Podcasts about humblepod

  • 8PODCASTS
  • 335EPISODES
  • 31mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • Apr 30, 2024LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about humblepod

Latest podcast episodes about humblepod

Experiencing Data with Brian O'Neill
142 - Live Webinar Recording: My UI/UX Design Audit of a New Podcast Analytics Service w/ Chris Hill (CEO, Humblepod)

Experiencing Data with Brian O'Neill

Play Episode Listen Later Apr 30, 2024 50:56


Welcome to a special edition of Experiencing Data. This episode is the audio capture from a live Crowdcast video webinar I gave on April 26th, 2024 where I conducted a mini UI/UX design audit of a new podcast analytics service that Chris Hill, CEO of Humblepod, is working on to help podcast hosts grow their show. Humblepod is also the team-behind-the-scenes of Experiencing Data, and Chris had asked me to take a look at his new “Listener Lifecycle” tool to see if we could find ways to improve the UX and visualizations in the tool, how we might productize this MVP in the future, and how improving the tool's design might help Chris help his prospective podcast clients learn how their listener data could help them grow their listenership and “true fans.” On a personal note, it was fun to talk to Chris on the show given we speak every week:  Humblepod has been my trusted resource for audio mixing, transcription, and show note summarizing for probably over 100 of the most recent episodes of Experiencing Data. It was also fun to do a “live recording” with an audience—and we did answer questions in the full video version. (If you missed the invite, join my Insights mailing list to get notified of future free webinars).   To watch the full audio and video recording on Crowdcast, free, head over to: https://www.crowdcast.io/c/podcast-analytics-ui-ux-design   Additional show notes, full transcription, and quotes for this episode are coming soon.

Screaming in the Cloud
A Beginner's Guide to Surviving AWS re:Invent with Chris Hill

Screaming in the Cloud

Play Episode Listen Later Mar 7, 2024 28:01


Episode SummaryCorey Quinn is joined by HumblePod CEO Chris Hill to dissect Chris's debut experience at AWS re:Invent. Together, they tackle the challenges of attending one of the biggest conferences in the IT industry, discussing its immense reach, logistical hurdles, and invaluable insights for anyone considering attending in the future. Beyond the event itself, Chris provides an intimate glimpse into the crucial behind-the-scenes efforts involved in producing exceptional content amid the chaos of AWS re:Invent, emphasizing the importance of kindness, professionalism, and superior audio quality. Discover how partnering with an experienced podcast production team can elevate any content to new heights of polish and engagement.Full Description / Show Notes(00:00) - Introduction to the Episode(01:25) - Chris's First Impressions of AWS re:Invent(02:09) - The Surprising Scale of AWS re:Invent(04:13) - Lessons Learned and Things Chris Would Do Differently at Future AWS re:Invent Events(07:52) - Balancing Content Creation, Networking, and Professionalism Under Stress(13:42) - Chris and Corey's Humorous Encounters with Security While Filming at AWS re:Invent(15:35) - Exploring AWS Services and Billing Surprises(21:12) - Significance of Professional Podcast Production(25:04) - Closing Thoughts & HumblePod Contact Information(26:19) - Closing ThoughtsAbout Chris:Chris Hill is a Knoxville, TN native and owner of the podcast production company, HumblePod. He helps his customers create, develop, and produce podcasts and is working with clients in Knoxville as well as startups and entrepreneurs across the United States, Silicon Valley, and the world.In addition to producing podcasts for nationally-recognized thought leaders, Chris is the co-host and producer of the award-winning Our Humble Beer Podcast. He also lectures at the University of Tennessee, where he leads non-credit courses on podcasts and marketing.  He received his undergraduate degree in business at the University of Tennessee at Chattanooga where he majored in Marketing & Entrepreneurship, and he later received his MBA from King University. Chris currently serves his community as the President of the American Marketing Association in Knoxville. In his spare time, he enjoys hanging out with the local craft beer community, international travel, exploring the great outdoors, and his many creative pursuits.Links:HumblePod: https://www.humblepod.com/Twitter: https://twitter.com/HumblePod LinkedIn: https://www.linkedin.com/in/chrisdhill1/ TikTok: https://www.tiktok.com/@thechristopholiesWBTB TikTok: https://www.tiktok.com/@webuiltthisbrand HumblePod IG: https://www.instagram.com/humblepod/?hl=en 

Screaming in the Cloud
How Tech Will Influence the Future of Podcasting with Chris Hill

Screaming in the Cloud

Play Episode Listen Later Oct 31, 2023 34:35


Chris Hill, owner of HumblePod and host of the We Built This Brand podcast, joins Corey on Screaming in the Cloud to discuss the future of podcasting and the role emerging technologies will play in the podcasting space. Chris describes why AI is struggling to make a big impact in the world of podcasting, and also emphasizes the importance of authenticity and finding a niche when producing a show. Corey and Chris discuss where video podcasting works and where it doesn't, and why it's more important to focus on the content of your podcast than the technical specs of your gear. Chris also shares insight on how to gauge the health of your podcast audience with his Podcast Listener Lifecycle evaluation tool.About ChrisChris Hill is a Knoxville, TN native and owner of the podcast production company, HumblePod. He helps his customers create, develop, and produce podcasts and is working with clients in Knoxville as well as startups and entrepreneurs across the United States, Silicon Valley, and the world.In addition to producing podcasts for nationally-recognized thought leaders, Chris is the co-host and producer of the award-winning Our Humble Beer Podcast and the host of the newly-launched We Built This Brand podcast. He also lectures at the University of Tennessee, where he leads non-credit courses on podcasts and marketing.  He received his undergraduate degree in business at the University of Tennessee at Chattanooga where he majored in Marketing & Entrepreneurship, and he later received his MBA from King University.Chris currently serves his community as the President of the American Marketing Association in Knoxville. In his spare time, he enjoys hanging out with the local craft beer community, international travel, exploring the great outdoors, and his many creative pursuits.Links Referenced: HumblePod: https://www.humblepod.com/ HumblePod Quick Edit: https://humblepod.com/services/quick-edit Podcast Listener Lifecycle: https://www.humblepod.com/podcast/grow-your-podcast-with-the-listener-lifecycle/ Twitter: https://twitter.com/christopholies Transcript:Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Are you navigating the complex web of API management, microservices, and Kubernetes in your organization? Solo.io is here to be your guide to connectivity in the cloud-native universe!Solo.io, the powerhouse behind Istio, is revolutionizing cloud-native application networking. They brought you Gloo Gateway, the lightweight and ultra-fast gateway built for modern API management, and Gloo Mesh Core, a necessary step to secure, support, and operate your Istio environment.Why struggle with the nuts and bolts of infrastructure when you can focus on what truly matters - your application. Solo.io's got your back with networking for applications, not infrastructure. Embrace zero trust security, GitOps automation, and seamless multi-cloud networking, all with Solo.io.And here's the real game-changer: a common interface for every connection, in every direction, all with one API. It's the future of connectivity, and it's called Gloo by Solo.io.DevOps and Platform Engineers, your journey to a seamless cloud-native experience starts here. Visit solo.io/screaminginthecloud today and level up your networking game.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My returning guest probably knows more about this podcast than I do. Chris Hill is not only the CEO of HumblePod, but he's also the producer of a lot of my various media endeavors, ranging from the psychotic music videos that I wind up putting out to mock executives on their birthdays to more normal videos that I wind up recording when I'm forced into the studio and can't escape because they bar the back exits, to this show. Chris, thank you for joining me, it's nice to see you step into the light.Chris: It's a pleasure to be here, Corey.Corey: So, you have been, effectively, producing this entire podcast after I migrated off of a previous vendor, what four years ago? Five?Chris: About four or five years ago now, yeah. It's been a while.Corey: Time is a flat circle. It's hard to keep track of all of that. But it's weird that you and I don't get to talk nearly as much as we used to, just because, frankly, the process is working and therefore, you disappear into the background.Chris: Yeah.Corey: One of the dangerous parts of that is that the only time I ever wind up talking to you is when something has gone wrong somewhere and frankly, that does not happen anymore. Which means we don't talk.Chris: Yeah. And I'm okay with that. I'm just kidding. I love talking to you, Corey.Corey: Oh, I tolerate you. And every once in a while, you irritate me massively, which is why I'm punishing you this year by—Chris: [laugh].Corey: Making you tag along for re:Invent.Chris: I'm really excited about that one. It's going to be fun to be there with you and Jeremy and Mike and everybody. Looking forward to it.Corey: You know how I can tell that you've never been to re:Invent before?Chris: “I'm looking forward to it.”Corey: Exactly. You still have life in your eyes and a spark in your step. And yeah… that'll change. That'll change. So, a lot of this show is indirectly your fault because this is a weird thing for a podcaster to admit, but I genuinely don't listen to podcasts. I did when I was younger, back when I had what the kids today call ‘commute' or ‘RTO' as they start slipping into the office, but I started working from home almost a decade ago, and there aren't too many podcasts that fit into the walk from the kitchen to my home office. Like great, give me everything you want me to know in about three-and-a-half seconds. Go… and we're done. It doesn't work. So, I'm a producer, but I don't consume my own content, which I think generally is something you only otherwise see in, you know, drug dealers.Chris: Yeah. Well, and I mean, I think a lot of professional media, like, you get to a point where you're so busy and you're creating so much content that it's hard to sit down and review your own stuff. I mean, even at HumblePod, I'm in a place where we're producing our own show now called We Built This Brand, and I end up in a place where some weeks I'm like, “I can't review this. I approve it. You send it out, I trust you.” So, Corey, I'm starting to echo you in a lot of ways and it's just—it makes me laugh from time to time.Corey: Somewhat recently, I wound up yet again, having to do a check on, “Hey, you use HumblePod for your podcasting work. Do you like them?” And it's fun. It's almost like when someone reaches out about someone you used to work with. Like, “We're debating hiring this person. Should we?” And I love being able to give the default response for the people I've worked with for this long, which is, “Shut up and hire them. Why are you talking to me and not hiring them faster? Get on with it.”Because I'm a difficult customer. I know that. The expectations I have are at times unreasonably high. And the fact that I don't talk to you nearly as much as I used to shows that this all has been working. Because there was a time we talked multiple times a day back—Chris: Mm-hm.Corey: When I had no idea what I was doing. Now, 500-some-odd episodes in, I still have no idea what I'm doing, but by God, I've gotten it down to a science.Chris: Absolutely you have. And you know, technically we're over 1000 episodes together, I think, at this point because if you combine what you're doing with Screaming in the Cloud, with Last Week in AWS slash AWS Morning Brief, yeah, we've done a lot with you. But yes, you've come a long way.Corey: Yes, I have become the very whitest of guys. It works out well. It's like, one podcast isn't enough. We're going to have two of them. But it's easy to talk about the past. Let's talk instead about the future a little bit. What does the future of podcasting look like? I mean, one easy direction to go in with this, as you just mentioned, there's over 1000 episodes of me flapping my gums in the breeze. That feels like it's more than enough data to train an AI model to basically be me without all the hard work, but somehow I kind of don't see it happening anytime soon.Chris: Yeah, I think listeners still value authenticity a lot and I think that's one of the hard things you're seeing in podcasting as a whole is that these organizations come in and they're like, “We're going to be the new podcast killer,” or, “We're going to be the next thing for podcasting,” and if it's too overproduced, too polished, like, I think people can detect that and see that inauthenticity, which is why, like, AI coming in and taking over people's voices is so crazy. One of the things that's happening right now at Spotify is that they are beta testing translation software so that Screaming in the Cloud could automatically be in Spanish or Last Week in AWS could automatically be in French or what have you. It's just so surreal to me that they're doing this, but they're doing exactly what you said. It's language learning models that understand what the host is saying and then they're translating it into another language.The problem is, what if that automation gets that word wrong? You know how bad one wrong word could be, translating from Spanish or French or any other language from English. So, there's a lot of challenges to be met there. And then, of course, you know, once they've got your voice, what do they do with it? There's a lot of risk there.Corey: The puns don't translate very well, most of the time, either.Chris: Oh, yes.Corey: Especially when I mis-intentionally mispronounce words like Ku-BER-netees.Chris: Exactly. I mean, it's going to be auto-translated into text at some point before it's then put out as, you know, an audio source, and so if you say something wrong, it's going to be an issue. And Ku-BER-netees or Chat-Gippity or any of those great terms that you have, they're going to also be translated wrong as well, and that creates its own can of worms so to speak.Corey: Well, let me ask you something because you have always been one to embrace emerging technologies. It's one of the things I appreciate about you; you generally don't recommend solutions from the Dark Ages when it comes to what equipment should I have and how should I maintain it and the rest. But there are a lot of services out there that will now do automatic transcription and the service that you use at the moment remains a woman named Cecilia, who's remarkably good at what she does. But why have you not replaced her with a robot?Chris: [laugh]. Very simply put, I mean, it kind of goes back to what I was just saying about language translation. AI does not understand context for human words as well as humans do, and so words are wrong a lot of times in auto transcription. I mean, I can remember a time when, you know, we first started working with you all were, if there was one thing wrong in a transcript, an executive at AWS would potentially make fun of you on Twitter for it. And so, we knew we had to be on our A-game when it came to that, so finding someone who had that niche expertise of being able to translate not just words and understand words, but also understand tech terminology, you know, I think that that's, that's its own animal and its own challenge. So yeah, I mean, you could easily get away with something—Corey: Especially with my attentional mispronunciation where she's, “I don't quite know what you're saying here, and neither does the entire rest of the industry.” Like, “Postgres-squ—do you mean Postgres? Who the hell calls it Postgres-squeal?” I do. I call it that. Two warring pronunciations, I will unify them by coming up with a third that is far worse. It's kind of my shtick. The problem is, at some point, it becomes too inside-jokey when I have 15 words that I'm doing that too, and suddenly no one knows what the hell I'm talking about and the joke gets old quickly.Chris: Yep.Corey: So, I've tried to scale that back. But there are still a few that I… I can't help but play with.Chris: Yeah. And it's always fun bringing someone new in to work on—work with you all because they're always like, “What is he saying? Does he mean this?” And [laugh] it's always an adventure.Corey: It keeps life fun though.Chris: Absolutely.Corey: So, one thing that you did for a while, back when I was starting out, it almost felt like you were in cahoots with Big Microphone because once I would wind up getting a setup all working and ready for the recording, like, “Great. Everything working terrifically? Cool, throw it away. It's time for generation three of this.” I think I'm on, like, gen six, or gen seven now, but it's been relatively static for the past few years. Are the checks not as big as they used to be? I mean, if we hit a point of equilibrium? What's going on?Chris: Yeah, unfortunately, Big Microphone isn't paying what they used to. The economy and interest rates and all that, it's just making it hard. But once you get to a certain level of gear, it's going to be more important that you have good content than better and better gear. Could we keep going? Sure. If you wanted to buy a studio and you wanted to get Neumann microphones or something like that, we could keep going. But again, Big Microphone is not paying what they used to.Corey: When people reach out because they're debating starting a podcast and they ask me for advice, other than hire HumblePod, the next question they usually get around to is gear. And I don't think that they are expecting my answer, which is, it does not matter. Because if the content is good, the listeners will forgive an awful lot. You could record it into your iPhone in a quiet room and they will put up with that. Whereas if the content isn't good, it doesn't matter what the production value is because people are constantly being offered better things to do with their time. You've got to grab them, you have to be compelling to your target audience or the rest of it does not matter.Chris: Yeah. And I think that's the big challenge with audio is a lot of people get excited, especially I find this true of people in the tech industry of like, “Okay, I want to learn all the tech stuff, I love all the cool tech stuff, and so I'm going to go out and buy all this equipment first.” And then they spend $5,000 on equipment and they never record a single episode because they put all their time and energy into researching and buying gear and never thought about the content of the show. The truth is, you could start with your iPhone and that's it. And while I don't necessarily advise that, you'd be surprised at the quality of audio on an iPhone.I've had a client have to re-record something while they were traveling remotely and I said, “You just need to get your iPhone out.” They took their AirPods, plugged them in and, I said, “No. Take them out, use the microphone on the iPhone.” And you can start with something as simple as that. Now, once you want to start making it better, sure, that's a great way to grow and that does influence people staying with your podcast over time, but I think in the long run, content trumps all.Corey: One of the problems I keep seeing is that people also want to record a podcast because they have a great idea for a few episodes. My rule of thumb—because I've gotten this wrong before—is, okay, if you want to do a whole new podcast, come up with the first 12 episodes. Because two, three, four, of course, you've got your ideas. And then by the—you'll find in many cases, you're going to have a problem by the end of it. Years ago, I did a mini-series inside of AMB called “Networking in the Cloud” where it was sponsored by, at the time, ThousandEyes, before Cisco bought them and froze them in amber for all eternity.But it was fun for the first six episodes and then I realized I'd said all I needed to say about networking, and I was on the hook for six more. And Ivan Pepeinjak, who's his own influencer type in the BGP IP space was like, “This is why you should stay in your lane. He's terrible. He got it all wrong.” Like, “Great. Come on and tell me exactly how I got it wrong,” because I was trying to approach it from a very surface topical area, but BGP is one of those areas where I get very wrapped around my own axle just because I've never used it in anger. Being able to pivot the show format is what saved me on that. But if I had started doing this as its own individual podcast and launched, it would have died on the vine, just because it would not have had enough staying power and I didn't have the interest to continue working on it. Could someone else come up with a networking-in-the-cloud podcast that had hundreds of episodes? Absolutely, but those people are what we call competent and good at things in a way that I very much am not.Chris: Yep. And I completely agree. I mean, 12 is my default number, so—I'm not going to take credit for your saying 12, but I know we've talked about that before. And—Corey: It was a 12-episode miniseries is why. And I remember by ten, I had completely scraped the bottom of the barrel. Then Ivan saved me on one of them, and then I did, I think, a mini-series-in-review, which is cheating but worked.Chris: Yeah. I remember that, the trials and travails of giving that out. It was fun, though. But with that, yeah, like, 12 is a good number because, like, to your point, if you have 12 and you want to do a monthly show, you've got a year's worth of content, if you do bi-weekly, that's six months, and if it's a weekly show, it's at least a quarter's worth of content. So, it does help you think through and at least come up with any potential roadblocks you might have by at least listing out, here's what episodes one, two, three, four, five and so on would be. And so, I do think that's a great approach.Corey: And don't be an idiot like I was and launch a newsletter and then podcast that focus on last week's news because you can't work ahead on that. If you can, why are you not a multi-billionaire for playing the markets? If you can predict the future, there's a more lucrative career for you than podcasting, I promise. But that means that I have to be on the treadmill on some level. I've gotten it down to a point where I can stretch it to ten days. I can take ten days off if I preload, do it as early as I possibly can beforehand and then as late as I possibly can when I return. Anything more than that, I'm either skipping a week or delaying the show or have to get a guest author or artist in.Chris: Yeah. And you definitely need that time off, and so that's the one big challenge, I think with podcasting, too, is like you create this treadmill for yourself that you constantly have to fill content after content after content. I think that's one of the big challenges in podcasting and one of the reasons we see so many podcasts fade out. I don't know if you're familiar, but there is a term called podfade, which is just that: people burning out, fading out in their excitement for a podcast. And most podcasters fade out by episode seven or eight, somewhere in that range, so to see someone go for say, like, you have 500 episodes plus, we're talking about a ton of good content. You've found your rhythm, you've found your groove. That can do it. But yeah, it's always, always a challenge staying motivated.Corey: One thing that consistently surprises me is that the things I care about as the creator and the things the audience cares about are not the same. And you have to be respectful of your audience's time. I've done the numbers on the shows that I put out and it's something on the order of over a year of human time for every episode that I put out. If I'm going to take a year from humanity's collective lifetimes in order to say my inane thoughts, then I have to be respectful of the audience's time. Which means, “Oh, I'm going to have a robot do it so I don't have to put the work in.” It doesn't work that way. That's not how you sustain.Chris: Right. In and again, it takes out that humanity that makes podcasting so special and makes that connection with even the listener so special. And I'm sure you've experienced this too. When you go to re:Invent, like, we're going to have here in just a few short months, people know you, and they probably say things and bring up things that you haven't even thought about. And you're like, “Where did you even learn that I did that?” And then you realize, “Oh, I said that on a podcast episode.”Corey: Yeah. What's weird is I don't get much feedback online for it, but people will talk to me in depth about the show. They'll come up to me near constantly and talk about it. They don't reach out the same way, which I guess makes sense. There are a couple of podcasts that I've really admired and listened to on and off in the car for years, but I've never reached out to the creators because I feel like I would sound ridiculous. It's not true. I know intellectually it's not true, but it feels weird to do it.Chris: One of the ways I got into podcasting was a podcast that just invited me to—you know, invited their listeners to sign up and engage with them. And I think that's something in the medium that does make it interesting is once you do engage, you find out that these creators respond. And where else do you get that, you know? If you're watching a big TV show and you tweet at somebody online that you admire in the show, the chance of them even liking what you said about them online is very slim to none. But with podcasting, there's just a different level of accessibility I find with most productions and most shows that makes it really something special.Corey: One thing that still surprises me—and I don't think I've ever been this explicit about it on the show, but why the hell not I have nothing to hide—Thursday evening, 5 p.m. Pacific time. That's when the automation fires and rotates everything for the newsletter and the AWS Morning Brief. Anything that comes in after that, unless I manually do an override, will not be in the next week's issue; it'll be the week after.That applies to Security as well, which means 5 p.m. on Thursday, it seals it, I write and record it and it goes ou—that particular one goes out Thursday morning the following week. And no one has ever said anything about this seems awfully late. Occasionally, there's been news the day before and someone said, “Oh, why didn't you include this?”And it's because, believe it or not, I don't just type this in and hit the send button. There's a bit more to it than that these days. But people don't need the sense of immediacy. This idea of striving to be first is not sustainable and it leads to terrible outcomes. My entire philosophy has not been to have the first take but rather the best take.Chris: Mm-hm.Corey: Sometimes I even get it right.Chris: And I mean in podcasting, too. Like, it's about, you serve a certain niche, right? Like, the people who are interested in AWS services and in this world of cloud computing listen to what you say, listen to the people you interview, and really enjoy those conversations. But that's not everybody in the world. That's not a very broad audience. And so, I think that those niches really serve a purpose.And the way I've always thought about it is, like, if you go to the grocery store, you know how you always have that rack of magazines with the most random interests? That's essentially what podcasting is. It's like each podcast is a different magazine that serves someone's random—and hyper-specific sometimes—niche interest in things. I mean, the number of things you can find podcasts on is just ridiculous. And I think the same is true for this. But the people who do follow, they're very serious, they're very dedicated, they do listen, and yeah, I think it's just a fascinating, fascinating thing.Corey: The way that I see it has been that I've been learning more from the audience and the things that people say that most people would believe, but… I make a lot of mistakes doing this, but talking to people does tend to shine a light on a lot of this. But enough about the past. Most of my episodes are about things that have previously happened. What does the future of podcasting look like? Where's it going from here?Chris: Oh, man. Well, I think the big question on everybody's mind is, do I need a video podcast? And I think that for most people, that's where the big question lies right now. I get a lot of questions about it, I get people reaching out, and I think the short answer to that is… not really. Or to answer a question I know you love, Corey, it depends.And the reason for that is, there's a lot with the tech of podcasting that just isn't going to distribute to everywhere, all at once anymore. The beauty of podcasting is that it's all based on an RSS feed. If you build an RSS feed and you put it in Apple Podcasts and Spotify, that RSS feed will distribute everywhere and it will distribute your audio everywhere. And what we see happening right now, and really one of the bigger challenges in podcasting, is that the RSS feed only provides audio. Technically, that's not accurate, but it does for most services.So, YouTube has recently come out and said that they are going to start integrating RSS feeds, so you'll be able to do those audiogram-esque things that a lot of people have done through apps like Headliner and stuff for a long time, or even their podcast host may automatically translate a version of their audio podcast into a video and just do, like, a waveform. They're going to have that in YouTube. TikTok is taking a similar approach. And they're both importing just the audio. And the reason I said earlier, that's technically not accurate is because RSS feeds can also support MP4s, but neither service is going to accept that or ingest it directly into their service from what you provide outbound.So, it's a very interesting time because it feels like we're getting there with video, but we're still not there, and we're still probably several years off from it. So, there's a lot of interest in video and I think the future is going to be video, but I think it's going to be a combination, too, with audio because who wants to sit and watch something for an hour-and-a-half when you're used to listening to it your commute or while you do the dishes or any number of other things that don't involve having your eyeballs directly on the content.Corey: We've tried it with this show. I found that it made the recording process a bit more onerous because everyone is suddenly freaking out about how they look and I can't indulge my constant nose-picking habit. Kidding. So, it was more work, I had to gussy myself up a bit more than dressing like a slob like I do some mornings because I do have young children and a deadline to get them to school by. But I never saw the audience to materialize there and be worth it.Because watching a video of two people talking with each other, it feels too much like a Zoom call that you can't participate in, so what's the point?Chris: Right.Corey: So, there's that. There's the fact that I also have very intentionally built most of what I do around newsletters and podcasts because at least so far, those are not dependent upon algorithmic discovery in the same way. I don't have to bias for things that YouTube likes this month. Instead, I can focus on the content that people originally signed up to hear me put out and I don't have to worry about it in the same way. Email predates me, it'll be here long after I'm gone, and that seems to make sense.I also look at how I have consumed podcasts, and times when I do, it's almost always while I'm doing something else. And if I have to watch a screen, that becomes significantly more distracting, and harder for me to find the time to do it with.Chris: I think what you're seeing is that, like, there's some avenues to where video podcasting is really good and really interesting, and I think the real place where that works best right now is in-person interviews. So, Corey, if you went out and interviewed Andy Jassy in person in Seattle, that to me would be something that would warrant bringing the cameras out for and putting online because people would want to see you in the office interacting with him. That would be interesting. To your point, during the Zoom calls and things like that, you end up in a place where people just aren't as interested in sitting and watching the Zoom call. And I think that's something that is a clear distinction to make.Entertainment, comedy, doing things in person, I think that's where the real interest in video is and that's why I don't think video will be for everybody all the time. The thing that is starting to come up as well is discoverability, and that has always been a challenge, but as we get into—and we probably don't want to go down this rabbit hole, but you know, what's happened to Twitter and X, like, discoverability is becoming more of a challenge because they're limiting access to that platform. They're limiting discoverability if you're not willing to pay for a blue checkmark. They're doing all these things to make it harder for small independent podcasts to grow.And the places that are opening up for it to grow are places like YouTube, places like TikTok, that have the ability to not only just put your full podcasts online now, but you can actually do, like, YouTube shorts or highlighted clips, and directly link those back to the long-form content that you're producing. So, there is some value in it, there is a technology and a future there for it, but it's just a very complicated time to be in podcasting and figuring out where to go to grow. That's probably the biggest challenge that we face and I think ultimately, that just comes down to developing an audience outside of these social media channels.Corey: One thing that you were talking about a while back in a conversation that I don't think I've ever followed up with you on—and there's no time like in front of a bunch of public people to do that—Chris: [laugh].Corey: You were talking to me about something that you were calling the Podcast Listener Lifecycle.Chris: Yes.Corey: What's your point on that?Chris: So, the Listener Lifecycle is something I developed, just to be frank, working with you guys, learning from you all, and also my background in marketing, and in building audiences and things, from my own podcasts and other things that I did prior to building HumblePod, led me to a place of going, how can we best explain to a client where their podcast is? How does it exist? Where does it exist? All that good stuff. And basically, the Listener Lifecycle is just that.It's a design—and we'll have links to it because I actually did a whole podcast season on the Listener Lifecycle from beginning to end, so that's probably the easiest way to talk about it. But essentially, it's the idea of, you're curious about a show, and how do you go from being curious about a show to exploring a podcast, to then becoming a follower of the podcast, literally clicking the Follow button. What does it take to get through each one of those stages? How can you identify where your audience is? And basically, it's a tool you can use to say, “Well, this is where my listener is in the stages.” And then once they get to be a follower, how do I build them into something more?Well, get them to be a subscriber, subscribe to a newsletter, subscribe to a Patreon or Substack or whatever that subscription service is that you prefer to use, and get them off of just being on social media and following you there and following you in a podcast audio form. Because things can happen: your podcast host could break and you'd lose your audience, right? We've seen Twitter, which we may have thought years ago that it would never go away, and now we don't know how long it's going to be there. It could be gone by the time we're done with this conversation for all we know. I've got all my notifications turned off, so we're basically in a liminal space at this point.But with that said, there's a lot of risk in audiences changing and things like that, so audience portability is really important. So, the more you can collect email addresses, collect contact information, and communicate with that group of people, the better your audience is going to be. And so, that's what it's about is helping people get to that stage where they can do that so that they don't lose audiences and so that they can even build and grow audiences beyond that to the point where they get to the last phase, which is the ‘true fan' phase. And that's where you get people who love your show, retweet everything you do, repost everything you do, and share it with all their friends every time you're creating new content. And that's ultimately what you want: those die-hard people that come up to you and know everything about you at re:Invent, those are the people that you want to create more of because they're going to help you grow your show and your audience, ultimately. So, that's what it's about. I know that's a lot. But again, like, we'll have a link in the show notes to where you can learn more about it.Corey: Indeed, we will. Normally I'm the one that says, “And we'll include a link to that in the show notes.” But you're the one that has to actually make all that happen. Here's another glimpse behind the curtain. I have a Calendly link that I pass out to people to book time on the show. They fill out the form, which is relatively straightforward and low effort by design, and the next time I think about it is ten minutes beforehand when it pops up with, “Hey, you have a recording to go to.” Great. I book an hour for a half-hour recording. I wind up going through this entire conversation. When we're done, we close out the episode, we chat a bit, I close the tab, and I don't think about it again, it's passed off to you folks entirely. It is the very whitest of white glove treatments. Because I, once again, am the very whitest of white guys.Chris: We aim to please [laugh].Corey: Exactly. Because I remember before this, I used to have things delayed by months because I would forget to copy the freaking file into Dropbox, of all things. And that was just wild to me.Chris: And we stay on you about that because we want to make sure that your show gets out and—Corey: And now it automatically transfers and I—when the automation works—I don't have to think about it again. What is fun to me is despite all the time that I spend in enterprise cloud services, we still use things that are prosumer, like Dropbox and other things that are human-centric because for some reason, most of your team are not also highly competent cloud developers. And I still think it is such a miss that something like S3, which would be perfect for this, requires that level of engineering. And I have more self-respect than that. I'd have to build some stuff in order to make that work effectively on my end, let alone folks who have actual jobs that don't involve messing around with cloud services all day.But it blows my mind that there's still such this gulf between things that sound like you would have one of your aging parents deal with versus something that is extraordinarily capable and state-of-the-art. I know they're launching a bunch of things like Amazon's IVS, which is a streaming offering, a lot of their elemental offerings for media packaging, but I look at it, it's like wow, not only is this expensive, it doesn't solve any problems that we actually have and would add significant extra steps to every part of it. Thanks, but no thanks. And sure, maybe we're not the target market, but I can't shake the feeling that there are an awful lot of people like us that fit that profile.Chris: Yeah. And I mean, you bring up a good point about not using S3, things like that. It has occurred to me as well that, hey, maybe we should find somebody to help us develop a technology like this to make it easier on us on the back end to do all the recording and the production in one place, one database, and be able to move on. So, at some point I would love to get there. That's probably a conversation for after the podcast, Corey, but definitely is something that we've been thinking about at HumblePod is, how do we reach that next step of making it even easier on our clients?Corey: Well, it is certainly appreciated. But again, remember, your task is to continue to perform the service excellently, not be the poster child for cloud services with dumb names.Chris: [laugh]. Yes, yes. And I'm sure we could come up with a bunch.Corey: One last question before we wind up calling in an episode. I know that I've been emphasizing the white glove treatment that I get—and let's be clear, you are not inexpensive, but you're also well worth it; you deliver value extraordinarily for our needs—do you offer things that are not quite as, we'll call it, high-touch and comprehensive?Chris: Yes, we do actually. We just recently launched a new service called Quick Edit and it's just that. It's still humans touching the service, so it's not a bunch of automated, hey, we're just running this through an AI program and it's going to spit it out on the other end. We actually have a human that touches your audio, cleans it up, and sends it back. And yeah, we're there to make sure that we can clean things up quickly and easily and affordably for those folks that are just in a pinch.Maybe you edit most weeks and you're just tired of doing the editing, maybe you're close to podfading and you just want an extra boost to see if you can keep the show going. That's what we have the Quick Edit service for. And that starts at $150 an episode and we'll edit up to 45 minutes of audio for you within that. And yeah, there's some other options available as well if you start to add more stuff, but just come check us out. You can go to humblepod.com/services/quick-edit and find that there.Corey: And we will, of course, put links to that in the show notes. Or at least you will. I certainly won't.Chris: [laugh].Corey: Chris, thank you so much for taking the time to speak with me. If people want to learn more, other than hunting you down at re:Invent, which they absolutely should do, where's the best place for them to find you?Chris: I mean@HumblePod anywhere is the quickest, easiest way to find me anywhere—or at least find the business—and you can find me at @christopholies. And we'll have a link to that in the show notes for sure because it's not worth spelling out on the podcast.Corey: I would have pronounced it chris-to-files, but that's all right. That's how it works.Chris: [laugh].Corey: Thank you so much, Chris for everything that you do, as well as suffering my nonsensical slings and arrows for the last half hour. We'll talk soon.Chris: You're welcome, Corey.Corey: Chris Hill, CEO at HumblePod. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you hated this episode, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment that I'm sure Chris or one of his colleagues will spend time hunting down from all corners of the internet to put into a delightful report, which I will then never read.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Screaming in the Cloud
The Role of DevRel at Google with Richard Seroter

Screaming in the Cloud

Play Episode Listen Later Aug 8, 2023 34:07


Richard Seroter, Director of Outbound Product Management at Google, joins Corey on Screaming in the Cloud to discuss what's new at Google. Corey and Richard discuss how AI can move from a novelty to truly providing value, as well as the importance of people maintaining their skills and abilities rather than using AI as a black box solution. Richard also discusses how he views the DevRel function, and why he feels it's so critical to communicate expectations for product launches with customers. About RichardRichard Seroter is Director of Outbound Product Management at Google Cloud. He's also an instructor at Pluralsight, a frequent public speaker, and the author of multiple books on software design and development. Richard maintains a regularly updated blog (seroter.com) on topics of architecture and solution design and can be found on Twitter as @rseroter. Links Referenced: Google Cloud: https://cloud.google.com Personal website: https://seroter.com Twitter: https://twitter.com/rseroter LinkedIn: https://www.linkedin.com/in/seroter/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Human-scale teams use Tailscale to build trusted networks. Tailscale Funnel is a great way to share a local service with your team for collaboration, testing, and experimentation.  Funnel securely exposes your dev environment at a stable URL, complete with auto-provisioned TLS certificates. Use it from the command line or the new VS Code extensions. In a few keystrokes, you can securely expose a local port to the internet, right from the IDE.I did this in a talk I gave at Tailscale Up, their first inaugural developer conference. I used it to present my slides and only revealed that that's what I was doing at the end of it. It's awesome, it works! Check it out!Their free plan now includes 3 users & 100 devices. Try it at snark.cloud/tailscalescream Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. We have returning guest Richard Seroter here who has apparently been collecting words to add to his job title over the years that we've been talking to him. Richard, you are now the Director of Product Management and Developer Relations at Google Cloud. Do I have all those words in the correct order and I haven't forgotten any along the way?Richard: I think that's all right. I think my first job was at Anderson Consulting as an analyst, so my goal is to really just add more words to whatever these titles—Corey: It's an adjective collection, really. That's what a career turns into. It's really the length of a career and success is measured not by accomplishments but by word count on your resume.Richard: If your business card requires a comma, success.Corey: So, it's been about a year or so since we last chatted here. What have you been up to?Richard: Yeah, plenty of things here, still, at Google Cloud as we took on developer relations. And, but you know, Google Cloud proper, I think AI has—I don't know if you've noticed, AI has kind of taken off with some folks who's spending a lot the last year… juicing up services and getting things ready there. And you know, myself and the team kind of remaking DevRel for a 2023 sort of worldview. So, yeah we spent the last year just scaling and growing and in covering some new areas like AI, which has been fun.Corey: You became profitable, which is awesome. I imagined at some point, someone wound up, like, basically realizing that you need to, like, patch the hole in the pipe and suddenly the water bill is no longer $8 billion a quarter. And hey, that works super well. Like, wow, that explains our utility bill and a few other things as well. I imagine the actual cause is slightly more complex than that, but I am a simple creature.Richard: Yeah. I think we made more than YouTube last quarter, which was a good milestone when you think of—I don't think anybody who says Google Cloud is a fun side project of Google is talking seriously anymore.Corey: I misunderstood you at first. I thought you said that you're pretty sure you made more than I did last year. It's like, well, yes, if a multi-billion dollar company's hyperscale cloud doesn't make more than I personally do, then I have many questions. And if I make more than that, I have a bunch of different questions, all of which could be terrifying to someone.Richard: You're killing it. Yeah.Corey: I'm working on it. So, over the last year, another trend that's emerged has been a pivot away—thankfully—from all of the Web3 nonsense and instead embracing the sprinkle some AI on it. And I'm not—people are about to listen to this and think, wait a minute, is he subtweeting my company? No, I'm subtweeting everyone's company because it seems to be a universal phenomenon. What's your take on it?Richard: I mean, it's countercultural now to not start every conversation with let me tell you about our AI story. And hopefully, we're going to get past this cycle. I think the AI stuff is here to stay. This does not feel like a hype trend to me overall. Like, this is legit tech with real user interest. I think that's awesome.I don't think a year from now, we're going to be competing over who has the biggest model anymore. Nobody cares. I don't know if we're going to hopefully lead with AI the same way as much as, what is it doing for me? What is my experience? Is it better? Can I do this job better? Did you eliminate this complex piece of toil from my day two stuff? That's what we should be talking about. But right now it's new and it's interesting. So, we all have to rub some AI on it.Corey: I think that there is also a bit of a passing of the buck going on when it comes to AI where I've talked to companies that are super excited about how they have this new AI story that's going to be great. And, “Well, what does it do?” “It lets you query our interface to get an answer.” Okay, is this just cover for being bad UX?Richard: [laugh]. That can be true in some cases. In other cases, this will fix UXes that will always be hard. Like, do we need to keep changing… I don't know, I'm sure if you and I go to our favorite cloud providers and go through their documentation, it's hard to have docs for 200 services and millions of pages. Maybe AI will fix some of that and make it easier to discover stuff.So in some cases, UIs are just hard at scale. But yes, I think in some cases, this papers over other things not happening by just rubbing some AI on it. Hopefully, for most everybody else, it's actually interesting, new value. But yeah, that's a… every week it's a new press release from somebody saying they're about to launch some AI stuff. I don't know how any normal human is keeping up with it.Corey: I certainly don't know. I'm curious to see what happens but it's kind of wild, too, because there you're right. There is something real there where you ask it to draw you a picture of a pony or something and it does, or give me a bunch of random analysis of this. I asked one recently to go ahead and rank the US presidents by absorbency and with a straight face, it did it, which is kind of amazing. I feel like there's a lack of imagination in the way that people talk about these things and a certain lack of awareness that you can make this a lot of fun, and in some ways, make that a better showcase of the business value than trying to do the straight-laced thing of having it explain Microsoft Excel to you.Richard: I think that's fair. I don't know how much sometimes whimsy and enterprise mix. Sometimes that can be a tricky part of the value prop. But I'm with you this some of this is hopefully returns to some more creativity of things. I mean, I personally use things like Bard or what have you that, “Hey, I'm trying to think of this idea. Can you give me some suggestions?” Or—I just did a couple weeks ago—“I need sample data for my app.”I could spend the next ten minutes coming up with Seinfeld and Bob's Burgers characters, or just give me the list in two seconds in JSON. Like that's great. So, I'm hoping we get to use this for more fun stuff. I'll be fascinated to see if when I write the keynote for—I'm working on the keynote for Next, if I can really inject something completely off the wall. I guess you're challenging me and I respect that.Corey: Oh, I absolutely am. And one of the things that I believe firmly is that we lose sight of the fact that people are inherently multifaceted. Just because you are a C-level executive at an enterprise does not mean that you're not also a human being with a sense of creativity and a bit of whimsy as well. Everyone is going to compete to wind up boring you to death with PowerPoint. Find something that sparks the imagination and sparks joy.Because yes, you're going to find the boring business case on your own without too much in the way of prodding for that, but isn't it great to imagine what if? What if we could have fun with some of these things? At least to me, that's always been the goal is to get people's attention. Humor has been my path, but there are others.Richard: I'm with you. I think there's a lot to that. And the question will be… yeah, I mean, again, to me, you and I talked about this before we started recording, this is the first trend for me in a while that feels purely organic where our customers, now—and I'll tell our internal folks—our customers have much better ideas than we do. And it's because they're doing all kinds of wild things. They're trying new scenarios, they're building apps purely based on prompts, and they're trying to, you know, do this.And it's better than what we just come up with, which is awesome. That's how it should be, versus just some vendor-led hype initiative where it is just boring corporate stuff. So, I like the fact that this isn't just us talking; it's the whole industry talking. It's people talking to my non-technical family members, giving me ideas for what they're using this stuff for. I think that's awesome. So yeah, but I'm with you, I think companies can also look for more creative angles than just what's another way to left-align something in a cell.Corey: I mean, some of the expressions on this are wild to me. The Photoshop beta with its generative AI play has just been phenomenal. Because it's weird stuff, like, things that, yeah, I'm never going to be a great artist, let's be clear, but being able to say remove this person from the background, and it does it, as best I can tell, seamlessly is stuff where yeah, that would have taken me ages to find someone who knows what the hell they're doing on the internet somewhere and then pay them to do it. Or basically stumble my way through it for two hours and it somehow looks worse afterwards than before I started. It's the baseline stuff of, I'm never going to be able to have it—to my understanding—go ahead just build me a whole banner ad that does this and hit these tones and the rest, but it is going to help me refine something in that direction, until I can then, you know, hand it to a professional who can take it from my chicken scratching into something real.Richard: If it will. I think that's my only concern personally with some of this is I don't want this to erase expertise or us to think we can just get lazy. I think that I get nervous, like, can I just tell it to do stuff and I don't even check the output, or I don't do whatever. So, I think that's when you go back to, again, enterprise use cases. If this is generating code or instructions or documentation or what have you, I need to trust that output in some way.Or more importantly, I still need to retain the skills necessary to check it. So, I'm hoping people like you and me and all our —every—all the users out there of this stuff, don't just offload responsibility to the machine. Like, just always treat it like a kind of slightly drunk friend sitting next to you with good advice and always check it out.Corey: It's critical. I think that there's a lot of concern—and I'm not saying that people are wrong on this—but that people are now going to let it take over their jobs, it's going to wind up destroying industries. No, I think it's going to continue to automate things that previously required human intervention. But this has been true since the Industrial Revolution, where opportunities arise and old jobs that used to be critical are no longer centered in quite the same way. The one aspect that does concern me is not that kids are going to be used to cheat on essays like, okay, great, whatever. That seems to be floated mostly by academics who are concerned about the appropriate structure of academia.For me, the problem is, is there's a reason that we have people go through 12 years of English class in the United States and that is, it's not to dissect of the work of long-dead authors. It's to understand how to write and how to tell us a story and how to frame ideas cohesively. And, “The computer will do that for me,” I feel like that potentially might not serve people particularly well. But as a counterpoint, I was told when I was going to school my entire life that you're never going to have a calculator in your pocket all the time that you need one. No, but I can also speak now to the open air, ask it any math problem I can imagine, and get a correct answer spoken back to me. That also wasn't really in the bingo card that I had back then either, so I am a hesitant to try and predict the future.Richard: Yeah, that's fair. I think it's still important for a kid that I know how to make change or do certain things. I don't want to just offload to calculators or—I want to be able to understand, as you say, literature or things, not just ever print me out a book report. But that happens with us professionals, too, right? Like, I don't want to just atrophy all of my programming skills because all I'm doing is accepting suggestions from the machine, or that it's writing my emails for me. Like, that still weirds me out a little bit. I like to write an email or send a tweet or do a summary. To me, I enjoy those things still. I don't want to—that's not toil to me. So, I'm hoping that we just use this to make ourselves better and we don't just use it to make ourselves lazier.Corey: You mentioned a few minutes ago that you are currently working on writing your keynote for Next, so I'm going to pretend, through a vicious character attack here, that this is—you know, it's 11 o'clock at night, the day before the Next keynote and you found new and exciting ways to procrastinate, like recording a podcast episode with me. My question for you is, how is this Next going to be different than previous Nexts?Richard: Hmm. Yeah, I mean, for the first time in a while it's in person, which is wonderful. So, we'll have a bunch of folks at Moscone in San Francisco, which is tremendous. And I [unintelligible 00:11:56] it, too, I definitely have online events fatigue. So—because absolutely no one has ever just watched the screen entirely for a 15 or 30 or 60-minute keynote. We're all tabbing over to something else and multitasking. And at least when I'm in the room, I can at least pretend I'll be paying attention the whole time. The medium is different. So, first off, I'm just excited—Corey: Right. It feels a lot ruder to get up and walk out of the front row in the middle of someone's talk. Now, don't get me wrong, I'll still do it because I'm a jerk, but I'll feel bad about it as I do. I kid, I kid. But yeah, a tab away is always a thing. And we seem to have taken the same structure that works in those events and tried to force it into more or less a non-interactive Zoom call, and I feel like that is just very hard to distinguish.I will say that Google did a phenomenal job of online events, given the constraints it was operating under. Production value is great, the fact that you took advantage of being in different facilities was awesome. But yeah, it'll be good to be back in person again. I will be there with bells on in Moscone myself, mostly yelling at people, but you know, that's what I do.Richard: It's what you do. But we missed that hallway track. You missed this sort of bump into people. Do hands-on labs, purposely have nothing to do where you just walk around the show floor. Like we have been missing, I think, society-wise, a little bit of just that intentional boredom. And so, sometimes you need at conference events, too, where you're like, “I'm going to skip that next talk and just see what's going on around here.” That's awesome. You should do that more often.So, we're going to have a lot of spaces for just, like, go—like, 6000 square feet of even just going and looking at demos or doing hands-on stuff or talking with other people. Like that's just the fun, awesome part. And yeah, you're going to hear a lot about AI, but plenty about other stuff, too. Tons of announcements. But the key is that to me, community stuff, learn from each other stuff, that energy in person, you can't replicate that online.Corey: So, an area that you have expanded into has been DevRel, where you've always been involved with it, let's be clear, but it's becoming a bit more pronounced. And as an outsider, I look at Google Cloud's DevRel presence and I don't see as much of it as your staffing levels would indicate, to the naive approach. And let's be clear, that means from my perspective, all public-facing humorous, probably performative content in different ways, where you have zany music videos that, you know, maybe, I don't know, parody popular songs do celebrate some exec's birthday they didn't know was coming—[fake coughing]. Or creative nonsense on social media. And the the lack of seeing a lot of that could in part be explained by the fact that social media is wildly fracturing into a bunch of different islands which, on balance, is probably a good thing for the internet, but I also suspect it comes down to a common misunderstanding of what DevRel actually is.It turns out that, contrary to what many people wanted to believe in the before times, it is not getting paid as much as an engineer, spending three times that amount of money on travel expenses every year to travel to exotic places, get on stage, party with your friends, and then give a 45-minute talk that spends two minutes mentioning where you work and 45 minutes talking about, I don't know, how to pick the right standing desk. That has, in many cases, been the perception of DevRel and I don't think that's particularly defensible in our current macroeconomic climate. So, what are all those DevRel people doing?Richard: [laugh]. That's such a good loaded question.Corey: It's always good to be given a question where the answers are very clear there are right answers and wrong answers, and oh, wow. It's a fun minefield. Have fun. Go catch.Richard: Yeah. No, that's terrific. Yeah, and your first part, we do have a pretty well-distributed team globally, who does a lot of things. Our YouTube channel has, you know, we just crossed a million subscribers who are getting this stuff regularly. It's more than Amazon and Azure combined on YouTube. So, in terms of like that, audience—Corey: Counterpoint, you definitionally are YouTube. But that's neither here nor there, either. I don't believe you're juicing the stats, but it's also somehow… not as awesome if, say, I were to do it, which I'm working on it, but I have a face for radio and it shows.Richard: [laugh]. Yeah, but a lot of this has been… the quality and quantity. Like, you look at the quantity of video, it overwhelms everyone else because we spend a lot of time, we have a specific media team within my DevRel team that does the studio work, that does the production, that does all that stuff. And it's a concerted effort. That team's amazing. They do really awesome work.But, you know, a lot of DevRel as you say, [sigh] I don't know about you, I don't think I've ever truly believed in the sort of halo effect of if super smart person works at X company, even if they don't even talk about that company, that somehow presents good vibes and business benefits to that company. I don't think we've ever proven that's really true. Maybe you've seen counterpoints, where [crosstalk 00:16:34]—Corey: I can think of anecdata examples of it. Often though, on some level, for me at least, it's been okay someone I tremendously respect to the industry has gone to work at a company that I've never heard of. I will be paying attention to what that company does as a direct result. Conversely, when someone who is super well known, and has been working at a company for a while leaves and then either trashes the company on the way out or doesn't talk about it, it's a question of, what's going on? Did something horrible happen there? Should we no longer like that company? Are we not friends anymore? It's—and I don't know if that's necessarily constructive, either, but it also, on some level, feels like it can shorthand to oh, to be working DevRel, you have to be an influencer, which frankly, I find terrifying.Richard: Yeah. Yeah. I just—the modern DevRel, hopefully, is doing a little more of product-led growth style work. They're focusing specifically on how are we helping developers discover, engage, scale, become advocates themselves in the platform, increasing that flywheel through usage, but that has very discreet metrics, it has very specific ownership. Again, personally, I don't even think DevRel should do as much with sales teams because sales teams have hundreds and sometimes thousands of sales engineers and sales reps. It's amazing. They have exactly what they need.I don't think DevRel is a drop in the bucket to that team. I'd rather talk directly to developers, focus on people who are self-service signups, people who are developers in those big accounts. So, I think the modern DevRel team is doing more in that respect. But when I look at—I just look, Corey, this morning at what my team did last week—so the average DevRel team, I look at what advocacy does, teams writing code labs, they're building tutorials. Yes, they're doing some in person events. They wrote some blog posts, published some videos, shipped a couple open-source projects that they contribute to in, like gaming sector, we ship—we have a couple projects there.They're actually usually customer zero in the product. They use the product before it ships, provides bugs and feedback to the team, we run DORA workshops—because again, we're the DevOps Research and Assessment gang—we actually run the tutorial and Docs platform for Google Cloud. We have people who write code samples and reference apps. So, sometimes you see things publicly, but you don't see the 20,000 code samples in the docs, many written by our team. So, a lot of the times, DevRel is doing work to just enable on some of these different properties, whether that's blogs or docs, whether that's guest articles or event series, but all of this should be in service of having that credible relationship to help devs use the platform easier. And I love watching this team do that.But I think there's more to it now than years ago, where maybe it was just, let's do some amazing work and try to have some second, third-order effect. I think DevRel teams that can have very discrete metrics around leading indicators of long-term cloud consumption. And if you can't measure that successfully, you've probably got to rethink the team.[midroll 00:19:20]Corey: That's probably fair. I think that there's a tremendous series of… I want to call it thankless work. Like having done some of those ridiculous parody videos myself, people look at it and they chuckle and they wind up, that was clever and funny, and they move on to the next one. And they don't see the fact that, you know, behind the scenes for that three-minute video, there was a five-figure budget to pull all that together with a lot of people doing a bunch of disparate work. Done right, a lot of this stuff looks like it was easy or that there was no work at all.I mean, at some level, I'm as guilty of that as anyone. We're recording a podcast now that is going to be handed over to the folks at HumblePod. They are going to produce this into something that sounds coherent, they're going to fix audio issues, all kinds of other stuff across the board, a full transcript, and the rest. And all of that is invisible to me. It's like AI; it's the magic box I drop a file into and get podcast out the other side.And that does a disservice to those people who are actively working in that space to make things better. Because the good stuff that they do never gets attention, but then the company makes an interesting blunder in some way or another and suddenly, everyone's out there screaming and wondering why these people aren't responding on Twitter in 20 seconds when they're finding out about this stuff for the first time.Richard: Mm-hm. Yeah, that's fair. You know, different internal, external expectations of even DevRel. We've recently launched—I don't know if you caught it—something called Jump Start Solutions, which were executable reference architectures. You can come into the Google Cloud Console or hit one of our pages and go, “Hey, I want to do a multi-tier web app.” “Hey, I want to do a data processing pipeline.” Like, use cases.One click, we blow out the entire thing in the platform, use it, mess around with it, turn it off with one click. Most of those are built by DevRel. Like, my engineers have gone and built that. Tons of work behind the scenes. Really, like, production-grade quality type architectures, really, really great work. There's going to be—there's a dozen of these. We'll GA them at Next—but really, really cool work. That's DevRel. Now, that's behind-the-scenes work, but as engineering work.That can be some of the thankless work of setting up projects, deployment architectures, Terraform, all of them also dropped into GitHub, ton of work documenting those. But yeah, that looks like behind-the-scenes work. But that's what—I mean, most of DevRel is engineers. These are folks often just building the things that then devs can use to learn the platforms. Is it the flashy work? No. Is it the most important work? Probably.Corey: I do have a question I'd be remiss not to ask. Since the last time we spoke, relatively recently from this recording, Google—well, I'd say ‘Google announced,' but they kind of didn't—Squarespace announced that they'd be taking over Google domains. And there was a lot of silence, which I interpret, to be clear, as people at Google being caught by surprise, by large companies, communication is challenging. And that's fine, but I don't think it was anything necessarily nefarious.And then it came out further in time with an FAQ that Google published on their site, that Google Cloud domains was a part of this as well. And that took a lot of people aback, in the sense—not that it's hard to migrate a domain from one provider to another, but it brought up the old question of, if you're building something in cloud, how do you pick what to trust? And I want to be clear before you answer that, I know you work there. I know that there are constraints on what you can or cannot say.And for people who are wondering why I'm not hitting you harder on this, I want to be very explicit, I can ask you a whole bunch of questions that I already know the answer to, and that answer is that you can't comment. That's not constructive or creative. So, I don't want people to think that I'm not intentionally asking the hard questions, but I also know that I'm not going to get an answer and all I'll do is make you uncomfortable. But I think it's fair to ask, how do you evaluate what services or providers or other resources you're using when you're building in cloud that are going to be around, that you can trust building on top of?Richard: It's a fair question. Not everyone's on… let's update our software on a weekly basis and I can just swap things in left. You know, there's a reason that even Red Hat is so popular with Linux because as a government employee, I can use that Linux and know it's backwards compatible for 15 years. And they sell that. Like, that's the value, that this thing works forever.And Microsoft does the same with a lot of their server products. Like, you know, for better or for worse, [laugh] they will always kind of work with a component you wrote 15 years ago in SharePoint and somehow it runs today. I don't even know how that's possible. Love it. That's impressive.Now, there's a cost to that. There's a giant tax in the vendor space to make that work. But yeah, there's certain times where even with us, look, we are trying to get better and better at things like comms. And last year we announced—I checked them recently—you know, we have 185 Cloud products in our enterprise APIs. Meaning they have a very, very tight way we would deprecate with very, very long notice, they've got certain expectations on guarantees of how long you can use them, quality of service, all the SLAs.And so, for me, like, I would bank on, first off, for every cloud provider, whether they're anchor services. Build on those right? You know, S3 is not going anywhere from Amazon. Rock solid service. BigQuery Goodness gracious, it's the center of Google Cloud.And you look at a lot of services: what can you bet on that are the anchors? And then you can take bets on things that sit around it. There's times to be edgy and say, “Hey, I'll use Service Weaver,” which we open-sourced earlier this year. It's kind of a cool framework for building apps and we'll deconstruct it into microservices at deploy time. That's cool.Would I literally build my whole business on it? No, I don't think so. It's early stuff. Now, would I maybe use it also with some really boring VMs and boring API Gateway and boring storage? Totally. Those are going to be around forever.I think for me, personally, I try to think of how do I isolate things that have some variability to them. Now, to your point, sometimes you don't know there's variability. You would have just thought that service might be around forever. So, how are you supposed to know that that thing could go away at some point? And that's totally fair. I get that.Which is why we have to keep being better at comms, making sure more things are in our enterprise APIs, which is almost everything. So, you have some assurances, when I build this thing, I've got a multi-year runway if anything ever changes. Nothing's going to stay the same forever, but nothing should change tomorrow on a dime. We need more trust than that.Corey: Absolutely. And I agree. And the problem, too, is hidden dependencies. Let's say what is something very simple. I want to log in to [unintelligible 00:25:34] brand new AWS account and spin of a single EC2 instance. The end. Well, I can trust that EC2 is going to be there. Great. That's not one service you need to go through that critical path. It is a bare minimum six, possibly as many as twelve, depending upon what it is exactly you're doing.And it's the, you find out after the fact that oh, there was that hidden dependency in there that I wasn't fully aware of. That is a tricky and delicate balance to strike. And, again, no one is going to ever congratulate you—at all—on the decision to maintain a service that is internally painful and engineering-ly expensive to keep going, but as soon as you kill something, even it's for this thing doesn't have any customers, the narrative becomes, “They're screwing over their customers.” It's—they just said that it didn't have any. What's the concern here?It's a messaging problem; it is a reputation problem. Conversely, everyone knows that Amazon does not kill AWS services. Full stop. Yeah, that turns out everyone's wrong. By my count, they've killed ten, full-on AWS services and counting at the moment. But that is not the reputation that they have.Conversely, I think that the reputation that Google is going to kill everything that it touches is probably not accurate, though I don't know that I'd want to have them over to babysit either. So, I don't know. But it is something that it feels like you're swimming uphill on in many respects, just due to not even deprecation decisions, historically, so much as poor communication around them.Richard: Mm-hm. I mean, communication can always get better, you know. And that's, it's not our customers' problem to make sure that they can track every weird thing we feel like doing. It's not their challenge. If our business model changes or our strategy changes, that's not technically the customer's problem. So, it's always our job to make this as easy as possible. Anytime we don't, we have made a mistake.So, you know, even DevRel, hey, look, it puts teams in a tough spot. We want our customers to trust us. We have to earn that; you will never just give it to us. At the same time, as you say, “Hey, we're profitable. It's great. We're growing like weeds,” it's amazing to see how many people are using this platform. I mean, even services, you don't talk about having—I mean, doing really, really well. But I got to earn that. And you got to earn, more importantly, the scale. I don't want you to just kick the tires on Google Cloud; I want you to bet on it. But we're only going to earn that with really good support, really good price, stability, really good feeling like these services are rock solid. Have we totally earned that? We're getting there, but not as mature as we'd like to get yet, but I like where we're going.Corey: I agree. And reputations are tricky. I mean, recently InfluxDB deprecated two regions and wound up turning them off and deleting data. And they wound up getting massive blowback for this, which, to their credit, their co-founder and CTO, Paul Dix—who has been on the show before—wound up talking about and saying, “Yeah, that was us. We're taking ownership of this.”But the public announcement said that they had—that data in AWS was not recoverable and they're reaching out to see if the data in GCP was still available. At which point, I took the wrong impression from this. Like, whoa, whoa, whoa. Hang on. Hold the phone here. Does that mean that data that I delete from a Google Cloud account isn't really deleted?Because I have a whole bunch of regulators that would like a word if so. And Paul jumped onto that with, “No, no, no, no, no. I want to be clear, we have a backup system internally that we were using that has that set up. And we deleted the backups on the AWS side; we don't believe we did on the Google Cloud side. It's purely us, not a cloud provider problem.” It's like, “Okay, first, sorry for causing a fire drill.” Secondly, “Okay, that's great.” But the reason I jumped in that direction was just because it becomes so easy when a narrative gets out there to believe the worst about companies that you don't even realize you're doing it.Richard: No, I understand. It's reflexive. And I get it. And look, B2B is not B2C, you know? In B2B, it's not, “Build it and they will come.” I think we have the best cloud infrastructure, the best security posture, and the most sophisticated managed services. I believe that I use all the clouds. I think that's true. But it doesn't matter unless you also do the things around it, around support, security, you know, usability, trust, you have to go sell these things and bring them to people. You can't just sit back and say, “It's amazing. Everyone's going to use it.” You've got to earn that. And so, that's something that we're still on the journey of, but our foundation is terrific. We just got to do a better job on some of these intangibles around it.Corey: I agree with you, when you s—I think there's a spirited debate you could have on any of those things you said that you believe that Google Cloud is the best at, with the exception of security, where I think that is unquestionably. I think that is a lot less variable than the others. The others are more or less, “Who has the best cloud infrastructure?” Well, depends on who had what for breakfast today. But the simplicity and the approach you take to security is head and shoulders above the competition.And I want to make sure I give credit where due: it is because of that simplicity and default posturing that customers wind up better for it as a result. Otherwise, you wind up in this hell of, “You must have at least this much security training to responsibly secure your environment.” And that is never going to happen. People read far less than we wish they would. I want to make very clear that Google deserves the credit for that security posture.Richard: Yeah, and the other thing, look, I'll say that, from my observation, where we do something that feels a little special and different is we do think in platforms, we think in both how we build and how we operate and how the console is built by a platform team, you—singularly. How—[is 00:30:51] we're doing Duet AI that we've pre-announced at I/O and are shipping. That is a full platform experience covering a dozen services. That is really hard to do if you have a lot of isolation. So, we've done a really cool job thinking in platforms and giving that simplicity at that platform level. Hard to do, but again, we have to bring people to it. You're not going to discover it by accident.Corey: Richard, I will let you get back to your tear-filled late-night writing of tomorrow's Next keynote, but if people want to learn more—once the dust settles—where's the best place for them to find you?Richard: Yeah, hopefully, they continue to hang out at cloud.google.com and using all the free stuff, which is great. You can always find me at seroter.com. I read a bunch every day and then I've read a blog post every day about what I read, so if you ever want to tune in on that, just see what wacky things I'm checking out in tech, that is good. And I still hang out on different social networks, Twitter at @rseroter and LinkedIn and things like that. But yeah, join in and yell at me about anything I said.Corey: I did not realize you had a daily reading list of what you put up there. That is news to me and I will definitely track in, and then of course, yell at you from the cheap seats when I disagree with anything that you've chosen to include. Thank you so much for taking the time to speak with me and suffer the uncomfortable questions.Richard: Hey, I love it. If people aren't talking about us, then we don't matter, so I would much rather we'd be yelling about us than the opposite there.Corey: [laugh]. As always, it's been a pleasure. Richard Seroter, Director of Product Management and Developer Relations at Google Cloud. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment that you had an AI system write for you because you never learned how to structure a sentence.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Happenstance
42: Humble Happenstance with Chris Hill

Happenstance

Play Episode Listen Later Apr 14, 2023 34:03


Humble Happenstance. The things we never expected, the threads that weave throughout experiences, and the moments that lead us to the career path we love. Let's see where Happenstance takes us...Chris Hill, a Knoxville native, is the owner of podcast production company HumblePod, assisting clients worldwide in creating and developing podcasts. In addition to producing podcasts for renowned thought leaders, he co-hosts and produces the award-winning Our Humble Beer Podcast. As a University of Tennessee lecturer, Chris leads non-credit courses on podcasts and marketing. He holds an undergraduate degree in Marketing & Entrepreneurship from the University of Tennessee at Chattanooga and an MBA from King University. Chris serves as the President-Elect of the American Marketing Association in Knoxville and enjoys spending time with his family, international travel, outdoor exploration, and engaging in creative pursuits.Let's Connect:@HappenstanceThePodcast@CareerCoachCassie

Screaming in the Cloud
The Realities of Working in Data with Emily Gorcenski

Screaming in the Cloud

Play Episode Listen Later Mar 7, 2023 36:22


Emily Gorcenski, Data & AI Service Line Lead at Thoughtworks, joins Corey on Screaming in the Cloud to discuss how big data is changing our lives - both for the better, and the challenges that come with it. Emily explains how data is only important if you know what to do with it and have a plan to work with it, and why it's crucial to understand the use-by date on your data. Corey and Emily also discuss how big data problems aren't universal problems for the rest of the data community, how to address the ethics around AI, and the barriers to entry when pursuing a career in data. About EmilyEmily Gorcenski is a principal data scientist and the Data & AI Service Line Lead of ThoughtWorks Germany. Her background in computational mathematics and control systems engineering has given her the opportunity to work on data analysis and signal processing problems from a variety of complex and data intensive industries. In addition, she is a renowned data activist and has contributed to award-winning journalism through her use of data to combat extremist violence and terrorism. The opinions expressed are solely her own.Links Referenced: ThoughtWorks: https://www.thoughtworks.com/ Personal website: https://emilygorcenski.com Twitter: https://twitter.com/EmilyGorcenski Mastodon: https://mastodon.green/@emilygorcenski@indieweb.social TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Emily Gorcenski, who is the Data and AI Service Line Lead over at ThoughtWorks. Emily, thank you so much for joining me today. I appreciate it.Emily: Thank you for having me. I'm happy to be here.Corey: What is it you do, exactly? Take it away.Emily: Yeah, so I run the data side of our business at ThoughtWorks, Germany. That means data engineering work, data platform work, data science work. I'm a data scientist by training. And you know, we're a consulting company, so I'm working with clients and trying to help them through the, sort of, morphing landscape that data is these days. You know, should we be migrating to the cloud with our data? What can we migrate to the cloud with our data? Where should we be doing with our data scientists and how do we make our data analysts' lives easier? So, it's a lot of questions like that and trying to figure out the strategy and all of those things.Corey: You might be one of the most perfectly positioned people to ask this question to because one of the challenges that I've run into consistently and persistently—because I watch a lot of AWS keynotes—is that they always come up with the same talking point, that data is effectively the modern gold. And data is what unlocks value to your busin—“Every business agrees,” because someone who's dressed in what they think is a nice suit on stage is saying that it's, “Okay, you're trying to sell me something. What's the deal here?” Then I check my email and I discover that Amazon has sent me the same email about the same problem for every region I've deployed things to in AWS. And, “Oh, you deploy this to one of the Japanese regions. We're going to send that to you in Japanese as a result.”And it's like, okay, for a company that says data is important, they have no idea who any of their customers are at this point, is that is the takeaway here. How real is, “Data is important,” versus, “We charge by the gigabyte so you should save all of your data and then run expensive things on top of it.”Emily: I think data is very important, if you know what you're going to do with it and if you have a plan for how to work with it. I think if you look at the history of computing, of technology, if you go back 20 years to maybe the early days of the big data era, right? Everyone's like, “Oh, we've got big data. Data is going to be big.” And for some reason, we never questioned why, like, we were thinking that the ‘big' in ‘big data' meant big is in volume and not ‘big' as in ‘big pharma.'This sort of revolution never really happened for most companies. Sure, some companies got a lot of value from the, sort of, data mining and just gather everything and collect everything and if you hit it with a big computational hammer, insights will come out and somehow there's insights will make you money through magic. The reality is much more prosaic. If you want to make money with data, you have to have a plan for what you're going to do with data. You have to know what you're looking for and you have to know exactly what you're going to get when you look at your data and when you try to answer questions with it.And so, when we see somebody like Amazon not being able to correlate that the fact that you're the account owner for all of these different accounts and that the language should be English and all of these things, that's part of the operational problem because it's annoying, to try to do joins across multiple tables in multiple regions and all of those things, but it's also part—you know, nobody has figured out how this adds value for them to do that, right? There's a part of it where it's like, this is just professionalism, but there's a part of it, where it's also like… whatever. You've got Google Translate. Figure out yourself. We're just going to get through it.I think that… as time has evolved from the initial waves of the big data era into the data science era, and now we're in, you know, all sorts of different architectures and principles and all of these things, most companies still haven't figured out what to do with data, right? They're still investing a ton of money to answer the same analytics questions that they were answering 20 years ago. And for me, I think that's a disappointment in some regards because we do have better tools now. We can do so many more interesting things if you give people the opportunity.Corey: One of the things that always seemed a little odd was, back when I wielded root credentials in anger—anger,' of course, being my name for the production environment, as opposed to, “Theory,” which is what I call staging because it works in theory, but not in production. I digress—it always felt like I was getting constant pushback from folks of, “You can't delete that data. It's incredibly important because one day, we're going to find a way to unlock the magic of it.” And it's, “These are web server logs that are 15 years old, and 98% of them by volume are load balancer health checks because it turns out that back in those days, baby seals got more hits than our website did, so that's not really a thing that we wind up—that's going to add much value to it.” And then from my perspective, at least, given that I tend to live, eat, sleep, breathe cloud these days, AWS did something that was refreshingly customer-obsessed when they came out with Glacier Deep Archive.Because the economics of that are if you want to store a petabyte of data, with a 12-hour latency on request for things like archival logs and whatnot, it's $1,000 a month per petabyte, which is okay, you have now hit a price point where it is no longer worth my time to argue with you. We're just not going to delete anything ever again. Problem solved. Then came GDPR, which is neither here nor there and we actually want to get rid of those things for a variety of excellent legal reasons. And the dance continues.But my argument against getting rid of data because it's super expensive no longer holds water in the way that it wants did for anything remotely resembling a reasonable amount of data. Then again, that's getting reinvented all the time. I used to be very, I guess we'll call it, I guess, a data minimalist. I don't want to store a bunch of data, mostly because I'm not a data person. I am very bad thinking in that way.I consider SQL to be the chests of the programming world and I'm not particularly great at it. And I also unlucky and have an aura, so if I destroy a bunch of stateless web servers, okay, we can all laugh about that, but let's keep me the hell away from the data warehouse if we still want a company tomorrow morning. And that was sort of my experience. And I understand my bias in that direction. But I'm starting to see magic get unlocked.Emily: Yeah, I think, you know, you said earlier, there's, like, this mindset, like, data is the new gold or data is new oil or whatever. And I think it's actually more true that data is the new milk, right? It goes bad if you don't use it, you know, before a certain point in time. And at a certain point in time, it's not going to be very offensive if you just leave it locked in the jug, but as soon as you try to open it, you're going to have a lot of problems. Data is very, very cheap to store these days. It's very easy to hold data; it's very expensive to process data.And I think that's where the shift has gone, right? There's sort of this, like, Oracle DBA legacy of, like, “Don't let the software developers touch the prod database.” And they've kind of kept their, like, arcane witchcraft to themselves, and that mindset has persisted. But now it's sort of shifted into all of these other architectural patterns that are just abstractions on top of this, don't let the software engineers touch the data store, right? So, we have these, like, streaming-first architectures, which are great. They're great for software devs. They're great for software devs. And they're great for data engineers who like to play with big powerful technology.They're terrible if you want to answer a question, like, “How many customers that I have yesterday?” And these are the things that I think are some of the central challenges, right? A Kappa architecture—you know, streaming-first architecture—is amazing if you want to improve your application developer throughput. And it's amazing if you want to build real-time analytics or streaming analytics into your platform. But it's terrible if you want your data lake to be navigable. It's terrible if you want to find the right data that makes sense to do the more complex things. And it becomes very expensive to try to process it.Corey: One of the problems I think I have that is that if I take a look at the data volumes that I work with in my day-to-day job, I'm dealing with AWS billing data as spit out by the AWS billing system. And there isn't really a big data problem here. If you take a look at some of the larger clients, okay, maybe I'm trying to consume a CSV that's ten gigabytes. Yes, Excel is going to violently scream itself to death if I try to wind up loading it there, and then my computer smells like burning metal all afternoon. But if it fits in RAM, it doesn't really feel like it's a big data problem, on some level.And it just feels that when I look at the landscape of all the different tools you can use for things like this, they just feel like it's more or less, hmm, “I have a loose thread on my shirt. Could you pass me that chainsaw for a second?” It just seems like stupendous overkill for anything that I'm working with. Counterpoint; that the clients I'm working with have massive data farms and my default response when I meet someone who's very good at an area that I don't do a lot of work in is—counterintuitively to what a lot of people apparently do on Twitter—is not the default assumption of oh, “I don't know anything about that space. It must be worthless and they must be dumb.”No. That is not the default approach to take anything, from my perspective. So, it's clear there's something very much there that I just don't see slash understand. That is a very roundabout way of saying what could be uncharitably distilled down to, “So, is your entire career bullshit?” But no, it is clearly not.There is value being extracted from this and it's powerful. I just think that there's been an industry-wide, relatively poor job done of explaining that value in ways that don't come across as contrived or profoundly disturbing.Emily: Yeah, I think there's a ton of value in doing things right. It gets very complicated to try to explain the nuances of when and how data can actually be useful, right? Oftentimes, your historical data, you know, it really only tells you about what happened in the past. And you can throw some great mathematics at it and try to use it to predict the future in some sense, but it's not necessarily great at what happens when you hit really hard changes, right?For example, when the Coronavirus pandemic hit and purchaser and consumer behavior changed overnight. There was no data in the data set that explained that consumer behavior. And so, what you saw is a lot of these things like supply chain issues, which are very heavily data-driven on a normal circumstance, there was nothing in that data that allowed those algorithms to optimize for the reality that we were seeing at that scale, right? Even if you look at advanced logistics companies, they know what to do when there's a hurricane coming or when there's been an earthquake or things like that. They have disaster scenarios.But nobody has ever done anything like this at the global scale, right? And so, what we saw was this hard reset that we're still feeling the repercussions of today. Yes, there were people who couldn't work and we had lockdowns and all that stuff, but we also have an effect from the impact of the way that we built the systems to work with the data that we need to shuffle around. And so, I think that there is value in being able to process these really, really large datasets, but I think that actually, there's also a lot of value in being able to solve smaller, simpler problems, right? Not everything is a big data problem, not everything requires a ton of data to solve.It's more about the mindset that you use to look at the data, to explore the data, and what you're doing with it. And I think the challenge here is that, you know, everyone wants to believe that they have a big data problem because it feels like you have to have a big data problem if you—Corey: All the cool kids are having this kind of problem.Emily: You have to have big data to sit at the grownup's table. And so, what's happened is we've optimized a lot of tools around solving big data problems and oftentimes, these tools are really poor at solving normal data problems. And there's a lot of money being spent in a lot of overkill engineering in the data space.Corey: On some level, it feels like there has been a dramatic misrepresentation of this. I had an article that went out last year where I called machine-learning selling pickaxes into a digital gold rush. And someone I know at AWS responded to that and probably the best way possible—she works over on their machine-learning group—she sent me a foam Minecraft pickaxe that now is hanging on my office wall. And that gets more commentary than anything, including the customized oil painting I have of Billy the Platypus fighting an AWS Billing Dragon. No, people want to talk about the Minecraft pickaxe.It's amazing. It's first, where is this creativity in any of the marketing that this department is putting out? But two it's clearly not accurate. And what it took for me to see that was a couple of things that I built myself. I built a Twitter thread client that would create Twitter threads, back when Twitter was a place that wasn't overrun by some of the worst people in the world and turned into BirdChan.But that was great. It would automatically do OCR on images that I uploaded, it would describe the image to you using Azure's Cognitive Vision API. And that was magic. And now I see things like ChatGPT, and that's magic. But you take a look at the way that the cloud companies have been describing the power of machine learning in AI, they wind up getting someone with a doctorate whose first language is math getting on stage for 45 minutes and just yelling at you in Star Trek technobabble to the point where you have no idea what the hell they're saying.And occasionally other data scientists say, “Yeah, I think he's just shining everyone on at this point. But yeah, okay.” It still becomes unclear. It takes seeing the value of it for it to finally click. People make fun of it, but the Hot Dog, Not A Hot Dog app is the kind of valuable breakthrough that suddenly makes this intangible thing very real for people.Emily: I think there's a lot of impressive stuff and ChatGPT is fantastically impressive. I actually used ChatGPT to write a letter to some German government agency to deal with some bureaucracy. It was amazing. It did it, was grammatically correct, it got me what I needed, and it saved me a ton of time. I think that these tools are really, really powerful.Now, the thing is, not every company needs to build its own ChatGPT. Maybe they need to integrate it, maybe there's an application for it somewhere in their landscape of product, in their landscape of services, in the landscape of their interim internal tooling. And I would be thrilled actually to see some of that be brought into reality in the next couple of years. But you also have to remember that ChatGPT is not something that came because we have, like, a really great breakthrough in AI last year or something like that. It stacked upon 40 years of research.We've gone through three new waves of neural networking in that time to get to this point, and it solves one class of problem, which is honestly a fairly narrow class of problem. And so, what I see is a lot of companies that have much more mundane problems, but where data can actually still really help them. Like how do you process Cambodian driver's licenses with OCR, right? These are the types of things that if you had a training data set that was every Cambodian person's driver's license for the last ten years, you're still not going to get the data volumes that even a day worth of Amazon's marketplace generates, right? And so, you need to be able to solve these problems still with data without resorting to the cudgel that is a big data solution, right?So, there's still a niche, a valuable niche, for solving problems with data without having to necessarily resort to, we have to load the entire internet into our stream and throw GPUs at it all day long and spend hundreds of—tens of millions of dollars in training. I don't know, maybe hundreds of millions; however much ChatGPT just raised. There's an in-between that I think is vastly underserved by what people are talking about these days.Corey: There is so much attention being given to this and it feels almost like there has been a concerted and defined effort to almost talk in circles and remove people from the humanity and the human consequences of what it is that they're doing. When I was younger, in my more reckless years, I was never much of a fan of the idea of government regulation. But now it has become abundantly clear that our industry, regardless of how you want to define industry, how—describe a society—cannot self-regulate when it comes to data that has the potential to ruin people's lives. I mean, I spent a fair bit of my time in my career working in financial services in a bunch of different ways. And at least in those jobs, it was only money.The scariest thing I ever dealt with, from a data perspective is when I did a brief stint at Grindr because that was the sort of problem where if that data gets out, people will die. And I have not had to think about things like that have that level of import before or since, for which I'm eternally grateful. “It's only money,” which is a weird thing for a guy who fixes cloud bills for a living to say. And if I say that in a client call, it's not going to go very well. But it's the truth. Money is one of those things that can be fixed. It can be addressed in due course. There are always opportunities there. Someone just been outed to their friends, family, and they feel their life is now in shambles around them, you can't unring that particular bell.Emily: Yeah. And in some countries, it can lead to imprisonment, or—Corey: It can lead to death sentences, yes. It's absolutely not acceptable.Emily: There's a lot to say about the ethics of where we are. And I think that as a lot of these high profile, you know, AI tools have come out over the last year or so, so you know, Stable Diffusion and ChatGPT and all of this stuff, there's been a lot of conversation that is sort of trying to put some counterbalance on what we're seeing. And I don't know that it's going to be successful. I think that, you know, I've been speaking about ethics and technology for a long time and I think that we need to mature and get to the next level of actually addressing the ethical problems in technology. Because it's so far beyond things like, “Oh, you know, if there's a biased training data set and therefore the algorithm is biased,” right?Everyone knows that by now, right? And the people who don't know that, don't care. We need to get much beyond where, you know, these conversations about ethics and technology are going because it's a manifold problem. We have issues with the people labeling this data are paid, you know, pennies per hour to deal with some of the most horrific content you've ever seen. I mean, I'm somebody who has immersed myself in a lot of horrific content for some of the work that I have done, and this is, you know, so far beyond what I've had to deal with in my life that I can't even imagine it. You couldn't pay me enough money to do it and we're paying people in developing nations, you know, a buck-thirty-five an hour to do this. I think—Corey: But you must understand, Emily, that given the standard of living where they are, that that is perfectly normal and we wouldn't want to distort local market dynamics. So, if they make a buck-fifty a day, we are going to be generous gods and pay them a whopping dollar-seventy a day, and now we feel good about ourselves. And no, it's not about exploitation. It's about raising up an emerging market. And other happy horseshit that lies people tell themselves.Emily: Yes, it is. Yes, it is. And we've built—you know, the industry has built its back on that. It's raised itself up on this type of labor. It's raised itself up on taking texts and images without permission of the creators. And, you know, there's—I'm not a lawyer and I'm not going to play one, but I do know that derivative use is something that at least under American law, is something that can be safely done. It would be a bad world if derivative use was not something that we had freely available, I think, and on the balance.But our laws, the thing is, our laws don't account for the scale. Our laws about things like fair use, derivative use, are for if you see a picture and you want to take your own interpretation, or if you see an image and you want to make a parody, right? It's a one-to-one thing. You can't make 5 million parody images based on somebody's art, yourself. These laws were never built for this scale.And so, I think that where AI is exploiting society is it's exploiting a set of ethics, a set of laws, and a set of morals that are built around a set of behavior that is designed around normal human interaction scales, you know, one person standing in front of a lecture hall or friends talking with each other or things like that. The world was not meant for a single person to be able to speak to hundreds of thousands of people or to manipulate hundreds of thousands of images per day. It's actually—I find it terrifying. Like, the fact that me, a normal person, has a Twitter following that, you know, if I wanted to, I can have 50 million impressions in a month. This is not a normal thing for a normal human being to have.And so, I think that as we build this technology, we have to also say, we're changing the landscape of human ethics by our ability to act at scale. And yes, you're right. Regulation is possibly one way that can help this, but I think that we also need to embed cultural values in how we're using the technology and how we're shaping our businesses to use the technology. It can be used responsibly. I mean, like I said, ChatGPT helped me with a visa issue, sending an email to the immigration office in Berlin. That's a fantastic thing. That's a net positive for me; hopefully, for humanity. I wasn't about to pay a lawyer to do it. But where's the balance, right? And it's a complex topic.Corey: It is. It absolutely is. There is one last topic that I would like to talk to you about that's a little less heavy. And I've got to be direct with you that I'm not trying to be unkind, but you've disappointed me. Because you mentioned to me at one point, when I asked how things were going in your AWS universe, you said, “Well, aside from the bank heist, reasonably well.”And I thought that you were blessed as with something I always look for, which is the gift of glorious metaphor. Unfortunately, as I said, you've disappointed me. It was not a metaphor; it was the literal truth. What the hell kind of bank heist could possibly affect an AWS account? This sounds like something out of a movie. Hit me with it.Emily: Yeah, you know, I think in the SRE world, we tell people to focus on the high probability, low impact things because that's where it's going to really hurt your business, and let the experts deal with the black swan events because they're pretty unlikely. You know, a normal business doesn't have to worry about terrorists breaking into the Google data center or a gang of thieves breaking into a bank vault. Apparently, that is something that I have to worry about because I have some data in my personal life that I needed to protect, like all other people. And I decided, like a reasonable and secure and smart human being who has a little bit of extra spending cash that I would do the safer thing and take my backup hard drive and my old phones and put them in a safety deposit box at an old private bank that has, you know, a vault that's behind the meter-and-a-half thick steel door and has two guards all the time, cameras everywhere. And I said, “What is the safest possible thing that you can do to store your backups?” Obviously, you put it in a secure storage location, right? And then, you know, I don't use my AWS account, my personal AWS account so much anymore. I have work accounts. I have test accounts—Corey: Oh, yeah. It's honestly the best way to have an AWS account is just having someone else having a payment instrument attached to it because otherwise oh God, you're on the hook for that yourself and nobody wants that.Emily: Absolutely. And you know, creating new email addresses for new trial accounts is really just a pain in the ass. So, you know, I have my phone, you know, from five years ago, sitting in this bank vault and I figured that was pretty secure. Until I got an email [laugh] from the Berlin Polizei saying, “There has been a break-in.” And I went and I looked at the news and apparently, a gang of thieves has pulled off the most epic heist in recent European history.This is barely in the news. Like, unless you speak German, you're probably not going to find any news about this. But a gang of thieves broke into this bank vault and broke open the safety deposit boxes. And it turns out that this vault was also the location where a luxury watch consigner had been storing his watches. So, they made off with some, like, tens of millions of dollars of luxury watches. And then also the phone that had my 2FA for my Amazon account. So, the total value, you know, potential theft of this was probably somewhere in the $500 million range if they set up a SageMaker instance on my account, perhaps.Corey: This episode is sponsored in part by Honeycomb. I'm not going to dance around the problem. Your. Engineers. Are. Burned. Out. They're tired from pagers waking them up at 2 am for something that could have waited until after their morning coffee. Ring Ring, Who's There? It's Nagios, the original call of duty! They're fed up with relying on two or three different “monitoring tools” that still require them to manually trudge through logs to decipher what might be wrong. Simply put, there's a better way. Observability tools like Honeycomb (and very little else becau se they do admittedly set the bar) show you the patterns and outliers of how users experience your code in complex and unpredictable environments so you can spend less time firefighting and more time innovating. It's great for your business, great for your engineers, and, most importantly, great for your customers. Try FREE today at honeycomb.io/screaminginthecloud. That's honeycomb.io/screaminginthecloud.Corey: The really annoying part that you are going to kick yourself on about this—and I'm not kidding—is, I've looked up the news articles on this event and it happened, something like two or three days after AWS put out the best release of last years, or any other re:Invent—past, present, future—which is finally allowing multiple MFA devices on root accounts. So finally, we can stop having safes with these things or you can have two devices or you can have multiple people in Covid times out of remote sides of different parts of the world and still get into the thing. But until then, nope. It's either no MFA or you have to store it somewhere ridiculous like that and access becomes a freaking problem in the event that the device is lost, or in this case stolen.Emily: [laugh]. I will just beg the thieves, if you're out there, if you're secretly actually a bunch of cloud engineers who needed to break into a luxury watch consignment storage vault so that you can pay your cloud bills, please have mercy on my poor AWS account. But also I'll tell you that the credit card attached to it is expired so you won't have any luck.Corey: Yeah. Really sad part. Despite having the unexpired credit card, it just means that the charge won't go through. They're still going to hold you responsible for it. It's the worst advice I see people—Emily: [laugh].Corey: Well, intentioned—giving each other on places like Reddit where the other children hang out. And it's, “Oh, just use a prepaid gift card so it can only charge you so much.” It's yeah, and then you get exploited like someone recently was and start accruing $60,000 a day in Lambda charges on an otherwise idle account and Amazon will come after you with a straight face after a week. And, like, “Yes, we'd like our $360,000, please.”Emily: Yes.Corey: “We tried to charge the credit card and wouldn't you know, it expired. Could you get on that please? We'd like our money faster if you wouldn't mind.” And then you wind up in absolute hell. Now, credit where due, they in every case I am aware of that is not looking like fraud's close cousin, they have made it right, on some level. But it takes three weeks of back and forth and interminable waiting.And you're sitting there freaking out, especially if you're someone who does not have a spare half-million dollars sitting around. Imagine who—“You sound poor. Have you tried not being that?” And I'm firmly convinced that it a matter of time until someone does something truly tragic because they don't understand that it takes forever, but it will go away. And from my perspective, there's no bigger problem that AWS needs to fix than surprise lifelong earnings bills to some poor freaking student who is just trying to stand up a website as part of a class.Emily: All of the clouds have these missing stairs in them. And it's really easy because they make it—one of the things that a lot of the cloud providers do is they make it really easy for you to spin up things to test them. And they make it really, really hard to find where it is to shut it all down. The data science is awful at this. As a data scientist, I work with a lot of data science tools, and every cloud has, like, the spin up your magical data science computing environment so that your data scientist can, like, bang on the data with you know, high-performance compute for a while.And you know, it's one click of a button and you type in a couple of na—you know, a couple of things name, your service or whatever, name your resource. You click a couple buttons and you spin it up, but behind the scenes, it's setting up a Kubernetes cluster and it's setting up some storage bucket and it's setting up some data pipelines and it's setting up some monitoring stuff and it's setting up a VM in order to run all of this stuff. And the next thing that you know, you're burning 100, 200 euro a day, just to, like, to figure out if you can load a CSV into pandas using a Jupyter Notebook. And you're like—when you try to shut it all down, you can't. It's you have to figure, oh, there is a networking thing set up. Well, nobody told me there's a networking thing set up. You know? How do I delete that?Corey: You didn't say please, so here you go. Without for me, it's not even the giant bill going from $4 a month in S3 charges to half a million bucks because that is pretty obvious from the outside just what the hell's been happening. It's the little stuff. I am still—since last summer—waiting for a refund on $260 of ‘because we said so' SageMaker credits because of a change of their billing system, for a 45-minute experiment I had done eight months before that.Emily: Yep.Corey: Wild stuff. Wild stuff. And I have no tolerance for people saying, “Oh, you should just read the pricing page and understand it better.” Yeah, listen, jackhole. I do this for a living. If I can fall victim to it, anyone can. I promise. It is not that I don't know how the billing system works and what to do to avoid unexpected charges.And I'm just luck—because if I hadn't caught it with my systems three days into the month, it would have been a $2,000 surprise. And yeah, I run a company. I can live with that. I wouldn't be happy, but whatever. It is immaterial compared to, you know, payroll.Emily: I think it's kind of a rite of passage, you know, to have the $150 surprise Redshift bill at the end of the month from your personal test account. And it's sad, you know? I think that there's so much better that they can do and that they should do. Sort of as a tangent, one of the challenges that I see in the data space is that it's so hard to break into data because the tooling is so complex and it requires so much extra knowledge, right? If you want to become a software developer, you can develop a microservice on your machine, you can build a web app on your machine, you can set up Ruby on Rails, or Flask, or you know, .NET, or whatever you want. And you can do all of that locally.And you can learn everything you need to know about React, or Terraform, or whatever, running locally. You can't do that with data stuff. You can't do that with BigQuery. You can't do that with Redshift. The only way that you can learn this stuff is if you have an account with that setup and you're paying the money to execute on it. And that makes it a really high barrier for entry for anyone to get into this space. It makes it really hard to learn. Because if you want to learn anything by doing, like many of us in the industry have done, it's going to cost you a ton of money just to [BLEEP] around and find out.Corey: Yes. And no one likes the find out part of those stories.Emily: Nobody likes to find out when it comes to your bill.Corey: And to tie it back to the data story of it, it is clearly some form of batch processing because it tries to be an eight-hour consistency model. Yeah, I assume for everything, it's 72. But what that means is that you are significantly far removed from doing a thing and finding out what that thing costs. And that's the direct charges. There's always the oh, I'm going to set things up and it isn't going to screw you over on the bill. You're just planting a beautiful landmine you're going to stumble blindly into in three months when you do something else and didn't realize what that means.And the worst part is it feels victim-blamey. I mean, this is my pro—I guess this is one of the reasons I guess I'm so down on data, even now. It's because I contextualize it in a sense of the AWS bill. No one's happy dealing with that. You ever met a happy accountant? You have not.Emily: Nope. Nope [laugh]. Especially when it comes to clouds stuff.Corey: Oh yeah.Emily: Especially these days, when we're all looking to save energy, save money in the cloud.Corey: Ideally, save the planet. Sustainability and saving money align on the axis of ‘turn that shit off.' It's great. We can hope for a brighter tomorrow.Emily: Yep.Corey: I really want to thank you for being so generous with your time. If people want to learn more, where can they find you? Apparently filing police reports after bank heists, which you know, it's a great place to meet people.Emily: Yeah. You know, the largest criminal act in Berlin is certainly a place you want to go to get your cloud advice. You can find me, I have a website. It's my name, emilygorcenski.com.You can find me on Twitter, but I don't really post there anymore. And I'm on Mastodon at some place because Mastodon is weird and kind of a mess. But if you search me, I'm really not that hard to find. My name is harder to spell, but you'll see it in the podcast description.Corey: And we will, of course, put links to all of this in the show notes. Thank you so much for your time. I really appreciate it.Emily: Thank you for having me.Corey: Emily Gorcenski, Data and AI Service Line Lead at ThoughtWorks. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insipid, insulting comment, talking about why data doesn't actually matter at all. And then the comment will disappear into the ether because your podcast platform of choice feels the same way about your crappy comment.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The 4D Approach to Cloud Sustainability with Catharine Strauss

Screaming in the Cloud

Play Episode Listen Later Feb 7, 2023 40:37


About CatherineCatharine brings more than fifteen years of experience building global networks and large scale data center infrastructure to the challenge of scaling quickly and safely.  She loves building engaged and curious teams, providing insightful forecasting tools, and thinking about how to build to scale in a sustainable way to preserve a humane quality of life on this swiftly tilting planet. When not trying to predict the future as a capacity planner, she's often knitting extremely complicated sweaters and coming up with ridiculous puns.TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Tailscale SSH is a new, and arguably better way to SSH. Once you've enabled Tailscale SSH on your server and user devices, Tailscale takes care of the rest. So you don't need to manage, rotate, or distribute new SSH keys every time someone on your team leaves. Pretty cool, right? Tailscale gives each device in your network a node key to connect to your VPN, and uses that same key for SSH authorization and encryption. So basically you're SSHing the same way that you're already managing your network.So what's the benefit? Well, built-in key rotation, the ability to manage permissions as code, connectivity between any two devices, and reduced latency. You can even ask users to re-authenticate SSH connections for that extra bit of security to keep the compliance folks happy. Try Tailscale now - it's free forever for personal use.Corey: Kentik provides Cloud and NetOps teams with complete visibility into hybrid and multi-cloud networks. Ensure an amazing customer experience, reduce cloud and network costs, and optimize performance at scale — from internet to data center to container to cloud. Learn how you can get control of complex cloud networks at www.kentik.com, and see why companies like Zoom, Twitch, New Relic, Box, Ebay, Viasat, GoDaddy, booking.com, and many, many more choose Kentik as their network observability platform. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. As a cloud economist, I wind up talking to an awful lot of folks about optimizing their AWS bills. That is what it says on the tent. It's what I do. Increasingly, I'm having discussions around the idea of sustainability because the number-one rule of cloud economics is also the number-one rule for sustainability. Step one, turn that shit off. If you're not using it, turn that shit off. If it doesn't add value commensurate to what it costs, turn that shit off. Because the best way to optimize something is to get rid of it. Today, to go into bit more depth on that, my guest is Catharine Strauss. Catharine, thank you for joining me.Catharine: Thank you. I'm excited.Corey: So, you have a long and storied career of effectively running global-scale network operations in terms of capacity planning, in terms of building out world-spanning networks, and logistics of doing that. You know, the stuff that's completely invisible to most people, except when it breaks. So, it's more or less a digital plumbing-type of role. How did you go from there to thinking about sustainability in a networking context?Catharine: Yeah. Thank you. I got dropped into networking as a career option, completely from the physical side, building out global networks. And all of the constraints that we were dealing with, were largely physical, logistical, or legal. So, we would do things like ship things through customs and have items stopped because they were miscategorized as munitions because they were lasers, “Pew, pew.” We had things like contract negotiations for data centers to do trenching into them that needed easements with the railroad. Like, just weird stuff that you don't normally think of as a cloud-project constraint. So, all of these physical constraints made it just more interesting to me because they were just so tactile.Corey: There's so much that is out there in the world that is completely divorced from anything that you have to think about in terms of building out networks and software. Until, suddenly, it's very much there, and you're learning that there's an entire universe/industry/ecosystem that you know nothing about that you now need to get into. Railroad easements are a terrific example of that. It's, “Wait, what, we're building the cloud here. What the hell does the railroad have to do—is there actually a robber baron I need to go fight somewhere? How does this work?” The old saw about the cloud just being someone else's computer is not particularly helpful, but it is true. There's a tremendous amount of work that goes into building out the physical footprint for a data center—let alone a hyper-scale cloud provider's data center—that does not have to be something the vast majority of us need to think about anymore. And that's, kind of, glorious and magical. But it does mean that there are people who very much need to think about that.Increasingly, we're seeing the sustainability and climate story of cloud extend beyond those folks. There are no carbon-footprint tools and dashboards in all the major cloud providers that I'm aware of. Well, I'd say it's a good start, but in some cases, it's barely that. It feels like this is something that people are at least starting to take semi-seriously in the context of cloud. How have you seen that evolving?Catharine: So, when I think about a data center, I see it as a factory where you take heavy metals and electricity—Corey: And turn them into YAML. Sorry, sorry. Go ahead.Catharine: [laugh] you turn them into spreadsheets, cat videos, and waste heat, right? So, when I'm looking at, you know, this tremendous global network, I started to look into what's the environmental cost of that. And what I found was, kind of, surprising. Like, three percent of our total global emissions, is coming from computing and the internet, and all of these things that I spent my career building. And I started to have waves of regret. And looking at that in the context of: how can we make things better? How can we make things more efficient, and how can we operate better with the physical constraints of electricity and energy grids, and what they are struggling with doing to provide us with what we need or managing this beast of an internet?Corey: Right now, it feels like there's an awful lot of—I don't know what the term is, greenwashing, cloud washing—basically, making your problem someone else's problem. I feel like the cloud providers are in a position where they have to walk something of a tightrope. Because on the one hand, yeah, there are choices I can make as a customer that will absolutely improve the carbon footprint of what it is that I'm doing. On the other, they never invite me to have conversations to negotiate with their energy providers around a lot of these things. So, it feels like, “Oh, yeah. Make sure that the cloud you're using is green enough.” “Wasn't that what I'm paying you for?” That feels like it's a really weird dichotomy that I'm still struggling to reconcile exactly how to approach.Catharine: Yeah. I, you know, I looked at the Amazon Sustainability platform, and they've got those two parts of it. They've got sustainability in the cloud, sustainability of the cloud. And, you know, I've worked with enough Google SREs to know that they and Amazon Data Center providers and Azure, they all have a vested interest in making it as cheap as possible to operate their data centers. And that goes far beyond individual server performance. It goes to the way that they do cooling. And, like, the innovations there are tremendous.But they're not doing that out of the goodness of their heart; they're doing that because it makes business sense for them. It reduces the cost for them to provide these services. And, you know, in some cases, it really obscures things because they will sign energy contracts and then keep them super-secret. There's very little transparency because these are industry secrets, and they don't want to damage their negotiation positions for the next deal that they sign. So, Amazon, you know, will put PR releases out there about all of their solar farms that they are sustaining in Virginia. But they don't talk about what percent that is of their total energy consumption, and they don't talk about, you know, what the total footprint is because that is considered either a security risk or an economic risk if people were to find out, you know, exactly how much energy they're pulling.Corey: I am, somewhat, sympathetic, but only to the reality that the more carbon transparency that a cloud provider gives around the relative greenness of a given service that they offer in a given region, the closer they get to exposing a significant component on their per-service margins. And they're, understandably, extraordinarily reluctant to that because then people will do things like figure out exactly how much are they up-charging things like data egress and ongoing per-hour session charges for some sage-maker nonsense.There's an awful lot out there that I don't think they want to have out there just for, on the one hand, the small one that's easy to deal with is the customer uprising. But more so, they don't want to expose this to their competitors.Catharine: Yeah, I don't know that I have a ton of sympathy. If the service is cheaper because they're running off of green energy, as we have increasingly seen in the market that solar and wind are just the cheapest alternative. If it's cheaper for Amazon and Google, I, kind of, feel like they should convey that, so that people can take advantage of those savings.We've got a demand issue, where, I think, the demand for these renewable energy sources is outstripping supply. But they're planning for the next five years where that decreasingly becomes an issue. So, why not let people operate according to their values, or even, you know, their own best interests in choosing data centers that are emitting fewer emissions into the world?Corey: There seems to be a singular focus between all of these providers in what they're displaying through their tools. And that is on carbon footprint, and it is also suspiciously, tightly bounded to what looks like compute. There're a lot of other climate-impacting effects of large-scale cloud providers. It has significant disruption to local waterways. There are tremendous questions around the sustainability around manufacturing of the various components that get turned into equipment that gets sold to these providers then integrating into other things. There's an awful lot of downstream effects. And I can't shake the feeling that focusing on how renewable the energy is to power the compute, focuses on a very small part of the story. How do you land on that one?Catharine: I would agree with that. I think people will often say, “Oh, what you should do if you're managing,” you know, “Your data center resources is for efficiency, you should be updating your hardware once a year or putting out the resources that are the most powerful.” The tipping point might be later than you actually think because what happens to those resources when they go back out into the environment when you decommission them? It's so hard to resell them, especially, globally. The reuse of gear is becoming harder and harder, and so the lifetime of that gear, that equipment, those servers, routers, whatnot, all of that is becoming harder and harder to do. And the disposal of those materials has a tremendous impact.So, I do think the energy is a big part of it, and it feels like the thing that we can control the most. But, like, if you really want to change the world, go work on carbon-neutral cement or batteries made out of rust and sand to store solar energy. You know, go work on low-heat steel. Those are the things where you're really making an impact. What we need to do in the market is really transform our notion of the cloud as this infinite nebulous, weightless item into something that is physical and has a physical impact on our lives.So, when you're trying to decide what your retention policy is for your data in your company when you're trying to decide where to replicate data, how long to hold it in active storage, you're really thinking about the megawatts that it takes, and the impact of that on the full picture.Corey: Well, a question that I've had as I look across my customer base of large companies doing interesting and exciting things with cloud, is I would love—absolutely love—to see a comparative analysis done by each provider that in very human terms, says what the relative climate impact is of taking all of their different storage services, on a per-petabyte basis, where I say, “Okay, if I want to store this in their object storage, or if I want to put this on disc volumes, or I want to use their deep-archive storage that looks an awful lot like tape, I don't care so much about the cost of those things, but I want to know what is the climate impact of this,” because I think that would be revelatory on a whole bunch of different levels. But it seems it's computes where they tend to focus instead.Catharine: Yeah, it would be really nice if as businesses, we started to look at the fuller impact of our actions. And it isn't just about the money saved. But my genuine belief is that it will get cheaper to do the right thing. And it is getting cheaper every day to use fewer resources. But the market has not caught up to that, and you can see that in how many companies are still giving away free, unlimited storage, right? You know, how many Go-Pro videos of someone's backyard, how many hours of that kind of footage is there out there in the world that's never going to get viewed again, but is sitting out there taking up energy that, at the same time, that we're having brownouts, and people are suffering and having to turn off their air conditioning?Corey: I think that we would do well as a society to get rid of a heck of a lot more data just because it sits there; it burns energy; it costs money, and I'm sorry, you're going to really have to reach to convince me that the web server access logs from 2012 are in any way business valuable or relevant to, basically, anyone out there.But I want to take it one step further because now that we know that we're definitely burning the planet to wind up storing a petabyte of data here, I'm very curious as to the climate footprint of then going into your world, taking that data, and throwing it somewhere else across the internet. Because I can tell you, almost to the penny, what that's going to cost, and it's an astonishingly large number because yeah, egress fees are what they are, but I couldn't tell you what the climate footprint of that is.Catharine: Yeah. When I was working at Fastly, we did a lot of optimizations across our network to avoid peak traffic because that was how we were built. You know, we had to build out to a certain network capacity, and then we could build, essentially, the area under our diurnal curve, we can build that out. But we don't have to, necessarily, serve it from the absolute closest data center. If we could serve it from a nearby data center or a provider that was three milliseconds of ‘wait and see more,' we could potentially use resources that we have elsewhere in the cloud to serve that request more efficiently.And I think we have an opportunity to do that with data centers scattered around the globe. Why aren't we load balancing so that we're pushing traffic from the data centers that are off-peak—you know, have energy to spare to accommodate for the data centers that are reaching capacity and don't have enough energy on the grid—why aren't we using these resources more efficiently?Corey: I've often lamented, from an economic perspective, that if I want to spend less money and optimize things, I can wind up trading out my instance types. Okay, I have a super-fast, high-end processor that costs a lot of money. I can get shittier compute by spending less. The same story with storage. I can get slower storage for less money that's a lot less performing, and it has some latencies added, but, “Great,” but I can make that decision.With networking, it's all of its nothing. It's there is no option for me to say, “I want to pay half of what the normal data rates are, but in return, I really only care that this data gets to where it's going by next Tuesday.” I don't need it done in sub-second latency speeds. There's no way to turn that off or to make that election. Increasingly, I really am coming around to the idea that cloud economics and sustainability are one in the same.Catharine: Yeah. For me, it makes a lot of sense. And, you know, when I look at people in their careers, focusing on cloud economics feels like a very, very easy win if you also care about sustainability. And it feels like once you have the data and the reporting tools—and, you know, we talked about the big gaps there—but if you're reporting on both your costs and the carbon footprint, you're developing a plan for how to optimize on both of them at the same time, and you're bringing that back to your management, bringing that back to your teammates, and really making sustainability an active value in your organization.I feel like there's not only a benefit to you, the finances of your company, and your personal career, but there's also a social impact where, you know, maybe you can feel a little less guilty about eating that steak. Maybe you can offset some travel that is increasing your carbon footprint; maybe you can do a trade-off; maybe you can do everything in little bursts across a broad scope, instead of us needing, you know, some big solution that's going to save us. There's no one solution.I think that's the main thing I've discovered in my education on sustainability is it has to be 50,000 small things, the ‘magic buckshot' rather than the ‘magic bullet,' is the term that I see used a lot. Carbon removal from the sky is coming, but while we wait for it, we got to slow the pace of digging the hole, and really give our solutions a chance to work.Corey: I despair at times at the lack of corporate will, I suppose, to wind up pursuing cloud sustainability as a customer of one of the cloud providers. I get people reaching out to me, pretty frequently, to help optimize the cost of their AWS bill. That is, definitely, what I do for a living. If I don't have people reaching out on that, something is going wrong somewhere. And even then, there have been months that have been relatively slow in recent years. Because well, it turns out when money is free, you don't really care that much about saving money. Now, people are tightening their belts and have to think about it a lot more, but that is a direct incentive of if you go ahead and optimize your cloud-spend bill, you will have more money.That is, sort of, what our capitalist system is supposed to optimize for in many respects. “Great,” you can have more money. But it's still not exciting for folks, and it's not what they really wind up chasing after. I despair at getting them to think larger than money because that's the only thing that companies generally tend to think about in the abstract, and start worrying about the future and climate and to invest significant effort in doing climate optimization. I don't know that there is a business today in greening your cloud workloads that could be started the way that I have for fixing the AWS bill.Catharine: Yeah, I don't think there's a business in it; I think it's a movement. It's like accessibility; it's like security; it's like a lot of other movements that have happened recently in tech where it becomes everybody's job. And it's important to people. And it becomes part of your company's brand, and you use it for recruitment; you use it for advancing your own career; you use it for making people feel like they're making a better decision.When I look at the three big cloud providers, and I look at the ways that they are marketing their sustainability, it is so slick. You go to their sustainability page and it's all, you know, beautiful, flashy graphics and information on all these feel-good things. Because they know, if they don't do it, they're going to be passed over because somebody is going to bring this up when they're evaluating their choices. Because we want it; we all want it. We just don't quite know how to get there. And until recently, it was more expensive, and you did have a green tax made the sustainable options more expensive. We're turning the page on that. Solar is cheaper than coal. And that's all you really—all you have to say to justify some of these advancements. It's all going to flow out of that simple fact.Corey: Cloud native just means you've got more components or microservices than anyone (even a mythical 10x engineer) can keep track of. With OpsLevel, you can build a catalog in minutes and forget needing that mythical 10x engineer. Now, you'll have a 10x service catalog to accompany your 10x service count. Visit OpsLevel.com to learn how easy it is to build and manage your service catalog. Connect to your git provider and you're off to the races with service import, repo ownership, tech docs, and more. Corey: I think that there's a tremendous opportunity here to think about this. And I think you're right. It absolutely takes on aspects of looking like a movement to do that. I'm optimistic about that. The counterpoint is that individuals are often not tremendously effective at altering the behavior of trillion-dollar companies, or even the relatively small ‘only' 50 billion-dollar companies out there.I can see where it starts, and I can see the outcome that you want there. I just have no idea what it looks like in between. It's, like, “Step two, we'll figure this part out later. Step three, climate.”Catharine: Yeah. If I were going to do it at my company, I would go to HR. And I would say, “I would like to form an employee-resource group around sustainability. Do you know anyone on the executive team who is interested in sustainability?” Get them to sponsor it; talk to that sponsor, and say, “We're the co-benefits here. What do you see as things that we absolutely need to do from a corporate-strategy standpoint that are aligned with this?”And then start having meetings—open meetings—where you invite people concerned about climate change, and you start to talk cross-functionally about, “What can we do? Can we change our retention policies? Can we change the way that we bill for services? Can we individually delete data on our Wiki that hasn't been accurate for seven years?” And, you know, start to talk and share successes. Then take it out to the larger industry and start giving talks because people want to be able to do something. Climate despair is real, but we, as cloud technologists, are so powerful in the resources that we have stewardship over. But I have to think that there is a possibility of making real change here.Corey: There's a certain point of scale at which point, having a sustainability conversation becomes productive. There are further points of scale where it becomes mandatory, let's be clear here. But when I'm building something in the off hours—mostly for shit-posting purposes—it generally tends to wind up costing maybe seven cents or so, when all is said and done because I'm using Lambda functions and other things that don't take a whole lot of computer resources out there. Googling what the most climate-effective way to implement that would be, is one of those exercises where the google search has a bigger carbon footprint than the entire start-to-finish of what it is that I'm building. It's not worth me looking into that.There is some inflection point between that, and we run 500,000 servers around the world or 500,000 instances where, yeah, there's a definite on-ramp where you need to start thinking about these things. What is that, I guess, that first initial point of, “I should be thinking about this,” for a given workload?Catharine: So, I've been trying to get data on this, and my best calculation is that an average server in a hyperscale data center, where you're using the whole thing for an entire year, is one to two tons of CO2 per year. So, I think when you start to look at other initiatives that you're seeing, I think the tipping point is around ten tons per year. And for some people, that's a lot; that's a lot of resources that you need to get up to that point.Corey: That feels directionally right. I think that is absolutely around where it starts to make sense. I mean, right now, I'm also in the uncomfortable creeping-awareness position of I've run a medium-sized EC2 instance persistently. That is my developer environment. I have it running all the time because having a Linux box is, sort of, handy. And whether I need it or not, it's there. If I were to turn it off when I go to sleep at night, for example, I do not believe that would have any climate impact whatsoever from the perspective of this is a medium-size instance. There are a bunch of those on any individual server.Amazon is not going to turn off Iraq right now because my instance is there or it's not. It is well within the margin of error for anything they have as far as provisioning or de-provisioning something. So, then someone, like, steals it to the term you used of climate despair a few minutes ago, that's what this feels like. It's one of those, “Well, okay. So, if it makes no actual difference if I were to spend instrumenting that thing to turn itself off at night and turn itself back on in the morning, it doesn't change a damn thing. I'm just doing something that is effectively meaningless in order to make myself feel better.”The enormity of the problem and the task, and doing it at scale, well, I'm not going to convince customers to do that. And for some cases, maybe that's for the better; maybe it's not. But I feel like for whatever I do, there's nothing I can do to make a difference in that sense, in my small-scale personal environment.Catharine: Yeah, yeah. I definitely appreciate that. This feel to me like the same concept of—I don't know, a couple of months ago, if you remember, California had a heat wave, and there were rolling brownouts. And we got a text that said, “Energy is at a high right now. Please turn off any unnecessary devices,” trying to avoid additional impact to the energy grid. And if you go and you look at the graph, there was an immediate decrease of 1500 megawatts in that moment because enough people got the text and took a small action, and it had the necessary impact. We avoided the brownouts, and the power, generally, kept flowing because it's such a big system.You know, if we're talking about three percent of global emissions, we're talking about, you know, power that's the size of the aviation industry. We're talking about power that's, roughly, the size of Switzerland just on data centers. You, as an individual, are not going to be able to make an impact; you, as an individual talking about this to as many people as possible—as we're doing right now—that starts to move the needle. And the thing I like about forming a grass roots group inside of your company is that it's not just about the data centers. Maybe, it's also about the service that comes in and brings you food and uses disposable containers; maybe, it's about people talking about their electric cars; maybe, it's about installing a heat pump; maybe, it's about talking about solutions instead of just talking about creeping dread all the time.Like, my move into sustainability has been largely in response to I can't keep doom-scrolling. I have to find the people who are making the solutions happen. And I just got out of a program with Climatebase where that is what I did for nine weeks is talk about the solutions. And all of the people in the companies that are actually doing something, they're so much more optimistic than the people I talk about who are just reading the headlines.Corey: Doing something absolutely feels better than sitting here helplessly and more less doom-scrolling about it. I absolutely empathize there. I think the trick is to get people to start taking action on this. I am curious, getting a little bit back to where you come from, something you alluded to at one point, was how energy markets are akin to network throughput. And I definitely wanted to dive into that. What do you mean? I'm not disagreeing, but I also have a really hard time seeing that. Help?Catharine: Yeah. So, I used to do capacity planning for Fastly. And so, we would spend all day staring at the diurnal curve of our network throughput because we had to plan for the peak. Whatever our traffic throughput was, our global network needed to be able to handle it. And every day—maybe we got close to that peak; maybe we didn't—but every day it would dip down into just the doldrums as people went to sleep and weren't using the internet.So, when I moved into looking at energy markets, specifically smart grids, and the way that renewables affect the available supply of electricity, I saw that same electricity curve; it's called the duck curve in electricity markets where you have this diurnal pattern and a point every day, where the grid has electricity available but no demand.So, when I was managing costs for our network, we would be trying, as much as possible, to fill that trough every day because it was free for us because we had already built out the infrastructure to fulfill that demand.And the energy markets are same way. We have built out the infrastructure. We just need the demand to meet the timing of the day. Put another way, you have to think fourth-dimensionally. It's like Doc Brown in Back to the Future III. Marty says, “If we continue along this track, the bridge isn't built yet. We're going to plunge into the canyon and die.” And Doc Brown says, “No, no, no. You're not thinking fourth-dimensionally. When we travel through time, we will be in the future, and the bridge will be there.” So, if we can shift the load from one region where energy is being consumed at its peak and move the traffic over to a region in the Pacific Northwest or a different time zone where they haven't yet hit their energy-consumption peak, we can more efficiently use the infrastructure that is already been built out.Corey: I really wish things were a lot easier to move around in that context. Data transfer fees make that very challenging, even if you can get around the latency challenges—which for many workloads is fine; that is not a prohibitive challenge. It's the moving things around; moving data to those other regions, especially, in the sense of, “But, okay. You're making it worse because now you have the data living in two different places instead of only one. You've doubled the carbon footprint of it, too.”For some workloads, it absolutely has significant merit. I just don't know exactly what that's going to look like—actually, I take that back—the more I think about that, the more I realize that in some level, that's what SDNs do already where, “Great, if this has to be built into something; if I hit an AWS endpoint or an API Gateway or something, I want to have an option when I'm building that out to be able to have that do more or less a follow-the-sun style pattern where it's honed out of wherever energy markets are inexpensive.” And that certainly is going to break things for a lot of workloads, but not all of them, not by far.Catharine: Yeah, and I think that is where my context is coming from. You know, working at Fastly, that was the notion, you know, “We're caching your data close to your end-users, so you don't have to operate resources in that area.” And we have a certain amount of leeway to how we serve that traffic. But it is a more global-distributed model and spinning up servers only when you need them is also a model that takes advantage of not having idle services around just in case you need them, actually responding to demand in real-time.If you look at what the future holds for, you know, smart grids, energy networks, there's this tremendous ability—and I would be very surprised if the big providers are not working on this—to integrate the two—so that electricity availability and how our network traffic is served, is just built into the big providers.Corey: I really hope that one of these big providers leads the way on that. That's the kind of thing that they should really want to see come out of these folks. We are recording this before AWS reinvents. So, if they did come out with something like this, good for them, and also, I have no idea, at the time of this recording, whether they are or not. So, if I got it right, no, I'm not breaking any confidentiality agreements. I feel I need to call that out explicitly because everyone assumes that I—that I have magic insight into everything they're going to come out with. Not really; usually it's all after the fact.Catharine: What I'm really hoping is that by the time this airs, Amazon has already released version two of their carbon footprint tool, where they have per data center visibility where it's no longer three months in arrears, so that you can actually do experimentation and see how differences in the way you implement your cloud impact your carbon footprint. Rather than just, like, sort of, the receipt of, “Yep, here's your carbon footprint.” Like, “No, no, no; I want to make it better. How do I make it better?”So, I'm very much hoping they make an announcement of that kind, and then I'll come back.Corey: You're welcome to come back if and when there's anything that any of these providers release that materially changes the trajectory we're currently on. I want to thank you for being so generous with your time. If people want to learn more, where's the best place for them to find you?Catharine: Yeah. You can find me on my website, Summerstir.com. And also, I hang out an awful lot with some very smart people on ClimateAction.tech. Their Slack is a great repository for people concerned about exactly these issues.Corey: And we will, of course, put links to that in the [show notes 00:37:21]. Thank you so much for being so generous with your time. I appreciate it.Catharine: This has been delightful. Thank you.Corey: Catharine Strauss, budding digital sustainability consultant. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment that also includes the cloud sustainability metrics for that podcast platform of choice.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Saving the World though Cloud Sustainability with Aerin Booth

Screaming in the Cloud

Play Episode Listen Later Jan 26, 2023 35:56


About AerinAerin is a Cloud Sustainability Advocate and neurodiverse founder in tech on a mission to help developers understand the real impact that cloud computing has on the world and reduce their carbon emissions in the cloud. Did you know that internet and cloud computing contribute over 4% of annual carbon emissions? Twice that of the airline industry!Aerin also hosts "Public Cloud for Public Good," a podcast targeted towards developers and senior leaders in tech. Every episode, they also donate £500 to charities and highlight organisations that are working towards a better future. Listen and learn how you can contribute towards making the world a better place through the use of public cloud services.Links Referenced: Twitter: https://twitter.com/aerincloud LinkedIn: https://www.linkedin.com/in/aerinb/ Public Cloud for Public Good: https://publicgood.cloud/ duckbillgroup.com: https://duckbillgroup.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: Cloud native just means you've got more components or microservices than anyone (even a mythical 10x engineer) can keep track of. With OpsLevel, you can build a catalog in minutes and forget needing that mythical 10x engineer. Now, you'll have a 10x service catalog to accompany your 10x service count. Visit OpsLevel.com to learn how easy it is to build and manage your service catalog. Connect to your git provider and you're off to the races with service import, repo ownership, tech docs, and more. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn and I am joined what feels like roughly a year later by a returning guest, Aerin Booth. How long have you been?Aerin: I've been really great. You know, it's been a journey of a year, I think, since we sort of did this podcast even, like, you know, a year and a bit since we met, and, like, I'm doing so much and I think it's doing, like, a big difference. And yeah, I can't wait for everything else. It's just yeah, a lot of work right now, but I'm really enjoying it. So, I'm really well, thank you.Corey: Normally, I like to introduce people by giving their job title and the company in which they work because again, that's a big deal for an awful lot of people. But a year ago, you were independent. And now you still are. And back when I was doing my own consulting independently, it felt very weird to do that, so I'm just going to call you the Ted Lasso of cloud at this point.Aerin: [laugh].Corey: You've got the mustache, you've got the, I would say, obnoxiously sunny disposition. It's really, there's a certain affinity right there. So, there we go. I feel like that is the best descriptor for what you have become.Aerin: I—do know what, I only just watched Ted Lasso over Christmas and I really found it so motivational in some ways because wow, like, it's not just who we'd want to be in a lot of ways? And I think, you know, for the work that I do, which is focused on sustainability, like, I want to present a positive future, I want to encourage people to achieve more and collaborate, and yeah, basically work on all these problems that we need to be worked on. And yeah, I think that's [laugh] [crosstalk 00:02:02]—Corey: One of the challenges of talking to you sometimes is you talk about these depressing things, but there's such a—you take such an upbeat, positive approach to it that I, by comparison, invariably come away from our conversations during, like, I'm Surly McBastard over here.Aerin: [laugh]. Yeah, you can be the bad cop of cloud computing and I'll try and be the good cop. Do you know, you say that the stuff I talk about is depressing, and it is true and people do worry about climate change. Like I did an online conference recently, it's focused on FinOps, and we had a survey, “Do you worry about climate change?” 70% of the people that responded said they worry about it.So, we all know, it's something we worry about and we care about. And, you know, I guess what I'm really trying to do is encourage people to care a bit more and start taking action and look after yourself. Because you know, when you do start taking action towards it, when you join those communities that are also working on it, it is good, it is helpful. And, you know, I've gone through some ups and downs and some of this, like, just do I throw in the towel because no one cares about it? Like, we spoke last year; I had attended re:Invent for the first time.This year, I was able to speak at re:Invent. So, I did a talk on being ethical in tech. And it was fun, it was good. I enjoyed what I delivered, but I had about 35 people sign up to that. I'm pretty sure if I talked about serverless or the next Web3 blockchain product, I would have got hundreds more. But what I'm starting to realize is that I think people just aren't ready to, sort of, want to do this yet. And yeah, I'm hoping that'll change.Corey: Let's first talk about, I guess, something that is more temporally pressing than some other things. Not that it is more important than climate change, mind you, but it feels like it's on a shorter timeline which is, relatively soon after this recording, there is a conference that you are kicking off called The State of Open. Ajar, Aerin. The State of Open is ajar. What is this conference? Is it in person? Is it virtual? Is it something where you and three friends are going to show up and basically talk to each other? How big? How small? What is it? What's it about? Tell me more, please. I'm riveted.Aerin: So, State of Open conference is a conference that's been in the works now for maybe about two weeks, a little bit longer in the planning, but the work we've been putting in over the last two weeks. It'll be on the seventh and eighth of February in London as a physical event in the QEII Conference Centre, but it will also be available online. And you know, when we talk about the State of Open, it's that question: what is the State of Open? The state of open-source, the state of open hardware, and the state of open data. And it is going to be probably the first and hopefully the biggest open-source conference in the UK.We already have over 100 confirmed guest speakers from Jimmy Wales, the co-founder of Wikipedia, to many of our great guests and headliners who haven't even announced yet for the plenary. So, I'm really excited. And the reason why I wanted to get involved with this is because one of the coolest things about this conference—compared to some others like re:Invent, for example—is that sustainability and diversity run through every single thing that we do. So, as the content director, I reviewed every single CFP for both of these things. I mean, you couldn't get a better person than someone like me, who's the queer person who won't shut up about sustainability to sort of do this thing.So, you know, I looked after those scorings for the CFPs in support of the CFP chairs. And now, as I'm working with those individual speakers on their content and making sure that diversity is included in the content. It's not just the diversity of the speaker, for example it's, who were the other people whose voice you're raising? What other people if you worked on this? Are there anyone that you've mentored, like, you know, actually, you know, let's have this as a wider conversation?Corey: Thank God. I thought you were about to say diversity of thought, and I was about to reach through the screen to strangle you.Aerin: [laugh]. No, no. I mean, we're doing really well, so of the announced speakers online, we are 40% non-male and about 18% non-white, which to be honest, for a fair sheer conference, when we didn't really do that much to specifically call this out, but I would probably raise this to Amanda Brock, who is the CEO of OpenUK, you know, she has built a community in the UK and around the world over the last few years which has been putting women forward and building these links. And that's why we've had such a great response for our first-year conferences, the work she's put in. It's hard.Like, this isn't easy. You know, we've had to do a lot of work to make sure that it is representative, at least better than other conferences, at least. So, I'm really excited. And like, there's so much, like, open-source is probably going to be the thing that saves the world. If we're going to end up looking at two different futures with monopolies and closed systems and all the money going towards cloud providers versus a fair and equitable society, open-source is the thing that's going to get us closer to that. So yeah, this conference will be a great event.Corey: Is it all in person? Is it being live-streamed as well? What is the deal here?Aerin: So, in person, we have loads of different things going on, but what will be streamed online if you sign up for virtual ticket is five different tracks. So, our platform engineering track, our security track, government law and policy, open data, and open hardware. And of course, the keynote and plenaries. But one of the things I'm also really proud about this conference is that we're really focusing on the developer experience, like, you know, what is your experience at the conference? So, we also have an unconference, we have a sub-conference run by Sustain OSS focused on workshops related to climate change and sustainability.We have loads of developer experience halls in the event itself. And throughout the day, over the two days, we have two one-hour blocks with no speaking content at all so that we can really make sure that people have that hardware track and are out there meeting each other and having a good time. And obviously, of course, like any good conference, the all-hands party on the first night. So, it really is a conference that's doing things differently from diversity to sustainability to that experience. So, it's awesome.Corey: One of the challenges that I've seen historically around things aiming at the idea of open conferences—and when we talk open-source, et cetera, et cetera—open' seems like it is a direction parallel to, we haven't any money, where it's, “Yes, we're a free software foundation,” and it turns out conferences themselves are not free. And you wind up with a whole bunch of folks showing up to it who are, in many cases, around the fringes of things. There are individual hobbyists who are very passionate about a thing but do not have the position in the corporate world. I'm looking through the lengthy list of speakers you have here and that is very much not this. These are serious people at serious companies. Not that there are not folks who are individual practitioners and passionate advocates and hobbyists than the rest. This is, by virtually any way you look at it, a remarkably diverse conference.Aerin: Mmm. You know, you are right about, like, that problem in open-source. It's like, you know, we look at open and whether we want to do open and we just go, “Well, it won't make me any money. I can't do that. I don't have the time. I need to bring in some money.”And one of the really unique things, again, about this conference is—I have not even mentioned it yet—we have an entrepreneurship room. So, we have 20 tables filled with entrepreneurs and CEOs and founders of open-source companies throughout the two days where you can book in time to sit at that table and have conversations with them. Ask them the questions that you want to ask about, whether it's something that you want to work on, or a company you want to found, and you'll be able to get that time. I had a very similar experience in some ways. It was re:Invent.I was a peer talk expert and you know, I had 15 or so conversations with some really interesting people just because they were able put that time in and they were able to find me on the website. So, that's something we are replicating to get those 20 also entrepreneurs and co-founders out to everyone else. They want to be able to help you and support you.Corey: That is an excellent segue if I do say so myself. Let's talk about re:Invent. It's the one time of the year you and I get to spend time in the same room. One thing that I got wrong is that I overbooked myself as I often do, and I didn't have time to do anything on their peer talk expert program, which is, you more or less a way that any rando can book time to sit down and chat with you. Now, in my case, I have assassination concerns because it turns out Amazon employees can read that thing too and some of them might work on billing. One wonders.So yeah, I have to be a little careful for personal reasons but for most people, it's a non-issue. I didn't get as much time as I wanted to talk to folks in the community. That is not going to repeat itself at the end of this year. But what was your take on re:Invent, because I was in meetings for most of them?Aerin: So, comparing this re:Invent to the re:Invent I went to, my first re:Invent when we met in 2021, you know, that was the re:Invent that inspired me to get into sustainability. They'd announced stuff to do with the shared responsibility model. A few months later, they released their carbon calculator, and I was like, “Yeah, this is the problem. This is the thing I want to work on and it will make me happy.” And a lot of that goes into, you know, finding a passion that keeps me motivated when things aren't that great.When maybe not a lot of money is coming in, at least I know, I'm doing everything I can to help save the world. So, re:Invent 2021 really inspired me to get involved with sustainability. When I look at re:Invent 2022, you might have Adam Selipsky on the main stage saying that sustainability is the problem of our generation, but that is just talk and bluster compared to what they were putting out in terms of content and their experience of, like, let's say the sustainability—I don't know what to call it—tiny little square in the back of the MGM Grand compared to the paid hall in the expo. Like, you know, that's the sort of thing where you can already see the prioritization of money. Let's put the biggest sponsors and all the money that we can bring it in the big hall where everyone is, and then put the thing we care about the most, apparently—sustainability—in the back of the MGM.And that in itself was annoying, but then you get there in the content, and it was like a massive Rivian van, like, an advert for, “Oh, Amazon has done all this to electrify Rivian and deliver you Prime.” But where was the people working on sustainability in the cloud? You know, we had a couple of teams who were talking about the customer carbon footprint tool, but there was just not much. And I spoke to a lot of people and they were saying similar things, like, “Where are the announcements? Where are the actual interesting things?” Rather than just—which is kind of what I'm starting to realize is that a lot of the conversations about sustainability is about selling yourself as sustainable.Use me rather than my competitors because we're 88% more, kind of, carbon neutral when it comes to traditional data centers, not because we are really going to solve these problems. And not to say that Amazon isn't doing innovative, amazing things that no one else can't do, because that is true, and cloud as part of the solution, but you know, sustainability shouldn't be about making more sales and growing your business, it should be about making the world a better place, not just in terms of carbon emissions, but you know, our life, the tech that we can access. Three billion people on this planet have never accessed the internet. And as we continue to grow all of our services like AI and machine learning and new Web3, bloody managed services come online, that's going to be more carbon, more compute power going towards the already rich and the already westernized people, rather than solving the problems we need to solve in the face of climate change.So, I was a little bit disappointed. And I did put a tweet thread out about it afterwards. And I just hope it can be different next year and I hope more people will start to ask for this. And that also what I'm starting to realize is that until more Amazon customers put this as their number one priority and say, “I'm not going to do business with you because of this issue,” or, you know, “This is what we really care about,” they're not going to make a change. Unless it starts to impact their bottom lines and people start to choose other cloud providers, they're not going to prioritize it.And I think up until this point, we're not seeing that from customers. We're kind of getting some people like me shouting about it, but across the board, sustainability isn't the number one priority right now. It's, like what Amazon says, security or resiliency or something else.Corey: And I think that, at least from where I set, the challenge is that if you asked me what I got out of re:Invent, and what the conversations I had—going into it, what are my expectations, and what do I hope to get and how's it going to end up, and then you ask you that same question—though maybe you are a poor example of this—and then you ask someone who works out as an engineer at a company that uses AWS and their two or three years into their career, why don't you talk to a manager or director or someone else? And the problem is if you start polling the entire audience, you'll find that this becomes—you're going to wind up with 20 different answers, at least. The conference doesn't seem like it has any idea of what it wants to be and to whom and in that vacuum, it tries to be all things to all people. And surprise, just like the shooting multifunction printer some of us have in our homes, it doesn't do well with any of those things because it's trying to stand in too many worlds at the same time.Aerin: You know, let's not, like, look at this from a way that you know, re:Invent is crap and, like, do all the work that everyone puts it is wasted because it is a really great event for a lot of different things for a lot of different people. And to be honest, the work that the Amazon staff put into it is pretty out of this world. I feel sorry though because you know, the rush for AWS sell more and do this massive event, they put people through the grinder. And I feel like, I don't know, we could see the cracks in some of that, the way that works. But, you know, there's so many people that I speak to who were like, “Yeah, I'm definitely not going again. I'm not even going to go anywhere near submitting a talk.”And, sort of, the thing is, like, I can imagine if the conference was something different; it was focused at sustainability at number one, it was about making the world a better place from everything that they do, it was about bringing diverse communities together. Like, you know, bringing these things up the list would make the whole thing a lot better. And to be honest, it would probably make it a lot more enjoyable [laugh] for the Amazon staff who end up talking at it. Because, you know, I guess it can feel a bit soulless over time is all you're doing is making money for someone else and selling more things. And, yeah, I think there's a lot more… different things we can do and a lot more things we can talk about if people just start to talk about, like you know, if you care about this as well and you work at Amazon, then start saying that as well.It'll really make a difference if you say we want re:Invent to look different. I mean, even Amazon staff, [laugh] and we've not even mentioned this one because I got Covid straight after re:Invent, nine days and staring at a wall in hotel room in Vegas was not my idea of a good time post-conference. So, that was a horrible, horrible experience. But, you know, I've had people call it re:Infect. Like, where are the Covid support?Like, there was hardly any conversation about that. It was sort of like, “Don't mention it because oh, s”—whatever else. But imagine if you just did something a little bit differently to look like you care about your customers. Just say, “We recommend people mask or take a test,” or even provide tests and masks. Like, even if it's not mandatory, they could have done a lot more to make it safer for everyone. Because, yeah, imagine having the reputation of re:Infect rather than re:Invent?Corey: I can only imagine how that would play out.Aerin: Only imagine.Corey: Yeah, it's it feels like we're all collectively decided to pretend that the pandemic is over. Because yeah, that's a bummer. I don't want to think about it. You know, kind of like we approach climate change.Aerin: Yeah. At the end of the day, like, and I keep coming across this more and more, you know, my thinking has changed over the last year because, like, you know, initially it was like a hyperactive puppy. Why are we caring about this? Like, yeah, if I say it, people will come, but the reality is, we have to blinker ourselves in order to deal with a lot of this stuff. We can't always worry about all of this stuff all of the time. And that's fine. That's acceptable. We do that in so many different parts of our life.But there comes to a point when you kind of think, “How much do I care about this?” And for a lot of people, it's because they have kids. Like, anyone who has kids right now must have to think, “Wow, what's the future going to look like?” And if you worry about what the future is going to look like, make sure you're taking steps to make the world a better place and make it the future you want it to look like. You know, I made the decision a long time ago not to have kids because I don't think I'd want to bring anyone into the world on what it might actually end up being, but you know, when I speak to people who are older in the 60s and they're like, “Oh, you've got 100 years. You don't need to worry about it.” Like, “Maybe you can say that because you're closer to dying than I am.” But yeah, I have to worry about this now because I'll still be eighty when all this shit is kicking off [laugh].Corey: This episode is sponsored in part by our friends at Strata. Are you struggling to keep up with the demands of managing and securing identity in your distributed enterprise IT environment? You're not alone, but you shouldn't let that hold you back. With Strata's Identity Orchestration Platform, you can secure all your apps on any cloud with any IDP, so your IT teams will never have to refactor for identity again. Imagine modernizing app identity in minutes instead of months, deploying passwordless on any tricky old app, and achieving business resilience with always-on identity, all from one lightweight and flexible platform.Want to see it in action? Share your identity challenge with them on a discovery call and they'll hook you up with a complimentary pair of AirPods Pro. Don't miss out, visit Strata.io/ScreamingCloud. That's Strata dot io slash ScreamingCloud.Corey: That I guess is one of the big fears I have—and I think it's somewhat unfounded—is that every year starts to look too much like the year before it. Because it's one of those ideas where we start to see the pace of innovation is slowing at AWS—and I'm not saying that to piss people at Amazon off and have them come after me with pitchforks and torches again—but they're not launching new services at the rate they once did, which is good for customers, but it starts to feel like oh, have we hit peak cloud this is what it's going to look like? Absolutely not. I don't get the sense that the world is like, “Well, everything's been invented. Time to shut down the patent office,” anytime soon.And in the short term, it feels like oh, there's not a lot exciting going on, but you look back the last five years even and look at how far we've come even in that period of time and—what is it? “The days are long, but the years are short.” It becomes a very macro thing of as things ebb and flow, you start to see the differences but the micro basis on a year-to-year perspective, it seems harder to detect. So longer term, I think we're going to see what the story looks like. And it's going to be satisfying one. Just right now, it's like, well, this wasn't as entertaining as I would have hoped, so I'm annoyed. Which I am because it wasn't, but that's not the biggest problem in the world.Aerin: It's not. And, you know, you look at okay, cool, there wasn't all these new flashy services. There was a few things are announced, I mean, hopefully that are going to contribute towards climate change. One of them is called AWS Supply Chain. And the irony of seeing sort of like AWS Supply Chain where a company that already has issues with data and conversations around competition, saying to everyone, “Hey, trust us and give all of your supply chain information and put it into one of our AWS products,” while at the same time their customer carbon footprint tool won't even show the full scope for their emissions of their own supply chain is not lost on me.And you do say, “Maybe we should start seeing things at a macro level,” but unless Amazon and other cloud hyperscalers start pulling the finger out and showing us how they have got a vision between now and 2040, and now in 2050, of how they're going to get there, it kind of just feels like they're saying, “It'll all be fine as long as we continue to grow, as long as we keep sucking up the market.” And, you know, an interesting thing that just kicked off in the UK back in November was the Competition and Markets Authority have started an investigation into the cloud providers on how they are basically sucking up all these markets, and how the growth of things that are not hyperscale is going. So, in the UK, the percentage of cloud has obviously gone up—more and more cloud spending has gone up—but kind of usage across non-hyperscalers has gone down over that same period. And they really are at risk of sucking up the world. Like, I have got involved in a lot of different things.I'm an AWS community builder; like, I do promote AWS. And, you know, the reason why I promote cloud, for example is serverless. We need serverless as the way we run our IT because that's the only way we'll do things like time shifting or demand shifting. So, when we look at renewable energy on the grid if that really high, the wind is blowing and the sun is shining, we want more workloads to be running then and when they're tiny, and they're [unintelligible 00:21:03], and what's the call it serverless generally, uh—Corey: Hype?Aerin: Function as a Code?Corey: Function—yeah, Function as a Service and all kinds of other nonsense. But I have to ask, when you're talking about serverless, in this context, is a necessary prerequisite of serverless that scale to zero when it's [unintelligible 00:21:19].Aerin: [laugh]. I kind of go back to marketing. What Amazon releasing these days when it relates to serverless that isn't just marketing and saying, “Oh, it's serverless.” Because yeah, there was a few products this year that is not scaled to zero is it? It's a 100-pound minimum. And when you're looking at number of accounts that you have, that can add up really quickly and it excludes people from using it.Corey: It's worse than that because it's not number of accounts. I consider DynamoDB to be serverless, by any definition of the term. Because it is. And what I like about it is I can have a separate table for every developer, for every service or microservice or project that they have, and in fact, each branch can have its own stuff like that. I look at some of the stuff that I build with multi-branch testing and whatnot, and, “Oh, wow. That would cost more than the engineer if they were to do that with some of the serverless offerings that AWS has put out.”Which makes that entire philosophy a complete non-starter, which means that invariably as soon as you start developing down that path, you are making significant trade-offs. That's just from a economics slash developer ergonomics slash best practices point of view. But there's a sustainability story to it as well.Aerin: Yeah. I mean, this sustainability thing is like, if you're not going to encourage this new way of working, like, if you're not going to move everyone to this point of view and this is how we need to do things, then you kind of just propagating the old world, putting it into your data center. For every managed service that VMware migrated piece of crap, just that land in the cloud, it's not making a real difference in the world because that's still going to exist. And we mentioned this just before the podcast and, you know, a lot of focus these days and for a lot of people is, “Okay, green energy is the problem. We need to solve green energy.”And Amazon is the biggest purchaser of power purchase agreements in renewable energy around the world, more than most governments. Or I think that the biggest corporate purchaser of it anyway. And that all might sound great, like, “Oh, the cloud is going to solve this problem for me and Amazon is going to solve it for me even better because they're bigger.” But at the end of the day, when we think about a data center, it exists in the real world.It's made of concrete. You know, when you pour concrete and when you make concrete, it releases CO2. It's got racks of servers that all are running. So, those individual servers had to be made by whoever it is in Asia or mined from rare earth metals and end up in the supply chain and then transported into the data centers in us-east-1. And then things go wrong. You have to repair you have to replace and you have to maintain them.Unless we get these circular economies going in a closed system, we can't just continue to grow like this. Because carbon emissions related to Scope 3, all those things I've just been talking about, basically anything that isn't the energy, is about 80 to 90% of all the carbon emissions. So, when Amazon says, “Oh, we're going to go green and get energy done by 2030”—which is seven years away—they've then got ten years to solve 90% of the problem. And we cannot all just continue to grow and think of tech as neutral and better for the world if we still got that 90% problem, which we do right now. And it really frustrates me when you look at the world and the way we've jumped on technology just go on, “Oh, it must be good.”Like Bitcoin, for example. Bitcoin has released 200 million metric tons of CO2 since its inception. And for something that is basically a glorified Ponzi scheme, I can't see how that is making the world a better place. So, when cloud providers are making managed services for Web3 and for blockchain, and they're selling more and more AI and machine learning, basically so they can keep on selling GPU access, I do worry about whether our path to infinite growth with all of these hyperscalers is probably the wrong way of looking at things. So, linking back to, you know, the conference, open-source and, you know, thinking about things differently is really important in tech right now.And not just for your own well-being and being able to sleep at night, but this is how we're going to solve our problems. When all companies on the planet want people to be sustainable and we have to start tackling this because there's a financial cost related to it, then you're going to be in the vogue. If you're really good developer, thinking about things differently can be efficient, then yeah, you're the developer that's going to win in the future. You might be assisted by ChatGPT three or whatever else, but yeah, sustainability and efficiency can really be the number one priority because it's a win, win, win. We save the world, we make ourselves better, we sleep better at night, and you just become a better developer.I keep monologuing at this point, but you know, when it comes to stuff like games design, we look at things like Quake and Pokemon and all these things when there's like, “How did they get these amazing games and these amazing experiences in such small sizes,” they had boundaries. They had boundaries to innovate within because they had to. They couldn't release the game if they couldn't fit into the cartridge, therefore, they made it work. When the cloud is sold as infinitely scalable and horizontally scalable and no one needs to worry about this stuff because you can get your credit card out, people stop caring about being innovative and being more efficient. So yeah, let's get some more boundaries in the cloud.Corey: What I find that is super helpful, has been, like, if I can, like, descri—like, Instagram is down. Describe your lunch to me style meme description, like, the epic handshake where you have two people clasping hands, and one side is labeled in this case, ‘sustainability advocates,' and the other side should be labeled ‘cloud economists,' and in the middle, it's, “Turn that shit off.” Because it's not burning carbon if it's not running, and it's not costing you anything—ideally—if it's not running, so it's one of those ideas where we meet in the middle. And that's important, not just because it makes both of us independently happy because it's both good for the world and you'll get companies on board with this because, “Wait. We can do this thing and it saves us money?” Suddenly, you're getting them aligned because that is their religion.If companies could be said to have a religion, it is money. That's the way it works. So, you have to make it worth money for them to do the right thing or you're always going to be swimming upstream like a depressed salmon.Aerin: I mean, look at why [unintelligible 00:27:11] security is near the top: because there's so many big fines related to security breaches. It will cost them money not to be secure. Right now, it doesn't cost companies money to be inefficient or to release all this carbon, so they get away with it or they choose to do it. And I think that's going to change. We see in regulations across you're coming out.So, you know, if you work for a big multinational that operates in Europe, by next year, you'll have to report on all of your Scope 3 carbon emissions. If you're a customer of AWS right now, you have no ability to do that. So, you know, this is going to be crunch time over the next 18 months to two years for a lot of big businesses, for Amazon and the other hyperscalers, to really start demonstrating that they can do this. And I guess that's my big push. And, you know, I want to work with anyone, and it's funny because I have been running this business for about, you know, a couple of years now, it's been going really well, I did my podcast, I'm on this path.But I did, last year, take some time, and I applied into AWS. And you know, I was like, “Okay, maybe I'll apply for this big tech company and help Amazon out.” And because I'll take that salary and I'll do something really good with it afterwards, I'll do my time for three years and attend re:Invent and deliver 12 talks and never sleep, but you know, at the end of it, I'll say, “Okay, I've done that and now I can do something really good.” Unfortunately, I didn't get the role—or fortunately—but you know, when I applied for that role, what I said to them is, “I really care about sustainability. I want to make the world a better place. I want to help your customers be more sustainable.”And they didn't want me to join. So, I'm just going to continue doing that but from the outside. And whether that means working with politicians or developers or anyone else to try and make the world better and to kind of help fight against climate change, then, yeah, that's definitely what I'm doing.Corey: So, one last question before we wind up calling it an episode. How do we get there? What is the best next step that folks can take? Because it's easy to look at this as a grand problem and realize it's too big to solve. Well, great. You don't need to solve the entire problem. You need take the first step. What is that first step?Aerin: Individuals, I would say it's just realizing that you do care about it and you want to take action. And you're going to say to yourself, “Even if I do little things, I'm going to move forward towards that point.” So, if that is being a more sustainable engineer or getting more conversations about climate change or even just doing other things in your community to make the world a better place than it is, taking that action. But one thing that I can definitely help about and talk a bit more of is that at the conference itself, I'll be running a panel with some great experts called the, “Next Generation of Cloud Education.” So, I really think we need to—like I said earlier in the podcast—to think differently about the cloud and IT.So, I am doing this panel and I'm bringing together someone like Simon Wardley to help people do Wardley Mapping. Like, that is a tool that allows you to see the landscape that you're operating in. You know, if you use that sort of tool to understand the real-world impact of what you're doing, then you can start caring about it a bit more. I'm bringing in somebody called Anne Currie, who is a tech ethicist and speaker and lecturer, and she's actually written some [laugh] really great nonfiction books, which I'd recommend everyone reads. It starts with Utopia Five.And that's about asking, “Well, is this ethical? Can we continue to do these things?” Can't—talks about things about sustainability. If it's not sustainable for everyone, it's not ethical. So, when I mentioned 3 billion people currently don't use the internet, it's like, can we continue to just keep on doing things the same way?And then John Booth, who is a data center expert, to help us really understand what the reality is on the ground. What are these data centers really look like? And then Amanda Brock, from OpenUK in the conference will joining as well to talk about, kind of, open-source and how we can make the world kind of a better place by getting involved in these communities. So, that'll be a really great panel.But what I'm also doing is releasing this as an online course. So, for people who want to get involved, it will be very intimate, about 15 seats on each core, so three weeks for you to actually work and talk directly with some of these experts and me to figure out what you want to do in the world of climate change and how you can take those first steps. So, it'll be a journey that even starts with an ecotherapist to help us deal with climate grief and wonder about the things we can do as individuals to feel better ourselves and be happier. So, I think that'd be a really great thing for a lot of people. And, yeah, not only that, but… it'll be great for you, but it also goes towards making the world a better place.So, 50% of the course fees will be donated, 25%, to charity, and 25% supporting open-source projects. So, I think it kind of just win, win, win. And that's the story of sustainability in general. It's a win, win, win for everyone. If you start seeing the world through a lens of sustainability, you'll save money, you'll sleep better at night, you'll get involved with some really great communities, and meet some really great people who care about this as well. And yeah, it'll be a brighter future.Corey: If people want to learn more, where can they find you?Aerin: So, if you want to learn more about what I'm up to, I'm on Twitter under @aerincloud, that A-E-R-I-N cloud. And then you can also find me on LinkedIn. But I also run my own podcast that was inspired by Corey, called Public Cloud for Public Good talking about cloud sustainability and how to make the world a better place for the use of public cloud services.Corey: And we will, of course, put a link to that in the [show notes 00:32:32]. Thank you so much for your time. I appreciate it, as always.Aerin: Thank you.Corey: Aerin Booth, the Ted Lasso of cloud. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this episode, please leave a five-star review on your podcast platform of choice, along with an angry and insulting comment that I will immediately scale to zero in true serverless fashion.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Solving for Cloud Security at Scale with Chris Farris

Screaming in the Cloud

Play Episode Listen Later Jan 24, 2023 35:39


About Chris Chris Farris has been in the IT field since 1994 primarily focused on Linux, networking, and security. For the last 8 years, he has focused on public-cloud and public-cloud security. He has built and evolved multiple cloud security programs for major media companies, focusing on enabling the broader security team's objectives of secure design, incident response and vulnerability management. He has developed cloud security standards and baselines to provide risk-based guidance to development and operations teams. As a practitioner, he's architected and implemented multiple serverless and traditional cloud applications focused on deployment, security, operations, and financial modeling.Chris now does cloud security research for Turbot and evangelizes for the open source tool Steampipe. He is one if the organizers of the fwd:cloudsec conference (https://fwdcloudsec.org) and has given multiple presentations at AWS conferences and BSides events.When not building things with AWS's building blocks, he enjoys building Legos with his kid and figuring out what interesting part of the globe to travel to next. He opines on security and technology on Twitter and his website https://www.chrisfarris.comLinks Referenced: Turbot: https://turbot.com/ fwd:cloudsec: https://fwdcloudsec.org/ Steampipe: https://steampipe.io/ Steampipe block: https://steampipe.io/blog TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Tailscale SSH is a new, and arguably better way to SSH. Once you've enabled Tailscale SSH on your server and user devices, Tailscale takes care of the rest. So you don't need to manage, rotate, or distribute new SSH keys every time someone on your team leaves. Pretty cool, right? Tailscale gives each device in your network a node key to connect to your VPN, and uses that same key for SSH authorization and encryption. So basically you're SSHing the same way that you're already managing your network.So what's the benefit? Well, built-in key rotation, the ability to manage permissions as code, connectivity between any two devices, and reduced latency. You can even ask users to re-authenticate SSH connections for that extra bit of security to keep the compliance folks happy. Try Tailscale now - it's free forever for personal use.Corey: This episode is sponsored by our friends at Logicworks. Getting to the cloud is challenging enough for many places, especially maintaining security, resiliency, cost control, agility, etc, etc, etc. Things break, configurations drift, technology advances, and organizations, frankly, need to evolve. How can you get to the cloud faster and ensure you have the right team in place to maintain success over time? Day 2 matters. Work with a partner who gets it - Logicworks combines the cloud expertise and platform automation to customize solutions to meet your unique requirements. Get started by chatting with a cloud specialist today at snark.cloud/logicworks. That's snark.cloud/logicworksCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is someone that I have been meaning to invite slash drag onto this show for a number of years. We first met at re:Inforce the first year that they had such a thing, Amazon's security conference for cloud, as is Amazon's tradition, named after an email subject line. Chris Farris is a cloud security nerd at Turbot. He's also one of the organizers for fwd:cloudsec, another security conference named after an email subject line with a lot more self-awareness than any of Amazon's stuff. Chris, thank you for joining me.Chris: Oh, thank you for dragging me on. You can let go of my hair now.Corey: Wonderful, wonderful. That's why we're all having the thinning hair going on. People just use it to drag us to and fro, it seems. So, you've been doing something that I'm only going to describe as weird lately because your background—not that dissimilar from mine—is as a practitioner. You've been heavily involved in the security space for a while and lately, I keep seeing an awful lot of things with your name on them getting sucked up by the giant app surveillance apparatus deployed to the internet, looking for basically any mention of AWS that I wind up using to write my newsletter and feed the content grist mill every year. What are you doing and how'd you get there?Chris: So, what am I doing right now is, I'm in marketing. It's kind of a, you know, “Oops, I'm sorry I did that.”Corey: Oh, the running gag is, you work in DevRel; that means, “Oh, you're in marketing, but they're scared to tell you that.” You're self-aware.Chris: Yeah.Corey: Good for you.Chris: I'm willing to address that I'm in marketing now. And I've been a cloud practitioner since probably 2014, cloud security since about 2017. And then just decided, the problem that we have in the cloud security community is a lot of us are just kind of sitting in a corner in our companies and solving problems for our companies, but we're not solving the problems at scale. So, I wanted a job that would allow me to reach a broader audience and help a broader audience. Where I see cloud security having—you know, or cloud in general falling down is Amazon makes it really hard for you to do your side of shared responsibility, and so we need to be out there helping customers understand what they need to be doing. So, I am now at a company called Turbot and we're really trying to promote cloud security.Corey: One of the first promoted guest episodes of this show was David Boeke, your CTO, and one of the things that I regret is that I've sort of lost track of Turbot over the past few years because, yeah, one or two things might have been going on during that timeline as I look back at having kids in the middle of a pandemic and the deadly plague o'er land. And suddenly, every conversation takes place over Zoom, which is like, “Oh, good, it's like a happy hour only instead, now it's just like a conference call for work.” It's like, ‘Conference Calls: The Drinking Game' is never the great direction to go in. But it seems the world is recovering. We're going to be able to spend some time together at re:Invent by all accounts that I'm actively looking forward to.As of this recording, you're relatively new to Turbot, and I figured out that you were going there because, once again, content hits my filters. You wrote a fascinating blog post that hits on an interest of mine that I don't usually talk about much because it's off-putting to some folk, and these days, I don't want to get yelled at and more than I have to about the experience of traveling, I believe it was to an all-hands on the other side of the world.Chris: Yep. So, my first day on the job at Turbot, I was landing in Kuala Lumpur, Malaysia, having left the United States 24 hours—or was it 48? It's hard to tell when you go to the other side of the planet and the time zones have also shifted—and then having left my prior company day before that. But yeah, so Turbot about traditionally has an annual event where we all get together in person. We're a completely remote company, but once a year, we all get together in person in our integrate event.And so, that was my first day on the job. And then you know, it was basically two weeks of reasonably intense hackathons, building out a lot of stuff that hopefully will show up open-source shortly. And then yeah, meeting all of my coworkers. And that was nice.Corey: You've always had a focus through all the time that I've known you and all the public content that you've put out there that has come across my desk that seems to center around security. It's sort of an area that I give a nod to more often than I would like, on some level, but that tends to be your bread and butter. Your focus seems to be almost overwhelmingly on I would call it AWS security. Is that fair to say or is that a mischaracterization of how you view it slash what you actually do? Because, again, we have these parasocial relationships with voices on the internet. And it's like, “Oh, yeah, I know all about that person.” Yeah, you've met them once and all you know other than that is what they put on Twitter.Chris: You follow me on Twitter. Yeah, I would argue that yes, a lot of what I do is AWS-related security because in the past, a lot of what I've been responsible for is cloud security in AWS. But I've always worked for companies that were multi-cloud; it's just that 90% of everything was Amazon and so therefore 90% of my time, 90% of my problems, 90% of my risk was all in AWS. I've been trying to break out of that. I've been trying to understand the other clouds.One of the nice aspects of this role and working on Steampipe is I am now experimenting with other clouds. The whole goal here is to be able to scale our ability as an industry and as security practitioners to support multiple clouds. Because whether we want to or not, we've got it. And so, even though 90% of my spend, 90% of my resources, 90% of my applications may be in AWS, that 10% that I'm ignoring is probably more than 10% of my risk, and we really do need to understand and support major clouds equally.Corey: One post you had recently that I find myself in wholehearted agreement with is on the adoption of Tailscale in the enterprise. I use it for all of my personal nonsense and it is transformative. I like the idea of what that portends for a multi-cloud, or poly-cloud, or whatever the hell we're calling it this week, sort of architectures were historically one of the biggest problems in getting to clouds two speak to one another and manage them in an intelligent way is the security models are different, the user identity stuff is different as well, and the network stuff has always been nightmarish. Well, with Tailscale, you don't have to worry about that in the same way at all. You can, more or less, ignore it, turn on host-based firewalls for everything and just allow Tailscale. And suddenly, okay, I don't really have to think about this in the same way.Chris: Yeah. And you get the micro-segmentation out of it, too, which is really nice. I will agree that I had not looked at Tailscale until I was asked to look at Tailscale, and then it was just like, “Oh, I am completely redoing my home network on that.” But looking at it, it's going to scare some old-school network engineers, it's going to impact their livelihoods and that is going to make them very defensive. And so, what I wanted to do in that post was kind of address, as a practitioner, if I was looking at this with an enterprise lens, what are the concerns you would have on deploying Tailscale in your environment?A lot of those were, you know, around user management. I think the big one that is—it's a new thing in enterprise security, but kind of this host profiling, which is hey, before I let your laptop on the network, I'm going to go make sure that you have antivirus and some kind of EDR, XDR, blah-DR agents so that you know we have a reasonable thing that you're not going to just go and drop [unintelligible 00:09:01] on the network and next thing you know, we're Maersk. Tailscale, that's going to be their biggest thing that they are going to have to figure out is how do they work with some of these enterprise concerns and things along those lines. But I think it's an excellent technology, it was super easy to set up. And the ability to fine-tune and microsegment is great.Corey: Wildly so. They occasionally sponsor my nonsense. I have no earthly idea whether this episode is one of them because we have an editorial firewall—they're not paying me to set any of this stuff, like, “And this is brought to you by whatever.” Yeah, that's the sponsored ad part. This is just, I'm in love with the product.One of the most annoying things about it to me is that I haven't found a reason to give them money yet because the free tier for my personal stuff is very comfortably sized and I don't have a traditional enterprise network or anything like that people would benefit from over here. For one area in cloud security that I think I have potentially been misunderstood around, so I want to take at least this opportunity to clear the air on it a little bit has been that, by all accounts, I've spent the last, mmm, few months or so just absolutely beating the crap out of Azure. Before I wind up adding a little nuance and context to that, I'd love to get your take on what, by all accounts, has been a pretty disastrous year-and-a-half for Azure security.Chris: I think it's been a disastrous year-and-a-half for Azure security. Um—[laugh].Corey: [laugh]. That was something of a leading question, wasn't it?Chris: Yeah, no, I mean, it is. And if you think, though, back, Microsoft's repeatedly had these the ebb and flow of security disasters. You know, Code Red back in whatever the 2000s, NT 4.0 patching back in the '90s. So, I think we're just hitting one of those peaks again, or hopefully, we're hitting the peak and not [laugh] just starting the uptick. A lot of what Azure has built is stuff that they already had, commercial off-the-shelf software, they wrapped multi-tenancy around it, gave it a new SKU under the Azure name, and called is cloud. So, am I super-surprised that somebody figured out how to leverage a Jupyter notebook to find the back-end credentials to drop the firewall tables to go find the next guy over's Cosmos DB? No, I'm not.Corey: I find their failures to be less egregious on a technical basis because let's face it, let's be very clear here, this stuff is hard. I am not pretending for even a slight second that I'm a better security engineer than the very capable, very competent people who work there. This stuff is incredibly hard. And I'm not—Chris: And very well-funded people.Corey: Oh, absolutely, yeah. They make more than I do, presumably. But it's one of those areas where I'm not sitting here trying to dunk on them, their work, their efforts, et cetera, and I don't do a good enough job of clarifying that. My problem is the complete radio silence coming out of Microsoft on this. If AWS had a series of issues like this, I'm hard-pressed to imagine a scenario where they would not have much more transparent communications, they might very well trot out a number of their execs to go on a tour to wind up talking about these things and what they're doing systemically to change it.Because six of these in, it's like, okay, this is now a cultural problem. It's not one rando engineer wandering around the company screwing things up on a rotational basis. It's, what are you going to do? It's unlikely that firing Steven is going to be your fix for these things. So, that is part of it.And then most recently, they wound up having a blog post on the MSRC, the Microsoft Security Resource Center is I believe that acronym? The [mrsth], whatever; and it sounds like a virus you pick up in a hospital—but the problem that I have with it is that they spent most of that being overly defensive and dunking on SOCRadar, the vulnerability researcher who found this and reported it to them. And they had all kinds of quibbles with how it was done, what they did with it, et cetera, et cetera. It's, “Excuse me, you're the ones that left customer data sitting out there in the Azure equivalent of an S3 bucket and you're calling other people out for basically doing your job for you? Excuse me?”Chris: But it wasn't sensitive customer data. It was only the contract information, so therefore it was okay.Corey: Yeah, if I put my contract information out there and try and claim it's not sensitive information, my clients will laugh and laugh as they sue me into the Stone Age.Chris: Yeah well, clearly, you don't have the same level of clickthrough terms that Microsoft is able to negotiate because, you know, [laugh].Corey: It's awful as well, it doesn't even work because, “Oh, it's okay, I lost some of your data, but that's okay because it wasn't particularly sensitive.” Isn't that kind of up to you?Chris: Yes. And if A, I'm actually, you know, a big AWS shop and then I'm looking at Azure and I've got my negotiations in there and Amazon gets wind that I'm negotiating with Azure, that's not going to do well for me and my business. So no, this kind of material is incredibly sensitive. And that was an incredibly tone-deaf response on their part. But you know, to some extent, it was more of a response than we've seen from some of the other Azure multi-tenancy breakdowns.Corey: Yeah, at least they actually said something. I mean, there is that. It's just—it's wild to me. And again, I say this as an Azure customer myself. Their computer vision API is basically just this side of magic, as best I can tell, and none of the other providers have anything like it.That's what I want. But, you know, it almost feels like that service is under NDA because no one talks about it when they're using this service. I did a whole blog post singing its praises and no one from that team reached out to me to say, “Hey, glad you liked it.” Not that they owe me anything, but at the same time it's incredible. Why am I getting shut out? It's like, does this company just have an entire policy of not saying anything ever to anyone at any time? It seems it.Chris: So, a long time ago, I came to this realization that even if you just look at the terminology of the three providers, Amazon has accounts. Why does Amazon have Amazon—or AWS accounts? Because they're a retail company and that's what you signed up with to buy your underwear. Google has projects because they were, I guess, a developer-first thing and that was how they thought about it is, “Oh, you're going to go build something. Here's your project.”What does Microsoft have? Microsoft Azure Subscriptions. Because they are still about the corporate enterprise IT model of it's really about how much we're charging you, not really about what you're getting. So, given that you're not a big enterprise IT customer, you don't—I presume—do lots and lots of golfing at expensive golf resorts, you're probably not fitting their demographic.Corey: You're absolutely not. And that's wild to me. And yet, here we are.Chris: Now, what's scary is they are doing so many interesting things with artificial intelligence… that if… their multi-tenancy boundaries are as bad as we're starting to see, then what else is out there? And more and more, we is carbon-based life forms are relying on Microsoft and other cloud providers to build AI, that's kind of a scary thing. Go watch Satya's keynote at Microsoft Ignite and he's showing you all sorts of ways that AI is going to start replacing the gig economy. You know, it's not just Tesla and self-driving cars at this point. Dali is going to replace the independent graphics designer.They've got things coming out in their office suite that are going to replace the mom-and-pop marketing shops that are generating menus and doing marketing plans for your local restaurants or whatever. There's a whole slew of things where they're really trying to replace people.Corey: That is a wild thing to me. And part of the problem I have in covering AWS is that I have to differentiate in a bunch of different ways between AWS and its Amazon corporate parent. And they have that problem, too, internally. Part of the challenge they have, in many cases, is that perks you give to employees have to scale to one-and-a-half million people, many of them in fulfillment center warehouse things. And that is a different type of problem that a company, like for example, Google, where most of their employees tend to be in office job-style environments.That's a weird thing and I don't know how to even start conceptualizing things operating at that scale. Everything that they do is definitionally a very hard problem when you have to make it scale to that point. What all of the hyperscale cloud providers do is, from where I sit, complete freaking magic. The fact that it works as well as it does is nothing short of a modern-day miracle.Chris: Yeah, and it is more than just throwing hardware at the problem, which was my on-prem solution to most of the things. “Oh, hey. We need higher availability? Okay, we're going to buy two of everything.” We called it the Noah's Ark model, and we have an A side and a B side.And, “Oh, you know what? Just in case we're going to buy some extra capacity and put it in a different city so that, you know, we can just fail from our primary city to our secondary city.” That doesn't work at the cloud provider scale. And really, we haven't seen a major cloud outage—I mean, like, a bad one—in quite a while.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate. Is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other; which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at honeycomb.io/screaminginthecloud. Observability: it's more than just hipster monitoring.Corey: The outages are always fascinating, just from the way that they are reported in the mainstream media. And again, this is hard, I get it. I am not here to crap on journalists. They, for some ungodly, unknowable reason, have decided not to spend their entire career focusing on the nuances of one very specific, very deep industry. I don't know why.But as [laugh] a result, they wind up getting a lot of their baseline facts wrong about these things. And that's fair. I'm not here to necessarily act as an Amazon spokesperson when these things happen. They have an awful lot of very well-paid people who can do that. But it is interesting just watching the blowback and the reaction of whatever there's an outage, the conversation is never “Does Amazon or Azure or Google suck?” It's, “Does cloud suck as a whole?”That's part of the reason I care so much about Azure getting their act together. If it were just torpedoing Microsoft's reputation, then well, that's sad, but okay. But it extends far beyond that to a point where it's almost where the enterprise groundhog sees the shadow of a data breach and then we get six more years of data center build-outs instead of moving things to a cloud. I spent too many years working in data centers and I have the scars from the cage nuts and crimping patch cables frantically in the middle of the night to prove it. I am thrilled at the fact that I don't believe I will ever again have to frantically drive across town in the middle of the night to replace a hard drive before the rest of the array degrades. Cloud has solved those problems beautifully. I don't want to go back to the Dark Ages.Chris: Yeah, and I think that there's a general potential that we could start seeing this big push towards going back on-prem for effectively sovereign data reasons, whether it's this country has said, “You cannot store your data about our citizens outside of our borders,” and either they're doing that because they do not trust the US Silicon Valley privacy or whatever, or because if it's outside of our borders, then our secret police agents can come knocking on the door at two in the morning to go find out what some dissidents' viewings habits might have been, I see sovereign cloud as this thing that may be a back step from this ubiquitous thing that we have right now in Amazon, Azure, and Google. And so, as we start getting to the point in the history books where we start seeing maps with lots of flags, I think we're going to start seeing a bifurcation of cloud as just a whole thing. We see it already right now. The AWS China partition is not owned by Amazon, it is not run by Amazon, it is not controlled by Amazon. It is controlled by the communist government of China. And nobody is doing business in Russia right now, but if they had not done what they had done earlier this year, we might very well see somebody spinning up a cloud provider that is completely controlled by and in the Russian government.Corey: Well, yes or no, but I want to challenge that assessment for a second because I've had conversations with a number of folks about this where people say, “Okay, great. Like, is the alt-right, for example, going to have better options now that there might be a cloud provider spinning up there?” Or, “Well, okay, what about a new cloud provider to challenge the dominance of the big three?” And there are all these edge cases, either geopolitically or politically based upo—or folks wanting to wind up approaching it from a particular angle, but if we were hired to build out an MVP of a hyperscale cloud provider, like, the budget for that MVP would look like one 100 billion at this point to get started and just get up to a point of critical mass before you could actually see if this thing has legs. And we'd probably burn through almost all of that before doing a single dime in revenue.Chris: Right. And then you're doing that in small markets. Outside of the China partition, these are not massively large markets. I think Oracle is going down an interesting path with its idea of Dedicated Cloud and Oracle Alloy [unintelligible 00:22:52].Corey: I like a lot of what Oracle's doing, and if younger me heard me say that, I don't know how hard I'd hit myself, but here we are. Their free tier for Oracle Cloud is amazing, their data transfer prices are great, and their entire approach of, “We'll build an entire feature complete region in your facility and charge you what, from what I can tell, is a very reasonable amount of money,” works. And it is feature complete, not, “Well, here are the three services that we're going to put in here and everything else is well… it's just sort of a toehold there so you can start migrating it into our big cloud.” No. They're doing it right from that perspective.The biggest problem they've got is the word Oracle at the front end and their, I would say borderline addiction to big-E enterprise markets. I think the future of cloud looks a lot more like cloud-native companies being founded because those big enterprises are starting to describe themselves in similar terminology. And as we've seen in the developer ecosystem, as go startups, so do big companies a few years later. Walk around any big company that's undergoing a digital transformation, you'll see a lot more Macs on desktops, for example. You'll see CI/CD processes in place as opposed to, “Well, oh, you want something new, it's going to be eight weeks to get a server rack downstairs and accounting is going to have 18 pages of forms for you to fill out.” No, it's “click the button,” or—Chris: Don't forget the six months of just getting the financial CapEx approvals.Corey: Exactly.Chris: You have to go through the finance thing before you even get to start talking to techies about when you get your server. I think Oracle is in an interesting place though because it is embracing the fact that it is number four, and so therefore, it's like we are going to work with AWS, we are going to work with Azure, our database can run in AWS or it can run in our cloud, we can interconnect directly, natively, seamlessly with Azure. If I were building a consumer-based thing and I was moving into one of these markets where one of these governments was demanding something like a sovereign cloud, Oracle is a great place to go and throw—okay, all of our front-end consumer whatever is all going to sit in AWS because that's what we do for all other countries. For this one country, we're just going to go and build this thing in Oracle and we're going to leverage Oracle Alloy or whatever, and now suddenly, okay, their data is in their country and it's subject to their laws but I don't have to re-architect to go into one of these, you know, little countries with tin horn dictators.Corey: It's the way to do multi-cloud right, from my perspective. I'll use a component service in a different cloud, I'm under no illusions, though, in doing that I'm increasing my resiliency. I'm not removing single points of failure; I'm adding them. And I make that trade-off on a case-by-case basis, knowingly. But there is a case for some workloads—probably not yours if you're listening to this; assume not, but when you have more context, maybe so—where, okay, we need to be across multiple providers for a variety of strategic or contextual reasons for this workload.That does not mean everything you build needs to be able to do that. It means you're going to make trade-offs for that workload, and understanding the boundaries of where that starts and where that stops is going to be important. That is not the worst idea in the world for a given appropriate workload, that you can optimize stuff into a container and then can run, more or less, anywhere that can take a container. But that is also not the majority of most people's workloads.Chris: Yeah. And I think what that comes back to from the security practitioner standpoint is you have to support not just your primary cloud, your favorite cloud, the one you know, you have to support any cloud. And whether that's, you know, hey, congratulations. Your developers want to use Tailscale because it bypasses a ton of complexity in getting these remote island VPCs from this recent acquisition integrated into your network or because you're going into a new market and you have to support Oracle Cloud in Saudi Arabia, then you as a practitioner have to kind of support any cloud.And so, one of the reasons that I've joined and I'm working on, and so excited about Steampipe is it kind of does give you that. It is a uniform interface to not just AWS, Azure, and Google, but all sorts of clouds, whether it's GitHub or Oracle, or Tailscale. So, that's kind of the message I have for security practitioners at this point is, I tried, I fought, I screamed and yelled and ranted on Twitter, against, you know, doing multi-cloud, but at the end of the day, we were still multi-cloud.Corey: When I see these things evolving, is that, yeah, as a practitioner, we're increasingly having to work across multiple providers, but not to a stupendous depth that's the intimidating thing that scares the hell out of people. I still remember my first time with the AWS console, being so overwhelmed with a number of services, and there were 12. Now, there are hundreds, and I still feel that same sense of being overwhelmed, but I also have the context now to realize that over half of all customer spend globally is on EC2. That's one service. Yes, you need, like, five more to get it to work, but okay.And once you go through learning that to get started, and there's a lot of moving parts around it, like, “Oh, God, I have to do this for every service?” No, take Route 53—my favorite database, but most people use it as a DNS service—you can go start to finish on basically everything that service does that a human being is going to use in less than four hours, and then you're more or less ready to go. Everything is not the hairy beast that is EC2. And most of those services are not for you, whoever you are, whatever you do, most AWS services are not for you. Full stop.Chris: Yes and no. I mean, as a security practitioner, you need to know what your developers are doing, and I've worked in large organizations with lots of things and I would joke that, oh, yeah, I'm sure we're using every service but the IoT, and then I go and I look at our bill, and I was like, “Oh, why are we dropping that much on IoT?” Oh, because they wanted to use the Managed MQTT service.Corey: Ah, I start with the bill because the bill is the source of truth.Chris: Yes, they wanted to use the Managed MQTT service. Okay, great. So, we're now in IoT. But how many of those things have resource policies, how many of those things can be made public, and how many of those things are your CSPM actually checking for and telling you that, hey, a developer has gone out somewhere and made this SageMaker notebook public, or this MQTT topic public. And so, that's where you know, you need to have that level of depth and then you've got to have that level of depth in each cloud. To some extent, if the cloud is just the core basic VMs, object storage, maybe some networking, and a managed relational database, super simple to understand what all you need to do to build a baseline to secure that. As soon as you start adding in on all of the fancy services that AWS has. I re—Corey: Yeah, migrating your Step Functions workflow to other cloud is going to be a living goddamn nightmare. Migrating something that you stuffed into a container and run on EC2 or Fargate is probably going to be a lot simpler. But there are always nuances.Chris: Yep. But the security profile of a Step Function is significantly different. So, you know, there's not much you can do there wrong, yet.Corey: You say that now, but wait for their next security breach, and then we start calling them Stumble Functions instead.Chris: Yeah. I say that. And the next thing, you know, we're going to have something like Lambda [unintelligible 00:30:31] show up and I'm just going to be able to put my Step Function on the internet unauthenticated. Because, you know, that's what Amazon does: they innovate, but they don't necessarily warn security practitioners ahead of their innovation that, hey, you're we're about to release this thing. You might want to prepare for it and adjust your baselines, or talk to your developers, or here's a service control policy that you can drop in place to, you know, like, suppress it for a little bit. No, it's like, “Hey, these things are there,” and by the time you see the tweets or read the documentation, you've got some developer who's put it in production somewhere. And then it becomes a lot more difficult for you as a security practitioner to put the brakes on it.Corey: I really want to thank you for spending so much time talking to me. If people want to learn more and follow your exploits—as they should—where can they find you?Chris: They can find me at steampipe.io/blog. That is where all of my latest rants, raves, research, and how-tos show up.Corey: And we will, of course, put a link to that in the [show notes 00:31:37]. Thank you so much for being so generous with your time. I appreciate it.Chris: Perfect, thank you. You have a good one.Corey: Chris Farris, cloud security nerd at Turbot. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry insulting comment, and be sure to mention exactly which Azure communications team you work on.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Becoming a Rural Remote Worker with Chris Vermilion

Screaming in the Cloud

Play Episode Listen Later Jan 19, 2023 33:01


About ChrisChris is a mostly-backend mostly-engineer at Remix Labs, working on visual app development. He has been in software startups for ten years, but his first and unrequited love was particle physics.  Before joining Remix Labs, he wrote numerical simulation and analysis tools for the Large Hadron Collider, then co-founded Roobiq, a clean and powerful mobile client for Salesforce back when the official ones were neither.Links Referenced: Remix Labs: https://remixlabs.com/ Twitter: https://twitter.com/chrisvermilion TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Tailscale SSH is a new, and arguably better way to SSH. Once you've enabled Tailscale SSH on your server and user devices, Tailscale takes care of the rest. So you don't need to manage, rotate, or distribute new SSH keys every time someone on your team leaves. Pretty cool, right? Tailscale gives each device in your network a node key to connect to your VPN, and uses that same key for SSH authorization and encryption. So basically you're SSHing the same way that you're already managing your network. So what's the benefit? Well, built-in key rotation, the ability to manage permissions as code, connectivity between any two devices, and reduced latency. You can even ask users to re-authenticate SSH connections for that extra bit of security to keep the compliance folks happy. Try Tailscale now - it's free forever for personal use.Corey: This episode is sponsored by our friends at Logicworks. Getting to the cloud is challenging enough for many places, especially maintaining security, resiliency, cost control, agility, etc, etc, etc. Things break, configurations drift, technology advances, and organizations, frankly, need to evolve. How can you get to the cloud faster and ensure you have the right team in place to maintain success over time? Day 2 matters. Work with a partner who gets it - Logicworks combines the cloud expertise and platform automation to customize solutions to meet your unique requirements. Get started by chatting with a cloud specialist today at snark.cloud/logicworks. That's snark.cloud/logicworksCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. When I was nine years old, one of the worst tragedies that can ever befall a boy happened to me. That's right, my parents moved me to Maine. And I spent the next ten years desperately trying to get out of the state.Once I succeeded and moved to California, I found myself in a position where almost nothing can drag me back there. One of the exceptions—basically, the only exception—is Monktoberfest, a conference put on every year by the fine folks at RedMonk. It is unquestionably the best conference that I have ever been to, and it continually amazes me every time I go. The last time I was out there, I met today's guest. Chris Vermilion is a Senior Software Developer at Remix Labs. Chris, now that I finished insulting the state that you call home, how are you?Chris: I'm great. I'm happy to be in a state that's not California.Corey: I hear you. It's, uh—I talk a lot of smack about Maine. But to be perfectly direct, my problem with it is that I grew up there and that was a difficult time in my life because I, really I guess, never finished growing up according to most people. And all right, we'll accept it. No one can hate a place in the same way that you can hate it if you grew up there and didn't enjoy the experience.So, it's not Maine that's the problem; it's me. I feel like I should clarify that I'm going to get letters and people in Maine will write those letters and then have to ride their horses to Massachusetts to mail them. But we know how that works.Chris: [laugh].Corey: So, what is Remix Labs? Let's start there. Because Remix sounds like… well, it sounds like a term that is overused. I see it everywhere in the business space. I know there was a Remix thing that recently got sold to I think it was at Shopify or Spotify; I keep getting those two confused. And—Chris: One of the two, yeah.Corey: Yeah, exactly one of them plays music and one of them sells me things except now I think they both do both, and everything has gone wonky and confusing. But what do you folks do over there?Chris: So, we work on visual app development for everybody. So, the goal is to have kind of a spreadsheet-on-steroids-like development environment where you can build interactively, you have live coding, you have a responsive experience in building interactive apps, websites, mobile apps, a little bit of everything, and providing an experience where you can build systems of engagement. So tools, mobile apps, that kind of work with whatever back-end resources you're trying to do, you can collaborate across different people, pass things around, and you can do that all with a nice kind of visual app developer, where you can sort of drop nodes around and wire them together and built in a way that's it's hopefully accessible to non-developers, to project managers, to domain experts, to you know, whatever stakeholders are interested in modifying that final product.Corey: I would say that I count as one of those. I use something similar to build the tool that assembles my newsletter every week, and that was solving a difficult problem for me. I can write back-ends reasonably well, using my primary tool, which is sheer brute force. I am not much of a developer, but it turns out that with enough enthusiasm, you can overcome most limitations. And that's great, but I know nothing about front end; it does not make sense to me, it does not click in the way that other things have clicked.So, I was fourth and inches from just retaining a contractor to build out a barely serviceable internal app. And I discovered, oh, use this low-code tool to drag and drop things and that basically was Visual Basic for internal apps. And that was awesome, but they're still positioned squarely in the space of internal apps only. There's no mobile app story, there's—and it works well enough for what I do, but I have other projects, I want to wind up getting out the door that are not strictly for internal use that would benefit from being able to have a serviceable interface slapped onto. It doesn't need to be gorgeous, it doesn't need to win awards, it just needs to be, “Cool, it can display the output of a table in a variety of different ways. It has a button and when I click a button, it does a thing, generally represented as an API call to something.”And doesn't take much, but being able to have something like that, even for an internal app, has been absolutely transformative just for workflow stuff internally, for making things accessible to people that are not otherwise going to be able to do those sorts of things, by which I mean me.Chris: Yeah. I mean, exactly, I think that is the kind of use case that we are aiming for is making this accessible to everybody, building tools that work for people that aren't necessarily software developers, they don't want to dive into code—although they can if they want, it's extensible in that way—that aren't necessarily front-end developers or designers, although it's accessible to designers and if you want to start from that end, you can do it. And it's amenable to collaboration, so you can have somebody that understands the problem build something that works, you can have somebody that understands design build something that works well and looks nice, and you can have somebody that understands the code or is more of a back-end developer, then go back in and maybe fine-tune the API calls because they realize that you're doing the same thing over and over again and so there's a better way to structure the lower parts of things. But you can pass around that experience between all these different stakeholders and you can construct something that everybody can modify to sort of suit their own needs and desires.Corey: Many years ago, Bill Clinton wound up coining the phrase, ‘The Digital Divide' to talk about people who had basically internet access and who didn't—those who got it or did not—and I feel like we have a modern form of that, the technology haves and have nots. Easy example of this for a different part of my workflow here: this podcast, as anyone listening to it is probably aware by now, is sponsored by awesome folks who wind up wanting to tell you about the exciting services or tools or products that they are building. And sometimes some of those sponsors will say things like, “Okay, here's the URL I want you to read into the microphone during the ad read,” and my response is a polite form of, “Are you serious?” It's seven different subdirectories on the web server, followed by a UTM series of tracking codes that, yeah, I promise, none of you are going to type that in. I'm not even going to wind up reading into the microphone because my attention span trips out a third of the way through.So, I needed a URL shortener. So, I set up snark.cloud for this. For a long time, that was relatively straightforward because I just used an S3 bucket with redirect objects inside of it. But then you have sort of the problem being a victim of your own success, to some extent, and I was at a point where, oh, I can have people control some of these things that aren't me; I don't need to be the person that sets up the link redirection work.Yeah, the challenge is now that you have a business user who is extraordinarily good at what he does, but he's also not someone who has deep experience in writing code, and trying to sit here and explain to him, here's how to set up a redirect object in an S3 bucket, like, why didn't I save time and tell him to go screw himself? It's awful. So, I've looked for a lot of different answers for this, and the one that I found lurking on GitHub—and I've talked about it a couple of times, now—runs on Google Cloud Run, and the front-end for that of the business user—which sounds ridiculous, but it's also kind of clever, is a Google Sheet. Because every business user knows how to work a Google Sheet. There's one column labeled ‘slug' and the other one labeled ‘URL' that it points to.And every time someone visits a snark.cloud slash whatever the hell the slug happens to be, it automatically does a redirect. And it's glorious. But I shouldn't have to go digging into the depths of GitHub to find stuff like that. This feels like a perfect use case for a no-code, low-code tool.Chris: Yeah. No, I agree. I mean, that's a cool use case. And I… as always, our competitor is Google Sheets. I think everybody in software development in enterprise software's only real competitor is the spreadsheet.Corey: Oh, God, yes, I wind up fixing AWS bills for a living and my biggest competitor is always Microsoft Excel. It's, “Yeah, we're going to do it ourselves internally,” is what most people do. It seems like no matter what business line I've worked in, I've companies that did Robo-advising for retirement planning; yeah, some people do it themselves in Microsoft Excel. I worked for an expense reporting company; everyone does that in Microsoft Excel. And so, on and so forth.There are really very few verticals where that's not an option. It's like, but what about a dating site? Oh, there are certain people who absolutely will use Microsoft Excel for that. Personally, I think it's a bad idea to hook up where you VLOOKUP but what do I know?Chris: [laugh]. Right, right.Corey: Before you wound up going into the wide world of low-code development over at Remix, you—well, a lot of people have different backstories when I talk to them on this show. Yours is definitely one of the more esoteric because the common case and most people talk about is oh, “I went to Stanford and then became a software engineer.” “Great. What did you study?” “Computer Science,” or something like it. Alternately, they drop out of school and go do things in their backyard. You have a PhD in particle physics, is it?Chris: That's right. Yeah.Corey: Which first, is wild in his own right, but we'll get back to that. How did you get here from there?Chris: Ah. Well, it's kind of the age-old story of academia. So, I started in electrical engineering and ended up double majoring in physics because that you had to take a lot of physics to be an engineer, and I said, you know, this is more fun. This is interesting. Building things is great, but sitting around reading papers is really where my heart's at.And ended up going to graduate school, which is about the best gig you can ever get. You get paid to sit in an office and read and write papers, and occasionally go out drinking with other grad students, and that's really about it.Corey: I only just now for the first time in my life, realized how much some aspects of my career resemble being a [laugh] grad student. Please, continue.Chris: It doesn't pay very well is the catch, you know? It's very hard to support a lifestyle that exists outside of your office, or, you know, involves a family and children, which is certainly one downside. But it's a lot of fun and it's very low stress, as long as you are, let's say, not trying to get a job afterward. Because where this all breaks down is that, you know, as I recall, the time I was a graduate student, there were roughly as many people graduating as graduate students every year as there were professors total in the field of physics, at least in the United States. That was something like the scale of the relationship.And so, if you do the math, and unfortunately, we were relatively good at doing math, you could see, you know, most of us were not going to go on, you know? This was the path to becoming a professor, but—Corey: You look at number of students and the number of professorships available in the industry, I guess we'll call it, and yeah, it's hmm, basic arithmetic does not seem like something that anyone in that department is not capable of doing.Chris: Exactly. So, you're right, we were all I think, more or less qualified to be an academic professor, certainly at research institutions, where the only qualification, really, is to be good at doing research and you have to tolerate teaching students sometimes. But there tends to be very little training on how to do that, or a meaningful evaluation of whether you're doing it well.Corey: I want to dive into that a bit because I think that's something we see a lot in this industry, where there's no training on how to do a lot of different things. Teaching is one very clear example, another one is interviewing people for jobs, so people are making it up as they go along, despite there being decades and decades of longitudinal studies of people figuring out what works and what doesn't, tech his always loved to just sort of throw it all out and start over. It's odd to me that academia would follow in similar patterns around not having a clear structure for, “Oh, so you're a grad student. You're going to be teaching a class. Here's how to be reasonably effective at it.” Given that higher education was not the place for me, I have very little insight into this. Is that how it plays out?Chris: I don't want to be too unfair to academia as a whole, and actually, I was quite lucky, I was a student at the University of Washington and we had a really great physics education group, so we did actually spend a fair amount of time thinking about effective ways to teach undergraduates and doing this great tutorial system they had there. But my sense was in the field as a whole, for people on the track to become professors at research institutions, there was typically not much in the way of training as a teacher, there was not really a lot of thought about pedagogy or the mechanics of delivering lectures. You know, you're sort of given a box full of chalk and a classroom and said, you know, “You have freshman physics this quarter. The last teacher used this textbook and it seems to be okay,” tended to be the sort of preparation that you would get. You know, and I think it varies institution to institution what kind of support you get, you know, the level of graduate students helping you out, but I think in lots of places in academia, the role of professors as teachers was the second thought, you know, if it was indeed thought at all.And similarly, the role of professors as mentors to graduate students, which, you know, if anything, is sort of their primary job is guiding graduate students through their early career. And again, I mean, much like in software, that was all very ad hoc. You know, and I think there are some similarities in terms of how academics and how tech workers think of themselves as sort of inventing the universe, we're at the forefront, the bleeding edge of human knowledge, and therefore because I'm being innovative in this one particular aspect, I can justify being innovative in all of them. I mean, that's the disruptive thing to do, right?Corey: And it's a shame that you're such a nice person because you would be phenomenal at basically being the most condescending person in all of tech if you wanted to. Because think about this, you have people saying, “Oh, what do you do?” “I'm a full-stack engineer.” And then some of the worst people in the world, of which I admit I used to be one, are, “Oh, full-stack. Really? When's the last time you wrote a device driver?”And you can keep on going at that. You work in particle physics, so you're all, “That's adorable. Hold my tea. When's the last time you created matter from energy?” And yeah, and then it becomes this the—it's very hard to wind up beating you in that particular game of [who'd 00:15:07] wore it better.Chris: Right. One of my fond memories of being a student is back when I got to spend more time thinking about these things and actually still remembered them, you know, in my electoral engineering days and physics days, I really had studied all the way down from the particle physics to semiconductor physics to how to lay out silicon chips and, you know, how to build ALUs and CPUs and whatnot from basic transistor gates. Yeah, and then all the way up to, you know, writing compilers and programming languages. And it really did seem like you could understand all those parts. I couldn't tell you how any of those things work anymore. Sadly, that part of my brain has now taken up with Go's lexical scoping rules and borrow checker fights with Rust. But there was a time when I was a smart person and knew those things.Corey: This episode is sponsored in part by our friends at Strata. Are you struggling to keep up with the demands of managing and securing identity in your distributed enterprise IT environment? You're not alone, but you shouldn't let that hold you back. With Strata's Identity Orchestration Platform, you can secure all your apps on any cloud with any IDP, so your IT teams will never have to refactor for identity again. Imagine modernizing app identity in minutes instead of months, deploying passwordless on any tricky old app, and achieving business resilience with always-on identity, all from one lightweight and flexible platform.Want to see it in action? Share your identity challenge with them on a discovery call and they'll hook you up with a complimentary pair of AirPods Pro. Don't miss out, visit Strata.io/ScreamingCloud. That's Strata dot io slash ScreamingCloud.Corey: I want to go back to what sounded like a throwaway joke at the start of the episode. In seriousness, one of the reasons—at least that I told myself at the time—that I left Maine was that it was pretty clear that there was no significant, lasting opportunity in industry when I was in Maine. In fact, the girl that I was dating at the time in college graduated college, and the paper of record for the state, The Maine Sunday Telegram, which during the week is called The Portland Press Herald, did a front-page story on her about how she went to school on a pulp and paper scholarship, she was valedictorian in her chemical engineering class at the University of Maine and had to leave the state to get a job. And every year they would roll out the governor, whoever that happened to be, to the University of Maine to give a commencement speech that's, “Don't leave Maine, don't leave Maine, don't leave Maine,” but without any real answer to, “Well, for what jobs?”Now, that Covid has been this plague o'er the land that has been devastating society for a while, work-from-home has become much more of a cohesive thing. And an awful lot of companies are fully embracing it. How have you seen Maine change based upon that for one, and for another, how have you found that community has been developed in the local sense because there was none of that in Maine when I was there? Even the brief time where I was visiting for a conference for a week, I saw definite signs of a strong local community in the tech space. What happened? I love it.Chris: It's great. Yeah, so I moved to Maine eight years ago, in 2014. And yeah, I was lucky enough to pretty early on, meet up with a few of the local nerds, and we have a long-running Slack group that I just saw was about to turn nine, so I guess I was there in the early days, called Computers Anonymous. It was a spinoff, I think, from a project somebody else had started in a few other cities. The joke was it was a sort of a confessional group of, you know, we're here to commiserate over our relationships with technology, which all of us have our complaints.Corey: Honestly, tech community is more of a support group than most other areas, I think.Chris: Absolutely. All you have to do is just have name and technology and somebody will pipe up. “Okay, you know, I've a horror story about that one.” But it has over the years turned into, you know, a very active Slack group of people that meet up once a month for beers and chats with each other, and you know, we all know each other's kids. And when the pandemic hit, it was absolutely a lifeline that we were all sort of still talking to each other every day and passing tips of, you know, which restaurants were doing takeout, and you know which ones were doing takeout and takeout booze, and all kinds of local knowledge was being spread around that way.So, it was a lucky thing to have when that hit, we had this community. Because it existed already as this community of, you know, people that were remote workers. And I think over the time that I've been here, I've really seen a growth in people coming here to work somewhere else because it's a lovely place to live, it's a much cheaper place to live than almost anywhere else I've ever been, you know, I think it's pretty attractive to the folks come up from Boston or New York or Connecticut for the summer, and they say, “Ah, you know, this doesn't seem so bad to live.” And then they come here for a winter, and then they think, “Well, okay, maybe I was wrong,” and go back. But I've really enjoyed my time here, and the tools for communicating and working remotely, have really taken off.You know, a decade ago, my first startup—actually, you know, in kind of a similar situation, similar story, we were starting a company in Louisville, Kentucky. It was where we happen to live. We had a tech community there that were asking those same questions. “Why is anybody leaving? Why is everybody leaving?”And we started this company, and we did an accelerator in San Francisco, and every single person we talked to—and this is 2012—said, you have to bring the company to San Francisco. It's the only way you'll ever hire anybody, it's the only way you'll ever raise any money, this is the only place in the world that you could ever possibly run a tech company. And you know, we tried and failed.Corey: Oh, we're one of those innovative industries in the world. We've taken a job that can be done from literally anywhere that has internet access and created a land crunch on eight square miles, located in an earthquake zone.Chris: Exactly. We're going to take a ton of VC money and where to spend 90% of it on rent in the Bay Area. The rent paid back to the LPs of our VC funds, and the circle of life continues.Corey: Oh, yeah. When I started this place as an independent consultant six years ago, I looked around, okay, should I rent space in an office so I have a place where I go and work? And I saw how much it costs to sublet even, like, a closed-door office in an existing tech startup's office space, saw the price tag, laughed myself silly, and nope, nope, nope. Instead installed a door on my home office and got this place set up as a—in my spare room now is transformed into my home office slash recording studio. And yeah, “Well, wasn't it expensive to do that kind of stuff?” Not compared to the first three days of rent in a place like that it wasn't. I feel like that's what's driving a lot of the return to office stories is the sort of, I guess, an expression of the sunk cost fallacy.Chris: Exactly. And it's a variation of nobody ever got fired for choosing IBM, you know? Nobody ever got fired for saying we should work in the office. It's the way we've always done things, people are used to it, and there really are difficulties to collaborating effectively remotely, you know? You do lose something with the lack of day-to-day contact, a lack of in-person contact, people really do get kind of burned out on interacting over screens. But I think there are ways around that and the benefits, in my mind, my experience, you know, working remotely for the last ten years or so, tend to outweigh the costs.Corey: Oh, yeah. If I were 20 years younger, I would absolutely have been much more amenable to staying in the state. There's a lot of things that recommend it. I mean, I don't want people listening to this to think I actually hate Maine. It's become a running joke, but it's also, there was remarkably little opportunity in tech back when I lived there.And now globally, I think we're seeing the rise of opportunity. And that is a line I heard in a talk once that stuck with me that talent is evenly distributed, but opportunity isn't. And there are paths forward now for folks who—I'm told—somehow don't live in that same eight-square miles of the world, where they too can build tech companies and do interesting things and work intelligently with other folks. I mean, the thing that always struck me as so odd before the pandemic was this insistence on, “Oh, we don't allow remote work.” It's, “Well, hang on a minute. Aren't we all telecommuting in from wherever offices happen to be to AWS?” Because I've checked thoroughly, they will not let you work from us-east-1. In fact, they're very strict on that rule.Chris: [laugh]. Yeah. And it's remarkable how long I think the attitude persisted that we can solve any problem except how to work somewhere other than SoMa.Corey: Part of the problem too in the startup space, and one of the things I'm so excited about seeing what you're doing over at Remix Labs, is so many of the tech startups for a long time felt like they were built almost entirely around problems that young, usually single men had in their 20s when they worked in tech and didn't want to deal with the inconveniences of having to take care of themselves. Think food delivery, think laundry services, think dating apps, et cetera, et cetera. It feels like now we're getting into an era where there's a lot of development and focus and funding being aimed at things that are a lot more substantial, like how would we make it possible for someone to build an app internally or externally without making them go to through a trial-by-fire hazing ritual of going to a boot camp for a year first?Chris: Yeah. No, I think that's right. I think there's been an evolution toward building tools for broader problems, for building tools that work for everybody. I think there was a definite startup ouroboros in the, kind of, early days of this past tech boom of so much money being thrown at early-stage startups with a couple of young people building them, and they solved a zillion of their own problems. And there was so much money being thrown at them that they were happy to spend lots of money on the problems that they had, and so it looked like there was this huge market for startups to solve those problems.And I think we'll probably see that dry up a little bit. So, it's nice to get back to what are the problems that the rest of us have. You know, or maybe the rest of you. I can't pretend that I'm not one of those startup people that wants on-demand laundry. But.Corey: Yet you wake up one day and realize, oh, yeah. That does change things a bit. Honestly, one of the weirdest things for me about moving to California from Maine was just the sheer level of convenience in different areas.Chris: Yes.Corey: And part of it is city living, true, but Maine is one those places where if you're traveling somewhere, you're taking a car, full stop. And living in a number of cities like San Francisco, it's, oh great, if I want to order food, there's not, “The restaurant that delivers,” it's, I can have basically anything that I want showing up here within the hour. Just that alone was a weird, transformative moment. I know, I still feel like 20 years in, that I'm “Country Boy Discovers City for the First Time; Loses Goddamn Mind.” Like, that is where I still am. It's still magic. I became an urban creature just by not being one for my formative years.Chris: Yeah. No, I mean, absolutely. I grew up in Ann Arbor, which is sort of a smallish college town, and certainly more urban than the areas around it, but visiting the big city of Detroit or Lansing, it was exciting. And, you know, I got older, I really sort of thought of myself as a city person. And I lived in San Francisco for a while and loved it, and Seattle for a while and loved it.Portland has been a great balance of, there's city; it's a five minute drive from my house that has amazing restaurants and concerts and a great art scene and places to eat and roughly 8000 microbreweries, but it's still a relatively small community. I know a lot of the people here. I sort of drive across town from one end to the other in 20 minutes, pick up my kids from school pretty easily. So, it makes for a nice balance here.Corey: I am very enthused on, well, the idea of growing community in localized places. One thing that I think we did lose a bit during the pandemic was, every conference became online, so therefore, every conference becomes the same and it's all the same crappy Zoom-esque experience. It's oh, it's like work with a slightly different topic, and for once the people on this call can't fire me… directly. So, it's one of those areas of just there's not enough differentiation.I didn't realize until I went back to Monktoberfest a month or so ago at the time at this call recording just how much I'd missed that sense of local community.Chris: Yeah.Corey: Because before that, the only conferences I'd been to since the pandemic hit were big corporate affairs, and yeah, you find community there, but it also is very different element to it, it has a different feeling. It's impossible to describe unless you've been to some of these community conferences, I think.Chris: Yeah. I mean, I think a smallish conference like that where you see a lot of the same people every year—credit to Steven, the whole RedMonk team for Monktoberfest—that they put on such a great show that every year, you see lots and lots of faces that you've seen the last several because everybody knows it's such a great conference, they come right back. And so, it becomes kind of a community. As I've gotten older a year between meetings doesn't seem like that long time anymore, so these are the friends I see from time to time, and you know, we have a Slack who chat from time to time. So, finding those ways to sort of cultivate small groups that are in regular contact and have that kind of specific environment and culture to them within the broader industry, I think has been super valuable, I think. To me, certainly.Corey: I really enjoyed so much of what has come out of the pandemic in some ways, which sounds like a weird thing to say, but I'm trying to find the silver linings where I can. I recently met someone who'd worked here with me for a year-and-a-half that I'd never met in person. Other people that I'd spoken to at length for the last few years in various capacity, I finally meet them in person and, “Huh. Somehow it never came up in conversation that they're six foot eight.” Like, “Yeah, okay/ that definitely is one of those things that you notice about them in person.” Ah, but here we are.I really want to thank you for spending as much time as you have to talk about what you're up to, what your experiences have been like. If people want to learn more, where's the best place for them to find you? And please don't say Maine.Chris: [laugh]. Well, as of this recording, you can find me on Twitter at @chrisvermilion, V-E-R-M-I-L-I-O-N. That's probably easiest.Corey: And we will, of course, put links to that in the [show notes 00:28:53]. Thank you so much for being so generous with your time. I appreciate it.Chris: No, thanks for having me on. This was fun.Corey: Chris Vermilion, Senior Software Developer at Remix Labs. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment, and since you're presumably from Maine when writing that comment, be sure to ask a grown-up to help you with the more difficult spellings of some of the words.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Defining and Nurturing a Self-Supporting Community with Alyss Noland

Screaming in the Cloud

Play Episode Listen Later Jan 17, 2023 33:48


About AlyssAlyss Noland is the head of Developer Relations Relations and Product Marketing at Common Room, an intelligent community-led growth platform. She previously led product marketing for Developer Experience at GitHub where she focused on open source community investment and helping engineering teams find success through development metrics and developer-focused research. She's been working in tech since 2012 in various roles from Sales Engineering and Developer Advocacy to Product Marketing with companies such as GitHub, Box, Atlassian, and BigCommerce, as well as being an advisor at Heavybit. Links Referenced: Common Room: https://www.commonroom.io/ Heavybit: https://www.heavybit.com/ Twitter: https://twitter.com/PreciselyAlyss Twitch: https://www.twitch.tv/PreciselyAlyss TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Tailscale SSH is a new, and arguably better way to SSH. Once you've enabled Tailscale SSH on your server and user devices, Tailscale takes care of the rest. So you don't need to manage, rotate, or distribute new SSH keys every time someone on your team leaves. Pretty cool, right? Tailscale gives each device in your network a node key to connect to your VPN, and uses that same key for SSH authorization and encryption. So basically you're SSHing the same way that you're managing your network.So what's the benefit? You'll get built-in key rotation, the ability to manage permissions as code, connectivity between two devices, and reduced latency. You can even ask users to re-authenticate SSH connections for that extra bit of security. Try Tailscale now - it's free forever for personal use forever.Corey: This episode is sponsored by our friends at Logicworks. Getting to the cloud is challenging enough for many places, especially maintaining security, resiliency, cost control, agility, etc, etc, etc. Things break, configurations drift, technology advances, and organizations, frankly, need to evolve. How can you get to the cloud faster and ensure you have the right team in place to maintain success over time? Day 2 matters. Work with a partner who gets it - Logicworks combines the cloud expertise and platform automation to customize solutions to meet your unique requirements. Get started by chatting with a cloud specialist today at snark.cloud/logicworks. That's snark.cloud/logicworksCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I often wonder how to start these conversations, but sometimes it's just handed to me and I don't even have to do a whole lot of work. My guest today is Alyss Noland, who's the Head of Developer Relations Relations and Product Marketing at Common Room. Alyss, thank you for joining me.Alyss: Thanks for having me, Corey. I'm really excited to be here.Corey: So, developer relations relations. It feels like an abstraction that has been forced to be built on top of another abstraction that has gotten too complicated, so as best I can tell, you are walking around as a human equivalent of Kubernetes.Alyss: Oh, gosh, I would really hope not to be a human equivalent of Kubernetes. I think that would make me an octopus. But—Corey: Yeah, “What did you say about me?” Yeah.Alyss: [laugh].Corey: “I didn't come here to be insulted, Quinn.” Yeah.Alyss: No, like listen, I love octopodes. Which [tattoo 00:01:24] is which? So, developer relations relations. Yes, it's an abstraction on an abstraction. A really critical level, it is how do I relate? Can I relate to people that are in the developer relations profession at large?We are at the point at which this is a somewhat poorly-defined area that is continuing to grow. And there's a lot of debates in that space and so I'm really excited to be at an organization that will give me a platform to try and move the industry forward.Corey: Your relatively recent career history is honestly fascinating to me. You spent about a year and a half as a senior developer advocate at Box. And as anyone who's ever tried it knows, it's very hard to beat Box [beatboxing noises]. But you tried and went to GitHub, in which case, you basically transitioned pretty quickly from a Senior Product Marketing Manager to Director of Product Marketing, where you were the go-to-market lead for GitHub Copilot.Alyss: Yeah, that was a really interesting project to be on. I started off at the technical preview back in 2021, launching that too—it ended up being with about a little over a million, two million folks in technical preview. And it's fairly new to the market. There was nothing else—or at the time, there had been nothing else that was using a descendant of GPT-3. There was nothing else using a descendant of GPT-3 to generate suggestions for code to—there were a couple that were using GPT-2, but the amount of language coverage they had was a little bit limited, what they were suggesting was a little bit limited.And it's hard to say, like, highlight of my career, but at that point in time, I would say probably, highlight of my career to be able to work on something with that opportunity for impact.Corey: As someone who was in the technical preview and now tried to be a paying customer of it, but I can't because of my open-source work, it wound up giving it to me for free. I found it to be absolutely transformative. And I know I'm going to get letters and I don't even slightly care because it's not, “I'm going to tab-complete my application.” If a tool can do that, your application is probably not that complex. No, for me, what I find incredibly valuable is the ability to tab-complete through obnoxious boilerplate. CloudFormation, I am not subtweeting you; I am calling you out directly. You are wordy and obnoxious. Fix yourself.And especially in languages that I don't deal with day-to-day—because I'm not a full-time developer—I forget certain parameters or argument order or things like that and being able to effectively tab-complete is awesome for that use case. It's not doing my job; it's automating the crappy part of my job. And I absolutely love it for that.Alyss: Yeah, and was really interesting working on a common portion of product marketing work is that we build messaging houses. We try to identify where's the value to the user, to the organization at large, depending on, like, who it is we're trying to sell to, how does that ladder up from, like, an IoT to a manager. And so, one of the things that I got really excited about as we started to see it—and there's some great work that Dr. Eirini Kallaimvakou has published that I would definitely refer to if you're interested in diving deeper into it—is the way in which Copilot and this, like, ability to improve the boilerplate experience, improve the boring shit—automate the boring shit, if you will—is about developer satisfaction. It's not about making you build your commits faster or about having more lines of code that you like get deployed out; it's about making your jobs suck less.Corey: Well, if you spent, what was it roughly two years, give or take, at GitHub between your various roles—and yes, I'm going to pronounce it ‘GIF-ub' because that's my brand of obnoxious, so I'm going to go for it—you went to Common Room. Let's begin there. What does Common Room do, exactly?Alyss: So, Common Room is an intelligent community-led growth platform. And there's a few things kind of packed into that really short description, but the idea is that we've seen all of these product-lead grows businesses. But at a critical point, and something we've seen at GitHub, which is a product-led growth company, it's something that we've seen at Atlassian, Asana, you name half a dozen different, like, SaaS companies, self-hosted software, open-source, community is at the heart of it. And so, how do you nurture that community? How do you measure that community? How do you prove that the work that you're doing is valuable?And that's what Common Room is setting out to do. And so, when I saw—like, they're not the only person or organization in the market that's doing this, but I think they're doing it exceptionally well, and with really great goals in mind. And so, I'm enthused to try and facilitate that investment in community for more organizations.Corey: One of the challenges that I have seen of products in the community space is it tended, historically, to go in really, I guess I'll call them uncomfortable directions. In the before times, I used to host dinner parties near constantly here, and someone confide into me once—after, you know, six beers or so, because that's when people get the excitingly honest—they mentioned that, “Yeah, I'm supposed to wind up putting these dinners into Salesforce”—or whatever the hell it was—“To track the contacts we have with influencers in this space.” And that made me feel so profoundly uncomfortable. It's, you're invited here to spend time with my friends and my family. You're meeting my kids, it's, yeah, this is just a go-to-market motion and you can [BLEEP] on out of here and never come back.And I did not get that sense to be clear and I'm told the company wound up canceling that horrifying program, but it does feel like it's very easy to turn an authentic relationship into something that feels remarkably sleazy. That said, Common Room has been around for a while and I have yet to hear a single accusation that you folks have come within a thousand miles of doing that. How do you avoid the trap?Alyss: It's a slippery slope, and I can't say that Common Room creates any kind of like enforcement or silos or prevents organizations from falling into this trap. Fundamentally, the way in which community can be abused, the way in which these relationships can be taken advantage of, at least from the perception of the parties that initially built the relationship, is to take the context out of them, to take the empathy out of them, take the people out of them. And so, that is fundamentally left to the organization's principles, it's left to how much authority does community have within the business relative to a sales team. And so first, being able to elevate community in such a way to show that they are having that impact already without having to turn the community into a prospect pool is, I think, one of the critical first steps, and it's something that we've been able to break through initially by connecting things like Slack, Discord, Twitter to show, here's all these people talking about you, here's all the things that they're saying, here's the sentiment analysis, and also, now we're going to push that into Salesforce. So, you can see that this started out in community and it was fostered there. Now, you can see the ROI, you don't need to go hitting up our community contacts to try and sell to them because we're doing it on your behalf in a very real way.Corey: Part of the challenge, I think, is that—and you've talked to me about this in previous conversations we've had—that so much of community is distilled down to a sales motion, which let's be direct, it kind of sucks at, in some levels, because it's okay, great, I'm here to talk to you about how community works. Well, in the AWS community, for example, the reason that formed and is as broad and fast as it is because AWS's documentation is Byzantine and there's a sort of shared suffering that we all get to commiserate over. And whenever AWS tries to take, “Ownership,” quote-unquote, of its community, right, that doesn't actually work that way. They have community watering holes, but to my understanding, the largest AWS-centric Slack team is the Open Guide to AWS's Slack team, which now has, at last count, 15,000 people in it. I'm lucky enough to be the community lead for that project.But it was pre-existing before I got there and it's great to be able to go and talk to people who are using these things. It doesn't feel like it is owned, run, or controlled—because it's not—by AWS themselves. It's clear from the way that your product has evolved, that you feel similarly around that where it's about being aware of the community rather than controlling the community. And that's important.Alyss: Absolutely. And one of the ways in which we, like, highlight this as soon as you're in the product, is being able to show community responsiveness and then what percentage of those responses are coming from my team members. And frankly, as someone who's previously set strategy for developer relations teams, for developer communities, what I want to see is community members responding to each other, community members knowing what's the right place to look, what's the right answer, how am I ensuring that they have the resources that they need, the answers that they need. Because at the end of the day, I can't scale one-to-one; no one can. And so, the community being able to support itself is at the heart of the definition of community.Corey: One of the other problems that I've seen historically, and I'll call it the Chef problem because Chef had an incredibly strong community, and as someone who is deep in the configuration management space myself, but never use Chef, it was the one that I avoided for a variety of reasons at the time, it was phenomenal. I wound up going to ChefConf, despite not being a Chef user, just to spend time with some of the great people that were involved. The blunder that they made before they were acquired into irrelevance by progress—and to be fair, the industry changed direction toward immutable infrastructure in ways that were hard to foresee—but the problem is, they made was hiring their entire community. And it doesn't sound like that would be a bad thing, but suddenly, everyone who was talking about the product had a Chef email address, and that hits very differently.Alyss: It does. And it goes back to that point of trying to maintain those authentic relationships. And if we're to step outside of tech, I have a background prior to tech in the video game industry, and that was a similar problem. Nearly every single community-made application, extension ends up getting acquired by some organization, like Curse, and then piped full of ads, or the person that you thought you could ask or to see build some other better experience of version control software, or a Git client ends up getting consumed into a large business and then the project never sees the light of day. And frankly, that's not how you run community in my estimation.My estimation is, if the community is doing things better than you are, take notes. Product management, pay attention. That's something that is another aspect of doing developer relations is about checking in with those teams, about showing them evidence. And like, it so often ends up being qualitative in a way that doesn't change people's minds or their feelings, where people want to see quantitative numbers in order to say, “Oh, this is the business justification. Like, this is the ROI. This proves that this is the thing we should invest in.” And frankly, no. Like, sometimes it is a little bit more about stepping back and letting the organic empathy and participation happen without having to own it.Corey: There's a sense, I think that a lot of companies feel the need to own every conversation that happens around them, their product, et cetera, and you can't. You just can't, unless—to be direct—your company is failing. Just because if no one's talking about you, then great, you're the only ones talking about you. And you can see this from time to time and it's depressing as hell when you have people who work for a company all tweeting the same cookie-cutter statement, and they get zero interaction except from a bot account. It's sad.Alyss: Yeah. And I've unfortunately seen this more times than I can count in community Slacks where people just, like, copy-paste whatever marketing handed to them, and I would be shocked if they got any engagement at all. Because that's… cool. What do I know about you? Why do I care about this event? Have you personalized it to me?And yeah, you don't want the organization to be the only one talking about you. If you are then you've already failed in this, you know, product-led growth motion. You've kind of—if we want to get into the murky water of NPS, like, nobody's going and telling their friends about your product [laugh]. And the thing that's so valuable is the authentic voice. It's the, “I'm excited to talk about this and I like it enough to tell you what I like about it.” I like it enough to tell you about this use case that might never seen the light of day, but because we're having a conversation between ourselves, it can all be personalized. It can all be about what's going on between us and about our shared experiences. And that is ten times more powerful than most Twitter-promoted ads you'll ever see.Corey: So, I want to unpack a little bit about not developer relations as such, but developer relations relations because I can mostly understand—badly—what product marketing is, but developer relations relations—or as you'd like to call it developer relations squared—that's something new. I've always called DevRel to be devrelopers, and people get annoyed enough at that. What is that newfound layer of abstraction on top of it?Alyss: Well, there's several things that I'm going to end up—and I say end up; I'm six weeks into the role, so I have a lot of high hopes for where I hope this goes. And one of those is things, like, we don't have a very shared understanding and shared definition of what developer advocacy even is, what is developer relations? Does developer marketing belong under that umbrella? How should organizations approach developer relations? How should they value it? Where should it, you know, belong in terms of business strategy?And there's an opportunity for a company whose business it is to elevate this industry, this career path, if you will, where we can spend the time, we can spend the money to say, here's what success looks like. We've interviewed all these groups, we've talked with the leaders in this space that are making it their jobs to think about this. Here's a set of group-developed recommendations for how the industry should mature. Or here's an open-source set of job descriptions and requirements. And like, let's get to some level of shared understanding.So, as an example of, kind of, where I'm leading to with all of this, and some of the challenges that developer relations faces is the State of Developer Relations report that just came out. There's a significant number of people that are coming into developer advocate, developer relations roles for the first time, they have one to two years of experience, they're coming into programs that have been around for one to two years, and so what does that tell you? That tells you you're bringing in people with no experience to try to establish brand new programs, that they're being asked to by their business, and they don't have the vocabulary, the tools, the frameworks in which to establish that for themselves. And so, they're going to be swayed by, you know, the tides of business, by the influences of their leadership without having their own pre-built notions. And so, how do we give them that equipment and how do we elevate the practice?Corey: Cloud native just means you've got more components or microservices than anyone (even a mythical 10x engineer) can keep track of. With OpsLevel, you can build a catalog in minutes and forget needing that mythical 10x engineer. Now, you'll have a 10x service catalog to accompany your 10x service count. Visit OpsLevel.com to learn how easy it is to build and manage your service catalog. Connect to your git provider and you're off to the races with service import, repo ownership, tech docs, and more. Corey: It feels like so much of the DevRel discourse has turned into, one, we define it by what is not, and two, it doesn't matter how you're measuring it, you're measuring it wrong. I feel like that is, I guess we'll call it counterproductive, for lack of a better descriptor. It feels like there's such a short-sighted perspective on all of this, but at the same time, you've absolutely got to find ways to articulate the value of DevRel slash community to the business otherwise, it turns into a really uncomfortable moment when, okay, time to cut costs. Why should we keep your function over a different function? If there's not a revenue or upside or time to market or some form of value story tied to that, that the business can understand that isn't just touchy-feely, it's a very difficult path forward from there. How do you see it?Alyss: I agree with you and I've, frankly, run into this problem several times in my career, and every time I've been a developer advocate. It's, you know—and where I've found the most success is not in saying, “Here's exactly the numbers that I'm going to be constantly looking at. I'm going to try to produce this many pieces of content, or I'm absolutely not speaking at events. And that's not my job. Or I'm not writing code. That's not my job.”It's about understanding what is driving the business forward. Who do I need participation and buy-in from and where am I hoping to go? Like, what does a year out from this look like? What does three years out from this look like? At Box, we do not want to be the API governance standard. That is not our job. That's not where we sit within engineering.That's frankly, if you really want to get into it, internal developer advocacy because it can influence the impact on the community. It is not the core focus and there are probably people better equipped and better educated on the core application. Big commerce, platform ecosystem, platform flywheel developers are fundamentally a part of continuing to grow the business and how do I go make that point to sales, how do I go make that point to partners, how do I go make that point to customer success, so that I can build a function that has more than one person. And so, I think to kind of bring it back to the larger question, that is where I see our greatest challenge is that we haven't given ourselves the vocabulary or the framework to understand the level of complexity that DevRel has become in being across so many industries, and being in B2B, and being in business to developer, and being in business to consumer. No one size fits all and we need to stop trying to treat it as though it can be.Corey: I think that there is a, how to put it, a problem in terms of how Twitter views a lot of these things. Someone wound up finally distilling it down for me in relatively recent times with a very resonant quote, which was simply put, that Twitter is not where you go for nuance. Twitter is where you go to be righteous. And I realized, oh, my God, that describes a good 80% of the things I've put up there. Like when I talk about how when companies do this thing to their staff and it's crappy, I am not necessarily for a nuanced debate, although of course there's always nuance and edge cases in the rest.As a counterpoint, whenever I wind up talking about things on Twitter and speak in generalities, I get a whole bunch of people pushing back with a, “Well, what about this edge case? That renders your entire point invalid.” And, ugh, not really. It feels like one of the casualties of the pandemic has been a sense of community in a sense of humans relating to other humans. I think we're all tired of the Zoom calls from hell I got to see you a couple of weeks before this recording at Monktoberfest in Portland, Maine, and oh, my God, dealing with people face to face, it was so much richer, at least from my perspective, compared to everything that we've been able to do during the pandemic. Am I alone on that? Are you seeing this across the board? Where companies are talking about this?Alyss: I will say with confidence, you're not alone in this. Whether or not companies are talking about it is also across the board. How rich are those understandings? How rich are those conversations? Because trying to step back as a brand is not really a way.Like, having nuance, being real, been community members, like that's not a way in which I think companies can participate in a way that feels truly authentic. That's why you need faces. That's why you need people. That's why you need folks whose job it is to do this. But in terms of things are lost, like, Twitter is not the right place to be having these conversations. It's not the right place in which to necessarily relate to people, absolutely.When you get distilled down all of your interactions into oh, I've got a notification. Oh, I have a checkmark, and so I have, like, better moderation tools. Oh, like, I made a statement and I don't want to hear a solution for it. We get all of these, uncurated experiences that are so dissatisfying that it does make us miss being around people who can read body language, that can understand my immediate relationship to them in spaces that we choose to be in, whereas Twitter is this big panopticon where we can just get yelled at and yell at each other. And it loves to amplify those conversations far more than any of the touchy-feely, good news success stories.Corey: When you take a look across the entire landscape of managing DevRel programs and ensuring that companies are receiving value for it, and—by which I mean, nurturing the long-term health of communities because yes, I am much more interested in that than I am in next quarter's numbers, how do you see that evolving, particularly with the recent economic recession or correction or drawback or everything's on fire, depending upon who it is you talk to? How do you see that evolving?Alyss: It goes back to what I said earlier about, I can speak in generalities, there will be specifics to various organizations, but at a fundamental part, like, I'll kind of take a step back and maybe make some very strong statements about what I think DevRel is, in a regard, which is, without documentation, without support, you don't have a product. And if you don't have folks going out and understanding what it is your customers need, and especially when those customers are maybe all the time or sometimes developers, and understanding what it is that they're saying and truly how having empathy for what's going on in their day-to-day, what task are they trying to complete, how relevant is this to them, if you don't invest in that, when that happens, you've lost the plot. And so, in those instances, unfortunately, that's a conversation with leadership team. Your leadership doesn't fundamentally understand the value and maybe it's worth it to make the argument in favor of to illustrate that without this feedback loop, without this investment in the educational journey of developers, without the investment in what is going on in our product, and where have we allowed ourselves to remain ignorant of what is happening in the day-to-day of our users. We need those folks.Product managers are in sprints, they're in standups. They're doing, like, strategic planning and their yearly planning. We need a group who is rewarded to care about this but also is innately driven to do so as well. And that's not something that you can make. And it's not something that we otherwise see. It's part of why we have such an absence in good developer marketing is because marketers aren't paid well enough to ever have learned the skills to be developers, and so there's no skills transfer.Corey: One last topic that I want to get into something you've only been doing for a short while, but you've become an advisor at Heavybit, which is a VC firm. How did that come about and what do you do?Alyss: So currently, I—I'll do the super-high level. What I do right now is I host office hours with seed startups and Series A that are in the dev tool space. And we generally talk about developer relations, a little bit in developer marketing go-to-market strategies. And it's super enriching for me because I love hearing about different experiences and problems and, like, areas of practice. But it was really interesting, and a little bit of a make-your-own-luck-and-opportunity type deal.Where I live in Austin, Texas; I do not live in the Bay Area, I don't have all those connections, I've been a bit distant from it. And I saw someone who had accepted a role that I had interviewed for, end up in some of their content. And I was like, “They're doing a great job. They definitely deserve to be there, but I also had similar qualifications, so why should I also be there?” And I found someone, his name's Tim, on LinkedIn, who runs their events. And I reached out and I said, “Hey, Tim, how would you like a new advisor?” And so, Tim responded back and we—Corey: Knock knock. Who's there? It's me.Alyss: Yeah, exactly. It's—and it was just, I want this thing to happen. How do I make it happen? I ask.Corey: And what does it day-to-day that look like? How much time does it take? What do you do exactly?Alyss: Yeah. I mean, right now, it's about five hours every quarter. So, I spend anywhere between 30 minutes to an hour with various organizations that are a part of Heavybit's portfolio, talking with them through their motion to go general availability, or they want to start participating in events, or they want to discover what are the right events for them to—or, like, DevOpsDays, should we participate in that? Should we hire a DevRel person? Should we hire a product marketing person? Just helping them sort wheat from chaff in terms of, like, how to proceed.And so, it's relatively, for me, lightweight. And Heavybit also gives us the opportunity to contribute back in blog posts, participate in podcasts and be able to have some of those richer conversations. So, I have a set of bookmarks, so there's over 100, bookmarks long, that is fully curated across several different categories. That was my first blog post was diving into a few of those where I think are critical areas of developer relations. What are some of the conversations on DevRel metrics? How do I think about setting a DevRel strategy for the first time? How do I do my first DevRel hire? And so, I wouldn't even call it a second job. It's more of a getting to, again, enrich my own experience, see a wider variety of different problems in this space and expand my own understanding.Corey: I really want to thank you for being so generous with your time. If people want to learn more about what you're up to, how you view the world, and basically just come along for the ride as you continue to demonstrate a side of tech that I don't think we get to see very often, where can they find you?Alyss: I am@PreciselyAlyss on Twitter, as well as Twitch. Aside from that, I would not recommend looking for me.Corey: Excellent. Always a good decision. I will put links to that in the [show notes 00:30:00]. Thank you so much for your time. I appreciate it.Alyss: Thanks, Corey.Corey: Alyss Noland, Head of Developer Relations Relations and Product Marketing at Common Room. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment belittling community and letting the rest of us know by observation just why you've been thrown out of every community to which you've ever been a part.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Return of re:Invent with Pete Cheslock

Screaming in the Cloud

Play Episode Listen Later Jan 12, 2023 41:45


About PetePete is currently the Head of Growth And Community for AppMap, the open source dynamic runtime code analyzer. Pete also works with early stage startups, helping them navigate the complex world of early stage new product development.Pete also fully acknowledges his profile pic is slightly out of date, but has been too lazy to update it to reflect current hair growth trends.Links:AppMap: https://appmap.io/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: If you asked me to rank which cloud provider has the best developer experience, I'd be hard-pressed to choose a platform that isn't Google Cloud. Their developer experience is unparalleled and, in the early stages of building something great, that translates directly into velocity. Try it yourself with the Google for Startups Cloud Program over at cloud.google.com/startup. It'll give you up to $100k a year for each of the first two years in Google Cloud credits for companies that range from bootstrapped all the way on up to Series A. Go build something, and then tell me about it. My thanks to Google Cloud for sponsoring this ridiculous podcast.Corey: Cloud native just means you've got more components or microservices than anyone (even a mythical 10x engineer) can keep track of. With OpsLevel, you can build a catalog in minutes and forget needing that mythical 10x engineer. Now, you'll have a 10x service catalog to accompany your 10x service count. Visit OpsLevel.com to learn how easy it is to build and manage your service catalog. Connect to your git provider and you're off to the races with service import, repo ownership, tech docs, and more. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn and this is probably my favorite recurring episode slash tradition, every year. I drag Pete Cheslock on who talks with me about his experience at re:Invent. Last year, Pete, you didn't go. The year before, none of us went because it was all virtual, but it feels like we're finally getting back into the swing of things. How are you, Pete?Pete: I am doing great. It is always a pleasure. It was amazing to see other humans in person at a industry event. As weird as it sounds to say that, you know, it was great to be in Vegas [laugh], it was mostly great, just because there were other humans there too that I wanted to see.Corey: Because this is going to confuse folks who haven't been following our various adventures, these days, you are the Head of Growth and Community at AppMap. But you and I have been talking for years and you did a stint working at The Duckbill Group here with us as a cloud economist. Ah, I miss those days. It was fun working with you and being able to bother you every day as opposed to just on special occasions like this.Pete: Yeah, I know. I got to slide into your Slack DMs in addition, and then when I didn't get a response, I would slide into your Twitter DMs. It worked out perfectly. So yeah, it's been a wild ride. I mean, I took an interlude from my startup journey by continually working at tech startups.And yeah, I got to join onboard the Duckbill and have, you know, a really wonderful time cutting bills and diving into all of the amazing parts of people's Amazon usage. But I am also equally broken in my brain, and continually said to myself, “Maybe I'll do another startup.” [laugh].Corey: Right. And it turns out that we're not a startup. Everyone likes to think we are. It's like, oh, okay—like Amazon, for example, has us historically in their startup division as far as how they—the buckets as they put different accounts into. And if you look at us through that lens, it's yeah, we're a specific kind of startups, specifically a failing startup—or failed—because to us growth is maybe we'll hire one or two people next year, as opposed to, “Oh, yeah, we're going to TEDx this place.” No, yeah, we're building a lifestyle business very much by design.Pete: I'd be very curious how many account managers actually Duckbill has kind of churned through because usually, you get to keep your account manager if you're growing at a pretty incredible clip. And it's kind of a bellwether for, like, how fast are we—are we growing so fast that we have kept our account manager for multiple years?Corey: Your timing is apt. We're a six-year-old company and I just met our fourth account manager last week.Pete: [laugh].Corey: No it's, honestly, what happens with AWS account managers is the only way you get to keep them is if your spend trajectory on AWS matches their career trajectory inside of AWS. Because if you outpace them, they'll give you to someone that they view as being more senior, whereas if they outpace you, they're going to stop dealing with the small accounts and move on to the bigger ones. Honestly, at this point, I've mostly stopped dealing with my account managers. I had one that was just spectacular. It was sad to see him get promoted; good for him.But I get tired of trying to explain the lunacy that is me to someone on the AWS side every year. It just doesn't make sense because my accounts are super weird and when they try and suggest the usual things that work for 99.995% of AWS customers and things they care about, it falls to custard when it comes to me specifically. And that's not on them; it's because I'm weird and broken.Pete: I'm remembering now one of the best account managers that I ever worked with at a startup, years and years ago. She was with us for a couple of years, pretty solidly. And then, you know, because careers are long and jobs are short, when I was at The Duckbill Group again, doing work, turns out she was the account manager on this other thing, you know? Which, like, looking at the company she was account manager for was like 500x [laugh] my previous company, so I was like, “Oh, yeah. You're clearly moving up in the world because my company did not 500x.” So, sometimes you got to chase the ones who are.Corey: So, let's talk about re:Invent. This felt like the first re:Invent post-pandemic. And let's be clear, I wound up getting Covid by the end, so I don't recommend that to everyone. But let's be clear, this was not a re:Invent were anyone officially accepted that Covid existed. I was one of the only people wearing masks to most of the events I was at. Great load of good that did me.But it was big. It was the usual sixty-some-odd-thousand people that had been in previous years, as opposed to the hard cap of 30 or so that they had last year so it felt smaller and more accessible. Nope. Right back to bizarre numbers of people. But fewer sponsors than most years, so it felt like their budget was severely constrained. And they were trying to have not as many sponsors, but still an overwhelming crush of attendees. It felt odd, but definitely very large scale.Pete: Yeah, I can echo that a hundred percent. I'm sure we've talked about this in previous ones, but I've had the pleasure—well, I don't know, some might call it not a pleasure, but it's been a pleasure to watch re:Invent grow over so many years. I went to the first re:Invent. A company I was at actually sponsored it. And remembering back to that first re:Invent, it was kind of quaint by comparison.There were 4000 people at the first re:Invent, which again, it's a big conference, especially when a lot of the conferences that I think I was really attending at the time were like, you know, 600, 1000, maybe tops. To go to a 4000-person event in Vegas especially, it's again, in the same Expo Hall it's been since that first one, it still felt big. But every person stayed in the Venetian. Pretty much everyone was in the same hotel, all of the attendees that year. All the talks were there.There was, you know, a lot [laugh]—I mean, a lot less of everything that was there. And so, watching it grow over time, not only as a sponsor because I've actually been—kind of worked re:Invent as a, like, a booth person for many of these years for multiple different sponsors and had to coordinate that aspect of it, but then also a couple of times just being more, like, attendee, right, just someone who could go and kind of consume the content. This year was more on the side of being more of an attendee where I got to just kind of experience the Expo Hall. You know, I actually spent a lot of time in the Expo Hall because a big part of why I was there was—Corey: To get t-shirts.Pete: Yeah, we'll get to—I was running low on not only t-shirts but socks. My socks were really worse for wear the last few years. I had to, like, re-up that, right [laugh]?Corey: Yeah, you look around. It's like, “Well, none of you people have, like, logoed pants? What's the deal here? Like, I have to actually buy those myself. I don't—I'm not here to spend money.”Pete: Yeah, I know. So. And so yeah, this year, it felt—it was like Covid wasn't a thing. It wasn't in anyone's mind. Just walking around—Vegas in general, obviously, it's kind of in its own little bubble, but, you know, I've been to other events this year that were much more controlled and had a lot more cautious attendees and this was definitely not like that at all. It felt very much, like the last one I was at. The last one I was at was 2019 and it was a big huge event with probably 50,000-plus people. And this one felt like to me at least, attendee-wise, it definitely felt bigger than that one in a lot of ways.Corey: I think that when all is said and done, it was a good event, but it wasn't necessarily what a lot of folks were expecting. What was your take on the content and how the week played out?Pete: Yeah, so I do, in many ways, kind of miss [laugh] the event of yore that was a little bit more of a targeted, focused event. And I understand that it will never be that kind of event anymore. Maybe they start splitting it off to be, you know, there's—just felt much more like a builder event in previous years. The content in the keynotes, you know, the big keynotes and things like that would be far more, these big, iterative improvements to the cloud. That's something that always felt kind of amazing to see. I mean, for years and years, it was like, “Who's ready for another re:Invent price drop?” Right? It was all about, like, what's the next big price drop going to be?Corey: Was it though because I never was approaching with an eye toward, “Oh, great. What are they going to cut prices on now?” That feels like the least interesting things that ever came out of re:Invent, at least for me. It's, what are they doing architecturally that lets me save money, yes. Or at least do something interesting architecturally, great. I didn't see Lambda when it first came out, for example, as a cost opportunity, although, of course, it became one. I saw it as this is a neat capability that I'm looking forward to exploring.Pete: Yeah, and I think that's what was really cool about some of those early ones is these, like, big things would get released. Like, Lambda was a big thing that got released. There was just these larger types of services coming out. And I think it's one of your quotes, right? Like, there's a service for everyone, but every service isn't for everyone.Corey: Yeah.Pete: And I feel like, you know, again, years ago, looking back, it felt like more of the services were more geared towards the operational, the early adopters of Amazon, a lot of those services was for those people. And it makes sense. They got to spread out further, they've got to have kind of a wider reach to grow into all of these different areas. And so, when they come out with things that, yeah, to me, I'm like, “This is ridiculous feature. Who would ever use this?” Like, there's probably a dozen other people at different companies that are obscenely excited because they're at some enterprise that has been ignored for years and now finally they're getting the exact tooling that they need, right?Corey: That made sense for a long time. I think that now, the idea that we're going to go and see an Andy Jassy-era style feature drop of, “Here's five new databases and a whole new compute platform and 17,000 more ways to run containers,” is not necessarily what is good for the platform, certainly not good for customers. I think that we're seeing an era of consolidation where, okay, you have all these services to do a thing. How do I pick which one to use? How do I get onto a golden path that I can also deviate from without having to rebuild everything? That's where customers seem to be. And it feels like AWS has been relatively slow to acknowledge or embrace that to me.Pete: Yeah, a lot of the services, you know, are services they're probably building for just their own internal purposes, as well. You know, I know, they are for a while very motivated to get off anything Oracle-related, so they started building these services that would help migrate, you know, away from Oracle because they were trying to do it themselves. But also, it's like, there's still—I mean, I talk with friends of mine who have worked at Amazon for many years and I'm always fascinated by how excited they are still to be there because they're operating at a scale that just doesn't exist anywhere else, right? It's like, they're off on their lone island that go into work somewhere else is almost going backwards because you've already solved problems at this lower level of scale. That's obviously not what you want to be doing anymore.And at the scale that they're at for some of these services, even like the core services, the small improvements they're making, they seem so simple and basic, like a tiny EBS improvement, you're like, “Ugh, that's so boring.” But at their level of scale for, like, something like an EBS, like one of those top five services, the impact of that tiny little change is probably even so amazingly impactful. Like it's just so huge [laugh], you know, inside that scope of the business that is just—that's what—if you really start pulling the thread, you're like, “Wow, actually, that is a massive improvement.” It just doesn't feel that way because it's just oh, it's just this tiny little thing [laugh]. It's like, just almost—it's too simple. It's too simple to be complex, except at massive scale.Corey: Exactly. The big problem I ran into is, I should have noticed it last year, but it was Adam Selipsky's first re:Invent and I didn't want to draw too many conclusions on it, but now we have enough dots to make a line—specifically two—where he is not going to do the Andy Jassy thing of getting on stage and reading off of a giant 200 item list of new feature and service announcements, which in AWS parlance, are invariably the same thing, and they wind up rolling all of that out. And me planning my content schedule for re:Quinnvent around that was a mistake. I had to cancel a last-minute rebuttal to his keynote because there was so little there of any substance that all would have been a half-hour-long personal attack and I try not to do that.Pete: Mmm. Yeah, the online discussion, I feel like, around the keynote was really, like, lackluster. It was yeah, like you said, very devoid of… not value; it's not really the right word, but just substance and heft to it. And maybe, look, we were just blessed with many, many years of these dense, really, really dense, full keynotes that were yeah, just massive feature drops, where here's a thing and here's a thing, and it was almost that, like, Apple-esque style kind of keynote where it was like, we're just going to bombard you with so many amazing things that kind of is in a cohesive storyline. I think that's the thing that they were always very good about in the past was having a cohesive story to tell about all of these crazy features.All of these features that they were just coming out with at this incredible velocity, they could weave the story around it. And you felt like, yeah, keynote was whatever hour, two hours long, but it would go by—it always felt like it would go by quickly because they were just they had down kind of really tight messaging and kept your attention the whole way through because you were kind of like, “Well, what's next? There's always—there's more. There's got to be more.” And there would be, right? There would be that payoff.Corey: I'm glad that they recognized that what got them here won't get them there, but I do wish that they had done a better job of signaling that to us in more effective ways. Does that make any sense?Pete: Yeah, that's an interesting… it's kind of an interesting thought exercise. I mean, you kind of mentioned before earlier, before we started recording, the CMO job is still available, it's still open [laugh] at AWS. So, if this was a good way to attract a top-tier CMO, I'd almost feel like if you were that person to come in and be like, “Hey, this did not work. Here are the following reasons and here's what you need to do to improve it.” Like, you might have a pretty solid shot of landing that role [laugh].Corey: Yeah, I'm not trying to make people feel intentionally bad over it. This stuff is very hard, particularly at scale. The problem I had with his keynote was not in fact that he was a bad speaker, far from it. He was good a year ago, he's clearly put work and energy into becoming better over the past year. From a technical analysis of how is Adam Selipsky as a public speaker, straight A's as far as I'm concerned, and I spent a lot of time focusing on this stuff myself as a professional speaker myself. I have no problems with how he wound up delivering any of the content. My problem was with the content itself. It feels like he was let down by the content team.Pete: Yeah, it definitely felt not as dense or as rich as we had come to expect in previous years. I don't think it was that the content didn't exist. It's not like they didn't build just as much, if not way, way more than they have in previous years. It just seemed to just not be part of the talk.I don't know. I always kind of wonder, too, is this just an audience thing? Which is, like, maybe I'm just not the audience for his talk, right? Was there someone else in that Expo Hall, someone else watching the stream, that was just kept on the edge of their seat hearing these stories? I don't know. I'm really kind of curious. Like, you know, are we only representing this one slice of the pie, basically?Corey: I think part of the problem is that re:Invent has grown so big, that it doesn't know what it wants to be anymore. Is it a sales event? By the size of the Expo Hall, yeah, it kind of is. Is it a partner expo where they talk about how they're going to milk various companies? Possibly. There's certainly one of those going on.There was an analyst summit that I attended for a number of days during re:Invent this year. They have a whole separate thing for press. The community has always been a thriving component of re:Invent, at least for me, and seeing those folks is always terrific. Is it supposed to be where they dump a whole bunch of features and roadmap information? Is it time for them to wind up having executive meetings with their customers? It tries to be all of those things and as a result, at this scale it feels like it is failing to be any of them, at least in an effective, cohesive way.Pete: Yeah, and you really nailed each of the personas, like, of a re:Invent attendee. I've talked to many people who are considering going to re:Invent, and they're, “I don't really know if I want to go, but I really want to go to some sessions, and I really want to do this.” And I always have to kind of push back and say, “Look, if you're only going there to attend talks,” like, just don't bother, right? As everyone knows, the talks are all recorded, you can watch them later. I did have conversations with some engineer, principal engineer level software folks that were there and the prevailing consensus from chatting with those folks, kind of anecdotally, is that, like, they had actually a lot of struggles even getting into some of these sessions, which for anyone who has been to re:Invent in the last, I don't know, four or five years, like, it's still a challenge, right?There's—you got to register for a lot of these talks way far in advance, there'll be a standby list, there'll be a standby line. It's a lot of a lot. And so, there's not usually a ton of value there. And so, I always try to say, like, “If you're going to re:Invent your, kind of, main purpose to go would be more for networking,” or just you're going because of the human interaction that you hope to get out of it, right, the high bandwidth conversations that are really hard to do in other areas. And I think you've nailed a bunch of those, right? Like, an analyst briefing is really efficient if you can get all the analysts in a room versus doing one-off analyst meetings.Meeting with big enterprises and hearing their thoughts and feelings and needs and requirements, you can get a lot of those conversations. And especially, too, if, like, talking to an enterprise and they got a dozen people all spread over the world, well you can get them all in one room, like, that's pretty amazing in this world. And then on the sales side, I feel like granted, I spent most of my time in the Expo Hall, but that was probably the area that I think you said earlier which I really picked up on, which was the balance between sponsors and attendees felt out of whack. Like it felt like there were way more attendees than the sponsors that I would have kind of expect to see.Now, there were a lot of sponsors on that Expo Hall and it took days. I mean, I was on the Expo Hall for days walking around and chatting with different companies and people. But one of the things that I saw that I have never seen before was a number of sponsorship booths, right—and these are booths that are, like, prebuilt, ten by ten-foot size or the smaller ones—that were blanks. They were like, you know, like, in a low-quality car where you have blank buttons that, like, if you paid more you get that feature. Walking around, there was a nonzero number of just straight-up empty booth, blank booths around which, I don't know, like, that felt kind of telling. Like, did they not sell all their sponsorships? Has that ever happened? I don't even know. But this was I felt like the first I've had—Corey: Or did companies buy the sponsorship and then realize that it was so expensive to go on top of it, throwing bad money after good might not have made sense. Because again, when people—Pete: Right.Corey: —brought out these sponsorships, in many cases, it was in the very early days of the growing recession we've seen now. And they may have been more optimistic, their circumstances may certainly have changed. I do know that pricing for re:Invent sponsorships was lower this past year than it had been in previous years. In 2019, for example, they had two Expo Halls, one at the ARIA Quad and the other at the Venetian. They had just one this year, which made less running around on my part, but still.Pete: Yeah, I do remember that, that they had so many sponsors. What I would say about the sponsors that there's two parts of this that were actually interesting. One, you're definitely right. As someone who has sponsored re:Invent before and has had to navigate that world, you are likely going to commit to the sponsorship as early as June, you know, could even be earlier than June depending on how big of a thing that you're doing. But it's early. It's usually in the summertime that you're—if you haven't made a decision by the summertime, like, you could actually not get a booth, right?And this was, I remember, the last one that I had sponsored was maybe 2018, 2019. And, like, you don't want those last few booths. Like, they put you in the back and not a good way. But going there, were a lot of—I did notice a lot of booths that had pretty massive layoffs who still had the booths, you know, and again, large booths, large companies, which again, same thing. I kind of am like, “Wow, like, how many employees did that booth cost you, right?”Because like [laugh], some of these booths are hundreds of thousands of dollars to sponsor. And then the other thing that I actually noticed, too, which I was honestly a little surprised by, with the exception of the Datadog booth; I love my friends at Datadog, they have the most amazingly aggressive booth BDRs who are always just, they'll get you if you're, like, hovering near them. And there's always someone to talk to over there. Like, they staff it really, really well. But there were some other booths that I was actually really interested in talking to some of the people to learn about their technology, that I actually waited to talk to someone. Like, I waited for someone to talk to, and then finally I'm like, “You know what? I'm going to come back.”And then I came back and waited again. So, it's like, how many of these sponsors obviously spent a lot of money to go there, then months later, they start looking at the people that they have to support this, they've already had some layoffs and probably sent a much smaller audience there to actually, like, operate the booth.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate. Is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other; which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at honeycomb.io/screaminginthecloud. Observability: it's more than just hipster monitoring.Corey: One bright light that I did enjoy and I always enjoy, though I'm not certain how actionable it is in the direct sense, was Peter DeSantis' Monday Night Live keynote. It was great. I mean, the one criticism I had of it—on Twitter at the time, before that thing melted and exploded—was that it was a bit mislabeled because it really should have been what it turned into midway through of surprise computer science lecture with Professor DeSantis. And I was totally there for it. But it was fun just watching some of the deep magic that makes this all work be presented in a way that most of us normally don't get to see.Pete: Well, this was the first year they did not do their Midnight Madness over-the-top kind of thing. And I also I don't recall that I saw them doing one of the other things I feel like is at night is they're, like, giant wing-eating competition. Am I wrong? Did they do that this year and I just missed hearing about it?Corey: They did not. Turns out that competitive Gluttony is not as compelling as it once was. But they also canceled their Midnight Madness event a month or two before re:Invent itself. What was super weird to me was that there was no event—community or otherwise—that sprung forth to seize that mantle. So, you had a whole bunch of people who were used to going for several hours that night to a big event with nothing to do.And at 9 p.m. they started just dumping a whole bunch of service releases in their blog and RSS feeds and the rest, and it just felt very weird and discordant. Like, do they think that we have nothing better to do than sit here and read through this on a Sunday night where we would have otherwise been at a party? Well yeah, in my case, I'm super sad and of course, I had nothing better to do that night. But most people had things going on.Pete: Yeah. Yeah, exactly. I think also, if you—maybe it's a little bit better now but I don't know when you have to buy that many chicken wings in advance, but with supply chains being what they are and the cost of chicken wings, I mean, not that I track the cost of chicken wings, except I absolutely do every time I go to Costco, they're up substantially. So, that was probably a contributing factor to the wing-eating contest: supply chain pain and suffering. But yeah, it's really interesting that just even in what some of the sponsors kind of were doing this year over previous years, I doubt they did this in 2021—but maybe, I don't know—but definitely not in 2019, something that I don't recall to this level was the sponsors essentially booking out entire restaurants near the venue every single day of the conference.And so again, if you were at this event like we were, and you at the end of the thing, were just like, I just want to sit and I've got a handful of friends, I want to sit and, like, have a drink, and just, like, chat and catch up and hear how the day went and everything else, finding a place to actually go to do that was very, very hard to do. And the thing that I noticed was—again, seemed like it was new this year; I don't recall it in 2019 to this level is, there were a lot of the big sponsors that had just booked a whole restaurant, breakfast, lunch, and dinner, like, from open to close, fully booked it, which was honestly, brilliant.Corey: Oh, yeah. If you bring 200, 300 people to an event, you've got to feed him somehow. And, “Hey, can we just rent out your restaurant for the entirety of this week?” Is not out of the question compared to what you'd even spend just reimbursing that sea of people to go and eat somewhere else.Pete: Exactly. The reason—I'm approaching this from, like, a business perspective—if I had a large group of enterprise salespeople and they need a place to book meetings, well, it's super compelling if I'm being courted by one of these salespeople and they're like, “Hey, come and have breakfast. Come and grab a coffee.” You know, and there's a place where you can sit down and quietly enjoy that meal or coffee while having a sale. Like, I'll have that sales conversation and I'm going to be way more motivated to show up to it because you're telling me it's like, this is where we're going to meet.Versus some of my friends were trying to, like, coordinate a lunch or a coffee and it's like, do we want to go to the Starbucks that has 500 people in line or do we want to walk four hotels, you know, down the street to find a bar that has video poker that no one will be sitting at and that we can just sit down and talk, right? It kind of felt like those were your two options.Corey: One thing that MongoDB did is rented out the Sugarcane restaurant. And they did this a couple of years in a row and they wound up effectively making it available to community leaders, heroes, and whatnot, for meetings or just a place to sit down and catch your breath. And I think that was a brilliant approach. You've gone to the trouble of setting this thing up for meetings for your execs and whatnot. Why not broaden it out?You can't necessarily do it for everyone, for obvious reasons, but it was nice to just reach out to folks in your orbit and say, “Yeah, this is something available to you.” I thought that was genius. And I—Pete: Oh yeah.Corey: —wish I thought of doing something like that. Let's be clear. I also wish I had rent-out-Sugarcane-for-a-week budget. But you know—Pete: [laugh].Corey: —we take what we can get.Pete: Yeah. That'll be a slight increase to the Spite Budget to support that move.Corey: Just a skosh, yeah.Pete: Yeah, the MongoDB, they were one that I do remember had done it similarly. I don't know if they had done it, kind of, full-time before, but a friend of mine work there, had invited me over and said, “Hey, like, come by, let's grab a drink. You know, we've got this hotel, you know, this restaurant kind of booked out.” And that was back in 2019. Really enjoyed it.And yeah, I noticed it was like, you know, basically, they had this area available, again, a place to sit down, to open your laptop, to respond to some emails, making it available to community people should have been a no-brainer to, really, all of these other sponsors that may have times of less kind of attendance, right? So obviously, at any of the big meals, maybe that's when you can't make it available for all the people you want to, but there's going to be off hours in between times that making that available and offering that up generates a supreme amount of goodwill, you know, in the community because you know, you're just looking for a place to sit out and drink some water [laugh].Corey: Yeah, that was one challenge that I saw across the board. There were very few places to just sit and work on something. And I'm not talking a lounge everywhere around every corner was needed necessarily or even advisable. No, the problem I've got was that I just wanted to sit down for two or three minutes and just type up an email quickly, or a tweet or something, and nope, you're walking and moving the whole time.Pete: Yeah. Now honestly, this would be a—this was a big missed opportunity for the Amazon event planning folks. There was a lot of unnecessary space usage that I understand why they had it. Here's an area you could play Foursquare, here's an area that had seesaws that you could sit on. Like, just, I don't know, kitschy stuff like that, and it was kind of off to the side or whatever.Those areas honestly, like, we're kind of off to the side, they were a little bit quieter. Would have been a great spot to just, like, load up some chairs and couches and little coffee tables and just having places that people could sit down because what ended up happening—and I'm sure you saw it just like I did—is that any hallway that had somewhere that you could lean your back against had a line of people just sitting there on their laptops because again, a lot of us are at this event, but we're also have jobs that we're working at, too, and at some point during the day, you need to check in, you need to check some stuff out. It felt like a lack of that kind of casual space that you can just relax in. And when you add on top of all the restaurants nearby being essentially fully booked, it really, really leaves you hanging for any sort of area to sit and relax and just check a thing or talk to a person or anything like that.Corey: Yeah, I can't wait to see what lessons get learned from this and how it was a mapping to next year, across the board. Like, I have a laundry list of things that I'm going to do differently at re:Invent next year. I do every year. And sometimes it works out; sometimes it really doesn't. And it's a constant process of improvement.I mean, one of the key breakthroughs for me was when I finally internalized the idea that, yeah, this isn't going to be like most jobs where I get fired in the next six months, where when I'm planning to go to re:Invent this is not the last re:Invent I will be at in my current capacity, doing what I do professionally. And that was no small thing. Where oh, yeah. So, I'm already making plans, not just for next re:Invent, but laying the groundwork for the re:Invent after that.Pete: Yeah, I mean, that's smart way to do it. And especially, too, when you don't consider yourself an analyst, even though you obviously are an analyst. Maybe you do consider yourself an analyst, but you're [laugh] more, you know, you're also the analyst who will go and actually use the product and start being like, “Why does this work the way it does?” But you're kind of a little bit the re:Invent target audience in a lot of ways, right? You're kind of equal parts on the analyst expert and user as well. It's like you kind of touch in a bunch of those areas.But yeah, I mean, I would say the one part that I definitely enjoyed was the nature walk that you did. And just seeing the amount of people that also enjoyed that and came by, it was kind of surreal to watch you in, like, full safari garb, basically meandering through the Expo Hall with this, like, trail of, like, backpacks [laugh] following you around. It was a lot of fun. And, you know, it's like stuff like that, where people are looking for interesting takes on, kind of, the state of something that is its own organism. Like, the Expo Hall is kind of its own thing that is outside of the re:Invent control. It's kind of whatever is made up by the people who are actually sponsoring it.Corey: Yeah, it was neat to see it play out. I'm curious to see how it winds up continuing to evolve in future years. Like right now, the Nature Walk is a blast, but it was basically at the top and I had something like 50 people following me around at one point. And that is too big for the Expo Hall. And I'm not there to cause a problem for AWS. Truly, I'm not. So, I need to find ways to embrace that in ways that don't kill the mojo or the energy but also don't create problems for, you know, the company whose backup I am perched upon, yelling more or less ridiculous things.Pete: [laugh]. I think it was particularly interested in how many people I'd be walking by and every once a while I would see, like, a friend of mine, someone actually working one of the booths and just be like, “What's going on here?” Like, I know one of my friends even said, “Yeah, like, nothing draws a crowd like a crowd.” And you can almost see more people [laugh] just, like, connecting themselves onto this safari train moving their way through. Yeah, it's a sight to see, that's for sure [laugh].Corey: Yeah, I'll miss aspects of this. Again, nothing can ever stay the same, on some level. You've got to wind up continuing to evolve and grow or you wind up more or less just frozen in place[ and nothing great ever happens for you.Pete: Yeah, I mean, again, Expo Hall has gone through these different iterations, and I—you know, when it does come to the event, as I kind of think back, I probably have spent most of my time actually in the Expo Hall, usually just related to the fact that, like, when you're a sponsor, like, you're just—that's where you're at. For better or worse, you're going to be in there. And especially if you're a sponsor, you want to check out what other sponsors are doing because you want to get ideas around things that you might want to try in later years. I mentioned Datadog before because Datadog to this day continues to have the best-designed booth ever, right? Like when it comes to a product that is highly demoable, I've been myself as a sponsor, it has always been a struggle to have a very effective demo setup.And I actually remember, kind of, recommending to a startup that I was at years ago, I'm doing a demo setup that was very, very similar to how Datadog did it because it was brilliant, where you have this, like, octagon around a main area of tables, and having double-sided demo stations. A lot more people are doing this now, but again, as I walked by I was again reminded just how effective that setup is because not only do you have people that just they don't want to talk, they just want to look, and they can kind of safely stand there and look, but you also have enough people staffing the booth for conversations that for, like me who actually might want to ask for questions, I don't have to wait and I can get an answer and be taken care of right away, versus some other booths. This year, one of the areas that I actually really enjoyed—and I don't even know the details of, like, how it all came about—but it looked like some sort of like Builder's Expo. I don't know if you remember walking by there, but there was a whole area of different people who had these little IoT or various powered things. One of them was, like, a marble sorting thing that was set up with a bunch of AWS services. I think there was like the Simple Beer Service V… four or five at this point. I had one of those iterations.It was some sort of mixture between Amazon software services that were powering these, like, physical things that you can interact with. But what was interesting is like, I have no idea, like, how it was set up, and who—I'm assuming it was Amazon specific—but each of these little booths were like chocked up with information about who they were and what they built, which gave it a feel of, like, this was like a last-minute builder event thing. It didn't feel like it was a highly produced thing. It had a much more casual feel to it, which honestly got me more interested to spend time there and check out the different booths.Corey: It was really nice to be able to go and I feel like you got to see all of the booths and whatnot. I know in previous years, it feels like you go looking for specific companies and you never find them. And you thought, “Oh”—Pete: [laugh].Corey: —“They must not have been here.” You find out after the fact, oh no, just you were looking in the wrong direction because there was so much to see.Pete: There were definitely still a couple of those. I had a list of a handful of booths I wanted to stop by, either to say hi to someone I knew who was going to be there or just to chat with them in general, there was a couple that I had to do a couple loops to really track them down. But yeah, I mean, it didn't feel as overly huge as a previous one, or as previous ones. I don't know, maybe it was like the way they designed it, the layout was maybe a little bit more efficient so that you could do loops through, like, an outer loop and an inner loop and actually see everything, or if it just was they just didn't have enough sponsors to truly fill it out and maybe that's why it felt like it was a little bit more approachable.I mean, it was still massive. I mean, it was still completely over the top, and loud and shiny lights and flashing things and millions of people. But it is kind of funny that, like, if you do enough of these, you can start to say, “Oh well, I don't know, it's still felt… a little bit less [laugh] for some reason.”Corey: Yeah, just a smidgen. Yeah. Pete, it is always a pleasure to get your take on re:Invent and see what you saw that I didn't and vice versa. And same time next year, same place?Pete: Yeah. I mean, like I said, one of my favorite parts of re:Invent is, you know, we always try to schedule, like, an end-of-event breakfast when we're both just supremely exhausted. Most of us don't even have a voice by the end. But just being able to, like, catch up and do our quick little recap and then obviously to be able to get on a podcast and talk about it is always a lot of fun. And yeah, thanks again for having me. This is—it's always, it's always a blast.Corey: It really is. Pete Cheslock, Head of Growth and Community at AppMap. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice and then put something insulting about me in the next keynote because you probably work on that content team.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Exposing Vulnerabilities in the World of Cloud Security with Tim Gonda

Screaming in the Cloud

Play Episode Listen Later Jan 10, 2023 33:23


About TimTim Gonda is a Cloud Security professional who has spent the last eight years securing and building Cloud workloads for commercial, non-profit, government, and national defense organizations. Tim currently serves as the Technical Director of Cloud at Praetorian, influencing the direction of its offensive-security-focused Cloud Security practice and the Cloud features of Praetorian's flagship product, Chariot. He considers himself lucky to have the privilege of working with the talented cyber operators at Praetorian and considers it the highlight of his career.Tim is highly passionate about helping organizations fix Cloud Security problems, as they are found, the first time, and most importantly, the People/Process/Technology challenges that cause them in the first place. In his spare time, he embarks on adventures with his wife and ensures that their two feline bundles of joy have the best playtime and dining experiences possible.Links Referenced: Praetorian: https://www.praetorian.com/ LinkedIn: https://www.linkedin.com/in/timgondajr/ Praetorian Blog: https://www.praetorian.com/blog/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Thinkst Canary. Most Companies find out way too late that they've been breached. Thinkst Canary changes this. Deploy Canaries and Canarytokens in minutes and then forget about them. Attackers tip their hand by touching 'em giving you the one alert, when it matters. With 0 admin overhead and almost no false-positives, Canaries are deployed (and loved) on all 7 continents. Check out what people are saying at canary.love today!Corey: Kentik provides Cloud and NetOps teams with complete visibility into hybrid and multi-cloud networks. Ensure an amazing customer experience, reduce cloud and network costs, and optimize performance at scale — from internet to data center to container to cloud. Learn how you can get control of complex cloud networks at www.kentik.com, and see why companies like Zoom, Twitch, New Relic, Box, Ebay, Viasat, GoDaddy, booking.com, and many, many more choose Kentik as their network observability platform. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Every once in a while, I like to branch out into new and exciting territory that I've never visited before. But today, no, I'd much rather go back to complaining about cloud security, something that I tend to do an awful lot about. Here to do it with me is Tim Gonda, Technical Director of Cloud at Praetorian. Tim, thank you for joining me on this sojourn down what feels like an increasingly well-worn path.Tim: Thank you, Corey, for having me today.Corey: So, you are the Technical Director of Cloud, which I'm sort of short-handing to okay, everything that happens on the computer is henceforth going to be your fault. How accurate is that in the grand scheme of things?Tim: It's not too far off. But we like to call it Praetorian for nebula. The nebula meaning that it's Schrödinger's problem: it both is and is not the problem. Here's why. We have a couple key focuses at Praetorian, some of them focusing on more traditional pen testing, where we're looking at hardware, hit System A, hit System B, branch out, get to goal.On the other side, we have hitting web applications and [unintelligible 00:01:40]. This insecure app leads to this XYZ vulnerability, or this medical appliance is insecure and therefore we're able to do XYZ item. One of the things that frequently comes up is that more and more organizations are no longer putting their applications or infrastructure on-prem anymore, so therefore, some part of the assessment ends up being in the cloud. And that is the unique rub that I'm in. And that I'm responsible for leading the direction of the cloud security focus group, who may not dive into a specific specialty that some of these other teams might dig into, but may have similar responsibilities or similar engagement style.And in this case, if we discover something in the cloud as an issue, or even in your own organization where you have a cloud security team, you'll have a web application security team, you'll have your core information security team that defends your environment in many different methods, many different means, you'll frequently find that the cloud security team is the hot button for hey, the server was misconfigured at one certain level, however the cloud security team didn't quite know that this web application was vulnerable. We did know that it was exposed to the internet but we can't necessarily turn off all web applications from the internet because that would no longer serve the purpose of a web application. And we also may not know that a particular underlying host's patch is out of date. Because technically, that would be siloed off into another problem.So, what ends up happening is that on almost every single incident that involves a cloud infrastructure item, you might find that cloud security will be right there alongside the incident responders. And yep, this [unintelligible 00:03:20] is here, it's exposed to the internet via here, and it might have the following application on it. And they get cross-exposure with other teams that say, “Hey, your web application is vulnerable. We didn't quite inform the cloud security team about it, otherwise this wouldn't be allowed to go to the public internet,” or on the infrastructure side, “Yeah, we didn't know that there was a patch underneath it, we figured that we would let the team handle it at a later date, and therefore this is also vulnerable.” And what ends up happening sometimes, is that the cloud security team might be the onus or might be the hot button in the room of saying, “Hey, it's broken. This is now your problem. Please fix it with changing cloud configurations or directing a team to make this change on our behalf.”So, in essence, sometimes cloud becomes—it both is and is not your problem when a system is either vulnerable or exposed or at some point, worst case scenario, ends up being breached and you're performing incident response. That's one of the cases why it's important to know—or important to involve others in the cloud security problem, or to be very specific about what the role of a cloud security team is, or where cloud security has to have certain boundaries or has to involve certain extra parties have to be involved in the process. Or when it does its own threat modeling process, say that, okay, we have to take a look at certain cloud findings or findings that's within our security realm and say that these misconfigurations or these items, we have to treat the underlying components as if they are vulnerable, whether or not they are and we have to report on them as if they are vulnerable, even if it means that a certain component of the infrastructure has to already be assumed to either have a vulnerability, have some sort of misconfiguration that allows an outside attacker to execute attacks against whatever the [unintelligible 00:05:06] is. And we have to treat and respond our security posture accordingly.Corey: One of the problems that I keep running into, and I swear it's not intentional, but people would be forgiven for understanding or believing otherwise, is that I will periodically inadvertently point out security problems via Twitter. And that was never my intention because, “Huh, that's funny, this thing isn't working the way that I would expect that it would,” or, “I'm seeing something weird in the logs in my test account. What is that?” And, “Oh, you found a security vulnerability or something akin to one in our environment. Oops. Next time, just reach out to us directly at the security contact form.” That's great. If I'd known I was stumbling blindly into a security approach, but it feels like the discovery of these things is not heralded by an, “Aha, I found it.” But, “Huh, that's funny.”Tim: Of course. Absolutely. And that's where some of the best vulnerabilities come where you accidentally stumble on something that says, “Wait, does this work how—what I think it is?” Click click. Like, “Oh, boy, it does.”Now, I will admit that certain cloud providers are really great about with proactive security reach outs. If you either just file a ticket or file some other form of notification, just even flag your account rep and say, “Hey, when I was working on this particular cloud environment, the following occurred. Does this work the way I think it is? Is this is a problem?” And they usually get back to you with reporting it to their internal team, so on and so forth. But let's say applications are open-source frameworks or even just organizations at large where you might have stumbled upon something, the best thing to do was either look up, do they have a public bug bounty program, do they have a security contact or form reach out that you can email them, or do you know, someone that the organization that you just send a quick email saying, “Hey, I found this.”And through some combination of those is usually the best way to go. And to be able to provide context of the organization being, “Hey, the following exists.” And the most important things to consider when you're sending this sort of information is that they get these sorts of emails almost daily.Corey: One of my favorite genre of tweet is when Tavis Ormandy and Google's Project Zero winds up doing a tweet like, “Hey, do I know anyone over at the security apparatus at insert company here?” It's like, “All right. I'm sure people are shorting stocks now [laugh], based upon whatever he winds up doing that.”Tim: Of course.Corey: It's kind of fun to watch. But there's no cohesive way of getting in touch with companies on these things because as soon as you'd have something like that, it feels like it's subject to abuse, where Comcast hasn't fixed my internet for three days, now I'm going to email their security contact, instead of going through the normal preferred process of wait in the customer queue so they can ignore you.Tim: Of course. And that's something else you want to consider. If you broadcast that a security vulnerability exists without letting the entity or company know, you're also almost causing a green light, where other security researchers are going to go dive in on this and see, like, one, does this work how you described. But that actually is a positive thing at some point, where either you're unable to get the company's attention, or maybe it's an open-source organization, or maybe you're not being fully sure that something is the case. However, when you do submit something to the customer and you want it to take it seriously, here's a couple of key things that you should consider.One, provide evidence that whatever you're talking about has actually occurred, two, provide repeatable steps that the layman's term, even IT support person can attempt to follow in your process, that they can repeat the same vulnerability or repeat the same security condition, and three, most importantly, detail why this matters. Is this something where I can adjust a user's password? Is this something where I can extract data? Is this something where I'm able to extract content from your website I otherwise shouldn't be able to? And that's important for the following reason.You need to inform the business what is the financial value of why leaving this unpatched becomes an issue for them. And if you do that, that's how those security vulnerabilities get prioritized. It's not necessarily because the coolest vulnerability exists, it's because it costs the company money, and therefore the security team is going to immediately jump on it and try to contain it before it costs them any more.Corey: One of my least favorite genres of security report are the ones that I get where I found a vulnerability. It's like, that's interesting. I wasn't aware that I read any public-facing services, but all right, I'm game; what have you got? And it's usually something along the lines of, “You haven't enabled SPF to hard fail an email that doesn't wind up originating explicitly from this list of IP addresses. Bug bounty, please.” And it's, “No genius. That is very much an intentional choice. Thank you for playing.”It comes down to also an idea of whenever I have reported security vulnerabilities in the past, the pattern I always take is, “I'm seeing something that I don't fully understand. I suspect this might have security implications, but I'm also more than willing to be proven wrong.” Because showing up with, “You folks are idiots and have a security problem,” is a terrific invitation to be proven wrong and look like an idiot. Because the first time you get that wrong, no one will take you seriously again.Tim: Of course. And as you'll find that most bug bounty programs are, if you participate in those, the first couple that you might have submitted, the customer might even tell you, “Yeah, we're aware that that vulnerability exists, however, we don't view it as a core issue and it cannot affect the functionality of our site in any meaningful way, therefore we're electing to ignore it.” Fair.Corey: Very fair. But then when people write up about those things, well, they've they decided this is not an issue, so I'm going to do a write-up on it. Like, “You can't do that. The NDA doesn't let you expose that.” “Really? Because you just said it's a non-issue. Which is it?”Tim: And the key to that, I guess, would also be that is there an underlying technology that doesn't necessarily have to be attributed to said organization? Can you also say that, if I provide a write-up or if I put up my own personal blog post—let's say, we go back to some of the OpenSSL vulnerabilities including OpenSSL 3.0, that came out not too long ago, but since that's an open-source project, it's fair game—let's just say that if there was a technology such as that, or maybe there's a wrapper around it that another organization could be using or could be implementing a certain way, you don't necessarily have to call the company up by name, or rather just say, here's the core technology reason, and here's the core technology risk, and here's the way I've demoed exploiting this. And if you publish an open-source blog like that and then you tweet about that, you can actually gain security support around such issue and then fight for the research.An example would be that I know a couple of pen testers who have reported things in the past, and while the first time they reported it, the company was like, “Yeah, we'll fix it eventually.” But later, when another researcher report this exact same finding, the company is like, “We should probably take this seriously and jump on it.” It sometimes it's just getting in front of that and providing frequency or providing enough people around to say that, “Hey, this really is an issue in the security community and we should probably fix this item,” and keep pushing others organizations on it. A lot of times, they just need additional feedback. Because as you said, somebody runs an automated scanner against your email and says that, “Oh, you're not checking SPF as strictly as the scanner would have liked because it's a benchmarking tool.” It's not necessarily a security vulnerability rather than it's just how you've chosen to configure something and if it works for you, it works for you.Corey: How does cloud change this? Because a lot of what we talked about so far could apply to anything. Go back in time to 1995 and a lot of what we're talking about mostly holds true. It feels like cloud acts as a significant level of complexity on top of all of this. How do you view the differentiation there?Tim: So, I think it differentiated two things. One, certain services or certain vulnerability classes that are handled by the shared service model—for the most part—are probably secure better than you might be able to do yourself. Just because there's a lot of research, the team is [experimented 00:13:03] a lot of time on this. An example of if there's a particular, like, spoofing or network interception vulnerability that you might see on a local LAN network, you probably are not going to have the same level access to be able to execute that on a virtual private cloud or VNet, or some other virtual network within cloud environment. Now, something that does change with the paradigm of cloud is the fact that if you accidentally publicly expose something or something that you've created expo—or don't set a setting to be private or only specific to your resources, there is a couple of things that could happen. The vulnerabilities exploitability based on where increases to something that used to be just, “Hey, I left a port open on my own network. Somebody from HR or somebody from it could possibly interact with it.”However, in the cloud, you've now set this up to the entire world with people that might have resources or motivations to go after this product, and using services like Shodan—which are continually mapping the internet for open resources—and they can quickly grab that, say, “Okay, I'm going to attack these targets today,” might continue to poke a little bit further, maybe an internal person that might be bored at work or a pen tester just on one specific engagement. Especially in the case of let's say, what you're working on has sparked the interest of a nation-state and they want to dig into a little bit further, they have the resources to be able to dedicate time, people, and maybe tools and tactics against whatever this vulnerability that you've given previously the example of—maybe there's a specific ID and a URL that just needs to be guessed right to give them access to something—they might spend the time trying to brute force that URL, brute force that value, and eventually try to go after what you have.The main paradigm shift here is that there are certain things that we might consider less of a priority because the cloud has already taken care of them with the shared service model, and rightfully so, and there's other times that we have to take heightened awareness on is, one, we either dispose something to the entire internet or all cloud accounts within creations. And that's actually something that we see commonly. In fact, one thing I would like to say we see very common is, all AWS users, regardless if it's in your account or somewhere else, might have access to your SNS topic or SQS Queue. Which doesn't seem like that big of vulnerability, but I changed the messages, I delete messages, I viewed your messages, but rather what's connected to those? Let's talk database Lambda functions where I've got source code that a developer has written to handle that source code and may not have built in logic to handle—maybe there was a piece of code that could be abused as part of this message that might allow an attacker to send something to your Lambda function and then execute something on that attacker's behalf.You weren't aware of it, you weren't thinking about it, and now you've exposed it to almost the entire internet. And since anyone can go sign up for an AWS account—or Azure or GCP account—and then they're able to start poking at that same piece of code that you might have developed thinking, “Well, this is just for internal use. It's not a big deal. That one static code analysis tool isn't probably too relevant.” Now, it becomes hyper-relevant and something you have to consider with a little more attention and dedicated time to making sure that these things that you've written or deploying, are in fact, safe because misconfigured or mis-exposed, and suddenly the entire world is starts knocking at it, and increases the risk of, it may really well be a problem. The severity of that issue could increase dramatically.Corey: As you take a look across, let's call it the hyperscale clouds, the big three—which presumably I don't need to define out—how do you wind up ranking them in terms of security from top to bottom? I have my own rankings that I like to dole out and basically, this is the, let's offend someone at every one of these companies, no matter how we wind up playing it. Because I will argue with you just on principle on them. How do you view them stacking up against each other?Tim: So, an interesting view on that is based on who's been around longest and who is encountered of the most technical debt. A lot of these security vulnerabilities or security concerns may have had to deal with a decision made long ago that might have made sense at the time and now the company has kind of stuck with that particular technology or decision or framework, and are now having to build or apply security Band-Aids to that process until it gets resolved. I would say, ironically, AWS is actually at the top of having that technical debt, and actually has so many different types of access policies that are very complex to configure and not very user intuitive unless you speak intuitively JSON or YAML or some other markdown language, to be able to tell you whether or not something was actually set up correctly. Now, there are a lot of security experts who make their money based on knowing how to configure or be able to assess whether or not these are actually the issue. I would actually bring them as, by default, by design, between the big three, they're actually on the lower end of certain—based on complexity and easy-to-configure-wise.The next one that would also go into that pile, I would say is probably Microsoft Azure, who [sigh] admittedly, decided to say that, “Okay, let's take something that was very complicated and everyone really loved to use as an identity provider, Active Directory, and try to use that as a model for.” Even though they made it extensively different. It is not the same as on-prem directory, but use that as the framework for how people wanted to configure their identity provider for a new cloud provider. The one that actually I would say, comes out on top, just based on use and based on complexity might be Google Cloud. They came to a lot of these security features first.They're acquiring new companies on a regular basis with the acquisition of Mandiant, the creation of their own security tooling, their own unique security approaches. In fact, they probably wrote the book on Kubernetes Security. Would be on top, I guess, from usability, such as saying that I don't want to have to manage all these different types of policies. Here are some buttons I would like to flip and I'd like my resources, for the most part by default, to be configured correctly. And Google does a pretty good job of that.Also, one of the things they do really well is entity-based role assumption, which inside of AWS, you can provide access keys by default or I have to provide a role ID after—or in Azure, I'm going to say, “Here's a [unintelligible 00:19:34] policy for something specific that I want to grant access to a specific resource.” Google does a pretty good job of saying that okay, everything is treated as an email address. This email address can be associated in a couple of different ways. It can be given the following permissions, it can have access to the following things, but for example, if I want to remove access to something, I just take that email address off of whatever access policy I had somewhere, and then it's taken care of. But they do have some other items such as their design of least privilege is something to be expected when you consider their hierarchy.I'm not going to say that they're not without fault in that area—in case—until they had something more recently, as far as finding certain key pieces of, like say, tags or something within a specific sub-project or in our hierarchy, there were cases where you might have granted access at a higher level and that same level of access came all the way down. And where at least privilege is required to be enforced, otherwise, you break their security model. So, I like them for how simple it is to set up security at times, however, they've also made it unnecessarily complex at other times so they don't have the flexibility that the other cloud service providers have. On the flip side of that, the level of flexibility also leads to complexity at times, which I also view as a problem where customers think they've done something correctly based on their best knowledge, the best of documentation, the best and Medium articles they've been researching, and what they have done is they've inadvertently made assumptions that led to core anti-patterns, like, [unintelligible 00:21:06] what they've deployed.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: I think you're onto something here, specifically in—well, when I've been asked historically and personally to rank security, I have viewed Google Cloud as number one, and AWS is number two. And my reasoning behind that has been from an absolute security of their platform and a pure, let's call it math perspective, it really comes down to which of the two of them had what for breakfast on any given day there, they're so close on there. But in a project that I spin up in Google Cloud, everything inside of it can talk to each other by default and I can scope that down relatively easily, whereas over an AWS land, by default, nothing can talk to anything. And that means that every permission needs to be explicitly granted, which in an absolutist sense and in a vacuum, yeah, that makes sense, but here in reality, people don't do that. We've seen a number of AWS blog posts over the last 15 years—they don't do this anymore—but it started off with, “Oh, yeah, we're just going to grant [* on * 00:22:04] for the purposes of this demo.”“Well, that's horrible. Why would you do that?” “Well, if we wanted to specify the IAM policy, it would take up the first third of the blog post.” How about that? Because customers go through that exact same thing. I'm trying to build something and ship.I mean, the biggest lie in any environment or any codebase ever, is the comment that starts with, “To do.” Yeah, that is load-bearing. You will retire with that to do still exactly where it is. You have to make doing things the right way at least the least frictionful path because no one is ever going to come back and fix this after the fact. It's never going to happen, as much as we wish that it did.Tim: At least until after the week of the breach when it was highlighted by the security team to say that, “Hey, this was the core issue.” Then it will be fixed in short order. Usually. Or a Band-Aid is applied to say that this can no longer be exploited in this specific way again.Corey: My personal favorite thing that, like, I wouldn't say it's a lie. But the favorite thing that I see in all of these announcements right after the, “Your security is very important to us,” right after it very clearly has not been sufficiently important to them, and they say, “We show no signs of this data being accessed.” Well, that can mean a couple different things. It can mean, “We have looked through the audit logs for a service going back to its launch and have verified that nothing has ever done this except the security researcher who found it.” Great. Or it can mean, “What even are logs, exactly? We're just going to close our eyes and assume things are great.” No, no.Tim: So, one thing to consider there is in that communication, that entire communication has probably been vetted by the legal department to make sure that the company is not opening itself up for liability. I can say from personal experience, when that usually has occurred, unless it can be proven that breach was attributable to your user specifically, the default response is, “We have determined that the security response of XYZ item or XYZ organization has determined that your data was not at risk at any point during this incident.” Which might be true—and we're quoting Star Wars on this one—from a certain point of view. And unfortunately, in the case of a post-breach, their security, at least from a regulation standpoint where they might be facing a really large fine, is absolutely probably their top priority at this very moment, but has not come to surface because, for most organizations, until this becomes something that is a financial reason to where they have to act, where their reputation is on the line, they're not necessarily incentivized to fix it. They're incentivized to push more products, push more features, keep the clients happy.And a lot of the time going back and saying, “Hey, we have this piece of technical debt,” it doesn't really excite our user base or doesn't really help us gain a competitive edge in the market is considered an afterthought until the crisis occurs and the information security team rejoices because this is the time they actually get to see their stuff fixed, even though it might be a super painful time for them in the short run because they get to see these things fixed, they get to see it put to bed. And if there's ever a happy medium, where, hey, maybe there was a legacy feature that wasn't being very well taken care of, or maybe this feature was also causing the security team a lot of pain, we get to see both that feature, that item, that service, get better, as well as security teams not have to be woken up on a regular basis because XYZ incident happened, XYZ item keeps coming up in a vulnerability scan. If it finally is put to bed, we consider that a win for all. And one thing to consider in security as well as kind of, like, we talk about the relationship between the developers and security and/or product managers and security is if we can make it a win, win, win situation for all, that's the happy path that we really want to be getting to. If there's a way that we can make sure that experience is better for customers, the security team doesn't have to be broken up on a regular basis because an incident happened, and the developers receive less friction when they want to go implement something, you find that that secure feature, function, whatever tends to be the happy path forward and the path of least resistance for everyone around it. And those are sometimes the happiest stories that can come out of some of these incidents.Corey: It's weird to think of there being any happy stories coming out of these things, but it's definitely one of those areas that there are learnings there to be had if we're willing to examine them. The biggest problem I see so often is that so many companies just try and hide these things. They give the minimum possible amount of information so the rest of us can't learn by it. Honestly, some of the moments where I've gained the most respect for the technical prowess of some of these cloud providers has been after there's been a security issue and they have disclosed either their response or why it was a non-issue because they took a defense-in-depth approach. It's really one of those transformative moments that I think is an opportunity if companies are bold enough to chase them down.Tim: Absolutely. And in a similar vein, when we think of certain cloud providers outages and we're exposed, like, the major core flaw of their design, and if it kept happening—and again, these outages could be similar and analogous to an incident or a security flaw, meaning that it affected us. It was something that actually happened. In the case of let's say, the S3 outage of, I don't know, it was like 2017, 2018, where it turns out that there was a core DNS system that inside of us-east-1, which is actually very close to where I live, apparently was the core crux of, for whatever reason, the system malfunctioned and caused a major outage. Outside of that, in this specific example, they had to look at ways of how do we not have a single point of failure, even if it is a very robust system, to make sure this doesn't happen again.And there was a lot of learnings to be had, a lot of in-depth investigation that happened, probably a lot of development, a lot of research, and sometimes on the outside of an incident, you really get to understand why a system was built a certain way or why a condition exists in the first place. And it sometimes can be fascinating to kind of dig into that very deeper and really understand what the core problem is. And now that we know what's an issue, we can actually really work to address it. And sometimes that's actually one of the best parts about working at Praetorian in some cases is that a lot of the items we find, we get to find them early before it becomes one of these issues, but the most important thing is we get to learn so much about, like, why a particular issue is such a big problem. And you have to really solve the core business problem, or maybe even help inform, “Hey, this is an issue for it like this.”However, this isn't necessarily all bad in that if you make these adjustments of these items, you get to retain this really cool feature, this really cool thing that you built, but also, you have to say like, here's some extra, added benefits to the customers that you weren't really there. And—such as the old adage of, “It's not a bug, it's a feature,” sometimes it's exactly what you pointed out. It's not necessarily all bad in an incident. It's also a learning experience.Corey: Ideally, we can all learn from these things. I want to thank you for being so generous with your time and talking about how you view this increasingly complicated emerging space. If people want to learn more, where's the best place to find you?Tim: You can find me on LinkedIn which will be included in this podcast description. You can also go look at articles that the team is putting together at praetorian.com. Unfortunately, I'm not very big on Twitter.Corey: Oh, well, you must be so happy. My God, what a better decision you're making than the rest of us.Tim: Well, I like to, like, run a little bit under the radar, except on opportunities like this where I can talk about something I'm truly passionate about. But I try not to pollute the airwaves too much, but LinkedIn is a great place to find me. Praetorian blog for stuff the team is building. And if anyone wants to reach out, feel free to hit the contact page up in praetorian.com. That's one of the best places to get my attention.Corey: And we will, of course, put links to that in the [show notes 00:30:19]. Thank you so much for your time. I appreciate it. Tim Gonda, Technical Director of Cloud at Praetorian. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment talking about how no one disagrees with you based upon a careful examination of your logs.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Life of a Fellow Niche Internet Micro Celebrity with Matt Margolis

Screaming in the Cloud

Play Episode Listen Later Jan 5, 2023 36:36


About MattMatt is the head of community at Lawtrades, a legal tech startup that connects busy in-house legal departments with flexible on-demand legal talent. Prior to this role, Matt was the director of legal and risk management at a private equity group down in Miami, Florida. Links Referenced: Lawtrades: https://www.lawtrades.com/ Instagram: https://www.instagram.com/itsmattslaw/ TikTok: https://www.tiktok.com/@itsmattslaw Twitter: https://twitter.com/ItsMattsLaw LinkedIn: https://www.linkedin.com/in/flattorney/ duckbillgroup.com: https://duckbillgroup.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: If you asked me to rank which cloud provider has the best developer experience, I'd be hard-pressed to choose a platform that isn't Google Cloud. Their developer experience is unparalleled and, in the early stages of building something great, that translates directly into velocity. Try it yourself with the Google for Startups Cloud Program over at cloud.google.com/startup. It'll give you up to $100k a year for each of the first two years in Google Cloud credits for companies that range from bootstrapped all the way on up to Series A. Go build something, and then tell me about it. My thanks to Google Cloud for sponsoring this ridiculous podcast.Corey: This episode is sponsored by our friends at Logicworks. Getting to the cloud is challenging enough for many places, especially maintaining security, resiliency, cost control, agility, etc, etc, etc. Things break, configurations drift, technology advances, and organizations, frankly, need to evolve. How can you get to the cloud faster and ensure you have the right team in place to maintain success over time? Day 2 matters. Work with a partner who gets it - Logicworks combines the cloud expertise and platform automation to customize solutions to meet your unique requirements. Get started by chatting with a cloud specialist today at snark.cloud/logicworks. That's snark.cloud/logicworksCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Something that I've learned in my career as a borderline full-time shitposter is that as the audience grows, people tend to lose sight of the fact that no, no, the reason that I have a career is because I'm actually good at one or two specific things, and that empowers the rest of the shitposting, gives me a basis from which to stand. Today's guest is Matt Margolis, Head of Community at Lawtrades. And I would say he is also a superior shitposter, but instead of working in the cloud space, he works in the legal field. Matt, thank you for joining me.Matt: That was the nicest intro I've ever received in my entire career.Corey: Well, yes, usually because people realize it's you and slam the door in your face, I assume, just based upon some of your TikToks. My God. Which is—I should point out—where I first encountered you.Matt: You found me on TikTok?Corey: I believe so. It sends me down these really weird rabbit holes, and at first, I was highly suspicious of the entire experience. Like, it's showing ADHD videos all the time, and as far as advertisements go, and it's, “Oh, my God, they're doing this really weird tracking,” and like, no, no, they just realize I'm on TikTok. It's that dopamine hit that works out super well. For a while, it drifted me into lesbian TikTok—which is great—because apparently, I follow a lot of creators who are not men, but I also don't go for the whole thirst trap things. Like, who does that? That's right. Must be lesbians. Which, great, I'm in good company. And it really doesn't know what to make of me. But you show up on my feed with fairly consistent frequency. Good work.Matt: That is fac—I appreciate that. I don't know if that's a compliment, though. But I [laugh]—no, I appreciate it. You know, for me, I get… not to plug a friend but I get—Alex Su's TikToks are probably like, one in two and then the other person is—maybe I'm also on lesbian TikTok as well. I think maybe we have earned the similar vote here.Corey: In fact, there's cohorts that they slot people into and I feel like we're right there together. Though Ales Su, who has been on the show as well, talk about source of frustration. I mentioned in passing that I was going to be chatting with him to my wife, who's an attorney. And she lit up. Like, “Oh, my God, you know him? My girlfriends and I talk about him all the time.”And I was sitting there going, well, there better damn well be a subculture out there that talks about me and those glowing terms because he's funny, yes, but he's not that funny. My God. And don't tell him that. It'll go to his head.Matt: I say the same thing. I got a good one for you. I was once in the sales call, and I remember speaking with—I was like, “You know, I'm like, pretty decent on Twitter. I'm pretty decent on LinkedIn”—which I don't think anyone brags about that, but I do—“And I'm okay on, like, Instagram and TikTok.” And he goes, “That's cool. That's really cool. So, are you kind of like Alex? Like, Alex Su?” And I go? “Uh, yeah,” he goes, “Yeah, because he's really funny. He's probably the best lawyer out there that, you know, shitposts and post funny things on the internet.” And I just sat there—and I love Alex; he's a good friend—I just sat there, and I'm like, “All right. All right. This is a conversation about Alex. This isn't a conversation about Matt.” And I took him to stride. I called Alex immediately after. I'm like, “Hey, you want to hear something funny.” And he got a kick out of it. He certainly got a kick out of it.Corey: It's always odd to me, just watching my own reputation come back to me filtered through other people's perceptions whenever I wind up encountering people in the wild, and they say, oh, you're Corey Quinn at—which is usually my clue to look at them very carefully with my full attention because if their next words are, “I work at Amazon,” that's my cue to duck before I get punched in the face. Whereas in other cases, they're like, “Oh, yeah, you're hilarious on the Twitters.” Or, “I saw you give a conference talk years ago,” or whatever it is. But no one ever says the stuff that's actually intellectually rigorous. No one ever says, “Yeah, I read some of your work on AWS contract negotiation,” or, “In-depth bill analysis as mapped to architecture.” Yeah, yeah. That is not the stuff that sticks in people's head. It's, “No, no, the funny guy with his mouth wide open on the internet.” It's, “Yep, that's me. The human flytrap.”Matt: Yeah, I feel that. I've been described, I think, is a party clown. That comes up from time to time. And to your point, Corey, like, I get that all the time where someone will say, “Matt I really enjoyed that meme you posted, the TikTok, the funny humor.” And then every so often, I'll post, gosh, like, an article about something we're doing, maybe a white paper on commercial contracting, or some sort of topic that really fits into my wheelhouse, and people were like, “That's… I guess that's cool. I just thought you were a party clown.” And you know, I make the balloon animals but… not all the time.Corey: That's the weirdest part to me of all of this is just this weird experience where we become the party clowns and that is what people view us as, but peeling away the humor and the jokes and the things we do for engagement, as we're like, we're sitting here each trying to figure out the best way to light ourselves on fire and survive the experience because the views would be enormous, you do have a legal background. You are an attorney yourself—still are, if I understand the process properly. Personally have an eighth-grade education, so basically, what I know of bars is a little bit of a different context.Matt: I also know those bars. I'm definitely a fan of those bars as well. I am still an attorney. I was in private practice, I worked in the government. I then went in-house in private equity down in Miami, Florida. And now, though I am shitposter, you are right, I am still a licensed attorney in the state of Florida. Could not take a bar exam anywhere else because I probably would light myself on fire. But yeah, I am. I am still an attorney.Corey: It's wild to me just to see how much of this world winds up continuing to, I guess, just evolve in strange and different ways. Because you take a look at the legal profession, it's—what is it, the world's second oldest profession? Because they say that the oldest profession was prostitution and then immediately someone, of course, had a problem with this, so they needed to have someone to defend them and hence, lawyers; the second oldest profession. And it seems like it's a field steeped in traditionalism, and with the bar, yes, a bit of gatekeeping. And now it's trying to deal with a highly dynamic, extraordinarily irreverent society.And it feels like an awful lot of, shall we say, more buttoned-down attorney types tend to not be reacting to any of that super well. I mean, most of my interaction with lawyers in a professional context when it comes to content takes a lot more of the form of a cease and desist than it does conversations like this. Thanks for not sending one of those, by the way, so far. It's appreciated.Matt: [laugh]. No worries, no worries. The day is not over yet. First off, Corey, I'm going to do a thing that attorneys love doing is I'm going to steal what you just said and I'm going to use it later because that was stellar.Corey: They're going to license it, remember?Matt: License it.Corey: That's how this works.Matt: Copy and paste it. I'm going to re—its precedent now. I agree with you wholeheartedly. I see it online, I see it on Link—LinkedIn is probably the best example of it; I sometimes see it on Twitter—older attorneys, attorneys that are part of that old guard, see what we're doing, what we're saying, the jokes we're making—because behind every joke is a real issue a real thing, right? The reason why we laugh, at least for some of these jokes, is we commiserate over it. We're like, “That's funny because it hurts.”And a lot of these old-guard attorneys hate it. Do not want to talk about it. They've been living good for years. They've been living under this regime for years and they don't want to deal with it. And attorneys like myself who are making these jokes, who are shitposting, who are bringing light to these kinds of things are really, I would say dis—I hate to call myself a disrupter, but are disrupting the traditional buttoned-up attorney lifestyle and world.Corey: It's wild to me, just to see how much of this winds up echoing my own experiences in dealing with, shall we say, some of the more I don't use legacy, which is a condescending engineering term for ‘it makes money,' but some of the older enterprise companies that had the temerity to found themselves before five years ago in somewhere that wasn't San Francisco and build things on computers that weren't rented by the gigabyte-month from various folks in Seattle. It's odd talking to some of those folks, and I've heard from a number of people, incidentally, that they considered working with my company, but decided not to because I seem a little too lighthearted and that's not how they tend to approach things. One of the nice things about being a boutique consultant is that you get to build things like this to let the clients that are not likely to be a good fit self-select out of working with you.Matt: It's identical to law.Corey: Yeah. “Aren't you worried you're losing business?” Like, “Oh, don't worry. It's not business I would want.”Matt: I'm okay with it. I'll survive. Yeah, like, the clients that are great clients, you're right, will be attracted to it. The clients that you never wanted to approach, they probably were never going to approach you anyways, are not [laugh] going to approach you. So, I agree wholeheartedly. I was always told lawyers are not funny. I've been told that jobs, conferences, events—Corey: Who are you hanging out with doctors?Matt: [laugh]. Dentists. The funniest of doctors. And I've been told that just lawyers aren't funny, right? So, lawyers shouldn't be funny; that's not how they should present themselves.You're never going to attract clients. You're ever going to engage in business development. And then I did. And then I did because people are attracted by funny. People like the personality. Just like you Corey, people enjoy you, enjoy your company, enjoy what you have to do because they enjoy being around you and they want to continue via, you know, like, business relationship.Corey: That's part of the weird thing from where I sit, where it's this—no matter what you do or where you sit, people remain people. And one of the big eye-openers for me that happened, fortunately early in my career, was discovering that a number of execs at name brand, publicly traded companies—not all of them, but a good number; the ones you'd want to spend time with—are in fact, human beings. I know, it sounds wild to admit that, but it's true. And they laugh, they tell stories themselves, they enjoy ridiculous levels of nonsense that tends to come out every second time I opened my mouth. But there's so much that I think people lose sight of. “Oh, they're executives. They only do boring and their love language is PowerPoint.” Mmm, not really. Not all of them.Matt: It's true. Their love language sometimes is Excel. So, I agree [laugh].Corey: That's my business partner.Matt: I'm not good at Excel, I'll tell you that. But I hear that as well. I hear that in my own business. So, I'm currently at a place called Lawtrades, and for the listeners out there, if you don't know who Lawtrades is, this is the—I'm not a salesperson, but this is my sales spiel.Corey: It's a dating site for lawyers, as best I can tell.Matt: [laugh]. It is. Well, I guess close. I mean, we are a marketplace. If you're a company and you need an attorney on a fractional basis, right—five hours, ten hours, 15 hours, 20 hours, 40 hours—I don't care, you connect.And what we're doing is we're empowering these freelance attorneys and legal professionals to kind of live their life, right, away from the old guard, having to work at these big firms to work at big clients. So, that's what we do. And when I'm in these conversations with general counsels, deputy general counsels, heads of legal at these companies, they don't want to talk like you're describing, this boring, nonsense conversation. We commiserate, we talk about the practice, we talk about stories, war stories, funny things about the practice that we enjoy. It's not a conversation about business; it's a conversation about being a human being in the legal space. It's always a good time, and it always results in a long-lasting relationship that I personally appreciate more than—probably more than they do. But [laugh].Corey: It really comes down to finding the watering holes where your humor works. I mean, I made the interesting choice one year to go and attend a conference for CFOs and the big selling point of this conference was that it counts as continuing professional education, which as you're well aware, in regulated professions, you need to attend a certain number of those every so often, or you lose your registration slash license slash whatever it is. My jokes did not work there. Let's put it that way.Matt: [laugh]. That's unfortunate because I'm having trouble keeping a straight face as we do this podcast.Corey: It was definitely odd. I'm like, “Oh, so what do you do?” Like, “Oh, I'm an accountant.” “Well, that's good. I mean, assume you don't bring your work home with you and vice versa. I mean, it's never a good idea to hook up where you VLOOKUP.”And instead of laughing—because I thought as Excel jokes go, that one's not half bad—instead, they just stared at me and then walked away. All right. Sorry, buddy, I didn't mean to accidentally tell a joke in your presence.Matt: [laugh]. You're setting up all of my content for Twitter. I like that one, too. That was really good.Corey: No, no, it comes down to just being a human being. And one of the nice things about doing what I've done—I'm curious to get your take on this, is that for the first time in my career doing what I do now, I feel like I get to bring my whole self to work. That is not what it means that a lot of ways it's commonly used. It doesn't mean I get to be problematic and make people feel bad as individuals. That's just being an asshole; that's not bringing your whole self to work.But it also means I feel like I don't have to hide, I can bring my personality with me, front and center. And people are always amazed by how much like my Twitter personality I am in real life. And yeah because I can't do a bit for this long. I don't have that kind of attention span for one. But the other side of that, too, is does exaggerate certain elements and it's always my highs, never my lows.I'm curious to know how you wind up viewing how you present online with who you are as a person.Matt: That is a really good question. Similar. Very similar. I do some sort of exaggeration. The character I like to play is ‘Bad Associate.' It's, like, one of my favorite characters to play where it's like, if I was the worst version of myself, in practice, what would I look like?And those jokes to me always make me laugh because I always—you know, you have a lot of anxiety when you practice. That's just an aspect of the law. So, for me, I get to make jokes about things that I thought I was going to do or sound like or be like, so it honestly makes me feel a little better. But for the humor itself and how I present online, especially on Twitter, my boss, one of my co-founders, put it perfectly. And we had met for a conference, and—first time in person—and he goes, “You're no different than Twitter, are you?” I go, “Nope.” And he goes, “That's great.”And he really appreciated that. And you're right. I felt like I presented my whole personality, my whole self, where in the legal profession, in private practice, it was not the case. Definitely not the case.Corey: Yeah, and sometimes I talk in sentences that are more than 280 characters, which is, you know, a bad habit.Matt: Sometimes. I have a habit from private practice that I can't get rid of, and I ask very aggressive depo questions like I'm deposing somebody. If you're listening in, can you write me on Twitter and tell me if you're a litigator and you do the same thing? Because, like, I will talk to folks, and they're like, “This isn't an interview or like a deposition.” I'm like, “Why? Why isn't it?” And it [laugh] gets really awkward really quickly. But I'm trying to break that habit.Corey: I married a litigator. That pattern tracks, let's be clear. Not that she doesn't so much, but her litigator friends, if litigators could be said to have friends, yeah, absolutely.Matt: My wife is a former litigator. Transactional attorney.Corey: Yes. Much the same. She's grown out of the habit, thankfully.Matt: Oh, yeah. But when we were in the thick of litigation, we were actually at competing law firms. It was very much so, you come home, and it's hard to take—right, it's hard to not take your work home, so there was definitely occasions where we would talk to each other and I thought the judge had to weigh in, right, because there were some objections thrown, some of the questions were leading, a little bit of compound questions. So, all right, that's my lawyer joke of the day. I'm sorry, Corey. I won't continue on the schtick.Corey: It works, though. It's badgering the witness, witnessing the badger, et cetera. Like, all kinds of ridiculous nonsense and getting it wrong, just to be, I guess, intentionally obtuse, works out well. Something you said a minute ago does tie into what you do professionally, where you mentioned that your wife was a litigator and now is a transactional attorney. One thing they never tell you when you start a business is how many lawyers you're going to be working with.And that's assuming everything goes well. I mean, we haven't been involved in litigation, so that's a whole subset of lawyer we haven't had to deal with yet. But we've worked with approximately six—if memory serves—so far, not because we're doing anything egregious, just because—rather because so many different aspects of the business require different areas of specialty. We also, to my understanding—and I'm sure my business partner will correct me slash slit my throat if I'm wrong—I've not had to deal with criminal attorneys in any interesting ways. Sorry, criminal defense attorneys, criminal attorneys is a separate setup for a separate story.But once I understood that, realizing, oh, yeah, Lawtrades. You can find specialist attorneys to augment your existing staff. That is basically how I view that. Is that directionally accurate?Matt: Yeah. So like, common issue I run into, right is, like, a general counsel, is a corporate attorney, right? That's their background. And they're very aware that they're not an employment attorney. They're not a privacy attorney. Maybe they're not an IP attorney or a patent attorney.And because they realize that, because they're not like that old school attorney that thinks they can do everything and solve everyone's problems, they come to Lawtrades and they say, “Look, I don't need an employment attorney for 40 hours a week. I just need ten hours. That's all I need, right? That's the amount of work that I have.” Or, “I don't have the budget for an attorney for 40 hours, but I need somebody. I need somebody here because that's not my specialty.”And that happens all the time where all of a sudden, a solo general counsel becomes a five or six-attorney legal department, right, because you're right, attorneys add up very quickly. We're like rabbits. So, that's where Lawtrades comes in to help out these folks, and help out freelance attorneys, right, that also are like, “Hey, listen, I know employment law. I can help.”Corey: Do you find that the vast slash entire constituency of your customers pretend to be attorneys themselves, or is this one of those areas where, “I'm a business owner. I don't know how these law things work. I had a firm handshake and now they're not paying as agreed. What do I do?” Do you wind up providing, effectively, introduction services—since I do view you as, you know, match.com for dating with slightly fewer STDs—do you wind up then effectively acting as an—[unintelligible 00:18:47] go to talk to find a lawyer in general? Or does it presuppose that I know which end of a brief is up?Matt: There's so many parts of what you just said I want to take as well. I also liked that you didn't just say no STDs. That was very lawyerly of you. It's always, like, likely, right?Corey: Oh, yes. So, the answer to any particular level of seniority and every aspect of being an attorney is, “It depends.”Matt: That's right. That's right. It triggers me for you to say it. Ugh. So, our client base, generally speaking, our companies ranging from, like, an A round company that has a solo GC all the way up to a publicly traded company that has super robust legal department that maybe needs a bunch of paralegals, bunch of legal operations professionals, contract managers, attorneys for very niche topics, niche issues, that they're just, that is not what they want to do.So, generally speaking, that's who we service. We used to be in the SMB space. There was a very public story—my founders are really cool because they built in public and we almost went broke, actually in that space. Which, Corey, I'm happy to share that article with you. I think you'll get a kick out of it.Corey: I would absolutely look forward to seeing that article. In fact, if you send me the link, we will definitely make it a point to throw it into the [show notes 00:19:58].Matt: Awesome. Happy to do it. Happy to do it. But it's cool. The clients, I tell you what, when I was in private practice when I was in-house, I would always deal with an adverse attorney. That was always what I was dealing with.No one was ever—or a business person internally that maybe wasn't thrilled to be on the phone. I tell you what, now, when I get to talk to some of these folks, they're happy to talk to me; it's a good conversation. It really has changed my mentality from being a very adverse litigator attorney to—I mean it kind of lends itself to a shitposter, to a mean guy, to a party clown. It's a lot of fun.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: One area that I think is going to be a point of commonality between us is in what the in-and-out of our day jobs look like. Because looking at it from a very naive perspective, why on earth does what is effectively an attorney referral service—yes, which may or may not run afoul of how you describe yourselves; I know, lawyers are very particular about wording—Matt: Staffing [laugh].Corey: Exactly. Legal staffing. There we are. It doesn't seem to lend itself to having a, “Head of Community,” quote-unquote, which really translates into, “I shitpost on the internet.” The same story could be said to apply to someone who fixes AWS bills because in my part of the industry, obviously, there is a significant problem with people who have large surprise bills from their cloud provider, but they generally don't talk about them in public as soon as they become an even slightly serious company.You don't find someone at a Fortune 500 complaining on Twitter about how big their AWS bill is because that does horrifying things to their stock price as well as them personally, once the SEC gets involved. So, for me, it was always I'm going to be loud and noisy and have fun in the space so that people hear about me, and then when they have this problem, in the come. Is that your approach to this, or is it more or less the retconning story that I just told, and it really had its origins in, “I'm just going to shitpost. I feel like good things will happen.”Matt: Funnily enough, it's both. That's how it started. So, when I was in private practice, I was posting like crazy on—I'm going to say LinkedIn for the third time—and again, I hope somebody sends a nasty message to me about how bad LinkedIn is, which I don't think it's that bad. I think it's okay—so I was shitposting on LinkedIn before probably many folks were shitposting on LinkedIn, again like Alex, and I was doing it just because I was tired of attorneys being what we described, this old guard, buttoned up, just obnoxiously perfect version of themselves. And it eventually led itself into this career. The whole journey was wild, how I got here. Best way to describe it was a crazy trip.Corey: It really is. You also have a very different audience in some ways. I mean, for example, when you work in the legal field, to my understanding from the—or being near to it, but not within it, where you go to school is absolutely one of those things that people still bring up as a credential decades later; it's the first thing people scroll to on LinkedIn. And in tech, we have nothing like that at all. I mean, just ask anyone of the random engineers who talk about where they used to work in their Twitter bio: ex-Google, ex-Uber, et cetera.Not quite as bad as the VC space where it's, “Oh, early investor in,” like, they list their companies, which of course to my mind, just translates directly into, the most interesting thing about you is that once upon a time, you wrote a check. Which yeah, and with some VCs that definitely tracks.Matt: That's right. That's a hundred percent right. It's still like that. I actually saw a Twitter post, not necessarily about education, but about big law, about working in big law where folks were saying, “Hey, I've heard a rumor that you cannot go in-house at a company unless you worked in big law.” And I immediately—I have such a chip on my shoulder because I am not a big law attorney—I immediately jumped to it to say, “Listen, I talk to in-house attorneys all the time. I'm a former in-house attorney. You don't have to work with big law. You don't have to go to a T-14 law school.” I didn't. I went to Florida State University in Tallahassee.But I hear that to this day. And you're right, it drives me nuts because that is a hallmark of the legal industry, bragging about credentials, bragging about where I came from. Because it also goes back to that old guard of, “Oh, I came from Harvard, and I did this, and I did that,” because we love to show how great and special we are not by our actual merits, but where we came from.Corey: When someone introduces themselves to me at a party—which has happened to me before—and in their introduction, they mention where they went to law school, I make it a point to ask them about it and screw it up as many times in the rest of the evening as I can work in to. It's like they went to Harvard. Like so, “Tell me about your time at Yale.” “Oh, sorry. I must have forgotten about that.” Or, “What was the worst part about living in DC when you went to law school?” “Oh, I'm sorry. I missed that. You went to Harvard. How silly of me.”Matt: There's a law school at Dartmouth [laugh]?Corey: I know. I'm as surprised as anyone to discover these things. Yeah. I mean, again, on the one hand, it does make people feel a little off and that's not really what I like doing. But on the other, ideally, it's a little bit of a judgment nudge as far as this may not sound the way that you think it sounds when you introduce yourself to people that way.Matt: All the time. I hear that all the time. Every so often, I'll have someone—and I think a lot of the industry, maybe just the industry where I'm in, it's not brought up anymore. I usually will ask, right? “Hey, where do you come from?” Just as a conversation starter, “What firm did you practice at? Did you practice in big law? Small law?”Someone once called it insignificant law to me, which hurts because I'm part of insignificant law. I get those and it's just to start a conversation, but when it's presented to me initially, “Hey, yeah, I was at Harvard,” unprompted. Or, “I went to Yale,” or went to whatever in the T-14, you're right, it's very off-putting. At least it's off-putting to me. Maybe if someone wants to tell me otherwise, online if you went to Harvard, and someone said, “Hey, I went to Harvard,” and that's how they started the conversation, and you enjoy it, then… so be it. But I'll tell you, it's a bit off-putting to me, Corey.Corey: It definitely seems it. I guess, on some level, I think it's probably rooted in some form of insecurity. Hmm, it's easy to think, “Oh, they're just completely full of themselves,” but that stuff doesn't spring fully formed from nowhere, like the forehead of some God. That stuff gets built into people. Like, the constant pressure of you are not good enough.Or if you've managed to go to one of those schools and graduate from it, great. The constant, like, “Not everyone can go here. You should feel honored.” It becomes, like, a cornerstone of their personality. For better or worse. Like, it made me more interesting adult if it made my 20s challenging. I don't have any big-name companies on my resume. Well, I do now because I make fun of one, but that's a separate problem entirely. It just isn't something I ever got to leverage, so I didn't.Matt: I feel that completely. I come from—again, someone once told me I worked in insignificant law. And if I ever write a book, that's what I'm going to call it is Insignificant Law. But I worked the small law firms, regional law firms, and these in Tallahassee and I worked in South Florida and nothing that anyone would probably recognize in conversation, right? So, it never became something I bring up.I just say, “I'm an attorney. I do these things,” if you ask me what I do. So, I think honestly, my personality, and probably the shitposting sprung out of that as well, where I just had a different thing to talk about. I didn't talk about the prestige. I talked about the practice, I talked about what I didn't like about the practice, I didn't talk about being on Wall Street doing these crazy deals, I talked about getting my ass kicked in Ponce, Florida, up in the panhandle. For me, I've got a chip on my shoulder, but a different kind of chip.Corey: It's amazing to me how many—well, let's calls this what we are: shitposters—I talk to where their brand and the way that they talk about their space is, I don't want to say rooted in trauma, but definitely built from a place of having some very specific chips on their shoulder. I mean, when I was running DevOps teams and as an engineer myself, I wound up continually tripping over the AWS bill of, “Ha, ha. Now, you get to pay your tax for not reading this voluminous documentation, and the fine print, and with all of the appendices, and the bibliography, and tracked down those references. Doesn't it suck to be you? Da da.” And finally, it was all right, I snapped. Okay. You want to play? Let's play.Matt: That's exactly right. There's, like, a meme going around. I think it actually saw from the accounting meme account, TB4—which is stellar—and it was like, “Ha, I'm laughing because it hurts.” And it's true. That's why we all laugh at the jokes, right?I'll make jokes about origination credit, which is always an issue in the legal industry. I make jokes about the toxic work environment, the partner saying, “Please fix,” at three o'clock in the morning. And we make fun of it because everyone's had to deal with it. Everyone's had to deal with it. And I will say that making fun of it brings light to it and hopefully changes the industry because we all can see how ridiculous it is. But at least at the very beginning, we all look at it and we say, “That's funny because it hurts.”Corey: There's an esprit de corps of shared suffering that I think emerges from folks who are in the trenches, and I think that the rise of—I mean some places called the micro-influencers, but that makes me want to just spit a rat when I hear it; I hate the term—but the rise of these niche personalities are because there are a bunch of in-jokes that you don't have to be very far in to appreciate and enjoy, but if you aren't in the space at all, they just make zero sense. Like when I go to family reunions and start ranting about EC2 instance pricing, I don't get to talk to too many people anymore because oh my God, I've become the drunk uncle I always wanted to be. Goal achieved.Matt: [laugh].Corey: You have to find the right audience.Matt: That's right. There is a term, I think coin—I think it was coined by Taylor Lorenz at Washington Post and it's called a nimcel, which is, like, a niche micro-influencer. It's the worst term I've ever heard in my entire life. The nimcel [laugh]. Sorry, Taylor, it's terrible.But so I don't want to call myself a nimcel. I guess I have a group of people that enjoy the content, but you are so right that the group of people, once you get it, you get it. And if you don't get it, you may think some parts of it—like, you can kind of piece things together, but it's not as funny. But there's plenty of litigation jokes I'll make—like, where I'm talking to the judge. It's always these hypothetical scenarios—and you can maybe find it funny.But if you're a litigator who's gotten their ass kicked by a judge in a state court that just does not like you, you are not a local, they don't like the way you're presenting yourself, they don't like your argument, and they just dig you into the ground, you laugh. You laugh because you're, like, I've been there. I've had—or on the flip, you're the attorney that watched your opposing counsel go through it, you're like, “I remember that.” And you're right, it really you get such a great reaction from these folks, such great feedback, and they love it. They absolutely love it. But you're right, if you're outside, you're like, “Eh, it's kind of funny, but I don't really get all of it.”Corey: My mother approaches it this way whenever she talks to me like I have no idea what you're talking about, but you seem to really know what you're talking about, so I'm proud of you. It's like, “No, Mom, that is, like, the worst combination of everything.” It's like, “Well, are you any good at this thing?” “No. But I'm a white man, so I'm going to assume yes and the world will agree with me until proven otherwise.” So yeah, maybe nuclear physics ain't for you in that scenario.But yeah, the idea of finding your people, finding your audience, before the rise of the internet, none of this stuff would have worked just because you live in a town; how many attorneys are really going to be within the sound of your voice, hearing these stories? Not to mention the fact that everyone knows everyone's business in some of those places, and oh, you can't really subtweet the one person because they're also in the room. The world changes.Matt: The world changes. I've never had this happen. So, when I really started to get aggressive on, like, Twitter, I had already left private practice; I was in-house at that point. And I've always envisioned, I've always, I always want to, like, go back to private practice for one case: to go into a courtroom in, like, Miami, Florida, and sit there and commiserate and tell the stories of people again like I used to do—just like what you're saying—and see what everyone says. Say, “Hey, I saw you on Twitter. Hey, I saw this story on Twitter.”But in the same breath, like, you can't talk like you talk online in person, to some degree, right? Like, I can't make fun of opposing counsel because the judge is right there and opposing counsel was right there, and I'm honestly, knowing my luck, I'm about to get my ass kicked by opposing counsel. So, I probably should watch myself in that courtroom.Corey: But I'm going to revise the shit out of this history when it comes time to do my tweet after the fact. “And then everybody clapped.”Matt: [laugh]. I found five dollars outside the courtroom.Corey: Exactly. I really want to thank you for spending so much time chatting with me. If people want to learn more and follow your amazing shitpost antics on the internet, where's the best place for them to do it?Matt: Corey it's been an absolute pleasure. Instagram, TikTok, Twitter, LinkedIn. For everything but LinkedIn: @ItsMattsLaw. LinkedIn, just find me by my name: Matt Margolis.Corey: And we will put links to all of it in the [show notes 00:33:04]. Thank you so much for being so generous with your time. It's appreciated.Matt: I have not laughed as hard in a very, very long time. Corey, thank you so much.Corey: Matt Margolis, Head of Community at Lawtrades. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment that you've drafted the first time realized, oh wait, you're not literate, and then hired someone off of Lawtrades to help you write in an articulate fashion.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Building Trust in the World of DevRel with Taylor Barnett

Screaming in the Cloud

Play Episode Listen Later Jan 3, 2023 29:41


About TaylorTaylor Barnett is a Staff Developer Advocate at PlanetScale. She is passionate about building great developer experiences emphasizing empathy within product, documentation, and other developer-facing projects. For the past decade, Taylor has worked at various data and API-focused startups in software development and developer relations. In her free time, as a firm believer in "touching grass," she's either gardening, taking long walks, climbing rocks with friends, trying to find the funkiest sour beers, or hanging out with her corgi, Yoda, and spouse in Austin, Texas.Links Referenced: PlanetScale: https://planetscale.com/ Twitter: https://twitter.com/taylor_atx Personal website: https://taylorbar.net TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: If you asked me to rank which cloud provider has the best developer experience, I'd be hard-pressed to choose a platform that isn't Google Cloud. Their developer experience is unparalleled and, in the early stages of building something great, that translates directly into velocity. Try it yourself with the Google for Startups Cloud Program over at cloud.google.com/startup. It'll give you up to $100k a year for each of the first two years in Google Cloud credits for companies that range from bootstrapped all the way on up to Series A. Go build something, and then tell me about it. My thanks to Google Cloud for sponsoring this ridiculous podcast.Corey: This episode is sponsored by our friends at Logicworks. Getting to the cloud is challenging enough for many places, especially maintaining security, resiliency, cost control, agility, etc, etc, etc. Things break, configurations drift, technology advances, and organizations, frankly, need to evolve. How can you get to the cloud faster and ensure you have the right team in place to maintain success over time? Day 2 matters. Work with a partner who gets it - Logicworks combines the cloud expertise and platform automation to customize solutions to meet your unique requirements. Get started by chatting with a cloud specialist today at snark.cloud/logicworks. That's snark.cloud/logicworksCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by Taylor Barnett, Staff Developer Advocate at PlanetScale. Taylor, you're one of those people that I'm ashamed I haven't had on the show before now. Thanks for joining me.Taylor: You're welcome. Yeah, I'm glad to be here.Corey: We've been traveling in similar circles for a while now. And I lost track of a lot of those areas when the pandemic hit, you know, the global plague o'er the land. And during that time, it seemed like there was a lot of question that folks had about what is developer advocacy. What does DevRel become now? And now that we're largely on the other side of it—at least business is pretending that we're behind it—do we have an answer yet?Taylor: I hope so. I mean, I have an answer. Not sure if other businesses have figured that out yet. But no, I mean, to me, advocacy is still just that glue between company and a community. But I think one of the things that the pandemic has really, like, pushed that, you know, when there were no in-person events, was that it questioned what activities that actually looks like.You know, I see advocacy as a ton of different levers and you can tweak those levers to different levels. Before, it was largely a lot of in-person stuff—I will say I was doing less in-person, actually before than most; I was doing a little bit more content—then it had to become so content-focused. And I think now we're in this awkward place where in-person events have come back and we're still, like, figuring out, like, how do we do those? What does that look like? And we've actually—I think part of it is we've over-indexed now on content.I think part of that is because it is visible and it is measurable, and that's always a big topic [laugh] in developer relations is metrics. But also, I think we've lost track of the actual advocacy part: how do we actually advocate for users internally? It's just disappeared a little bit because we were so content-focused during the pandemic.Corey: I would say that's been a recurring theme with every DevRel person that I've spoken to that metrics are the bane of their existence. And I want to be clear, I'm not just talking about developer advocates, I'm talking about people who manage and run developer advocacy teams, I'm talking about executives who are trying to bring the appropriate context to strategic-level discussions around these things. All of the metrics that I have been able to uncover are wrong. But it's like the, ‘all models are wrong, but some models are useful' type of approach, where—Taylor: Yeah.Corey: Every time you start putting a metric around it and measuring people based upon the outcome of that metric, it ends in disaster. My one and only interview for a DevRel job in my past was my question for them was how do you measure success? “Well, we want to see you have talks accepted at some of the big tier-one conferences.” And they list a few examples, and it's, yeah, “I've spoken at for the ones you just listed in the past year, so… do I get a raise?” It's one of those areas where there's no right answer, but a lot of wrong ones.Taylor: Yeah. And one of the other troubling patterns that I've started to see also more is that in these cloud startups, they have DevRel programs now that are fairly young, we're talking not even a year old. Some in the recent DevRel survey results, it was about, like, 29% of programs are less than a year old. Within those programs, 43% of those people have not even been in a DevRel role for more than a year. So, not only do we have folks that haven't done this before, the startup has not done this before.And so, the metrics conversation is basically a shit show. People with the right experiences aren't in these roles and so they're not able to craft strategies and actually look at good metrics. And so, then we then over-index on the things like, “Oh, you wrote a blog post.” Great, you know, that's, like, some kind of metric. “It got X number of page views.” Great, that's some kind of metric.And it often incentivizes some of the wrong things. And so, then it just incentivizes more and more of this content creation, to just get those pageviews up. And it's scary to me because then we're just going back to the more evangelist type of developer relations and less of the advocacy type stuff where we're actually advocating for users internally.Corey: I would agree. I'd say that there's a problem where we have a, almost across the board, lack of understanding about—let's even start at the very beginning of when DevRel is required or when it's not. I mean, take where you work now at PlanetScale. You're effectively managed Vitess-as-a-service. That's a little on the technical side and is not the sort of thing that's going to necessarily lend itself to a mass-market marketing approach.This is not something to put on billboards outside of most highways, for example, but it does require engaging with people on a technical level. I keep joking but also serious when I refer to DevRel as meaning you work in marketing, but they're scared to tell you.Taylor: Yeah. No, I mean, I actually sometimes say, “Well, like, I'm secretly probably a pretty good product marketer, but I don't want developers to know that because then I'll lose my street cred from my actual development and engineering background.” And I have a computer science degree and, like, I'm actually, like [laugh], very, very technical. But the reality is, like, you know, somebody's got to write the words, sometimes.Corey: The words are harder when they go into people then they are into computers. At least with computers—Taylor: Exactly.Corey: It's pretty—it's a bounded problem space to some extent. With people, oh no, no, there's no consistency at all.Taylor: Yeah. And like, words mean different things to different people, especially, like, my favorite one lately is, like, what does edge mean? Nobody actually has one [laugh] definition of that word.Corey: Oh, I think most of them do. Edge always means, “Oh, it's a way of describing the thing that we've been doing for 15 years, but now want to sell into a hype cycle.”Taylor: Yeah, yeah. I mean, CDNs have been around for a while. You know, and that's really—like, what PlanetScale, it is, in some ways, we're challenging what people expect from their database. We think you can actually expect more from your database platform, and so there are things you know, to teach people about some of these newer ways of working with a database. And that requires needing to think about how we present that to users, but also hearing back from users how do we work within their applications, their stacks.We're MySQL. That's, you know, a trusted standard. It's been around for a while, so it works with many, but also, we're in this whole new paradigm of how to use a database. These are all new ideas and they require both a two-way street of both putting things out there—so content, not bad; it's still needed—but also things coming in and taking that, making it actionable, and talking about it internally.Corey: When you take a look at the DevRel world, what do you think that most organizations are missing or getting wrong about it? And yes, I understand that I'm basically asking you to start beef with a whole bunch of companies out there, but that's all right. It's what we do here.Taylor: Yeah, one of the things I love, [Matty 00:07:44] Stratton had this thing where I tweeted out a few months ago that we've over-indexed on content, and matty's reply was that we've over-indexed on being able to do cool shit that isn't connected to revenue because that somehow is dirty for DevRel to somehow be connected to revenue. I think, you know, a lot of times, there are ways that we can look at how do users actually get value from our products. Like, are they actually getting value? One way they express that is by paying for it. So therefore, we are then somehow connected to revenue.I mean, I want to build things, I want to work on platforms that deliver value, that people actually want to pay for because they see this is makes my life easier, somehow. But to do that, and again, we've got to talk to our users. We've got to figure out where do they actually value. What are the things that are just fluff? There's a lot of fluff out there.Sometimes if we don't listen to them, then we don't have to find out that what we're building is fluff. So, that's probably the part that could start some beefs. But it's the reality of lots of VC money and tooling and being able to build things super easily, it's a bunch of different factors coming together in this time.Corey: One of the things that I don't pretend to understand, but I'm going to roll with it anyway, is there's been a lot of discourse on where DevRel does not belong in an org chart. I don't have a terrific answer at this, but I do know that most of the answers I get from practitioners in the space are deeply dissatisfying. It seems that—not to be unkind or cast aspersions where they don't belong, but whenever I ask the question, everyone has a whole laundry list of wrong answers and very few right ones.Taylor: I honestly will say I don't care [laugh]. I mean, that's the reality.Corey: Corporate IT. Got it.Taylor: Do I want to be on a team that makes me directly responsible for qualified leads? No. That does not necessarily say anything about the team itself. That is just a metric. That is—you know, and that team exists in a larger system that has put certain pressures on it.Like, you know, there's, like, things, like, it's more about how a team looks at just doing the DevRel stuff and doing marketing in general, or how they do sales. You know, I know lots of developers hate to hate on sales—marketing, too—and I don't necessarily think sales and marketing are a bad thing, I think is the way we incentivize those roles create bad behaviors, and so maybe we should look at how we incentivize them. And so, I don't care what team I'm honestly on most of the time. I've been on a few different ones. As long as I get to do the developer advocacy work that I actually think is impactful for developers and actually making developers' lives better, I'm cool.Corey: It's my belief, on some level, that it's very easy to internalize a bad expression of it. You can have phenomenally well-empowered DevRel teams working in marketing—Taylor: Yep.Corey: —at some companies, and in other places, it can be an absolute disaster because they start putting metrics like number of qualified leads around you. And I can't shake the feeling that people internalize, “Well, we've reported marketing once and it was terrible,” without realizing the context of yeah, but in a terrible way, and an org that didn't really understand what you do. That doesn't necessarily mean that you should throw that whole baby out with the bathwater.Taylor: Yeah, I mean, we've all had bad managers. So, we're not going to say we're just never going to have a manager.Corey: Some people try that.Taylor: Is that what you've done [laugh]?Corey: Indirectly. No, I was talking about more about the holacracy companies where oh yeah, no one reports to anyone. It's really? Because everyone makes different amounts of money, so one wonders about that.Taylor: Yeah. But by far, we just go find better managers is what we often do, you know? And there's the whole phrase that, like, people don't leave companies, they leave managers. It's very true in my experience. And we don't just say, “All marketing teams bad, so I'm never going to join a marketing team.” We should say, “Let's just go find one that fits better.”Corey: I was very frustrated in my last couple of real jobs because so much of what I was doing was DevRel-like, but this was before that was an established and accepted thing in the places that I worked, so there were questions like, “Well, what is the value of you going to give a keynote at this conference?” And the honest answer was, “Yeah, I have no idea how to quantify it, but I know that if I do it, good things come out of it.” And that was a difficult battle to fight, whereas now when I decided to go work for myself, it's, “Yeah, I'm going to go speak there. I don't know what the ROI is. I know good things and maybe some useful things will come out of it. Maybe I'll learn something, but this is how we experiment and learn.” And that looks an awful lot to most traditional management types. Like I'm trying to justify a trip somewhere.Taylor: Yeah. And I think, you know, what's been also interesting, as I noticed, some people are starting to notice a lot of more junior people wanting to get into developer relations. And we sometimes actually are wondering, some of us in developer relations, if we've not always shown like the negative parts of that. What happens when you go do that keynote? What does that mean for your week leading up to that keynote? What does travel look like? What is, like, running across an airport wearing a mask and carrying your luggage look like?I think we don't always get to see that and so it looks a little bit less glamorous when people see that. And maybe they would be slightly less interested in the role or just, like, how do you handle working with, like, five different teams across a company to try to be like that glue piece between all of them to get something done? Like, there's a lot less glamorous parts that I'm hoping more people talk about because, like you said, it just looks like you're trying to go get a trip somewhere. I think the other thing is, like, even if you are having a keynote, I think one of the things that some people—they think one keynote is going to just wreck a budget. The reality is for our business, it will not do that, so why can't we, like, have a better balance of extremes?Like, you're not going to be giving ten of those keynotes in a year, maybe experiment doing two and see what comes out of doing two of them. But the other thing is, it's a long-term game and so you're not going to see something maybe the week after. It could be six months later. I had this one experience where someone actually told me—it was probably, like, a whole year after I had given a talk—that him and his teammates—this was back when people you know, went into offices—sat in an office and watched one of my old talks together. And I was just like, what, like, y'all, like, got together and did that?Corey: Yeah, you could have invited me and I could have delivered it for you in person and answered questions, but all right.Taylor: Yeah. It was like, what I was just like, oh my gosh, that is literally never happened to me. This was a few years ago. And then, too, I was like, that just made it worth it. If you asked a CEO, would you like to have an advocate go give a talk for a whole team at a company, they'd be like, “Yes, I want you—” especially if that's a big company and the name is shiny and they would love to have that as a customer, they would be, like, a hundred percent, “Go give that talk.”And so, I think many times, leadership needs to actually kind of check in on, like, is this really that much of a cost if it's just, like, one keynote? I've seen battles over really feels like stupid things sometimes. But everything in moderation is kind of the way I approach it.[midroll 00:15:17]Corey: One problem that I tended to see and I don't know how closely your experience mirrors my own, but it seemed, especially in the before times, right before the pandemic hit, that we were almost trapped in a downward spiral at a lot of the conferences because it felt like it was mostly becoming DevRels speaking to DevRel. And that wasn't the most inclusive thing for folks who used to wind up going to a lot of local conferences to learn from their local community and see how other people were solving the problems that they were solving. Instead, it felt like a bunch of DevRel types getting up there, in most cases giving a talk that was heavily alluding to why you should buy their product, if not an outright sales pitch for it. And it just felt like we're losing something. Do you think that's something that we've avoided, that we've pressed pause on, with the pandemic and now the recession, or do you think there's something else afoot?Taylor: I think that's still happening today, especially with, like, engineers wanting sometimes to travel less, you know, some people still have personal and family reasons for not traveling, so even less of them are wanting to speak. I don't think I saw, like, a huge swath of engineers, like, really excited to speak once conferences started in person again. They thought, “Oh, my gosh, I have to go talk to people in person again?” And so, it's still happening. I've seen it from an organizer's perspective.I used to organize the API specifications conference. There's tons of DevRel submissions in there, so you know, we really tried to spend time reaching out to companies that were member companies of the OpenAPI Initiative and get them to actually have member engineers from their teams come speak. I think DevRel has a role to internally advocate for engineers who are doing the day-to-day work, go speak at conferences. You know, I think many times engineers feel like, “Oh, what I have to talk about is not very interesting.” And I have to tell them, it is very interesting, and I would love to have you speak, and I'm here to help you, and you know, need help writing a CFP? I'm there. You need help putting together slides, practicing talks? I'm there.And I think DevRel can be kind of like these coaches for folks to go speak at conferences because the reality is attendees want to hear from them. They want to hear engineers from especially major companies or companies just doing really interesting engineering challenges speaking. And I think DevRel has a part in helping that happen. I've personally backed away from speaking the last six months, partially because I'm kind of not seeing as much value for myself doing it before I was doing a lot more, so I'm using that effort to try to advocate internally to help people CFPs. Last week, I helped a bunch of people KubeCon submissions, and then next week, I have other conferences I would love to—I have engineers that I've kind of picked out that I would love to have speak. And yeah, I'm glad to play a part in trying to improve that. And I think other advocates should, too.Corey: Where do you think that we're going as an industry? Because it became pretty clear for a couple of years that so much of what we were doing and how we were discussing it, it felt like there was a brief moment in time that we could really transform what we were doing and start to have a broader awareness that DevRel was more than giving talks on stage at conferences. And it feels like we squandered that opportunity and it mostly turned into, oh, now we're going to give the same talks, we're just going to do it to webcams, either pre-recorded—which was the better approach—or we're going to do it live, even though there's no interactive component to it, just introduce a whole bunch of different failure modes. I was disappointed. I liked some of the early stuff I saw coming out, like Desert Island DevOps, where they did it inside of Animal Crossing. Like I wanted to see more stuff like that, but it just seems like we didn't.Taylor: Yeah, I mean, the reality is, I think a lot of the online events have disappeared a lot in the last three or four months. And we're also seeing events trying to be hybrid. To me, a hybrid event is, like, throwing two events. Do you have an organizing team that can actually handle two concurrent events? It's hard.And API Specifications Conference, we did two years in person. Pretty niche conference. It's like the API nerds of the API nerds. And so, we still had pretty engaged attendees because there weren't any other sources of this, but then when everyone was starting to do the same content, attendees started checking out. They got tired of sitting in front of their monitors and watching talks.You know, we're seeing things coming back in person. I think it's going to be very interesting for the Spring because the Fall for me, it was probably one of my busiest conference seasons in terms of us just also sponsoring things. And I'm unsure of the return on investment today. We will see over time how that return on investment comes out, but I think it's going to change the way we look at the Spring, it's going to change the way we look at next Fall, and I think other companies are having the same conversations, too. And so, it's going to be like, okay, what do we do instead if we don't focus on conferences? I don't know. For me, that's focusing on the actual advocacy part, the user feedback, talking to users, building a product that people find value in. But for other teams, their team might not be in the place to do that. They might be expected to still produce this content in different ways, in-person, written, online.Corey: So, one of the burning questions that I think is not asked or addressed particularly well in the space has been, how do you get users to trust you? And to be clear, I am not saying you personally. It's like, “Well, given your history of flagrant lying and misleading people and scam after scam after scam, that is honestly impressive—” No, no, no, none of that. It's how do you—the indefinite you—build user trust?Taylor: Yeah, I think this is something we've seen, lots of companies of all sizes really struggle with. You know, the obvious thing I think many times companies think of is like, oh, if I'm open and transparent and have great docs, users will trust me. You know, I think that's part of it. I think the other thing that many often forget is that you need to listen to them, you need to take their feedback that they give you when you ask questions—and there's a whole, like, asking questions; I'm learning myself, like, how to ask better questions—how do you then make that actionable internally?You know, you have to understand who makes product decisions. Who do I need to talk to about this feature versus this other feature, and there's all these internal dynamics that you're then wading into. So, you have to get good at that. And then when you finally actually get some kind of change, whether that be some small paper cut of a thing related to a feature, or a big feature that you release, you actually go back to the user and you tell them, “Hey, look, we did this.” And what blows my mind is I do this, I take notes on who told me what feedback, and when that issue gets closed out, I go back to them and they're just shocked that I replied. They are shocked that I actually followed up. And to me, it's like such a basic thing, just following up. Doesn't seem, like, that hard.But it actually is hard but also useful. And you know, I think we've seen this so many times. We see—this is one example that I think about a lot, and I think you're familiar with this one too, Corey, the Aurora Serverless Data API in V1, people loved that. Then they came out with V2. There was no data API.And if you search that people are upset everywhere. And AWS keeps on telling them, “Nope, it's not going to happen.” And it's like, it's such an easy win if they actually listened to the user base. But there's countless examples of this, you know? There's things that we do at PlanetScale that we could improve on, you know, that users are telling us.There's only so much time in the day, but I think part of an advocate's job to wade through this feedback and figure out where can we bring the biggest value and the most impact. And, you know, I think all companies could benefit just from listening more and doing something about it.Corey: I wish that were a message that would get in front of the right people to make them a little bit more receptive. It feels like that's a message that is bandied around—to be direct—in DevRel circles an awful lot, but it doesn't seem to gain traction outside of that.Taylor: This kind of goes back to what we were talking about earlier with what team you're on. Sometimes that makes a huge difference, especially in larger companies. If you were siloed away in a marketing org—nothing bad about marketing, to be clear, but internally, you're seen as marketing—engineers, developers, see you as marketing. When you come with product feedback, they're kind of, “That's not your box. Go back to your box. Go back to your silo.”And you know, I think the reality is, we can't look at advocacy like that. I have users tell me things that they would never tell salesperson, they would never tell someone on our leadership team, they might tell someone in support. They tell me things. They send me emails that are multiple paragraphs long, giving positive and negative feedback. Many times it's positive, but I'm just shocked they'll even write that much, you know, positive. Like, they actually took out the time to do it.And they trusted that it was worth their time. I've done something right there if they're willing to do that at that point. And I, you know, I make sure I respond to every single one of those emails. I had someone ask me like, “Oh, do you want us to forward you all of them?” And I'm like, “Yes. Every single one. No matter what it says, I'm going to reply to this email.”Because then if I lose that trust, it's everything for me as an advocate. It's how I can help them, you know, see the value in the product, and help them with adoption, and bring them along to eventually paying, potentially—dirty word, revenue—but otherwise, I wouldn't have a job. So, you know, I think it's really something that startups, they think they see DevRel advocacy as content farms and not enough of the part that actually helps them make money.Corey: I really want to thank you for being so generous with your time. If people want to learn more, where's the best place for them to find you?Taylor: So, for now, I'm on Twitter as @taylor_atx. But if anything happens with that, as we know right now, you can also find me at taylorbar.net is my website. I'll always try to keep links of where I am on there. Trying to write more. We'll see if I accomplish that over the holidays. But yeah, that's the two places you can find me.Corey: And we will, of course, include links to that in the [show notes 00:26:27]. Thank you so much for your time. I appreciate it.Taylor: Yeah, thanks, Corey, for letting me rant, ramble, kind of have all these thoughts about advocacy. I'm hoping we can have a good 2023 in the world of DevRel and advocacy and make progress on some of these things.Corey: I sure hope you're right.Taylor: [laugh]. I hope I'm right, too, for the happiness of my job [laugh].Corey: Taylor Barnett, Staff Developer Advocate at PlanetScale. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment channeling a late Christmas spirit to tell us what the true meaning of DevRel was all along. Which will be wrong. Because it includes metrics.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Holiday Replay Edition - Inside the Mind of a DevOps Novelist with Gene Kim

Screaming in the Cloud

Play Episode Listen Later Dec 29, 2022 30:49


About GeneGene Kim is a multiple award-winning CTO, researcher and author, and has been studying high-performing technology organizations since 1999. He was founder and CTO of Tripwire for 13 years. He has written six books, including The Unicorn Project (2019), The Phoenix Project (2013), The DevOps Handbook (2016), the Shingo Publication Award winning Accelerate (2018), and The Visible Ops Handbook (2004-2006) series. Since 2014, he has been the founder and organizer of DevOps Enterprise Summit, studying the technology transformations of large, complex organizations.Links: The Phoenix Project: https://www.amazon.com/Phoenix-Project-DevOps-Helping-Business/dp/1942788290/ The Unicorn Project: https://www.amazon.com/Unicorn-Project-Developers-Disruption-Thriving/dp/B0812C82T9 The DevOps Enterprise Summit: https://events.itrevolution.com/ @RealGeneKim TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: If you asked me to rank which cloud provider has the best developer experience, I'd be hard-pressed to choose a platform that isn't Google Cloud. Their developer experience is unparalleled and, in the early stages of building something great, that translates directly into velocity. Try it yourself with the Google for Startups Cloud Program over at cloud.google.com/startup. It'll give you up to $100k a year for each of the first two years in Google Cloud credits for companies that range from bootstrapped all the way on up to Series A. Go build something, and then tell me about it. My thanks to Google Cloud for sponsoring this ridiculous podcast.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey Quinn: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by a man who needs no introduction but gets one anyway. Gene Kim, most famously known for writing The Phoenix Project, but now the Wall Street Journal best-selling author of The Unicorn Project, six years later. Gene, welcome to the show.Gene Kim: Corey so great to be on. I was just mentioning before how delightful it is to be on the other side of the podcast. And it's so much smaller in here than I had thought it would be.Corey Quinn: Excellent. It's always nice to wind up finally meeting people whose work was seminal and foundational. Once upon a time, when I was a young, angry Unix systems administrator—because it's not like there's a second type of Unix administrator—[laughing] The Phoenix Project was one of those texts that was transformational, as far as changing the way I tended to view a lot of what I was working on and gave a glimpse into what could have been a realistic outcome for the world, or the company I was at, but somehow was simultaneously uplifting and incredibly depressing all at the same time. Now, The Unicorn Project does that exact same thing only aimed at developers instead of traditional crusty ops folks.Gene Kim: [laughing] Yeah, yeah. Very much so. Yeah, The Phoenix Project was very much aimed at ops leadership. So, Bill Palmer, the protagonist of that book was the VP of Operations at Parts Unlimited, and the protagonist in The Unicorn Project is Maxine Chambers, Senior Architect, and Developer, and I love the fact that it's told in the same timeline as The Phoenix Project, and in the first scene, she is unfairly blamed for causing the payroll outage and is exiled to The Phoenix Project, where she recoils in existential horror and then finds that she can't do anything herself. She can't do a build, she can't run her own tests. She can't, God forbid, do her own deploys. And I just love the opening third of the book where it really does paint that tundra that many developers find themselves in where they're just caught in decades of built-up technical debt, unable to do even the simplest things independently, let alone be able to independently develop tests or create value for customers. So, it was fun, very much fun, to revisit the Parts Unlimited universe.Corey Quinn: What I found that was fun about—there are few things in there I want to unpack. The first is that it really was the, shall we say, retelling of the same story in, quote/unquote, “the same timeframe”, but these books were written six years apart.Gene Kim: Yeah, and by the way, I want to first acknowledge all the help that you gave me during the editing process. Some of your comments are just so spot on with exactly the feedback I needed at the time and led to the most significant lift to jam a whole bunch of changes in it right before it got turned over to production. Yeah, so The Phoenix Project is told, quote, “in the present day,” and in the same way, The Unicorn Project is also told—takes place in the present day. In fact, they even start, plus or minus, on the same day. And there is a little bit of suspension of disbelief needed, just because there are certain things that are in the common vernacular, very much in zeitgeist now, that weren't six years ago, like “digital disruption”, even things like Uber and Lyft that feature prominently in the book that were just never mentioned in The Phoenix Project, but yeah, I think it was the story very much told in the same vein as like Ender's Shadow, where it takes place in the same timeline, but from a different perspective.Corey Quinn: So, something else that—again, I understand it's an allegory, and trying to tell an allegorical story while also working it into the form of a fictional work is incredibly complicated. That's something that I don't think people can really appreciate until they've tried to do something like it. But I still found myself, at various times, reading through the book and wondering, asking myself questions that, I guess, say more about me than they do about anyone else. But it's, “Wow, she's at a company that is pretty much scapegoating her and blaming her for all of us. Why isn't she quitting? Why isn't she screaming at people? Why isn't she punching the boss right in their stupid, condescending face and storming out of the office?” And I'm wondering how much of that is my own challenges as far as how life goes, as well as how much of it is just there for, I guess, narrative devices. It needed to wind up being someone who would not storm out when push came to shove.Gene Kim: But yeah, I think she actually does the last of the third thing that you mentioned where she does slam the sheet of paper down and say, “Man, you said the outage is caused by a technical failure and a human error, and now you're telling me I'm the human error?” And just cannot believe that she's been put in that position. Yeah, so thanks to your feedback and the others, she actually does shop her resume around. And starts putting out feelers, because this is no longer feeling like the great place to work that attracted her, eight years prior. The reality is for most people, is that it's sometimes difficult to get a new job overnight, even if you want to. But I think that Maxine stays because she believes in the mission. She takes a great deal of pride of what she's created over the years, and I think like most great brands, they do create a sense of mission and there's a deep sense of the customers they serve. And, there's something very satisfying about the work to her. And yeah, I think she is very much, for a couple of weeks, very much always thinking about, she won't be here for long, one way or another, but by the time she stumbles into the rebellion, the crazy group of misfits, the ragtag bunch of misfits, who are trying to find better ways of working and willing to break whatever rules it takes to take over the very ancient powerful order, she falls in love with a group. She found a group of kindred spirits who very much, like her, believe that developer productivity is one of the most important things that we can do as an organization. So, by the time that she looks up with that group, I mean, I think she's all thoughts of leaving are gone.Corey Quinn: Right. And the idea of, if you stick around, you can theoretically change things for the better is extraordinarily compelling. The challenge I've seen is that as I navigate the world, I've met a number of very gifted employees who, frankly wind up demonstrating that same level of loyalty and same kind of loyalty to companies that are absolutely not worthy of them. So my question has always been, when do I stick around versus when do I leave? I'm very far on the bailout as early as humanly possible side of that spectrum. It's why I'm a great consultant but an absolutely terrible employee.Gene Kim: [laughing] Well, so we were honored to have you at the DevOps Enterprise Summit. And you've probably seen that The Unicorn Project book is really dedicated to the achievements of the DevOps Enterprise community. It's certainly inspired by and dedicated to their efforts. And I think what was so inspirational to me were all these courageous leaders who are—they know what the mission is. I mean, they viscerally understand what the mission is and understand that the ways of working aren't working so well and are doing whatever they can to create better ways of working that are safer, faster, and happier. And I think what is so magnificent about so many of their journeys is that their organization in response says, “Thank you. That's amazing. Can we put you in a position of even more authority that will allow you to even make a more material, more impactful contribution to the organization?” And so it's been my observation, having run the conference for, now, six years, going on seven years is that this is a population that is being out promoted—has been promoted at a rate far higher than the population at large. And so for me, that's just an incredible story of grit and determination. And so yeah, where does grit and determination becomes sort of blind loyalty? That's ultimately self-punishing? That's a deep question that I've never really studied. But I certainly do understand that there is a time when no amount of perseverance and grit will get from here to there, and that's a fact.Corey Quinn: I think that it's a really interesting narrative, just to see it, how it tends to evolve, but also, I guess, for lack of a better term, and please don't hold this against me, it seems in many ways to speak to a very academic perspective, and I don't mean that as an insult. Now, the real interesting question is why I would think, well—why would accusing someone of being academic ever be considered as an insult, but my academic career was fascinating. It feels like it aligns very well with The Five Ideals, which is something that you have been talking about significantly for a long time. And in an academic setting that seems to make sense, but I don't see it thought of or spoken of in the same way on the ground. So first, can you start off by giving us an intro to what The Five Ideals are, and I guess maybe disambiguate the theory from the practice?Gene Kim: Oh for sure, yeah. So The Five Ideals are— oh, let's go back one step. So The Phoenix Project had The Three Ways, which were the principles for which you can derive all the observed DevOps practices from and The Four Types of Work. And so in The Five Ideals I used the concept of The Five Ideals and they are—the first—Corey Quinn: And the next version of The Nine whatever you call them at that point, I'm sure. It's a geometric progression.Gene Kim: Right or actually, isn't it the pri—oh, no. four isn't, four isn't prime. Yeah, yeah, I don't know. So, The Five Ideals is a nice small number and it was just really meant to verbalize things that I thought were very important, things I just gravitate towards. One is Locality and Simplicity. And briefly, that's just, to what degree can teams do what they need to do independently without having to coordinate, communicate, prioritize, sequence, marshal, deconflict, with scores of other teams. The Second Ideal is what I think the outcomes are when you have that, which is Focus, Flow and Joy. And so, Dr. Mihaly Csikszentmihalyi, he describes flow as a state when we are so engrossed in the work we love that we lose track of time and even sense of self. And that's been very much my experience, coding ever since I learned Clojure, this functional programming language. Third Ideal is Improvement of Daily Work, which shows up in The Phoenix Project to say that improvement daily work is even more important than daily work itself. Fourth Ideal is Psychological Safety, which shows up in the State of DevOps Report, but showed up prominently in Google's Project Oxygen, and even in the Toyota production process where clearly it has to be—in order for someone to pull the andon cord that potentially stops the assembly line, you have to have an environment where it's psychologically safe to do so. And then Fifth Ideal is Customer Focus, really focus on core competencies that create enduring, durable business value that customers are willing to pay for, versus context, which is everything else. And yeah, to answer your question, Where did it come from? Why do I think it is important? Why do I focus on that? For me, it's really coming from the State of DevOps Report, that I did with Dr. Nicole Forsgren and Jez Humble. And so, beyond all the numbers and the metrics and the technical practices and the architectural practices and the cultural norms, for me, what that really tells the story of is of The Five Ideals, as to what one of them is very much a need for architecture that allows teams to work independently, having a higher predictor of even, continuous delivery. I love that. And that from the individual perspective, the ideal being, that allows us to focus on the work we want to do to help achieve the mission with a sense of flow and joy. And then really elevating the notion that greatness isn't free, we need to improve daily work, we have to make it psychologically safe to talk about problems. And then the last one really being, can we really unflinchingly look at the work we do on an everyday basis and ask, what the customers care about it? And if customers don't care about it, can we question whether that work really should be done or not. So that's where for me, it's really meant to speak to some more visceral emotions that were concretized and validated through the State of DevOps Report. But these notions I am just very attracted to.Corey Quinn: I like the idea of it. The question, of course, is always how to put these into daily practice. How do you take these from an idealized—well, let's not call it a textbook, but something very similar to that—and apply it to the I guess, uncontrolled chaos that is the day-to-day life of an awful lot of people in their daily jobs.Gene Kim: Yeah. Right. So, the protagonist is Maxine and her role in the story, in the beginning, is just to recognize what not great looks like. She's lived and created greatness for all of her career. And then she gets exiled to this terrible Phoenix project that chews up developers and spits them out and they leave these husks of people they used to be. And so, she's not doing a lot of problem-solving. Instead, it's this recoiling from the inability for people to do builds or do their own tests or be able to do work without having to open up 20 different tickets or not being able to do their own deploys. She just recoil from this spending five days watching people do code merges, and for me, I'm hoping that what this will do, and after people read the book, will see this all around them, hopefully, will have a similar kind of recoiling reaction where they say, “Oh my gosh, this is terrible. I should feel as bad about this as Maxine does, and then maybe even find my fellow rebels and see if we can create a pocket of greatness that can become like the sublimation event in Dr. Thomas Kuhn's book, The Structure of Scientific Revolutions.” Create that kernel of greatness, of which then greatness then finds itself surrounded by even more greatness.Corey Quinn: What I always found to be fascinating about your work is how you wind up tying so many different concepts together in ways you wouldn't necessarily expect. For example, when I was reviewing one of your manuscripts before this went to print, you did reject one of my suggestions, which was just, retitle the entire thing. Instead of calling it The Unicorn Project. Instead, call it Gene Kim's Love Letter to Functional Programming. So what is up with that?Gene Kim: Yeah, to put that into context, for 25 years or more, I've self-identified as an ops person. The Phoenix Project was really an ops book. And that was despite getting my graduate degree in compiler design and high-speed networking in 1995. And the reason why I gravitated towards ops, because that was my observation, that that's where the saves were made. It was ops who saved the customer from horrendous, terrible developers who just kept on putting things into production that would then blow up and take everyone with it. It was ops protecting us from the bad adversaries who were trying to steal data because security people were so ineffective. But four years ago, I learned a functional programming language called Clojure and, without a doubt, it reintroduced the joy of coding back into my life and now, in a good month, I spend half the time—in the ideal—writing, half the time hanging out with the best in the game, of which I would consider this to be a part of, and then 20% of time coding. And I find for the first time in my career, in over 30 years of coding, I can write something for years on end, without it collapsing in on itself, like a house of cards. And that is an amazing feeling, to say that maybe it wasn't my inability, or my lack of experience, or my lack of sensibilities, but maybe it was just that I was sort of using the wrong tool to think with. That comes from the French philosopher Claude Lévi-Strauss. He said of certain things, “Is it a good tool to think with?” And I just find functional programming is such a better tool to think with, that notions like composability, like immutability, what I find so exciting is that these things aren't just for programming languages. And some other programming languages that follow the same vein are, OCaml, Lisp, ML, Elixir, Haskell. These all languages that are sort of popularizing functional programming, but what I find so exciting is that we see it in infrastructure and operations, too. So Docker is fundamentally immutable. So if you want to change a container, we have to make a new one. Kubernetes composes these containers together at the level of system of systems. Kafka is amazing because it usually reveals the desire to have this immutable data model where you can't change the past. Version control is immutable. So, I think it's no surprise that as our systems get more and more complex and distributed, we're relying on things like immutability, just to make it so that we can reason about them. So, it is something I love addressing in the book, and it's something I decided to double down on after you mentioned it. I'm just saying, all kidding aside is this a book for—Corey Quinn: Oh good, I got to make it worse. Always excited when that happens.Gene Kim: Yeah, I mean, your suggestion really brought to the forefront a very critical decision, which was, is this a book for technology leaders, or even business leaders, or is this a book developers? And, after a lot of soul searching, I decided no, this is a book for developers, because I think the sensibilities that we need to instill and the awareness we need to create these things around are the developers and then you just hope and pray that the book will be good enough that if enough engineers like it, then engineering leaders will like it. And if enough engineering leaders like it, then maybe some business leaders will read it as well. So that's something I'm eagerly seeing what will happen as the weeks, months, and years go by. Corey Quinn: This episode is sponsored in part by DataStax. The NoSQL event of the year is DataStax Accelerate in San Diego this May from the 11th through the 13th. I've given a talk previously called the myth of multi-cloud, and it's time for me to revisit that with... A sequel! Which is funny given that it's a NoSQL conference, but there you have it. To learn more, visit datastax.com that's D-A-T-A-S-T-A-X.com and I hope to see you in San Diego. This May.Corey Quinn: One thing that I always admired about your writing is that you can start off trying to make a point about one particular aspect of things. And along the way you tie in so many different things, and the functional programming is just one aspect of this. At some point, by the end of it, I half expected you to just pick a fight over vi versus Emacs, just for the sheer joy you get in effectively drawing interesting and, I guess, shall we say, the right level of conflict into it, where it seems very clear that what you're talking about is something thing that has the potential to be transformative and by throwing things like that in you're, on some level, roping people in who otherwise wouldn't weigh in at all. But it's really neat to watch once you have people's attention, just almost in spite of what they want, you teach them something. I don't know if that's a fair accusation or not, but it's very much I'm left with the sense that what you're doing has definite impact and reverberations throughout larger industries.Gene Kim: Yeah, I hope so. In fact, just to reveal this kind of insecurity is, there's an author I've read a lot of and she actually read this blog post that she wrote about the worst novel to write, and she called it The Yeomans Tour of the Starship Enterprise. And she says, “The book begins like this: it's a Yeoman on the Starship Enterprise, and all he does is admire the dilithium crystals, and the phaser, and talk about the specifications of the engine room.” And I sometimes worry that that's what I've done in The Unicorn Project, but hopefully—I did want to have that technical detail there and share some things that I love about technology and the things I hate about technology, like YAML files, and integrate that into the narrative because I think it is important. And I would like to think that people reading it appreciate things like our mutual distaste of YAML files, that we've all struggled trying to escape spaces and file names inside of make files. I mean, these are the things that are puzzles we have to solve, but they're so far removed from the business problem we're trying to solve that really, the purpose of that was trying to show the mistake of solving puzzles in our daily work instead of solving real problems.Corey Quinn: One thing that I found was really a one-two punch, for me at least, was first I read and give feedback on the book and then relatively quickly thereafter, I found myself at my first DevOps Enterprise Summit, and I feel like on some level, I may have been misinterpreted when I was doing my live-tweeting/shitposting-with-style during a lot of the opening keynotes, and the rest, where I was focusing on how different of a conference it was. Unlike a typical DevOps Days or big cloud event, it wasn't a whole bunch of relatively recent software startups. There were serious institutions coming out to have conversations. We're talking USAA, we're talking to US Air Force, we're talking large banks, we're talking companies that have a 200-year history, where you don't get to just throw everything away and start over. These are companies that by and large, have, in many ways, felt excluded to some extent, from the modern discussions of, well, we're going to write some stuff late at night, and by the following morning, it's in production. You don't get to do that when you're a 200-year-old insurance company. And I feel like that was on some level interpreted as me making fun of startups for quote/unquote, “not being serious,” which was never my intention. It's just this was a different conversation series for a different audience who has vastly different constraints. And I found it incredibly compelling and I intend to go back.Gene Kim: Well, that's wonderful. And, in fact, we have plans for you, Mr. Quinn.Corey Quinn: Uh-oh.Gene Kim: Yeah. I think when I say I admire the DevOps Enterprise community. I mean that I'm just so many different dimensions. The fact that these, leaders and—it's not leaders just in terms of seniority on the organization chart—these are people who are leading technology efforts to survive and win in the marketplace. In organizations that have been around sometimes for centuries, Barclays Bank was founded in the year 1634. That predates the invention of paper cash. HMRC, the UK version of the IRS was founded in the year 1200. And, so there's probably no code that goes that far back, but there's certainly values and—Corey Quinn: Well, you'd like to hope not. Gene Kim: Yeah, right. You never know. But there are certainly values and traditions and maybe even processes that go back centuries. And so that's what's helped these organizations be successful. And here are a next generation of leaders, trying to make sure that these organizations see another century of greatness. So I think that's, in my mind, deeply admirable.Corey Quinn: Very much so. And my only concern was, I was just hoping that people didn't misinterpret my snark and sarcasm as aimed at, “Oh, look at these crappy—these companies are real companies and all those crappy SAS companies are just flashes in the pan.” No, I don't believe that members of the Fortune 500 are flash in the pan companies, with a couple notable exceptions who I will not name now, because I might want some of them on this podcast someday. The concern that I have is that everyone's work is valuable. Everyone's work is important. And what I'm seeing historically, and something that you've nailed, is a certain lack of stories that apply to some of those organizations that are, for lack of a better term, ossified into their current process model, where they there's no clear path for them to break into, quote/unquote, “doing the DevOps.”Gene Kim: Yeah. And the business frame and the imperative for it is incredible. Tesla is now offering auto insurance bundled into the car. Banks are now having to compete with Apple. I mean, it is just breathtaking to see how competitive the marketplaces and the need to understand the customer and deliver value to them quickly and to be able to experiment and innovate and out-innovate the competition. I don't think there's any business leader on the planet who doesn't understand that software is eating the world and they have to that any level of investment they do involves software at some level. And so the question is, for them, is how do they get educated enough to invest and manage and lead competently? So, to me it really is like the sleeping giant awakening. And it's my genuine belief is that the next 50 years, as much value as the tech giants have created: Facebook, Amazon, Netflix, Google, Microsoft, they've generated trillions of dollars of economic value. When we can get eighteen million developers, as productive as an engineer at a tech giant is, that will generate tens of trillions of dollars of economic value per year. And so, when you generate that much economic activity, all problems become solvable, you look at climate change, you take a look at the disparity between rich and poor. All things can be fixed when you significantly change the economic economy in this way. So, I'm extremely hopeful and I know that the need for things like DevOps are urgent and important.Corey Quinn: I guess that that's probably the best way of framing this. So you wrote one version that was aimed at operators back in 2013, this one was aimed at developers, and effectively retails and clarifies an awful lot of the same points. As a historical ops person, I didn't feel left behind by The Unicorn Project, despite not being its target market. So I guess the question on everyone's mind, are you planning on doing a third iteration, and if so, for what demographic?Gene Kim: Yeah, nothing at this point, but there is one thing that I'm interested in which is the role of business leaders. And Sarah is an interesting villain. One of my favorite pieces of feedback during the review process was, “I didn't think I could ever hate Sarah more. And yet, I did find her even to be more loathsome than before.” She's actually based on a real person, someone that I worked with.Corey Quinn: That's the best part, is these characters are relatable enough that everyone can map people they know onto various aspects of them, but can't ever disclose the entire list in public because that apparently has career consequences.Gene Kim: That's right. Yes, I will not say who the character is based on but there's, in the last scene of the book that went to print, Sarah has an interesting interaction with Maxine, where they meet for lunch. And, I think the line was, “And it wasn't what Maxine had thought, and she's actually looking forward to the next meeting.” I think that leaves room for it. So one of the things I want to do with some friends and colleagues is just understand, why does Sarah act the way she does? I think we've all worked with someone like her. And there are some that are genuinely bad actors, but I think a lot of them are doing something, based on genuine, real motives. And it would be fun, I thought, to do something with Elizabeth Henderson, who we decided to start having a conversation like, what does she read? What is her background? What is she good at? What does her resume look like? And what caused her to—who in technology treated her so badly that she treats technology so badly? And why does she behave the way she does? And so I think she reads a lot of strategy books. I think she is not a great people manager, I think she maybe has come from the mergers and acquisition route that viewed people as fungible. And yeah, I think she is definitely a creature of economics, was lured by an external investor, about how good it can be if you can extract value out of the company, squeeze every bit of—sweat every asset and sell the company for parts. So I would just love to have a better understanding of, when people say they work with someone like a Sarah, is there a commonality to that? And can we better understand Sarah so that we can both work with her and also, compete better against her, in our own organizations?Corey Quinn: I think that's probably a question best left for people to figure out on their own, in a circumstance where I can't possibly be blamed for it.Gene Kim: [laughing].That can be arranged, Mr. Quinn.Corey Quinn: All right. Well, if people want to learn more about your thoughts, ideas, feelings around these things, or of course to buy the book, where can they find you?Gene Kim: If you're interested in the ideas that are in The Unicorn Project, I would point you to all of the freely available videos on YouTube. Just Google DevOps Enterprise Summit and anything that's on the plenary stage are specifically chosen stories that very much informed The Unicorn Project. And the best way to reach me is probably on Twitter. I'm @RealGeneKim on Twitter, and feel free to just @ mention me, or DM me. Happy to be reached out in whatever way you can find me. Corey Quinn: You know where the hate mail goes then. Gene, thank you so much for taking the time to speak with me, I appreciate it.Gene Kim: And Corey, likewise, and again, thank you so much for your unflinching feedback on the book and I hope you see your fingerprints all over it and I'm just so delighted with the way it came out. So thanks to you, Corey. Corey Quinn: As soon as my signed copy shows up, you'll be the first to know.Gene Kim: Consider it done. Corey Quinn: Excellent, excellent. That's the trick, is to ask people for something in a scenario in which they cannot possibly say no. Gene Kim, multiple award-winning CTO, researcher, and author. Pick up his new book, The Wall Street Journal best-selling The Unicorn Project. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on Apple Podcasts. If you hated this podcast, please leave a five-star review on Apple Podcasts and leave a compelling comment.Announcer: This has been this week's episode of Screaming in the Cloud. You can also find more Corey at ScreamingintheCloud.com or wherever fine snark is sold.This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Holiday Replay Edition - Burnout Isn't a Sign of Weakness with Dr. Christina Maslach, PhD

Screaming in the Cloud

Play Episode Listen Later Dec 27, 2022 33:50


About ChristinaChristina Maslach, PhD, is a Professor of Psychology (Emerita) and a researcher at the Healthy Workplaces Center at the University of California, Berkeley.  She received her A.B. from Harvard, and her Ph.D. from Stanford.  She is best known as the pioneering researcher on job burnout, producing the standard assessment tool (the Maslach Burnout Inventory, MBI), books, and award-winning articles.  The impact of her work is reflected by the official recognition of burnout, as an occupational phenomenon with health consequences, by the World Health Organization in 2019.  In 2020, she received the award for Scientific Reviewing, for her writing on burnout, from the National Academy of Sciences.  Among her other honors are: Fellow of the American Association for the Advancement of Science (1991 -- "For groundbreaking work on the application of social psychology to contemporary problems"), Professor of the Year (1997), and the 2017 Application of Personality and Social Psychology Award (for her research career on job burnout).  Links: The Truth About Burnout: https://www.amazon.com/Truth-About-Burnout-Organizations-Personal/dp/1118692136 TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: If you asked me to rank which cloud provider has the best developer experience, I'd be hard-pressed to choose a platform that isn't Google Cloud. Their developer experience is unparalleled and, in the early stages of building something great, that translates directly into velocity. Try it yourself with the Google for Startups Cloud Program over at cloud.google.com/startup. It'll give you up to $100k a year for each of the first two years in Google Cloud credits for companies that range from bootstrapped all the way on up to Series A. Go build something, and then tell me about it. My thanks to Google Cloud for sponsoring this ridiculous podcast.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. One subject that I haven't covered in much depth on this show has been a repeated request from the audience, and that is to talk a bit about burnout. So, when I asked the audience who I should talk to about burnout, there were really two categories of responses. The first was, “Pick me. I hate my job, and I'd love to talk about that.” And the other was, “You should speak to Professor Maslach.” Christina Maslach is a Professor of Psychology at Berkeley. She's a teacher and a researcher, particularly in the area of burnout. Professor, welcome to the show.Dr. Maslach: Well, thank you for inviting me.Corey: So, I'm going to assume from the outset that the reason that people suggest that I speak to you about burnout is because you've devoted a significant portion of your career to studying the phenomenon, and not just because you hate your job and are ready to go do something else. Is that directionally correct?Dr. Maslach: That is directionally correct, yes. I first stumbled upon the phenomenon back in the 1970s—which is, you know, 45, almost 50 years ago now—and have been fascinated with trying to understand what is going on.Corey: So, let's start at the very beginning because I'm not sure in, I guess, the layperson context that I use the term that I fully understand it. What is burnout?Dr. Maslach: Well, burnout as we have been studying it over many years, it's a stress phenomenon, okay, it's a response to stressors, but it's not just the exhaustion of stress. That's one component of it, but it actually has two other components that go along with it. One is this very negative, cynical, hostile attitude toward the job and the other people in it, you know, “Take this job and shove it,” kind of feeling. And usually, people don't begin their job like that, but that's where they go if they become more burned out.Corey: I believe you may have just inadvertently called out a decent proportion of the tech sector.Dr. Maslach: [laugh].Corey: Or at least, that might just be my internal cynicism rising to the foreground.Dr. Maslach: No, it's not. Actually, I have heard from a number of tech people over the past decades about just this kind of issue. And so I think it's particularly relevant. The third component that we see going along with this, it usually comes in a little bit later, but I've heard a lot about this from tech people as well, and that is that you begin to develop a very negative sense of your own self, and competence, and where you're going, and what you're able to do. So, the stress response of exhaustion, the negative cynicism towards the job, the negative evaluation of yourself, that's the trifecta of burnout.Corey: You've spent a lot of your early research at least focusing on, I guess, occupations that you could almost refer to as industrial, in some respects: working with heavy equipment, working with a variety of different professionals in very stressful situations. It feels weird, on some level, to say, “Oh, yeah, my job is very stressful. In that vein, I have to sit in front of a computer all day, and sometimes I have to hop on a meeting with people.” And it feels, on some level, like that even saying, “I'm experiencing burnout,” in my role is a bit of an overreach.Dr. Maslach: Yeah, that's an interesting point because, in fact, yes, when we think about OSHA, you know, and occupational risks and hazards, we do think about the chemicals, and the big equipment, and the hazards, so having more psychological and social risk factors, is something that probably a lot of people don't resonate to immediately and think, well, if you're strong, and if you're resilient, and whatever, you can—anybody can handle that, and that's really a test almost of your ability to do your work. But what we're finding is that it has its own hazards, psychological and social as well. And so, burnout is something that we've seen in a lot of more people-oriented professions, from the beginning. Healthcare has had this for a long time. Various kinds of social services, teaching, all of these other things. So, it's actually not a sign of weakness as some people might think.Corey: Right. And that's part of the challenge and, honestly, one of the reasons that I've stayed away from having in-depth discussions about the topic of burnout on the show previously is it feels that—rightly or wrongly, and I appreciate your feedback on this one either way—it feels like it's approaching the limits of what could be classified as mental health. And I can give terrible advice on how computers work—in fact, I do on a regular basis; it's kind of my thing—and that's usually not going to have any lasting impact on people who don't see through the humor part of that. But when we start talking about mental health, I'm cautious because it feels like an inadvertent story or advice that works for some but not all, has the potential to do a tremendous bit of damage, and I'm very cautious about that. Is burnout a mental health issue? Is it a medical issue that is recognized? Where does it start, okay does it stop on that spectrum?Dr. Maslach: It is not a medical issue—and the World Health Organization, which just came out with a statement about this in 2019 on burnout, they're recognizing it as an occupational risk factor—made it very clear that this is not a medical thing. It is not a medical disease, it doesn't have a certain set of medical diagnoses, although people tend to sometimes go there. Can it have physical health outcomes? In other words, if you're burning out and you're not sleeping well, and you're not eating well, and not taking care of yourself, do you begin to impair your physical health down the road? Yes.Could it also have mental health outcomes, that you begin to feel depressed, and anxious, and not knowing what to do, and afraid of the future? Yes, it could have those outcomes as well. So, it certainly is kind of like—I can put it this way, like a stepping stone in a path to potential negative health: physical health, or mental health issues. And I think that's one of the reasons why it is so important. But unfortunately, a lot of people still view it as somebody who's burned out isn't tough enough, strong enough, they're wimpy, they're not good enough, they're not a hundred percent.And so the stigma that is often attached to burnout, people not only indulge it, but they feel it directed towards them, and often they will try to hide the kinds of experiences they're having because they worry that they are going to be judged negatively, thrown under the bus, you know, let go from the job, whatever, if they talk about what's actually happening with them.Corey: What do you see, as you look around, I guess, the wide varieties of careers that are susceptible to burnout—which I have a sneaking suspicion based upon what you've said rounds to all of them—what do you think is the most misunderstood, or misunderstood aspects of burnout?Dr. Maslach: I think what's most misunderstood is that people assume that it is a problem of the individual person. And if somebody is burned out, then they've got to just take care of themselves, or take a break, or eat better, or get more sleep, all of those kinds of things which cope with stressors. What's not as well understood or focused on is the fact that this is a response to other stressors, and these stressors are often in the workplace—this is where I've been studying it—but in essentially in the larger social, physical environment that people are functioning in. They're not burning out all by themselves.There's a reason why they are feeling the kind of exhaustion, developing that cynicism, beginning to doubt themselves, that we see with burnout. So there, if you ever want to talk about preventing burnout, you really have to be focusing on what are the various kinds of things that seem to be causing the problem, and how do we modify those? Coping with stressors is a good thing, but it doesn't change the stressors. And so we really have to look at that, as well as what people can bring about, you know, taking care of themselves or trying to do the job better or differently.Corey: I feel like it's impossible to have a conversation like this without acknowledging the background of the past year that many of us have spent basically isolated, working from home. And for some folks, okay, they were working from home before, but it feels different now. At least that's the position I find myself in. Other folks are used to going into an office and now they're either isolated—and research shows that it has been worse, statistically, for single people versus married people, but married people are also trapped at home with their spouse, which sounds half-joking but it is very real. At some point, distance is useful.And it feels like everyone is sort of a bit at their wit's end. It feels like things are closer to being frayed, there's a constant sense that there's this, I guess, pervasive dread for the past year. Are you seeing that that has a potential to affect how burnout is being expressed or perceived?Dr. Maslach: I think it has, and one of the things that we clearly see is that people are using the word burnout, more and more and more and more. It's almost becoming the word du jour, and using it to describe, things are going wrong and it's not good. And it may be overstretching the use of burnout, but I think the reason of the popularity of the term is that it has this kind of very vivid imagery of things going up in smoke, and can't handle it, and flames licking at your heels, and all this sort of stuff so that they can do that. I even got a comment from a colleague in France just a few days ago, where they're talking about, “Is burnout the malady of the century?” you know, kind of thing. And it's being used a lot; it's sometimes maybe overused, but I think it's also striking a chord with people as a sign that things are going badly, and I don't know how to deal with it in some way.Corey: It also feels, on some level, for those of us who are trapped inside, it kind of almost feels like it's a tremendous expression of privilege because who am I to have a problem with this? Oh, I have to go inside and order a lot of takeout and spend time with my family. And I look at how folks who are nowhere near as privileged have to go and be essential workers and show up in increasingly dangerous positions. And it almost feels like burnout isn't something that I'm entitled to, if that makes sense.Dr. Maslach: [laugh]. Yeah. It's an interesting description of that because I think there are ways in which people are looking at their experience and dealing with it, and like many things in life, I find that all of these things are a bit of a double-edged sword; there's positive and there's negative aspects to them. And so when I've talked with some people about now having to work from home rather than working in their office, they're also bringing up, “Well, hey, I've noticed that the interviews I'm doing with potential clients are actually going a little better”—you know, this is from a law office—“And trying to figure out how—are we doing it differently so that people can actually relate to each other as human beings instead of the suit and tie in the big office? What's going on in terms of how we're doing the work that there may be actually a benefit here?”For others. It's been, “Oh, my gosh. I don't have to commute, but endless meetings and people are thinking I'm not doing my job, and I don't know how to get in touch, and how do we work together effectively?” And so there's other things that are much more difficult, in some sense. I think another thing that you have to keep in mind that it's not just about how you're doing your work, perhaps differently, or you're under different circumstances, but people, so many people have lost their jobs, and are worried that they may lose their jobs.That we're actually finding that people are going into overdrive and working harder and more hours as a way of trying to protect from being the next one who won't have any income at all. So, there's a lot of other dynamics that are going on as a result of the pandemic, I think, that we need to be aware of.Corey: One thing that I'd like to point out is that you are a Professor Emerita of Psychology at Berkeley, which means you presumably wound up formulating this based upon significant bodies of peer-reviewed research, as opposed to just coming up with a thesis, stating it as if it were fact, and then writing an entire series of books on it. I mean, that path, I believe, is called being a venture capitalist, but I may be mistaken on that front. How do you effectively study something like burnout? It feels like it is so subjective and situation-specific, but it has to have a normalization aspect to it.Dr. Maslach: Uh, yeah, that's a good point. I think, in fact, the first time I ever wrote about some of the stuff that I was learning about burnout back in the mid '70s—I think it was '75, '76 maybe—and it was in a magazine, it wasn't in a journal. It wasn't peer-reviewed because not even peer-reviewed journals would review this; they thought it was pop psychology, and eh. So, I would get, in those days, snail mail by the sackfuls from people saying, “Oh, my God. I didn't know anybody else felt like this. Let me tell you my story.”You know, kind of thing. And so that was really, after doing a lot of interviews with people, following them on the job when possible to, sort of, see how things were going, and then writing about the basic themes that were coming out of this, it turned out that there were a lot of people who responded and said, “I know that. I've been there. I'm experiencing it.” Even though each of them were sort of thinking, “I'm the only one. What's wrong with me? Everybody else seems fine.”And so part of the research in trying to get it out in whatever form you can is trying to share it because that gives you feedback from a wide variety of people, not only the peers reviewing the quality of the research, but the people who are actually trying to figure out how to deal effectively with this problem. So it's, how do I and my colleagues actually have a bigger, broader conversation with people from which we learn a lot, and then try and say, okay, and here's everything we've heard, and let's throw it back out and share it and see what people think.Corey: You have written several books on the topic, if I'm not mistaken. And one thing that surprises me is how much what you talk about in those books seems to almost transcend time. I believe your first was published in 1982—Dr. Maslach: Right.Corey: —if I'm not mistaken—Dr. Maslach: Yes.Corey: —and it's an awful lot of what it talks about still feels very much like it could be written today. Is this just part of the quintessential human experience? Or has nothing new changed in the last 200 years since the Industrial Revolution? How is it progressing, if at all, and what does the future look like?Dr. Maslach: Great questions and I don't have a good answer for you. But we have sort of struggled with this because if you look at older literature, if you even go back centuries, if you even go back in parts of the Bible or something, you're seeing phrases and descriptions sometime that says sounds a lot like burnout, although we're not using that term. So, it's not something that I think just somehow got invented; it wasn't invented in the '70s or anything like that. But trying to trace back those roots and get a better sense of what are we capturing here is fascinating, and I think we're still working on it.People have asked, well, where did the term ‘burnout' as opposed to other kinds of terms come from? And it's been around for a while, again, before the '70s or something. I mean, we have Graham Greene writing the novel A Burnt-Out Case, back in the early '60s. My dad was an engineer, rarefied gas dynamics, so he was involved with the space program and engineers talk about burnout all the time: ball bearings burn out, rocket boosters burn out. And when they started developing Silicon Valley, all those little startups and enterprises, they advertised as burnout shops. And that was, you know, '60s, into the '70s, et cetera, et cetera. So, the more modern roots, I think probably have some ties to that use of the term before I and other researchers even got started with it.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: This is one of those questions that is incredibly self-serving, and I refuse to apologize for it. How can I tell whether I'm suffering from burnout, versus I'm just a jerk with an absolutely terrible attitude? And that is not as facetious a question as it probably sounds like.Dr. Maslach: [laugh]. Yeah. Well, part of the problem for me—or the challenge for me—is to understand what it is people need to know about themselves. Can I take a diagnostic test which tells me if I am burned out or if I'm something else?Sort of the more important question is, what is feeling right and what is not feeling so good—or even wrong—about my experience? And usually, you can't figure that all out by yourself and you need to get other input from other people. And it could be a counselor or therapist, or it could be friends or colleagues who you have to be able to get to a point where we can talk about it, and hear each other, and get some feedback without putdowns, just sort of say, “Yeah, have you ever thought about the fact that when you get this kind of a task, you usually just go crazy for a while and not really settle down and figure out what you really need to do as opposed to what you think you have to do?” Part of this, are you bringing yourself in terms of the stress response, but what is it that you're not doing—or that you're doing not well—to figure out solutions, to get help or advice or better input from others. So, it takes time, but it really does take a lot of that kind of social feedback.So, when I said—if I can stay with it a little bit more—when I first was writing and publishing about and all these people were writing back saying, “I thought I was the only one,” that phenomenon of putting on a happy face and not letting anybody else see that you're going through some difficult challenges, or feeling bad, or depressed, or whatever is something we call pluralistic ignorance; means we don't have good knowledge about what is normal, or what is being shared, or how other people are because we're all pretending to put on the happy face, to pretend and make sure that everybody thinks we're okay and is not going to come after us. But if we all do that, then we all, together, are creating a different social reality that people perceive rather than actually what is happening behind that mask.Corey: It feels, on some level, like this is an aspect of the social media problem, where we're comparing our actual lives and all the bloopers that we see to other people's highlight reels because few people wind up talking very publicly about their failures.Dr. Maslach: Oh, yeah. Yeah. And often for good reason because they know they will be attacked and dumped. And there could be some serious consequences, and you just say, “I'm going to figure out what I'm going to do on my own.”But one of the things that when I work with people, and I'm asking them, “What do you think would help? What sort of things that don't happen could happen?” And so forth, one of the things that goes to the top of the list is having somebody else; a safe relationship, a safe place where we can talk, where we can unburden, where you're not going to spill the beans to everybody else, and you're getting advice, or you're getting a pat on the back, or a shoulder to cry on, and that you're there for them for the same kind of reason. So, it's a different form of what we think of as social network. It used to be that a network like that meant that you had other people, whether family, friends, neighbors, colleagues, whoever, that you knew, you could go to; a mentor, an advisor, a trusted ally, and that you would perform that role for them and other people, as well.And what has happened, I think, to add to the emphasis on burnout these days, is that those social connections, those trusts, between people has really been shredding, and, you know—or cut off or broken apart. And so people are feeling isolated, even if they're surrounded by a lot of other people, don't want to raise their hand, don't want to say, “Can we talk over coffee? I'm really having a bad day. I need some help to figure out this problem.” And so one of those most valuable resources that human beings need—which is other people—is, if we're working in environments where that gets pulled apart, and shredded, and it's lacking, that's a real risk factor for burnout.Corey: What are the things that contribute to burnout? It doesn't feel, based upon what you've said so far, that it's one particular thing. There has to be points of commonality between all of this, I have to imagine.Dr. Maslach: Yeah.Corey: Is it possible to predict that, oh, this is a scenario in which either I or people who are in this role are likely to become burned out faster?Dr. Maslach: Mm-hm. Yeah. Good question and I don't know if we have a final answer, but at this point, in terms of all the research that's been done, not just on burnout, but on much larger issues of health, and wellbeing, and stress, and coping, and all the rest of it, there are clearly six areas in which the fit between people and their job environment are critical. And if the fit is—or the match, or the balance—is better, they are going to be at less risk for burnout, they're more likely to be engaged with work.But if some real bad fits, or mismatches, occur in one or more of these areas, that raises the risk factor for burnout. So, if I can just mention those six quickly. And these are not in any particular order because I find that people assume the first one is the worst or the best, and it's not. Any rate, one of them has to do with that social environment I was just talking about; think of it as the workplace community. All the people whose paths you cross at various points—you know, coworkers, the people you supervise, your bosses, et cetera—so those social relationships, that culture, do you have a supportive environment which really helps people thrive? Can you trust people, there's respect, and all that kind of thing going on? Or is it really what people are now describing as a socially toxic work environment?A second area has to do with reward. And it turns out not so much salary and benefits, it's more about social recognition and the intrinsic reward you get from doing a good job. So, if you work hard, do some special things, and nothing positive happens—nobody even pats you on the back, nobody says, “Gee, why don't you try this new project? I think you're really good at it,” anything that acknowledges what you've done—it's a very difficult environment to work in. People who are more at risk of burnout, when I asked them, “What is a good day for you? A good day. A really good day.” And the answer is often, “Nothing bad happens.” But it's not the presence of good stuff happening, like people glad that you did such good work or something like that.Third area has to do with values—and this is one that also often gets ignored, but sometimes this is the critical bottom line—that you're doing work that you think is meaningful, where you're working has integrity, and you're in line with that as opposed to value conflicts where you're doing things that you think are wrong: “I want to help people, I want to help cure patients, and here, I'm actually only supposed to be trying to help the hospital get more money.” When they have that kind of value conflict, this is often where they have to say, “I don't want to sell my soul and I'm leaving.”The fourth area is one of fairness. And this is really about that whatever the policies, the principles, et cetera, they're administered fairly. So, when things are going badly here—the mismatch—this is where discrimination lives, this is where glass ceilings are going on, that people are not being treated fairly in terms of the work they do, how they're promoted, or all of those kinds of things. So, that interpersonal respect, and, sort of, social justice is missing.The next two areas—the fifth and six—are probably the two that had been the most well-known for a long time. One has to do with workload and how manageable it is. Given the demands that you have, do you have sufficient resources, like time, and tools, and whatever other kind of teams support you need to get the job done. And control is about the amount of autonomy and the opportunities you have to perhaps improvise, or innovate, or correct, or figure out how to do the job better in some way. So, when people are having mismatches in work overload; a lack of control; you cannot improvise; where you have unfairness; where there is values that are just incompatible with what you believe is right, a sort of moral issue; where you're not getting any kind of positive feedback, even when it's deserved, for the kind of work you're doing; and when you're working in a socially toxic relationship where you can't trust people, you don't know who to turn to, people are having unresolved conflicts all the time. Those six areas are, those are the markers really of risk factors for burnout.Corey: I know that I'm looking back through my own career history listening to you recount those and thinking, “Oh, maybe I wasn't just a terrible employee in every one of those situations.”Dr. Maslach: Exactly.Corey: I'm sure a lot of it did come from me, I want to be very clear here. But there's also that aspect of this that might not just be a ‘me' problem.Dr. Maslach: Yeah. That's a good way of putting it. It's really in some sense, it's more of a ‘we' problem than a ‘me' problem. Because again, you're not working in isolation, and the reciprocal relationship you have with other people, and other policies, and other things that are happening in whatever workplace that is, is creating a kind of larger environment in which you and many others are functioning.And we've seen instances where people begin to make changes in that environment—how do we do this differently? How can we do this better, let's try it out for a while and see if this can work—and using those six areas, the value is not just, “Oh, it's really in bad shape. We have huge unfairness issues.” But then it says, “It would be better if we could figure out a way to get rid of that fairness problem, or to make a modification so that we have a more fair process on that.” So, they're like guideposts as well.As people start thinking through these six areas, you can sort of say, “What's working well, in terms of workload, what's working badly? Where do we run into problems on control? How do we improve the social relationships between colleagues who have to work together on a team?” They're not just markers of what's gone wrong, but they can—if you flip it around and look at it, let's look at the other end—okay is a path that we could get better? Make it right?Corey: If people want to learn more about burnout in general, and you're working in it specifically, where can they go to find your work and learn more about what you have to say?Dr. Maslach: Obviously, there's been a lot of articles, and now lots of things on the web, and in past books that I've written. And as you said, in many ways, they are still pretty relevant. The Truth About Burnout came out, oh gosh, '97. So, that's 25 years ago and it's still work.But my colleague, Michael Leiter from Canada, and I have just written up a new manuscript for a new book in which we really are trying to focus on sharing everything we have learned about, you know, what burnout has taught us, and put that into a format of a book that will allow people to really take what we've learned and figure out how does this apply? How can this be customized to our situation? So, I'm hoping that that will be coming out within the next year.Corey: And you are, of course, welcome back to discuss your book when it releases.Dr. Maslach: I would be honored if you would have me back. That would be a wonderful treat.Corey: Absolutely. But in return, I do expect a pre-release copy of the manuscript, so I have something intelligent to talk about.Dr. Maslach: [laugh]. Of course, of course.Corey: Thank you so much for your time. I really appreciate it.Dr. Maslach: Well, thank you for having me. I appreciate the opportunity to share this, especially during these times.Corey: Indeed. Professor Christina Maslach, Professor Emeritus of Psychology at Berkeley, I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment telling me why you're burned out on this show.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Holiday Replay Edition - Continuous Integration and Continuous Delivery Made Easy with Rob Zuber

Screaming in the Cloud

Play Episode Listen Later Dec 22, 2022 38:53


About RobRob Zuber is a 20-year veteran of software startups; a four-time founder, three-time CTO. Since joining CircleCI, Rob has seen the company through its Series B, Series C, and Series D funding and delivered on product innovation at scale. Rob leads a team of 150+ engineers who are distributed around the globe.Prior to CircleCI, Rob was the CTO and Co-founder of Distiller, a continuous integration and deployment platform for mobile applications acquired by CircleCI in 2014. Before that, he cofounded Copious an online social marketplace. Rob was the CTO and Co-founder of Yoohoot, a technology company that enabled local businesses to connect with nearby consumers, which was acquired by Appconomy in 2011.Links: Twitter: @z00b LinkedIn URL: https://www.linkedin.com/in/robzuber/ Personal site: https://www.crunchbase.com/person/rob-zuber#section-overview Company site: www.circleci.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host cloud economist, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: If you asked me to rank which cloud provider has the best developer experience, I'd be hard-pressed to choose a platform that isn't Google Cloud. Their developer experience is unparalleled and, in the early stages of building something great, that translates directly into velocity. Try it yourself with the Google for Startups Cloud Program over at cloud.google.com/startup. It'll give you up to $100k a year for each of the first two years in Google Cloud credits for companies that range from bootstrapped all the way on up to Series A. Go build something, and then tell me about it. My thanks to Google Cloud for sponsoring this ridiculous podcast.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by Rob Zuber, CTO of CircleCI. Rob, welcome to the show.Rob: Thanks. Thanks for having me. It's great to be here.Corey: It really is, isn't it? So you've been doing the CTO dance, for lack of a better term, at CircleCI for about five, six years now at this point?Rob: Yeah, that's right. I joined five and a half years ago. I actually came in through an acquisition. We were building a CI/CD platform for mobile, iOS specifically, and there were just a few of us. I came in an engineering role, but within, I think a year, had taken over the CTO role and have been doing that since.Corey: For those of us who've been living under a rock and recording podcasts, CI/CD or Continuous Integration/Continuous Delivery has gone through a bit of, shall we say, evolution since the term first showed up. My first exposure to it many moons ago was back when Jenkins was still called Hudson, and it was the box that you ran that it would wait for some event to happen, whether it was the passing of time, a commit to a particular branch, someone clicked a button, and then it would run a series of scripts, which sort of lent itself to the idea of the hacker news anthem, "That doesn't look hard. I can build that in a weekend." Now, we've seen a bit of growth in that space of not just, I guess the systems you can run yourselves, but also a lot of the SaaS offerings around this. That's the, I guess, the morons journey from my perspective to path through CI/CD. That's almost certainly lacking nuance. What is it, I guess in the real world with adults talking about it?Rob: Yeah, so I think it's a good perspective, or it's a good description of the perspective that many people have. Many people enter into this feeling that way. I think, specifically when you talk about cloud providers in CircleCI, we do have an on-prem offering behind the firewall. No one really runs anything on-prem anymore. But we have an offering for that market, but the real leverage is for folks that can use our stuff, multi-tenant SaaS cloud offering. Because, ultimately it's true. Many people have start with something simple from a code based perspective, right? I'm starting out, I've got a small team. We have a pretty simple project, maybe a little monolith Ruby on rails, something like that. Actually, I think in the time of the start of CircleCI. Probably not too many people kick off the rails monolith these days because if you're not using Kubernetes and Docker, then you're probably not doing it right.Corey: So, the Kubernetes and Docker people tell us?Rob: Yeah, exactly. They will proudly tell you that. We'll come back around to that point if we want to, but so you have simple project and you have simple CI, right? You may just have a simple script that you're putting in a Jenkins box or something like that, but what ultimately ends up happening is it gets complicated, and as it gets complicated, it becomes a bigger and bigger distraction from the thing that you're really trying to do, right? You're trying to build a business to ... I don't know, to do ride hailing, to do scooter sharing, what's big these days. You might be trying to do any of the ...Corey: Oh, my project is Twitter for pets. We're revolutionizing the world of pet communication.Rob: Right. And do you want to spend your time working on pet communication or on CI/CD, right? CI/CD is a thing that we understand very well, we spend our time on it every day, we think about some of the depths of it, which we can go into in a second. One of the things that gets complicated, amongst others, is just scale. So you build a big team, you have multiple projects and you have that one box under your desk where you said, "Oh, it's not that hard to build CI/CD. Now, everybody's waiting for their stuff to run because someone else got in there before them and you're thinking, okay, well how do I buy ... maybe you're not buying more boxes, you're building out something in a cloud provider and then you're worrying about auto scaling because it starts to cost you too much to run those boxes, and how do you respond to the amount of load that you have on any given day?Because you're crunching for a deadline versus everybody's taken a week off. Then, you want to get your build done as quickly as possible. So you start figuring out how to paralyze the work and spread it across those machines. The list goes on and on. This is the reality that everyone runs into as they scale their work. We do that for you. While it seems simple and ... I said I came in through an acquisition, we were building CI/CD for iOS, and I was that person. I said, "This seems really simple. We should build it and put it in the market." It didn't take us very long to get that first version to build, and it had to be generic to support many different types of customers and their particular builds.It was a small start but we started to run into the same problems, and then of course as a business, we ran into the problem of getting access to customers and all those things and that's why we joined CircleCI and that became what is now our iOS offering. But there is a lot of value that you can get quickly, to your point, but then you start focusing time and energy on that. I often refer to it, others in the industry refer to these sorts of things as undifferentiated heavy lifting. Something that becomes big and complex over time and is not the core of your business. Then as you start to invest in it, as we invest in it, then we build capabilities that most people wouldn't bother to build when they write that first bash script off a trigger or whenever, around helping you get your project set up, handling the connection into hooks, handling authentication so that different users only have access to the code they should have access to, maybe isolating access to production secrets, for example, if you're doing deploy.The kinds of things that keep coming up over and over in CI/CD that people don't think about on that first pass but ended up hunting them down the road.Corey: What do you think that people tend to misunderstand the most about CI/CD as you take a look at that throughout the ecosystem? From my perspective, when it was a box that you ran, behind the firewall as you say, the problem was is that everyone talked about, "Oh yes, we use cattle, not pets, except the box that does the builds. Of course, that box has a bunch of hand-built stuff on it that's impossible to replicate. It has extraordinary permissions into production environments and can do horrifying things, and it was always the star of various security finding reports. There are a number of us who came up from an operation side viewing CI/CD as, in some ways, a liability, which I understand is a very biased and one sided perspective. But going beyond that, what are people missing? What are they not seeing about the CI/CD landscape?Rob: One thing that I think is really interesting there, well, one thing you call that was just resiliency, right? We think about that in the way that we operate that system. We have a world of cattle because we've managed to think about that as a true offering. So, as you scale and you start to think, "Oh, how do I make this resilient inside my operation?" That's going to become a challenge that you face. The other thing that I think about that I've noticed over the years is, I want to call it division of labor or division of responsibilities. Many of those single instance or even multi-instance self-managed CI/CD tools end up in a place where, past any size of team, honestly somebody needs to own it and manage it to make sure it's stable.The changes that you want to make as a developer are often tied to basically being managed by that administrator. To be a little clear, if I have a group responsible for running CI/CD and I want to start building a different type of code or a different project, and it requires a plugin or an extension to the CI/CD platform or CI/CD tool, then I need to probably file a ticket and wait for another department who is generally not super motivated to get my code out into production, to go make a change that they are going to evaluate and review and decide ... or maybe creates conflict with something somebody else is doing on that system. And then you say, "Oh well actually we can't have these co-installed so now we need two systems." It's that division of responsibilities. Whereas, having built a multi-tenant cloud offering, we could never have that. There is no world in which our customers say to us, "Hey, we want this plugin installed. Can you go do that for us?"Everything that is about how the development team thinks about their software and how they want their build to run, how they want their deploys to run, etc, needs to be in the hands of the developers, and everything that is about maintenance and operation and scale needs to be in our hands. It has created a very clear separation out of necessity, but one that even ... I mentioned that you can deploy CircleCI yourself and run it within a team, and in large organizations, that separation really helps them get leverage. Does that make sense?Corey: It really does. I think we're also seeing a change in perspective around resiliency and how this works. I once worked at a company I will not name where they were. It was either CircleCI or TeamCity. This was years and years ago where I don't recall exactly what they were using, but it doesn't matter because at one point the service took an outage, and in typical knee jerk reaction, well, that can never happen again. So they wound up doing all of the CI/CD work for some godforsaken reason on a Raspberry PI that some developer brought in and left in the corner of the office. Surprise, it took an awfully long time for tests to run on basically an underpowered toy project. The answer there was to just use less tests because you generally don't need to run nearly as many.I just stared at people for the longest time when it came to that. I think that one of the problems that we still see, I know when I write code myself, I'm as guilty of this as anyone, I am a terrible developer and don't believe in tests. So, the CI/CD pipeline that I tend to look at is more or less a glorified script runner. Whenever I make a commit to this branch, go ahead and run the following three lines script that does a serverless deployment and puts it where it needs to go, and then I'll test it manually, or it's a pre-production environment so it's not that big of a deal. That can work for some use cases, but it's also a great thing that no one actually depends on the stuff that I write for day-to-day business operations or anything critical. At what point does it stop being a script runner?Rob: Well, to the point of the scale, I think there's a couple of things that you brought up in there that are interesting to me. One is the culture of testing. It feels like one of these areas of software development, because I was around in a time when no one really understood what it was to do automated testing. I won't even go into TDD, but just, in general, why would I do that? We have this QA team, it's cost effective to give it to a bunch of people. I'm thinking backwards or thinking back on that, it all seems a little bit well, wrong. But getting to the point where you've worked effectively with tests takes a little bit of effort. But once you have that, once you've sat and worked on something and had the feedback loop of, oh, this thing's not working. Oh, I'll just change this, now it's working.Really having that locally, as a developer, is super rewarding, in my mind and enabling I guess I would say as well. Then you get to this place where you're excited about building tests, especially as you're working in a team, and then culturally you end up in a place where, I put up a PR and someone else looks at it and says, "I see you're making an assumption or I believe you're making an assumption here, but I don't see any way that that's being validated. So please add testing to ensure that is actually true." Both because I want to make sure it's true now, but when we both forget that you ever wrote this and someone else makes a change, your assumptions hold or someone can understand that you were making those assumptions and they can make appropriate changes to deal with it.I think as you work in a team that's growing and scaling and beyond your pet project, once you've witnessed the value of that, you don't want to go back. So, people do end up writing more and more tests and that's what drives the scale at least on the testing and CI side in a way that you need to then manage that. Going the opposite direction of what you're describing, which is, hey, let's just write fewer tests and use cheaper machines, people are recognizing the value and saying, "Okay, we want that value, but we don't want to bottleneck everyone with an hour long build to run all these. So how do we get a system that's going to scale and support that?"Corey: That's what's fascinating, is watching that start to percolate beyond the traditional web applications with particular blessed languages and into other things. For example, in my copious spare time, I'm the community lead for the open guide to AWS, which is a GitHub project that has 25,000 stars or so, so you know it's good, where it's just a giant markdown document that lists the 10,000 tips and tricks that we all wish we'd known when we'd gotten started with AWS, and in a format that's easily consumable. The CI/CD approach we have right now, which I believe is done through Travis, is it just winds up running a giant link checker in parallel across the thousands of links that are ... sorry, I wanted to say 1,200 links, that are included within that document.There's really not a lot else we can do in that type of environment. I mean, a spellchecker with all of the terms of art involved would more or less a seg fault itself to death as soon as it took a look, but other than making sure we don't have dead links, and it feels like there's not a lot of automation or testing opportunity in something like that. Is that accurate? Am I completely wrong and missing something?Rob: I've never built that particular site so it ... I mean, it sounds reasonable. I think that going the other way, we often think about, before we kick off a large complex set of testing for a more complex application, maybe then a markdown document, a lot of people now will use things similar to what you're using, like maybe part of my application is a bunch of links to outside docs or outside sites that I'm referencing or if I run into a problem, I link you to our help site or something and making sure all that stuff is validated. Doing linting on the structure and format of code itself. One of the things that comes up as you scale out of the individual script runner is doing that work in parallel. I can say, you know what? Do the linting over here, do the link checking over here. Only use very small boxes for those.We don't happen to have Raspberry Pi's in our infrastructure, but we can give you a much smaller resource, which costs you less if you're not going to be pushing the limits of that. But then, if you have big integration tests or something which need more space than we can provide that as well, both in a single channel or pathway to give you the room to move faster and then to break that out and break up your work. At an extreme example, and of course, anyone who's done parallelization knows there's costs to splitting up work in like the management overhead. But if you have 1200 links, like you could check them all at the same time. I doubt that would be a good use of our platform, but you could check 600 in one and 600 in another, or 300s at a time or whatever, in find the optimal path if you really cared about getting that done more quickly.Corey: Right. Usually, it's not that big of a concern and usually it winds up throwing errors on existing bad links, not something that has been included in the pull request in question. Again, there's nothing that is so awesome that I can't horribly misuse it for something ridiculous. It's my entire stock and trade. It's why I believe route 53 remains the best database option for everyone, but it's fun going through this space and just seeing how things have evolved. One question I do have since you come from a background, by way of acquisition, that was aimed squarely at this, historically, it seems that running a lot of testing on mobile devices, specifically iOS devices, was the stuff of nightmares because you couldn't really run that in any meaningful way in a virtualized environment. So, it generally required an awful lot of devices. Is that still the case? Has that environment changed radically since I last worked at a mobile shop?Rob: I don't think so, but I think we've all started to think a little bit differently. We got started in that business because we were building iOS apps and thought, wow, the tooling here, it's really frustrating. To be clear, at CircleCI and at that business, we were solving the problem of managing the machines themselves, so the portion of the testing that you would run effectively in a simulator, not the problem of the device farm, if you will. But one of the things that I remember, and so this is late 2013, early 2014 as I was working on mobile apps was people shifting the MVC layers a little bit such that the thing that you needed to test on a device was getting smaller and smaller, meaning putting more logic in, I forget what the name was specifically, but it was like the ... I don't want to try to even guess.But basically pulling logic out of the actual rendering and down into what we'll call state transitions I guess. If you think about that in modern day and look at maybe web frameworks like React, you're trying to just respond with rendering on top of a lot of state change that happens underneath that. In that model, if you thin out the user interface portion, you make a lot more of your code testable, if that makes sense. The reason we're all trying to test on all these different devices is often that we've baked a lot of business logic into the view layer. Does that make sense?Corey: Yeah, it absolutely does. Please continue.Rob: Instead of saying, well, all our logic's in the view layer, so let's get really good at testing the view layer, which means massive device farms and a bunch of people testing all these things, let's make that layer as thin as possible, and there's analogies for this in even how we do service design these days and structure the architecture of systems, basically make the boundaries as thin as possible and the interaction with the outside world as thin as possible. That gives you much more capability to effectively test the majority or much larger portions of your business logic. The device farm problem is still a problem. People still want to see how something specifically renders on a particular screen or whatever. But by minimizing that, the amount that you have to invest in that gets smaller.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: You mentioned device farm, which is an app choice, given that that is the name of an AWS service that has a crap ton of mobile devices that you can log into and it's one of my top candidates for the, did I make this service up to mess with you competitions? It does lead us to an interesting question. CI/CD has gotten an increased amount of attention lately from pretty much everyone. AWS, as is typical for Amazon, tends to lie awake at night worrying that someone somehow is making money that isn't them. So their product strategy distills down to, yes. So, they wound up releasing a whole bunch of CI/CD oriented products that at launch were, to be polite, terrible. Over time, they've gotten slightly better, but it's still a very confusing ecosystem there.Then we see things like Azure dev ops who it seems is aimed at a very similar type of problem and they're also trying to challenge Amazon on the grounds of terrible names of services. But we're now seeing an increased focus from the first party providers themselves around the CI/CD space. What does that mean for existing entrenched players who have been making a specialty out of this for a lot longer than these folks have been playing with it?Rob: It's a great question. I think about the approaches very differently, which is probably unsurprising. Speaking of lying awake at night or spending all day thinking about these things, this is what we do. You've the term script runner a few times in the conversation, the thing that I see when I see someone like AWS looking at this problem is basically, people are using, the way that I think about it, is maybe less the money, although it translates pretty quickly. People are using compute to do something, can we get them to do that with us? Oddly enough, a massive chunk of CircleCI runs on AWS so it doesn't really matter to them one way or another, but they're effectively looking to drive compute hours and looking to drive a pathway onto their platform.One thing about that is it doesn't really matter to them in my perspective, whether people use that particular product or not. As a result, it gets the product investment that you put in when that's the case. So, it's a sort of a check the box approach like, hey we CI and we have CD like other people do. Whereas, when we look at CI and CD, we've been talking about some of the factors like scaling it effectively and making it really easy for you to understand what's going on. We think about very much the core use case, what is one of our customers or users doing when they show up? How do we do that in a way that maximizes their flow? Minimizes the overhead to them of using our system, whether it's getting set up and running really quickly, like talk about being in the center of how much of the world is developing software.So we see patterns, we see mistakes that people are making and can use that to inform both how our product works and inform you directly as a user. "Hey, I see that you're trying to do this. It would go better if you did this." I think both from the, honestly, the years that we've been doing this and the amount that we've witnessed in terms of what works well for customers, what doesn't, what we see going through just from a data perspective, as we see hundreds of thousands of builds running, that rich perspective is unique to us. Because as you said, we're a player that's been doing this for a really long time and very focused on it. We treat the experience with, I guess I'm trying to figure out a way to say this that doesn't sound as bad as it might, but a lot of people have suffered a lot with CI/CD.There's a lot that goes into getting CI/CD to work effectively and getting it to work reliably over time as your system is constantly changing. Honestly, there's a lot of frustration, and we come in to work every day thinking about minimizing that frustration so that our customers can go spend their time doing what matters to them. Again, when I think you sort of ... a lot of these big players present you with a runtime in which you can execute a script of your choosing. It's not thinking about the problem in that way and I don't see them changing their perspective. Honestly, I just don't worry about them.Corey: Which is a very fair tack to take. It's interesting watching companies and as far as how much time and energy they spend worrying about competition versus how much they focus instead on customers. To turn it around slightly, what makes what you do challenging in some respects, I would imagine is that a lot of your target market is themselves, developers. Developers, in my experience, are challenging customers in that, first, they tend to devalue their own time to the point where, oh, that doesn't sound hard. I'll build that overnight. Secondly, once you finally win them over to the idea of paying for something, it's challenging to get them to have the necessary signing authority. At best, they become champions. But what you do has to start with developers in order to win widespread adoption and technical buy-in. How does that wind up manifesting as approach to, well, some people call it developer relations, developer advocacy. I refer to those folks as developers because I have problems, but how do you folks view that?Rob: Yeah, it's a really insightful view actually because we do end up in most of our customers, or in the environments of our customers, however you want to describe it, as a result of the enthusiasm of individual developers, development teams, much more so than ... there are many products certainly in enterprise software and I don't really think purely in enterprise, but there are many products that can only be purchased by the CIO or the CTO or whatever. Right? To your question of developer relations, we spend a lot of time out in the market talking to individuals, talking at conferences, writing content about how we think about this space and things that people can do. But we're a very product driven company, meaning both, that's what we think about first, and then support it with these other things.But second, we win on product, right? We don't win in the market because you thought the blog posts that we wrote was really cool. That might make you aware of us, but if you don't love the product, I mean, developers, to your point, they want to use things that they really enjoy using. When developers use the product and love the product and they champion it and they get access because they might work on a side project or an open source project or maybe they worked in another company that used CircleCI and then they go somewhere else and they say, "What are we doing? Life is so much better for you Circle CI, those sorts of things. But it very much comes from the bottom up. It's pretty difficult to go into an organization and say, "Hey, you should push this down to all of your developers."There's a lot of rejection that comes from developers on mandated tooling. We have to provide knowledge, we have to provide capabilities in our product that appealed to those other folks. For example, administrators of our tooling, or when it gets to the point where someone owns how you use CircleCI versus just being a regular user of the product. We have capabilities to support them around understanding what's happening, around creating shared capabilities that multiple teams can use, those sorts of things. But ultimately, we have to lead with product, we have to get in into the sort of hearts and minds of the developers themselves and then grow from there and everything we do from a marketing, developer relations myself, I spend a lot of time talking to customers who are out in the market, is all about propping up or helping raise awareness effectively. But there's nothing that we can do if the product doesn't meet the needs of our customers.Corey: That's what it seems like it comes down to a fair bit. It's always weird to consider that, at its heart, developer relations is marketing. The folks I talk to who argue against that, it seems that it comes from a misunderstanding of what marketing actually is. It's not buying ads in airports, it's not doing podcast advertisements. That's a subject near and dear to my heart. It's not about annoying people by showing up at their office with the sales team. It's about understanding what their challenges and problems are and then positioning a solution that ideally solves them in a place that and in a way that they can be receptive to. Instead, people tend to equate marketing to this whole ridiculous statistics driven nonsense that doesn't really resonate with anyone and I think that that's unfair to everyone involved.That said, I will say that having spent a fair bit of time in this space, I've yet to see anything from CircleCI that has annoyed me to the point where I would have remembered it, which is awesome. I don't see it in flight magazines, generally. I don't see it on obnoxious people try to tackle me as I walk through an expo hall and want to scan my badge. It just seems very well executed and you have some very talented people working for you. To that end, you are largely a distributed company, which is fascinating. Did it start that way? Did it happen that way by a quirk of fate?Rob: Yeah, I those two things probably come together. The company, from very early days, now I wasn't there but I think some of our earliest engineers were distributed and the company started out basically entirely as engineers. It's a team solving problems of other engineers, which is ... it's a fun challenge. There were early participants who were distributed. Mostly, when you start a company and no one has ever heard of you and no one knows if you're going to be successful, going and recruiting is generally a different game than when you're, certainly, when you're where we are now. There were some personal relations that just happened to connect with people around the globe who wanted to participate.We started out pretty early with some distribution, and that led to structuring the org in a way, both from a tooling and process perspective. A lot of that sort of happens organically, but building a culture that really supported that. I personally am based in the Bay Area, so we have headquarters in San Francisco, but it doesn't really make a difference if I go in versus just stay and work from home on any given day because the company operates in such a way that that distribution is completely normal.Corey: We accidentally did the same thing. My business partner and I used to live across the street from each other and we decided to merge a week before he moved out of state to Portland. So awesome. Great. We have wonderful timing on all of these things. It's fun to build it from that way, build that way from the ground up. The challenge I've always seen is when you start off with having a centralized office and everyone's there, except this one person who, no matter how you try to work around it, is never as involved. So it feels like the sort of thing you've absolutely got to be building from day one, or otherwise, you're going to have a massive cultural growing pain as you try to get there.Rob: Yeah, I think that's true. So I've actually been that one person. I, at some point in my career prior to CircleCI, was helping out a company founded by some friends of mine based in Toronto. I grew up in Toronto. I kicked off a project and then the project grew and grew until I was the one person out of maybe 50 or 60 who wasn't in an office in Toronto. It got to the point where no one remembered who I was and I was like, "Cool, I think I'm done. I'm out." I was fine with that. It was always meant to be a temporary thing, but I really felt that transition for the organization. I would say in terms of growing, I mean, yes, if you start out, it goes both ways, if you start out distributed, you're going to remain distributed.There are certain things that get more challenging at scale, right? If everybody is sort of just in their home all over the globe, then the communication overhead continues to increase and increase in just understanding who people are, who you should be talking to. You need to focus-Corey: There's always the time zone hierarchy.Rob: Ooh, the time zones are a delight, yes. I would say like we talk a lot about, in this industry, Dunbar's number and sizes of teams and the points at which things get more complex. I think there's probably a different scale for distributed teams. It takes fewer people to reach a point where communication gets challenging, and trust and all the other things that go with Dunbar's views. You kind of have that challenge and then you start to think, oh well, then you have some offices, because we actually have maybe six physical offices, partly because in our go to market org, we've started to expand globally and put people in regional offices.There's this interesting disconnect. I don't know about disconnect, but there's a split in how we operate in different parts of the org. I think what I've seen people ... well, I don't know about succeed, but I've seen people try when you start out with one org, or sorry, one location is, let's not jump to that one person somewhere else and then one person somewhere else kind of thing, but build out a second office, build out another office, like pick another location where you think you ... it's often, certainly where we are, in the Bay Area, it's often driven by just this market. Finding talent, finding people who want to join you, hanging onto those people when there are so many other opportunities around tends to be much more challenging. When you offer people alternatives, like you can stay where you are but have access to a cool and interesting company or you can work from home, which a lot of people value, then there's different things that you bring to the table.I see a lot of people trying to expand in that way, but when you are so office-centric, a second office I think is a smoother transition point than just suddenly distributing people because, especially the first and second one, unless you're hiring in a massive wave, are really going to struggle in that environment.Corey: I think that's probably one of the more astute things that's been noticed on this show in the last couple of years. If people want to hear more about what you have to say and how you think about the world, where can they find you?Rob: I would say, on our blog, I tend to write stuff there as do other people. You talked about having great people in the organization. We have a lot of great people talking about how we think about engineering, how we think about both engineering teams and culture and then some of the problems we're trying to solve. So, off our site, circleci.com, and go to our blog. Then, I attend to is to speak and hangout on podcasts and do guest writing. I think I'm pretty easy to find. You can find me on Twitter. My handle is z00b, Z-0-0-B. I know I'm not super prolific, but if someone wants to track me down and ask me something, I'd probably be more than happy to answer.Corey: You can expect some engagement as soon as this goes out. Thank you so much for taking the time to speak with me today. I appreciate it.Rob: Yeah, thanks for having me. This was a ton of fun.Corey: Rob Zuber, CTO at CircleCI. I'm Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on Apple podcasts. If you've hated this podcast, please leave a five-star review on Apple podcasts along with something amusing for me to read later while I'm crying.Announcer: This has been this week's episode of Screaming in the Cloud. You can also find more corey@screaminginthecloud.com or wherever fine snark is sold.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Uptycs of Cybersecurity Requirements with Jack Roehrig

Screaming in the Cloud

Play Episode Listen Later Dec 20, 2022 43:13


About JackJack is Uptycs' outspoken technology evangelist. Jack is a lifelong information security executive with over 25 years of professional experience. He started his career managing security and operations at the world's first Internet data privacy company. He has since led unified Security and DevOps organizations as Global CSO for large conglomerates. This role involved individually servicing dozens of industry-diverse, mid-market portfolio companies.Jack's breadth of experience has given him a unique insight into leadership and mentorship. Most importantly, it fostered professional creativity, which he believes is direly needed in the security industry. Jack focuses his extra time mentoring, advising, and investing. He is an active leader in the ISLF, a partner in the SVCI, and an outspoken privacy activist. Links Referenced: UptycsSecretMenu.com: https://www.uptycssecretmenu.com Jack's email: jroehrig@uptycs.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: If you asked me to rank which cloud provider has the best developer experience, I'd be hard-pressed to choose a platform that isn't Google Cloud. Their developer experience is unparalleled and, in the early stages of building something great, that translates directly into velocity. Try it yourself with the Google for Startups Cloud Program over at cloud.google.com/startup. It'll give you up to $100k a year for each of the first two years in Google Cloud credits for companies that range from bootstrapped all the way on up to Series A. Go build something, and then tell me about it. My thanks to Google Cloud for sponsoring this ridiculous podcast.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted guest episode is brought to us by our friends at Uptycs. And they have sent me their Technology Evangelist, Jack Charles Roehrig. Jack, thanks for joining me.Jack: Absolutely. Happy to spread the good news.Corey: So, I have to start. When you call yourself a technology evangelist, I feel—just based upon my own position in this ecosystem—the need to ask, I guess, the obvious question of, do you actually work there, or have you done what I do with AWS and basically inflicted yourself upon a company. Like, well, “I speak for you now.” The running gag that becomes more true every year is that I'm AWS's chief marketing officer.Jack: So, that is a great question. I take it seriously. When I say technology evangelist, you're speaking to Jack Roehrig. I'm a weird guy. So, I quit my job as CISO. I left a CISO career. For, like, ten years, I was a CISO. Before that, 17 years doing stuff. Started my own thing, secondaries, investments, whatever.Elias Terman, he hits me up and he says, “Hey, do you want this job?” It was an executive job, and I said, “I'm not working for anybody.” And he says, “What about a technology evangelist?” And I was like, “That's weird.” “Check out the software.”So, I'm going to check out the software. I went online, I looked at it. I had been very passionate about the space, and I was like, “How does this company exist in doing this?” So, I called him right back up, and I said, “I think I am.” He said, “You think you are?” I said, “Yeah, I think I'm your evangelist. Like, I think I have to do this.” I mean, it really was like that.Corey: Yeah. It's like, “Well, we have an interview process and the rest.” You're like, “Yeah, I have a goldfish. Now that we're done talking about stuff that doesn't matter, I'll start Monday.” Yeah, I like the approach.Jack: Yeah. It was more like I had found my calling. It was bizarre. I negotiated a contract with him that said, “Look, I can't just work for Uptycs and be your evangelist. That doesn't make any sense.” So, I advise companies, I'm part of the SVCI, I do secondaries, investment, I mentor, I'm a steering committee member of the ISLF. We mentor security leaders.And I said, “I'm going to continue doing all of these things because you don't want an evangelist who's just an Uptycs evangelist.” I have to know the space. I have to have my ear to the ground. And I said, “And here's the other thing, Elias. I will only be your evangelist while I'm your evangelist. I can't be your evangelist when I lose passion. I don't think I'm going to.”Corey: The way I see it, authenticity matters in this space. You can sell out exactly once, so make it count because you're never going to be trusted again to do it a second time. It keeps people honest, at least the ones you actually want to be doing work with. So, you've been in the space a long time, 20 years give or take, and you've seen an awful lot. So, I'm curious, given that I tend to see about, you know, six or seven different companies in the RSA Sponsor Hall every year selling things because you know, sure hundreds of booths, bunch of different marketing logos and products, but it all distills down to the same five or six things.What did you see about Uptycs that made you say, “This is different?” Because to be very direct, looking at the website, it's, “Oh, what do you sell?” “Acronyms. A whole bunch of acronyms that, because I don't eat, sleep, and breathe security for a living, I don't know what most of them mean, but I'm sure they're very impressive and important.” What does it actually do, for those of us who are practitioners, but not swimming in the security vendor stream?Jack: So, I've been obsessed with this space and I've seen the acronyms change over and over and over again. I'm always the first one to say, “What does that mean?” As the senior guy in the room a lot of time. So, acronyms. What does Uptycs do? What drew me into them? They did HIDS, Host Intrusion Detection System. I don't know if you remember that. Turned into—Corey: Oh, yeah. OSSEC was the one I always wound up using, the open-source version. OSSEC [kids 00:04:10]. It's like, oh, instead of paying a vendor, you can contribute it yourself because your time is free, right? Free as in puppy, or these days free as in tier when it comes to cloud.Jack: Oh, I like that. So, yeah, I became obsessed with this HIDS stuff. I think it was evident I was doing it, that it was threat [unintelligible 00:04:27]. And these companies, great companies. I started this new job in an education technology company and I needed a lot of work, so I started to play around with more sophisticated HIDS systems, and I fell in love with it. I absolutely fell in love with it.But there are all these limitations. I couldn't find this company that would build it right. And Uptycs has this reputation as being not very sexy, you know? People telling me, “Uptycs? You're going to Uptycs?” Yeah—I'm like, “Yeah. They're doing really cool stuff.”So, Uptycs has, like, this brand name and I had referred Uptycs before without even knowing what it was. So, here I am, like, one of the biggest XDR, I hope to say, activists in the industry, and I didn't know about Uptycs. I felt humiliated. When I heard about what they were doing, I felt like I wasted my career.Corey: Well, that's a strong statement. Let's begin with XDR. To my understanding, that some form of audio cable standard that I use to plug into my microphone. Some would say it, “X-L-R.” I would say sounds like the same thing. What is XDR?Jack: What is it, right? So, [audio break 00:05:27] implement it, but you install an agent, typically on a system, and that agent collects data on the system: what processes are running, right? Well, maybe it's system calls, maybe it's [unintelligible 00:05:37] as regular system calls. Some of them use the extended Berkeley Packet Filter daemon to get stuff, but one of the problems is that we are obtaining low-level data on an operating system, it's got to be highly specific. So, you collect all this data, who's logging in, which passwords are changing, all the stuff that a hacker would do as you're typing on the computer. You're maybe monitoring vulnerabilities, it's a ton of data that you're monitoring.Well, one of the problems that these companies face is they try to monitor too much. Then some came around and they tried to monitor too little, so they weren't as real-time.Corey: Sounds like a little pig story here.Jack: Yeah [laugh], exactly. Another company came along with a fantastic team, but you know, I think they came in a little late in the game, and it looks like they're folding now. They were wonderful company, but the one of the biggest problems I saw was the agent, the compatibility. You know, it was difficult to deploy. I ran DevOps and security and my DevOps team uninstalled the agent because they thought there was a problem with it, we proved there wasn't and four months later, they hadn't completely reinstall it.So, a CISO who manages the DevOps org couldn't get his own DevOps guy to install this agent. For good reason, right? So, this is kind of where I'm going with all of this XDR stuff. What is XDR? It's an agent on a machine that produces a ton of data.I—it's like omniscience. Yes, I started to turn it in, I would ping developers, I was like, “Why did you just run sudo on that machine?” Right. I mean, I knew everything was going on in the space, I had a good intro to all the assets, they technically run on the on-premise data center and the quote-unquote, “Cloud.” I like to just say the production estate. But it's omniscience. It's insights, you can create rules, it's one of the most powerful security tools that exists.Corey: I think there's a definite gap as far as—let's narrow this down to cloud for just a second before we expand this into the joy that has data centers—where you can instrument a whole bunch of different security services in any cloud provider—I'm going to pick on AWS because they're the 800-pound gorilla in the room, and frankly, they could use taking down a peg or two by and large—and you wind up configuring all the different security services that in some cases seem totally unaware of each other, but that's the AWS product portfolio for you. And you do the math out and realize that it theoretically would cost you—to enable all these things—about three times as much as the actual data breach you're ideally trying to prevent against. So, on some level, it feels like, “Heads, I win; tails, you lose,” style scenario.And the answer that people have started reaching out to third-party vendors to wind up tying all of this together into some form of cohesive narrative that a human being has a hope in hell of understanding. But everything I've tried to this point still feels like it is relatively siloed, focused on the whole fear, uncertainty, and doubt that is so inherent to so much of the security world's marketing. And it's almost like cost control where you can spend almost limitless amount of time, energy, money, et cetera, trying to fix these things, but it doesn't advance your company to the next milestone. It's like buying fire insurance on your building. You can spend all the money on fire insurance. Great, it doesn't get you to the next milestone that propels your company forward. It's all reactive instead of proactive. So, it feels like it is never the exciting, number-one priority for companies until right after it should have been higher in the list than it was.Jack: So, when I worked at Turnitin, we had saturated the market. And we worked in education, technology space globally. Compliance everywhere. So, I just worked on the Australian Data Infrastructure Act of 2020. I'm very familiar with the 27 data privacy regulations that are [laugh] in scope for schools. I'm a FERPA expert, right? I know that there's only one P in HIPAA [laugh].So, all of these compliance regulations drove schools and universities, consortiums, government agencies to say, “You need to be secure.” So, security at Turnitin was the number one—number one—key performance indicator of the company for one-and-a-half years. And these cloud security initiatives didn't just make things more secure. They also allowed me to implement a reasonable control framework to get various compliance certifications. So, I'm directly driving sales by deploying these security tools.And the reason why that worked out so great is, by getting the certifications and by building a sensible control framework layer, I was taking these compliance requirements and translating them into real mitigations of business risk. So, the customers are driving security as they should. I'm implementing sane security controls by acting as the chief security officer, company becomes more secure, I save money by using the correct toolset, and we increased our business by, like, 40% in a year. This is a multibillion-dollar company.Corey: That is definitely a story that resonates, especially with organizations that are—or they should be—compliance-forward and having to care about the nature of what it is that they're doing. But I have a somewhat storied history in working in FinTech and large-scale financial services. One of the nice things about that job, which is sort of a weird thing to say there if you don't want to get ejected from the room, has been, “Yeah well, it's only money,” in the final analysis. Because yeah, no one dies if you wind up screwing that up. People's kids don't get exposed.It's just okay, people have to fill out a bunch of forms and you get sued into oblivion and you're not there anymore because the first role of a CISO is to be ablative and get burned away whenever there's a problem. But it still doesn't feel like it does more for a number of clients than, on some level, checking a box that they feel needs to be checked. Not that it shouldn't be, necessarily, but I have a hard time finding people that get passionately excited about security capabilities. Where are they hiding?Jack: So, one of the biggest problems that you're going to face is there are a lot of security people that have moved up in the ranks through technology and not through compliance and technology. These people will implement control frameworks based on audit requirements that are not bespoke to their company. They're doing it wrong. So, we're not ticking boxes; I'm creating boxes that need to be ticked to secure the infrastructure. And at Turnitin, Turnitin was a company that people were forced to use to submit their works in the school.So, imagine that you have to submit a sensitive essay, right? And that sensitive essay goes to this large database. We have the Taiwanese government submitting confidential data there. I had the chief scientist at NASA submitting in pre-publication data there. We've got corporate trade secrets that are popped in there. We have all kinds of FDA pre-approval stuff. This is a plagiarism detection software being used by large companies, governments, and 12-year-old girls, right, who don't want their data leaked.So, if you look at it, like, this is an ethical thing that is required for us to do, our customers drive that, but truly, I think it's ethics that drive it. So, when we implemented a control framework, I didn't do the minimum, I didn't run an [unintelligible 00:12:15] scan that nobody ran. I looked for tools that satisfied many boxes. And one of the things about the telemetry at scale, [unintelligible 00:12:22], XDR, whatever want to call it, right? But the agent-based systems that monitor for all of us this run-state data, is they can take a lot of your technical SOC controls.Furthermore, you can use these tools to improve your processes like incident response, right? You can use them to log things. You can eliminate your SIEM by using this for your DLP. The problem of companies in the past is they wouldn't deploy on the entire infrastructure. So, you'd get one company, it would just be on-prem, or one company that would just run on CentOS.One of the reasons why I really liked this Uptycs company is because they built it on an osquery. Now, if you mention osquery, a lot of people glaze over, myself included before I worked at Uptycs. But apparently what it is, is it's this platform to collect a ton of data on the run state of a machine in real-time, pop it into a normalized SQL database, and it runs on a ton of stuff: Mac OS, Windows, like, tons of version of Linux because it's open-source, so people are porting it to their infrastructure. And that was one of these unique differentiators is, what is the cloud? I mean, AWS is a place where you can rapidly prototype, there's tons of automation, you can go in and you build something quickly and then it scales.But I view the cloud as just a simple abstraction to refer to all of my assets, be them POPS, on-premise data machines, you know, the corporate environment, laptops, desktops, the stuff that we buy in the public clouds, right? These things are all part of the greater cloud. So, when I think cloud security, I want something that does it all. That's very difficult because if you had one tool run on your cloud, one tool to run on your corporate environment, and one tool to run for your production environment, those tools are difficult to manage. And the data needs to be ETL, you know? It needs to be normalized. And that's very difficult to do.Our company is doing [unintelligible 00:14:07] security right now as a company that's taking all these data signals, and they're normalizing them, right, so that you can have one dashboard. That's a big trend in security right now. Because we're buying too many tools. So, I guess the answer that really is, I don't see the cloud is just AWS. I think AWS is not just data—they shouldn't call themselves the cloud. They call themselves the cloud with everything. You can come in, you can rapidly prototype your software, and you know what? You want to run to the largest scale possible? You can do that too. It's just the governance problem that we run into.Corey: Oh, yes. The AWS product strategy is pretty clearly, in a word, “Yes,” written on a Post-it note somewhere. That's the easiest job in the world is running their strategy. The challenge, too, is that we don't live in a world where monocultures are a thing anymore because regardless—if you use AWS for the underlying infrastructure, great, that makes a lot of sense. Use it for a lot of the higher-up the stack, SaaS-y type things that you don't want to have to build yourself from—by going to Home Depot and picking up components, you're doing something relatively foolish in most cases.They're a plumbing company not a porcelain company, in many respects. And regardless of what your intention is around multiple clouds, people wind up using different things. In most cases, you're going to be storing your source code in GitHub, not in AWS CodeCommit because CodeCommit doesn't really have any customers, for reasons that become blindingly apparent the first time you try to use it for something. So, you always wind up with these cross-cloud, cross-infrastructure stories. For any company that had the temerity to be founded before 2010, they probably have an on-premises data center as well—or six or more—and you're starting to try to wind up having a whole bunch of different abstractions viewed through the same lenses in terms of either observability or control plane or governance, or—dare I say it—security. And it feels like there are multiple approaches, all of which have their drawbacks, which of course means, it's complicated. What's your take on it?Jack: So, I think it was two years ago we started to see tools to do signal consumption. They would aggregate those signals and they would try and produce meaningful results that were actionable rather than you having to go and look at all this granular data. And I think that's phenomenal. I think a lot of companies are going to start to do that more and more. One of the other trends people do is they eliminated data and they went machine-learning and anomaly detection. And that didn't work.It missed a lot of things, right, or generated a lot of false positive. I think that one of the next big technologies—and I know it's been done for two years—but I think we're the next things we're going to see is the axonius of the consumption of events, the categorization into alerts-based synthetic data classification policies, and we're going to look at the severity classifications of those, they're going to be actionable in a priority queue, and we're going to eliminate the need for people that don't like their jobs and sit at a SOC all day and analyze a SIEM. I don't ever run a SIEM, but I think that this diversity can be a good thing. So, sometimes it's turned out to be a bad thing, right? We wanted to diversity, we don't want all the data to be homogenous. We don't need data standards because that limits things. But we do want competition. But I would ask you this, Corey, why do you think AWS? We remember 2007, right?Corey: I do. Oh, I've been around at least that long.Jack: Yeah, you remember when S3 came up. Was that 2007?Corey: I want to say 2004, 2005 in beta, and then relaunched as the first general available service. The first beta service was SQS, so there's always some question about which one was first. I don't get in the middle of those fights because all I'm going to do is upset people.Jack: But S3 was awesome. It still is awesome, right?Corey: Oh yes.Jack: And you know what I saw? I worked for a very older company with very strict governance. You know with SOX compliance, which is a joke, but we also had SOC compliance. I did HIPAA compliance for them. Tons of compliance to this.I'm not a compliance off, too, by trade. So, I started seeing [x cards 00:17:54], you know, these company personal cards, and people would go out and [unintelligible 00:17:57] platform because if they worked with my teams internally, if they wanted to get a small app deployed, it was like a two, three-month process. That process was long because of CFO overhead, approvals, vendor data security vetting, racking machines. It wasn't a problem that was inherent to the technology. I actually built a self-service cloud in that company. The problem was governance. It was financial approvals, it was product justification.So, I think AWS is really what made the internet inflect and scale and innovate amazingly. But I think that one of the things that it sacrificed was governance. So, if you tie a lot of what we're saying back together, by using some sort of tool that you can pop into a cloud environment and they can access a hundred percent of the infrastructure and look for risks, what you're doing is you're kind of X-Ray visioning into all these nodes that were deployed rapidly and kept around because they were crown jewels, and you're determining the risks that lie on them. So, let's say that 10 or 15% of your estate is prototype things that grew at a scale and we can't pull back into our governance infrastructure. A lot of times people think that those types of team machines are probably pretty locked down and they're probably low risk.If you throw a company on the side scanner or something like that, you'll see they have 90% of the risk, 80% of the risk. They're unpatched and they're old. So, I remember at one point in my career, right, I'm thinking Amazon's great. I'm—[unintelligible 00:19:20] on Amazon because they've made the internet go, they influxed. I mean, they've scaled us up like crazy.Corey: Oh, the capability store is phenomenal. No argument there.Jack: Yeah. The governance problem, though, you know, the government, there's a lot of hacks because of people using AWS poorly.Corey: And to be clear, that's everyone. We all are. I take a look at some of the horrible technical decisions I made even a couple of years ago, based upon what I know now, it's difficult to back out and wind up doing things the proper way. I wrote an article a while back, “17 Ways to Run Containers on AWS,” and listed all the services. And I think it was a little on the nose, but then I wrote 17, “More Ways to Run Containers on AWS,” but different services. And I'm about three-quarters of the way through the third in the sequel. I just need a couple more releases and we're good to go.Jack: The more and more complexity you add, the more security risk exists. And I've heard horror stories. Dictionary.com lost a lot of business once because a couple of former contractors deleted some instances in AWS. Before that, they had a secret machine they turned into a pixel [unintelligible 00:20:18] and had take down their iPhone app.I've seen some stuff. But one of the interesting things about deploying one of these tools in AWS, they can just, you know, look X-Ray vision on into all your compute, all your storage and say, “You have PIIs stored here, you have personal data stored here, you have this vulnerability, that vulnerability, this machine has already been compromised,” is you can take that to your CEO as a CISO and say, “Look, we were wrong, there's a lot of risk here.” And then what I've done in the past is I've used that to deploy HIDS—XDR, telemetry at scale, whatever you want to call it—these agent-based solutions, I've used that to justification for them. Now, the problem with this solutions that use agentless is almost all of them are just in the cloud. So, just a portion of your infrastructure.So, if your hybrid environment, you have data centers, you're ignoring the data centers. So, it's interesting because I've seen these companies position themselves as competitors when really, they're in complementary spaces, but one of them justified the other for me. So, I mean, what do you think about that awkward competition? Why was this competition exists between these people if they do completely different things?Corey: I'll take it a step further. I'm a big believer that security for the cloud providers should not be a revenue generator in any meaningful sense because at that point, they wind up with an inherent conflict of interest, where when they start charging, especially trying to do value-based pricing as they move up the stack, what they're inherently saying is, great, you can get our version of our services that is less secure, so that they're what they're doing is they're making security on their platform an inherent investment decision. And I've never been a big believer in that approach.Jack: The SSO tax.Corey: Oh, yes. And many others.Jack: Yeah. So, I was one of the first SSO tax contributors. That started it.Corey: You want data plane audit logging? Great, that'll cost you. But they finally gave in a couple of years back and made the first management trail for CloudTrail audit logging free for everyone. And people still advertently built second ones and then wonder why they're paying through the nose. Like, “Oh, that's 40 grand a month. That should be zero.” Great. Send that to your SIEM and then have that pass it out to where it needs to go. But so much of it is just these weird configuration taxes that people aren't fully aware exist.Jack: It's the market, right? The market is—so look at Amazon's IAM. It is amazing, right? It's totally robust, who is using it correctly? I know a lot of people are. I've been the CISO for over 100 companies and IAM is was one of those things that people don't know how to use, and I think the reason is because people aren't paying for it, so AWS can continue to innovate on it.So, we find ourselves with this huge influx of IAM tools in the startup scene. We all know Uptycs does some CIAM and some identity management stuff. But that's a great example of what you're talking about, right? These cloud companies are not making the things inherently secure, but they are giving some optionality. The products don't grow because they're not being consumed.And AWS doesn't tend to advertise them as much as the folks in the security industry. It's been one complaint of mine, right? And I absolutely agree with you. Most of the breaches are coming out of AWS. That's not AWS's fault. AWS's infrastructure isn't getting breached.It's the way that the customers are configuring the infrastructure. That's going to change a lot soon. We're starting to see a lot of change. But the fundamental issue here is that security needs to be invested in for short-term initiatives, not just for long-term initiatives. Customers need to care about security, not compliance. Customers need to see proof of security. A customer should be demanding that they're using a secure company. If you've ever been on the vendor approval side, you'll see it's very hard to push back on an insecure company going through the vendor process.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: Oh, yes. I wound up giving probably about 100 companies now S3 Bucket Negligence Awards for being public about failing to secure their data and put that out into the world. I had one physical bucket made, the S3 Bucket Responsibility Award and presented it to their then director of security over at the Pokémon Company because there was a Wall Street Journal article talking about how their security review—given the fact that they are a gaming company that has children as their primary customer—they take it very seriously. And they cited the reason they're not to do business with one unnamed vendor was in part due to the lackadaisical approach around S3 bucket control. So, that was the one time I've seen in public a reference where, “Yeah, we were going to use a vendor and their security story was terrible, and we decided not to.”It's, why is that news? That should be a much more common story, but these days, it feels like procurement is rubber-stamping it and, like, “Okay, great. Fill out the form.” And, “Okay, you gave some wrong answers on the form. Try it again and tell the story differently until it gets shoved through.” It feels like it's a rubber stamp rather than a meaningful control.Jack: It's not a rubber stamp for me when I worked in it. And I'm a big guy, so they come to me, you know, like—that's how being, like, career law, it's just being big and intimidating. Because that's—I mean security kind of is that way. But, you know, I've got a story for you. This one's a little more bleak.I don't know if there's a company called Ask.fm—and I'll mention them by name—right, because, well, I worked for a company that did, like, a hostile takeover this company. And that's when I started working with [unintelligible 00:25:23]. [unintelligible 00:25:24]. I speak Russian and I learned it for work. I'm not Russian, but I learned the language so that I could do my job.And I was working for a company with a similar name. And we were in board meetings and we were crying, literally shedding tears in the boardroom because this other company was being mistaken for us. And the reason why we were shedding tears is because young women—you know, 11 to 13—were committing suicide because of online bullying. They had no health and safety department, no security department. We were furious.So, the company was hosted in Latvia, and we went over there and we installed one I lived in Latvia for quite a bit, working as the CISO to install a security program along with the health and safety person to install the moderation team. This is what we need to do in the industry, especially when it comes to children, right? Well, regulation solve it? I don't know.But what you're talking about the Pokémon video game, I remember that right? We can't have that kind of data being leaked. These are children. We need to protect them with information security. And in education technology, I'll tell you, it's just not a budget priority.So, the parents need to demand the security, we need to demand these audit certifications, and we need to demand that our audit firms are audited better. Our audit firms need to be explaining to security leaders that the control frameworks are something that they're responsible for creating bespoke. I did a presentation with Al Kingsley recently about security compliance, comparing FERPA and COPPA to the GDPR. And it was very interesting because FERPA has very little teeth, it's very long code and GDPR is relatively brilliant. GDPR made some changes. FERPA was so ambiguous and vague, it made a lot of changes, but they were kind of like, in any direction ever because nobody knows FERPA is. So, I don't know, what's the answer to that? What do we do?Corey: Yeah. The challenge is, you can see a lot of companies in specific areas doing the right thing, when they're intentionally going out on day one to, for example, service kids as a primary user base demographic. The challenge that you see with this is that, that's great, but then you have things that are not starting off with that point of view. And they started running into population limits and realize, okay, we've got to start expanding our user base somewhere, and then they went a bolting on those things is almost as an afterthought, where, “Oh, well, we've been basically misusing people's data for our entire existence, but now—now—we're suddenly magically going to do the right thing where kids are concerned.” I wish, but unfortunate that philosophy assumes a better take of humanity than is readily apparent.Jack: I wonder why they do that though, right? Something's got to, you know, news happened or something and that's why they're doing it. And that's not okay. But I have seen companies, one of the founders of Scantron—do you know what a Scantron is?Corey: Oh, yes. I'm much older than I look.Jack: Yeah, I'm much older than I look, too. I like to think that. But for those that don't know, a scantron, use a number two pencil and you filled in these little dots. And it was for taking tests. So, the guy who started Scantron, created a small two-person company.And AWS did something magnificent. They recognized that it was an education technology company, and they gave them, for free, security consultation services, security implementation services. And when we bought this company—I'm heavily involved in M&A, right—I'm sitting down with the two founders of the company, and my jaw is on the desk. They were more secure than a lot of the companies that I've worked with that had robust security departments. And I said, “How did you do this?”They said, “AWS provided us with this free service because we're education technology.” I teared up. My heart was—you know, that's amazing. So, there are companies that are doing this right, but then again, look at Grammarly. I hate to pick on Grammarly. LanguageTool is an open-source I believe, privacy-centric Grammarly competitor, but Grammarly, invest in your security a little more, man. Y'all were breached. They store a lot of data, they [unintelligible 00:29:10] lot of the data.Corey: Oh, and it scared the living hell out of companies realizing that they had business users using Grammarly as an extension to work on internal documents and just sending proprietary data to some third-party service that they clicked through the terms on and I don't know that it was ever shown the Grammarly was misusing any of that, but the potential for that is massive.Jack: Do you know what they were doing with it?Corey: Well, using AI to learn these things. Yeah, but it's the supervision story always involves humans reading it.Jack: They were building a—and I think—nobody knows the rumor, but I've worked in the industry, right, pretty heavily. They're doing something great for the world. I believe they're building a database of works submitted to do various things with them. One of those things is plagiarism detection. So, in order to do that they got to store, like, all of the data that they're processing.Well, if you have all the data that you've done for your company that's sitting in this Grammarly database and they get hacked—luckily, that's a lot of data. Maybe you'll be overlooked. But I've data breach database sitting here on my desk. Do you know how many rows it's got? [pause]. Yes, breach database.Corey: Oh, I wouldn't even begin to guess. I know the data volumes that Troy Hunt's Have I Been Pwned? Site winds up dealing with and it is… significant.Jack: How many billions of rows do you think it is?Corey: Ah, I'd say 20 as an argument?Jack: 34.Corey: Okay. Yeah, directionally right. Fermi estimation saves us yet again.Jack: [laugh]. The reason I build this breach database is because I thought Covid would slow down and I wanted it to do executive protection. Companies in the education space also suffer from [active 00:30:42] shooters and that sort of thing. So, that's another thing about security, too, is it transcends all these interesting areas, right? Like here, I'm doing executive risk protection by looking at open-source data.Protect the executives, show the executives that security is a concern, these executives that'll realize security's real. Then these past that security down in the list of priorities, and next thing you know, the 50 million active students that are using Turnitin are getting better security. Because an executive realized, “Hey, wait a minute, this is a real thing.” So, there's a lot of ways around this, but I don't know, it's a big space, there's a lot of competition. There's a lot of companies that are coming in and flashing out of the pan.A lot of companies are coming in and building snake oil. How do people know how to determine the right things to use? How do people don't want to implement? How do people understand that when they deploy a program that only applies to their cloud environment it doesn't touch there on-prem where a lot of data might be a risk? And how do we work together? How do we get teams like DevOps, IT, SecOps, to not fight each other for installing an agent for doing this?Now, when I looked at Uptycs, I said, “Well, it does the EDR for corp stuff, it does the host intrusion detection, you know, the agent-based stuff, I think, for the well because it uses a buzzword I don't like to use, osquery. It's got a bunch of cloud security configuration on it, which is pretty commoditized. It does agentless cloud scanning.” And it—really, I spent a lot of my career just struggling to find these tools. I've written some myself.And when I saw Uptycs, I was—I felt stupid. I couldn't believe that I hadn't used this tool, I think maybe they've increased substantially their capabilities, but it was kind of amazing to me that I had spent so much of my time and energy and hadn't found them. Luckily, I decided to joi—actually I didn't decide to join; they kind of decided for me—and they started giving it away for free. But I found that Uptycs needs a, you know, they need a brand refresh. People need to come and take a look and say, “Hey, this isn't the old Uptycs. Take a look.”And maybe I'm wrong, but I'm here as a technology evangelist, and I'll tell you right now, the minute I no longer am evangelists for this technology, the minute I'm no longer passionate about it, I can't do my job. I'm going to go do something else. So, I'm the one guy who will put it to your brass tacks. I want this thing to be the thing I've been passionate about for a long time. I want people to use it.Contact me directly. Tell me what's wrong with it. Tell me I'm wrong. Tell me I'm right. I really just want to wrap my head around this from the industry perspective, and say, “Hey, I think that these guys are willing to make the best thing ever.” And I'm the craziest person in security. Now, Corey, who's the craziest person security?Corey: That is a difficult question with many wrong answers.Jack: No, I'm not talking about McAfee, all right. I'm not that level of crazy. But I'm talking about, I was obsessed with this XDR, CDR, all the acronyms. You know, we call it HIDS, I was obsessed with it for years. I worked for all these companies.I quit doing, you know, a lot of very good entrepreneurial work to come work at this company. So, I really do think that they can fix a lot of this stuff. I've got my fingers crossed, but I'm still staying involved in other things to make these technologies better. And the software's security space is going all over the place. Sometimes it's going bad direction, sometimes it's going to good directions. But I agree with you about Amazon producing tools. I think it's just all market-based. People aren't going to use the complex tools of Amazon when there's all this other flashy stuff being advertised.Corey: It all comes down to marketing budget, and AWS has always struggled with telling a story. I really want to thank you for being so generous with your time. If people want to learn more, where should they go?Jack: Oh, gosh, everywhere. But if you want to learn more about Uptycs, why don't you just email me?Corey: We will, of course, put your email address into the show notes.Jack: Yeah, we'll do it.Corey: Don't offer if you're not serious. There's also uptycssecretmenu.com, which is apparently not much of a secret, given the large banner all over Uptycs' website.Jack: Have you seen this? Let me just tell you about this. This is not a catch. I was blown away by this; it's one of the reasons I joined. For a buck, if you have between 100 and 1000 nodes, right, you get our agentless system and our agent-based system, right?I think it's only on AWS. But that's, like, what, $150, $180,000 value? You get it for a full year. You don't have to sign a contract to renew or anything. Like, you just get it for a buck. If anybody who doesn't go on to the secret menu website and pay $1 and check out this agentless solution that deploys in two minutes, come on, man.I challenge everybody, go on there, do that, and tell me what's wrong with it. Go on there, do that, and give me the feedback. And I promise you I'll do everything in my best efforts to make it the best. I saw the engineering team in this company, they care. Ganesh, the CEO, he is not your average CEO.This guy is in tinkerers. He's on there, hands on keyboard. He responds to me in the middle of night. He's a geek just like me. But we need users to give us feedback. So, you got this dollar menu, you sign up before the 31st, right? You get the product for buck. Deploy the thing in two minutes.Then if you want to do the XDR, this agent-based system, you can deploy that at your leisure across whichever areas you want. Maybe you want a corporate network on laptops and desktops, your production infrastructure, your compute in the cloud, deploy it, take a look at it, tell me what's wrong with it, tell me what's right with it. Let's go in there and look at it together. This is my job. I want this company to work, not because they're Uptycs but because I think that they can do it.And this is my personal passion. So, if people hit me up directly, let's chat. We can build a Slack, Uptycs skunkworks. Let's get this stuff perfect. And we're also going to try and get some advisory boards together, like, maybe a CISO advisory board, and just to get more feedback from folks because I think the Uptycs brand has made a huge shift in a really positive direction.And if you look at the great thing here, they're unifying this whole agentless and agent-based stuff. And a lot of companies are saying that they're competing with that, those two things need to be run together, right? They need to be run together. So, I think the next steps here, check out that dollar menu. It's unbelievable. I can't believe that they're doing it.I think people think it's too good to be true. Y'all got nothing to lose. It's a buck. But if you sign up for it right now, before the December 31st, you can just wait and act on it any month later. So, just if you sign up for it, you're just locked into the pricing. And then you want to hit me up and talk about it. Is it three in the morning? You got me. It's it eight in the morning? You got me.Corey: You're more generous than I am. It's why I work on AWS bills. It's strictly a business-hours problem.Jack: This is not something that they pay me for. This is just part of my personal passion. I have struggled to get this thing built correctly because I truly believe not only is it really cool—and I'm not talking about Uptycs, I mean all the companies that are out there—but I think that this could be the most powerful tool in security that makes the world more secure. Like, in a way that keeps up with the security risks increasing.We just need to get customers, we need to get critics, and if you're somebody who wants to come in and prove me wrong, I need help. I need people to take a look at it for me. So, it's free. And if you're in the San Francisco Bay Area and you give me some good feedback and all that, I'll take you out to dinner, I'll introduce you to startup companies that I think, you know, you might want to advise. I'll help out your career.Corey: So, it truly is dollar menu then.Jack: Well, I'm paying for the dinner out my personal thing.Corey: Exactly. Well, again, you're also paying for the infrastructure required to provide the service, so, you know, one way or another, it's all the best—it's just like Cloud, there is no cloud. It's just someone else's cost center. I like that.Jack: Well, yeah, we're paying for a ton of data hosting. This is a huge loss leader. Uptycs has a lot of money in the bank, I think, so they're able to do this. Uptycs just needs to get a little more bold in their marketing because I think they've spent so much time building an awesome product, it's time that we get people to see it. That's why I did this.My career was going phenomenally. I was traveling the world, traveling the country promoting things, just getting deals left and right and then Elias—my buddy over at Orca; Elias, one of the best marketing guys I've ever met—I've never done marketing before. I love this. It's not just marketing. It's like I get to take feedback from people and make the product better and this is what I've been trying to do.So, you're talking to a crazy person in security. I will go well above and beyond. Sign up for that dollar menu. I'm telling you, it is no commitment, maybe you'll get some spam email or something like that. Email me directly, I'll kill the spam email.You can do it anytime before the end of 2023. But it's only for 2023. So, you got a full year of the services for free. For free, right? And one of them takes two minutes to deploy, so start with that one. Let me know what you think. These guys ideate and they pivot very quickly. I would love to work on this. This is why I came here.So, I haven't had a lot of opportunity to work with the practitioners. I'm there for you. I'll create a Slack, we can all work together. I'll invite you to my Slack if you want to get involved in secondaries investing and startup advisory. I'm a mentor and a leader in this space, so for me to be able to stay active, this is like a quid pro quo with me working for this company.Uptycs is the company that I've chosen now because I think that they're the ones that are doing this. But I'm doing this because I think I found the opportunity to get it done right, and I think it's going to be the one thing in security that when it is perfected, has the biggest impact.Corey: We'll see how it goes out over the coming year, I'm sure. Thank you so much for being so generous with your time. I appreciate it.Jack: I like you. I like you, Corey.Corey: I like me too.Jack: Yeah? All right. Okay. I'm telling [unintelligible 00:39:51] something. You and I are very weird.Corey: It works out.Jack: Yeah.Corey: Jack Charles Roehrig, Technology Evangelist at Uptycs. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment that we're going to be able to pull the exact details of where you left it from because your podcast platform of choice clearly just treated security as a box check.Jack: [laugh].Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Holiday Replay Edition - The Staying Power of Kubernetes with Kelsey Hightower

Screaming in the Cloud

Play Episode Listen Later Dec 15, 2022 43:04


About KelseyKelsey Hightower is the Principal Developer Advocate at Google, the co-chair of KubeCon, the world's premier Kubernetes conference, and an open source enthusiast. He's also the co-author of Kubernetes Up & Running: Dive into the Future of Infrastructure.Links: Twitter: @kelseyhightower Company site: Google.com Book: Kubernetes Up & Running: Dive into the Future of Infrastructure TranscriptAnnouncer: Hello and welcome to Screaming in the Cloud, with your host Cloud economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. I'm joined this week by Kelsey Hightower, who claims to be a principal developer advocate at Google, but based upon various keynotes I've seen him in, he basically gets on stage and plays video games like Tetris in front of large audiences. So I assume he is somehow involved with e-sports. Kelsey, welcome to the show.Kelsey: You've outed me. Most people didn't know that I am a full-time e-sports Tetris champion at home. And the technology thing is just a side gig.Corey: Exactly. It's one of those things you do just to keep the lights on, like you're waiting to get discovered, but in the meantime, you're waiting table. Same type of thing. Some people wait tables you more or less a sling Kubernetes, for lack of a better term.Kelsey: Yes.Corey: So let's dive right into this. You've been a strong proponent for a long time of Kubernetes and all of its intricacies and all the power that it unlocks and I've been pretty much the exact opposite of that, as far as saying it tends to be over complicated, that it's hype-driven and a whole bunch of other, shall we say criticisms that are sometimes bounded in reality and sometimes just because I think it'll be funny when I put them on Twitter. Where do you stand on the state of Kubernetes in 2020?Kelsey: So, I want to make sure it's clear what I do. Because when I started talking about Kubernetes, I was not working at Google. I was actually working at CoreOS where we had a competitor Kubernetes called Fleet. And Kubernetes coming out kind of put this like fork in our roadmap, like where do we go from here? What people saw me doing with Kubernetes was basically learning in public. Like I was really excited about the technology because it's attempting to solve a very complex thing. I think most people will agree building a distributed system is what cloud providers typically do, right? With VMs and hypervisors. Those are very big, complex distributed systems. And before Kubernetes came out, the closest I'd gotten to a distributed system before working at CoreOS was just reading the various white papers on the subject and hearing stories about how Google has systems like Borg tools, like Mesa was being used by some of the largest hyperscalers in the world, but I was never going to have the chance to ever touch one of those unless I would go work at one of those companies.So when Kubernetes came out and the fact that it was open source and I could read the code to understand how it was implemented, to understand how schedulers actually work and then bonus points for being able to contribute to it. Those early years, what you saw me doing was just being so excited about systems that I attended to build on my own, becoming this new thing just like Linux came up. So I kind of agree with you that a lot of people look at it as a more of a hype thing. They're looking at it regardless of their own needs, regardless of understanding how it works and what problems is trying to solve that. My stance on it, it's a really, really cool tool for the level that it operates in, and in order for it to be successful, people can't know that it's there.Corey: And I think that might be where part of my disconnect from Kubernetes comes into play. I have a background in ops, more or less, the grumpy Unix sysadmin because it's not like there's a second kind of Unix sysadmin you're ever going to encounter. Where everything in development works in theory, but in practice things pan out a little differently. I always joke that ops is the difference between theory and practice. In theory, devs can do everything and there's no ops needed. In practice, well it's been a burgeoning career for a while. The challenge with this is Kubernetes at times exposes certain levels of abstraction that, sorry certain levels of detail that generally people would not want to have to think about or deal with, while papering over other things with other layers of abstraction on top of it. That obscure, valuable troubleshooting information from a running something in an operational context. It absolutely is a fascinating piece of technology, but it feels today like it is overly complicated for the use a lot of people are attempting to put it to. Is that a fair criticism from where you sit?Kelsey: So I think the reason why it's a fair criticism is because there are people attempting to run their own Kubernetes cluster, right? So when we think about the cloud, unless you're in OpenStack land, but for the people who look at the cloud and you say, "Wow, this is much easier." There's an API for creating virtual machines and I don't see the distributed state store that's keeping all of that together. I don't see the farm of hypervisors. So we don't necessarily think about the inherent complexity into a system like that, because we just get to use it. So on one end, if you're just a user of a Kubernetes cluster, maybe using something fully managed or you have an ops team that's taking care of everything, your interface of the system becomes this Kubernetes configuration language where you say, "Give me a load balancer, give me three copies of this container running." And if we do it well, then you'd think it's a fairly easy system to deal with because you say, "kubectl, apply," and things seem to start running.Just like in the cloud where you say, "AWS create this VM, or G cloud compute instance, create." You just submit API calls and things happen. I think the fact that Kubernetes is very transparent to most people is, now you can see the complexity, right? Imagine everyone driving with the hood off the car. You'd be looking at a lot of moving things, but we have hoods on cars to hide the complexity and all we expose is the steering wheel and the pedals. That car is super complex but we don't see it. So therefore we don't attribute as complexity to the driving experience.Corey: This to some extent feels it's on the same axis as serverless, with just a different level of abstraction piled onto it. And while I am a large proponent of serverless, I think it's fantastic for a lot of Greenfield projects. The constraints inherent to the model mean that it is almost completely non-tenable for a tremendous number of existing workloads. Some developers like to call it legacy, but when I hear the term legacy I hear, "it makes actual money." So just treating it as, "Oh, it's a science experiment we can throw into a new environment, spend a bunch of time rewriting it for minimal gains," is just not going to happen as companies undergo digital transformations, if you'll pardon the term.Kelsey: Yeah, so I think you're right. So let's take Amazon's Lambda for example, it's a very opinionated high-level platform that assumes you're going to build apps a certain way. And if that's you, look, go for it. Now, one or two levels below that there is this distributed system. Kubernetes decided to play in that space because everyone that's building other platforms needs a place to start. The analogy I like to think of is like in the mobile space, iOS and Android deal with the complexities of managing multiple applications on a mobile device, security aspects, app stores, that kind of thing. And then you as a developer, you build your thing on top of those platforms and APIs and frameworks. Now, it's debatable, someone would say, "Why do we even need an open-source implementation of such a complex system? Why not just everyone moved to the cloud?" And then everyone that's not in a cloud on-premise gets left behind.But typically that's not how open source typically works, right? The reason why we have Linux, the precursor to the cloud is because someone looked at the big proprietary Unix systems and decided to re-implement them in a way that anyone could run those systems. So when you look at Kubernetes, you have to look at it from that lens. It's the ability to democratize these platform layers in a way that other people can innovate on top. That doesn't necessarily mean that everyone needs to start with Kubernetes, just like not everyone needs to start with the Linux server, but it's there for you to build the next thing on top of, if that's the route you want to go.Corey: It's been almost a year now since I made an original tweet about this, that in five years, no one will care about Kubernetes. So now I guess I have four years running on that clock and that attracted a bit of, shall we say controversy. There were people who thought that I meant that it was going to be a flash in the pan and it would dry up and blow away. But my impression of it is that in, well four years now, it will have become more or less system D for the data center, in that there's a bunch of complexity under the hood. It does a bunch of things. No-one sensible wants to spend all their time mucking around with it in most companies. But it's not something that people have to think about in an ongoing basis the way it feels like we do today.Kelsey: Yeah, I mean to me, I kind of see this as the natural evolution, right? It's new, it gets a lot of attention and kind of the assumption you make in that statement is there's something better that should be able to arise, giving that checkpoint. If this is what people think is hot, within five years surely we should see something else that can be deserving of that attention, right? Docker comes out and almost four or five years later you have Kubernetes. So it's obvious that there should be a progression here that steals some of the attention away from Kubernetes, but I think where it's so new, right? It's only five years in, Linux is like over 20 years old now at this point, and it's still top of mind for a lot of people, right? Microsoft is still porting a lot of Windows only things into Linux, so we still discuss the differences between Windows and Linux.The idea that the cloud, for the most part, is driven by Linux virtual machines, that I think the majority of workloads run on virtual machines still to this day, so it's still front and center, especially if you're a system administrator managing BDMs, right? You're dealing with tools that target Linux, you know the Cisco interface and you're thinking about how to secure it and lock it down. Kubernetes is just at the very first part of that life cycle where it's new. We're all interested in even what it is and how it works, and now we're starting to move into that next phase, which is the distro phase. Like in Linux, you had Red Hat, Slackware, Ubuntu, special purpose distros.Some will consider Android a special purpose distribution of Linux for mobile devices. And now that we're in this distro phase, that's going to go on for another 5 to 10 years where people start to align themselves around, maybe it's OpenShift, maybe it's GKE, maybe it's Fargate for EKS. These are now distributions built on top of Kubernetes that start to add a little bit more opinionation about how Kubernetes should be pushed together. And then we'll enter another phase where you'll build a platform on top of Kubernetes, but it won't be worth mentioning that Kubernetes is underneath because people will be more interested on the thing above.Corey: I think we're already seeing that now, in terms of people no longer really care that much what operating system they're running, let alone with distribution of that operating system. The things that you have to care about slip below the surface of awareness and we've seen this for a long time now. Originally to install a web server, it wound up taking a few days and an intimate knowledge of GCC compiler flags, then RPM or D package and then yum on top of that, then ensure installed, once we had configuration management that was halfway decent.Then Docker run, whatever it is. And today feels like it's with serverless technologies being what they are, it's effectively a push a file to S3 or it's equivalent somewhere else and you're done. The things that people have to be aware of and the barrier to entry continually lowers. The downside to that of course, is that things that people specialize in today and effectively make very lucrative careers out of are going to be not front and center in 5 to 10 years the way that they are today. And that's always been the way of technology. It's a treadmill to some extent.Kelsey: And on the flip side of that, look at all of the new jobs that are centered around these cloud-native technologies, right? So you know, we're just going to make up some numbers here, imagine if there were only 10,000 jobs around just Linux system administration. Now when you look at this whole Kubernetes landscape where people are saying we can actually do a better job with metrics and monitoring. Observability is now a thing culturally that people assume you should have, because you're dealing with these distributed systems. The ability to start thinking about multi-regional deployments when I think that would've been infeasible with the previous tools or you'd have to build all those tools yourself. So I think now we're starting to see a lot more opportunities, where instead of 10,000 people, maybe you need 20,000 people because now you have the tools necessary to tackle bigger projects where you didn't see that before.Corey: That's what's going to be really neat to see. But the challenge is always to people who are steeped in existing technologies. What does this mean for them? I mean I spent a lot of time early in my career fighting against cloud because I thought that it was taking away a cornerstone of my identity. I was a large scale Unix administrator, specifically focusing on email. Well, it turns out that there aren't nearly as many companies that need to have that particular skill set in house as it did 10 years ago. And what we're seeing now is this sort of forced evolution of people's skillsets or they hunker down on a particular area of technology or particular application to try and make a bet that they can ride that out until retirement. It's challenging, but at some point it seems that some folks like to stop learning, and I don't fully pretend to understand that. I'm sure I will someday where, "No, at this point technology come far enough. We're just going to stop here, and anything after this is garbage." I hope not, but I can see a world in which that happens.Kelsey: Yeah, and I also think one thing that we don't talk a lot about in the Kubernetes community, is that Kubernetes makes hyper-specialization worth doing because now you start to have a clear separation from concerns. Now the OS can be hyperfocused on security system calls and not necessarily packaging every programming language under the sun into a single distribution. So we can kind of move part of that layer out of the core OS and start to just think about the OS being a security boundary where we try to lock things down. And for some people that play at that layer, they have a lot of work ahead of them in locking down these system calls, improving the idea of containerization, whether that's something like Firecracker or some of the work that you see VMware doing, that's going to be a whole class of hyper-specialization. And the reason why they're going to be able to focus now is because we're starting to move into a world, whether that's serverless or the Kubernetes API.We're saying we should deploy applications that don't target machines. I mean just that step alone is going to allow for so much specialization at the various layers because even on the networking front, which arguably has been a specialization up until this point, can truly specialize because now the IP assignments, how networking fits together, has also abstracted a way one more step where you're not asking for interfaces or binding to a specific port or playing with port mappings. You can now let the platform do that. So I think for some of the people who may be not as interested as moving up the stack, they need to be aware that the number of people we need being hyper-specialized at Linux administration will definitely shrink. And a lot of that work will move up the stack, whether that's Kubernetes or managing a serverless deployment and all the configuration that goes with that. But if you are a Linux, like that is your bread and butter, I think there's going to be an opportunity to go super deep, but you may have to expand into things like security and not just things like configuration management.Corey: Let's call it the unfulfilled promise of Kubernetes. On paper, I love what it hints at being possible. Namely, if I build something that runs well on top of Kubernetes than we truly have a write once, run anywhere type of environment. Stop me if you've heard that one before, 50,000 times in our industry... or history. But in practice, as has happened before, it seems like it tends to fall down for one reason or another. Now, Amazon is famous because for many reasons, but the one that I like to pick on them for is, you can't say the word multi-cloud at their events. Right. That'll change people's perspective, good job. The people tend to see multi-cloud are a couple of different lenses.I've been rather anti multi-cloud from the perspective of the idea that you're setting out day one to build an application with the idea that it can be run on top of any cloud provider, or even on-premises if that's what you want to do, is generally not the way to proceed. You wind up having to make certain trade-offs along the way, you have to rebuild anything that isn't consistent between those providers, and it slows you down. Kubernetes on the other hand hints at if it works and fulfills this promise, you can suddenly abstract an awful lot beyond that and just write generic applications that can run anywhere. Where do you stand on the whole multi-cloud topic?Kelsey: So I think we have to make sure we talk about the different layers that are kind of ready for this thing. So for example, like multi-cloud networking, we just call that networking, right? What's the IP address over there? I can just hit it. So we don't make a big deal about multi-cloud networking. Now there's an area where people say, how do I configure the various cloud providers? And I think the healthy way to think about this is, in your own data centers, right, so we know a lot of people have investments on-premises. Now, if you were to take the mindset that you only need one provider, then you would try to buy everything from HP, right? You would buy HP store's devices, you buy HP racks, power. Maybe HP doesn't sell air conditioners. So you're going to have to buy an air conditioner from a vendor who specializes in making air conditioners, hopefully for a data center and not your house.So now you've entered this world where one vendor does it make every single piece that you need. Now in the data center, we don't say, "Oh, I am multi-vendor in my data center." Typically, you just buy the switches that you need, you buy the power racks that you need, you buy the ethernet cables that you need, and they have common interfaces that allow them to connect together and they typically have different configuration languages and methods for configuring those components. The cloud on the other hand also represents the same kind of opportunity. There are some people who really love DynamoDB and S3, but then they may prefer something like BigQuery to analyze the data that they're uploading into S3. Now, if this was a data center, you would just buy all three of those things and put them in the same rack and call it good.But the cloud presents this other challenge. How do you authenticate to those systems? And then there's usually this additional networking costs, egress or ingress charges that make it prohibitive to say, "I want to use two different products from two different vendors." And I think that's-Corey: ...winds up causing serious problems.Kelsey: Yes, so that data gravity, the associated cost becomes a little bit more in your face. Whereas, in a data center you kind of feel that the cost has already been paid. I already have a network switch with enough bandwidth, I have an extra port on my switch to plug this thing in and they're all standard interfaces. Why not? So I think the multi-cloud gets lost in the chew problem, which is the barrier to entry of leveraging things across two different providers because of networking and configuration practices.Corey: That's often the challenge, I think, that people get bogged down in. On an earlier episode of this show we had Mitchell Hashimoto on, and his entire theory around using Terraform to wind up configuring various bits of infrastructure, was not the idea of workload portability because that feels like the windmill we all keep tilting at and failing to hit. But instead the idea of workflow portability, where different things can wind up being interacted with in the same way. So if this one division is on one cloud provider, the others are on something else, then you at least can have some points of consistency in how you interact with those things. And in the event that you do need to move, you don't have to effectively redo all of your CICD process, all of your tooling, et cetera. And I thought that there was something compelling about that argument.Kelsey: And that's actually what Kubernetes does for a lot of people. For Kubernetes, if you think about it, when we start to talk about workflow consistency, if you want to deploy an application, queue CTL, apply, some config, you want the application to have a load balancer in front of it. Regardless of the cloud provider, because Kubernetes has an extension point we call the cloud provider. And that's where Amazon, Azure, Google Cloud, we do all the heavy lifting of mapping the high-level ingress object that specifies, "I want a load balancer, maybe a few options," to the actual implementation detail. So maybe you don't have to use four or five different tools and that's where that kind of workload portability comes from. Like if you think about Linux, right? It has a set of system calls, for the most part, even if you're using a different distro at this point, Red Hat or Amazon Linux or Google's container optimized Linux.If I build a Go binary on my laptop, I can SCP it to any of those Linux machines and it's going to probably run. So you could call that multi-cloud, but that doesn't make a lot of sense because it's just because of the way Linux works. Kubernetes does something very similar because it sits right on top of Linux, so you get the portability just from the previous example and then you get the other portability and workload, like you just stated, where I'm calling kubectl apply, and I'm using the same workflow to get resources spun up on the various cloud providers. Even if that configuration isn't one-to-one identical.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: One thing I'm curious about is you wind up walking through the world and seeing companies adopting Kubernetes in different ways. How are you finding the adoption of Kubernetes is looking like inside of big E enterprise style companies? I don't have as much insight into those environments as I probably should. That's sort of a focus area for the next year for me. But in startups, it seems that it's either someone goes in and rolls it out and suddenly it's fantastic, or they avoid it entirely and do something serverless. In large enterprises, I see a lot of Kubernetes and a lot of Kubernetes stories coming out of it, but what isn't usually told is, what's the tipping point where they say, "Yeah, let's try this." Or, "Here's the problem we're trying to solve for. Let's chase it."Kelsey: What I see is enterprises buy everything. If you're big enough and you have a big enough IT budget, most enterprises have a POC of everything that's for sale, period. There's some team in some pocket, maybe they came through via acquisition. Maybe they live in a different state. Maybe it's just a new project that came out. And what you tend to see, at least from my experiences, if I walk into a typical enterprise, they may tell me something like, "Hey, we have a POC, a Pivotal Cloud Foundry, OpenShift, and we want some of that new thing that we just saw from you guys. How do we get a POC going?" So there's always this appetite to evaluate what's for sale, right? So, that's one case. There's another case where, when you start to think about an enterprise there's a big range of skillsets. Sometimes I'll go to some companies like, "Oh, my insurance is through that company, and there's ex-Googlers that work there." They used to work on things like Borg, or something else, and they kind of know how these systems work.And they have a slightly better edge at evaluating whether Kubernetes is any good for the problem at hand. And you'll see them bring it in. Now that same company, I could drive over to the other campus, maybe it's five miles away and that team doesn't even know what Kubernetes is. And for them, they're going to be chugging along with what they're currently doing. So then the challenge becomes if Kubernetes is a great fit, how wide of a fit it isn't? How many teams at that company should be using it? So what I'm currently seeing as there are some enterprises that have found a way to make Kubernetes the place where they do a lot of new work, because that makes sense. A lot of enterprises to my surprise though, are actually stepping back and saying, "You know what? We've been stitching together our own platform for the last five years. We had the Netflix stack, we got some Spring Boot, we got Console, we got Vault, we got Docker. And now this whole thing is getting a little more fragile because we're doing all of this glue code."Kubernetes, We've been trying to build our own Kubernetes and now that we know what it is and we know what it isn't, we know that we can probably get rid of this kind of bespoke stack ourselves and just because of the ecosystem, right? If I go to HashiCorp's website, I would probably find the word Kubernetes as much as I find the word Nomad on their site because they've made things like Console and Vault become first-class offerings inside of the world of Kubernetes. So I think it's that momentum that you see across even People Oracle, Juniper, Palo Alto Networks, they're all have seem to have a Kubernetes story. And this is why you start to see the enterprise able to adopt it because it's so much in their face and it's where the ecosystem is going.Corey: It feels like a lot of the excitement and the promise and even the same problems that Kubernetes is aimed at today, could have just as easily been talked about half a decade ago in the context of OpenStack. And for better or worse, OpenStack is nowhere near where it once was. It would felt like it had such promise and such potential and when it didn't pan out, that left a lot of people feeling relatively sad, burnt out, depressed, et cetera. And I'm seeing a lot of parallels today, at least between what was said about OpenStack and what was said about Kubernetes. How do you see those two diverging?Kelsey: I will tell you the big difference that I saw, personally. Just for my personal journey outside of Google, just having that option. And I remember I was working at a company and we were like, "We're going to roll our own OpenStack. We're going to buy a free BSD box and make it a file server. We're going all open sources," like do whatever you want to do. And that was just having so many issues in terms of first-class integrations, education, people with the skills to even do that. And I was like, "You know what, let's just cut the check for VMware." We want virtualization. VMware, for the cost and when it does, it's good enough. Or we can just actually use a cloud provider. That space in many ways was a purely solved problem. Now, let's fast forward to Kubernetes, and also when you get OpenStack finished, you're just back where you started.You got a bunch of VMs and now you've got to go figure out how to build the real platform that people want to use because no one just wants a VM. If you think Kubernetes is low level, just having OpenStack, even OpenStack was perfect. You're still at square one for the most part. Maybe you can just say, "Now I'm paying a little less money for my stack in terms of software licensing costs," but from an extraction and automation and API standpoint, I don't think OpenStack moved the needle in that regard. Now in the Kubernetes world, it's solving a huge gap.Lots of people have virtual machine sprawl than they had Docker sprawl, and when you bring in this thing by Kubernetes, it says, "You know what? Let's reign all of that in. Let's build some first-class abstractions, assuming that the layer below us is a solved problem." You got to remember when Kubernetes came out, it wasn't trying to replace the hypervisor, it assumed it was there. It also assumed that the hypervisor had APIs for creating virtual machines and attaching disc and creating load balancers, so Kubernetes came out as a complementary technology, not one looking to replace. And I think that's why it was able to stick because it solved a problem at another layer where there was not a lot of competition.Corey: I think a more cynical take, at least one of the ones that I've heard articulated and I tend to agree with, was that OpenStack originally seemed super awesome because there were a lot of interesting people behind it, fascinating organizations, but then you wound up looking through the backers of the foundation behind it and the rest. And there were something like 500 companies behind it, an awful lot of them were these giant organizations that ... they were big e-corporate IT enterprise software vendors, and you take a look at that, I'm not going to name anyone because at that point, oh will we get letters.But at that point, you start seeing so many of the patterns being worked into it that it almost feels like it has to collapse under its own weight. I don't, for better or worse, get the sense that Kubernetes is succumbing to the same thing, despite the CNCF having an awful lot of those same backers behind it and as far as I can tell, significantly more money, they seem to have all the money to throw at these sorts of things. So I'm wondering how Kubernetes has managed to effectively sidestep I guess the open-source miasma that OpenStack didn't quite manage to avoid.Kelsey: Kubernetes gained its own identity before the foundation existed. Its purpose, if you think back from the Borg paper almost eight years prior, maybe even 10 years prior. It defined this problem really, really well. I think Mesos came out and also had a slightly different take on this problem. And you could just see at that time there was a real need, you had choices between Docker Swarm, Nomad. It seems like everybody was trying to fill in this gap because, across most verticals or industries, this was a true problem worth solving. What Kubernetes did was played in the exact same sandbox, but it kind of got put out with experience. It's not like, "Oh, let's just copy this thing that already exists, but let's just make it open."And in that case, you don't really have your own identity. It's you versus Amazon, in the case of OpenStack, it's you versus VMware. And that's just really a hard place to be in because you don't have an identity that stands alone. Kubernetes itself had an identity that stood alone. It comes from this experience of running a system like this. It comes from research and white papers. It comes after previous attempts at solving this problem. So we agree that this problem needs to be solved. We know what layer it needs to be solved at. We just didn't get it right yet, so Kubernetes didn't necessarily try to get it right.It tried to start with only the primitives necessary to focus on the problem at hand. Now to your point, the extension interface of Kubernetes is what keeps it small. Years ago I remember plenty of meetings where we all got in rooms and said, "This thing is done." It doesn't need to be a PaaS. It doesn't need to compete with serverless platforms. The core of Kubernetes, like Linux, is largely done. Here's the core objects, and we're going to make a very great extension interface. We're going to make one for the container run time level so that way people can swap that out if they really want to, and we're going to do one that makes other APIs as first-class as ones we have, and we don't need to try to boil the ocean in every Kubernetes release. Everyone else has the ability to deploy extensions just like Linux, and I think that's why we're avoiding some of this tension in the vendor world because you don't have to change the core to get something that feels like a native part of Kubernetes.Corey: What do you think is currently being the most misinterpreted or misunderstood aspect of Kubernetes in the ecosystem?Kelsey: I think the biggest thing that's misunderstood is what Kubernetes actually is. And the thing that made it click for me, especially when I was writing the tutorial Kubernetes The Hard Way. I had to sit down and ask myself, "Where do you start trying to learn what Kubernetes is?" So I start with the database, right? The configuration store isn't Postgres, it isn't MySQL, it's Etcd. Why? Because we're not trying to be this generic data stores platform. We just need to store configuration data. Great. Now, do we let all the components talk to Etcd? No. We have this API server and between the API server and the chosen data store, that's essentially what Kubernetes is. You can stop there. At that point, you have a valid Kubernetes cluster and it can understand a few things. Like I can say, using the Kubernetes command-line tool, create this configuration map that stores configuration data and I can read it back.Great. Now I can't do a lot of things that are interesting with that. Maybe I just use it as a configuration store, but then if I want to build a container platform, I can install the Kubernetes kubelet agent on a bunch of machines and have it talk to the API server looking for other objects you add in the scheduler, all the other components. So what that means is that Kubernetes most important component is its API because that's how the whole system is built. It's actually a very simple system when you think about just those two components in isolation. If you want a container management tool that you need a scheduler, controller, manager, cloud provider integrations, and now you have a container tool. But let's say you want a service mesh platform. Well in a service mesh you have a data plane that can be Nginx or Envoy and that's going to handle routing traffic. And you need a control plane. That's going to be something that takes in configuration and it uses that to configure all the things in a data plane.Well, guess what? Kubernetes is 90% there in terms of a control plane, with just those two components, the API server, and the data store. So now when you want to build control planes, if you start with the Kubernetes API, we call it the API machinery, you're going to be 95% there. And then what do you get? You get a distributed system that can handle kind of failures on the back end, thanks to Etcd. You're going to get our backs or you can have permission on top of your schemas, and there's a built-in framework, we call it custom resource definitions that allows you to articulate a schema and then your own control loops provide meaning to that schema. And once you do those two things, you can build any platform you want. And I think that's one thing that it takes a while for people to understand that part of Kubernetes, that the thing we talk about today, for the most part, is just the first system that we built on top of this.Corey: I think that's a very far-reaching story with implications that I'm not entirely sure I am able to wrap my head around. I hope to see it, I really do. I mean you mentioned about writing Learn Kubernetes the Hard Way and your tutorial, which I'll link to in the show notes. I mean my, of course, sarcastic response to that recently was to register the domain Kubernetes the Easy Way and just re-pointed to Amazon's ECS, which is in no way shape or form Kubernetes and basically has the effect of irritating absolutely everyone as is my typical pattern of behavior on Twitter. But I have been meaning to dive into Kubernetes on a deeper level and the stuff that you've written, not just the online tutorial, both the books have always been my first port of call when it comes to that. The hard part, of course, is there's just never enough hours in the day.Kelsey: And one thing that I think about too is like the web. We have the internet, there's webpages, there's web browsers. Web Browsers talk to web servers over HTTP. There's verbs, there's bodies, there's headers. And if you look at it, that's like a very big complex system. If I were to extract out the protocol pieces, this concept of HTTP verbs, get, put, post and delete, this idea that I can put stuff in a body and I can give it headers to give it other meaning and semantics. If I just take those pieces, I can bill restful API's.Hell, I can even bill graph QL and those are just different systems built on the same API machinery that we call the internet or the web today. But you have to really dig into the details and pull that part out and you can build all kind of other platforms and I think that's what Kubernetes is. It's going to probably take people a little while longer to see that piece, but it's hidden in there and that's that piece that's going to be, like you said, it's going to probably be the foundation for building more control planes. And when people build control planes, I think if you think about it, maybe Fargate for EKS represents another control plane for making a serverless platform that takes to Kubernetes API, even though the implementation isn't what you find on GitHub.Corey: That's the truth. Whenever you see something as broadly adopted as Kubernetes, there's always the question of, "Okay, there's an awful lot of blog posts." Getting started to it, learn it in 10 minutes, I mean at some point, I'm sure there are some people still convince Kubernetes is, in fact, a breakfast cereal based upon what some of the stuff the CNCF has gotten up to. I wouldn't necessarily bet against it socks today, breakfast cereal tomorrow. But it's hard to find a decent level of quality, finding the certain quality bar of a trusted source to get started with is important. Some people believe in the hero's journey, story of a narrative building.I always prefer to go with the morons journey because I'm the moron. I touch technologies, I have no idea what they do and figure it out and go careening into edge and corner cases constantly. And by the end of it I have something that vaguely sort of works and my understanding's improved. But I've gone down so many terrible paths just by picking a bad point to get started. So everyone I've talked to who's actually good at things has pointed to your work in this space as being something that is authoritative and largely correct and given some of these people, that's high praise.Kelsey: Awesome. I'm going to put that on my next performance review as evidence of my success and impact.Corey: Absolutely. Grouchy people say, "It's all right," you know, for the right people that counts. If people want to learn more about what you're up to and see what you have to say, where can they find you?Kelsey: I aggregate most of outward interactions on Twitter, so I'm @KelseyHightower and my DMs are open, so I'm happy to field any questions and I attempt to answer as many as I can.Corey: Excellent. Thank you so much for taking the time to speak with me today. I appreciate it.Kelsey: Awesome. I was happy to be here.Corey: Kelsey Hightower, Principal Developer Advocate at Google. I'm Corey Quinn. This is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on Apple podcasts. If you've hated this podcast, please leave a five-star review on Apple podcasts and then leave a funny comment. Thanks.Announcer: This has been this week's episode of Screaming in the Cloud. You can also find more Core at screaminginthecloud.com or wherever fine snark is sold.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Winning Hearts and Minds in Cloud with Brian Hall

Screaming in the Cloud

Play Episode Listen Later Dec 13, 2022 37:51


About BrianBrian leads the Google Cloud Product and Industry Marketing team. This team is focused on accelerating the growth of Google Cloud by establishing thought leadership, increasing demand and usage, enabling their sales teams and partners to tell their product stories with excellence, and helping their customers be the best advocates for them.Before joining Google, Brian spent over 25 years in product marketing or engineering in different forms. He started his career at Microsoft and had a very non-traditional path for 20 years. Brian worked in every product division except for cloud. He did marketing, product management, and engineering roles. And, early on, he was the first speech writer for Steve Ballmer and worked on Bill Gates' speeches too. His last role was building up the Microsoft Surface business from scratch as VP of the hardware businesses. After Microsoft, Brian spent a year as CEO at a hardware startup called Doppler Labs, where they made a run at transforming hearing, and then spent two years as VP at Amazon Web Services leading product marketing, developer advocacy, and a bunch more marketing teams.Brian has three kids still at home, Barty, Noli, and Alder, who are all named after trees in different ways. His wife Edie and him met right at the beginning of their first year at Yale University, where Brian studied math, econ, and philosophy and was the captain of the Swim and Dive team his senior year. Edie has a PhD in forestry and runs a sustainability and forestry consulting firm she started, that is aptly named “Three Trees Consulting”. As a family they love the outdoors, tennis, running, and adventures in Brian's 1986 Volkswagen Van, which is his first and only car, that he can't bring himself to get rid of.Links Referenced: Google Cloud: https://cloud.google.com @isforat: https://twitter.com/IsForAt LinkedIn: https://www.linkedin.com/in/brhall/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: This episode is brought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups. If you're tired of the vulnerabilities, costs, and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, that's V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This episode is brought to us by our friends at Google Cloud and, as a part of that, they have given me someone to, basically, harass for the next half hour. Brian Hall is the VP of Product Marketing over at Google Cloud. Brian, welcome back.Brian: Hello, Corey. It's good to be here, and technically, we've given you time to harass me by speaking with me because you never don't have the time to harass me on Twitter and other places, and you're very good at it.Corey: Well, thank you. Again, we first met back when you were doing, effectively, the same role over at AWS. And before that, you spent only 20 years or so at Microsoft. So, you've now worked at all three of the large hyperscale cloud providers. You probably have some interesting perspectives on how the industry has evolved over that time. So, at the time of this recording, it is after Google Next and before re:Invent. There was also a Microsoft event there that I didn't pay much attention to. Where are we as a culture, as an industry, when it comes to cloud?Brian: Well, I'll start with it is amazing how early days it still is. I don't want to be put on my former Amazon cap too much, and I think it'd be pushing it a little bit to say it's complete and total day one with the cloud. But there's no question that there is a ton of evolution still to come. I mean, if you look at it, you can kind of break it into three eras so far. And roll with me here, and happy to take any dissent from you.But there was kind of a first era that was very much led by Amazon. We can call it the VM era or the component era, but being able to get compute on-demand, get nearly unlimited or actually unlimited storage with S3 was just remarkable. And it happened pretty quickly that startups, new tech companies, had to—like, it would be just wild to not start with AWS and actually start ordering servers and all that kind of stuff. And so, I look at that as kind of the first phase. And it was remarkable how long Amazon had a run really as the only player there. And maybe eight years ago—six years ago—we could argue on timeframes, things shifted a little bit because the enterprises, the big companies, and the governments finally realized, “Holy crow. This thing has gotten far enough that it's not just for these startups.”Corey: Yeah. There was a real change. There was an eye-opening moment there where it isn't just, “I want to go and sell things online.” It's, “And I also want to be a bank. Can we do that with you?” And, “Huh.”Brian: My SAP—like I don't know big that darn thing is going to get. Could I put it in your cloud? And, “Oh, by the way, CapEx forecasting stinks. Can you get me out of that?” And so, it became like the traditional IT infrastructure. All of the sudden, the IT guys showed up at the party, which I know is—it sounds fun to me, but that doesn't sound like the best addition to a party for many people. And so essentially, old-school IT infrastructure finally came to the cloud and Microsoft couldn't miss that happening when it did. But it was a major boon for AWS just because of the position that they had already.Corey: And even Google as well. All three of you now are pivoting in a lot of the messaging to talk to the big E enterprises out there. And I've noticed for the last few years, and I'm not entirely alone. When I go to re:Invent, and I look at announcements they're making, sure they have for the serverless stuff and how to run websites and EC2 nonsense. And then they're talking about IOT things and other things that just seem very oriented on a persona I don't understand. Everyone's doing stuff with mainframes now for example. And it feels like, “Oh, those of us who came here for the web services like it says on the name of the company aren't really feeling like it's for us anymore.” It's the problem of trying to be for everyone and pivoting to where the money is going, but Google's done this at least as much as anyone has in recent years. Are those of us who don't have corporate IT-like problems no longer the target market for folks or what's changed?Brian: It's still the target market, so like, you take the corporate IT, they're obviously still moving to the cloud. And there's a ton of opportunity. Just take existing IT spending and see a number over $1 trillion per year, and if you take the run rates of Microsoft, Amazon, Google Cloud, it's certainly over $100 billion, but that means it's still less than ten percent of what is existing IT spending. There are many people that think that existing IT spend number is significantly higher than that. But to your point on what's changing, there's actually a third wave that's happening.So, if the first wave was you start a company. You're a tech company, of course, you start it on AWS or on the Cloud. Second wave is all the IT people, IT departments, the central organizations that run technology for all the people that are not technology people come to the cloud. This third wave is everybody has to become a technology person. If you're a business leader, like you're at a fast-food restaurant and you're responsible for the franchisee relations, before, like, you needed to get an EDI system running or something, and so you told your IT department to figure out.Now, you have to actually think about what apps do we want to provide to our customers. How do I get the right data to my franchisees so that they can make business decisions? How can I automate all that? And you know, whereas before I was a guy wearing a suit or a gal wearing a suit who didn't need to know technology, I now have to. And that's what's changing the most. And it's why the Target Addressable Market—or the TAM as business folk sometimes say—it's really hard to estimate looking forward if every business is really needing to become a technology business in many ways. And it didn't dawn on me, honestly, and you can give me all the ribbing that I probably deserve for this—but it didn't really dawn on me until I came to Google and kept hearing the transformation word, “Digital transformation, digital transformation,” and honestly, having been in software for so long, I didn't really know what digital transformation meant until I started seeing all of these folks, like every company have to become a tech company effectively.Corey: Yeah. And it turns out there aren't enough technologists to go around, so it's very challenging to wind up getting the expertise in-house. It's natural to start looking at, “Well, how do we effectively outsource this?” And well, you can absolutely have a compression algorithm for experience. It's called, “Buying products and services and hiring people who have that experience already baked in either to the product or they show up knowing how to do something because they've done this before.”Brian: That's right. The thing I think we have to—for those of us that come from the technology side, this transformation is scary for the people who all of the sudden have to get tech and be like—Corey, if you or I—actually, you're very artistic, so maybe this wouldn't do it for you—but if I were told, “Hey, Brian, for your livelihood, you now need to incorporate painting,” like…Corey: [laugh]. I can't even write legibly let alone draw or paint. That is not my skill set. [laugh].Brian: I'd be like, “Wait, what? I'm not good at painting. I've never been a painting person, like I'm not creative.” “Okay. Great. Then we're going to fire you, or we're going to bring someone in who can.” Like, that'd be scary. And so, having more services, more people that can help as every company goes through a transition like that—and it's interesting, it's why during Covid, the cloud did really well, and some people kind of said, “Well, it's because they—people didn't want to send their people into their data centers.” No. That wasn't it. It was really because it just forced the change to digital. Like the person to, maybe, batter the analogy a little bit—the person who was previously responsible for all of the physical banks, which are—a bank has, you know, that are retail locations—the branches—they have those in order to service the retail customers.Corey: Yeah.Brian: That person, all of the sudden, had to figure out, “How do I do all that service via phone, via agents, via an app, via our website.” And that person, that entire organization, was forced digital in many ways. And that certainly had a lot of impact on the cloud, too.Corey: Yeah. I think that some wit observed a few years back that Covid has had more impact on your digital transformation than your last ten CIOs combined.Brian: Yeah.Corey: And—yeah, suddenly, you're forcing people into a position where there really is no other safe option. And some of that has unwound but not a lot of it. There's still seem to be those same structures and ability to do things from remote locations then there were before 2020.Brian: Yeah. Since you asked, kind of, where we are in the industry, to bring all of that to an endpoint, now what this means is people are looking for cloud providers, not just to have the primitives, not just to have the IT that they—their central IT needed, but they need people who can help them build the things that will help their business transform. It makes it a fun, new stage, new era, a transformation era for companies like Google to be able to say, “Hey, here's how we build things. Here's what we've learned over a period of time. Here's what we've most importantly learned from other customers, and we want to help be your strategic partner in that transformation.” And like I said, it'd be almost impossible to estimate what the TAM is for that. The real question is how quickly can we help customers and innovate in our Cloud solutions in order to make more of the stuff more powerful and faster to help people build.Corey: I want to say as well that—to be clear—you folks can buy my attention but not my opinion. I will not say things if I do not believe them. That's the way the world works here. But every time I use Google Cloud for something, I am taken aback yet again by the developer experience, how polished it is. And increasingly lately, it's not just that you're offering those low-lying primitives that composed together to build things higher up the stack, you're offering those things as well across a wide variety of different tooling options. And they just tend to all make sense and solve a need rather than requiring me to build it together myself from popsicle sticks.And I can't shake the feeling that that's where the industry is going. I'm going to want someone to sell me an app to do expense reports. I'm not going to want—well, I want a database and a front-end system, and how I wind up storing all the assets on the backend. No. I just want someone to give me something that solves that problem for me. That's what customers across the board are looking for as best I can see.Brian: Well, it certainly expands the number of customers that you can serve. I'll give you an example. We have an AI agent product called Call Center AI which allows you to either build a complete new call center solution, or more often it augments an existing call center platform. And we could sell that on an API call basis or a number of agent seats basis or anything like that. But that's not actually how call center leaders want to buy. Imagine we come in and say, “This many API calls or $4 per seat or per month,” or something like that. There's a whole bunch of work for that call center leader to go figure out, “Well, do I want to do this? Do I not? How should I evaluate it versus others?” It's quite complex. Whereas, if we come in and say, “Hey, we have a deal for you. We will guarantee higher customer satisfaction. We will guarantee higher agent retention. And we will save you money. And we will only charge you some percentage of the amount of money that you're saved.”Corey: It's a compelling pitch.Brian: Which is an easier one for a business decision-maker to decide to take?Corey: It's no contest. I will say it's a little odd that—one thing—since you brought it up, one thing that struck me as a bit strange about Contact Center AI, compared to most of the services I would consider to be Google Cloud, instead of, “Click here to get started,” it's, “Click here to get a demo. Reach out to contact us.” It feels—Brian: Yeah.Corey: —very much like the deals for these things are going to get signed on a golf course.Brian: [laugh]. They—I don't know about signed on a golf course. I do know that there is implementation work that needs to be done in order to build the models because it's the model for the AI, figuring out how your particular customers are served in your particular context that takes the work. And we need to bring in a partner or bring in our expertise to help build that out. But it sounds to me like you're looking to go golfing since you've looked into this situation.Corey: Just like painting, I'm no good at golfing either.Brian: [laugh].Corey: Honestly, it's—it just doesn't have the—the appeal isn't there for me for whatever reason. I smile; I nod; I tend to assume that, “Yeah, that's okay. I'll leave some areas for other people to go exploring in.”Brian: I see. I see.Corey: So, two weeks before Google Cloud Next occurred, you folks wound up canceling Stadia, which had been rumored for a while. People had been predicting it since it was first announced because, “Just wait. They're going to Google Reader it.” And yeah, it was consumer-side, and I do understand that that was not Cloud. But it did raise the specter of—for people to start talking once again about, “Oh, well, Google doesn't have any ability to focus on things long-term. They're going to turn off Cloud soon, too. So, we shouldn't be using it at all.” I do not agree with that assessment.But I want to get your take on it because I do have some challenges with the way that your products and services go to market in some ways. But I don't have the concern that you're going to turn it all off and decide, “Yeah, that was a fun experiment. We're done.” Not with Cloud, not at this point.Brian: Yeah. So, I'd start with at Google Cloud, it is our job to be a trusted enterprise platform. And I can't speak to before I was here. I can't speak to before Thomas Kurian, who's our CEO, was here before. But I can say that we are very, very focused on that. And deprecating products in a surprising way or in a way that doesn't take into account what customers are on it, how can we help those customers is certainly not going to help us do that. And so, we don't do that anymore.Stadia you brought up, and I wasn't part of starting Stadia. I wasn't part of ending Stadia. I honestly don't know anything about Stadia that any average tech-head might not know. But it is a different part of Google. And just like Amazon has deprecated plenty of services and devices and other things in their consumer world—and Microsoft has certainly deprecated many, many, many consumer and other products—like, that's a different model. And I won't say whether it's good, bad, or righteous, or not.But I can say at Google Cloud, we're doing a really good job right now. Can we get better? Of course. Always. We can get better at communicating, engaging customers in advance. But we now have a clean deprecation policy with a set of enterprise APIs that we commit to for stated periods of time. We also—like people should take a look. We're doing ten-year deals with companies like Deutsche Bank. And it's a sign that Google is here to last and Google Cloud in particular. It's also at a market level, just worth recognizing.We are a $27 billion run rate business now. And you earn trust in drips. You lose it in buckets. And we're—we recognize that we need to just keep every single day earning trust. And it's because we've been able to do that—it's part of the reason that we've gotten as large and as successful as we have—and when you get large and successful, you also tend to invest more and make it even more clear that we're going to continue on that path. And so, I'm glad that the market is seeing that we are enterprise-ready and can be trusted much, much more. But we're going to keep earning every single day.Corey: Yeah. I think it's pretty fair to say that you have definitely gotten yourselves into a place where you've done the things that I would've done if I wanted to shore up trust that the platform was not going to go away. Because these ten-year deals are with the kinds of companies that, shall we say, do not embark on signing contracts lightly. They very clearly, have asked you the difficult, pointed questions that I'm basically asking you now as cheap shots. And they ask it in very serious ways through multiple layers of attorneys. And if the answers aren't the right answers, they don't sign the contract. That is pretty clearly how the world works.The fact that companies are willing to move things like core trading systems over to you on a ten-year time horizon, tells me that I can observe whatever I want from the outside, but they have actual existential risk questions tied to what they're doing. And they are in some ways betting their future on your folks. You clearly know what those right answers are and how to articulate them. I think that's the side of things that the world does not get to see or think about very much. Because it is easy to point at all the consumer failings and the hundreds of messaging products that you continually replenish just in order to kill.Brian: [laugh].Corey: It's—like, what is it? The tree of liberty must be watered periodically from time to time, but the blood of patriots? Yeah. The logo of Google must be watered by the blood of canceled messaging products.Brian: Oh, come on. [laugh].Corey: Yeah. I'm going to be really scared if there's an actual, like, Pub/Sub service. I don't know. That counts as messaging, sort of. I don't know.Brian: [laugh]. Well, thank you. Thank you for the recognition of how far we've come in our trust from enterprises and trust from customers.Corey: I think it's the right path. There's also reputational issues, too. Because in the absence of new data, people don't tend to change their opinion on things very easily. And okay, there was a thing I was using. It got turned off. There was a big kerfuffle. That sticks in people's minds. But I've never seen an article about a Google service saying, “Oh, yeah. It hasn't been turned off or materially changed. In fact, it's gotten better with time. And it's just there working reliably.” You're either invisible, or you're getting yelled at.It feels like it's a microcosm of my early career stage of being a systems administrator. I'm either invisible or the mail system's broke, and everyone wants my head. I don't know what the right answer is—Brian: That was about right to me.Corey: —in this thing. Yeah. I don't know what the right answer on these things is, but you're definitely getting it right. I think the enterprise API endeavors that you've gone through over the past year or two are not broadly known. And frankly, you've definitely are ex-AWS because enterprise APIs is a terrible name for what these things are.Brian: [laugh].Corey: I'll let you explain it. Go ahead. And bonus points if you can do it without sounding like a press release. Take it away.Brian: There are a set of APIs that developers and companies should be able to know are going to be supported for the period of time that they need in order to run their applications and truly bet on them. And that's what we've done.Corey: Yeah. It's effectively a commitment that there will not be meaningful deprecations or changes to the API that are breaking changes without significant notice periods.Brian: Correct.Corey: And to be clear, that is exactly what all of the cloud providers have in their enterprise contracts. They're always notice periods around those things. There are always, at least, certain amounts of time and significant breach penalties in the event that, “Yeah, today, I decided that we were just not going to spin up VMs in that same way as we always have before. Sorry. Sucks to be you.” I don't see that happening on the Google Cloud side of the world very often, not like it once did. And again, we do want to talk about reputations.There are at least four services that I'm aware of that AWS has outright deprecated. One, Sumerian has said we're sunsetting the service in public. But on the other end of the spectrum, RDS on VMWare has been completely memory-holed. There's a blog post or two but nothing else remains in any of the AWS stuff, I'm sure, because that's an, “Enterprise-y” service, they wound up having one on one conversations with customers or there would have been a hue and cry. But every cloud provider does, in the fullness of time, turn some things off as they learn from their customers.Brian: Hmm. I hadn't heard anything about AWS Infinidash for a while either.Corey: No, no. It seems to be one of those great services that we made up on the internet one day for fun. And I love that just from a product marketing perspective. I mean, you know way more about that field than I do given that it's your job, and I'm just sitting here in this cheap seats throwing peanuts at you. But I love the idea of customers just come up and make up a product one day in your space and then the storytelling that immediately happens thereafter. Most companies would kill for something like that just because you would expect on some level to learn so much about how your reputation actually works. When there's a platonic ideal of a service that isn't bothered by pesky things like, “It has to exist,” what do people say about it? And how does that work?And I'm sort of surprised there wasn't more engagement from Amazon on that. It always seems like they're scared to say anything. Which brings me to a marketing question I have for you. You and Amazing have similar challenges—you being Google in this context, not you personally—in that your customers take themselves deadly seriously. And as a result, you have to take yourselves with at least that same level of seriousness. You can't go on Twitter and be the Wendy's Twitter account when you're dealing with enterprise buyers of cloud platforms. I'm kind of amazed, and I'd love to know. How can you manage to say anything at all? Because it just seems like you are so constrained, and there's no possible thing you can say that someone won't take issue with. And yes, some of the time, that someone is me.Brian: Well, let's start with going back to Infinidash a little bit. Yes, you identified one interesting thing about that episode, if I can call it an episode. The thing that I tell you though that didn't surprise me is it shows how much of cloud is actually learned from other people, not from the cloud provider itself. I—you're going to be going to re:Invent. You were at Google Cloud Next. Best thing about the industry conferences is not what the provider does. It's the other people that are there that you learn from. The folks that have done something that you've been trying to do and couldn't figure out how to do, and then they explained it to you, just the relationships that you get that help you understand what's going on in this industry that's changing so fast and has so much going on.And so,   And so, that part didn't surprise me. And that gets a little bit to the second part of your—that we're talking about. “How do you say anything?” As long as you're helping a customer say it. As long as you're helping someone who has been a fan of a product and has done interesting things with it say it, that's how you communicate for the most part, putting a megaphone in front of the people who already understand what's going on and helping their voice be heard, which is a lot more fun, honestly, than creating TV ads and banner ads and all of the stuff that a lot of consumer and traditional companies. We get to celebrate our customers and our creators much, much more.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: I think that it's not super well understood by a lot of folks out there that the official documentation that any cloud provider puts out there is kind of a last resort. Or I'm looking for the specific flag to a specific parameter of a specific command. Great. Sure. But what I really want to do whenever I'm googling how to do something—and yes, that—we're going to be googling—welcome. You've successfully owned that space to the point where it's become common parlance. Good work is I want to see what other people had said. I want to find blog posts, ideally recent ones, talking about how to do the thing that I'm trying to do. If I'm trying to do something relatively not that hard or not that uncommon, if I spin up three web servers behind a load-balancer, and I can't find any community references on how to do that thing, either I'm trying to do something absolutely bizarre and I should re-think it, or there is no community/customer base for the product talking about how to do things with it.And I have noticed a borderline Cambrian explosion over the last few years of the Google Cloud community. I'm seeing folks who do not work at Google, and also who have never worked at Google, and sometimes still think they work at Google in some cases. It's not those folks. It is people who are just building things as a customer. And they, in turn, become very passionate advocates for the platform. And they start creating content on these things.Brian: Yeah. We've been blessed to have, not only, the customer base grow, but essentially the passion among that customer base, and we've certainly tried to help building community and catalyzing the community, but it's been fun to watch how our customers' success turns into our success which turns into customer success. And it's interesting, in particular, to see too how much of that passion comes from people seeing that there is another way to do things.It's clear that many people in our industry knew cloud through the lens of Amazon, knew tech in general through the lenses of Microsoft and Oracle and a lot of other companies. And Google, which we try and respect specifically what people are trying to accomplish and how they know how to do it, we also many ways have taken a more opinionated approach, if you will, to say, “Hey, here's how this could be done in a different way.” And when people find something that's unexpectedly different and also delightful, it's more likely that they're going to be strong advocates and share that passion with the world.Corey: It's a virtuous cycle that leads to the continued growth and success of a platform. Something I've been wondering about in the broader sense, is what happens after this? Because if, let's say for the sake of argument, that one of the major cloud providers decided, “Okay. You know, we're going to turn this stuff off. We've decided we don't really want to be in the cloud business.” It turns out that high-margin businesses that wind up turning into cash monsters as soon as you stop investing heavily in growing them, just kind of throw off so much that, “We don't know what to do with. And we're running out of spaces to store it. So, we're getting out of it.” I don't know how that would even be possible at some point. Because given the amount of time and energy some customers take to migrate in, it would be a decade-long project for them to migrate back out again.So, it feels on some level like on the scale of a human lifetime, that we will be seeing the large public cloud providers, in more or less their current form, for the rest of our lives. Is that hopelessly naïve? Am I missing—am I overestimating how little change happens in the sweep of a human lifetime in technology?Brian: Well, I've been in the tech industry for 27 years now. And I've just seen a continual moving up the stack. Where, you know, there are fundamental changes. I think the PC becoming widespread, fundamental change; mobile, certainly becoming primary computing experience—what I know you call a toilet computer, I call my mobile; that's certainly been a change. Cloud has certainly been a change. And so, there are step functions for sure. But in general, what has been happening is things just keep moving up the stack. And as things move up the stack, there are companies that evolve and learn to do that and provide more value and more value to new folks. Like I talked about how businesspeople are leaders in technology now in a way that they never were before. And you need to give them the value in a way that they can understand it, and they can consume it, and they can trust it. And it's going to continue to move in that direction.And so, what happens then as things move up the stack, the abstractions start happening. And so, there are companies that were just major players in the ‘90s, whether it's Novell or Sun Microsystems or—I was actually getting a tour of the Sunnyvale/Mountain View Google Campuses yesterday. And the tour guide said, “This used to be the site of a company that was called Silicon Graphics. They did something around, like, making things for Avatar.” I felt a little aged at that point.But my point is, there are these companies that were amazing in their time. They didn't move up the stack in a way that met the net set of needs. And it's not like that crater the industry or anything, it's just people were able to move off of it and move up. And I do think that's what we'll see happening.Corey: In some cases, it seems to slip below the waterline and become, effectively, plumbing, where everyone uses it, but no one knows who they are or what they do. The Tier 1 backbone providers these days tend to be in that bucket. Sure, some of them have other businesses, like Verizon. People know who Verizon is, but they're one of the major Tier 1 carriers in the United States just of the internet backbone.Brian: That's right. And that doesn't mean it's not still a great business.Corey: Yeah.Brian: It just means it's not front of mind for maybe the problems you're trying to solve or the opportunities we're trying to capture at that point in time.Corey: So, my last question for you goes circling back to Google Cloud Next. You folks announced an awful lot of things. And most of them, from my perspective, were actually pretty decent. What do you think is the most impactful announcement that you made that the industry largely overlooked?Brian: Most impactful that the industry—well, overlooked might be the wrong way to put this. But there's this really interesting thing happening in the cloud world right now where whereas before companies, kind of, chose their primary cloud writ large, today because multi-cloud is actually happening in the vast majority of companies have things in multiple places, people make—are making also the decision of, “What is going to be my strategic data provider?” And I don't mean data in the sense of the actual data and meta-data and the like, but my data cloud.Corey: Mm-hmm.Brian: How do I choose my data cloud specifically? And there's been this amazing profusion of new data companies that do better ETL or ELT, better data cleaning, better packaging for AI, new techniques for scaling up/scaling down at cost. A lot of really interesting stuff happening in the dataspace. But it's also created almost more silos. And so, the most important announcement that we made probably didn't seem like a really big announcement to a lot of people, but it really was about how we're connecting together more of our data cloud with BigQuery, with unstructured and structured data support, with support for data lakes, including new formats, including Iceberg and Delta and Hudi to come how—Looker is increasingly working with BigQuery in order to make it, so that if you put data into Google Cloud, you not only have these super first-class services that you can use, ranging from databases like Spanner to BigQuery to Looker to AI services, like Vertex AI, but it's also now supporting all these different formats so you can bring third-party applications into that one place. And so, at the big cloud events, it's a new service that is the biggest deal. For us, the biggest deal is how this data cloud is coming together in an open way to let you use the tool that you want to use, whether it's from Google or a third party, all by betting on Google's data cloud.Corey: I'm really impressed by how Google is rather clearly thinking about this from the perspective of the data has to be accessible by a bunch of different things, even though it may take wildly different forms. It is making the data more fluid in that it can go to where the customer needs it to be rather than expecting the customer to come to it where it lives. That, I think, is a trend that we have not seen before in this iteration of the tech industry.Brian: I think you got that—you picked that up very well. And to some degree, if you step back and look at it, it maybe shouldn't be that surprising that Google is adept at that. When you think of what Google search is, how YouTube is essentially another search engine producing videos that deliver on what you're asking for, how information is used with Google Maps, with Google Lens, how it is all about taking information and making it as universally accessible and helpful as possible. And if we can do that for the internet's information, why can't we help businesses do it for their business information? And that's a lot of where Google certainly has a unique approach with Google Cloud.Corey: I really want to thank you for being so generous with your time. If people want to learn more about what you're up to, where's the best place for them to find you?Brian: cloud.google.com for Google Cloud information of course. And if it's still running when this podcast goes, @isforat, I-S-F-O-R-A-T, on Twitter.Corey: And we will put links to both of those in the show notes. Thank you so much for you time. I appreciate it.Brian: Thank you, Corey. It's been good talking with you.Corey: Brian Hall, VP of Product Marketing at Google Cloud. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas, if you've hated this podcast, please, leave a five-star review on your podcast platform of choice along with an insulting angry comment dictating that, “No. Large companies make ten-year-long commitments casually all the time.”Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Making Sense of Data with Harry Perks

Screaming in the Cloud

Play Episode Listen Later Dec 8, 2022 30:48


About HarryHarry has worked at Sysdig for over 6 years, helping organizations mature their journey to cloud native. He's witnessed the evolution of bare metal, VMs, and finally Kubernetes establish itself as the de-facto for container orchestration. He is part of the product team building Sysdig's troubleshooting and cost offering, helping customers increase their confidence operating and managing Kubernetes.Previously, Harry ran, and later sold, a cloud hosting provider where he was working hands on with systems administration. He studied information security and lives in the UK.Links Referenced:Sysdig: https://sysdig.com/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: This episode is brought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups. If you're tired of the vulnerabilities, costs, and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, that's V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode has been brought to us by our friends at Sysdig, and they have sent one of their principal product managers to suffer my slings and arrows. Please welcome Harry Perks.Harry: Hey, Corey, thanks for hosting me. Good to meet you.Corey: An absolute pleasure and thanks for basically being willing to suffer all of the various nonsense about to throw your direction. Let's start with origin stories; I find that those tend to wind up resonating the most. Back when I first noticed Sysdig coming into the market, because it was just launching at that point, it seemed like it was a… we'll call it an innovative approach to observability, though I don't recall that we use the term observability back then. It more or less took a look at whatever an application was doing almost at a system call level and tracing what was going on as those requests worked on an individual system, and then providing those in a variety of different forms to reason about. Is that directionally correct as far as the origin story goes, where my misremembering an evening event I went to what feels like half a lifetime ago?Harry: I'd say the latter, but just because it's a funnier answer. But that's correct. So, Sysdig was created by Loris Degioanni, one of the founders of Wireshark. And when containers and Kubernetes was being incepted, you know, it kind of created this problem where you kind of lacked visibility into what's going on inside these opaque boxes, right? These black boxes which are containers.So, we started using system calls as a source of truth for… I don't want to say observability, but observability, and using those system calls to essentially see what's going on inside containers from the outside. And leveraging system calls, we were able to pull up metrics, such as what are the golden signals of applications running in containers, network traffic. So, it's a very simple way to instrument applications. And that was really how monitoring started. And then Sysdig kind of morphed into a security product.Corey: What was it that drove that transformation? Because generally speaking, when you have a product that's in a particular space that's aimed at a particular niche pivots into something that feels as orthogonal as security don't tend to be something that you see all that often. What did you folks see that wound up driving that change?Harry: The same challenges that were being presented by containers and microservices for monitoring were the same challenges for security. So, for runtime security, it was very difficult for our customers to be able to understand what the heck is going on inside the container. Is a crypto miner being spun up? Is there malicious activity going on? So, it made logical sense to use that same data source - system calls - to understand the monitoring and the security posture of applications.Corey: One of the big challenges out there is that security tends to be one of those pervasive things—I would argue that observability does too—where once you have a position of being able to see what is going on inside of an environment and be able to reason about it. And this goes double for inside of containers, which from a cloud provider perspective, at least seems to be, “Oh, yeah, just give us the containers, we don't care what's going on inside, so we're never going to ask, notice, or care.” And being able to bridge between that lack of visibility between—from the outside of container land and inside of container land has been a perennial problem. There are security implications, there are cost implications, there are observability challenges to be sure, and of course, reliability concerns that flow directly from that, which is, I think, most people, at least historically, contextualize observability. It's a fancy word to describe is the site about to fall over and crash into the sea. At least in my experience. Is that your definition of observability, or if I basically been hijacked by a number of vendors who have decided to relabel what they'd been doing for 15 years as observability?Harry: [laugh]. I think observability is one of those things that is down to interpretation depending on what is the most recent vendor you've been speaking with. But to me, observability is: am I happy? Am I sad? Are my applications happy? Are they sad?Am I able to complete business-critical transactions that keep me online, and keep me afloat? So, it's really as simple as that. There are different ways to implement observability, but it's really, you know, you can't improve the performance, and you can't improve the security posture of things, you can't see, right? So, how do I make sure I can see everything? And what do I do with that data is really what observability means to me.Corey: The entire observability space across the board is really one of those areas that is defined, on some level, by outliers within it. It's easy to wind up saying that any given observability tool will—oh, it alerts you when your application breaks. The problem is that the interesting stuff is often found in the margins, in the outlier products that wind up emerging from it. What is the specific area of that space where Sysdig tends to shine the most?Harry: Yeah, so you're right. The outliers typically cause problems and often you don't know what you don't know. And I think if you look at Kubernetes specifically, there is a whole bunch of new problems and challenges and things that you need to be looking at that didn't exist five to ten years ago, right? There are new things that can break. You know, you've got a pod that's stuck in a CrashLoopBackOff.And hey, I'm a developer who's running my application on Kubernetes. I've got this pod in a CrashLoopBackOff. I don't know what that means. And then suddenly I'm being expected to alert on these problems. Well, how can I alert on things that I didn't even know were a problem?So, one of the things that Sysdig is doing on the observability side is we're looking at all of this data and we're actually presenting opinionated views that help customers make sense of that data. Almost like, you know, I could present this data and give it to my grandma, and she would say, “Oh, yeah, okay. You've got these pods in CrashLoopBackoff you've got these pods that are being CPU throttled. Hey, you know, I didn't know I had to worry about CPU limits, or, you know, memory limits and now I'm suffering, kind of, OOM kills.” So, I think one of the things that's quite unique about Sysdig on the monitoring side that a lot of customers are getting value from is kind of demystifying some of those challenges and making a lot of that data actionable.Corey: At the time of this recording, I've not yet bothered to run Kubernetes in anger by which I, of course, mean production. My production environment is of course called ‘Anger' similarly to the way that my staging environment is called ‘Theory' because things work in theory, but not in production. That is going to be changing in the first quarter of next year, give or take. The challenge with that, though, is that so much has changed—we'll say—since the evolution of Kubernetes into something that is mainstream production in most shops. I stopped working in production environments before that switch really happened, so I'm still at a relatively amateurish level of understanding around a lot of these things.I'm still thinking about old-school problems, like, “Okay, how big do I make each one of the nodes in my Kubernetes cluster?” Yeah, if I get big systems, it's likelier that there will be economies of scale that start factoring in fewer nodes to manage, but it does increase the blast radius if one of those nodes gets affected by something that takes it offline for a while. I'm still at the very early stages of trying to wrap my head around the nuances of running these things in a production environment. Cost is, of course, a separate argument. My clients run it everywhere and I can reason about it surprisingly well for something that is not lending itself to easy understanding it by any sense of the word and you almost have to intuit its existence just by looking at the AWS bill.Harry: No, I like your observations. And I think the last part there around costs is something that I'm seeing a lot in the industry and in our customers is, okay, suddenly, you know, I've got a great monitoring posture, or observability posture, whatever that really means. I've got a great security posture. As customers are maturing in their journey to Kubernetes, suddenly there are a bunch of questions that are being asked from atop—and we've kind of seen this internally—such as, “Hey, what is the ROI of each customer?”Or, “What is the ROI of a specific product line or feature that we deliver to our customers?”And we couldn't answer those problems. And we couldn't answer those problems because we're running a bunch of applications and software on Kubernetes and when we receive our billing reports from the multiple different cloud providers we use— Azure, AWS, and GCP—we just received a big fat bill that was compute, and we were unable to kind of break that down by the different teams and business units, which is a real problem. And one of the problems that we really wanted to start solving, both for internal uses, but also for our customers, as well.Corey: Yeah, when you have a customer coming in, the easy part of the equation is well how much revenue are we getting from a customer? Well, that's easy enough to just wind up polling your finance group and, “Yeah, how much have they paid us this year?” “Great. Good to know.” Then it gets really confusing over on the cost side because it gets into a unit economic model that I think most shops don't have a particularly advanced understanding of.If we have another hundred customers sign up this month, what will it cost us to service them? And what are the variables that change those numbers? It really gets into a fascinating model where people more or less, do some gut checks and some rounding, but there are a bunch of areas where people get extraordinarily confused, start to finish. Kubernetes is very much one of them because from a cloud provider's perspective, it's just a single-tenant app that is really gnarly in terms of its behavior, it does a bunch of different things, and from the bill alone, it's hard to tell that you're even running Kubernetes unless you ask.Harry: Yeah, absolutely. And there was a survey from the CNCF recently that said 68% of folks are seeing increased Kubernetes costs—of course—and 69% of respondents said that they have no cost monitoring in place or just cost estimates, which is simply not good enough, right? People want to break down that line item to those individual business units and in teams. Which is a huge challenge that cloud providers aren't fulfilling today.Corey: Where do you see most of the cost issue breaking down? I mean, there's some of the stuff that we are never allowed to talk about when it comes to cost, which is the realistic assessment that people to work on technology cost more than the technology itself. There's a certain—how do we put this—unflattering perspective that a lot of people are deploying Kubernetes into environments because they want to bolster their own resume, not because it's the actual right answer to anything that they have going on. So, that's a little hit or miss, on some level. I don't know that I necessarily buy into that, but you take a look at the compute storage, you look at the data transfer side, which it seems that almost everyone mostly tends to ignore, despite the fact that Kubernetes itself has no zone affinity, so it has no idea whether its internal communication is free or expensive, and it just adds up to a giant question mark.Then you look at Kubernetes architecture diagrams, or God forbid the CNCF landscape diagram, and realize, oh, my God, they have more of these things, and they do Pokemon, and people give up any hope of understanding it other than just saying, “It's complicated,” and accepting that that's just the way that it is. I'm a little less fatalistic, but I also think it's a heck of a challenge.Harry: Absolutely. I mean, the economics of cloud, right? Why is ingress free, but egress is not free? Why is it so difficult to [laugh] understand that intra AZ traffic is completely billed separately to public traffic, for example? And I think network costs is one thing that is extremely challenging for customers.One, they don't even have that visibility into what is the network traffic: what is internal traffic, what is public traffic. But then there's also a whole bunch of other challenges that are causing Kubernetes costs to rise, right? You've got folks that struggle with setting the right requests for Kubernetes, which ultimately blows up the scale of a Kubernetes cluster. You've got the complexity of AWS, for example, economics of instance types, you know? I don't know whether I need to be running ten m5.xlarge versus four, Graviton instances.And this ability to, kind of, size a cluster correctly as well as size a workload correctly is very, very difficult and customers are not able to establish that baseline today. And obviously, you can't optimize what you can't see, right, so I think a lot of customers struggle with both that visibility. But then the complexity means that it's incredibly difficult to optimize those costs.Corey: You folks are starting to dip your toes in the Kubernetes costing space. What approach are you taking?Harry: Sysdig builds products to Kubernetes first. So, if you look at what we're doing on the monitoring space, we were really kind of pioneered what customers want to get out of Kubernetes observability, and then we were doing similar things for security? So, making sure our security product is, [I want to say,] Kubernetes-native. And what we're doing on the cost side of the things is, of course, there are a lot of cost products out there that will give you the ability to slice and dice by AWS service, for example, but they don't give you that Kubernetes context to then break those costs down by teams and business units. So at Sysdig, we've already been collecting usage information, resource usage information–requests, the container CPU, the memory usage–and a lot of customers have been using that data today for right-sizing, but one of the things they said was, “Hey, I need to quantify this. I need to put a big fat dollar sign in front of some of these numbers we're seeing so I can go to these teams and management and actually prompt them to right-size.”So, it's quite simple. We're essentially augmenting that resource usage information with cost data from cloud providers. So, instead of customers saying, “Hey, I'm wasting one terabyte of memory, they can say, hey, I'm wasting 500 bucks on memory each month,” So, it's very much Kubernetes specific, using a lot of Kubernetes context and metadata.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: Part of the whole problem that I see across the space is that the way to solve some of these problems internally has been when you start trying to divide costs between different teams is well, we're just going to give each one their own cluster, or their own environment. That does definitely solve the problem of shared services. The counterpoint is it solves them by making every team individually incur them. That doesn't necessarily seem like the best approach in every scenario. One thing I have learned, though, is that, for some customers, that is the right approach. Sounds odd, but that's the world we live in where context absolutely matters a lot. I'm very reluctant these days to say at a glance, “Oh, you're doing it wrong.” You eat a whole lot of crow when you do that, it turns out.Harry: I see this a lot. And I see customers giving their own business units, their own AWS account, which I kind of feel like is a step backwards, right? I don't think you're properly harnessing the power of Kubernetes and creating this, kind of, shared tenancy model, when you're giving a team their own AWS account. I think it's important we break down those silos. You know, there's so much operational overhead with maintaining these different accounts, but there must be a better way to address some of these challenges.Corey: It's one of those areas where “it depends” becomes the appropriate answer to almost anything. I'm a fan of having almost every workload have its own AWS account within the same shared AWS organization, then with shared VPCs, which tend to work out. But that does add some complexity to observing how things interact there. One of the guidances that I've given people is assume in the future that in any architecture diagram you ever put up there, that there will be an AWS account boundary between any two resources because someone's going to be doing it somewhere. And that seems to be something that AWS themselves are just slowly starting to awaken to as well. It's getting easier and easier every week to wind up working with multiple accounts in a more complicated structure.Harry: Absolutely. And I think when you start to adopt a multi-cloud strategy, suddenly, you've got so many more increased dimensions. I'm running an application in AWS, Azure, and GCP, and now suddenly, I've got all of these subaccounts. That is an operational overhead that I don't think jives very well, considering there is such a shortage of folks that are real experts—I want to say experts—in operating these environments. And that's really, you know, I think one of the challenges that isn't being spoken enough about today.Corey: It feels like so much of the time that the Kubernetes is winding up being an expression of the same way that getting into microservices was, which is, “Well, we have a people problem, we're going to solve it with this approach.” Great, but then you wind up with people adopting it where they don't have the context that applied when the stuff was originally built and designed for. Like with mono repos. Yeah, it was a problem when you had 5000 developers all try to work on the same thing and stomping each other, so breaking that apart made sense. But the counterpoint of where you wind up with companies with 20 developers and 200 microservices starts to be a little… okay, has this pendulum swung too far?Harry: Yeah, absolutely. And I think that when you've got so many people being thrown at a problem, there's lots of kinds of changes being made, there's new deployments, and I think things can spiral out of control pretty quickly, especially when it comes to costs. “Hey, I'm a developer and I've just made this change. And how do I understand, you know, what is the financial impact of this change?” “Has this blown up my network costs because suddenly, I'm not traversing the right network path?” Or, suddenly, I'm consuming so much more CPU, and actually, there is a physical compute cost of this. There's a lot of cooks in the kitchen and I think that is causing a lot of challenges for organizations.Corey: You've been working in product for a while and one of my favorite parts of being in a position where you are so close to the core of what it is your company does, is that you find it's almost impossible to not continue learning things just based upon how customers take what you built and the problems that they experienced, both that they bring you in to solve, and of course, the new and exciting problems that you wind up causing for them—or to be more charitable surfacing that they didn't realize already existed. What have you learned lately from your customers that you didn't see coming?Harry: One of the biggest problems that I've been seeing is—I speak to a lot of customers and I've maybe spoken to 40 or 50 customers over the last, you know, few months, about a variety of topics, whether it's observability, in general, or, you know, on the financial side, Kubernetes costs–and what I hear about time and time again, regardless as to the vertical or the size of the organization, is the platform teams, the people closest to Kubernetes know their stuff. They get it. But a lot of their internal customers,so the internal business units and teams, they, of course, don't have the same kind of clarity and understanding, and these are the people that are getting the most frustrated. I've been shipping software for 20 years and now I'm modernizing applications, I'm starting to use Kubernetes, I've got so many new different things to learn about that I'm simply drowning, in problems, in cloud-native problems.And I think we forget about that, right? Too often, we kind of spend time throwing fancy technology at the people, such as the, you know, the DevOps engineers, the platform teams, but a lot of internal customers are struggling to leverage that technology to actually solve their own problems. They can't make sense of this data and they can't make the right changes based off of that data.Corey: I would say that is a very common affliction of Kubernetes where so often it winds up handling things that are now abstracted away to the point where we don't need to worry about that. That's true right up until the point where they break and now you have to go diving into the magic. That's one of the reasons that I was such a fan of Sysdig when it first came out was the idea that it was getting into what I viewed at the time as operating system fundamentals and actually seeing what was going on, abstracted away from the vagaries of the code and a lot more into what system calls is it making. Great, okay, now I'm starting to see a lot of calls that it shouldn't necessarily be making, or it's thrashing in a particular way. And it's almost impossible to get to that level of insight—historically—through traditional observability tools, but being able to take a look at what's going on from a more fundamentals point of view was extraordinarily helpful.I'm optimistic if you can get to a point where you're able to do that with Kubernetes, given its enraging ecosystem, for lack of a better term. Whenever you wind up rolling out Kubernetes, you've also got to pick some service delivery stuff, some observability tooling, some log routers, and so on and so forth. It feels like by the time you're running anything in production, you've made so many choices along the way that the odds that anyone else has made the same choices you have are vanishingly small, so you're running your own bespoke unicorn somewhere.Harry: Absolutely. Flip a coin. And that's probably one [laugh] of the solutions that you're going to throw at a problem, right? And you keep flipping that coin and then suddenly, you're going to reach a combination that nobody else has done before. And you're right, the knowledge that you have gained from, I don't know, Corey Quinn Enterprises is probably not going to ring true at Harry Perks Enterprise Limited, right?There is a whole different set of problems and technology and people that, you know, of course, you can bring some of that knowledge along—there are some common denominators—but every organization is ultimately using technology in different ways. Which is problematic, right to the people that are actually pioneering some of these cloud native applications.Corey: Given my professional interest, I am curious about what it is you're doing as you start moving a little bit away from the security and observability sides and into cost observability. How are you approaching that? What are the mistakes that you see people making and how are you meeting them where they are?Harry: The biggest challenge that I am seeing is with sizing workloads and sizing clusters. And I see this time and time again. Our product shines the light on the capacity utilization of compute. And what it really boils down to is two things. Platform teams are not using the correct instance types or the combination of instance types to run the workloads for their teams, their application teams, but also application developers are not setting things like requests correctly.Which makes sense. Again, I flip a coin and maybe that's the request I'm going to set. I used to size a VM with one gig of memory, so now I'm going to size my pod with one gig of memory. But it doesn't really work like that. And of course, when you request usage is essentially my slice of the pizza that's been carved out.And even if I don't see that entire slice of pizza, it's for me, nobody else can use it. So, what we're trying to do is really help customers with that challenge. So, if I'm a developer, I would be looking at the historical usage of our workloads. Maybe it's the maximum usage or, you know, the p99 or the p95 and then setting my workload request to that. You keep doing that over the course of the different team's applications you have and suddenly, you start to establish this baseline of what is the compute actually needed to run all of these applications.And that helps me answer the question, what should I size my cluster to? And that's really important because until you've established that baseline, you can't start to do things like cluster reshaping, to pick a different combination of instance types to power your cluster.Corey: Some level, a lack of diversity in instance types is a bit of a red flag, just because it generally means that someone said, “Oh, yeah, we're going to start with this default instance size and then we'll adjust as time goes on,” and spoilers just like anything else labeled ‘TODO' in your codebase, it never gets done. So, you find yourself pretty quickly in a scenario where some workloads are struggling to get the resources they need inside of whatever that default instance size is, and on the other, you wind up with some things that are more or less running a cron job once a day and sitting there completely idle but running the whole time, regardless. And optimization and right-sizing on a lot of these scenarios is a little bit tricky. I've been something of a, I'll say, a pessimist, when it comes to the idea of right-sizing EC2 instances, just because so many historical workloads are challenging to get recertified on newer instance families and the rest, whereas when we're running on Kubernetes already, presumably everything's built in such a way that it can stop existing in a stateless way and the service still continues to work. If not, it feels like there are some necessary Kubernetes prerequisites that may not have circulated fully internally yet.Harry: Right. And to make this even more complicated, you've got applications that may be more memory intensive or CPU intensive, so understanding the ratio of CPU to memory requirements for their applications depending on how they've been architected makes this more challenging, right? I mean, pods are jumping around and that makes it incredibly difficult to track these movements and actually pick the instances that are going to be most appropriate for my workloads and for my clusters.Corey: I really want to thank you for being so generous with your time. If people want to learn more, where's the best place for them to find you?Harry: sysdig.com is where you can learn more about what Sysdig is doing as a company and our platform in general.Corey: And we will, of course, put a link to that in the show notes. Thank you so much for your time. I appreciate it.Harry: Thank you, Corey. Hope to speak to you again soon.Corey: Harry Perks, principal product manager at Sysdig. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment that we will lose track of because we don't know where it was automatically provisioned.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Multi-Cloud in Sanity with Simen Svale Skogsrud

Screaming in the Cloud

Play Episode Listen Later Dec 6, 2022 34:34


About SimenEver since he started programming simple games on his 8-bit computer back in the day, Simen has been passionate about how software can deliver powerful experiences. Throughout his career he has been a sought-after creator and collaborator for companies seeking to push the envelope with their digital end-user experiences.He co-founded Sanity because the state of the art content tools were consistently holding him, his team and his customers back in delivering on their vision. He is now serving as the CTO of Sanity.Simen loves mountain biking and rock climbing with child-like passion and unwarranted enthusiasm. Over the years he has gotten remarkably good at going over the bars without taking serious damage.Links Referenced: Sanity: https://www.sanity.io/ Semin's Twitter: https://twitter.com/svale/ Slack community for Sanity: https://slack.sanity.io/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: This episode is brought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups. If you're tired of the vulnerabilities, costs, and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, that's V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Today's guest is here to tell a story that I have been actively searching for, for years, and I have picked countless fights in pursuit of it. And until I met today's guest, I was unconvinced that it actually exists. Simen Svale is the co-founder and CTO of a company called Sanity. Simen, thank you for joining me, what is Sanity? What do you folks do over there?Simen: Thank you, Corey. Thank you. So, we used to be this creative agency that came in as, kind of—we would, kind of, Black Hawk Down into a company and help them innovate, and that would be our thing. And these were usually content, a project like media companies, corporate communication, these kinds of companies, we would be coming in and we would develop some ideas with them. And they would love those ideas and then invariably, we wouldn't ever be able to do those ideas because we couldn't change the workflows in their CMS, we couldn't extend their content models, we couldn't really do anything meaningful.So, then we would end up setting up separate tools next to those content tools and they would invariably get lost and never be used after a while. So, we were like, we need to solve this problem, we need to solve it at the source. So, we decided we wanted a new kind of content platform. It would be a content platform consisting of two parts. There will be the, kind of, workspace where you create the content and do the workflows and all that, that will be like an open-source project that you can really customize and build the exact workspace that you need for your company.And then on the other side, you would have this, kind of, content cloud, we call it the content lake. And the point with this is to very often you bring in several different sources, you have your content that you create specifically for a project, but very often you have content from an ERP system, availability of products, time schedules. Let's say you're real estate agent; you have data about your properties that come from other systems. So, this is a system to bring all that together. And then there is another thing that kind of really frustrated me was content systems had content APIs, and content APIs are really particularly, and specifically, about a certain way of using content, whereas we thought content is just data.It should be data, and the API should be a database query language. So, these are, kind of, the components of Sanity, it's a very customizable workspace for working with content and running your content workflows. And it's this content lake, which is this, kind of, cloud for your content.Corey: The idea of a content lake is fascinating, on some level, where it goes beyond what the data lake story, which I've always found to be a little of the weird side when cloud companies get up and talk about this. I remember this distinctly a few years ago at a re:Invent keynote, that Andy Jassy, then the CEO of AWS, got up and talked about customer's data lakes, and here's tools for using that. And I mentioned it to one of my clients it's like, and they looked at me like I was a very small, very simple child and said, “Yeah, that would be great, genius, if we had a data lake, but we don't.” It's like, “You… you have many petabytes of data hanging out in S3. What do you think that is?” “Oh, that just the logs and the assets and stuff.” It's… yeah.Simen: [laugh].Corey: So, it turns out that people don't think about what they have in the same terms, and meeting customers with their terms is challenging. Do you find that people have an idea of what a content cloud or a content lake is before you talk to them about it?Simen: I mean, that's why it took us some time to come up with the word content lake. But we realized, like, our thinking was, the content lake is where you bring all your content to make it curiable and to make it deliverable. So that's, like—you should think, like, as long as I need to present this to end-users, I need to bring it into the content lake. And it's kind of analogous to a data lake. Of course, if you can't curate your data in the data lake, it isn't a data lake, even if you have all the data there. You have to be able to analyze it and deliver it in the format you need it.So, it's kind of an analogy for the same kind of thinking. And a crux of a content lake is it gives you one, kind of, single API that works for all of your content sources. It kind of brings them all in together in one umbrella, which is, kind of, the key here, that teams can then leverage that without learning new APIs and without ordering up new APIs from the other teams.Corey: The story that really got me pointed in your direction is when a mutual friend of ours looked at me and said, “Oh, you haven't talked to them yet?” Because it was in response to a story I've told repeatedly, at length, at anyone who will listen, and by that I include happens to be unfortunate enough to share an elevator ride with me. I'll talk to strangers about this, it doesn't matter. And my argument has been for a long time that multi-cloud, in the sense of, “Oh yeah, we have this one workload and we can just seamlessly deploy it anywhere,” is something that is like cow tipping as Ben Kehoe once put it, in that it doesn't exist and you know it doesn't exist because there are no videos of it happening on YouTube. There are no keynote stories where someone walks out on stage and says, “Oh, yeah, thanks for this company's great product, I had my thing that I built entirely on AWS, and I can seamlessly flip a switch, and now it's running on Google Cloud, and flip the switch again, and now it's running on Azure.”And the idea is compelling, and they're very rarely individual workloads that are built from the beginning to be able to run like that, but it takes significant engineering work. And in practice, no one ever takes advantage of that optionality in most cases. It is vanishingly rare. And our mutual friend said, “Oh, yeah. You should talk to Simen. He's done it.”Simen: [laugh]. Yeah.Corey: Okay, shenanigans on that, but why not? I'm game. So, let me be very direct. What the hell have you done?Simen: [laugh]. So, we didn't know it was hard until I saw his face when I told him. That helps, right? Like, ignorance is bliss. What we wanted was, we were blessed with getting very, very big enterprise customers very early in our startup journey, which is fantastic, but also very demanding.And one thing we saw was, either for compliance reasons or for, kind of, strategic partnership reasons, there were reasons that big, big companies wanted to be on specific service providers. And in a sense, we don't care. Like, we don't want to care. We want to support whatever makes sense. And we are very, let's call it, principled architects, so actually, like, the lower levels of Sanity doesn't know they are part of Sanity, they don't even know about customers.Like, we had already the, kind of, separation of concerns that makes the lower—the, kind of, workload-specific systems of Sanity not know a lot of what they are doing. They are basically just, kind of, processing content, CDN requests, and just doing that, no idea about billing or anything like that. So, when we saw the need for that, we thought, okay, that means we have the, what we call the color charts, which is, kind of, the light bulbs, the ones we can have—we have hundreds and hundreds of them and we can just switch them off and the service still works. And then there's the control plane that is, kind of, the admin interface that the user is use to administrate the resources. We wanted customers to just be able to then say, “I want this workloads, this kind of content store to run on Azure, and I want this one on Google Cloud.” I wanted that to feel the same way regions do. Like, you just choose that and we'll migrate it to wherever you want it. And of course, charge you for that privilege.Corey: Even that is hard to do because when companies say, “Oh, yeah, we didn't have a multi-cloud strategy here,” it's okay, if you're multi-cloud strategy evolves, we have to have this thing on multiple clouds, okay, first as a step one, if you're on AWS—which is where this conversation usually takes place when I'm having this conversation with people, given the nature of what I do for a living—it's, great, first, deploy it to a second AWS region and go active-active between those two. You should—theoretically—have full-service and API compatibility between them, which removes a whole bunch of problems. Just go ahead and do that and show us how easy it is. And then for step two, then talk about other cloud providers. And spoiler, there's never a step two because that stuff is way more difficult than people who have not done it give it credit for being.How did you build your application in such a way that you aren't taking individual dependencies on things that only exist in one particular cloud, either in terms of the technology itself or the behaviors? For example, load balancers come up with different inrush times, RDS instances provision databases at different speeds with different guarantees around certain areas across different cloud providers. At some point, it feels like you have to go back to the building blocks of just rolling everything yourself in containers and taking only internal dependencies. How do you square that circle?Simen: Yeah, I think it's a good point. Like, I guess we had a fear of—my biggest fear in terms of single cloud was just that leverage you provide your cloud provider if you use too many of those kinds of super-specific services, the ones that only they run. Like, so it was, our initial architecture was based on the fact that we would be able to migrate, like, not necessarily multi-cloud, just, if someone really ups the price or behaves terribly, we can say, “Oh, yeah. Then we'll leave for another cloud provider.” So, we only use super generic services, like queue services, blob services, these are pretty generic across the providers.And then we use generic databases like Postgres or Elastic, and we run them pretty generically. So, anyone who can provide, like, a Postgres-style API, we can run on that. We don't use any exotic features. Let's say, picking boring Technologies was the most, kind of, important choice. And then this also goes into our business model because we are a highly integrated database provider.Like in one sense, Sanity is as a content database with this weird go-to-market. Like, people think of us as a CMS, but it is actually the database we charge for. So also, we can't use these very highly integrated services because that's our margin. Like, we want that money, right [laugh]? So, we create that value and then we build that on very simple, very basic building blocks if that makes sense.So, when we wanted to move to a different cloud, everything we needed access to, we could basically build a platform inside Azure that looks exactly like the one we built inside Google, to the applications.Corey: There is something to be said for the approach of using boring technologies. Of course, there's also the story of, “Yeah, I use boring technologies.” “Like what?” “Oh, like, Kubernetes,” is one of the things that people love to say. It's like, “Oh, yes.”My opinion on Kubernetes historically has not been great. Basically, I look at it as if you want to cosplay working at Google but can't pass their technical screen, then Kubernetes is the answer for you. And that's more than a little unfair. And starting early next year, I'm going to be running a production workload myself in Kubernetes, just so I can make fun of it with greater accuracy, honestly, but I'm going to learn things as I go. It is sort of the exact opposite of boring.Even my early experiments with it so far have been, I guess we'll call it unsettling as far as some of the non-deterministic behaviors that have emerged and the rest. How did you go about deciding to build on top of Kubernetes in your situation? Or was it one of those things that just sort of happened to you?Simen: Well, we had been building microservice-based products for a long time internal to our agency, so we kind of knew about all the pains of coordinating, orchestrating, scaling those—Corey: “We want to go with microservices because we're tired of being able to find the problem. We want this to be much more of an exciting murder mystery when something goes down.”Simen: Oh, I've heard that. But I think if you carve up the services the right way, every service becomes simple. It's just so much easier to develop, to reason about. And I've been involved in so many monoliths before that, and then every refactor is like guts on the table is, like, month, kind of, ordeal, super high risk. With the microservices, everything becomes a simple, manageable affair.And you can basically rebuild your whole stack service by service. And you can do—like, it's a realistic thing. Like, you—because all of them are pretty simple. But it's kind of complicated when they are all running inside instances, there's crosstalk with configuration, like, you change the library, and everything kind of breaks. So, Docker was obvious.Like, Docker, that kind of isolation, being able to have different images but sharing the machine resources was amazing. And then, of course, Kubernetes being about orchestrating that made a lot of sense. But that was also compatible with a few things that we have already discovered. Because workloads in Kubernetes needs to be incredibly boring. We talk about boring stuff, like, if you, for example—in the beginning, we had services that start up, they do some, kind of, sanity check, they validate their environment and then they go into action.That in itself breaks the whole experience because what you want Kubernetes-based service to do is basically just do one thing all the time in the same way, use the same amount of memory, the same amount of resources, and just do that one thing at that rate, always. So, we broke apart those things, even the same service runs in different containers, depending on their state. Like, this is the state for doing the Sanity check, this is the state for [unintelligible 00:13:05], this is the state for doing mutations. Same service. So, there's ways about that.I absolutely adore the whole thing. It saved—like, I haven't heard about those pains we used to have in the past ever again. But also, it wasn't an easy choice for me because my single SRE at the time said, like, he was either Kubernetes or he'd quit. So, it was very simple decision.Corey: Exactly. The resume-driven development is very much a thing. I've not one to turn up my nose at that; that's functionally what I've done my entire career. How long had your product been running in an environment like that before, “Well, we're going multi-cloud,” was on the table?Simen: So, that would be three-and-a-half years, I think, yeah. And then we started building it out in Azure.Corey: That's a sizable period of time in the context of trying to understand how something works. If I built something two months ago, and now I have to pick it up and move it somewhere else, that is generally a much easier task as far as migrations go than if the thing has been sitting there for ten years. Because whenever you leave something in an environment like that, it tends to grow roots and takes a number of dependencies, both explicit and implicit, on the environment in which runs. Like, in the early days of AWS, you sort of knew that local disks on the instances were ephemeral because in the early days, that was the only option you had. So, every application had to be written in such a way that it did not presume that there was going to be local disk persistence forever.Docker containers take that a significant step further. Where when that container is gone, it's gone. There is no persistent disk there without some extra steps. And in the early days of Docker, that wasn't really a thing either. Did you discover that you'd take in a bunch of implicit dependencies like that on the original cloud that you were building on?Simen: I'm old school developer. I would all the way back to C. And in C, you need to be incredibly, incredibly careful with your dependencies because you basically—your whole dependency mapping is happening inside of your mind. The language doesn't help you at all. So, I'm always thinking about my kind of project as, kind of, layers of abstraction.If someone talks to Postgres during a request, requests are supposed to be handled in the index, then I'm [laugh] pretty angry. Like, that breaks the whole point. Like, the whole point is that this service doesn't need to know about Postgres. So, we have been pretty hardcore on, like, not having any crosstalk, making sure every service just knows about—like, we had a clear idea which services were allowed to talk to which services. And we were using GVT tokens internally to make sure that authentication and the rights management was just handled on the ingress point and just passed along with records.So, no one was able to talk to user stores or authentication services. That always all happens on the ingress. So, in essence, it was a very pure, kind of, layered platform already. And then, like I said, also then built on super boring technologies. So, it wasn't really a dramatic thing.The drama was more than we didn't maybe, like [laugh] like these sort of cloud services that much. But as you grow older in this industry, you kind of realize that you just hate the technologies differently. And some of the time, you hate a little bit less than others. And that's just how it goes. That's fine. So, that was the pain. We didn't have a lot of pain with our own platform because of these things.Corey: It's so nice watching people who have been around in the ecosystem for long enough to have made all the classic mistakes and realized, oh, that's why common wisdom is what common wisdom is because generally speaking, that shit works, and you learn it yourself from first principles when you decide—poorly, in most cases—to go and reimplement things. Like oh, DNS goes down a lot, so we're just going to rsync around an ETSI hosts file on all of our Linux servers. Yeah, we tried that collectively back in the '70s. It didn't work so well then, either. But every once in a while, some startup founder feels the need to speed-run learning those exact same lessons.What I'm picking up from you is a distinct lack of the traditional startup founder vibe of, “Oh well, the reason that most people don't do things this way is because most people are idiots. I'm smarter than they are. I know best.” I'm getting the exact opposite of that from you where you seemed to wind up wanting to stick to things that are tried and true and, as you said earlier, not exciting.Simen: Yeah, at least for these kinds of [unintelligible 00:17:15]. Like, so we had a similar platform for our customers that we, kind of, used internally before we created Sanity, and when we decided to basically redo the whole thing, but for kind of a self-serve thing and make a product, I went around the developer team and I just asked them, like, “In your experience, what systems that we use are you not thinking about, like, or not having any problems with?” And, like, just make a list of those. And there was a short list that are pretty well known. And some of them has turned out, at the scale we're running now, pretty problematic still.So, it's not like it's all roses. We picked Elasticsearch for some things and that it can be pretty painful. I'm on the market for a better indexing service, for example. And then sometimes you get—let's talk about some mistakes. Like, sometimes you—I still am totally on the microservices train, and if you make sure you design your workloads clearly and have a clear idea about the abstractions and who gets to talk to who, it works.But then if you make a wrong split—so we had a split between a billing service and a, kind of, user and resource management service that now keeps talking back and forth all the time. Like, they have to know about what each other is. And it says, if two services need to know about each other's reciprocally, like, then you're in trouble, then those should be the same service, in my opinion. Or you can split it some other way. So, this is stuff that we've been struggling with.But you're right. My last, kind of, rah-rah thing was Rails and Ruby, and then when I weened off of that, I was like, these technologies work for me. For example, I use Golang a lot. It's a very ugly language. It's very, very useful. You can't argue against the productivity you have in Go, but also the syntax is kind of ugly. And then I realized, like, yeah, I kind of hate everything now, but also, I love the productivity of this.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: There's something to be said for having been in the industry long enough to watch today's exciting new thing becomes tomorrow's legacy garbage that you've got to maintain and support. And I think after a few cycles of that, you wind up becoming almost cynical and burned out on a lot of things that arise that everyone leaves everyone breathless. I am generally one of the last adopters of something. I was very slow to get on virtualization. I was a doomsayer on cloud itself for many years.I turned my nose up at Docker. I mostly skipped the whole Kubernetes thing and decided to be early to serverless, which does not seem to be taking off the way that I wanted it to, so great. It's one of those areas where just having been in the operation side particularly, having to run things and fix them at two in the morning when they inevitably break when some cron job in the middle of the night fires off because no one will be around then to bother. Yeah, great plan. It really, at least in my case, makes me cynical and tired to the point where I got out of running things in anger.You seem to have gone a different direction where oh, you're still going to build and run things. You're just going to do it in a ways that are a lot more well-understood. I think there's a lot of value to that and I don't think that we give enough credit as an industry to people making those decisions.Simen: You know, I was big into Drum and Bass back in the '90s I just love that thing. And then you went away, and then something came was called dubstep. It's the same thing. And it's just better. It's a better Drum and Bass.Corey: Oh yeah, the part where it goes doof, doof, doof, doof, doof, doof, doof—Simen: [laugh]. Exactly.Corey: Has always been—it's yeah, we call it different things, but the doof, doof, doof, doof, doof music is always there. Yeah.Simen: Yeah, yeah, yeah. And I think the thing to recognize, you could either be cynical and say, like, you kids, you're just making the same music we did like 20 years ago, or you can recognize that actually it—Corey: Kids love that, being told that. It's their favorite thing, telling them, “Oh yeah, back when I was your age…” that's how you—that's a signifier of a story that they're going to be riveted to and be really interested in hearing.Simen: [laugh]. Exactly. And I don't think like that because I think you need to recognize that this thing came back and it came back better and stronger. And I think Mark Twain probably didn't say that history doesn't repeat itself, it rhymes. And this is similar thing.Right now I have to contend with the fact that server-based rendering is coming back as a completely new thing, which was like, the thing, always, but also it comes back with new abstractions and new ways of thinking about that and comes back better with better tooling. And kind of—I think the one thing if you can take away from that kind of journey, that you can be stronger by not being excited by shiny new things and not being, kind of, a champion for one specific thing over every other thing. You can just, kind of, see the utility of that. And then when they things come back and they pretend to be new, you can see both the, kind of, tradition of it and maybe see it clearer than most of the people, but also, it's like you said, don't bore the kids because also you should see how it is new, how it is solving new things, and how these kids coming back with the same old thing as a new thing, they saw it differently, they framed it slightly differently, and we are better for it.Corey: There's so much in this industry that we take from others. We all stand on the shoulders of giants, and I think that is something that is part of what makes this industry so fantastic in different ways. Some of the original computer scientists who built some of the things that everyone takes for granted these days are still alive. It's not like the world of physics, for example, where some of the greats wound up discovering these things hundreds of years ago. No, it's all evolved within living memory.That means that we can talk to people, we can humanize them, on some level. It's not some lofty great sitting around and who knows what they would have wanted or how they would have intended this. Now, you have people who helped build the TCP stack stand up and say, “Oh yeah, that was a dumb. We did a dumb. We should not have done it that way.” Oh, great.It's a constant humbling experience watching people evolve things. You mentioned that Go was a really neat language. Back when I wound up failing out of school, before I did that, I took a few classes in C and it was challenging and obnoxious. About like you would expect. And at the beginning of this year, I did a deep-dive into learning go over the course of a couple days enough to build a binary that winds up controlling my internet camera in my home office.And I've learned an awful lot and how to do things and got a lot of things wrong, and it was a really fun language. It was harder to do a lot of the ill-considered things that get people into trouble with C.Simen: Hmm.Corey: The idea that people are getting nice things in a way that we didn't have them back when we were building things the first time around is great. If you're listening to this, it is imperative—listen to me—it is imperative. Do not email me about Rust. I don't want to hear it.Simen: [laugh].Corey: But I love the fact that our tools are now stuff that we can use in sensible ways. These days, as you look at using sensible tools—which in this iteration, I will absolutely say that using a hyperscale public cloud provider is the right move; that's the way to go—do you find that, given that you started over hanging out on Google Cloud, and now you're running workloads everywhere, do you have an affinity for one as your primary cloud, or does everything you've built wind up seamlessly flowing back and forth?Simen: So, of course, we have a management interface that our end-users, kind of, use to monitor, and it has to be—at least has to have a home somewhere, even though the data can be replicated everywhere. So, that's in Google Cloud because that's where we started. And also, I think GCP is what our team likes the most. They think it's the most solid platform.Corey: Its developer experience is far and away the best of all the major cloud providers. Bar none. I've been saying that for a while. When I first started using it, I thought I was going to just be making fun of it, but this is actually really good was my initial impression, and that impression has never faded.Simen: Yeah. No, it's like it's terrible, as well, but it's the least terrible platform of them all. But I think we would not make any decisions based on that. As long as it's solid, as long as it's stable, and as long as, kind of, price is reasonable and business practices is, kind of, sound, we would work with any provider. And hopefully, we would also work with less… let's call it less famous, more niche providers in the future to provide, let's say, specific organizations that need very, very specific policies or practices, we will be happy to support. I want to go there in the future. And that might require some exotic integrations and ways of building things.Corey: A multi-cloud story that I used to tell—in the broader sense—used PagerDuty as an example because that is the service that does one thing really well, and that is wake you up when something sends the right kind of alert. And they have multiple cloud providers historically that they use. And the story that came out of it was, yeah, as I did some more digging into what they've done and how they talked about this, it's clear that the thing that wakes you up in the middle of the night absolutely has to work across a whole bunch of different providers because if it's on one, what happens when that's the one that goes down? We learned that when AWS took an outage in 2011 or 2012, and PagerDuty went down as a result of that. So, the thing that wakes you up absolutely lives in a bunch of different places on a bunch of different providers.But their marketing site doesn't have to. Their user control panel doesn't have to. If there's an outage in their primary cloud that is sufficiently gruesome enough, okay, they can have a degraded mode where you're not able to update and set up new alerts and add new users into your account because everything's on fire in those moments anyway, that's an acceptable trade-off. But the thing that wakes you up absolutely must work all the time. So, it's the idea of this workload has got to live in a bunch of places, but not every workload looks like that.As you look across the various services and things you have built that comprise a company, do you find that you're biasing for running most things in a single provider or do you take that default everywhere approach?Simen: No, I think that to us, it is—and we're not—that's something we haven't—work we haven't done yet, but architecturally, it will work fine. Because as long as we serve queries, like, we have to—like components, like, people write stuff, they create new content, and that needs to be up as much as possible. But of course, when that goes down, if we still serve queries, their properties are still up, right? Their websites or whatever is still serving content.So, if we were to make things kind of cross-cloud redundant, it would be the CDN, like, indexes and the varnish caches and have those [unintelligible 00:27:23]. But it is a challenge in terms of how you do routing. And let's say the routing provider is down. How do you deal with that? Like, there's been a number of DNS outages and I would love to figure out how to get around that. We just, right now, people would have to manually, kind of, change their—we have backup ingress points with the—yeah, that's a challenge.Corey: One of the areas where people get into trouble with multi-cloud as well, that I've found, has been that people do it with that idea of getting rid of single points of failure, which makes a lot of sense. But in practice, what so many of them have done is inadvertently added multiple points of failure, all of which are single-tracked. So okay, now we're across to cloud providers, so we get exposure to everyone's outages, is how that winds up looking. I've seen companies that have been intentionally avoiding AWS because great, when they go down and the internet breaks, we still want our store to be up. Great, but they take a dependency on Stripe who is primarily in AWS, so depending on the outage, people may very well not be able to check out of their store, so what did they gain by going to another provider? Because now when that provider goes down, their site is down then too.Simen: Mmm. Yeah. It's interesting that anything works at all, actually, like, seeing how intertwined everything is. But I think that is, to me, the amazing part, like you said, someone's marketing site doesn't have to be moved to the cloud, or maybe some of it does. And I find it interesting that, like, in the serverless space, even if we provide a very—like, we have super advanced engineers and we do complex orchestration over cloud services, we don't run anything else, right?Like, all of our, kind of, web properties is run with highly integrated, basically on Vercel, mostly, right? Like we don't want to know about—like, we don't even know which cloud that's running on, right? And I think that's how it should be because most things, like you said, most things are best outsourced to another company and have them worry, like, have them worry when things are going down. And that's how I feel about these things that, yes, you cannot be totally protected, but at least you can outsource some of that worry to someone who really knows what—like, if Stripe goes down, most people don't have the resources to worry at the level that Stripe would worry, right? So, at least you have that.Corey: Exactly. Yeah, if you ignore the underlying cloud provider stuff, they do a lot of things I don't want to have to become an expert in. Effectively, you wind up getting your payment boundary through them; you don't have to worry about PCI yourself at all; you can hand it off to them. That's value.Simen: Exactly. Yeah.Corey: Like, the infrastructure stuff is just table stakes compared to a lot of the higher up the stack value that companies in that position enjoy. Yeah, I'm not sitting here saying don't use Stripe. I want to be very clear on that.Simen: No, no, no. No, I got you. I got you. I just remember, like, so we talked about maybe you hailing all the way back to Seattle, so hail all the way back to having your own servers in a, kind of, place somewhere that you had to drive to, to replace a security card because when the hard drive was down. Or like, oh, you had to scale up and now you have to buy five servers, you have to set them up and drive them to the—and put them into the slots.Like, yes, you can fix any problem yourself. Perfect. But also, you had to fix every problem yourself. I'm so happy to be able to pay Google or AWS or Azure to have that worry for me, to have that kind of redundancy on hand. And clearly, we are down less time now that we have less control [laugh] if that makes sense.Corey: I really want to thank you for being so generous with your time. If people want to learn more, where's the best place for them to find you?Simen: So, I'm at @svale—at Svale—on Twitter, and my DMs are open. And also we have a Slack community for Sanity, so if you want to kind of engage with Sanity, you can join our Slack community, and that will be on there as well. And you find it in the footer on all of the sanity.io webpages.Corey: And we will put links to that in the show notes.Simen: Perfect.Corey: Thank you so much for being so generous with your time. I really appreciate it.Simen: Thank you. This was fun.Corey: Simen Svale, CTO and co-founder at Sanity. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment, and make sure you put that insulting comment on all of the different podcast platforms that are out there because you have to run everything on every cloud provider.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Rapid Rise of Vector Databases with Ram Sriharsha

Screaming in the Cloud

Play Episode Listen Later Dec 2, 2022 31:41


About RamDr. Ram Sriharsha held engineering, product management, and VP roles at the likes of Yahoo, Databricks, and Splunk. At Yahoo, he was both a principal software engineer and then research scientist; at Databricks, he was the product and engineering lead for the unified analytics platform for genomics; and, in his three years at Splunk, he played multiple roles including Sr Principal Scientist, VP Engineering and Distinguished Engineer.Links Referenced: Pinecone: https://www.pinecone.io/ XKCD comic: https://www.explainxkcd.com/wiki/index.php/1425:_Tasks TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Chronosphere. Tired of observability costs going up every year without getting additional value? Or being locked into a vendor due to proprietary data collection, querying, and visualization? Modern-day, containerized environments require a new kind of observability technology that accounts for the massive increase in scale and attendant cost of data. With Chronosphere, choose where and how your data is routed and stored, query it easily, and get better context and control. 100% open-source compatibility means that no matter what your setup is, they can help. Learn how Chronosphere provides complete and real-time insight into ECS, EKS, and your microservices, wherever they may be at snark.cloud/chronosphere that's snark.cloud/chronosphere.Corey: This episode is brought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups. If you're tired of the vulnerabilities, costs, and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, that's V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Today's promoted guest episode is brought to us by our friends at Pinecone and they have given their VP of Engineering and R&D over to suffer my various sling and arrows, Ram Sriharsha. Ram, thank you for joining me.Ram: Corey, great to be here. Thanks for having me.Corey: So, I was immediately intrigued when I wound up seeing your website, pinecone.io because it says right at the top—at least as of this recording—in bold text, “The Vector Database.” And if there's one thing that I love, it is using things that are not designed to be databases as databases, or inappropriately referring to things—be they JSON files or senior engineers—as databases as well. What is a vector database?Ram: That's a great question. And we do use this term correctly, I think. You can think of customers of Pinecone as having all the data management problems that they have with traditional databases; the main difference is twofold. One is there is a new data type, which is vectors. Vectors, you can think of them as arrays of floats, floating point numbers, and there is a new pattern of use cases, which is search.And what you're trying to do in vector search is you're looking for the nearest, the closest vectors to a given query. So, these two things fundamentally put a lot of stress on traditional databases. So, it's not like you can take a traditional database and make it into a vector database. That is why we coined this term vector database and we are building a new type of vector database. But fundamentally, it has all the database challenges on a new type of data and a new query pattern.Corey: Can you give me an example of what, I guess, an idealized use case would be of what the data set might look like and what sort of problem you would have in a vector database would solve?Ram: A very great question. So, one interesting thing is there's many, many use cases. I'll just pick the most natural one which is text search. So, if you're familiar with the Elastic or any other traditional text search engines, you have pieces of text, you index them, and the indexing that you do is traditionally an inverted index, and then you search over this text. And what this sort of search engine does is it matches for keywords.So, if it finds a keyword match between your query and your corpus, it's going to retrieve the relevant documents. And this is what we call text search, right, or keyword search. You can do something similar with technologies like Pinecone, but what you do here is instead of searching our text, you're searching our vectors. Now, where do these vectors come from? They come from taking deep-learning models, running your text through them, and these generate these things called vector embeddings.And now, you're taking a query as well, running them to deep-learning models, generating these query embeddings, and looking for the closest record embeddings in your corpus that are similar to the query embeddings. This notion of proximity in this space of vectors tells you something about semantic similarity between the query and the text. So suddenly, you're going beyond keyword search into semantic similarity. An example is if you had a whole lot of text data, and maybe you were looking for ‘soda,' and you were doing keyword search. Keyword search will only match on variations of soda. It will never match ‘Coca-Cola' because Coca-Cola and soda have nothing to do with each other.Corey: Or Pepsi, or pop, as they say in the American Midwest.Ram: Exactly.Corey: Yeah.Ram: Exactly. However, semantic search engines can actually match the two because they're matching for intent, right? If they find in this piece of text, enough intent to suggest that soda and Coca-Cola or Pepsi or pop are related to each other, they will actually match those and score them higher. And you're very likely to retrieve those sort of candidates that traditional search engines simply cannot. So, this is a canonical example, what's called semantic search, and it's known to be done better by these other vector search engines. There are also other examples in say, image search. Just if you're looking for near duplicate images, you can't even do this today without a technology like vector search.Corey: What is the, I guess, translation or conversion process of existing dataset into something that a vector database could use? Because you mentioned it was an array of floats was the natural vector datatype. I don't think I've ever seen even the most arcane markdown implementation that expected people to wind up writing in arrays of floats. What does that look like? How do you wind up, I guess, internalizing or ingesting existing bodies of text for your example use case?Ram: Yeah, this is a very great question. This used to be a very hard problem and what has happened over the last several years in deep-learning literature, as well as in deep-learning as a field itself, is that there have been these large, publicly trained models, examples will be OpenAI, examples will be the models that are available in Hugging Face like Cohere, and a large number of these companies have come forward with very well trained models through which you can pass pieces of text and get these vectors. So, you no longer have to actually train these sort of models, you don't have to really have the expertise to deeply figured out how to take pieces of text and build these embedding models. What you can do is just take a stock model, if you're familiar with OpenAI, you can just go to OpenAIs homepage and pick a model that works for you, Hugging Face models, and so on. There's a lot of literature to help you do this.Sophisticated customers can also do something called fine-tuning, which is built on top of these models to fine-tune for their use cases. The technology is out there already, there's a lot of documentation available. Even Pinecone's website has plenty of documentation to do this. Customers of Pinecone do this [unintelligible 00:07:45], which is they take piece of text, run them through either these pre-trained models or through fine-tuned models, get the series of floats which represent them, vector embeddings, and then send it to us. So, that's the workflow. The workflow is basically a machine-learning pipeline that either takes a pre-trained model, passes them through these pieces of text or images or what have you, or actually has a fine-tuning step in it.Corey: Is that ingest process something that not only benefits from but also requires the use of a GPU or something similar to that to wind up doing the in-depth, very specific type of expensive math for data ingestion?Ram: Yes, very often these run on GPUs. Sometimes, depending on budget, you may have compressed models or smaller models that run on CPUs, but most often they do run on GPUs, most often, we actually find people make just API calls to services that do this for them. So, very often, people are actually not deploying these GPU models themselves, they are maybe making a call to Hugging Face's service, or to OpenAI's service, and so on. And by the way, these companies also democratized this quite a bit. It was much, much harder to do this before they came around.Corey: Oh, yeah. I mean, I'm reminded of the old XKCD comic from years ago, which was, “Okay, I want to give you a picture. And I want you to tell me it was taken within the boundaries of a national park.” Like, “Sure. Easy enough. Geolocation information is attached. It'll take me two hours.” “Cool. And I also want you to tell me if it's a picture of a bird.” “Okay, that'll take five years and a research team.”And sure enough, now we can basically do that. The future is now and it's kind of wild to see that unfolding in a human perceivable timespan on these things. But I guess my question now is, so that is what a vector database does? What does Pinecone specifically do? It turns out that as much as I wish it were otherwise, not a lot of companies are founded on, “Well, we have this really neat technology, so we're just going to be here, well, in a foundational sense to wind up ensuring the uptake of that technology.” No, no, there's usually a monetization model in there somewhere. Where does Pinecone start, where does it stop, and how does it differentiate itself from typical vector databases? If such a thing could be said to exist yet.Ram: Such a thing doesn't exist yet. We were the first vector database, so in a sense, building this infrastructure, scaling it, and making it easy for people to operate it in a SaaS fashion is our primary core product offering. On top of that, this very recently started also enabling people who have who actually have raw text to not just be able to get value from these vector search engines and so on, but also be able to take advantage of traditional what we call keyword search or sparse retrieval and do a combined search better, in Pinecone. So, there's value-add on top of this that we do, but I would say the core of it is building a SaaS managed platform that allows people to actually easily store as data, scale it, query it in a way that's very hands off and doesn't require a lot of tuning or operational burden on their side. This is, like, our core value proposition.Corey: Got it. There's something to be said for making something accessible when previously it had only really been available to people who completed the Hello World tutorial—which generally resembled a doctorate at Berkeley or Waterloo or somewhere else—and turn it into something that's fundamentally, click the button. Where on that, I guess, a spectrum of evolution do you find that Pinecone is today?Ram: Yeah. So, you know, prior to Pinecone, we didn't really have this notion of a vector database. For several years, we've had libraries that are really good that you can pre-train on your embeddings, generate this thing called an index, and then you can search over that index. There is still a lot of work to be done even to deploy that and scale it and operate it in production and so on. Even that was not being, kind of, offered as a managed service before.What Pinecone does which is novel, is you no longer have to have this pre-training be done by somebody, you no longer have to worry about when to retrain your indexes, what to do when you have new data, what to do when there is deletions, updates, and the usual data management operations. You can just think of this is, like, a database that you just throw your data in. It does all the right things for you, you just worry about querying. This has never existed before, right? This is—it's not even like we are trying to make the operational part of something easier. It is that we are offering something that hasn't existed before, at the same time, making it operationally simple.So, we're solving two problems, which is we building a better database that hasn't existed before. So, if you really had this sort of data management problems and you wanted to build an index that was fresh that you didn't have to super manually tune for your own use cases, that simply couldn't have been done before. But at the same time, we are doing all of this in a cloud-native fashion; it's easy for you to just operate and not worry about.Corey: You've said that this hasn't really been done before, but this does sound like it is more than passingly familiar specifically to the idea of nearest neighbor search, which has been around since the '70s in a bunch of different ways. So, how is it different? And let me of course, ask my follow-up to that right now: why is this even an interesting problem to start exploring?Ram: This is a great question. First of all, nearest neighbor search is one of the oldest forms of machine learning. It's been known for decades. There's a lot of literature out there, there are a lot of great libraries as I mentioned in the passing before. All of these problems have primarily focused on static corpuses. So basically, you have a set of some amount of data, you want to create an index out of it, and you want to query it.A lot of literature has focused on this problem. Even there, once you go from small number of dimensions to large number of dimensions, things become computationally far more challenging. So, traditional nearest neighbor search actually doesn't scale very well. What do I mean by large number of dimensions? Today, deep-learning models that produce image representations typically operate in 2048 dimensions of photos [unintelligible 00:13:38] dimensions. Some of the OpenAI models are even 10,000 dimensional and above. So, these are very, very large dimensions.Most of the literature prior to maybe even less than ten years back has focused on less than ten dimensions. So, it's like a scale apart in dealing with small dimensional data versus large dimensional data. But even as of a couple of years back, there hasn't been enough, if any, focus on what happens when your data rapidly evolves. For example, what happens when people add new data? What happens if people delete some data? What happens if your vectors get updated? These aren't just theoretical problems; they happen all the time. Customers of ours face this all the time.In fact, the classic example is in recommendation systems where user preferences change all the time, right, and you want to adapt to that, which means your user vectors change constantly. When even these sort of things change constantly, you want your index to reflect it because you want your queries to catch on to the most recent data. [unintelligible 00:14:33] have to reflect the recency of your data. This is a solved problem for traditional databases. Relational databases are great at solving this problem. A lot of work has been done for decades to solve this problem really well.This is a fundamentally hard problem for vector databases and that's one of the core focus areas [unintelligible 00:14:48] painful. Another problem that is hard for these sort of databases is simple things like filtering. For example, you have a corpus of say product images and you want to only look at images that maybe are for the Fall shopping line, right? Seems like a very natural query. Again, databases have known and solved this problem for many, many years.The moment you do nearest neighbor search with these sort of constraints, it's a hard problem. So, it's just the fact that nearest neighbor search and lots of research in this area has simply not focused on what happens to that, so those are of techniques when combined with data management challenges, filtering, and all the traditional challenges of a database. So, when you start doing that you enter a very novel area to begin with.Corey: This episode is sponsored in part by our friends at Redis, the company behind the incredibly popular open-source database. If you're tired of managing open-source Redis on your own, or if you are looking to go beyond just caching and unlocking your data's full potential, these folks have you covered. Redis Enterprise is the go-to managed Redis service that allows you to reimagine how your geo-distributed applications process, deliver and store data. To learn more from the experts in Redis how to be real-time, right now, from anywhere, visit snark.cloud/redis. That's snark dot cloud slash R-E-D-I-S.Corey: So, where's this space going, I guess is sort of the dangerous but inevitable question I have to ask. Because whenever you talk to someone who is involved in a very early stage of what is potentially a transformative idea, it's almost indistinguishable from someone who is whatever the polite term for being wrapped around their own axle is, in a technological sense. It's almost a form of reverse Schneier's Law of anyone can create an encryption algorithm that they themselves cannot break. So, the possibility that this may come back to bite us in the future if it turns out that this is not potentially the revelation that you see it as, where do you see the future of this going?Ram: Really great question. The way I think about it is, and the reason why I keep going back to databases and these sort of ideas is, we have a really great way to deal with structured data and structured queries, right? This is the evolution of the last maybe 40, 50 years is to come up with relational databases, come up with SQL engines, come up with scalable ways of running structured queries on large amounts of data. What I feel like this sort of technology does is it takes it to the next level, which is you can actually ask unstructured questions on unstructured data, right? So, even the couple of examples we just talked about, doing near duplicate detection of images, that's a very unstructured question. What does it even mean to say that two images are nearly duplicate of each other? I couldn't even phrase it as kind of a concrete thing. I certainly cannot write a SQL statement for it, but I cannot even phrase it properly.With these sort of technologies, with the vector embeddings, with deep learning and so on, you can actually mathematically phrase it, right? The mathematical phrasing is very simple once you have the right representation that understands your image as a vector. Two images are nearly duplicate if they are close enough in the space of vectors. Suddenly you've taken a problem that was even hard to express, let alone compute, made it precise to express, precise to compute. This is going to happen not just for images, not just for semantic search, it's going to happen for all sorts of unstructured data, whether it's time series, where it's anomaly detection, whether it's security analytics, and so on.I actually think that fundamentally, a lot of fields are going to get disrupted by this sort of way of thinking about things. We are just scratching the surface here with semantic search, in my opinion.Corey: What is I guess your barometer for success? I mean, if I could take a very cynical point of view on this, it's, “Oh, well, whenever there's a managed vector database offering from AWS.” They'll probably call it Amazon Basics Vector or something like that. Well, that is a—it used to be a snarky observation that, “Oh, we're not competing, we're just validating their market.” Lately, with some of their competitive database offerings, there's a lot more truth to that than I suspect AWS would like.Their offerings are nowhere near as robust as what they pretend to be competing against. How far away do you think we are from the larger cloud providers starting to say, “Ah, we got the sense there was money in here, so we're launching an entire service around this?”Ram: Yeah. I mean, this is a—first of all, this is a great question. There's always something that's constantly, things that any innovator or disrupter has to be thinking about, especially these days. I would say that having a multi-year head, start in the use cases, in thinking about how this system should even look, what sort of use cases should it [unintelligible 00:19:34], what the operating points for the [unintelligible 00:19:37] database even look like, and how to build something that's cloud-native and scalable, is very hard to replicate. Meaning if you look at what we have already done and kind of tried to base the architecture of that, you're probably already a couple of years behind us in terms of just where we are at, right, not just in the architecture, but also in the use cases in where this is evolving forward.That said, I think it is, for all of these companies—and I would put—for example, Snowflake is a great example of this, which is Snowflake needn't have existed if Redshift had done a phenomenal job of being cloud-native, right, and kind of done that before Snowflake did it. In hindsight, it seems like it's obvious, but when Snowflake did this, it wasn't obvious that that's where everything was headed. And Snowflake built something that's very technologically innovative, in a sense that it's even now hard to replicate. Plus, it takes a long time to replicate something like that. I think that's where we are at.If Pinecone does its job really well and if we simply execute efficiently, it's very hard to replicate that. So, I'm not super worried about cloud providers, to be honest, in this space, I'm more worried about our execution.Corey: If it helps anything, I'm not very deep into your specific area of the world, obviously, but I am optimistic when I hear people say things like that. Whenever I find folks who are relatively early along in their technological journey being very concerned about oh, the large cloud provider is going to come crashing in, it feels on some level like their perspective is that they have one weird trick, and they were able to crack that, but they have no defensive mode because once someone else figures out the trick, well, okay, now we're done. The idea of sustained and lasting innovation in a space, I think, is the more defensible position to take, with the counterargument, of course, that that's a lot harder to find.Ram: Absolutely. And I think for technologies like this, that's the only solution, which is, if you really want to avoid being disrupted by cloud providers, I think that's the way to go.Corey: I want to talk a little bit about your own background. Before you wound up as the VP of R&D over at Pinecone, you were in a bunch of similar… I guess, similar styled roles—if we'll call it that—at Yahoo, Databricks, and Splunk. I'm curious as to what your experience in those companies wound up impressing on you that made you say, “Ah, that's great and all, but you know what's next? That's right, vector databases.” And off, you went to Pinecone. What did you see?Ram: So, first of all, in was some way or the other, I have been involved in machine learning and systems and the intersection of these two for maybe the last decade-and-a-half. So, it's always been something, like, in the in between the two and that's been personally exciting to me. So, I'm kind of very excited by trying to think about new type of databases, new type of data platforms that really leverages machine learning and data. This has been personally exciting to me. I obviously learned very different things from different companies.I would say that Yahoo was just the learning in cloud to begin with because prior to joining Yahoo, I wasn't familiar with Silicon Valley cloud companies at that scale and Yahoo is a big company and there's a lot to learn from there. It was also my first introduction to Hadoop, Spark, and even machine learning where I really got into machine learning at scale, in online advertising and areas like that, which was a massive scale. And I got into that in Yahoo, and it was personally exciting to me because there's very few opportunities where you can work on machine learning at that scale, right?Databricks was very exciting to me because it was an earlier-stage company than I had been at before. Extremely well run and I learned a lot from Databricks, just the team, the culture, the focus on innovation, and the focus on product thinking. I joined Databricks as a product manager. I hadn't played the product manager hat before that, so it was very much a learning experience for me and I think I learned from some of the best in that area. And even at Pinecone, I carry that forward, which is think about how my learnings at Databricks informs how we should be thinking about products at Pinecone, and so on. So, I think I learned—if I had to pick one company I learned a lot from, I would say, it's Databricks. The most [unintelligible 00:23:50].Corey: I would also like to point out, normally when people say, “Oh, the one company I've learned the most from,” and they pick one of them out of their history, it's invariably the most recent one, but you left there in 2018—Ram: Yeah.Corey: —then went to go spend the next three years over at Splunk, where you were a Senior Principal, Scientist, a Senior Director and Head of Machine-Learning, and then you decided, okay, that's enough hard work. You're going to do something easier and be the VP of Engineering, which is just wild at a company of that scale.Ram: Yeah. At Splunk, I learned a lot about management. I think managing large teams, managing multiple different teams, while working on very different areas is something I learned at Splunk. You know, I was at this point in my career when I was right around trying to start my own company. Basically, I was at a point where I'd taken enough learnings and I really wanted to do something myself.That's when Edo and I—you know, the CEO of Pinecone—and I started talking. And we had worked together for many years, and we started working together at Yahoo. We kept in touch with each other. And we started talking about the sort of problems that I was excited about working on and then I came to realize what he was working on and what Pinecone was doing. And we thought it was a very good fit for the two of us to work together.So, that is kind of how it happened. It sort of happened by chance, as many things do in Silicon Valley, where a lot of things just happen by network and chance. That's what happened in my case. I was just thinking of starting my own company at the time when just a chance encounter with Edo led me to Pinecone.Corey: It feels from my admittedly uninformed perspective, that a lot of what you're doing right now in the vector database area, it feels on some level, like it follows the trajectory of machine learning, in that for a long time, the only people really excited about it were either sci-fi authors or folks who had trouble explaining it to someone without a degree in higher math. And then it turned into—a couple of big stories from the mid-2010s stick out at me when we've been people were trying to sell this to me in a variety of different ways. One of them was, “Oh, yeah, if you're a giant credit card processing company and trying to detect fraud with this kind of transaction volume—” it's, yeah, there are maybe three companies in the world that fall into that exact category. The other was WeWork where they did a lot of computer vision work. And they used this to determine that at certain times of day there was congestion in certain parts of the buildings and that this was best addressed by hiring a second barista. Which distilled down to, “Wait a minute, you're telling me that you spent how much money on machine-learning and advanced analyses and data scientists and the rest have figured out that people like to drink coffee in the morning?” Like, that is a little on the ridiculous side.Now, I think that it is past the time for skepticism around machine learning when you can go to a website and type in a description of something and it paints a picture of the thing you just described. Or you can show it a picture and it describes what is in that picture fairly accurately. At this point, the only people who are skeptics, from my position on this, seem to be holding out for some sort of either next-generation miracle or are just being bloody-minded. Do you think that there's a tipping point for vector search where it's going to become blindingly obvious to, if not the mass market, at least more run-of-the-mill, more prosaic level of engineer that haven't specialized in this?Ram: Yeah. It's already, frankly, started happening. So, two years back, I wouldn't have suspected this fast of an adoption for this new of technology from this varied number of use cases. I just wouldn't have suspected it because I, you know, I still thought, it's going to take some time for this field to mature and, kind of, everybody to really start taking advantage of this. This has happened much faster than even I assumed.So, to some extent, it's already happening. A lot of it is because the barrier to entry is quite low right now, right? So, it's very easy and cost-effective for people to create these embeddings. There is a lot of documentation out there, things are getting easier and easier, day by day. Some of it is by Pinecone itself, by a lot of work we do. Some of it is by, like, companies that I mentioned before who are building better and better models, making it easier and easier for people to take these machine-learning models and use them without having to even fine-tune anything.And as technologies like Pinecone really mature and dramatically become cost-effective, the barrier to entry is very low. So, what we tend to see people do, it's not so much about confidence in this new technology; it is connecting something simple that I need this sort of value out of, and find the least critical path or the simplest way to get going on this sort of technology. And as long as it can make that barrier to entry very small and make this cost-effective and easy for people to explore, this is going to start exploding. And that's what we are seeing. And a lot of Pinecone's focus has been on ease-of-use, in simplicity in connecting the zero-to-one journey for precisely this reason. Because not only do we strongly believe in the value of this technology, it's becoming more and more obvious to the broader community as well. The remaining work to be done is just the ease of use and making things cost-effective. And cost-effectiveness is also what the focus on a lot. Like, this technology can be even more cost-effective than it is today.Corey: I think that it is one of those never-mistaken ideas to wind up making something more accessible to folks than keeping it in a relatively rarefied environment. We take a look throughout the history of computing in general and cloud in particular, were formerly very hard things have largely been reduced down to click the button. Yes, yes, and then get yelled at because you haven't done infrastructure-as-code, but click the button is still possible. I feel like this is on that trendline based upon what you're saying.Ram: Absolutely. And the more we can do here, both Pinecone and the broader community, I think the better, the faster the adoption of this sort of technology is going to be.Corey: I really want to thank you for spending so much time talking me through what it is you folks are working on. If people want to learn more, where's the best place for them to go to find you?Ram: Pinecone.io. Our website has a ton of information about Pinecone, as well as a lot of standard documentation. We have a free tier as well where you can play around with small data sets, really get a feel for vector search. It's completely free. And you can reach me at Ram at Pinecone. I'm always happy to answer any questions. Once again, thanks so much for having me.Corey: Of course. I will put links to all of that in the show notes. This promoted guest episode is brought to us by our friends at Pinecone. Ram Sriharsha is their VP of Engineering and R&D. And I'm Cloud Economist Corey Quinn. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment that I will never read because the search on your podcast platform is broken because it's not using a vector database.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Complexities of AWS Cost Optimization with Rick Ochs

Screaming in the Cloud

Play Episode Listen Later Dec 1, 2022 46:56


About RickRick is the Product Leader of the AWS Optimization team. He previously led the cloud optimization product organization at Turbonomic, and previously was the Microsoft Azure Resource Optimization program owner.Links Referenced: AWS: https://console.aws.amazon.com LinkedIn: https://www.linkedin.com/in/rick-ochs-06469833/ Twitter: https://twitter.com/rickyo1138 TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Chronosphere. Tired of observability costs going up every year without getting additional value? Or being locked in to a vendor due to proprietary data collection, querying and visualization? Modern day, containerized environments require a new kind of observability technology that accounts for the massive increase in scale and attendant cost of data. With Chronosphere, choose where and how your data is routed and stored, query it easily, and get better context and control. 100% open source compatibility means that no matter what your setup is, they can help. Learn how Chronosphere provides complete and real-time insight into ECS, EKS, and your microservices, whereever they may be at snark.cloud/chronosphere That's snark.cloud/chronosphere Corey: This episode is bought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups.  If you're tired of the vulnerabilities, costs and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, thats V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. For those of you who've been listening to this show for a while, the theme has probably emerged, and that is that one of the key values of this show is to give the guest a chance to tell their story. It doesn't beat the guests up about how they approach things, it doesn't call them out for being completely wrong on things because honestly, I'm pretty good at choosing guests, and I don't bring people on that are, you know, walking trash fires. And that is certainly not a concern for this episode.But this might devolve into a screaming loud argument, despite my best effort. Today, I'm joined by Rick Ochs, Principal Product Manager at AWS. Rick, thank you for coming back on the show. The last time we spoke, you were not here you were at, I believe it was Turbonomic.Rick: Yeah, that's right. Thanks for having me on the show, Corey. I'm really excited to talk to you about optimization and my current role and what we're doing.Corey: Well, let's start at the beginning. Principal product manager. It sounds like one of those corporate titles that can mean a different thing in every company or every team that you're talking to. What is your area of responsibility? Where do you start and where do you stop?Rick: Awesome. So, I am the product manager lead for all of AWS Optimizations Team. So, I lead the product team. That includes several other product managers that focus in on Compute Optimizer, Cost Explorer, right-sizing recommendations, as well as Reservation and Savings Plan purchase recommendations.Corey: In other words, you are the person who effectively oversees all of the AWS cost optimization tooling and approaches to same?Rick: Yeah.Corey: Give or take. I mean, you could argue that oh, every team winds up focusing on helping customers save money. I could fight that argument just as effectively. But you effectively start and stop with respect to helping customers save money or understand where the money is going on their AWS bill.Rick: I think that's a fair statement. And I also agree with your comment that I think a lot of service teams do think through those use cases and provide capabilities, you know? There's, like, S3 storage lines. You know, there's all sorts of other products that do offer optimization capabilities as well, but as far as, like, the unified purpose of my team, it is, unilaterally focused on how do we help customers safely reduce their spend and not hurt their business at the same time.Corey: Safely being the key word. For those who are unaware of my day job, I am a partial owner of The Duckbill Group, a consultancy where we fix exactly one problem: the horrifying AWS bill. This is all that I've been doing for the last six years, so I have some opinions on AWS bill reduction as well. So, this is going to be a fun episode for the two of us to wind up, mmm, more or less smacking each other around, but politely because we are both professionals. So, let's start with a very high level. How does AWS think about AWS bills from a customer perspective? You talk about optimizing it, but what does that mean to you?Rick: Yeah. So, I mean, there's a lot of ways to think about it, especially depending on who I'm talking to, you know, where they sit in our organization. I would say I think about optimization in four major themes. The first is how do you scale correctly, whether that's right-sizing or architecting things to scale in and out? The second thing I would say is, how do you do pricing and discounting, whether that's Reservation management, Savings Plan Management, coverage, how do you handle the expenditures of prepayments and things like that?Then I would say suspension. What that means is turn the lights off when you leave the room. We have a lot of customers that do this and I think there's a lot of opportunity for more. Turning EC2 instances off when they're not needed if they're non-production workloads or other, sort of, stateful services that charge by the hour, I think there's a lot of opportunity there.And then the last of the four methods is clean up. And I think it's maybe one of the lowest-hanging fruit, but essentially, are you done using this thing? Delete it. And there's a whole opportunity of cleaning up, you know, IP addresses unattached EBS volumes, sort of, these resources that hang around in AWS accounts that sort of getting lost and forgotten as well. So, those are the four kind of major thematic strategies for how to optimize a cloud environment that we think about and spend a lot of time working on.Corey: I feel like there's—or at least the way that I approach these things—that there are a number of different levels you can look at AWS billing constructs on. The way that I tend to structure most of my engagements when I'm working with clients is we come in and, step one: cool. Why do you care about the AWS bill? It's a weird question to ask because most of the engineering folks look at me like I've just grown a second head. Like, “So, why do you care about your AWS bill?” Like, “What? Why do you? You run a company doing this?”It's no, no, no, it's not that I'm being rhetorical and I don't—I'm trying to be clever somehow and pretend that I don't understand all the nuances around this, but why does your business care about lowering the AWS bill? Because very often, the answer is they kind of don't. What they care about from a business perspective is being able to accurately attribute costs for the service or good that they provide, being able to predict what that spend is going to be, and also yes, a sense of being good stewards of the money that has been entrusted to them by via investors, public markets, or the budget allocation process of their companies and make sure that they're not doing foolish things with it. And that makes an awful lot of sense. It is rare at the corporate level that the stated number one concern is make the bills lower.Because at that point, well, easy enough. Let's just turn off everything you're running in production. You'll save a lot of money in your AWS bill. You won't be in business anymore, but you'll be saving a lot of money on the AWS bill. The answer is always deceptively nuanced and complicated.At least, that's how I see it. Let's also be clear that I talk with a relatively narrow subset of the AWS customer totality. The things that I do are very much intentionally things that do not scale. Definitionally, everything that you do has to scale. How do you wind up approaching this in ways that will work for customers spending billions versus independent learners who are paying for this out of their own personal pocket?Rick: It's not easy [laugh], let me just preface that. The team we have is incredible and we spent so much time thinking about scale and the different personas that engage with our products and how they're—what their experience is when they interact with a bill or AWS platform at large. There's also a couple of different personas here, right? We have a persona that focuses in on that cloud cost, the cloud bill, the finance, whether that's—if an organization is created a FinOps organization, if they have a Cloud Center of Excellence, versus an engineering team that maybe has started to go towards decentralized IT and has some accountability for the spend that they attribute to their AWS bill. And so, these different personas interact with us in really different ways, where Cost Explorer downloading the CUR and taking a look at the bill.And one thing that I always kind of imagine is somebody putting a headlamp on and going into the caves in the depths of their AWS bill and kind of like spelunking through their bill sometimes, right? And so, you have these FinOps folks and billing people that are deeply interested in making sure that the spend they do have meets their business goals, meaning this is providing high value to our company, it's providing high value to our customers, and we're spending on the right things, we're spending the right amount on the right things. Versus the engineering organization that's like, “Hey, how do we configure these resources? What types of instances should we be focused on using? What services should we be building on top of that maybe are more flexible for our business needs?”And so, there's really, like, two major personas that I spend a lot of time—our organization spends a lot of time wrapping our heads around. Because they're really different, very different approaches to how we think about cost. Because you're right, if you just wanted to lower your AWS bill, it's really easy. Just size everything to a t2.nano and you're done and move on [laugh], right? But you're [crosstalk 00:08:53]—Corey: Aw, t3 or t4.nano, depending upon whether regional availability is going to save you less. I'm still better at this. Let's not kid ourselves I kid. Mostly.Rick: For sure. So t4.nano, absolutely.Corey: T4g. Remember, now the way forward is everything has an explicit letter designator to define which processor company made the CPU that underpins the instance itself because that's a level of abstraction we certainly wouldn't want the cloud provider to take away from us any.Rick: Absolutely. And actually, the performance differences of those different processor models can be pretty incredible [laugh]. So, there's huge decisions behind all of that as well.Corey: Oh, yeah. There's so many factors that factor in all these things. It's gotten to a point of you see this usually with lawyers and very senior engineers, but the answer to almost everything is, “It depends.” There are always going to be edge cases. Easy example of, if you check a box and enable an S3 Gateway endpoint inside of a private subnet, suddenly, you're not passing traffic through a 4.5 cent per gigabyte managed NAT Gateway; it's being sent over that endpoint for no additional cost whatsoever.Check the box, save a bunch of money. But there are scenarios where you don't want to do it, so always double-checking and talking to customers about this is critically important. Just because, the first time you make a recommendation that does not work for their constraints, you lose trust. And make a few of those and it looks like you're more or less just making naive recommendations that don't add any value, and they learn to ignore you. So, down the road, when you make a really high-value, great recommendation for them, they stop paying attention.Rick: Absolutely. And we have that really high bar for recommendation accuracy, especially with right sizing, that's such a key one. Although I guess Savings Plan purchase recommendations can be critical as well. If a customer over commits on the amount of Savings Plan purchase they need to make, right, that's a really big problem for them.So, recommendation accuracy must be above reproach. Essentially, if a customer takes a recommendation and it breaks an application, they're probably never going to take another right-sizing recommendation again [laugh]. And so, this bar of trust must be exceptionally high. That's also why out of the box, the compute optimizer recommendations can be a little bit mild, they're a little time because the first order of business is do no harm; focus on the performance requirements of the application first because we have to make sure that the reason you build these workloads in AWS is served.Now ideally, we do that without overspending and without overprovisioning the capacity of these workloads, right? And so, for example, like if we make these right-sizing recommendations from Compute Optimizer, we're taking a look at the utilization of CPU, memory, disk, network, throughput, iops, and we're vending these recommendations to customers. And when you take that recommendation, you must still have great application performance for your business to be served, right? It's such a crucial part of how we optimize and run long-term. Because optimization is not a one-time Band-Aid; it's an ongoing behavior, so it's really critical that for that accuracy to be exceptionally high so we can build business process on top of it as well.Corey: Let me ask you this. How do you contextualize what the right approach to optimization is? What is your entire—there are certain tools that you have… by ‘you,' I mean, of course, as an organization—have repeatedly gone back to and different approaches that don't seem to deviate all that much from year to year, and customer to customer. How do you think about the general things that apply universally?Rick: So, we know that EC2 is a very popular service for us. We know that sizing EC2 is difficult. We think about that optimization pillar of scaling. It's an obvious area for us to help customers. We run into this sort of industry-wide experience where whenever somebody picks the size of a resource, they're going to pick one generally larger than they need.It's almost like asking a new employee at your company, “Hey, pick your laptop. We have a 16 gig model or a 32 gig model. Which one do you want?” That person [laugh] making the decision on capacity, hardware capacity, they're always going to pick the 32 gig model laptop, right? And so, we have this sort of human nature in IT of, we don't want to get called at two in the morning for performance issues, we don't want our apps to fall over, we want them to run really well, so we're going to size things very conservatively and we're going to oversize things.So, we can help customers by providing those recommendations to say, you can size things up in a different way using math and analytics based on the utilization patterns, and we can provide and pick different instance types. There's hundreds and hundreds of instance types in all of these regions across the globe. How do you know which is the right one for every single resource you have? It's a very, very hard problem to solve and it's not something that is lucrative to solve one by one if you have 100 EC2 instances. Trying to pick the correct size for each and every one can take hours and hours of IT engineering resources to look at utilization graphs, look at all of these types available, look at what is the performance difference between processor models and providers of those processors, is there application compatibility constraints that I have to consider? The complexity is astronomical.And then not only that, as soon as you make that sizing decision, one week later, it's out of date and you need a different size. So, [laugh] you didn't really solve the problem. So, we have to programmatically use data science and math to say, “Based on these utilization values, these are the sizes that would make sense for your business, that would have the lowest cost and the highest performance together at the same time.” And it's super important that we provide this capability from a technology standpoint because it would cost so much money to try to solve that problem that the savings you would achieve might not be meaningful. Then at the same time… you know, that's really from an engineering perspective, but when we talk to the FinOps and the finance folks, the conversations are more about Reservations and Savings Plans.How do we correctly apply Savings Plans and Reservations across a high percentage of our portfolio to reduce the costs on those workloads, but not so much that dynamic capacity levels in our organization mean we all of a sudden have a bunch of unused Reservations or Savings Plans? And so, a lot of organizations that engage with us and we have conversations with, we start with the Reservation and Savings Plan conversation because it's much easier to click a few buttons and buy a Savings Plan than to go institute an entire right-sizing campaign across multiple engineering teams. That can be very difficult, a much higher bar. So, some companies are ready to dive into the engineering task of sizing; some are not there yet. And they're a little maybe a little earlier in their FinOps journey, or the building optimization technology stacks, or achieving higher value out of their cloud environments, so starting with kind of the low hanging fruit, it can vary depending on the company, size of company, technical aptitudes, skill sets, all sorts of things like that.And so, those finance-focused teams are definitely spending more time looking at and studying what are the best practices for purchasing Savings Plans, covering my environment, getting the most out of my dollar that way. Then they don't have to engage the engineering teams; they can kind of take a nice chunk off the top of their bill and sort of have something to show for that amount of effort. So, there's a lot of different approaches to start in optimization.Corey: My philosophy runs somewhat counter to this because everything you're saying does work globally, it's safe, it's non-threatening, and then also really, on some level, feels like it is an approach that can be driven forward by finance or business. Whereas my worldview is that cost and architecture in cloud are one and the same. And there are architectural consequences of cost decisions and vice versa that can be adjusted and addressed. Like, one of my favorite party tricks—although I admit, it's a weird party—is I can look at the exploded PDF view of a customer's AWS bill and describe their architecture to them. And people have questioned that a few times, and now I have a testimonial on my client website that mentions, “It was weird how he was able to do this.”Yeah, it's real, I can do it. And it's not a skill, I would recommend cultivating for most people. But it does also mean that I think I'm onto something here, where there's always context that needs to be applied. It feels like there's an entire ecosystem of product companies out there trying to build what amount to a better Cost Explorer that also is not free the way that Cost Explorer is. So, the challenge I see there's they all tend to look more or less the same; there is very little differentiation in that space. And in the fullness of time, Cost Explorer does—ideally—get better. How do you think about it?Rick: Absolutely. If you're looking at ways to understand your bill, there's obviously Cost Explorer, the CUR, that's a very common approach is to take the CUR and put a BI front-end on top of it. That's a common experience. A lot of companies that have chops in that space will do that themselves instead of purchasing a third-party product that does do bill breakdown and dissemination. There's also the cross-charge show-back organizational breakdown and boundaries because you have these super large organizations that have fiefdoms.You know, if HR IT and sales IT, and [laugh] you know, product IT, you have all these different IT departments that are fiefdoms within your AWS bill and construct, whether they have different ABS accounts or say different AWS organizations sometimes, right, it can get extremely complicated. And some organizations require the ability to break down their bill based on those organizational boundaries. Maybe tagging works, maybe it doesn't. Maybe they do that by using a third-party product that lets them set custom scopes on their resources based on organizational boundaries. That's a common approach as well.We do also have our first-party solutions, they can do that, like the CUDOS dashboard as well. That's something that's really popular and highly used across our customer base. It allows you to have kind of a dashboard and customizable view of your AWS costs and, kind of, split it up based on tag organizational value, account name, and things like that as well. So, you mentioned that you feel like the architectural and cost problem is the same problem. I really don't disagree with that at all.I think what it comes down to is some organizations are prepared to tackle the architectural elements of cost and some are not. And it really comes down to how does the customer view their bill? Is it somebody in the finance organization looking at the bill? Is it somebody in the engineering organization looking at the bill? Ideally, it would be both.Ideally, you would have some of those skill sets that overlap, or you would have an organization that does focus in on FinOps or cloud operations as it relates to cost. But then at the same time, there are organizations that are like, “Hey, we need to go to cloud. Our CIO told us go to cloud. We don't want to pay the lease renewal on this building.” There's a lot of reasons why customers move to cloud, a lot of great reasons, right? Three major reasons you move to cloud: agility, [crosstalk 00:20:11]—Corey: And several terrible ones.Rick: Yeah, [laugh] and some not-so-great ones, too. So, there's so many different dynamics that get exposed when customers engage with us that they might or might not be ready to engage on the architectural element of how to build hyperscale systems. So, many of these customers are bringing legacy workloads and applications to the cloud, and something like a re-architecture to use stateless resources or something like Spot, that's just not possible for them. So, how can they take 20% off the top of their bill? Savings Plans or Reservations are kind of that easy, low-hanging fruit answer to just say, “We know these are fairly static environments that don't change a whole lot, that are going to exist for some amount of time.”They're legacy, you know, we can't turn them off. It doesn't make sense to rewrite these applications because they just don't change, they don't have high business value, or something like that. And so, the architecture part of that conversation doesn't always come into play. Should it? Yes.The long-term maturity and approach for cloud optimization does absolutely account for architecture, thinking strategically about how you do scaling, what services you're using, are you going down the Kubernetes path, which I know you're going to laugh about, but you know, how do you take these applications and componentize them? What services are you using to do that? How do you get that long-term scale and manageability out of those environments? Like you said at the beginning, the complexity is staggering and there's no one unified answer. That's why there's so many different entrance paths into, “How do I optimize my AWS bill?”There's no one answer, and every customer I talk to has a different comfort level and appetite. And some of them have tried suspension, some of them have gone heavy down Savings Plans, some of them want to dabble in right-sizing. So, every customer is different and we want to provide those capabilities for all of those different customers that have different appetites or comfort levels with each of these approaches.Corey: This episode is sponsored in part by our friends at Redis, the company behind the incredibly popular open source database. If you're tired of managing open source Redis on your own, or if you are looking to go beyond just caching and unlocking your data's full potential, these folks have you covered. Redis Enterprise is the go-to managed Redis service that allows you to reimagine how your geo-distributed applications process, deliver, and store data. To learn more from the experts in Redis how to be real-time, right now, from anywhere, visit redis.com/duckbill. That's R - E - D - I - S dot com slash duckbill.Corey: And I think that's very fair. I think that it is not necessarily a bad thing that you wind up presenting a lot of these options to customers. But there are some rough edges. An example of this is something I encountered myself somewhat recently and put on Twitter—because I have those kinds of problems—where originally, I remember this, that you were able to buy hourly Savings Plans, which again, Savings Plans are great; no knock there. I would wish that they applied to more services rather than, “Oh, SageMaker is going to do its own Savings Pla”—no, stop keeping me from going from something where I have to manage myself on EC2 to something you manage for me and making that cost money. You nailed it with Fargate. You nailed it with Lambda. Please just have one unified Savings Plan thing. But I digress.But you had a limit, once upon a time, of $1,000 per hour. Now, it's $5,000 per hour, which I believe in a three-year all-up-front means you will cheerfully add $130 million purchase to your shopping cart. And I kept adding a bunch of them and then had a little over a billion dollars a single button click away from being charged to my account. Let me begin with what's up with that?Rick: [laugh]. Thank you for the tweet, by the way, Corey.Corey: Always thrilled to ruin your month, Rick. You know that.Rick: Yeah. Fantastic. We took that tweet—you know, it was tongue in cheek, but also it was a serious opportunity for us to ask a question of what does happen? And it's something we did ask internally and have some fun conversations about. I can tell you that if you clicked purchase, it would have been declined [laugh]. So, you would have not been—Corey: Yeah, American Express would have had a problem with that. But the question is, would you have attempted to charge American Express, or would something internally have gone, “This has a few too many commas for us to wind up presenting it to the card issuer with a straight face?”Rick: [laugh]. Right. So, it wouldn't have gone through and I can tell you that, you know, if your account was on a PO-based configuration, you know, it would have gone to the account team. And it would have gone through our standard process for having a conversation with our customer there. That being said, we are—it's an awesome opportunity for us to examine what is that shopping cart experience.We did increase the limit, you're right. And we increased the limit for a lot of reasons that we sat down and worked through, but at the same time, there's always an opportunity for improvement of our product and experience, we want to make sure that it's really easy and lightweight to use our products, especially purchasing Savings Plans. Savings Plans are already kind of wrought with mental concern and risk of purchasing something so expensive and large that has a big impact on your AWS bill, so we don't really want to add any more friction necessarily the process but we do want to build an awareness and make sure customers understand, “Hey, you're purchasing this. This has a pretty big impact.” And so, we're also looking at other ways we can kind of improve the ability for the Savings Plan shopping cart experience to ensure customers don't put themselves in a position where you have to unwind or make phone calls and say, “Oops.” Right? We [laugh] want to avoid those sorts of situations for our customers. So, we are looking at quite a few additional improvements to that experience as well that I'm really excited about that I really can't share here, but stay tuned.Corey: I am looking forward to it. I will say the counterpoint to that is having worked with customers who do make large eight-figure purchases at once, there's a psychology element that plays into it. Everyone is very scared to click the button on the ‘Buy It Now' thing or the ‘Approve It.' So, what I've often found is at that scale, one, you can reduce what you're buying by half of it, and then see how that treats you and then continue to iterate forward rather than doing it all at once, or reach out to your account team and have them orchestrate the buy. In previous engagements, I had a customer do this religiously and at one point, the concierge team bought the wrong thing in the wrong region, and from my perspective, I would much rather have AWS apologize for that and fix it on their end, than from us having to go with a customer side of, “Oh crap, oh, crap. Please be nice to us.”Not that I doubt you would do it, but that's not the nervous conversation I want to have in quite the same way. It just seems odd to me that someone would want to make that scale of purchase without ever talking to a human. I mean, I get it. I'm as antisocial as they come some days, but for that kind of money, I kind of just want another human being to validate that I'm not making a giant mistake.Rick: We love that. That's such a tremendous opportunity for us to engage and discuss with an organization that's going to make a large commitment, that here's the impact, here's how we can help. How does it align to our strategy? We also do recommend, from a strategic perspective, those more incremental purchases. I think it creates a better experience long-term when you don't have a single Savings Plan that's going to expire on a specific day that all of a sudden increases your entire bill by a significant percentage.So, making staggered monthly purchases makes a lot of sense. And it also works better for incremental growth, right? If your organization is growing 5% month-over-month or year-over-year or something like that, you can purchase those incremental Savings Plans that sort of stack up on top of each other and then you don't have that risk of a cliff one day where one super-large SP expires and boom, you have to scramble and repurchase within minutes because every minute that goes by is an additional expense, right? That's not a great experience. And so that's, really, a large part of why those staggered purchase experiences make a lot of sense.That being said, a lot of companies do their math and their finance in different ways. And single large purchases makes sense to go through their process and their rigor as well. So, we try to support both types of purchasing patterns.Corey: I think that is an underappreciated aspect of cloud cost savings and cloud cost optimization, where it is much more about humans than it is about math. I see this most notably when I'm helping customers negotiate their AWS contracts with AWS, where they are often perspectives such as, “Well, we feel like we really got screwed over last time, so we want to stick it to them and make them give us a bigger percentage discount on something.” And it's like, look, you can do that, but I would much rather, if it were me, go for something that moves the needle on your actual business and empowers you to move faster, more effectively, and lead to an outcome that is a positive for everyone versus the well, we're just going to be difficult in this one point because they were difficult on something last time. But ego is a thing. Human psychology is never going to have an API for it. And again, customers get to decide their own destiny in some cases.Rick: I completely agree. I've actually experienced that. So, this is the third company I've been working at on Cloud optimization. I spent several years at Microsoft running an optimization program. I went to Turbonomic for several years, building out the right-sizing and savings plan reservation purchase capabilities there, and now here at AWS.And through all of these journeys and experiences working with companies to help optimize their cloud spend, I can tell you that the psychological needle—moving the needle is significantly harder than the technology stack of sizing something correctly or deleting something that's unused. We can solve the technology part. We can build great products that identify opportunities to save money. There's still this psychological component of IT, for the last several decades has gone through this maturity curve of if it's not broken, don't touch it. Five-nines, six sigma, all of these methods of IT sort of rationalizing do no harm, don't touch anything, everything must be up.And it even kind of goes back several decades. Back when if you rebooted a physical server, the motherboard capacitors would pop, right? So, there's even this anti—or this stigma against even rebooting servers sometimes. In the cloud really does away with a lot of that stuff because we have live migration and we have all of these, sort of, stateless designs and capabilities, but we still carry along with us this mentality of don't touch it; it might fall over. And we have to really get past that.And that means that the trust, we went back to the trust conversation where we talk about the recommendations must be incredibly accurate. You're risking your job, in some cases; if you are a DevOps engineer, and your commitments on your yearly goals are uptime, latency, response time, load time, these sorts of things, these operational metrics, KPIs that you use, you don't want to take a downsized recommendation. It has a severe risk of harming your job and your bonus.Corey: “These instances are idle. Turn them off.” It's like, yeah, these instances are the backup site, or the DR environment, or—Rick: Exactly.Corey: —something that takes very bursty but occasional traffic. And yeah, I know it costs us some money, but here's the revenue figures for having that thing available. Like, “Oh, yeah. Maybe we should shut up and not make dumb recommendations around things,” is the human response, but computers don't have that context.Rick: Absolutely. And so, the accuracy and trust component has to be the highest bar we meet for any optimization activity or behavior. We have to circumvent or supersede the human aversion, the risk aversion, that IT is built on, right?Corey: Oh, absolutely. And let's be clear, we see this all the time where I'm talking to customers and they have been burned before because we tried to save money and then we took a production outage as a side effect of a change that we made, and now we're not allowed to try to save money anymore. And there's a hidden truth in there, which is auto-scaling is something that a lot of customers talk about, but very few have instrumented true auto-scaling because they interpret is we can scale up to meet demand. Because yeah, if you don't do that you're dropping customers on the floor.Well, what about scaling back down again? And the answer there is like, yeah, that's not really a priority because it's just money. We're not disappointing customers, causing brand reputation, and we're still able to take people's money when that happens. It's only money; we can fix it later. Covid shined a real light on a lot of the stuff just because there are customers that we've spoken to who's—their user traffic dropped off a cliff, infrastructure spend remained constant day over day.And yeah, they believe, genuinely, they were auto-scaling. The most interesting lies are the ones that customers tell themselves, but the bill speaks. So, getting a lot of modernization traction from things like that was really neat to watch. But customers I don't think necessarily intuitively understand most aspects of their bill because it is a multidisciplinary problem. It's engineering, its finance, its accounting—which is not the same thing as finance—and you need all three of those constituencies to be able to communicate effectively using a shared and common language. It feels like we're marriage counseling between engineering and finance, most weeks.Rick: Absolutely, we are. And it's important we get it right, that the data is accurate, that the recommendations we provide are trustworthy. If the finance team gets their hands on the savings potential they see out of right-sizing, takes it to engineering, and then engineering comes back and says, “No, no, no, we can't actually do that. We can't actually size those,” right, we have problems. And they're cultural, they're transformational. Organizations' appetite for these things varies greatly and so it's important that we address that problem from all of those angles. And it's not easy to do.Corey: How big do you find the optimization problem is when you talk to customers? How focused are they on it? I have my answers, but that's the scale of anec-data. I want to hear your actual answer.Rick: Yeah. So, we talk with a lot of customers that are very interested in optimization. And we're very interested in helping them on the journey towards having an optimal estate. There are so many nuances and barriers, most of them psychological like we already talked about.I think there's this opportunity for us to go do better exposing the potential of what an optimal AWS estate would look like from a dollar and savings perspective. And so, I think it's kind of not well understood. I think it's one of the biggest areas or barriers of companies really attacking the optimization problem with more vigor is if they knew that the potential savings they could achieve out of their AWS environment would really align their spend much more closely with the business value they get, I think everybody would go bonkers. And so, I'm really excited about us making progress on exposing that capability or the total savings potential and amount. It's something we're looking into doing in a much more obvious way.And we're really excited about customers doing that on AWS where they know they can trust AWS to get the best value for their cloud spend, that it's a long-term good bet because their resources that they're using on AWS are all focused on giving business value. And that's the whole key. How can we align the dollars to the business value, right? And I think optimization is that connection between those two concepts.Corey: Companies are generally not going to greenlight a project whose sole job is to save money unless there's something very urgent going on. What will happen is as they iterate forward on the next generation of services or a migration of a service from one thing to another, they will make design decisions that benefit those optimizations. There's low-hanging fruit we can find, usually of the form, “Turn that thing off,” or, “Configure this thing slightly differently,” that doesn't take a lot of engineering effort in place. But, on some level, it is not worth the engineering effort it takes to do an optimization project. We've all met those engineers—speaking is one of them myself—who, left to our own devices, will spend two months just knocking a few hundred bucks a month off of our AWS developer environment.We steal more than office supplies. I'm not entirely sure what the business value of doing that is, in most cases. For me, yes, okay, things that work in small environments work very well in large environments, generally speaking, so I learned how to save 80 cents here and that's a few million bucks a month somewhere else. Most folks don't have that benefit happening, so it's a question of meeting them where they are.Rick: Absolutely. And I think the scale component is huge, which you just touched on. When you're talking about a hundred EC2 instances versus a thousand, optimization becomes kind of a different component of how you manage that AWS environment. And while single-decision recommendations to scale an individual server, the dollar amount might be different, the percentages are just about the same when you look at what is it to be sized correctly, what is it to be configured correctly? And so, it really does come down to priority.And so, it's really important to really support all of those companies of all different sizes and industries because they will have different experiences on AWS. And some will have more sensitivity to cost than others, but all of them want to get great business value out of their AWS spend. And so, as long as we're meeting that need and we're supporting our customers to make sure they understand the commitment we have to ensuring that their AWS spend is valuable, it is meaningful, right, they're not spending money on things that are not adding value, that's really important to us.Corey: I do want to have as the last topic of discussion here, how AWS views optimization, where there have been a number of repeated statements where helping customers optimize their cloud spend is extremely important to us. And I'm trying to figure out where that falls on the spectrum from, “It's the thing we say because they make us say it, but no, we're here to milk them like cows,” all the way on over to, “No, no, we passionately believe in this at every level, top to bottom, in every company. We are just bad at it.” So, I'm trying to understand how that winds up being expressed from your lived experience having solved this problem first outside, and then inside.Rick: Yeah. So, it's kind of like part of my personal story. It's the main reason I joined AWS. And, you know, when you go through the interview loops and you talk to the leaders of an organization you're thinking about joining, they always stop at the end of the interview and ask, “Do you have any questions for us?” And I asked that question to pretty much every single person I interviewed with. Like, “What is AWS's appetite for helping customers save money?”Because, like, from a business perspective, it kind of is a little bit wonky, right? But the answers were varied, and all of them were customer-obsessed and passionate. And I got this sense that my personal passion for helping companies have better efficiency of their IT resources was an absolute primary goal of AWS and a big element of Amazon's leadership principle, be customer obsessed. Now, I'm not a spokesperson, so [laugh] we'll see, but we are deeply interested in making sure our customers have a great long-term experience and a high-trust relationship. And so, when I asked these questions in these interviews, the answers were all about, “We have to do the right thing for the customer. It's imperative. It's also in our DNA. It's one of the most important leadership principles we have to be customer-obsessed.”And it is the primary reason why I joined: because of that answer to that question. Because it's so important that we achieve a better efficiency for our IT resources, not just for, like, AWS, but for our planet. If we can reduce consumption patterns and usage across the planet for how we use data centers and all the power that goes into them, we can talk about meaningful reductions of greenhouse gas emissions, the cost and energy needed to run IT business applications, and not only that, but most all new technology that's developed in the world seems to come out of a data center these days, we have a real opportunity to make a material impact to how much resource we use to build and use these things. And I think we owe it to the planet, to humanity, and I think Amazon takes that really seriously. And I'm really excited to be here because of that.Corey: As I recall—and feel free to make sure that this comment never sees the light of day—you asked me before interviewing for the role and then deciding to accept it, what I thought about you working there and whether I would recommend it, whether I wouldn't. And I think my answer was fairly nuanced. And you're working there now and we still are on speaking terms, so people can probably guess what my comments took the shape of, generally speaking. So, I'm going to have to ask now; it's been, what, a year since you joined?Rick: Almost. I think it's been about eight months.Corey: Time during a pandemic is always strange. But I have to ask, did I steer you wrong?Rick: No. Definitely not. I'm very happy to be here. The opportunity to help such a broad range of companies get more value out of technology—and it's not just cost, right, like we talked about. It's actually not about the dollar number going down on a bill. It's about getting more value and moving the needle on how do we efficiently use technology to solve business needs.And that's been my career goal for a really long time, I've been working on optimization for, like, seven or eight, I don't know, maybe even nine years now. And it's like this strange passion for me, this combination of my dad taught me how to be a really good steward of money and a great budget manager, and then my passion for technology. So, it's this really cool combination of, like, childhood life skills that really came together for me to create a career that I'm really passionate about. And this move to AWS has been such a tremendous way to supercharge my ability to scale my personal mission, and really align it to AWS's broader mission of helping companies achieve more with cloud platforms, right?And so, it's been a really nice eight months. It's been wild. Learning AWS culture has been wild. It's a sharp diverging culture from where I've been in the past, but it's also really cool to experience the leadership principles in action. They're not just things we put on a website; they're actually things people talk about every day [laugh]. And so, that journey has been humbling and a great learning opportunity as well.Corey: If people want to learn more, where's the best place to find you?Rick: Oh, yeah. Contact me on LinkedIn or Twitter. My Twitter account is @rickyo1138. Let me know if you get the 1138 reference. That's a fun one.Corey: THX 1138. Who doesn't?Rick: Yeah, there you go. And it's hidden in almost every single George Lucas movie as well. You can contact me on any of those social media platforms and I'd be happy to engage with anybody that's interested in optimization, cloud technology, bill, anything like that. Or even not [laugh]. Even anything else, either.Corey: Thank you so much for being so generous with your time. I really appreciate it.Rick: My pleasure, Corey. It was wonderful talking to you.Corey: Rick Ochs, Principal Product Manager at AWS. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment, rightly pointing out that while AWS is great and all, Azure is far more cost-effective for your workloads because, given their lack security, it is trivially easy to just run your workloads in someone else's account.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Crafting a Modern Data Protection Strategy with Sam Nicholls

Screaming in the Cloud

Play Episode Listen Later Nov 30, 2022 37:26


About SamSam Nicholls: Veeam's Director of Public Cloud Product Marketing, with 10+ years of sales, alliance management and product marketing experience in IT. Sam has evolved from his on-premises storage days and is now laser-focused on spreading the word about cloud-native backup and recovery, packing in thousands of viewers on his webinars, blogs and webpages.Links Referenced: Veeam AWS Backup: https://www.veeam.com/aws-backup.html Veeam: https://veeam.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Chronosphere. Tired of observability costs going up every year without getting additional value? Or being locked in to a vendor due to proprietary data collection, querying and visualization? Modern day, containerized environments require a new kind of observability technology that accounts for the massive increase in scale and attendant cost of data. With Chronosphere, choose where and how your data is routed and stored, query it easily, and get better context and control. 100% open source compatibility means that no matter what your setup is, they can help. Learn how Chronosphere provides complete and real-time insight into ECS, EKS, and your microservices, whereever they may be at snark.cloud/chronosphere That's snark.cloud/chronosphere Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted guest episode is brought to us by and sponsored by our friends over at Veeam. And as a part of that, they have thrown one of their own to the proverbial lion. My guest today is Sam Nicholls, Director of Public Cloud over at Veeam. Sam, thank you for joining me.Sam: Hey. Thanks for having me, Corey, and thanks for everyone joining and listening in. I do know that I've been thrown into the lion's den, and I am [laugh] hopefully well-prepared to answer anything and everything that Corey throws my way. Fingers crossed. [laugh].Corey: I don't think there's too much room for criticizing here, to be direct. I mean, Veeam is a company that is solidly and thoroughly built around a problem that absolutely no one cares about. I mean, what could possibly be wrong with that? You do backups; which no one ever cares about. Restores, on the other hand, people care very much about restores. And that's when they learn, “Oh, I really should have cared about backups at any point prior to 20 minutes ago.”Sam: Yeah, it's a great point. It's kind of like taxes and insurance. It's almost like, you know, something that you have to do that you don't necessarily want to do, but when push comes to shove, and something's burning down, a file has been deleted, someone's made their way into your account and, you know, running a right mess within there, that's when you really, kind of, care about what you mentioned, which is the recovery piece, the speed of recovery, the reliability of recovery.Corey: It's been over a decade, and I'm still sore about losing my email archives from 2006 to 2009. There's no way to get it back. I ran my own mail server; it was an iPhone setting that said, “Oh, yeah, automatically delete everything in your trash folder—or archive folder—after 30 days.” It was just a weird default setting back in that era. I didn't realize it was doing that. Yeah, painful stuff.And we learned the hard way in some of these cases. Not that I really have much need for email from that era of my life, but every once in a while it still bugs me. Which gets speaks to the point that the people who are the most fanatical about backing things up are the people who have been burned by not having a backup. And I'm fortunate in that it wasn't someone else's data with which I had been entrusted that really cemented that lesson for me.Sam: Yeah, yeah. It's a good point. I could remember a few years ago, my wife migrated a very aging, polycarbonate white Mac to one of the shiny new aluminum ones and thought everything was good—Corey: As the white polycarbonate Mac becomes yellow, then yeah, all right, you know, it's time to replace it. Yeah. So yeah, so she wiped the drive, and what happened?Sam: That was her moment where she learned the value and importance of backup unless she backs everything up now. I fortunately have never gone through it. But I'm employed by a backup vendor and that's why I care about it. But it's incredibly important to have, of course.Corey: Oh, yes. My spouse has many wonderful qualities, but one that drives me slightly nuts is she's something of a digital packrat where her hard drives on her laptop will periodically fill up. And I used to take the approach of oh, you can be more efficient and do the rest. And I realized no, telling other people they're doing it wrong is generally poor practice, whereas just buying bigger drives is way easier. Let's go ahead and do that. It's small price to pay for domestic tranquility.And there's a lesson in that. We can map that almost perfectly to the corporate world where you folks tend to operate in. You're not doing home backup, last time I checked; you are doing public cloud backup. Actually, I should ask that. Where do you folks start and where do you stop?Sam: Yeah, no, it's a great question. You know, we started over 15 years ago when virtualization, specifically VMware vSphere, was really the up-and-coming thing, and, you know, a lot of folks were there trying to utilize agents to protect their vSphere instances, just like they were doing with physical Windows and Linux boxes. And, you know, it kind of got the job done, but was it the best way of doing it? No. And that's kind of why Veeam was pioneered; it was this agentless backup, image-based backup for vSphere.And, of course, you know, in the last 15 years, we've seen lots of transitions, of course, we're here at Screaming in the Cloud, with you, Corey, so AWS, as well as a number of other public cloud vendors we can help protect as well, as a number of SaaS applications like Microsoft 365, metadata and data within Salesforce. So, Veeam's really kind of come a long way from just virtual machines to really taking a global look at the entirety of modern environments, and how can we best protect each and every single one of those without trying to take a square peg and fit it in a round hole?Corey: It's a good question and a common one. We wind up with an awful lot of folks who are confused by the proliferation of data. And I'm one of them, let's be very clear here. It comes down to a problem where backups are a multifaceted, deep problem, and I don't think that people necessarily think of it that way. But I take a look at all of the different, even AWS services that I use for my various nonsense, and which ones can be used to store data?Well, all of them. Some of them, you have to hold it in a particularly wrong sort of way, but they all store data. And in various contexts, a lot of that data becomes very important. So, what service am I using, in which account am I using, and in what region am I using it, and you wind up with data sprawl, where it's a tremendous amount of data that you can generally only track down by looking at your bills at the end of the month. Okay, so what am I being charged, and for what service?That seems like a good place to start, but where is it getting backed up? How do you think about that? So, some people, I think, tend to ignore the problem, which we're seeing less and less, but other folks tend to go to the opposite extreme and we're just going to backup absolutely everything, and we're going to keep that data for the rest of our natural lives. It feels to me that there's probably an answer that is more appropriate somewhere nestled between those two extremes.Sam: Yeah, snapshot sprawl is a real thing, and it gets very, very expensive very, very quickly. You know, your snapshots of EC2 instances are stored on those attached EBS volumes. Five cents per gig per month doesn't sound like a lot, but when you're dealing with thousands of snapshots for thousands machines, it gets out of hand very, very quickly. And you don't know when to delete them. Like you say, folks are just retaining them forever and dealing with this unfortunate bill shock.So, you know, where to start is automating the lifecycle of a snapshot, right, from its creation—how often do we want to be creating them—from the retention—how long do we want to keep these for—and where do we want to keep them because there are other storage services outside of just EBS volumes. And then, of course, the ultimate: deletion. And that's important even from a compliance perspective as well, right? You've got to retain data for a specific number of years, I think healthcare is like seven years, but then you've—Corey: And then not a day more.Sam: Yeah, and then not a day more because that puts you out of compliance, too. So, policy-based automation is your friend and we see a number of folks building these policies out: gold, silver, bronze tiers based on criticality of data compliance and really just kind of letting the machine do the rest. And you can focus on not babysitting backup.Corey: What was it that led to the rise of snapshots? Because back in my very early days, there was no such thing. We wound up using a bunch of servers stuffed in a rack somewhere and virtualization was not really in play, so we had file systems on physical disks. And how do you back that up? Well, you have an agent of some sort that basically looks at all the files and according to some ruleset that it has, it copies them off somewhere else.It was slow, it was fraught, it had a whole bunch of logic that was pushed out to the very edge, and forget about restoring that data in a timely fashion or even validating a lot of those backups worked other than via checksum. And God help you if you had data that was constantly in the state of flux, where anything changing during the backup run would leave your backups in an inconsistent state. That on some level seems to have largely been solved by snapshots. But what's your take on it? You're a lot closer to this part of the world than I am.Sam: Yeah, snapshots, I think folks have turned to snapshots for the speed, the lack of impact that they have on production performance, and again, just the ease of accessibility. We have access to all different kinds of snapshots for EC2, RDS, EFS throughout the entirety of our AWS environment. So, I think the snapshots are kind of like the default go-to for folks. They can help deliver those very, very quick RPOs, especially in, for example, databases, like you were saying, that change very, very quickly and we all of a sudden are stranded with a crash-consistent backup or snapshot versus an application-consistent snapshot. And then they're also very, very quick to recover from.So, snapshots are very, very appealing, but they absolutely do have their limitations. And I think, you know, it's not a one or the other; it's that they've got to go hand-in-hand with something else. And typically, that is an image-based backup that is stored in a separate location to the snapshot because that snapshot is not independent of the disk that it is protecting.Corey: One of the challenges with snapshots is most of them are created in a copy-on-write sense. It takes basically an instant frozen point in time back—once upon a time when we ran MySQL databases on top of the NetApp Filer—which works surprisingly well—we would have a script that would automatically quiesce the database so that it would be in a consistent state, snapshot the file and then un-quiesce it, which took less than a second, start to finish. And that was awesome, but then you had this snapshot type of thing. It wasn't super portable, it needed to reference a previous snapshot in some cases, and AWS takes the same approach where the first snapshot it captures every block, then subsequent snapshots wind up only taking up as much size as there have been changes since the first snapshots. So, large quantities of data that generally don't get access to a whole lot have remarkably small, subsequent snapshot sizes.But that's not at all obvious from the outside, and looking at these things. They're not the most portable thing in the world. But it's definitely the direction that the industry has trended in. So, rather than having a cron job fire off an AWS API call to take snapshots of my volumes as a sort of the baseline approach that we all started with, what is the value proposition that you folks bring? And please don't say it's, “Well, cron jobs are hard and we have a friendlier interface for that.”Sam: [laugh]. I think it's really starting to look at the proliferation of those snapshots, understanding what they're good at, and what they are good for within your environment—as previously mentioned, low RPOs, low RTOs, how quickly can I take a backup, how frequently can I take a backup, and more importantly, how quickly can I restore—but then looking at their limitations. So, I mentioned that they were not independent of that disk, so that certainly does introduce a single point of failure as well as being not so secure. We've kind of touched on the cost component of that as well. So, what Veeam can come in and do is then take an image-based backup of those snapshots, right—so you've got your initial snapshot and then your incremental ones—we'll take the backup from that snapshot, and then we'll start to store that elsewhere.And that is likely going to be in a different account. We can look at the Well-Architected Framework, AWS deeming accounts as a security boundary, so having that cross-account function is critically important so you don't have that single point of failure. Locking down with IAM roles is also incredibly important so we haven't just got a big wide open door between the two. But that data is then stored in a separate account—potentially in a separate region, maybe in the same region—Amazon S3 storage. And S3 has the wonderful benefit of being still relatively performant, so we can have quick recoveries, but it is much, much cheaper. You're dealing with 2.3 cents per gig per month, instead of—Corey: To start, and it goes down from there with sizeable volumes.Sam: Absolutely, yeah. You can go down to S3 Glacier, where you're looking at, I forget how many points and zeros and nines it is, but it's fractions of a cent per gig per month, but it's going to take you a couple of days to recover that da—Corey: Even infrequent access cuts that in half.Sam: Oh yeah.Corey: And let's be clear, these are snapshot backups; you probably should not be accessing them on a consistent, sustained basis.Sam: Well, exactly. And this is where it's kind of almost like having your cake and eating it as well. Compliance or regulatory mandates or corporate mandates are saying you must keep this data for this length of time. Keeping that—you know, let's just say it's three years' worth of snapshots in an EBS volume is going to be incredibly expensive. What's the likelihood of you needing to recover something from two years—actually, even two months ago? It's very, very small.So, the performance part of S3 is, you don't need to take it as much into consideration. Can you recover? Yes. Is it going to take a little bit longer? Absolutely. But it's going to help you meet those retention requirements while keeping your backup bill low, avoiding that bill shock, right, spending tens and tens of thousands every single month on snapshots. This is what I mean by kind of having your cake and eating it.Corey: I somewhat recently have had a client where EBS snapshots are one of the driving costs behind their bill. It is one of their largest single line items. And I want to be very clear here because if one of those people who listen to this and thinking, “Well, hang on. Wait, they're telling stories about us, even though they're not naming us by name?” Yeah, there were three of you in the last quarter.So, at that point, it becomes clear it is not about something that one individual company has done and more about an overall driving trend. I am personalizing it a little bit by referring to as one company when there were three of you. This is a narrative device, not me breaking confidentiality. Disclaimer over. Now, when you talk to people about, “So, tell me why you've got 80 times more snapshots than you do EBS volumes?” The answer is as, “Well, we wanted to back things up and we needed to get hourly backups to a point, then daily backups, then monthly, and so on and so forth. And when this was set up, there wasn't a great way to do this natively and we don't always necessarily know what we need versus what we don't. And the cost of us backing this up, well, you can see it on the bill. The cost of us deleting too much and needing it as soon as we do? Well, that cost is almost incalculable. So, this is the safe way to go.” And they're not wrong in anything that they're saying. But the world has definitely evolved since then.Sam: Yeah, yeah. It's a really great point. Again, it just folds back into my whole having your cake and eating it conversation. Yes, you need to retain data; it gives you that kind of nice, warm, cozy feeling, it's a nice blanket on a winter's day that that data, irrespective of what happens, you're going to have something to recover from. But the question is does that need to be living on an EBS volume as a snapshot? Why can't it be living on much, much more cost-effective storage that's going to give you the warm and fuzzies, but is going to make your finance team much, much happier [laugh].Corey: One of the inherent challenges I think people have is that snapshots by themselves are almost worthless, in that I have an EBS snapshot, it is sitting there now, it's costing me an undetermined amount of money because it's not exactly clear on a per snapshot basis exactly how large it is, and okay, great. Well, I'm looking for a file that was not modified since X date, as it was on this time. Well, great, you're going to have to take that snapshot, restore it to a volume and then go exploring by hand. Oh, it was the wrong one. Great. Try it again, with a different one.And after, like, the fifth or six in a row, you start doing a binary search approach on this thing. But it's expensive, it's time-consuming, it takes forever, and it's not a fun user experience at all. Part of the problem is it seems that historically, backup systems have no context or no contextual awareness whatsoever around what is actually contained within that backup.Sam: Yeah, yeah. I mean, you kind of highlighted two of the steps. It's more like a ten-step process to do, you know, granular file or folder-level recovery from a snapshot, right? You've got to, like you say, you've got to determine the point in time when that, you know, you knew the last time that it was around, then you're going to have to determine the volume size, the region, the OS, you're going to have to create an EBS volume of the same size, region, from that snapshot, create the EC2 instance with the same OS, connect the two together, boot the EC2 instance, mount the volume search for the files to restore, download them manually, at which point you have your file back. It's not back in the machine where it was, it's now been downloaded locally to whatever machine you're accessing that from. And then you got to tear it all down.And that is again, like you say, predicated on the fact that you knew exactly that that was the right time. It might not be and then you have to start from scratch from a different point in time. So, backup tooling from backup vendors that have been doing this for many, many years, knew about this problem long, long ago, and really seek to not only automate the entirety of that process but make the whole e-discovery, the search, the location of those files, much, much easier. I don't necessarily want to do a vendor pitch, but I will say with Veeam, we have explorer-like functionality, whereby it's just a simple web browser. Once that machine is all spun up again, automatic process, you can just search for your individual file, folder, locate it, you can download it locally, you can inject it back into the instance where it was through Amazon Kinesis or AWS Kinesis—I forget the right terminology for it; some of its AWS, some of its Amazon.But by-the-by, the whole recovery process, especially from a file or folder level, is much more pain-free, but also much faster. And that's ultimately what people care about how reliable is my backup? How quickly can I get stuff online? Because the time that I'm down is costing me an indescribable amount of time or money.Corey: This episode is sponsored in part by our friends at Redis, the company behind the incredibly popular open source database. If you're tired of managing open source Redis on your own, or if you are looking to go beyond just caching and unlocking your data's full potential, these folks have you covered. Redis Enterprise is the go-to managed Redis service that allows you to reimagine how your geo-distributed applications process, deliver, and store data. To learn more from the experts in Redis how to be real-time, right now, from anywhere, visit redis.com/duckbill. That's R - E - D - I - S dot com slash duckbill.Corey: Right, the idea of RPO versus RTO: recovery point objective and recovery time objective. With an RPO, it's great, disaster strikes right now, how long is acceptable to it have been since the last time we backed up data to a restorable point? Sometimes it's measured in minutes, sometimes it's measured in fractions of a second. It really depends on what we're talking about. Payments databases, that needs to be—the RPO is basically an asymptotically approaches zero.The RTO is okay, how long is acceptable before we have that data restored and are back up and running? And that is almost always a longer time, but not always. And there's a different series of trade-offs that go into that. But both of those also presuppose that you've already dealt with the existential question of is it possible for us to recover this data. And that's where I know that you are obviously—you have a position on this that is informed by where you work, but I don't, and I will call this out as what I see in the industry: AWS backup is compelling to me except for one fatal flaw that it has, and that is it starts and stops with AWS.I am not a proponent of multi-cloud. Lord knows I've gotten flack for that position a bunch of times, but the one area where it makes absolute sense to me is backups. Have your data in a rehydrate-the-business level state backed up somewhere that is not your primary cloud provider because you're otherwise single point of failure-ing through a company, through the payment instrument you have on file with that company, in the blast radius of someone who can successfully impersonate you to that vendor. There has to be a gap of some sort for the truly business-critical data. Yes, egress to other providers is expensive, but you know what also is expensive? Irrevocably losing the data that powers your business. Is it likely? No, but I would much rather do it than have to justify why I'm not doing it.Sam: Yeah. Wasn't likely that I was going to win that 2 billion or 2.1 billion on the Powerball, but [laugh] I still play [laugh]. But I understand your standpoint on multi-cloud and I read your newsletters and understand where you're coming from, but I think the reality is that we do live in at least a hybrid cloud world, if not multi-cloud. The number of organizations that are sole-sourced on a single cloud and nothing else is relatively small, single-digit percentage. It's around 80-some percent that are hybrid, and the remainder of them are your favorite: multi-cloud.But again, having something that is one hundred percent sole-source on a single platform or a single vendor does expose you to a certain degree of risk. So, having the ability to do cross-platform backups, recoveries, migrations, for whatever reason, right, because it might not just be a disaster like you'd mentioned, it might also just be… I don't know, the company has been taken over and all of a sudden, the preference is now towards another cloud provider and I want you to refactor and re-architect everything for this other cloud provider. If all that data is locked into one platform, that's going to make your job very, very difficult. So, we mentioned at the beginning of the call, Veeam is capable of protecting a vast number of heterogeneous workloads on different platforms, in different environments, on-premises, in multiple different clouds, but the other key piece is that we always use the same backup file format. And why that's key is because it enables portability.If I have backups of EC2 instances that are stored in S3, I could copy those onto on-premises disk, I could copy those into Azure, I could do the same with my Azure VMs and store those on S3, or again, on-premises disk, and any other endless combination that goes with that. And it's really kind of centered around, like control and ownership of your data. We are not prescriptive by any means. Like, you do what is best for your organization. We just want to provide you with the toolset that enables you to do that without steering you one direction or the other with fee structures, disparate feature sets, whatever it might be.Corey: One of the big challenges that I keep seeing across the board is just a lack of awareness of what the data that matters is, where you see people backing up endless fleets of web server instances that are auto-scaled into existence and then removed, but you can create those things at will; why do you care about the actual data that's on these things? It winds up almost at the library management problem, on some level. And in that scenario, snapshots are almost certainly the wrong answer. One thing that I saw previously that really changed my way of thinking about this was back many years ago when I was working at a startup that had just started using GitHub and they were paying for a third-party service that wound up backing up Git repos. Today, that makes a lot more sense because you have a bunch of other stuff on GitHub that goes well beyond the stuff contained within Git, but at the time, it was silly. It was, why do that? Every Git clone is a full copy of the entire repository history. Just grab it off some developer's laptop somewhere.It's like, “Really? You want to bet the company, slash your job, slash everyone else's job on that being feasible and doable or do you want to spend the 39 bucks a month or whatever it was to wind up getting that out the door now so we don't have to think about it, and they validate that it works?” And that was really a shift in my way of thinking because, yeah, backing up things can get expensive when you have multiple copies of the data living in different places, but what's really expensive is not having a company anymore.Sam: Yeah, yeah, absolutely. We can tie it back to my insurance dynamic earlier where, you know, it's something that you know that you have to have, but you don't necessarily want to pay for it. Well, you know, just like with insurances, there's multiple different ways to go about recovering your data and it's only in crunch time, do you really care about what it is that you've been paying for, right, when it comes to backup?Could you get your backup through a git clone? Absolutely. Could you get your data back—how long is that going to take you? How painful is that going to be? What's going to be the impact to the business where you're trying to figure that out versus, like you say, the 39 bucks a month, a year, or whatever it might be to have something purpose-built for that, that is going to make the recovery process as quick and painless as possible and just get things back up online.Corey: I am not a big fan of the fear, uncertainty, and doubt approach, but I do practice what I preach here in that yeah, there is a real fear against data loss. It's not, “People are coming to get you, so you absolutely have to buy whatever it is I'm selling,” but it is something you absolutely have to think about. My core consulting proposition is that I optimize the AWS bill. And sometimes that means spending more. Okay, that one S3 bucket is extremely important to you and you say you can't sustain the loss of it ever so one zone is not an option. Where is it being backed up? Oh, it's not? Yeah, I suggest you spend more money and back that thing up if it's as irreplaceable as you say. It's about doing the right thing.Sam: Yeah, yeah, it's interesting, and it's going to be hard for you to prove the value of doing that when you are driving their bill up when you're trying to bring it down. But again, you have to look at something that's not itemized on that bill, which is going to be the impact of downtime. I'm not going to pretend to try and recall the exact figures because it also varies depending on your business, your industry, the size, but the impact of downtime is massive financially. Tens of thousands of dollars for small organizations per hour, millions and millions of dollars per hour for much larger organizations. The backup component of that is relatively small in comparison, so having something that is purpose-built, and is going to protect your data and help mitigate that impact of downtime.Because that's ultimately what you're trying to protect against. It is the recovery piece that you're buying is the most important piece. And like you, I would say, at least be cognizant of it and evaluate your options and what can you live with and what can you live without.Corey: That's the big burning question that I think a lot of people do not have a good answer to. And when you don't have an answer, you either backup everything or nothing. And I'm not a big fan of doing either of those things blindly.Sam: Yeah, absolutely. And I think this is why we see varying different backup options as well, you know? You're not going to try and apply the same data protection policies each and every single workload within your environment because they've all got different types of workload criticality. And like you say, some of them might not even need to be backed up at all, just because they don't have data that needs to be protected. So, you need something that is going to be able to be flexible enough to apply across the entirety of your environment, protect it with the right policy, in terms of how frequently do you protect it, where do you store it, how often, or when are you eventually going to delete that and apply that on a workload by workload basis. And this is where the joy of things like tags come into play as well.Corey: One last thing I want to bring up is that I'm a big fan of watching for companies saying the quiet part out loud. And one area in which they do this—because they're forced to by brevity—is in the title tag of their website. I pull up veeam.com and I hover over the tab in my browser, and it says, “Veeam Software: Modern Data Protection.”And I want to call that out because you're not framing it as explicitly backup. So, the last topic I want to get into is the idea of security. Because I think it is not fully appreciated on a lived-experience basis—although people will of course agree to this when they're having ivory tower whiteboard discussions—that every place your data lives is a potential for a security breach to happen. So, you want to have your data living in a bunch of places ideally, for backup and resiliency purposes. But you also want it to be completely unworkable or illegible to anyone who is not authorized to have access to it.How do you balance those trade-offs yourself given that what you're fundamentally saying is, “Trust us with your Holy of Holies when it comes to things that power your entire business?” I mean, I can barely get some companies to agree to show me their AWS bill, let alone this is the data that contains all of this stuff to destroy our company.Sam: Yeah. Yeah, it's a great question. Before I explicitly answer that piece, I will just go to say that modern data protection does absolutely have a security component to it, and I think that backup absolutely needs to be a—I'm going to say this an air quotes—a “first class citizen” of any security strategy. I think when people think about security, their mind goes to the preventative, like how do we keep these bad people out?This is going to be a bit of the FUD that you love, but ultimately, the bad guys on the outside have an infinite number of attempts to get into your environment and only have to be right once to get in and start wreaking havoc. You on the other hand, as the good guy with your cape and whatnot, you have got to be right each and every single one of those times. And we as humans are fallible, right? None of us are perfect, and it's incredibly difficult to defend against these ever-evolving, more complex attacks. So backup, if someone does get in, having a clean, verifiable, recoverable backup, is really going to be the only thing that is going to save your organization, should that actually happen.And what's key to a secure backup? I would say separation, isolation of backup data from the production data, I would say utilizing things like immutability, so in AWS, we've got Amazon S3 object lock, so it's that write once, read many state for whatever retention period that you put on it. So, the data that they're seeking to encrypt, whether it's in production or in their backup, they cannot encrypt it. And then the other piece that I think is becoming more and more into play, and it's almost table stakes is encryption, right? And we can utilize things like AWS KMS for that encryption.But that's there to help defend against the exfiltration attempts. Because these bad guys are realizing, “Hey, people aren't paying me my ransom because they're just recovering from a clean backup, so now I'm going to take that backup data, I'm going to leak the personally identifiable information, trade secrets, or whatever on the internet, and that's going to put them in breach compliance and give them a hefty fine that way unless they pay me my ransom.” So encryption, so they can't read that data. So, not only can they not change it, but they can't read it is equally important. So, I would say those are the three big things for me on what's needed for backup to make sure it is clean and recoverable.Corey: I think that is one of those areas where people need to put additional levels of thought in. I think that if you have access to the production environment and have full administrative rights throughout it, you should definitionally not—at least with that account and ideally not you at all personally—have access to alter the backups. Full stop. I would say, on some level, there should not be the ability to alter backups for some particular workloads, the idea being that if you get hit with a ransomware infection, it's pretty bad, let's be clear, but if you can get all of your data back, it's more of an annoyance than it is, again, the existential business crisis that becomes something that redefines you as a company if you still are a company.Sam: Yeah. Yeah, I mean, we can turn to a number of organizations. Code Spaces always springs to mind for me, I love Code Spaces. It was kind of one of those precursors to—Corey: It's amazing.Sam: Yeah, but they were running on AWS and they had everything, production and backups, all stored in one account. Got into the account. “We're going to delete your data if you don't pay us this ransom.” They were like, “Well, we're not paying you the ransoms. We got backups.” Well, they deleted those, too. And, you know, unfortunately, Code Spaces isn't around anymore. But it really kind of goes to show just the importance of at least logically separating your data across different accounts and not having that god-like access to absolutely everything.Corey: Yeah, when you talked about Code Spaces, I was in [unintelligible 00:32:29] talking about GitHub Codespaces specifically, where they have their developer workstations in the cloud. They're still very much around, at least last time I saw unless you know something I don't.Sam: Precursor to that. I can send you the link—Corey: Oh oh—Sam: You can share it with the listeners.Corey: Oh, yes, please do. I'd love to see that.Sam: Yeah. Yeah, absolutely.Corey: And it's been a long and strange time in this industry. Speaking of links for the show notes, I appreciate you're spending so much time with me. Where can people go to learn more?Sam: Yeah, absolutely. I think veeam.com is kind of the first place that people gravitate towards. Me personally, I'm kind of like a hands-on learning kind of guy, so we always make free product available.And then you can find that on the AWS Marketplace. Simply search ‘Veeam' through there. A number of free products; we don't put time limits on it, we don't put feature limitations. You can backup ten instances, including your VPCs, which we actually didn't talk about today, but I do think is important. But I won't waste any more time on that.Corey: Oh, configuration of these things is critically important. If you don't know how everything was structured and built out, you're basically trying to re-architect from first principles based upon archaeology.Sam: Yeah [laugh], that's a real pain. So, we can help protect those VPCs and we actually don't put any limitations on the number of VPCs that you can protect; it's always free. So, if you're going to use it for anything, use it for that. But hands-on, marketplace, if you want more documentation, want to learn more, want to speak to someone veeam.com is the place to go.Corey: And we will, of course, include that in the show notes. Thank you so much for taking so much time to speak with me today. It's appreciated.Sam: Thank you, Corey, and thanks for all the listeners tuning in today.Corey: Sam Nicholls, Director of Public Cloud at Veeam. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry insulting comment that takes you two hours to type out but then you lose it because you forgot to back it up.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Need for Speed in Time-Series Data with Brian Mullen

Screaming in the Cloud

Play Episode Listen Later Nov 29, 2022 32:55


About BrianBrian is an accomplished dealmaker with experience ranging from developer platforms to mobile services. Before InfluxData, Brian led business development at Twilio. Joining at just thirty-five employees, he built over 150 partnerships globally from the company's infancy through its IPO in 2016. He led the company's international expansion, hiring its first teams in Europe, Asia, and Latin America. Prior to Twilio Brian was VP of Business Development at Clearwire and held management roles at Amp'd Mobile, Kivera, and PlaceWare.Links Referenced:InfluxData: https://www.influxdata.com/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is bought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups.  If you're tired of the vulnerabilities, costs and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, thats V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out.Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. It's been a year, which means it's once again time to have a promoted guest episode brought to us by our friends at InfluxData. Joining me for a second time is Brian Mullen, CMO over at InfluxData. Brian, thank you for agreeing to do this a second time. You're braver than most.Brian: Thanks, Corey. I'm happy to be here. Second time is the charm.Corey: So, it's been an interesting year to put it mildly and I tend to have the attention span of a goldfish of most days, so for those who are similarly flighty, let's start at the very top. What is an InfluxDB slash InfluxData slash Influx—when you're not sure which one to use, just shorten it and call it good—and why might someone need it?Brian: Sure. So, InfluxDB is what most people understand our product as, a pretty popular open-source product, been out for quite a while. And then our company, InfluxData is the company behind InfluxDB. And InfluxDB is where developers build IoT real-time analytics and cloud applications, typically all based on time series. It's a time-series data platform specifically built to handle time-series data, which we think about is any type of data that is stamped in time in some way.It could be metrics, like, taken every one second, every two seconds, every three seconds, or some kind of event that occurs and is stamped in time in some way. So, our product and platform is really specialized to handle that technical problem.Corey: When last we spoke, I contextualized that in the realm of an IoT sensor that winds up reporting its device ID and its temperature at a given timestamp. That is sort of baseline stuff that I think aligns with what we're talking about. But over the past year, I started to see it in a bit of a different light, specifically viewing logs as time-series data, which hadn't occurred to me until relatively recently. And it makes perfect sense, on some level. It's weird to contextualize what Influx does as being a logging database, but there's absolutely no reason it couldn't be.Brian: Yeah, it certainly could. So typically, we see the world of time-series data in kind of two big realms. One is, as you mentioned the, you know, think of it as the hardware or, you know, physical realm: devices and sensors, these are things that are going to show up in a connected car, in a factory deployment, in renewable energy, you know, wind farm. And those are real devices and pieces of hardware that are out in the physical world, collecting data and emitting, you know, time-series every one second, or five seconds, or ten minutes, or whatever it might be.But it also, as you mentioned, applies to, call it the virtual world, which is really all of the software and infrastructure that is being stood up to run applications and services. And so, in that world, it could be the same—it's just a different type of source, but is really kind of the same technical problem. It's still time-series data being stamped, you know, data being stamped every, you know, one second, every five seconds, in some cases, every millisecond, but it is coming from a source that is actually in the infrastructure. Could be, you know, virtual machines, it could be containers, it could be microservices running within those containers. And so, all of those things together, both in the physical world and this infrastructure world are all emitting time-series data.Corey: When you take a look at the broader ecosystem, what is it that you see that has been the most misunderstood about time-series data as a whole? For example, when I saw AWS talking about a lot of things that they did in the realm of for your data lake, I talked to clients of mine about this and their response is, “Well, that'd be great genius, if we had a data lake.” It's, “What do you think those petabytes of nonsense in S3 are?” “Oh, those are the logs and the assets and a bunch of other nonsense.” “Yeah, that's what other people are calling a data lake.” “Oh.” Do you see similar lights-go-on moment when you talk to clients and prospective clients about what it is that they're doing that they just hadn't considered to be time-series data previously?Brian: Yeah. In fact, that's exactly what we see with many of our customers is they didn't realize that all of a sudden, they are now handling a pretty sizable time-series workload. And if you kind of take a step back and look at a couple of pretty obvious but sometimes unrecognized trends in technology, the first is cloud applications in general are expanding, they're both—horizontally and vertically. So, that means, like, the workloads that are being run in the Netflix's of the world, or all the different infrastructure that's being spun up in the cloud to run these various, you know, applications and services, those workloads are getting bigger and bigger, those companies and their subscriber bases, and the amount of data they're generating is getting bigger and bigger. They're also expanding horizontally by region and geography.So Netflix, for example, running not just in the US, but in every continent and probably every cloud region around the world. So, that's happening in the cloud world, and then also, in the IoT world, there's this massive growth of connected devices, both net-new devices that are being developed kind of, you know, the next Peloton or the next climate control unit that goes in an apartment or house, and also these longtime legacy devices that are been on the factory floor for a couple of decades, but now are being kind of modernized and coming online. So, if you look at all of that growth of the data sources now being built up in the cloud and you look at all that growth of these connected devices, both new and existing, that are kind of coming online, there's a huge now exponential growth in the sources of data. And all of these sources are emitting time-series data. You can just think about a connected car—not even a self-driving car, just a connected car, your everyday, kind of, 2022 model, and nearly every element of the car is emitting time-series data: its engine components, you know, your tires, like, what the climate inside of the car is, statuses of the engine itself, and it's all doing that in real-time, so every one second, every five seconds, whatever.So, I think in general, people just don't realize they're already dealing with a substantial workload of time series. And in most cases, unless they're using something like Influx, they're probably not, you know, especially tuned to handle it from a technology perspective.Corey: So, it's been a year. What has changed over on your side of the world since the last time we spoke? It seems that well, things continue and they're up and to the right. Well, sure, generally speaking, you're clearly still in business. Good job, always appreciative of your custom, as well as the fact that oh, good, even in a world where it seems like there's a macro recession in progress, that there are still companies out there that continue to persist and in some cases, dare I say, even thrive? What have you folks been up to?Brian: Yeah, it's been a big year. So first, we've seen quite a bit of expansion across the use cases. So, we've seen even further expansion in IoT, kind of expanding into consumer, industrial, and now sustainability and clean energy, and that pairs with what we've seen on FinTech and cryptocurrency, gaming and entertainment applications, network telemetry, including some of the biggest names in telecom, and then a little bit more on the cloud side with cloud services, infrastructure, and dev tools and APIs. So, quite a bit more broad set of use cases we're now seeing across the platform. And the second thing is—you might have seen it in the last month or so—is a pretty big announcement we had of our new storage engine.So, this was just announced earlier this month in November and was previously introduced to our community as what we call an IOx, which is how it was known in the open-source. And think of this really as a rebuilt and reimagined storage engine which is built on that open-source project, InfluxDB IOx that allows us to deliver faster queries, and now—pretty exciting for the first time—unlimited time-series, or cardinality as it's known in the space. And then also we introduced SQL for writing queries and BI tool support. And this is, for the first time we're introducing SQL, which is world's most popular data programming language to our platform, enabling developers to query via the API our language Flux, and InfluxQL in addition.Corey: A long time ago, it really seems that the cloud took a vote, for lack of a better term, and decided that when it comes to storage, object store is the way forward. It was a bit of a reimagining from how we all considered using storage previously, but the economics are at minimum of ten to one in favor of objects store, the latency is far better, the durability is off the charts better, you don't have to deal—at least in AWS-land—with the concept of availability zones and the rest, just from an economic and performance perspective, provided the use case embraces it, there's really no substitute.Brian: Yeah, I mean, the way we think about storage is, you know, obviously, it varies quite a bit from customer to customer with our use cases. Especially in IoT, we see some use cases where customers want to have data around for months and in some cases, years. So, it's a pretty substantial data set you're often looking at. And sometimes those customers want to downsample those, they don't necessarily need every single piece of minutia that they may need in real-time, but not in summary, looking backward. So, you really—we're in this kind of world where we're dealing with both hive fidelity—usually in the moment—data and lower fidelity, when people can downsample and have a little bit more of a summarized view of what happened.So, pretty unique for us and we have to kind of design the product in a way that is able to balance both of those because that's what, you know, the customer use cases demand. It's a super hard problem to solve. One of the reasons that you have a product like InfluxDB, which is specialized to handle this kind of thing, is so that you can actually manage that balance in your application service and setting your retention policy, et cetera.Corey: That's always been something that seemed a little on the odd side to me when I'm looking at a variety of different observability tools, where it seems that one of the key dimensions that they all tend to, I guess, operate on and price on is retention period. And I get it; you might not necessarily want to have your load balancer logs from 2012 readily available and paying for the privilege, but it does seem that given the dramatic fall of archival storage pricing, on some level, people do want to be able to retain that data just on the off chance that will be useful. Maybe that's my internal digital packrat chiming in at this point, but I do believe strongly that there is a correlation between how recent the data is and how useful it is, for a variety of different use cases. But that's also not a global truth. How do you view the divide? And what do you actually see people saying they want versus what they're actually using?Brian: It's a really good question and not a simple problem to solve. So, first of all, I would say it probably really depends on the use case and the extent to which that use case is touching real world applications and services. So, in a pure observability setting where you're looking at, perhaps more of a, kind of, operational view of infrastructure monitoring, you want to understand kind of what happened and when those tend to be a little bit more focused on real-time and recent. So, for example, you of course, want to know exactly what's happening in the moment, zero in on whatever anomaly and kind of surrounding data there is, perhaps that means you're digging into something that happened in you know, fairly recent time. So, those do tend to be, not all of them, but they do tend to be a little bit more real-time and recent-oriented.I think it's a little bit different when we look at IoT. Those generally tend to be longer timeframes that people are dealing with. Their physical out-in-the-field devices, you know, many times those devices are kind of coming online and offline, depending on the connectivity, depending on the environment, you can imagine a connected smart agriculture setup, I mean, those are a pretty wide array of devices out and in, you know, who knows what kind of climate and environment, so they tend to be a little bit longer in retention policy, kind of, being able to dig into the data, what's happening. The time frame that people are dealing with is just, in general, much longer in some of those situations.Corey: One story that I've heard a fair bit about observability data and event data is that they inevitably compose down into metrics rather than events or traces or logs, and I have a hard time getting there because I can definitely see a bunch of log entries showing the web servers return codes, okay, here's the number of 500 errors and number of different types of successes that we wind up seeing in the app. Yeah, all right, how many per minute, per second, per hour, whatever it is that makes sense that you can look at aberrations there. But in the development process at least, I find that having detailed log messages tell me about things I didn't see and need to understand or to continue building the dumb thing that I'm in the process of putting out. It feels like once something is productionalized and running, that its behavior is a lot more well understood, and at that point, metrics really seem to take over. How do you see it, given that you fundamentally live at that intersection where one can become the other?Brian: Yeah, we are right at that intersection and our answer probably would be both. Metrics are super important to understand and have that regular cadence and be kind of measuring that state over time, but you can miss things depending on how frequent those metrics are coming in. And increasingly, when you have the amount of data that you're dealing with coming from these various sources, the measurement is getting smaller and smaller. So, unless you have, you know, perfect metrics coming in every half-second, or you know, in some sub-partition of that, in milliseconds, you're likely to miss something. And so, events are really key to understand those things that pop up and then maybe come back down and in a pure metric setting, in your regular interval, you would have just completely missed. So, we see most of our use cases that are showing a balance of the two is kind of the most effective. And from a product perspective, that's how we think about solving the problem, addressing both.Corey: One of the things that I struggled with is it seems that—again, my approach to this is relatively outmoded. I was a systems administrator back when that title was not considered disparaging by a good portion of the technical community the way that it is today. Even though the job is the same, we call them something different now. Great. Okay, whatever smile, nod, and accept the larger paycheck.But my way of thinking about things are okay, you have the logs, they live on the server itself. And maybe if you want to be fancy, you wind up putting them to a centralized rsyslog cluster or whatnot. Yes, you might send them as well to some other processing system for visibility or a third-party monitoring system, but the canonical truth slash source of logs tends to live locally. That said, I got out of running production infrastructure before this idea of ephemeral containers or serverless functions really became a thing. Do you find that these days you are the source of truth slash custodian of record for these log entries, or do you find that you are more of a secondary source for better visibility and analysis, but not what they're going to bust out when the auditor comes calling in three years?Brian: I think, again, it—[laugh] I feel like I'm answering the same way [crosstalk 00:15:53]Corey: Yeah, oh, and of course, let's be clear, use cases are going to vary wildly. This is not advice on anyone's approach to compliance and the rest [laugh]. I don't want to get myself in trouble here.Brian: Exactly. Well, you know, we kind of think about it in terms of profiles. And we see a couple of different profiles of customers using InfluxDB. So, the first is, and this was kind of what we saw most often early on, still see quite a bit of them is kind of more of that operator profile. And these are folks who are going to—they're building some sort of monitor, kind of, source of truth for—that's internally facing to monitor applications or services, perhaps that other teams within their company built.And so that's, kind of like, a little bit more of your kind of pure operator. Yes, they're building up in the stack themselves, but it's to pay attention to essentially something that another team built. And then what we've seen more recently, especially as we've moved more prominently into the cloud and offered a usage-based service with a, you know, APIs and endpoint people can hit, as we see more people come into it from a builder's perspective. And similar in some ways, except that they're still building kind of a, you know, a source of truth for handling this kind of data. But they're also building the applications and services themselves are taken out to market that are in the hands of customers.And so, it's a little bit different mindset. Typically, there's, you know, a little bit more comfort with using one of many services to kind of, you know, be part of the thing that they're building. And so, we've seen a little bit more comfort from that type of profile, using our service running in the cloud, using the API, and not worrying too much about the kind of, you know, underlying setup of the implementation.Corey: Love how serverless helps you scale big and ship fast, but hate debugging your serverless apps? With Lumigo's serverless observability, it's fast and easy (and maybe a little fun, too). End-to-end distributed tracing gives developers full clarity into their most complex serverless and containerized applications, connecting every service from AWS Lambda and Amazon ECS to DynamoDB, API Gateways, Step Functions and more. Try Lumigo free and debug 3x faster, reduce error rate and speed up development. Visit snark.cloud/lumigo That's snark.cloud/L-U-M-I-G-OCorey: So, I've been on record a lot saying that the best database is TXT records stuffed into Route 53, which works super well as a gag, let's be clear, don't actually build something on top of this, that's a disaster waiting to happen. I don't want to destroy anyone's career as I do this. But you do have a much more viable competitive threat on the landscape. And that is quite simply using the open-source version of InfluxDB. What is the tipping point where, “Huh, I can run this myself,” turns into, “But I shouldn't. I should instead give money to other people to run it for me.”Because having been an engineer, where I believe I'm the world's greatest everything, when it comes to my environment—a fact provably untrue, but that hubris never quite goes away entirely—at what point am I basically being negligent not to start dealing with you in a more formalized business context?Brian: First of all, let me say that we have many customers, many developers out there who are running open-source and it works perfectly for them. The workload is just right, the deployment makes sense. And so, there are many production workloads we're using open-source. But typically, the kind of big turning point for people is on scale, scale, and overall performance related to that. And so, that's typically when they come and look at one of the two commercial offers.So, to start, open-source is a great place to, you know, kind of begin the journey, check it out, do that level of experimentation and kind of proof of concept. We also have 60,000-plus developers using our introductory cloud service, which is a free service. You simply sign up and can begin immediately putting data into the platform and building queries, and you don't have to worry about any of the setup and running servers to deploy software. So, both of those, the open-source and our cloud product are excellent ways to get started. And then when it comes time to really think about building in production and moving up in scale, we have our two commercial offers.And the first of those is InfluxDB Cloud, which is our cloud-native fully managed by InfluxData offering. We run this not only in AWS but also in Google Cloud and Microsoft Azure. It's a usage-based service, which means you pay exactly for what you use, and the three components that people pay for our data in, number of queries, and the amount of data you store in storage. We also for those who are interested in actually managing it themselves, we have InfluxDB Enterprise, which is a software subscription-base model, and it is self-managed by the customer in their environment. Now, that environment could be their own private cloud, it also could be on-premises in their own data center.And so, lots of fun people who are a little bit more oriented to kind of manage software themselves rather than using a service gear toward that. But both those commercial offers InfluxDB Cloud and InfluxDB Enterprise are really designed for, you know, massive scale. In the case of Cloud, I mentioned earlier with the new storage engine, you can hit unlimited cardinality, which means you have no limit on the number of time series you can put into the platform, which is a pretty big game-changing concept. And so, that means however many time-series sources you have and however many series they're emitting, you can run that without a problem without any sort of upper limit in our cloud product. Over on the enterprise side with our self-managed product, that means you can deploy a cluster of whatever size you want. It could be a two-by-four, it could be a four-by-eight, or something even larger. And so, it gives people that are managing in their own private cloud or in a data center environment, really their own options to kind of construct exactly what they need for their particular use case.Corey: Does your object storage layer make it easier to dynamically change clusters on the fly? I mean, historically, running things in a pre-provisioned cluster with EBS volumes or local disk was, “Oh, great. You want to resize something? Well, we're going to be either taking an outage or we're going to be building up something, migrating data live, and there's going to be a knife-switch cutover at some point that makes things relatively unfortunate.” It seems that once you abstract the storage layer away from anything resembling an instance that you would be able to get away from some of those architectural constraints.Brian: Yeah, that's really the promise, and what is delivered in our cloud product is that you no longer, as a developer, have to think about that if you're using that product. You don't have to think about how big the cluster is going to be, you don't have to think about these kind of disaster scenarios. It is all kind of pre-architected in the service. And so, the things that we really want to deliver to people, in addition to the elimination of that concern for what the underlying infrastructure looks like and how its operating. And so, with infrastructure concerns kind of out of the way, what we want to deliver on are kind of the things that people care most about: real-time query speed.So, now with this new storage engine, you can query data across any time series within milliseconds, 100 times faster queries against high cardinality data that was previously impossible. And we also have unlimited time-series volume. Again, any total number of time series you have, which is known as cardinality, is now able to run without a problem in the platform. And then we also have kind of opening up, we're opening up the aperture a bit for developers with SQL language support. And so, this is just a whole new world of flexibility for developers to begin building on the platform. And again, this is all in the way that people are using the product without having to worry about the underlying infrastructure.Corey: For most companies—and this does not apply to you—their core competency is not running time-series databases and the infrastructure attendant thereof, so it seems like it is absolutely a great candidate for, “You know, we really could make this someone else's problem and let us instead focus on the differentiated thing that we are doing or building or complaining about.”Brian: Yeah, that's a true statement. Typically what happens with time-series data is that people first of all, don't realize they have it, and then when they realize they have time-series data, you know, the first thing they'll do is look around and say, “Well, what do I have here?” You know, I have this relational database over here or this document database over here, maybe even this, kind of, search database over here, maybe that thing can handle time series. And in a light manner, it probably does the job. But like I said, the sources of data and just the volume of time series is expanding, really across all these different use cases, exponentially.And so, pretty quickly, people realize that thing that may be able to handle time series in some minor manner, is quickly no longer able to do it. They're just not purpose-built for it. And so, that's where really they come to a product like Influx to really handle this specific problem. We're built specifically for this purpose and so as the time-series workload expands when it kind of hits that tipping point, you really need a specialized tool.Corey: Last question, before I turn you loose to prepare for re:Invent, of course—well, I guess we'll ask you a little bit about that afterwards, but first, we can talk a lot theoretically about what your product could or might theoretically do. What are you actually seeing? What are the use cases that other than the stereotypical ones we've talked about, what have you seen people using it for that surprised you?Brian: Yeah, some of it is—it's just really interesting how it connects to, you know, things you see every day and/or use every day. I mean, chances are, many people listening have probably use InfluxDB and, you know, perhaps didn't know it. You know, if anyone has been to a home that has Tesla Powerwalls—Tesla is a customer of ours—then they've seen InfluxDB in action. Tesla's pulling time-series data from these connected Powerwalls that are in solar-powered homes, and they monitor things like health and availability and performance of those solar panels and the battery setup, et cetera. And they're collecting this at the edge and then sending that back into the hub where InfluxDB is running on their back end.So, if you've ever seen this deployed like that's InfluxDB running behind the scenes. Same goes, I'm sure many people have a Nest thermostat in their house. Nest monitors the infrastructure, actually the powers that collection of IoT data collection. So, you think of this as InfluxDB running behind the scenes to monitor what infrastructure is standing up that back-end Nest service. And this includes their use of Kubernetes and other software infrastructure that's run in their platform for collection, managing, transforming, and analyzing all of this aggregate device data that's out there.Another one, especially for those of us that streamed our minds out during the pandemic, Disney+ entertainment, streaming, and delivery of that to applications and to devices in the home. And so, you know, this hugely popular Disney+ streaming service is essentially a global content delivery network for distributing all these, you know, movies and video series to all the users worldwide. And they monitor the movement and performance of that video content through this global CDN using InfluxDB. So, those are a few where you probably walk by something like this multiple times a week, or in our case of Disney+ probably watching it once a day. And it's great to see InfluxDB kind of working behind the scenes there.Corey: It's one of those things where it's, I guess we'll call it plumbing, for lack of a better term. It's not the sort of thing that people are going to put front-and-center into any product or service that they wind up providing, you know, except for you folks. Instead, it's the thing that empowers a capability behind that product or service that is often taken for granted, just because until you understand the dizzying complexity, particularly at scale, of what these things have to do under the hood, it just—well yeah, of course, it works that way. Why shouldn't it? That's an expectation I have of the product because it's always had that. Yeah, but this is how it gets there.Brian: Our thesis really is that data is best understood through the lens of time. And as this data is expanding exponentially, time becomes increasingly the, kind of, common element, the common component that you're using to kind of view what happened. That could be what's running through a telecom network, what's happening with the devices that are connected that network, the movement of data through that network, and when, what's happening with subscribers and content pushing through a CDN on a streaming service, what's happening with climate and home data in hundreds of thousands, if not millions of homes through common device like a Nest thermostat. All of these things they attach to some real-world collection of data, and as long as that's happening, there's going to be a place for time-series data and tools that are optimized to handle it.Corey: So, my last question—for real this time—we are recording this the week before re:Invent 2022. What do you hope to see, what do you expect to see, what do you fear to see?Brian: No fears. Even though it's Vegas, no fears.Corey: I do have the super-spreader event fear, but that's a separate—Brian: [laugh].Corey: That's a separate issue. Neither one of us are deep into the epidemiology weeds, to my understanding. But yeah, let's just bound this to tech, let's be clear.Brian: Yeah, so first of all, we're really excited to go there. We'll have a pretty big presence. We have a few different locations where you can meet us. We'll have a booth on the main show floor, we'll be in the marketplace pavilion, as I mentioned, InfluxDB Cloud is offered across the marketplaces of each of the clouds, AWS, obviously in this case, but also in Azure and Google. But we'll be there in the AWS Marketplace pavilion, showcasing the new engine and a lot of the pretty exciting new use cases that we've been seeing.And we'll have our full team there, so if you're looking to kind of learn more about InfluxDB, or you've checked it out recently and want to understand kind of what the new capability is, we'll have many folks from our technical teams there, from our development team, some our field folks like the SEs and some of the product managers will be there as well. So, we'll have a pretty great collection of experts on InfluxDB to answer any questions and walk people through, you know, demonstrations and use cases.Corey: I look forward to it. I will be doing my traditional Wednesday afternoon tour through the expo halls and nature walk, so if you're listening to this and it's before Wednesday afternoon, come and find me. I am kicking off and ending at the [unintelligible 00:29:15] booth, but I will make it a point to come by the Influx booth and give you folks a hard time because that's what I do.Brian: We love it. Please. You know, being on the tour is—on the walking tour is excellent. We'll be mentally prepared. We'll have some comebacks ready for you.Corey: Therapists are standing by on both sides.Brian: Yes, exactly. Anyway, we're really looking forward to it. This will be my third year on your walking tour. So, the nature walk is one of my favorite parts of AWS re:Invent.Corey: Well, I appreciate that. Thank you. And thank you for your time today. I will let you get back to your no doubt frenzied preparations. At least they are on my side.Brian: We will. Thanks so much for having me and really excited to do it.Corey: Brian Mullen, CMO at InfluxData, I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment that you naively believe will be stored as a TXT record in a DNS server somewhere rather than what is almost certainly a time-series database.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Couchbase and the Evolving World of Databases with Perry Krug

Screaming in the Cloud

Play Episode Listen Later Nov 28, 2022 34:21


About PerryPerry Krug currently leads the Shared Services team which is focused on building tools and managing infrastructure and data to increase the productivity of Couchbase's Sales and Field organisations.  Perry has been with Couchbase for over 12 years and has served in many customer-facing technical roles, helping hundreds of customers understand, deploy, and maintain Couchbase's NoSQL database technology.  He has been working with high performance caching and database systems for over 15 years.Links Referenced: Couchbase: https://www.couchbase.com/ Perry's LinkedIn: https://www.linkedin.com/in/perrykrug/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: InfluxDB is the smart data platform for time series. It's built from the ground-up to handle the massive volumes and countless sources of time-stamped data produced by sensors, applications, and systems. You probably think of these as logs.InfluxDB is programmable and performant, has a common API across the platform, and handles high granularity data–at scale and with high availability. Use InfluxDB to build real-time applications for analytics, IoT, and cloud-native services, all in less time and with less code. So go ahead–turn your apps up to 11 and start your journey to Awesome for free at InfluxData.com/screaminginthecloudCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Today's episode is a promoted guest episode brought to us by our friends at Couchbase. Now, I want to start off by saying that this week is AWS re:Invent. And there is Last Week in AWS swag available at their booth. More on that to come throughout the next half hour or so of conversation. But let's get right into it. My guest today is Perry Krug, Director of Shared Services over at Couchbase. Perry, thanks for joining me.Perry: Hey, Corey, thank you so much for having me. It's a pleasure.Corey: So, we're recording this before re:Invent, so the fact that we both have, you know, personality and haven't lost our voices yet should probably be a bit of a giveaway on this. But I want to start at the very beginning because unlike people who are academically successful, I tend to suck at doing the homework, across the board. Couchbase has been around for a long time. We've seen the company do a bunch of different things, most importantly and notably, sponsoring my ridiculous nonsense for which I thank you. But let's start at the beginning. What is Couchbase?Perry: Yeah, you're very welcome, Corey. And it's again, it's a pleasure to be here. So, Couchbase is an enterprise database company at the very top level. We make database software and we distribute that to our customers. We have two flavors, two ways of getting your hands on it.One is the kind of legacy, what we call self-managed, where you the user, the customer, downloads the software, installs it themselves, sets it up, manages the cluster monitoring, scaling all of that. And that's, you know, a big part of our business. Over the last few years we've identified, and certainly others in the industry have, as well the desire for users to access database and other technology in a hosted Software-as-a-Service pay-as-you-go, cloud-native, buzzword, et cetera, et cetera, vehicle. And so, we've released the Couchbase Capella, which is our fully managed, fully hosted database-as-a-service, running in—currently—Amazon and Google, soon to be Azure as well. And it wraps and extends our core Couchbase Server product into a, as I mentioned, hosted and managed platform that our users can now come to and consume as developers and build their applications while leaving all of the operational and administration—monitoring, managing failover expansion, all of that—to us as the experts.Corey: So, you folks are non-relational database, NoSQL in the common parlance, which is odd because they call it NoSQL, yet. They keep making more of them, so I feel like that's sort of the Hollywood model where okay, that was so good. We're going to do it again. Where did NoSQL come from? Because back when I was learning databases, when dinosaurs roamed the earth, it was all about relational models, like we're going to use a relational database because when the only tool you have is an axe, every problem looks like hours of fun. What gave rise to this, I guess, Cambrian explosion that we've seen of NoSQL options that proliferate o'er the land?Perry: Yeah, a really, really good question, and I like the axe-throwing metaphor. So sure, 20, 30, 40 now years ago, as digital applications needed a place to store their data, the world invented relational databases. And those were used and continue to be used very well for what they were designed for, for data that follows a very strict structure that doesn't need to be served at significant scale, does not need to be replicated geographically, does not need to handle data coming in from different sources and those sources changing their formats of things all the time. And so, I'm probably as old as you are and been around when the dinosaurs were there. We remember this term called ‘Web 2.0.' Kids, you're going to have to go look that up in the dictionary or TikTok it or something.But Web 2.0 really was the turning point when websites became web applications. And suddenly, there was the introduction of MySpace and Facebook and Amazon and Google and LinkedIn, and a number of others, and they realized that relational databases we're not going to meet their needs, whether it be performance, whether it be flexibility, whether it be changing of data models, whether it be introducing new features at a rapid pace. They tried; they stretched them, they added a bunch of different databases together, and really was not going to be a viable solution. So, 10 now, maybe 15 years ago, you started to see the rise of these tech giants—although we didn't call them tech giants back then but they were the precursors to today's—invent their own new databases.So, Amazon had theirs, Google has theirs, LinkedIn, and a number of others. These companies had reached a level of scale and reached a level of user base, had reached a level of data requirement, had reached a level of expectation with their customers. These customers, us, the users, us consumers, we expect things to be fast, we expect them to be always available. We expect Facebook to give us our news feed in milliseconds. We expect Google to give us our website or our search results in immediate, with more and more information coming along with them.And so, it was these companies that hit those requirements first. The only solution for them was to start from scratch and rewrite their own databases. Fast forward five, six, seven years, and we as consumers turned around and said, “Look, I really liked the way Facebook does things. I really like the way Google does things. I really like the way Amazon does things.“Bank of America, can you do the same? IRS, can you do the same? Health care vendor number one, two, three, and four, government body, can you all give me the same experience? I want my taxi to tell me exactly where it's going to take me from one place to another, I want it to give me a receipt immediately after I finish my ride. Actually, I want to be able to change my payment method after I paid for that ride because I used the wrong one.”All of these are expectations that we as consumers have taken from the tech giants—Apple, LinkedIn, Facebook—and turned around to nearly every other service that we interact with on a daily basis. And all of a sudden, the requirements that Facebook had, that Google had, that no other company had, you know, outside of the top five, suddenly were needed by every single industry, nearly every single company, in order to be competitive in their markets.Corey: And there's no way to scale relational to get to a point where it can wind up handling those type workloads efficiently?Perry: Correct, correct. And it's not just that the technology cannot do it—everything is technically feasible—but the cost both financially and time-to-market-wise in order to do that in a relational database was untenable. It either cost too much money, or it costs too much developers time, or cost too much of everybody's time to try to shoehorn something into it. And then you have the rise of cloud and containers, which relational databases, you know, never even had the inkling of a thought that they might need to be able to handle someday. And so, these requirements that consumers have been placed on everything else that they interact with really led to the rise of NoSQL as a commodity or as a database for the masses.LinkedIn is not in the business of developing a database and then selling it to everybody else to use as a database, right? They built it for themselves, they made their service better. And so, what you see is some of those founding fathers created databases, but then had no desire to sell them to others. And then after that followed the rise of companies like Couchbase and a number of others who said, “Look, we think we can provide those capabilities, we think we can meet those requirements for everybody.” And thereby rose the plethora of NoSQL databases because everybody had a little bit different of an approach to it.If you ask ten people what NoSQL is about, you're going to get eleven or twelve different answers. But you can kind of distill that into two categories. One is performance and operations. So, I need it to be faster, I need it to be scalable, I need it to be replicated geographically. And that's what NoSQL is to me. And that's the right answer.And so, you have things like Cassandra and Redis that are meant to be fast and scalable and replicated. You ask another group and they're going to tell you, “No, no, no. NoSQL needs to be flexible. I need to get rid of the rigid database schemas, I need to bring JSON or other data formats in and munge all this data together and create something cool and new out of it.” And thereby you have the rise of things like MongoDB, who focused nearly exclusively on the developer experience of working with data.And for a long time, those two were in opposite camps, where you have the databases that did performance and the databases that did flexibility. I'm not here to say that Couchbase is the ultimate kitchen sink for everything, but we've certainly tried to approach both of those challenges together so that you can have something that scales and performs and can be flexible enough in data model. And everybody else is trying to do the same thing, right? But all these databases are competing for that same nirvana of the best of both worlds.Corey: And it almost feels like there's a convergence play in place where everything now is trying to go away from the idea of, “Oh, yeah, we started off as a purpose-built database, but you can use this for everything.” And I don't necessarily know that is going to be the path that a lot of companies want to go down. What do you view Couchbase as I guess, falling down? In other words, what workloads is Couchbase inappropriate for?Perry: Yeah, that's a good question. And my [crosstalk 00:10:35]—Corey: Anyone who can't answer that one is a zealot and that's one of those okay, let's be very careful and not take our eyes off you for one second, while smiling and backing away slowly.Perry: Let's cut to commercial. No, I mean, there certainly are workloads that you know, in the past, we've not been good for that we've made improvements to address. There are workloads that we had not address well today that we will try to address in the future, and there are workloads that we may never see as fitting in our wheelhouse. The biggest category group that comes to mind is Couchbase is not an archival database. We are not meant to have data put in us that you don't care about, that you don't want to—that you just need to keep it around, but you don't ever need to access.And there are systems that do that well, they do that at a solid total cost of ownership. And Couchbase is meant for operational data. It's meant for data that needs to be interacted with, read and/or written, at scale and at a reasonable performance to serve a user-facing or system-facing application. And we call ourselves a general-purpose database. Bongo and others call themselves as well. Oracle calls itself a general-purpose database, and yet, not everybody uses Oracle for everything.So, there are reasons that you—Corey: Who could afford that?Perry: Who could? Exactly. It comes down to cost, ultimately. So, I'm not here to say that Couchbase does everything. We like to think, and we're trying to target and strive towards an 80%, right? If we can do 80% of an application or an organization's workloads, there is certainly room for 20% of other workloads, other applications, other requirements that can be met or need to be met by purpose-built databases.But if you rewind four or five years, there was this big push towards polyglot persistence. It's a buzzword that came and kind of has gone out of fashion, but it presented the idea that everybody is going to use 15 different databases and everybody is going to pick the right one for exactly the workload and they're going to somehow stitch them all together. And that really hasn't come to fruition either. So, I think there's some balance, where it's not one to rule them all, but it's also not 15 for every company. Some organizations just have a set of requirements that they want to be met and our database can do that.Corey: Let's continue our tour of the competitive landscape here now that we've handled the relational side of the world. The best database, as anyone who's listened to this show knows, is of course, Amazon's Route 53 TXT records stuffed into DNS, especially in the NoSQL land. Clearly, you're all fighting for second place after that. How do you stack up against the idea of legitimately using that approach? And for those who are not in on the joke, please don't do this. It is not the right answer. But I'm curious to get your take as to why DNS TXT records are an inappropriate NoSQL option.Perry: Well, it's a joke, right? And let's be clear about that. But—Corey: I have to say that because otherwise, someone tries it in production. I've gotten that wrong a few times, historically, so now I put a disclaimer in because yeah, it's only funny, so long as people are in on the joke. If not, and I lead someone down the primrose path to disaster, I feel bad. So, let's be very clear. We're kidding.Perry: And I'm laughing. I'm laughing here behind the camera. I am. I am.Corey: Yeah.Perry: So, the element of truth that I think Couchbase is in a position, or I'm in a position to kind of talk about is, 12 years ago, when Couchbase started, we were a key-value database and that's where we saw the best part of the market in those days, and where we were able to achieve the best scale and replication and performance, and fairly quickly realized that simple key-value, though extremely valuable and easy to manage, was not broad enough in requirements-meeting. And that's where we set our sights on and identified the larger, kind of, document database group, which is really just a derivative of key-value, where still everything is a key and a value; it's just now a document that you can reason about, that you can create an index on, that you can query, that you can run full-text search on, you can do much more with the data. So, at our core, we are still a key-value database. When that value is JSON, we become a document database. And so, if Route 53 decided that they wanted to enter into the document database market, they would need to be adding things that allowed you to introspect and ask questions of the data within that text which you can't, right?Corey: Well, not with that attitude. But yeah, I agree with you.Perry: [laugh].Corey: Moving up the stack, let's talk about a much more fearsome competitor here that I'm certain you see an awful lot of deals that you wind up closing, specifically your own open-source product. You historically have wound up selling software into environments, I believe, you referred to as your legacy offering where it's the hosted version of your commercial software. And now of course, you also have Capella, your cloud-hosted version. But open-source looks surprisingly compelling for an awful lot of use cases and an awful lot of folks. What's the distinction?Perry: Sure. Just to correct a little bit the distinction, we have Couchbase Server, which we provide as a what we call self-managed, where you can download it and install it yourself. Now, you could do that with the open-source version or you could do that with our Enterprise Edition. What we've then done is wrapped that Enterprise Edition in a hosted bottle, and that's Capella. So, the open-source version is something we've long been supporters of; it's been a core part of our go-to-market for the last 12 or 13 years or so and we still see it as a strong offering for organizations that don't need the added features, the added capabilities, don't need the support of the experts that wrote the software behind them.Certainly, we contribute and support our community through our forums and Discord and other channels, but that's a very big difference than two o'clock in the morning, something's not working and I need a ticket to track. We don't do that for our community edition. So, we see lots of users downloading that, picking it up building it into their applications, especially applications that are in their infancy or are with organizations that they simply can't afford the added cost and therefore they don't get the added benefit. We're not here to gouge and carve out every dollar that we can, but if you need the benefit that we can provide, we think there's value in that and that's what we're trying to run a business as.Corey: Oh, absolutely. It doesn't work when you're trying to wind up charging a license fee for something that someone is doing in their spare time project for funsies just to learn the technology. It's like, and then you show up. It's like, “That'll be $700. Surprise.”Yeah, that's sort of the AWS billing model approach, where—it's not a viable onramp for most folks. So, the open-source direction down there make sense. Counterpoint. If you're running a bank on top of it, “Well, we're running it ourselves and really hoping for the best. I mean, we have access to the code and all.” Great, but there are times you absolutely want some of the best minds in the world, with respect to that particular product, able to help troubleshoot so the ATM start working again before people riot in the streets.Perry: Yeah, yeah. And ultimately, it's a question of core competencies. Are you an organization that wants to be in the database development market? Great, by all means, we'd love to support you in that. If you want to focus on doing what you do best be at a bank or an e-commerce website, you worry about your application, you let us worry about the database and everybody gets along very well.Corey: There's definitely something to be said for outsourcing some of the pain, some of the challenge around an awful lot of it.Perry: There's a natural progression to the cloud for that and Software-as-a-Service, database-as-a-service where you're now outsourcing even more by running on our hosting platform. No longer do you have to download the binary and install yourself, no longer do you have to setup the cluster and watch it in case it has a blip or the statistic goes up too far. We're taking care of that for you. So yes, you're paying for that service, but you're getting the value of not having to be a database manager, let alone database developer for them.Corey: Love how serverless helps you scale big and ship fast, but hate debugging your serverless apps? With Lumigo's serverless observability, it's fast and easy (and maybe a little fun, too). End-to-end distributed tracing gives developers full clarity into their most complex serverless and containerized applications, connecting every service from AWS Lambda and Amazon ECS to DynamoDB, API Gateways, Step Functions and more. Try Lumigo free and debug 3x faster, reduce error rate and speed up development. Visit snark.cloud/lumigo That's snark.cloud/L-U-M-I-G-OCorey: What is the point of distinction between Couchbase Server and Couchbase Capella? To be clear, your self-hosted versus managed cloud offerings. When is one appropriate versus the other?Perry: Well, I'm supposed to say that Capella is always the appropriate choice, but there are currently a number of situations where Capella is not available in particular regions or cloud providers and so downloading running the software yourself certainly in your own—yes, there are people who still run their own data centers. I know it's taboo and we don't like to talk about that, but there are people who have on-premise. And so, Couchbase Capella is not available for them. But Couchbase Server is the original Couchbase database and it is the core of Couchbase Capella. So, wrapping is not giving it enough credit; we use Couchbase Server to power Couchbase Capella.And so, there's an enormous amount of value added around the core database, but ultimately, it's the behind the scenes of Couchbase Capella. Which I think is a nice benefit in that when an application is connecting to either one, it gets the same experience. You can point an application at one versus the other and because it's the same database running behind the scenes, the behavior, the data model, the query language, the APIs are all the same, so it adds a nice level of flexibility four customers that are either moving from one to another or have to have some sort of hybrid approach, which we see in the market today.Corey: Let's talk economics for a second. I can see scenarios where especially you have a high volume environment where you're sending tremendous amounts of data back and forth and as soon as it crosses an availability zone boundary or a region boundary, or God forbid, goes out to the internet via standard egress fees over in AWS-land, there's a radically different economic modeling that comes into play as opposed to having something in the same availability zone, in the same subnet just where that—or all traffic back and forth is free. Do you see that in your customer base, that that is a model that is driving people towards self-hosting?Perry: No. And I'd say no because Capella allows you to peer and run your application in the same availability zone as the as a database. And so, as long as that's an option for you that we have, you know, our offering in the right region, in the right AZ, and you can put your application there, then that's not a not an issue. We did have a customer not too long ago that didn't set that up correctly, they thought they did, and we noticed some high data transfer charges. Again, the benefit of running a hosted service, we detected that for them and were able to turn around and say, “Hmm, you might want to change this to over there so that we all save some money in doing so.”If we were not there watching it, they might not have noticed that themselves if they were running it self-managed; they might not have known what to do about it. And so, there's a benefit to working with us and using that hosted platform that we can keep an eye out. And we can apply all of our learning and best practices and bug fixes, we give that to everybody, rather than each person having to stumble across those hurdles themselves.Corey: That's one of those fun, weird corner-case trivia things about AWS data transfer. When you're transferring data within the same region between availability zones, it costs a penny on the sending side and a penny on the receiving side. Everything else is one side or the other that winds up getting the charge. And what makes this especially fun is that when it shows up on your bill, if you transfer a petabyte, it shows as cross-AZ data transfer: two petabytes.Perry: Two. Yeah.Corey: So, it double-counts so they can bill for it appropriately, but it leads to some really weird hunting it down, like, “Okay, well, we found half of it, but where's the other half hiding?” It's always obnoxious to trace this stuff down. The fact that you see it on your bill, well, that's testament to the fact that yeah, they're using the service. Good for them and good for you. Being able to track it down on a per-customer basis that does speak to your level of insight into what exactly is going on in your environment and where. As someone who does this for a living, let me confirm that is absolutely non-trivial.Perry: No, definitely not trivial. And you know, we've learned over the last four or five years, we've learned an enormous amount about how cloud providers work, how AWS works, but guess what, Google does it completely differently. And Azure does it—Corey: Yep.Perry: —completely differently. And so, on the surface level, they're all just cloud providers and they give you a VM, and you put some stuff on it, but integrating with the APIs, integrating with the different systems and naming of things, and then understanding the intricacies of the ins and outs, and, yeah, these cloud providers have their own bugs as well. And so, sometimes you stumble across that for them. And it's been a significant learning exercise that I think we're all better off for, having Couchbase gone through it for you.Corey: Let's get this a little bit more germane for this week for those of you who are listening to this during re:Invent. You folks are clearly here at the show—it's funny to talk about ‘here,' even though when we're recording this, it is not near here; we're actually home and enjoying ourselves, but welcome to temporal dislocation; here we are—here at the show, you folks are—among other things—being kind enough to pass out the Last Week in AWS swag from your booth, which, thank you. So, that is obviously the primary reason that you were at the show. What are the other reasons? What are the secondary reasons that you decided to come here?Perry: Yeah [laugh]. Well, I guess I have to think about this now since you already called out the primary reason.Corey: Exactly. Wait, we can have more than one reason for things? My God.Perry: Can we? Can we? AWS has long been a huge partner of ours, even before Capella itself was released. I remember sometime in, you know, five years or so ago, some 30% of our customers were running Couchbase inside of AWS, and some of our largest were some of your largest at times, like Viber, the messaging platform. And so, we've always had a very strong relationship with AWS, and the better that we can be presenting ourselves to your customers, and our customers can feel that we are jointly supporting them, the better. And so, you know, coming to re:Invent is a testament to that long-standing and very solid partnership, and also it's meant to get more exposure for us to let it be clear that Couchbase runs very well on AWS.Corey: It's one of those areas where when someone says, “Oh yeah, this is a great service offering, but it doesn't run super well on AWS.” It's like, “Okay, so are you bad computers or is what you have built so broken and Byzantine that it has to live somewhere else?” Or occasionally, the use case is absolutely not supported by AWS. Not to beat them up some more on their egress fees, but I'm absolutely about to if you're building a video streaming site, you don't want it living in AWS. It won't run super well there. Well, it'll run well, it'll just run extortionately expensively and that means that it's a non-starter.Perry: Yeah, why do you think Netflix raises their fees?Corey: Netflix, to their credit, has been really rather public about this, where they do all of their egress via their Open Connect, custom-built CDN appliances that they drop all over the place. They don't stream a single byte from AWS, and we know this from the outside because they are clearly still solvent.Perry: [laugh].Corey: I do the math on that. So, if I had been streaming at on-demand prices one month with my Netflix usage, I would have wound up spending four times my subscription fee just in their raw costs for data transfer. And I have it on good authority that is not just data transfer that is their only bill in the entire company; they also have to pay people and content and the analytics engine and whatnot. And it's kind of a weird, strange world.Perry: Real estate.Corey: Yeah. Because it's one of those strange stories because they are absolutely a showcase customer for AWS. They've been a marquee customer trotted out year after year to talk about what they're doing. But if you attempt to replicate their business purely on top of AWS, it will not work. Full stop. The economics preclude that happening.What is your philosophy these days on what historically has felt like an existential threat to most vendors that I've spoken to in a variety of ways: what if Amazon decides to enter your market? I'd ask you the same thing. Do you have fears that they're going to wind up effectively taking your open-source offering and turning it into Amazon Basics Couchbase, for lack of a better term? Is that something that is on your threat radar, or is that not really something you concern yourselves about?Perry: So, I mean, there's no arguing, there's no illusion that Amazon and Google and Microsoft are significant competitors in the database space, along with Oracle and IBM and Mongo and a handful of others.Corey: Anything's a database if you hold it wrong.Perry: This is true. This specific point of open-source is something that we have addressed in the same ways that others have addressed. And that's by choosing and changing our license model so that it precludes cloud providers from using the open-source database to produce their own service on the back of it. Let me be clear, it does not impact our existing open-source users and anybody that wants to use the Community Edition or download the software, the source code, and build it themselves. It's only targeted at Amazon because they have a track record of doing that to things like Elastic and Redis and Mongo, all of whom who have made similar to Couchbase moves to prevent that by the licensing of the open-source code.Corey: So, one of the things I do see at re:Invent every year is—and I believe wholeheartedly this comes historically from a lot of AWS's requirements for vendors on the show floor that have become public through a variety of different ways—where you for a long time, you are not allowed to mention multi-cloud or reference the fact that you work on any other cloud provider there. So, there's been a theme of this is why, for whatever it is we sell or claim to sell or hope one day to sell, AWS is the absolute best place for you to run it, full stop. And in some cases, that's absolutely true because people build primarily for a certain cloud provider and then when they find customers and other places, they learn to run it over there, too. If I'm approaching this from the perspective of I have a database problem—because looking at my philosophy on databases is hard to imagine I don't have database problems—then is my experience going to be better or even materially different between any of the cloud providers if I become a Couchbase Capella customer?Perry: I'd like to say no. We've done our best to abstract and to leverage the best of all of the cloud providers underneath to provide Couchbase in the best form that they will allow us to. And as far as I can see, there's no difference amongst those. Your application and what you do with the data, that may be better suited to one provider or another, but it's always been Couchbase is philosophy—sort of say, strategy—to make our software available to wherever our customers and users want to, to consume it. And that goes everything from physical hardware running in a data center, virtual machines on top of that, containers, cloud, and different cloud providers, different regions, different availability zones, all the way through to edge and other infrastructures. We're not in a position to say, “If you want Couchbase, you should use AWS.” We're in a position to say, “If you are using AWS, you can have Couchbase.”Corey: I really want to thank you for being so generous with your time, and of course, your sponsorship dollars, which are deeply appreciated. Once again, swag is available at the Couchbase booth this week at re:Invent. If people want to learn more and if for some unfathomable reason, they're not at re:Invent, probably because they make good life choices, where can they go to find you?Perry: couchbase.com. That'll to be the best place to land on. That takes you to our documentation, our resources, our getting help, our contact pages, directly into Capella if you want to sign in or login. I would go there.Corey: And we will, of course, put links to that in the show notes. Thank you so much for your time. I really appreciate it.Perry: Corey, it's been a pleasure. Thank you for your questions and banter, and I really appreciate the opportunity to come and share some time with you.Corey: We'll have to have you back in the near future. Perry Krug, Director of Shared Services at Couchbase. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry and insulting comment berating me for being nowhere near musical enough when referencing [singing] Couchbase Capella.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Art and Science of Database Innovation with Andi Gutmans

Screaming in the Cloud

Play Episode Listen Later Nov 23, 2022 37:07


About AndiAndi Gutmans is the General Manager and Vice President for Databases at Google. Andi's focus is on building, managing and scaling the most innovative database services to deliver the industry's leading data platform for businesses. Before joining Google, Andi was VP Analytics at AWS running services such as Amazon Redshift. Before his tenure at AWS, Andi served as CEO and co-founder of Zend Technologies, the commercial backer of open-source PHP.Andi has over 20 years of experience as an open source contributor and leader. He co-authored open source PHP. He is an emeritus member of the Apache Software Foundation and served on the Eclipse Foundation's board of directors. He holds a bachelor's degree in Computer Science from the Technion, Israel Institute of Technology.Links Referenced: LinkedIn: https://www.linkedin.com/in/andigutmans/ Twitter: https://twitter.com/andigutmans TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode is brought to us by our friends at Google Cloud, and in so doing, they have gotten a guest to appear on this show that I have been low-key trying to get here for a number of years. Andi Gutmans is VP and GM of Databases at Google Cloud. Andi, thank you for joining me.Andi: Corey, thanks so much for having me.Corey: I have to begin with the obvious. Given that one of my personal passion projects is misusing every cloud service I possibly can as a database, where do you start and where do you stop as far as saying, “Yes, that's a database,” so it rolls up to me and, “No, that's not a database, so someone else can deal with the nonsense?”Andi: I'm in charge of the operational databases, so that includes both the managed third-party databases such as MySQL, Postgres, SQL Server, and then also the cloud-first databases, such as Spanner, Big Table, Firestore, and AlloyDB. So, I suggest that's where you start because those are all awesome services. And then what doesn't fall underneath, kind of, that purview are things like BigQuery, which is an analytics, you know, data warehouse, and other analytics engines. And of course, there's always folks who bring in their favorite, maybe, lesser-known or less popular database and self-manage it on GCE, on Compute.Corey: Before you wound up at Google Cloud, you spent roughly four years at AWS as VP of Analytics, which is, again, one of those very hazy type of things. Where does it start? Where does it stop? It's not at all clear from the outside. But even before that, you were, I guess, something of a legendary figure, which I know is always a weird thing for people to hear.But you were partially at least responsible for the Zend Framework in the PHP world, which I didn't realize what the heck that was, despite supporting it in production at a couple of jobs, until after I, for better or worse, was no longer trusted to support production environments anymore. Which, honestly, if you can get out, I'm a big proponent of doing that. You sleep so much better without a pager. How did you go from programming languages all the way on over to databases? It just seems like a very odd mix.Andi: Yeah. No, that's a great question. So, I was one of the core developers of PHP, and you know, I had been in the PHP community for quite some time. I also helped ideate. The Zend Framework, which was the company that, you know, I co-founded Zend Technologies was kind of the company behind PHP.So, like Red Hat supports Linux commercially, we supported PHP. And I was very much focused on developers, programming languages, frameworks, IDEs, and that was, you know, really exciting. I had also done quite a bit of work on interoperability with databases, right, because behind every application, there's a database, and so a lot of what we focused on is a great connectivity to MySQL, to Postgres, to other databases, and I got to kind of learn the database world from the outside from the application builders. We sold our company in I think it was 2015 and so I had to kind of figure out what's next. And so, one option would have been, hey, stay in programming languages, but what I learned over the many years that I worked with application developers is that there's a huge amount of value in data.And frankly, I'm a very curious person; I always like to learn, so there was this opportunity to join Amazon, to join the non-relational database side, and take myself completely out of my comfort zone. And actually, I joined AWS to help build the graph database Amazon Neptune, which was even more out of my comfort zone than even probably a relational database. So, I kind of like to do different things and so I joined and I had to learn, you know how to build a database pretty much from the ground up. I mean, of course, I didn't do the coding, but I had to learn enough to be dangerous, and so I worked on a bunch of non-relational databases there such as, you know, Neptune, Redis, Elasticsearch, DynamoDB Accelerator. And then there was the opportunity for me to actually move over from non-relational databases to analytics, which was another way to get myself out of my comfort zone.And so, I moved to run the analytic space, which included services like Redshift, like EMR, Athena, you name it. So, that was just a great experience for me where I got to work with a lot of awesome people and learn a lot. And then the opportunity arose to join Google and actually run the Google transactional databases including their older relational databases. And by the way, my job actually have two jobs. One job is running Spanner and Big Table for Google itself—meaning, you know, search ads and YouTube and everything runs on these databases—and then the second job is actually running external-facing databases for external customers.Corey: How alike are those two? Is it effectively the exact same thing, just with different API endpoints? Are they two completely separate universes? It's always unclear from the outside when looking at large companies that effectively eat versions of their own dog food, where their internal usage of these things starts and stops.Andi: So, great question. So, Cloud Spanner and Cloud Big Table do actually use the internal Spanner and Big Table. So, at the core, it's exactly the same engine, the same runtime, same storage, and everything. However, you know, kind of, internally, the way we built the database APIs was kind of good for scrappy, you know, Google engineers, and you know, folks are kind of are okay, learning how to fit into the Google ecosystem, but when we needed to make this work for enterprise customers, we needed a cleaner APIs, we needed authentication that was an external, right, and so on, so forth. So, think about we had to add an additional set of APIs on top of it, and management, right, to really make these engines accessible to the external world.So, it's running the same engine under the hood, but it is a different set of APIs, and a big part of our focus is continuing to expose to enterprise customers all the goodness that we have on the internal system. So, it's really about taking these very, very unique differentiated databases and democratizing access to them to anyone who wants to.Corey: I'm curious to get your position on the idea that seems to be playing it's—I guess, a battle that's been playing itself out in a number of different customer conversations. And that is, I guess, the theoretical decision between, do we go towards general-purpose databases and more or less treat every problem as a nail in search of a hammer or do you decide that every workload gets its own custom database that aligns the best with that particular workload? There are trade-offs in either direction, but I'm curious where you land on that given that you tend to see a lot more of it than I do.Andi: No, that's a great question. And you know, just for the viewers who maybe aren't aware, there's kind of two extreme points of view, right? There's one point of view that says, purpose-built for everything, like, every specific pattern, like, build bespoke databases, it's kind of a best-of-breed approach. The problem with that approach is it becomes extremely complex for customers, right? Extremely complex to decide what to use, they might need to use multiple for the same application, and so that can be a bit daunting as a customer. And frankly, there's kind of a law of diminishing returns at some point.Corey: Absolutely. I don't know what the DBA role of the future is, but I don't think anyone really wants it to be, “Oh, yeah. We're deciding which one of these three dozen manage database services is the exact right fit for each and every individual workload.” I mean, at some point it feels like certain cloud providers believe that not only every workload should have its own database, but almost every workload should have its own database service. It's at some point, you're allowed to say no and stop building these completely, what feel like to me, Byzantine, esoteric database engines that don't seem to have broad applicability to a whole lot of problems.Andi: Exactly, exactly. And maybe the other extreme is what folks often talk about as multi-model where you say, like, “Hey, I'm going to have a single storage engine and then map onto that the relational model, the document model, the graph model, and so on.” I think what we tend to see is if you go too generic, you also start having performance issues, you may not be getting the right level of abilities and trade-offs around consistency, and replication, and so on. So, I would say Google, like, we're taking a very pragmatic approach where we're saying, “You know what? We're not going to solve all of customer problems with a single database, but we're also not going to have two dozen.” Right?So, we're basically saying, “Hey, let's understand that the main characteristics of the workloads that our customers need to address, build the best services around those.” You know, obviously, over time, we continue to enhance what we have to fit additional models. And then frankly, we have a really awesome partner ecosystem on Google Cloud where if someone really wants a very specialized database, you know, we also have great partners that they can use on Google Cloud and get great support and, you know, get the rest of the benefits of the platform.Corey: I'm very curious to get your take on a pattern that I've seen alluded to by basically every vendor out there except the couple of very obvious ones for whom it does not serve their particular vested interests, which is that there's a recurring narrative that customers are demanding open-source databases for their workloads. And when you hear that, at least, people who came up the way that I did, spending entirely too much time on Freenode, back when that was not a deeply problematic statement in and of itself, where, yes, we're open-source, I guess, zealots is probably the best terminology, and yeah, businesses are demanding to participate in the open-source ecosystem. Here in reality, what I see is not ideological purity or anything like that and much more to do with, “Yeah, we don't like having a single commercial vendor for our databases that basically plays the insert quarter to continue dance whenever we're trying to wind up doing something new. We want the ability to not have licensing constraints around when, where, how, and how quickly we can run databases.” That's what I hear when customers are actually talking about open-source versus proprietary databases. Is that what you see or do you think that plays out differently? Because let's be clear, you do have a number of database services that you offer that are not open-source, but are also absolutely not tied to weird licensing restrictions either?Andi: That's a great question, and I think for years now, customers have been in a difficult spot because the legacy proprietary database vendors, you know, knew how sticky the database is, and so as a result, you know, the prices often went up and was not easy for customers to kind of manage costs and agility and so on. But I would say that's always been somewhat of a concern. I think what I'm seeing changing and happening differently now is as customers are moving into the cloud and they want to run hybrid cloud, they want to run multi-cloud, they need to prove to their regulator that it can do a stressed exit, right, open-source is not just about reducing cost, it's really about flexibility and kind of being in control of when and where you can run the workloads. So, I think what we're really seeing now is a significant surge of customers who are trying to get off legacy proprietary database and really kind of move to open APIs, right, because they need that freedom. And that freedom is far more important to them than even the cost element.And what's really interesting is, you know, a lot of these are the decision-makers in these enterprises, not just the technical folks. Like, to your point, it's not just open-source advocates, right? It's really the business people who understand they need the flexibility. And by the way, even the regulators are asking them to show that they can flexibly move their workloads as they need to. So, we're seeing a huge interest there and, as you said, like, some of our services, you know, are open-source-based services, some of them are not.Like, take Spanner, as an example, it is heavily tied to how we build our infrastructure and how we build our systems. Like, I would say, it's almost impossible to open-source Spanner, but what we've done is we've basically embraced open APIs and made sure if a customer uses these systems, we're giving them control of when and where they want to run their workloads. So, for example, Big Table has an HBase API; Spanner now has a Postgres interface. So, our goal is really to give customers as much flexibility and also not lock them into Google Cloud. Like, we want them to be able to move out of Google Cloud so they have control of their destiny.Corey: I'm curious to know what you see happening in the real world because I can sit here and come up with a bunch of very well-thought-out logical reasons to go towards or away from certain patterns, but I spent years building things myself. I know how it works, you grab the closest thing handy and throw it in and we all know that there is nothing so permanent as a temporary fix. Like, that thing is load-bearing and you'll retire with that thing still in place. In the idealized world, I don't think that I would want to take a dependency on something like—easy example—Spanner or AlloyDB because despite the fact that they have Postgres-squeal—yes, that's how I pronounce it—compatibility, the capabilities of what they're able to do under the hood far exceed and outstrip whatever you're going to be able to build yourself or get anywhere else. So, there's a dataflow architectural dependency lock-in, despite the fact that it is at least on its face, Postgres compatible. Counterpoint, does that actually matter to customers in what you are seeing?Andi: I think it's a great question. I'll give you a couple of data points. I mean, first of all, even if you take a complete open-source product, right, running them in different clouds, different on-premises environments, and so on, fundamentally, you will have some differences in performance characteristics, availability characteristics, and so on. So, the truth is, even if you use open-source, right, you're not going to get a hundred percent of the same characteristics where you run that. But that said, you still have the freedom of movement, and with I would say and not a huge amount of engineering investment, right, you're going to make sure you can run that workload elsewhere.I kind of think of Spanner in the similar way where yes, I mean, you're going to get all those benefits of Spanner that you can't get anywhere else, like unlimited scale, global consistency, right, no maintenance downtime, five-nines availability, like, you can't really get that anywhere else. That said, not every application necessarily needs it. And you still have that option, right, that if you need to, or want to, or we're not giving you a reasonable price or reasonable price performance, but we're starting to neglect you as a customer—which of course we wouldn't, but let's just say hypothetically, that you know, that could happen—that you still had a way to basically go and run this elsewhere. Now, I'd also want to talk about some of the upsides something like Spanner gives you. Because you talked about, you want to be able to just grab a few things, build something quickly, and then, you know, you don't want to be stuck.The counterpoint to that is with Spanner, you can start really, really small, and then let's say you're a gaming studio, you know, you're building ten titles hoping that one of them is going to take off. So, you can build ten of those, you know, with very minimal spend on Spanner and if one takes off overnight, it's really only the database where you don't have to go and re-architect the application; it's going to scale as big as you need it to. And so, it does enable a lot of this innovation and a lot of cost management as you try to get to that overnight success.Corey: Yeah, overnight success. I always love that approach. It's one of those, “Yeah, I became an overnight success after only ten short years.” It becomes this idea people believe it's in fits and starts, but then you see, I guess, on some level, the other side of it where it's a lot of showing up and doing the work. I have to confess, I didn't do a whole lot of admin work in my production years that touched databases because I have an aura and I'm unlucky, and it turns out that when you blow away some web servers, everyone can laugh and we'll reprovision stateless things.Get too close to the data warehouse, for example, and you don't really have a company left anymore. And of course, in the world of finance that I came out of, transactional integrity is also very much a thing. A question that I had [centers 00:17:51] really around one of the predictions you gave recently at Google Cloud Next, which is your prediction for the future is that transactional and analytical workloads from a database perspective will converge. What's that based on?Andi: You know, I think we're really moving from a world where customers are trying to make real-time decisions, right? If there's model drift from an AI and ML perspective, want to be able to retrain their models as quickly as possible. So, everything is fast moving into streaming. And I think what you're starting to see is, you know, customers don't have that time to wait for analyzing their transactional data. Like in the past, you do a batch job, you know, once a day or once an hour, you know, move the data from your transactional system to analytical system, but that's just not how it is always-on businesses run anymore, and they want to have those real-time insights.So, I do think that what you're going to see is transactional systems more and more building analytical capabilities, analytical systems building, and more transactional, and then ultimately, cloud platform providers like us helping fill that gap and really making data movement seamless across transactional analytical, and even AI and ML workloads. And so, that's an area that I think is a big opportunity. I also think that Google is best positioned to solve that problem.Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With Tailscale SSH, you can do exactly that. Tailscale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate SSH.Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built-in key rotation, permissions as code, connectivity between any two devices, reduce latency, and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?Nope, I wish it were. Tailscale is completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscaleCorey: On some level, I've found that, at least in my own work, that once I wind up using a database for something, I'm inclined to try and stuff as many other things into that database as I possibly can just because getting a whole second data store, taking a dependency on it for any given workload tends to be a little bit on the, I guess, challenging side. Easy example of this. I've talked about it previously in various places, but I was talking to one of your colleagues, [Sarah Ellis 00:19:48], who wound up at one point making a joke that I, of course, took way too far. Long story short, I built a Twitter bot on top of Google Cloud Functions that every time the Azure brand account tweets, it simply quote-tweets that translates their tweet into all caps, and then puts a boomer-style statement in front of it if there's room. This account is @cloudboomer.Now, the hard part that I had while doing this is everything stateless works super well. Where do I wind up storing the ID of the last tweet that it saw on his previous run? And I was fourth and inches from just saying, “Well, I'm already using Twitter so why don't we use Twitter as a database?” Because everything's a database if you're either good enough or bad enough at programming. And instead, I decided, okay, we'll try this Firebase thing first.And I don't know if it's Firestore, or Datastore or whatever it's called these days, but once I wrap my head around it incredibly effective, very fast to get up and running, and I feel like I made at least a good decision, for once in my life, involving something touching databases. But it's hard. I feel like I'm consistently drawn toward the thing I'm already using as a default database. I can't shake the feeling that that's the wrong direction.Andi: I don't think it's necessarily wrong. I mean, I think, you know, with Firebase and Firestore, that combination is just extremely easy and quick to build awesome mobile applications. And actually, you can build mobile applications without a middle tier which is probably what attracted you to that. So, we just see, you know, huge amount of developers and applications. We have over 4 million databases in Firestore with just developers building these applications, especially mobile-first applications. So, I think, you know, if you can get your job done and get it done effectively, absolutely stick to them.And by the way, one thing a lot of people don't know about Firestore is it's actually running on Spanner infrastructure, so Firestore has the same five-nines availability, no maintenance downtime, and so on, that has Spanner, and the same kind of ability to scale. So, it's not just that it's quick, it will actually scale as much as you need it to and be as available as you need it to. So, that's on that piece. I think, though, to the same point, you know, there's other databases that we're then trying to make sure kind of also extend their usage beyond what they've traditionally done. So, you know, for example, we announced AlloyDB, which I kind of call it Postgres on steroids, we added analytical capabilities to this transactional database so that as customers do have more data in their transactional database, as opposed to having to go somewhere else to analyze it, they can actually do real-time analytics within that same database and it can actually do up to 100 times faster analytics than open-source Postgres.So, I would say both Firestore and AlloyDB, are kind of good examples of if it works for you, right, we'll also continue to make investments so the amount of use cases you can use these databases for continues to expand over time.Corey: One of the weird things that I noticed just looking around this entire ecosystem of databases—and you've been in this space long enough to, presumably, have seen the same type of evolution—back when I was transiting between different companies a fair bit, sometimes because I was consulting and other times because I'm one of the greatest in the world at getting myself fired from jobs based upon my personality, I found that the default standard was always, “Oh, whatever the database is going to be, it started off as MySQL and then eventually pivots into something else when that starts falling down.” These days, I can't shake the feeling that almost everywhere I look, Postgres is the answer instead. What changed? What did I miss in the ecosystem that's driving that renaissance, for lack of a better term?Andi: That's a great question. And, you know, I have been involved in—I'm going to date myself a bit—but in PHP since 1997, pretty much, and one of the things we kind of did is we build a really good connector to MySQL—and you know, I don't know if you remember, before MySQL, there was MS SQL. So, the MySQL API actually came from MS SQL—and we bundled the MySQL driver with PHP. And so, kind of that LAMP stack really took off. And kind of to your point, you know, the default in the web, right, was like, you're going to start with MySQL because it was super easy to use, just fun to use.By the way, I actually wrote—co-authored—the tab completion in the MySQL client. So like, a lot of these kinds of, you know, fun, simple ways of using MySQL were there, and frankly, was super fast, right? And so, kind of those fast reads and everything, it just was great for web and for content. And at the time, Postgres kind of came across more like a science project. Like the folks who were using Postgres were kind of the outliers, right, you know, the less pragmatic folks.I think, what's changed over the past, how many years has it been now, 25 years—I'm definitely dating myself—is a few things: one, MySQL is still awesome, but it didn't kind of go in the direction of really, kind of, trying to catch up with the legacy proprietary databases on features and functions. Part of that may just be that from a roadmap perspective, that's not where the owner wanted it to go. So, MySQL today is still great, but it didn't go into that direction. In parallel, right, customers wanting to move more to open-source. And so, what they found this, the thing that actually looks and smells more like legacy proprietary databases is actually Postgres, plus you saw an increase of investment in the Postgres ecosystem, also very liberal license.So, you have lots of other databases including commercial ones that have been built off the Postgres core. And so, I think you are today in a place where, for mainstream enterprise, Postgres is it because that is the thing that has all the features that the enterprise customer is used to. MySQL is still very popular, especially in, like, content and web, and mobile applications, but I would say that Postgres has really become kind of that de facto standard API that's replacing the legacy proprietary databases.Corey: I've been on the record way too much as saying, with some justification, that the best database in the world that should be used for everything is Route 53, specifically, TXT records. It's a key-value store and then anyone who's deep enough into DNS or databases generally gets a slightly greenish tinge and feels ill. That is my simultaneous best and worst database. I'm curious as to what your most controversial opinion is about the worst database in the world that you've ever seen.Andi: This is the worst database? Or—Corey: Yeah. What is the worst database that you've ever seen? I know, at some level, since you manage all things database, I'm asking you to pick your least favorite child, but here we are.Andi: Oh, that's a really good question. No, I would say probably the, “Worst database,” double-quotes is just the file system, right? When folks are basically using the file system as regular database. And that can work for, you know, really simple apps, but as apps get more complicated, that's not going to work. So, I've definitely seen some of that.I would say the most awesome database that is also file system-based kind of embedded, I think was actually SQLite, you know? And SQLite is actually still very, very popular. I think it sits on every mobile device pretty much on the planet. So, I actually think it's awesome, but it's, you know, it's on a database server. It's kind of an embedded database, but it's something that I, you know, I've always been pretty excited about. And, you know, their stuff [unintelligible 00:27:43] kind of new, interesting databases emerging that are also embedded, like DuckDB is quite interesting. You know, it's kind of the SQLite for analytics.Corey: We've been using it for a few things around a bill analysis ourselves. It's impressive. I've also got to say, people think that we had something to do with it because we're The Duckbill Group, and it's DuckDB. “Have you done anything with this?” And the answer is always, “Would you trust me with a database? I didn't think so.” So no, it's just a weird coincidence. But I liked that a lot.It's also counterintuitive from where I sit because I'm old enough to remember when Microsoft was teasing the idea of WinFS where they teased a future file system that fundamentally was a database—I believe it's an index or journal for all of that—and I don't believe anything ever came of it. But ugh, that felt like a really weird alternate world we could have lived in.Andi: Yeah. Well, that's a good point. And by the way, you know, if I actually take a step back, right, and I kind of half-jokingly said, you know, file system and obviously, you know, all the popular databases persist on the file system. But if you look at what's different in cloud-first databases, right, like, if you look at legacy proprietary databases, the typical setup is wright to the local disk and then do asynchronous replication with some kind of bounded replication lag to somewhere else, to a different region, or so on. If you actually start to look at what the cloud-first databases look like, they actually write the data in multiple data centers at the same time.And so, kind of joke aside, as you start to think about, “Hey, how do I build the next generation of applications and how do I really make sure I get the resiliency and the durability that the cloud can offer,” it really does take a new architecture. And so, that's where things like, you know, Spanner and Big Table, and kind of, AlloyDB databases are truly architected for the cloud. That's where they actually think very differently about durability and replication, and what it really takes to provide the highest level of availability and durability.Corey: On some level, I think one of the key things for me to realize was that in my own experiments, whenever I wind up doing something that is either for fun or I just want see how it works in what's possible, the scale of what I'm building is always inherently a toy problem. It's like the old line that if it fits in RAM, you don't have a big data problem. And then I'm looking at things these days that are having most of a petabyte's worth of RAM sometimes it's okay, that definition continues to extend and get ridiculous. But I still find that most of what I do in a database context can be done with almost any database. There's no reason for me not to, for example, uses a SQLite file or to use an object store—just there's a little latency, but whatever—or even a text file on disk.The challenge I find is that as you start scaling and growing these things, you start to run into limitations left and right, and only then it's one of those, oh, I should have made different choices or I should have built-in abstractions. But so many of these things comes to nothing; it just feels like extra work. What guidance do you have for people who are trying to figure out how much effort to put in upfront when they're just more or less puttering around to see what comes out of it?Andi: You know, we like to think about ourselves at Google Cloud as really having a unique value proposition that really helps you future-proof your development. You know, if I look at both Spanner and I look at BigQuery, you can actually start with a very, very low cost. And frankly, not every application has to scale. So, you can start at low cost, you can have a small application, but everyone wants two things: one is availability because you don't want your application to be down, and number two is if you have to scale you want to be able to without having to rewrite your application. And so, I think this is where we have a very unique value proposition, both in how we built Spanner and then also how we build BigQuery is that you can actually start small, and for example, on Spanner, you can go from one-tenth of what we call an instance, like, a small instance, that is, you know, under $65 a month, you can go to a petabyte scale OLTP environment with thousands of instances in Spanner, with zero downtime.And so, I think that is really the unique value proposition. We're basically saying you can hold the stick at both ends: you can basically start small and then if that application doesn't need to scale, does need to grow, you're not reengineering your application and you're not taking any downtime for reprovisioning. So, I think that's—if I had to give folks, kind of, advice, I say, “Look, what's done is done. You have workloads on MySQL, Postgres, and so on. That's great.”Like, they're awesome databases, keep on using them. But if you're truly building a new app, and you're hoping that app is going to be successful at some point, whether it's, like you said, all overnight successes take at least ten years, at least you built in on something like Spanner, you don't actually have to think about that anymore or worry about it, right? It will scale when you need it to scale and you're not going to have to take any downtime for it to scale. So, that's how we see a lot of these industries that have these potential spikes, like gaming, retail, also some use cases in financial services, they basically gravitate towards these databases.Corey: I really want to thank you for taking so much time out of your day to talk with me about databases and your perspective on them, especially given my profound level of ignorance around so many of them. If people want to learn more about how you view these things, where's the best place to find you?Andi: Follow me on LinkedIn. I tend to post quite a bit on LinkedIn, I still post a bit on Twitter, but frankly, I've moved more of my activity to LinkedIn now. I find it's—Corey: That is such a good decision. I envy you.Andi: It's a more curated [laugh], you know, audience and so on. And then also, you know, we just had Google Cloud Next. I recorded a session there that kind of talks about database and just some of the things that are new in database-land at Google Cloud. So, that's another thing that if folks more interested to get more information, that may be something that could be appealing to you.Corey: We will, of course, put links to all of this in the [show notes 00:34:03]. Thank you so much for your time. I really appreciate it.Andi: Great. Corey, thanks so much for having me.Corey: Andi Gutmans, VP and GM of Databases at Google Cloud. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment, then I'm going to collect all of those angry, insulting comments and use them as a database.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Security for Speed and Scale with Ashish Rajan

Screaming in the Cloud

Play Episode Listen Later Nov 22, 2022 35:24


About AshishAshish has over 13+yrs experience in the Cybersecurity industry with the last 7 focusing primarily helping Enterprise with managing security risk at scale in cloud first world and was the CISO of a global Cloud First Tech company in his last role. Ashish is also a keynote speaker and host of the widely poplar Cloud Security Podcast, a SANS trainer for Cloud Security & DevSecOps. Ashish currently works at Snyk as a Principal Cloud Security Advocate. He is a frequent contributor on topics related to public cloud transformation, Cloud Security, DevSecOps, Security Leadership, future Tech and the associated security challenges for practitioners and CISOs.Links Referenced: Cloud Security Podcast: https://cloudsecuritypodcast.tv/ Personal website: https://www.ashishrajan.com/ LinkedIn: https://www.linkedin.com/in/ashishrajan/ Twitter: https://twitter.com/hashishrajan Cloud Security Podcast YouTube: https://www.youtube.com/c/CloudSecurityPodcast Cloud Security Podcast LinkedIn: https://www.linkedin.com/company/cloud-security-podcast/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Thinkst Canary. Most folks find out way too late that they've been breached. Thinkst Canary changes this. Deploy canaries and canary tokens in minutes, and then forget about them. Attackers tip their hand by touching them, giving you one alert, when it matters. With zero administrative overhead to this and almost no false positives, Canaries are deployed and loved on all seven continents. Check out what people are saying at canary.love today. Corey: This episode is bought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups.  If you're tired of the vulnerabilities, costs and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, thats V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode is brought to us once again by our friends at Snyk. Snyk does amazing things in the world of cloud security and terrible things with the English language because, despite raising a whole boatload of money, they still stubbornly refuse to buy a vowel in their name. I'm joined today by Principal Cloud Security Advocate from Snyk, Ashish Rajan. Ashish, thank you for joining me.Corey: Your history is fascinating to me because you've been around for a while on a podcast of your own, the Cloud Security Podcast. But until relatively recently, you were a CISO. As has become relatively accepted in the industry, the primary job of the CISO is to get themselves fired, and then, “Well, great. What's next?” Well, failing upward is really the way to go wherever possible, so now you are at Snyk, helping the rest of us fix our security. That's my headcanon on all of that anyway, which I'm sure bears scant, if any, resemblance to reality, what's your version?Ashish: [laugh]. Oh, well, fortunately, I wasn't fired. And I think I definitely find that it's a great way to look at the CISO job to walk towards the path where you're no longer required because then I think you've definitely done your job. I moved into the media space because we got an opportunity to go full-time. I spoke about this offline, but an incident inspired us to go full-time into the space, so that's what made me leave my CISO job and go full-time into democratizing cloud security as much as possible for anyone and everyone. So far, every day, almost now, so it's almost like I dream about cloud security as well now.Corey: Yeah, I dream of cloud security too, but my dreams are of a better world in which people didn't tell me how much they really care about security in emails that demonstrate how much they failed to care about security until it was too little too late. I was in security myself for a while and got out of it because I was tired of being miserable all the time. But I feel that there's a deep spiritual alignment between people who care about cost and people who care about security when it comes to cloud—or business in general—because you can spend infinite money on those things, but it doesn't really get your business further. It's like paying for fire insurance. It isn't going to get you to your next milestone, whereas shipping faster, being more effective at launching a feature into markets, that can multiply revenue. That's what companies are optimized around. It's, “Oh, right. We have to do the security stuff,” or, “We have to fix the AWS billing piece.” It feels, on some level, like it's a backburner project most of the time and it's certainly invested in that way. What's your take on that?Ashish: I tend to disagree with that, for a couple reasons.Corey: Excellent. I love arguments.Ashish: I feel this in a healthy way as well. A, I love the analogy of spiritual animals where they are cost optimization as well as the risk aversion as well. I think where I normally stand—and this is what I had to unlearn after doing years of cybersecurity—was that initially, we always used to be—when I say ‘we,' I mean cybersecurity folks—we always used to be like police officers. Is that every time there's an incident, it turns into a crime scene, and suddenly we're all like, “Pew, pew, pew,” with trying to get all the evidence together, let's make this isolated as much—as isolated as possible from the rest of the environment, and let's try and resolve this.I feel like in Cloud has asked people to become more collaborative, which is a good problem to have. It also encourages that, I don't know how many people know this, but the reason we have brakes in our cars is not because we can slow down the car; it's so that we can go faster. And I feel security is the same thing. The guardrails we talk about, the risks that you're trying to avert, the reason you're trying to have security is not to slow down but to go faster. Say for example in an ideal world, to quote what you were saying earlier if we were to do the right kind of encryption—I'm just going to use the most basic example—if we just do encryption, right, and just ensure that as a guardrail, the entire company needs to have encryption at rest, encryption in transit, period, nothing else, no one cares about anything else.But if you just lay that out as a framework and this is our guardrail, no one brakes this, and whoever does, hey we—you know, slap on the wrist and come back on to the actual track, but keep going forward. That just means any project that comes in that meets [unintelligible 00:04:58] criteria. Keeps going forward, as many times we want to go into production. Doesn't matter. So, that is the new world of security that we are being asked to move towards where Amazon re:Invent is coming in, there will be another, I don't know, three, four hundred services that will be released. How many people, irrespective of security, would actually know all of those services? They would not. So, [crosstalk 00:05:20]—Corey: Oh, we've long since passed the point where I can convincingly talk about AWS services that don't really exist and not get called out on it by Amazon employees. No one keeps them on their head. Except me because I'm sad.Ashish: Oh, no, but I think you're right, though. I can't remember who was it—maybe Andrew Vogel or someone—they didn't release a service which didn't exist, and became, like, a thing on Twitter. Everyone—Corey: Ah, AWS's Infinidash. I want to say that was Joe Nash out of Twilio at the time. I don't recall offhand if I'm right on that, but that's how it feels. Yeah, it was certainly not me. People said that was my idea. Nope, nope, I just basically amplified it to a huge audience.But yeah, it was a brilliant idea, just because it's a fake service so everyone could tell stories about it. And amazing product feedback, if you look at it through the right lens of how people view your company and your releases when they get this perfect, platonic ideal of what it is you might put out there, what do people say about it?Ashish: Yeah. I think that's to your point, I will use that as an example as well to talk about things that there will always be a service which we will be told about for the first time, which we will not know. So, going back to the unlearning part, as a security team, we just have to understand that we can't use the old ways of, hey, I want to have all the controls possible, cover all there is possible. I need to have a better understanding of all the cloud services because I've done, I don't know, 15 years of cloud, there is no one that has 10, 15 years of cloud unless you're I don't know someone from Amazon employee yourself. Most people these days still have five to six years experience and they're still learning.Even the cloud engineering folks or the DevOps folks, they're all still learning and the tooling is continuing to evolve. So yeah, I think I definitely find that the security in this cloud world a lot more collaborative and it's being looked at as the same function as a brake would have in car: to help you go faster, not to just slam the brake every time it's like, oh, my God, is the situation isolated and to police people.Corey: One of the points I find that is so aligned between security and cost—and you alluded to it a minute ago—is the idea of helping companies go faster safely. To that end, guardrails have to be at least as easy as just going off and doing it cow-person style. Because if it's not, it's more work in any way, shape, or form, people won't do it. People will not tag their resources by hand, people will not go through and use the dedicated account structure you've got that gets in their way and screams at them every time they try to use one of the native features built into the platform. It has to get out of their way and make things easier, not worse, or people fight it, they go around it, and you're never going to get buy-in.Ashish: Do you feel like cost is something that a lot more people pay a lot more attention to because, you know, that creeps into your budget? Like, as people who've been leaders before, and this was the conversation, they would just go, “Well, I only have, I don't know, 100,000 to spend this quarter,” or, “This year,” and they are the ones who—are some of them, I remember—I used to have this manager, once, a CTO would always be conscious about the spend. It's almost like if you overspend, where do you get the money from? There's no money to bring in extra. Like, no. There's a set money that people plan for any year for a budget. And to your point about if you're not keeping an eye on how are we spending this in the AWS context because very easy to spend the entire money in one day, or in the cloud context. So, I wonder if that is also a big driver for people to feel costs above security? Where do you stand on that?Corey: When it comes to cost, one of the nice things about it—and this is going to sound sarcastic, but I swear to you it's not—it's only money.Ashish: Mmm.Corey: Think about that for a second because it's true. Okay, we wound up screwing up and misconfiguring something and overspending. Well, there are ways around that. You can call AWS, you can get credits, you can get concessions made for mistakes, you can sign larger contracts and get a big pile of proof of concept credit et cetera, et cetera. There are ways to make that up, whereas with security, it's there are no do-overs on security breaches.Ashish: No, that's a good point. I mean, you can always get more money, use a credit card, worst case scenario, but you can't do the same for—there's a security breach and suddenly now—hopefully, you don't have to call New York Times and say, “Can you undo that article that you just have posted that told you it was a mistake. We rewinded what we did.”Corey: I'm curious to know what your take is these days on the state of the cloud security community. And the reason I bring that up is, well, I started about a year-and-a-half ago now doing a podcast every Thursday. Which is Last Week in AWS: Security Edition because everything else I found in the industry that when I went looking was aimed explicitly at either—driven by the InfoSec community, which is toxic and a whole bunch of assumed knowledge already built in that looks an awful lot like gatekeeping, which is the reason I got out of InfoSec in the first place, or alternately was completely vendor-captured, where, okay, great, we're going to go ahead and do a whole bunch of interesting content and it's all brought to you by this company and strangely, all of the content is directly align with doing some pretty weird things that you wouldn't do unless you're trying to build a business case for that company's product. And it just feels hopelessly compromised. I wanted to find something that was aimed at people who had to care about security but didn't have security as part of their job title. Think DevOps types and you're getting warmer.That's what I wound up setting out to build. And when all was said and done, I wasn't super thrilled with, honestly, how alone it still felt. You've been doing this for a while, and you're doing a great job at it, don't get me wrong, but there is the question that—and I understand they're sponsoring this episode, but the nice thing about promoted guest episodes is that they can buy my attention, not my opinion. How do you retain creative control of your podcast while working for a security vendor?Ashish: So, that's a good question. So, Snyk by themselves have not ever asked us to change any piece of content; we have been working with them for the past few months now. The reason we kind of came along with Snyk was the alignment. And we were talking about this earlier for I totally believe that DevSecOps and cloud security are ultimately going to come together one day. That may not be today, that may not be tomorrow, that may not be in 2022, or maybe 2023, but there will be a future where these two will sit together.And the developer-first security mentality that they had, in this context from cloud prospective—developers being the cloud engineers, the DevOps people as you called out, the reason you went in that direction, I definitely want to work with them. And ultimately, there would never be enough people in security to solve the problem. That is the harsh reality. There would never be enough people. So, whether it's cloud security or not, like, for people who were at AWS re:Inforce, the first 15 minutes by Steve Schmidt, CSO of Amazon, was get a security guardian program.So, I've been talking about it, everyone else is talking about right now, Amazon has become the first CSP to even talk about this publicly as well that we should have security guardians. Which by the way, I don't know why, but you can still call it—it is technically DevSecOps what you're trying to do—they spoke about a security champion program as part of the keynote that they were running. Nothing to do with cloud security, but the idea being how much of this workload can we share? We can raise, as a security team—for people who may be from a security background listening to this—how much elevation can we provide the risk in front of the right people who are a decision-maker? That is our role.We help them with the governance, we help with managing it, but we don't know how to solve the risk or close off a risk, or close off a vulnerability because you might be the best person because you work in that application every day, every—you know the bandages that are put in, you know all the holes that are there, so the best threat model can be performed by the person who works on a day-to-day, not a security person who spent, like, an hour with you once a week because that's the only time they could manage. So, going back to the Snyk part, that's the mission that we've had with the podcast; we want to democratize cloud security and build a community around neutral information. There is no biased information. And I agree with what you said as well, where a lot of the podcasts outside of what we were finding was more focused on, “Hey, this is how you use AWS. This is how you use Azure. This is how you use GCP.”But none of them were unbiased in the opinion. Because real life, let's just say even if I use the AWS example—because we are coming close to the AWS re:Invent—they don't have all the answers from a security perspective. They don't have all the answers from an infrastructure perspective or cloud-native perspective. So, there are some times—or even most times—people are making a call where they're going outside of it. So, unbiased information is definitely required and it is not there enough.So, I'm glad that at least people like yourself are joining, and you know, creating the world where more people are trying to be relatable to DevOps people as well as the security folks. Because it's hard for a security person to be a developer, but it's easy for a developer or an engineer to understand security. And the simplest example I use is when people walk out of their house, they lock the door. They're already doing security. This is the same thing we're asking when we talk about security in the cloud or in the [unintelligible 00:14:49] as well. Everyone is, it just it hasn't been pointed out in the right way.Corey: I'm curious as to what it is that gets you up in the morning. Now, I know you work in security, but you're also not a CISO anymore, so I'm not asking what gets you up at 2 a.m. because we know what happens in the security space, then. There's a reason that my area of business focus is strictly a business hours problem. But I'd love to know what it is about cloud security as a whole that gets you excited.Ashish: I think it's an opportunity for people to get into the space without the—you know, you said gatekeeper earlier, those gatekeepers who used to have that 25 years experience in cybersecurity, 15 years experience in cybersecurity, Cloud has challenged that norm. Now, none of that experience helps you do AWS services better. It definitely helps you with the foundational pieces, definitely helps you do identity, networking, all of that, but you still have to learn something completely new, a new way of working, which allows for a lot of people who earlier was struggling to get into cybersecurity, now they have an opening. That's what excites me about cloud security, that it has opened up a door which is beyond your CCNA, CISSP, and whatever else certification that people want to get. By the way, I don't have a CISSP, so I can totally throw CISSP under the bus.But I definitely find that cloud security excites me every morning because it has shown me light where, to what you said, it was always a gated community. Although that's a very huge generalization. There's a lot of nice people in cybersecurity who want to mentor and help people get in. But Cloud security has pushed through that door, made it even wider than it was before.Corey: I think there's a lot to be said for the concept of sending the elevator back down. I really have remarkably little patience for people who take the perspective of, “Well, I got mine so screw everyone else.” The next generation should have it easier than we did, figuring out where we land in the ecosystem, where we live in the space. And there are folks who do a tremendous job of this, but there are also areas where I think there is significant need for improvement. I'm curious to know what you see as lacking in the community ecosystem for folks who are just dipping their toes into the water of cloud security.Ashish: I think that one, there's misinformation as well. The first one being, if you have never done IT before you can get into cloud security, and you know, you will do a great job. I think that is definitely a mistake to just accept the fact if Amazon re:Invent tells you do all these certifications, or Azure does the same, or GCP does the same. If I'll be really honest—and I feel like I can be honest, this is a safe space—that for people who are listening in, if you're coming to the space for the first time, whether it's cloud or cloud security, if you haven't had much exposure to the foundational pieces of it, it would be a really hard call. You would know all the AWS services, you will know all the Azure services because you have your certification, but if I was to ask you, “Hey, help me build an application. What would be the architecture look like so it can scale?”“So, right now we are a small pizza-size ten-people team”—I'm going to use the Amazon term there—“But we want to grow into a Facebook tomorrow, so please build me an architecture that can scale.” And if you regurgitate what Amazon has told you, or Azure has told you, or GCP has told you, I can definitely see that you would struggle in the industry because that's not how, say every application is built. Because the cloud service provider would ask you to drink the Kool-Aid and say they can solve all your problems, even though they don't have all the servers in the world. So, that's the first misinformation.The other one too, for people who are transitioning, who used to be in IT or in cybersecurity and trying to get into the cloud security space, the challenge over there is that outside of Amazon, Google, and Microsoft, there is not a lot of formal education which is unbiased. It is a great way to learn AWS security on how amazing AWS is from AWS people, the same way Microsoft will be [unintelligible 00:19:10], however, when it comes down to actual formal education, like the kind that you and I are trying to provide through a podcast, me with the Cloud Security Podcast, you with Last Week in AWS in the Security Edition, that kind of unbiased formal education, like free education, like what you and I are doing does definitely exist and I guess I'm glad we have company, that you and I both exist in this space, but formal education is very limited. It's always behind, say an expensive paid wall sometimes, and rightly so because it's information that would be helpful. So yeah, those two things. Corey: This episode is sponsored in part by our friends at Uptycs. Attackers don't think in silos, so why would you have siloed solutions protecting cloud, containers, and laptops distinctly? Meet Uptycs - the first unified solution prioritizes risk across your modern attack surface—all from a single platform, UI, and data model. Stop by booth 3352 at AWS re:Invent in Las Vegas to see for yourself and visit uptycs.com. That's U-P-T-Y-C-S.com. Corey: One of the problems that I have with the way a lot of cloud security stuff is situated is that you need to have something running to care about the security of. Yeah, I can spin up a VM in the free tier of most of these environments, and okay, “How do I secure a single Linux box?” Okay, yes, there are a lot of things you can learn there, but it's very far from a holistic point of view. You need to have the infrastructure running at reasonable scale first, in order to really get an effective lab that isn't contrived.Now, Snyk is a security company. I absolutely understand and have no problem with the fact that you charge your customers money in order to get security outcomes that are better than they would have otherwise. I do not get why AWS and GCP charge extra for security. And I really don't get why Azure charges extra for security and then doesn't deliver security by dropping the ball on it, which is neither here nor there.Ashish: [laugh].Corey: It feels like there's an economic form of gatekeeping, where you must spend at least this much money—or work for someone who does—in order to get exposure to security the way that grownups think about it. Because otherwise, all right, I hit my own web server, I have ten lines in the logs. Now, how do I wind up doing an analysis run to figure out what happened? I pull it up on my screen and I look at it. You need a point of scale before anything that the modern world revolves around doesn't seem ludicrous.Ashish: That's a good point. Also because we don't talk about the responsibility that the cloud service provider has themselves for security, like the encryption example that I used earlier, as a guardrail, it doesn't take much for them to enable by default. But how many do that by default? I feel foolish sometimes to tell people that, “Hey, you should have encryption enabled on your storage which is addressed, or in transit.”It should be—like, we have services like Let's Encrypt and other services, which are trying to make this easily available to everyone so everyone can do SSL or HTTPS. And also, same goes for encryption. It's free and given the choice that you can go customer-based keys or your own key or whatever, but it should be something that should be default. We don't have to remind people, especially if you're the providers of the service. I agree with you on the, you know, very basic principle of why do I pay extra for security, when you should have already covered this for me as part of the service.Because hey, technically, aren't you also responsible in this conversation? But the way I see shared responsibility is that—someone on the podcast mentioned it and I think it's true—shared responsibility means no one's responsible. And this is the kind of world we're living in because of that.Corey: Shared responsibility has always been an odd concept to me because AWS is where I first encountered it and they, from my perspective, turn what fits into a tweet into a 45-minute dog-and-pony show around, “Ah, this is how it works. This is the part we're responsible for. This is the part where the customer responsibility is. Now, let's have a mind-numbingly boring conversation around it.” Whereas, yeah, there's a compression algorithm here. Basically, if the cloud gets breached, it is overwhelmingly likely that you misconfigured something on your end, not the provider doing it, unless it's Azure, which is neither here nor there, once again.The problem with that modeling, once you get a little bit more business sophistication than I had the first time I made the observation, is that you can't sit down with a CISO at a company that just suffered a data breach and have your conversation be, “Doesn't it suck to be you—[singing] duh, duh—because you messed up. That's it.” You need that dog-and-pony show of being able to go in-depth and nuance because otherwise, you're basically calling out your customer, which you can't really do. Which I feel occludes a lot of clarity for folks who are not in that position who want to understand these things a bit better.Ashish: You're right, Corey. I think definitely I don't want to be in a place where we're definitely just educating people on this, but I also want to call out that we are in a world where it is true that Amazon, Azure, Google Cloud, they all have vulnerabilities as well. Thanks to research by all these amazing people on the internet from different companies out there, they've identified that, hey, these are not pristine environments that you can go into. Azure, AWS, Google Cloud, they themselves have vulnerabilities, and sometimes some of those vulnerabilities cannot be fixed until the customer intervenes and upgrades their services. We do live in a world where there is not enough education about this as well, so I'm glad you brought this up because for people who are listening in, I mean, I was one of those people who would always say, “When was the last time you heard Amazon had a breach?” Or, “Microsoft had a breach?” Or, “Google Cloud had a breach?”That was the idea when people were just buying into the concept of cloud and did not trust cloud. Every cybersecurity person that I would talk to they're like, “Why would you trust cloud? Doesn't make sense.” But this is, like, seven, eight years ago. Fast-forward to today, it's almost default, “Why would you not go into cloud?”So, for people who tend to forget that part, I guess, there is definitely a journey that people came through. With the same example of multi-factor authentication, it was never a, “Hey, let's enable password and multi-factor authentication.” It took a few stages to get there. Same with this as well. We're at that stage where now cloud service providers are showing the kinks in the armor, and now people are questioning, “I should update my risk matrix for what if there's actually a breach in AWS?”Now, Capital One is a great example where the Amazon employee who was sentenced, she did something which has—never even [unintelligible 00:25:32] on before, opened up the door for that [unintelligible 00:25:36] CISO being potentially sentenced. There was another one. Because it became more primetime news, now people are starting to understand, oh, wait. This is not the same as it used to be. Cloud security breaches have evolved as well.And just sticking to the Uber point, when Uber has that recent breach where they were talking about, “Hey, so many data records were gone,” what a lot of people did not talk about in that same message, it also mentioned the fact that, hey, they also got access to the AWS console of Uber. Now, that to me, is my risk metrics has already gone higher than where it was before because it just not your data, but potentially your production, your pre-prod, any development work that you were doing for, I don't know, self-driving cars or whatever that Uber [unintelligible 00:26:18] is doing, all that is out on the internet. But who was talking about all of that? That's a much worse a breach than what was portrayed on the internet. I don't know, what do you think?Corey: When it comes to trusting providers, where I sit is that I think, given their scale, they need to be a lot more transparent than they have been historically. However, I also believe that if you do not trust that these companies are telling you the truth about what they're doing, how they're doing it, what their controls are, then you should not be using them as a customer, full stop. This idea of confidential computing drives me nuts because so much of it is, “Well, what if we assume our cloud provider is lying to us about all of these things?” Like, hypothetically there's nothing stopping them from building an exact clone of their entire control plane that they redirect your request to that do something completely different under the hood. “Oh, yeah, of course, we're encrypting it with that special KMS key.” No, they're not. For, “Yeah, sure we're going to put that into this region.” Nope, it goes right back to Virginia. If you believe that's what's going on and that they're willing to do that, you can't be in cloud.Ashish: Yeah, a hundred percent. I think foundational trust need to exist and I don't think the cloud service providers themselves do a great job of building that trust. And maybe that's where the drift comes in because the business has decided they're going to cloud. The cyber security people are trying to be more aware and asking the question, “Hey, why do we trust it so blindly? I don't have a pen test report from Amazon saying they have tested service.”Yes, I do have a certificate saying it's PCI compliant, but how do I know—to what you said—they haven't cloned our services? Fortunately, businesses are getting smarter. Like, Walmart would never have their resources in AWS because they don't trust them. It's a business risk if suddenly they decide to go into that space. But the other way around, Microsoft may decides tomorrow that they want to start their own Walmart. Then what do you do?So, I don't know how many people actually consider that as a real business risk, especially because there's a word that was floating around the internet called supercloud. And the idea behind this was—oh, I can already see your reaction [laugh].Corey: Yeah, don't get me started on that whole mess.Ashish: [laugh]. Oh no, I'm the same. I'm like, “What? What now?” Like, “What are you—” So, one thing I took away which I thought was still valuable was the fact that if you look at the cloud service providers, they're all like octopus, they all have tentacles everywhere.Like, if you look at the Amazon of the world, they not only a bookstore, they have a grocery store, they have delivery service. So, they are into a lot of industries, the same way Google Cloud, Microsoft, they're all in multiple industries. And they can still have enough money to choose to go into an industry that they had never been into before because of the access that they would get with all this information that they have, potentially—assuming that they [unintelligible 00:29:14] information. Now, “Shared responsibility,” quote-unquote, they should not do it, but there is nothing stopping them from actually starting a Walmart tomorrow if they wanted to.Corey: So, because a podcast and a day job aren't enough, what are you going to be doing in the near future given that, as we record this, re:Invent is nigh?Ashish: Yeah. So, podcasting and being in the YouTube space has definitely opened up the creative mindset for me. And I think for my producer as well. We're doing all these exciting projects. We have something called Cloud Security Villains that is coming up for AWS re:Invent, and it's going to be released on our YouTube channel as well as my social media.And we'll have merchandise for it across the re:Invent as well. And I'm just super excited about the possibility that media as a space provides for everyone. So, for people who are listening in and thinking that, I don't know, I don't want to write for a blog or email newsletter or whatever the thing may be, I just want to put it out there that I used to be excited about AWS re:Invent just to understand, hey, hopefully, they will release a new security service. Now, I get excited about these events because I get to meet community, help them, share what they have learned on the internet, and sound smarter [laugh] as a result of that as well, and get interviewed where people like yourself. But I definitely find that at the moment with AWS re:Invent coming in, a couple of things that are exciting for me is the release of the Cloud Security Villains, which I think would be an exciting project, especially—hint, hint—for people who are into comic books, you will definitely enjoy it, and I think your kids will as well. So, just in time for Christmas.Corey: We will definitely keep an eye out for that and put a link to that in the show notes. I really want to thank you for being so generous with your time. If people want to learn more about what you're up to, where's the best place for them to find you?Ashish: I think I'm fortunate enough to be at that stage where normally if people Google me—and it's simply Ashish Rajan—they will definitely find me [laugh]. I'll be really hard for them not find me on the internet. But if you are looking for a source of unbiased cloud security knowledge, you can definitely hit up cloudsecuritypodcast.tv or our YouTube and LinkedIn channel.We go live stream every week with a new guest talking about cloud security, which could be companies like LinkedIn, Twilio, to name a few that have come on the show already, and a lot more than have come in and been generous with their time and shared how they do what they do. And we're fortunate that we get ranked top 100 in America, US, UK, as well as Australia. I'm really fortunate for that. So, we're doing something right, so hopefully, you get some value out of it as well when you come and find me.Corey: And we will, of course, put links to all of that in the show notes. Thank you so much for being so generous with your time. I really appreciate it.Ashish: Thank you, Corey, for having me. I really appreciate this a lot. I enjoyed the conversation.Corey: As did I. Ashish Rajan, Principal Cloud Security Advocate at Snyk who is sponsoring this promoted guest episode. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment pointing out that not every CISO gets fired; some of them successfully manage to blame the intern.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Snyk and the Complex World of Vulnerability Intelligence with Clinton Herget

Screaming in the Cloud

Play Episode Listen Later Nov 17, 2022 38:39


About ClintonClinton Herget is Field CTO at Snyk, the leader is Developer Security. He focuses on helping Snyk's strategic customers on their journey to DevSecOps maturity. A seasoned technnologist, Cliton spent his 20-year career prior to Snyk as a web software developer, DevOps consultant, cloud solutions architect, and engineering director. Cluinton is passionate about empowering software engineering to do their best work in the chaotic cloud-native world, and is a frequent conference speaker, developer advocate, and technical thought leader.Links Referenced: Snyk: https://snyk.io/ duckbillgroup.com: https://duckbillgroup.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out.Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: This episode is bought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups.  If you're tired of the vulnerabilities, costs and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, thats V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. One of the fun things about establishing traditions is that the first time you do it, you don't really know that that's what's happening. Almost exactly a year ago, I sat down for a previous promoted guest episode much like this one, With Clinton Herget at Snyk—or Synic; however you want to pronounce that. He is apparently a scarecrow of some sorts because when last we spoke, he was a principal solutions engineer, but like any good scarecrow, he was outstanding in his field, and now, as a result, is a Field CTO. Clinton, Thanks for coming back, and let me start by congratulating you on the promotion. Or consoling you depending upon how good or bad it is.Clinton: You know, Corey, a little bit of column A, a little bit of column B. But very glad to be here again, and frankly, I think it's because you insist on mispronouncing Snyk as Synic, and so you get me again.Corey: Yeah, you could add a couple of new letters to it and just call the company [Synack 00:01:27]. Now, it's a hard pivot to a networking company. So, there's always options.Clinton: I acknowledge what you did there, Corey.Corey: I like that quite a bit. I wasn't sure you'd get it.Clinton: I'm a nerd going way, way back, so we'll have to go pretty deep in the stack for you to stump me on some of this stuff.Corey: As we did with the, “I wasn't sure you'd get it.” See that one sailed right past you. And I win. Chalk another one up for me and the networking pun wars. Great, we'll loop back for that later.Clinton: I don't even know where I am right now.Corey: [laugh]. So, let's go back to a question that one would think that I'd already established a year ago, but I have the attention span of basically a goldfish, let's not kid ourselves. So, as I'm visiting the Snyk website, I find that it says different words than it did a year ago, which is generally a sign that is positive; when nothing's been updated including the copyright date, things are going really well or really badly. One wonders. But no, now you're talking about Snyk Cloud, you're talking about several other offerings as well, and my understanding of what it is you folks do no longer appears to be completely accurate. So, let me be direct. What the hell do you folks do over there?Clinton: It's a really great question. Glad you asked me on a year later to answer it. I would say at a very high level, what we do hasn't changed. However, I think the industry has certainly come a long way in the past couple years and our job is to adapt to that Snyk—again, pronounced like a pair of sneakers are sneaking around—it's a developer security platform. So, we focus on enabling the people who build applications—which as of today, means modern applications built in the cloud—to have better visibility, and ultimately a better chance of mitigating the risk that goes into those applications when it matters most, which is actually in their workflow.Now, you're exactly right. Things have certainly expanded in that remit because the job of a software engineer is very different, I think this year than it even was last year, and that's continually evolving over time. As a developer now, I'm doing a lot more than I was doing a few years ago. And one of the things I'm doing is building infrastructure in the cloud, I'm writing YAML files, I'm writing CloudFormation templates to deploy things out to AWS. And what happens in the cloud has a lot to do with the risk to my organization associated with those applications that I'm building.So, I'd love to talk a little bit more about why we decided to make that move, but I don't think that represents a watering down of what we're trying to do at Snyk. I think it recognizes that developer security vision fundamentally can't exist without some understanding of what's happening in the cloud.Corey: One of the things that always scares me is—and sets the spidey sense tingling—is when I see a company who has a product, and I'm familiar—ish—with what they do. And then they take their product name and slap the word cloud at the end, which is almost always codes to, “Okay, so we took the thing that we sold in boxes in data centers, and now we're making a shitty hosted version available because it turns out you rubes will absolutely pay a subscription for it.” Yeah, I don't get the sense that at all is what you're doing. In fact, I don't believe that you're offering a hosted managed service at the moment, are you?Clinton: No, the cloud part, that fundamentally refers to a new product, an offering that looks at the security or potentially the risks being introduced into cloud infrastructure, by now the engineers who were doing it who are writing infrastructure as code. We previously had an infrastructure-as-code security product, and that served alongside our static analysis tool which is Snyk Code, our open-source tool, our container scanner, recognizing that the kinds of vulnerabilities you can potentially introduce in writing cloud infrastructure are not only bad to the organization on their own—I mean, nobody wants to create an S3 bucket that's wide open to the world—but also, those misconfigurations can increase the blast radius of other kinds of vulnerabilities in the stack. So, I think what it does is it recognizes that, as you and I think your listeners well know, Corey, there's no such thing as the cloud, right? The cloud is just a bunch of fancy software designed to abstract away from the fact that you're running stuff on somebody else's computer, right?Corey: Unfortunately, in this case, the fact that you're calling it Snyk Cloud does not mean that you're doing what so many other companies in that same space do it would have led to a really short interview because I have no faith that it's the right path forward, especially for you folks, where it's, “Oh, you want to be secure? You've got to host your stuff on our stuff instead. That's why we called it cloud.” That's the direction that I've seen a lot of folks try and pivot in, and I always find it disastrous. It's, “Yeah, well, at Snyk if we run your code or your shitty applications here in our environment, it's going to be safer than if you run it yourself on something untested like AWS.” And yeah, those stories hold absolutely no water. And may I just say, I'm gratified that's not what you're doing?Clinton: Absolutely not. No, I would say we have no interest in running anyone's applications. We do want to scan them though, right? We do want to give the developers insight into the potential misconfigurations, the risks, the vulnerabilities that you're introducing. What sets Snyk apart, I think, from others in that application security testing space is we focus on the experience of the developer, rather than just being another tool that runs and generates a bunch of PDFs and then throws them back to say, “Here's everything you did wrong.”We want to say to developers, “Here's what you could do better. Here's how that default in a CloudFormation template that leads to your bucket being, you know, wide open on the internet could be changed. Here's the remediation that you could introduce.” And if we do that at the right moment, which is inside that developer workflow, inside the IDE, on their local machine, before that gets deployed, there's a much greater chance that remediation is going to be implemented and it's going to happen much more cheaply, right? Because you no longer have to do the round trip all the way out to the cloud and back.So, the cloud part of it fundamentally means completing that story, recognizing that once things do get deployed, there's a lot of valuable context that's happening out there that a developer can really take advantage of. They can say, “Wait a minute. Not only do I have a Log4Shell vulnerability, right, in one of my open-source dependencies, but that artifact, that application is actually getting deployed to a VPC that has ingress from the internet,” right? So, not only do I have remote code execution in my application, but it's being put in an enclave that actually allows it to be exploited. You can only know that if you're actually looking at what's really happening in the cloud, right?So, not only does Snyk cloud allows us to provide an additional layer of security by looking at what's misconfigured in that cloud environment and help your developers make remediations by saying, “Here's the actual IAC file that caused that infrastructure to come into existence,” but we can also say, here's how that affects the risk of other kinds of vulnerabilities at different layers in the stack, right? Because it's all software; it's all connected. Very rarely does a vulnerability translate one-to-one into risk, right? They're compound because modern software is compound. And I think what developers lack is the tooling that fits into their workflow that understands what it means to be a software engineer and actually helps them make better choices rather than punishing them after the fact for guessing and making bad ones.Corey: That sounds awesome at a very high level. It is very aligned with how executives and decision-makers think about a lot of these things. Let's get down to brass tacks for a second. Assume that I am the type of developer that I am in real life, by which I mean shitty. What am I going to wind up attempting to do that Snyk will flag and, in other words, protect me from myself and warn me that I'm about to commit a dumb?Clinton: First of all, I would say, look, there's no such thing as a non-shitty developer, right? And I built software for 20 years and I decided that's really hard. What's a lot easier is talking about building software for a living. So, that's what I do now. But fundamentally, the reason I'm at Snyk, is I want to help people who are in the kinds of jobs that I had for a very long time, which is to say, you have a tremendous amount of anxiety because you recognize that the success of the organization rests on your shoulders, and you're making hundreds, if not thousands of decisions every day without the right context to understand fully how the results of that decision is going to affect the organization that you work for.So, I think every developer in the world has to deal with this constant cognitive dissonance of saying, “I don't know that this is right, but I have to do it anyway because I need to clear that ticket because that release needs to get into production.” And it becomes really easy to short-sightedly do things like pull an open-source dependency without checking whether it has any CVEs associated with it because that's the version that's easiest to implement with your code that already exists. So, that's one piece. Snyk Open Source, designed to traverse that entire tree of dependencies in open-source all the way down, all the hundreds and thousands of packages that you're pulling in to say, not only, here's a vulnerability that you should really know is going to end up in your application when it's built, but also here's what you can do about it, right? Here's the upgrade you can make, here's the minimum viable change that actually gets you out of this problem, and to do so when it's in the right context, which is in you know, as you're making that decision for the first time, right, inside your developer environment.That also applies to things like container vulnerabilities, right? I have even less visibility into what's happening inside a container than I do inside my application. Because I know, say, I'm using an Ubuntu or a Red Hat base image. I have no idea, what are all the Linux packages that are on it, let alone what are the vulnerabilities associated with them, right? So, being able to detect, I've got a version of OpenSSL 3.0 that has a potentially serious vulnerability associated with it before I've actually deployed that container out into the cloud very much helps me as a developer.Because I'm limiting the rework or the refactoring I would have to do by otherwise assuming I'm making a safe choice or guessing at it, and then only finding out after I've written a bunch more code that relies on that decision, that I have to go back and change it, and then rewrite all of the things that I wrote on top of it, right? So, it's the identifying the layer in the stack where that risk could be introduced, and then also seeing how it's affected by all of those other layers because modern software is inherently complex. And that complexity is what drives both the risk associated with it, and also things like efficiency, which I know your audience is, for good reason, very concerned about.Corey: I'm going to challenge you on aspect of this because on the tin, the way you describe it, it sounds like, “Oh, I already have something that does that. It's the GitHub Dependabot story where it winds up sending me a litany of complaints every week.” And we are talking, if I did nothing other than read this email in that day, that would be a tremendously efficient processing of that entire thing because so much of it is stuff that is ancient and archived, and specific aspects of the vulnerabilities are just not relevant. And you talk about the OpenSSL 3.0 issues that just recently came out.I have no doubt that somewhere in the most recent email I've gotten from that thing, it's buried two-thirds of the way down, like all the complaints like the dishwasher isn't loaded, you forgot to take the trash out, that baby needs a change, the kitchen is on fire, and the vacuuming, and the r—wait, wait. What was that thing about the kitchen? Seems like one of those things is not like the others. And it just gets lost in the noise. Now, I will admit to putting my thumb a little bit on the scale here because I've used Snyk before myself and I know that you don't do that. How do you avoid that trap?Clinton: Great question. And I think really, the key to the story here is, developers need to be able to prioritize, and in order to prioritize effectively, you need to understand the context of what happens to that application after it gets deployed. And so, this is a key part of why getting the data out of the cloud and bringing it back into the code is so important. So, for example, take an OpenSSL vulnerability. Do you have it on a container image you're using, right? So, that's question number one.Question two is, is there actually a way that code can be accessed from the outside? Is it included or is it called? Is the method activated by some other package that you have running on that container? Is that container image actually used in a production deployment? Or does it just go sit in a registry and no one ever touches it?What are the conditions required to make that vulnerability exploitable? You look at something like Spring Shell, for example, yes, you need a certain version of spring-beans in a JAR file somewhere, but you also need to be running a certain version of Tomcat, and you need to be packaging those JARs inside a WAR in a certain way.Corey: Exactly. I have a whole bunch of Lambda functions that provide the pipeline system that I use to build my newsletter every week, and I get screaming concerns about issues in, for example, a version of the markdown parser that I've subverted. Yeah, sure. I get that, on some level, if I were just giving it random untrusted input from the internet and random ad hoc users, but I'm not. It's just me when I write things for that particular Lambda function.And I'm not going to be actively attempting to subvert the thing that I built myself and no one else should have access to. And looking through the details of some of these things, it doesn't even apply to the way that I'm calling the libraries, so it's just noise, for lack of a better term. It is not something that basically ever needs to be adjusted or fixed.Clinton: Exactly. And I think cutting through that noise is so key to creating developer trust in any kind of tool that scanning an asset and providing you what, in theory, are a list of actionable steps, right? I need to be able to understand what is the thing, first of all. There's a lot of tools that do that, right, and we tend to mock them by saying things like, “Oh, it's just another PDF generator. It's just another thousand pages that you're never going to read.”So, getting the information in the right place is a big part of it, but filtering out all of the noise by saying, we looked at not just one layer of the stack, but multiple layers, right? We know that you're using this open-source dependency and we also know that the method that contains the vulnerability is actively called by your application in your first-party code because we ran our static analysis tool against that. Furthermore, we know because we looked at your cloud context, we connected to your AWS API—we're big partners with AWS and very proud of that relationship—but we can tell that there's inbound internet access available to that service, right? So, you start to build a compound case that maybe this is something that should be prioritized, right? Because there's a way into the asset from the outside world, there's a way into the vulnerable functions through the labyrinthine, you know, spaghetti of my code to get there, and the conditions required to exploit it actually exist in the wild.But you can't just run a single tool; you can't just run Dependabot to get that prioritization. You actually have to look at the entire holistic application context, which includes not just your dependencies, but what's happening in the container, what's happening in your first-party, your proprietary code, what's happening in your IAC, and I think most importantly for modern applications, what's actually happening in the cloud once it gets deployed, right? And that's sort of the holy grail of completing that loop to bring the right context back from the cloud into code to understand what change needs to be made, and where, and most importantly why. Because it's a priority that actually translates into organizational risk to get a developer to pay attention, right? I mean, that is the key to I think any security concern is how do you get engineering mindshare and trust that this is actually what you should be paying attention to and not a bunch of rework that doesn't actually make your software more secure?Corey: One of the challenges that I see across the board is that—well, let's back up a bit here. I have in previous episodes talked in some depth about my position that when it comes to the security of various cloud providers, Google is number one, and AWS is number two. Azure is a distant third because it figures out what Crayons tastes the best; I don't know. But the reason is not because of any inherent attribute of their security models, but rather that Google massively simplifies an awful lot of what happens. It automatically assumes that resources in the same project should be able to talk to one another, so I don't have to painstakingly configure that.In AWS-land, all of this must be done explicitly; no one has time for that, so we over-scope permissions massively and never go back and rein them in. It's a configuration vulnerability more than an underlying inherent weakness of the platform. Because complexity is the enemy of security in many respects. If you can't fit it all in your head to reason about it, how can you understand the security ramifications of it? AWS offers a tremendous number of security services. Many of them, when taken in some totality of their pricing, cost more than any breach, they could be expected to prevent. Adding more stuff that adds more complexity in the form of Snyk sounds like it's the exact opposite of what I would want to do. Change my mind.Clinton: I would love to. I would say, fundamentally, I think you and I—and by ‘I,' I mean Snyk and you know, Corey Quinn Enterprises Limited—I think we fundamentally have the same enemy here, right, which is the cyclomatic complexity of software, right, which is how many different pathways do the bits have to travel down to reach the same endpoint, right, the same goal. The more pathways there are, the more risk is introduced into your software, and the more inefficiency is introduced, right? And then I know you'd love to talk about how many different ways is there to run a container on AWS, right? It's either 30 or 400 or eleventy-million.I think you're exactly right that that complexity, it is great for, first of all, selling cloud resources, but also, I think, for innovating, right, for building new kinds of technology on top of that platform. The cost that comes along with that is a lack of visibility. And I think we are just now, as we approach the end of 2022 here, coming to recognize that fundamentally, the complexity of modern software is beyond the ability of a single engineer to understand. And that is really important from a security perspective, from a cost control perspective, especially because software now creates its own infrastructure, right? You can't just now secure the artifact and secure the perimeter that it gets deployed into and say, “I've done my job. Nobody can breach the perimeter and there's no vulnerabilities in the thing because we scanned it and that thing is immutable forever because it's pets, not cattle.”Where I think the complexity story comes in is to recognize like, “Hey, I'm deploying this based on a quickstart or CloudFormation template that is making certain assumptions that make my job easier,” right, in a very similar way that choosing an open-source dependency makes my job easier as a developer because I don't have to write all of that code myself. But what it does mean is I lack the visibility into, well hold on. How many different pathways are there for getting things done inside this dependency? How many other dependencies are brought on board? In the same way that when I create an EKS cluster, for example, from a CloudFormation template, what is it creating in the background? How many VPCs are involved? What are the subnets, right? How are they connected to each other? Where are the potential ingress points?So, I think fundamentally, getting visibility into that complexity is step number one, but understanding those pathways and how they could potentially translate into risk is critically important. But that prioritization has to involve looking at the software holistically and not just individual layers, right? I think we lose when we say, “We ran a static analysis tool and an open-source dependency scanner and a container scanner and a cloud config checker, and they all came up green, therefore the software doesn't have any risks,” right? That ignores the fundamental complexity in that all of these layers are connected together. And from an adversaries perspective, if my job is to go in and exploit software that's hosted in the cloud, I absolutely do not see the application model that way.I see it as it is inherently complex and that's a good thing for me because it means I can rely on the fact that those engineers had tremendous anxiety, we're making a lot of guesses, and crossing their fingers and hoping something would work and not be exploitable by me, right? So, the only way I think we get around that is to recognize that our engineers are critical stakeholders in that security process and you fundamentally lack that visibility if you don't do your scanning until after the fact. If you take that traditional audit-based approach that assumes a very waterfall, legacy approach to building software, and recognize that, hey, we're all on this infinite loop race track now. We're deploying every three-and-a-half seconds, everything's automated, it's all built at scale, but the ability to do that inherently implies all of this additional complexity that ultimately will, you know, end up haunting me, right? If I don't do anything about it, to make my engineer stakeholders in, you know, what actually gets deployed and what risks it brings on board.Corey: This episode is sponsored in part by our friends at Uptycs. Attackers don't think in silos, so why would you have siloed solutions protecting cloud, containers, and laptops distinctly? Meet Uptycs - the first unified solution that prioritizes risk across your modern attack surface—all from a single platform, UI, and data model. Stop by booth 3352 at AWS re:Invent in Las Vegas to see for yourself and visit uptycs.com. That's U-P-T-Y-C-S.com. My thanks to them for sponsoring my ridiculous nonsense.Corey: When I wind up hearing you talk about this—I'm going to divert us a little bit because you're dancing around something that it took me a long time to learn. When I first started fixing AWS bills for a living, I thought that it would be mostly math, by which I mean arithmetic. That's the great secret of cloud economics. It's addition, subtraction, and occasionally multiplication and division. No, turns out it's much more psychology than it is math. You're talking in many aspects about, I guess, what I'd call the psychology of a modern cloud engineer and how they think about these things. It's not a technology problem. It's a people problem, isn't it?Clinton: Oh, absolutely. I think it's the people that create the technology. And I think the longer you persist in what we would call the legacy viewpoint, right, not recognizing what the cloud is—which is fundamentally just software all the way down, right? It is abstraction layers that allow you to ignore the fact that you're running stuff on somebody else's computer—once you recognize that, you realize, oh, if it's all software, then the problems that it introduces are software problems that need software solutions, which means that it must involve activity by the people who write software, right? So, now that you're in that developer world, it unlocks, I think, a lot of potential to say, well, why don't developers tend to trust the security tools they've been provided with, right?I think a lot of it comes down to the question you asked earlier in terms of the noise, the lack of understanding of how those pieces are connected together, or the lack of context, or not even frankly, caring about looking beyond the single-point solution of the problem that solution was designed to solve. But more importantly than that, not recognizing what it's like to build modern software, right, all of the decisions that have to be made on a daily basis with very limited information, right? I might not even understand where that container image I'm building is going in the universe, let alone what's being built on top of it and how much critical customer data is being touched by the database, that that container now has the credentials to access, right? So, I think in order to change anything, we have to back way up and say, problems in the cloud or software problems and we have to treat them that way.Because if we don't if we continue to represent the cloud as some evolution of the old environment where you just have this perimeter that's pre-existing infrastructure that you're deploying things onto, and there's a guy with a neckbeard in the basement who is unplugging cables from a switch and plugging them back in and that's how networking problems are solved, I think you missed the idea that all of these abstraction layers introduced the very complexity that needs to be solved back in the build space. But that requires visibility into what actually happens when it gets deployed. The way I tend to think of it is, there's this firewall in place. Everybody wants to say, you know, we're doing DevOps or we're doing DevSecOps, right? And that's a lie a hundred percent of the time, right? No one is actually, I think, adhering completely to those principles.Corey: That's why one of the core tenets of ClickOps is lying about doing anything in the console.Clinton: Absolutely, right? And that's why shadow IT becomes more and more prevalent the deeper you get into modern development, not less and less prevalent because it's fundamentally hard to recognize the entirety of the potential implications, right, of a decision that you're making. So, it's a lot easier to just go in the console and say, “Okay, I'm going to deploy one EC2 to do this. I'm going to get it right at some point.” And that's why every application that's ever been produced by human hands has a comment in it that says something like, “I don't know why this works but it does. Please don't change it.”And then three years later because that developer has moved on to another job, someone else comes along and looks at that comment and says, “That should really work. I'm going to change it.” And they do and everything fails, and they have to go back and fix it the original way and then add another comment saying, “Hey, this person above me, they were right. Please don't change this line.” I think every engineer listening right now knows exactly where that weak spot is in the applications that they've written and they're terrified of that.And I think any tool that's designed to help developers fundamentally has to get into the mindset, get into the psychology of what that is, like, of not fundamentally being able to understand what those applications are doing all of the time, but having to write code against them anyway, right? And that's what leads to, I think, the fear that you're going to get woken up because your pager is going to go off at 3 a.m. because the building is literally on fire and it's because of code that you wrote. We have to solve that problem and it has to be those people who's psychology we get into to understand, how are you working and how can we make your life better, right? And I really do think it comes with that the noise reduction, the understanding of complexity, and really just being humble and saying, like, “We get that this job is really hard and that the only way it gets better is to begin admitting that to each other.”Corey: I really wish that there were a better way to articulate a lot of these things. This the reason that I started doing a security newsletter; it's because cost and security are deeply aligned in a few ways. One of them is that you care about them a lot right after you failed to care about them sufficiently, but the other is that you've got to build guardrails in such a way that doing the right thing is easier than doing it the wrong way, or you're never going to gain any traction.Clinton: I think that's absolutely right. And you use the key term there, which is guardrails. And I think that's where in their heart of hearts, that's where every security professional wants to be, right? They want to be defining policy, they want to be understanding the risk posture of the organization and nudging it in a better direction, right? They want to be talking up to the board, to the executive team, and creating confidence in that risk posture, rather than talking down or off to the side—depending on how that org chart looks—to the engineers and saying, “Fix this, fix that, and then fix this other thing.” A, B, and C, right?I think the problem is that everyone in a security role or an organization of any size at this point, is doing 90% of the latter and only about 10% of the former, right? They're acting as gatekeepers, not as guardrails. They're not defining policy, they're spending all of their time creating Jira tickets and all of their time tracking down who owns the piece of code that got deployed to this pod on EKS that's throwing all these errors on my console, and how can I get the person to make a decision to actually take an action that stops these notifications from happening, right? So, all they're doing is throwing footballs down the field without knowing if there's a receiver there, right, and I think that takes away from the job that our security analysts really shouldn't be doing, which is creating those guardrails, which is having confidence that the policy they set is readily understood by the developers making decisions, and that's happening in an automated way without them having to create friction by bothering people all the time. I don't think security people want to be [laugh] hated by the development teams that they work with, but they are. And the reason they are is I think, fundamentally, we lack the tooling, we lack—Corey: They are the barrier method.Clinton: Exactly. And we lacked the processes to get the right intelligence in a way that's consumable by the engineers when they're doing their job, and not after the fact, which is typically when the security people have done their jobs.Corey: It's sad but true. I wish that there were a better way to address these things, and yet here we are.Clinton: If only there were better way to address these things.Corey: [laugh].Clinton: Look, I wouldn't be here at Snyk if I didn't think there were a better way, and I wouldn't be coming on shows like yours to talk to the engineering communities, right, people who have walked the walk, right, who have built those Terraform files that contain these misconfigurations, not because they're bad people or because they're lazy, or because they don't do their jobs well, but because they lacked the visibility, they didn't have the understanding that that default is actually insecure. Because how would I know that otherwise, right? I'm building software; I don't see myself as an expert on infrastructure, right, or on Linux packages or on cyclomatic complexity or on any of these other things. I'm just trying to stay in my lane and do my job. It's not my fault that the software has become too complex for me to understand, right?But my management doesn't understand that and so I constantly have white knuckles worrying that, you know, the next breach is going to be my fault. So, I think the way forward really has to be, how do we make our developers stakeholders in the risk being introduced by the software they write to the organization? And that means everything we've been talking about: it means prioritization; it means understanding how the different layers of the stack affect each other, especially the cloud pieces; it means an extensible platform that lets me write code against it to inject my own reasoning, right? The piece that we haven't talked about here is that risk calculation doesn't just involve technical aspects, there's also business intelligence that's involved, right? What are my critical applications, right, what actually causes me to lose significant amounts of money if those services go offline?We at Snyk can't tell that. We can't run a scanner to say these are your crown jewel services that can't ever go down, but you can know that as an organization. So, where we're going with the platform is opening up the extensible process, creating APIs for you to be able to affect that risk triage, right, so that as the creators have guardrails as the security team, you are saying, “Here's how we want our developers to prioritize. Here are all of the factors that go into that decision-making.” And then you can be confident that in their environment, back over in developer-land, when I'm looking at IntelliJ, or, you know, or on my local command line, I am seeing the guardrails that my security team has set for me and I am confident that I'm fixing the right thing, and frankly, I'm grateful because I'm fixing it at the right time and I'm doing it in such a way and with a toolset that actually is helping me fix it rather than just telling me I've done something wrong, right, because everything we do at Snyk focuses on identifying the solution, not necessarily identifying the problem.It's great to know that I've got an unencrypted S3 bucket, but it's a whole lot better if you give me the line of code and tell me exactly where I have to copy and paste it so I can go on to the next thing, rather than spending an hour trying to figure out, you know, where I put that line and what I actually have to change it to, right? I often say that the most valuable currency for a developer, for a software engineer, it's not money, it's not time, it's not compute power or anything like that, it's the right context, right? I actually have to understand what are the implications of the decision that I'm making, and I need that to be in my own environment, not after the fact because that's what creates friction within an organization is when I could have known earlier and I could have known better, but instead, I had to guess I had to write a bunch of code that relies on the thing that was wrong, and now I have to redo it all for no good reason other than the tooling just hadn't adapted to the way modern software is built.Corey: So, one last question before we wind up calling it a day here. We are now heavily into what I will term pre:Invent where we're starting to see a whole bunch of announcements come out of the AWS universe in preparation for what I'm calling Crappy Cloud Hanukkah this year because I'm spending eight nights in Las Vegas. What are you doing these days with AWS specifically? I know I keep seeing your name in conjunction with their announcements, so there's something going on over there.Clinton: Absolutely. No, we're extremely excited about the partnership between Snyk and AWS. Our vulnerability intelligence is utilized as one of the data sources for AWS Inspector, particularly around open-source packages. We're doing a lot of work around things like the code suite, building Snyk into code pipeline, for example, to give developers using that code suite earlier visibility into those vulnerabilities. And really, I think the story kind of expands from there, right?So, we're moving forward with Amazon, recognizing that it is, you know, sort of the de facto. When we say cloud, very often we mean AWS. So, we're going to have a tremendous presence at re:Invent this year, I'm going to be there as well. I think we're actually going to have a bunch of handouts with your face on them is my understanding. So, please stop by the booth; would love to talk to folks, especially because we've now released the Snyk Cloud product and really completed that story. So, anything we can do to talk about how that additional context of the cloud helps engineers because it's all software all the way down, those are absolutely conversations we want to be having.Corey: Excellent. And we will, of course, put links to all of these things in the [show notes 00:35:00] so people can simply click, and there they are. Thank you so much for taking all this time to speak with me. I appreciate it.Clinton: All right. Thank you so much, Corey. Hope to do it again next year.Corey: Clinton Herget, Field CTO at Snyk. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment telling me that I'm being completely unfair to Azure, along with your favorite tasting color of Crayon.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Non-Magical Approach to Cloud-Based Development with Chen Goldberg

Screaming in the Cloud

Play Episode Listen Later Nov 15, 2022 40:13


About ChenChen Goldberg is GM and Vice President of Engineering at Google Cloud, where she leads the Cloud Runtimes (CR) product area, helping customers deliver greater value, effortlessly. The CR  portfolio includes both Serverless and Kubernetes based platforms on Google Cloud, private cloud and other public clouds. Chen is a strong advocate for customer empathy, building products and solutions that matter. Chen has been core to Google Cloud's open core vision since she joined the company six years ago. During that time, she has led her team to focus on helping development teams increase their agility and modernize workloads. Prior to joining Google, Chen wore different hats in the tech industry including leadership positions in IT organizations, SI teams and SW product development, contributing to Chen's broad enterprise perspective. She enjoys mentoring IT talent both in and outside of Google. Chen lives in Mountain View, California, with her husband and three kids. Outside of work she enjoys hiking and baking.Links Referenced: Twitter: https://twitter.com/GoldbergChen LinkedIn: https://www.linkedin.com/in/goldbergchen/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With Tailscale SSH, you can do exactly that. Tailscale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate SSH.Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built-in key rotation, permissions as code, connectivity between any two devices, reduce latency, and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?Nope, I wish it were. Tailscale is completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscaleCorey: Welcome to Screaming in the Cloud, I'm Corey Quinn. When I get bored and the power goes out, I find myself staring at the ceiling, figuring out how best to pick fights with people on the internet about Kubernetes. Because, well, I'm basically sad and have a growing collection of personality issues. My guest today is probably one of the best people to have those arguments with. Chen Goldberg is the General Manager of Cloud Runtimes and VP of Engineering at Google Cloud. Chen, Thank you for joining me today.Chen: Thank you so much, Corey, for having me.Corey: So, Google has been doing a lot of very interesting things in the cloud, and the more astute listener will realize that interesting is not always necessarily a compliment. But from where I sit, I am deeply vested in the idea of a future where we do not have a cloud monoculture. As I've often said, I want, “What cloud should I build something on in five to ten years?” To be a hard question to answer, and not just because everything is terrible. I think that Google Cloud is absolutely a bright light in the cloud ecosystem and has been for a while, particularly with this emphasis around developer experience. All of that said, Google Cloud is sort of a big, unknowable place, at least from the outside. What is your area of responsibility? Where do you start? Where do you stop? In other words, what can I blame you for?Chen: Oh, you can blame me for a lot of things if you want to. I [laugh] might not agree with that, but that's—Corey: We strive for accuracy in these things, though.Chen: But that's fine. Well, first of all, I've joined Google about seven years ago to lead the Kubernetes and GKE team, and ever since, continued at the same area. So evolved, of course, Kubernetes, and Google Kubernetes Engine, and leading our hybrid and multi-cloud strategy as well with technologies like Anthos. And now I'm responsible for the entire container runtime, which includes Kubernetes and the serverless solutions.Corey: A while back, I, in fairly typical sarcastic form, wound up doing a whole inadvertent start of a meme where I joked about there being 17 ways to run containers on AWS. And then as that caught on, I wound up listing out 17 services you could use to do that. A few months went past and then I published a sequel of 17 more services you can use to run Kubernetes. And while that was admittedly tongue-in-cheek, it does lead to an interesting question that's ecosystem-wide. If I look at Google Cloud, I have Cloud Run, I have GKE, I have GCE if I want to do some work myself.It feels like more and more services are supporting Docker in a variety of different ways. How should customers and/or people like me—though, I am sort of a customer as well since I do pay you folks every month—how should we think about containers and services in which to run them?Chen: First of all, I think there's a lot of credit that needs to go to Docker that made containers approachable. And so, Google has been running containers forever. Everything within Google is running on containers, even our VMs, even our cloud is running on containers, but what Docker did was creating a packaging mechanism to improve developer velocity. So, that's on its own, it's great. And one of the things, by the way, that I love about Google Cloud approach to containers and Docker that yes, you can take your Docker container and run it anywhere.And it's actually really important to ensure what we call interoperability, or low barrier to entry to a new technology. So, I can take my Docker container, I can move it from one platform to another, and so on. So, that's just to start with on a containers. Between the different solutions, so first of all, I'm all about managed services. You are right, there are many ways to run a Kubernetes. I'm taking a lot of pride—Corey: The best way is always to have someone else run it for you. Problem solved. Great, the best kind of problems are always someone else's.Chen: Yes. And I'm taking a lot of pride of what our team is doing with Kubernetes. I mean, we've been working on that for so long. And it's something that you know, we've coined that term, I think back in 2016, so there is a success disaster, but there's also what we call sustainable success. So, thinking about how to set ourselves up for success and scale. Very proud of that service.Saying that, not everybody and not all your workloads you need the flexibility that Kubernetes gives you in all the ecosystem. So, if you start with containers your first time, you should start with Cloud Run. It's the easiest way to run your containers. That's one. If you are already in love with Kubernetes, we won't take it away from you. Start with GKE. Okay [laugh]? Go all-in. Okay, we are all in loving Kubernetes as well. But what my team and I are working on is to make sure that those will work really well together. And we actually see a lot of customers do that.Corey: I'd like to go back a little bit in history to the rise of Docker. I agree with you it was transformative, but containers had been around in various forms—depending upon how you want to define it—dating back to the '70s with logical partitions on mainframes. Well, is that a container? Is it not? Well, sort of. We'll assume yes for the sake of argument.The revelation that I found from Docker was the developer experience, start to finish. Suddenly, it was a couple commands and you were just working, where previously it had taken tremendous amounts of time and energy to get containers working in that same context. And I don't even know today whether or not the right way to contextualize containers is as sort of a lite version of a VM, as a packaging format, as a number of other things that you could reasonably call it. How do you think about containers?Chen: So, I'm going to do, first of all, a small [unintelligible 00:06:31]. I actually started my career as a system mainframe engineer—Corey: Hmm.Chen: And I will share that when you know, I've learned Kubernetes, I'm like, “Huh, we already have done all of that, in orchestration, in workload management on mainframe,” just to the side. The way I think about containers is as a—two things: one, it is a packaging of an application, but the other thing which is also critical is the decoupling between your application and the OS. So, having that kind of abstraction and allowing you to portable and move it between environments. So, those are the two things that are when I think about containers. And what technologies like Kubernetes and serverless gives on top of that is that manageability and making sure that we take care of everything else that is needed for you to run your application.Corey: I've been, how do I put this, getting some grief over the past few years, in the best ways possible, around a almost off-the-cuff prediction that I made, which was that in five years, which is now a lot closer to two, basically, nobody is going to care about Kubernetes. And I could have phrased that slightly more directly because people think I was trying to say, “Oh, Kubernetes is just hype. It's going to go away. Nobody's going to worry about it anymore.” And I think that is a wildly inaccurate prediction.My argument is that people are not going to have to think about it in the same way that they are today. Today, if I go out and want to go back to my days of running production services in anger—and by ‘anger,' I of course mean in production—then it would be difficult for me to find a role that did not at least touch upon Kubernetes. But people who can work with that technology effectively are in high demand and they tend to be expensive, not to mention then thinking about all of the intricacies and complexities that Kubernetes brings to the foreground, that is what doesn't feel sustainable to me. The idea that it's going to have to collapse down into something else is, by necessity, going to have to emerge. How are you seeing that play out? And also, feel free to disagree with the prediction. I am thrilled to wind up being told that I'm wrong it's how I learn the most.Chen: I don't know if I agree with the time horizon of when that will happen, but I will actually think it's a failure on us if that won't be the truth, that the majority of people will not need to know about Kubernetes and its internals. And you know, we keep saying that, like, hey, we need to make it more, like, boring, and easy, and I've just said like, “Hey, you should use managed.” And we have lots of customers that says that they're just using GKE and it scales on their behalf and they don't need to do anything for that and it's just like magic. But from a technology perspective, there is still a way to go until we can make that disappear.And there will be two things that will push us into that direction. One is—you mentioned that is as well—the talent shortage is real. All the customers that I speak with, even if they can find those great people that are experts, they're actually more interesting things for them to work on, okay? You don't need to take, like, all the people in your organization and put them on building the infrastructure. You don't care about that. You want to build innovation and promote your business.So, that's one. The second thing is that I do expect that the technology will continue to evolve and are managed solutions will be better and better. So hopefully, with these two things happening together, people will not care that what's under the hood is Kubernetes. Or maybe not even, right? I don't know exactly how things will evolve.Corey: From where I sit, what are the early criticisms I had about Docker, which I guess translates pretty well to Kubernetes, are that they solve a few extraordinarily painful problems. In the case of Docker, it was, “Well, it works on my machine,” as a grumpy sysadmin, the way I used to be, the only real response we had to that was, “Well. Time to backup your email, Skippy, because your laptop is going into production, then.” Now, you can effectively have a high-fidelity copy of production, basically anywhere, and we've solved the problem of making your Mac laptop look like a Linux server. Great, okay, awesome.With Kubernetes, it also feels, on some level, like it solves for very large-scale Google-type of problems where you want to run things across at least a certain point of scale. It feels like even today, it suffers from having an easy Hello World-style application to deploy on top of it. Using it for WordPress, or some other form of blogging software, for example, is stupendous overkill as far as the Hello World story tends to go. Increasingly as a result, it feels like it's great for the large-scale enterprise-y applications, but the getting started story of how do I have a service I could reasonably run in production? How do I contextualize that, in the world of Kubernetes? How do you respond to that type of perspective?Chen: We'll start with maybe a short story. I started my career in the Israeli army. I was head of the department and one of the lead technology units and I was responsible for building a PAS. In essence, it was 20-plus years ago, so we didn't really call it a PAS but that's what it was. And then at some point, it was amazing, developers were very productive, we got innovation again, again. And then there was some new innovation just at the beginning of web [laugh] at some point.And it was actually—so two things I've noticed back then. One, it was really hard to evolve the platform to allow new technologies and innovation, and second thing, from a developer perspective, it was like a black box. So, the developers team that people were—the other development teams couldn't really troubleshoot environment; they were not empowered to make decisions or [unintelligible 00:12:29] in the platform. And you know, when it was just started with Kubernetes—by the way, beginning, it only supported 100 nodes, and then 1000 nodes. Okay, it was actually not for scale; it actually solved those two problems, which I'm—this is where I spend most of my time.So, the first one, we don't want magic, okay? To be clear on, like, what's happening, I want to make sure that things are consistent and I can get the right observability. So, that's one. The second thing is that we invested so much in the extensibility an environment that it's, I wouldn't say it's easy, but it's doable to evolve Kubernetes. You can change the models, you can extend it you can—there is an ecosystem.And you know, when we were building it, I remember I used to tell my team, there won't be a Kubernetes 2.0. Which is for a developer, it's [laugh] frightening. But if you think about it and you prepare for that, you're like, “Huh. Okay, what does that mean with how I build my APIs? What does that mean of how we build a system?” So, that was one. The second thing I keep telling my team, “Please don't get too attached to your code because if it will still be there in 5, 10 years, we did something wrong.”And you can see areas within Kubernetes, again, all the extensions. I'm very proud of all the interfaces that we've built, but let's take networking. This keeps to evolve all the time on the API and the surface area that allows us to introduce new technologies. I love it. So, those are the two things that have nothing to do with scale, are unique to Kubernetes, and I think are very empowering, and are critical for the success.Corey: One thing that you said that resonates most deeply with me is the idea that you don't want there to be magic, where I just hand it to this thing and it runs it as if by magic. Because, again, we've all run things in anger in production, and what happens when the magic breaks? When you're sitting around scratching your head with no idea how it starts or how it stops, that is scary. I mean, I recently wound up re-implementing Google Cloud Distinguished Engineer Kelsey Hightower's “Kubernetes the Hard Way” because he gave a terrific tutorial that I ran through in about 45 minutes on top of Google Cloud. It's like, “All right, how do I make this harder?”And the answer is to do it on AWS, re-implement it there. And my experiment there can be found at kubernetesthemuchharderway.com because I have a vanity domain problem. And it taught me he an awful lot, but one of the challenges I had as I went through that process was, at one point, the nodes were not registering with the controller.And I ran out of time that day and turned everything off—because surprise bills are kind of what I spend my time worrying about—turn it on the next morning to continue and then it just worked. And that was sort of the spidey sense tingling moment of, “Okay, something wasn't working and now it is, and I don't understand why. But I just rebooted it and it started working.” Which is terrifying in the context of a production service. It was understandable—kind of—and I think that's the sort of thing that you understand a lot better, the more you work with it in production, but a counterargument to that is—and I've talked about it on this show before—for this podcast, I wind up having sponsors from time to time, who want to give me fairly complicated links to go check them out, so I have the snark.cloud URL redirector.That's running as a production service on top of Google Cloud Run. It took me half an hour to get that thing up and running; I haven't had to think about it since, aside from a three-second latency that was driving me nuts and turned out to be a sleep hidden in the code, which I can't really fault Google Cloud Run for so much as my crappy nonsense. But it just works. It's clearly running atop Kubernetes, but I don't have to think about it. That feels like the future. It feels like it's a glimpse of a world to come, we're just starting to dip our toes into. That, at least to me, feels like a lot more of the abstractions being collapsed into something easily understandable.Chen: [unintelligible 00:16:30], I'm happy you say that. When talking with customers and we're showing, like, you know, yes, they're all in Kubernetes and talking about Cloud Run and serverless, I feel there is that confidence level that they need to overcome. And that's why it's really important for us in Google Cloud is to make sure that you can mix and match. Because sometimes, you know, a big retail customer of ours, some of their teams, it's really important for them to use a Kubernetes-based platform because they have their workloads also running on-prem and they want to serve the same playbooks, for example, right? How do I address issues, how do I troubleshoot, and so on?So, that's one set of things. But some cloud only as simple as possible. So, can I use both of them and still have a similar developer experience, and so on? So, I do think that we'll see more of that in the coming years. And as the technology evolves, then we'll have more and more, of course, serverless solutions.By the way, it doesn't end there. Like, we see also, you know, databases and machine learning, and like, there are so many more managed services that are making things easy. And that's what excites me. I mean, that's what's awesome about what we're doing in cloud. We are building platforms that enable innovation.Corey: I think that there's an awful lot of power behind unlocking innovation from a customer perspective. The idea that I can use a cloud provider to wind up doing an experiment to build something in the course of an evening, and if it works, great, I can continue to scale up without having to replace, you know, the crappy Raspberry Pi-level hardware in my spare room with serious enterprise servers in a data center somewhere. The on-ramp and the capability and the lack of long-term commitments is absolutely magical. What I'm also seeing that is contributing to that is the de facto standard that's emerged of most things these days support Docker, for better or worse. There are many open-source tools that I see where, “Oh, how do I get this up and running?”“Well, you can go over the river and through the woods and way past grandmother's house to build this from source or run this Docker file.” I feel like that is the direction the rest of the world is going. And as much fun as it is to sit on the sidelines and snark, I'm finding a lot more capability stories emerging across the board. Does that resonate with what you're seeing, given that you are inherently working at very large scale, given the [laugh] nature of where you work?Chen: I do see that. And I actually want to double down on the open standards, which I think this is also something that is happening. At the beginning, we talked about I want it to be very hard when I choose the cloud provider. But innovation doesn't only come from cloud providers; there's a lot of companies and a lot of innovation happening that are building new technologies on top of those cloud providers, and I don't think this is going to stop. Innovation is going to come from many places, and it's going to be very exciting.And by the way, things are moving super fast in our space. So, the investment in open standard is critical for our industry. So, Docker is one example. Google is in [unintelligible 00:19:46] speaking, it's investing a lot in building those open standards. So, we have Docker, we have things like of course Kubernetes, but we are also investing in open standards of security, so we are working with other partners around [unintelligible 00:19:58], defining how you can secure the software supply chain, which is also critical for innovation. So, all of those things that reduce the barrier to entry is something that I'm personally passionate about.Corey: Scaling containers and scaling Kubernetes is hard, but a whole ‘nother level of difficulty is scaling humans. You've been at Google for, as you said, seven years and you did not start as a VP there. Getting promoted from Senior Director to VP at Google is a, shall we say, heavy lift. You also mentioned that you previously started with, I believe, it was a seven-person team at one point. How have you been able to do that? Because I can see a world in which, “Oh, we just write some code and we can scale the computers pretty easily,” I've never found a way to do that for people.Chen: So yes, I started actually—well not 7, but the team was 30 people [laugh]. And you can imagine how surprised I was when I joining Google Cloud with Kubernetes and GKE and it was a pretty small team, to the beginning of those days. But the team was already actually on the edge of burning out. You know, pings on Slack, the GitHub issues, there was so many things happening 24/7.And the thing was just doing everything. Everybody were doing everything. And one of the things I've done on my second month on the team—I did an off-site, right, all managers; that's what we do; we do off-sites—and I brought the team in to talk about—the leadership team—to talk about our team values. And in the beginning, they were a little bit pissed, I would say, “Okay, Chen. What's going on? You're wasting two days of our lives to talk about those things. Why we are not doing other things?”And I was like, “You know guys, this is really important. Let's talk about what's important for us.” It was an amazing it worked. By the way, that work is still the foundation of the culture in the team. We talked about the three values that we care about and how that will look like.And the reason it's important is that when you scale teams, the key thing is actually to scale decision-making. So, how do you scale decision-making? I think there are two things there. One is what you're trying to achieve. So, people should know and understand the vision and know where we want to get to.But the second thing is, how do we work? What's important for us? How do we prioritize? How do we make trade-offs? And when you have both the what we're trying to do and the how, you build that team culture. And when you have that, I find that you're set up more for success for scaling the team.Because then the storyteller is not just the leader or the manager. The entire team is a storyteller of how things are working in this team, how do we work, what you're trying to achieve, and so on. So, that's something that had been a critical. So, that's just, you know, from methodology of how I think it's the right thing to scale teams. Specifically, with a Kubernetes, there were more issues that we needed to work on.For example, building or [recoding 00:23:05] different functions. It cannot be just engineering doing everything. So, hiring the first product managers and information engineers and marketing people, oh my God. Yes, you have to have marketing people because there are so many events. And so, that was one thing, just you know, from people and skills.And the second thing is that it was an open-source project and a product, but what I was personally doing, I was—with the team—is bringing some product engineering practices into the open-source. So, can we say, for example, that we are going to focus on user experience this next release? And we're not going to do all the rest. And I remember, my team was like worried about, like, “Hey, what about that, and what about this, and we have—” you know, they were juggling everything together. And I remember telling them, “Imagine that everything is on the floor. All the balls are on the floor. I know they're on the floor, you know they're on the floor. It's okay. Let's just make sure that every time we pick something up, it never falls again.” And that idea is a principle that then evolved to ‘No Heroics,' and it evolved to ‘Sustainable Success.' But building things towards sustainable success is a principle which has been very helpful for us.Corey: This episode is sponsored in part by our friend at Uptycs. Attackers don't think in silos, so why would you have siloed solutions protecting cloud, containers, and laptops distinctly? Meet Uptycs - the first unified solution that prioritizes risk across your modern attack surface—all from a single platform, UI, and data model. Stop by booth 3352 at AWS re:Invent in Las Vegas to see for yourself and visit uptycs.com. That's U-P-T-Y-C-S.com. My thanks to them for sponsoring my ridiculous nonsense.Corey: When I take a look back, it's very odd to me to see the current reality that is Google, where you're talking about empathy, and the No Heroics, and the rest of that is not the reputation that Google enjoyed back when a lot of this stuff got started. It was always oh, engineers should be extraordinarily bright and gifted, and therefore it felt at the time like our customers should be as well. There was almost an arrogance built into, well, if you wrote your code more like Google will, then maybe your code wouldn't be so terrible in the cloud. And somewhat cynically I thought for a while that oh Kubernetes is Google's attempt to wind up making the rest of the world write software in a way that's more Google-y. I don't think that observation has aged very well. I think it's solved a tremendous number of problems for folks.But the complexity has absolutely been high throughout most of Kubernetes life. I would argue, on some level, that it feels like it's become successful almost in spite of that, rather than because of it. But I'm curious to get your take. Why do you believe that Kubernetes has been as successful as it clearly has?Chen: [unintelligible 00:25:34] two things. One about empathy. So yes, Google engineers are brilliant and are amazing and all great. And our customers are amazing, and brilliant, as well. And going back to the point before is, everyone has their job and where they need to be successful and we, as you say, we need to make things simpler and enable innovation. And our customers are driving innovation on top of our platform.So, that's the way I think about it. And yes, it's not as simple as it can be—probably—yet, but in studying the early days of Kubernetes, we have been investing a lot in what we call empathy, and the customer empathy workshop, for example. So, I partnered with Kelsey Hightower—and you mentioned yourself trying to start a cluster. The first time we did a workshop with my entire team, so then it was like 50 people [laugh], their task was to spin off a cluster without using any scripts that we had internally.And unfortunately, not many folks succeeded in this task. And out of that came the—what you you call it—a OKR, which was our goal for that quarter, is that you are able to spin off a cluster in three commands and troubleshoot if something goes wrong. Okay, that came out of that workshop. So, I do think that there is a lot of foundation on that empathetic engineering and the open-source of the community helped our Google teams to be more empathetic and understand what are the different use cases that they are trying to solve.And that actually bring me to why I think Kubernetes is so successful. People might be surprised, but the amount of investment we're making on orchestration or placement of containers within Kubernetes is actually pretty small. And it's been very small for the last seven years. Where do we invest time? One is, as I mentioned before, is on the what we call the API machinery.So, Kubernetes has introduced a way that is really suitable for a cloud-native technologies, the idea of reconciliation loop, meaning that the way Kubernetes is—Kubernetes is, like, a powerful automation machine, which can automate, of course, workload placement, but can automate other things. Think about it as a way of the Kubernetes API machinery is observing what is the current state, comparing it to the desired state, and working towards it. Think about, like, a thermostat, which is a different automation versus the ‘if this, then that,' where you need to anticipate different events. So, this idea about the API machinery and the way that you can extend it made it possible for different teams to use that mechanism to automate other things in that space.So, that has been one very powerful mechanism of Kubernetes. And that enabled all of innovation, even if you think about things like Istio, as an example, that's how it started, by leveraging that kind of mechanism to separate storage and so on. So, there are a lot of operators, the way people are managing their databases, or stateful workloads on top of Kubernetes, they're extending this mechanism. So, that's one thing that I think is key and built that ecosystem. The second thing, I am very proud of the community of Kubernetes.Corey: Oh, it's a phenomenal community success story.Chen: It's not easy to build a community, definitely not in open-source. I feel that the idea of values, you know, that I was talking about within my team was actually a big deal for us as we were building the community: how we treat each other, how do we help people start? You know, and we were talking before, like, am I going to talk about DEI and inclusivity, and so on. One of the things that I love about Kubernetes is that it's a new technology. There is actually—[unintelligible 00:29:39] no, even today, there is no one with ten years experience in Kubernetes. And if anyone says they have that, then they are lying.Corey: Time machine. Yes.Chen: That creates an opportunity for a lot of people to become experts in this technology. And by having it in open-source and making everything available, you can actually do it from your living room sofa. That excites me, you know, the idea that you can become an expert in this new technology and you can get involved, and you'll get people that will mentor you and help you through your first PR. And there are some roles within the community that you can start, you know, dipping your toes in the water. It's exciting. So, that makes me really happy, and I know that this community has changed the trajectory of many people's careers, which I love.Corey: I think that's probably one of the most impressive things that it's done. One last question I have for you is that we've talked a fair bit about the history and how we see it progressing through the view toward the somewhat recent past. What do you see coming in the future? What does the future of Kubernetes look like to you?Chen: Continue to be more and more boring. There is the promise of hybrid and multi-cloud, for example, is only possible by technologies like Kubernetes. So, I do think that, as a technology, it will continue to be important by ensuring portability and interoperability of workloads. I see a lot of edge use cases. If you think about it, it's like just lagging a bit around, like, innovation that we've seen in the cloud, can we bring that innovation to the edge, this will require more development within Kubernetes community as well.And that's really actually excites me. I think there's a lot of things that we're going to see there. And by the way, you've seen it also in KubeCon. I mean, there were some announcements in that space. In Google Cloud, we just announced before, like, with customers like Wendy's and Rite Aid as well. So, taking advantage of this technology to allow innovation everywhere.But beyond that, my hope is that we'll continue and hide the complexity. And our challenge will be to not make it a black box. Because that will be, in my opinion, a failure pattern, doesn't help those kinds of platforms. So, that will be the challenge. Can we scope the project, ensure that we have the right observability, and from a use case perspective, I do think edge is super interesting.Corey: I would agree. There are a lot of workloads out there that are simply never going to be hosted in the cloud provider region, for a variety of reasons of varying validity, but it is the truth. I think that the focus on addressing customers where they are has been an emerging best practice for cloud providers and I'm thrilled to see Google leading the charge on that.Chen: Yeah. And you just reminded me, the other thing that we see also more and more is definitely AI and ML workloads running on Kubernetes, which is part of that, right? So, Google Cloud is investing a lot in making an AI/ML easy. And I don't know if many people know, but, like, even Vertex AI, our own platform, is running on GKE. So, that's part of seeing how do we make sure that platform is suitable for these kinds of workloads and really help customers do the heavy lifting.So, that's another set of workloads that are very relevant at the edge. And one of our customers—MLB, for example—two things are interesting there. The first one, I think a lot of people sometimes say, “Okay, I'm going to move to the cloud and I want to know everything right now, how that will evolve.” And one of the things that's been really exciting with working with MLB for the last four years is the journey and the iterations. So, they started somewhat, like, at one phase and then they saw what's possible, and then moved to the next one, and so on. So, that's one. The other thing is that, really, they have so much ML running at the stadium with Google Cloud technology, which is very exciting.Corey: I'm looking forward to seeing how this continues to evolve and progress, particularly in light of the recent correction we're seeing in the market where a lot of hype-driven ideas are being stress test, maybe not in the way we might have hoped that they would, but it'll be really interesting to see what shakes out as far as things that deliver business value and are clear wins for customers versus a lot of the speculative stories that we've been hearing for a while now. Maybe I'm totally wrong on this. And this is going to be a temporary bump in the road, and we'll see no abatement in the ongoing excitement around so many of these emerging technologies, but I'm curious to see how it plays out. But that's the beautiful part about getting to be a pundit—or whatever it is people call me these days that's at least polite enough to say on a podcast—is that when I'm right, people think I'm a visionary, and when I'm wrong, people don't generally hold that against you. It seems like futurist is the easiest job in the world because if you predict and get it wrong, no one remembers. Predict and get it right, you look like a genius.Chen: So, first of all, I'm optimistic. So usually, my predictions are positive. I will say that, you know, what we are seeing, also what I'm hearing from our customers, technology is not for the sake of technology. Actually, nobody cares [laugh]. Even today.Okay, so nothing needs to change for, like, nobody would c—even today, nobody cares about Kubernetes. They need to care, unfortunately, but what I'm hearing from our customers is, “How do we create new experiences? How we make things easy?” Talent shortage is not just with tech people. It's also with people working in the warehouse or working in the store.Can we use technology to help inventory management? There's so many amazing things. So, when there is a real business opportunity, things are so much simpler. People have the right incentives to make it work. Because one thing we didn't talk about—right, we talked about all these new technologies and we talked about scaling team and so on—a lot of time, the challenge is not the technology.A lot of time, the challenge is the process. A lot of time, the challenge is the skills, is the culture, there's so many things. But when you have something—going back to what I said before—how you unite teams, when there's something a clear goal, a clear vision that everybody's excited about, they will make it work. So, I think this is where having a purpose for the innovation is critical for any successful project.Corey: I think and I hope that you're right. I really want to thank you for spending as much time with me as you have. If people want to learn more, where's the best place for them to find you?Chen: So, first of all, on Twitter. I'm there or on LinkedIn. I will say that I'm happy to connect with folks. Generally speaking, at some point in my career, I recognized that I have a voice that can help people, and I've experienced that can also help people build their careers. I'm happy to share that and [unintelligible 00:36:54] folks both in the company and outside of it.Corey: I think that's one of the obligations on a lot of us, once we wanted to get into a certain position or careers to send the ladder back down, for lack of a better term. It's I've never appreciated the perspective, “Well, screw everyone else. I got mine.” The whole point the next generation should have it easier than we did.Chen: Yeah, definitely.Corey: Chen Goldberg, General Manager of Cloud Runtimes and VP of Engineering at Google. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry rant of a comment talking about how LPARs on mainframes are absolutely not containers, making sure it's at least far too big to fit in a reasonably-sized Docker container.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Quest to Make Edge Computing a Reality with Andy Champagne

Screaming in the Cloud

Play Episode Listen Later Nov 10, 2022 46:56


About AndyAndy is on a lifelong journey to understand, invent, apply, and leverage technology in our world. Both personally and professionally technology is at the root of his interests and passions.Andy has always had an interest in understanding how things work at their fundamental level. In addition to figuring out how something works, the recursive journey of learning about enabling technologies and underlying principles is a fascinating experience which he greatly enjoys.The early Internet afforded tremendous opportunities for learning and discovery. Andy's early work focused on network engineering and architecture for regional Internet service providers in the late 1990s – a time of fantastic expansion on the Internet.Since joining Akamai in 2000, Akamai has afforded countless opportunities for learning and curiosity through its practically limitless globally distributed compute platform. Throughout his time at Akamai, Andy has held a variety of engineering and product leadership roles, resulting in the creation of many external and internal products, features, and intellectual property.Andy's role today at Akamai – Senior Vice President within the CTO Team - offers broad access and input to the full spectrum of Akamai's applied operations – from detailed patent filings to strategic company direction. Working to grow and scale Akamai's technology and business from a few hundred people to roughly 10,000 with a world-class team is an amazing environment for learning and creating connections.Personally Andy is an avid adventurer, observer, and photographer of nature, marine, and astronomical subjects. Hiking, typically in the varied terrain of New England, with his family is a common endeavor. He enjoys compact/embedded systems development and networking with a view towards their applications in drone technology.Links Referenced: Macrometa: https://www.macrometa.com/ Akamai: https://www.akamai.com/ LinkedIn: https://www.linkedin.com/in/andychampagne/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With Tailscale SSH, you can do exactly that. Tailscale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate SSH.Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built-in key rotation, permissions as code, connectivity between any two devices, reduce latency, and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?Nope, I wish it were. Tailscale is completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscaleCorey: Managing shards. Maintenance windows. Overprovisioning. ElastiCache bills. I know, I know. It's a spooky season and you're already shaking. It's time for caching to be simpler. Momento Serverless Cache lets you forget the backend to focus on good code and great user experiences. With true autoscaling and a pay-per-use pricing model, it makes caching easy. No matter your cloud provider, get going for free at gomomento.co/screaming That's GO M-O-M-E-N-T-O dot co slash screamingCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I like doing promoted guest episodes like this one. Not that I don't enjoy all of my promoted guest episodes. But every once in a while, I generally have the ability to wind up winning an argument with one of my customers. Namely, it's great to talk to you folks, but why don't you send me someone who doesn't work at your company? Maybe a partner, maybe an investor, maybe a customer. At Macrometa who's sponsoring this episode said, okay, my guest today is Andy Champagne, SVP at the CTO office at Akamai. Andy, thanks for joining me.Andy: Thanks, Corey. Appreciate you having me. And appreciate Macrometa letting me come.Corey: Let's start with talking about you, and then we'll get around to the Macrometa discussion in the fullness of time. You've been at an Akamai for 22 years, which in tech company terms, it's like staying at a normal job for 75 years. What's it been like being in the same place for over two decades?Andy: Yeah, I've got several gold watches. I've been retired twice. Nobody—you know, Akamai—so in the late-90s, I was in the ISP universe, right? So, I was in network engineering at regional ISPs, you know, kind of cutting teeth on, you know, trying to scale networks and deal with the flux of user traffic coming in from the growth of the web. And, you know, frankly, it wasn't working, right?Companies were trying to scale up at the time by adding bigger and bigger servers, and buying literally, you know, servers, the size of refrigerators. And all of a sudden, there was this company that was coming together out in Cambridge, I'm from Massachusetts, and Akamai started in Cambridge, Massachusetts, still headquartered there. And Akamai was forming up and they had a totally different solution to how to solve this, which was amazing. And it was compelling and it drew me there, and I am still there, 22-odd years in, trying to solve challenging problems.Corey: Akamai is one of those companies that I often will describe to people who aren't quite as inclined in the network direction as I've been previously, as one of the biggest companies of the internet that you've never heard of. You are—the way that I think of you historically, I know this is not how you folks frame yourself these days, but I always thought of you as the CDN that you use when it really mattered, especially in the earlier days of the internet where there were not a whole lot of good options to choose from, and the failure mode that Akamai had when I was looking at it many years ago, is that, well, it feels enterprise-y. Well, what does that mean exactly because that's usually used as a disparaging term by any developer in San Francisco. What does that actually unpack to? And to my mind, it was, well, it was one of the more expensive options, which yes, that's generally not a terrible thing, and also that it felt relatively stodgy, for lack of a better term, where it felt like updating things through an API was more of a JSON API—namely a guy named Jason—who would take a ticket, possibly from Jira if they were that modern or not, and then implement it by hand. I don't believe that it is quite that bad these days because, again, this was circa 2012 that we're talking here. But how do you view what Akamai is and does in 2022?Andy: Yeah. Awesome question. There's a lot to unpack in there, including a few clever jabs you threw in. But all good.Corey: [laugh].Andy: [laugh]. I think Akamai has been through a tremendous, tremendous series of evolutions on the internet. And really the one that, you know, we're most excited about today is, you know, earlier this year, we kind of concluded our acquisition of Linode. And if we think about Linode, which brings compute into our platform, you know, ultimately Akamai today is a compute company that has a security offering and has a delivery offering as well. We do more security than delivery, so you know, delivery is kind of something that was really important during our first ten or twelve years, and security during the last ten, and we think compute during the next ten.The great news there is that if you look at Linode, you can't really find a more developer-focused company than Linode. You essentially fall into a virtual machine, you may accidentally set up a virtual machine inadvertently it's so easy. And that is how we see the interface evolving. We see a compute-centric interface becoming standard for people as time moves on.Corey: I'm reminded of one of those ancient advertisements, I forget, I think would have been Sun that put it out where the network is the computer or the computer is the network. The idea of that a computer sitting by itself unplugged was basically just this side of useless, whereas a bunch of interconnected computers was incredibly powerful. That today and 2022 sounds like an extraordinarily obvious statement, but it feels like this is sort of a natural outgrowth of that, where, okay, you've wound up solving the CDN piece of it pretty effectively. Now, you're expanding out into, as you say, compute through the Linode acquisition and others, and the question I have is, is that because there's a larger picture that's currently unfolding, or is this a scenario where well, we nailed the CDN side of the world, well, on that side of the universe, there's no new worlds left to conquer. Let's see what else we can do. Next, maybe we'll start making toasters.Andy: Bunch of bored guys in Cambridge, and we're just like, “Hey, let's go after compute. We don't know what we're doing.” No. There's a little bit more—Corey: Exactly. “We have money and time. Let's combine the two and see what we can come up with.”Andy: [laugh]. Hey, folks, compute: it's the new thing. No, it's more than that. And you know, Akamai has a very long history with the edge, right? And Akamai started—and again, arrogantly saying, we invented the concept of the edge, right, out there in '99, 2000, deploying hundreds and then to thousands of different locations, which is what our CDN ran on top of.And that was a really new, novel concept at the time. We extended that. We've always been flirting with what is called edge computing, which is how do we take pieces of application logic and move them from a centralized point and move them out to the edge. And I mean, cripes, if you go back and Google, like, ‘Akamai edge computing,' we were working on that in 2003, which is a bit like ancient history, right? And we are still on a quest.And literally, we think about it in the company this way: we are on a quest to make edge computing a reality, which is how do you take applications that have centralized chokepoints? And how do you move as much of those applications as possible out to the edge of the network to unblock user performance and experience, and then see what folks developers can enable with that kind of platform?Corey: For me, it seems that the rise of AWS—which is, by extension, the rise of cloud—has been, okay, you wind up building whatever you want for the internet and you stuff it into an AWS region, and oh, that's far away from your customers and/or your entire architecture is terrible so it has to make 20 different calls to the data center in series rather than in parallel. Great, how do we reduce the latency as much as possible? And their answer has largely seemed to be, ah, we'll build more regions, ever closer to you. One of these days, I expect to wake up and find that there's an announcement that they're launching a new region in my spare room here. It just seems to get closer and closer and closer. You look around, and there's a cloud construction crew stalking you to the mall and whatnot. I don't believe that is the direction that the future necessarily wants to be going in.Andy: Yeah, I think there's a lot there. And I would say it this way, which is, you know, having two-ish dozen uber-large data centers is probably not the peak technology of the internet, right? There's more we need to do to be able to get applications truly distributed. And, you know, just to be clear, I mean, Amazon AWS's done amazing stuff, they've projected phenomenal scale and they continue to do so. You know, but at Akamai, the problem we're trying to solve is really different than how do we put a bunch of stuff in a small number of data centers?It's, you know, obviously, there's going to be a centralized aspect, but there also needs to be incredibly integrated and seamless, moves through a gradient of compute, where hey, maybe you're in a very large data center for your AI/ML, kind of, you know, offline data lake type stuff. And then maybe you're in hundreds of locations for mid-tier application processing, and, you know, reconciliation of databases, et cetera. And then all the way out at the edge, you know, in thousands of locations, you should be there for user interactivity. And when I say user interactivity, I don't just mean, you know, read-only, but you've got to be able to do a read-write operation in synchronous fashion with the edge. And that's what we're after is building ultimately a platform for that and looking at tools, technology, and people along the way to help us with it.Corey: I've built something out, my lasttweetinaws.com threading Twitter client, and that's… it's fine. It's stateless, but it's a little too intricate to effectively run in the Lambda@Edge approach, so using their CloudFront offering is simply a non-starter. So, in order to get low latency for people using it around the world, I now have to deploy it simultaneously to 20 different AWS regions.And that is, to be direct, a colossal pain in the ass. No one is really doing stuff like that, that I can see. I had to build a whole lot of customs tooling just to get a CI/CD system up and working. Their strong regional isolation is great for containing blast radii, but obnoxious when you're trying to get something deployed globally. It's not the only way.Combine that with the reality that ingress data transfer to any of their regions is free—generally—but sending data to the internet is a jewel beyond price because all my stars, that is egress bandwidth; there is nothing more valuable on this planet or any other. And that doesn't quite seem right. Because if that were actively true, a whole swath of industries and apps would not be able to exist.Andy: Yeah, you know, Akamai, a huge part of our business is effectively distributing egress bandwidth to the world, right? And that is a big focus of ours. So, when we look at customers that are well positioned to do compute with Akamai, candidly, the filtering question that I typically ask with customers is, “Hey, do you have a highly distributed audience that you want to engage with, you know, a lot of interactivity or you're pushing a lot of content, video, updates, whatever it is, to them?” And that notion of highly distributed applications that have high egress requirements is exactly the sweet spot that we think Akamai has, you know, just a great advantage with, between our edge platform that we've been working on for the last 20-odd years and obviously, the platform that Linode brings into the conversation.Corey: Let's talk a little bit about Macrometa.Andy: Sure.Corey: What is the nature of your involvement with those folks? Because it seems like you sort of crossed into a whole bunch of different areas simultaneously, which is fascinating and great to see, but to my understanding, you do not own them.Andy: No, we don't. No, they're an independent company doing their thing. So, one of the fun hats that I get to wear at Akamai is, I'm responsible for our Akamai Ventures Program. So, we do our corporate investing and all this kind of thing. And we work with a wide array of companies that we think are contributing to the progression of the internet.So, there's a bunch of other folks out there that we work with as well. And Macrometa is on that list, which is we've done an investment in Macrometa, we're board observers there, so we get to sit in and give them input on, kind of, how they're doing things, but they don't have to listen to us since we're only observers. And we've also struck a preferred partnership with them. And what that means is that as our customers are building solutions, or as we're building solutions for our customers, utilizing the edge, you know, we're really excited and we've got Macrometa at the table to help with that. And Macrometa is—you know, just kind of as a refresher—is trying to solve the problem of distributed data access at the edge in a high-performance and almost non-blocking, developer-friendly way. And that is very, very exciting to us, so that's the context in which they're interesting to our continuing evolution of how the edge works.Corey: One of the questions I always like to ask, and it's usually not considered a personal attack when I asked the question—Andy: Oh, good.Corey: But it's, “Describe what the company does.” Now, at some places like the latter days of Yahoo, for example, it's very much a personal attack. But what is it that Macrometa does?Andy: So, Macrometa provides a worldwide, high-speed distributed database that is resident on what today, you could call the edge of the network. And the advantage here is, instead of having one SQL server sitting somewhere, or what you would call a distributed SQL Server, which is two SQL Servers sitting next to one another, Macrometa has a high-speed data store that allows you to, instead of having that centralized SQL Server, have it run natively at the edge of the network. And when you're building applications that run on the edge or anywhere, you need to try to think about how do you have the data as close to the user or to the access point as possible. And that's the problem Macrometa is after and that's what their products today solve. It's an incredibly bright team over there, a fantastic founder-CEO team, and we're really excited to be working with him.Corey: It wasn't intentionally designed this way as a setup when I mentioned a few minutes ago, but yeah, my Twitter client works across the 20-some-odd AWS regions, specifically because it's stateless. All of the state, other than a couple of API keys at provision time, wind up living in the user's browser. If this was something that needed to retain state in any way, like, you know, basically every real application under the sun, this strategy would absolutely not work unless I wound up with some heinous form of circular replication, and then you wind up with a single region going down and everything explodes. Having a cohesive, coherent data layer that spans all of that is key.Andy: Yeah, and you're on to the classical, you know, CompSci issue here around edge, which is if you have 100 edge regions, how do you have consistent state storage between applications running on N of those? And that is the problem Macrometa is after, and, you know, Akamai has been working on this and other variants of the edge problem for some time. We're very excited to be working with the folks at Macrometa. It's a cool group of folks. And it's an interesting approach to the technology. And from what we've seen so far, it's been working great.Corey: The idea of how do I wind up having persistent, scalable state across a bunch of different edge locations is not just a hard computer science problem; it's also a hard cloud economics problem, given the cost of data transit in a bunch of different directions between different providers. It turns, “How much does it cost?” In most cases to a question that can only be answered by well let's run it for a few days and find out. Which is not usually the best way to answer some questions. Like, “Is that power socket live?” “Let's touch it and find out.” Yeah, there are ways you learn that are extraordinarily painful.Andy: Yeah no, nobody should be doing that with power sockets. I think this is one of these interesting areas, which is this is really right in Akamai's backyard but it's not realized by a lot of folks. So, you know, Akamai has, for the last 20-odd-years, been all about how do we egress as much as possible to the entire internet. The weird areas, the big areas, the small areas, the up-and-coming areas, we serve them all. And in doing that, we've built a very large global fabric network, which allows us to get between those locations at a very low cost because we have to move our own content around.And hooking those together, having a essentially private network fabric that hooks the vast majority of our big locations together and then having very high-speed egress out of all of the locations to the internet, you know, that's been how we operate our business at scale effectively and economically for years, and utilizing that for compute data replication, data synchronization tasks is what we're doing.Corey: There are a lot of different solutions that could be used to solve a lot of the persistent data layer question. For example, when you had to solve a similar problem with compute, you had a few options in front of you. Well, we could buy a whole bunch of computers and stuff them in a rack somewhere because, eh, cloud; how hard could it be? Saner heads prevailed, and no, no, no, we're going to buy Linode, which was honestly a genius approach on about three different levels, and I'm still unconvinced the industry sees that for the savvy move that it was. I'm confident that'll change in time.Why not build it yourself? Or alternately, acquire another company that was working on something similar? Instead, you're an investor in a company that's doing this effectively, but not buying them outright?Andy: Yeah, you know, and I think that's—Akamai is beyond at this point in thinking that it's just about ownership, right? I think that this—we don't have to own everything in order to have a successful ecosystem. You know, certainly, we're going to want to own key parts of it and that's where you saw the Linode acquisition, where we felt that was kind of core. But ultimately, we believe in promoting customer choice here. And there's a pretty big role that we have that we think we can help with companies, such as folks like Macrometa where they have, you know, really interesting technology, but they can use leverage, they can use some of our go-to-market, they can use, you know, some of our, you know, kind of guidance and expertise on running a startup—which, by the way, it's not an easy job for these folks—and that's what we're there to do.So, with things like Linode, you know, we want to bring it in, and we want to own it because we think it's just so compelling, and it fits so well with where we want to go. With folks like Macrometa, you know, that's still a really young area. I mean, you know, Linode was in business for many, many, many years and was a good-sized business, you know, before we bought them.Corey: Yeah, there's something to be said, for letting the market shake something out rather than having to do it all yourself as trailblazers. I'm a big believer in letting other companies do things. I mean, one of the more annoying things, from my position, is this idea where AWS takes a product strategy of, “Yes.” That becomes a bit of a challenge when they're trying to wind up building compete decks, and how do we defeat the competition? And it's like, “Wh—oh, you're talking about the other hyperscalers?” “No, we're talking with the service team one floor away.”That just seems a little on the strange side to—some companies get too big and too expensive on some level. I think that there's a very real risk of Akamai trying to do everything on the internet if you continue to expand and start listing out things that are not currently in your portfolio. And, oh, we should do that, too, and we should do that, too, and we should do that, too. And suddenly, it feels pretty closely aligned with you're trying to do everything.Andy: Yeah. I think we've been a company who has been really disciplined and not doing everything. You know, we started with CDN. And you know, we're talking '98 to 2010, you know, CDN was really our thing, and we feel we executed really well on that. We probably executed quite quietly and well, but feel we executed pretty well on that.Really from 2010, 2012 to 2020, it was all about security, right? And, you know, we built, you know, pretty amazing security business, hundred percent of SaaS business, on top of our CDN platform with security. And now we're thinking about—we did that route relatively quietly, as well, and now we're thinking about the next ten years and how do we have that same kind of impact on cloud. And that is exciting because it's not just centralized cloud; it's about a distributed cloud vision. And that is really compelling and that's why you know, we've got great folks that are still here and working on it.Corey: I'm a big believer in the idea that you can start getting distilled truth out of folks, particularly companies, the more you compress the space they have to wind up saying. Something that's why Twitter very often lets people tip their hands. But a commonplace that I look for is the title field on a company's website. So, when I go over to akamai.com, you position yourself as something that fits in a small portion of a tweet, which is good. Whenever have a Tolstoy-length paragraph in the tooltip title for the browser tab, that's a problem.But you say simply, “Security, cloud delivery, performance. Akamai.” Which is beautifully well done, but security comes first. I have a mental model of Akamai as being a CDN and some other stuff that I don't fully understand. But again, I first encountered you folks in the early-2000s.It turns out that it's hard to change existing opinions. Are you a CDN Company or are you a security company?Andy: Oh, super—Corey: In other words, if someone wind up mis-alphabetizing that and they're about to get censured after this show because, “No, we're a CDN, first; why did you put security first?”Andy: You know, so all those things feed off each other, right? And this has been a question where it's like, you know, our security layer and our distributed WAF and other security offerings run on top of the CDN layer. So, it's all about building a common compute edge and then leveraging that for new applications. CDN was the first application. The next and second application was security.And we think the third application, but probably not the final one, is compute. So, I think I don't think anyone in marketing will be fired by the ordering that they did on that. I think that ultimately now, you know, for—just if we look at it from a monetary perspective, right, we do more security than we do CDN. So, there's a lot that we have in the security business. And you know, compute's got a long way to go, especially because it's not just one big data center of compute; it is a different flavor than I think folks have seen before.Corey: When I was at RSA, you folks were one of the exhibitors there. And I like to make the common observation that there are basically six companies that exhibit at RSA. Yeah, there are hundreds of booths, but it's the same six products, all marketed are different logos with different words. And they all seem to approach it from a few relatively expectable personas and positions. I've always found myself agreeing with the things that you folks say, and maybe it's because of my own network-centric background, but it doesn't seem like you take the same approach that a number of other companies do or it's, “Oh, it has to start with the way that developers write their first line of code.” Instead, it seems to take a holistic view that comes from the starting position of everything talks to each other on a network basis, and from here, let's move forward. Is that accurate to how you view the security space?Andy: Yeah, you know, our view of the security space is—again, it's a network-centric one, right? And our work in the security space initially came from really big DDoS attacks, right? And how do we stop Distributed Denial of Service attacks from impacting folks? And that was the initial benefit that we brought. And from there, we evolved our story around, you know, how do we have a more sophisticated WAF? How do we have predictive capabilities at the edge?So ultimately, we're not about ingraining into your process of how your thing was written or telling you how to write it. We're about, you know, essentially being that perimeter edge that is watching and monitoring everything that comes into you to make sure that, you know, hey, we're not seeing Log4j-type exploits coming at you, and we'll let you know if we do, or to block malicious activity. So, we fit on anything, which is why our security business has been so successful. If you have an application on the edge, you can put Akamai Security in front of it and it's going to make your application better. That's been super compelling for the last, you know, again, last decade or so that we've really been focused on security.Corey: I think that it is a mistake to take a security model that starts with a view of what people have in front of them day-to-day—like, I look at my laptop and say, “Oh, this is what I spend my time on. This is where all security must start and stop.” Because yeah, okay, great. If you get physical access to my laptop, it's pretty much game over on some level. But yeah, if you're at a point where you're going to bust into my house and threaten me in order to get access to my laptop, here you go.There are no secrets that I am in possession of that are worth dying for. It's just money and that's okay. But looking at it through a lens of the internet has gone from science experiment to thing that the nerds love to use to a cornerstone of the fabric of modern society. And that's not because of the magic supercomputer that we all have in our pockets, but rather because those magic supercomputers can talk to the sum total of human knowledge and any other human anywhere on the planet, basically, ever. And I don't know that that evolution has been really appreciated by society at large as far as just how empowering that can be. But it completely changes the entire security paradigm from back in the '80s when I got started, don't put untrusted floppy disks into your computer or it might literally explode on your desk.Andy: [laugh]. So, we're talking about floppy disks now? Yes. So, first of all, the scope of impact of the internet has increased, meaning what you can do with it has increased. And directly proportional to that increase the threat vectors have increased, right? And the more systems are connected, the more vulnerabilities there are.So listen, it's easy to scare anybody about security on the internet. It is a topic that is an infinite well of scariness. At the same time, you know, and not just Akamai, but there's a lot of companies out there that can, whether it's making your development more secure, making your pipeline, your digital supply chain a more secure, or then you know where Akamai is, we're at the end, which is you know, helping to wrap around your entire web presence to make it more secure, there's a variety of companies that are out there really making the internet work from a security perspective. And honestly, there's also been tremendous progress on the operating system front in the last several years, which previously was not as good—probably is way to characterize it—as it is today. So, and you know, at the end of the day, the nerds are still out there working, right?We are out here still working on making the internet, you know, scale better, making it more secure, making it more robust because we're probably not done, right? You know, phones are awesome, and tablet devices, et cetera, are awesome, but we've probably got more coming. We don't quite know what that is yet, but we want to have the capacity, safety, and compute to power it.Corey: How does Macrometa as a persistent data layer tie into your future vision of security first as what Akamai does? I can see a few directions, but I'm going to go out on a limb and guess that before you folks decided to make an investment in such a thing, you probably gave it more than the 30 seconds or whatnot or so a thought that I've had to wind up putting these pieces together.Andy: So, a few things there. First of all, Macrometa, ultimately, we see them coming in the front door with our compute solution, right? Because as folks are building capabilities on the edge, “Hey, I want to run compute on the edge. How do I interoperate with data?” The worst answer possible is, “Well, call back to the centralized data store.”So, we want to ensure that customers have choice and performance options for distributed data access. Macrometa fits great there. However, now pause that; let's transition back to the security point you raised, which is, you know, coordinating an edge data security platform is a really complicated thing. Because you want to make sure that threats that are coming in on one side of the network, or you know, in one given country, you know, are also understood throughout the network. And there's a definite role for a data platform in doing that.We obviously, you know, for the last ten years have built several that help accomplish that at scale for our network, but we also recognize that, you know, innovation in data platforms is probably not done. And you know, Macrometa's got some pretty interesting approaches. So, we're very interested in working with them and talking jointly with customers, which we've done a bunch of, to see how that progresses. But there's tie-ins, I would say, mostly on compute, but secondarily, there's a lot of interesting areas with real-time security intel, they can be very useful as well.Corey: Since I have you here, I would love to ask you something that's a little orthogonal to the rest of this conversation, but I don't even care about that because that's why it's my show; I can ask what I want.Andy: Oh, no.Corey: Talk to me a little bit about the Linode acquisition. Because when it first came out, I thought, “Oh, Linode must not be doing well, so it's an acqui-hire scenario.” Followed by, “Wait a minute, that doesn't seem quite right.” And I dug deeper, and suddenly, I started to see a bunch of things that made sense. But that's just my outside perspective. I prefer to see you justify what it is that you've done.Andy: Justify what we've done. Well, with that positive framing—Corey: Exactly. “Explain yourself. How dare you, sir?”Andy: [laugh]. “What are you doing?” So, to take that, which is first of all, Linode was doing great when we bought them and they're continuing to do great now. You know, backstory here is actually a fun one. So, I personally have been a customer of Linode for about 13 years, and you know, super familiar with their offerings, as we're a bunch of other folks at Akamai.And what ultimately attracted us to Linode was, first of all, from a strategic perspective, is we talked about how Akamai thinks about Compute being a gradient of compute: you've got the edge, you've got kind of a middle tier, and you've got more centralized locations. Akamai has the edge, we've got the middle, we didn't have the central. Linode has got the central. And obviously, you know, we're going to see some significant expansion of capacity and scale there, but they've got the central location. And, you know, ultimately, we feel that there's a lot of passion in Linode.You know, they're a Linux open-source-centric company, and believe it or not Akamai is, too. I mean, you know, that's kind of how it works. And there was a great connection between the sorts of folks that they had and how they think about customers. Linode was a really customer-driven company. I mean, they were fanatical.I mean, I as a, you know, customer of $30 a month personally, could open a ticket and I'd get an answer in five minutes. And that's very similar to kind of how Akamai is driven, which is we're very customer-centric, and when a customer has a problem or need something different, you know, we're on it. So, there's literally nothing bad there and it's a super exciting beginning of a new chapter for Akamai, which is really how do we tackle compute? We're super excited to have the Linode team. You know, they're still mostly down in Philadelphia doing their thing.And, you know, we've hired substantially and we're continuing to do so, so if you want to work there, drop a note over. And it's been fantastic. And it's one of our, you know, really large acquisitions that we've done, and I think we were really lucky to find a great company in such a good position and be able to make it work.Corey: From my perspective, one of the areas that has me excited about the acquisition stems from what I would consider to be something of a customer-base culture misalignment between the two companies. One of the things that I have always enjoyed about Linode—and in the interest of full transparency, they have been a periodic sponsor over the last five or six years of my ridiculous nonsense. I believe that they are not at the moment which I expect you to immediately rectify after this conversation, of course.Andy: I'll give you my credit card. Yeah.Corey: Excellent. Excellent. We do not get in the way of people trying to give you money. But it was great because that's exactly it. I could take a credit card in the middle of the night and spin up things on Linode.And it was one of those companies that aligned very closely to how I tended to view cloud infrastructure from the perspective of, I need a Linux box, or I need a bunch of Linux boxes right there, right now, and I don't have 12 weeks to go to cloud school to learn the intricacies of a given provider. It more or less just worked in a whole bunch of easy ways. Whereas if I wanted to roll out at Akamai, it was always I would pull up the website, and it's, “Click here to talk to our enterprise sales team.” And that tells me two things. One, it is probably going to be outside of my signing authority because no one trusts me with money for obvious reasons, when I was an employee, and two, you will not be going to space today because those conversations always take time.And it's going to be—if I'm in a hurry and trying to get something out the door, that is going to act as a significant drag on capability. Now, most of your customers do not launch things by the seat of their pants, three hours after the idea first occurs to them, but on Linode, that often seems to be the case. The idea of addressing developers early on in the ‘it's just an idea' phase. I can't shake the feeling that there's a definite future in which Linode winds up being able to speak much more effectively to enterprise, while Akamai also learns to speak to, honestly, half-awake shitposters at 2 a.m. when we're building something heinous.Andy: I feel like you've been sitting in on our strategy presentations. Maybe not the shitposters, but the rest of it. And I think the way that I would couch it, my corporate-speak of that, would be that there's a distinct yin and yang, there a complementary nature between the customer bases of Akamai, which has, you know, an incredible list of enterprise customers—I mean, the who's-who of enterprise customers, Akamai works with them—but then, you know, Linode, who has really tremendous representation of developers—that's what we'll use for the name posts—like, folks like myself included, right, who want to throw something together, want to spin up a VM, and then maybe tear it down and never do it again, or maybe set up 100 of them. And, to your point, the crossover opportunities there, which is, you know, Linode has done a really good job of having small customers that grow over time. And by having Akamai, you know, you can now grow, and never have to leave because we're going to be able to bring enough scale and throughput and, you know, professional help services as you need it to help you stay in the ecosystem.And similarly, Akamai has a tremendous—you know, the benefit of a tremendous set of enterprise customers who are out there, you know, frankly, looking to solve their compute challenges, saying, “Hey, I have a highly distributed application. Akamai, how can you help me with this?” Or, “Hey, I need presence in x or y.” And now we have, you know, with Linode, the right tools to support that. And yes, we can make all kinds of jokes about, you know, Akamai and Linode and different, you know, people and archetypes we appeal to, but ultimately, there's an alignment between Akamai and Linode on how we approach things, which is about Linux, open-source, it's about technical honesty and simplicity. So, great group of folks. And secondly, like, I think the customer crossover, you're right on it. And we're very excited for how that goes.Corey: I also want to call out that Macrometa seems to have split this difference perfectly. One of the first things I visit on any given company's page when I'm trying to understand them is the pricing page. It's one of those areas where people spend the least time, early on, but it's also where they tend to be the most honest. Maybe that's why. And I look for two things, and Macrometa has both of them.The first is a ‘try it for free, right now, get started.' It's a free-tier approach. Because even if you charge $10 or whatnot, there are many developers working on things in odd hours where they don't necessarily either have the ability to make that purchase decision, know that they have the ability to make that purchase decision, or are willing to do that by the seat of their pants. So, ‘get started for free' is important; it means you can develop right now. Conversely, there are a bunch of enterprise procurement departments out there who will want a whole bunch of custom things.Custom SLAs, custom support responses, custom everything, and they also don't know how to sign a check that doesn't have two commas in it. So, you don't probably want to avoid those customers, but what they're looking for is an enterprise offering that is no price. There should not be a price tag on that because you will never get it right for everyone, but what they want to see is ‘click here to contact sales.' That is coded language for, “We are serious professionals and know who you are and how you like to operate.” They've got both and I think that is absolutely the right decision.Andy: It do—Corey: And whatever you have in between those two is almost irrelevant.Andy: No, I think you're on it. And Macrometa, their pricing philosophy allows you to get in and try it with zero friction, which is super important. Like, I don't even have to use a credit card. I can experiment for free, I can try it for free, but then as I grow their pricing tier kind of scales along with that. And it's a—you know, that is the way that folks try applications.I always try to think about, hey, you know, if I'm on a team and we're tasked with putting together a proof of concept for something in two days, and I've got, you know, a couple folks working with me, how do I do that? And you don't have time for procurement, you might need to use the free thing to experiment. So, there is a lot that they can do. And you know, their pricing—this transparency of pricing that they have is fantastic. Now, Linode, also very transparent, we don't have a free tier, but you know, you can get in for very low friction and try that as well.Corey: Yeah, companies tend to go through a maturity curve evolution on these things. I've talked to companies that purely view it is how much money a given customer is spending determines how much attention they get. And it's like, “Yeah, maybe take a look through some of your smaller users or new signups there.” Yeah, they're spending $10 a month or whatnot, but their email address is@cocacola.com. Just spitballing here; maybe you might want a white-glove a few of those folks, just because not everyone comes in the door via an RFP.Andy: Yep. We look at customers for what your potential is, right? Like, you know, how much could you end up spending with us, right? You know, so if you're building your application on Linode, and you're going to spend $20, for the first couple months, that's totally fine. Get in there, experiment, and then you know, in the next several years, let's see where it goes. So, you're exactly right, which is, you know, that username@enterprisedomain.com is often much more indicative than what the actual bill is on a monthly basis.Corey: I always find it a little strange when I have a vendor that I'm doing business with, and then suddenly, an account person reaches out, like, hey, let's just have a call for half an hour to talk about what you're doing and how you're doing it. It's my immediate response to that these days, just of too many years doing that, as, “I really need to look at that bill. How much are we spending, again?” And I honestly, usually not that much because believe it or not, when you focus on cloud economics for a living, you pay attention to your credit card bills, but it is always interesting to see who reaches out and who doesn't. That's been a strange approach, and there is no one right answer for all of this.If every free tier account user of any given cloud provider wound up getting constant emails from their account managers, it's how desperate are you to grow revenue, and what are you about to do to pricing? At some level of becomes… unhelpful.Andy: I can see that. I've had, personally, situations where I'm a trial user of something, and all of a sudden I get emails—you know, using personal email addresses, no Akamai involvement—all of a sudden, I'm getting emails. And I'm like, “Really? Did I make the priority list for you to call me and leave me a voicemail, and then email me?” I don't know how that's possible.So, from a personal perspective, totally see that. You know, from an account development perspective, you know, kind of with the Akamai hat on, it's challenging, right? You know, folks are out there trying to figure out where business is going to come from. And I think if you're able to get an indicator that somebody, you know, maybe you're going to call that person at enterprisedomain.com to try to figure out, you know, hey, is this real and is this you with a side project or is this you with a proof of concept for something that could be more fruitful? And, you know, Corey, they're probably just calling you because you're you.Corey: One of the things that I was surprised by where I saw the exact same thing. I started getting a series of emails from my account manager for Google Workspaces. Okay, and then I really did a spit-take when I realized this was on my personal address. Okay… so I read this carefully because what the hell is happening? Oh, they're raising prices and it's a campaign. Great.Now, my one-user vanity domain is going to go from $6 a month to $8 a month or whatever. Cool, I don't care. This is not someone actively trying to reach out as a human being. It's an outreach campaign. Cool, fair. But that's the problem, on some level, for super-tiny customers. It's a, what is it, is it a shakedown? What are they about to yell at me for?Andy: No, I got the same thing. My Google Workspace personal account, which is, like, two people, right? Like, and I got an email and then I think, like, a voicemail. And I'm like, I read the email and I'm like—you know, it's going—again, it's like, it was like six something and now it's, like, eight something a month. So, it's like, “Okay. You're all right.”Corey: Just go—that's what you have a credit card for. Go ahead and charge it. It's fine. Now, yeah, counterpoint if you're a large company, and yeah, we're just going to be raising prices by 20% across the board for everyone, and you look at this and like, that's a phone number. Yeah, I kind of want some special outreach and conversations there. But it's odd.Andy: It's interesting. Yeah. They're great.Corey: Last question before we call this an episode. In 22 years, how have you seen the market change from your perspective? Most people do not work in the industry from one company's perspective for as long as you have. That gives you a somewhat privileged position to see, from a point of relative stability, what the industry has done.Andy: So—Corey: What have you noticed?Andy: —and I'm going to give you an answer, which is about, like, the sales cycle, which is it used to be about meetings and about everybody coming together and used to have to occasionally wear a suit. And there would be, you know, meetings where you would need to get a CEO or CFO to personally see a presentation and decide something and say, “Okay, we're going with X or Y. We're going to make a decision.” And today, those decisions are, pretty far and wide, made much, much further down in the organization. They're made by developers, team leads, project managers, program managers.So, the way people engage with customers today is so different. First of all, like, most meetings are still virtual. I mean, like, yeah, we have physical meetings and we get together for things, but like, so much more is done virtually, which is cool because we built the internet so we wouldn't have to go anywhere, so it's nice that we got that landed. It's unfortunate that we had to do with Covid to get there, but ultimately, I think that purchasing decisions and technology decisions are distributed so much more deeply into the organization than they were. It used to be a, like, C-level thing. We're now seeing that stuff happened much further down in the organization.We see that inside Akamai and we see it with our customers as well. It's been, honestly, refreshing because you tend to be able to engage with technical folks when you're talking about technical products. And you know, the business folks are still there and they're helping to guide the discussions and all that, but it's a much better time, I think, to be a technical person now than it probably was 20 years ago.Corey: I would say that being a technical person has gotten easier in a bunch of ways; it's gotten harder in a bunch of ways. I would say that it has transformed. I was very opposed to the idea that oh, as a sysadmin, why should I learn to write code? And in retrospect, it was because I wasn't sure I could do it and it felt like the rising tide was going to drown me. And in hindsight, yeah, it was the right direction for the industry to go in.But I'm also sensitive to folks who don't want to, midway through their career, pick up an entirely new skill set in order to remain relevant. I think that it is a lot easier to do some things. Back when Akamai started, it took an intimate knowledge of GCC compiler flags, in most cases, to host a website. Now, it is checking a box on a web page and you're done. Things have gotten easier.The abstractions continue to slip below the waterline, so the things we have to care about getting more and more meaningful to the business. We're nowhere near our final form yet, but I'm very excited about how accessible this industry is to folks that previously would not have been, while also disheartened by just how much there is to know. Otherwise, “Oh yeah, that entire aspect of the way that this core thing that runs my business, yeah, that's basically magic and we just hope the magic doesn't stop working, or we make a sacrifice to the proper God, which is usually a giant trillion-dollar company.” And the sacrifice is, of course, engineering time combined with money.Andy: You know, technology is all about abstraction layers, right? And I think—that's my view, right—and we've been spending the last several decades, not, ‘we' Akamai; ‘we' the technology industry—on, you know, coming up with some pretty solid abstraction layers. And you're right, like, the, you know, GCC j6—you know, -j6—you know, kind of compiler tags not that important anymore, we could go back in time and talk about inetd, the first serverless. But other than that, you know, as we get to the present day, I think what's really interesting is you can contribute technically without being a super coding nerd. There's all kinds of different technical approaches today and technical disciplines that aren't just about development.Development is super important, but you know, frankly, the sysadmin skill set is more valuable today if you look at what SREs have become and how important they are to the industry. I mean, you know, those are some of the most critical folks in the entire piping here. So, don't feel bad for starting out as a sysadmin. I think that's my closing comment back to you.Corey: I think that's probably a good place to leave it. I really want to thank you for being so generous with your time.Andy: Anytime.Corey: If people want to learn more about how you see the world, where can they find you?Andy: Yeah, I mean, I guess you could check me out on LinkedIn. Happy to shoot me something there and happy to catch up. I'm pretty much read-only on social, so I don't pontificate a lot on Twitter, but—Corey: Such a good decision.Andy: Feel free to shoot me something on LinkedIn if you want to get in touch or chat about Akamai.Corey: Excellent. And of course, our thanks goes well, to the fine folks at Macrometa who have promoted this episode. It is always appreciated when people wind up supporting this ridiculous nonsense that I do. My guest has been Andy Champagne SVP at the CTO office over at Akamai. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment that will not post successfully because your podcast provider of choice wound up skimping out on a provider who did not care enough about a persistent global data layer.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
A Cloud Economist is Born - The AlterNAT Origin Story

Screaming in the Cloud

Play Episode Listen Later Nov 9, 2022 34:45


About BenBen Whaley is a staff software engineer at Chime. Ben is co-author of the UNIX and Linux System Administration Handbook, the de facto standard text on Linux administration, and is the author of two educational videos: Linux Web Operations and Linux System Administration. He is an AWS Community Hero since 2014. Ben has held Red Hat Certified Engineer (RHCE) and Certified Information Systems Security Professional (CISSP) certifications. He earned a B.S. in Computer Science from Univ. of Colorado, Boulder.Links Referenced: Chime Financial: https://www.chime.com/ alternat.cloud: https://alternat.cloud Twitter: https://twitter.com/iamthewhaley LinkedIn: https://www.linkedin.com/in/benwhaley/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With Tailscale SSH, you can do exactly that. Tailscale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate SSH.Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built-in key rotation, permissions as code, connectivity between any two devices, reduce latency, and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?Nope, I wish it were. Tailscale is completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscaleCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn and this is an episode unlike any other that has yet been released on this august podcast. Let's begin by introducing my first-time guest somehow because apparently an invitation got lost in the mail somewhere. Ben Whaley is a staff software engineer at Chime Financial and has been an AWS Community Hero since Andy Jassy was basically in diapers, to my level of understanding. Ben, welcome to the show.Ben: Corey, so good to be here. Thanks for having me on.Corey: I'm embarrassed that you haven't been on the show before. You're one of those people that slipped through the cracks and somehow I was very bad at following up slash hounding you into finally agreeing to be here. But you certainly waited until you had something auspicious to talk about.Ben: Well, you know, I'm the one that really should be embarrassed here. You did extend the invitation and I guess I just didn't feel like I had something to drop. But I think today we have something that will interest most of the listeners without a doubt.Corey: So, folks who have listened to this podcast before, or read my newsletter, or follow me on Twitter, or have shared an elevator with me, or at any point have passed me on the street, have heard me complain about the Managed NAT Gateway and it's egregious data processing fee of four-and-a-half cents per gigabyte. And I have complained about this for small customers because they're in the free tier; why is this thing charging them 32 bucks a month? And I have complained about this on behalf of large customers who are paying the GDP of the nation of Belize in data processing fees as they wind up shoving very large workloads to and fro, which is I think part of the prerequisite requirements for having a data warehouse. And you are no different than the rest of these people who have those challenges, with the singular exception that you have done something about it, and what you have done is so, in retrospect, blindingly obvious that I am embarrassed the rest of us never thought of it.Ben: It's interesting because when you are doing engineering, it's often the simplest solution that is the best. I've seen this repeatedly. And it's a little surprising that it didn't come up before, but I think it's in some way, just a matter of timing. But what we came up with—and is this the right time to get into it, do you want to just kind of name the solution, here?Corey: Oh, by all means. I'm not going to steal your thunder. Please, tell us what you have wrought.Ben: We're calling it AlterNAT and it's an alternative solution to a high-availability NAT solution. As everybody knows, NAT Gateway is sort of the default choice; it certainly is what AWS pushes everybody towards. But there is, in fact, a legacy solution: NAT instances. These were around long before NAT Gateway made an appearance. And like I said they're considered legacy, but with the help of lots of modern AWS innovations and technologies like Lambdas and auto-scaling groups with max instance lifetimes and the latest generation of networking improved or enhanced instances, it turns out that we can maybe not quite get as effective as a NAT Gateway, but we can save a lot of money and skip those data processing charges entirely by having a NAT instance solution with a failover NAT Gateway, which I think is kind of the key point behind the solution. So, are you interested in diving into the technical details?Corey: That is very much the missing piece right there. You're right. What we used to use was NAT instances. That was the thing that we used because we didn't really have another option. And they had an interface in the public subnet where they lived and an interface hanging out in the private subnet, and they had to be configured to wind up passing traffic to and fro.Well, okay, that's great and all but isn't that kind of brittle and dangerous? I basically have a single instance as a single point of failure and these are the days early on when individual instances did not have the level of availability and durability they do now. Yeah, it's kind of awful, but here you go. I mean, the most galling part of the Managed NAT Gateway service is not that it's expensive; it's that it's expensive, but also incredibly good at what it does. You don't have to think about this whole problem anymore, and as of recently, it also supports ipv4 to ipv6 translation as well.It's not that the service is bad. It's that the service is stonkingly expensive, particularly at scale. And everything that we've seen before is either oh, run your own NAT instances or bend your knee and pays your money. And a number of folks have come up with different options where this is ridiculous. Just go ahead and run your own NAT instances.Yeah, but what happens when I have to take it down for maintenance or replace it? It's like, well, I guess you're not going to the internet today. This has the, in hindsight, obvious solution, well, we just—we run the Managed NAT Gateway because the 32 bucks a year in instance-hour charges don't actually matter at any point of scale when you're doing this, but you wind up using that for day in, day out traffic, and the failover mode is simply you'll use the expensive Managed NAT Gateway until the instance is healthy again and then automatically change the route table back and forth.Ben: Yep. That's exactly it. So, the auto-scaling NAT instance solution has been around for a long time well, before even NAT Gateway was released. You could have NAT instances in an auto-scaling group where the size of the group was one, and if the NAT instance failed, it would just replace itself. But this left a period in which you'd have no internet connectivity during that, you know, when the NAT instance was swapped out.So, the solution here is that when auto-scaling terminates an instance, it fails over the route table to a standby NAT Gateway, rerouting the traffic. So, there's never a point at which there's no internet connectivity, right? The NAT instance is running, processing traffic, gets terminated after a certain period of time, configurable, 14 days, 30 days, whatever makes sense for your security strategy could be never, right? You could choose that you want to have your own maintenance window in which to do it.Corey: And let's face it, this thing is more or less sitting there as a network traffic router, for lack of a better term. There is no need to ever log into the thing and make changes to it until and unless there's a vulnerability that you can exploit via somehow just talking to the TCP stack when nothing's actually listening on the host.Ben: You know, you can run your own AMI that has been pared down to almost nothing, and that instance doesn't do much. It's using just a Linux kernel to sit on two networks and pass traffic back and forth. It has a translation table that kind of keeps track of the state of connections and so you don't need to have any service running. To manage the system, we have SSM so you can use Session Manager to log in, but frankly, you can just disable that. You almost never even need to get a shell. And that is, in fact, an option we have in the solution is to disable SSM entirely.Corey: One of the things I love about this approach is that it is turnkey. You throw this thing in there and it's good to go. And in the event that the instance becomes unhealthy, great, it fails traffic over to the Managed NAT Gateway while it terminates the old node and replaces it with a healthy one and then fails traffic back. Now, I do need to ask, what is the story of network connections during that failover and failback scenario?Ben: Right, that's the primary drawback, I would say, of the solution is that any established TCP connections that are on the NAT instance at the time of a route change will be lost. So, say you have—Corey: TCP now terminates on the floor.Ben: Pretty much. The connections are dropped. If you have an open SSH connection from a host in the private network to a host on the internet and the instance fails over to the NAT Gateway, the NAT Gateway doesn't have the translation table that the NAT instance had. And not to mention, the public IP address also changes because you have an Elastic IP assigned to the NAT instance, a different Elastic IP assigned to the NAT Gateway, and so because that upstream IP is different, the remote host is, like, tracking the wrong IP. So, those connections, they're going to be lost.So, there are some use cases where this may not be suitable. We do have some ideas on how you might mitigate that, for example, with the use of a maintenance window to schedule the replacement, replaced less often so it doesn't have to affect your workflow as much, but frankly, for many use cases, my belief is that it's actually fine. In our use case at Chime, we found that it's completely fine and we didn't actually experience any errors or failures. But there might be some use cases that are more sensitive or less resilient to failure in the first place.Corey: I would also point out that a lot of how software is going to behave is going to be a reflection of the era in which it was moved to cloud. Back in the early days of EC2, you had no real sense of reliability around any individual instance, so everything was written in a very defensive manner. These days, with instances automatically being able to flow among different hardware so we don't get instance interrupt notifications the way we once did on a semi-constant basis, it more or less has become what presents is bulletproof, so a lot of people are writing software that's a bit more brittle. But it's always been a best practice that when a connection fails okay, what happens at failure? Do you just give up and throw your hands in the air and shriek for help or do you attempt to retry a few times, ideally backing off exponentially?In this scenario, those retries will work. So, it's a question of how well have you built your software. Okay, let's say that you made the worst decisions imaginable, and okay, if that connection dies, the entire workload dies. Okay, you have the option to refactor it to be a little bit better behaved, or alternately, you can keep paying the Manage NAT Gateway tax of four-and-a-half cents per gigabyte in perpetuity forever. I'm not going to tell you what decision to make, but I know which one I'm making.Ben: Yeah, exactly. The cost savings potential of it far outweighs the potential maintenance troubles, I guess, that you could encounter. But the fact is, if you're relying on Managed NAT Gateway and paying the price for doing so, it's not as if there's no chance for connection failure. NAT Gateway could also fail. I will admit that I think it's an extremely robust and resilient solution. I've been really impressed with it, especially so after having worked on this project, but it doesn't mean it can't fail.And beyond that, upstream of the NAT Gateway, something could in fact go wrong. Like, internet connections are unreliable, kind of by design. So, if your system is not resilient to connection failures, like, there's a problem to solve there anyway; you're kind of relying on hope. So, it's a kind of a forcing function in some ways to build architectural best practices, in my view.Corey: I can't stress enough that I have zero problem with the capabilities and the stability of the Managed NAT Gateway solution. My complaints about it start and stop entirely with the price. Back when you first showed me the blog post that is releasing at the same time as this podcast—and you can visit that at alternat.cloud—you sent me an early draft of this and what I loved the most was that your math was off because of a not complete understanding of the gloriousness that is just how egregious the NAT Gateway charges are.Your initial analysis said, “All right, if you're throwing half a terabyte out to the internet, this has the potential of cutting the bill by”—I think it was $10,000 or something like that. It's, “Oh no, no. It has the potential to cut the bill by an entire twenty-two-and-a-half thousand dollars.” Because this processing fee does not replace any egress fees whatsoever. It's purely additive. If you forget to have a free S3 Gateway endpoint in a private subnet, every time you put something into or take something out of S3, you're paying four-and-a-half cents per gigabyte on that, despite the fact there's no internet transitory work, it's not crossing availability zones. It is simply a four-and-a-half cent fee to retrieve something that has only cost you—at most—2.3 cents per month to store in the first place. Flip that switch, that becomes completely free.Ben: Yeah. I'm not embarrassed at all to talk about the lack of education I had around this topic. The fact is I'm an engineer primarily and I came across the cost stuff because it kind of seemed like a problem that needed to be solved within my organization. And if you don't mind, I might just linger on this point and kind of think back a few months. I looked at the AWS bill and I saw this egregious ‘EC2 Other' category. It was taking up the majority of our bill. Like, the single biggest line item was EC2 Other. And I was like, “What could this be?”Corey: I want to wind up flagging that just because that bears repeating because I often get people pushing back of, “Well, how bad—it's one Managed NAT Gateway. How much could it possibly cost? $10?” No, it is the majority of your monthly bill. I cannot stress that enough.And that's not because the people who work there are doing anything that they should not be doing or didn't understand all the nuances of this. It's because for the security posture that is required for what you do—you are at Chime Financial, let's be clear here—putting everything in public subnets was not really a possibility for you folks.Ben: Yeah. And not only that but there are plenty of services that have to be on private subnets. For example, AWS Glue services must run in private VPC subnets if you want them to be able to talk to other systems in your VPC; like, they cannot live in public subnet. So essentially, if you want to talk to the internet from those jobs, you're forced into some kind of NAT solution. So, I dug into the EC2 Other category and I started trying to figure out what was going on there.There's no way—natively—to look at what traffic is transiting the NAT Gateway. There's not an interface that shows you what's going on, what's the biggest talkers over that network. Instead, you have to have flow logs enabled and have to parse those flow logs. So, I dug into that.Corey: Well, you're missing a step first because in a lot of environments, people have more than one of these things, so you get to first do the scavenger hunt of, okay, I have a whole bunch of Managed NAT Gateways and first I need to go diving into CloudWatch metrics and figure out which are the heavy talkers. Is usually one or two followed by a whole bunch of small stuff, but not always, so figuring out which VPC you're even talking about is a necessary prerequisite.Ben: Yeah, exactly. The data around it is almost missing entirely. Once you come to the conclusion that it is a particular NAT Gateway—like, that's a set of problems to solve on its own—but first, you have to go to the flow logs, you have to figure out what are the biggest upstream IPs that it's talking to. Once you have the IP, it still isn't apparent what that host is. In our case, we had all sorts of outside parties that we were talking to a lot and it's a matter of sorting by volume and figuring out well, this IP, what is the reverse IP? Who is potentially the host there?I actually had some wrong answers at first. I set up VPC endpoints to S3 and DynamoDB and SQS because those were some top talkers and that was a nice way to gain some security and some resilience and save some money. And then I found, well, Datadog; that's another top talker for us, so I ended up creating a nice private link to Datadog, which they offer for free, by the way, which is more than I can say for some other vendors. But then I found some outside parties, there wasn't a nice private link solution available to us, and yet, it was by far the largest volume. So, that's what kind of started me down this track is analyzing the NAT Gateway myself by looking at VPC flow logs. Like, it's shocking that there isn't a better way to find that traffic.Corey: It's worse than that because VPC flow logs tell you where the traffic is going and in what volumes, sure, on an IP address and port basis, but okay, now you have a Kubernetes cluster that spans two availability zones. Okay, great. What is actually passing through that? So, you have one big application that just seems awfully chatty, you have multiple workloads running on the thing. What's the expensive thing talking back and forth? The only way that you can reliably get the answer to that I found is to talk to people about what those workloads are actually doing, and failing that you're going code spelunking.Ben: Yep. You're exactly right about that. In our case, it ended up being apparent because we have a set of subnets where only one particular project runs. And when I saw the source IP, I could immediately figure that part out. But if it's a K8s cluster in the private subnets, yeah, how are you going to find it out? You're going to have to ask everybody that has workloads running there.Corey: And we're talking about in some cases, millions of dollars a month. Yeah, it starts to feel a little bit predatory as far as how it's priced and the amount of work you have to put in to track this stuff down. I've done this a handful of times myself, and it's always painful unless you discover something pretty early on, like, oh, it's talking to S3 because that's pretty obvious when you see that. It's, yeah, flip switch and this entire engagement just paid for itself a hundred times over. Now, let's see what else we can discover.That is always one of those fun moments because, first, customers are super grateful to learn that, oh, my God, I flipped that switch. And I'm saving a whole bunch of money. Because it starts with gratitude. “Thank you so much. This is great.” And it doesn't take a whole lot of time for that to alchemize into anger of, “Wait. You mean, I've been being ridden like a pony for this long and no one bothered to mention that if I click a button, this whole thing just goes away?”And when you mention this to your AWS account team, like, they're solicitous, but they either have to present as, “I didn't know that existed either,” which is not a good look, or, “Yeah, you caught us,” which is worse. There's no positive story on this. It just feels like a tax on not knowing trivia about AWS. I think that's what really winds me up about it so much.Ben: Yeah, I think you're right on about that as well. My misunderstanding about the NAT pricing was data processing is additive to data transfer. I expected when I replaced NAT Gateway with NAT instance, that I would be substituting data transfer costs for NAT Gateway costs, NAT Gateway data processing costs. But in fact, NAT Gateway incurs both data processing and data transfer. NAT instances only incur data transfer costs. And so, this is a big difference between the two solutions.Not only that, but if you're in the same region, if you're egressing out of your say us-east-1 region and talking to another hosted service also within us-east-1—never leaving the AWS network—you don't actually even incur data transfer costs. So, if you're using a NAT Gateway, you're paying data processing.Corey: To be clear you do, but it is cross-AZ in most cases billed at one penny egressing, and on the other side, that hosted service generally pays one penny ingressing as well. Don't feel bad about that one. That was extraordinarily unclear and the only reason I know the answer to that is that I got tired of getting stonewalled by people that later turned out didn't know the answer, so I ran a series of experiments designed explicitly to find this out.Ben: Right. As opposed to the five cents to nine cents that is data transfer to the internet. Which, add that to data processing on a NAT Gateway and you're paying between thirteen-and-a-half cents to nine-and-a-half cents for every gigabyte egressed. And this is a phenomenal cost. And at any kind of volume, if you're doing terabytes to petabytes, this becomes a significant portion of your bill. And this is why people hate the NAT Gateway so much.Corey: I am going to short-circuit an angry comment I can already see coming on this where people are going to say, “Well, yes. But it's a multi-petabyte scale. Nobody's paying on-demand retail price.” And they're right. Most people who are transmitting that kind of data, have a specific discount rate applied to what they're doing that varies depending upon usage and use case.Sure, great. But I'm more concerned with the people who are sitting around dreaming up ideas for a company where I want to wind up doing some sort of streaming service. I talked to one of those companies very early on in my tenure as a consultant around the billing piece and they wanted me to check their napkin math because they thought that at their numbers when they wound up scaling up, if their projections were right, that they were going to be spending $65,000 a minute, and what did they not understand? And the answer was, well, you didn't understand this other thing, so it's going to be more than that, but no, you're directionally correct. So, that idea that started off on a napkin, of course, they didn't build it on top of AWS; they went elsewhere.And last time I checked, they'd raised well over a quarter-billion dollars in funding. So, that's a business that AWS would love to have on a variety of different levels, but they're never going to even be considered because by the time someone is at scale, they either have built this somewhere else or they went broke trying.Ben: Yep, absolutely. And we might just make the point there that while you can get discounts on data transfer, you really can't—or it's very rare—to get discounts on data processing for the NAT Gateway. So, any kind of savings you can get on data transfer would apply to a NAT instance solution, you know, saving you four-and-a-half cents per gigabyte inbound and outbound over the NAT Gateway equivalent solution. So, you're paying a lot for the benefit of a fully-managed service there. Very robust, nicely engineered fully-managed service as we've already acknowledged, but an extremely expensive solution for what it is, which is really just a proxy in the end. It doesn't add any value to you.Corey: The only way to make that more expensive would be to route it through something like Splunk or whatnot. And Splunk does an awful lot for what they charge per gigabyte, but it just feels like it's rent-seeking in some of the worst ways possible. And what I love about this is that you've solved the problem in a way that is open-source, you have already released it in Terraform code. I think one of the first to-dos on this for someone is going to be, okay now also make it CloudFormation and also make it CDK so you can drop it in however you want.And anyone can use this. I think the biggest mistake people might make in glancing at this is well, I'm looking at the hourly charge for the NAT Gateways and that's 32-and-a-half bucks a month and the instances that you recommend are hundreds of dollars a month for the big network-optimized stuff. Yeah, if you care about the hourly rate of either of those two things, this is not for you. That is not the problem that it solves. If you're an independent learner annoyed about the $30 charge you got for a Managed NAT Gateway, don't do this. This will only add to your billing concerns.Where it really shines is once you're at, I would say probably about ten terabytes a month, give or take, in Managed NAT Gateway data processing is where it starts to consider this. The breakeven is around six or so but there is value to not having to think about things. Once you get to that level of spend, though it's worth devoting a little bit of infrastructure time to something like this.Ben: Yeah, that's effectively correct. The total cost of running the solution, like, all-in, there's eight Elastic IPs, four NAT Gateways, if you're—say you're four zones; could be less if you're in fewer zones—like, n NAT Gateways, n NAT instances, depending on how many zones you're in, and I think that's about it. And I said right in the documentation, if any of those baseline fees are a material number for your use case, then this is probably not the right solution. Because we're talking about saving thousands of dollars. Any of these small numbers for NAT Gateway hourly costs, NAT instance hourly costs, that shouldn't be a factor, basically.Corey: Yeah, it's like when I used to worry about costing my customers a few tens of dollars in Cost Explorer or CloudWatch or request fees against S3 for their Cost and Usage Reports. It's yeah, that does actually have a cost, there's no real way around it, but look at the savings they're realizing by going through that. Yeah, they're not going to come back and complaining about their five-figure consulting engagement costing an additional $25 in AWS charges and then lowering it by a third. So, there's definitely a difference as far as how those things tend to be perceived. But it's easy to miss the big stuff when chasing after the little stuff like that.This is part of the problem I have with an awful lot of cost tooling out there. They completely ignore cost components like this and focus only on the things that are easy to query via API, of, oh, we're going to cost-optimize your Kubernetes cluster when they think about compute and RAM. And, okay, that's great, but you're completely ignoring all the data transfer because there's still no great way to get at that programmatically. And it really is missing the forest for the trees.Ben: I think this is key to any cost reduction project or program that you're undertaking. When you look at a bill, look for the biggest spend items first and work your way down from there, just because of the impact you can have. And that's exactly what I did in this project. I saw that ‘EC2 Other' slash NAT Gateway was the big item and I started brainstorming ways that we could go about addressing that. And now I have my next targets in mind now that we've reduced this cost to effectively… nothing, extremely low compared to what it was, we have other new line items on our bill that we can start optimizing. But in any cost project, start with the big things.Corey: You have come a long way around to answer a question I get asked a lot, which is, “How do I become a cloud economist?” And my answer is, you don't. It's something that happens to you. And it appears to be happening to you, too. My favorite part about the solution that you built, incidentally, is that it is being released under the auspices of your employer, Chime Financial, which is immune to being acquired by Amazon just to kill this thing and shut it up.Because Amazon already has something shitty called Chime. They don't need to wind up launching something else or acquiring something else and ruining it because they have a Slack competitor of sorts called Amazon Chime. There's no way they could acquire you [unintelligible 00:27:45] going to get lost in the hallways.Ben: Well, I have confidence that Chime will be a good steward of the project. Chime's goal and mission as a company is to help everyone achieve financial peace of mind and we take that really seriously. We even apply it to ourselves and that was kind of the impetus behind developing this in the first place. You mentioned earlier we have Terraform support already and you're exactly right. I'd love to have CDK, CloudFormation, Pulumi supports, and other kinds of contributions are more than welcome from the community.So, if anybody feels like participating, if they see a feature that's missing, let's make this project the best that it can be. I suspect we can save many companies, hundreds of thousands or millions of dollars. And this really feels like the right direction to go in.Corey: This is easily a multi-billion dollar savings opportunity, globally.Ben: That's huge. I would be flabbergasted if that was the outcome of this.Corey: The hardest part is reaching these people and getting them on board with the idea of handling this. And again, I think there's a lot of opportunity for the project to evolve in the sense of different settings depending upon risk tolerance. I can easily see a scenario where in the event of a disruption to the NAT instance, it fails over to the Managed NAT Gateway, but fail back becomes manual so you don't have a flapping route table back and forth or a [hold 00:29:05] downtime or something like that. Because again, in that scenario, the failure mode is just well, you're paying four-and-a-half cents per gigabyte for a while until you wind up figuring out what's going on as opposed to the failure mode of you wind up disrupting connections on an ongoing basis, and for some workloads, that's not tenable. This is absolutely, for the common case, the right path forward.Ben: Absolutely. I think it's an enterprise-grade solution and the more knobs and dials that we add to tweak to make it more robust or adaptable to different kinds of use cases, the best outcome here would actually be that the entire solution becomes irrelevant because AWS fixes the NAT Gateway pricing. If that happens, I will consider the project a great success.Corey: I will be doing backflips like you wouldn't believe. I would sing their praises day in, day out. I'm not saying reduce it to nothing, even. I'm not saying it adds no value. I would change the way that it's priced because honestly, the fact that I can run an EC2 instance and be charged $0 on a per-gigabyte basis, yeah, I would pay a premium on an hourly charge based upon traffic volumes, but don't meter per gigabyte. That's where it breaks down.Ben: Absolutely. And why is it additive to data transfer, also? Like, I remember first starting to use VPC when it was launched and reading about the NAT instance requirement and thinking, “Wait a minute. I have to pay this extra management and hourly fee just so my private hosts could reach the internet? That seems kind of janky.”And Amazon established a norm here because Azure and GCP both have their own equivalent of this now. This is a business choice. This is not a technical choice. They could just run this under the hood and not charge anybody for it or build in the cost and it wouldn't be this thing we have to think about.Corey: I almost hate to say it, but Oracle Cloud does, for free.Ben: Do they?Corey: It can be done. This is a business decision. It is not a technical capability issue where well, it does incur cost to run these things. I understand that and I'm not asking for things for free. I very rarely say that this is overpriced when I'm talking about AWS billing issues. I'm talking about it being unpredictable, I'm talking about it being impossible to see in advance, but the fact that it costs too much money is rarely my complaint. In this case, it costs too much money. Make it cost less.Ben: If I'm not mistaken. GCPs equivalent solution is the exact same price. It's also four-and-a-half cents per gigabyte. So, that shows you that there's business games being played here. Like, Amazon could get ahead and do right by the customer by dropping this to a much more reasonable price.Corey: I really want to thank you both for taking the time to speak with me and building this glorious, glorious thing. Where can we find it? And where can we find you?Ben: alternat.cloud is going to be the place to visit. It's on Chime's GitHub, which will be released by the time this podcast comes out. As for me, if you want to connect, I'm on Twitter. @iamthewhaley is my handle. And of course, I'm on LinkedIn.Corey: Links to all of that will be in the podcast notes. Ben, thank you so much for your time and your hard work.Ben: This was fun. Thanks, Corey.Corey: Ben Whaley, staff software engineer at Chime Financial, and AWS Community Hero. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry rant of a comment that I will charge you not only four-and-a-half cents per word to read, but four-and-a-half cents to reply because I am experimenting myself with being a rent-seeking schmuck.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
How To Effectively Manage Your Co-Founder with Mike Julian

Screaming in the Cloud

Play Episode Listen Later Nov 3, 2022 31:34


About MikeBesides his duties as The Duckbill Group's CEO, Mike is the author of O'Reilly's Practical Monitoring, and previously wrote the Monitoring Weekly newsletter and hosted the Real World DevOps podcast. He was previously a DevOps Engineer for companies such as Taos Consulting, Peak Hosting, Oak Ridge National Laboratory, and many more. Mike is originally from Knoxville, TN (Go Vols!) and currently resides in Portland, OR.Links Referenced: Twitter: https://twitter.com/Mike_Julian mikejulian.com: https://mikejulian.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at Datadog. Datadog is a SaaS monitoring and security platform that enables full-stack observability for modern infrastructure and applications at every scale. Datadog enables teams to see everything: dashboarding, alerting, application performance monitoring, infrastructure monitoring, UX monitoring, security monitoring, dog logos, and log management, in one tightly integrated platform. With 600-plus out-of-the-box integrations with technologies including all major cloud providers, databases, and web servers, Datadog allows you to aggregate all your data into one platform for seamless correlation, allowing teams to troubleshoot and collaborate together in one place, preventing downtime and enhancing performance and reliability. Get started with a free 14-day trial by visiting datadoghq.com/screaminginthecloud, and get a free t-shirt after installing the agent.Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With Tailscale SSH, you can do exactly that. Tailscale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate SSH.Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built in key rotation permissions is code connectivity between any two devices, reduce latency and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?Nope, I wish it were. tail scales. Completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscaleCorey: Welcome to Screaming in the Cloud, I'm Corey Quinn and I'm having something of a crisis of faith based upon a recent conversation I've had with my returning yet again guest, Mike Julian, my business partner and CEO of The Duckbill Group. Welcome back, Mike.Mike: Hi, everyone.Corey: So, the revelation that had surfaced unexpectedly was, based upon a repeated talking point where I am a terrible employee slash expensive to manage, et cetera, et cetera, and you pointed out that you've been managing me for four years or so now, at which point I did a spit take, made all the more impressive by the fact that I wasn't drinking anything at the time, and realized, “Oh, my God, you're right, but I haven't had any of the usual problems slash friction with you that I have with basically every boss I've ever had in my entire career.” So, I'm spiraling. Let's talk about that.Mike: My recollection of that conversation is slightly different than yours. Mine is that you called me and said, “Mike, I just realized that you're my boss.” And I'm like, “How do you feel about that?” He's like, “I'm not really sure.”Corey: And I'm still not entirely sure how I feel if I'm being fully honest with you. Just because it's such a weird thing to have to deal with. Because historically, I always view a managerial relationship as starting from a place of a power imbalance. And that is the one element that is missing from our relationship. We each own half the company, we can fire each other, but it takes the form of tearing the company apart, and that isn't something that we're really set up to entertain.Mike: And you know, I actually think it's deeper than that because you owning the other half of the company is not really… it's not really power in itself. Like, yeah, it is, but you could easily own half the company and have no power. Because, like, really when we talk about power, we're talking about political power, influence, and I think the reason that there is no power imbalance is because each of us does something in the company that is just as important as the other. And they're both equally valuable to the company and we both recognize the other's contributions, as that, as being equally valuable to the company. It's less to do about how much we own and more about the work that we do.Corey: Oh, of course. The ownership starts and stops entirely with the fact that neither one of us can force the other out. So it's, as opposed to well, I own 51% of the company, so when I'm tired of your bullshit, you're leaving. And that is a dynamic that's never entered into it. I'm also going to add one more thing onto what you just said, which is, both of us would sooner tear off our own skin than do the other's job.Mike: Yeah. God, I would hate to do your job, but I know you'd hate to do mine.Corey: You look at my calendar on a busy meeting day and you have a minor panic attack just looking at it where, “Oh, my God, talking to that many people.” And you are going away for a while and you come back with a whole analytical model where your first love language feels like it's spreadsheets on some days, and I look at this and it's like, “Yeah, I know what some of those numbers mean.” And it just drives me up a wall, the idea of building out a plan and an execution thing and then delegating a lot of it to other people, it does not work for my worldview in so many different ways. It's the reason I think that you and I get along. That and our shared values.Mike: I remember the first time that you and I did a consulting engagement together. We went on a multi-day trip. And at the end of, like, three days of nonstop conversations, you made a comment, it was like, “Cool. So, what are we going to do that again?” Like, you were excited by it. I can tell you're energized. And I was just thinking, “Please for love of God, I want to die right now.”Corey: One of the weirdest parts about all of it, though, is neither one of us is in a scenario where what we do for a living and how we go about it works without the other.Mike: Right. Yeah, like, this is one of the interesting things about the company we have built is that it would not work with just you or just me; it's us being co-founders is what makes it successful.Corey: The thing that I do not understand and I don't think I ever will is the idea of co-founder speed dating, where you basically go to some big networking mixer event, pick some rando off the street, and congratulations, that's your business partner. Have fun. It is not that much of an exaggeration to say that co-founding a company with someone else is like a marriage. You are creating a legal entity that without very specific controls and guidelines, you are opening yourself up to massive liability issues if the other person decides to screw you over. That is part of the reason that the values match was so important for us.Mike: Yeah, it is surprising to me how similar being co-founders and business partners is to being married. I did not expect how close those two things were. You and I spend an incredible amount of time just on the relationship for each of us, which I never expected, but makes sense in hindsight.Corey: That's I think part of it makes the whole you managing me type of relationship work is because not only can you not, “Fire me,” quote-unquote, but I can't quit without—Mike: [laugh].Corey: Leaving behind a giant pile of effort with nothing to show for it over the last four years. So, it's one of those conversation styles where we go into the conversation knowing, regardless of how heated it gets or how annoyed we are with each other, that we are not going to blow the company up because one of us is salty that week.Mike: Right. Yeah, I remember from the legal perspective, when we put together a partnership agreement, our attorneys were telling us that we really needed to have someone at the 51% owner, and we were both adamant that no, that doesn't work for us. And finally, the way that we handled it is if you and I could not handle a dispute, then the only remedy left was to shut the entire thing down. And that would be an automatic trigger. We've never ever, ever even got close to that point.But like, I like that's the structure because it really means that if you and I can't agree on something and it's a substantial thing, then there's no business, which really kind of sets the stage for how important the conversations that we have are. And of course, you and I, we're close, we have a great relationship, so that's never come up. But I do like that it's still there.Corey: I like the fact that there's always going to be an option to get out. It's not a suicide pact, for lack of a better term. But it's also something that neither one of us would ever entertain lightly. And credit where due, there have been countless conversations where you and I were diametrically opposed; we each talk through it, and one or the other of us will just do a complete one-eighty our position where, “Okay, you convinced me,” and that's it. What's so odd about that is because we don't have too many examples of that in public society, it just seems like there's now this entire focus on, “Oh, if you make an observation or a point, that's wrong, you've got to double down on it.” Why would you do that? That makes zero sense. When you've considered something of a different angle and change your mind, why waste more time on it?Mike: I think there's other interesting ones, too, where you and I have come at something from a different angle and one of us will realize that we just actually don't care as much as we thought we did. And we'll just back down because it's not the hill we want to die on.Corey: Which brings us to a good point. What hill do we want to die on?Mike: Hmm. I think we've only got a handful. I mean, as it should; like, there should not be there should not be many of them.Corey: No, no because most things can change, in the fullness of time. Just because it's not something we believe is right for the business right now does not mean it never will be.Mike: Yeah. I think all of them really come down to questions of values, which is why you and I worked so well together, in that we don't have a lot of common interests, we're at completely different stages in our lives, but we have very tightly aligned values. Which means that when we go into a discussion about something, we know where the other stands right away, like, we could generally make a pretty good guess about it. And there's often very little question about how some values discussion is going to go. Like, do we take on a certain client that is, I don't know, they build landmines? Is that a thing that we're going to do? Of course not. Like—Corey: I should clarify, we're talking here about physical landmines; not whatever disastrous failure mode your SaaS application has.Mike: [laugh]. Yeah.Corey: We know what those are.Mike: Yeah, and like, that sort of thing, you and I would never even pose the question to each other. We would just make the decision. And maybe we tell each other later because and, like, “Hey, haha, look what happened,” but there will never be a discussion around it because it just—our values are so tightly aligned that it wouldn't be necessary.Corey: Whenever we're talking to someone that's in a new sector or a company that has a different expression, we always like to throw it past each other just to double-check, you don't have a problem with—insert any random thing here; the breadth of our customer base just astounds me—and very rarely as either one of us thrown a flag on something just because we do have this affinity for saying[ yes and making money.Mike: Yeah. But you actually wanted to talk about the terribleness of managing you.Corey: Yeah. I am very curious as to what your experience has been.Mike: [laugh].Corey: And before we dive into it, I want to call out a couple of things that make me a little atypical for your typical problem employee. I am ADHD personified. My particular expression of that means that my energy level is very different at different times of day, there are times where I will get nothing done for a day or two, and then in four hours, get three weeks of work done. It is hard to predict and it's hard to schedule around and it's never clear exactly what that energy level is going to be at any given point in time. That's the starting point of this nonsense. Now, take it away.Mike: Yeah. What most people know about Corey is what everyone sees on Twitter, which is what I would call the high highs. Everyone sees you as your most energetic, or at least perceived as the most energetic. If they see you in person at a conference, it's the same sort of thing. What people don't see are your lows, which are really, really low lows.And it's not a matter of, like, you don't get anything done. Like, you know, we can handle that; it's that you disappear. And it may be for a couple hours, it may be for a couple of days, and we just don't really know what's going on. That's really hard. But then, in your high highs, they're really high, but they're also really unpredictable.So, what that means is that because you have ADHD, like, the way that your brain thinks, the way your brain works, is that you don't control what you're going to focus on, and you never know what you're going to focus on. It may be exactly what you should be focusing on, which is a huge win for everyone involved, but sometimes you focus on stuff that doesn't matter to anyone except you. Sometimes really interesting stuff comes out of that, but oftentimes it doesn't. So, helping build a structure to work around those sorts of things and to also support those sorts of things, has been one of the biggest challenges that I've had. And most of my job is really about building a support structure for you and enabling you to do your best work.So, that's been really interesting and really challenging because I do not think that way. Like, if I need to focus on something, I just say, “Great. I'm just going to focus on this thing,” and I'll focus on it until I'm done. But you don't work that way, and you couldn't conceivably work that way, ever. So, it's always been hard because I say things like, “Hey, Corey, I need you to go write this series of emails.” And you'll write them when your brain decides that wants to write them, which might be never.Corey: That's part of the problem. I've also found that if I have an idea floating around too long, it'll linger for years and I'll never write anything about it, whereas there are times when I have—the inspiration strikes, I write a one- to 2000-word blog post every week that goes out, and there are times it takes me hours and there are times I bust out the entire thing in first draft form in 20 minutes or less. Like, if it's Domino's, like, there's not going to be a refund on it. So, it's kind of wild and I wish I could harness that somehow I don't know how, but… that's one of the biggest challenges.Mike: I wish I could too, but it's one of the things that you learn to get used to. And with that, because we've worked together for so long, I've gotten to be able to tell in what state of mind you are. Like, are you in a state where if I put something in front of you, you're going to go after it hard, and like, great things are going to happen, or are you more likely to ignore that I said anything? And I can generally tell within the first sentence or so of bringing something up. But that also means that I have other—I have to be careful with how I structure requests that I have for you.In some cases, I come with a punch list of, like, here's six things I need to get through and I'm going to sit on this call while we go through them. In other cases, I have to drip them out one at a time over the span of a week just because that's how your mind is those days. That makes it really difficult because that's not how most people are managed and it's not how most people expect to manage. So, coming up with different ways to do that has been one of the trickiest things I've done.Corey: Let's move on a little bit other than managing my energy levels because that does not sound like a particularly difficult employee to manage. “Okay, great. We've got to build some buffer room into the schedule in case he winds up not delivering for a few days. Okay, we can live with that.” But oh, working with me gets so much worse.Mike: [laugh]. It absolutely does.Corey: This is my performance review. Please hit me with it.Mike: Yeah. The other major concern that has been challenging to work through that makes you really frustrating to work with, is you hate conflict. Actually, I don't actually—let me clarify that further. You avoid conflict, except your definition of conflict is more broad than most. Because when most people think of conflicts, like, “Oh, I have to go have this really hard conversation, it's going to be uncomfortable, and, like—”Corey: “Time to go fire Steven.”Mike: Right, or things like, “I have to have our performance conversation with someone.” Like, everyone hates those, but, like, there's good ways and bad ways to them, like, it's uncomfortable even at the best of times. But with you, it's more than that, it's much more broad. You avoid giving direction because you perceive giving direction as potential for conflict, and because you're so conflict-avoidant, you don't give direction to people.Which means that if someone does something you don't like, you don't say anything and then it leaves everyone on the team to say, like, “I really wish Corey would be more explicit about what he wants. I wish he was more vocal about the direction he wanted to go.” Like, “Please tell us something more.” But you're so conflict-avoidant that you don't, and no amount of begging or we're asking for it has really changed that, so we end up with these two things where you're doing most of the work yourself because you don't want to direct other people to do it.Corey: I will push back slightly on one element of that, which is when I have a strong opinion about something, I am not at all hesitant about articulating that. I mean, this is not—like, my Twitter is not performance art; it's very much what I believe. The challenge is that for so much of what we talk about internally on a day-to-day basis, I don't really have a strong opinion. And what I've always shied away from is the idea of telling people how to do their jobs. So, I want to be very clear that I'm not doing that, except when it's important.Because we've all been in environments in the corporate world where the president of the company wanders past or your grand-boss walks into the room and asks an idle question, or, “Maybe we should do this,” and it never feels like it's really just idle pondering. It's, “Welp, new strategic priority just dropped from on high.”Mike: Right.Corey: And every senior manager has a story about screwing that one up. And I have led us down that path once or twice previously. So—Mike: That's true.Corey: When I don't have a strong opinion, I think what I need to get better at is saying, “I don't give a shit,” but when I frame it like that it causes different problems.Mike: Yeah. Yeah, that's very true. I still don't completely agree with your disagreement there, but I understand your perspective. [laugh].Corey: Oh, he's not like you can fire me, so it doesn't really matter. I kid. I kid.Mike: Right. Yeah. So, I think those are the two major areas that make you a real challenge to manage and a challenge to direct. But one of the reasons why I think we've been successful at it, or at least I'll say I've been successful at managing you, is I do so with such a gentle touch that you don't realize that I'm doing anything, and I have all these different—Corey: Well, it did take me four years to realize what was going on.Mike: Yeah, like, I have all these different ways of getting you to do things, and you don't realize I'm doing them. And, like, I've shared many of them here for you for the first time. And that's really is what has worked out well. Like, a lot of the ways that I manage you, you don't realize are management.Corey: Managing shards. Maintenance windows. Overprovisioning. ElastiCache bills. I know, I know. It's a spooky season and you're already shaking. It's time for caching to be simpler. Momento Serverless Cache lets you forget the backend to focus on good code and great user experiences. With true autoscaling and a pay-per-use pricing model, it makes caching easy. No matter your cloud provider, get going for free at gomomento.co/screaming That's GO M-O-M-E-N-T-O dot co slash screamingCorey: What advice would you have for someone for whom a lot of these stories are resonating? Because, “Hey, I have a direct report is driving me to distraction and a lot sounds like what you're describing.” What do you wish you'd known sooner about how to coax performance out of me, for lack of a better phrasing?Mike: When we first started really working together, I knew what ADHD was, but I knew it from a high school paper that I did on ADHD, and it's um—oh, what was it—“The Overdiagnosis of ADHD,” which was a thing when you and I were at high school. That's all I knew is just that ADHD was suspected to be grossly overdiagnosed and that most people didn't have it. What I have learned is that yeah, that might have been true—maybe; I don't know—but for people that do have any ADHD, it's a real thing. Like, it does have some pretty substantial impact.And I wish I had known more about how that manifests, and particularly how it manifests in different people. And I wish I'd known more earlier on about the coping mechanisms that different people create for themselves and how they manage and how they—[sigh], I'm struggling to come up with the right word here, but many people who are neurodivergent in some way create coping mechanisms and ways to shift themselves to appear more neurotypical. And I wish I had understood that better. Particularly, I wish I had understood that better for you when we first started because I've kind of learned about it over time. And I spent so much time trying to get you to work the way that I work rather than understand that you work different. Had I spent more time to understand how you work and what your coping mechanisms were, the earlier years of Duckbill would have been so much smoother.Corey: And again, having this conversation has been extraordinarily helpful. On my side of it, one of the things that was absolutely transformative and caused a massive reduction in our interpersonal conflict was the very simple tool of, it's not necessarily a problem when I drop something on the floor and don't get to it, as long as I throw a hand up and say, “I'm dropping this thing,” and so someone else can catch it as we go. I don't know how much of this is ADHD speaking versus how much of it is just my own brokenness in some ways, but I feel like everyone has this neverending list of backlog tasks that they'll get to someday that generally doesn't ever seem to happen. More often than not, I wind up every few months, just looking at my ever-growing list, reset to zero and we'll start over. And every once in a while, I'll be really industrious and knock a thing or two off the list. But so many that don't necessarily matter or need to be me doing them, but it drives people to distraction when something hits my email inbox, it just dies there, for example.Mike: Yeah. One of the systems that we set up here is that if there's something that Corey does not immediately want to do, I have you send it to someone else. And generally it's to me and then I become a router for you. But making that more explicit and making that easier for you—I'm just like, “If this is not something that you're going to immediately take care of yourself, forward it to me.” And that was huge. But then other things, like when you take time off, no one knows you're taking time off. And it's an—the easiest thing is no one cares that you're taking time off; just, you know, tell us you're doing it.Corey: Yeah, there's a difference between, “I'm taking three days off,” and your case, the answer is generally, “Oh, thank God. He's finally using some of that vacation.”Mike: [laugh].Corey: The problem is there's a world of difference between, “Oh, I'm going to take these three days off,” and just not showing up that day. That tends to cause problems for people.Mike: Yeah. They're just waving a hand in the air and saying, “Hey, this is happening,” that's great. But not waving it, not saying anything at all, that's where the pain really comes from.Corey: When you take a look across your experience managing people, which to my understanding your first outing with it was at this company—Mike: Yeah.Corey: What about managing me is the least surprising and the most surprising that you've picked up during that pattern? Because again, the story has always been, “Oh, yeah, you're a terrible manager because you've never done it before,” but I look back and you're clearly the best manager I've ever had, if for no other reason than neither one of us can rage-quit. But there's a lot of artistry to how you've handled a lot of challenges that I present to you.Mike: I'm the best manager you've had because I haven't fired you. [laugh].Corey: And also, some of the best ones I have had fired me. That doesn't necessarily disqualify someone.Mike: Yeah. I want to say, I am by no means experienced as a manager. As you mentioned, this is my first outing into doing management. As my coach tells me, I'm getting better every day. I am not terrible [laugh].The—let's see—most surprising; least surprising. I don't think I have anything for least surprising. I think most surprising is how easy it is for you to accept feedback and how quickly you do something about it, how quickly you take action on that feedback. I did not expect that, given all your other proclivities for not liking managers, not liking to be managed, for someone to give feedback to you and you say, “Yep, that sounds good,” and then do it, like, that was incredibly surprising.Corey: It's one of those areas where if you're not embracing or at least paying significant attention to how you are being perceived, maybe that's a problem, maybe it's not, let's be very clear. However, there's also a lot of propensity there to just assume, “Oh, I'm right and screw everyone else.” You can do an awful lot of harm that way. And that is something I've had to become incredibly aware of, especially during the pandemic, as the size of my audience at this point more than quadrupled from the start of the pandemic. These are a bunch of people now who have never met me in person, they have no context on what I do.And I tend to view the world the way you might expect a dog to behave, who caught a car that he has absolutely no idea how to drive, and he's sort of winging it as he goes. Like, step one, let's not kill people. Step two, eh, we'll figure that out later. Like, step one is the most important.Mike: Mm-hm. Yeah.Corey: And feedback is hard to get, past a certain point. I often lament from time to time that it's become more challenging for me to weed out who the jerks are because when you're perceived to have a large platform and more or less have no problem calling large companies and powerful folk to account, everyone's nice to you. And well, “Really? He's terrible and shitty to women. That's odd. He's always been super nice to me.” Is not the glowing defense that so many people seem to think that it is. It's I have learned to listen a lot more clearly the more I speak.Mike: That's a challenge for me as well because, as we've mentioned, my first foray into management. As we've had more people in the company, that has gotten more of a challenge of I have to watch what I say because my word carries weight on its own, by virtue of my position. And you have the same problem, except yours is much more about your weight in public, rather than your weight internally.Corey: I see it as different sides of the same coin. I take it as a personal bit of a badge of honor that almost every person I meet, including the people who've worked here, have come away, very surprised by just how true to life my personality on Twitter is to how actually am when I interact with humans. You're right, they don't see the low sides, but I also try not to take that out on the staff either.Mike: [laugh]. Right.Corey: We do the best of what we have, I think, and it's gratifying to know that I can still learn new tricks.Mike: Yeah. And I'm not firing anytime soon.Corey: That's right. Thank you again for giving me the shotgun performance review. It's always appreciated. If people want to learn more, where can they find you, to get their own performance preview, perhaps?Mike: Yeah, you can find me on Twitter at @Mike_Julian. Or you can sign up for our newsletter, where I'm talking about my upcoming book on consulting at mikejulian.com.Corey: And we will put links to that into the show notes. Thanks again, sir.Mike: Thank you.Corey: Mike Julian, CEO of The Duckbill Group, my business partner, and apparently my boss. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment that demonstrates the absolute worst way to respond to a negative performance evaluation.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Computing on the Edge with Macrometa's Chetan Venkatesh

Screaming in the Cloud

Play Episode Listen Later Nov 1, 2022 40:29


About ChetanChetan Venkatesh is a technology startup veteran focused on distributed data, edge computing, and software products for enterprises and developers. He has 20 years of experience in building primary data storage, databases, and data replication products. Chetan holds a dozen patents in the area of distributed computing and data storage.Chetan is the CEO and Co-Founder of Macrometa – a Global Data Network featuring a Global Data Mesh, Edge Compute, and In-Region Data Protection. Macrometa helps enterprise developers build real-time apps and APIs in minutes – not months.Links Referenced: Macrometa: https://www.macrometa.com Macrometa Developer Week: https://www.macrometa.com/developer-week TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With Tailscale SSH, you can do exactly that. Tailscale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate SSH.Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built in key rotation permissions is code connectivity between any two devices, reduce latency and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?Nope, I wish it were. tail scales. Completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscaleCorey: Managing shards. Maintenance windows. Overprovisioning. ElastiCache bills. I know, I know. It's a spooky season and you're already shaking. It's time for caching to be simpler. Momento Serverless Cache lets you forget the backend to focus on good code and great user experiences. With true autoscaling and a pay-per-use pricing model, it makes caching easy. No matter your cloud provider, get going for free at gomomento.co/screaming That's GO M-O-M-E-N-T-O dot co slash screamingCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Today, this promoted guest episode is brought to us basically so I can ask a question that has been eating at me for a little while. That question is, what is the edge? Because I have a lot of cynical sarcastic answers to it, but that doesn't really help understanding. My guest today is Chetan Venkatesh, CEO and co-founder at Macrometa. Chetan, thank you for joining me.Chetan: It's my pleasure, Corey. You're one of my heroes. I think I've told you this before, so I am absolutely delighted to be here.Corey: Well, thank you. We all need people to sit on the curb and clap as we go by and feel like giant frauds in the process. So let's start with the easy question that sets up the rest of it. Namely, what is Macrometa, and what puts you in a position to be able to speak at all, let alone authoritatively, on what the edge might be?Chetan: I'll answer the second part of your question first, which is, you know, what gives me the authority to even talk about this? Well, for one, I've been trying to solve the same problem for 20 years now, which is build distributed systems that work really fast and can answer questions about data in milliseconds. And my journey's sort of been like the spiral staircase journey, you know, I keep going around in circles, but the view just keeps getting better every time I do one of these things. So I'm on my fourth startup doing distributed data infrastructure, and this time really focused on trying to provide a platform that's the antithesis of the cloud. It's kind of like taking the cloud and flipping it on its head because instead of having a single region application where all your stuff runs in one place, on us-west-1 or us-east-1, what if your apps could run everywhere, like, they could run in hundreds and hundreds of cities around the world, much closer to where your users and devices and most importantly, where interesting things in the real world are happening?And so we started Macrometa about five years back to build a new kind of distributed cloud—let's call the edge—that kind of looks like a CDN, a Content Delivery Network, but really brings very sophisticated platform-level primitives for developers to build applications in a distributed way around primitives for compute, primitives for data, but also some very interesting things that you just can't do in the cloud anymore. So that's Macrometa. And we're doing something with edge computing, which is a big buzzword these days, but I'm sure you'll ask me about that.Corey: It seems to be. Generally speaking, when I look around and companies are talking about edge, it feels almost like it is a redefining of what they already do to use a term that is currently trending and deep in the hype world.Chetan: Yeah. You know, I think humans just being biologically social beings just tend to be herd-like, and so when we see a new trend, we like to slap it on everything we have. We did that 15 years back with cloud, if you remember, you know? Everybody was very busy trying to stick the cloud label on everything that was on-prem. Edge is sort of having that edge-washing moment right now.But I define edge very specifically is very different from the cloud. You know, where the cloud is defined by centralization, i.e., you've got a giant hyperscale data center somewhere far, far away, where typically electricity, real estate, and those things are reasonably cheap, i.e., not in urban centers, where those things tend to be expensive.You know, you have platforms where you run things at scale, it's sort of a your mess for less business in the cloud and somebody else manages that for you. The edge is actually defined by location. And there are three types of edges. The first edge is the CDN edge, which is historically where we've been trying to make things faster with the internet and make the internet scale. So Akamai came about, about 20 years back and created this thing called the CDN that allowed the web to scale. And that was the first killer app for edge, actually. So that's the first location that defines the edge where a lot of the peering happens between different network providers and the on-ramp around the cloud happens.The second edge is the telecom edge. That's actually right next to you in terms of, you know, the logical network topology because every time you do something on your computer, it goes through that telecom layer. And now we have the ability to actually run web services, applications, data, directly from that telecom layer.And then the third edge is—sort of, people have been familiar with this for 30 years. The third edge is your device, just your mobile phone. It's your internet gateway and, you know, things that you carry around in your pocket or sit on your desk, where you have some compute power, but it's very restricted and it only deals with things that are interesting or important to you as a person, not in a broad range. So those are sort of the three things. And it's not the cloud. And these three things are now becoming important as a place for you to build and run enterprise apps.Corey: Something that I think is often overlooked here—and this is sort of a natural consequence of the cloud's own success and the joy that we live in a system that we do where companies are required to always grow and expand and find new markets—historically, for example, when I went to AWS re:Invent, which is a cloud service carnival in the desert that no one in the right mind should ever want to attend but somehow we keep doing, it used to be that, oh, these announcements are generally all aligned with people like me, where I have specific problems and they look a lot like what they're talking about on stage. And now they're talking about things that, from that perspective, seem like Looney Tunes. Like, I'm trying to build Twitter for Pets or something close to it, and I don't understand why there's so much talk about things like industrial IoT and, “Machine learning,” quote-unquote, and other things that just do not seem to align with. I'm trying to build a web service, like it says on the name of a company; what gives?And part of that, I think, is that it's difficult to remember, for most of us—especially me—that what they're coming out with is not your shopping list. Every service is for someone, not every service is for everyone, so figuring out what it is that they're talking about and what those workloads look like, is something that I think is getting lost in translation. And in our defense—collective defense—Amazon is not the best at telling stories to realize that, oh, this is not me they're talking to; I'm going to opt out of this particular thing. You figure it out by getting it wrong first. Does that align with how you see the market going?Chetan: I think so. You know, I think of Amazon Web Services, or even Google, or Azure as sort of Costco and, you know, Sam's Wholesale Club or whatever, right? They cater to a very broad audience and they sell a lot of stuff in bulk and cheap. And you know, so it's sort of a lowest common denominator type of a model. And so emerging applications, and especially emerging needs that enterprises have, don't necessarily get solved in the cloud. You've got to go and build up yourself on sort of the crude primitives that they provide.So okay, go use your bare basic EC2, your S3, and build your own edgy, or whatever, you know, cutting edge thing you want to build over there. And if enough people are doing it, I'm sure Amazon and Google start to pay interest and you know, develop something that makes it easier. So you know, I agree with you, they're not the best at this sort of a thing. The edge is phenomenon also that's orthogonally, and diametrically opposite to the architecture of the cloud and the economics of the cloud.And we do centralization in the cloud in a big way. Everything is in one place; we make giant piles of data in one database or data warehouse slice and dice it, and almost all our computer science is great at doing things in a centralized way. But when you take data and chop it into 50 copies and keep it in 50 different places on Earth, and you have this thing called the internet or the wide area network in the middle, trying to keep all those copies in sync is a nightmare. So you start to deal with some very basic computer science problems like distributed state and how do you build applications that have a consistent view of that distributed state? So you know, there have been attempts to solve these problems for 15, 18 years, but none of those attempts have really cracked the intersection of three things: a way for programmers to do this in a way that doesn't blow their heads with complexity, a way to do this cheaply and effectively enough where you can build real-world applications that serve billions of users concurrently at a cost point that actually is economical and make sense, and third, a way to do this with adequate levels of performance where you don't die waiting for the spinning wheel on your screen to go away.So these are the three problems with edge. And as I said, you know, me and my team, we've been focused on this for a very long while. And me and my co-founder have come from this world and we created a platform very uniquely designed to solve these three problems, the problems of complexity for programmers to build in a distributed environment like this where data sits in hundreds of places around the world and you need a consistent view of that data, being able to operate and modify and replicate that data with consistency guarantees, and then a third one, being able to do that, at high levels of performance, which translates to what we call ultra-low latency, which is human perception. The threshold of human perception, visually, is about 70 milliseconds. Our finest athletes, the best Esports players are about 70 to 80 milliseconds in their twitch, in their ability to twitch when something happens on the screen. The average human is about 100 to 110 milliseconds.So in a second, we can maybe do seven things at rapid rates. You know, that's how fast our brain can process it. Anything that falls below 100 milliseconds—especially if it falls into 50 to 70 milliseconds—appears instantaneous to the human mind and we experience it as magic. And so where edge computing and where my platform comes in is that it literally puts data and applications within 50 milliseconds of 90% of humans and devices on Earth and allows now a whole new set of applications where latency and location and the ability to control those things with really fine-grained capability matters. And we can talk a little more about what those apps are in a bit.Corey: And I think that's probably an interesting place to dive into at the moment because whenever we talk about the idea of new ways of building things that are aimed at decentralization, first, people at this point automatically have a bit of an aversion to, “Wait, are you talking about some of the Web3 nonsense?” It's one of those look around the poker table and see if you can spot the sucker, and if you can't, it's you. Because there are interesting aspects to that entire market, let's be clear, but it also seems to be occluded by so much of the grift and nonsense and spam and the rest that, again, sort of characterize the early internet as well. The idea though, of decentralizing out of the cloud is deeply compelling just to anyone who's really ever had to deal with the egress charges, or even the data transfer charges inside of one of the cloud providers. The counterpoint is it feels that historically, you either get to pay the tax and go all-in on a cloud provider and get all the higher-level niceties, or otherwise, you wind up deciding you're going to have to more or less go back to physical data centers, give or take, and other than the very baseline primitives that you get to work with of VMs and block storage and maybe a load balancer, you're building it all yourself from scratch. It seems like you're positioning this as setting up for a third option. I'd be very interested to hear it.Chetan: Yeah. And a quick comment on decentralization: good; not so sure about the Web3 pieces around it. We tend to talk about computer science and not the ideology of distributing data. There are political reasons, there are ideological reasons around data and sovereignty and individual human rights, and things like that. There are people far smarter than me who should explain that.I fall personally into the Nicholas Weaver school of skepticism about Web3 and blockchain and those types of things. And for readers who are not familiar with Nicholas Weaver, please go online. He teaches at UC Berkeley is just one of the finest minds of our time. And I think he's broken down some very good reasons why we should be skeptical about, sort of, Web3 and, you know, things like that. Anyway, that's a digression.Coming back to what we're talking about, yes, it is a new paradigm, but that's the challenge, which is I don't want to introduce a new paradigm. I want to provide a continuum. So what we've built is a platform that looks and feels very much like Lambdas, and a poly-model database. I hate the word multi. It's a pretty dumb word, so I've started to substitute ‘multi' with ‘poly' everywhere, wherever I can find it.So it's not multi-cloud; it's poly-cloud. And it's not multi-model; it's poly-model. Because what we want is a world where developers have the ability to use the best paradigm for solving problems. And it turns out when we build applications that deal with data, data doesn't just come in one form, it comes in many different forms, it's polymorphic, and so you need a data platform, that's also, you know, polyglot and poly-model to be able to handle that. So that's one part of the problem, which is, you know, we're trying to provide a platform that provides continuity by looking like a key-value store like Redis. It looks like a document database—Corey: Or the best database in the world Route 53 TXT records. But please, keep going.Chetan: Well, we've got that too, so [laugh] you know? And then we've got a streaming graph engine built into it that kind of looks and behaves like a graph database, like Neo4j, for example. And, you know, it's got columnar capabilities as well. So it's sort of a really interesting data platform that is not open-source; it's proprietary because it's designed to solve these problems of being able to distribute data, put it in hundreds of locations, keep it all in sync, but it looks like a conventional NoSQL database. And it speaks PostgreSQL, so if you know PostgreSQL, you can program it, you know, pretty easily.What it's also doing is taking away the responsibility for engineers and developers to understand how to deal with very arcane problems like conflict resolution in data. I made a change in Mumbai; you made a change in Tokyo; who wins? Our systems in the cloud—you know, DynamoDB, and things like that—they have very crude answers for this something called last writer wins. We've done a lot of work to build a protocol that brings you ACID-like consistency in these types of problems and makes it easy to reason with state change when you've got an application that's potentially running in 100 locations and each of those places is modifying the same record, for example.And then the second part of it is it's a converged platform. So it doesn't just provide data; it provides a compute layer that's deeply integrated directly with the data layer itself. So think of it as Lambdas running, like, stored procedures inside the database. That's really what it is. We've built a very, very specialized compute engine that exposes containers in functions as stored procedures directly on the database.And so they run inside the context of the database and so you can build apps in Python, Go, your favorite language; it compiles down into a [unintelligible 00:15:02] kernel that actually runs inside the database among all these different polyglot interfaces that we have. And the third thing that we do is we provide an ability for you to have very fine-grained control on your data. Because today, data's become a political tool; it's become something that nation-states care a lot about.Corey: Oh, do they ever.Chetan: Exactly. And [unintelligible 00:15:24] regulated. So here's the problem. You're an enterprise architect and your application is going to be consumed in 15 countries, there are 13 different frameworks to deal with. What do you do? Well, you spin up 13 different versions, one for each country, and you know, build 13 different teams, and have 13 zero-day attacks and all that kind of craziness, right?Well, data protection is actually one of the most important parts of the edge because, with something like Macrometa, you can build an app once, and we'll provide all the necessary localization for any region processing, data protection with things like tokenization of data so you can exfiltrate data securely without violating potentially PII sensitive data exfiltration laws within countries, things like that, i.e. It's solving some really hard problems by providing an opinionated platform that does these three things. And I'll summarize it as thus, Corey, we can kind of dig into each piece. Our platform is called the Global Data Network. It's not a global database; it's a global data network. It looks like a frickin database, but it's actually a global network available in 175 cities around the world.Corey: The challenge, of course, is where does the data actually live at rest, and—this is why people care about—well, they're two reasons people care about that; one is the data residency locality stuff, which has always, honestly for me, felt a little bit like a bit of a cloud provider shakedown. Yeah, build a data center here or you don't get any of the business of anything that falls under our regulation. The other is, what is the egress cost of that look like? Because yeah, I can build a whole multicenter data store on top of AWS, for example, but minimum, we're talking two cents, a gigabyte of transfer, even with inside of a region in some cases, and many times that externally.Chetan: Yeah, that's the real shakedown: the egress costs [laugh] more than the other example that you talked about over there. But it's a reality of how cloud pricing works and things like that. What we have built is a network that is completely independent of the cloud providers. We're built on top of five different service providers. Some of them are cloud providers, some of them are telecom providers, some of them are CDNs.And so we're building our global data network on top of routes and capacity provided by transfer providers who have different economics than the cloud providers do. So our cost for egress falls somewhere between two and five cents, for example, depending on which edge locations, which countries, and things that you're going to use over there. We've got a pretty generous egress fee where, you know, for certain thresholds, there's no egress charge at all, but over certain thresholds, we start to charge between two to five cents. But even if you were to take it at the higher end of that spectrum, five cents per gigabyte for transfer, the amount of value our platform brings in architecture and reduction in complexity and the ability to build apps that are frankly, mind-boggling—one of my customers is a SaaS company in marketing that uses us to inject offers while people are on their website, you know, browsing. Literally, you hit their website, you do a few things, and then boom, there's a customized offer for them.In banking that's used, for example, you know, you're making your minimum payments on your credit card, but you have a good payment history and you've got a decent credit score, well, let's give you an offer to give you a short-term loan, for example. So those types of new applications, you know, are really at this intersection where you need low latency, you need in-region processing, and you also need to comply with data regulation. So when you building a high-value revenue-generating app like that egress cost, even at five cents, right, tends to be very, very cheap, and the smallest part of you know, the complexity of building them.Corey: One of the things that I think we see a lot of is that the tone of this industry is set by the big players, and they have done a reasonable job, by and large, of making anything that isn't running in their blessed environments, let me be direct, sound kind of shitty, where it's like, “Oh, do you want to be smart and run things in AWS?”—or GCP? Or Azure, I guess—“Or do you want to be foolish and try and build it yourself out of popsicle sticks and twine?” And, yeah, on some level, if I'm trying to treat everything like it's AWS and run a crappy analog version of DynamoDB, for example, I'm not going to have a great experience, but if I also start from a perspective of not using things that are higher up the stack offerings, that experience starts to look a lot more reasonable as we start expanding out. But it still does present to a lot of us as well, we're just going to run things in VM somewhere and treat them just like we did back in 2005. What's changed in that perspective?Chetan: Yeah, you know, I can't talk for others but for us, we provide a high-level Platform-as-a-Service, and that platform, the global data network, has three pieces to it. First piece is—and none of this will translate into anything that AWS or GCP has because this is the edge, Corey, is completely different, right? So the global data network that we have is composed of three technology components. The first one is something that we call the global data mesh. And this is Pub/Sub and event processing on steroids. We have the ability to connect data sources across all kinds of boundaries; you've got some data in Germany and you've got some data in New York. How do you put these things together and get them streaming so that you can start to do interesting things with correlating this data, for example?And you might have to get across not just physical boundaries, like, they're sitting in different systems in different data centers; they might be logical boundaries, like, hey, I need to collaborate with data from my supply chain partner and we need to be able to do something that's dynamic in real-time, you know, to solve a business problem. So the global data mesh is a way to very quickly connect data wherever it might be in legacy systems, in flat files, in streaming databases, in data warehouses, what have you—you know, we have 500-plus types of connectors—but most importantly, it's not just getting the data streaming, it's then turning it into an API and making that data fungible. Because the minute you put an API on it and it's become fungible now that data is actually got a lot of value. And so the data mesh is a way to very quickly connect things up and put an API on it. And that API can now be consumed by front-ends, it can be consumed by other microservices, things like that.Which brings me to the second piece, which is edge compute. So we've built a compute runtime that is Docker compatible, so it runs containers, it's also Lambda compatible, so it runs functions. Let me rephrase that; it's not Lambda-compatible, it's Lambda-like. So no, you can't take your Lambda and dump it on us and it won't just work. You have to do some things to make it work on us.Corey: But so many of those things are so deeply integrated to the ecosystem that they're operating within, and—Chetan: Yeah.Corey: That, on the one hand, is presented by cloud providers as, “Oh, yes. This shows how wonderful these things are.” In practice, talk to customers. “Yeah, we're using it as spackle between the different cloud services that don't talk to one another despite being made by the same company.”Chetan: [laugh] right.Corey: It's fun.Chetan: Yeah. So the second edge compute piece, which allows you now to build microservices that are stateful, i.e., they have data that they interact with locally, and schedule them along with the data on our network of 175 regions around the world. So you can build distributed applications now.Now, your microservice back-end for your banking application or for your HR SaaS application or e-commerce application is not running in us-east-1 and Virginia; it's running literally in 15, 18, 25 cities where your end-users are, potentially. And to take an industrial IoT case, for example, you might be ingesting data from the electricity grid in 15, 18 different cities around the world; you can do all of that locally now. So that's what the edge functions does, it flips the cloud model around because instead of sending data to where the compute is in the cloud, you're actually bringing compute to where the data is originating, or the data is being consumed, such as through a mobile app. So that's the second piece.And the third piece is global data protection, which is hey, now I've got a distributed infrastructure; how do I comply with all the different privacy and regulatory frameworks that are out there? How do I keep data secure in each region? How do I potentially share data between regions in such a way that, you know, I don't break the model of compliance globally and create a billion-dollar headache for my CIO and CEO and CFO, you know? So that's the third piece of capabilities that this provides.All of this is presented as a set of serverless APIs. So you simply plug these APIs into your existing applications. Some of your applications work great in the cloud. Maybe there are just parts of that app that should be on our edge. And that's usually where most customers start; they take a single web service or two that's not doing so great in the cloud because it's too far away; it has data sensitivity, location sensitivity, time sensitivity, and so they use us as a way to just deal with that on the edge.And there are other applications where it's completely what I call edge native, i.e., no dependancy on the cloud comes and runs completely distributed across our network and consumes primarily the edges infrastructure, and just maybe send some data back on the cloud for long-term storage or long-term analytics.Corey: And ingest does remain free. The long-term analytics, of course, means that once that data is there, good luck convincing a customer to move it because that gets really expensive.Chetan: Exactly, exactly. It's a speciation—as I like to say—of the cloud, into a fast tier where interactions happen, i.e., the edge. So systems of record are still in the cloud; we still have our transactional systems over there, our databases, data warehouses.And those are great for historical types of data, as you just mentioned, but for things that are operational in nature, that are interactive in nature, where you really need to deal with them because they're time-sensitive, they're depleting value in seconds or milliseconds, they're location sensitive, there's a lot of noise in the data and you need to get to just those bits of data that actually matter, throw the rest away, for example—which is what you do with a lot of telemetry in cybersecurity, for example, right—those are all the things that require a new kind of a platform, not a system of record, a system of interaction, and that's what the global data network is, the GDN. And these three primitives, the data mesh, Edge compute, and data protection, are the way that our APIs are shaped to help our enterprise customers solve these problems. So put it another way, imagine ten years from now what DynamoDB and global tables with a really fast Lambda and Kinesis with actually Event Processing built directly into Kinesis might be like. That's Macrometa today, available in 175 cities.Corey: This episode is brought to us in part by our friends at Datadog. Datadog is a SaaS monitoring and security platform that enables full-stack observability for modern infrastructure and applications at every scale. Datadog enables teams to see everything: dashboarding, alerting, application performance monitoring, infrastructure monitoring, UX monitoring, security monitoring, dog logos, and log management, in one tightly integrated platform. With 600-plus out-of-the-box integrations with technologies including all major cloud providers, databases, and web servers, Datadog allows you to aggregate all your data into one platform for seamless correlation, allowing teams to troubleshoot and collaborate together in one place, preventing downtime and enhancing performance and reliability. Get started with a free 14-day trial by visiting datadoghq.com/screaminginthecloud, and get a free t-shirt after installing the agent.Corey: I think it's also worth pointing out that it's easy for me to fall into a trap that I wonder if some of our listeners do as well, which is, I live in, basically, downtown San Francisco. I have gigabit internet connectivity here, to the point where when it goes out, it is suspicious and more a little bit frightening because my ISP—Sonic.net—is amazing and deserves every bit of praise that you never hear any ISP ever get. But when I travel, it's a very different experience. When I go to oh, I don't know, the conference center at re:Invent last year and find that the internet is patchy at best, or downtown San Francisco on Verizon today, I discover that the internet is almost non-existent, and suddenly applications that I had grown accustomed to just working suddenly didn't.And there's a lot more people who live far away from these data center regions and tier one backbones directly to same than don't. So I think that there's a lot of mistaken ideas around exactly what the lower bandwidth experience of the internet is today. And that is something that feels inadvertently classist if that make sense. Are these geographically bigoted?Chetan: Yeah. No, I think those two points are very well articulated. I wish I could articulate it that well. But yes, if you can afford 5G, some of those things get better. But again, 5G is not everywhere yet. It will be, but 5G can in many ways democratize at least one part of it, which is provide an overlap network at the edge, where if you left home and you switched networks, on to a wireless, you can still get the same quality of service that you used to getting from Sonic, for example. So I think it can solve some of those things in the future. But the second part of it—what did you call it? What bigoted?Corey: Geographically bigoted. And again, that's maybe a bit of a strong term, but it's easy to forget that you can't get around the speed of light. I would say that the most poignant example of that I had was when I was—in the before times—giving a keynote in Australia. So ah, I know what I'll do, I'll spin up an EC2 instance for development purposes—because that's how I do my development—in Australia. And then I would just pay my provider for cellular access for my iPad and that was great.And I found the internet was slow as molasses for everything I did. Like, how do people even live here? Well, turns out that my provider would backhaul traffic to the United States. So to log into my session, I would wind up having to connect with a local provider, backhaul to the US, then connect back out from there to Australia across the entire Pacific Ocean, talk to the server, get the response, would follow that return path. It's yeah, turns out that doing laps around the world is not the most efficient way of transferring any data whatsoever, let alone in sizable amounts.Chetan: And that's why we decided to call our platform the global data network, Corey. In fact, it's really built inside of sort of a very simple reason is that we have our own network underneath all of this and we stop this whole ping-pong effect of data going around and help create deterministic guarantees around latency, around location, around performance. We're trying to democratize latency and these types of problems in a way that programmers shouldn't have to worry about all this stuff. You write your code, you push publish, it runs on a network, and it all gets there with a guarantee that 95% of all your requests will happen within 50 milliseconds round-trip time, from any device, you know, in these population centers around the world.So yeah, it's a big deal. It's sort of one of our je ne sais quoi pieces in our mission and charter, which is to just democratize latency and access, and sort of get away from this geographical nonsense of, you know, how networks work and it will dynamically switch topology and just make everything slow, you know, very non-deterministic way.Corey: One last topic that I want to ask you about—because I near certain given your position, you will have an opinion on this—what's your take on, I guess, the carbon footprint of clouds these days? Because a lot of people been talking about it; there has been a lot of noise made about, justifiably so. I'm curious to get your take.Chetan: Yeah, you know, it feels like we're in the '30s and the '40s of the carbon movement when it comes to clouds today, right? Maybe there's some early awareness of the problem, but you know, frankly, there's very little we can do than just sort of put a wet finger in the air, compute some carbon offset and plant some trees. I think these are good building blocks; they're not necessarily the best ways to solve this problem, ultimately. But one of the things I care deeply about and you know, my company cares a lot about is helping make developers more aware off what kind of carbon footprint their code tangibly has on the environment. And so we've started two things inside the company. We've started a foundation that we call the Carbon Conscious Computing Consortium—the four C's. We're going to announce that publicly next year, we're going to invite folks to come and join us and be a part of it.The second thing that we're doing is we're building a completely open-source, carbon-conscious computing platform that is built on real data that we're collecting about, to start with, how Macrometa's platform emits carbon in response to different types of things you build on it. So for example, you wrote a query that hits our database and queries, you know, I don't know, 20 billion objects inside of our database. It'll tell you exactly how many micrograms or how many milligrams of carbon—it's an estimate; not exactly. I got to learn to throttle myself down. It's an estimate, you know, you can't really measure these things exactly because the cost of carbon is different in different places, you know, there are different technologies, et cetera.Gives you a good decent estimate, something that reliably tells you, “Hey, you know that query that you have over there, that piece of SQL? That's probably going to do this much of micrograms of carbon at this scale.” You know, if this query was called a million times every hour, this is how much it costs. A million times a day, this is how much it costs and things like that. But the most important thing that I feel passionate about is that when we give developers visibility, they do good things.I mean, when we give them good debugging tools, the code gets better, the code gets faster, the code gets more efficient. And Corey, you're in the business of helping people save money, when we give them good visibility into how much their code costs to run, they make the code more efficient. So we're doing the same thing with carbon, we know there's a cost to run your code, whether it's a function, a container, a query, what have you, every operation has a carbon cost. And we're on a mission to measure that and provide accurate tooling directly in our platform so that along with your debug lines, right, where you've got all these print statements that are spitting up stuff about what's happening there, we can also print out, you know, what did it cost in carbon.And you can set budgets. You can basically say, “Hey, I want my application to consume this much of carbon.” And down the road, we'll have AI and ML models that will help us optimize your code to be able to fit within those carbon budgets. For example. I'm not a big fan of planting—you know, I love planting trees, but don't get me wrong, we live in California and those trees get burned down.And I was reading this heartbreaking story about how we returned back into the atmosphere a giant amount of carbon because the forest reserve that had been planted, you know, that was capturing carbon, you know, essentially got burned down in a forest fire. So, you know, we're trying to just basically say, let's try and reduce the amount of carbon, you know, that we can potentially create by having better tooling.Corey: That would be amazing, and I think it also requires something that I guess acts almost as an exchange where there's a centralized voice that can make sure that, well, one, the provider is being honest, and two, being able to ensure you're doing an apples-to-apples comparison and not just discounting a whole lot of negative externalities. Because, yes, we're talking about carbon released into the environment. Okay, great. What about water effects from what's happening with your data centers are located? That can have significant climate impact as well. It's about trying to avoid the picking and choosing. It's hard, hard problem, but I'm unconvinced that there's anything more critical in the entire ecosystem right now to worry about.Chetan: So as a startup, we care very deeply about starting with the carbon part. And I agree, Corey, it's a multi-dimensional problem; there's lots of tentacles. The hydrocarbon industry goes very deeply into all parts of our lives. I'm a startup, what do I know? I can't solve all of those things, but I wanted to start with the philosophy that if we provide developers with the right tooling, they'll have the right incentives then to write better code. And as we open-source more of what we learn and, you know, our tooling, others will do the same. And I think in ten years, we might have better answers. But someone's got to start somewhere, and this is where we'd like to start.Corey: I really want to thank you for taking as much time as you have for going through what you're up to and how you view the world. If people want to learn more, where's the best place to find you?Chetan: Yes, so two things on that front. Go to www.macrometa.com—M-A-C-R-O-M-E-T-A dot com—and that's our website. And you can come and experience the full power of the platform. We've got a playground where you can come, open an account and build anything you want for free, and you can try and learn. You just can't run it in production because we've got a giant network, as I said, of 175 cities around the world. But there are tiers available for you to purchase and build and run apps. Like I think about 80 different customers, some of the biggest ones in the world, some of the biggest telecom customers, retail, E-Tail customers, [unintelligible 00:34:28] tiny startups are building some interesting things on.And the second thing I want to talk about is November 7th through 11th of 2022, just a couple of weeks—or maybe by the time this recording comes out, a week from now—is developer week at Macrometa. And we're going to be announcing some really interesting new capabilities, some new features like real-time complex event processing with low, ultra-low latency, data connectors, a search feature that allows you to build search directly on top of your applications without needing to spin up a giant Elastic Cloud Search cluster, or providing search locally and regionally so that, you know, you can have search running in 25 cities that are instant to search rather than sending all your search requests back in one location. There's all kinds of very cool things happening over there.And we're also announcing a partnership with the original, the OG of the edge, one of the largest, most impressive, interesting CDN players that has become a partner for us as well. And then we're also announcing some very interesting experimental work where you as a developer can build apps directly on the 5G telecom cloud as well. And then you'll hear from some interesting companies that are building apps that are edge-native, that are impossible to build in the cloud because they take advantage of these three things that we talked about: geography, latency, and data protection in some very, very powerful ways. So you'll hear actual customer case studies from real customers in the flesh, not anonymous BS, no marchitecture. It's a week-long of technical talk by developers, for developers. And so, you know, come and join the fun and let's learn all about the edge together, and let's go build something together that's impossible to do today.Corey: And we will, of course, put links to that in the [show notes 00:36:06]. Thank you so much for being so generous with your time. I appreciate it.Chetan: My pleasure, Corey. Like I said, you're one of my heroes. I've always loved your work. The Snark-as-a-Service is a trillion-dollar market cap company. If you're ever interested in taking that public, I know some investors that I'd happily put you in touch with. But—Corey: Sadly, so many of those investors lack senses of humor.Chetan: [laugh]. That is true. That is true [laugh].Corey: [laugh]. [sigh].Chetan: Well, thank you. Thanks again for having me.Corey: Thank you. Chetan Venkatesh, CEO and co-founder at Macrometa. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry and insulting comment about why we should build everything on the cloud provider that you work for and then the attempt to challenge Chetan for the title of Edgelord.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Infinite Possibilities of Amazon S3 with Kevin Miller

Screaming in the Cloud

Play Episode Listen Later Oct 26, 2022 33:17


About KevinKevin Miller is currently the global General Manager for Amazon Simple Storage Service (S3), an object storage service that offers industry-leading scalability, data availability, security, and performance. Prior to this role, Kevin has had multiple leadership roles within AWS, including as the General Manager for Amazon S3 Glacier, Director of Engineering for AWS Virtual Private Cloud, and engineering leader for AWS Virtual Private Network and AWS Direct Connect. Kevin was also Technical Advisor to the Senior Vice President for AWS Utility Computing. Kevin is a graduate of Carnegie Mellon University with a Bachelor of Science in Computer Science.Links Referenced: snark.cloud/shirt: https://snark.cloud/shirt aws.amazon.com/s3: https://aws.amazon.com/s3 TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at Datadog. Datadog is a SaaS monitoring and security platform that enables full-stack observability for modern infrastructure and applications at every scale. Datadog enables teams to see everything: dashboarding, alerting, application performance monitoring, infrastructure monitoring, UX monitoring, security monitoring, dog logos, and log management, in one tightly integrated platform. With 600-plus out-of-the-box integrations with technologies including all major cloud providers, databases, and web servers, Datadog allows you to aggregate all your data into one platform for seamless correlation, allowing teams to troubleshoot and collaborate together in one place, preventing downtime and enhancing performance and reliability. Get started with a free 14-day trial by visiting datadoghq.com/screaminginthecloud, and get a free t-shirt after installing the agent.Corey: Managing shards. Maintenance windows. Overprovisioning. ElastiCache bills. I know, I know. It's a spooky season and you're already shaking. It's time for caching to be simpler. Momento Serverless Cache lets you forget the backend to focus on good code and great user experiences. With true autoscaling and a pay-per-use pricing model, it makes caching easy. No matter your cloud provider, get going for free at gomomento.co/screaming. That's GO M-O-M-E-N-T-O dot co slash screaming.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Right now, as I record this, we have just kicked off our annual charity t-shirt fundraiser. This year's shirt showcases S3 as the eighth wonder of the world. And here to either defend or argue the point—we're not quite sure yet—is Kevin Miller, AWS's vice president and general manager for Amazon S3. Kevin, thank you for agreeing to suffer the slings and arrows that are no doubt going to be interpreted, misinterpreted, et cetera, for the next half hour or so.Kevin: Oh, Corey, thanks for having me. And happy to do that, and really flattered for you to be thinking about S3 in this way. So more than happy to chat with you.Corey: It's absolutely one of those services that is foundational to the cloud. It was the first AWS service that was put into general availability, although the beta folks are going to argue back and forth about no, no, that was SQS instead. I feel like now that Mai-Lan handles both SQS and S3 as part of her portfolio, she is now the final arbiter of that. I'm sure that's an argument for a future day. But it's impossible to imagine cloud without S3.Kevin: I definitely think that's true. It's hard to imagine cloud, actually, with many of our foundational services, including SQS, of course, but we are—yes, we were the first generally available service with S3. And pretty happy with our anniversary being Pi Day, 3/14.Corey: I'm also curious, your own personal trajectory has been not necessarily what folks would expect. You were the general manager of Amazon Glacier, and now you're the general manager and vice president of S3. So, I've got to ask, because there are conflicting reports on this depending upon what angle you look at, are Glacier and S3 the same thing?Kevin: Yes, I was the general manager for S3 Glacier prior to coming over to S3 proper, and the answer is no, they are not the same thing. We certainly have a number of technologies where we're able to use those technologies both on S3 and Glacier, but there are certainly a number of things that are very distinct about Glacier and give us that ability to hit the ultra-low price points that we do for Glacier Deep Archive being as low as $1 per terabyte-month. And so, that definitely—there's a lot of actual ingenuity up and down the stack, from hardware to software, everywhere in between, to really achieve that with Glacier. But then there's other spots where S3 and Glacier have very similar needs, and then, of course, today many customers use Glacier through S3 as a storage class in S3, and so that's a great way to do that. So, there's definitely a lot of shared code, but certainly, when you get into it, there's [unintelligible 00:04:59] to both of them.Corey: I ran a number of obnoxiously detailed financial analyses, and they all came away with, unless you have a very specific very nuanced understanding of your data lifecycle and/or it is less than 30 or 60 days depending upon a variety of different things, the default S3 storage class you should be using for virtually anything is Intelligent Tiering. That is my purely economic analysis of it. Do you agree with that? Disagree with that? And again, I understand that all of these storage classes are like your children, and I am inviting you to tell me which one of them is your favorite, but I'm absolutely prepared to do that.Kevin: Well, we love Intelligent Tiering because it is very simple; customers are able to automatically save money using Intelligent Tiering for data that's not being frequently accessed. And actually, since we launched it a few years ago, we've already saved customers more than $250 million using Intelligent Tiering. So, I would say today, it is our default recommendation in almost every case. I think that the cases where we would recommend another storage class as the primary storage class tend to be specific to the use case where—and particularly for use cases where customers really have a good understanding of the access patterns. And we saw some customers do for their certain dataset, they know that it's going to be heavily accessed for a fixed period of time, or this data is actually for archival, it'll never be accessed, or very rarely if ever access, just maybe in an emergency.And those kinds of use cases, I think actually, customers are probably best to choose one of the specific storage classes where they're, sort of, paying that the lower cost from day one. But again, I would say for the vast majority of cases that we see, the data access patterns are unpredictable and customers like the flexibility of being able to very quickly retrieve the data if they decide they need to use it. But in many cases, they'll save a lot of money as the data is not being accessed, and so, Intelligent Tiering is a great choice for those cases.Corey: I would take it a step further and say that even when customers believe that they are going to be doing a deeper analysis and they have a better understanding of their data flow patterns than Intelligent Tiering would, in practice, I see that they rarely do anything about it. It's one of those things where they're like, “Oh, yeah, we're going to set up our own lifecycle policies real soon now,” whereas, just switch it over to Intelligent Tiering and never think about it again. People's time is worth so much more than the infrastructure they're working on in almost every case. It doesn't seem to make a whole lot of sense unless you have a very intentioned, very urgent reason to go and do that stuff by hand in most cases.Kevin: Yeah, that's right. I think I agree with you, Corey. And certainly, that is the recommendation we lead with customers.Corey: In previous years, our charity t-shirt has focused on other areas of AWS, and one of them was based upon a joke that I've been telling for a while now, which is that the best database in the world is Route 53 and storing TXT records inside of it. I don't know if I ever mentioned this to you or not, but the first iteration of that joke was featuring around S3. The challenge that I had with it is that S3 Select is absolutely a thing where you can query S3 with SQL which I don't see people doing anymore because Athena is the easier, more, shall we say, well-articulated version of all of that. And no, no, that joke doesn't work because it's actually true. You can use S3 as a database. Does that statement fill you with dread? Regret? Am I misunderstanding something? Or are you effectively running a giant subversive database?Kevin: Well, I think that certainly when most customers think about a database, they think about a collection of technology that's applied for given problems, and so I wouldn't count S3 as providing the whole range of functionality that would really make up a database. But I think that certainly a lot of the primitives and S3 Select as a great example of a primitive are available in S3. And we're looking at adding, you know, additional primitives going forward to make it possible to, you know, to build a database around S3. And as you see, other AWS services have done that in many ways. For example, obviously with Amazon Redshift having a lot of capability now to just directly access and use data in S3 and make that a super seamless so that you can then run data warehousing type queries on top of S3 and on top of your other datasets.So, I certainly think it's a great building block. And one other thing I would actually just say that you may not know, Corey, is that one of the things over the last couple of years we've been doing a lot more with S3 is actually working to directly contribute improvements to open-source connector software that uses S3, to make available automatically some of the performance improvements that can be achieved either using both the AWS SDK, and also using things like S3 Select. So, we started with a few of those things with Select; you're going to see more of that coming, most likely. And some of that, again, the idea there as you may not even necessarily know you're using Select, but when we can identify that it will improve performance, we're looking to be able to contribute those kinds of improvements directly—or we are contributing those directly to those open-source packages. So, one thing I would definitely recommend customers and developers do is have a capability of sort of keeping that software up-to-date because although it might seem like those are sort of one-and-done kind of software integrations, there's actually almost continuous improvement now going on, and around things like that capability, and then others we come out with.Corey: What surprised me is just how broadly S3 has been adopted by a wide variety of different clients' software packages out there. Back when I was running production environments in anger, I distinctly remember in one Ubuntu environment, we wound up installing a specific package that was designed to teach apt how to retrieve packages and its updates from S3, which was awesome. I don't see that anymore, just because it seems that it is so easy to do it now, just with the native features that S3 offers, as well as an awful lot of software under the hood has learned to directly recognize S3 as its own thing, and can react accordingly.Kevin: And just do the right thing. Exactly. No, we certainly see a lot of that. So that's, you know—I mean, obviously making that simple for end customers to use and achieve what they're trying to do, that's the whole goal.Corey: It's always odd to me when I'm talking to one of my clients who is looking to understand and optimize their AWS bill to see outliers in either direction when it comes to S3 itself. When they're driving large S3 bills as in a majority of their spend, it's, okay, that is very interesting. Let's dive into that. But almost more interesting to me is when it is effectively not being used at all. When, oh, we're doing everything with EBS volumes or EFS.And again, those are fine services. I don't have any particular problem with them anymore, but the problem I have is that the cloud long ago took what amounts to an economic vote. There's a tax savings for storing data in an object store the way that you—and by extension, most of your competitors—wind up pricing this, versus the idea of on a volume basis where you have to pre-provision things, you don't get any form of durability that extends beyond the availability zone boundary. It just becomes an awful lot of, “Well, you could do it this way. But it gets really expensive really quickly.”It just feels wild to me that there is that level of variance between S3 just sort of raw storage basis, economically, as well as then just the, frankly, ridiculous levels of durability and availability that you offer on top of that. How did you get there? Was the service just mispriced at the beginning? Like oh, we dropped to zero and probably should have put that in there somewhere.Kevin: Well, no, I wouldn't call it mispriced. I think that the S3 came about when we took a—we spent a lot of time looking at the architecture for storage systems, and knowing that we wanted a system that would provide the durability that comes with having three completely independent data centers and the elasticity and capability where, you know, customers don't have to provision the amount of storage they want, they can simply put data and the system keeps growing. And they can also delete data and stop paying for that storage when they're not using it. And so, just all of that investment and sort of looking at that architecture holistically led us down the path to where we are with S3.And we've definitely talked about this. In fact, in Peter's keynote at re:Invent last year, we talked a little bit about how the system is designed under the hood, and one of the thing you realize is that S3 gets a lot of the benefits that we do by just the overall scale. The fact that it is—I think the stat is that at this point more than 10,000 customers have data that's stored on more than a million hard drives in S3. And that's how you get the scale and the capability to do is through massive parallelization. Where customers that are, you know, I would say building more traditional architectures, those are inherently typically much more siloed architectures with a relatively small-scale overall, and it ends up with a lot of resource that's provisioned at small-scale in sort of small chunks with each resource, that you never get to that scale where you can start to take advantage of the some is more than the greater of the parts.And so, I think that's what the recognition was when we started out building S3. And then, of course, we offer that as an API on top of that, where customers can consume whatever they want. That is, I think, where S3, at the scale it operates, is able to do certain things, including on the economics, that are very difficult or even impossible to do at a much smaller scale.Corey: One of the more egregious clown-shoe statements that I hear from time to time has been when people will come to me and say, “We've built a competitor to S3.” And my response is always one of those, “Oh, this should be good.” Because when people say that, they generally tend to be focusing on one or maybe two dimensions that doesn't work for a particular use case as well as it could. “Okay, what was your story around why this should be compared to S3?” “Well, it's an object store. It has full S3 API compatibility.” “Does it really because I have to say, there are times where I'm not entirely convinced that S3 itself has full compatibility with the way that its API has been documented.”And there's an awful lot of magic that goes into this too. “Okay, great. You're running an S3 competitor. Great. How many buildings does it live in?” Like, “Well, we have a problem with the s at the end of that word.” It's, “Okay, great. If it fits on my desk, it is not a viable S3 competitor. If it fits in a single zip code, it is probably not a viable S3 competitor.” Now, can it be an object store? Absolutely. Does it provide a new interface to some existing data someone might have? Sure why not. But I think that, oh, it's S3 compatible, is something that gets tossed around far too lightly by folks who don't really understand what it is that drives S3 and makes it special.Kevin: Yeah, I mean, I would say certainly, there's a number of other implementations of the S3 API, and frankly we're flattered that customers recognize and our competitors and others recognize the simplicity of the API and go about implementing it. But to your point, I think that there's a lot more; it's not just about the API, it's really around everything surrounding S3 from, as you mentioned, the fact that the data in S3 is stored in three independent availability zones, all of which that are separated by kilometers from each other, and the resilience, the automatic failover, and the ability to withstand an unlikely impact to one of those facilities, as well as the scalability, and you know, the fact that we put a lot of time and effort into making sure that the service continues scaling with our customers need. And so, I think there's a lot more that goes into what is S3. And oftentimes just in a straight-up comparison, it's sort of purely based on just the APIs and generally a small set of APIs, in addition to those intangibles around—or not intangibles, but all of the ‘-ilities,' right, the elasticity and the durability, and so forth that I just talked about. In addition to all that also, you know, certainly what we're seeing for customers is as they get into the petabyte and tens of petabytes, hundreds of petabytes scale, their need for the services that we provide to manage that storage, whether it's lifecycle and replication, or things like our batch operations to help update and to maintain all the storage, those become really essential to customers wrapping their arms around it, as well as visibility, things like Storage Lens to understand, what storage do I have? Who's using it? How is it being used?And those are all things that we provide to help customers manage at scale. And certainly, you know, oftentimes when I see claims around S3 compatibility, a lot of those advanced features are nowhere to be seen.Corey: I also want to call out that a few years ago, Mai-Lan got on stage and talked about how, to my recollection, you folks have effectively rebuilt S3 under the hood into I think it was 235 distinct microservices at the time. There will not be a quiz on numbers later, I'm assuming. But what was wild to me about that is having done that for services that are orders of magnitude less complex, it absolutely is like changing the engine on a car without ever slowing down on the highway. Customers didn't know that any of this was happening until she got on stage and announced it. That is wild to me. I would have said before this happened that there was no way that would have been possible except it clearly was. I have to ask, how did you do that in the broad sense?Kevin: Well, it's true. A lot of the underlying infrastructure that's been part of S3, both hardware and software is, you know, you wouldn't—if someone from S3 in 2006 came and looked at the system today, they would probably be very disoriented in terms of understanding what was there because so much of it has changed. To answer your question, the long and short of it is a lot of testing. In fact, a lot of novel testing most recently, particularly with the use of formal logic and what we call automated reasoning. It's also something we've talked a fair bit about in re:Invent.And that is essentially where you prove the correctness of certain algorithms. And we've used that to spot some very interesting, the one-in-a-trillion type cases that S3 scale happens regularly, that you have to be ready for and you have to know how the system reacts, even in all those cases. I mean, I think one of our engineers did some calculations that, you know, the number of potential states for S3, sort of, exceeds the number of atoms in the universe or something so crazy. But yet, using methods like automated reasoning, we can test that state space, we can understand what the system will do, and have a lot of confidence as we begin to swap, you know, pieces of the system.And of course, nothing in S3 scale happens instantly. It's all, you know, I would say that for a typical engineering effort within S3, there's a certain amount of effort, obviously, in making the change or in preparing the new software, writing the new software and testing it, but there's almost an equal amount of time that goes into, okay, and what is the process for migrating from System A to System B, and that happens over a timescale of months, if not years, in some cases. And so, there's just a lot of diligence that goes into not just the new systems, but also the process of, you know, literally, how do I swap that engine on the system. So, you know, it's a lot of really hard working engineers that spent a lot of time working through these details every day.Corey: I still view S3 through the lens of it is one of the easiest ways in the world to wind up building a static web server because you basically stuff the website files into a bucket and then you check a box. So, it feels on some level though, that it is about as accurate as saying that S3 is a database. It can be used or misused or pressed into service in a whole bunch of different use cases. What have you seen from customers that has, I guess, taught you something you didn't expect to learn about your own service?Kevin: Oh, I'd say we have those [laugh]  meetings pretty regularly when customers build their workloads and have unique patterns to it, whether it's the type of data they're retrieving and the access pattern on the data. You know, for example, some customers will make heavy use of our ability to do [ranged gets 00:22:47] on files and [unintelligible 00:22:48] objects. And that's pretty good capability, but that can be one where that's very much dependent on the type of file, right, certain files have structure, as far as you know, a header or footer, and that data is being accessed in a certain order. Oftentimes, those may also be multi-part objects, and so making use of the multi-part features to upload different chunks of a file in parallel. And you know, also certainly when customers get into things like our batch operations capability where they can literally write a Lambda function and do what they want, you know, we've seen some pretty interesting use cases where customers are running large-scale operations across, you know, billions, sometimes tens of billions of objects, and this can be pretty interesting as far as what they're able to do with them.So, for something is sort of what you might—you know, as simple and basics, in some sense, of GET and PUT API, just all the capability around it ends up being pretty interesting as far as how customers apply it and the different workloads they run on it.Corey: So, if you squint hard enough, what I'm hearing you tell me is that I can view all of this as, “Oh, yeah. S3 is also compute.” And it feels like that as a fast-track to getting a question wrong on one of the certification exams. But I have to ask, from your point of view, is S3 storage? And whether it's yes or no, what gets you excited about the space that it's in?Kevin: Yeah well, I would say S3 is not compute, but we have some great compute services that are very well integrated with S3, which excites me as well as we have things like S3 Object Lambda, where we actually handle that integration with Lambda. So, you're writing Lambda functions, we're executing them on the GET path. And so, that's a pretty exciting feature for me. But you know, to sort of take a step back, what excites me is I think that customers around the world, in every industry, are really starting to recognize the value of data and data at large scale. You know, I think that actually many customers in the world have terabytes or more of data that sort of flows through their fingers every day that they don't even realize.And so, as customers realize what data they have, and they can capture and then start to analyze and make ultimately make better business decisions that really help drive their top line or help them reduce costs, improve costs on whether it's manufacturing or, you know, other things that they're doing. That's what really excites me is seeing those customers take the raw capability and then apply it to really just to transform how they not just how their business works, but even how they think about the business. Because in many cases, transformation is not just a technical transformation, it's people and cultural transformation inside these organizations. And that's pretty cool to see as it unfolds.Corey: One of the more interesting things that I've seen customers misunderstand, on some level, has been a number of S3 releases that focus around, “Oh, this is for your data lake.” And I've asked customers about that. “So, what's your data lake strategy?” “Well, we don't have one of those.” “You have, like, eight petabytes and climbing in S3? What do you call that?” It's like, “Oh, yeah, that's just a bunch of buckets we dump things into. Some are logs of our assets and the rest.” It's—Kevin: Right.Corey: Yeah, it feels like no one thinks of themselves as having anything remotely resembling a structured place for all of the data that accumulates at a company.Kevin: Mm-hm.Corey: There is an evolution of people learning that oh, yeah, this is in fact, what it is that we're doing, and this thing that they're talking about does apply to us. But it almost feels like a customer communication challenge, just because, I don't know about you, but with my legacy AWS account, I have dozens of buckets in there that I don't remember what the heck they're for. Fortunately, you folks don't charge by the bucket, so I can smile, nod, remain blissfully ignorant, but it does make me wonder from time to time.Kevin: Yeah, no, I think that what you hear there is actually pretty consistent with what the reality is for a lot of customers, which is in distributed organizations, I think that's bound to happen, you have different teams that are working to solve problems, and they are collecting data to analyze, they're creating result datasets and they're storing those datasets. And then, of course, priorities can shift, and you know, and there's not necessarily the day-to-day management around data that we might think would be expected. I feel [we 00:26:56] sort of drew an architecture on a whiteboard. And so, I think that's the reality we are in. And we will be in, largely forever.I mean, I think that at a smaller-scale, that's been happening for years. So, I think that, one, I think that there's a lot of capability just being in the cloud. At the very least, you can now start to wrap your arms around it, right, where used to be that it wasn't even possible to understand what all that data was because there's no way to centrally inventory it well. In AWS with S3, with inventory reports, you can get a list of all your storage and we are going to continue to add capability to help customers get their arms around what they have, first off; understand how it's being used—that's where things like Storage Lens really play a big role in understanding exactly what data is being accessed and not. We're definitely listening to customers carefully around this, and I think when you think about broader data management story, I think that's a place that we're spending a lot of time thinking right now about how do we help customers get their arms around it, make sure that they know what's the categorization of certain data, do I have some PII lurking here that I need to be very mindful of?And then how do I get to a world where I'm—you know, I won't say that it's ever going to look like the perfect whiteboard picture you might draw on the wall. I don't think that's really ever achievable, but I think certainly getting to a point where customers have a real solid understanding of what data they have and that the right controls are in place around all that data, yeah, I think that's directionally where I see us heading.Corey: As you look around how far the service has come, it feels like, on some level, that there were some, I guess, I don't want to say missteps, but things that you learned as you went along. Like, back when the service was in beta, for example, there was no per-request charge. To my understanding that was changed, in part because people were trying to use it as a file system, and wow, that suddenly caused a tremendous amount of load on some of the underlying systems. You originally launched with a BitTorrent endpoint as an option so that people could download through peer-to-peer approaches for large datasets and turned out that wasn't really the way the internet evolved, either. And I'm curious, if you were to have to somehow build this off from scratch, are there any other significant changes you would make in how the service was presented to customers in how people talked about it in the early days? Effectively given a mulligan, what would you do differently?Kevin: Well, I don't know, Corey, I mean, just given where it's grown to in macro terms, you know, I definitely would be worried taking a mulligan, you know, that I [laugh] would change the sort of the overarching trajectory. Certainly, I think there's a few features here and there where, for whatever reason, it was exciting at the time and really spoke to what customers at the time were thinking, but over time, you know, sort of quickly those needs move to something a little bit different. And, you know, like you said things like the BitTorrent support is one where, at some level, it seems like a great technical architecture for the internet, but certainly not something that we've seen dominate in the way things are done. Instead, you know, we've largely kind of have a world where there's a lot of caching layers, but it still ends up being largely client-server kind of connections. So, I don't think I would do a—I certainly wouldn't do a mulligan on any of the major functionality, and I think, you know, there's a few things in the details where obviously, we've learned what really works in the end. I think we learned that we wanted bucket names to really strictly conform to rules for DNS encoding. So, that was the change that was made at some point. And we would tweak that, but no major changes, certainly.Corey: One subject of some debate while we were designing this year's charity t-shirt—which, incidentally, if you're listening to this, you can pick up for yourself at snark.cloud/shirt—was the is S3 itself dependent upon S3? Because we know that every other service out there is as well, but it is interesting to come up with an idea of, “Oh, yeah. We're going to launch a whole new isolated region of S3 without S3 to lean on.” That feels like it's an almost impossible bootstrapping problem.Kevin: Well, S3 is not dependent on S3 to come up, and it's certainly a critical dependency tree that we look at and we track and make sure that we'd like to have an acyclic graph as we look at dependencies.Corey: That is such a sophisticated way to say what I learned the hard way when I was significantly younger and working in production environments: don't put the DNS servers needed to boot the hypervisor into VMs that require a working hypervisor. It's one of those oh, yeah, in hindsight, that makes perfect sense, but you learn it right after that knowledge really would have been useful.Kevin: Yeah, absolutely. And one of the terms we use for that, as well as is the idea of static stability, or that's one of the techniques that can really help with isolating a dependency is what we call static stability. We actually have an article about that in the Amazon Builder Library, which there's actually a bunch of really good articles in there from very experienced operations-focused engineers in AWS. So, static stability is one of those key techniques, but other techniques—I mean, just pure minimization of dependencies is one. And so, we were very, very thoughtful about that, particularly for that core layer.I mean, you know, when you talk about S3 with 200-plus microservices, or 235-plus microservices, I would say not all of those services are critical for every single request. Certainly, a small subset of those are required for every request, and then other services actually help manage and scale the kind of that inner core of services. And so, we look at dependencies on a service by service basis to really make sure that inner core is as minimized as possible. And then the outer layers can start to take some dependencies once you have that basic functionality up.Corey: I really want to thank you for being as generous with your time as you have been. If people want to learn more about you and about S3 itself, where should they go—after buying a t-shirt, of course.Kevin: Well, certainly buy the t-shirt. First, I love the t-shirts and the charity that you work with to do that. Obviously, for S3, it's aws.amazon.com/s3. And you can actually learn more about me. I have some YouTube videos, so you can search for me on YouTube and kind of get a sense of myself.Corey: We will put links to that into the show notes, of course. Thank you so much for being so generous with your time. I appreciate it.Kevin: Absolutely. Yeah. Glad to spend some time. Thanks for the questions, Corey.Corey: Kevin Miller, vice president and general manager for Amazon S3. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, ignorant comment talking about how your S3 compatible service is going to blow everyone's socks off when it fails.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Man Behind the Curtain at Zoph with Victor Grenu

Screaming in the Cloud

Play Episode Listen Later Oct 25, 2022 28:28


About VictorVictor is an Independent Senior Cloud Infrastructure Architect working mainly on Amazon Web Services (AWS), designing: secure, scalable, reliable, and cost-effective cloud architectures, dealing with large-scale and mission-critical distributed systems. He also has a long experience in Cloud Operations, Security Advisory, Security Hardening (DevSecOps), Modern Applications Design, Micro-services and Serverless, Infrastructure Refactoring, Cost Saving (FinOps).Links Referenced: Zoph: https://zoph.io/ unusd.cloud: https://unusd.cloud Twitter: https://twitter.com/zoph LinkedIn: https://www.linkedin.com/in/grenuv/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at Datadog. Datadog's SaaS monitoring and security platform that enables full stack observability for developers, IT operations, security, and business teams in the cloud age. Datadog's platform, along with 500 plus vendor integrations, allows you to correlate metrics, traces, logs, and security signals across your applications, infrastructure, and third party services in a single pane of glass.Combine these with drag and drop dashboards and machine learning based alerts to help teams troubleshoot and collaborate more effectively, prevent downtime, and enhance performance and reliability. Try Datadog in your environment today with a free 14 day trial and get a complimentary T-shirt when you install the agent.To learn more, visit datadoghq.com/screaminginthecloud to get. That's www.datadoghq.com/screaminginthecloudCorey: Managing shards. Maintenance windows. Overprovisioning. ElastiCache bills. I know, I know. It's a spooky season and you're already shaking. It's time for caching to be simpler. Momento Serverless Cache lets you forget the backend to focus on good code and great user experiences. With true autoscaling and a pay-per-use pricing model, it makes caching easy. No matter your cloud provider, get going for free at gomomento.co/screaming That's GO M-O-M-E-N-T-O dot co slash screamingCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. One of the best parts about running a podcast like this and trolling the internet of AWS things is every once in a while, I get to learn something radically different than what I expected. For a long time, there's been this sort of persona or brand in the AWS space, specifically the security side of it, going by Zoph—that's Z-O-P-H—and I just assumed it was a collective or a whole bunch of people working on things, and it turns out that nope, it is just one person. And that one person is my guest today. Victor Grenu is an independent AWS architect. Victor, thank you for joining me.Victor: Hey, Corey, thank you for having me. It's a pleasure to be here.Corey: So, I want to start by diving into the thing that first really put you on my radar, though I didn't realize it was you at the time. You have what can only be described as an army of Twitter bots around the AWS ecosystem. And I don't even know that I'm necessarily following all of them, but what are these bots and what do they do?Victor: Yeah. I have a few bots on Twitter that I push some notification, some tweets, when things happen on AWS security space, especially when the AWS managed policies are updated from AWS. And it comes from an initial project from Scott Piper. He was running a Git command on his own laptop to push the history of AWS managed policy. And it told me that I can automate this thing using a deployment pipeline and so on, and to tweet every time a new change is detected from AWS. So, the idea is to monitor every change on these policies.Corey: It's kind of wild because I built a number of somewhat similar Twitter bots, only instead of trying to make them into something useful, I'd make them into something more than a little bit horrifying and extraordinarily obnoxious. Like there's a Cloud Boomer Twitter account that winds up tweeting every time Azure tweets something only it quote-tweets them in all caps and says something insulting. I have an AWS releases bot called AWS Cwoud—so that's C-W-O-U-D—and that winds up converting it to OwO speak. It's like, “Yay a new auto-scawowing growp.” That sort of thing is obnoxious and offensive, but it makes me laugh.Yours, on the other hand, are things that I have notifications turned on for just because when they announce something, it's generally fairly important. The first one that I discovered was your IAM changes bot. And I found some terrifying things coming out of that from time to time. What's the data source for that? Because I'm just grabbing other people's Twitter feeds or RSS feeds; you're clearly going deeper than that.Victor: Yeah, the data source is the official AWS managed policy. In fact, I run AWS CLI in the background and I'm doing just a list policy, the list policy command, and with this list I'm doing git of each policy that is returned, so I can enter it in a git repository to get the full history of the time. And I also craft a list of deprecated policy, and I also run, like, a dog-food initiative, the policy analysis, validation analysis from AWS tools to validate the consistency and the accuracy of the own policies. So, there is a policy validation with their own tool. [laugh].Corey: You would think that wouldn't turn up anything because their policy validator effectively acts as a linter, so if it throws an error, of course, you wouldn't wind up pushing that. And yet, somehow the fact that you have bothered to hook that up and have findings from it indicates that that's not how the real world works.Victor: Yeah, there is some, let's say, some false positive because we are running the policy validation with their own linter then own policies, but this is something that is documented from AWS. So, there is an official page where you can find why the linter is not working on each policy and why. There is a an explanation for each findings. I thinking of [unintelligible 00:05:05] managed policy, which is too long, and policy analyzer is crashing because the policy is too long.Corey: Excellent. It's odd to me that you have gone down this path because it's easy enough to look at this and assume that, oh, this must just be something you do for fun or as an aspect of your day job. So, I did a little digging into what your day job is, and this rings very familiar to me: you are an independent AWS consultant, only you're based out of Paris, whereas I was doing this from San Francisco, due to an escalatingly poor series of life choices on my part. What do you focus on in the AWS consulting world?Victor: Yeah. I'm running an AWS consulting boutique in Paris and I'm working for a large customer in France. And I'm doing mostly infrastructure stuff, infrastructure design for cloud-native application, and I'm also doing some security audits and [unintelligible 00:06:07] mediation for my customer.Corey: It seems to me that there's a definite divide as far as how people find the AWS consulting experience to be. And I'm not trying to cast judgment here, but the stories that I hear tend to fall into one of two categories. One of them is the story that you have, where you're doing this independently, you've been on your own for a while working specifically on this, and then there's the stories of, “Oh, yeah, I work for a 500 person consultancy and we do everything as long as they'll pay us money. If they've got money, we'll do it. Why not?”And it always seems to me—not to be overly judgy—but the independent consultants just seem happier about it because for better or worse, we get to choose what we focus on in a way that I don't think you do at a larger company.Victor: Yeah. It's the same in France or in Europe; there is a lot of consulting firms. But with the pandemic and with the market where we are working, in the cloud, in the cloud-native solution and so on, that there is a lot of demands. And the natural path is to start by working for a consulting firm and then when you are ready, when you have many AWS certification, when you have the experience of the customer, when you have a network of well-known customer, and you gain trust from your customer, I think it's natural to go by yourself, to be independent and to choose your own project and your own customer.Corey: I'm curious to get your take on what your perception of being an AWS consultant is when you're based in Paris versus, in my case, being based in the West Coast of the United States. And I know that's a bit of a strange question, but even when I travel, for example, over to the East Coast, suddenly, my own newsletter sends out three hours later in the day than I expect it to and that throws me for a loop. The AWS announcements don't come out at two or three in the afternoon; they come out at dinnertime. And for you, it must be in the middle of the night when a lot of those things wind up dropping. The AWS stuff, not my newsletter. I imagine you're not excitedly waiting on tenterhooks to see what this week's issue of Last Week in AWS talks about like I am.But I'm curious is that even beyond that, how do you experience the market? From what you're perceiving people in the United States talking about as AWS consultants versus what you see in Paris?Victor: It's difficult, but in fact, I don't have so much information about the independent in the US. I know that there is a lot, but I think it's more common in Europe. And yeah, it's an advantage to whoever ten-hour time [unintelligible 00:08:56] from the US because a lot of stuff happen on the Pacific time, on the Seattle timezone, on San Francisco timezone. So, for example, for this podcast, my Monday is over right now, so, so yeah, I have some advantage in time, but yeah.Corey: This is potentially an odd question for you. But I find an awful lot of the AWS documentation to be challenging, we'll call it. I don't always understand exactly what it's trying to tell me, and it's not at all clear that the person writing the documentation about a service in some cases has ever used the service. And in everything I just said, there is no language barrier. This documentation was written—theoretically—in English and I, most days, can stumble through a sentence in English and almost no other language. You obviously speak French as a first language. Given that you live in Paris, it seems to be a relatively common affliction. How do you find interacting with AWS in French goes? Or is it just a complete nonstarter, and it all has to happen in English for you?Victor: No, in fact, the consultants in Europe, I think—in fact, in my part, I'm using my laptop in English, I'm using my phone in English, I'm using the AWS console in English, and so on. So, the documentation for me is a switch on English first because for the other language, there is sometimes some automated translation that is very dangerous sometimes, so we all keep the documentation and the materials in English.Corey: It's wild to me just looking at how challenging so much of the stuff is. Having to then work in a second language on top of that, it just seems almost insurmountable to me. It's good they have automated translation for a lot of this stuff, but that falls down in often hilariously disastrous ways, sometimes. It's wild to me that even taking most programming languages that folks have ever heard of, even if you program and speak no English, which happens in a large part of the world, you're still using if statements even if the term ‘if' doesn't mean anything to you localized in your language. It really is, in many respects, an English-centric industry.Victor: Yeah. Completely. Even in French for our large French customer, I'm writing the PowerPoint presentation in English, some emails are in English, even if all the folks in the thread are French. So yeah.Corey: One other area that I wanted to explore with you a bit is that you are very clearly focused on security as a primary area of interest. Does that manifest in the work that you do as well? Do you find that your consulting engagements tend to have a high degree of focus on security?Victor: Yeah. In my design, when I'm doing some AWS architecture, my main objective is to design some security architecture and security patterns that apply best practices and least privilege. But often, I'm working for engagement on security audits, for startups, for internal customer, for diverse company, and then doing some accommodation after all. And to run my audit, I'm using some open-source tooling, some custom scripts, and so on. I have a methodology that I'm running for each customer. And the goal is to sometime to prepare some certification, PCI DSS or so on, or maybe to ensure that the best practice are correctly applied on a workload or before go-live or, yeah.Corey: One of the weird things about this to me is that I've said for a long time that cost and security tend to be inextricably linked, as far as being a sort of trailing reactive afterthought for an awful lot of companies. They care about both of those things right after they failed to adequately care about those things. At least in the cloud economic space, it's only money as opposed to, “Oops, we accidentally lost our customers' data.” So, I always found that I find myself drifting in a security direction if I don't stop myself, just based upon a lot of the cost work I do. Conversely, it seems that you have come from the security side and you find yourself drifting in a costing direction.Your side project is a SaaS offering called unusd.cloud, that's U-N-U-S-D dot cloud. And when you first mentioned this to me, my immediate reaction was, “Oh, great. Another SaaS platform for costing. Let's tear this one apart, too.” Except I actually like what you're building. Tell me about it.Victor: Yeah, and unusd.cloud is a side project for me and I was working since, let's say one year. It was a project that I've deployed for some of my customer on their local account, and it was very useful. And so, I was thinking that it could be a SaaS project. So, I've worked at [unintelligible 00:14:21] so yeah, a few months on shifting the product to assess [unintelligible 00:14:27].The product aim to detect the worst on AWS account on all AWS region, and it scan all your AWS accounts and all your region, and you try to detect and use the EC2, LDS, Glue [unintelligible 00:14:45], SageMaker, and so on, and attach a EBS and so on. I don't craft a new dashboard, a new Cost Explorer, and so on. It's it just cost awareness, it's just a notification on email or Slack or Microsoft Teams. And you just add your AWS account on the project and you schedule, let's say, once a day, and it scan, and it send you a cost of wellness, a [unintelligible 00:15:17] detection, and you can act by turning off what is not used.Corey: What I like about this is it cuts at the number one rule of cloud economics, which is turn that shit off if you're not using it. You wouldn't think that I would need to say that except that everyone seems to be missing that, on some level. And it's easy to do. When you need to spin something up and it's not there, you're very highly incentivized to spin that thing up. When you're not using it, you have to remember that thing exists, otherwise it just sort of sits there forever and doesn't do anything.It just costs money and doesn't generate any value in return for that. What you got right is you've also eviscerated my most common complaint about tools that claim to do this, which is you build in either a explicit rule of ignore this resource or ignore resources with the following tags. The benefit there is that you're not constantly giving me useless advice, like, “Oh, yeah, turn off this idle thing.” It's, yeah, that's there for a reason, maybe it's my dev box, maybe it's my backup site, maybe it's the entire DR environment that I'm going to need at little notice. It solves for that problem beautifully. And though a lot of tools out there claim to do stuff like this, most of them really failed to deliver on that promise.Victor: Yeah, I just want to keep it simple. I don't want to add an additional console and so on. And you are correct. You can apply a simple tag on your asset, let's say an EC2 instances, you apply the tag in use and the value of, and then the alerting is disabled for this asset. And the detection is based on the CPU [unintelligible 00:17:01] and the network health metrics, so when the instances is not used in the last seven days, with a low CPU every [unintelligible 00:17:10] and low network out, it comes as a suspect. [laugh].[midroll 00:17:17]Corey: One thing that I like about what you've done, but also have some reservations about it is that you have not done with so many of these tools do which is, “Oh, just give us all the access in your account. It'll be fine. You can trust us. Don't you want to save money?” And yeah, but I also still want to have a company left when all sudden done.You are very specific on what it is that you're allowed to access, and it's great. I would argue, on some level, it's almost too restrictive. For example, you have the ability to look at EC2, Glue, IAM—just to look at account aliases, great—RDS, Redshift, and SageMaker. And all of these are simply list and describe. There's no gets in there other than in Cost Explorer, which makes sense. You're not able to go rummaging through my data and see what's there. But that also bounds you, on some level, to being able to look only at particular types of resources. Is that accurate or are you using a lot of the CloudWatch stuff and Cost Explorer stuff to see other areas?Victor: In fact, it's the least privilege and read-only permission because I don't want too much question for the security team. So, it's full read-only permission. And I've only added the detection that I'm currently supports. Then if in some weeks, in some months, I'm adding a new detection, let's say for Snapshot, for example, I will need to update, so I will ask my customer to update their template. There is a mechanisms inside the project to tell them that the template is obsolete, but it's not a breaking change.So, the detection will continue, but without the new detection, the new snapshot detection, let's say. So yeah, it's least privilege, and all I need is the get-metric-statistics from CloudWatch to detect unused assets. And also checking [unintelligible 00:19:16] Elastic IP or [unintelligible 00:19:19] EBS volume. So, there is no CloudWatching in this detection.Corey: Also, to be clear, I am not suggesting that what you have done is at all a mistake, even if you bound it to those resources right now. But just because everyone loves to talk about these exciting, amazing, high-level services that AWS has put up there, for example, oh, what about DocumentDB or all these other—you know, Amazon Basics MongoDB; same thing—or all of these other things that they wind up offering, but you take a look at where customers are spending money and where they're surprised to be spending money, it's EC2, it's a bit of RDS, occasionally it's S3, but that's a lot harder to detect automatically whether that data is unused. It's, “You haven't been using this data very much.” It's, “Well, you see how the bucket is labeled ‘Archive Backups' or ‘Regulatory Logs?'” imagine that. What a ridiculous concept.Yeah. Whereas an idle EC2 instance sort of can wind up being useful on this. I am curious whether you encounter in the wild in your customer base, folks who are having idle-looking EC2 instances, but are in fact, for example, using a whole bunch of RAM, which you can't tell from the outside without custom CloudWatch agents.Victor: Yeah, I'm not detecting this behavior for larger usage of RAM, for example, or for maybe there is some custom application that is low in CPU and don't talk to any other services using the network, but with this detection, with the current state of the detection, I'm covering large majority of waste because what I see from my customer is that there is some teams, some data scientists or data teams who are experimenting a lot with SageMaker with Glue, with Endpoint and so on. And this is very expensive at the end of the day because they don't turn off the light at the end of the day, on Friday evening. So, what I'm trying to solve here is to notify the team—so on Slack—when they forgot to turn off the most common waste on AWS, so EC2, LTS, Redshift.Corey: I just now wound up installing it while we've been talking on my dedicated shitposting account, and sure enough, it already spat out a single instance it found, which yeah was running an EC2 instance on the East Coast when I was just there, so that I had a DNS server that was a little bit more local. Okay, great. And it's a T4g.micro, so it's not exactly a whole lot of money, but it does exactly what it says on the tin. It didn't wind up nailing the other instances I have in that account that I'm using for a variety of different things, which is good.And it further didn't wind up falling into the trap that so many things do, which is the, “Oh, it's costing you zero and your spend this month is zero because this account is where I dump all of my AWS credit codes.” So, many things say, “Oh, well, it's not costing you anything, so what's the problem?” And then that's how you accidentally lose $100,000 in activate credits because someone left something running way too long. It does a lot of the right things that I would hope and expect it to do, and the fact that you don't do that is kind of amazing.Victor: Yeah. It was a need from my customer and an opportunity. It's a small bet for me because I'm trying to do some small bets, you know, the small bets approach, so the idea is to try a new thing. It's also an excuse for me to learn something new because building a SaaS is a challenging.Corey: One thing that I am curious about, in this account, I'm also running the controller for my home WiFi environment. And that's not huge. It's T3.small, but it is still something out there that it sits there because I need it to exist. But it's relatively bored.If I go back and look over the last week of CloudWatch metrics, for example, it doesn't look like it's usually busy. I'm sure there's some network traffic in and out as it updates itself and whatnot, but the CPU peeks out at a little under 2% used. It didn't warn on this and it got it right. I'm just curious as to how you did that. What is it looking for to determine whether this instance is unused or not?Victor: It's the magic [laugh]. There is some intelligence artif—no, I'm just kidding. It just statistics. And I'm getting two metrics, the superior average from the last seven days and the network out. And I'm getting the average on those metrics and I'm doing some assumption that this EC2, this specific EC2 is not used because of these metrics, this server average.Corey: Yeah, it is wild to me just that this is working as well as it is. It's just… like, it does exactly what I would expect it to do. It's clear that—and this is going to sound weird, but I'm going to say it anyway—that this was built from someone who was looking to answer the question themselves and not from the perspective of, “Well, we need to build a product and we have access to all of this data from the API. How can we slice and dice it and add some value as we go?” I really liked the approach that you've taken on this. I don't say that often or lightly, particularly when it comes to cloud costing stuff, but this is something I'll be using in some of my own nonsense.Victor: Thanks. I appreciate it.Corey: So, I really want to thank you for taking as much time as you have to talk about who you are and what you're up to. If people want to learn more, where can they find you?Victor: Mainly on Twitter, my handle is @zoph [laugh]. And, you know, on LinkedIn or on my company website, as zoph.io.Corey: And we will, of course, put links to that in the [show notes 00:25:23]. Thank you so much for your time today. I really appreciate it.Victor: Thank you, Corey, for having me. It was a pleasure to chat with you.Corey: Victor Grenu, independent AWS architect. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment that is going to cost you an absolute arm and a leg because invariably, you're going to forget to turn it off when you're done.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Consulting the Aspiring Consultant with Mike Julian

Screaming in the Cloud

Play Episode Listen Later Oct 20, 2022 30:33


About MikeBeside his duties as The Duckbill Group's CEO, Mike is the author of O'Reilly's Practical Monitoring, and previously wrote the Monitoring Weekly newsletter and hosted the Real World DevOps podcast. He was previously a DevOps Engineer for companies such as Taos Consulting, Peak Hosting, Oak Ridge National Laboratory, and many more. Mike is originally from Knoxville, TN (Go Vols!) and currently resides in Portland, OR.Links Referenced: @Mike_Julian: https://twitter.com/Mike_Julian mikejulian.com: https://mikejulian.com duckbillgroup.com: https://duckbillgroup.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at AWS AppConfig. Engineers love to solve, and occasionally create, problems. But not when it's an on-call fire-drill at 4 in the morning. Software problems should drive innovation and collaboration, NOT stress, and sleeplessness, and threats of violence. That's why so many developers are realizing the value of AWS AppConfig Feature Flags. Feature Flags let developers push code to production, but hide that that feature from customers so that the developers can release their feature when it's ready. This practice allows for safe, fast, and convenient software development. You can seamlessly incorporate AppConfig Feature Flags into your AWS or cloud environment and ship your Features with excitement, not trepidation and fear. To get started, go to snark.cloud/appconfig. That's snark.cloud/appconfig.Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With Tailscale SSH, you can do exactly that. Tailscale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate SSH.Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built in key rotation, permissions is code, connectivity between any two devices, reduce latency and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?Nope, I wish it were. Tailscale is completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscaleCorey: Welcome to Screaming in the Cloud. I'm Cloud Economist Corey Quinn, and my guest is a returning guest on this show, my business partner and CEO of The Duckbill Group, Mike Julian. Mike, thanks for making the time.Mike: Lucky number three, I believe?Corey: Something like that, but numbers are hard. I have databases for that of varying quality and appropriateness for the task, but it works out. Anything's a database. If you're brave enough.Mike: With you inviting me this many times, I'm starting to think you'd like me or something.Corey: I know, I know. So, let's talk about something that is going to put that rumor to rest.Mike: [laugh].Corey: Clearly, you have made some poor choices in the course of your career, like being my business partner being the obvious one. But what's really in a dead heat for which is the worst decision is you've written a book previously. And now you are starting the process of writing another book because, I don't know, we don't keep you busy enough or something. What are you doing?Mike: Making very bad decisions. When I finished writing Practical Monitoring—O'Reilly, and by the way, you should go buy a copy if interested in monitoring—I finished the book and said, “Wow, that was awful. I'm never doing it again.” And about a month later, I started thinking of new books to write. So, that was 2017, and Corey and I started Duckbill and kind of stopped thinking about writing books because small companies are basically small children. But now I'm going to write a book about consulting.Corey: Oh, thank God. I thought you're going to go down the observability path a second time.Mike: You know, I'm actually dreading the day that O'Reilly asks me to do a second edition because I don't really want to.Corey: Yeah. Effectively turn it into an entire story where the only monitoring tool you really need is the AWS bill. That'll go well.Mike: [laugh]. Yeah. So yeah, like, basically, I've been doing consulting for such a long time, and most of my career is consulting in some form or fashion, and I head up all the consulting at Duckbill. I've learned a lot about consulting. And I've found that people have a lot of questions about consulting, particularly at the higher-end levels. Once you start getting into advisory sort of stuff, there's not a lot of great information out there aimed at engineering.Corey: There's a bunch of different views on what consulting is. You have independent contractors billing by the hour as staff replacement who call what they do consulting; you have the big consultancies, like Bain or BCG; you've got what we do in an advisory sense, and of course, you have a bunch of MBA new grads going to a lot of the big consultancies who are going to see a book on consulting and think that it's potentially for them. I don't know that you necessarily have a lot of advice for the new grad type, so who is this for? What is your target customer for this book?Mike: If you're interested in joining McKinsey out of college, I don't have a lot to add; I don't have a lot to tell you. The reason for that is kind of twofold. One is that shops like McKinsey and Deloitte and Accenture and BCG and Bain, all those, are playing very different games than what most of us think about when we think consulting. Their entire model revolves around running a process. And it's the same process for every client they work with. But, like, you're buying them because of their process.And that process is nothing new or novel. You don't go to those firms because you want the best advice possible. You go to those firms because it's the most defensible advice. It's sort of those things like, “No one gets fired for buying Cisco,” no one got fired for buying IBM, like, that sort of thing, it's a very defensible choice. But you're not going to get great results from it.But because of that, their entire model revolves around throwing dozens, in some cases, hundreds of new grads at a problem and saying, “Run this process. Have fun. Let us know if you need help.” That's not consulting I have any experience with. It's honestly not consulting that most of us want to do.Most of that is staffed by MBAs and accountants. When I think consulting, I think about specialized advice and providing that specialized advice to people. And I wager that most of us think about that in the same way, too. In some cases, it might just be, “I'm going to write code for you as a freelancer,” or I'm just going to tell you like, “Hey, put the nail in here instead of over here because it's going to be better for you.” Like, paying for advice is good.But with that, I also have a… one of the first things I say in the beginning of the book, which [laugh] I've already started writing because I'm a glutton for punishment, is I don't think junior people should be consultants. I actually think it's really bad idea because to be a consultant, you have to have expertise in some area, and junior staff don't. They haven't been in their careers long enough to develop that yet. So, they're just going to flounder. So, my advice is generally aimed at people that have been in their careers for quite some time, generally, people that are 10, 15, 20 years into their career, looking to do something.Corey: One of the problems that we see when whenever we talk about these things on Twitter is that we get an awful lot of people telling us that we're wrong, that it can't be made to work, et cetera, et cetera. But following this model, I've been independent for—well, I was independent and then we became The Duckbill Group; add them together because figuring out exactly where that divide happened is always a mental leap for me, but it's been six years at this point. We've definitely proven our ability to not go out of business every month. It's kind of amazing. Without even an exception case of, “That one time.”Mike: [laugh]. Yeah, we are living proof that it does work, but you don't really have to take just our word for it because there are a lot of other firms that exist entirely on an advisory-only, high-expertise model. And it works out really well. We've worked with several of them, so it does work; it just isn't very common inside of tech and particularly inside of engineering.Corey: So, one of the things that I find is what differentiates an expert from an enthusiastic amateur is, among other things, the number of mistakes that they've made. So, I guess a different way of asking this is what qualifies you to write this book, but instead, I'm going to frame it in a very negative way. What have you screwed up on that puts you in a position of, “Ah, I'm going to write a book so that someone else can make better choices.”Mike: One of my favorite stories to tell—and Corey, I actually think you might not have heard this story before—Corey: That seems unlikely, but give it a shot.Mike: Yeah. So, early in my career, I was working for a consulting firm that did ERP implementations. We worked with mainly large, old-school manufacturing firms. So, my job there was to do the engineering side of the implementation. So, a lot of rack-and-stack, a lot of Windows Server configuration, a lot of pulling cables, that sort of thing. So, I thought I was pretty good at this. I quickly learned that I was actually not nearly as good as I thought I was.Corey: A common affliction among many different people.Mike: A common affliction. But I did not realize that until this one particular incident. So, me and my boss are both on site at this large manufacturing facility, and the CFO pulls my boss aside and I can hear them talking and, like, she's pretty upset. She points at me and says, “I never want this asshole in my office ever again.” So, he and I have a long drive back to our office, like an hour and a half.And we had a long chat about what that meant for me. I was not there for very long after that, as you might imagine, but the thing is, I still have no idea to this day what I did to upset her. I know that she was pissed and he knows that she was pissed. And he never told me exactly what it was, only that's you take care of your client. And the client believes that I screwed up so massively that she wanted me fired.Him not wanting to argue—he didn't; he just kind of went with it—and put me on other clients. But as a result of that, it really got me thinking that I screwed something up so badly to make this person hate me so much and I still have no idea what it was that I did. Which tells me that even at the time, I did not understand what was going on around me. I did not understand how to manage clients well, and to really take care of them. That was probably the first really massive mistake that I've made my career—or, like, the first time I came to the realization that there's a whole lot I don't know and it's really costing me.Corey: From where I sit, there have been a number of things that we have done as we've built our consultancy, and I'm curious—you know, let's get this even more personal—in the past, well, we'll call it four years that we have been The Duckbill Group—which I think is right—what have we gotten right and what have we gotten wrong? You are the expert; you're writing a book on this for God's sake.Mike: So, what I think we've gotten right is one of my core beliefs is never bill hourly. Shout out to Jonathan Stark. He wrote I really good book that is a much better explanation of that than I've ever been able to come up with. But I've always had the belief that billing hourly is just a bad idea, so we've never done that and that's worked out really well for us. We've turned down work because that's the model they wanted and it's like, “Sorry, that's not what we do. You're going to have to go work for someone else—or hire someone else.”Other things that I think we've gotten right is a focus on staying on the advisory side and not doing any implementation. That's allowed us to get really good at what we do very quickly because we don't get mired in long-term implementation detail-level projects. So, that's been great. Where we went a little wrong, I think—or what we have gotten wrong, lessons that we've learned. I had this idea that we could build out a junior and mid-level staff and have them overseen by very senior people.And, as it turns out, that didn't work for us, entirely because it didn't work for me. That was really my failure. I went from being an IC to being the leader of a company in one single step. I've never been a manager before Duckbill. So, that particular mistake was really about my lack of abilities in being a good manager and being a good leader.So, building that out, that did not work for us because it didn't work for me and I didn't know how to do it. So, I made way too many mistakes that were kind of amateur-level stuff in terms of management. So, that didn't work. And the other major mistake that I think we've made is not putting enough effort into marketing. So, we get most of our leads by inbound or referral, as is common with boutique consulting firms, but a lot of the income that we get comes through Last Week in AWS, which is really awesome.But we don't put a whole lot of effort into content or any marketing stuff related to the thing that we do, like cost management. I think a lot of that is just that we don't really know how, aside from just creating content and publishing it. We don't really understand how to market ourselves very well on that side of things. I think that's a mistake we've made.Corey: It's an effective strategy against what's a very complicated problem because unlike most things, if—let's go back to your old life—if we have an observability problem, we will talk about that very publicly on Twitter and people will come over and get—“Hey, hey, have you tried to buy my company's product?” Or they'll offer consulting services, or they'll point us in the right direction, all of which is sometimes appreciated. Whereas when you have a big AWS bill, you generally don't talk about it in public, especially if you're a serious company because that's going to, uh, I think the phrase is, “Shake investor confidence,” when you're actually live tweeting slash shitposting about your own AWS bill. And our initial thesis was therefore, since we can't wind up reaching out to these people when they're having the pain because there's no external indication of it, instead what we have to do is be loud enough and notable in this space, where they find us where it shouldn't take more than them asking one or two of their friends before they get pointed to us. What's always fun as the stories we hear is, “Okay, so I asked some other people because I wanted a second opinion, and they told us to go to you, too.” Word of mouth is where our customers come from. But how do you bootstrap that? I don't know. I'm lucky that I got it right the first time.Mike: Yeah, and as I mentioned a minute ago, that a lot of that really comes through your content, which is not really cost management-related. It's much more AWS broad. We don't put out a lot of cost management specific content. And honestly, I think that's to our detriment. We should and we absolutely can. We just haven't. I think that's one of the really big things that we've missed on doing.Corey: There's an argument that the people who come to us do not spend their entire day thinking about AWS bills. I mean, I can't imagine what that would be like, but they don't for whatever reason; they're trying to do something ridiculous, like you know, run a profitable company. So, getting in front of them when they're not thinking about the bills means, on some level, that they're going to reach out to us when the bill strikes. At least that's been my operating theory.Mike: Yeah, I mean, this really just comes down to content strategy and broader marketing strategy. Because one of the things you have to think about with marketing is how do you meet a customer at the time that they have the problem that you solve? And what most marketing people talk about here is what's called the triggering event. Something causes someone to take an action. What is that something? Who is that someone, and what is that action?And for us, one of the things that we thought early on is that well, the bill comes out the first week of the month, every month, so people are going to opened the bill freak out, and a big influx of leads are going to come our way and that's going to happen every single month. The reality is that never happened. That turns out was not a triggering event for anyone.Corey: And early on, when we didn't have that many leads coming in, it was a statistical aberration that I thought I saw, like, “Oh, out of the three leads this month, two of them showed up in the same day. Clearly, it's an AWS billing day thing.” No. It turns out that every company's internal cadence is radically different.Mike: Right. And I wish I could say that we have found what our triggering events are, but I actually don't think we have. We know who the people are and we know what they reach out for, but we haven't really uncovered that triggering event. And it could also be there, there isn't a one. Or at least, if there is one, it's not one that we could see externally, which is kind of fine.Corey: Well, for the half of our consulting that does contract negotiation for large-scale commitments with AWS, it comes up for renewal or the initial discount contract gets offered, those are very clear triggering events but the challenge is that we don't—Mike: You can't see them externally.Corey: —really see that from the outside. Yeah.Mike: Right. And this is one of those things where there are triggering events for basically everything and it's probably going to be pretty consistent once you get down to specific services. Like we provide cost optimization services and contract negotiation services. I'm willing to bet that I can predict exactly what the trigger events for both of those will be pretty well. The problem is, you can never see those externally, which is kind of fine.Ideally, you would be able to see it externally, but you can't, so we roll with it, which means our entire strategy has revolved around always being top-of-mind because at the time where it happens, we're already there. And that's a much more difficult strategy to employ, but it does work.Corey: All it takes is time and being really lucky and being really prolific, and, and, and. It's one of those things where if I were to set out to replicate it, I don't even know how I'd go about doing it.Mike: People have been asking me. They say, “I want to create The Duckbill Group for X. What do I do?” And I say, “First step, get yourself a Corey Quinn.” And they're like, “Well, I can't do that. There's only one.” I'm like, “Yep. Sucks to be you.” [laugh].Corey: Yeah, we called the Jerk Store. They're running out of him. Yeah, it's a problem. And I don't think the world needs a whole lot more of my type of humor, to be honest, because the failure mode that I have experienced brutally and firsthand is not that people don't find me funny; it's that it really hurts people's feelings. I have put significant effort into correcting those mistakes and not repeating them, but it sucks every time I get it wrong.Mike: Yeah.Corey: Another question I have for you around the book targeting, are you aiming this at individual independent consultants or are you looking to advise people who are building agencies?Mike: Explicitly not the latter. My framing around this is that there are a number of people who are doing consulting right now and they've kind of fell into it. Often, they'll leave one job and do a little consulting while they're waiting on their next thing. And in some cases, that might be a month or two. In some cases, it might go on years, but that whole time, they're just like, “Oh, yeah, I'm doing consulting in between things.”But at some point, some of those think, “You know what? I want this to be my thing. I don't want there to be a next thing. This is my thing. So therefore, how do I get serious about doing consulting? How do I get serious about being a consultant?”And that's where I think I can add a lot of value because casually consulting of, like, taking whatever work just kind of falls your way is interesting for a while, but once you get serious about it, and you have to start thinking, well, how do I actually deliver engagements? How do I do that consistently? How do I do it repeatedly? How to do it profitably? How do I price my stuff? How do I package it? How do I attract the leads that I want? How do I work with the customers I want?And turning that whole thing from a casual, “Yeah, whatever,” into, “This is my business,” is a very different way of thinking. And most people don't think that way because they didn't really set out to build a business. They set out to just pass time and earn a little bit of money before they went off to the next job. So, the framing that I have here is that I'm aiming to help people that are wanting to get serious about doing consulting. But they generally have experience doing it already.Corey: Managing shards. Maintenance windows. Overprovisioning. ElastiCache bills. I know, I know. It's a spooky season and you're already shaking. It's time for caching to be simpler. Momento Serverless Cache lets you forget the backend to focus on good code and great user experiences. With true autoscaling and a pay-per-use pricing model, it makes caching easy. No matter your cloud provider, get going for free at gomemento.co/screaming That's GO M-O-M-E-N-T-O dot co slash screamingCorey: We went from effectively being the two of us on the consulting delivery side, two scaling up to, I believe, at one point we were six of us, and now we have scaled back down to largely the two of us, aided by very specific external folk, when it makes sense.Mike: And don't forget April.Corey: And of course. I'm talking delivery.Mike: [laugh].Corey: There's a reason I—Mike: Delivery. Yes.Corey: —prefaced it that way. There's a lot of support structure here, let's not get ourselves, and they make this entire place work. But why did we scale up? And then why did we scale down? Because I don't believe we've ever really talked about that publicly.Mike: No, not publicly. In fact, most people probably don't even notice that it happened. We got pretty big for—I mean, not big. So, we hit, I think, six full-time people at one point. And that was quite a bit.Corey: On the delivery side. Let's be clear.Mike: Yeah. No, I think actually with support structure, too. Like, if you add in everyone that we had with the sales and marketing as well, we were like 11 people. And that was a pretty sizable company. But then in July this year, it kind of hit a point where I found that I just wasn't enjoying my job anymore.And I looked around and noticed that a lot of other people was kind of feeling the same way, is just things had gotten harder. And the business wasn't suffering at all, it was just everything felt more difficult. And I finally realized that, for me personally at least, I started Duckbill because I love working with clients, I love doing consulting. And what I have found is that as the company grew larger and larger, I spent most of my time keeping the trains running and taking care of the staff. Which is exactly what I should be doing when we're that size, like, that is my job at that size, but I didn't actually enjoy it.I went into management as, like, this job going from having never done it before. So, I didn't have anything to compare it to. I didn't know if I would like it or not. And once I got here, I realized I actually don't. And I spent a lot of efforts to get better at it and I think I did. I've been working with a leadership coach for years now.But it finally came to a point where I just realized that I wasn't actually enjoying it anymore. I wasn't enjoying the job that I had created. And I think that really panned out to you as well. So, we decided, we had kind of an opportune time where one of our team decided that they were also wanting to go back to do independent consulting. I'm like, “Well, this is actually pretty good time. Why don't we just start scaling things back?” And like, maybe we'll scale it up again in the future; maybe we won't. But like, let's just buy ourselves some breathing room.Corey: One of the things that I think we didn't spend quite enough time really asking ourselves was what kind of place do we want to work at. Because we've explicitly stated that you and I both view this as the last job either of us is ever going to have, which means that we're not trying to do the get big quickly to get acquired, or we want to raise a whole bunch of other people's money to scale massively. Those aren't things either of us enjoy. And it turns out that handling the challenges of a business with as many people working here as we had wasn't what either one of us really wanted to do.Mike: Yeah. You know what—[laugh] it's funny because a lot of our advisors kept asking the same thing. Like, “So, what kind of company do you want?” And like, we had some pretty good answers for that, in that we didn't want to build a VC-backed company, we didn't ever want to be hyperscale. But there's a wide gulf of things between two-person company and hyperscale and we didn't really think too much about that.In fact, being a ten-person company is very different than being a three-person company, and we didn't really think about that either. We should have really put a lot more thought into that of what does it mean to be a ten-person company, and is that what we want? Or is three, four, or five-person more our style? But then again, I don't know that we could have predicted that as a concern had we not tried it first.Corey: Yeah, that was very much something that, for better or worse, we pay advisors for their advice—that's kind of definitionally how it works—and then we ignored it, on some level, though we thought we were doing something different at the time because there's some lessons you've just got to learn by making the mistake yourself.Mike: Yeah, we definitely made a few of those. [laugh].Corey: And it's been an interesting ride and I've got zero problem with how things have shaken out. I like what we do quite a bit. And honestly, the biggest fear I've got going forward is that my jackass business partner is about to distract the hell out of himself by writing a book, which is never as easy as even the most pessimistic estimates would be. So, that's going to be awesome and fun.Mike: Yeah, just wait until you see the dedication page.Corey: Yeah, I wasn't mentioned at all in the last book that you wrote, which I found personally offensive. So, if I'm not mentioned this time, you're fired.Mike: Oh, no, you are. It's just I'm also adding an anti-dedication page, which just has a photo of you.Corey: Oh, wonderful, wonderful. This is going to be one of those stories of the good consultant and the bad consultant, and I'm going to be the Goofus to your Gallant, aren't I?Mike: [laugh]. Yes, yes. You are.Corey: “Goofus wants to bill by the hour.”Mike: It's going to have a page of, like, “Here's this [unintelligible 00:25:05] book is dedicated to. Here's my acknowledgments. And [BLEEP] this guy.”Corey: I love it. I absolutely love it. I think that there is definitely a bright future for telling other people how to consult properly. May just suggest as a subtitle for the book is Consulting—subtitle—You Have Problems and Money. We'll Take Both.Mike: [laugh]. Yeah. My working title for this is Practical Consulting, but only because my previous book was Practical Monitoring. Pretty sure O'Reilly would have a fit if I did that. I actually have no idea what I'm going to call the book, still.Corey: Naming things is super hard. I would suggest asking people at AWS who name services and then doing the exact opposite of whatever they suggest. Like, take their list of recommendations and sort by reverse order and that'll get you started.Mike: Yeah. [laugh].Corey: I want to thank you for giving us an update on what you're working on and why you have less hair every time I see you because you're mostly ripping it out due to self-inflicted pain. If people want to follow your adventures, where's the best place to keep updated on this ridiculous, ridiculous nonsense that I cannot talk you out of?Mike: Two places. You can follow me on Twitter, @Mike_Julian, or you can sign up for the newsletter on my site at mikejulian.com where I'll be posting all the updates.Corey: Excellent. And I look forward to skewering the living hell out of them.Mike: I look forward to ignoring them.Corey: Thank you, Mike. It is always a pleasure.Mike: Thank you, Corey.Corey: Mike Julian, CEO at The Duckbill Group, and my unwilling best friend. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, annoying comment in which you tell us exactly what our problem is, and then charge us a fixed fee to fix that problem.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Evolution of Cloud Services with Richard Hartmann

Screaming in the Cloud

Play Episode Listen Later Oct 18, 2022 45:26


About RichardRichard "RichiH" Hartmann is the Director of Community at Grafana Labs, Prometheus team member, OpenMetrics founder, OpenTelemetry member, CNCF Technical Advisory Group Observability chair, CNCF Technical Oversight Committee member, CNCF Governing Board member, and more. He also leads, organizes, or helps run various conferences from hundreds to 18,000 attendess, including KubeCon, PromCon, FOSDEM, DENOG, DebConf, and Chaos Communication Congress. In the past, he made mainframe databases work, ISP backbones run, kept the largest IRC network on Earth running, and designed and built a datacenter from scratch. Go through his talks, podcasts, interviews, and articles at https://github.com/RichiH/talks or follow him on Twitter at https://twitter.com/TwitchiH for musings on the intersection of technology and society.Links Referenced: Grafana Labs: https://grafana.com/ Twitter: https://twitter.com/TwitchiH Richard Hartmann list of talks: https://github.com/richih/talks TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at AWS AppConfig. Engineers love to solve, and occasionally create, problems. But not when it's an on-call fire-drill at 4 in the morning. Software problems should drive innovation and collaboration, NOT stress, and sleeplessness, and threats of violence. That's why so many developers are realizing the value of AWS AppConfig Feature Flags. Feature Flags let developers push code to production, but hide that that feature from customers so that the developers can release their feature when it's ready. This practice allows for safe, fast, and convenient software development. You can seamlessly incorporate AppConfig Feature Flags into your AWS or cloud environment and ship your Features with excitement, not trepidation and fear. To get started, go to snark.cloud/appconfig. That's snark.cloud/appconfig.Corey: This episode is brought to us in part by our friends at Datadog. Datadog's SaaS monitoring and security platform that enables full stack observability for developers, IT operations, security, and business teams in the cloud age. Datadog's platform, along with 500 plus vendor integrations, allows you to correlate metrics, traces, logs, and security signals across your applications, infrastructure, and third party services in a single pane of glass.Combine these with drag and drop dashboards and machine learning based alerts to help teams troubleshoot and collaborate more effectively, prevent downtime, and enhance performance and reliability. Try Datadog in your environment today with a free 14 day trial and get a complimentary T-shirt when you install the agent.To learn more, visit datadoghq/screaminginthecloud to get. That's www.datadoghq/screaminginthecloudCorey: Welcome to Screaming in the Cloud, I'm Corey Quinn. There are an awful lot of people who are incredibly good at understanding the ins and outs and the intricacies of the observability world. But they didn't have time to come on the show today. Instead, I am talking to my dear friend of two decades now, Richard Hartmann, better known on the internet as RichiH, who is the Director of Community at Grafana Labs, here to suffer—in a somewhat atypical departure for the theme of this show—personal attacks for once. Richie, thank you for joining me.Richard: And thank you for agreeing on personal attacks.Corey: Exactly. It was one of your riders. Like, there have to be the personal attacks back and forth or you refuse to appear on the show. You've been on before. In fact, the last time we did a recording, I believe you were here in person, which was a long time ago. What have you been up to?You're still at Grafana Labs. And in many cases, I would point out that, wow, you've been there for many years; that seems to be an atypical thing, which is an American tech industry perspective because every time you and I talk about this, you look at folks who—wow, you were only at that company for five years. What's wrong with you—you tend to take the longer view and I tend to have the fast twitch, time to go ahead and leave jobs because it's been more than 20 minutes approach. I see that you're continuing to live what you preach, though. How's it been?Richard: Yeah, so there's a little bit of Covid brains, I think. When we talked in 2018, I was still working at SpaceNet, building a data center. But the last two-and-a-half years didn't really happen for many people, myself included. So, I guess [laugh] that includes you.Corey: No, no you're right. You've only been at Grafana Labs a couple of years. One would think I would check the notes for shooting my mouth off. But then, one wouldn't know me.Richard: What notes? Anyway, I've been around Prometheus and Grafana Since 2015. But it's like, real, full-time everything is 2020. There was something in between. Since 2018, I contracted to do vulnerability handling and everything for Grafana Labs because they had something and they didn't know how to deal with it.But no, full time is 2020. But as to the space in the [unintelligible 00:02:45] of itself, it's maybe a little bit German of me, but trying to understand the real world and trying to get an overview of systems and how they actually work, and if they are working correctly and as intended, and if not, how they're not working as intended, and how to fix this is something which has always been super important to me, in part because I just want to understand the world. And this is a really, really good way to automate understanding of the world. So, it's basically a work-saving mechanism. And that's why I've been sticking to it for so long, I guess.Corey: Back in the early days of monitoring systems—so we called it monitoring back then because, you know, are using simple words that lack nuance was sort of de rigueur back then—we wound up effectively having tools. Nagios is the one that springs to mind, and it was terrible in all the ways you would expect a tool written in janky Perl in the early-2000s to be. But it told you what was going on. It tried to do a thing, generally reach a server or query it about things, and when things fell out of certain specs, it screamed its head off, which meant that when you had things like the core switch melting down—thinking of one very particular incident—you didn't get a Nagios alert; you got 4000 Nagios alerts. But start to finish, you could wrap your head rather fully around what Nagios did and why it did the sometimes strange things that it did.These days, when you take a look at Prometheus, which we hear a lot about, particularly in the Kubernetes space and Grafana, which is often mentioned in the same breath, it's never been quite clear to me exactly where those start and stop. It always feels like it's a component in a larger system to tell you what's going on rather than a one-stop shop that's going to, you know, shriek its head off when something breaks in the middle of the night. Is that the right way to think about it? The wrong way to think about it?Richard: It's a way to think about it. So personally, I use the terms monitoring and observability pretty much interchangeably. Observability is a relatively well-defined term, even though most people won't agree. But if you look back into the '70s into control theory where the term is coming from, it is the measure of how much you're able to determine the internal state of a system by looking at its inputs and its outputs. Depending on the definition, some people don't include the inputs, but that is the OG definition as far as I'm aware.And from this, there flow a lot of things. This question of—or this interpretation of the difference between telling that, yes, something's broken versus why something's broken. Or if you can't ask new questions on the fly, it's not observability. Like all of those things are fundamentally mapped to this definition of, I need enough data to determine the internal state of whatever system I have just by looking at what is coming in, what is going out. And that is at the core the thing. Now, obviously, it's become a buzzword, which is oftentimes the fate of successful things. So, it's become a buzzword, and you end up with cargo culting.Corey: I would argue periodically, that observability is hipster monitoring. If you call it monitoring, you get yelled at by Charity Majors. Which is tongue and cheek, but she has opinions, made, nonetheless shall I say, frustrating by the fact that she is invariably correct in those opinions, which just somehow makes it so much worse. It would be easy to dismiss things she says if she weren't always right. And the world is changing, especially as we get into the world of distributed systems.Is the server that runs the app working or not working loses meaning when we're talking about distributed systems, when we're talking about containers running on top of Kubernetes, which turns every outage into a murder mystery. We start having distributed applications composed of microservices, so you have no idea necessarily where an issue is. Okay, is this one microservice having an issue related to the request coming into a completely separate microservice? And it seems that for those types of applications, the answer has been tracing for a long time now, where originally that was something that felt like it was sprung, fully-formed from the forehead of some God known as one of the hyperscalers, but now is available to basically everyone, in theory.In practice, it seems that instrumenting applications still one of the hardest parts of all of this. I tried hooking up one of my own applications to be observed via OTEL, the open telemetry project, and it turns out that right now, OTEL and AWS Lambda have an intersection point that makes everything extremely difficult to work with. It's not there yet; it's not baked yet. And someday, I hope that changes because I would love to interchangeably just throw metrics and traces and logs to all the different observability tools and see which ones work, which ones don't, but that still feels very far away from current state of the art.Richard: Before we go there, maybe one thing which I don't fully agree with. You said that previously, you were told if a service up or down, that's the thing which you cared about, and I don't think that's what people actually cared about. At that time, also, what they fundamentally cared about: is the user-facing service up, or down, or impacted? Is it slow? Does it return errors every X percent for requests, something like this?Corey: Is the site up? And—you're right, I was hand-waving over a whole bunch of things. It was, “Okay. First, the web server is returning a page, yes or no? Great. Can I ping the server?” Okay, well, there are ways of server can crash and still leave enough of the TCP/IP stack up or it can respond to pings and do little else.And then you start adding things to it. But the Nagios thing that I always wanted to add—and had to—was, is the disk full? And that was annoying. And, on some level, like, why should I care in the modern era how much stuff is on the disk because storage is cheap and free and plentiful? The problem is, after the third outage in a month because the disk filled up, you start to not have a good answer for well, why aren't you monitoring whether the disk is full?And that was the contributors to taking down the server. When the website broke, there were what felt like a relatively small number of reasonably well-understood contributors to that at small to midsize applications, which is what I'm talking about, the only things that people would let me touch. I wasn't running hyperscale stuff where you have a fleet of 10,000 web servers and, “Is the server up?” Yeah, in that scenario, no one cares. But when we're talking about the database server and the two application servers and the four web servers talking to them, you think about it more in terms of pets than you do cattle.Richard: Yes, absolutely. Yet, I think that was a mistake back then, and I tried to do it differently, as a specific example with the disk. And I'm absolutely agreeing that previous generation tools limit you in how you can actually work with your data. In particular, once you're with metrics where you can do actual math on the data, it doesn't matter if the disk is almost full. It matters if that disk is going to be full within X amount of time.If that disk is 98% full and it sits there at 98% for ten years and provides the service, no one cares. The thing is, will it actually run out in the next two hours, in the next five hours, what have you. Depending on this, is this currently or imminently a customer-impacting or user-impacting then yes, alert on it, raise hell, wake people, make them fix it, as opposed to this thing can be dealt with during business hours on the next workday. And you don't have to wake anyone up.Corey: Yeah. The big filer with massive amounts of storage has crossed the 70% line. Okay, now it's time to start thinking about that, what do you want to do? Maybe it's time to order another shelf of discs for it, which is going to take some time. That's a radically different scenario than the 20 gigabyte root volume on your server just started filling up dramatically; the rate of change is such that'll be full in 20 minutes.Yeah, one of those is something you want to wake people up for. Generally speaking, you don't want to wake people up for what is fundamentally a longer-term strategic business problem. That can be sorted out in the light of day versus, “[laugh] we're not going to be making money in two hours, so if I don't wake up and fix this now.” That's the kind of thing you generally want to be woken up for. Well, let's be honest, you don't want that to happen at all, but if it does happen, you kind of want to know in advance rather than after the fact.Richard: You're literally describing linear predict from Prometheus, which is precisely for this, where I can look back over X amount of time and make a linear prediction because everything else breaks down at scale, blah, blah, blah, to detail. But the thing is, I can draw a line with my pencil by hand on my data and I can predict when is this thing going to it. Which is obviously precisely correct if I have a TLS certificate. It's a little bit more hand-wavy when it's a disk. But still, you can look into the future and you say, “What will be happening if current trends for the last X amount of time continue in Y amount of time.” And that's precisely a thing where you get this more powerful ability of doing math with your data.Corey: See, when you say it like that, it sounds like it actually is a whole term of art, where you're focusing on an in-depth field, where salaries are astronomical. Whereas the tools that I had to talk about this stuff back in the day made me sound like, effectively, the sysadmin that I was grunting and pointing: “This is gonna fill up.” And that is how I thought about it. And this is the challenge where it's easy to think about these things in narrow, defined contexts like that, but at scale, things break.Like the idea of anomaly detection. Well, okay, great if normally, the CPU and these things are super bored and suddenly it gets really busy, that's atypical. Maybe we should look into it, assuming that it has a challenge. The problem is, that is a lot harder than it sounds because there are so many factors that factor into it. And as soon as you have something, quote-unquote, “Intelligent,” making decisions on this, it doesn't take too many false positives before you start ignoring everything it has to say, and missing legitimate things. It's this weird and obnoxious conflation of both hard technical problems and human psychology.Richard: And the breaking up of old service boundaries. Of course, when you say microservices, and such, fundamentally, functionally a microservice or nanoservice, picoservice—but the pendulum is already swinging back to larger units of complexity—but it fundamentally does not make any difference if I have a monolith on some mainframe or if I have a bunch of microservices. Yes, I can scale differently, I can scale horizontally a lot more easily, vertically, it's a little bit harder, blah, blah, blah, but fundamentally, the logic and the complexity, which is being packaged is fundamentally the same. More users, everything, but it is fundamentally the same. What's happening again, and again, is I'm breaking up those old boundaries, which means the old tools which have assumptions built in about certain aspects of how I can actually get an overview of a system just start breaking down, when my complexity unit or my service or what have I, is usually congruent with a physical piece, of hardware or several services are congruent with that piece of hardware, it absolutely makes sense to think about things in terms of this one physical server. The fact that you have different considerations in cloud, and microservices, and blah, blah, blah, is not inherently that it is more complex.On the contrary, it is fundamentally the same thing. It scales with users' everything, but it is fundamentally the same thing, but I have different boundaries of where I put interfaces onto my complexity, which basically allow me to hide all of this complexity from the downstream users.Corey: That's part of the challenge that I think we're grappling with across this entire industry from start to finish. Where we originally looked at these things and could reason about it because it's the computer and I know how those things work. Well, kind of, but okay, sure. But then we start layering levels of complexity on top of layers of complexity on top of layers of complexity, and suddenly, when things stop working the way that we expect, it can be very challenging to unpack and understand why. One of the ways I got into this whole space was understanding, to some degree, of how system calls work, of how the kernel wound up interacting with userspace, about how Linux systems worked from start to finish. And these days, that isn't particularly necessary most of the time for the care and feeding of applications.The challenge is when things start breaking, suddenly having that in my back pocket to pull out could be extremely handy. But I don't think it's nearly as central as it once was and I don't know that I would necessarily advise someone new to this space to spend a few years as a systems person, digging into a lot of those aspects. And this is why you need to know what inodes are and how they work. Not really, not anymore. It's not front and center the way that it once was, in most environments, at least in the world that I live in. Agree? Disagree?Richard: Agreed. But it's very much unsurprising. You probably can't tell me how to precisely grow sugar cane or corn, you can't tell me how to refine the sugar out of it, but you can absolutely bake a cake. But you will not be able to tell me even a third of—and I'm—for the record, I'm also not able to tell you even a third about the supply chain which just goes from I have a field and some seeds and I need to have a package of refined sugar—you're absolutely enabled to do any of this. The thing is, you've been part of the previous generation of infrastructure where you know how this underlying infrastructure works, so you have more ability to reason about this, but it's not needed for cloud services nearly as much.You need different types of skill sets, but that doesn't mean the old skill set is completely useless, at least not as of right now. It's much more a case of you need fewer of those people and you need them in different places because those things have become infrastructure. Which is basically the cloud play, where a lot of this is just becoming infrastructure more and more.Corey: Oh, yeah. Back then I distinctly remember my elders looking down their noses at me because I didn't know assembly, and how could I possibly consider myself a competent systems admin if I didn't at least have a working knowledge of assembly? Or at least C, which I, over time, learned enough about to know that I didn't want to be a C programmer. And you're right, this is the value of cloud and going back to those days getting a web server up and running just to compile Apache's httpd took a week and an in-depth knowledge of GCC flags.And then in time, oh, great. We're going to have rpm or debs. Great, okay, then in time, you have apt, if you're in the dev land because I know you are a Debian developer, but over in Red Hat land, we had yum and other tools. And then in time, it became oh, we can just use something like Puppet or Chef to wind up ensuring that thing is installed. And then oh, just docker run. And now it's a checkbox in a web console for S3.These things get easier with time and step by step by step we're standing on the shoulders of giants. Even in the last ten years of my career, I used to have a great challenge question that I would interview people with of, “Do you know what TinyURL is? It takes a short URL and then expands it to a longer one. Great, on the whiteboard, tell me how you would implement that.” And you could go up one side and down the other, and then you could add constraints, multiple data centers, now one goes offline, how do you not lose data? Et cetera, et cetera.But these days, there are so many ways to do that using cloud services that it almost becomes trivial. It's okay, multiple data centers, API Gateway, a Lambda, and a global DynamoDB table. Now, what? “Well, now it gets slow. Why is it getting slow?”“Well, in that scenario, probably because of something underlying the cloud provider.” “And so now, you lose an entire AWS region. How do you handle that?” “Seems to me when that happens, the entire internet's kind of broken. Do people really need longer URLs?”And that is a valid answer, in many cases. The question doesn't really work without a whole bunch of additional constraints that make it sound fake. And that's not a weakness. That is the fact that computers and cloud services have never been as accessible as they are now. And that's a win for everyone.Richard: There's one aspect of accessibility which is actually decreasing—or two. A, you need to pay for them on an ongoing basis. And B, you need an internet connection which is suitably fast, low latency, what have you. And those are things which actually do make things harder for a variety of reasons. If I look at our back-end systems—as in Grafana—all of them have single binary modes where you literally compile everything into a single binary and you can run it on your laptop because if you're stuck on a plane, you can't do any work on it. That kind of is not the best of situations.And if you have a huge CI/CD pipeline, everything in this cloud and fine and dandy, but your internet breaks. Yeah, so I do agree that it is becoming generally more accessible. I disagree that it is becoming more accessible along all possible axes.Corey: I would agree. There is a silver lining to that as well, where yes, they are fraught and dangerous and I would preface this with a whole bunch of warnings, but from a cost perspective, all of the cloud providers do have a free tier offering where you can kick the tires on a lot of these things in return for no money. Surprisingly, the best one of those is Oracle Cloud where they have an unlimited free tier, use whatever you want in this subset of services, and you will never be charged a dime. As opposed to the AWS model of free tier where well, okay, it suddenly got very popular or you misconfigured something, and surprise, you now owe us enough money to buy Belize. That doesn't usually lead to a great customer experience.But you're right, you can't get away from needing an internet connection of at least some level of stability and throughput in order for a lot of these things to work. The stuff you would do locally on a Raspberry Pi, for example, if your budget constrained and want to get something out here, or your laptop. Great, that's not going to work in the same way as a full-on cloud service will.Richard: It's not free unless you have hard guarantees that you're not going to ever pay anything. It's fine to send warning, it's fine to switch the thing off, it's fine to have you hit random hard and soft quotas. It is not a free service if you can't guarantee that it is free.Corey: I agree with you. I think that there needs to be a free offering where, “Well, okay, you want us to suddenly stop serving traffic to the world?” “Yes. When the alternative is you have to start charging me through the nose, yes I want you to stop serving traffic.” That is definitionally what it says on the tin.And as an independent learner, that is what I want. Conversely, if I'm an enterprise, yeah, I don't care about money; we're running our Superbowl ad right now, so whatever you do, don't stop serving traffic. Charge us all the money. And there's been a lot of hand wringing about, well, how do we figure out which direction to go in? And it's, have you considered asking the customer?So, on a scale of one to bank, how serious is this account going to be [laugh]? Like, what are your big concerns: never charge me or never go down? Because we can build for either of those. Just let's make sure that all of those expectations are aligned. Because if you guess you're going to get it wrong and then no one's going to like you.Richard: I would argue this. All those services from all cloud providers actually build to address both of those. It's a deliberate choice not to offer certain aspects.Corey: Absolutely. When I talk to AWS, like, “Yeah, but there is an eventual consistency challenge in the billing system where it takes”—as anyone who's looked at the billing system can see—“Multiple days, sometimes for usage data to show up. So, how would we be able to stop things if the usage starts climbing?” To which my relatively direct responses, that sounds like a huge problem. I don't know how you'd fix that, but I do know that if suddenly you decide, as a matter of policy, to okay, if you're in the free tier, we will not charge you, or even we will not charge you more than $20 a month.So, you build yourself some headroom, great. And anything that people are able to spin up, well, you're just going to have to eat the cost as a provider. I somehow suspect that would get fixed super quickly if that were the constraint. The fact that it isn't is a conscious choice.Richard: Absolutely.Corey: And the reason I'm so passionate about this, about the free space, is not because I want to get a bunch of things for free. I assure you I do not. I mean, I spend my life fixing AWS bills and looking at AWS pricing, and my argument is very rarely, “It's too expensive.” It's that the billing dimension is hard to predict or doesn't align with a customer's experience or prices a service out of a bunch of use cases where it'll be great. But very rarely do I just sit here shaking my fist and saying, “It costs too much.”The problem is when you scare the living crap out of a student with a surprise bill that's more than their entire college tuition, even if you waive it a week or so later, do you think they're ever going to be as excited as they once were to go and use cloud services and build things for themselves and see what's possible? I mean, you and I met on IRC 20 years ago because back in those days, the failure mode and the risk financially was extremely low. It's yeah, the biggest concern that I had back then when I was doing some of my Linux experimentation is if I typed the wrong thing, I'm going to break my laptop. And yeah, that happened once or twice, and I've learned not to make those same kinds of mistakes, or put guardrails in so the blast radius was smaller, or use a remote system instead. Yeah, someone else's computer that I can destroy. Wonderful. But that was on we live and we learn as we were coming up. There was never an opportunity for us, to my understanding, to wind up accidentally running up an $8 million charge.Richard: Absolutely. And psychological safety is one of the most important things in what most people do. We are social animals. Without this psychological safety, you're not going to have long-term, self-sustaining groups. You will not make someone really excited about it. There's two basic ways to sell: trust or force. Those are the two ones. There's none else.Corey: Managing shards. Maintenance windows. Overprovisioning. ElastiCache bills. I know, I know. It's a spooky season and you're already shaking. It's time for caching to be simpler. Momento Serverless Cache lets you forget the backend to focus on good code and great user experiences. With true autoscaling and a pay-per-use pricing model, it makes caching easy. No matter your cloud provider, get going for free at gomemento.co/screaming That's GO M-O-M-E-N-T-O dot co slash screamingCorey: Yeah. And it also looks ridiculous. I was talking to someone somewhat recently who's used to spending four bucks a month on their AWS bill for some S3 stuff. Great. Good for them. That's awesome. Their credentials got compromised. Yes, that is on them to some extent. Okay, great.But now after six days, they were told that they owed $360,000 to AWS. And I don't know how, as a cloud company, you can sit there and ask a student to do that. That is not a realistic thing. They are what is known, in the United States at least, in the world of civil litigation as quote-unquote, “Judgment proof,” which means, great, you could wind up finding that someone owes you $20 billion. Most of the time, they don't have that, so you're not able to recoup it. Yeah, the judgment feels good, but you're never going to see it.That's the problem with something like that. It's yeah, I would declare bankruptcy long before, as a student, I wound up paying that kind of money. And I don't hear any stories about them releasing the collection agency hounds against people in that scenario. But I couldn't guarantee that. I would never urge someone to ignore that bill and see what happens.And it's such an off-putting thing that, from my perspective, is beneath of the company. And let's be clear, I see this behavior at times on Google Cloud, and I see it on Azure as well. This is not something that is unique to AWS, but they are the 800-pound gorilla in the space, and that's important. Or as I just to mention right now, like, as I—because I was about to give you crap for this, too, but if I go to grafana.com, it says, and I quote, “Play around with the Grafana Stack. Experience Grafana for yourself, no registration or installation needed.”Good. I was about to yell at you if it's, “Oh, just give us your credit card and go ahead and start spinning things up and we won't charge you. Honest.” Even your free account does not require a credit card; you're doing it right. That tells me that I'm not going to get a giant surprise bill.Richard: You have no idea how much thought and work went into our free offering. There was a lot of math involved.Corey: None of this is easy, I want to be very clear on that. Pricing is one of the hardest things to get right, especially in cloud. And it also, when you get it right, it doesn't look like it was that hard for you to do. But I fix [sigh] I people's AWS bills for a living and still, five or six years in, one of the hardest things I still wrestle with is pricing engagements. It's incredibly nuanced, incredibly challenging, and at least for services in the cloud space where you're doing usage-based billing, that becomes a problem.But glancing at your pricing page, you do hit the two things that are incredibly important to me. The first one is use something for free. As an added bonus, you can use it forever. And I can get started with it right now. Great, when I go and look at your pricing page or I want to use your product and it tells me to ‘click here to contact us.' That tells me it's an enterprise sales cycle, it's got to be really expensive, and I'm not solving my problem tonight.Whereas the other side of it, the enterprise offering needs to be ‘contact us' and you do that, that speaks to the enterprise procurement people who don't know how to sign a check that doesn't have to commas in it, and they want to have custom terms and all the rest, and they're prepared to pay for that. If you don't have that, you look to small-time. When it doesn't matter what price you put on it, you wind up offering your enterprise tier at some large number, it's yeah, for some companies, that's a small number. You don't necessarily want to back yourself in, depending upon what the specific needs are. You've gotten that right.Every common criticism that I have about pricing, you folks have gotten right. And I definitely can pick up on your fingerprints on a lot of this. Because it sounds like a weird thing to say of, “Well, he's the Director of Community, why would he weigh in on pricing?” It's, “I don't think you understand what community is when you ask that question.”Richard: Yes, I fully agree. It's super important to get pricing right, or to get many things right. And usually the things which just feel naturally correct are the ones which took the most effort and the most time and everything. And yes, at least from the—like, I was in those conversations or part of them, and the one thing which was always clear is when we say it's free, it must be free. When we say it is forever free, it must be forever free. No games, no lies, do what you say and say what you do. Basically.We have things where initially you get certain pro features and you can keep paying and you can keep using them, or after X amount of time they go away. Things like these are built in because that's what people want. They want to play around with the whole thing and see, hey, is this actually providing me value? Do I want to pay for this feature which is nice or this and that plugin or what have you? And yeah, you're also absolutely right that once you leave these constraints of basically self-serve cloud, you are talking about bespoke deals, but you're also talking about okay, let's sit down, let's actually understand what your business is: what are your business problems? What are you going to solve today? What are you trying to solve tomorrow?Let us find a way of actually supporting you and invest into a mutual partnership and not just grab the money and run. We have extremely low churn for, I would say, pretty good reasons. Because this thing about our users, our customers being successful, we do take it extremely seriously.Corey: It's one of those areas that I just can't shake the feeling is underappreciated industry-wide. And the reason I say that this is your fingerprints on it is because if this had been wrong, you have a lot of… we'll call them idiosyncrasies, where there are certain things you absolutely will not stand for, and misleading people and tricking them into paying money is high on that list. One of the reasons we're friends. So yeah, but I say I see your fingerprints on this, it's yeah, if this hadn't been worked out the way that it is, you would not still be there. One other thing that I wanted to call out about, well, I guess it's a confluence of pricing and logging in the rest, I look at your free tier, and it offers up to 50 gigabytes of ingest a month.And it's easy for me to sit here and compare that to other services, other tools, and other logging stories, and then I have to stop and think for a minute that yeah, discs have gotten way bigger, and internet connections have gotten way faster, and even the logs have gotten way wordier. I still am not sure that most people can really contextualize just how much logging fits into 50 gigs of data. Do you have any, I guess, ballpark examples of what that looks like? Because it's been long enough since I've been playing in these waters that I can't really contextualize it anymore.Richard: Lord of the Rings is roughly five megabytes. It's actually less. So, we're talking literally 10,000 Lord of the Rings, which you can just shove in us and we're just storing this for you. Which also tells you that you're not going to be reading any of this. Or some of it, yes, but not all of it. You need better tooling and you need proper tooling.And some of this is more modern. Some of this is where we actually pushed the state of the art. But I'm also biased. But I, for myself, do claim that we did push the state of the art here. But at the same time you come back to those absolute fundamentals of how humans deal with data.If you look back basically as far as we have writing—literally 6000 years ago, is the oldest writing—humans have always dealt with information with the state of the world in very specific ways. A, is it important enough to even write it down, to even persist it in whatever persistence mechanisms I have at my disposal? If yes, write a detailed account or record a detailed account of whatever the thing is. But it turns out, this is expensive and it's not what you need. So, over time, you optimize towards only taking down key events and only noting key events. Maybe with their interconnections, but fundamentally, the key events.As your data grows, as you have more stuff, as this still is important to your business and keeps being more important to—or doesn't even need to be a business; can be social, can be whatever—whatever thing it is, it becomes expensive, again, to retain all of those key events. So, you turn them into numbers and you can do actual math on them. And that's this path which you've seen again, and again, and again, and again, throughout humanity's history. Literally, as long as we have written records, this has played out again, and again, and again, and again, for every single field which humans actually cared about. At different times, like, power networks are way ahead of this, but fundamentally power networks work on metrics, but for transient load spike, and everything, they have logs built into their power measurement devices, but those are only far in between. Of course, the main thing is just metrics, time-series. And you see this again, and again.You also were sysadmin in internet-related all switches have been metrics-based or metrics-first for basically forever, for 20, 30 years. But that stands to reason. Of course the internet is running at by roughly 20 years scale-wise in front of the cloud because obviously you need the internet because as you wouldn't be having a cloud. So, all of those growing pains why metrics are all of a sudden the thing, “Or have been for a few years now,” is basically, of course, people who were writing software, providing their own software services, hit the scaling limitations which you hit for Internet service providers two decades, three decades ago. But fundamentally, you have this complete system. Basically profiles or distributed tracing depending on how you view distributed tracing.You can also argue that distributed tracing is key events which are linked to each other. Logs sit firmly in the key event thing and then you turn this into numbers and that is metrics. And that's basically it. You have extremes at the and where you can have valid, depending on your circumstances, engineering trade-offs of where you invest the most, but fundamentally, that is why those always appear again in humanity's dealing with data, and observability is no different.Corey: I take a look at last month's AWS bill. Mine is pretty well optimized. It's a bit over 500 bucks. And right around 150 of that is various forms of logging and detecting change in the environment. And on the one hand, I sit here, and I think, “Oh, I should optimize that,” because the value of those logs to me is zero.Except that whenever I have to go in and diagnose something or respond to an incident or have some forensic exploration, they then are worth an awful lot. And I am prepared to pay 150 bucks a month for that because the potential value of having that when the time comes is going to be extraordinarily useful. And it basically just feels like a tax on top of what it is that I'm doing. The same thing happens with application observability where, yeah, when you just want the big substantial stuff, yeah, until you're trying to diagnose something. But in some cases, yeah, okay, then crank up the verbosity and then look for it.But if you're trying to figure it out after an event that isn't likely or hopefully won't recur, you're going to wish that you spent a little bit more on collecting data out of it. You're always going to be wrong, you're always going to be unhappy, on some level.Richard: Ish. You could absolutely be optimizing this. I mean, for $500, it's probably not worth your time unless you take it as an exercise, but outside of due diligence where you need specific logs tied to—or specific events tied to specific times, I would argue that a lot of the problems with logs is just dealing with it wrong. You have this one extreme of full-text indexing everything, and you have this other extreme of a data lake—which is just a euphemism of never looking at the data again—to keep storage vendors happy. There is an in between.Again, I'm biased, but like for example, with Loki, you have those same label sets as you have on your metrics with Prometheus, and you have literally the same, which means you only index that part and you only extract on ingestion time. If you don't have structured logs yet, only put the metadata about whatever you care about extracted and put it into your label set and store this, and that's the only thing you index. But it goes further than just this. You can also turn those logs into metrics.And to me this is a path of optimization. Where previously I logged this and that error. Okay, fine, but it's just a log line telling me it's HTTP 500. No one cares that this is at this precise time. Log levels are also basically an anti-pattern because they're just trying to deal with the amount of data which I have, and try and get a handle on this on that level whereas it would be much easier if I just counted every time I have an HTTP 500, I just up my counter by one. And again, and again, and again.And all of a sudden, I have literally—and I did the math on this—over 99.8% of the data which I have to store just goes away. It's just magic the way—and we're only talking about the first time I'm hitting this logline. The second time I'm hitting this logline is functionally free if I turn this into metrics. It becomes cheap enough that one of the mantras which I have, if you need to onboard your developers on modern observability, blah, blah, blah, blah, blah, the whole bells and whistles, usually people have logs, like that's what they have, unless they were from ISPs or power companies, or so; there they usually start with metrics.But most users, which I see both with my Grafana and with my Prometheus [unintelligible 00:38:46] tend to start with logs. They have issues with those logs because they're basically unstructured and useless and you need to first make them useful to some extent. But then you can leverage on this and instead of having a debug statement, just put a counter. Every single time you think, “Hey, maybe I should put a debug statement,” just put a counter instead. In two months time, see if it was worth it or if you delete that line and just remove that counter.It's so much cheaper, you can just throw this on and just have it run for a week or a month or whatever timeframe and done. But it goes beyond this because all of a sudden, if I can turn my logs into metrics properly, I can start rewriting my alerts on those metrics. I can actually persist those metrics and can more aggressively throw my logs away. But also, I have this transition made a lot easier where I don't have this huge lift, where this day in three months is to be cut over and we're going to release the new version of this and that software and it's not going to have that, it's going to have 80% less logs and everything will be great and then you missed the first maintenance window or someone is ill or what have you, and then the next Big Friday is coming so you can't actually deploy there. I mean Black Friday. But we can also talk about deploying on Fridays.But the thing is, you have this huge thing, whereas if you have this as a continuous improvement process, I can just look at, this is the log which is coming out. I turn this into a number, I start emitting metrics directly, and I see that those numbers match. And so, I can just start—I build new stuff, I put it into a new data format, I actually emit the new data format directly from my code instrumentation, and only then do I start removing the instrumentation for the logs. And that allows me to, with full confidence, with psychological safety, just move a lot more quickly, deliver much more quickly, and also cut down on my costs more quickly because I'm just using more efficient data types.Corey: I really want to thank you for spending as much time as you have. If people want to learn more about how you view the world and figure out what other personal attacks they can throw your way, where's the best place for them to find you?Richard: Personal attacks, probably Twitter. It's, like, the go-to place for this kind of thing. For actually tracking, I stopped maintaining my own website. Maybe I'll do again, but if you go on github.com/ritchieh/talks, you'll find a reasonably up-to-date list of all the talks, interviews, presentations, panels, what have you, which I did over the last whatever amount of time. [laugh].Corey: And we will, of course, put links to that in the [show notes 00:41:23]. Thanks again for your time. It's always appreciated.Richard: And thank you.Corey: Richard Hartmann, Director of Community at Grafana Labs. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment. And then when someone else comes along with an insulting comment they want to add, we'll just increment the counter by one.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Raising Awareness on Cloud-Native Threats with Michael Clark

Screaming in the Cloud

Play Episode Listen Later Oct 13, 2022 38:44


About MichaelMichael is the Director of Threat Research at Sysdig, managing a team of experts tasked with discovering and defending against novel security threats. Michael has more than 20 years of industry experience in many different roles, including incident response, threat intelligence, offensive security research, and software development at companies like Rapid7, ThreatQuotient, and Mantech. Prior to joining Sysdig, Michael worked as a Gartner analyst, advising enterprise clients on security operations topics.Links Referenced: Sysdig: https://sysdig.com/ “2022 Sysdig Cloud-Native Threat Report”: https://sysdig.com/threatreport TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Something interesting about this particular promoted guest episode that is brought to us by our friends at Sysdig is that when they reached out to set this up, one of the first things out of their mouth was, “We don't want to sell anything,” which is novel. And I said, “Tell me more,” because I was also slightly skeptical. But based upon the conversations that I've had, and what I've seen, they were being honest. So, my guest today—surprising as though it may be—is Mike Clark, Director of Threat Research at Sysdig. Mike, how are you doing?Michael: I'm doing great. Thanks for having me. How are you doing?Corey: Not dead yet. So, we take what we can get sometimes. You folks have just come out with the “2022 Sysdig Cloud-Native Threat Report”, which on one hand, it feels like it's kind of a wordy title, on the other it actually encompasses everything that it is, and you need every single word of that report. At a very high level, what is that thing?Michael: Sure. So, this is our first threat report we've ever done, and it's kind of a rite of passage, I think for any security company in the space; you have to have a threat report. And the cloud-native part, Sysdig specializes in cloud and containers, so we really wanted to focus in on those areas when we were making this threat report, which talks about, you know, some of the common threats and attacks we were seeing over the past year, and we just wanted to let people know what they are and how they protect themselves.Corey: One thing that I've found about a variety of threat reports is that they tend to excel at living in the fear, uncertainty, and doubt space. And invariably, they paint a very dire picture of the internet about become cascading down. And then at the end, there's always a, “But there is hope. Click here to set up a meeting with us.” It's basically a very thinly- veiled cover around what is fundamentally a fear, uncertainty, and doubt-driven marketing strategy, and then it tries to turn into a sales pitch.This does absolutely none of that. So, I have to ask, did you set out to intentionally make something that added value in that way and have contributed to the body of knowledge, or is it because it's your inaugural report; you didn't realize you were supposed to turn it into a terrible sales pitch.Michael: We definitely went into that on purpose. There's a lot of ways to fix things, especially these days with all the different technologies, so we can easily talk about the solutions without going into specific products. And that's kind of way we went about it. There's a lot of ways to fix each of the things we mentioned in the report. And hopefully, the person reading it finds a good way to do it.Corey: I'd like to unpack a fair bit of what's in the report. And let's be clear, I don't intend to read this report into a microphone; that is generally not a great way of conveying information that I have found. But I want to highlight a few things that leapt out to me that I find interesting. Before I do that, I'm curious to know, most people who write reports, especially ones of this quality, are not sitting there cogitating in their office by themselves, and they set pen to paper and emerge four days later with the finished treatise. There's a team involved, there's more than one person that weighs in. Who was behind this?Michael: Yeah, it was a pretty big team effort across several departments. But mostly, it came to the Sysdig threat research team. It's about ten people right now. It's grown quite a bit through the past year. And, you know, it's made up of all sorts of backgrounds and expertise.So, we have machine learning people, data scientists, data engineers, former pen-testers and red team, a lot of blue team people, people from the NSA, people from other government agencies as well. And we're also a global research team, so we have people in Europe and North America working on all of this. So, we try to get perspectives on how these threats are viewed by multiple areas, not just Silicon Valley, and express fixes that appeal to them, too.Corey: Your executive summary on this report starts off with a cloud adversary analysis of TeamTNT. And my initial throwaway joke on that, it was going to be, “Oh, when you start off talking about any entity that isn't you folks, they must have gotten the platinum sponsorship package.” But then I read the rest of that paragraph and I realized that wait a minute, this is actually interesting and germane to something that I see an awful lot. Specifically, they are—and please correct me if I'm wrong on any of this; you are definitionally the expert whereas I am, obviously the peanut gallery—but you talk about TeamTNT as being a threat actor that focuses on targeting the cloud via cryptojacking, which is a fanciful word for, “Okay, I've gotten access to your cloud environment; what am I going to do with it? Mine Bitcoin and other various cryptocurrencies.” Is that generally accurate or have I missed the boat somewhere fierce on that? Which is entirely possible.Michael: That's pretty accurate. We also think it just one person, actually, and they are very prolific. So, they were pretty hard to get that platinum support package because they are everywhere. And even though it's one person, they can do a lot of damage, especially with all the automation people can make now, one person can appear like a dozen.Corey: There was an old t-shirt that basically encompassed everything that was wrong with the culture of the sysadmin world back in the naughts, that said, “Go away, or I will replace you with a very small shell script.” But, on some level, you can get a surprising amount of work done on computers, just with things like for loops and whatnot. What I found interesting was that you have put numbers and data behind something that I've always taken for granted and just implicitly assumed that everyone knew. This is a common failure mode that we all have. We all have blind spots where we assume the things that we spend our time on is easy and the stuff that other people are good at and you're not good at, those are the hard things.It has always been intuitively obvious to me as a cloud economist, that when you wind up spending $10,000 in cloud resources to mine cryptocurrency, it does not generate $10,000 of cryptocurrency on the other end. In fact, the line I've been using for years is that it's totally economical to mine Bitcoin in the cloud; the only trick is you have to do it in someone else's account. And you've taken that joke and turned it into data. Something that you found was that in one case, that you were able to attribute $8,100 of cryptocurrency that were generated by stealing $430,000 of cloud resources to do it. And oh, my God, we now have a number and a ratio, and I can talk intelligently and sound four times smarter. So, ignoring anything else in this entire report, congratulations, you have successfully turned this into what is beginning to become a talking point of mine. Value unlocked. Good work. Tell me more.Michael: Oh, thank you. Cryptomining is kind of like viruses in the old on-prem environment. Normally it just cleaned up and never thought of again; the antivirus software does its thing, life goes on. And I think cryptominers are kind of treated like that. Oh, there's a miner; let's rebuild the instance or bring a new container online or something like that.So, it's often considered a nuisance rather than a serious threat. It also doesn't have the, you know, the dangerous ransomware connotation to it. So, a lot of people generally just think of as a nuisance, as I said. So, what we wanted to show was, it's not really a nuisance and it can cost you a lot of money if you don't take it seriously. And what we found was for every dollar that they make, it costs you $53. And, you know, as you mentioned, it really puts it into view of what it could cost you by not taking it seriously. And that number can scale very quickly, just like your cloud environment can scale very quickly.Corey: They say this cloud scales infinitely and that is not true. First, tried it; didn't work. Secondly, it scales, but there is an inherent limit, which is your budget, on some level. I promise they can add hard drives to S3 faster than you can stuff data into it. I've checked.One thing that I've seen recently was—speaking of S3—I had someone reach out in what I will charitably refer to as a blind panic because they were using AWS to do something. Their bill was largely $4 a month in S3 charges. Very reasonable. That carries us surprisingly far. And then they had a credential leak and they had a threat actor spin up all the Lambda functions in all of the regions, and it went from $4 a month to $60,000 a day and it wasn't caught for six days.And then AWS as they tend to do, very straight-faced, says, “Yeah, we would like our $360,000, please.” At which point, people start panicking because a lot of the people who experience this are not themselves sophisticated customers; they're students, they're learning how this stuff works. And when I'm paying $4 a month for something, it is logical and intuitive for me to think that, well, if I wind up being sloppy with their credentials, they could run that bill up to possibly $25 a month and that wouldn't be great, so I should keep an eye on it. Yeah, you dropped a whole bunch of zeros off the end of that. Here you go. And as AWS spins up more and more regions and as they spin up more and more services, the ability to exploit this becomes greater and greater. This problem is not getting better, it is only getting worse, by a lot.Michael: Oh, yeah, absolutely. And I feel really bad for those students who do have that happen to them. I've heard on occasion that the cloud providers will forgive some debts, but there's no guarantee of that happening, from breaches. And you know, the more that breaches happen, the less likely they are going to forgive it because they still to pay for it; someone's paying for it in the end. And if you don't improve and fix your environment and it keeps happening, one day, they're just going to stick you with the bill.Corey: To my understanding, they've always done the right thing when I've highlighted something to them. I don't have intimate visibility into it and of course, they have a threat model themselves of, okay, I'm going to spin up a bunch of stuff, mine cryptocurrency for a month—cry and scream and pretend I got hacked because fraud is very much a thing, there is a financial incentive attached to this—and they mostly seem to get it right. But the danger that I see for the cloud provider is not that they're going to stop being nice and giving money away, but assume you're a student who just winds up getting more than your entire college tuition as a surprise bill for this month from a cloud provider. Even assuming at the end of that everything gets wiped and you don't owe anything. I don't know about you, but I've never used that cloud provider again because I've just gotten a firsthand lesson in exactly what those risks are, it's bad for the brand.Michael: Yeah, it really does scare people off of that. Now, some cloud providers try to offer more proactive protections against this, try to shut down instances really quick. And you know, you can take advantage of limits and other things, but they don't make that really easy to do. And setting those up is critical for everybody.Corey: The one cloud provider that I've seen get this right, of all things, has been Oracle Cloud, where they have an always free tier. Until you affirmatively upgrade your account to chargeable, they will not charge you a penny. And I have experimented with this extensively, and they're right, they will not charge you a penny. They do have warnings plastered on the site, as they should, that until you upgrade your account, do understand that if you exceed a threshold, we will stop serving traffic, we will stop servicing your workload. And yeah, for a student learner, that's absolutely what I want. For a big enterprise gearing up for a giant Superbowl commercial or whatnot, it's, “Yeah, don't care what it costs, just make sure you continue serving traffic. We don't get a redo on this.” And without understanding exactly which profile of given customer falls into, whenever the cloud provider tries to make an assumption and a default in either direction, they're wrong.Michael: Yeah, I'm surprised that Oracle Cloud of all clouds. It's good to hear that they actually have a free tier. Now, we've seen attackers have used free tiers quite a bit. It all depends on how people set it up. And it's actually a little outside the threat report, but the CI/CD pipelines in DevOps, anywhere there's free compute, attackers will try to get their miners in because it's all about scale and not quality.Corey: Well, that is something I'd be curious to know. Because you talk about focusing specifically on cloud and containers as a company, which puts you in a position to be authoritative on this. That Lambda story that I mentioned about, surprise $60,000 a day in cryptomining, what struck me about that and caught me by surprise was not what I think would catch most people who didn't swim in this world by surprise of, “You can spend that much?” In my case, what I'm wondering about is, well hang on a minute. I did an article a year or two ago, “17 Ways to Run Containers On AWS” and listed 17 AWS services that you could use to run containers.And a few months later, I wrote another article called “17 More Ways to Run Containers On AWS.” And people thought I was belaboring the point and making a silly joke, and on some level, of course I was. But I was also highlighting very clearly that every one of those containers running in a service could be mining cryptocurrency. So, if you get access to someone else's AWS account, when you see those breaches happen, are people using just the one or two services they have things ready to go for, or are they proliferating as many containers as they can through every service that borderline supports it?Michael: From what we've seen, they usually just go after a compute, like EC2 for example, as it's most well understood, it gets the job done, it's very easy to use, and then get your miner set up. So, if they happen to compromise your credentials versus the other method that cryptominers or cryptojackers do is exploitation, then they'll try to spread throughout their all their EC2 they can and spin up as much as they can. But the other interesting thing is if they get into your system, maybe via an exploit or some other misconfiguration, they'll look for the IAM metadata service as soon as they get in, to try to get your IAM credentials and see if they can leverage them to also spin up things through the API. So, they'll spin up on the thing they compromised and then actively look for other ways to get even more.Corey: Restricting the permissions that anything has in your cloud environment is important. I mean, from my perspective, if I were to have my account breached, yes, they're going to cost me a giant pile of money, but I know the magic incantations to say to AWS and worst case, everyone has a pet or something they don't want to see unfortunate things happen to, so they'll waive my fee; that's fine. The bigger concern I've got—in seriousness—I think most companies do is the data. It is the access to things in the account. In my case, I have a number of my clients' AWS bills, given that that is what they pay me to work on.And I'm not trying to undersell the value of security here, but on the plus side that helps me sleep at night, that's only money. There are datasets that are far more damaging and valuable about that. The worst sleep I ever had in my career came during a very brief stint I had about 12 years ago when I was the director of TechOps at Grindr, the gay dating site. At that scenario, if that data had been breached, people could very well have died. They live in countries where that winds up not being something that is allowed, or their family now winds up shunning them and whatnot. And that's the stuff that keeps me up at night. Compared to that, it's, “Well, you cost us some money and embarrassed a company.” It doesn't really rank on the same scale to me.Michael: Yeah. I guess the interesting part is, data requires a lot of work to do something with for a lot of attackers. Like, it may be opportunistic and come across interesting data, but they need to do something with it, there's a lot more risk once they start trying to sell the data, or like you said, if it turns into something very unfortunate, then there's a lot more risk from law enforcement coming after them. Whereas with cryptomining, there's very little risk from being chased down by the authorities. Like you said, people, they rebuild things and ask AWS for credit, or whoever, and move on with their lives. So, that's one reason I think cryptomining is so popular among threat actors right now. It's just the low risk compared to other ways of doing things.Corey: It feels like it's a nuisance. One thing that I was dreading when I got this copy of the report was that there was going to be what I see so often, which is let's talk about ransomware in the cloud, where people talk about encrypting data in S3 buckets and sneakily polluting the backups that go into different accounts and how your air -gapping and the rest. And I don't see that in the wild. I see that in the fear-driven marketing from companies that have a thing that they say will fix that, but in practice, when you hear about ransomware attacks, it's much more frequently that it is their corporate network, it is on-premises environments, it is servers, perhaps running in AWS, but they're being treated like servers would be on-prem, and that is what winds up getting encrypted. I just don't see the attacks that everyone is warning about. But again, I am not primarily in the security space. What do you see in that area?Michael: You're absolutely right. Like we don't see that at all, either. It's certainly theoretically possible and it may have happened, but there just doesn't seem to be that appetite to do that. Now, the reasoning? I'm not a hundred percent sure why, but I think it's easier to make money with cryptomining, even with the crypto markets the way they are. It's essentially free money, no expenses on your part.So, maybe they're not looking because again, that requires more effort to understand especially if it's not targeted—what data is important. And then it's not exactly the same method to do the attack. There's versioning, there's all this other hoops you have to jump through to do an extortion attack with buckets and things like that.Corey: Oh, it's high risk and feels dirty, too. Whereas if you're just, I guess, on some level, psychologically, if you're just going to spin up a bunch of coin mining somewhere and then some company finds it and turns it off, whatever. You're not, as in some cases, shaking down a children's hospital. Like that's one of those great, I can't imagine how you deal with that as a human being, but I guess it takes all types. This doesn't get us to sort of the second tentpole of the report that you've put together, specifically around the idea of supply chain attacks against containers. There have been such a tremendous number of think pieces—thought pieces, whatever they're called these days—talking about a software bill of materials and supply chain threats. Break it down for me. What are you seeing?Michael: Sure. So, containers are very fun because, you know, you can define things as code about what gets put on it, and they become so popular that sharing sites have popped up, like Docker Hub and other public registries, where you can easily share your container, it has everything built, set up, so other people can use it. But you know, attackers have kind of taken notice of this, too. Where anything's easy, an attacker will be. So, we've seen a lot of malicious containers be uploaded to these systems.A lot of times, they're just hoping for a developer or user to come along and use them because your Docker Hub does have the official designation, so while they can try to pretend to be like Ubuntu, they won't be the official. But instead, they may try to see theirs and links and things like that to entice people to use theirs instead. And then when they do, it's already pre-loaded with a miner or, you know, other malware. So, we see quite a bit of these containers in Docker Hub. And they're disguised as many different popular packages.They don't stand up to too much scrutiny, but enough that, you know, a casual looker, even Docker file may not see it. So yeah, we see a lot of—and embedded credentials and other big part that we see in these containers. That could be an organizational issue, like just a leaked credential, but you can put malicious credentials into Docker files, to0, like, say an SSH private key that, you know, if they start this up, the attacker can now just log—SSH in. Or other API keys or other AWS changing commands you can put in there. You can put really anything in there, and wherever you load it, it's going to run. So, you have to be really careful.[midroll 00:22:15]Corey: Years ago, I gave a talk at the conference circuit called, “Terrible Ideas in Git” that purported to teach people how to get worked through hilarious examples of misadventure. And the demos that I did on that were, well, this was fun and great, but it was really annoying resetting them every time I gave the talk, so I stuffed them all into a Docker image and then pushed that up to Docker Hub. Great. It was awesome. I didn't publicize it and talk about it, but I also just left it as an open repository there because what are you going to do? It's just a few directories in the route that have very specific contrived scenarios with Git, set up and ready to go.There's nothing sensitive there. And the thing is called, “Terrible Ideas.” And I just kept watching the download numbers continue to increment week over week, and I took it down because it's, I don't know what people are going to do with that. Like, you see something on there and it says, “Terrible Ideas.” For all I know, some bank is like, “And that's what we're running in production now.” So, who knows?But the idea o—not that there was necessarily anything wrong with that, but the fact that there's this theoretical possibility someone could use that or put the wrong string in if I give an example, and then wind up running something that is fairly compromisable in a serious environment was just something I didn't want to be a part of. And you see that again, and again, and again. This idea of what Docker unlocks is amazing, but there's such a tremendous risk to it. I mean, I've never understood 15 years ago, how you're going to go and spin up a Linux server on top of EC2 and just grab a community AMI and use that. It's yeah, I used to take provisioning hardware very seriously to make sure that I wasn't inadvertently using something compromised. Here, it's like, “Oh, just grab whatever seems plausible from the catalog and go ahead and run that.” But it feels like there's so much of that, turtles all the way down.Michael: Yeah. And I mean, even if you've looked at the Docker file, with all the dependencies of the things you download, it really gets to be difficult. So, I mean, to protect yourself, it really becomes about, like, you know, you can do the static scanning of it, looking for bad strings in it or bad version numbers for vulnerabilities, but it really comes down to runtime analysis. So, when you start to Docker container, you really need the tools to have visibility to what's going on in the container. That's the only real way to know if it's safe or not in the end because you can't eyeball it and really see all that, and there could be a binary assortment of layers, too, that'll get run and things like that.Corey: Hell is other people's workflows, as I'm sure everyone's experienced themselves, but one of mine has always been that if I'm doing something as a proof of concept to build it up on a developer box—and I do keep my developer environments for these sorts of things isolated—I will absolutely go and grab something that is plausible- looking from Docker Hub as I go down that process. But when it comes time to wind up putting it into a production environment, okay, now we're going to build our own resources. Yeah, I'm sure the Postgres container or whatever it is that you're using is probably fine, but just so I can sleep at night, I'm going to take the public Docker file they have, and I'm going to go ahead and build that myself. And I feel better about doing that rather than trusting some rando user out there and whatever it is that they've put up there. Which on the one hand feels like a somewhat responsible thing to do, but on the other, it feels like I'm only fooling myself because some rando putting things up there is kind of what the entire open-source world is, to a point.Michael: Yeah, that's very true. At some point, you have to trust some product or some foundation to have done the right thing. But what's also true about containers is they're attacked and use for attacks, but they're also used to conduct attacks quite a bit. And we saw a lot of that with the Russian-Ukrainian conflict this year. Containers were released that were preloaded with denial-of-service software that automatically collected target lists from, I think, GitHub they were hosted on.So, all a user to get involved had to do was really just get the container and run it. That's it. And now they're participating in this cyberwar kind of activity. And they could also use this to put on a botnet or if they compromise an organization, they could spin up at all these instances with that Docker container on it. And now that company is implicated in that cyber war. So, they can also be used for evil.Corey: This gets to the third point of your report: “Geopolitical conflict influences attacker behaviors.” Something that happened in the early days of the Russian invasion was that a bunch of open-source maintainers would wind up either disabling what their software did or subverting it into something actively harmful if it detected it was running in the Russian language and/or in a Russian timezone. And I understand the desire to do that, truly I do. I am no Russian apologist. Let's be clear.But the counterpoint to that as well is that, well, to make a reference I made earlier, Russia has children's hospitals, too, and you don't necessarily know the impact of fallout like that, not to mention that you have completely made it untenable to use anything you're doing for a regulated industry or anyone else who gets caught in that and discovers that is now in their production environment. It really sets a lot of stuff back. I've never been a believer in that particular form of vigilantism, for lack of a better term. I'm not sure that I have a better answer, let's be clear. I just, I always knew that, on some level, the risk of opening that Pandora's box were significant.Michael: Yeah. Even if you're doing it for the right reasons. It still erodes trust.Corey: Yeah.Michael: Especially it erodes trust throughout open-source. Like, not just the one project because you'll start thinking, “Oh, how many other projects might do this?” And—Corey: Wait, maybe those dirty hippies did something in our—like, I don't know, they've let those people anywhere near this operating system Linux thing that we use? I don't think they would have done that. Red Hat seems trustworthy and reliable. And it's yo, [laugh] someone needs to crack open a history book, on some level. It's a sticky situation.I do want to call out something here that it might be easy to get the wrong idea from the summary that we just gave. Very few things wind up raising my hackles quite like companies using tragedy to wind up shilling whatever it is they're trying to sell. And I'll admit when I first got this report, and I saw, “Oh, you're talking about geopolitical conflict, great.” I'm not super proud of this, but I was prepared to read you the riot act, more or less when I inevitably got to that. And I never did. Nothing in this entire report even hints in that direction.Michael: Was it you never got to it, or, uh—Corey: Oh, no. I've read the whole thing, let's be clear. You're not using that to sell things in the way that I was afraid you were. And simultaneously I want to say—I want to just point that out because that is laudable. At the same time, I am deeply and bitterly resentful that that even is laudable. That should be the common state.Capitalizing on tragedy is just not something that ever leaves any customer feeling good about one of their vendors, and you've stayed away from that. I just want to call that out is doing the right thing.Michael: Thank you. Yeah, it was actually a big topic about how we should broach this. But we have a good data point on right after it started, there was a huge spike in denial-of-service installs. And that we have a bunch of data collection technology, honeypots and other things, and we saw the day after cryptomining started going down and denial-of-service installs started going up. So, it was just interesting how that community changed their behaviors, at least for a time, to participate in whatever you want to call it, the hacktivism.Over time, though, it kind of has gone back to the norm where maybe they've gotten bored or something or, you know, run out of funds, but they're starting cryptomining again. But these events can cause big changes in the hacktivism community. And like I mentioned, it's very easy to get involved. We saw over 150,000 downloads of those pre-canned denial-of-service containers, so it's definitely something that a lot of people participated in.Corey: It's a truism that war drives innovation and different ways of thinking about things. It's a driver of progress, which says something deeply troubling about us. But it's also clear that it serves as a driver for change, even in this space, where we start to see different applications of things, we see different threat patterns start to emerge. And one thing I do want to call out here that I think often gets overlooked in the larger ecosystem and industry as a whole is, “Well, no one's going to bother to hack my nonsense. I don't have anything interesting for them to look at.”And it's, on some level, an awful lot of people running tools like this aren't sophisticated enough themselves to determine that. And combined with your first point in the report as well that, well, you have an AWS account, don't you? Congratulations. You suddenly have enormous piles of money—from their perspective—sitting there relatively unguarded. Yay. Security has now become everyone's problem, once again.Michael: Right. And it's just easier now. It means, it was always everyone's problem, but now it's even easier for attackers to leverage almost everybody. Like before, you had to get something on your PC. You had to download something. Now, your search of GitHub can find API keys, and then that's it, you know? Things like that will make it game over or your account gets compromised and big bills get run up. And yeah, it's very easy for all that to happen.Corey: Ugh. I do want to ask at some point, and I know you asked me not to do it, but I'm going to do it anyway because I have this sneaking suspicion that given that you've spent this much time on studying this problem space, that you probably, as a company, have some answers around how to address the pain that lives in these problems. What exactly, at a high level, is it that Sysdig does? Like, how would you describe that in an elevator without sabotaging the elevator for 45 minutes to explain it in depth to someone?Michael: So, I would describe it as threat detection and response for cloud containers and workloads in general. And all the other kind of acronyms for cloud, like CSPM, CIEM.Corey: They're inventing new and exciting acronyms all the time. And I honestly at this point, I want to have almost an acronym challenge of, “Is this a cybersecurity acronym or is it an audio cable? Which is it?” Because it winds up going down that path, super easily. I was at RSA walking the expo floor and I had I think 15 different companies I counted pitching XDR, without a single one bothering to explain what that meant. Okay, I guess it's just the thing we've all decided we need. It feels like security people selling to security people, on some level.Michael: I was a Gartner analyst.Corey: Yeah. Oh… that would do it then. Terrific. So, it's partially your fault, then?Michael: No. I was going to say, don't know what it means either.Corey: Yeah.Michael: So, I have no idea [laugh]. I couldn't tell you.Corey: I'm only half kidding when I say in many cases, from the vendor perspective, it seems like what it means is whatever it is they're trying to shoehorn the thing that they built into filling. It's kind of like observability. Observability means what we've been doing for ten years already, just repurposed to catch the next hype wave.Michael: Yeah. The only thing I really understand is: detection and response is a very clear detect things and respond to things. So, that's a lot of what we do.Corey: It's got to beat the default detection mechanism for an awful lot of companies who in years past have found out that they have gotten breached in the headline of The New York Times. Like it's always fun when that, “Wait, what? What? That's u—what? How did we not know this was coming?”It's when a third party tells you that you've been breached, it's never as positive—not that it's a positive experience anyway—than discovering yourself internally. And this stuff is complicated, the entire space is fraught, and it always feels like no matter how far you go, you could always go further, but left to its inevitable conclusion, you'll burn through the entire company budget purely on security without advancing the other things that company does.Michael: Yeah.Corey: It's a balance.Michael: It's tough because it's a lot to know in the security discipline, so you have to balance how much you're spending and how much your people actually know and can use the things you've spent money on.Corey: I really want to thank you for taking the time to go through the findings of the report for me. I had skimmed it before we spoke, but talking to you about this in significantly more depth, every time I start going to cite something from it, I find myself coming away more impressed. This is now actively going on my calendar to see what the 2023 version looks like. Congratulations, you've gotten me hooked. If people want to download a copy of the report for themselves, where should they go to do that?Michael: They could just go to sysdig.com/threatreport. There's no email blocking or gating, so you just download it.Corey: I'm sure someone in your marketing team is twitching at that. Like, why can't we wind up using this as a lead magnet? But ugh. I look at this and my default is, oh, wow, you definitely understand your target market. Because we all hate that stuff. Every mandatory field you put on those things makes it less likely I'm going to download something here. Click it and have a copy that's awesome.Michael: Yep. And thank you for having me. It's a lot of fun.Corey: No, thank you for coming. Thanks for taking so much time to go through this, and thanks for keeping it to the high road, which I did not expect to discover because no one ever seems to. Thanks again for your time. I really appreciate it.Michael: Thanks. Have a great day.Corey: Mike Clark, Director of Threat Research at Sysdig. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment pointing out that I didn't disclose the biggest security risk at all to your AWS bill, an AWS Solutions Architect who is working on commission.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Dynamic Configuration Through AWS AppConfig with Steve Rice

Screaming in the Cloud

Play Episode Listen Later Oct 11, 2022 35:54


About Steve:Steve Rice is Principal Product Manager for AWS AppConfig. He is surprisingly passionate about feature flags and continuous configuration. He lives in the Washington DC area with his wife, 3 kids, and 2 incontinent dogs.Links Referenced:AWS AppConfig: https://go.aws/awsappconfig TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at AWS AppConfig. Engineers love to solve, and occasionally create, problems. But not when it's an on-call fire-drill at 4 in the morning. Software problems should drive innovation and collaboration, NOT stress, and sleeplessness, and threats of violence. That's why so many developers are realizing the value of AWS AppConfig Feature Flags. Feature Flags let developers push code to production, but hide that that feature from customers so that the developers can release their feature when it's ready. This practice allows for safe, fast, and convenient software development. You can seamlessly incorporate AppConfig Feature Flags into your AWS or cloud environment and ship your Features with excitement, not trepidation and fear. To get started, go to snark.cloud/appconfig. That's snark.cloud/appconfig.Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With tail scale, ssh, you can do exactly that. Tail scale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate.S. Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built in key rotation permissions is code connectivity between any two devices, reduce latency and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?Nope, I wish it were. tail scales. Completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscaleCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This is a promoted guest episode. What does that mean? Well, it means that some people don't just want me to sit here and throw slings and arrows their way, they would prefer to send me a guest specifically, and they do pay for that privilege, which I appreciate. Paying me is absolutely a behavior I wish to endorse.Today's victim who has decided to contribute to slash sponsor my ongoing ridiculous nonsense is, of all companies, AWS. And today I'm talking to Steve Rice, who's the principal product manager on AWS AppConfig. Steve, thank you for joining me.Steve: Hey, Corey, great to see you. Thanks for having me. Looking forward to a conversation.Corey: As am I. Now, AppConfig does something super interesting, which I'm not aware of any other service or sub-service doing. You are under the umbrella of AWS Systems Manager, but you're not going to market with Systems Manager AppConfig. You're just AWS AppConfig. Why?Steve: So, AppConfig is part of AWS Systems Manager. Systems Manager has, I think, 17 different features associated with it. Some of them have an individual name that is associated with Systems Manager, some of them don't. We just happen to be one that doesn't. AppConfig is a service that's been around for a while internally before it was launched externally a couple years ago, so I'd say that's probably the origin of the name and the service. I can tell you more about the origin of the service if you're curious.Corey: Oh, I absolutely am. But I just want to take a bit of a detour here and point out that I make fun of the sub-service names in Systems Manager an awful lot, like Systems Manager Session Manager and Systems Manager Change Manager. And part of the reason I do that is not just because it's funny, but because almost everything I found so far within the Systems Manager umbrella is pretty awesome. It aligns with how I tend to think about the world in a bunch of different ways. I have yet to see anything lurking within the Systems Manager umbrella that has led to a tee-hee-hee bill surprise level that rivals, you know, the GDP of Guam. So, I'm a big fan of the entire suite of services. But yes, how did AppConfig get its name?Steve: [laugh]. So, AppConfig started about six years ago, now, internally. So, we actually were part of the region services department inside of Amazon, which is in charge of launching new services around the world. We found that a centralized tool for configuration associated with each service launching was really helpful. So, a service might be launching in a new region and have to enable and disable things as it moved along.And so, the tool was sort of built for that, turning on and off things as the region developed and was ready to launch publicly; then the regions launch publicly. It turned out that our internal customers, which are a lot of AWS services and then some Amazon services as well, started to use us beyond launching new regions, and started to use us for feature flagging. Again, turning on and off capabilities, launching things safely. And so, it became massively popular; we were actually a top 30 service internally in terms of usage. And two years ago, we thought we really should launch this externally and let our customers benefit from some of the goodness that we put in there, and some of—those all come from the mistakes we've made internally. And so, it became AppConfig. In terms of the name itself, we specialize in application configuration, so that's kind of a mouthful, so we just changed it to AppConfig.Corey: Earlier this year, there was a vulnerability reported around I believe it was AWS Glue, but please don't quote me on that. And as part of its excellent response that AWS put out, they said that from the time that it was disclosed to them, they had patched the service and rolled it out to every AWS region in which Glue existed in a little under 29 hours, which at scale is absolutely magic fast. That is superhero speed and then some because you generally don't just throw something over the wall, regardless of how small it is when we're talking about something at the scale of AWS. I mean, look at who your customers are; mistakes will show. This also got me thinking that when you have Adam, or previously Andy, on stage giving a keynote announcement and then they mention something on stage, like, “Congratulations. It's now a very complicated service with 14 adjectives in his name because someone's paid by the syllable. Great.”Suddenly, the marketing pages are up, the APIs are working, it's showing up in the console, and it occurs to me only somewhat recently to think about all of the moving parts that go on behind this. That is far faster than even the improved speed of CloudFront distribution updates. There's very clearly something going on there. So, I've got to ask, is that you?Steve: Yes, a lot of that is us. I can't take credit for a hundred percent of what you're talking about, but that's how we are used. We're essentially used as a feature-flagging service. And I can talk generically about feature flagging. Feature flagging allows you to push code out to production, but it's hidden behind a configuration switch: a feature toggle or a feature flag. And that code can be sitting out there, nobody can access it until somebody flips that toggle. Now, the smart way to do it is to flip that toggle on for a small set of users. Maybe it's just internal users, maybe it's 1% of your users. And so, the features available, you can—Corey: It's your best slash worst customers [laugh] in that 1%, in some cases.Steve: Yeah, you want to stress test the system with them and you want to be able to look and see what's going to break before it breaks for everybody. So, you release us to a small cohort, you measure your operations, you measure your application health, you measure your reputational concerns, and then if everything goes well, then you maybe bump it up to 2%, and then 10%, and then 20%. So, feature flags allow you to slowly release features, and you know what you're releasing by the time it's at a hundred percent. It's tempting for teams to want to, like, have everybody access it at the same time; you've been working hard on this feature for a long time. But again, that's kind of an anti-pattern. You want to make sure that on production, it behaves the way you expect it to behave.Corey: I have to ask what is the fundamental difference between feature flags and/or dynamic configuration. Because to my mind, one of them is a means of achieving the other, but I could also see very easily using the terms interchangeably. Given that in some of our conversations, you have corrected me which, first, how dare you? Secondly, okay, there's probably a reason here. What is that point of distinction?Steve: Yeah. Typically for those that are not eat, sleep, and breathing dynamic configuration—which I do—and most people are not obsessed with this kind of thing, feature flags is kind of a shorthand for dynamic configuration. It allows you to turn on and off things without pushing out any new code. So, your application code's running, it's pulling its configuration data, say every five seconds, every ten seconds, something like that, and when that configuration data changes, then that app changes its behavior, again, without a code push or without restarting the app.So, dynamic configuration is maybe a superset of feature flags. Typically, when people think feature flags, they're thinking of, “Oh, I'm going to release a new feature, so it's almost like an on-off switch.” But we see customers using feature flags—and we use this internally—for things like throttling limits. Let's say you want to be able to throttle TPS transactions per second. Or let's say you want to throttle the number of simultaneous background tasks, and say, you know, I just really don't want this creeping above 50; bad things can start to happen.But in a period of stress, you might want to actually bring that number down. Well, you can push out these changes with dynamic configuration—which is, again, any type of configuration, not just an on-off switch—you can push this out and adjust the behavior and see what happens. Again, I'd recommend pushing it out to 1% of your users, and then 10%. But it allows you to have these dials and switches to do that. And, again, generically, that's dynamic configuration. It's not as fun to term as feature flags; feature flags is sort of a good mental picture, so I do use them interchangeably, but if you're really into the whole world of this dynamic configuration, then you probably will care about the difference.Corey: Which makes a fair bit of sense. It's the question of what are you talking about high level versus what are you talking about implementation detail-wise.Steve: Yep. Yep.Corey: And on some level, I used to get… well, we'll call it angsty—because I can't think of a better adjective right now—about how AWS was reluctant to disclose implementation details behind what it did. And in the fullness of time, it's made a lot more sense to me, specifically through a lens of, you want to be able to have the freedom to change how something works under the hood. And if you've made no particular guarantee about the implementation detail, you can do that without potentially worrying about breaking a whole bunch of customer expectations that you've inadvertently set. And that makes an awful lot of sense.The idea of rolling out changes to your infrastructure has evolved over the last decade. Once upon a time you'd have EC2 instances, and great, you want to go ahead and make a change there—or this actually predates EC2 instances. Virtual machines in a data center or heaven forbid, bare metal servers, you're not going to deploy a whole new server because there's a new version of the code out, so you separate out your infrastructure from the code that it runs. And that worked out well. And increasingly, we started to see ways of okay, if we want to change the behavior of the application, we'll just push out new environment variables to that thing and restart the service so it winds up consuming those.And that's great. You've rolled it out throughout your fleet. With containers, which is sort of the next logical step, well, okay, this stuff gets baked in, we'll just restart containers with a new version of code because that takes less than a second each and you're fine. And then Lambda functions, it's okay, we'll just change the deployment option and the next invocation will wind up taking the brand new environment variables passed out to it. How do feature flags feature into those, I guess, three evolving methods of running applications in anger, by which I mean, of course, production?Steve: [laugh]. Good question. And I think you really articulated that well.Corey: Well, thank you. I should hope so. I'm a storyteller. At least I fancy myself one.Steve: [laugh]. Yes, you are. Really what you talked about is the evolution of you know, at the beginning, people were—well, first of all, people probably were embedding their variables deep in their code and then they realized, “Oh, I want to change this,” and now you have to find where in my code that is. And so, it became a pattern. Why don't we separate everything that's a configuration data into its own file? But it'll get compiled at build time and sent out all at once.There was kind of this breakthrough that was, why don't we actually separate out the deployment of this? We can separate the deployment from code from the deployment of configuration data, and have the code be reading that configuration data on a regular interval, as I already said. So now, as the environments have changed—like you said, containers and Lambda—that ability to make tweaks at microsecond intervals is more important and more powerful. So, there certainly is still value in having things like environment variables that get read at startup. We call that static configuration as opposed to dynamic configuration.And that's a very important element in the world of containers that you talked about. Containers are a bit ephemeral, and so they kind of come and go, and you can restart things, or you might spin up new containers that are slightly different config and have them operate in a certain way. And again, Lambda takes that to the next level. I'm really excited where people are going to take feature flags to the next level because already today we have people just fine-tuning to very targeted small subsets, different configuration data, different feature flag data, and allows them to do this like at we've never seen before scale of turning this on, seeing how it reacts, seeing how the application behaves, and then being able to roll that out to all of your audience.Now, you got to be careful, you really don't want to have completely different configurations out there and have 10 different, or you know, 100 different configurations out there. That makes it really tough to debug. So, you want to think of this as I want to roll this out gradually over time, but eventually, you want to have this sort of state where everything is somewhat consistent.Corey: That, on some level, speaks to a level of operational maturity that my current deployment adventures generally don't have. A common reference I make is to my lasttweetinaws.com Twitter threading app. And anyone can visit it, use it however they want.And it uses a Route 53 latency record to figure out, ah, which is the closest region to you because I've deployed it to 20 different regions. Now, if this were a paid service, or I had people using this in large volume and I had to worry about that sort of thing, I would probably approach something that is very close to what you describe. In practice, I pick a devoted region that I deploy something to, and cool, that's sort of my canary where I get things working the way I would expect. And when that works the way I want it to I then just push it to everything else automatically. Given that I've put significant effort into getting deployments down to approximately two minutes to deploy to everything, it feels like that's a reasonable amount of time to push something out.Whereas if I were, I don't know, running a bank, for example, I would probably have an incredibly heavy process around things that make changes to things like payment or whatnot. Because despite the lies, we all like to tell both to ourselves and in public, anything that touches payments does go through waterfall, not agile iterative development because that mistake tends to show up on your customer's credit card bills, and then they're also angry. I think that there's a certain point of maturity you need to be at as either an organization or possibly as a software technology stack before something like feature flags even becomes available to you. Would you agree with that, or is this something everyone should use?Steve: I would agree with that. Definitely, a small team that has communication flowing between the two probably won't get as much value out of a gradual release process because everybody kind of knows what's going on inside of the team. Once your team scales, or maybe your audience scales, that's when it matters more. You really don't want to have something blow up with your users. You really don't want to have people getting paged in the middle of the night because of a change that was made. And so, feature flags do help with that.So typically, the journey we see is people start off in a maybe very small startup. They're releasing features at a very fast pace. They grow and they start to build their own feature flagging solution—again, at companies I've been at previously have done that—and you start using feature flags and you see the power of it. Oh, my gosh, this is great. I can release something when I want without doing a big code push. I can just do a small little change, and if something goes wrong, I can roll it back instantly. That's really handy.And so, the basics of feature flagging might be a homegrown solution that you all have built. If you really lean into that and start to use it more, then you probably want to look at a third-party solution because there's so many features out there that you might want. A lot of them are around safeguards that makes sure that releasing a new feature is safe. You know, again, pushing out a new feature to everybody could be similar to pushing out untested code to production. You don't want to do that, so you need to have, you know, some checks and balances in your release process of your feature flags, and that's what a lot of third parties do.It really depends—to get back to your question about who needs feature flags—it depends on your audience size. You know, if you have enough audience out there to want to do a small rollout to a small set first and then have everybody hit it, that's great. Also, if you just have, you know, one or two developers, then feature flags are probably something that you're just kind of, you're doing yourself, you're pushing out this thing anyway on your own, but you don't need it coordinated across your team.Corey: I think that there's also a bit of—how to frame this—misunderstanding on someone's part about where AppConfig starts and where it stops. When it was first announced, feature flags were one of the things that it did. And that was talked about on stage, I believe in re:Invent, but please don't quote me on that, when it wound up getting announced. And then in the fullness of time, there was another announcement of AppConfig now supports feature flags, which I'm sitting there and I had to go back to my old notes. Like, did I hallucinate this? Which again, would not be the first time I'd imagine such a thing. But no, it was originally how the service was described, but now it's extra feature flags, almost like someone would, I don't know, flip on a feature-flag toggle for the service and now it does a different thing. What changed? What was it that was misunderstood about the service initially versus what it became?Steve: Yeah, I wouldn't say it was a misunderstanding. I think what happened was we launched it, guessing what our customers were going to use it as. We had done plenty of research on that, and as I mentioned before we had—Corey: Please tell me someone used it as a database. Or am I the only nutter that does stuff like that?Steve: We have seen that before. We have seen something like that before.Corey: Excellent. Excellent, excellent. I approve.Steve: And so, we had done our due diligence ahead of time about how we thought people were going to use it. We were right about a lot of it. I mentioned before that we have a lot of usage internally, so you know, that was kind of maybe cheating even for us to be able to sort of see how this is going to evolve. What we did announce, I guess it was last November, was an opinionated version of feature flags. So, we had people using us for feature flags, but they were building their own structure, their own JSON, and there was not a dedicated console experience for feature flags.What we announced last November was an opinionated version that structured the JSON in a way that we think is the right way, and that afforded us the ability to have a smooth console experience. So, if we know what the structure of the JSON is, we can have things like toggles and validations in there that really specifically look at some of the data points. So, that's really what happened. We're just making it easier for our customers to use us for feature flags. We still have some customers that are kind of building their own solution, but we're seeing a lot of them move over to our opinionated version.Corey: This episode is brought to us in part by our friends at Datadog. Datadog's SaaS monitoring and security platform that enables full stack observability for developers, IT operations, security, and business teams in the cloud age. Datadog's platform, along with 500 plus vendor integrations, allows you to correlate metrics, traces, logs, and security signals across your applications, infrastructure, and third party services in a single pane of glass.Combine these with drag and drop dashboards and machine learning based alerts to help teams troubleshoot and collaborate more effectively, prevent downtime, and enhance performance and reliability. Try Datadog in your environment today with a free 14 day trial and get a complimentary T-shirt when you install the agent.To learn more, visit datadoghq/screaminginthecloud to get. That's www.datadoghq/screaminginthecloudCorey: Part of the problem I have when I look at what it is you folks do, and your use cases, and how you structure it is, it's similar in some respects to how folks perceive things like FIS, the fault injection service, or chaos engineering, as is commonly known, which is, “We can't even get the service to stay up on its own for any [unintelligible 00:18:35] period of time. What do you mean, now let's intentionally degrade it and make it work?” There needs to be a certain level of operational stability or operational maturity. When you're still building a service before it's up and running, feature flags seem awfully premature because there's no one depending on it. You can change configuration however your little heart desires. In most cases. I'm sure at certain points of scale of development teams, you have a communications problem internally, but it's not aimed at me trying to get something working at 2 a.m. in the middle of the night.Whereas by the time folks are ready for what you're doing, they clearly have that level of operational maturity established. So, I have to guess on some level, that your typical adopter of AppConfig feature flags isn't in fact, someone who is, “Well, we're ready for feature flags; let's go,” but rather someone who's come up with something else as a stopgap as they've been iterating forward. Usually something homebuilt. And it might very well be you have the exact same biggest competitor that I do in my consulting work, which is of course, Microsoft Excel as people try to build their own thing that works in their own way.Steve: Yeah, so definitely a very common customer of ours is somebody that is using a homegrown solution for turning on and off things. And they really feel like I'm using the heck out of these feature flags. I'm using them on a daily or weekly basis. I would like to have some enhancements to how my feature flags work, but I have limited resources and I'm not sure that my resources should be building enhancements to a feature-flagging service, but instead, I'd rather have them focusing on something, you know, directly for our customers, some of the core features of whatever your company does. And so, that's when people sort of look around externally and say, “Oh, let me see if there's some other third-party service or something built into AWS like AWS AppConfig that can meet those needs.”And so absolutely, the workflows get more sophisticated, the ability to move forward faster becomes more important, and do so in a safe way. I used to work at a cybersecurity company and we would kind of joke that the security budget of the company is relatively low until something bad happens, and then it's, you know, whatever you need to spend on it. It's not quite the same with feature flags, but you do see when somebody has a problem on production, and they want to be able to turn something off right away or make an adjustment right away, then the ability to do that in a measured way becomes incredibly important. And so, that's when, again, you'll see customers starting to feel like they're outgrowing their homegrown solution and moving to something that's a third-party solution.Corey: Honestly, I feel like so many tools exist in this space, where, “Oh, yeah, you should definitely use this tool.” And most people will use that tool. The second time. Because the first time, it's one of those, “How hard could that be out? I can build something like that in a weekend.” Which is sort of the rallying cry of doomed engineers who are bad at scoping.And by the time that they figure out why, they have to backtrack significantly. There's a whole bunch of stuff that I have built that people look at and say, “Wow, that's a really great design. What inspired you to do that?” And the absolute honest answer to all of it is simply, “Yeah, I worked in roles for the first time I did it the way you would think I would do it and it didn't go well.” Experience is what you get when you didn't get what you wanted, and this is one of those areas where it tends to manifest in reasonable ways.Steve: Absolutely, absolutely.Corey: So, give me an example here, if you don't mind, about how feature flags can improve the day-to-day experience of an engineering team or an engineer themselves. Because we've been down this path enough, in some cases, to know the failure modes, but for folks who haven't been there that's trying to shave a little bit off of their journey of, “I'm going to learn from my own mistakes.” Eh, learn from someone else's. What are the benefits that accrue and are felt immediately?Steve: Yeah. So, we kind of have a policy that the very first commit of any new feature ought to be the feature flag. That's that sort of on-off switch that you want to put there so that you can start to deploy your code and not have a long-lived branch in your source code. But you can have your code there, it reads whether that configuration is on or off. You start with it off.And so, it really helps just while developing these things about keeping your branches short. And you can push the mainline, as long as the feature flag is off and the feature is hidden to production, which is great. So, that helps with the mess of doing big code merges. The other part is around the launch of a feature.So, you talked about Andy Jassy being on stage to launch a new feature. Sort of the old way of doing this, Corey, was that you would need to look at your pipelines and see how long it might take for you to push out your code with any sort of code change in it. And let's say that was an hour-and-a-half process and let's say your CEO is on stage at eight o'clock on a Friday. And as much as you like to say it, “Oh, I'm never pushing out code on a Friday,” sometimes you have to. The old way—Corey: Yeah, that week, yes you are, whether you want to or not.Steve: [laugh]. Exactly, exactly. The old way was this idea that I'm going to time my release, and it takes an hour-and-a-half; I'm going to push it out, and I'll do my best, but hopefully, when the CEO raises her arm or his arm up and points to a screen that everything's lit up. Well, let's say you're doing that and something goes wrong and you have to start over again. Well, oh, my goodness, we're 15 minutes behind, can you accelerate things? And then you start to pull away some of these blockers to accelerate your pipeline or you start editing it right in the console of your application, which is generally not a good idea right before a really big launch.So, the new way is, I'm going to have that code already out there on a Wednesday [laugh] before this big thing on a Friday, but it's hidden behind this feature flag, I've already turned it on and off for internals, and it's just waiting there. And so, then when the CEO points to the big screen, you can just flip that one small little configuration change—and that can be almost instantaneous—and people can access it. So, that just reduces the amount of stress, reduces the amount of risk in pushing out your code.Another thing is—we've heard this from customers—customers are increasing the number of deploys that they can do per week by a very large percentage because they're deploying with confidence. They know that I can push out this code and it's off by default, then I can turn it on whenever I feel like it, and then I can turn it off if something goes wrong. So, if you're into CI/CD, you can actually just move a lot faster with a number of pushes to production each week, which again, I think really helps engineers on their day-to-day lives. The final thing I'm going to talk about is that let's say you did push out something, and for whatever reason, that following weekend, something's going wrong. The old way was oop, you're going to get a page, I'm going to have to get on my computer and go and debug things and fix things, and then push out a new code change.And this could be late on a Saturday evening when you're out with friends. If there's a feature flag there that can turn it off and if this feature is not critical to the operation of your product, you can actually just go in and flip that feature flag off until the next morning or maybe even Monday morning. So, in theory, you kind of get your free time back when you are implementing feature flags. So, I think those are the big benefits for engineers in using feature flags.Corey: And the best way to figure out whether someone is speaking from a position of experience or is simply a raving zealot when they're in a position where they are incentivized to advocate for a particular way of doing things or a particular product, as—let's be clear—you are in that position, is to ask a form of the following question. Let's turn it around for a second. In what scenarios would you absolutely not want to use feature flags? What problems arise? When do you take a look at a situation and say, “Oh, yeah, feature flags will make things worse, instead of better. Don't do it.”Steve: I'm not sure I wouldn't necessarily don't do it—maybe I am that zealot—but you got to do it carefully.Corey: [laugh].Steve: You really got to do things carefully because as I said before, flipping on a feature flag for everybody is similar to pushing out untested code to production. So, you want to do that in a measured way. So, you need to make sure that you do a couple of things. One, there should be some way to measure what the system behavior is for a small set of users with that feature flag flipped to on first. And it could be some canaries that you're using for that.You can also—there's other mechanisms you can do that to: set up cohorts and beta testers and those kinds of things. But I would say the gradual rollout and the targeted rollout of a feature flag is critical. You know, again, it sounds easy, “I'll just turn it on later,” but you ideally don't want to do that. The second thing you want to do is, if you can, is there some sort of validation that the feature flag is what you expect? So, I was talking about on-off feature flags; there are things, as when I was talking about dynamic configuration, that are things like throttling limits, that you actually want to make sure that you put in some other safeguards that say, “I never want my TPS to go above 1200 and never want to set it below 800,” for whatever reason, for example. Well, you want to have some sort of validation of that data before the feature flag gets pushed out. Inside Amazon, we actually have the policy that every single flag needs to have some sort of validation around it so that we don't accidentally fat-finger something out before it goes out there. And we have fat-fingered things.Corey: Typing the wrong thing into a command structure into a tool? “Who would ever do something like that?” He says, remembering times he's taken production down himself, exactly that way.Steve: Exactly, exactly, yeah. And we've done it at Amazon and AWS, for sure. And so yeah, if you have some sort of structure or process to validate that—because oftentimes, what you're doing is you're trying to remediate something in production. Stress levels are high, it is especially easy to fat-finger there. So, that check-and-balance of a validation is important.And then ideally, you have something to automatically roll back whatever change that you made, very quickly. So AppConfig, for example, hooks up to CloudWatch alarms. If an alarm goes off, we're actually going to roll back instantly whatever that feature flag was to its previous state so that you don't even need to really worry about validating against your CloudWatch. It'll just automatically do that against whatever alarms you have.Corey: One of the interesting parts about working at Amazon and seeing things in Amazonian scale is that one in a million events happen thousands of times every second for you folks. What lessons have you learned by deploying feature flags at that kind of scale? Because one of my problems and challenges with deploying feature flags myself is that in some cases, we're talking about three to five users a day for some of these things. That's not really enough usage to get insights into various cohort analyses or A/B tests.Steve: Yeah. As I mentioned before, we build these things as features into our product. So, I just talked about the CloudWatch alarms. That wasn't there originally. Originally, you know, if something went wrong, you would observe a CloudWatch alarm and then you decide what to do, and one of those things might be that I'm going to roll back my configuration.So, a lot of the mistakes that we made that caused alarms to go off necessitated us building some automatic mechanisms. And you know, a human being can only react so fast, but an automated system there is going to be able to roll things back very, very quickly. So, that came from some specific mistakes that we had made inside of AWS. The validation that I was talking about as well. We have a couple of ways of validating things.You might want to do a syntactic validation, which really you're validating—as I was saying—the range between 100 and 1000, but you also might want to have sort of a functional validation, or we call it a semantic validation so that you can make sure that, for example, if you're switching to a new database, that you're going to flip over to your new database, you can have a validation there that says, “This database is ready, I can write to this table, it's truly ready for me to switch.” Instead of just updating some config data, you're actually going to be validating that the new target is ready for you. So, those are a couple of things that we've learned from some of the mistakes we made. And again, not saying we aren't making mistakes still, but we always look at these things inside of AWS and figure out how we can benefit from them and how our customers, more importantly, can benefit from these mistakes.Corey: I would say that I agree. I think that you have threaded the needle of not talking smack about your own product, while also presenting it as not the global panacea that everyone should roll out, willy-nilly. That's a good balance to strike. And frankly, I'd also say it's probably a good point to park the episode. If people want to learn more about AppConfig, how you view these challenges, or even potentially want to get started using it themselves, what should they do?Steve: We have an informational page at go.aws/awsappconfig. That will tell you the high-level overview. You can search for our documentation and we have a lot of blog posts to help you get started there.Corey: And links to that will, of course, go into the [show notes 00:31:21]. Thank you so much for suffering my slings, arrows, and other assorted nonsense on this. I really appreciate your taking the time.Steve: Corey thank you for the time. It's always a pleasure to talk to you. Really appreciate your insights.Corey: You're too kind. Steve Rice, principal product manager for AWS AppConfig. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment. But before you do, just try clearing your cookies and downloading the episode again. You might be in the 3% cohort for an A/B test, and you [want to 00:32:01] listen to the good one instead.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
HeatWave and the Latest Evolution of MySQL with Nipun Agarwal

Screaming in the Cloud

Play Episode Listen Later Oct 6, 2022 38:43


About NipunNipun Agarwal is a Senior Vice President, MySQL HeatWave Development, Oracle. His interests include distributed data processing, machine learning, cloud technologies and security. Nipun was part of the Oracle Database team where he introduced a number of new features. He has been awarded over 170 patents.Links Referenced: Oracle: https://oracle.com MySQL HeatWave info: https://www.oracle.com/mysql/ MySQL Service on AWS and OCI login (Oracle account required): https://cloud.mysql.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at Datadog. Datadog's SaaS monitoring and security platform that enables full stack observability for developers, IT operations, security, and business teams in the cloud age. Datadog's platform, along with 500 plus vendor integrations, allows you to correlate metrics, traces, logs, and security signals across your applications, infrastructure, and third party services in a single pane of glass.Combine these with drag and drop dashboards and machine learning based alerts to help teams troubleshoot and collaborate more effectively, prevent downtime, and enhance performance and reliability. Try Datadog in your environment today with a free 14 day trial and get a complimentary T-shirt when you install the agent.To learn more, visit datadoghq.com/screaminginthecloud to get. That's www.datadoghq.com/screaminginthecloudCorey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode is sponsored by our friends at Oracle, and back for a borderline historic third round going out and telling stories about these things, we have Nipun Agarwal, who is, as opposed to his first appearance on the show, has been promoted to senior vice president of MySQL HeatWave. Nipun, thank you for coming back. Most people are not enamored enough with me to subject themselves to my slings and arrows a second time, let alone a third. So first, thanks. And are you okay, over there?Nipun: Thank you, Corey. Yeah, very happy to be back.Corey: [laugh]. So, since the last time we've spoken, there have been some interesting developments that have happened. It was pre-announced by Larry Ellison on a keynote stage or an earnings call, I don't recall the exact format, that HeatWave was going to be coming to AWS. Now, you've conducted a formal announcement, this usual media press blitz, et cetera, talking about it with an eye toward general availability later this year, if I'm not mistaken, and things seem to be—if you'll forgive the term—heating up a bit.Nipun: That is correct. So, as you know, we have had MySQL HeatWave on OCI for just about two years now. Very good reception, a lot of people who are using MySQL HeatWave, are migrating from other clouds, specifically from AWS, and now we have announced availability of MySQL HeatWave on AWS.Corey: So, for those who have not done the requisite homework of listening to the entire back catalog of nearly 400 episodes of this show, what exactly is MySQL HeatWave, just so we make sure that we set the stage for what we're going to be talking about? Because I sort of get the sense that without a baseline working knowledge of what that is, none of the rest of this is going to make a whole lot of sense.Nipun: MySQL HeatWave is a managed MySQL service provided by Oracle. But it is different from other MySQL-based services in the sense that we have significantly enhanced the service such that it can very efficiently process transactions, analytics, and in-database machine learning. So, what customers get with the service, with MySQL HeatWave, is a single MySQL database which can process OLTP, transaction processing, real-time analytics, and machine learning. And they can do this without having to move the data out of MySQL into some other specialized database services who are running analytics or machine learning. And all existing tools and applications which work with MySQL work as is because this is something that enhances the server. In addition to that, it provides very good performance and very good price performance compared to other similar services out there.Corey: The idea historically that some folks were pushing around the idea of multi-cloud was that you would have workloads that—oh, they live in one cloud, but the database was going to be all the way across the other side of the internet, living in a different provider. And in practice, what we generally tend to see is that where the data lives is where the compute winds up living. By and large, it's easier to bring the compute resources to the data than it is to move the data to the compute, just because data egress in most of the cloud providers—notably exempting yours—is astronomically expensive. You are, if I recall correctly, less than 10% of AWS's data egress charge on just retail pricing alone, which is wild to me. So first, thank you for keeping that up and not raising prices because I would have felt rather annoyed if I'd been saying such good things. And it was, haha, it was a bait and switch. It was not. I'm still a big fan. So, thank you for that, first and foremost.Nipun: Certainly. And what you described is absolutely correct that while we have a lot of customers migrating from AWS to use MySQL HeatWave and OCI, a class of customers are unable to, and the number one reason they're unable to is that AWS charges these customers all very high egress fees to move the data out of AWS into OCI for them to benefit from MySQL HeatWave. And this has definitely been one of the key incentives for us, the key motivation for us, to offer MySQL HeatWave on AWS so that customers don't need to pay this exorbitant data egress fees.Corey: I think it's fair to disclose that I periodically advise a variety of different cloud companies from a perspective of voice-of-the-customer feedback, which essentially distills down to me asking really annoying slash obnoxious questions that I, as a customer, legitimately want to know, but people always frown at me when I asked that in vendor pitches. For some reason, when I'm doing this on an advisory basis, people instead nod thoughtfully and take notes, so that at least feels better from my perspective. Oracle Cloud has been one of those, and I've been kicking the tires on the AWS offering that you folks have built out for a bit of time now. I have to say, it is legitimate. I was able to run a significant series of tests on this, and what I found going through that process was interesting on a bunch of different levels.I'm wondering if it's okay with you, if we go through a few of them, just things that jumped out to me as we went through a series of conversations around, “So, we're going to run a service on AWS.” And my initial answer was, “Is this Oracle? Are you sure?” And here we are today; we are talking about it and press releases.Nipun: Yes, certainly fine with me. Please go ahead.Corey: So, I think one of the first questions I had when you said, “We're going to run a database service on AWS itself,” was, if I'm true to type, is going to be fairly sarcastic, which is, “Oh, thank God. Finally, a way to run a MySQL database on AWS. There's never been one of those before.” Unless you count EC2 or Aurora or Redshift depending upon how you squint at it, or a variety of other increasingly strange things. It feels like that has been a largely saturated market in many respects.I generally don't tend to advise on things that I find patently ridiculous, and your answer was great, but I don't want to put words in your mouth. What was it that you saw that made you say, “Ah, we're going to put a different database offering on AWS, and no, it's not a terrible decision.”Nipun: Got it. Okay, so if you look at it, the value proposition which MySQL HeatWave offers is that customers of MySQL or customers have MySQL compatible databases, whether Aurora, or whether it's RDS MySQL, right, or even, like, you know, customers of Redshift, they have been migrating to MySQL HeatWave on OCI. Like, for the reasons I said: it's a single database, customers don't need to have multiple databases for managing different kinds of workloads, it's much faster, it's a lot less expensive, right? So, there are a lot of value propositions. So, what we found is that if you were to offer MySQL HeatWave on AWS, it will significantly ease the migration of other customers who might be otherwise thinking that it will be difficult for them to migrate, perhaps because of the high egress cost of AWS, or because of the high latency some of the applications in the AWS incur when the database is running somewhere else.Or, if they really have an ecosystem of applications already running on AWS and they just want to replace the database, it'll be much easier for them if MySQL HeatWave was offered on AWS. Those are the reasons why we feel it's a compelling proposition, that if existing customers of AWS are willing to migrate the cloud from AWS to OCI and use MySQL HeatWave, there is clearly a value proposition we are offering. And if we can now offer the same service in AWS, it will hopefully increase the number of customers who can benefit from MySQL HeatWave.Corey: One of the next questions I had going in was, “Okay, so what actually is this under the hood?” Is this you effectively just stuffing some software into a machine image or an AMI—or however they want to mispronounce that word over an AWS-land—and then just making it available to your account and running that, where's the magic or mystery behind this? Like, it feels like the next more modern cloud approach is to stuff the whole thing into a Docker container. But that's not what you wound up doing.Nipun: Correct. So, HeatWave has been designed and architected for scale-out processing, and it's been optimized for the cloud. So, when we decided to offer MySQL HeatWave on AWS, we have actually gone ahead and optimize our server for the AWS architecture. So, the processor we are running on, right, we have optimized our software for that instance types in AWS, right? So, the data plane has been optimized for AWS architecture.The second thing is we have a brand new control plane layer, right? So, it's not the case that we're just taking what we had in OCI and running it on AWS. We have optimized the data plane for AWS, we have a native control plane, which is running on AWS, which is using the respective services on AWS. And third, we have a brand new console which we are offering, which is a very interactive console where customers can run queries from the console. They can do data management from the console, they're able to use Autopilot from the console, and we have performance monitoring from the console, right? So, data plane, control plane, console. They're all running natively in AWS. And this provides for a very seamless integration or seamless experience for the AWS customers.Corey: I think it's also a reality, however much we may want to pretend otherwise, that if there is an opportunity to run something in a different cloud provider that is better than where you're currently running it now, by and large, customers aren't going to do it because it needs to not just be better, but so astronomically better in ways that are foundational to a company's business model in order to justify the tremendous expense of a cloud migration, not just in real, out of pocket, cost in dollars and cents that are easy to measure, but also in terms of engineering effort, in terms of opportunity cost—because while you're doing that you're not doing other things instead—and, on some level, people tend to only do that when there's an overwhelming strategic reason to do it. When folks already have existing workloads on AWS, as many of them do, it stands to reason that they are not going to want to completely deviate from that strategy just because something else offers a better database experience any number of axes. So, meeting customers where they are is one of the, I guess, foundational shifts that we've really seen from the entire IT industry over the last 40 years, rather than you will buy it from us and you will tolerate it. It's, now customers have choice, and meeting them where they are and being much more, I guess, able to impedance-match with them has been critical. And I'm really optimistic about what the launch of this service portends for Oracle.Nipun: Indeed, but let me give you another data point. We find a very large number of Aurora customers migrating to MySQL HeatWave on OCI, right? And this is the same workload they were running on Aurora, but now they want to run the same workload on MySQL HeatWave on OCI. They are willing to undertake this journey of migration because their applications, they get much faster, and for a lot less price, but they get much faster. Then the second aspect is, there's another class of customers who are for instance running, on Aurora or other transactions or workloads, but then they have to keep moving the data, they'll keep performing the ETL process into some other service, whether it's Snowflake, or whether it's Redshift for analytics.Now, with this migration, when they move to MySQL HeatWave, customers don't need to, like, have multiple databases, and they get real-time analytics, meaning that if any data changes inside the server inside the OLTP as a database service, right? If they were to run a query, that query is giving them the latest results, right? It's not stale. Whereas with an ETL process, it gets to be stale. So, given that we already found that there were so many customers migrating to OCI to use MySQL HeatWave, I think there's a clear value proposition of MySQL HeatWave, and there's a lot of demand.But like, as I was mentioning earlier, by having MySQL HeatWave be offered on AWS, it makes the proposition even more compelling because, as you said, yes, there is some engineering work that customers will need to do to migrate between clouds, and if they don't want to, then absolutely now they have MySQL HeatWave which they can now use in AWS itself.Corey: I think that one of the things I continually find myself careening into, perhaps unexpectedly, is a failure to really internalize just how vast this entire industry really is. Every time I think I've seen it all, all I have to do is talk to one more cloud customer and I learn something completely new and different. Sometimes it's an innovative, exciting use of a thing. Other times, it's people holding something fearfully wrong and trying to use it as a hammer instead. And you know, if it's dumb and it works, is it really dumb? There are questions around that.And this in turn gave rise to one of my next obnoxious questions as I was looking at what you were building at the time because a lot of your pricing and discussions and framing of this was targeting very large enterprise-style customers, and the price points reflected that. And then I asked the question that Big E enterprise never quite expects, for whatever reason, it's like, “That looks awesome if I have a budget with many commas in it. What can I get for $4?” And as of this recording, pricing has not been finalized slash published for the service, but everything that you have shown me so far absolutely makes developing on this for a proof of concept or an evening puttering around, completely tenable: it is not bound to a fixed period of licensing; it's, use it when you want to use it, turn it off when you're done; and the hourly pricing is not egregious. I think that is something that historically, Oracle Database offerings have not really aligned with.OCI very much has, particularly with an eye toward its extraordinarily awesome free tier that's always free. But this feels like it's a weird blending of the OCI model versus historical Oracle Database pricing models in a way that, honestly I'm pretty excited about.Nipun: So, we react to what the customer requirements and needs are. So, for this class of customers who are using, say, RDS, MySQL, Aurora, we understand that they are very cost sensitive, right? So, one of the things which we have done in addition to offering MySQL HeatWave on AWS is based on the customer feedback and such. We are now offering a small shape of HeatWave instance in addition to the regular large shape. So, if customers want to just, you know, kick the tires, if developers just want to get started, they can get a MySQL node with HeatWave for less than ten cents an hour. So, for less than ten cents an hour, they get the ability to run transaction processing, analytics, and machine learning.And if you were to compare the corresponding cost of Aurora for the same, like, you know, core count, it's, like, you know, 12-and-a-half cents. And that's just Aurora, without Redshift or without SageMaker. So yes, you're right that based on the feedback and we have found that it would be much more attractive to have this low-end shape for the AWS developers. We are offering this smaller shape. And yeah, it's very, very affordable. It's about just shy of ten cents an hour.Corey: This brings up another question that I raised pretty early on in the process because you folks kept talking about shapes, and it turns out that is the Oracle Cloud term that applies to instance size over an AWS-land. And as we dug into this a bit further, it does make sense for how you think about these things and how you build them to customers. Specifically, if I want to run this, I log into cloud.oracle.com and sign up for it there, and pay you over on that side of the world, this does not show up on my AWS bill. What drove that decision?Nipun: Okay, so a couple of things. One clarification is that the site people log in to is cloud.mysql.com. So, that's where they come to: cloud.mysql.com.Corey: Oh, my apologies. I keep forgetting that you folks have multiple cloud offerings and domains. They're kind of a thing. How do they work? Given I have a bad domain by habit myself, I have no room to judge.Nipun: So, they come to cloud.mysql.com. From there, they can provision an instance. And we, as, like, you know, Oracle or MySQL, go ahead and create an instance in AWS, in the Oracle tenancy. From there, customers can then, you know, access their data on AWS and such. Now, what we want to provide the customers is a very seamless experience, that they just come to cloud.mysql.com, and from there, they can do everything: provisioning an instance, running the queries, payment and such. So, this is one of the reasons that we want customers just to be able to come to the site, cloud.mysql.com, and take care of the billing and such.Now, the other thing is that, okay, why not allow customers to pay from AWS, right? Now, one of the things over there is that if you were to do that and there's a customer, they'll be like, “Hey, I got to pay something to AWS, something to Oracle, so we'd prefer, it'd be better to have a one-stop shop.” And since many of these are already Oracle customers, it's helpful to do it this way.Corey: Another approach you could have taken—and I want to be very clear here that I am not suggesting that this would have been a good idea—but an approach that you could have taken would have been to go down the weird AWS partner rabbit hole, and we're going to provide this to customers on the AWS Marketplace. Because according to AWS, that's where all of their customers go to discover new softwares. Yeah, first, that's a lie. They do not. But aside from that, what was it about that Marketplace model that drove you to a decision point where okay, at launch, we are not going to be offering this on the AWS Marketplace? And to be clear, I'm not suggesting that was the wrong decision.Nipun: Right. The main reason is we want to offer the MySQL HeatWave service at the least expensive cost to the user, right, or like, the least cost. If you were to, like, have MySQL HeatWave in the Marketplace, AWS charges a premium. This the customers would need to pay. So, we just didn't want the customers to have to pay this additional premium just because they can now source this thing from the Marketplace. So, it's really to, like, save costs for the customer.Corey: The value of the Marketplace, from my perspective, has been effectively not having to deal as much with customer procurement departments because well, AWS is already on the procurement approved list, so we're just going to go ahead and take the hit to wind up making it accessible from that perspective and calling it good. The downside to this is that increasingly, as customers are making larger and longer-term commitments that are tied to certain levels of spend on AWS, they're increasingly trying to drag every vendor with whom they do business into the your AWS bill so they can check those boxes off. And the problem that I keep seeing with that is vendors who historically have been doing just fine, have great working relationships with a customer are reporting that suddenly customers are coming back with, “Yeah, so for our contract renewal, we want to go through the AWS Marketplace.” In return, effectively, these companies are then just getting a haircut off whatever it is they're able to charge their customers but receiving no actual value for any of this. It attenuates the relationship by introducing a third party into the process, and it doesn't make anything better from the vendor's point of view because they already had something functional and working; now they just have to pay a commission on it to AWS, who, it seems, is pathologically averse to any transaction happening where they don't get a cut, on some level. But I digress. I just don't like that model very much at all. It feels coercive.Nipun: That's absolutely right. That's absolutely right. And we thought that, yes, there is some value to be going to Marketplace, but it's not worth the additional premium customers would need to pay. Totally agree.Corey: This episode is sponsored in part by our friends at AWS AppConfig. Engineers love to solve, and occasionally create, problems. But not when it's an on-call fire-drill at 4 in the morning. Software problems should drive innovation and collaboration, NOT stress, and sleeplessness, and threats of violence. That's why so many developers are realizing the value of AWS AppConfig Feature Flags. Feature Flags let developers push code to production, but hide that that feature from customers so that the developers can release their feature when it's ready. This practice allows for safe, fast, and convenient software development. You can seamlessly incorporate AppConfig Feature Flags into your AWS or cloud environment and ship your Features with excitement, not trepidation and fear. To get started, go to snark.cloud/appconfig. That's snark.cloud/appconfig.Corey: It's also worth pointing out that in Oracle's historical customer base, by which I mean the last 40 years that you folks have been in business, you do have significant customers with very sizable estates. A lot of your cloud efforts have focused around, I guess, we'll call it an Oracle-specific currency: Oracle Credits. Which is similar to the AWS style of currency just for a different company in different ways. One of the benefits that you articulated to me relatively early on was that by going through cloud.mysql.com, customers with those credits—which can be in sizable amounts based upon various differentiating variables that change from case to case—and apply that to their use of MySQL HeatWave on AWS.Nipun: Right. So, in fact, just for starters, right, what we give to customers is we offer some free credits for customers to try a service on OCI of, you know, $300. And that's the same thing, the same experience you would like customers who are trying HeatWave on AWS to get. Yes, so you're right, this is the kind of consistency we want to have, and yet another reason why cloud.mysql.com makes sense is the entry point for customers to try the service.Corey: There was a time where I would have struggled not to laugh in your face at the idea that we're talking about something in the context of an Oracle database, and well, there's $300 in credit. That's, “What can I get for that? Hung up on?” No. A surprising amount, when it comes to these things.I feel like that opens up an entirely new universe of experimentation. And, “Let's see how this thing actually works with his workload,” and lets people kick the tires on it for themselves in a way that, “Oh, we have this great database. Well, can I try it? Sure, for $8 million, you absolutely can.” “Well, it can stay great and awesome over there because who wants to take that kind of a bet?” It feels like it's a new world and in a bunch of different respects, and I just can't make enough noise about how glad I am to see this transformation happening.Nipun: Yeah. Absolutely, right? So, just think about it. So, you're getting MySQL and HeatWave together for just shy of ten cents an hour, right? So, what you could get for $300 is 3000 hours for MySQL HeatWave instance, which is very good for people to try for free. And then, you know, decide if they want to go ahead with it.Corey: One other, I guess, obnoxious question that I love to ask—it's not really a question so much as a statement; that that's part of the first thing that makes it really obnoxious—but it always distills down to the following that upsets product people left and right, which is, “I don't get it.” And one of the things that I didn't fully understand at the outset of how you were structuring things was the idea of separating out HeatWave from its constituent components. I believe it was Autopilot if I'm not mistaken, and it was effectively different SKUs that you could wind up opting to go for. And okay, if I'm trying to kick the tires on this and contextualize it as someone for whom the world's best database is Route 53, then it really felt like an additional decision point that I wasn't clear on the value of. And I'm still not entirely sure on the differentiation point and the value there, but now you offer it bundled as a default, which I think is so much better, from the user experience perspective.Nipun: Okay, so let me clarify a couple of things.Corey: Please. Databases are not my forte, so expect me to wind up getting most of the details hilariously wrong.Nipun: Sure. So, MySQL Autopilot provides machine-learning-based automation for various aspects of the MySQL service; very popular. There is no charge for it. It is built into MySQL HeatWave; there is no additional charge for it, right, so there never was any SKU for it. What you're referring to is, we have had a SKU for the MySQL node or the MySQL instance, and there's a separate SKU for HeatWave.The reason there is a need to have a different SKU for these two is because you always only have one node of MySQL. It could be, like, you know, running on one core, or like, you know, multiple cores, but it's always, like, you know, one node. But with HeatWave, it's a scale-out architecture, so you can have multiple nodes. So, the users need to be able to express how many nodes of HeatWave are they provisioning, right? So, that's why there is a need to have two SKUs, and we continue to have those two SKUs.What we are doing now differently is that when users instantiate a MySQL instance, by default, they always get the HeatWave node associated with it, right? So, they don't need to, like, you know, make the decision to—okay when to add HeatWave; they always get HeatWave along with the MySQL instance, and that's what I was saying a combination of both of these is, you know, like, just about ten cents an hour. If for whatever reason, they decide that they do not want HeatWave, they can turn it off, and then the price drops to half. But what we're providing is the AWS service that HeatWave is turned on by default.Corey: Which makes an awful lot of sense. It's something that lets people opt out if they decide they don't need this as they continue to scale out, but for the newcomer who does not, in many cases—in my particular case—have a nuanced understanding of where this offering starts and stops, it's clearly the right decision of—rather than, “Oh, yeah. The thing you were trying and it didn't work super well? Well, yeah. If you enable this other thing, it would have been awesome.” “Well, great. Please enable it for me by default and let me opt out later in time as my level of understanding deepens.”Nipun: That's right. And that's exactly what we are doing. Now, this was a feedback we got because many, if not most, of our customers would want to have HeatWave, and we just kind of, you know, mitigating them from going through one more step, it's always enabled by default.Corey: As far as I'm aware, you folks are running this effectively as any other AWS customer might, where you establish a private link connection to your customers, in some cases, or give them a public or private endpoint where they can wind up communicating with this service. It doesn't require any favoritism or special permissions from AWS themselves that they wouldn't give to any other random customer out there, correct?Nipun: Yes, that is correct. So, for now, we are exposing this thing as a public endpoint. In the future, we have plans to support the private endpoint as well, but for now, it's public.Corey: Which means that foundationally what you're building out is something that fits into a model that could work extraordinarily well across a variety of different environments. How purpose-tuned is the HeatWave installation you have running on AWS for the AWS environment, versus something that is relatively agnostic, could be dropped into any random cloud provider, up to and including the terrifyingly obsolete rack I have in the spare room?Nipun: So, as I mentioned, when we decided to offer MySQL HeatWave on AWS, the idea was that okay, for the AWS customers, we now want to have an offering which is completely optimized for AWS, provides the best price-performance on AWS. So, we have determined which instance types underneath will provide the best price performance, and that's what we have optimized for, right? So, I can tell you, like, in terms of many of—for instance, take the case of the cache size of the underlying processor that we're using on AWS is different than what we're using for OCI. So, we have gone ahead, made these optimizations in our code, and we believe that our code is really optimized now for the AWS infrastructure.Corey: I think that makes a fair deal of sense because, again, one of the big problems AWS has had is the proliferation of EC2 instance types to the point now where the answer is super easy, too, “Are you using the correct instance type for your workload?” Because that answer now is, “Of course not. Who could possibly say that they were with any degree of confidence?” But when you take the time to look at a very specific workload that's going to be scaled out, it's worth the time investment to figure out exactly how to optimize things for price and performance, given the constraints. Let's be very clear here, I would argue that the better price performance for HeatWave is almost certainly not going to be on AWS themselves, if for no other reason than the joy that is their data transfer pricing, even for internal things moving around from time to time.Personally, I love getting charged data transfer for taking data from S3, running it through AWS Glue, putting it into a different S3 bucket, accessing it with Athena, then hooking that up to Tableau as we go down and down and down the spiraling rabbit hole that never ends. It's not exactly what I would call well-optimized economically. Their entire system feels almost like it's a rigged game, on some level. But given those constraints, yeah, dialing in it and making it cost-effective is absolutely something that I've watched you folks put significant time and effort into.Nipun: So, I'll make two points, right, to the questions. First is yes, I just want to, like, be clear about it, that when a user provisions MySQL HeatWave via cloud.mysql.com and we create an instance in AWS, we don't give customers a multitude of things to, like, you know, choose from.We have determined which instance type is going to provide the customer the best price performance, and that's what we provision. So, the customer doesn't even need to know or care, is it going to be, like, you know, AMD? Is it going to be Intel? Is it going to be, like, you know, ARM, right? So, it's something which we have predetermined and we have optimized for it. That's first.The second point is in terms of the price performance. So, you're absolutely right, that for the class of customers who cannot migrate away from AWS because of the egress costs or because of the high latency because of AWS, right, sure, MySQL HeatWave on AWS will provide the best price-performance compared to other services out in AWS like Redshift, or Aurora, or Snowflake. But if customers have the flexibility to choose a cloud of their choice, it is indeed the case that customers are going to find that running MySQL HeatWave on OCI is going to provide them, by far, the best price performance, right? So, the price performance of running MySQL HeatWave on OCI is indeed better than MySQL HeatWave on AWS. And just because of the fact that when we are running the service in AWS, we are paying the list price, right, on AWS; that's how we get the gear. Whereas with OCI, like, you know, things are a lot less expensive for us.But even when you're running on AWS, we are very, very price competitive with other services. And you know, as you've probably seen from the performance benchmarks and such, what I'm very intrigued about is that we're able to run a standard workload, like some, like, you know, TPC-H and offer seven times better price-performance while running in AWS compared to Redshift. So, what this goes to show is that we are really passing on the savings to the customers. And clearly, Redshift is not doing a good job of performance or, like, you know, they're charging too much. But the fact that we can offer seven times better price performance than Redshift in AWS speaks volumes, both about architecture and how much of savings we are passing to our customers.Corey: What I love about this story is that it makes testing the waters of what it's like to run MySQL HeatWave a lot easier for customers because the barrier to entry is so much lower. Where everything you just said I agree with it is more cost-effective to run on Oracle Cloud. I think there are a number of workloads that are best placed on Oracle Cloud. But unless you let people kick the tires on those things, where they happen to be already, it's difficult to get them to a point where they're going to be able to experience that themselves. This is a massive step on that path.Nipun: Yep. Right.Corey: I really want to thank you for taking time out of your day to walk us through exactly how this came to be and what the future is going to look like around this. If people want to learn more, where should they go?Nipun: Oh, they can go to oracle.com/mysql, and there they can get a lot more information about the capabilities of MySQL HeatWave, what we are offering in AWS, price-performance. By the way, all the price performance numbers I was talking about, all the scripts are available publicly on GitHub. So, we welcome, we encourage customers to download the scripts from GitHub, try for themselves, and all of this information is available from oracle.com/mysql where they can get this detailed information.Corey: And we will, of course, put links to that in the show notes. Thank you so much for your time. I appreciate it.Nipun: Sure thing, Corey. Thank you for the opportunity.Corey: Nipun Agarwal, Senior Vice President of MySQL HeatWave. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry insulting comment. You will then be overcharged for the data transfer to submit that insulting comment, and then AWS will take a percentage of that just because they're obnoxious and can.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
ChaosSearch and the Evolving World of Data Analytics with Thomas Hazel

Screaming in the Cloud

Play Episode Listen Later Oct 4, 2022 35:21


About ThomasThomas Hazel is Founder, CTO, and Chief Scientist of ChaosSearch. He is a serial entrepreneur at the forefront of communication, virtualization, and database technology and the inventor of ChaosSearch's patented IP. Thomas has also patented several other technologies in the areas of distributed algorithms, virtualization and database science. He holds a Bachelor of Science in Computer Science from University of New Hampshire, Hall of Fame Alumni Inductee, and founded both student & professional chapters of the Association for Computing Machinery (ACM).Links Referenced: ChaosSearch: https://www.chaossearch.io/ Twitter: https://twitter.com/ChaosSearch Facebook: https://www.facebook.com/CHAOSSEARCH/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at AWS AppConfig. Engineers love to solve, and occasionally create, problems. But not when it's an on-call fire-drill at 4 in the morning. Software problems should drive innovation and collaboration, NOT stress, and sleeplessness, and threats of violence. That's why so many developers are realizing the value of AWS AppConfig Feature Flags. Feature Flags let developers push code to production, but hide that that feature from customers so that the developers can release their feature when it's ready. This practice allows for safe, fast, and convenient software development. You can seamlessly incorporate AppConfig Feature Flags into your AWS or cloud environment and ship your Features with excitement, not trepidation and fear. To get started, go to snark.cloud/appconfig. That's snark.cloud/appconfig.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode is brought to us by our returning sponsor and friend, ChaosSearch. And once again, the fine folks at ChaosSearch has seen fit to basically subject their CTO and Founder, Thomas Hazel, to my slings and arrows. Thomas, thank you for joining me. It feels like it's been a hot minute since we last caught up.Thomas: Yeah, Corey. Great to be on the program again, then. I think it's been almost a year. So, I look forward to these. They're fun, they're interesting, and you know, always a good time.Corey: It's always fun to just take a look at companies' web pages in the Wayback Machine, archive.org, where you can see snapshots of them at various points in time. Usually, it feels like this is either used for long-gone things and people want to remember the internet of yesteryear, or alternately to deliver sick burns with retorting a “This you,” when someone winds up making an unpopular statement. One of the approaches I like to use it for, which is significantly less nefarious—usually—is looking back in time at companies' websites, just to see how the positioning of the product evolves over time.And ChaosSearch has had an interesting evolution in that direction. But before we get into that, assuming that there might actually be people listening who do not know the intimate details of exactly what it is you folks do, what is ChaosSearch, and what might you folks do?Thomas: Yeah, well said, and I look forward to [laugh] doing the Wayback Time because some of our ideas, way back when, seemed crazy, but now they make a lot of sense. So, what ChaosSearch is all about is transforming customers' cloud object stores like Amazon S3 into an analytical database that supports search and SQL-type use cases. Now, where's that apply? In log analytics, observability, security, security data lakes, operational data, particularly at scale, where you just stream your data into your data lake, connect our service, our SaaS service, to that lake and automagically we index it and provide well-known APIs like Elasticsearch and integrate with Kibana or Grafana, and SQL APIs, something like, say, a Superset or Tableau or Looker into your data. So, you stream it in and you get analytics out. And the key thing is the time-cost complexity that we all know that operational data, particularly at scale, like terabytes and a day and up causes challenges, and we all know how much it costs.Corey: They certainly do. One of the things that I found interesting is that, as I've mentioned before, when I do consulting work at The Duckbill Group, we have absolutely no partners in the entire space. That includes AWS, incidentally. But it was easy in the beginning because I was well aware of what you folks were up to, and it was great when there was a use case that matched of you're spending an awful lot of money on Elasticsearch; consider perhaps migrating some of that—if it makes sense—to ChaosSearch. Ironically, when you started sponsoring some of my nonsense, that conversation got slightly trickier where I had to disclose, yeah our media arm is does have sponsorships going on with them, but that has no bearing on what I'm saying.And if they take their sponsorships away—please don't—then we would still be recommending them because it's the right answer, and it's what we would use if we were in your position. We receive no kickbacks or partner deal or any sort of reseller arrangement because it just clouds the whole conflict of interest perception. But you folks have been fantastic for a long time in a bunch of different ways.Thomas: Well, you know, I would say that what you thought made a lot of sense made a lot of sense to us as well. So, the ChaosSearch idea just makes sense. Now, you had to crack some code, solve some problems, invent some technology, and create some new architecture, but the idea that Elasticsearch is a useful solution with all the tooling, the visualization, the wonderful community around that, was a good place to start, but here's the problem: setting it up, scaling it out, keep it up, when things are happening, things go bump in the night. All those are real challenges, and one of them was just the storaging of the data. Well, what if you could make S3 the back-end store? One hundred percent; no SSDs or HDDs. Makes a lot of sense.And then support the APIs that your tooling uses. So, it just made a lot of sense on what we were trying to do, just no one thought of it. Now, if you think about the Northstar you were talking about, you know, five, six years ago, when I said, transforming cloud storage into an analytical database for search and SQL, people thought that was crazy and mad. Well, now everyone's using Cloud Storage, everyone's using S3 as a data lake. That's not in question anymore.But it was a question five, six, you know, years ago. So, when we met up, you're like, “Well, that makes sense.” It always made sense, but people either didn't think was possible, or were worried, you know, I'll just try to set up an Elastic cluster and deal with it. Because that's what happens when you particularly deal with large-scale implementations. So, you know, to us, we would love the Elastic API, the tooling around it, but what we all know is the cost, the time the complexity, to manage it, to scale it out, just almost want to pull your hair out. And so, that's where we come in is, don't change what you do, just change how you do it.Corey: Every once in a while, I'll talk to a client who's running an Amazon Elasticsearch cluster, and they have nothing but good things to say about it. Which, awesome. On the one hand, part of me wishes that I had some of their secrets, but often what's happened is that they have this down to a science, they have a data lifecycle that's clearly defined and implemented, the cluster is relatively static, so resizes aren't really a thing, and it just works for their use cases. And in those scenarios, like, “Do you care about the bill?” “Not overly. We don't have to think about it.”Great. Then why change? If there's no pain, you're not going to sell someone something, especially when we're talking, this tends to be relatively smaller-scale as well. It's okay, great, they're spending $5,000 a month on it. It doesn't necessarily justify the engineering effort to move off.Now, when you start looking at this, and, “Huh, that's a quarter million bucks a month we're spending on this nonsense, and it goes down all the time,” yeah, that's when it starts to be one of those logical areas to start picking apart and diving into. What's also muddied the waters since the last time we really went in-depth on any of this was it used to be we would be talking about it exactly like we are right now, about how it's Elasticsearch-compatible. Technically, these days, we probably shouldn't be saying it is OpenSearch compatible because of the trademark issues between Elastic and AWS and the Schism of the OpenSearch fork of the Elasticsearch project. And now it feels like when you start putting random words in front of the word search, ChaosSearch fits right in. It feels like your star is rising.Thomas: Yeah, no, well said. I appreciate that. You know, it's funny when Elastic changed our license, we all didn't know what was going to happen. We knew something was going to happen, but we didn't know what was going to happen. And Amazon, I say ironically, or, more importantly, decided they'll take up the open mantle of keeping an open, free solution.Now, obviously, they recommend running that in their cloud. Fair enough. But I would say we don't hear as much Elastic replacement, as much as OpenSearch replacement with our solution because of all the benefits that we talked about. Because the trigger points for when folks have an issue with the OpenSearch or Elastic stack is got too expensive, or it was changing so much and it was falling over, or the complexity of the schema changing, or all the above. The pipelines were complex, particularly at scale.That's both for Elasticsearch, as well as OpenSearch. And so, to us, we want either to win, but we want to be the replacement because, you know, at scale is where we shine. But we have seen a real trend where we see less Elasticsearch and more OpenSearch because the community is worried about the rules that were changed, right? You see it day in, day out, where you have a community that was built around open and fair and free, and because of business models not working or the big bad so-and-so is taking advantage of it better, there's a license change. And that's a trust change.And to us, we're following the OpenSearch path because it's still open. The 600-pound gorilla or 900-pound gorilla of Amazon. But they really held the mantle, saying, “We're going to stay open, we assume for as long as we know, and we'll follow that path. But again, at that scale, the time, the costs, we're here to help solve those problems.” Again, whether it's on Amazon or, you know, Google et cetera.Corey: I want to go back to what I mentioned at the start of this with the Wayback Machine and looking at how things wound up unfolding in the fullness of time. The first time that it snapshotted your site was way back in the year 2018, which—Thomas: Nice. [laugh].Corey: Some of us may remember, and at that point, like, I wasn't doing any work with you, and later in time I would make fun of you folks for this, but back then your brand name was in all caps, so I would periodically say things like this episode is sponsored by our friends at [loudly] CHAOSSEARCH.Thomas: [laugh].Corey: And once you stopped capitalizing it and that had faded from the common awareness, it just started to look like I had the inability to control the volume of my own voice. Which, fair, but generally not mid-sentence. So, I remember those early days, but the positioning of it was, “The future of log management and analytics,” back in 2018. Skipping forward a year later, you changed this because apparently in 2019, the future was already here. And you were talking about, “Log search analytics, purpose-built for Amazon S3. Store everything, ask anything all on your Amazon S3.”Which is awesome. You were still—unfortunately—going by the all caps thing, but by 2020, that wound up changing somewhat significantly. You were at that point, talking for it as, “The data platform for scalable log analytics.” Okay, it's clearly heading in a log direction, and that made a whole bunch of sense. And now today, you are, “The data lake platform for analytics at scale.” So, good for you, first off. You found a voice?Thomas: [laugh]. Well, you know, it's funny, as a product mining person—I'll take my marketing hat off—we've been building the same solution with the same value points and benefits as we mentioned earlier, but the market resonates with different terminology. When we said something like, “Transforming your Cloud Object Storage like S3 into an analytical database,” people were just were like, blown away. Is that even possible? Right? And so, that got some eyes.Corey: Oh, anything is a database if you hold that wrong. Absolutely.Thomas: [laugh]. Yeah, yeah. And then you're saying log analytics really resonated for a few years. Data platform, you know, is more broader because we do more broader things. And now we see over the last few years, observability, right? How do you fit in the observability viewpoint, the stack where log analytics is one aspect to it?Some of our customers use Grafana on us for that lens, and then for the analysis, alerting, dashboarding. You can say that Kibana in the hunting aspect, the log aspects. So, you know, to us, we're going to put a message out there that resonates with what we're hearing from our customers. For instance, we hear things like, “I need a security data lake. I need that. I need to stream all my data. I need to have all the data because what happens today that now, I need to know a week, two weeks, 90 days.”We constantly hear, “I need at least 90 days forensics on that data.” And it happens time and time again. We hear in the observability stack where, “Hey, I love Datadog, but I can't afford it more than a week or two.” Well, that's where we come in. And we either replace Datadog for the use cases that we support, or we're auxiliary to it.Sometimes we have an existing Grafana implementation, and then they store data in us for the long tail. That could be the scenario. So, to us, the message is around what resonates with our customers, but in the end, it's operational data, whether you want to call it observability, log analytics, security analytics, like the data lake, to us, it's just access to your data, all your data, all the time, and supporting the APIs and the tooling that you're using. And so, to me, it's the same product, but the market changes with messaging and requirements. And this is why we always felt that having a search and SQL platform is so key because what you'll see in Elastic or OpenSearch is, “Well, I only support the Elastic API. I can't do correlations. I can't do this. I can't do that. I'm going to move it over to say, maybe Athena but not so much. Maybe a Snowflake or something else.”Corey: “Well, Thomas, it's very simple. Once you learn our own purpose-built, domain-specific language, specifically for our product, well, why are you still sitting here, go learn that thing.” People aren't going to do that.Thomas: And that's what we hear. It was funny, I won't say what the company was, a big banking company that we're talking to, and we hear time and time again, “I only want to do it via the Elastic tooling,” or, “I only want to do it via the BI tooling.” I hear it time and time again. Both of these people are in the same company.Corey: And that's legitimate as well because there's a bunch of pre-existing processes pointing at things and we're not going to change 200 different applications in their data model just because you want to replace a back-end system. I also want to correct myself. I was one tab behind. This year's branding is slightly different: “Search and analyze unlimited log data in your cloud object storage.” Which is, I really like the evolution on this.Thomas: Yeah, yeah. And I love it. And what was interesting is the moving, the setting up, the doubling of your costs, let's say you have—I mean, we deal with some big customers that have petabytes of data; doubling your petabytes, that means, if your Elastic environment is costing you tens of millions and then you put into Snowflake, that's also going to be tens of millions. And with a solution like ours, you have really cost-effective storage, right? Your cloud storage, it's secure, it's reliable, it's Elastic, and you attach Chaos to get the well-known APIs that your well-known tooling can analyze.So, to us, our evolution has been really being the end viewpoint where we started early, where the search and SQL isn't here today—and you know, in the future, we'll be coming out with more ML type tooling—but we have two sides: we have the operational, security, observability. And a lot of the business side wants access to that data as well. Maybe it's app data that they need to do analysis on their shopping cart website, for instance.Corey: The thing that I find curious is, the entire space has been iterating forward on trying to define observability, generally, as whatever people are already trying to sell in many cases. And that has seemed to be a bit of a stumbling block for a lot of folks. I figured this out somewhat recently because I've built the—free for everyone to use—the lasttweetinaws.com, Twitter threading client.That's deployed to 20 different AWS regions because it's go—the idea is that should be snappy for people, no matter where they happen to be on the planet, and I use it for conferences when I travel, so great, let's get ahead of it. But that also means I've got 20 different sources of logs. And given that it's an omnibus Lambda function, it's very hard to correlate that to users, or user sessions, or even figure out where it's going. The problem I've had is, “Oh, well, this seems like something I could instrument to spray logs somewhere pretty easily, but I don't want to instrument it for 15 different observability vendors. Why don't I just use otel—or Open Telemetry—and then tell that to throw whatever I care about to various vendors and do a bit of a bake-off?” The problem, of course, is that open telemetry and Lambda seem to be in just the absolute wrong directions. A lot.Thomas: So, we see the same trend of otel coming out, and you know, this is another API that I'm sure we're going to go all-in on because it's getting more and more talked about. I won't say it's the standard that I think is trending to all your points about I need to normalize a process. But as you mentioned, we also need to correlate across the data. And this is where, you know, there are times where search and hunting and alerting is awesome and wonderful and solves all your needs, and sometimes correlation. Imagine trying to denormalize all those logs, set up a pipeline, put it into some database, or just do a SELECT *, you know, join this to that to that, and get your answers.And so, I think both OpenTelemetry and SQL and search all need to be played into one solution, or at least one capability because if you're not doing that, you're creating some hodgepodge pipeline to move it around and ultimately get your questions answered. And if it takes weeks—maybe even months, depending on the scale—you may sometimes not choose to do it.Corey: One other aspect that has always annoyed me about more or less every analytics company out there—and you folks are no exception to this—is the idea of charging per gigabyte ingested because that inherently sets up a weird dichotomy of, well, this is costing a lot, so I should strive to log less. And that is sort of the exact opposite, not just of the direction you folks want customers to go in, but also where customers themselves should be going in. Where you diverge from an awful lot of those other companies because of the nature of how you work, is that you don't charge them again for retention. And the idea that, yeah, the fact that anything stored in ChaosSearch lives in your own S3 buckets, you can set your own lifecycle policies and do whatever you want to do with that is a phenomenal benefit, just because I've always had a dim view of short-lived retention periods around logs, especially around things like audit logs. And these days, I would consider getting rid of audit logging data and application logging data—especially if there's a correlation story—any sooner than three years feels like borderline malpractice.Thomas: [laugh]. We—how many times—I mean, we've heard it time and time again is, “I don't have access to that data because it was too costly.” No one says they don't want the data. They just can't afford the data. And one of the key premises that if you don't have all the data, you're at risk, particularly in security—I mean, even audits. I mean, so many times our customers ask us, you know, “Hey, what was this going on? What was that go on?” And because we can so cost-effectively monitor our own service, we can provide that information for them. And we hear this time and time again.And retention is not a very sexy aspect, but it's so crucial. Anytime you look in problems with X solution or Y solution, it's the cost of the data. And this is something that we wanted to address, officially. And why do we make it so cost-effective and free after you ingest it was because we were using cloud storage. And it was just a great place to land the data cost-effective, securely.Now, with that said, there are two types of companies I've seen. Everybody needs at least 90 days. I see time and time again. Sure, maybe daily, in a weeks, they do a lot of their operation, but 90 days is where it lands. But there's also a bunch of companies that need it for years, for compliance, for audit reasons.And imagine trying to rehydrate, trying to rebuild—we have one customer—again I won't say who—has two petabytes of data that they rehydrate when they need it. And they say it's a nightmare. And it's growing. What if you just had it always alive, always accessible? Now, as we move from search to SQL, there are use cases where in the log world, they just want to pay upfront, fixed fee, this many dollars per terabyte, but as we get into the more ad hoc side of it, more and more folks are asking for, “Can I pay per query?”And so, you'll see coming out soon, about scenarios where we have a different pricing model. For logs, typically, you want to pay very consistent, you know, predetermined cost structure, but in the case of more security data lakes, where you want to go in the past and not really pay for something until you use it, that's going to be an option as well coming out soon. So, I would say you need both in the pricing models, but you need the data to have either side, right?Corey: This episode is sponsored in part by our friends at ChaosSearch. You could run Elasticsearch or Elastic Cloud—or OpenSearch as they're calling it now—or a self-hosted ELK stack. But why? ChaosSearch gives you the same API you've come to know and tolerate, along with unlimited data retention and no data movement. Just throw your data into S3 and proceed from there as you would expect. This is great for IT operations folks, for app performance monitoring, cybersecurity. If you're using Elasticsearch, consider not running Elasticsearch. They're also available now in the AWS marketplace if you'd prefer not to go direct and have half of whatever you pay them count towards your EDB commitment. Discover what companies like Equifax, Armor Security, and Blackboard already have. To learn more, visit chaossearch.io and tell them I sent you just so you can see them facepalm, yet again.Corey: You'd like to hope. I mean, you could always theoretically wind up just pulling what Ubiquiti apparently did—where this came out in an indictment that was unsealed against an insider—but apparently one of their employees wound up attempting to extort them—which again, that's not their fault, to be clear—but what came out was that this person then wound up setting the CloudTrail audit log retention to one day, so there were no logs available. And then as a customer, I got an email from them saying there was no evidence that any customer data had been accessed. I mean, yeah, if you want, like, the world's most horrifyingly devilish best practice, go ahead and set your log retention to nothing, and then you too can confidently state that you have no evidence of anything untoward happening.Contrast this with what AWS did when there was a vulnerability reported in AWS Glue. Their analysis of it stated explicitly, “We have looked at our audit logs going back to the launch of the service and have conclusively proven that the only time this has ever happened was in the security researcher who reported the vulnerability to us, in their own account.” Yeah, one of those statements breeds an awful lot of confidence. The other one makes me think that you're basically being run by clowns.Thomas: You know what? CloudTrail is such a crucial—particularly Amazon, right—crucial service because of that, we see time and time again. And the challenge of CloudTrail is that storing a long period of time is costly and the messiness the JSON complexity, every company struggles with it. And this is how uniquely—how we represent information, we can model it in all its permutations—but the key thing is we can store it forever, or you can store forever. And time and time again, CloudTrail is a key aspect to correlate—to your question—correlate this happened to that. Or do an audit on two years ago, this happened.And I got to tell you, to all our listeners out there, please store your CloudTrail data—ideally in ChaosSearch—because you're going to need it. Everyone always needs that. And I know it's hard. CloudTrail data is messy, nested JSON data that can explode; I get it. You know, there's tricks to do it manually, although quite painful. But CloudTrail, every one of our customers is indexing with us in CloudTrail because of stories like that, as well as the correlation across what maybe their application log data is saying.Corey: I really have never regretted having extra logs lying around, especially with, to be very direct, the almost ridiculously inexpensive storage classes that S3 offers, especially since you can wind up having some of the offline retrieval stuff as part of a lifecycle policy now with intelligent tiering. I'm a big believer in just—again—the Glacier Deep Archive I've at the cost of $1,000 a month per petabyte, with admittedly up to 12 hours of calling that as a latency. But that's still, for audit logs and stuff like that, why would I ever want to delete things ever again?Thomas: You're exactly right. And we have a bunch of customers that do exactly that. And we automate the entire process with you. Obviously, it's your S3 account, but we can manage across those tiers. And it's just to a point where, why wouldn't you? It's so cost-effective.And the moments where you don't have that information, you're at risk, whether it's internal audits, or you're providing a service for somebody, it's critical data. With CloudTrail, it's critical data. And if you're not storing it and if you're not making it accessible through some tool like an Elastic API or Chaos, it's not worth it. I think, to your point about your story, it's epically not worth it.Corey: It's really not. It's one of those areas where that is not a place to overly cost optimize. This is—I mean we talked earlier about my business and perceptions of conflict of interest. There's a reason that I only ever charge fixed-fee and not percentage of savings or whatnot because, at some point, I'll be placed in a position of having to say nonsense, like, “Do you really need all of these backups?” That doesn't make sense at that point.I do point out things like you have hourly disk snapshots of your entire web fleet, which has no irreplaceable data on them dating back five years. Maybe cleaning some of that up might be the right answer. The happy answer is somewhere in between those two, and it's a business decision around exactly where that line lies. But I'm a believer in never regretting having kept logs almost into perpetuity. Until and unless I start getting more or less pillaged by some particularly rapacious vendor that's oh, yeah, we're going to charge you not just for ingest, but also for retention. And for how long you want to keep it, we're going to treat it like we're carving it into platinum tablets. No. Stop that.Thomas: [laugh]. Well, you know, it's funny, when we first came out, we were hearing stories that vendors were telling customers why they didn't need their data, to your point, like, “Oh, you don't need that,” or, “Don't worry about that.” And time and time again, they said, “Well, turns out we didn't need that.” You know, “Oh, don't index all your data because you just know what you know.” And the problem is that life doesn't work out that way business doesn't work out that way.And now what I see in the market is everyone's got tiering scenarios, but the accessibility of that data takes some time to get access to. And these are all workarounds and bandaids to what fundamentally is if you design an architecture and a solution is such a way, maybe it's just always hot; maybe it's just always available. Now, we talked about tiering off to something very, very cheap, then it's like virtually free. But you know, our solution was, whether it's ultra warm, or this tiering that takes hours to rehydrate—hours—no one wants to live in that world, right? They just want to say, “Hey, on this date on this year, what was happening? And let me go look, and I want to do it now.”And it has to be part of the exact same system that I was using already. I didn't have to call up IT to say, “Hey, can you rehydrate this?” Or, “Can I go back to the archive and look at it?” Although I guess we're talking about archiving with your website, viewing from days of old, I think that's kind of funny. I should do that more often myself.Corey: I really wish that more companies would put themselves in the customers' shoes. And for what it's worth, periodically, I've spoken to a number of very happy ChaosSearch customers. I haven't spoken to any angry ones yet, which tells me you're either terrific at crisis comms, or the product itself functions as intended. So, either way, excellent job. Now, which team of yours is doing that excellent job, of course, is going to depend on which one of those outcomes it is. But I'm pretty good at ferreting out stories on those things.Thomas: Well, you know, it's funny, being a company that's driven by customer ask, it's so easy build what the customer wants. And so, we really take every input of what the customer needs and wants—now, there are cases where we relace Splunk. They're the Cadillac, they have all the bells and whistles, and there's times where we'll say, “Listen, that's not what we're going to do. We're going to solve these problems in this vector.” But they always keep on asking, right? You know, “I want this, I want that.”But most of the feedback we get is exactly what we should be building. People need their answers and how they get it. It's really helped us grow as a company, grow as a product. And I will say ever since we went live now many, many years ago, all our roadmap—other than our Northstar of transforming cloud storage into a search SQL big data analytics database has been customer-driven, market customer-driven, like what our customer is asking for, whether it's observability and integrating with Grafana and Kibana or, you know, security data lakes. It's just a huge theme that we're going to make sure that we provide a solution that meets those needs.So, I love when customers ask for stuff because the product just gets better. I mean, yeah, sometimes you have to have a thick skin, like, “Why don't you have this?” Or, “Why don't you have that?” Or we have customers—and not to complain about customers; I love our customers—but they sometimes do crazy things that we have to help them on crazy-ify. [laugh]. I'll leave it at that. But customers do silly things and you have to help them out. I hope they remember that, so when they ask for a feature that maybe takes a month to make available, they're patient with us.Corey: We sure can hope. I really want to thank you for taking so much time to once again suffer all of my criticisms, slings and arrows, blithe market observations, et cetera, et cetera. If people want to learn more, where's the best place to find you?Thomas: Well, of course, chaossearch.io. There's tons of material about what we do, use cases, case studies; we just published a big case study with Equifax recently. We're in Gartner and a whole bunch of Hype Cycles that you can pull down to see how we fit in the market.Reach out to us. You can set up a trial, kick the tires, again, on your cloud storage like S3. And ChaosSearch on Twitter, we have a Facebook, we have all this classic social medias. But our website is really where all the good content and whether you want to learn about the architecture and how we've done it, and use cases; people who want to say, “Hey, I have a problem. How do you solve it? How do I learn more?”Corey: And we will, of course, put links to that in the show notes. For my own purposes, you could also just search for the term ChaosSearch in your email inbox and find one of their sponsored ads in my newsletter and click that link, but that's a little self-serving as we do it. I'm kidding. I'm kidding. There's no need to do that. That is not how we ever evaluate these things. But it is funny to tell that story. Thomas, thank you so much for your time. As always, it's appreciated.Thomas: Corey Quinn, I truly enjoyed this time. And I look forward to upcoming re:Invent. I'm assuming it's going to be live like last year, and this is where we have a lot of fun with the community.Corey: Oh, I have no doubt that we're about to go through that particular path very soon. Thank you. It's been an absolute pleasure.Thomas: Thank you.Corey: Thomas Hazel, CTO and Founder of ChaosSearch. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment that I will then set to have a retention period of one day, and then go on to claim that I have received no negative feedback.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Unseen Impact of Cloud Migration with Donovan Brady

Screaming in the Cloud

Play Episode Listen Later Sep 29, 2022 35:13


About DonovanDonovan Brady is the Director of Solutions Architecture at Logicworks. He began his career at Logicworks six years ago as a Solutions Architect, fast forward to today, Donovan now manages a team of highly skilled and certified AWS and Azure Solutions Architects. During his time at Logicworks, Donovan has had the opportunity to work with companies in a variety of verticals to solve their most complex IT and business challenges. Donovan is originally from New York and has been a professional musician since the age of six. He is also a self-proclaimed 90's video game nerd.Links Referenced: LogicWorks: https://www.logicworks.com/ Donovan's LinkedIn: https://www.linkedin.com/in/donovan-brady-9403a583/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest on this promoted episode of Screaming in the Cloud is Donovan Brady, director of solutions architecture at Logicworks and something of a kindred spirit in that he tends to also focus on something that I more or less spend my time being obnoxious about on Twitter, which is in many cases going towards cloud for the right reasons with an outcome in mind, which is rarely having the most interesting and clever technical stack imaginable. Donovan, thank you for joining me today.Donovan: Yeah, thanks for having me. Corey, really excited for the conversation and looking forward to getting into it.Corey: Let's start by establishing the bona fides, for lack of a better term here. What does Logicworks do that they require a director of solutions architecture?Donovan: Logicworks is a managed services and professional services cloud provider specializing in hyper-compliant workloads migrating, optimizing, and operating in the cloud, right? That means that we work with primarily Software-as-a-Service companies or HealthTech or FinTech companies with compliances, like PCI, HIPAA, HITRUST, SOC, ISO, GDPR, pretty much you name it and we help customers in their cloud journey operate and optimize in the cloud.Corey: It's weird, when you talk about optimizing in cloud, people always hear that as, “Oh, we're going to fix the bill, we're going to—” because again, that is the context in which I operate where, “Oh, great. We're going to optimize your cloud bill.” Which makes sense, but I think that people have lost sight of the forest for the trees in many respects, where when they hear, “I'm going to optimize your bill,” that often comes across, “Oh, we're going to make it smaller,” because generally, it's not very well optimized, and one of those optimal things you can do is turn something that's unneeded or unnecessary off, and surprise, there's a side effect of saving money. But it often means that in some cases, it's time to start spending more money on things like, oh I don't know, backups, and resiliency, and figure out what it is that the business is aligned around. Because you can always cut the bill to zero by turning everything off. It seems like that's not really the alignment that—or the reason that companies go to cloud in the first place. So, what's your take on that? Why do most companies say, “Ah. We have a problem and we're going to go with cloud,” in the hopes that it fixes that problem?Donovan: Yeah, it's an interesting point. So, a lot of times we hear customers say exactly what you just mentioned: “We want to move to cloud so that we can save costs,” or, “Oh, we're in the cloud and we're primarily concerned about costs.” Unfortunately, that's the wrong motivation. Saving cost is definitely a byproduct of moving to the cloud if you do it the right way, but the primary business reasons why somebody or an organization would want to move to the cloud are slightly different, right? The business objectives that most customers or companies want to increase are their agility, they want to increase their profitability, they want to decrease their time-to-revenue.Let's say—as I mentioned, I work with a lot of Software-as-a-Service companies—they have a monolithic application right now that takes a long time to update, it takes a long time to patch. If you can modernize that application, if you can use some more cutting edge or bleeding edge tools like what's provided in the public cloud, you'd be able to significantly decrease that timetable for deploying new updates, acquiring new companies, decreasing your time to revenue. Those are the primary business drivers that customers should be focused on. And then when you're optimizing in the cloud, you're really looking at the five categories of the Well-Architected Framework, which as you mentioned, isn't just cost. There's security, there's reliability, there's performance, and there's operations, and all of these are kind of intertwined, and can eventually lead to a decrease in [laugh] costs, right, but if you were to just lift-and-shift into the cloud, you're probably not going to save that much money.Corey: Let's not forget the sixth pillar of the Well-Architected Framework, which was recently added which is sustainability. Unfortunately, it does seem a little true to life where it seems like it's been bolted on after the fact. I would have expected that to also be the security pillar, but that's a little sensitive for some folks. It's one of those areas where sustainability and cost optimization tend to go hand-in-glove because turning something off benefits everyone except the cloud providers hoping that you don't turn things off in some cases. I don't necessarily believe that's where the major hyperscalers are sitting today, but there's no denying that they do benefit from things sitting there going unused, as do most companies that charge money for a thing that they provide you.Donovan: Yeah, that's exactly true. There are some other tools that make it easier to save costs and still achieve your expected goals, and that's the more of those more cutting-edge technologies like a serverless deployment, right? A Lambda function is a point-in-time deployment of your code without needing to rely on an ongoing virtual machine, or database, or whatever it is that is running that application, and it runs for just a couple minutes, couple seconds, however long you need it. And I was actually just speaking with a customer recently, we did this CXO dinner, and we were talking about the benefits of the cloud. It was almost as if we planted him there.We didn't; he randomly showed up. But he was also promoting the cloud because he said he runs his entire organization almost in the free tier of AWS because the entire thing is serverless-based, you know? So, there are definitely ways to optimize your costs and therefore sustainability, as you mentioned: turning things off or maybe not have them permanently running in the first place. But you can only achieve that once you've actually done the shift after the lift.Corey: The website that hosts this podcast is lastweekinaws.com. The first version of that that I put up was entirely Lambda and S3-driven. It was traditionally serverless; it cost pennies a month to run.And I've migrated it a few years ago to where it currently is, living on WordPress at WP Engine. Now, a lot of the technology purists will look at that and say that I went in the exact wrong direction. Why would I ever do that? Well, because things like integrating a podcast feed into the website are, grab a plugin; call it good. I can find a universe of people who are better at working with WordPress from a development perspective than I am as opposed to building something myself out of basically popsicle sticks and string and spending all of my time maintaining that.Like, this website does not directly bring in any revenue to the business. It is ancillary. I need to have it in order to empower what the business does, but it is not a core competency of what we do, so outsourcing that to someone who does specialize in this makes an awful lot of business sense. And very often I'll see people who are missing the point in cloud when it comes to losing sight of what the actual business objective is.Donovan: Absolutely. Oftentimes, we hear the primary business goal is, “We want to decrease costs.” They hear for a number of reasons, “We can decrease costs.” “Oh, we can cut some of our operations teams because the cloud just magically runs.” You know, it doesn't exactly work that way. But they're missing sight of the actual business drivers that can help them grow the business, right?What I like to say is that the cloud is not just a data center to host your infrastructure or host your applications; it's actually a platform to grow your business. And because of these more streamlined and automated tools that AWS, Azure, Google, and these other hyperscaler clouds provide you, you can actually hit an immense amount of profitability by just leveraging these tools, but it requires you to transform your business. And that is what cloud adoption really is. Many people think that cloud adoption is a project, “Oh, I migrated to the cloud,” and then you leave it alone. Right?But if you do that you're subject to a lot of issues. Back to that Well-Architected Framework that we said, if you don't have any guardrails in place to make sure that you have the proper security posture or the proper high availability or reliability concerns, right, it leads to cloud sprawl. Now, cloud sprawl is when an environment lacks the necessary guardrails and governance to limit the deployment of resources, right? This has a number of impacts, including a larger attack surface—that's primarily security—there are tons of resources that can get deployed—there's a lot of cost there—and then a lot of management overhead, you know? So, this means that cloud adoption is not a point-in-time project; it's really a process; it's a methodology; it's an ongoing, continuous process, to make sure that you are cost-optimized, to make sure that you're reliable, to make sure that you're secure.And if you can maintain an optimized environment that's well-architected, you will inevitably grow your business because as we mentioned, it's going to lead to better innovation, more agile development teams that can go to market quicker, increasing your overall revenue.Corey: I think that people like to lose sight of the fact that in almost every case, unless you're doing something absolutely bizarre, payroll is going to cost more than your cloud bill. Now, please don't take that as a personal challenge if you're listening to this. The goal is not to run up the score and see how high you can get just for funsies. Although I do admit, I play that game occasionally with, you know, accounts that are not my own. But in practice, for an easy example, on this, people like to turn up their nose at RDS sometimes because well, that charges a lot of money to run MySQL and I can run MySQL on top of EC2 instances.Yes, you can, but what is your time worth comparatively? And at certain points of scale, when you're making extraordinary demands on it, the economics do come back around where yeah, you probably should be running it on EC2 instead of as a managed service, but so much of that is going to be contextual. It's basically impossible to look at an AWS estate—or any cloud estate for that matter—and unequivocally say, “Oh, this is a bad design. This is something that should not be because of X, Y, and Z,” because invariably, there's context that you're missing. I look at cloud accounts all the time where I could, in a vacuum, go ahead and optimize the living hell out of it, except there are reasons that things are the way they are, and the best way to look like a junior consultant is to show up and start throwing shade without understanding why things are the way they are.Donovan: Exactly. And I think the number one issue that people run into, that organizations run into post-migration, is being able to track or tie back their migration and their optimization efforts to the business drivers. Logicworks actually ran an anonymous campaign, it was a survey. We hired some consultants, and some of the information that we found was absolutely shocking. They said that about 63% of organizations that have migrated to the cloud don't actually understand or see the value of the cloud.Now, again, if you're listening to this, that doesn't mean that there isn't the value; it means that these organizations haven't been able to track that, they haven't been able to identify, okay, what are the agility metrics? What is the intended time to market? How long do we want to go with patches? Is this going to be a 24-hour timetable or is it going to be a week-long timetable? How many pushes are we making a day?And then you can actually track that to the revenue that's been generated? And then you can understand. But that's a lot of work, and people are mostly concerned about the hard details of the technical. You know, “Okay, just put me in there. Let's see how it goes. We got to get out of our data center for one reason or another,” but they missed the overall idea of this adoption that we're referencing here.Corey: On some level, it feels like the real business value of a cloud migration has little to do with the cloud itself and everything to do with let's get out of the environment we're currently in and into a new environment because that turns over a whole bunch of rocks and lets you finally sunset some things that really should have been turned off decades ago. I don't know that's too cynical of a take, but I can't shake the feeling there's some validity to it somewhere.Donovan: [laugh]. Yeah, there definitely is some validity to that we've worked with a number of customers that we've migrated, and one perfect example. Last year, we were working with this disaster cleanup organization. They are one of the largest in the country. They have 1700 franchises and they needed to get out of their primary data center because there were a ton of issues.There was actually a case where a squirrel had chewed through one of their power cables to the data center and shut off their air conditioning, so their data center was overheating. We were talking to them about the entire scope that they understood the migration to include, it had about 100 distinct applications and about maybe three to 400 virtual machines. Once we actually got into the assessment, there were over 700 virtual machines and 320 applications; distinct applications. There's a mixture between custom off-the-shelf builds or homegrown applications that they've been making themselves, but they didn't understand that they had over 60% of the IT estate that was just residing in that data center. Because sometimes it's difficult to keep track of all of that.That's one of the things that cloud helps with. Cloud provides a ton of management and visibility and observability and traceability tools, but again, they need to be enabled, you know? And I think when companies are concerned about migrating and they're just concerned about the re-hosting or the re-platforming of their applications, the management of this as an afterthought, and the actual, “How does this change our business,” is kind of a, “Oh man,” question mark in their head because that wasn't something that was considered. And then that's usually when we get the call.Corey: It's always fun when the bat phone goes off because people are generally not calling because, “Hey, things are great here. We just really wanted to boast about it some.” It turns into a, “Oh crap, we have a problem.” That honestly is one of my favorite parts of consulting is when you wind up being able to solve what feels like a monumental problem for someone, and sure, it's relatively easy for you—presumably because when you do the same thing again and again and become a subject matter expert in it, it's just a question of style more than how do I solve this intractable problem. But it's nice to be able to walk away with a client saying, “That was awesome. I wish we could do it again.”Donovan: Yeah.Corey: Helping people is really what the game is about. I think that some folks tend to lose sight of that. And again, consulting doesn't have the best reputation in the world, due to some of the larger shops, in some cases, doing what presents as, “Oh, I don't know how to fix this, but I'm going to show up and prolong the problem forever.” I don't see that that is true consulting in the traditional sense, but maybe I'm just playing games with words.Donovan: No, I agree with that. I think one of my favorite parts of consulting is taking a step back and evaluating the entire landscape of what the problem is that we're trying to solve, right? Oftentimes, as you are aware, somebody comes to you and they say, “This is the problem and this is what I need you to do to fix it.” Eight out of ten times, the solution that they've come up with probably isn't the right solution. That will fix a symptom, right, but it's not going to fix the actual cause or the biggest issue that's plaguing them, right?They have this continuous issue and they know that that's going to cause them heartache and we need somebody to just do this, we don't have time to do this. But if you peel back the onion and understand why you're having that issue, that's where I think a better consulting company would come in and help them discern.Corey: There's value in perspective. Very often, when you are the client organization, you're too close to the problem, and/or you are extraordinarily familiar with your own context, but you're missing some of the larger context in the greater ecosystem. I mean, one of the ways that I tend to look like a wizard from the future when I see something odd on their AWS bill and can just call it out, like, “Oh yeah, that's this weird side effect of when you wind up having additional CloudTrail Management Events, yadda, yadda, yadda, yadda,” and they look at me like I'm a wizard from the future because I can just bust that out off the top of my head. Yeah, but the reason I can do that is because these things repeat themselves, these patterns continue to emerge, and the first time I saw that, it took me two weeks to get to the bottom of it. Now, when I see the symptoms of it, it's oh, yeah, it's that thing again.And I feel like consulting is just a collection of stories like that again, and again, and again. And again, part of the trick is you don't let the client ever see you sweat or do the research. You just, “All right, my next availability is in two weeks. I have a thing coming up.” And that thing, as it turns out, is spending two weeks of deep-dive research. But at least in my case I'd never charge by the hour, so it's all about being mysterious and perceived as being good at these things, at least back in the early days. Then for my sins, I actually became good at it. Oops.Donovan: [laugh]. Doesn't that always seem to happen? It's just like, “Oh, I didn't expect to be an expert in this,” and then I don't know where one day you're like, “Oh, I guess I am the expert.”Corey: Yeah. There's also value in being an outside voice where you're not beholden to individual stakeholders of, “Oh, yeah, we're never allowed to wind up talking about that one system because someone went empire-building and they're powerful here and you can't ever talk about that thing in any way that doesn't lead to more headcount and the rest.” Awesome. I don't have the energy or time for those things, and to be direct, I'm not very good at it, as an employee. I can mind my manners for the duration of a relatively short duration consulting engagement just fine, but I'm also not necessarily there to look at things that are clearly suboptimal and say, “Oh, yeah, this is the way that it should be.”But I'm also not the type to come in and say, “Well, why didn't you build this in Lambda?” “Well, genius because then when this thing was built, Lambda didn't exist, for one,” is a perfectly valid answer to that. And why would you go back and refactor something that's already doing its job, unless your primary business objective is to bolster your own resume? Which I would suggest, it probably shouldn't be?Donovan: Yeah, exactly. And that's where that outside perspective comes in because there are seven Rs of migration, right? Used to be six Rs; now there's seven hours of migration, and you need to continuously reevaluate all of your application portfolio and see which of the seven Rs is most applicable to you, right? This is why that cloud adoption idea is an ongoing process. Maybe you're beholden to some legacy applications that they really did serve their purpose when you were first migrating and there was nothing wrong with them, but now a few years later, it's time to take a deeper look at all of your applications. Maybe we don't need that application anymore. Maybe we can repurchase that. Maybe there's a SaaS version of that, that we can go leverage now.Or maybe it's completely unrelated. You know, I was working with a prospect recently, who came to us—they are currently hosted in AWS—they came to us by way of Microsoft because we're both an AWS and an Azure Partner, and Microsoft said, “Hey, they have some concerns. They wanted us to do a little assessment and see what's going on.” We talked to them, and they said that they were looking to migrate away from AWS and into Azure because they wanted better pricing—Corey: Oh, jeez.Donovan: —and that is—exactly [laugh]. And that is the exact reason why you don't migrate from one to the other because you're setting yourself up for failure, right? It's not about the cost; it's not about the sticker price, the retail price. At the end of the day, all the cloud providers are going to be plus or minus a 2% difference. It's, how did you architect this environment?And when we started peeling back that onion, we started realizing they didn't really have any guardrails or governance, and they started experiencing some of this cloud sprawl. And they expected that by transitioning cloud providers, they would have solved this magic wand solution, and now they're just going to have better costs without putting any processes or frameworks in place to manage the environment to ensure that doesn't happen again.Corey: Let's take it a step further. If you're migrating to cloud to save money, you are probably not going to achieve any cost savings within five years at the soonest. That ignores as well the opportunity cost of all that energy that could be spent on other projects. If you're moving to the cloud, it has to be based on a capability story, not because, “Oh, we're going to save money by moving to the cloud,” in almost every case. I mean, the one time that actually did result in saving money was my own story, where instead of renting a rack in downtown Los Angeles—or part of a rack—for what was it, I think $300 a month or whatnot, suddenly, I wound up just spinning up a couple of Gmail accounts, and oh, this is costing me $10 a month instead, that actually did save some money. On a hobby project where my time was effectively free.That is not usually the case for any functioning business. Don't lose sight of the fact that as technologists, we tend to view our time is free, but to our employer, it's more expensive than the AWS bill. It's spend the money where it makes sense to spend the money.Donovan: Exactly. And that's a great tie-back to those business drivers, right? I think the cloud is a no-brainer for an organization who is looking to expand. “We're right now only in these couple states and we want to be national,” or, “Now, we're going to have a global presence, and it's way easier to leverage the global footprint of the cloud than it is to build your own data centers or whatever your own solution would be.” Right?But those are the primary business drivers that you're looking to achieve. And I think, as a solutions architect, you know, solutions architects are really those consultants that take that higher-level approach and dig deeper to understand okay, but what are we trying to solve here? And this is how we can solve that, right? Not just, oh, I want to save some costs.Corey: I'm taking a look at one of the projects that I'm working on right now and from—this is objectively the wrong direction along almost every axis—I am building something new that I'm not quite ready to talk about yet, but it is going to be revenue-bearing and thus production, and tied to a web app that I'm in the process of constructing. I am intentionally setting out from the beginning to break from my usual serverless pattern and build this to run on top of Kubernetes. I have been, therefore, learning Kubernetes for the last few weeks, and I have many thoughts on it, few of them flattering. And I'm looking at this going, “This seems over-engineered,” and for my use case, it certainly is. However, the reason I'm doing this is because every client I'm dealing with these days runs some Kubernetes stuff themselves, and I should understand it better. It gives me a production workload that I can use for demos for a variety of different things. The staging environment will contain no personally identifiable information, so I can deploy that anywhere that there's a Kubernetes-esque environment or control plane as a dummy workload that I can use to kick the tires on this.And for those reasons, it makes an awful lot of sense. In practice, doing something serverlessly would be the better option. Or the best answer would be to find someone who provides this sort of thing as a white label-able service that I can just pay a few hundred bucks a month to and not think about this thing again. But those are the constraints that make what otherwise looks like a ridiculous idea make sense in this context. But oh God, looking at this for people who run Kubernetes just so they can host their blog, it's, what are you doing over there? Is that just sort of the Hello World-style application? Because not for nothing, the value of a blog is in the content, not in the magic that winds up making that content visible to readers in almost every case.Donovan: Yeah, [laugh] this is one of my favorite topics to talk about, actually, because so many times, we engage these customers, and they're like, “We want to containerize.” And I'm like, “Oh, that's awesome. I love the idea. There's so many benefits to containerization that pretty much checks every box within the Well-Architected Framework.” And then their next sentence is, “And so, we're going to move to Kubernetes.”And I'm like, “Well, why Kubernetes?” Right? Because as you said, this is just a blog, or this is just a static website, or whatever the application they have is. Containers are great, but do you need a full-fledged Kubernetes deployment? Do you need every open-source software possible so that you can integrate?Or would you be just fine with ECS that is pretty streamlined, pretty much out of the box just works from AWS? Sometimes it's necessary. Sometimes EKS or a Kubernetes deployment or maybe your own self-managed Kubernetes deployment is necessary, but it's another one of these misconceptions, I'd say, when people hear the new buzzwords, they're like, “Oh, we need to do that because that's the best thing to do.” And it's often best as you—I loved your word ‘perspective,' the outside perspective—it's usually best to get that outside perspective and understand really how—what specific solution might help you.Corey: From where I sit, I think that people often tend to skip over that. It gets to the idea of resume-driven development. And honestly, it's hard to tell people they're necessarily going in the wrong direction given how fever-pitched the hype around Kubernetes has gotten. Every company is using it for something, and on some level, wanting to get that on your resume is a logical next step. Looking at cloud bills, I would never suggest someone [laugh] wind up doing this on their own dime, so yeah, it does make sense in that context.The goal, I think of being a technology executive is to be able to understand that and, one, provide pathways for your team to develop, but also to make sure that their objectives align with the business's objectives. And often I do not see that leading in the Kubernetes direction, for better or worse. There's a reason that I own the domain kubernetestheeasyway.com and I repoint it to Amazon's ECS.Although to those listening, I am thrilled to repoint that to the highest bidder; my email is open. Jokes and witticisms aside, I am curious based upon your perspective in the market—which is broader than mine because imagine that, you don't focus on one very specific problem—what are most organizations looking at for business drivers that they generally tend to either not realize or not quite achieve, or, “Well, that didn't go quite the way that we wanted to?” Because we see the outcome of people moving to cloud; we don't necessarily see the reasons behind it.Donovan: Yeah, that's a great point. So, the first and foremost concern that I'd say most companies experience is how do we make sure that our website or application is always up and always able to generate revenue, right? Based on the types of customers that we get, they're hyper-compliant, they're mission critical, they're 24x7, they need to always be on, they need to make sure that whatever the platform that they're running their infrastructure on is able to be up 24x7. Much harder to do that in your own world than it is to do that in cloud, right, so that's one large business driver.Another large business driver, again, with hyper-compliance, is security. They've experienced some security incidents and they want to make sure that they have a more scalable, secure platform as they grow because, again, maybe they're becoming national, or they're becoming global, or they need to meet some compliance regulation, right? And then additionally, as I've mentioned a couple times here, they want to increase their overall agility, which will in turn decrease their time to market their time to revenue. That is an often miscalculated correlation, right? People say, “Okay, we'll increase our agility,” but without realizing why they're going to try to increase their agility.They want to move from pushing code 20 times a month to 20 times a day, but why, right? That's the agility piece, but why do you want to do that? Well, because it allows us to more quickly debug our code. It allows us to more quickly identify issues or rollbacks and improve our overall efficiency, increase customer retention, increase customer satisfaction, things like that.Corey: I really wish that people would, I guess, tell those stories more in conference talks, rather than, “We moved to the cloud because of reasons, and it was awesome.” And it's always depressing to me because you hear them tell this beautiful story, and you turn next to the person in the audience often and be like, “Oh, I wish I could work in a place that ran a project like that.” And they say, “Yeah, me too.” And you check their badge and they work at the same company the speaker does. It's the idea of telling this fanciful, imagined versioning rather than addressing the reality that the real world is very messy.Donovan: Exactly. And it's really hard, you know? I'm not going to sit here and say, “Well, you know, everything that we're talking about here and completely transforming your business is simple. And wow, you guys aren't smart for doing it or for not doing it.” You know? It is difficult.But back to that idea of the outside perspective, there are tried and true methods, right? Like we talked about with the Well-Architected Framework, with the Cloud Adoption Methodology, with the Seven Rs of Migration, right? There's a lot of content out there are a lot of people out there that can help you, but it's definitely a journey.Corey: It really is. And I want to thank you for sharing your view of it with us on the show. If people want to learn more, where's the best place to find you?Donovan: Visit our website, logicworks.com. You could visit us across all of our social media platforms. You could reach out to me directly; happy to talk to anybody, even if you just wanted to say, “Hey, how's the weather?” Right now, it's raining. But yeah, definitely reach out to us via our website.Corey: And we will, of course, put a link to that in the show notes. Thank you so much for being so generous with your time. I appreciate it.Donovan: Thank you so much, Corey. I had a great time, and looking forward to the next one.Corey: Donovan Brady, director of solutions architecture at Logicworks on this promoted guest episode. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry rant as a negative comment on this, presumably because you are Donovan's antithesis, the director of problems architecture.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Controversy of Cloud Repatriation With Amy Tobey of Equinix

Screaming in the Cloud

Play Episode Listen Later Sep 27, 2022 38:34


About AmyAmy Tobey has worked in tech for more than 20 years at companies of every size, working with everything from kernel code to user interfaces. These days she spends her time building an innovative Site Reliability Engineering program at Equinix, where she is a principal engineer. When she's not working, she can be found with her nose in a book, watching anime with her son, making noise with electronics, or doing yoga poses in the sun.Links Referenced: Equinix: https://metal.equinix.com Twitter: https://twitter.com/MissAmyTobey TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn, and this episode is another one of those real profiles in shitposting type of episodes. I am joined again from a few months ago by Amy Tobey, who is a Senior Principal Engineer at Equinix, back for more. Amy, thank you so much for joining me.Amy: Welcome. To your show. [laugh].Corey: Exactly. So, one thing that we have been seeing a lot over the past year, and you struck me as one of the best people to talk about what you're seeing in the wilderness perspective, has been the idea of cloud repatriation. It started off with something that came out of Andreessen Horowitz toward the start of the year about the trillion-dollar paradox, how, at a certain point of scale, repatriating to a data center is the smart and right move. And oh, my stars that ruffle some feathers for people?Amy: Well, I spent all this money moving to the cloud. That was just mean.Corey: I know. Why would I want to leave the cloud? I mean, for God's sake, my account manager named his kid after me. Wait a minute, how much am I spending on that? Yeah—Amy: Good question.Corey: —there is that ever-growing problem. And there have been the examples that people have given of Dropbox classically did a cloud repatriation exercise, and a second example that no one can ever name. And it seems like okay, this might not necessarily be the direction that the industry is going. But I also tend to not be completely naive when it comes to these things. And I can see repatriation making sense on a workload-by-workload basis.What that implies is that yeah, but a lot of other workloads are not going to be going to a data center. They're going to stay in a cloud provider, who would like very much if you never read a word of this to anyone in public.Amy: Absolutely, yeah.Corey: So, if there are workloads repatriating, it would occur to me that there's a vested interest on the part of every major cloud provider to do their best to, I don't know if saying suppress the story is too strongly worded, but it is directionally what I mean.Amy: They aren't helping get the story out. [laugh].Corey: Yeah, it's like, “That's a great observation. Could you maybe shut the hell up and never make it ever again in public, or we will end you?” Yeah. Your Amazon. What are you going to do, launch a shitty Amazon Basics version of what my company does? Good luck. Have fun. You're probably doing it already.But the reason I want to talk to you on this is a confluence of a few things. One, as I mentioned back in May when you were on the show, I am incensed and annoyed that we've been talking for as long as we have, and somehow I never had you on the show. So, great. Come back, please. You're always welcome here. Secondly, you work at Equinix, which is, effectively—let's be relatively direct—it is functionally a data center as far as how people wind up contextualizing this. Yes, you have higher level—Amy: Yeah I guess people contextualize it that way. But we'll get into that.Corey: Yeah, from the outside. I don't work there, to be clear. My talking points don't exist for this. But I think of oh, Equinix. Oh, that means you basically have a colo or colo equivalent. The pricing dynamics have radically different; it looks a lot closer to a data center in my imagination than it does a traditional public cloud. I would also argue that if someone migrates from AWS to Equinix, that would be viewed—arguably correctly—as something of a repatriation. Is that directionally correct?Amy: I would argue incorrectly. For Metal, right?Corey: Ah.Amy: So, Equinix is a data center company, right? Like that's why everybody knows us as. Equinix Metal is a bare metal primitive service, right? So, it's a lot more of a cloud workflow, right, except that you're not getting the rich services that you get in a technically full cloud, right? Like, there's no RDS; there's no S3, even. What you get is bare metal primitives, right? With a really fast network that isn't going to—Corey: Are you really a cloud provider without some ridiculous machine-learning-powered service that's going to wind up taking pictures, perform incredibly expensive operations on it, and then return something that's more than a little racist? I mean, come on. That's not—you're not a cloud until you can do that, right?Amy: We can do that. We have customers that do that. Well, not specifically that, but um—Corey: Yeah, but they have to build it themselves. You don't have the high-level managed service that basically serves as, functionally, bias laundering.Amy: Yeah, you don't get it in a box, right? So, a lot of our customers are doing things that are unique, right, that are maybe not exactly fit into the cloud well. And it comes back down to a lot of Equinix's roots, which is—we talk but going into the cloud, and it's this kind of abstract environment we're reaching for, you know, up in the sky. And it's like, we don't know where it is, except we have regions that—okay, so it's in Virginia. But the rule of real estate applies to technology as often as not, which is location, location, location, right?When we're talking about a lot of applications, a challenge that we face, say in gaming, is that the latency from the customer, so that last mile to your data center, can often be extremely important, right, so a few milliseconds even. And a lot of, like, SaaS applications, the typical stuff that really the cloud was built on, 10 milliseconds, 50 milliseconds, nobody's really going to notice that, right? But in a gaming environment or some very low latency application that needs to run extremely close to the customer, it's hard to do that in the cloud. They're building this stuff out, right? Like, I see, you know, different ones [unintelligible 00:05:53] opening new regions but, you know, there's this other side of the cloud, which is, like, the edge computing thing that's coming alive, and that's more where I think about it.And again, location, location, location. The speed of light is really fast, but as most of us in tech know, if you want to go across from the East Coast to the West Coast, you're talking about 80 milliseconds, on average, right? I think that's what it is. I haven't checked in a while. Yeah, that's just basic fundamental speed of light. And so, if everything's in us-east-1—and this is why we do multi-region, sometimes—the latency from the West Coast isn't going to be great. And so, we run the application on both sides.Corey: It has improved though. If you want to talk old school things that are seared into my brain from over 20 years ago, every person who's worked in data centers—or in technology, as a general rule—has a few IP addresses seared. And the one that I've always had on my mind was 130.111.32.11. Kind of arbitrary and ridiculous, but it was one of the two recursive resolvers provided at the University of Maine where I had my first help desk job.And it lives on-prem, in Maine. And generally speaking, I tended to always accept that no matter where I was—unless I was in a data center somewhere—it was about 120 milliseconds. And I just checked now; it is 85 and change from where I am in San Francisco. So, the internet or the speed of light have improved. So, good for whichever one of those it was. But yeah, you've just updated my understanding of these things. All of this is, which is to say, yes, latency is very important.Amy: Right. Let's forget repatriation to really be really honest. Even the Dropbox case or any of them, right? Like, there's an economic story here that I think all of us that have been doing cloud work for a while see pretty clearly that maybe not everybody's seeing that—that's thinking from an on-prem kind of situation, which is that—you know, and I know you do this all the time, right, is, you don't just look at the cost of the data center and the servers and the network, the technical components, the bill of materials—Corey: Oh, lies, damned lies, and TCO analyses. Yeah.Amy: —but there's all these people on top of it, and the organizational complexity, and the contracts that you got to manage. And it's this big, huge operation that is incredibly complex to do well that is almost nobody's business. So the way I look at this, right, and the way I even talk to customers about it is, like, “What is your produ—” And I talk to people internally about this way? It's like, “What are you trying to build?” “Well, I want to build a SaaS.” “Okay. Do you need data center expertise to build a SaaS?” “No.” “Then why the hell are you putting it in a data center?” Like we—you know, and speaking for my employer, right, like, we have Equinix Metal right here. You can build on that and you don't have to do all the most complex part of this, at least in terms of, like, the physical plant, right? Like, right, getting a bare metal server available, we take care of all of that. Even at the primitive level, where we sit, it's higher level than, say, colo.Corey: There's also the question of economics as it ties into it. It's never just a raw cost-of-materials type of approach. Like, my original job in a data center was basically to walk around and replace hard drives, and apparently, to insult people. Now, the cloud has taken one of those two aspects away, and you can follow my Twitter account and figure out which one of those two it is, but what I keep seeing now is there is value to having that task done, but in a cloud environment—and Equinix Metal, let's be clear—that has slipped below the surface level of awareness. And well, what are the economic implications of that?Well, okay, you have a whole team of people at large companies whose job it is to do precisely that. Okay, we're going to upskill them and train them to use cloud. Okay. First, not everyone is going to be capable or willing to make that leap from hard drive replacement to, “Congratulations and welcome to JavaScript. You're about to hate everything that comes next.”And if they do make that leap, their baseline market value—by which I mean what the market is willing to pay for them—approximately will double. And whether they wind up being paid more by their current employer or they take a job somewhere else with those skills and get paid what they are worth, the company still has that economic problem. Like it or not, you will generally get what you pay for whether you want to or not; that is the reality of it. And as companies are thinking about this, well, what gets into the TCO analysis and what doesn't, I have yet to see one where the outcome was not predetermined. They're less, let's figure out in good faith whether it's going to be more expensive to move to the cloud, or move out of the cloud, or just burn the building down for insurance money. The outcome is generally the one that the person who commissioned the TCO analysis wants. So, when a vendor is trying to get you to switch to them, and they do one for you, yeah. And I'm not saying they're lying, but there's so much judgment that goes into this. And what do you include and what do you not include? That's hard.Amy: And there's so many hidden costs. And that's one of the things that I love about working at a cloud provider is that I still get to play with all that stuff, and like, I get to see those hidden costs, right? Like you were talking about the person who goes around and swaps out the hard drives. Or early in my career, right, I worked with someone whose job it was this every day, she would go into data center, she'd swap out the tapes, you know, and do a few things other around and, like, take care of the billing system. And that was a job where it was kind of going around and stewarding a whole bunch of things that kind of kept the whole machine running, but most people outside of being right next to the data center didn't have any idea that stuff even happen, right, that went into it.And so, like you were saying, like, when you go to do the TCO analysis, I mean, I've been through this a couple of times prior in my career, where people will look at it and go like, “Well, of course we're not going to list—we'll put, like, two headcount on there.” And it's always a lie because it's never just to headcount. It's never just the network person, or the SRE, or the person who's racking the servers. It's also, like, finance has to do all this extra work, and there's all the logistic work, and there is just so much stuff that just is really hard to include. Not only do people leave it out, but it's also just really hard for people to grapple with the complexity of all the things it takes to run a data center, which is, like, one of the most complex machines on the planet, any single data center.Corey: I've worked in small-scale environments, maybe a couple of mid-sized ones, but never the type of hyperscale facility that you folks have, which I would say is if it's not hyperscale, it's at least directionally close to it. We're talking thousands of servers, and hundreds of racks.Amy: Right.Corey: I've started getting into that, on some level. Now, I guess when we say ‘hyperscale,' we're talking about AWS-size things where, oh, that's a region and it's going to have three dozen data center facilities in it. Yeah, I don't work in places like that because honestly, have you met me? Would you trust me around something that's that critical infrastructure? No, you would not, unless you have terrible judgment, which means you should not be working in those environments to begin with.Amy: I mean, you're like a walking chaos exercise. Maybe I would let you in.Corey: Oh, I bring my hardware destruction aura near anything expensive and things are terrible. It's awful. But as I looked at the cloud, regardless of cloud, there is another economic element that I think is underappreciated, and to be fair, this does, I believe, apply as much to Equinix Metal as it does to the public hyperscale cloud providers that have problems with naming things well. And that is, when you are provisioning something as a customer of one of these places, you have an unbounded growth problem. When you're in a data center, you are not going to just absentmindedly sign an $8 million purchase order for new servers—you know, a second time—and then that means you're eventually run out of power, space, places to put things, and you have to go find it somewhere.Whereas in cloud, the only limit is basically your budget where there is no forcing function that reminds you to go and clean up that experiment from five years ago. You have people with three petabytes of data they were using for a project, but they haven't worked there in five years and nothing's touched it since. Because the failure mode of deleting things that are important, or disasters—Amy: That's why Glacier exists.Corey: Oh, exactly. But that failure mode of deleting things that should not be deleted are disastrous for a company, whereas if you've leave them there, well, it's only money. And there's no forcing function to do that, which means you have this infinite growth problem with no natural limit slash predator around it. And that is the economic analysis that I do not see playing out basically anywhere. Because oh, by the time that becomes a problem, we'll have good governance in place. Yeah, pull the other one. It has bells on it.Amy: That's the funny thing, right, is a lot of the early drive in the cloud was those of us who wanted to go faster and we were up against the limitations of our data centers. And then we go out and go, like, “Hey, we got this cloud thing. I'll just, you know, put the credit card in there and I'll spin up a few instances, and ‘hey, I delivered your product.'” And everybody goes, “Yeah, hey, happy.” And then like you mentioned, right, and then we get down the road here, and it's like, “Oh, my God, how much are we spending on this?”And then you're in that funny boat where you have both. But yeah, I mean, like, that's just typical engineering problem, where, you know, we have to deal with our constraints. And the cloud has constraints, right? Like when I was at Netflix, one of the things we would do frequently is bump up against instance limits. And then we go talk to our TAM and be like, “Hey, buddy. Can we have some more instance limit?” And then take care of that, right?But there are some bounds on that. Of course, in the cloud providers—you know, if I have my cloud provider shoes on, I don't necessarily want to put those limits to law because it's a business, the business wants to hoover up all the money. That's what businesses do. So, I guess it's just a different constraint that is maybe much too easy to knock down, right? Because as you mentioned, in a data center or in a colo space, I outgrow my cage and I filled up all that space I have, I have to either order more space from my colo provider, I expand to the cloud, right?Corey: The scale I was always at, the limit was not the space because I assure you with enough shoving all things are possible. Don't believe me? Look at what people are putting in the overhead bin on any airline. Enough shoving, you'll get a Volkswagen in there. But it was always power constrained is what I dealt with it. And it's like, “Eh, they're just being conservative.” And the whole building room dies.Amy: You want blade servers because that's how you get blade servers, right? That movement was about bringing the density up and putting more servers in a rack. You know, there were some management stuff and [unintelligible 00:16:08], but a lot of it was just about, like, you know, I remember I'm picturing it, right—Corey: Even without that, I was still power constrained because you have to remember, a lot of my experiences were not in, shall we say, data center facilities that you would call, you know, good.Amy: Well, that brings up a fun thing that's happening, which is that the power envelope of servers is still growing. The newest Intel chips, especially the ones they're shipping for hyperscale and stuff like that, with the really high core counts, and the faster clock speeds, you know, these things are pulling, like, 300 watts. And they also have to egress all that heat. And so, that's one of the places where we're doing some innovations—I think there's a couple of blog posts out about it around—like, liquid cooling or multimode cooling. And what's interesting about this from a cloud or data center perspective, is that the tools and skills and everything has to come together to run a, you know, this year's or next year's servers, where we're pushing thousands of kilowatts into a rack. Thousands; one rack right?The bar to actually bootstrap and run this stuff successfully is rising again, compared to I take my pizza box servers, right—and I worked at a gaming company a long time ago, right, and they would just, like, stack them on the floor. It was just a stack of servers. Like, they were in between the rails, but they weren't screwed down or anything, right? And they would network them all up. Because basically, like, the game would spin up on the servers and if they died, they would just unplug that one and leave it there and spin up another one.It was like you could just stack stuff up and, like, be slinging cables across the data center and stuff back then. I wouldn't do it that way now, but when you add, say liquid cooling and some of these, like, extremely high power situations into the mix, now you need to have, for example, if you're using liquid cooling, you don't want that stuff leaking, right? And so, it's good as the pressure fittings and blind mating and all this stuff that's coming around gets, you still have that element of additional training, and skill, and possibility for mistakes.Corey: The thing that I see as I look at this across the space is that, on some level, it's gotten harder to run a data center than it ever did before. Because again, another reason I wanted to have you on this show is that you do not carry a quota. Although you do often carry the conversation, when you have boring people around you, but quotas, no. You are not here selling things to people. You're not actively incentivized to get people to see things a certain way.You are very clearly an engineer in the right ways. I will further point out though, that you do not sound like an engineer, by which I mean, you're going to basically belittle people, in many cases, in the name of being technically correct. You're a human being with a frickin soul. And believe me, it is noticed.Amy: I really appreciate that. If somebody's just listening to hearing my voice and in my name, right, like, I have a low voice. And in most of my career, I was extremely technical, like, to the point where you know, if something was wrong technically, I would fight to the death to get the right technical solution and maybe not see the complexity around the decisions, and why things were the way they were in the way I can today. And that's changed how I sound. It's changed how I talk. It's changed how I look at and talk about technology as well, right? I'm just not that interested in Kubernetes. Because I've kind of started looking up the stack in this kind of pursuit.Corey: Yeah, when I say you don't sound like an engineer, I am in no way shape or form—Amy: I know.Corey: —alluding in any respect to your technical acumen. I feel the need to clarify that statement for people who might be listening, and say, “Hey, wait a minute. Is he being a shithead?” No.Amy: No, no, no.Corey: Well, not the kind you're worried I'm being anyway; I'm a different breed of shithead and that's fine.Amy: Yeah, I should remember that other people don't know we've had conversations that are deeply technical, that aren't on air, that aren't context anybody else has. And so, like, I bring that deep technical knowledge, you know, the ability to talk about PCI Express, and kilovolts [unintelligible 00:19:58] rack, and top-of-rack switches, and network topologies, all of that together now, but what's really fascinating is where the really big impact is, for reliability, for security, for quality, the things that me as a person, that I'm driven by—products are cool, but, like, I like them to be reliable; that's the part that I like—really come down to more leadership, and business acumen, and understanding the business constraints, and then being able to get heard by an audience that isn't necessarily technical, that doesn't necessarily understand the difference between PCI, PCI-X, and PCI Express. There's a difference between those. It doesn't mean anything to the business, right, so when we want to go and talk about why are we doing, for example, multi-region deployment of our application? If I come in and say, “Well, because we want to use Raft.” That's going to fall flat, right?The business is going to go, “I don't care about Raft. What does that have to do with my customers?” Which is the right question to always ask. Instead, when I show up and say, “Okay, what's going on here is we have this application sits in a single region—or in a single data center or whatever, right? I'm using region because that's probably what most of the people listening understand—you know, so I put my application in a single region and it goes down, our customers are going to be unhappy. We have the alternative to spend, okay, not a little bit more money, probably a lot more money to build a second region, and the benefit we will get is that our customers will be able to access the service 24x7, and it will always work and they'll have a wonderful experience. And maybe they'll keep coming back and buy more stuff from us.”And so, when I talk about it in those terms, right—and it's usually more nuanced than that—then I start to get the movement at the macro level, right, in the systemic level of the business in the direction I want it to go, which is for the product group to understand why reliability matters to the customer, you know? For the individual engineers to understand why it matters that we use secure coding practices.[midroll 00:21:56]Corey: Getting back to the reason I said that you are not quota-carrying and you are not incentivized to push things in a particular way is that often we'll meet zealots, and I've never known you to be one, you have always been a strong advocate for doing the right thing, even if it doesn't directly benefit any given random employer that you might have. And as a result, one of the things that you've said to me repeatedly is if you're building something from scratch, for God's sake, put it in cloud. What is wrong with you? Do that. The idea of building it yourself on low-lying, underlying primitives for almost every modern SaaS style workload, there's no reason to consider doing something else in almost any case. Is that a fair representation of your position on this?Amy: It is. I mean, the simpler version right, “Is why the hell are you doing undifferentiated lifting?” Right? Things that don't differentiate your product, why would you do it?Corey: The thing that this has empowered then is I can build an experiment tonight—I don't have to wait for provisioning and signed contracts and do all the rest. I can spend 25 cents and get the experiment up and running. If it takes off, though, it has changed how I move going forward as well because there's no difference in the way that there was back when we were in data centers. I'm going to try and experiment I'm going to run it in this, I don't know, crappy Raspberry Pi or my desktop or something under my desk somewhere. And if it takes off and I have to scale up, I got to do a giant migration to real enterprise-grade hardware. With cloud, you are getting all of that out of the box, even if all you're doing with it is something ridiculous and nonsensical.Amy: And you're often getting, like, ridiculously better service. So, 20 years ago, if you and I sat down to build a SaaS app, we would have spun up a Linux box somewhere in a colo, and we would have spun up Apache, MySQL, maybe some Perl or PHP if we were feeling frisky. And the availability of that would be one machine could do, what we could handle in terms of one MySQL instance. But today if I'm spinning up a new stack for some the same kind of SaaS, I'm going to probably deploy it into an ASG, I'm probably going to have some kind of high availability database be on it—and I'm going to use Aurora as an example—because, like, the availability of an Aurora instance, in terms of, like, if I'm building myself up with even the very best kit available in databases, it's going to be really hard to hit the same availability that Aurora does because Aurora is not just a software solution, it's also got a team around it that stewards that 24/7. And it continues to evolve on its own.And so, like, the base, when we start that little tiny startup, instead of being that one machine, we're actually starting at a much higher level of quality, and availability, and even security sometimes because of these primitives that were available. And I probably should go on to extend on the thought of undifferentiated lifting, right, and coming back to the colo or the edge story, which is that there are still some little edge cases, right? Like I think for SaaS, duh right? Like, go straight to. But there are still some really interesting things where there's, like, hardware innovations where they're doing things with GPUs and stuff like that.Where the colo experience may be better because you're trying to do, like, custom hardware, in which case you are in a colo. There are businesses doing some really interesting stuff with custom hardware that's behind an application stack. What's really cool about some of that, from my perspective, is that some of that might be sitting on, say, bare metal with us, and maybe the front-end is sitting somewhere else. Because the other thing Equinix does really well is this product we call a Fabric which lets us basically do peering with any of the cloud providers.Corey: Yeah, the reason, I guess I don't consider you as a quote-unquote, “Cloud,” is first and foremost, rooted in the fact that you don't have a bandwidth model that is free and grass and criminally expensive to send it anywhere that isn't to you folks. Like, are you really a cloud if you're not just gouging the living piss out of your customers every time they want to send data somewhere else?Amy: Well, I mean, we like to say we're part of the cloud. And really, that's actually my favorite feature of Metal is that you get, I think—Corey: Yeah, this was a compliment, to be very clear. I'm a big fan of not paying 1998 bandwidth pricing anymore.Amy: Yeah, but this is the part where I get to do a little bit of, like, showing off for Metal a little bit, in that, like, when you buy a Metal server, there's different configurations, right, but, like, I think the lowest one, you have dual 10 Gig ports to the server that you can get either in a bonded mode so that you have a single 20 Gig interface in your operating system, or you can actually do L3 and you can do BGP to your server. And so, this is a capability that you really can't get at all on the other clouds, right? This lets you do things with the network, not only the bandwidth, right, that you have available. Like, you want to stream out 25 gigs of bandwidth out of us, I think that's pretty doable. And the rates—I've only seen a couple of comparisons—are pretty good.So, this is like where some of the business opportunities, right—and I can't get too much into it, but, like, this is all public stuff I've talked about so far—which is, that's part of the opportunity there is sitting at the crossroads of the internet, we can give you a server that has really great networking, and you can do all the cool custom stuff with it, like, BGP, right? Like, so that you can do Anycast, right? You can build Anycast applications.Corey: I miss the days when that was a thing that made sense.Amy: [laugh].Corey: I mean that in the context of, you know, with the internet and networks. These days, it always feels like the network engineering as slipped away within the cloud because you have overlays on top of overlays and it's all abstractions that are living out there right until suddenly you really need to know what's going on. But it has abstracted so much of this away. And that, on some level, is the surprise people are often in for when they wind up outgrowing the cloud for a workload and wanting to move it someplace that doesn't, you know, ride them like naughty ponies for bandwidth. And they have to rediscover things that we've mostly forgotten about.I remember having to architect significantly around the context of hard drive failures. I know we've talked about that a fair bit as a thing, but yeah, it's spinning metal, it throws off heat and if you lose the wrong one, your data is gone and you now have serious business problems. In cloud, at least AWS-land, that's not really a thing anymore. The way EBS is provisioned, there's a slight tick in latency if you're looking at just the right time for what I think is a hard drive failure, but it's there. You don't have to think about this anymore.Migrate that workload to a pile of servers in a colo somewhere, guess what? Suddenly your reliability is going to decrease. Amazon, and the other cloud providers as well, have gotten to a point where they are better at operations than you are at your relatively small company with your nascent sysadmin team. I promise. There is an economy of scale here.Amy: And it doesn't have to be good or better, right? It's just simply better resourced—Corey: Yeah.Amy: Than most anybody else can hope. Amazon can throw a billion dollars at it and never miss it. In most organizations out there, you know, and most of the especially enterprise, people are scratching and trying to get resources wherever they can, right? They're all competing for people, for time, for engineering resources, and that's one of the things that gets freed up when you just basically bang an API and you get the thing you want. You don't have to go through that kind of old world internal process that is usually slow and often painful.Just because they're not resourced as well; they're not automated as well. Maybe they could be. I'm sure most of them could, in theory be, but we come back to undifferentiated lifting. None of this helps, say—let me think of another random business—Claire's, whatever, like, any of the shops in the mall, they all have some kind of enterprise behind them for cash processing and all that stuff, point of sale, none of this stuff is differentiating for them because it doesn't impact anything to do with where the money comes in. So again, we're back at why are you doing this?Corey: I think that's also the big challenge as well, when people start talking about repatriation and talking about this idea that they are going to, oh, that cloud is too expensive; we're going to move out. And they make the economics work. Again, I do firmly believe that, by and large, businesses do not intentionally go out and make poor decisions. I think when we see a company doing something inscrutable, there's always context that we're missing, and I think as a general rule of thumb, that at these companies do not hire people who are fools. And there are always constraints that they cannot talk about in public.My general position as a consultant, and ideally as someone who aspires to be a decent human being, is that when I see something I don't understand, I assume that there's simply a lack of context, not that everyone involved in this has been foolish enough to make giant blunders that I can pick out in the first five seconds of looking at it. I'm not quite that self-confident yet.Amy: I mean, that's a big part of, like, the career progression into above senior engineer, right, is, you don't get to sit in your chair and go, like, “Oh, those dummies,” right? You actually have—I don't know about ‘have to,' but, like, the way I operate now, right, is I remember in my youth, I used to be like, “Oh, those business people. They don't know, nothing. Like, what are they doing?” You know, it's goofy what they're doing.And then now I have a different mode, which is, “Oh, that's interesting. Can you tell me more?” The feeling is still there, right? Like, “Oh, my God, what is going on here?” But then I get curious, and I go, “So, how did we get here?” [laugh]. And you get that story, and the stories are always fascinating, and they always involve, like, constraints, immovable objects, people doing the best they can with what they have available.Corey: Always. And I want to be clear that very rarely is it the right answer to walk into a room and say, look at the architecture and, “All right, what moron built this?” Because always you're going to be asking that question to said moron. And it doesn't matter how right you are, they're never going to listen to another thing out of your mouth again. And have some respect for what came before even if it's potentially wrong answer, well, great. “Why didn't you just use this service to do this instead?” “Yeah, because this thing predates that by five years, jackass.”There are reasons things are the way they are, if you take any architecture in the world and tell people to rebuild it greenfield, almost none of them would look the same as they do today because we learn things by getting it wrong. That's a great teacher, and it hurts. But it's also true.Amy: And we got to build, right? Like, that's what we're here to do. If we just kind of cycle waiting for the perfect technology, the right choices—and again, to come back to the people who built it at the time used—you know, often we can fault people for this—used the things they know or the things that are nearby, and they make it work. And that's kind of amazing sometimes, right?Like, I'm sure you see architectures frequently, and I see them too, probably less frequently, where you just go, how does this even work in the first place? Like how did you get this to work? Because I'm looking at this diagram or whatever, and I don't understand how this works. Maybe that's a thing that's more a me thing, like, because usually, I can look at a—skim over an architecture document and be, like, be able to build the model up into, like, “Okay, I can see how that kind of works and how the data flows through it.” I get that pretty quickly.And comes back to that, like, just, again, asking, “How did we get here?” And then the cool part about asking how did we get here is it sets everybody up in the room, not just you as the person trying to drive change, but the people you're trying to bring along, the original architects, original engineers, when you ask, how did we get here, you've started them on the path to coming along with you in the future, which is kind of cool. But until—that storytelling mode, again, is so powerful at almost every level of the stack, right? And that's why I just, like, when we were talking about how technical I bring things in, again, like, I'm just not that interested in, like, are you Little Endian or Big Endian? How did we get here is kind of cool. You built a Big Endian architecture in 2022? Like, “Ohh. [laugh]. How do we do that?”Corey: Hey, leave me to my own devices, and I need to build something super quickly to get it up and running, well, what I'm going to do, for a lot of answers is going to look an awful lot like the traditional three-tier architecture that I was running back in 2008. Because I know it, it works well, and I can iterate rapidly on it. Is it a best practice? Absolutely not, but given the constraints, sometimes it's the fastest thing to grab? “Well, if you built this in serverless technologies, it would run at a fraction of the cost.” It's, “Yes, but if I run this thing, the way that I'm running it now, it'll be $20 a month, it'll take me two hours instead of 20. And what exactly is your time worth, again?” It comes down to the better economic model of all these things.Amy: Any time you're trying to make a case to the business, the economic model is going to always go further. Just general tip for tech people, right? Like if you can make the better economic case and you go to the business with an economic case that is clear. Businesses listen to that. They're not going to listen to us go on and on about distributed systems.Somebody in finance trying to make a decision about, like, do we go and spend a million bucks on this, that's not really the material thing. It's like, well, how is this going to move the business forward? And how much is it going to cost us to do it? And what other opportunities are we giving up to do that?Corey: I think that's probably a good place to leave it because there's no good answer. We can all think about that until the next episode. I really want to thank you for spending so much time talking to me again. If people want to learn more, where's the best place for them to find you?Amy: Always Twitter for me, MissAmyTobey, and I'll see you there. Say hi.Corey: Thank you again for being as generous with your time as you are. It's deeply appreciated.Amy: It's always fun.Corey: Amy Tobey, Senior Principal Engineer at Equinix Metal. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment that tells me exactly what we got wrong in this episode in the best dialect you have of condescending engineer with zero people skills. I look forward to reading it.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
How Data Discovery is Changing the Game with Shinji Kim

Screaming in the Cloud

Play Episode Listen Later Sep 22, 2022 32:58


About ShinjiShinji Kim is the Founder & CEO of Select Star, an automated data discovery platform that helps you to understand & manage your data. Previously, she was the Founder & CEO of Concord Systems, a NYC-based data infrastructure startup acquired by Akamai Technologies in 2016. She led the strategy and execution of Akamai IoT Edge Connect, an IoT data platform for real-time communication and data processing of connected devices. Shinji studied Software Engineering at University of Waterloo and General Management at Stanford GSB.Links Referenced: Select Star: https://www.selectstar.com/ LinkedIn: https://www.linkedin.com/company/selectstarhq/ Twitter: https://twitter.com/selectstarhq TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at AWS AppConfig. Engineers love to solve, and occasionally create, problems. But not when it's an on-call fire-drill at 4 in the morning. Software problems should drive innovation and collaboration, NOT stress, and sleeplessness, and threats of violence. That's why so many developers are realizing the value of AWS AppConfig Feature Flags. Feature Flags let developers push code to production, but hide that that feature from customers so that the developers can release their feature when it's ready. This practice allows for safe, fast, and convenient software development. You can seamlessly incorporate AppConfig Feature Flags into your AWS or cloud environment and ship your Features with excitement, not trepidation and fear. To get started, go to snark.cloud/appconfig. That's snark.cloud/appconfig.Corey: I come bearing ill tidings. Developers are responsible for more than ever these days. Not just the code that they write, but also the containers and the cloud infrastructure that their apps run on. Because serverless means it's still somebody's problem. And a big part of that responsibility is app security from code to cloud. And that's where our friend Snyk comes in. Snyk is a frictionless security platform that meets developers where they are - Finding and fixing vulnerabilities right from the CLI, IDEs, Repos, and Pipelines. Snyk integrates seamlessly with AWS offerings like code pipeline, EKS, ECR, and more! As well as things you're actually likely to be using. Deploy on AWS, secure with Snyk. Learn more at Snyk.co/scream That's S-N-Y-K.co/screamCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Every once in a while, I encounter a company that resonates with something that I've been doing on some level. In this particular case, that is what's happened here, but the story is slightly different. My guest today is Shinji Kim, who's the CEO and founder at Select Star.And the joke that I was making a few months ago was that Select Stars should have been the name of the Oracle ACE program instead. Shinji, thank you for joining me and suffering my ridiculous, basically amateurish and sophomore database-level jokes because I am bad at databases. Thanks for taking the time to chat with me.Shinji: Thanks for having me here, Corey. Good to meet you.Corey: So, Select Star despite being the only query pattern that I've ever effectively been able to execute from memory, what you do as a company is described as an automated data discovery platform. So, I'm going to start at the beginning with that baseline definition. I think most folks can wrap their heads around what the idea of automated means, but the rest of the words feel like it might mean different things to different people. What is data discovery from your point of view?Shinji: Sure. The way that we define data discovery is finding and understanding data. In other words, think about how discoverable your data is in your company today. How easy is it for you to find datasets, fields, KPIs of your organization data? And when you are looking at a table, column, dashboard, report, how easy is it for you to understand that data underneath? Encompassing on that is how we define data discovery.Corey: When you talk about data lurking around the company in various places, that can mean a lot of different things to different folks. For the more structured data folks—which I tend to think of as the organized folks who are nothing like me—that tends to mean things that live inside of, for example, traditional relational databases or things that closely resemble that. I come from a grumpy old sysadmin perspective, so I'm thinking, oh, yeah, we have a Jira server in the closet and that thing's logging to its own disk, so that's going to be some information somewhere. Confluence is another source of data in an organization; it's usually where insight and a knowledge of what's going on goes to die. It's one of those write once, read never type of things.And when I start thinking about what data means, it feels like even that is something of a squishy term. From the perspective of where Select Start starts and stops, is it bounded to data that lives within relational databases? Does it go beyond that? Where does it start? Where does it stop?Shinji: So, we started the company with an intention of increasing the discoverability of data and hence providing automated data discovery capability to organizations. And the part where we see this as the most effective is where the data is currently being consumed today. So, this is, like, where the data consumption happens. So, this can be a data warehouse or data lake, but this is where your data analysts, data scientists are querying data, they are building dashboards, reports on top of, and this is where your main data mart lives.So, for us, that is primarily a cloud data warehouse today, usually has a relational data structure. On top of that, we also do a lot of deep integrations with BI tools. So, that includes tools like Tableau, Power BI, Looker, Mode. Wherever these queries from the business stakeholders, BI engineers, data analysts, data scientists run, this is a point of reference where we use to auto-generate documentation, data models, lineage, and usage information, to give it back to the data team and everyone else so that they can learn more about the dataset they're about to use.Corey: So, given that I am seeing an increased number of companies out there talking about data discovery, what is it the Select Star does that differentiates you folks from other folks using similar verbiage in how they describe what they do?Shinji: Yeah, great question. There are many players that popping up, and also, traditional data catalog's definitely starting to offer more features in this area. The main differentiator that we have in the market today, we call it fast time-to-value. Any customer that is starting with Select Star, they get to set up their instance within 24 hours, and they'll be able to get all the analytics and data models, including column-level lineage, popularity, ER diagrams, and how other people are—top users and how other people are utilizing that data, like, literally in few hours, max to, like, 24 hours. And I would say that is the main differentiator.And most of the customers I have pointed out that setup and getting started has been super easy, which is primarily backed by a lot of automation that we've created underneath the platform. On top of that, just making it super easy and simple to use. It becomes very clear to the users that it's not just for the technical data engineers and DBAs to use; this is also designed for business stakeholders, product managers, and ops folks to start using as they are learning more about how to use data.Corey: Mapping this a little bit toward the use cases that I'm the most familiar with, this big source of data that I tend to stumble over is customer AWS bills. And that's not exactly a big data problem, given that it can fit in memory if you have a sufficiently exciting computer, but using Tableau don't wind up slicing and dicing that because at some point, Excel falls down. From my perspective, problem with Excel is that it doesn't tend to work on huge datasets very well, and from the position of Salesforce, the problem with Excel is that it doesn't cost a giant pile of money every month. So, those two things combined, Tableau is the answer for what we do. But that's sort of the end-all for us of, that's where it stops.At that point, we have dashboards that we build and queries that we run that spit out the thing we're looking at, and then that goes back to inform our analysis. We don't inherently feed that back into anything else that would then inform the rest of what we do. Now, for our use case, that probably makes an awful lot of sense because we're here to help our customers with their billing challenges, not take advantage of their data to wind up informing some giant model and mispurposing that data for other things. But if we were generating that data ourselves as a part of our operation, I can absolutely see the value of tying that back into something else. You wind up almost forming a reinforcing cycle that improves the quality of data over time and lets you understand what's going on there. What are some of the outcomes that you find that customers get to by going down this particular path?Shinji: Yeah, so just to double-click on what you just talked about, the way that we see this is how we analyze the metadata and the activity logs—system logs, user logs—of how that data has been used. So, part of our auto-generated documentation for each table, each column, each dashboard, you're going to be able to see the full data lineage: where it came from, how it was transformed in the past, and where it's going to. You will also see what we call popularity score: how many unique users are utilizing this data inside the organization today, how often. And utilizing these two core models and analysis that we create, you can start looking at first mapping out the data flow, and then determining whether or not this dataset is something that you would want to continue keeping or running the data pipelines for. Because once you start mapping these usage models of tables versus dashboards, you may find that there are recurring jobs that creates all these materialized views and tables that are feeding dashboards that are not being looked at anymore.So, with this mechanism by looking initially data lineage as a concept, a lot of companies use data lineage in order to find dependencies: what is going to break if I make this change in the column or table, as well as just debugging any of issues that is currently happening in their pipeline. So, especially when you will have to debug a SQL query or pipeline that you didn't build yourself but you need to find out how to fix it, this is a really easy way to instantly find out, like, where the data is coming from. But on top of that, if you start adding this usage information, you can trace through where the main compute is happening, which largest route table is still being queried, instead of the more summarized tables that should be used, versus which are the tables and datasets that is continuing to get created, feeding the dashboards and is those dashboards actually being used on the business side. So, with that, we have customers that have saved thousands of dollars every month just by being able to deprecate dashboards and pipelines that they were afraid of deprecating in the past because they weren't sure if anyone's actually using this or not. But adopting Select Star was a great way to kind of do a full spring clean of their data warehouse as well as their BI tool. And this is an additional benefit to just having to declutter so many old, duplicated, and outdated dashboards and datasets in their data warehouse.Corey: That is, I guess, a recurring problem that I see in many different pockets of the industry as a whole. You see it in the user visibility space, you see it in the cost control space—I even made a joke about Confluence that alludes to it—this idea that you build a whole bunch of dashboards and use it to inform all kinds of charts and other systems, but then people are busy. It feels like there's no ‘and then.' Like, one of the most depressing things in the universe that you can see after having spent a fair bit of effort to build up those dashboards is the analytics for who internally has looked at any of those dashboards since the demo you gave showing it off to everyone else. It feels like in many cases, we put all these projects and amount of effort into building these things out that then don't get used.People don't want to be informed by data they want to shoot from their gut. Now, sometimes that's helpful when we're talking about observability tools that you use to trace down outages, and, “Well, our site's really stable. We don't have to look at that.” Very awesome, great, awesome use case. The business insight level of dashboard just feels like that's something you should really be checking a lot more than you are. How do you see that?Shinji: Yeah, for sure. I mean, this is why we also update these usage metrics and lineage every 24 hours for all of our customers automatically, so it's just up-to-date. And the part that more customers are asking for where we are heading to—earlier, I mentioned that our main focus has been on analyzing data consumption and understanding the consumption behavior to drive better usage of your data, or making data usage much easier. The part that we are starting to now see is more customers wanting to extend those feature capabilities to their staff of where the data is being generated. So, connecting the similar amount of analysis and metadata collection for production databases, Kafka Queues, and where the data is first being generated is one of our longer-term goals. And then, then you'll really have more of that, up to the source level, of whether the data should be even collected or whether it should even enter the data warehouse phase or not.Corey: One of the challenges I see across the board in the data space is that so many products tend to have a very specific point of the customer lifecycle, where bringing them in makes sense. Too early and it's, “Data? What do you mean data? All I have are these logs, and their purpose is basically to inflate my AWS bill because I'm bad at removing them.” And on the other side, it's, “Great. We pioneered some of these things and have built our own internal enormous system that does exactly what we need to do.” It's like, “Yes, Google, you're very smart. Good job.” And most people are somewhere between those two extremes. Where are customers on that lifecycle or timeline when using Select Star makes sense for them?Shinji: Yeah, I think that's a great question. Also the time, the best place where customers would use Select Star for is that after they have their cloud data warehouse set up. Either they have finished their migration, they're starting to utilize it with their BI tools, and they're starting to notice that it's not just, like, you know, ten to fifty tables that they're starting with; most of them have more than hundreds of tables. And they're feeling that this is starting to go out of control because we have all these data, but we are not a hundred percent sure what exactly is in our database. And this usually just happens more in larger companies, companies at thousand-plus employees, and they usually find a lot of value out of Select Star right away because, like, we will start pointing out many different things.But we also see a lot of, like, forward-thinking, fast-growing startups that are at the size of a few hundred employees, you know, they now have between five to ten-person data team, and they are really creating the right single source of truth of their data knowledge through a Select Star. So, I think you can start anywhere from when your data team size is, like, beyond five and you're continuing to grow because every time you're trying to onboard a data analyst, data scientist, you will have to go through, like, basically the same type of training of your data model, and it might actually look different because the data models and the new features, new apps that you're integrating this changes so quickly. So, I would say it's important to have that base early on and then continue to grow. But we do also see a lot of companies coming to us after having thousands of datasets or tens of thousands of datasets that it's really, like, very hard to operate and onboard anyone. And this is a place where we really shine to help their needs, as well.Corey: Sort of the, “I need a database,” to the, “Help, I have too many databases,” pipeline, where [laugh] at some point people start to—wanting to bring organization to the chaos. One thing I like about your model is that you don't seem to be making the play that every other vendor in the data space tends to, which is, “Oh, we want you to move your data onto our systems. The end.” You operate on data that is in place, which makes an awful lot of sense for the kinds of things that we're talking about. Customers are flat out not going to move their data warehouse over to your environment, just because the data gravity is ludicrous. Just the sheer amount of money it would take to egress that data from a cloud provider, for example, is monstrous.Shinji: Exactly. [laugh]. And security concerns. We don't want to be liable for any of the data—and this is, like, a very specific decision we've made very early on the company—to not access data, to not egress any of the real data, and to provide as much value as possible just utilizing the metadata and logs. And depending on the types of data warehouses, it also can be really efficient because the query history or the metadata systems tables are indexed separately. Usually, it's much lighter load on the compute side. And that definitely has, like, worked well for our advantage, especially being a SaaS tool.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.Corey: What I like is just how straightforward the integrations are. It's clear you're extraordinarily agnostic as far as where the data itself lives. You integrate with Google's BigQuery, with Amazon Redshift, with Snowflake, and then on the other side of the world with Looker, and Tableau, and other things as well. And one of the example use cases you give is find the upstream table in BigQuery that a Looker dashboard depends on. That's one of those areas where I see something like that, and, oh, I can absolutely see the value of that.I have two or three DynamoDB tables that drive my newsletter publication system that I built—because I have deep-seated emotional problems and I take it out and everyone else via code—but as a small, contained system that I can still fit in my head. Mostly. And I still forget which table is which in some cases. Down the road, especially at scale, “Okay, where is the actual data source that's informing this because it doesn't necessarily match what I'm expecting,” is one of those incredibly valuable bits of insight. It seems like that is something that often gets lost; the provenance of data doesn't seem to work.And ideally, you know, you're staffing a company with reasonably intelligent people who are going to look at the results of something and say, “That does not align with my expectations. I'm going to dig.” As opposed to the, “Oh, yeah, that seems plausible. I'll just go with whatever the computer says.” There's an ocean of nuance between those two, but it's nice to be able to establish the validity of the path that you've gone down in order to set some of these things up.Shinji: Yeah, and this is also super helpful if you're tasked to debug a dashboard or pipeline that you did not build yourself. Maybe the person has left the company, or maybe they're out-of-office, but this dashboard has been broken and you're quote-unquote, “On call,” for data. What are you going to do? You're going to—without a tool that can show you a full lineage, you will have to start digging through somebody else's SQL code and try to map out, like, where the data is coming from, if this is calculating correctly. Usually takes, you know, few hours to just get to the bottom of the issue. And this is one of the main use cases that our customers bring up every single time, as more of, like, this is now the go-to place every time there is any data questions or data issues.Corey: The first and golden rule of cloud economics is step one, turn that shit off.Shinji: [laugh].Corey: When people are using something, you can optimize the hell out of it however you want, but nothing's going to beat turning it off. One challenge is when we're looking at various accounts and we see a Redshift cluster, and it's, “Okay. That thing's costing a few million bucks a year and no one seems to know anything about it.” They keep pointing to other teams, and it turns into this giant, like, finger-pointing exercise where no one seems to have responsibility for it. And very often, our clients will choose not to turn that thing off because on the one hand, if you don't turn it off, you're going to spend a few million bucks a year that you otherwise would not have had to.On the other, if you delete the data warehouse, and it turns out, oh, yeah, that was actually kind of important, now we don't have a company anymore. It's a question of which is the side you want to be wrong on. And in some levels, leaving something as it is and doing something else is always a more defensible answer, just because the first time your cost-saving exercises take out production, you're generally not allowed to save money anymore. This feels like it helps get to that source of truth a heck of a lot more effectively than tracing individual calls and turning into basically data center archaeologists.Shinji: [laugh]. Yeah, for sure. I mean, this is why from the get go, we try to give you all your tables, all of your database, just ordered by popularity. So, you can also see overall, like, from all the tables, whether that's thousands or tens of thousands, you're seeing the most used, has the most number of dependencies on the top, and you can also filter it by all the database tables that hasn't been touched in the last 90 days. And just having this, like, high-level view gives a lot of ideas to the data platform team about how they can optimize usage of their data warehouse.Corey: From where I tend to sit, an awful lot of customers are still relatively early in their data journey. An awful lot of the marketing that I receive from various AWS mailing lists that I found myself on because I've had the temerity to open accounts has been along the lines of oh, data discovery is super important, but first, they presuppose that I've already bought into this idea that oh, every company must be a completely data-driven company. The end. Full stop.And yeah, we're a small bespoke services consultancy. I don't necessarily know that that's the right answer here. But then it takes it one step further and starts to define the idea of data discovery as, ah, you will use it to find a PII or otherwise sensitive or restricted data inside of your datasets so you know exactly where it lives. And sure, okay, that's valuable, but it also feels like a very narrow definition compared to how you view these things.Shinji: Yeah. Basically, the way that we see data discovery is it's starting to become more of an essential capability in order for you to monitor and understand how your data is actually being used internally. It basically gives you the insights around sure, like, what are the duplicated datasets, what are the datasets that have that descriptions or not, what are something that may contain sensitive data, so on and so forth, but that's still around the characteristics of the physical datasets. Whereas I think the part that's really important around data discovery that is not being talked about as much is how the data can actually be used better. So, have it as more of a forward-thinking mechanism and in order for you to actually encourage more people to utilize data or use the data correctly, instead of trying to contain this within just one team is really where I feel like data discovery can help.And in regards to this, the other big part around data discovery is really opening up and having that transparency just within the data team. So, just within the data team, they always feel like they do have that access to the SQL queries and you can just go to GitHub and just look at the database itself, but it's so easy to get lost in the sea of metadata that is just laid out as just the list; there isn't much context around the data itself. And that context and with along with the analytics of the metadata is what we're really trying to provide automatically. So eventually, like, this can be also seen as almost like a way to, like, monitor the datasets, like, how you're currently monitoring your applications through Datadog or your website with your Google Analytics, this is something that can be also used as more of a go-to source of truth around what your state of the data is, how that's defined, and how that's being mapped to different business processes, so that there isn't much confusion around data. Everything can be called the same, but underneath it actually can mean very different things. Does that make sense?Corey: No, it absolutely does. I think that this is part of the challenge in trying to articulate value that is, I guess, specific to this niche across an entire industry. The context that drives data is going to be incredibly important, and it feels like so much of the marketing in the space is aimed at one or two pre-imagined customer profiles. And that has the side effect of making customers for whom that model doesn't align, look and feel like either doing something wrong, or makes it look like the vendor who's pitching this is somewhat out of touch. I know that I work in a relatively bounded problem space, but I still learn new things about AWS billing on virtually every engagement that I go on, just because you always get to learn more about how customers view things and how they view not just their industry, but also the specificities of their own business and their own niche.I think that is one of the challenges historically, with the idea of letting software do everything. Do you find the problems that you're solving tend to be global in nature or are you discovering strange depths of nuance on a customer-by-customer basis at this point?Shinji: Overall, a lot of the problems that we solve and the customers that we work with is very industry agnostic. As long as you are having many different datasets that you need to manage, there are common problems that arises, regardless of the industry that you're in. We do observe some industry-specific issues because your data is either, it's an unstructured data, or your data is primarily events, or you know, depending on how the data looks like, but primarily because of most of the BI solutions and data warehouses are operating as a relational databases, this is a part where we really try to build a lot of best practices, and the common analytics that we can apply to every customer that's using Select Star.Corey: I really want to thank you for taking so much time to go through the ins and outs of what it is you're doing these days. If people want to learn more, where's the best place to find you?Shinji: Yeah, I mean, it's been fun [laugh] talking here. So, we are at selectstar.com. That's our website. You can sign up for a free trial. It's completely self-service, so you don't need to get on a demo but, like, we'll also help you onboard and happy to give a free demo to whoever that is interested.We are also on LinkedIn and Twitter under selectstarhq. Yeah, I mean, we're happy to help for any companies that have these issues around wanting to increase their discoverability of data, and want to help their data team and the rest of the company to be able to utilize data better.Corey: And we will, of course, put links to all of that in the [show notes 00:28:58]. Thank you so much for your time today. I really appreciate it.Shinji: Great. Thanks for having me, Corey.Corey: Shinji Kim, CEO and founder at Select Star. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment that I won't be able to discover because there are far too many podcast platforms out there, and I have no means of discovering where you've said that thing unless you send it to me.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Azul and the Current State of the Java Ecosystem with Scott Sellers

Screaming in the Cloud

Play Episode Listen Later Sep 20, 2022 36:35


About ScottWith more than 28 years of successful leadership in building high technology companies and delivering advanced products to market, Scott provides the overall strategic leadership and visionary direction for Azul Systems.Scott has a consistent proven track record of vision, leadership, and success in enterprise, consumer and scientific markets. Prior to co-founding Azul Systems, Scott founded 3dfx Interactive, a graphics processor company that pioneered the 3D graphics market for personal computers and game consoles. Scott served at 3dfx as Vice President of Engineering, CTO and as a member of the board of directors and delivered 7 award-winning products and developed 14 different graphics processors. After a successful initial public offering, 3dfx was later acquired by NVIDIA Corporation.Prior to 3dfx, Scott was a CPU systems architect at Pellucid, later acquired by MediaVision. Before Pellucid, Scott was a member of the technical staff at Silicon Graphics where he designed high-performance workstations.Scott graduated from Princeton University with a bachelor of science, earning magna cum laude and Phi Beta Kappa honors. Scott has been granted 8 patents in high performance graphics and computing and is a regularly invited keynote speaker at industry conferences.Links Referenced:Azul: https://www.azul.com/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: I come bearing ill tidings. Developers are responsible for more than ever these days. Not just the code that they write, but also the containers and the cloud infrastructure that their apps run on. Because serverless means it's still somebody's problem. And a big part of that responsibility is app security from code to cloud. And that's where our friend Snyk comes in. Snyk is a frictionless security platform that meets developers where they are - Finding and fixing vulnerabilities right from the CLI, IDEs, Repos, and Pipelines. Snyk integrates seamlessly with AWS offerings like code pipeline, EKS, ECR, and more! As well as things you're actually likely to be using. Deploy on AWS, secure with Snyk. Learn more at Snyk.co/scream That's S-N-Y-K.co/screamCorey: This episode is sponsored in part by our friends at AWS AppConfig. Engineers love to solve, and occasionally create, problems. But not when it's an on-call fire-drill at 4 in the morning. Software problems should drive innovation and collaboration, NOT stress, and sleeplessness, and threats of violence. That's why so many developers are realizing the value of AWS AppConfig Feature Flags. Feature Flags let developers push code to production, but hide that that feature from customers so that the developers can release their feature when it's ready. This practice allows for safe, fast, and convenient software development. You can seamlessly incorporate AppConfig Feature Flags into your AWS or cloud environment and ship your Features with excitement, not trepidation and fear. To get started, go to snark.cloud/appconfig. That's snark.cloud/appconfig.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest on this promoted episode today is Scott Sellers, CEO and co-founder of Azul. Scott, thank you for joining me.Scott: Thank you, Corey. I appreciate the opportunity in talking to you today.Corey: So, let's start with what you're doing these days. What is Azul? What do you folks do over there?Scott: Azul is an enterprise software and SaaS company that is focused on delivering more efficient Java solutions for our customers around the globe. We've been around for 20-plus years, and as an entrepreneur, we've really gone through various stages of different growth and different dynamics in the market. But at the end of the day, Azul is all about adding value for Java-based enterprises, Java-based applications, and really endearing ourselves to the Java community.Corey: This feels like the sort of space where there are an awful lot of great business cases to explore. When you look at what's needed in that market, there are a lot of things that pop up. The surprising part to me is that this is the direction that you personally went in. You started your career as a CPU architect, to my understanding. You were then one of the co-founders of 3dfx before it got acquired by Nvidia.You feel like you've spent your career more as a hardware guy than working on the SaaS side of the world. Is that a misunderstanding of your path, or have things changed, or is this just a new direction? Help me understand how you got here from where you were.Scott: I'm not exactly sure what the math would say because I continue to—can't figure out a way to stop time. But you're correct that my academic background, I was an electrical engineer at Princeton and started my career at Silicon Graphics. And that was when I did a lot of fantastic and fascinating work building workstations and high-end graphics systems, you know, back in the day when Silicon Graphics really was the who's who here in Silicon Valley. And so, a lot of my career began in the context of hardware. As you mentioned, I was one of the founders of graphics company called 3dfx that was one of, I think, arguably the pioneer in terms of bringing 3d graphics to the masses, if you will.And we had a great run of that. That was a really fun business to be a part of just because of what was going on in the 3d world. And we took that public and eventually sold that to Nvidia. And at that point, my itch, if you will, was really learning more about the enterprise segment. I'd been involved with professional graphics with SGI, I had been involved with consumer graphics with 3dfx.And I was fascinated just to learn about the enterprise segment. And met a couple people through a mutual friend around the 2001 timeframe, and they started talking about this thing called Java. And you know, I had of course heard about Java, but as a consumer graphics guy, didn't have a lot of knowledge about it or experience with it. And the more I learned about it, recognized that what was going on in the Java world—and credit to Sun for really creating, obviously, not only language, but building a community around Java—and recognized that new evolutions of developer paradigms really only come around once a decade if then, and was convinced and really got excited about the opportunity to ride the wave of Java and build a company around that.Corey: One of the blind spots that I have throughout the entire world of technology—and to be fair, I have many of them, but the one most relevant to this conversation, I suppose, is the Java ecosystem as a whole. I come from a background of being a grumpy Unix sysadmin—because I've never met a happy one of those in my entire career—and as a result, scripting languages is where everything that I worked with started off. And on the rare occasions, I worked in Java shops, it was, “Great. We're going to go—here's a WAR file. Go ahead and deploy this with Tomcat,” or whatever else people are going to use. But basically, “Don't worry your pretty little head about that.”At most, I have to worry about how to configure a heap or whatnot. But it's from the outside looking in, not having to deal with that entire ecosystem as a whole. And what I've seen from that particular perspective is that every time I start as a technologist, or even as a consumer trying to install some random software package in the depths of the internet, and I have to start thinking about Java, it always feels like I'm about to wind up in a confusing world. There are a number of software packages that I installed back in, I want to say the early-2010s or whatnot. “Oh, you need to have a Java runtime installed on your Mac,” for example.And okay, going through Oracle site, do I need the JRE? Do I need the JDK? Oh, there's OpenJDK, which kind of works, kind of doesn't. Amazon got into the space with Corretto, which because that sounds nothing whatsoever, like Java, but strange names coming from Amazon is basically par for the course for those folks. What is the current state of the Java ecosystem, for those of us who have—basically the closest we've ever gotten is JavaScript, which is nothing alike except for the name.Scott: And you know, frankly, given the protection around the name Java—and you know, that is a trademark that's owned by Oracle—it's amazing to me that JavaScript has been allowed to continue to be called JavaScript because as you point out, JavaScript has nothing to do with Java per se.Corey: Well, one thing they do have in common I found out somewhat recently is that Oracle also owns the trademark for JavaScript.Scott: Ah, there you go. Maybe that's why it continues.Corey: They're basically a law firm—three law firms in a trench coat, masquerading as a tech company some days.Scott: Right. But anyway, it is a confusing thing because you know, I think, arguably, JavaScript, by the numbers, probably has more programmers than any other language in the world, just given its popularity as a web language. But to your question about Java specifically, it's had an evolving life, and I think the state where it is today, I think it's in the most exciting place it's ever been. And I'll walk you through kind of why I believe that to be the case.But Java has evolved over time from its inception back in the days when it was called, I think it was Oak when it was originally conceived, and Sun had eventually branded it as Java. And at the time, it truly was owned by Sun, meaning it was proprietary code; it had to be licensed. And even though Sun gave it away, in most cases, it still at the end of the day, it was a commercially licensed product, if you will, and platform. And if you think about today's world, it would not be conceivable to create something that became so popular with programmers that was a commercially licensed product today. It almost would be mandated that it would be open-source to be able to really gain the type of traction that Java has gained.And so, even though Java was really garnering interest, you know, not only within the developer community, but also amongst commercial entities, right, everyone—and the era now I'm talking about is around the 2000 era—all of the major software vendors, whether it was obviously Sun, but then you had Oracle, you had IBM, companies like BEA, were really starting to blossom at that point. It was a—you know, you could almost not find a commercial software entity that was not backing Java. But it was still all controlled by Sun. And all that success ultimately led to a strong outcry from the community saying this has to be open-source; this is too important to be beholden to a single vendor. And that decision was made by Sun prior to the Oracle acquisition, they actually open-sourced the Java runtime code and they created an open-source project called OpenJDK.And to Oracle's credit, when they bought Sun—which I think at the time when you really look back, Oracle really did not have a lot of track record, if you will, of being involved with an open-source community—and I think when Oracle acquired Sun, there was a lot of skepticism as to what's going to happen to Java. Is Oracle going to make this thing, you know, back to the old days, proprietary Oracle, et cetera? And really—Corey: I was too busy being heartbroken over Solaris at that point to pay much attention to the Java stuff, but it felt like it was this—sort of the same pattern, repeated across multiple ecosystems.Scott: Absolutely. And even though Sun had also open-sourced Solaris, with the OpenSolaris project, that was one of the kinds of things that it was still developed very much in a closed environment, and then they would kind of throw some code out into the open world. And no one really ran OpenSolaris because it wasn't fully compatible with Solaris. And so, that was a faint attempt, if you will.But Java was quite different. It was truly all open-sourced, and the big difference that—and again, I give Oracle a lot of credit for this because this was a very important time in the evolution of Java—that Oracle, maintained Sun's commitment to not only continue to open-source Java but most importantly, develop it in the open community. And so, you know, again, back and this is the 2008, ‘09, ‘10 timeframe, the evolution of Java, the decisions, the standards, you know, what goes in the platform, what doesn't, decisions about updates and those types of things, that truly became a community-led world and all done in the open-source. And credit to Oracle for continuing to do that. And that really began the transition away from proprietary implementations of Java to one that, very similar to Linux, has really thrived because of the true open-source nature of what Java is today.And that's enabled more and more companies to get involved with the evolution of Java. If you go to the OpenJDK page, you'll see all of the not only, you know, incredibly talented individuals that are involved with the evolution of Java, but again, a who's who in pretty much every major commercial entities in the enterprise software world is also somehow involved in the OpenJDK community. And so, it really is a very vibrant, evolving standard. And some of the tactical things that have happened along the way in terms of changing how versions of Java are released still also very much in the context of maintaining compatibility and finding that careful balance of evolving the platform, but at the same time, recognizing that there is a lot of Java applications out there, so you can't just take a right-hand turn and forget about the compatibility side of things. But we as a community overall, I think, have addressed that very effectively, and the result has been now I think Java is more popular than ever and continues to—we liken it kind of to the mortar and the brick walls of the enterprise. It's a given that it's going to be used, certainly by most of the enterprises worldwide today.Corey: There's a certain subset of folk who are convinced the Java, “Oh, it's this a legacy programming language, and nothing modern or forward-looking is going to be built in it.” Yeah, those people generally don't know what the internal language stack looks like at places like oh, I don't know, AWS, Google, and a few others, it is very much everywhere. But it also feels, on some level, like, it's a bit below the surface-level of awareness for the modern full-stack developer in some respects, right up until suddenly it's very much not. How is Java evolving in a cloud these days?Scott: Well, what we see happening—you know, this is true for—you know, I'm a techie, so I can talk about other techies. I mean as techies, we all like the new thing, right? I mean, it's not that exciting to talk about a language that's been around for 20-plus years. But that doesn't take away from the fact that we still all use keyboards. I mean, no one really talks about what keyboard they use anymore—unless you're really into keyboards—but at the end of the day, it's still a fundamental tool that you use every single day.And Java is kind of in the same situation. The reason that Java continues to be so fundamental is that it really comes back to kind of reinventing the wheel problem. Are there are other languages that are more efficient to code in? Absolutely. Are there other languages that, you know, have some capabilities that the Java doesn't have? Absolutely.But if you have the ability to reinvent everything from scratch, sure, go for it. And you also don't have to worry about well, can I find enough programmers in this, you know, new hot language, okay, good luck with that. You might be able to find dozens, but when you need to really scale a company into thousands or tens of thousands of developers, good luck finding, you know, everyone that knows, whatever your favorite hot language of the day is.Corey: It requires six years experience in a four-year-old language. Yeah, it's hard to find that, sometimes.Scott: Right. And you know, the reality is, is that really no application ever is developed from scratch, right? Even when an application is, quote, new, immediately, what you're using is frameworks and other things that have written long ago and proven to be very successful.Corey: And disturbing amounts of code copied and pasted from Stack Overflow.Scott: Absolutely.Corey: But that's one of those impolite things we don't say out loud very often.Scott: That's exactly right. So, nothing really is created from scratch anymore. And so, it's all about building blocks. And this is really where this snowball of Java is difficult to stop because there is so much third-party code out there—and by that, I mean, you know, open-source, commercial code, et cetera—that is just so leveraged and so useful to very quickly be able to take advantage of and, you know, allow developers to focus on truly new things, not reinventing the wheel for the hundredth time. And that's what's kind of hard about all these other languages is catching up to Java with all of the things that are immediately available for developers to use freely, right, because most of its open-source. That's a pretty fundamental Catch-22 about when you start talking about the evolution of new languages.Corey: I'm with you so far. The counterpoint though is that so much of what we're talking about in the world of Java is open-source; it is freely available. The OpenJDK, for example, says that right on the tin. You have built a company and you've been in business for 20 years. I have to imagine that this is not one of those stories where, “Oh, all the things we do, we give away for free. But that's okay. We make it up in volume.” Even the venture capitalist mindset tends to run out of patience on those kinds of timescales. What is it you actually do as a business that clearly, obviously delivers value for customers but also results in, you know, being able to meet payroll every week?Scott: Right? Absolutely. And I think what time has shown is that, with one very notable exception and very successful example being Red Hat, there are very, very few pure open-source companies whose business is only selling support services for free software. Most successful businesses that are based on open-source are in one-way shape or form adding value-added elements. And that's our strategy as well.The heart of everything we do is based on free code from OpenJDK, and we have a tremendous amount of business that we are following the Red Hat business model where we are selling support and long-term access and a huge variety of different operating system configurations, older Java versions. Still all free software, though, right, but we're selling support services for that. And that is, in essence, the classic Red Hat business model. And that business for us is incredibly high growth, very fast-moving, a lot of that business is because enterprises are tired of paying the very high price to Oracle for Java support and they're looking for an open-source alternative that is exactly the same thing, but comes in pure open-source form and with a vendor that is as reputable as Oracle. So, a lot of our businesses based on that.However, on top of that, we also have value-added elements. And so, our product that is called Azul Platform Prime is rooted in OpenJDK—it is OpenJDK—but then we've added value-added elements to that. And what those value-added elements create is, in essence, a better Java platform. And better in this context means faster, quicker to warm up, elimination of some of the inconsistencies of the Java runtime in terms of this nasty problem called garbage collection which causes applications to kind of bounce around in terms of performance limitations. And so, creating a better Java is another way that we have monetized our company is value-added elements that are built on top of OpenJDK. And I'd say that part of the business is very typical for the majority of enterprise software companies that are rooted in open-source. They're typically adding value-added components on top of the open-source technology, and that's our similar strategy as well.And then the third evolution for us, which again is very tried-and-true, is evolving the business also to add SaaS offerings. So today, the majority of our customers, even though they deploy in the cloud, they're stuck customer-managed and so they're responsible for where do I want to put my Java runtime on building out my stack and cetera, et cetera. And of course, that could be on-prem, but like I mentioned, the majority are in the cloud. We're evolving our product offerings also to have truly SaaS-based solutions so that customers don't even need to manage those types of stacks on their own anymore.Corey: On some level, it feels like we're talking about two different things when we talk about cloud and when we talk about programming languages, but increasingly, I'm starting to see across almost the entire ecosystem that different languages and different cloud providers are in many ways converging. How do you see Java changing as cloud-native becomes the default rather than the new thing?Scott: Great question. And I think the thing to recognize about, really, most popular programming languages today—I can think of very few exceptions—these languages were created, envisioned, implemented if you will, in a day when cloud was not top-of-mind, and in many cases, certainly in the case of Java, cloud didn't even exist when Java was originally conceived, nor was that the case when you know, other languages, such as Python, or JavaScript, or on and on. So, rethinking how these languages should evolve in very much the context of a cloud-native mentality is a really important initiative that we certainly are doing and I think the Java community is doing overall. And how you architect not only the application, but even the Java runtime itself can be fundamentally different if you know that the application is going to be deployed in the cloud.And I'll give you an example. Specifically, in the world of any type of runtime-based language—and JavaScript is an example of that; Python is an example of that; Java is an example of that—in all of those runtime-based environments, what that basically means is that when the application is run, there's a piece of software that's called the runtime that actually is running that application code. And so, you can think about it as a middleware piece of software that sits between the operating system and the application itself. And so, that runtime layer is common across those languages and those platforms that I mentioned. That runtime layer is evolving, and it's evolving in a way that is becoming more and more cloud-native in it's thinking.The process itself of actually taking the application, compiling it into whatever underlying architecture it may be running on—it could be an x86 instance running on Amazon; it could be, you know, for example, an ARM64, which Amazon has compute instances now that are based on an ARM64 processor that they call Graviton, which is really also kind of altering the price-performance of the compute instances on the AWS platform—that runtime layer magically takes an application that doesn't have to be aware of the underlying hardware and transforms that into a way that can be run. And that's a very expensive process; it's called just-in-time compiling, and that just-in-time compilation, in today's world—which wasn't really based on cloud thinking—every instance, every compute instance that you deploy, that same JIT compilation process is happening over and over again. And even if you deploy 100 instances for scalability, every one of those 100 instances is doing that same work. And so, it's very inefficient and very redundant. Contrast that to a cloud-native thinking: that compilation process should be a service; that service should be done once.The application—you know, one instance of the application is actually run and there are the other ninety-nine should just reuse that compilation process. And that shared compiler service should be scalable and should be able to scale up when applications are launched and you need more compilation resources, and then scaled right back down when you're through the compilation process and the application is more moving into the—you know, to the runtime phase of the application lifecycle. And so, these types of things are areas that we and others are working on in terms of evolving the Java runtime specifically to be more cloud-native.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.Corey: This feels like it gets even more critical when we're talking about things like serverless functions across basically all the cloud providers these days, where there's the whole setup, everything in the stack, get it running, get it listening, ready to go, to receive a single request and then shut itself down. It feels like there are a lot of operational efficiencies possible once you start optimizing from a starting point of yeah, this is what that environment looks like, rather than us big metal servers sitting in a rack 15 years ago.Scott: Yeah. I think the evolution of serverless appears to be headed more towards serverless containers as opposed to serverless functions. Serverless functions have a bunch of limitations in terms of when you think about it in the context of a complex, you know, microservices-based deployment framework. It's just not very efficient, to spin up and spin down instances of a function if that actually is being—it is any sort of performance or latency-sensitive type of applications. If you're doing something very rarely, sure, it's fine; it's efficient, it's elegant, et cetera.But any sort of thing that has real girth to it—and girth probably means that's what's driving your application infrastructure costs, that's what's driving your Amazon bill every month—those types of things typically are not going to be great for starting and stopping functional instances. And so, serverless is evolving more towards thinking about the container itself not having to worry about the underlying operating system or the instance on Amazon that it's running on. And that's where, you know, we see more and more of the evolution of serverless is thinking about it at a container-level as opposed to a functional level. And that appears to be a really healthy steady state, so it gets the benefits of not having to worry about all the underlying stuff, but at the same time, doesn't have the downside of trying to start and stop functional influences at a given point in time.Corey: It seems to me that there are really two ways of thinking about cloud. The first is what I think a lot of companies do their first outing when they're going into something like AWS. “Okay, we're going to get a bunch of virtual machines that they call instances in AWS, we're going to run things just like it's our data center except now data transfer to the internet is terrifyingly expensive.” The more quote-unquote, “Cloud-native” way of thinking about this is what you're alluding to where there's, “Here's some code that I wrote. I want to throw it to my cloud provider and just don't tell me about any of the infrastructure parts. Execute this code when these conditions are met and leave me alone.”Containers these days seem to be one of our best ways of getting there with a minimum of fuss and friction. What are you seeing in the enterprise space as far as adoption of those patterns go? Or are we seeing cloud repatriation showing up as a real thing and I'm just not in the right place to see it?Scott: Well, I think as a cloud journey evolves, there's no question that—and in fact it's even silly to say that cloud is here to stay because I think that became a reality many, many years ago. So really, the question is, what are the challenges now with cloud deployments? Cloud is absolutely a given. And I think you stated earlier, it's rare that, whether it's a new company or a new application, at least in most businesses that don't have specific regulatory requirements, that application is highly, highly likely to be envisioned to be initially and only deployed in the cloud. That's a great thing because you have so many advantages of not having to purchase infrastructure in advance, being able to tap into all of the various services that are available through the cloud providers. No one builds databases anymore; you're just tapping into the service that's provided by Azure or AWS, or what have you.And, you know, just that specific example is a huge amount of savings in terms of just overhead, and license costs, and those types of stuff, and there's countless examples of that. And so, the services that are available in the cloud are unquestioned. So, there's countless advantages of why you want to be in the cloud. The downside, however, the cloud that is, if at the end of the day, AWS, Microsoft with Azure, Google with GCP, they are making 30% margin on that cloud infrastructure. And in the days of hardware, when companies would actually buy their servers from Dell, or HP, et cetera, those businesses are 5% margin.And so, where's that 25% going? Well, the 25% is being paid for by the users of cloud, and as a result of that, when you look at it purely from an operational cost perspective, it is more expensive to run in the cloud than it is back in the legacy days, right? And that's not to say that the industry has made the wrong choice because there's so many advantages of being in cloud, there's no doubt about it. And there should be—you know, and the cloud providers deserve to take some amount of margin to provide the services that they provide; there's no doubt about that. The question is, how do you do the best of all worlds?And you know, there is a great blog by a couple of the partners in Andreessen Horowitz, they called this the Cloud Paradox. And the Cloud Paradox really talks about the challenges. It's really a Catch-22; how do you get all the benefits of cloud but do that in a way that is not overly taxing from a cost perspective? And a lot of it comes down to good practices and making sure that you have the right monitoring and culture within an enterprise to make sure that cloud cost is a primary thing that is discussed and metric, but then there's also technologies that can help so that you don't have to even think about what you really don't ever want to do: repatriating, which is about the concept of actually moving off the cloud back to the old way of doing things. So certainly, I don't believe repatriation is a practical solution for ongoing and increasing cloud costs. I believe technology is a solution to that.And there are technologies such as our product, Azul Platform Prime, that in essence, allows you to do more with less, right, get all the benefits of cloud, deploy in your Amazon environment, deploy in your Azure environment, et cetera, but imagine if instead of needing a hundred instances to handle your given workload, you could do that with 50 or 60. Tomorrow, that means that you can start savings and being able to do that simply by changing your JVM from a standard OpenJDK or Oracle JVM to something like Platform Prime, you can immediately start to start seeing the benefits from that. And so, a lot of our business now and our growth is coming from companies that are screaming under the ongoing cloud costs and trying to keep them in line, and using technology like Azul Platform Prime to help mitigate those costs.Corey: I think that there is a somewhat foolish approach that I'm seeing taken by a lot of folks where there are some companies that are existentially anti-cloud, if for no other reason than because if the cloud wins, then they don't really have a business anymore. The problem I see with that is that it seems that their solution across the board is to turn back the clock where if I'm going to build a startup, it's time for me to go buy some servers and a rack somewhere and start negotiating with bandwidth providers. I don't see that that is necessarily viable for almost anyone. We aren't living in 1995 anymore, despite how much some people like to pretend we are. It seems like if there are workloads—for which I agree, cloud is not necessarily an economic fit, first, I feel like the market will fix that in the fullness of time, but secondly, on an individual workload belonging in a certain place is radically different than, “Oh, none of our stuff should live on cloud. Everything belongs in a data center.” And I just think that companies lose all credibility when they start pretending that it's any other way.Scott: Right. I'd love to see the reaction of the venture capitalists' face when an entrepreneur walks in and talks about how their strategy for deploying their SaaS service is going to be buying hardware and renting some space in the local data center.Corey: Well, there is a good cost control method, if you think about it. I mean very few engineers are going to accidentally spin up an $8 million cluster in a data center a second time, just because there's no space left for it.Scott: And you're right; it does happen in the cloud as well. It's just, I agree with you completely that as part of the evolution of cloud, in general, is an ever-improving aspect of cost and awareness of cost and building in technologies that help mitigate that cost. So, I think that will continue to evolve. I think, you know, if you really think about the cloud journey, cost, I would say, is still in early phases of really technologies and practices and processes of allowing enterprises to really get their head around cost. I'd still say it's a fairly immature industry that is evolving quickly, just given the importance of it.And so, I think in the coming years, you're going to see a radical improvement in terms of cost awareness and technologies to help with costs, that again allows you to the best of all worlds. Because, you know, if you go back to the Dark Ages and you start thinking about buying servers and infrastructure, then you are really getting back to a mentality of, “I've got to deploy everything. I've got to buy software for my database. I've got to deploy it. What am I going to do about my authentication service? So, I got to buy this vendor's, you know, solution, et cetera.” And so, all that stuff just goes away in the world of cloud, so it's just not practical, in this day and age I think, to think about really building a business that's not cloud-native from the beginning.Corey: I really want to thank you for spending so much time talking to me about how you view the industry, the evolution we've seen in the Java ecosystem, and what you've been up to. If people want to learn more, where's the best place for them to find you?Scott: Well, there's a thing called a website that you may not have heard of, it's really cool.Corey: Can I build it in Java?Scott: W-W-dot—[laugh]. Yeah. Azul website obviously has an awful lot of information about that, Azul is spelled A-Z-U-L, and we sometimes get the question, “How in the world did you name a company—why did you name it Azul?”And it's kind of a funny story because back in the days of Azul when we thought about, hey, we want to be big and successful, and at the time, IBM was the gold standard in terms of success in the enterprise world. And you know, they were Big Blue, so we said, “Hey, we're going to be a little blue. Let's be Azul.” So, that's where we began. So obviously, go check out our site.We're very present, also, in the Java community. We're, you know, many developer conferences and talks. We sponsor and run many of what's called the Java User Groups, which are very popular 10-, 20-person meetups that happen around the globe on a regular basis. And so, you know, come check us out. And I appreciate everyone's time in listening to the podcast today.Corey: No, thank you very much for spending as much time with me as you have. It's appreciated.Scott: Thanks, Corey.Corey: Scott Sellers, CEO and co-founder of Azul. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an entire copy of the terms and conditions from Oracle's version of the JDK.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Future of Serverless with Allen Helton

Screaming in the Cloud

Play Episode Listen Later Sep 15, 2022 39:06


About AllenAllen is a cloud architect at Tyler Technologies. He helps modernize government software by creating secure, highly scalable, and fault-tolerant serverless applications.Allen publishes content regularly about serverless concepts and design on his blog - Ready, Set Cloud!Links Referenced: Ready, Set, Cloud blog: https://readysetcloud.io Tyler Technologies: https://www.tylertech.com/ Twitter: https://twitter.com/allenheltondev Linked: https://www.linkedin.com/in/allenheltondev/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at AWS AppConfig. Engineers love to solve, and occasionally create, problems. But not when it's an on-call fire-drill at 4 in the morning. Software problems should drive innovation and collaboration, NOT stress, and sleeplessness, and threats of violence. That's why so many developers are realizing the value of AWS AppConfig Feature Flags. Feature Flags let developers push code to production, but hide that that feature from customers so that the developers can release their feature when it's ready. This practice allows for safe, fast, and convenient software development. You can seamlessly incorporate AppConfig Feature Flags into your AWS or cloud environment and ship your Features with excitement, not trepidation and fear. To get started, go to snark.cloud/appconfig. That's snark.cloud/appconfig.Corey: I come bearing ill tidings. Developers are responsible for more than ever these days. Not just the code that they write, but also the containers and the cloud infrastructure that their apps run on. Because serverless means it's still somebody's problem. And a big part of that responsibility is app security from code to cloud. And that's where our friend Snyk comes in. Snyk is a frictionless security platform that meets developers where they are - Finding and fixing vulnerabilities right from the CLI, IDEs, Repos, and Pipelines. Snyk integrates seamlessly with AWS offerings like code pipeline, EKS, ECR, and more! As well as things you're actually likely to be using. Deploy on AWS, secure with Snyk. Learn more at Snyk.co/scream That's S-N-Y-K.co/screamCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Every once in a while I wind up stumbling into corners of the internet that I previously had not traveled. Somewhat recently, I wound up having that delightful experience again by discovering readysetcloud.io, which has a whole series of, I guess some people might call it thought leadership, I'm going to call it instead how I view it, which is just amazing opinion pieces on the context of serverless, mixed with APIs, mixed with some prognostications about the future.Allen Helton by day is a cloud architect at Tyler Technologies, but that's not how I encountered you. First off, Allen, thank you for joining me.Allen: Thank you, Corey. Happy to be here.Corey: I was originally pointed towards your work by folks in the AWS Community Builder program, of which we both participate from time to time, and it's one of those, “Oh, wow, this is amazing. I really wish I'd discovered some of this sooner.” And every time I look through your back catalog, and I click on a new post, I see things that are either I've really agree with this or I can't stand this opinion, I want to fight about it, but more often than not, it's one of those recurring moments that I love: “Damn, I wish I had written something like this.” So first, you're absolutely killing it on the content front.Allen: Thank you, Corey, I appreciate that. The content that I make is really about the stuff that I'm doing at work. It's stuff that I'm passionate about, stuff that I'd spend a decent amount of time on, and really the most important thing about it for me, is it's stuff that I'm learning and forming opinions on and wants to share with others.Corey: I have to say, when I saw that you were—oh, your Tyler Technologies, which sounds for all the world like, oh, it's a relatively small consultancy run by some guy presumably named Tyler, and you know, it's a petite team of maybe 20, 30 people on the outside. Yeah, then I realized, wait a minute, that's not entirely true. For example, for starters, you're publicly traded. And okay, that does change things a little bit. First off, who are you people? Secondly, what do you do? And third, why have I never heard of you folks, until now?Allen: Tyler is the largest company that focuses completely on the public sector. We have divisions and products for pretty much everything that you can imagine that's in the public sector. We have software for schools, software for tax and appraisal, we have software for police officers, for courts, everything you can think of that runs the government can and a lot of times is run on Tyler software. We've been around for decades building our expertise in the domain, and the reason you probably haven't heard about us is because you might not have ever been in trouble with the law before. If you [laugh] if you have been—Corey: No, no, I learned very early on in the course of my life—which will come as a surprise to absolutely no one who spent more than 30 seconds with me—that I have remarkably little filter and if ten kids were the ones doing something wrong, I'm the one that gets caught. So, I spent a lot of time in the principal's office, so this taught me to keep my nose clean. I'm one of those squeaky-clean types, just because I was always terrified of getting punished because I knew I would get caught. I'm not saying this is the right way to go through life necessarily, but it did have the side benefit of, no, I don't really engage with law enforcement going throughout the course of my life.Allen: That's good. That's good. But one exposure that a lot of people get to Tyler is if you look at the bottom of your next traffic ticket, it'll probably say Tyler Technologies on the bottom there.Corey: Oh, so you're really popular in certain circles, I'd imagine?Allen: Super popular. Yes, yes. And of course, you get all the benefits of writing that code that says ‘if defendant equals Allen Helton then return.'Corey: I like that. You get to have the exception cases built in that no one's ever going to wind up looking into.Allen: That's right. Yes.Corey: The idea of what you're doing makes an awful lot of sense. There's a tremendous need for a wide variety of technical assistance in the public sector. What surprises me, although I guess it probably shouldn't, is how much of your content is aimed at serverless technologies and API design, which to my way of thinking, isn't really something that public sector has done a lot with. Clearly I'm wrong.Allen: Historically, you're not wrong. There's an old saying that government tends to run about ten years behind on technology. Not just technology, but all over the board and runs about ten years behind. And until recently, that's really been true. There was a case last year, a situation last year where one of the state governments—I don't remember which one it was—but they were having a crisis because they couldn't find any COBOL developers to come in and maintain their software that runs the state.And it's COBOL; you're not going to find a whole lot of people that have that skill. A lot of those people are retiring out. And what's happening is that we're getting new people sitting in positions of power and government that want innovation. They know about the cloud and they want to be able to integrate with systems quickly and easily, have little to no onboarding time. You know, there are people in power that have grown up with technology and understand that, well, with everything else, I can be up and running in five or ten minutes. I cannot do this with the software I'm consuming now.Corey: My opinion on it is admittedly conflicted because on the one hand, yeah, I don't think that governments should be running on COBOL software that runs on mainframes that haven't been supported in 25 years. Conversely, I also don't necessarily want them being run like a seed series startup, where, “Well, I wrote this code last night, and it's awesome, so off I go to production with it.” Because I can decide not to do business anymore with Twitter for Pets, and I could go on to something else, like PetFlicks, or whatever it is I choose to use. I can't easily opt out of my government. The decisions that they make stick and that is going to have a meaningful impact on my life and everyone else's life who is subject to their jurisdiction. So, I guess I don't really know where I believe the proper, I guess, pace of technological adoption should be for governments. Curious to get your thoughts on this.Allen: Well, you certainly don't want anything that's bleeding edge. That's one of the things that we kind of draw fine lines around. Because when we're dealing with government software, we're dealing with, usually, critically sensitive information. It's not medical records, but it's your criminal record, and it's things like your social security number, it's things that you can't have leaking out under any circumstances. So, the things that we're building on are things that have proven out to be secure and have best practices around security, uptime, reliability, and in a lot of cases as well, and maintainability. You know, if there are issues, then let's try to get those turned around as quickly as we can because we don't want to have any sort of downtime from the software side versus the software vendor side.Corey: I want to pivot a little bit to some of the content you've put out because an awful lot of it seems to be, I think I'll call it variations on a theme. For example, I just read some recent titles, and to illustrate my point, “Going API First: Your First 30 Days,” “Solutions Architect Tips how to Design Applications for Growth,” “3 Things to Know Before Building A Multi-Tenant Serverless App.” And the common thread that I see running through all of these things are these are things that you tend to have extraordinarily strong and vocal opinions about only after dismissing all of them the first time and slapping something together, and then sort of being forced to live with the consequences of the choices that you've made, in some cases you didn't realize you were making at the time. Are you one of those folks that has the wisdom to see what's coming down the road, or did you do what the rest of us do and basically learn all this stuff by getting it hilariously wrong and having to careen into rebound situations as a result?Allen: [laugh]. I love that question. I would like to say now, I feel like I have the vision to see something like that coming. Historically, no, not at all. Let me talk a little bit about how I got to where I am because that will shed a lot of context on that question.A few years ago, I was put into a position at Tyler that said, “Hey, go figure out this cloud thing.” Let's figure out what we need to do to move into the cloud safely, securely, quickly, all that rigmarole. And so, I did. I got to hand-select team of engineers from people that I worked with at Tyler over the past few years, and we were basically given free rein to learn. We were an R&D team, a hundred percent R&D, for about a year's worth of time, where we were learning about cloud concepts and theory and building little proof of concepts.CI/CD, serverless, APIs, multi-tenancy, a whole bunch of different stuff. NoSQL was another one of the things that we had to learn. And after that year of R&D, we were told, “Okay, now go do something with that. Go build this application.” And we did, building on our theory our cursory theory knowledge. And we get pretty close to go live, and then the business says, “What do you do in this scenario? What do you do in that scenario? What do you do here?”Corey: “I update my resume and go work somewhere else. Where's the hard part here?”Allen: [laugh].Corey: Turns out, that's not a convincing answer.Allen: Right. So, we moved quickly. And then I wouldn't say we backpedaled, but we hardened for a long time before the—prior to the go-live, with the lessons that we've learned with the eyes of Tyler, the mature enterprise company, saying, “These are the things that you have to make sure that you take into consideration in an actual production application.” One of the things that I always pushed—I was a manager for a few years of all these cloud teams—I always push do it; do it right; do it better. Right?It's kind of like crawl, walk, run. And if you follow my writing from the beginning, just looking at the titles and reading them, kind of like what you were doing, Corey, you'll see that very much. You'll see how I talk about CI/CD, you'll see me how I talk about authorization, you'll see me how I talk about multi-tenancy. And I kind of go in waves where maybe a year passes and you see my content revisit some of the topics that I've done in the past. And they're like, “No, no, no, don't do what I said before. It's not right.”Corey: The problem when I'm writing all of these things that I do, for example, my entire newsletter publication pipeline is built on a giant morass of Lambda functions and API Gateways. It's microservices-driven—kind of—and each microservice is built, almost always, with a different framework. Lately, all the new stuff is CDK. I started off with the serverless framework. There are a few other things here and there.And it's like going architecting, back in time as I have to make updates to these things from time to time. And it's the problem with having done all that myself is that I already know the answer to, “What fool designed this?” It's, well, you're basically watching me learn what I was, doing bit by bit. I'm starting to believe that the right answer on some level, is to build an inherent shelf-life into some of these things. Great, in five years, you're going to come back and re-architect it now that you know how this stuff actually works rather than patching together 15 blog posts by different authors, not all of whom are talking about the same thing and hoping for the best.Allen: Yep. That's one of the things that I really like about serverless, I view that as a giant pro of doing Serverless is that when we revisit with the lessons learned, we don't have to refactor everything at once like if it was just a big, you know, MVC controller out there in the sky. We can refactor one Lambda function at a time if now we're using a new version of the AWS SDK, or we've learned about a new best practice that needs to go in place. It's a, “While you're in there, tidy up, please,” kind of deal.Corey: I know that the DynamoDB fanatics will absolutely murder me over this one, but one of the reasons that I have multiple Dynamo tables that contain, effectively, variations on the exact same data, is because I want to have the dependency between the two different microservices be the API, not, “Oh, and under the hood, it's expecting this exact same data structure all the time.” But it just felt like that was the wrong direction to go in. That is the justification I use for myself why I run multiple DynamoDB tables that [laugh] have the same content. Where do you fall on the idea of data store separation?Allen: I'm a big single table design person myself, I really like the idea of being able to store everything in the same table and being able to create queries that can return me multiple different types of entity with one lookup. Now, that being said, one of the issues that we ran into, or one of the ambiguous areas when we were getting started with serverless was, what does single table design mean when you're talking about microservices? We were wondering does single table mean one DynamoDB table for an entire application that's composed of 15 microservices? Or is it one table per microservice? And that was ultimately what we ended up going with is a table per microservice. Even if multiple microservices are pushed into the same AWS account, we're still building that logical construct of a microservice and one table that houses similar entities in the same domain.Corey: So, something I wish that every service team at AWS would do as a part of their design is draw the architecture of an application that you're planning to build. Great, now assume that every single resource on that architecture diagram lives in its own distinct AWS account because somewhere in some customer, there's going to be an account boundary at every interconnection point along the way. And so, many services don't do that where it's, “Oh, that thing and the other thing has to be in the same account.” So, people have to write their own integration shims, and it makes doing the right thing of putting different services into distinct bounded AWS accounts for security or compliance reasons way harder than I feel like it needs to be.Allen: [laugh]. Totally agree with you on that one. That's one of the things that I feel like I'm still learning about is the account-level isolation. I'm still kind of early on, personally, with my opinions in how we're structuring things right now, but I'm very much of a like opinion that deploying multiple things into the same account is going to make it too easy to do something that you shouldn't. And I just try not to inherently trust people, in the sense that, “Oh, this is easy. I'm just going to cross that boundary real quick.”Corey: For me, it's also come down to security risk exposure. Like my lasttweetinaws.com Twitter shitposting thread client lives in a distinct AWS account that is separate from the AWS account that has all of our client billing data that lives within it. The idea being that if you find a way to compromise my public-facing Twitter client, great, the blast radius should be constrained to, “Yay, now you can, I don't know, spin up some cryptocurrency mining in my AWS account and I get to look like a fool when I beg AWS for forgiveness.”But that should be the end of it. It shouldn't be a security incident because I should not have the credit card numbers living right next to the funny internet web thing. That sort of flies in the face of the original guidance that AWS gave at launch. And right around 2008-era, best practices were one customer, one AWS account. And then by 2012, they had changed their perspective, but once you've made a decision to build multiple services in a single account, unwinding and unpacking that becomes an incredibly burdensome thing. It's about the equivalent of doing a cloud migration, in some ways.Allen: We went through that. We started off building one application with the intent that it was going to be a siloed application, a one-off, essentially. And about a year into it, it's one of those moments of, “Oh, no. What we're building is not actually a one-off. It's a piece to a much larger puzzle.”And we had a whole bunch of—unfortunately—tightly coupled things that were in there that we're assuming that resources were going to be in the same AWS account. So, we ended up—how long—I think we took probably two months, which in the grand scheme of things isn't that long, but two months, kind of unwinding the pieces and decoupling what was possible at the time into multiple AWS accounts, kind of, segmented by domain, essentially. But that's hard. AWS puts it, you know, it's those one-way door decisions. I think this one was a two-way door, but it locked and you could kind of jimmy the lock on the way back out.Corey: And you could buzz someone from the lobby to let you back in. Yeah, the biggest problem is not necessarily the one-way door decisions. It's the one-way door decisions that you don't realize you're passing through at the time that you do them. Which, of course, brings us to a topic near and dear to your heart—and I only recently started have opinions on this myself—and that is the proper design of APIs, which I'm sure will incense absolutely no one who's listening to this. Like, my opinions on APIs start with well, probably REST is the right answer in this day and age. I had people, like, “Well, I don't know, GraphQL is pretty awesome.” Like, “Oh, I'm thinking SOAP,” and people look at me like I'm a monster from the Black Lagoon of centuries past in XML-land. So, my particular brand of strangeness side, what do you see that people are doing in the world of API design that is the, I guess, most common or easy to make mistakes that you really wish they would stop doing?Allen: If I could boil it down to one word, fundamentalism. Let me unpack that for you.Corey: Oh, please, absolutely want to get a definition on that one.Allen: [laugh]. I approach API design from a developer experience point of view: how easy is it for both internal and external integrators to consume and satisfy the business processes that they want to accomplish? And a lot of times, REST guidelines, you know, it's all about entity basis, you know, drill into the appropriate entities and name your endpoints with nouns, not verbs. I'm actually very much onto that one.But something that you could easily do, let's say you have a business process that given a fundamentally correct RESTful API design takes ten API calls to satisfy. You could, in theory, boil that down to maybe three well-designed endpoints that aren't, quote-unquote, “RESTful,” that make that developer experience significantly easier. And if you were a fundamentalist, that option is not even on the table, but thinking about it pragmatically from a developer experience point of view, that might be the better call. So, that's one of the things that, I know feels like a hot take. Every time I say it, I get a little bit of flack for it, but don't be a fundamentalist when it comes to your API designs. Do something that makes it easier while staying in the guidelines to do what you want.Corey: For me the problem that I've kept smacking into with API design, and it honestly—let me be very clear on this—my first real exposure to API design rather than API consumer—which of course, I complain about constantly, especially in the context of the AWS inconsistent APIs between services—was when I'm building something out, and I'm reading the documentation for API Gateway, and oh, this is how you wind up having this stage linked to this thing, and here's the endpoint. And okay, great, so I would just populate—build out a structure or a schema that has the positional parameters I want to use as variables in my function. And that's awesome. And then I realized, “Oh, I might want to call this a different way. Aw, crap.” And sometimes it's easy; you just add a different endpoint. Other times, I have to significantly rethink things. And I can't shake the feeling that this is an entire discipline that exists that I just haven't had a whole lot of exposure to previously.Allen: Yeah, I believe that. One of the things that you could tie a metaphor to for what I'm saying and kind of what you're saying, is AWS SAM, the Serverless Application Model, all it does is basically macros CloudFormation resources. It's just a transform from a template into CloudFormation. CDK does same thing. But what the developers of SAM have done is they've recognized these business processes that people do regularly, and they've made these incredibly easy ways to satisfy those business processes and tie them all together, right?If I want to have a Lambda function that is backed behind a endpoint, an API endpoint, I just have to add four or five lines of YAML or JSON that says, “This is the event trigger, here's the route, here's the API.” And then it goes and does four, five, six different things. Now, there's some engineers that don't like that because sometimes that feels like magic. Sometimes a little bit magic is okay.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.Corey: I feel like one of the benefits I've had with the vast majority of APIs that I've built is that because this is all relatively small-scale stuff for what amounts to basically shitposting for the sake of entertainment, I'm really the only consumer of an awful lot of these things. So, I get frustrated when I have to backtrack and make changes and teach other microservices to talk to this thing that has now changed. And it's frustrating, but I have the capacity to do that. It's just work for a period of time. I feel like that equation completely shifts when you have published this and it is now out in the world, and it's not just users, but in many cases paying customers where you can't really make those changes without significant notice, and every time you do you're creating work for those customers, so you have to be a lot more judicious about it.Allen: Oh, yeah. There is a whole lot of governance and practice that goes into production-level APIs that people integrate with. You know, they say once you push something out the door into production that you're going to support it forever. I don't disagree with that. That seems like something that a lot of people don't understand.And that's one of the reasons why I push API-first development so hard in all the content that I write is because you need to be intentional about what you're letting out the door. You need to go in and work, not just with the developers, but your product people and your analysts to say, what does this absolutely need to do, and what does it need to do in the future? And you take those things, and you work with analysts who want specifics, you work with the engineers to actually build it out. And you're very intentional about what goes out the door that first time because once it goes out with a mistake, you're either going to version it immediately or you're going to make some people very unhappy when you make a breaking change to something that they immediately started consuming.Corey: It absolutely feels like that's one of those things that AWS gets astonishingly right. I mean, I had the privilege of interviewing, at the time, Jeff Barr and then Ariel Kelman, who was their head of marketing, to basically debunk a bunch of old myths. And one thing that they started talking about extensively was the idea that an API is fundamentally a promise to your customers. And when you make a promise, you'd better damn well intend on keeping it. It's why API deprecations from AWS are effectively unique whenever something happens.It's the, this is a singular moment in time when they turn off a service or degrade old functionality in favor of new. They can add to it, they can launch a V2 of something and then start to wean people off by calling the old one classic or whatnot, but if I built something on AWS in 2008 and I wound up sleeping until today, and go and try and do the exact same thing and deploy it now, it will almost certainly work exactly as it did back then. Sure, reliability is going to be a lot better and there's a crap ton of features and whatnot that I'm not taking advantage of, but that fundamental ability to do that is awesome. Conversely, it feels like Google Cloud likes to change around a lot of their API stories almost constantly. And it's unplanned work that frustrates the heck out of me when I'm trying to build something stable and lasting on top of it.Allen: I think it goes to show the maturity of these companies as API companies versus just vendors. It's one of the things that I think AWS does [laugh]—Corey: You see the similar dichotomy with Microsoft and Apple. Microsoft's new versions of Windows generally still have functionalities in them to support stuff that was written in the '90s for a few use cases, whereas Apple's like, “Oh, your computer's more than 18-months old? Have you tried throwing it away and buying a new one? And oh, it's a new version of Mac OS, so yeah, maybe the last one would get security updates for a year and then get with the times.” And I can't shake the feeling that the correct answer is in some way, both of those, depending upon who your customer is and what it is you're trying to achieve.If Microsoft adopted the Apple approach, their customers would mutiny, and rightfully so; the expectation has been set for decades that isn't what happens. Conversely, if Apple decided now we're going to support this version of Mac OS in perpetuity, I don't think a lot of their application developers wouldn't quite know what to make of that.Allen: Yeah. I think it also comes from a standpoint of you better make it worth their while if you're going to move their cheese. I'm not a Mac user myself, but from what I hear for Mac users—and this could be rose-colored glasses—but is that their stuff works phenomenally well. You know, when a new thing comes out—Corey: Until it doesn't, absolutely. It's—whenever I say things like that on this show, I get letters. And it's, “Oh, yeah, really? They'll come up with something that is a colossal pain in the ass on Mac.” Like, yeah, “Try building a system-wide mute key.”It's yeah, that's just a hotkey away on windows and here in Mac land. It's, “But it makes such beautiful sounds. Why would you want them to be quiet?” And it's, yeah, it becomes this back-and-forth dichotomy there. And you can even explain it to iPhones as well and the Android ecosystem where it's, oh, you're going to support the last couple of versions of iOS.Well, as a developer, I don't want to do that. And Apple's position is, “Okay, great.” Almost half of the mobile users on the planet will be upgrading because they're in the ecosystem. Do you want us to be able to sell things those people are not? And they're at a point of scale where they get to dictate those terms.On some level, there are benefits to it and others, it is intensely frustrating. I don't know what the right answer is on the level of permanence on that level of platform. I only have slightly better ideas around the position of APIs. I will say that when AWS deprecates something, they reach out individually to affected customers, on some level, and invariably, when they say, “This is going to be deprecated as of August 31,” or whenever it is, yeah, it is going to slip at least twice in almost every case, just because they're not going to turn off a service that is revenue-bearing or critical-load-bearing for customers without massive amounts of notice and outreach, and in some cases according to rumor, having engineers reach out to help restructure things so it's not as big of a burden on customers. That's a level of customer focus that I don't think most other companies are capable of matching.Allen: I think that comes with the size and the history of Amazon. And one of the things that they're doing right now, we've used Amazon Cloud Cams for years, in my house. We use them as baby monitors. And they—Corey: Yea, I saw this I did something very similar with Nest. They didn't have the Cloud Cam at the right time that I was looking at it. And they just announced that they're going to be deprecating. They're withdrawing them for sale. They're not going to support them anymore. Which, oh at Amazon—we're not offering this anymore. But you tell the story; what are they offering existing customers?Allen: Yeah, so slightly upset about it because I like my Cloud Cams and I don't want to have to take them off the wall or wherever they are to replace them with something else. But what they're doing is, you know, they gave me—or they gave all the customers about eight months head start. I think they're going to be taking them offline around Thanksgiving this year, just mid-November. And what they said is as compensation for you, we're going to send you a Blink Cam—a Blink Mini—for every Cloud Cam that you have in use, and then we are going to gift you a year subscription to the Pro for Blink.Corey: That's very reasonable for things that were bought years ago. Meanwhile, I feel like not to be unkind or uncharitable here, but I use Nest Cams. And that's a Google product. I half expected if they ever get deprecated, I'll find out because Google just turns it off in the middle of the night—Allen: [laugh].Corey: —and I wake up and have to read a blog post somewhere that they put an update on Nest Cams, the same way they killed Google Reader once upon a time. That's slightly unfair, but the fact that joke even lands does say a lot about Google's reputation in this space.Allen: For sure.Corey: One last topic I want to talk with you about before we call it a show is that at the time of this recording, you recently had a blog post titled, “What does the Future Hold for Serverless?” Summarize that for me. Where do you see this serverless movement—if you'll forgive the term—going?Allen: So, I'm going to start at the end. I'm going to work back a little bit on what needs to happen for us to get there. I have a feeling that in the future—I'm going to be vague about how far in the future this is—that we'll finally have a satisfied promise of all you're going to write in the future is business logic. And what does that mean? I think what can end up happening, given the right focus, the right companies, the right feedback, at the right time, is we can write code as developers and have that get pushed up into the cloud.And a phrase that I know Jeremy Daly likes to say ‘infrastructure from code,' where it provisions resources in the cloud for you based on your use case. I've developed an application and it gets pushed up in the cloud at the time of deploying it, optimized resource allocation. Over time, what will happen—with my future vision—is when you get production traffic going through, maybe it's spiky, maybe it's consistently at a scale that outperforms the resources that it originally provisioned. We can have monitoring tools that analyze that and pick that out, find the anomalies, find the standard patterns, and adjust that infrastructure that it deployed for you automatically, where it's based on your production traffic for what it created, optimizes it for you. Which is something that you can't do on an initial deployment right now. You can put what looks best on paper, but once you actually get traffic through your application, you realize that, you know, what was on paper might not be correct.Corey: You ever noticed that whiteboard diagrams never show the reality, and they're always aspirational, and they miss certain parts? And I used to think that this was the symptom I had from working at small, scrappy companies because you know what, those big tech companies, everything they build is amazing and awesome. I know it because I've seen their conference talks. But I've been a consultant long enough now, and for a number of those companies, to realize that nope, everyone's infrastructure is basically a trash fire at any given point in time. And it works almost in spite of itself, rather than because of it.There is no golden path where everything is shiny, new and beautiful. And that, honestly, I got to say, it was really [laugh] depressing when I first discovered it. Like, oh, God, even these really smart people who are so intelligent they have to have extra brain packs bolted to their chests don't have the magic answer to all of this. The rest of us are just screwed, then. But we find ways to make it work.Allen: Yep. There's a quote, I wish I remembered who said it, but it was a military quote where, “No battle plan survives impact with the enemy—first contact with the enemy.” It's kind of that way with infrastructure diagrams. We can draw it out however we want and then you turn it on in production. It's like, “Oh, no. That's not right.”Corey: I want to mix the metaphors there and say, yeah, no architecture survives your first fight with a customer. Like, “Great, I don't think that's quite what they're trying to say.” It's like, “What, you don't attack your customers? Pfft, what's your customer service line look like?” Yeah, it's… I think you're onto something.I think that inherently everything beyond the V1 design of almost anything is an emergent property where this is what we learned about it by running it and putting traffic through it and finding these problems, and here's how it wound up evolving to account for that.Allen: I agree. I don't have anything to add on that.Corey: [laugh]. Fair enough. I really want to thank you for taking so much time out of your day to talk about how you view these things. If people want to learn more, where is the best place to find you?Allen: Twitter is probably the best place to find me: @AllenHeltonDev. I have that username on all the major social platforms, so if you want to find me on LinkedIn, same thing: AllenHeltonDev. My blog is always open as well, if you have any feedback you'd like to give there: readysetcloud.io.Corey: And we will, of course, put links to that in the show notes. Thanks again for spending so much time talking to me. I really appreciate it.Allen: Yeah, this was fun. This was a lot of fun. I love talking shop.Corey: It shows. And it's nice to talk about things I don't spend enough time thinking about. Allen Helton, cloud architect at Tyler Technologies. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment that I will reject because it was not written in valid XML.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
The Ever-Changing World of Cloud Native Observability with Ian Smith

Screaming in the Cloud

Play Episode Listen Later Sep 13, 2022 41:58


About IanIan Smith is Field CTO at Chronosphere where he works across sales, marketing, engineering and product to deliver better insights and outcomes to observability teams supporting high-scale cloud-native environments. Previously, he worked with observability teams across the software industry in pre-sales roles at New Relic, Wavefront, PagerDuty and Lightstep.Links Referenced: Chronosphere: https://chronosphere.io Last Tweet in AWS: lasttweetinaws.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Every once in a while, I find that something I'm working on aligns perfectly with a person that I wind up basically convincing to appear on this show. Today's promoted guest is Ian Smith, who's Field CTO at Chronosphere. Ian, thank you for joining me.Ian: Thanks, Corey. Great to be here.Corey: So, the coincidental aspect of what I'm referring to is that Chronosphere is, despite the name, not something that works on bending time, but rather an observability company. Is that directionally accurate?Ian: That's true. Although you could argue it probably bend a little bit of engineering time. But we can talk about that later.Corey: [laugh]. So, observability is one of those areas that I think is suffering from too many definitions, if that makes sense. And at first, I couldn't make sense of what it was that people actually meant when they said observability, this sort of clarified to me at least when I realized that there were an awful lot of, well, let's be direct and call them ‘legacy monitoring companies' that just chose to take what they were already doing and define that as, “Oh, this is observability.” I don't know that I necessarily agree with that. I know a lot of folks in the industry vehemently disagree.You've been in a lot of places that have positioned you reasonably well to have opinions on this sort of question. To my understanding, you were at interesting places, such as LightStep, New Relic, Wavefront, and PagerDuty, which I guess technically might count as observability in a very strange way. How do you view observability and what it is?Ian: Yeah. Well, a lot of definitions, as you said, common ones, they talk about the three pillars, they talk really about data types. For me, it's about outcomes. I think observability is really this transition from the yesteryear of monitoring where things were much simpler and you, sort of, knew all of the questions, you were able to define your dashboards, you were able to define your alerts and that was really the gist of it. And going into this brave new world where there's a lot of unknown things, you're having to ask a lot of sort of unique questions, particularly during a particular instance, and so being able to ask those questions in an ad hoc fashion layers on top of what we've traditionally done with monitoring. So, observability is sort of that more flexible, more dynamic kind of environment that you have to deal with.Corey: This has always been something that, for me, has been relatively academic. Back when I was running production environments, things tended to be a lot more static, where, “Oh, there's a problem with the database. I will SSH into the database server.” Or, “Hmm, we're having a weird problem with the web tier. Well, there are ten or 20 or 200 web servers. Great, I can aggregate all of their logs to Syslog, and worst case, I can log in and poke around.”Now, with a more ephemeral style of environment where you have Kubernetes or whatnot scheduling containers into place that have problems you can't attach to a running container very easily, and by the time you see an error, that container hasn't existed for three hours. And that becomes a problem. Then you've got the Lambda universe, which is a whole ‘nother world pain, where it becomes very challenging, at least for me, in order to reason using the old style approaches about what's actually going on in your environment.Ian: Yeah, I think there's that and there's also the added complexity of oftentimes you'll see performance or behavioral changes based on even more narrow pathways, right? One particular user is having a problem and the traffic is spread across many containers. Is it making all of these containers perform badly? Not necessarily, but their user experience is being affected. It's very common in say, like, B2B scenarios for you to want to understand the experience of one particular user or the aggregate experience of users at a particular company, particular customer, for example.There's just more complexity. There's more complexity of the infrastructure and just the technical layer that you're talking about, but there's also more complexity in just the way that we're handling use cases and trying to provide value with all of this software to the myriad of customers in different industries that software now serves.Corey: For where I sit, I tend to have a little bit of trouble disambiguating, I guess, the three baseline data types that I see talked about again and again in observability. You have logs, which I think I've mostly I can wrap my head around. That seems to be the baseline story of, “Oh, great. Your application puts out logs. Of course, it's in its own unique, beautiful format. Why wouldn't it be?” In an ideal scenario, they're structured. Things are never ideal, so great. You're basically tailing log files in some cases. Great. I can reason about those.Metrics always seem to be a little bit of a step beyond that. It's okay, I have a whole bunch of log lines that are spitting out every 500 error that my app is throwing—and given my terrible code, it throws a lot—but I can then ideally count the number of times that appears and then that winds up incrementing counter, similar to the way that we used to see with StatsD, for example, and Collectd. Is that directionally correct? As far as the way I reason about, well so far, logs and metrics?Ian: I think at a really basic level, yes. I think that, as we've been talking about, sort of greater complexity starts coming in when you have—particularly metrics in today's world of containers—Prometheus—you mentioned StatsD—Prometheus has become sort of like the standard for expressing those things, so you get situations where you have incredibly high cardinality, so cardinality being the interplay between all the different dimensions. So, you might have, my container is a label, but also the type of endpoint is running on that container as a label, then maybe I want to track my customer organizations and maybe I have 5000 of those. I have 3000 containers, and so on and so forth. And you get this massive explosion, almost multiplicatively.For those in the audience who really live and read cardinality, there's probably someone screaming about well, it's not truly multiplicative in every sense of the word, but, you know, it's close enough from an approximation standpoint. As you get this massive explosion of data, which obviously has a cost implication but also has, I think, a really big implication on the core reason why you have metrics in the first place you alluded to, which is, so a human being can reason about it, right? You don't want to go and look at 5000 log lines; you want to know, out of those 5000 log lines of 4000 errors and I have 1000, OKs. It's very easy for human beings to reason about that from a numbers perspective. When your metrics start to re-explode out into thousands, millions of data points, and unique sort of time series more numbers for you to track, then you're sort of losing that original goal of metrics.Corey: I think I mostly have wrapped my head around the concept. But then that brings us to traces, and that tends to be I think one of the hardest things for me to grasp, just because most of the apps I build, for obvious reasons—namely, I'm bad at programming and most of these are proof of concept type of things rather than anything that's large scale running in production—the difference between a trace and logs tends to get very muddled for me. But the idea being that as you have a customer session or a request that talks to different microservices, how do you collate across different systems all of the outputs of that request into a single place so you can see timing information, understand the flow that user took through your application? Is that again, directionally correct? Have I completely missed the plot here? Which is again, eminently possible. You are the expert.Ian: No, I think that's sort of the fundamental premise or expected value of tracing, for sure. We have something that's akin to a set of logs; they have a common identifier, a trace ID, that tells us that all of these logs essentially belong to the same request. But importantly, there's relationship information. And this is the difference between just having traces—sorry, logs—with just a trace ID attached to them. So, for example, if you have Service A calling Service B and Service C, the relatively simple thing, you could use time to try to figure this out.But what if there are things happening in Service B at the same time there are things happening in Service C and D, and so on and so forth? So, one of the things that tracing brings to the table is it tells you what is currently happening, what called that. So oh, I know that I'm Service D. I was actually called by Service B and I'm not just relying on timestamps to try and figure out that connection. So, you have that information and ultimately, the data model allows you to fully sort of reflect what's happening with the request, particularly in complex environments.And I think this is where, you know, tracing needs to be sort of looked at as not a tool for—just because I'm operating in a modern environment, I'm using some Kubernetes, or I'm using Lambda, is it needs to be used in a scenario where you really have troubles grasping, from a conceptual standpoint, what is happening with the request because you need to actually fully document it. As opposed to, I have a few—let's say three Lambda functions. I maybe have some key metrics about them; I have a little bit of logging. You probably do not need to use tracing to solve, sort of, basic performance problems with those. So, you can get yourself into a place where you're over-engineering, you're spending a lot of time with tracing instrumentation and tracing tooling, and I think that's the core of observability is, like, using the right tool, the right data for the job.But that's also what makes it really difficult because you essentially need to have this, you know, huge set of experience or knowledge about the different data, the different tooling, and what influential architecture and the data you have available to be able to reason about that and make confident decisions, particularly when you're under a time crunch which everyone is familiar with a, sort of like, you know, PagerDuty-style experience of my phone is going off and I have a customer-facing incident. Where is my problem? What do I need to do? Which dashboard do I need to look at? Which tool do I need to investigate? And that's where I think the observability industry has become not serving the outcomes of the customers.Corey: I had a, well, I wouldn't say it's a genius plan, but it was a passing fancy that I've built this online, freely available Twitter client for authoring Twitter threads—because that's what I do is that of having a social life—and it's available at lasttweetinaws.com. I've used that as a testbed for a few things. It's now deployed to roughly 20 AWS regions simultaneously, and this means that I have a bit of a problem as far as how to figure out not even what's wrong or what's broken with this, but who's even using it?Because I know people are. I see invocations all over the planet that are not me. And sometimes it appears to just be random things crawling the internet—fine, whatever—but then I see people logging in and doing stuff with it. I'd kind of like to log and see who's using it just so I can get information like, is there anyone I should talk to about what it could be doing differently? I love getting user experience reports on this stuff.And I figured, ah, this is a perfect little toy application. It runs in a single Lambda function so it's not that complicated. I could instrument this with OpenTelemetry, which then, at least according to the instructions on the tin, I could then send different types of data to different observability tools without having to re-instrument this thing every time I want to kick the tires on something else. That was the promise.And this led to three weeks of pain because it appears that for all of the promise that it has, OpenTelemetry, particularly in a Lambda environment, is nowhere near ready for being able to carry a workload like this. Am I just foolish on this? Am I stating an unfortunate reality that you've noticed in the OpenTelemetry space? Or, let's be clear here, you do work for a company with opinions on these things. Is OpenTelemetry the wrong approach?Ian: I think OpenTelemetry is absolutely the right approach. To me, the promise of OpenTelemetry for the individual is, “Hey, I can go and instrument this thing, as you said and I can go and send the data, wherever I want.” The sort of larger view of that is, “Well, I'm no longer beholden to a vendor,”—including the ones that I've worked for, including the one that I work for now—“For the definition of the data. I am able to control that, I'm able to choose that, I'm able to enhance that, and any effort I put into it, it's mine. I own that.”Whereas previously, if you picked, say, for example, an APM vendor, you said, “Oh, I want to have some additional aspects of my information provider, I want to track my customer, or I want to track a particular new metric of how much dollars am I transacting,” that effort really going to support the value of that individual solution, it's not going to support your outcomes. Which is I want to be able to use this data wherever I want, wherever it's most valuable. So, the core premise of OpenTelemetry, I think, is great. I think it's a massive undertaking to be able to do this for at least three different data types, right? Defining an API across a whole bunch of different languages, across three different data types, and then creating implementations for those.Because the implementations are the thing that people want, right? You are hoping for the ability to, say, drop in something. Maybe one line of code or preferably just, like, attach a dependency, let's say in Java-land at runtime, and be able to have the information flow through and have it complete. And this is the premise of, you know, vendors I've worked with in the past, like New Relic. That was what New Relic built on: the ability to drop in an agent and get visibility immediately.So, having that out-of-the-box visibility is obviously a goal of OpenTelemetry where it makes sense—Go, it's very difficult to attach things at runtime, for example—but then saying, well, whatever is provided—let's say your gRPC connections, database, all these things—well, now I want to go and instrument; I want to add some additional value. As you said, maybe you want to track something like I want to have in my traces the email address of whoever it is or the Twitter handle of whoever is so I can then go and analyze that stuff later. You want to be able to inject that piece of information or that instrumentation and then decide, well, where is the best utilized? Is it best utilized in some tooling from AWS? Is it best utilized in something that you've built yourself? Is it best of utilized an open-source project? Is it best utilized in one of the many observability vendors, or is even becoming more common, I want to shove everything in a data lake and run, sort of, analysis asynchronously, overlay observability data for essentially business purposes.All of those things are served by having a very robust, open-source standard, and simple-to-implement way of collecting a really good baseline of data and then make it easy for you to then enhance that while still owning—essentially, it's your IP right? It's like, the instrumentation is your IP, whereas in the old world of proprietary agents, proprietary APIs, that IP was basically building it, but it was tied to that other vendor that you were investing in.Corey: One thing that I was consistently annoyed by in my days of running production infrastructures at places, like, you know, large banks, for example, one of the problems I kept running into is that this, there's this idea that, “Oh, you want to use our tool. Just instrument your applications with our libraries or our instrumentation standards.” And it felt like I was constantly doing and redoing a lot of instrumentation for different aspects. It's not that we were replacing one vendor with another; it's that in an observability, toolchain, there are remarkably few, one-size-fits-all stories. It feels increasingly like everyone's trying to sell me a multifunction printer, which does one thing well, and a few other things just well enough to technically say they do them, but badly enough that I get irritated every single time.And having 15 different instrumentation packages in an application, that's either got security ramifications, for one, see large bank, and for another it became this increasingly irritating and obnoxious process where it felt like I was spending more time seeing the care and feeding of the instrumentation then I was the application itself. That's the gold—that's I guess the ideal light at the end of the tunnel for me in what OpenTelemetry is promising. Instrument once, and then you're just adjusting configuration as far as where to send it.Ian: That's correct. The organization's, and you know, I keep in touch with a lot of companies that I've worked with, companies that have in the last two years really invested heavily in OpenTelemetry, they're definitely getting to the point now where they're generating the data once, they're using, say, pieces of the OpenTelemetry pipeline, they're extending it themselves, and then they're able to shove that data in a bunch of different places. Maybe they're putting in a data lake for, as I said, business analysis purposes or forecasting. They may be putting the data into two different systems, even for incident and analysis purposes, but you're not having that duplication effort. Also, potentially that performance impact, right, of having two different instrumentation packages lined up with each other.Corey: There is a recurring theme that I've noticed in the observability space that annoys me to no end. And that is—I don't know if it's coming from investor pressure, from folks never being satisfied with what they have, or what it is, but there are so many startups that I have seen and worked with in varying aspects of the observability space that I think, “This is awesome. I love the thing that they do.” And invariably, every time they start getting more and more features bolted onto them, where, hey, you love this whole thing that winds up just basically doing a tail-F on a log file, so it just streams your logs in the application and you can look for certain patterns. I love this thing. It's great.Oh, what's this? Now, it's trying to also be the thing that alerts me and wakes me up in the middle of the night. No. That's what PagerDuty does. I want PagerDuty to do that thing, and I want other things—I want you just to be the log analysis thing and the way that I contextualize logs. And it feels like they keep bolting things on and bolting things on, where everything is more or less trying to evolve into becoming its own version of Datadog. What's up with that?Ian: Yeah, the sort of, dreaded platform play. I—[laugh] I was at New Relic when there were essentially two products that they sold. And then by the time I left, I think there was seven different products that were being sold, which is kind of a crazy, crazy thing when you think about it. And I think Datadog has definitely exceeded that now. And I definitely see many, many vendors in the market—and even open-source solutions—sort of presenting themselves as, like, this integrated experience.But to your point, even before about your experience of these banks it oftentimes become sort of a tick-a-box feature approach of, “Hey, I can do this thing, so buy more. And here's a shared navigation panel.” But are they really integrated? Like, are you getting real value out of it? One of the things that I do in my role is I get to work with our internal product teams very closely, particularly around new initiatives like tracing functionality, and the constant sort of conversation is like, “What is the outcome? What is the value?”It's not about the feature; it's not about having a list of 19 different features. It's like, “What is the user able to do with this?” And so, for example, there are lots of platforms that have metrics, logs, and tracing. The new one-upmanship is saying, “Well, we have events as well. And we have incident response. And we have security. And all these things sort of tie together, so it's one invoice.”And constantly I talk to customers, and I ask them, like, “Hey, what are the outcomes that you're getting when you've invested so heavily in one vendor?” And oftentimes, the response is, “Well, I only need to deal with one vendor.” Okay, but that's not an outcome. [laugh]. And it's like the business having a single invoice.Corey: Yeah, that is something that's already attainable today. If you want to just have one vendor with a whole bunch of crappy offerings, that's what AWS is for. They have AmazonBasics versions of everything you might want to use in production. Oh, you want to go ahead and use MongoDB? Well, use AmazonBasics MongoDB, but they call it DocumentDB because of course they do. And so, on and so forth.There are a bunch of examples of this, but those companies are still in business and doing very well because people often want the genuine article. If everyone was trying to do just everything to check a box for procurement, great. AWS has already beaten you at that game, it seems.Ian: I do think that, you know, people are hoping for that greater value and those greater outcomes, so being able to actually provide differentiation in that market I don't think is terribly difficult, right? There are still huge gaps in let's say, root cause analysis during an investigation time. There are huge issues with vendors who don't think beyond sort of just the one individual who's looking at a particular dashboard or looking at whatever analysis tool there is. So, getting those things actually tied together, it's not just, “Oh, we have metrics, and logs, and traces together,” but even if you say we have metrics and tracing, how do you move between metrics and tracing? One of the goals in the way that we're developing product at Chronosphere is that if you are alerted to an incident—you as an engineer; doesn't matter whether you are massively sophisticated, you're a lead architect who has been with the company forever and you know everything or you're someone who's just come out of onboarding and is your first time on call—you should not have to think, “Is this a tracing problem, or a metrics problem, or a logging problem?”And this is one of those things that I mentioned before of requiring that really heavy level of knowledge and understanding about the observability space and your data and your architecture to be effective. And so, with the, you know, particularly observability teams and all of the engineers that I speak with on a regular basis, you get this sort of circumstance where well, I guess, let's talk about a real outcome and a real pain point because people are like, okay, yeah, this is all fine; it's all coming from a vendor who has a particular agenda, but the thing that constantly resonates is for large organizations that are moving fast, you know, big startups, unicorns, or even more traditional enterprises that are trying to undergo, like, a rapid transformation and go really cloud-native and make sure their engineers are moving quickly, a common question I will talk about with them is, who are the three people in your organization who always get escalated to? And it's usually, you know, between two and five people—Corey: And you can almost pick those perso—you say that and you can—at least anyone who's worked in environments or through incidents like this more than a few times, already have thought of specific people in specific companies. And they almost always fall into some very predictable archetypes. But please, continue.Ian: Yeah. And people think about these people, they always jump to mind. And one of the things I asked about is, “Okay, so when you did your last innovation around observably”—it's not necessarily buying a new thing, but it maybe it was like introducing a new data type or it was you're doing some big investment in improving instrumentation—“What changed about their experience?” And oftentimes, the most that can come out is, “Oh, they have access to more data.” Okay, that's not great.It's like, “What changed about their experience? Are they still getting woken up at 3 am? Are they constantly getting pinged all the time?” One of the vendors that I worked at, when they would go down, there were three engineers in the company who were capable of generating list of customers who are actually impacted by damage. And so, every single incident, one of those three engineers got paged into the incident.And it became borderline intolerable for them because nothing changed. And it got worse, you know? The platform got bigger and more complicated, and so there were more incidents and they were the ones having to generate that. But from a business level, from an observability outcomes perspective, if you zoom all the way up, it's like, “Oh, were we able to generate the list of customers?” “Yes.”And this is where I think the observability industry has sort of gotten stuck—you know, at least one of the ways—is that, “Oh, can you do it?” “Yes.” “But is it effective?” “No.” And by effective, I mean those three engineers become the focal point for an organization.And when I say three—you know, two to five—it doesn't matter whether you're talking about a team of a hundred or you're talking about a team of a thousand. It's always the same number of people. And as you get bigger and bigger, it becomes more and more of a problem. So, does the tooling actually make a difference to them? And you might ask, “Well, what do you expect from the tooling? What do you expect to do for them?” Is it you give them deeper analysis tools? Is it, you know, you do AI Ops? No.The answer is, how do you take the capabilities that those people have and how do you spread it across a larger population of engineers? And that, I think, is one of those key outcomes of observability that no one, whether it be in open-source or the vendor side is really paying a lot of attention to. It's always about, like, “Oh, we can just shove more data in. By the way, we've got petabyte scale and we can deal with, you know, 2 billion active time series, and all these other sorts of vanity measures.” But we've gotten really far away from the outcomes. It's like, “Am I getting return on investment of my observability tooling?”And I think tracing is this—as you've said, it can be difficult to reason about right? And people are not sure. They're feeling, “Well, I'm in a microservices environment; I'm in cloud-native; I need tracing because my older APM tools appear to be failing me. I'm just going to go and wriggle my way through implementing OpenTelemetry.” Which has significant engineering costs. I'm not saying it's not worth it, but there is a significant engineering cost—and then I don't know what to expect, so I'm going to go on through my data somewhere and see whether we can achieve those outcomes.And I do a pilot and my most sophisticated engineers are in the pilot. And they're able to solve the problems. Okay, I'm going to go buy that thing. But I've just transferred my problems. My engineers have gone from solving problems in maybe logs and grepping through petabytes worth of logs to using some sort of complex proprietary query language to go through your tens of petabytes of trace data but actually haven't solved any problem. I've just moved it around and probably just cost myself a lot, both in terms of engineering time and real dollars spent as well.Corey: One of the challenges that I'm seeing across the board is that observability, for certain use cases, once you start to see what it is and its potential for certain applications—certainly not all; I want to hedge that a little bit—but it's clear that there is definite and distinct value versus other ways of doing things. The problem is, is that value often becomes apparent only after you've already done it and can see what that other side looks like. But let's be honest here. Instrumenting an application is going to take some significant level of investment, in many cases. How do you wind up viewing any return on investment that it takes for the very real cost, if only in people's time, to go ahead instrumenting for observability in complex environments?Ian: So, I think that you have to look at the fundamentals, right? You have to look at—pretend we knew nothing about tracing. Pretend that we had just invented logging, and you needed to start small. It's like, I'm not going to go and log everything about every application that I've had forever. What I need to do is I need to find the points where that logging is going to be the most useful, most impactful, across the broadest audience possible.And one of the useful things about tracing is because it's built in distributed environments, primarily for distributed environments, you can look at, for example, the biggest intersection of requests. A lot of people have things like API Gateways, or they have parts of a monolith which is still handling a lot of requests routing; those tend to be areas to start digging into. And I would say that, just like for anyone who's used Prometheus or decided to move away from Prometheus, no one's ever gone and evaluated Prometheus solution without having some sort of Prometheus data, right? You don't go, “Hey, I'm going to evaluate a replacement for Prometheus or my StatsD without having any data, and I'm simultaneously going to generate my data and evaluate the solution at the same time.” It doesn't make any sense.With tracing, you have decent open-source projects out there that allow you to visualize individual traces and understand sort of the basic value you should be getting out of this data. So, it's a good starting point to go, “Okay, can I reason about a single request? Can I go and look at my request end-to-end, even in a relatively small slice of my environment, and can I see the potential for this? And can I think about the things that I need to be able to solve with many traces?” Once you start developing these ideas, then you can have a better idea of, “Well, where do I go and invest more in instrumentation? Look, databases never appear to be a problem, so I'm not going to focus on database instrumentation. What's the real problem is my external dependencies. Facebook API is the one that everyone loves to use. I need to go instrument that.”And then you start to get more clarity. Tracing has this interesting network effect. You can basically just follow the breadcrumbs. Where is my biggest problem here? Where are my errors coming from? Is there anything else further down the call chain? And you can sort of take that exploratory approach rather than doing everything up front.But it is important to do something before you start trying to evaluate what is my end state. End state obviously being sort of nebulous term in today's world, but where do I want to be in two years' time? I would like to have a solution. Maybe it's open-source solution, maybe it's a vendor solution, maybe it's one of those platform solutions we talked about, but how do I get there? It's really going to be I need to take an iterative approach and I need to be very clear about the value and outcomes.There's no point in doing a whole bunch of instrumentation effort in things that are just working fine, right? You want to go and focus your time and attention on that. And also you don't want to go and burn just singular engineers. The observability team's purpose in life is probably not to just write instrumentation or just deploy OpenTelemetry. Because then we get back into the land where engineers themselves know nothing about the monitoring or observability they're doing and it just becomes a checkbox of, “I dropped in an agent. Oh, when it comes time for me to actually deal with an incident, I don't know anything about the data and the data is insufficient.”So, a level of ownership supported by the observability team is really important. On that return on investment, sort of, though it's not just the instrumentation effort. There's product training and there are some very hard costs. People think oftentimes, “Well, I have the ability to pay a vendor; that's really the only cost that I have.” There's things like egress costs, particularly volumes of data. There's the infrastructure costs. A lot of the times there will be elements you need to run in your own environment; those can be very costly as well, and ultimately, they're sort of icebergs in this overall ROI conversation.The other side of it—you know, return and investment—return, there's a lot of difficulty in reasoning about, as you said, what is the value of this going to be if I go through all this effort? Everyone knows a sort of, you know, meme or archetype of, “Hey, here are three options; pick two because there's always going to be a trade off.” Particularly for observability, it's become an element of, I need to pick between performance, data fidelity, or cost. Pick two. And when data fidelity—particularly in tracing—I'm talking about the ability to not sample, right?If you have edge cases, if you have narrow use cases and ways you need to look at your data, if you heavily sample, you lose data fidelity. But oftentimes, cost is a reason why you do that. And then obviously, performance as you start to get bigger and bigger datasets. So, there's a lot of different things you need to balance on that return. As you said, oftentimes you don't get to understand the magnitude of those until you've got the full data set in and you're trying to do this, sort of, for real. But being prepared and iterative as you go through this effort and not saying, “Okay, well, I'm just going to buy everything from one vendor because I'm going to assume that's going to solve my problem,” is probably that undercurrent there.Corey: As I take a look across the entire ecosystem, I can't shake the feeling—and my apologies in advance if this is an observation, I guess, that winds up throwing a stone directly at you folks—Ian: Oh, please.Corey: But I see that there's a strong observability community out there that is absolutely aligned with the things I care about and things I want to do, and then there's a bunch of SaaS vendors, where it seems that they are, in many cases, yes, advancing the state of the art, I am not suggesting for a second that money is making observability worse. But I do think that when the tool you sell is a hammer, then every problem starts to look like a nail—or in my case, like my thumb. Do you think that there's a chance that SaaS vendors are in some ways making this entire space worse?Ian: As we've sort of gone into more cloud-native scenarios and people are building things specifically to take advantage of cloud from a complexity standpoint, from a scaling standpoint, you start to get, like, vertical issues happening. So, you have things like we're going to charge on a per-container basis; we're going to charge on a per-host basis; we're going to charge based off the amount of gigabytes that you send us. These are sort of like more horizontal pricing models, and the way the SaaS vendors have delivered this is they've made it pretty opaque, right? Everyone has experiences, or has jerks about overages from observability vendors' massive spikes. I've worked with customers who have used—accidentally used some features and they've been billed a quarter million dollars on a monthly basis for accidental overages from a SaaS vendor.And these are all terrible things. Like, but we've gotten used to this. Like, we've just accepted it, right, because everyone is operating this way. And I really do believe that the move to SaaS was one of those things. Like, “Oh, well, you're throwing us more data, and we're charging you more for it.” As a vendor—Corey: Which sort of erodes your own value proposition that you're bringing to the table. I mean, I don't mean to be sitting over here shaking my fist yelling, “Oh, I could build a better version in a weekend,” except that I absolutely know how to build a highly available Rsyslog cluster. I've done it a handful of times already and the technology is still there. Compare and contrast that with, at scale, the fact that I'm paying 50 cents per gigabyte ingested to CloudWatch logs, or a multiple of that for a lot of other vendors, it's not that much harder for me to scale that fleet out and pay a much smaller marginal cost.Ian: And so, I think the reaction that we're seeing in the market and we're starting to see—we're starting to see the rise of, sort of, a secondary class of vendor. And by secondary, I don't mean that they're lesser; I mean that they're, sort of like, specifically trying to address problems of the primary vendors, right? Everyone's aware of vendors who are attempting to reduce—well, let's take the example you gave on logs, right? There are vendors out there whose express purpose is to reduce the cost of your logging observability. They just sit in the middle; they are a middleman, right?Essentially, hey, use our tool and even though you're going to pay us a whole bunch of money, it's going to generate an overall return that is greater than if you had just continued pumping all of your logs over to your existing vendor. So, that's great. What we think really needs to happen, and one of the things we're doing at Chronosphere—unfortunate plug—is we're actually building those capabilities into the solution so it's actually end-to-end. And by end-to-end, I mean, a solution where I can ingest my data, I can preprocess my data, I can store it, query it, visualize it, all those things, aligned with open-source standards, but I have control over that data, and I understand what's going on with particularly my cost and my usage. I don't just get a bill at the end of the month going, “Hey, guess what? You've spent an additional $200,000.”Instead, I can know in real time, well, what is happening with my usage. And I can attribute it. It's this team over here. And it's because they added this particular label. And here's a way for you, right now, to address that and cap it so it doesn't cost you anything and it doesn't have a blast radius of, you know, maybe degraded performance or degraded fidelity of the data.That though is diametrically opposed to the way that most vendors are set up. And unfortunately, the open-source projects tend to take a lot of their cues, at least recently, from what's happening in the vendor space. One of the ways that you can think about it is a sort of like a speed of light problem. Everyone knows that, you know, there's basic fundamental latency; everyone knows how fast disk is; everyone knows the, sort of like, you can't just make your computations happen magically, there's a cost of running things horizontally. But a lot of the way that the vendors have presented efficiency to the market is, “Oh, we're just going to incrementally get faster as AWS gets faster. We're going to incrementally get better as compression gets better.”And of course, you can't go and fit a petabyte worth of data into a kilobyte, unless you're really just doing some sort of weird dictionary stuff, so you feel—you're dealing with some fundamental constraints. And the vendors just go, “I'm sorry, you know, we can't violate the speed of light.” But what you can do is you can start taking a look at, well, how is the data valuable, and start giving the people controls on how to make it more valuable. So, one of the things that we do with Chronosphere is we allow you to reshape Prometheus metrics, right? You go and express Prometheus metrics—let's say it's a business metric about how many transactions you're doing as a business—you don't need that on a per-container basis, particularly if you're running 100,000 containers globally.When you go and take a look at that number on a dashboard, or you alert on it, what is it? It's one number, one time series. Maybe you break it out per region. You have five regions, you don't need 100,000 data points every minute behind that. It's very expensive, it's not very performant, and as we talked about earlier, it's very hard to reason about as a human being.So, giving the tools to be able to go and condense that data down and make it more actionable and more valuable, you get performance, you get cost reduction, and you get the value that you ultimately need out of the data. And it's one of the reasons why, I guess, I work at Chronosphere. Which I'm hoping is the last observability [laugh] venture I ever work for.Corey: Yeah, for me a lot of the data that I see in my logs, which is where a lot of this stuff starts and how I still contextualize these things, is nonsense that I don't care about and will never care about. I don't care about load balance or health checks. I don't particularly care about 200 results for the favicon when people visit the site. I care about other things, but just weed out the crap, especially when I'm paying by the pound—or at least by the gigabyte—in order to get that data into something. Yeah. It becomes obnoxious and difficult to filter out.Ian: Yeah. And the vendors just haven't done any of that because why would they, right? If you went and reduced the amount of log—Corey: Put engineering effort into something that reduces how much I can charge you? That sounds like lunacy. Yeah.Ian: Exactly. They're business models entirely based off it. So, if you went and reduced every one's logging bill by 30%, or everyone's logging volume by 30% and reduced the bills by 30%, it's not going to be a great time if you're a publicly traded company who has built your entire business model on essentially a very SaaS volume-driven—and in my eyes—relatively exploitative pricing and billing model.Corey: Ian, I want to thank you for taking so much time out of your day to talk to me about this. If people want to learn more, where can they find you? I mean, you are a Field CTO, so clearly you're outstanding in your field. But if, assuming that people don't want to go to farm country, where's the best place to find you?Ian: Yeah. Well, it'll be a bunch of different conferences. I'll be at KubeCon this year. But chronosphere.io is the company website. I've had the opportunity to talk to a lot of different customers, not from a hard sell perspective, but you know, conversations like this about what are the real problems you're having and what are the things that you sort of wish that you could do?One of the favorite things that I get to ask people is, “If you could wave a magic wand, what would you love to be able to do with your observability solution?” That's, A, a really great part, but oftentimes be being able to say, “Well, actually, that thing you want to do, I think I have a way to accomplish that,” is a really rewarding part of this particular role.Corey: And we will, of course, put links to that in the show notes. Thank you so much for being so generous with your time. I appreciate it.Ian: Thanks, Corey. It's great to be here.Corey: Ian Smith, Field CTO at Chronosphere on this promoted guest episode. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment, which going to be super easy in your case, because it's just one of the things that the omnibus observability platform that your company sells offers as part of its full suite of things you've never used.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.