Software to manage containers on a server-cluster
In the podcast, I spoke with Meenakshi Kaushik and Neelima Mukiri from Cisco team on responsible AI and machine learning bias and how to address the biases when using ML in our applications. Meenakshi Kaushik currently works in the product management team at Cisco. She leads Kubernetes and AI/ML product offerings in the organization. Meenakshi has interest in AI/ML space and is excited about how the technology can enhance human wellbeing and productivity. Neelima Mukiri is a principal engineer at Cisco. She is currently working in Cisco's on-premise and software service container platforms. Read a transcript of this interview: https://bit.ly/2ZiRswO Subscribe to our newsletters: - The InfoQ weekly newsletter: www.infoq.com/news/InfoQ-Newsletter/ - The Software Architects' Newsletter [monthly]: www.infoq.com/software-architects-newsletter/ Upcoming Virtual Events - events.infoq.com/ QCon London: https://qconlondon.com/ - April 4-6, 2022 / London, UK QCon Plus: https://plus.qconferences.com/ - May 10-20.2022 InfoQ Live: https://live.infoq.com/ - Feb 22, 2022 - June 21, 2022 - July 19, 2022 - August 23, 2022 Follow InfoQ: - Twitter: twitter.com/infoq - LinkedIn: www.linkedin.com/company/infoq/ - Facebook: www.facebook.com/InfoQdotcom/ - Instagram: @infoqdotcom - Youtube: www.youtube.com/infoq
Open Source and other source available projects have been a huge driver of progress in our industry, but building and maintaining an open source project is about a lot more than just writing the initial code and putting together a good README. On this episode of the maintenance mini-series, we'll be discussing open source and the maintenance required to keep it going.
Matt Holt is the creator of popular open source projects like Caddy Web Server, CertMagic, PapaParse, & others. If you've ever wondered what it would be like to maintain open source projects full-time, Matt is the man to speak with. We learn how he came up with the idea for Caddy, how it works, and the pro's and con's of building a sustainable an open source project. Want to show your support for Caddy? Become a sponsor on GitHub!Connect with Matt:https://twitter.com/mholt6https://github.com/mholtMentioned in today's episode:https://github.com/caddyserver/caddyhttps://caddy.community https://www.amazon.com/PHP-MySQL-Development-Luke-Welling/dp/0672317842https://en.wikipedia.org/wiki/Automated_Certificate_Management_Environment https://github.com/mholt/timelinerWant more from Ardan Labs?You can learn Go, Kubernetes, Docker & more through our video training, live events, or through our blog!
Zac Smith, managing director Equinix Metal, is sharing how Equinix Metal runs the best hardware and networking in the industry, why pairing magical software with the right hardware is the future, and what Open19 means for sustainability in the data centre. Think modular components that slot in (including CPUs), liquid cooling that converts heat into energy, and a few other solutions that minimise the impact on the environment. But first, Zac tells us about the transition from Packet to Equinix Metal, his reasons for doing what he does, as well as the things that he is really passionate about, such as the most efficient data centres in the world and building for the love of it. This is a great follow-up to episode 18 because it goes deeper into the reasons that make Gerhard excited about the work that Equinix Metal is doing. This conversation with Zac puts it all into perspective. By the way, did you know that Equinix stands for Equality in the Internet Exchange?
About StephanieStephanie Wong is an award-winning speaker, engineer, pageant queen, and hip hop medalist. She is a leader at Google with a mission to blend storytelling and technology to create remarkable developer content. At Google, she's created over 400 videos, blogs, courses, and podcasts that have helped developers globally. You might recognize her as the host of the GCP Podcast. Stephanie is active in her community, fiercely supporting women in tech and mentoring students.Links: Personal Website: https://stephrwong.com Twitter: https://twitter.com/stephr_wong TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats v-u-l-t-r.com slash screaming.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of "Hello, World" demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself all while gaining the networking load, balancing and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free. This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that's snark.cloud/oci-free.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. One of the things that makes me a little weird in the universe is that I do an awful lot of… let's just call it technology explanation slash exploration in public, and turning it into a bit of a brand-style engagement play. What makes this a little on the weird side is that I don't work for a big company, which grants me a tremendous latitude. I have a whole lot of freedom that lets me be all kinds of different things, and I can't get fired, which is something I'm really good at.Inversely, my guest today is doing something remarkably similar, except she does work for a big company and could theoretically be fired if they were foolish enough to do so. But I don't believe that they are. Stephanie Wong is the head of developer engagement at Google. Stephanie, thank you for volunteering to suffer my slings and arrows about all of this.Stephanie: [laugh]. Thanks so much for having me today, Corey.Corey: So, at a very high level, you're the head of developer engagement, which is a term that I haven't seen a whole lot of. Where does that start and where does that stop?Stephanie: Yeah, so I will say that it's a self-proclaimed title a bit because of the nuance of what I do. I would say at its heart, I am still a part of developer relations. If you've heard of developer advocacy or developer evangelist, I would say this slight difference in shade of what I do is that I focus on scalable content creation and becoming a central figure for our developer audiences to engage and enlighten them with content that, frankly, is remarkable, and that they'd want to share and learn about our technology.Corey: Your bio is fascinating in that it doesn't start with the professional things that most people do with, “This is my title and this is my company,” is usually the first sentence people put in. Yours is, “Stephanie Wong is an award-winning speaker, engineer, pageant queen, and hip hop medalist.” Which is both surprising and more than a little bit refreshing because when I read a bio like that my immediate instinctive reaction is, “Oh, thank God. It's a real person for a change.” I like the idea of bringing the other aspects of what you are other than, “This is what goes on in an IDE, the end,” to your audience.Stephanie: That is exactly the goal that I had when creating that bio because I truly believe in bringing more interdisciplinary and varied backgrounds to technology. I, myself have gone through a very unconventional path to get to where I am today and I think in large part, my background has had a lot to do with my successes, my failures, and really just who I am in tech as an uninhibited and honest, credible person today.Corey: I think that there's a lack of understanding, broadly, in our industry about just how important credibility and authenticity are and even the source of where they come from. There are a lot of folks who are in the DevRel space—devrelopers, as I insist upon calling them, over their protests—where, on some level, the argument is, what is developer relations? “Oh, you work in marketing, but they're scared to tell you,” has been my gag on that one for a while. But they speak from a position of, “I know what's what because I have been in the trenches, working on these large-scale environments as an engineer for the last”—fill in the blank, however long it may have been—“And therefore because I have done things, I am going to tell you how it is.” You explicitly call out that you don't come from the traditional, purely technical background. Where did you come from? It's unlikely that you've sprung fully-formed from the forehead of some god, but again, I'm not entirely sure how Google finds and creates the folks that it winds up advancing, so maybe you did.Stephanie: Well, to tell you the truth. We've all come from divine creatures. And that's where Google sources all employees. So. You know. But—[laugh].Corey: Oh, absolutely. “We climbed to the top of Olympus and then steal fire from the gods.” “It's like, isn't that the origin story of Prometheus?” “Yeah, possibly.” But what is your background? Where did you come from?Stephanie: So, I have grown up, actually, in Silicon Valley, which is a little bit ironic because I didn't go to school for computer science or really had the interest in becoming an engineer in school. I really had no idea.Corey: Even been more ironic than that because most of Silicon Valley appears to never have grown up at all.Stephanie: [laugh]. So, true. Maybe there's a little bit of that with me, too. Everybody has a bit of Peter Pan syndrome here, right? Yeah, I had no idea what I wanted to do in school and I just knew that I had an interest in communicating with one another, and I ended up majoring in communication studies.I thought I wanted to go into the entertainment industry and go into production, which is very different and ended up doing internships at Warner Brothers Records, a YouTube channel for dance—I'm a dancer—and I ended up finding a minor in digital humanities, which is sort of this interdisciplinary minor that combines technology and the humanities space, including literature, history, et cetera. So, that's where I got my start in technology, getting an introduction to information systems and doing analytics, studying social media for certain events around the world. And it wasn't until after school that I realized that I could work in enterprise technology when I got an offer to be a sales engineer. Now, that being said, I had no idea what sales engineering was. I just knew it had something to do with enterprise technology and communications, and I thought it was a good fit for my background.Corey: The thing that I find so interesting about that is that it breaks the mold of what people expect, when, “If someone's going to talk to me about technology—especially coming from a”—it's weird; it's one of the biggest companies on the planet, and people still on some level equate Google with the startup-y mentality of being built in someone's garage. That's an awfully big garage these days, if that's even slightly close to true, which it isn't. But there's this idea of, “Oh, you have to go to Stanford. You have to get a degree in computer science. And then you have to go and do this, this, this, this, and this.”And it's easy to look dismissively at what you're doing. “Communications? Well, all that would teach you to do is communicate to people clearly and effectively. What possible good is that in tech?” As we look around the landscape and figure out exactly why that is so necessary in tech, and also so lacking?Stephanie: Exactly. I do think it's an underrated skill in tech. Maybe it's not so much anymore, but I definitely think that it has been in the past. And even for developers, engineers, data scientists, other technical practitioner, especially as a person in DevRel, I think it's such a valuable skill to be able to communicate complex topics simply and understandably to a wide variety of audiences.Corey: The big question that I have for you because I've talked to an awful lot of folks who are very concerned about the way that they approach developer relations, where—they'll have ratios, for example—where I know someone and he insists that he give one deeply technical talk for every four talks that are not deeply technical, just because he feels the need to re-establish and shore up his technical bona fides. Now, if there's one thing that people on the internet love, it is correcting people on things that are small trivia aspect, or trying to pull out the card that, “Oh, I've worked on this system for longer than you've worked on this system, therefore, you should defer to me.” Do you find that you face headwinds for not having the quote-unquote, “Traditional” engineering technical background?Stephanie: I will say that I do a bit. And I did, I would say when I first joined DevRel, and I don't know if it was much more so that it was being imposed on me or if it was being self-imposed, something that I felt like I needed to prove to gain credibility, not just in my organization, but in the industry at large. And it wasn't until two or three years into it, that I realized that I had a niche myself. It was to create stories with my content that could communicate these concepts to developers just as effectively. And yes, I can still prove that I can go into an hour-long or a 45-minute-long tech talk or a webinar about a topic, but I can also easily create a five to ten-minute video that communicates concepts and inspires audiences just the same, and more importantly, be able to point to resources, code labs, tutorials, GitHub repos, that can allow the audience to be hands-on themselves, too. So really, I think that it was over time that I gained more experience and realized that my skill sets are valuable in a different way, and it's okay to have a different background as long as you bring something to the table.Corey: And I think that it's indisputable that you do. The concept of yours that I've encountered from time to time has always been insightful, it is always been extremely illuminating, and—you wouldn't think of this as worthy of occasion and comment, but I feel it needs to be said anyway—at no point in any of your content did I feel like I was being approached in a condescending way, where at every point it was always about uplifting people to a level of understanding, rather than doing the, “Well, I'm smarter than you and you couldn't possibly understand the things that I've been to.” It is relatable, it is engaging, and you add a very human face to what is admittedly an area of industry that is lacking in a fair bit of human element.Stephanie: Yeah, and I think that's the thing that many folks DevRel continue to underline is the idea of empathy, empathizing with your audiences, empathizing with the developers, the engineers, the data engineers, whoever it is that you're creating content for, it's being in their shoes. But for me, I may not have been in those shoes for years, like many other folks historically have been in for DevRel, but I want to at least go through the journey of learning a new piece of technology. For example, if I'm learning a new platform on Google Cloud, going through the steps of creating a demo, or walking through a tutorial, and then candidly explaining that experience to my audience, or creating a video about it. I really just reject the idea of having ego in tech and I would love to broaden the opportunity for folks who came from a different background like myself. I really want to just represent the new world of technology where it wasn't full of people who may have had the privilege to start coding at a very early age, in their garages.Corey: Yeah, privilege of, in many respects, also that privilege means, “Yes, I had the privilege of not having to have friends and deal with learning to interact with other human beings, which is what empowered me to build this company and have no social skills whatsoever.” It's not the aspirational narrative that we sometimes are asked to believe. You are similar in some respects to a number of things that I do—by which I mean, you do it professionally and well and I do it as basically performance shitpost art—but you're on Twitter, you make videos, you do podcasts, you write long-form and short-form as well. You are sort of all across the content creation spectrum. Which of those things do you prefer to do? Which ones of those are things you find a little bit more… “Well, I have to do it, but it's not my favorite?” Or do you just tend to view it as content is content; you just look at different media to tell your story?Stephanie: Well, I will say any form of content is queen—I'm not going to say king, but—[laugh] content is king, content is queen, it doesn't matter.Corey: Content is a baroness as it turns out.Stephanie: [laugh]. There we go. I have to say, so given my background, I mentioned I was into production and entertainment before, so I've always had a gravitation towards video content. I love tinkering with cameras. Actually, as I got started out at Google Cloud, I was creating scrappy content using webcams and my own audio equipment, and doing my own research, and finding lounges and game rooms to do that, and we would just upload it to our own YouTube channel, which probably wasn't allowed at the time, but hey, we got by with it.And eventually, I got approached by DevRel to start doing it officially on the channel and I was given budget to do it in-studio. And so that was sort of my stepping stone to doing this full-time eventually, which I never foresaw for myself. And so yeah, I have this huge interest in—I'm really engaged with video content, but once I started expanding and realizing that I could repurpose that content for podcasting, I could repurpose it for blogs, then you start to realize that you can shard content and expand your reach exponentially with this. So, that's when I really started to become more active on social media and leverage it to build not just content for Google Cloud, but build my own brand in tech.Corey: That is the inescapable truth of DevRel done right is that as you continue doing it, in time, in your slice of the industry, it is extremely likely that your personal brand eclipses the brand of the company that you represent. And it's in many ways a test of corporate character—if it makes sense—as do how they react to that. I've worked in roles before I started this place where I was starting to dabble with speaking a lot, and there was always a lot of insecurity that I picked up of, “Well, it feels like you're building your personal brand, not advancing the company here, and we as a company do not see the value in you doing that.” Direct quote from the last boss I had. And, well, that partially explains why I'm here, I suppose.But there's insecurity there. I'd see the exact opposite coming out of Google, especially in recent times. There's something almost seems to be a renaissance in Google Cloud, and I'm not sure where it came from. But if I look at it across the board, and you had taken all the labels off of everything, and you had given me a bunch of characteristics about different companies, I would never have guessed that you were describing Google when you're talking about Google Cloud. And perhaps that's unfair, but perceptions shape reality.Stephanie: Yeah, I find that interesting because I think traditionally in DevRel, we've also hired folks for their domain expertise and their brand, depending on what you're representing, whether it's in the Kubernetes space or Python client library that you're supporting. But it seems like, yes, in my case, I've organically started to build my brand while at Google, and Google has been just so spectacular in supporting that for me. But yeah, it's a fine line that I think many people have to walk. It's like, do you want to continue to build your own brand and have that carry forth no matter what company you stay at, or if you decide to leave? Or can you do it hand-in-hand with the company that you're at? For me, I think I can do it hand-in-hand with Google Cloud.Corey: It's taken me a long time to wrap my head around what appears to be a contradiction when I look at Google Cloud, and I think I've mostly figured it out. In the industry, there is a perception that Google as an entity is condescending and sneering toward every other company out there because, “You're Google, you know how to do all these great, amazing things that are global-spanning, and over here at Twitter for Pets, we suck doing these things.” So, Google is always way smarter and way better at this than we could ever hope to be. But that is completely opposed to my personal experiences talking with Google employees. Across the board, I would say that you all are self-effacing to a fault.And I mean that in the sense of having such a limited ego, in some cases, that it's, “Well, I don't want to go out there and do a whole video on this. It's not about me, it's about the technology,” are things that I've had people who work at Google say to me. And I appreciate the sentiment; it's great, but that also feels like it's an aloofness. It also fails to humanize what it is that you're doing. And you are a, I've got to say, a breath of fresh air when it comes to a lot of that because your stories are not just, “Here's how you do a thing. It's awesome. And this is all the intricacies of the API.”And yeah, you get there, but you also contextualize that in a, “Here's why it matters. Here's the problem that solves. Here is the type of customer's problem that this is great for,” rather than starting with YAML and working your way up. It's going the other way, of, “We want to sell some underpants,” or whatever it is the customer is trying to do today. And that is the way that I think is one of the best ways to drive adoption of what's going on because if you get people interested and excited about something—at least in my experience—they're going to figure out how the API works. Badly in many cases, but works. But if you start on the API stuff, it becomes a solution looking for a problem. I like your approach to this.Stephanie: Thank you. Yeah, I appreciate that. I think also something that I've continued to focus on is to tell stories across products, and it doesn't necessarily mean within just Google Cloud's ecosystem, but across the industry as well. I think we need to, even at Google, tell a better story across our product space and tie in what developers are currently using. And I think the other thing that I'm trying to work on, too, is contextualizing our products and our launches not just across the industry, but within our product strategy. Where does this tie in? Why does it matter? What is our forward-looking strategy from here? When we're talking about our new data cloud products or analytics, [unintelligible 00:17:21], how does this tie into our API strategy?Corey: And that's the biggest challenge, I think, in the AI space. My argument has been for a while—in fact, I wrote a blog post on it earlier this year—that AI and machine learning is a marvelously executed scam because it's being pushed by cloud providers and the things that you definitely need to do a machine learning experiment are a bunch of compute and a whole bunch of data that has to be stored on something, and wouldn't you know it, y'all sell that by the pound. So, it feels, from a cynical perspective, which I excel at espousing, that approach becomes one of you're effectively selling digital pickaxes into a gold rush. Because I see a lot of stories about machine learning how to do very interesting things that are either highly, highly use-case-specific, which great, that would work well, for me too, if I ever wind up with, you know, a petabyte of people's transaction logs from purchasing coffee at my national chain across the country. Okay, that works for one company, but how many companies look like that?And on the other side of it, “It's oh, here's how we can do a whole bunch of things,” and you peel back the covers a bit, and it looks like, “Oh, but you really taught me here is bias laundering?” And, okay. I think that there's a definite lack around AI and machine learning of telling stories about how this actually matters, what sorts of things people can do with it that aren't incredibly—how do I put this?—niche or a problem in search of a solution?Stephanie: Yeah, I find that there are a couple approaches to creating content around AI and other technologies, too, but one of them being inspirational content, right? Do you want to create something that tells the story of how I created a model that can predict what kind of bakery item this is? And we're going to do it by actually showcasing us creating the outcome. So, that's one that's more like, okay. I don't know how relatable or how appropriate it is for an enterprise use case, but it's inspirational for new developers or next gen developers in the AI space, and I think that can really help a company's brand, too.The other being highly niche for the financial services industry, detecting financial fraud, for example, and that's more industry-focused. I found that they both do well, in different contexts. It really depends on the channel that you're going to display it on. Do you want it to be viral? It really depends on what you're measuring your content for. I'm curious from you, Corey, what you've seen across, as a consumer of content?Corey: What's interesting, at least in my world, is that there seems to be, given that what I'm focusing on first and foremost is the AWS ecosystem, it's not that I know it the best—I do—but at this point, it's basically Stockholm Syndrome where it's… with any technology platform when you've worked with it long enough, you effectively have the most valuable of skill sets around it, which is not knowing how it works, but knowing how it doesn't, knowing what the failure mode is going to look like and how you can work around that and detect it is incredibly helpful. Whereas when you're trying something new, you have to wait until it breaks to find the sharp edges on it. So, there's almost a lock-in through, “We failed you enough times,” story past a certain point. But paying attention to that ecosystem, I find it very disjointed. I find that there are still events that happen and I only find out when the event is starting because someone tweets about it, and for someone who follows 40 different official AWS RSS feeds, to be surprised by something like that tells me, okay, there's not a whole lot of cohesive content strategy here, that is at least making it easy for folks to consume the things that they want, especially in my case where even the very niche nature of what I do, my interest is everything.I have a whole bunch of different filters that look for various keywords and the rest, and of course, I have helpful folks who email me things constantly—please keep it up; I'm a big fan—worst case, I'd rather read something twice than nothing. So, it's helpful to see all of that and understand the different marketing channels, different personas, and the way that content approaches, but I still find things that slip through the cracks every time. The thing that I've learned—and it felt really weird when I started doing it—was, I will tell the same stories repeatedly in different forums, or even the same forum. I could basically read you a Twitter thread from a year ago, word-for-word, and it would blow up bigger than it did the first time. Just because no one reads everything.Stephanie: Exactly.Corey: And I've already told my origin story. You're always new to someone. I've given talks internally at Amazon at various times, and I'm sort of loud and obnoxious, but the first question I love to ask is, “Raise your hand if you've never heard of me until today.” And invariably, over three-quarters of the room raises their hand every single time, which okay, great. I think that's awesome, but it teaches me that I cannot ever expect someone to have, quote-unquote, “Done the reading.”Stephanie: I think the same can be said about the content that I create for the company. You can't assume that people, A) have seen my tweets already or, B) understand this product, even if I've talked about it five times in the past. But yes, I agree. I think that you definitely need to have a content strategy and how you format your content to be more problem-solution-oriented.And so the way that I create content is that I let them fall into three general buckets. One being that it could be termed definition: talking about the basics, laying the foundation of a product, defining terms around a topic. Like, what is App Engine, or Kubeflow 101, or talking about Pub/Sub 101.The second being best practices. So, outlining and explaining the best practices around a topic, how do you design your infrastructure for scale and reliability.And the third being diagnosis: investigating; exploring potential issues, as you said; using scripts; Stackdriver logging, et cetera. And so I just kind of start from there as a starting point. And then I generally follow a very, very effective model. I'm sure you're aware of it, but it's called the five point argument model, where you are essentially telling a story to create a compelling narrative for your audience, regardless of the topic or what bucket that topic falls into.So, you're introducing the problem, you're sort of rising into a point where the climax is the solution. And that's all to build trust with your audience. And as it falls back down, you're giving the results in the conclusion, and that's to inspire action from your audience. So, regardless of what you end up talking about this problem-solution model—I've found at least—has been highly effective. And then in terms of sharing it out, over and over again, over the span of two months, that's how you get the views that you want.Corey: This episode is sponsored in part by something new. Cloud Academy is a training platform built on two primary goals. Having the highest quality content in tech and cloud skills, and building a good community the is rich and full of IT and engineering professionals. You wouldn't think those things go together, but sometimes they do. Its both useful for individuals and large enterprises, but here's what makes it new. I don't use that term lightly. Cloud Academy invites you to showcase just how good your AWS skills are. For the next four weeks you'll have a chance to prove yourself. Compete in four unique lab challenges, where they'll be awarding more than $2000 in cash and prizes. I'm not kidding, first place is a thousand bucks. Pre-register for the first challenge now, one that I picked out myself on Amazon SNS image resizing, by visiting cloudacademy.com/corey. C-O-R-E-Y. That's cloudacademy.com/corey. We're gonna have some fun with this one!Corey: See, that's a key difference right there. I don't do anything regular in terms of video as part of my content. And I do it from time to time, but you know, getting gussied up and whatnot is easier than just talking into a microphone. As I record this, it's Friday, I'm wearing a Hawaiian shirt, and I look exactly like the middle-aged dad that I am. And for me at least, a big breakthrough moment was realizing that my audience and I are not always the same.Weird confession for someone in my position: I don't generally listen to podcasts. And the reason behind that is I read very quickly, and even if I speed up a podcast, I'm not going to be able to consume the information nearly as quickly as I could by reading it. That, amongst other reasons, is one of the reasons that every episode of this show has a full transcript attached to it. But I'm not my audience. Other people prefer to learn by listening and there's certainly nothing wrong with that.My other podcast, the AWS Morning Brief, is the spoken word version of the stuff that I put out in my newsletter every week. And that is—it's just a different area for people to consume the content because that's what works for them. I'm not one to judge. The hard part for me was getting over that hump of assuming the audience was like me.Stephanie: Yeah. And I think the other key part of is just mainly consistency. It's putting out the content consistently in different formats because everybody—like you said—has a different learning style. I myself do. I enjoy visual styles.I also enjoy listening to podcasts at 2x speed. [laugh]. So, that's my style. But yeah, consistency is one of the key things in building content, and building an audience, and making sure that you are valuable to your audience. I mean, social media, at the end of the day is about the people that follow you.It's not about yourself. It should never be about yourself. It's about the value that you provide. Especially as somebody who's in DevRel in this position for a larger company, it's really about providing value.Corey: What are the breakthrough moments that I had relatively early in my speaking career—and I think it's clear just from what you've already said that you've had a similar revelation at times—I gave a talk, that was really one of my first talks that went semi-big called, “Terrible Ideas in Git.” It was basically, learn how to use Git via anti-pattern. What it secretly was, was under the hood, I felt it was time I learned Git a bit better than I did, so I pitched it and I got a talk accepted. So well, that's what we call a forcing function. By the time I give that talk, I'd better be [laugh] able to have built a talk that do this intelligently, and we're going to hope for the best.It worked, but the first version of that talk I gave was super deep into the plumbing of Git. And I'm sure that if any of the Git maintainers were in the audience, they would have found it great, but there aren't that many folks out there. I redid the talk and instead approached it from a position of, “You have no idea what Git is. Maybe you've heard of it, but that's as far as it goes.” And then it gets a little deeper there.And I found that making the subject more accessible as opposed to deeper into the weeds of it is almost always the right decision from a content perspective. Because at some level, when you are deep enough into the weeds, the only way you're going to wind up fixing something or having a problem that you run into get resolved, isn't by listening to a podcast or a conference talk; it's by talking to the people who built the thing because at that level, those are the only people who can hang at that level of depth. That stops being fodder for conference talks unless you turn it into an after-action report of here's this really weird thing I learned.Stephanie: Yeah. And you know, to be honest, the one of the most successful pieces of content I've created was about data center security. I visited a data center and I essentially unveiled what our security protocols were. And that wasn't a deeply technical video, but it was fun and engaging and easily understood by the masses. And that's what actually ended up resulting in the highest number of views.On top of that, I'm now creating a video about our subsea fiber optic cables. Finding that having to interview experts from a number of different teams across engineering and our strategic negotiators, it was like a monolith of information that I had to take in. And trying to format that into a five-minute story, I realized that bringing it up a layer of abstraction to help folks understand this at a wider level was actually beneficial. And I think it'll turn into a great piece of content. I'm still working on it now. So, [laugh] we'll see how it turns out.Corey: I'm a big fan of watching people learn and helping them get started. The thing that I think gets lost a lot is it's easy to assume that if I look back in time at myself when I was first starting my professional career two decades ago, that I was exactly like I am now, only slightly more athletic and can walk up a staircase without getting winded. That's never true. It never has been true. I've learned a lot about not just technology but people as I go, and looking at folks are entering the workforce today through the same lens of, “Well, that's not how I would handle that situation.” Yeah, no kidding. I have two decades of battering my head against the sharp edges and leaving dents in things to inform that opinion.No, when I was that age, I would have handled it way worse than whatever it is I'm critiquing at the time. But it's important to me that we wind up building those pathways and building those bridges so that people coming into the space, first, have a clear path to get here, and secondly, have a better time than I ever did. Where does the next generation of talent come from has been a recurring question and a recurring theme on the show.Stephanie: Yeah. And that's exactly why I've been such a fierce supporter of women in tech, and also, again, encouraging a broader community to become a part of technology. Because, as I said, I think we're in the midst of a new era of technology, of people from all these different backgrounds in places that historically have had more remote access to technology, now having the ability to become developers at an early age. So, with my content, that's what I'm hoping to drive to make this information more easily accessible. Even if you don't want to become a Google Cloud engineer, that's totally fine, but if I can help you understand some of the foundational concepts of cloud, then I've done my job well.And then, even with women who are already trying to break into technology or wanting to become a part of it, then I want to be a mentor for them, with my experience not having a technical background and saying yes to opportunities that challenged me and continuing to build my own luck between hard work and new opportunities.Corey: I can't wait to see how this winds up manifesting as we see understandings of what we're offering to customers in different areas in different ways—both in terms of content and terms of technology—how that starts to evolve and shift. I feel like we're at a bit of an inflection point now, where today if I graduate from school and I want to start a business, I have to either find a technical co-founder or I have to go to a boot camp and learn how to code in order to build something. I think that if we can remove that from the equation and move up the stack, sure, you're not going to be able to build the next Google or Pinterest or whatnot from effectively Visual Basic for Interfaces, but you can build an MVP and you can then continue to iterate forward and turn it into something larger down the road. The other part of it, too, is that moving up the stack into more polished solutions rather than here's a bunch of building blocks for platforms, “So, if you want a service to tell you whether there's a picture of a hot dog or not, here's a service that does exactly that.” As opposed to, “Oh, here are the 15 different services, you can bolt together and pay for each one of them and tie it together to something that might possibly work, and if it breaks, you have no idea where to start looking, but here you go.” A packaged solution that solves business problems.Things move up the stack; they do constantly. The fact is that I started my career working in data centers and now I don't go to them at all because—spoiler—Google, and Amazon, and people who are not IBM Cloud can absolutely run those things better than I can. And there's no differentiated value for me in solving those global problems locally. I'd rather let the experts handle stuff like that while I focus on interesting problems that actually affect my business outcome. There's a reason that instead of running all the nonsense for lastweekinaws.com myself because I've worked in large-scale WordPress hosting companies, instead I pay WP Engine to handle it for me, and they, in turn, hosted on top of Google Cloud, but it doesn't matter to me because it's all just a managed service that I pay for. Because me running the website itself adds no value, compared to the shitpost I put on the website, which is where the value derives from. For certain odd values of value.Stephanie: [laugh]. Well, two things there is that I think we actually had a demo created on Google Cloud that did detect hot dogs or not hot dogs using our Vision API, years in the past. So, thanks for reminding me of that one.Corey: Of course.Stephanie: But yeah, I mean, I completely agree with that. I mean, this is constantly a topic in conversation with my team members, and with clients. It's about higher level of abstractions. I just did a video series with our fellow, Eric Brewer, who helped build cloud infrastructure here at Google over the past ten decades. And I asked him what he thought the future of cloud would be in the next ten years, and he mentioned, “It's going to be these higher levels of abstraction, building platforms on top of platforms like Kubernetes, and having more services like Cloud run serverless technologies, et cetera.”But at the same time, I think the value of cloud will continue to be providing optionality for developers to have more opinionated services, services like GKE Autopilot, et cetera, that essentially take away the management of infrastructure or nodes that people don't really want to deal with at the end of the day because it's not going to be a competitive differentiator for developers. They want to focus on building software and focusing on keeping their services up and running. And so yeah, I think the future is going to be that, giving developers flexibility and freedom, and still delivering the best-of-breed technology. If it's covering something like security, that's something that should be baked in as much as possible.Corey: You're absolutely right, first off. I'm also looking beyond it where I want to be able to build a website that is effectively Twitter, only for pets—because that is just a harebrained enough idea to probably raise a $20 million seed round these days—and I just want to be able to have the barks—those are like tweets, only surprisingly less offensive and racist—and have them just be stored somewhere, ideally presumably under the hood somewhere, it's going to be on computers, but whether it's in containers, or whether it's serverless, or however is working is the sort of thing that, “Wow, that seems like an awful lot of nonsense that is not central nor core to my business succeeding or failing.” I would say failing, obviously, except you can lose money at scale with the magic of things like SoftBank. Here we are.And as that continues to grow and scale, sure, at some point I'm going to have bespoke enough needs and a large enough scale where I do have to think about those things, but building the MVP just so I can swindle some VCs is not the sort of thing where I should have to go to that depth. There really should be a golden-path guardrail-style thing that I can effectively drag and drop my way into the next big scam. And that is, I think, the missing piece. And I think that we're not quite ready technologically to get there yet, but I can't shake the feeling and the hope that's where technology is going.Stephanie: Yeah. I think it's where technology is heading, but I think part of the equation is the adoption by our industry, right? Industry adoption of cloud services and whether they're ready to adopt services that are that drag-and-drop, as you say. One thing that I've also been talking a lot about is this idea of service-oriented networking where if you have a service or API-driven environment and you simply want to bring it to cloud—almost a plug-and-play there—you don't really want to deal with a lot of the networking infrastructure, and it'd be great to do something like PrivateLink on AWS, or Private Service Connect on Google Cloud.While those conversations are happening with customers, I'm finding that it's like trying to cross the Grand Canyon. Many enterprise customers are like, “That sounds great, but we have a really complex network topology that we've been sitting on for the past 25 years. Do you really expect that we're going to transition over to something like that?” So, I think it's about providing stepping stones for our customers until they can be ready to adopt a new model.Corey: Yeah. And of course, the part that never gets said out loud but is nonetheless true and at least as big of a deal, “And we have a whole team of people who've built their entire identity around that network because that is what they work on, and they have been ignoring cloud forever, and if we just uplift everything into a cloud where you folks handle that, sure, it's better for the business outcome, but where does that leave them?” So, they've been here for 25 years, and they will spend every scrap of political capital they've managed to accumulate to torpedo a cloud migration. So, any FUD they can find, any horse-trading they can do, anything they can do to obstruct the success of a cloud initiative, they're going to do because people are people, and there is no real plan to mitigate that. There's also the fact that unless there's a clear business value story about a feature velocity increase or opening up new markets, there's also not an incentive to do things to save money. That is never going to be the number one priority in almost any case short of financial disaster at a company because everything they're doing is building out increasing revenue, rather than optimizing what they're already doing.So, there's a whole bunch of political challenges. Honestly, moving the computer stuff from on-premises data centers into a cloud provider is the easiest part of a cloud migration compared to all of the people that are involved.Stephanie: Yeah. Yeah, we talked about serverless and all the nice benefits of it, but unless you are more a digitally-born, next-gen developer, it may be a higher burden for you to undertake that migration. That's why we always [laugh] are talking about encouraging people to start with newer surfaces.Corey: Oh, yeah. And that's the trick, too, is if you're trying to learn a new cloud platform these days—first, if you're trying to pick one, I'd be hard-pressed to suggest anything other than Google Cloud, with the possible exception of DigitalOcean, just because the new user experience is so spectacularly good. That was my first real, I guess, part of paying attention to Google Cloud a few years ago, where I was, “All right, I'm going to kick the tires on this and see how terrible this interface is because it's a Google product.” And it was breathtakingly good, which I did not expect. And getting out of the way to empower someone who's new to the platform to do something relatively quickly and straightforwardly is huge. And sure, there's always room to prove, but that is the right area to focus on. It's clear that the right energy was spent in the right places.Stephanie: Yeah. I will say a story that we don't tell quite as well as we should is the One Google story. And I'm not talking about just between Workspace and Google Cloud, but our identity access management and knowing your Google account, which everybody knows. It's not like Microsoft, where you're forced to make an account, or it's not like AWS where you had a billion accounts and you hate them all.Corey: Oh, my God, I dread logging into the AWS console every time because it is such a pain in the ass. I go to cloud.google.com sometimes to check something, it's like, “Oh, right. I have to dig out my credentials.” And, “Where's my YubiKey?” And get it. Like, “Oh. I'm already log—oh. Oh, right. That's right. Google knows how identity works, and they don't actively hate their customers. Okay.” And it's always a breath of fresh air. Though I will say that by far and away, the worst login experience I've seen yet is, of course, Azure.Stephanie: [laugh]. That's exactly right. It's Google account. It's yours. It's personal. It's like an Apple iCloud account. It's one click, you're in, and you have access to all the applications. You know, so it's the same underlying identity structure with Workspace and Gmail, and it's the same org structure, too, across Workspace and Google Cloud. So, it's not just this disingenuous financial bundle between GCP and Workspace; it's really strategic. And it's kind of like the idea of low code or no code. And it looks like that's what the future of cloud will be. It's not just by VMs from us.Corey: Yeah. And there are customers who want to buy VMs and that's great. Speed up what they're doing; don't get in the way of people giving you their money, but if you're starting something net-new, there's probably better ways to do it. So, I want to thank you for taking as much time as you have to wind up going through how you think about, well, the art of storytelling in the world of engineering. If people want to learn more about who you are, what you're up to, and how you approach things, where can they find you?Stephanie: Yeah, so you can head to stephrwong.com where you can see my work and also get in touch with me if you want to collaborate on any content. I'm always, always, always open to that. And my Twitter is @stephr_wong.Corey: And we will, of course, put links to that in the [show notes 00:40:03]. Thank you so much for taking the time to speak with me.Stephanie: Thanks so much.Corey: Stephanie Wong, head of developer engagement at Google Cloud. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment telling me that the only way to get into tech these days is, in fact, to graduate with a degree from Stanford, and I can take it from you because you work in their admissions office.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
We kick things off by weighing the merits of two gender-neutral regional pronouns: the familiar y'all and the under appreciated yinz. Now that's covered...The global population of developers will hit 45 million by 2030, up from 26.9 million in 2021 (EDC). What platforms will they want to build on?Did Kubernetes solve all your problems? Did it create new ones?It seems there's always an XKCD relevant to our conversation. Today, it's How standards proliferate.
Maxwell, a solution architect at xMatters, took a winding road to get to where he is. After a computer engineering education, he held jobs as field support engineer, product manager, SRE, and finally his current role as a solutions architect, where he serves as something of an SRE for SREs, helping them solve incident management problems with the help of xMatters. When he moved to the SRE role, Maxwell wanted to get back to doing technical work. It was a lateral move within his company, which was migrating an on-prem solution into the cloud. It's a journey that plenty of companies are making now: breaking an application into microservices, running processes in containers, and using Kubernetes to orchestrate the whole thing. Non-production environments would go down and waste SRE time, making it harder to address problems in the production pipeline. At the heart of their issues was the incident response process. They had several bottlenecks that prevented them from delivering value to their customers quickly. Incidents would send emails to the relevant engineers, sometimes 20 on a single email, which made it easy for any one engineer to ignore the problem—someone else has got this. They had a bad silo problem, where escalating to the right person across groups became an issue of its own. And of course, most of this was manual. Their MTTR—mean time to resolve—was lagging. Maxwell moved over to xMatters because they managed to solve these problems through clever automation. Their product automates the scheduling and notification process so that the right person knows about the incident as soon as possible. At the core of this process was a different MTTR—mean time to respond. Once an engineer started working to resolve a problem, it was all down to runbooks and skill. But the lag between the initial incident and that start was the real slowdown. It's not just the response from the first SRE on call. It's the other escalations down the line—to data engineers, for example—that can eat away time. They've worked hard to make escalation configuration easy. It not only handles who's responsible for specific services and metrics, but who's in the escalation chain from there. When the incident hits, the notifications go out through a series of configured channels; maybe it tries a chat program first, then email, then SMS. The on-call process is often a source of dread, but automating the escalation process can take some of the sting out of it. Check out the episode to learn more.
Hi, Spring fans! Welcome to another installment of _A Bootiful Podcast_! This week - a week before Thanksgiving here in the US - [Josh Long (@starbuxman)](https://twitter.com/starbuxman) wants you to know he and the Spring team are_thankful_ for you, and then he talks to newly minted Java Champion and Java ecosystem legend [Kate Stanley (@KateStanley91)](https://twitter.com/KateStanley91) about Kubernetes and Apache Kafka.
About Micheal BenedictMicheal Benedict leads Engineering Productivity at Pinterest. He and his team focus on developer experience, building tools and platforms for over a thousand engineers to effectively code, build, deploy and operate workloads on the cloud. Mr. Benedict has also built Infrastructure and Cloud Governance programs at Pinterest and previously, at Twitter -- focussed on managing cloud vendor relationships, infrastructure budget management, cloud migration, capacity forecasting and planning and cloud cost attribution (chargeback). Links: Pinterest: https://www.pinterest.com Twitter: https://twitter.com/micheal LinkedIn: https://www.linkedin.com/in/michealb/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: You know how git works right?Announcer: Sorta, kinda, not really Please ask someone else!Corey: Thats all of us. Git is how we build things, and Netlify is one of the best way I've found to build those things quickly for the web. Netlify's git based workflows mean you don't have to play slap and tickle with integrating arcane non-sense and web hooks, which are themselves about as well understood as git. Give them a try and see what folks ranging from my fake Twitter for pets startup, to global fortune 2000 companies are raving about. If you end up talking to them, because you don't have to, they get why self service is important—but if you do, be sure to tell them that I sent you and watch all of the blood drain from their faces instantly. You can find them in the AWS marketplace or at www.netlify.com. N-E-T-L-I-F-Y.comCorey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats v-u-l-t-r.com slash screaming.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Sometimes when I have conversations with guests here, we run long. Really long. And then we wind up deciding it was such a good conversation, and there's still so much more to say that we schedule a follow-up, and that's what happened today. Please welcome back Micheal Benedict, who is, as of the last time we spoke and presumably still now, the head of engineering productivity at Pinterest. Micheal, how are you?Micheal: I'm doing great, and thanks for that introduction, Corey. Thankfully, yes, I am still the head of engineering productivity; I'm really glad to speak more about it today.Corey: The last time that we spoke, we went up one side and down the other of large-scale environments running on AWS and billing aspects thereof, et cetera, et cetera. I want to stay away from that this time and instead focus on the rest of engineering productivity, which is always an interesting and possibly loaded term. So, what is productivity engineering? It sounds almost like it's an internal dev tools team, or is it something more?Micheal: Well, thanks for asking because I get this question asked a lot of times. So, for one, our primary job is to enable every developer, at least at our company, to do their best work. And we want to do this by providing them a fast, safe, and a reliable path to take any idea into production without ever worrying about the infrastructure. As you clearly know, learning anything about how AWS works—or any public cloud provider works—is a ton of investment, and we do want our product engineers, our mobile engineers, and all the other folks to be focused on delivering amazing experiences to our Pinners. So, we could be doing some of the hard work in providing those abstractions for them in such way, and taking away the pain of managing infrastructure.Corey: The challenge, of course, that I've seen is that a lot of companies take the approach of, “Ah. We're going to make AWS available to all of our engineers in it's raw, unfiltered form.” And that lasts until the first bill shows up. And then it's, “Okay. We're going to start building some guardrails around that.” Which makes a lot of sense. There then tends to be a move towards internal platforms that effectively wrap cloud services.And for a while now, I've been generally down on the concept and publicly so in the general sense. That said, what I say that applies as a best practice or something that most people should consider does tend to fall apart when we talk about specific use cases. You folks are an extremely large environment; how do you view it? First off, do you do internal platforms like that? And secondly, would you recommend that other companies do the same thing?Micheal: I think that's such a great question because every company evolves with its own pace of development. And I wouldn't say Pinterest by itself had a developer productivity or an engineering productivity organization from the get-go. I think this happens when you start realizing that your core engineers who are working on product are now spending a certain fraction of time—which starts ballooning pretty fast—in managing the underlying systems and the infrastructure. And at that point in time, it's probably a good question to ask, how can I reduce the friction in those people's lives such that they could be focused more on the product. And, kind of, centralize or provide some sort of common abstractions through a central team which can take away all that pain.So, that is generally a good guiding principle to think about when your engineers are spending at least 30% of their time on operating the systems rather than building capabilities, that's probably a good time to revisit and see whether a central team would make sense to take away some of that. And just simple examples, right? This includes upgrading OS on your EC2 machines, or just trying to make sure you're patching all the right versions on your next big Kubernetes cluster you're running for serving x number of users. The moment you start seeing that, you want to start thinking about, if there is a central team who could take away that pain, what are the things they could be investing on to help up-level every other engineer within your organization. And I think that's one of the best ways to be thinking about it.And it was also a guiding principle for us within Pinterest to view what investments we could make in these central teams which can up-level each and every different type of engineer in the company as well. And just an example on that could be your mobile engineer would have very different expectations from your backend engineer who was working on certain aspects of code in your product. And it is truly important to understand where you want to centralize capabilities, which both these types of engineers could use, or you want to divest and have unique capabilities where it's going to make them productive. There's no one-size-fits-all solution for this, but I'm happy to talk about what we have at Pinterest, which has been reasonably working well. But I do think there's a lot more improvements we could be doing.Corey: Yeah, but let's also be clear that, as you've mentioned, you are heavily biased towards EC2 instances for a lot of what you do. If we look at the AWS console and we see hundreds of different services now, and it's easy to sit here and say, “Oh, internal platforms are terrible because all of those services are going to be enhanced in various ways and you're never going to be able to keep up with feature parity.” Yeah, but if you can wrap something like EC2 in an internal platform wrapper, that begins to be a different story because sure, someone's going to go and try something new with a different AWS service, they're going to need direct access. But the EC2 product across the board generally does not evolve in leaps and bounds with transformative changes overnight. Let's also not forget that at a company with the scale that Pinterest operates at, “Hey, AWS just dusted off a new feature and docs are still rolling out, and it's not in CloudFormation yet, but we're going to roll it out to production,” probably seems like the wrong direction to go in, I would assume.Micheal: And yes, I think that brings one of the key guardrails, I think, which these groups provide. So, when we start thinking about what teams, centralized teams like engineering productivity, developer tools, developer platforms actually do is they help with a couple of things. The top three are: they can help pave a path for the most common use cases. Like to your point, provisioning EC2 does take a set of steps, all the time. If you're going to have a thousand people doing that every time they're building a new service or trying to expand capacity playing with their launch templates, those are things you can start streamlining and making it simple by some wrapper because you want to address those 80% use cases which are usually common, and you can have a wrapper or could just automate that. And that's one of the key things: can you provide a paved path for those use cases?The second thing is, can you do that by having the right guardrails in place? How often have you heard the story that, “I just clicked a button and that now spun up, like, a thousand-plus instances.” And now you have to juggle between trying to stop them or do something about it.Corey: Back in 2013, you folks were still focusing on this fair bit. I remember because Jeremy Carroll, who I believe was your first SRE there once upon a time, wound up doing a whole series of talks around how Pinterest approached doing an AMI Factory. And back in those days, the challenges were, “Okay. We have the baseline AMI, and that's great, but we also want to do deployments of things and we don't really want to do a new deploy of an entire fleet of EC2 instances for a single line of config change, so how do we wind up weighing off of when you bake a new AMI versus when you just change something that has—in what is deployed to them?” And it was really a complicated problem back then.I'm not convinced it's not still a complicated problem, but the answers are a lot more cohesive. And making sure that every team—when you're talking about a company as large as Pinterest with that many teams—is doing things in the same way, seems like it's critically important otherwise you wind up with a whole bunch of unique-looking instances that each have to be managed by hand as opposed to something that can be reasoned around collectively.Micheal: Yep. And that last part you mentioned is extremely crucial as well because like I said, our audience or our customers are just not the engineers; we do work with our product managers and business partners as well because at times, we have to tie or change our architecture based on certain cost optimizations which would make sense, like you just articulated. We don't want to have all the instance types. It does not add much value to a developer unless they're explicitly seeking a high-memory instance or a [GP-based instance in a 00:10:25] certain way. So, we can then work with our business partners to make sure that we're committing to only a certain type of instances, and how we can abstract our tools to only give you that. For example, our deployment system, Teletraan which is an open-source system, actually condenses down all these instance types to a couple of categories like high-compute, high-memory—and you've probably seen that in many of the new cloud providers as well—so people don't have to learn or know the underlying instance type.When we moved from c3 to c5, it was just called as a high-compute system, so the next time someone provisioned a new service or deployed it using our system, they would just select high-compute as the de facto instance type and we would just automatically provision a C5 for them. So, that just reduces the extra complexity or the cognitive overhead individuals would have to go through in learning each instance type, what is the base AMI that comes on it, what are the different configurations that need to go in terms of setting up your AZ-scaling properties. We give them a good reasonable set of defaults to get started with, and then they can then work on optimizing or making changes to it.Corey: Ignoring entirely your mispronunciation of AMI, which is, of course, three syllables—and that is a petty hill upon which I will die—it occurs to me the more I work with AWS in various ways, the easier it gets. And I used to think in some respects, it was because the platform was so—it was improving so dramatically around me. But no, in many cases, it's because the first time you write some CloudFormation by hand, it's a nightmare and you keep smacking into weird issues. But the second or third time, it's super easy because you just copy the thing you've already built and change the relevant bits around. And that was the learning curve that I went through playing around with a lot of these things.When you start looking at this from a large-scale environment where it's not just about upskilling the people that you have to understand how these things integrate in AWS land, but also the consistent onboarding of engineers at a fairly progressive clip is, great, you effectively have to start doing trainings on all these things, and there's a lot of knobs and dials that can blow up and hurt people. At some point, building the guardrails or building the environment in which you are getting all the stuff abstracted away from where the application engineers have to think about this at all, it eventually reaches a tipping point where it starts to feel like it's no longer optional if you want to continue growing as a company because you don't have the luxury of spending six months of onboarding before you let someone touch the thing they were hired to build.Micheal: And you will see that many companies very often have very similar programming practices like you just described. Even I learned that the same way: you have a base template, you just copy-paste it and start from there on. And no one goes through the bootstrapping process manually anymore; you want to—I think we call it cargo-culting, but in general, just get something to bootstrap and start from there. But one of the things we learned in sort of the hard way is that can also lead to, kind of, you pushing, you know, not great practices because people don't know what is a blessed version of a good template or what actually would make sense. So, some of those things, we have been working on.And this is where centralized teams like engineering productivity are really helpful is we provide you with the blessed or the canonical way to do certain things. Case in point example is a CI/CD pipeline or delivery of software services. We have invested enough in experimenting on what works with some of the more nuanced use cases at Pinterest, in helping generate, sort of, a canonical version which would cover 80% of the use cases. Someone could just go and try to build a service and they could just use the same canonical pipeline without learning much or making changes to it. This also reduces that cargo-culting nature which I called, rather than copying it from unknown sources and trying to like—again, it may cause havoc to our systems, so we can avoid a lot of that because of these practices.Corey: So, let's step a little bit beyond AWS—I know I hate doing it, too—but I'm going to assume that your remit is broader than, oh, AWS whisperer-slash-Wrangler. So, tell me a little bit more about what it is that your day-to-day looks like if there is anything that could be said not to focus purely around AWS whispering.Micheal: So, one of the challenges—and I want to talk about this a bit more—is our environments have become extremely complex over time. And it's the nature of, like, rising entropy. Like, we've just noticed that there's two things: we have a diverse set of customer base, and these include everyone trying to do different workloads or work service types. What that essentially translates into is that we realized that our solution may not fit all of them. For example, what works for a machine-learning engineer in terms of iterating on building a model and delivering a model is not the same as someone working on a long-running service and trying to deploy that. The same would apply for someone trying to operate a Kafka system.And that has made, I think, definitely our job a bit challenging in trying to assess where do you actually draw the line on the abstraction? What is the right layer of abstraction across your local development experience, across when you move over to staging your code in a PR model and getting feedback and subsequently actually releasing it to production? Because this changes dramatically based on what is the workload type you're working on. And we feel like that has been one of the biggest challenges where I know I spent my day-to-day and my team does too, in trying to help provide some of the right solutions for these individuals. There's—very often we'll also get asked from individuals trying to do a very nuanced thing.Of late, we have been talking about thinking about how you operate functions, like provide Functions as a Service within the company? It just put us in a difficult spot at times because we have to ask the hard question, “Is this required?” I know the industry is doing it; it's definitely there. I personally believe, yes, it could be a future, but is that absolutely important? Is that going to benefit Pinterest in any formal way if we invest on some core abstractions?And those are difficult conversations to have because we have exciting engineers coming in trying to do amazing things; it puts us in a hard spot, as well, as to sometimes saying graciously, no. I know many companies deal with it when they have these centralized teams, but I think it's part of that job. Like when you say it's day-to-day, I would say I'm probably saying no a couple of times in that day.Corey: Let's pretend for the sake of argument that I am, tomorrow morning, starting another company—Twitter for Pets—and over the next ten years, it grows to be larger than Pinterest in terms of infrastructure, probably not revenue because it turns out pets are not the lucrative source of ad revenue that I was hoping it would be but, you know, directionally the same thing. It seems to me that building out this sort of function with this sort of approach to things is dramatically early as far as optimizations go when it's just me puttering around on something. I'm always cognizant of the wrong people taking the wrong message when we're talking about things that happen like this at scale. When does having an engineering productivity group begin to make sense?Micheal: I mentioned this earlier; like, yeah, there is definitely not a right answer, but we can start small. For example, this group actually started more as a delivery team. You know, when we started, we realized that we had different ways of deploying services or software at Pinterest, so we first gathered together to figure out, okay, what are the different ways and can we start simplifying that part? And that's where it started expanding. Okay, we are doing button-based deployments right now we have thousand-plus microservices, and we are seeing more incidents than we wanted to because anything where there's a human involved means there's a potential gap for error. I myself was involved in a SEV 0 incident, and I will be honest; we ended up deploying a Hello World application in one of our production fleet. Not the thing I wanted to be associated with my name, but, you know—Corey: And you were suddenly saying hello to the world, in fact—Micheal: [laugh].Corey: —and oops-a-doozy.Micheal: Yeah. So—and that really prompted us to rethink how we need to enable guardrails to do safe production rollouts. And that's how those conversations start ballooning out.Corey: And the healthy correct way. We've all broken production in various ways, and it's—you correctly are identifying, I believe, the direction you're heading in where this is a process problem and a tooling problem; it is not that you are secretly crap and should never have been allowed near anything in production. I mean, that's my excuse for me, but in your case, this is a common thing where it's, if someone can unintentionally cause issues like that, there needs to be better processes and procedures as the organization matures.Micheal: Yep. And that's kind of like always the route or the starting point for these discussions. And it starts growing from there on because, okay, you've helped improve the deploy process but now we're seeing insane amount of slowness, say on the build processes, or even post-deploy, there's, like, issues on how we monitor and look into data.And that I think forces these conversations, okay, where do we have these bespoke tools available? What are people doing today? And you have to ask those hard questions, like what can we actually remove from here? The goal is not to introduce yet another new system. Many a times, to be honest bash just gets the job done. [laugh].Personally, I'm okay with that as long as it's consistent and people, you know, are able to contribute to it and you have good practices in validating it, if it works, we should go for it rather than introducing yet another YAML [laugh] and some of that other aspects of doing that work. And that's what we encourage as well. That's how I think a lot of this starts connecting together in terms of, okay, now this is becoming a productivity group; they're focused on certain challenges where investing probably one person here may up-level a few other engineers who don't have to do that on a day-to-day basis. And I think that's one of the key items for, especially, folks who are running mid-sized companies to realize and start investing in these type of teams to really up-level, sort of, the rest of the engineering.Corey: You've been doing this for a fair while. If you were to go back and start over again on day one—which is always a terrifying question, on some level—what would you have done differently about building out this function as Pinterest continued to scale out?Micheal: Well, first, I must acknowledge that this was just not me, and there's, like, ton of people involved in helping make this happen.Corey: No, that's fair. We'll blame them for the missteps; that is—Micheal: [laugh].Corey: —just fine with me. I kid. I kid.Micheal: I think, definitely the nuances. If I look back, all the decisions that were made then at that point in time, there was a decision made to move to Phabricator, which was back then a great open-source code management system where with the current information at that point in time. And I'm not—I think it's very hard to always look back and say, “Oh, we could have chosen x at one point in time.” And I think in reality, that's how engineering organizations always evolve, that you have to make do with the information you have right now to make a decision that works for you over a couple of years.And I'll give you a small example of this. There was a time when Pinterest was actually on GitHub Enterprise—this was like circa 2013, I would say—and it really served as well for, like, five-plus years. Only then at certain point, we realized that it's hard to hire PHP engineers to support a tool like that, and we had to rethink what is the ROI and the investments we've made here? Can we ever map up or match back to one of the offerings in the industry today? And that's when you make decisions that, okay, at this point in time, it's clear that business continuity talks, you know, and it's hard to operate a system, which is, at this moment not supported, and then you make a call about making a shift or moving.And I think that's the key item. I don't think there's anything dramatically I would have changed since the start. Perhaps definitely investing a bit more individuals into the group and going from there. But that said, I'm really, sort of, at least proud of the fact that usually these teams are extremely lean and small, and they always have an outsized impact, especially when they're working with other engineers, other [opinionated 00:22:13] engineers for what it's worth.This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of "Hello, World" demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking databases, observability, management, and security.And - let me be clear here - it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself all while gaining the networking load, balancing and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build.With Always Free you can do things like run small scale applications, or do proof of concept testing without spending a dime. You know that I always like to put asterisks next to the word free. This is actually free. No asterisk. Start now. Visit https://snark.cloud/oci-free that's https://snark.cloud/oci-free.Corey: Most folks show up intending to do good today, and you make the best decision at the time with the context and constraints that you have, but my question I think is less around, “Well, what were the biggest mistakes you made?” But more to do with the idea of, based upon what you've learned and as you have shown—as you've shined light on these dark areas, as you have been exploring it, has anything jumped out at you that is, “Oh, yeah. Now, that I know—if I had known then what I know now, I would definitely have made this other decision.” Ideally, something that applies a little more globally than specific within Pinterest, just because the whole idea, aspirationally, is that people might learn something from our conversation. At least I will, if nothing else.Micheal: No, I think that's a great question. And I think the three things that jump to me, top of mind. I think technology is means to an end unless it gives you a competitive edge. And it's really hard to figure out at what point in time what technology and why we adopted it, it's going to make the biggest difference. Humans always tend to have a bias towards aligning towards where we want to go. So, that's the first one in my mind.The second one is, and we spoke about this last time, embrace your cloud provider as much as possible. You'd want to avoid taking on operational burden which is not going to add value to the business. If there is something you see your operating which can be offloaded—because your provider can, trust me, do a way better job than you or your team of few can ever do—embrace that as soon as possible. It's better that way because then it frees up your time to focus on the most important thing, which I've realized over time is—I really think teams like ours are actually—we're probably the most value as a glue to all the different experiences a software engineer would go through as part of their SDLC lifecycle.If we can simplify someone's life by giving them a clear view as to where their commit or the work is in this grand scheme of rolling out and giving them the right amount of data to take action when something goes wrong, trust me, they will love you for what you're doing because you're saving them ton of time. Many times, we don't realize that when we publish 11 different ways for you to go and check to just get your basic validation of work done. We tend to so much focus on the technological aspect of what the tool does, rather than the experience of it, and I've realized, if you can bridge the experience, especially for teams like ours, people really don't even need to know whether you're running Kubernetes or any of those solutions behind the scenes. And I think that's one of the biggest takeaways I have.Corey: I want to double down on something you said about the fact that you are not going to be able to run these services as effectively as your provider can. And relatively recently—in fact, since the first time we spoke—AWS has released a investment report in Virginia. And from 2011 through 2020, they have invested in building AWS data centers there, $35 billion. I promise almost no company that employs people listening to this that are not themselves a cloud provider is going to make that kind of investment in running these things themselves.Now, do cloud providers have sharp edges? Yes, absolutely. That is what my entire career is about, unfortunately. But you're not going to do a better job of running things more sustainably, more reliably, et cetera, et cetera. But there are other problems with this—and that's what I want to start exploring here—where in the olden days, when I ran things in data centers and they went down a lot more as a result, sometimes when there were outages, I would have the CEO of the company just standing there nervous worrying over my shoulder as I frantically typed to fix things.Spoiler: my typing accuracy did not improve by having someone looming over me. Now, when there's an outage that your cloud provider takes, in many cases the thing that you are doing to fix it is reloading the status page and waiting for an update because it is completely out of your hands. Is that something that you've had to encounter? Because you can push buttons and turn dials when things are broken and you control it, but in an AWS—or other cloud provider—outage, all you can really do is wait unless you have a DR plan that is large-scale and effective enough that you won't feel foolish or have wasted a huge amount of time and energy migrating off and then—because then it gets repaired in ten minutes. How do you approach that, from your perspective? I guess, the expectation management piece?Micheal: It's definitely I know something which keeps a lot of folks within infrastructure up at night because, like you just said, at times we can feel extremely powerless when we obviously don't have direct control—or visibility at times, as well—on what's happening. One of the things we have realized over time as part of running on our cloud provider for over a decade now, it forces us to rethink a bit on our priority workflows, what we want our Pinners to always have access to, what they need to see, what is not important or critical. Because it puts into perspective, even for the infrastructure teams, is to what is the most important thing we should always have it available and running, what is okay to be in a degraded state, until what time, right? So, it actually forces us to define SLOs and availability criteria within the team where we can broadcast that to the larger audience including the executives. So, none of this comes as a surprise at that point.I mean, it's not the answer, probably, you're looking for because is there's nothing we can do except set expectations clearly on what we can do and how when you think about the business when these things do happen. So, I know people may have I have a different view on this; I'm definitely curious to hear as well, but I know at Pinterest at least we have converged on our priority workflows. When something goes out, how do we jump in to provide a degraded experience? We have very clear run books to do that, and especially when it's a SEV 0, we do have clear processes in place on how often we need to update our entire company on where things are. And especially this is where your partnership with the cloud provider is going to be a big, big boon because you really want to know or have visibility, at the minimum some predictability on when things can get resolved, and how you want to work with them on some creative solutions. This is outside the DR strategy, obviously; you should still be focused on a DR strategy, but these are just simple things we've learned over time on how to just make it predictable for individuals within the company, so not everyone is freaking out.Corey: Yeah, from my perspective, I think the big things that I found that have worked, in my experience—mostly by getting them wrong the first time—is explain that someone else running the infrastructure when they take an outage; there's not much we can do. And no, it's not the sort of thing where picking up the phone and screaming at someone is going to help us, is the sort of thing that is best to communicate to executive stakeholders when things are running well, not in the middle of that incident.Then when things break, it's one of those, “Great, you're an exec. You know what your job is? Literally anything other than standing in the middle of the engineering floor, making everyone freak out even more. We'll have a discussion later about what the contributing factors were when you demand that we fire someone because of an outage. Then we're going to have a long and hard talk about what kind of culture you're trying to build here again?” But there are no perfect answers here.It's easy to sit here in the silver light of day with things working correctly and say, “Oh, yeah. This is how outages should be handled.” But then when it goes down, we're all basically an inch away at best from running around with our hair on fire, screaming, “Fix it, fix it, fix it, fix it, now.” And I am empathetic to that. There's a reason but I fix AWS bills for a living, and one of those big reasons is that it's a strictly business-hours problem and I don't have to run production infrastructure that faces anything that people care about, which is kind of amazing and freeing for someone who spent too many years on call.Micheal: Absolutely. And one of the things is that this is not only with the cloud provider, I think in today's nature of how our businesses are set up, there's probably tons of other APIs you are using or you're working with you may not be aware of. And we ended up finding that the hard way as well. There were a certain set of APIs or services we were using in the critical path which we were not aware of. When these outages happen, that's when you find that out.So, you're not only beholden to your provider at that point in time; you have to have those SLO expectations set with your other SaaS providers as well, other folks you're working with. Because I don't think that's going to change; it's probably only going to get complicated with all the different types of tools you're using. And then that's a trade-off you need to really think about. An example here is just like—you know, like I said, we moved in the past from GitHub to Phabricator—I didn't close the loop on that because we're moving back to GitHub right now [laugh] and that's one of the key projects I'm working with. Yeah, it's circle of life.But the thing is, we did a very strong evaluation here because we felt like, “Okay, there's a probability that GitHub can go down and that means people will be not productive for that couple of hours. What do we do then?” And we had to put a plan together to how we can mitigate that part and really build that confidence with the engineering teams, internally. And it's not the best solution out there; the other solution was just run our own, but how is that going to make any other difference because we do have libraries being pulled out of GitHub and so many other aspects of our systems which are unknowingly dependent on it anyways. So, you have to still mitigate those issues at some point in your entire SDLC process.So, that was just one example I shared, but it's not always on the cloud provider; I think there are just many aspects of—at least today how businesses are run, you're dependent; you have critical dependencies, probably, on some SaaS provider you haven't really vetted or evaluated. You will find out when they go down.Corey: So, I don't think I've told this story before, but before I started this place, I was doing a fair bit of consulting work for other companies. And I was doing a project at Pinterest years ago. And this was one of the best things I've ever experienced at a company site, let alone a client site, where I was there early in the morning, eight o'clock or so, so you know, engineers love to show up at the crack of 11:30. But so I was working a little early; it was great. And suddenly my SSH session that I was using to remote into something or other hung.And it's tap up, tap enter a couple of times, tap it a couple more. It was hung hard. “What's the—” and then someone gently taps me on the shoulder. So, I take the headphones off. It was someone from corporate IT was coming around saying, “Hey, there's a slight problem with our corporate firewall that we're fixing. Here's a MiFi device just for you that you can tether to get back online and get worked on until the firewall gets back.”And it was incredible, just the level of just being on top of things, and the focus on keeping the people who were building things and doing expensive engineering work that was awesome—and also me—productive during that time frame was just something I hadn't really seen before. It really made me think about the value of where do you remove bottlenecks from people getting their jobs done? It was—it remains one of the most impressive things I've seen.Micheal: That is great. And as you were telling me that I did look up our [laugh] internal system to see whether a user called Corey Quinn existed, and I should confirm this with you. I do see entries over here, a couple of commits, but this was 2015. Was that the time you were around, or is this before that even?Corey: That would have been around then, yes. I didn't start this place until late 2016.Micheal: I do see your commits, like, from 2015, and I—Corey: And they're probably terrible, I have no doubt. There's a reason I don't read code for a living anymore.Micheal: Okay, I do see a lot of GIFs—and I hope it's pronounced as GIF—okay, this is cool. We should definitely have a chat about this separately, Corey?Corey: Oh, yeah. “Would you explain this code?” “Absolutely not. I wrote it. Of course, I have no idea what it does. That's the rule. That's the way code always works.”Micheal: Oh, you are an honorary Pinterest engineer at this point, and you have—yes—contributed to our API service and a couple of Puppet profiles I see over here.Corey: Oh, yes—Micheal: [Amazing 00:36:11]. [laugh].Corey: You don't wind up thinking that's a risk factor that should be disclosed. I kid. I kid. It's, I made a joke about this when VMware acquired SaltStack and I did some analytics and found that 60 some odd lines of code I had written, way back when that were still in the current version of what was being shipped. And they thought, “Wait, is this actually a risk?”And no, I am making a joke. The joke is, is my code is bad. Fortunately, there are smart people around me who review these things. This is why code review is so important. But there was a lot to admire when I was there doing various things at Pinterest. It was a fun environment to work in, the level of professionalism was phenomenal, and I was just a big fan of a lot of the automation stuff.Phabricator was great. I love working with it, and, “Great, I'm going to use this to the next place I go.” And I did and then it was—I looked at what it took to get it up and running, and oh, yeah, I can see why GitHub is so popular these days. But it was neat. It was interesting seeing that type of environment up close.Micheal: That is great to hear. You know, this is what I enjoy, like, hearing some of these war stories. I am surprised; you seem to have committed way more than I've ever done in my [laugh] duration here at Pinterest. I do managing for a living, but then again—Corey, the good news is your code is still running on production. And we—Corey: Oh dear.Micheal: —haven't—[laugh]. We haven't removed or made any changes to it, so that's pretty amazing. And thank you for all your contributions.Corey: Oh, please, you don't have to thank me. I was paid, it was fine. That's the value of—Micheal: [laugh].Corey: —[work 00:37:38] for hire. It's kind of amazing. And the best part about consultants is, is when we're done with a project, we get the hell out everyone's happy about it.More happy when it's me that's leaving because of obvious personality-related reasons. But it was just an interesting company from start to finish. I remember one other time, I wound up opening a ticket about having a slight challenge with a flickering on my then Apple-branded display that everyone was using before they discontinued those. And I expected there to be, “Oh, okay. You're a consultant. Great. How did we not put you in the closet with a printer next to that thing, breathing the toner?” Like most consulting clients tend to do, and sure enough, three minutes later, I'm getting that tap on the shoulder again; they have a whole replacement monitor. “Can you go grab a cup of coffee? We'll run the cable for it. It'll just be about five minutes.” I started to feel actively bad about requesting things because I did a lot of consulting work for a lot of different companies, and not to be unkind, but treating consultants and contractors super well is not something that a lot of companies optimize for. I can't necessarily blame them for that. It just really stood out.Micheal: Yep, I do hope we are keeping up with that right now because I know our team definitely has a lot of consultants working with us as well. And it's always amazing to see; we do want to treat them as FTs. It doesn't even matter at that point because we're all individuals and we're trying to work towards common goals. Like you just said, I think I personally have learned a few items as well from some of these folks. Which is again, I think speaks to how we want to work and create a culture of, like, we're all engineers; we want to be solving problems together, and as you were doing it, we want to do it in such a way that it's still fun, and we're not having the restrictions of titles or roles and other pieces. But I think I digressed. It was really fun to see your commits though, I do want to track this at some point before we move completely over to GitHub, at least keep this as a record, for what it's worth.Corey: Yeah basically look at this graffiti in the codebase of, “A shit-poster was here,” and here I am. And that tends to be, on some level, the mark we live on the universe. What's always terrifying is looking at things I did 15 years ago in my first Linux admin job. Can I still ping the thing that I built there? Yes, I can. And how is that even possible? That should not have outlived me; honestly, it should never have seen the light of day in production, but here we are. And you never know how long that temporary kluge you put together is going to last.Micheal: You know, one of the things I was recalling, I was talking to someone in my team about this topic as well. We always talk about 10x engineers. I don't know what your thoughts are on that, but the fact that you just mentioned you built something; it still pings. And there's a bunch of things, in my mind, when you are writing code or you're working on some projects, the fact that it can outlast you and live on, I think that's a big, big contribution. And secondly, if your code can actually help up-level, like, ten other people, I think you've really made the mark of 10x engineer at that point.Corey: Yeah, the idea of the superhuman engineer is always been a strange and dangerous one. If for nothing else, from where I sit, excellence is inherently situational. Like we just talked about someone at Pinterest: is potentially going to be able to have that kind of impact specifically because—to my worldview—that there's enough process and things around there that empower them to succeed. Then if you were to take that engineer and drop them into a five-person startup where none of those things exist, they might very well flounder. It's why I'm always a little suspicious of this is a startup founded by engineers from Google or Facebook, or wherever it is.It's, yeah, and what aspects of that culture do you think are one-to-one matches with the small scrappy startup in the garage? Right, I predicting some challenges here. Excellence is always situational. An amazing employee at one company can get fired at a second one for lack of performance, and that does not mean that there's anything wrong with them and it does not mean that they are a fraud. It means that what they needed to be successful was present in one of those shops, but not the other.Micheal: This is so true. And I really appreciate you bringing this up because whenever we discuss any form of performance management, that is a—in my view personally—I think that's an incorrect term to be using. It is really at that point in time, either you have outlived the environment you are in, or the environment is going in a different direction where I think your current skill set probably could be best used in the environment where it's going to work. And I know it's very fuzzy at that point, but like you said, yes, excellence really means you don't want to tie it to the number of commits you have pushed out, or any specific aspect of your deliverables or how you work.Corey: There are no easy answers to any of these things, and it's always situational. It's why I think people are sometimes surprised when I will make comments about the general case of how things should be, then I talk to a specific environment where they do the exact opposite, and I don't yell at them for it. It's there—in a general sense, I have some guidance, but they are usually reasons things are the way they are, and I'm interested in hearing them out. Everything's situational, the worst consultant in the world is the one that shows up, has no idea what's going on, and then asked, “What moron set this up?” Invariably, two said, quote-unquote, “Moron.” And the engagement doesn't go super well from there. It's, “Okay, why is this the way that it is? What constraints shaped it? What was the context behind the problem you were trying to solve?” And, “Well, why didn't you use this AWS service?” “Because it didn't exist for another three years when we were building that thing,” is a—Micheal: Yes.Corey: —common answer.Micheal: Yes, you should definitely appreciate that of all the decisions that have been made in past. People tend to always forget why they were made. You're absolutely right; what worked back then will probably not work now, or vice versa, and it's always situational. So, I think I can go on about this for hours, but I think you hit that to the point, Corey.Corey: Yeah, I do my best. I want to thank you for taking another block of time out of your day to wind up talking with me about various aspects of what it takes to effectively achieve better levels of engineering productivity at large companies, with many teams, working on shared codebases. If people want to learn more about what you're up to, where can they find you?Micheal: I'm definitely on Twitter. So, please note that I'm spelled M-I-C-H-E-A-L on Twitter. So, you can definitely read on to my tweets there. But otherwise, you can always reach out to me on LinkedIn, too.Corey: Fantastic and we will, of course, include a link to that in the [show notes 00:44:02]. Thanks once again for your time. I appreciate it.Micheal: Thanks a lot, Corey.Corey: Micheal Benedict, head of engineering productivity at Pinterest. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with a comment telling me that you work at Pinterest, have looked at the codebase, and would very much like a refund and an apology.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
This week Gerhard is chatting with Romano Roth, Head of DevOps at Zühlke, a company founded by Gerhard Zühlke in 1968. Nowadays they help companies all over the world build, ship and run anything from factory robots, to AI assistants in complex regulatory environments, and even medical devices that perform autonomous robotic surgery. When Romano is not leading a team of 30 software engineers that specialise in operations, infrastructure and cloud, he is one of the organisers of DevOps Days Zürich, and also the DevOps Meetup group which is how Gerhard and Romano met in 2019. Having started his career as a .Net developer back in 2002, Romano had his fair share of dev and ops challenges, and he always enjoys seeing real business value delivered continuously in an automated way. In recent years, Romano's perspective broadened, and now he sees DevOps challenges across many companies. If you are curious about what good DevOps looks like, and what are the real challenges, then Romano has some good insights for you.
01:42 - Rin's Superpower: Writing, Public Speaking, and Being Neurodivergent + Awesome! 02:18 - GitHub Actions (https://github.com/features/actions) * Concurrent Actions * CICD (Continuous Improvement, Continuous Deployment) * Security * Trivy (https://aquasecurity.github.io/trivy/v0.17.0/) * Building Secure Open Source Communities From the Ground Up (https://www.youtube.com/watch?v=DtDitJyd-3s) * Camunda Community Hub (https://github.com/camunda-community-hub) * community-action-maven-release (https://github.com/camunda-community-hub/community-action-maven-release) 07:47 - Improving Developer Experience * Kubernetes Community Contributor Experience Special Interest Group (https://github.com/kubernetes/community/blob/master/sig-contributor-experience/README.md) * Contributing Code * Kubernetes.dev (https://www.kubernetes.dev/) 11:33 - Neurodivergence + Autistic Burnout * A Vulnerable Tale About Burnout - Julia Simon (https://www.youtube.com/watch?v=lpiXbfOTNYw) * CNCF Slack (https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/attend/slack-guidelines/) * Articles From Rin (https://muckrack.com/kiran-oliver/articles) * John K. Sawers: Hacking Your Emotional API (https://www.youtube.com/watch?v=OGDRUI8biTc) * CPTSD (https://www.healthline.com/health/cptsd) * EMDR (https://www.emdr.com/) 17:04 - Mentoring and Reviewing for Kubernetes (https://kubernetes.io/) * KubeCon + CloudNativeCon (https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/) 20:49 - Open Source Contribution * Paying Maintainers * Getting Hired Based on Contributions * Getting Started with DevOps/DevSecOps Contributing * MiniKube (https://minikube.sigs.k8s.io/docs/start/) * The Diana Initiative (https://www.dianainitiative.org/) * Trivy (https://aquasecurity.github.io/trivy/v0.17.0/) * Auditing 29:04 - Mentoring (Cont'd) * Pod Mentoring (https://github.com/kubernetes/community/blob/master/mentoring/programs/mentoring-events.md) * Ruby Central Scholarship Program (http://rubycentral.org/scholarships) 32:46 - Evaluating Open Source Projects: Tips For Newbies * Contributor Licence Agreements (CLAs) * Codes of Conduct (CoCs) * Evaluate the Community Reflections: John: Technical Mentorship vs Social Mentorship. Mando: Providing a welcoming sense of community for people with non-traditional backgrounds. Rin: Being intentional about helping others, but also helping others means helping yourself. John 2: The distinction between technical and autistic burnout. This episode was brought to you by @therubyrep (https://twitter.com/therubyrep) of DevReps, LLC (http://www.devreps.com/). To pledge your support and to join our awesome Slack community, visit patreon.com/greaterthancode (https://www.patreon.com/greaterthancode) To make a one-time donation so that we can continue to bring you more content and transcripts like this, please do so at paypal.me/devreps (https://www.paypal.me/devreps). You will also get an invitation to our Slack community this way as well. Transcript: Coming Soon! Special Guest: Rin Oliver.
Today's Full Stack Journey podcast delves into the open-source Argo project, a collection of tools to help ops team use Kubernetes. Our guest is Hong Wang. He's an Argo maintainer and co-founder and CEO of Akuity, a for-profit company built around Argo. The post Full Stack Journey 060: The Argo Project – Tools To Help You Run Kubernetes appeared first on Packet Pushers.
Today's Full Stack Journey podcast delves into the open-source Argo project, a collection of tools to help ops team use Kubernetes. Our guest is Hong Wang. He's an Argo maintainer and co-founder and CEO of Akuity, a for-profit company built around Argo. The post Full Stack Journey 060: The Argo Project – Tools To Help You Run Kubernetes appeared first on Packet Pushers.
About MatthewMatthew Prince is co-founder and CEO of Cloudflare. Cloudflare's mission is to help build a better Internet. Today the company runs one of the world's largest networks, which spans more than 200 cities in over 100 countries. Matthew is a World Economic Forum Technology Pioneer, a member of the Council on Foreign Relations, winner of the 2011 Tech Fellow Award, and serves on the Board of Advisors for the Center for Information Technology and Privacy Law. Matthew holds an MBA from Harvard Business School where he was a George F. Baker Scholar and awarded the Dubilier Prize for Entrepreneurship. He is a member of the Illinois Bar, and earned his J.D. from the University of Chicago and B.A. in English Literature and Computer Science from Trinity College. He's also the co-creator of Project Honey Pot, the largest community of webmasters tracking online fraud and abuse.Links: Cloudflare: https://www.cloudflare.com Blog post: https://blog.cloudflare.com/aws-egregious-egress/ Bandwidth Alliance: https://www.cloudflare.com/bandwidth-alliance/ Announcement of R2: https://blog.cloudflare.com/introducing-r2-object-storage/ Blog.cloudflare.com: https://blog.cloudflare.com Duckbillgroup.com: https://duckbillgroup.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Writing ad copy to fit into a 30 second slot is hard, but if anyone can do it the folks at Quali can. Just like their Torque infrastructure automation platform can deliver complex application environments anytime, anywhere, in just seconds instead of hours, days or weeks. Visit Qtorque.io today and learn how you can spin up application environments in about the same amount of time it took you to listen to this ad.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate: is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards, while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other, which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at Honeycomb.io/screaminginthecloud. Observability, it's more than just hipster monitoring.Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. Today, my guest is someone I feel a certain kinship with, if for no other reason than I spend the bulk of my time antagonizing AWS incredibly publicly. And my guest periodically descends into the gutter with me to do the same sort of things. The difference is that I'm a loudmouth with a Twitter account and Matthew Prince is the co-founder and CEO of Cloudflare, which is, of course, publicly traded. Matthew, thank you for deigning to speak with me today. I really appreciate it.Matthew: Corey, it's my pleasure, and appreciate you having me on.Corey: So, I'm mostly being facetious here, but not entirely, in that you have very publicly and repeatedly called out some of the same things I love calling out, which is AWS's frankly egregious egress pricing. In fact, that was a title of a blog post that you folks put out, and it was so well done I'm ashamed I didn't come up with it myself years ago. But it's something that is resonating with a large number of people in very specific circumstances as far as what their company does. Talk to me a little bit about that. Cloudflare is a CDN company and increasingly looking like something beyond that. Where do you stand on this? What got you on this path?Matthew: I was actually searching through really old emails to find something the other day, and I found a message from all the way back in 2009, so actually even before Michelle and I had come up with a name for Cloudflare. We were really just trying to understand the pricing on public clouds and breaking it all down. How much does the compute cost? How much does storage cost? How much does bandwidth cost?And we kept running the numbers over and over and over again, and the storage and compute costs actually seemed relatively reasonable and you could understand it, but the economics behind the bandwidth just made no sense. It was clear that as bandwidth usage grew and you got scale that your costs eventually effectively went to zero. And I think it was that insight that led to us starting Cloudflare. And the self-service plans at Cloudflare have always been unlimited bandwidth, and from the beginning, we didn't charge for bandwidth. People told us at the time we were crazy to not do that, but I think that that realization, that over time and at scale, bandwidth costs do go to zero is really core to who Cloudflare is.Cloudflare launched a little over 11 years ago now, and as we've watched the various public clouds and AWS in particular just really over that same 11 years not only not follow the natural price of bandwidth down, but really hold their costs steady. At some point, we've got a lot of mutual customers and it's a complaint that we hear from our mutual customers all the time, and we decided that we should do something about it. And so that started four years ago, when we launched the Bandwidth Alliance, and worked with almost all the major public clouds with the exception of Amazon, to say that if someone is sending traffic from a public cloud network to Cloudflare's network, we're not going to charge them for the bandwidth. It's going across a piece of fiber optic cable that yeah, there's some cost to put it in place and maybe there's some maintenance costs associated with it, but there's not—Corey: And the equipment at the end costs money, but it's not cloud cost; it just cost on a per second, every hour of your lifetime basis. It's a capital expense that is amortized across a number of years et cetera, et cetera.Matthew: And it's a fixed cost. It's not a variable cost. You put that fiber optic cable and you use a port on a router on each side. There's cost associated with that, but it's relatively de minimis. And so we said, “If it's not costing us anything and it's not costing a cloud provider anything, why are we charging customers for that?”And I think it's an argument that resonated with almost every other provider that was out there. And so Google discounts traffic when it's sent to us, Microsoft discounts traffic when it's sent to us, and we just announced that Oracle has joined this discounting their traffic, which was already some of the most cost-effective bandwidth from any cloud provider.Corey: Oh, yeah. Oracle's fantastic. As you were announced, I believe today, the fact that they're joining the Bandwidth Alliance is both fascinating and also, on some level, “Okay. It doesn't matter as much because their retail starting cost is 10% of Amazon's.” You have to start pushing an awful lot of traffic relative to what you would do AWS before it starts to show up. It's great to see.Matthew: And the fact that they're taking that down to effectively zero if you're using us is even better, right? And I think it again just illustrates how Amazon's really alone in this at being so egregious in how they do that. And it's, when we've done the math to calculate what their markups are, it's almost 80 times what reasonable assumptions on what their wholesale costs are. And so we really do believe in fighting for our customers and being customer-centric, and this seems like a place where—again, Amazon provides an incredible service and so many things, but the data transfer costs are just completely outrageous. And I'm glad that you're calling them out on it, and I'm glad we're calling them out on it and I think increasingly they look isolated and very anti-customer.Corey: What's interesting to me is that ingress to AWS at all the large public tier-one cloud providers is free. Which has led, I think, to the assumption—real or not—that bandwidth doesn't actually cost anything, whereas going outbound, all I can assume is that one day, some Amazon VP was watching a rerun of Meet the Parents and they got to the line where Ben Stiller says, “Oh, you can milk anything with nipples,” and said, “Holy crap. Our customers all have nipples; we can milk them with egress charges.” And here we are. As much as I think the cloud empowers some amazing stuff, the egress charges are very much an Achilles heel to a point where it starts to look like people won't even consider public cloud for certain workloads based upon that.People talk about how Netflix is a great representation of the ideal AWS customers. Yeah, but they don't stream a single byte to customers from AWS. They have their own CDN called Open Connect that they put all around the internet, specifically for that use case because it would bankrupt them otherwise.Matthew: If you're a small customer, bandwidth does cost something because you have to pay someone to do the work of interconnecting with all of the various networks that are out there. If you start to be, though, a large customer—like a Cloudflare, like an AWS, like an Azure—that is sending serious traffic to the internet, then it starts to actually be in the interest of ISPs to directly interconnect with you, and the costs of your bandwidth over time will approach zero. And that's the just economic reality of how bandwidth pricing works. I think that the confusion, to some extent, comes from all of us having bought our own home internet connection. And I think that the fact that you get more bandwidth up in most internet connections, and you get down, people think that there's some physics, which is associated with that.And there are; that turns out just to be the legacy of the cable system that was really designed to send pictures down to your—Corey: It wasn't really a listening post. Yeah.Matthew: Right. And so they have dedicated less capacity for up and again, in-home network connections, that makes a ton of sense, but that's not how internet connections work globally. In fact, you pay—you get a symmetric connection. And so if they can demonstrate that it's free to take the traffic in, we can't figure out any reason that's not simply about customer lock-in; why you would charge to take data out, but you wouldn't charge to put it in. Because actually cost more from writing data to a disk, it costs more than reading it from a disk.And so by all reasonable accounts, if they were actually charging based on what their costs were, they would charge for ingress but they want to charge for egress. But the approach that we've taken is to say, “For standard bandwidth, we just aren't going to charge for it.” And we do charge for if you use our premium routing services, which is something called Argo, but even then it's relatively cheap compared with what is just standard kind of internet connectivity that's out there. And as we see more of the clouds like Microsoft and Google and Oracle show that this is a place where they can be much more customer-centric and customer-friendly, over time I'm hopeful that will put pressure on Amazon and they will eliminate their egress fees.Corey: People also tend to assume that when I talk about this, that I'm somehow complaining about the level of discounting or whatnot, and they yell at me and say, “Oh, well, you should know by now, Corey, that no one at significant scale pays retail pricing.” “Thanks, professor. I appreciate that, but four years ago, or so I sat down with a startup founder who was sketching out the idea for a live video streaming service and said, ‘There's something wrong with my math because if I built this on AWS—which he knew very well, incidentally—it looks like it would cost me at our scale of where we're hoping to hit $65,000 a minute.'” And I checked and yep, sure enough, his math was not wrong, so he obviously did not build his proof of concept on top of AWS. And the last time I checked, they had raised several 100 million dollars in a bunch of different funding rounds.That is a company now that will not be on AWS because it was never an option. I want to talk as well about your announcement of R2, which is just spectacular. It is—please correct me if I get any of this wrong—it's an object store that lives in your existing distributed-points-of-presence-slash-data-centers-slash-colo-slash-a-bunch-of-computers-in-fancy-warehouse-rooms-with-the-lights-are-always-on-And-it's-always-cold-and-noisy. And people can store data there—Matthew: [crosstalk 00:10:23] aisles it's cold; in the other aisles, it's hot. But yes.Corey: Exactly. But it turns out when you lurk around to the hot aisle, that's not where all the buttons are and the things you're able to plug into, so it's freeze or sweat, and there's never a good answer. But it's an object store that costs a fair bit less than retail pricing for Amazon S3, or most other object stores out there. Which, okay, great. That's always good to see competition in the storage space, but specifically, you're not charging any data transfer costs whatsoever for doing this. First, where did this come from?Matthew: So, we needed it ourselves. I think all of the great products at Cloudflare start with an internal need. If you look at why do we build our zero-trust solutions? It's because we said we needed a security solution that was fast and reliable and secure to protect our employees as they were going out and using the internet.Why did we build Cloudflare Workers? Because we needed a very flexible compute platform where we could build systems ourselves. And that's not unique to us. I mean, why did Amazon build AWS? They built it because they needed those tools in order to continue to grow and expand as quickly as possible.And in fact, I think if you look at the products that Google makes that are really great, it ends up being the ones that Google's employees use themselves. Gmail started as Caribou once upon a time, which was their internal email system. And so we needed an object store and the sometimes belligerent CEO of Cloudflare insisted that our team couldn't use any of the public cloud object stores. And so we had to build it.That was the start of it and we've been using it internally for products over time. It powers, for example, Cloudflare Images, it powers a lot of our streaming video services, and it works great. And at some point, we said, “Can we take this and make it available to everyone?” The question that you've asked on Twitter, and I think a lot of people reasonably ask us, “What's the catch?”Corey: Well, in my defense, I think it's fair. There was an example that I gave of, “Okay, I'm going to go ahead and keep—because it's new, I don't trust new object stores. Great. I'm going to do the same experiment twice, keep one the pure AWS story and the other, I'm just going to add Cloudflare R2 to the mix so that I have to transfer out of AWS once.” For a one gigabyte file that gets shared out for a petabyte's worth of bandwidth, on AWS it costs roughly $52,000 to do that. If I go with the R2 solution, it cost me 13 cents, all of which except for a penny-and-a-half are AWS charges. And that just feels—when you're looking at that big of a gap, it's easy to look at that and think, “Okay, someone is trying to swindle me somewhere. And when you can't spot the sucker, it's probably me. What's the catch?”Matthew: I guess it's not really a catch; it's an explanation. We have been able to drive our bandwidth costs down low enough that in that particular use case, we have to store the file, and that, again, that—there's a hard disk in there and we replicate it to make sure that it's available so it's not just one hard disk, but it's multiple hard disks in various places, but that amortized over time, isn't that big a cost. And then bandwidth is effectively zero. And so if we can do that, then that's great.Maybe a different way of framing the question is like, “Why would we do that?” And I think what we see is that there is an opportunity for customers to be able to use the best of various cloud providers and hook the different parts together. So, people talk about multi-cloud all the time, and for a while, the way that I think people thought about that was you take the exact same workload and you run it in Azure and AWS. That turns out not to be—I mean, maybe some people do that, but it's super rare and it's incredibly hard.Corey: It has been a recurring theme of most things I say where, by default, that is one of the dumbest things I can imagine.Matthew: Yeah, that isn't good. But what people do want to do is they want to say, “Listen, there's some really great services that Amazon provides; we want to use those. And there's some really great services that Azure provides, and we want to use those. And Google's got some great machine learning, and so does IBM. And I want to sort of mix and match the various pieces together.”And the challenge in doing that is the egress fees. If everyone just had a detente and said there's going to be no egress fees for us to be able to hook these various [pits 00:14:48] together, then you would be able to take advantage of a lot of the different technologies and we would actually get stronger applications. And so the vision of what we're trying to build is how can we be the fabric that can stitch the various cloud providers together so that you can do that. And when we looked at that, and we said, “Okay, what's the path to getting there?” The big place where there's the just meatiest cost on egress fees is object stores.And so if you could have a centralized object store, and you can say then from that object go use whatever the best service is at Amazon, go use whatever the best service is at Google, go use whatever the best service is at Azure, that then allows, I think, actually people to take advantage of the cloud in a way which is what people really should mean when they talk about multi-cloud. Which is, there should be competition on the various features themselves, and you should be able to pick and choose the best of all of the different bits. And I think we as consumers then benefit from that. And so when we're looking at how we can strategically enable that future, building an object store was a real key part of that, and that's part of what we're doing. Now, how do we make money off of that? Well, there's a little bit off the storage, and again, even [laugh]—Corey: Well, that is the Amazonian answer there. It's like, “Your margin is my opportunity,” is a famous Bezos quote, and I figure you're sitting there saying, “Ah, it would cost $52,000 to do that in Amazon. Ah, we can make a penny-and-a-half.” That's very Amazonian, you could probably get hired over there with that philosophy.Matthew: Yeah. And this is a commodity service, just [laugh] storing data. If you look across the history of what Cloudflare has done, in 2014, we made encryption free because it's absurd to pay for math, right? I mean, it's just crazy right?Corey: Or to pay for security as a value-add. No, that should be baked into whatever you're doing, in an ideal world.Matthew: Domain registration. Like, it's writing something down in a ledger. It's a commodity; of course it should go to whatever the absolute cost is. On the other hand, there are things that we do that aren't commodities where we are able to better protect people because we see so much traffic, and we've built the machine learning models, and we've done those things, and so we charge for those things. So commodities, we think over time, go to effectively, whatever their cost is, and then the value is in the actual intelligent services that are on top of it.But an object store is a commodity and so we should be trying to drive that pricing down. And in the case of bandwidth, it's effectively free for us. And so if we can be that fabric that connects the different class together, I think that makes sense is a strategy for us and that's why R2 made a ton of sense for us to build and to launch.Corey: There seems to be a lack of ability for lots of folks, at least on the internet to imagine a use case other than theirs. I cheated by being a consultant, I get to borrow other people's use cases at a high degree of turnover. But the question I saw raised was, “Well, how many workloads really do that much egress from static objects that don't change? Doesn't sound like there'd be a whole lot of them.” And it's, “Oh, my sweet summer child. Sure, your app doesn't do a lot of that, but let me introduce it to my friends who are hosting videos on their website, for example, or large images that get accessed a whole bunch of times; things that are written once and then read forever by the internet.”Matthew: And we sit in a position where because of the role that Cloudflare plays where we sit in front of a number of these different cloud providers, we could actually look at the use cases and the data, and then build products in order to solve that. And that's why we started with Workers; that's why we then built the KV store that was on top of that; we built object-store next. And so you can see as we're sort of marching through these things, it is very much being informed by the data that we actually see from real customers. And one of the things that I really like about R2 is in exactly the example that you gave where you can keep everything in S3; you can set R2 in front of it and put it in slurp mode, and effectively it just—as those objects get pulled out, it starts storing them there. And so the migration path is super easy; you don't have to actually change anything about your application and will cut your bills substantially.And so I think that's the right thing to enable a multi-cloud world where, again, it's not you're running the exact same workload in different places, but you get to take advantage of the really great tack that all of these companies are building and use that. And then the companies will compete on building that tech well. So, it's not just about how do I get the data in and then kind of underinvest in all of the different services that I provide. It's how can we make sure that on a service-by-service basis, you actually are having real competition over time. And again, I think that's the right thing for customers, and absolutely R2 might not be the right thing for every use case that's out there, but I think that it wi—enabling more competition is going to make the cloud better for everyone.Corey: Oh, yeah. It's always fun hearing it from Amazonians. It's, “You have a service that talks to satellites in orbit. You really think that's a general-purpose thing that every company out there has to deal with?” No. Well, not yet, anyway.It also just feels to me like their transfer approach is antithetical to almost every other aspect of how they have built their cloud. Amazonians have told me repeatedly—I believe them—that their network is effectively magic. The fact that you can get near line rate between any two points without melting various [unintelligible 00:20:14], which shows that there was significant thought, work, effort, planning, technology, et cetera, put into the network. And I don't dispute that. But if I'm trying to build a workload and put it inside of AWS, I can control how it performs tied to budget; I can have a lot of RAM for things that are memory intensive, or I can have a little RAM; I can have great CPU performance or terrible CPU performance.The challenge with data transfer is it is uniformly great. “I want to get that data over there super quickly.” Yeah, awesome. I'm fine paying a premium for that. But I have this pile of data right here. I want to get it over there, ideally by Tuesday. There's no good way to do that, even with their Snowball—or Snow Family devices—when you fill them with data and send them into AWS, yeah, that's great. Then you just pay for the use of the device.Use them to send data out of AWS, they tack on an additional per-gigabyte fee for getting the data out. You're training as a lawyer, you went to the same law school that my wife did, the University of Chicago, which, oh, interesting stories down that path. But if we look at this, my argument is that the way to do an end-run around this is to sue Amazon for something, and then demand access to the data you have living in their environment during discovery. Make them give it to you for free, though, they'd probably find a way to charge it there, too. It's just a complete lack of vision and lack of awareness because it feels like they're milking a cash cow until it dies.Matthew: Yeah, they probably would charge for it and you'd also have to pay a lot of lawyers. So, I'm not sure that's the cost [crosstalk 00:21:44]—Corey: Its only works above certain volumes, I figure.Matthew: I do think that if your pricing strategy is designed to lock people in to prevent competition, then that does create other challenges. And there are certainly some University of Chicago law professors out there that have spent their careers arguing why antitrust laws don't make any sense, but I think that this is definitely one of those areas where you can see very clearly that customers are actually being harmed by the pricing strategy that's there. And the pricing strategy is not tied in any way to the underlying costs which are associated with that. And so I do think that, especially as you see other providers in the space—like Oracle—taking their bandwidth costs to effectively zero, that's the sort of thing that I think will have regulators start to scratch their heads. If tomorrow, AWS took egress costs to zero, and as a result, R2 was not as advantaged as it is today against them, you know, I think there are a lot of people who would say, “Oh, they showed Cloudflare.” I would do a happy dance because that's the best thing [thing they can do 00:22:52] for our customers.Corey: Our long-term goals, it sounds like, are relatively aligned. People think that I want to see AWS reign ascendant; people also say I want to see them burning and crashing into the sea, and neither one of those are true. What I want is, I want someone in a few years from now to be doing a startup and trying to figure out which cloud provider they should pick, and I want that to be a hard decision. Ideally, if you wind up reducing data transfer fees enough, it doesn't even have to be only one. There are stories that starts to turn into an actual realistic multi-cloud story that isn't, at its face, ridiculous. But right now, you have to pick a horse and ride it, for a variety of reasons. And I don't like that.Matthew: It's entirely egress-based. And again, I think that customers are better off if they are able to pick who is the best service at any time. And that is what encourages innovation. And over time, that's even what's good for the various cloud providers because it's what keeps them being valuable and keeps their customers thinking that they're building something which is magical and that they aren't trapped in the decision that they made, which is when we talk to a lot of the customers today, they feel that way. And it's I think part of why something like R2 and something like the Bandwidth Alliance has gotten so much attention because it really touches a nerve on what's frustrating customers today. And if tomorrow Amazon announced that they were eliminating egress fees and going head-to-head with R2, again, I think that's a wonderful outcome. And one that I think is unlikely, but I would celebrate it if it happened.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of "Hello, World" demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking databases, observability, management, and security.And - let me be clear here - it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself all while gaining the networking load, balancing and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build.With Always Free you can do things like run small scale applications, or do proof of concept testing without spending a dime. You know that I always like to put asterisks next to the word free. This is actually free. No asterisk. Start now. Visit https://snark.cloud/oci-free that's https://snark.cloud/oci-free.Corey: My favorite is people who don't do research on this stuff. They wind up saying, “Oh, yeah. Cloudflare is saying that bandwidth is a fixed cost. Of course not. They must be losing their shirt on this.”You are a publicly-traded company. Your gross margins are 76% or 77%, depending upon whether we're talking about GAAP or non-GAAP. Point being, you are clearly not selling this at a loss and hoping to make it up in volume. That's what a VC-backed company does. Is something that is real and as accurate.I want to, on some level, I guess, low-key apologize because I keep viewing Cloudflare through a lens that is increasingly inaccurate, which is as a CDN. But you've had Cloudflare Workers for a while, effectively Functions as a Service that run at the edge, which has this magic aura around it, that do various things, which is fascinating to me. You're launching R2; it feels like you are in some ways aiming at becoming a cloud provider, but instead of taking the traditional approach of building it from the region's outward, you're building it from the outward in. Is that a fair characterization?Matthew: I think that's right. I think fundamentally what Cloudflare is, is a network. And I remember early on in the pandemic, we did a series of fireside chats with people we thought we could learn from. And so was everyone from Andre Iguodala, the basketball player, to Mark Cuban, the entrepreneur, to we had a [unintelligible 00:25:56] governor and all kinds of things. And we these were just internal on off the record.And I got to do one with Eric Schmidt, the former CEO of Google. And I said, “You know, Eric, one of the things that we struggle with is describing what is Cloudflare.” And without hesitation, he said, “Oh, that's easy. You're the network I plug into and don't have to worry about anything else.” And I think that's better than I could say it, myself, and I think that's what it is that we fundamentally are: we're the network that fits together.Now, it turns out that in the process of being that network and enabling that network, we are going to build things like R2, which start to be an object store and starts to sort of step into some of the cloud provider space. And Workers is really just a way of programming that network in order to do that, but it turns out that there are a bunch of workloads that if you move them into the network itself, make sense—not going to be every workload, but a lot of workloads that makes sense there. And again, I think that you can actually be very bullish on all of the big public cloud providers and bullish on Cloudflare at the same time because what we want to do is enable the ability for people to mix and match, and change, and be the fabric that connects all of those things together. And so over time, if Amazon says, “We're going to drop egress fees,” it may be that R2 isn't a product that exists—I don't think they're going to do that, so I think it's something that is going to be successful for us and get a lot of new users to us—but fundamentally, I think that where the traditional public clouds think of themselves as the place you put data and you process data, I think we think of ourselves as the place you move data. And that's somewhat different.That then translates into it as we're building out the different pieces, where it does feel like we're building from the outside in. And it may be that over time, that put versus move distinction becomes narrower and narrower as we build more and more services like R2, and durable objects, and KV, and we're working on a database, and all those things. And it could be that we converge in a similar place.Corey: One thing I really appreciate about your vision because it is so atypical these days, is that you aren't trying to build the multifunction printer of companies. You are not trying to be all things to all people in every scenario. Which is impossible to do, but companies are still trying their level best to do it. You are staking out the bounds of where you were willing to start and where you're willing to stop, in a variety of different ways. I would be—how do I put it?—surprised if you at some point in the next five years come out with, “And this is our own database that we have built out that directly competes with the following open-source project that we basically have implemented their API and gone down that particular path.” It does not sound like it is in your core wheelhouse at that point. You don't need—to my understanding—to write your own database engine in order to do what you do.Matthew: Maybe. I mean, we actually are kind of working on a database because—Corey: Oh, no, here we go again.Matthew: [laugh]—and yeah—in a couple of different ways. So, the first way is, we want to make sure that if you're using Workers, you can connect to whatever database you want to use anywhere in the world. And that's something that's coming and we'll be there. At the same time, the challenge of distributed computing turns out not to be the computing, it turns out to be the data and figuring out how to—CAP theorem is real, right? Consistency, Availability, and Partition tolerance; you can pick any two out of the three, but you can't get all three.And so you there's always going to be some trade-off that's there. And so we don't see a lot of good examples. There's some really cool companies that are working on things in the space, but we don't see a lot of really good examples of who has built a database that can be run on a distributed workload system, like Cloudflare to it do well. And so our team internally needs that, and so we're trying to figure out how to build it for ourselves, and I would imagine that after we build it for ourselves—if it works the way we expect it will—that that will then be something that we open up.Our motivation and the way we think about products is we need to build the tools for our own team. Our team itself is customer zero, and then some of those things are very specific to us, but every once in a while, when there are functions that makes sense for others, then we'll build them as well. And that does maybe risk being the multifunction printer, but again, I think that because the customer for that starts with ourselves, that's how we think about it. And if there's someone else's making a great tool, we'll use that. But in this case, we don't see anyone that's built a multi-tenant, globally-distributed, ACID-compliant relational database.Corey: I can't let it pass on challenge. Sure they have, and you're running it yourself. DNS: the finest database in the world. You stuff whatever you want to text records, and now you have taken a finely crafted wrench and turned it into a barely acceptable hammer, which is what I love about doing that terrible approach. Yeah, relational is not going to quite work that way. But—Matthew: Yes. That's a fancy key-value store, right? So—and we've had that for a long time. As we're trying to build those things up, the good news is that, again, we've run data at scale for quite some time and proven that we can do it efficiently and reliably.Corey: There's a lot that can be said about building the things you need to deliver your product to customers. And maybe a database is a poor example here, but I don't see that your motivation in this space is to step into something completely outside your areas of expertise solely because there's money to be made over there. Well, yeah, fortune passes everywhere. The question is, which are you best positioned to wind up delivering an actual transformative solution to that space, and what parts of it are just rent-seeking where it's okay, we're going to go and wherever the money is, we're chasing that down.Matthew: Yeah, we're still a for-profit business, and we've been able to grow revenue well, but I think it is that what motivates us and what drives us comes back to our mission, which is how do you help build a better internet? And you can look at every single thing that we've done, and we try to be very long-term-oriented. So, for instance, when we in 2014 made encryption free, the number one reason at the time, when people upgraded for the free version of our service, the paid version of our service is they got encryption for that. And so it was super scary to say, “Hey, we're going to take the biggest feature and give it away for free,” but it was clearly the direction of history and we wanted to be on the right side of history. And we considered it a bug that the internet wasn't built in an encrypted way from the beginning.So, of course, that was going to head that direction. And so I think that we and then subsequently Let's Encrypt, and a bunch of others have said, it's absurd that you're charging for math. And again, I think that's a good example of how we think about products. And we want to continue to disrupt ourselves and take the things that once upon a time were reserved for our customers that spend $10 million-plus with us, and we want to keep pushing those things down because, over time, the real opportunity is if you do right by customers, there will be plenty of ways that you can earn some of their budget. And again, we think that is the long-term winning strategy.Corey: I would agree with this. You're not out there making sneakers and selling them because you see people spend a lot of money on that; you're delivering value for customers. I say this as one of your paying customers. I have zero problem paying you every month like clockwork, and it is the least cloud-like experience because I know exactly what the bill is going to be in advance, which is apparently not how things should be done in this industry, yadda, yadda, yadda. It is a refreshingly delightful experience every time.The few times I've had challenges with the service, it has almost always been a—I'll call it a documentation gap, where the way it was explained in the formal documentation was not how I conceptualize things, which, again, explaining what these complex things are to folks who are not steeped in certain areas of them is always going to be a challenge. But I cannot think back to a single customer service failure I've had with you folks. I can't look back at any point where you have failed me as a customer, which is a strange thing to say, given how incredibly efficient I am at stumbling over weird bugs.Matthew: Terrific to have you as a customer. We are hardly perfect and we make mistakes, but one of the things I think that we try to do and one of the core values of Cloudflare is transparency. If I think about, like, the original sins of tech, a lot of it is this bizarre secrecy which pervades the entire industry. When we make mistakes, we talk about them, and we explain them. When there's an error, we don't throw up a white page; we put up a page that has our logo on it because we want to own it.And that sometimes gets blowback because you're in front of it, but again, I think it's the right thing to do for customers. And it's and I think it's incredibly important. One of the things that's interesting is you mentioned that you know what your bill is going to be. If you go back and look at the history of hosting on the internet, in the early days of internet hosting, it looks a lot like AWS.Corey: Oh, 95th percentile transit billing; go for one five minutes segment over and boom, your bill explodes. Oh, I remember those days. Unkindly.Matthew: And it was super complicated. And then what happened is the hosting world switched from this incredibly complicated billing to much more simplified, predictable, unlimited bandwidth with maybe some asterisks, but largely that was in place. And then it's strange that Amazon came along and then has brought us back to the more complicated world that's out there. I would have predicted that that's a sine wave—Corey: It has to be. I mean—Matthew: —and it's going to go back and forth over time. But I would have predicted that we would be more in the direction of coming back toward simplify, everything included. And again, I think that's how we've priced our things from the beginning. I'm surprised that it has held on as long as it has, but I do think that there's going to be an opportunity for—and I don't think Amazon will be the leader here, but I think there will be an opportunity for one of the big clouds.And again, I think Oracle is probably doing this the best of any of them right now—to say, “How can we go away from that complexity? How can we make bills predictable? How can we not nickel and dime everything, but allow you to actually forecast and budget?” And it just seems like that's the natural arc of history, and we will head back toward that. And, again, I think we've done our part to push that along. And I'm excited that other cloud providers seem to be thinking about that now as well.Corey: Oh, yeah. What I do with fixing AWS bills is the same thing folks were doing in the 70s and 80s with long-distance bills for companies. We're definitely hitting that sine wave. I know that if I were at AWS in a leadership role, I would be actively embarrassed that the company that is delivering a better customer experience around financial things is Oracle of all companies, given their history of audits and surprising people and the rest. It is ridiculous to me.One last topic that I want to cover with you before we call it an episode is, back in college, you had a thesis that you have done an excellent job of effectively eliminating from the internet. And the theme of this, to my understanding, was that the internet is a fad. And I am so aligned with that because I'm someone who has said for years that emerging technologies are fads. I've said it about cloud, about virtualization, about containers. And I just skipped Kubernetes. And now I'm all-in on serverless, which means, of course it's going to fail because I'm always wrong on these things. But tell me about that.Matthew: When I was seven years old in 1980, my grandmother gave me an Apple ][+ computer for Christmas. And I took to it like a just absolute duck to water and did things that made me very popular in junior high school, like going to computer camp. And my mom used to sign up for continuing education classes at the local university in computer science, and basically sneak me in, and I'd do all the homework and all that. And I remember when I got to college, there was a small group of students that would come around and help other students set their computer up, and I had it all set up and was involved. And so, got pretty deeply involved in the computer science program at college.And then I remember there was a group of three other students—so they were four of us—and they wanted to start an online digital magazine. And at the time, this was pre-web, or right in the early days of the web; it was sort of nineteen… ninety-three. And we built it originally on old Apple technology called HyperCard. And we used to email out the old HyperCard stacks. And the HyperCard stacks kept getting bigger and bigger and bigger, and we'd send them out to the school so [laugh] that we—so we kept crashing the mail servers.But the college loved this, so they kept buying bigger and bigger mail servers. But they were—at some point, they said, “This won't scale. You got to switch technologies.” And they introduced us to two different groups. One was a printer company based out in San Francisco that had this technology called PDF. And I was a really big fan of PDF. I thought PDF was the future, it was definitely going to be how everything got published.And then the other was this group of dorky graduate students at the University of Illinois that had this thing called a browser, which was super flaky, and crashed all the time, and didn't work. And so of the four of us, I was the one who voted for PDF and the other three were like, “Actually, I think this HTML thing is going to be a hit.” And we built this. We won an award from Wired—which was only a print magazine at the time—that called us the first online-only weekly publication. And it was such a struggle to get anyone to write for it because browsers sucked and, you know, trying to get students on campus, but no one on campus cared.We would get these emails from the other side of the world, where I remember really clearly is this—in broken English—email from Japan saying, “I love the magazine. Please keep writing more for the magazine.” And I remember thinking at the time, “Why do I care if someone in Japan is reading this if the girl down the hall who I have a crush on isn't?” Which is obviously what motivates dorky college students like myself. And at that same time, you saw all of this internet explosion.I remember the moment when Netscape went public and just blew through all the expectations. And it was right around the time I was getting ready to graduate for college, and I was kind of just burned out on the entire thing. And I thought, “If I can't even get anyone to write for this dopey magazine and yet we're winning awards, like, this stuff has to all just be complete garbage.” And so wrote a thesis on—ehh, it was not a very good [laugh] thesis. It's—but one of the things I said was that largely the internet was a fad, and that if it wasn't, that it had some real risks because if you enabled everyone to connect with whatever their weird interests and hobbies were, that you would very quickly fall to the lowest common denominator. And predicted some things that haven't come true. I thought for sure that you would have both a liberal and conservative search engine. And it's a miracle to this day, I think that doesn't exist.Corey: Now, that you said it, of course, it's going to.Matthew: Well, I don't know I've… [sigh] we'll see. But it is pretty amazing that Google has been able to, again, thread that line and stay largely apolitical. I'm surprised there aren't more national search engines; the fact that it only Russia and China have national search engines and France and Germany don't is just strange to me. It seems like if you're controlling the source of truth and how people find it, that seems like something that governments would try and take over. There are some things that in retrospect, look pretty wise, but there were a lot more things that looked really, really stupid. And so I think at some level, I had to build Cloudflare to atone for that stupidity all those years ago.Corey: There's something to be said for looking back and saying, “Yeah, I had an opinion, and with the light of new information, I am changing my opinion.” For some reason, in some circles, it feels like that gets interpreted as a sign of weakness, but I couldn't disagree more, it's, “Well, I had an opinion based upon what I saw at the time. Turns out, I was wrong, and here we are.” I really wish more people were capable of doing that.Matthew: It's one of the things we test for in hiring. And I think the characteristic that describes people who can do that well is really empathy. The understanding that the experiences that you have lead you to have a unique set of insights, but they also create a unique set of blind spots. And it's rare that you find people that are able to do that. And whenever you do—whenever we do we hire them.Corey: To that end, as far as hiring and similar topics go, if people want to learn more about how you view things, and how you see the world, and what you're releasing—maybe even potentially work with you—where can they find you?Matthew: [laugh]. So, the joke, sometimes, internal at Cloudflare is that Cloudflare is a blogging company that runs this global network just to have something to write about. So, I think we're unlike most corporate blogs, which are—if our corporate blog were typical, we'd have articles on, like, “Here are the top six reasons you need a fast website,” which would just be, you know, shoot me. But instead, I think we write about the things that are going on online and our unique view into them. And we have a core value of transparency, so we talk about that. So, if you're interested in Cloudflare, I'd encourage you to—especially if you're of the sort of geekier variety—to check out blog.cloudflare.com, and I think that's a good place to learn about us. And I still write for that occasionally.Corey: You're one of the only non-AWS corporate blogs that I pay attention to, for that exact reason. It is not, “Oh, yay. More content marketing by folks who just feel the need to hit a quota as opposed to talking about something valuable and interesting.” So, it's appreciated.Matthew: The secret to it was we realized at some point that the purpose of the blog wasn't to attract customers, it was to attract potential employees. And it turns out, if you sort of change that focus, then you talk to people like their peers, and it turns out then that the content that you create is much more authentic. And that turns out to be a great way to attract customers as well.Corey: I want to thank you for taking so much time out of your day to speak with me. I really appreciate it.Matthew: Thanks for all you're doing. And we're very aligned, and keep fighting the good fight. And someday, again, we'll eliminate cloud egress fees, and we can share a beer when we do.Corey: I will absolutely be there for it. Matthew, Prince, CEO, and co-founder of Cloudflare. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with a rambling comment explaining that while data packets into a cloud provider are cheap and crappy, the ones being sent to the internet are beautiful, bespoke, unicorn snowflakes, so of course they cost money.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
In this episode, we cover: 00:00:00 - Introduction 00:02:45 - Adopting the Cloud 00:08:15 - POC Process 00:12:40 - Infrastructure Team Building 00:17:45 - “Disaster Roleplay”/Communicating to the Non-Technical Side 00:20:20 - Leadership 00:22:45 - Tomas' Horror Story/Dashboard Organziation 00:29:20 - Outro Links: Productboard: https://www.productboard.com Scaling Teams: https://www.amazon.com/Scaling-Teams-Strategies-Successful-Organizations/dp/149195227X Seeking SRE: https://www.amazon.com/Seeking-SRE-Conversations-Running-Production/dp/1491978864/ TranscriptJason: Welcome to Break Things on Purpose, a podcast about failure and reliability. In this episode, we chat with Tomas Fedor, Head of Infrastructure at Productboard. He shares his approach to testing and implementing new technologies, and his experiences in leading and growing technical teams.Today, we've got with us Tomas Fedor, who's joining us all the way from the Czech Republic. Tomas, why don't you say hello and introduce yourself?Tomas: Hello, everyone. Nice to meet you all, and my name is Tomas, or call me Tom. And I've been working for a Productboard for past two-and-a-half year as infrastructure leader. And all the time, my experience was in the areas of DevOps, and recently, three and four years is about management within infrastructure teams. What I'm passionate about, my main technologies-wise in cloud, mostly Amazon Web Services, Kubernetes, Infrastructure as Code such as Terraform, and recently, I also jumped towards security compliances, such as SOC 2 Type 2.Jason: Interesting. So, a lot of passions there, things that we actually love chatting about on the podcast. We've had other guests from HashiCorp, so we've talked plenty about Terraform. And we've talked about Kubernetes with some folks who are involved with the CNCF. I'm curious, with your experience, how did you first dive into these cloud-native technologies and adopting the cloud? Is that something you went straight for, or is that something you transitioned into?Tomas: I actually slow transition to cloud technologies because my first career started at university when I was like, say, half developer and half Unix administrator. And I had experience with building very small data center. So, those times were amazing to understand all the hardware aspects of how it's going to be built. And then later on, I got opportunity to join a very famous startup at Czech Republic [unintelligible 00:02:34] called Kiwi.com [unintelligible 00:02:35]. And that time, I first experienced cloud technologies such as Amazon Web Services.Jason: So, as you adopted Amazon, coming from that background of a university and having physical servers that you had to deal with, what was your biggest surprise in adopting the cloud? Maybe something that you didn't expect?Tomas: So, that's great question, and what comes to my mind first, is switching to completely different [unintelligible 00:03:05] because during my university studies and career there, I mostly focused on networking [unintelligible 00:03:13], but later on, you start actually thinking about not how to build a service, but what service you need to use for your use case. And you don't have, like, one service or one use case, but you have plenty of services that can suit your needs and you need to choose wisely. So, that was very interesting, and it needed—and it take me some time to actually adopt towards new thinking, new mindset, et cetera.Jason: That's an excellent point. And I feel like it's only gotten worse with the, “How do you choose?” If I were to ask you to set up a web service and it needs some sort of data store, at this point you've got, what, a half dozen or more options on Amazon? [laugh].Tomas: Exactly.Jason: So, with so many services on providers like Amazon, how do you go about choosing?Tomas: After a while, we came up with a thing like RFCs. That's like ‘Request For Comments,' where we tried to sum up all the goals, and all the principles, and all the problems and challenges we try to tackle. And with that, we also tried to validate all the alternatives. And once you went through all these information, you tried to sum up all the possible solutions. You typically had either one or two options, and those options were validated with all your team members or the whole engineering organization, and you made the decision then you try to run POC, and you either are confirmed, yeah this is the technology, or this is service you need and we are going to implement it, or you revised your proposal.Jason: I really like that process of starting with the RFC and defining your requirements and really getting those set so that as you're evaluating, you have these really stable ideas of what you need and so you don't get swayed by all of the hype around a certain technology. I'm curious, who is usually involved in the RFC process? Is it a select group in the engineering org? Is it broader? How do you get the perspectives that you need?Tomas: I feel we have very great established process at Productboard about RFCs. It's transparent to the whole organization, that's what I love the most. The first week, there is one or two reporters that are mainly focused on writing and summing up the whole proposal to write down goals, and also non-goals because that is going to define your focus and also define focus of reader. And then you're going just to describe alternatives, possible options, or maybe to sum up, “Hey, okay, I'm still unsure about this specific decision, but I feel this is the right direction.” Maybe I have someone else in the organization who is already familiar with the technology or with my use case, and that person can help me.So, once—or we call it a draft state, and once you feel confident, you are going to change the status of RFC to open. The time is open to feedback to everyone, and they typically geared, like, two weeks or three weeks, so everyone can give a feedback. And you have also option to present it on engineering all-hands. So, many engineers, or everyone else joining the engineering all-hands is aware of this RFC so you can receive a lot of feedback. What else is important to mention there that you can iterate over RFCs.So, you mark it as resolved after through two or three weeks, but then you come up with a new proposal, or you would like to update it slightly with important change. So, you can reopen it and update version there. So, that also gives you a space to update your RFC, improve the proposal, or completely to change the context so it's still up-to-date with what you want to resolve.Jason: I like that idea of presenting at engineering all-hands because, at least in my experience, being at a startup, you're often super busy so you may know that the RFC is available, but you may not have time to actually read through it, spend the time to comment, so having that presentation where it's nicely summarized for you is always nice. Moving from that to the POC, when you've selected a few and you want to try them out, tell me more about that POC process. What does that look like?Tomas: So typically, in my infrastructure team, it's slightly different, I believe, as you have either product teams focus on POCs, or you have more platform teams focusing on those. So, in case of the infrastructure team, we would like to understand what code is actually going to be about because typically the infrastructure team has plenty of services to be responsible for, to be maintained, and we try to first choose, like, one specific use case and small use case that's going to suit the need.For instance, I can share about implementation of HashiCorp Vault, like our adoption. We leveraged firstly only key-value engine for storing secrets. And what was important to understand here, whether we want to spend hours of building the whole cluster, or we can leverage their cloud service and try to integrate it with one of our services. And we need to understand what service we are going to adopt with Vault.So, we picked cloud solution. It was very simple, the experience that were seamless for us, we understood what we needed to validate. So, is developer able to connect to Vault? Is application able to connect to Vault? What roles does it offer? Was the difference for cloud versus on-premise solution?And at the end, it's often the cost. So, in that case, POC, we spin up just cloud service integrated with our system, choose the easiest possible adaptable service, run POC, validate it with developers, and provide all the feedback, all the data, to the rest of the engineering. So, that was for us, some small POC with large service at the end.Jason: Along with validating that it does what you want it to do, do you ever include reliability testing in that POC?Tomas: It is, but it is in, like, let's say, it's in a later stage. For example, I can again mention HashiCorp Vault. Once we made a decision to try to spin up first on-premise cluster, we started just thinking, like, how many master nodes do we need to have? How many availability zones do we need to have? So, you are going to follow quorum?And we are thinking, “Okay, so what's actually the reliability of Amazon Web Services regions and their availability zones? What's the reliability of multi-cross-region? And what actually the expectations that is going to happen? And how often they happen? Or when in the past, it happened?”So, all those aspects were considered, and we ran out that decision. Okay, we are still happy with one region because AWS is pretty stable, and I believe it's going to be. And we are now successfully running with three availability zones, but before we jumped to the conclusion of having three availability zones, we run several tests. So, we make sure that in case one availability zone being down, we are still fully able to run HashiCorp Vault cluster without any issues.Jason: That's such an important test, especially with something like HashiCorp Vault because not being able to log into things because you don't have credentials or keys is definitely problematic.Tomas: Fully agree.Jason: You've adopted that during the POC process, or the extended POC process; do you continue that on with your regular infrastructure work continuing to test for reliability, or maybe any chaos engineering?Tomas: I actually measure something about what we are working on, like, what we have so far improved in terms of post-mortem process that's interesting. So, we started two-and-a-half year ago, and just two of us as infrastructure engineers. At the time, there was only one incident response on-call team, our first iteration within the infrastructure team was with migration from Heroku, where we ran all our services, to Amazon Web Services. And that time, we needed to also start thinking about, okay, the infrastructure team needs to be on call as well. So, that required to update in the process because until then, it works great; you have one team, people know each other, people know the whole stack. Suddenly, you are going to add new people, you're going to add new people a separate team, and that's going to change the way how on-call should be treated, and how the process should look like.You may ask why. You have understanding within the one team, you understand the expectations, but then you have suddenly different skill set of people, and they are going to be responsible for different part of the technical organization, so you need to align the expectation between two teams. And that was great because guys at Productboard are amazing, and they are always helpful. So, we sat down, we made first proposal of how new team is going to work like, what are going to be responsibilities. We took inspirations from the already existing on-call process, and we just updated it slightly.And we started to run with first test scenarios of being on call so we understand the process fully. Later on, it evolved to more complex process, but it's still very simple. What is more complex: we have more teams that's first thing being on call; we have better separation of all the alerts, so you're not going to route every alert to one team, but you are able to route it to every team that's responsible for its service; the team have also prepared a set of runbooks so anyone else can easily follow runbook and fix the incident pretty easily, and then we also added section about post-mortems, so what are our expectations of writing down post-mortem once incident is resolved.Jason: That's a great process of documenting, really—right—documenting the process so that everybody, whether they're on a different team and they're coming over or new hires, particularly, people that know nothing about your established practices can take that runbook and follow along, and achieve the same results that any other engineer would.Tomas: Yeah, I agree. And what was great to see that once my team grew—we are currently five and we started two—we saw excitement of the team members to update the process so everybody else we're going to join the on-call is going to be excited, is going to take it as an opportunity to learn more. So, we added disaster roleplay, and that section talks about you are new person joining on-call team, and we would like to make sure you are going to understand all the processes, all the necessary steps, and you are going to be aligned with all the expectations. But before you will actually going to have your first alerts of on-call, we would like to try to run roleplay. Imagine what a HashiCorp Vault cluster is going down; you should be the one resolving it. So, what are the first steps, et cetera?And that time you're going to realize whatever is being needs to be done, it's not only from a technical perspective, such as check our go to monitoring, check runbook, et cetera, but also communication-wise because you need to communicate not only with your shadowing buddy, but you also need to communicate internally, or to the customers. And that's going to change the perspective of how an incident should be handled.Jason: That disaster roleplay sounds really amazing. Can you chat a little bit more about the details of how that works? Particularly you mentioned engaging the non-technical side—right—of communication with various people. Does the disaster roleplay require coordinating with all those people, or is it just a mock, you would pretend to do, but you don't actually reach out to those people during this roleplay?Tomas: So, we would like to also combine the both aspects. We would like to make sure that person understands all the communication channels that are set within our organization, and what they are used for, and then we would like to make sure that that person understand how to involve other engineers within the organization. For instance, what was there the biggest difference is that you have plenty of options how to configure assigning or creating an alert. And so for those, you may have a different notification settings. And what happened is that some of the people have settings only for newly created alert, but when you made a change of assigned person of already existing alert, someone else, it might happen that that person didn't notice it because the notification setting was wrong. So, we encountered even these kind of issues and we were able to fix it, thanks to disaster roleplay. So, that was amazing to be found out.Jason: That's one of the favorite things that I like to do when we're using chaos engineering to do a similar thing to the disaster roleplay, is to really check those incident response processes, and validating those alerts is huge. There's so many times that I've found that we thought that someone would be alerted for some random thing, and turns out that nobody knew anything was going on. I love that you included that into your disaster roleplay process.Tomas: Yeah, it was also great experience for all the engineers involved. Unfortunately, we run it only within our team, but I hope we are going to have a chance to involve all other engineering on-call teams, so the onboarding experience to the engineering on-call teams is going to rise and is going to be amazing.Jason: So, one of the things that I'm really interested in is, you've gone from being a DevOps engineer, an SRE individual contributor role, and now you're leaving a small team. I think a lot of folks, as they look at their career, and I think more people are starting to become interested in this is, what does that progression look like? This is sort of a change of subject, but I'm interested in hearing your thoughts on what are the skills that you picked up and have used to become an effective technical leader within Productboard? What's some of that advice that our listeners, as individual contributors, can start to gain in order to advance where they're going with their own careers?Tomas: Firstly, it's important to understand what makes you passionate in your career, whether it's working with people, understanding their needs and their future, or you would like to be more on track as individual contributor and you would like to enlarge your scope of responsibilities towards leading more technical complex initiatives, that are going to take a long time to be implemented. In case all the infrastructure, or in case of the platform leaders, I would say the position of manager or technical leader also requires certain technical knowledge so you can be still in close touch with your team or with your most senior engineers, so you can set the goals and set the strategic clearly. But still, it's important to be, let's say, people person and be able to listen because in that case, people are going to be more open to you, and you can start helping them, and you can start making their dreams true and achievable.Jason: Making their dreams true. That's a great take on this idea because I feel like so many times, having done infrastructure work, that you start to get a mindset of maybe that people just are making demands of you, all the time. And it's sometimes hard to keep that perspective of working together as a team and really trying to excel to give them a platform that they can leverage to really get things done. We were talking about disaster roleplaying, and that naturally leads to a question that we like to ask of all of our guests and that's, do you have any horror stories from your career about an incident, some horror story or outage that you experienced and what you've learned from it?Tomas: I have one, and it actually happened at the beginning of my career of DevOps engineer. What is interesting here that it was one of the toughest incidents I experienced. It happened after midnight. So, the time I was still new to a company, and we have received an alert informing about too many 502, 504 errors written from API. At the time API process thousands of requests per second, and the incident had a huge impact on the services we were offering.And as I was shadowing my on-call buddy, I tried to check our main alerting channel, see what's happening, what's going on there, how can I help, and I started with checking monitoring system, reviewing all the reports from the engineers of being on-call, and I initiated the investigation on my own. I realized that something is wrong or something is not right, and I realized I was just confused and I want sleep, so it took me a while to get back on track. So, I made the side note, like, how can I start my brain to be working as during the day? And then I got back to the incident resolution process.So, it was really hard for me to start because I didn't know what [unintelligible 00:24:27] you knew about the channel, you knew about your engineers working on the resolution, but there were plenty of different communication funnels. Like, some of the engineers were deep-focused on their own investigation, and some of them were on call. And we needed to provide regular updates to the customers and internally as well. I had that inner feeling of let's share something, but I realized I just can't drop a random message because the message with all the information should have certain format and should have certain information. But I didn't know what kind of information should be there.So, I tried to ping someone, so, “Hey, can you share something?” And in the meantime, actually, more other people send me direct message. And I saw there are a lot of different tracks of people who tried to solve the incident, who tries to provide the status, but we were not aligned. So, this all showed me how important is to have proper communication funnel set. And we got the lucky to actually end up in one channel, we got lucky to resolve incident pretty quickly.And what else I learned that I would recommend to make sure you know where to work. I know it's pretty obvious sentence, but once your company has plenty of dashboards and you need to find one specific metric, sometime it looks like mission impossible.Jason: That's definitely a good lesson learned and feeds back to that disaster roleplays, practicing how you do those communications, understanding where things need to be communicated. You mentioned that it can be difficult to find a metric within a particular dashboard when you have so many. Do you have any advice for people on how to structure their dashboards, or name their dashboards, or organize them in a certain way to make that easier to find the metric or the information that you're looking for?Tomas: I will have a different approach, and that do have basic dashboard that provides you SLOs of all the services you have in the company. So, we understand firstly what service actually impacts the overall stability or reliability. So, that's my first advice. And then you should be able to either click on the specific service, and that should redirect you to it's dashboard, or you're going to have starred one of your favorite dashboards you have. So, I believe the most important is really have one main dashboard where you have all the services and their stability resourced, then you have option to look.Jason: Yeah, when you have one main dashboard, you're using that as basically the starting point, and from there, you can branch out and dive deeper, I guess, into each of the services.Tomas: Exactly, exactly true.Jason: I like that approach. And I think that a lot of modern dashboarding or monitoring systems now, the nice thing is that they have that ability, right, to go from one particular dashboard or graphic and have links out to the other information, or just click on the graph and it will show you the underlying host dashboard or node dashboard for that metric, which is really, really handy.Tomas: And I love the connection with other monitoring services, such as application monitoring. That gives you so much insight and when it's even connected with your work management tool is amazing so you can have all the important information in one place.Jason: Absolutely. So, oftentimes we talk about—what is it—the three pillars of observability, which I know some of our listeners may hate that, but the idea of having metrics and performance monitoring/APM and logs, and just how they all connect to each other can really help you solve a lot, or uncover a lot of information when you're in the middle of an incident. So Tomas, thanks for being on the show. I wanted to wrap up with one more question, and that's do you have any shoutouts, any plugs, anything that you want to share that our listeners should go take a look at?Tomas: Yeah, sure. So, as we are talking about management, I would like to promote one book that helped make my career, and that's Scaling Teams. It's written by Alexander Grosse and David Loftesness.And another one book is from Google, they have, like, three series, one of those is Seeking SRE, and I believe other parts are also useful to be read in case you would like to understand whether your organization needs SRE team and how to implement it within organization, and also, technically.Jason: Those are two great resources, and we'll have those linked in the show notes on the website. So, for anybody listening, you can find more information about those two books there. Tomas, thanks for joining us today. It's been a pleasure to have you.Tomas: Thanks. Bye.Jason: For links to all the information mentioned, visit our website at gremlin.com/podcast. If you liked this episode, subscribe to the Break Things on Purpose podcast on Spotify, Apple Podcasts, or your favorite podcast platform. Our theme song is called “Battle of Pogs” by Komiku, and it's available on loyaltyfreakmusic.com.
Please liberate my friends and keep the Zuckerberg torture to a minimum we can explore the stars and visit other dimensions but i'm not sure if we should continue using Kubernetes i will do my best never to betray the cause of freedom The post Kubernetes seems to represent slavery appeared first on Software Engineering Daily.
In this episode Mike catches up with Robert Hodges, CEO of Altinity, to discuss data warehouses such as ClickHouse, Big Data, and Kubernetes.Follow Robert on LinkedIn:https://www.linkedin.com/in/berkeleybob2105/OSA CON 2021https://altinity.com/osa-con-2021/Fully Managed ClickHouse in the Public Cloudhttps://altinity.com/
This week, Gunna Marripudi and Arindam Banjeree join us to discuss the new Astra Data Store offering and how presenting NFS storage locally to Kubernetes clusters can create new possibilities.
In this episode I talk with Aaron Alderman, Director and Founder at Tanks! For Playing. Tanks! For Playing aims to revolutionise the blockchain gaming space with a novel concept where gameplay is based around social connection and strategy that is fuelled by the ambition to win cash prizes in tournaments. #Tanks ☑️ Technology and Technology Partners Mentioned: #Blockchain, #Web3, Web2.0, NodeJS, Amazon, Kubernetes, React, Ethereum, Binance Smart Chain, Polymatic, Solidity, EVM, Uniswap, Quickswap, Pancakeswap, Coin MarketCap, Coin Gecko, NFTs ☑️ Raw Talking Points: * Blockchain gaming * Bootstrapping a new company through ICO ($441,755) * TANKS Token * Burns (what is burning) * Multi-chain * Web3 vs Web2 Technologies * The Game Its self written in Unity (mobile or in browser) * Investors vs players ... target market * What does it mean for the game to be fully on-chain * NFTs in Blockchain gaming * Hosting application/games for Web3 Technology behind that ☑️ Web: https://tanksforplaying.io/ ☑️ Whitepaper: https://storage.googleapis.com/tanks-... ☑️ Gameplay Video: https://www.youtube.com/watch?v=iJPHl... ☑️ Interested in being on #GTwGT? Contact via Twitter @GTwGTPodcast ☑️ Music: https://www.bensound.com
This week we discuss the state of severless, the agility equation and Twitter goes blue. Plus, what exactly happens in an Internet Minute…? Rundown The Unfulfilled Promise of Serverless (https://www.lastweekinaws.com/blog/the-unfulfilled-promise-of-serverless) Twitter will now let you pay to undo tweets and read ad-free news in the US (https://www.theverge.com/2021/11/9/22766286/twitter-blue-subscription-service-scroll-nuzzel-undo-tweets-ad-free-articles-us?scrolla=5eb6d68b7fedc32c19ef33b4) From Amazon to Zoom: What Happens in an Internet Minute In 2021? (https://www.visualcapitalist.com/from-amazon-to-zoom-what-happens-in-an-internet-minute-in-2021/) Clubhouse rolls out Replay to let users record live rooms and share them later (https://techcrunch.com/2021/11/08/clubhouse-record-room-replay/) Relevant to your interests HashiCorp: Benchmarking the S-1 Data (https://cloudedjudgement.substack.com/p/hashicorp-benchmarking-the-s-1-data) Cloudflare Announces Third Quarter 2021 Financial Results (https://finance.yahoo.com/news/cloudflare-announces-third-quarter-2021-201500627.html) Hello, world! Meet Kyndryl, the services biz spun-out by IBM (https://www.theregister.com/2021/11/04/kyndryl_ibm_spinoff/) Google is working on a more user-friendly way to find files in Drive (https://www.theverge.com/2021/11/4/22763374/google-drive-search-chips-filters-beta-test) Habitica - Gamify Your Life (https://habitica.com/static/home) Red Hat to hire fewer senior engineers after budget frozen (https://www.theregister.com/2021/11/05/red_hat_jobs/) Datadog Acquires Ozcode - Ozcode (https://oz-code.com/blog/general/datadog-acquires-ozcode) What I Talk About When I Talk About Platforms (https://martinfowler.com/articles/talk-about-platforms.html) To reinvent work, we have to destroy the clock (https://thehustle.co/to-reinvent-work-we-have-to-destroy-the-clock/) We've gone backwards since the Bronze Age with calendars (https://www.theregister.com/2021/11/08/calendar_backwards/) AWS Announces Plans to Open Second Region in Canada (https://www.businesswire.com/news/home/20211108005823/en/AWS-Announces-Plans-to-Open-Second-Region-in-Canada) Security software company McAfee acquired for $14 billion (https://www.theverge.com/2021/11/8/22769910/mcafee-private-investor-group-acquisition-software) Unity is buying Peter Jackson's Weta Digital for over $1.6B (https://techcrunch.com/2021/11/09/unity-is-buying-peter-jacksons-weta-digital-for-over-1-6b/) Amazon's Cloud's New Boss Is Girding to Defend Turf in the Field Company Pioneered (https://www.wsj.com/articles/amazons-cloud-boss-is-girding-to-defend-turf-in-the-field-company-pioneered-11636300800?st=5gwylbkkz0bug3s&reflink=article_email_share) GE to break up into 3 companies focusing on aviation, health care and energy (https://www.cnbc.com/2021/11/09/ge-to-break-up-into-3-companies-focusing-on-aviation-healthcare-and-energy.html) You're scheduling too many 1:1 meetings (https://www.protocol.com/workplace/one-on-one-meetings) Nonsense Elon Musk faces a $15 billion tax bill, which is likely the real reason he's selling stock (https://www.cnbc.com/2021/11/07/elon-musk-faces-a-15-billion-tax-bill-which-is-likely-the-real-reason-hes-selling-stock.html) Sponsors strongDM — Manage and audit remote access to infrastructure. Start your free 14-day trial today at strongdm.com/SDT (http://strongdm.com/SDT) CBT Nuggets — Training available for IT Pros anytime, anywhere. Start your 7-day Free Trial today at cbtnuggets.com/sdt (https://cbtnuggets.com/sdt) Conferences Coté speaking at DevOops (https://devoops.ru/en/) (Russia), Nov 11th: “Kubernetes is not for developers…?” (https://devoops.ru/en/talks/kubernetes-is-not-for-developers/) THAT Conference comes to Texas January 17-20, 2022 (https://that.us/events/tx/2022/) — Now with the right link Brandon going to AWS Re:invent join the meetupIRL channel in SDT Slack Promote SDT TikTok, mentioned some stuff gets edited out Listener Feedback Jordan wants you to work as an SRE at Shopify (http://Senior> Site Reliability Engineer - Remote, Americas). Fully remote in Americas, Hawaii, APAC and EMEA SDT news & hype Join us in Slack (http://www.softwaredefinedtalk.com/slack). Send your postal address to email@example.com (mailto:firstname.lastname@example.org) and we will send you free laptop stickers! Follow us on Twitch (https://www.twitch.tv/sdtpodcast), Twitter (https://twitter.com/softwaredeftalk), Instagram (https://www.instagram.com/softwaredefinedtalk/), LinkedIn (https://www.linkedin.com/company/software-defined-talk/) and YouTube (https://www.youtube.com/channel/UCi3OJPV6h9tp-hbsGBLGsDQ/featured). Brandon built the Quick Concall iPhone App (https://itunes.apple.com/us/app/quick-concall/id1399948033?mt=823) and he wants you to buy it for $0.99. Use the code SDT to get $20 off Coté's book, (https://leanpub.com/digitalwtf/c/sdt) Digital WTF (https://leanpub.com/digitalwtf/c/sdt), so $5 total. Become a sponsor of Software Defined Talk (https://www.softwaredefinedtalk.com/ads)! Recommendations Brandon: macOS disk size utility (http://grandperspectiv.sourceforge.net) Matt: Cloud Native AF #5: (https://www.cloudnativeaf.com/5) Charming Pirates and the Funnel to Contributors Photo Credits Banner (https://unsplash.com/photos/-mwNJswDlXE) Show Art (https://unsplash.com/photos/5mZ_M06Fc9g) Internet Minute Graphic (https://www.visualcapitalist.com/from-amazon-to-zoom-what-happens-in-an-internet-minute-in-2021/)
Hi, Spring fans! In this installment Josh Long (@starbuxman) talks to Apache Artemis and messaging legend Clebert Suconic (@clebertsuconic). Check out ArtemisCloud.io for easy Kubernetes deployments of Apache Artemis.
In this episode, Gerhard is joined by Cyrille Le Clerc, Product Manager Lead on Observability at Elastic, and Oleg Nenashev, Principal Engineer at CloudBees. It all started with Oleg's tweet back in July, in which he was promoting Akihiro Kiuchi's work on Jenkins monitoring with OpenTelemetry. This was done in the context of Google's Summer of Code - a link to Akihiro's demo is in the show notes. As you may remember from episode 20, instrumenting our changelog.com pipeline is on Gerhard's mind, and this conversation helped him clarify a few things. If you are thinking of instrumenting your CI/CD pipeline with OpenTelemetry, this episode is for you.
About RichardHe's also an instructor at Pluralsight, a frequent public speaker, and the author of multiple books on software design and development. Richard maintains a regularly updated blog (seroter.com) on topics of architecture and solution design and can be found on Twitter as @rseroter. Links: Twitter: https://twitter.com/rseroter LinkedIn: https://www.linkedin.com/in/seroter Seroter.com: https://seroter.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats v-u-l-t-r.com slash screaming.Corey: You know how git works right?Announcer: Sorta, kinda, not really Please ask someone else!Corey: Thats all of us. Git is how we build things, and Netlify is one of the best way I've found to build those things quickly for the web. Netlify's git based workflows mean you don't have to play slap and tickle with integrating arcane non-sense and web hooks, which are themselves about as well understood as git. Give them a try and see what folks ranging from my fake Twitter for pets startup, to global fortune 2000 companies are raving about. If you end up talking to them, because you don't have to, they get why self service is important—but if you do, be sure to tell them that I sent you and watch all of the blood drain from their faces instantly. You can find them in the AWS marketplace or at www.netlify.com. N-E-T-L-I-F-Y.comCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Once upon a time back in the days of VH1, which was like MTV except it played music videos, would have a show that was, “Where are they now?” Looking at former celebrities. I will not use the term washed up because that's going to be insulting to my guest.Richard Seroter is a returning guest here on Screaming in the Cloud. We spoke to him a year ago when he was brand new in his role at Google as director of outbound product management. At that point, he basically had stars in his eyes and was aspirational around everything he wanted to achieve. And now it's a year later and he has clearly failed because it's Google. So, outbound products are clearly the things that they are going to be deprecating, and in the past year, I am unaware of a single Google Cloud product that has been outright deprecated. Richard, thank you for joining me, and what do you have to say for yourself?Richard: Yeah, “Where are they now?” I feel like I'm the Leif Garrett of cloud here, joining you. So yes, I'm still here, I'm still alive. A little grayer after twelve months in, but happy to be here chatting cloud, chatting whatever else with you.Corey: I joke a little bit about, “Oh, Google winds up killing things.” And let's be clear, your consumer division which, you know, Google is prone to that. And understanding a company's org chart is a challenge. A year or two ago, I was of the opinion that I didn't need to know anything about Google Cloud because it would probably be deprecated before I really had to know about it. My opinion has evolved considerably based upon a number of things I'm seeing from Google.Let's be clear here, I'm not saying this to shine you on or anything like that; it's instead that I've seen some interesting things coming out of Google that I consider to be the right moves. One example of that is publicly signing multiple ten-year deals with very large, serious institutions like Deutsche Bank, and others. Okay, you don't generally sign contracts with companies of that scale and intend not to live up to them. You're hiring Forrest Brazeal as your head of content for Google Cloud, which is not something you should do lightly, and not something that is a short-term play in any respect. And the customer experience has continued to improve; Google Cloud products have not gotten worse, and I'm seeing in my own customer conversations that discussions about Google Cloud have become significantly less dismissive than they were over the past year. Please go ahead and claim credit for all of that.Richard: Yeah. I mean, the changes a year ago when I joined. So, Thomas Kurian has made a huge impact on some of that. You saw us launch the enterprise APIs thing a while back, which was, “Hey, here's, for the most part, every one of our products that has a fixed API. We're not going to deprecate it without a year's notice, whatever it is. We're not going to make certain types of changes.” Maybe that feels like, “Well, you should have had that before.” All right, all we can do is improve things moving forward. So, I think that was a good change.Corey: Oh, I agree. I think that was a great thing to do. You had something like 80-some-odd percent coverage of Google Cloud services, and great, that's going to only increase with time, I can imagine. But I got a little pushback from a few Googlers for not being more congratulatory towards them for doing this, and look, it's a great thing. Don't get me wrong, but you don't exactly get a whole lot of bonus points and kudos and positive press coverage—not that I'm press—for doing the thing you should have been doing [laugh] all along.It's, “This is great. This is necessary.” And it demonstrates a clear awareness that there was—rightly or wrongly—a perception issue around the platform's longevity and that you've gone significantly out of your way to wind up addressing that in ways that go far beyond just yelling at people on Twitter they don't understand the true philosophy of Google Cloud, which is the right thing to do.Richard: Yeah, I mean, as you mentioned, look, the consumer side is very experimental in a lot of cases. I still mourn Google Reader. Like, those things don't matter—Corey: As do we all.Richard: Of course. So, I get that. Google Cloud—and of course we have the same cultural thing, but at the same time, there's a lifecycle management that's different in Google Cloud. We do not deprecate products that much. You know, enterprises make decade-long bets. I can't be swap—changing databases or just turning off messaging things. Instead, we're building a core set of things and making them better.So, I like the fact that we have a pretty stable portfolio that keeps getting a little bit bigger. Not crazy bigger; I like that we're not just throwing everything out there saying, “Rock on.” We have some opinions. But I think that's been a positive trend, customers seem to like that we're making these long-term bets. We're not going anywhere for a long time and our earnings quarter after quarter shows it—boy, this will actually be a profitable business pretty soon.Corey: Oh, yeah. People love to make hay, and by people, I stretch the term slightly and talk about, “Investment analysts say that Google Cloud is terrible because at your last annual report you're losing something like $5 billion a year on Google Cloud.” And everyone looked at me strangely, when I said, “No, this is terrific. What that means is that they're investing in the platform.” Because let's be clear, folks at Google tend to be intelligent, by and large, or at least intelligent enough that they're not going to start selling cloud services for less than it costs to run them.So yeah, it is clearly an investment in the platform and growth of it. The only way it should be turning a profit at this point is if there's no more room to invest that money back into growing the platform, given your market position. I think that's a terrific thing, and I'm not worried at all about it losing money. I don't think anyone should be.Richard: Yeah, I mean, strategically, look, this doesn't have to be the same type of moneymaker that even some other clouds have to be to their portfolio. Look, this is an important part, but you look at those ten-year deals that we've been signing: when you look at Univision, that's a YouTube partnership; you look at Ford that had to do with Android Auto; you look at these others, this is where us being also a consumer and enterprise SaaS company is interesting because this isn't just who's cranking out the best IaaS. I mean, that can be boring stuff over time. It's like, who's actually doing the stuff that maybe makes a traditional company more interesting because they partner on some of those SaaS services. So, those are the sorts of deals and those sorts of arrangements where cloud needs to be awesome, and successful, and make money, doesn't need to be the biggest revenue generator for Google.Corey: So, when we first started talking, you were newly minted as a director of outbound product management. And now, you are not the only one, there are apparently 60 of you there, and I'm no closer to understanding what the role encompasses. What is your remit? Where do you start? Where do you stop?Richard: Yeah, that's a good question. So, there's outbound product management teams, mostly associated with the portfolio area. So network, storage, AI, analytics, database, compute, application modernization-y sort of stuff—which is what I cover—containers, dev tools, serverless. Basically, I am helping make sure the market understands the product and the product understands the market. And not to be totally glib, but a lot of that is, we are amplification.I'm amplifying product out to market, analysts, field people, partners: “Do you understand this thing? Can I help you put this in context?” But then really importantly, I'm trying to help make sure we're also amplifying the market back to our product teams. You're getting real customer feedback: “Do you know what that analyst thinks? Have you heard what happened in the competitive space?”And so sometimes companies seem to miss that, and PMs poke their head up when I'm about to plan a product or I'm about to launch a product because I need some feedback. But keeping that constant pulse on the market, on customers, on what's going on, I think that can be a secret weapon. I'm not sure everybody does that.Corey: Spending as much time as I do on bills, admittedly AWS bills, but this is a pattern that tends to unfold across every provider I've seen. The keynotes are chock-full of awesome managed service announcements, things that are effectively turnkey at further up the stack levels, but the bills invariably look a lot more like, yeah, we spend a bit of money on that and then we run 10,000 virtual instances in a particular environment and we just treat it like it's an extension of our data center. And that's not exciting; that's not fun, quote-unquote, but it's absolutely what customers are doing and I'm not going to sit here and tell them that they're wrong for doing it. That is the hallmark of a terrible consultant of, “I don't understand why you're doing what you're doing, so it must be foolish.” How about you stop and gain some context into why customers do the things that they do?Richard: No, I send around a goofy newsletter every week to a thousand or two people, just on things I'm learning from the field, from customers, trying to make sure we're just thinking bigger. A couple of weeks ago, I wrote an idea about modernization is awesome, and I love when people upgrade their software. By the way, most people migration is a heck of a lot easier than if I can just get this into your cloud, yeah love that; that's not the most interesting thing, to move VMs around, but most people in their budget, don't have time to rewrite every Java app to go. Everybody's not changing .NET framework to .NET core.Like, who do I think everybody is? No, I just need to try to get some incremental value first. Yes, then hopefully I'll swap out my self-managed SQL database for a Spanner or a managed service. Of course, I want all of that, but this idea that I can turn my line of business loan processing app into a thousand functions overnight is goofy. So, how are we instead thinking more pragmatically about migration, and then modernizing some of it? But even that sort of mindset, look, Google thinks about innovation modernization first. So, also just trying to help us take a step back and go, “Gosh, what is the normal path? Well, it's a lot of migration first, some modernization, and then there's some steady-state work there.”Corey: One of the things that surprised me the most about Google Cloud in the market, across the board, has been the enthusiastic uptake for enterprise workloads. And by enterprise workloads, I'm talking about things like SAP HANA is doing a whole bunch of deployments there; we're talking Big Iron-style enterprise-y things that, let's be honest, countervene most of the philosophy that Google has always held and espoused publicly, at least on conference stages, about how software should be built. And I thought that would cut against them and make it very difficult for you folks to gain headway in that market and I could not have been more wrong. I'm talking to large enterprises who are enthusiastically talking about Google Cloud. I've got a level with you, compared to a year or two ago, I don't recognize the place.Richard: Mmm. I mean, some of that, honestly, in the conversations I have, and whatever I do a handful of customer calls every week, I think folks still want something familiar, but you're looking for maybe a further step on some of it. And that means, like, yes, is everybody going to offer VMs? Yeah, of course. Is everyone going to have MySQL? Obviously.But if I'm an enterprise and I'm doing these generational bets, can I cheat a little bit, and maybe if I partner with a more of an innovation partner versus maybe just the easy next step, am I buying some more relevance for the long-term? So, am I getting into environment that has some really cool native zero-trust stuff? Am I getting into environment with global backend services and I'm not just stitching together a bunch of regional stuff? How can I cheat by using a more innovation vendor versus just lifting and shifting to what feels like hosted software in another cloud? I'm seeing more of that because these migrations are tough; nobody should be just randomly switching clouds. That's insane.So, can I make, maybe, one of these big bets with somebody who feels like they might actually even improve my business as a whole because I can work with Google Pay and improve how I do mobile payments, or I could do something here with Android? Or, heck, all my developers are using Angular and Flutter; aren't I going to get some benefit from working with Google? So, we're seeing that, kind of, add-on effect of, “Maybe this is a place not just to host my VMs, but to take a generational leap.”Corey: And I think that you're positioning yourselves in a way to do it. Again, talk about things that you wouldn't have expected to come out of Google of all places, but your console experience has been first-rate and has been for a while. The developer experience is awesome; I don't need to learn the intricacies of 12 different services for what I'm trying to do just in order to get something basic up and running. I can stop all the random little billing things in my experimental project with a single click, which that admittedly has a confirm, which you kind of want. But it lets you reason about these things.It lets you get started building something, and there's a consistency and cohesiveness to the console that, again, I am not a graphic designer, by any stretch of the imagination. My most commonly used user interface is a green-screen shell prompt, and then I'm using Vim to wind up writing something horrifying, ideally in Python, but more often in YAML. And that has been my experience, but just clicking around the console, it's clear that there was significant thought put into the design, the user experience, and the way of approaching folks who are starting to look very different, from a user persona perspective.Richard: I can—I mean, I love our user research team; they're actually fun to hang out with and watch what they do, but you have to remember, Google as a company, I don't know, cloud is the first thing we had to sell. Did have to sell Gmail. I remember 15 years ago, people were waiting for invites. And who buys Maps or who buys YouTube? For the most part, we've had to build things that were naturally interesting and easy-to-use because otherwise, you would just switch to anything else because everything was free.So, some of that does infuse Google Cloud, “Let's just make this really easy to use. And let's just make sure that, maybe, you don't hate yourself when you're done jumping into a shell from the middle of the console.” It's like, that should be really easy to do—or upgrade a database, or make changes to things. So, I think some of the things we've learned from the consumer good side, have made their way to how we think of UX and design because maybe this stuff shouldn't be terrible.Corey: There's a trope going around, where I wound up talking about the next million cloud customers. And I'm going to have to write a sequel to it because it turns out that I've made a fundamental error, in that I've accepted the narrative that all of the large cloud vendors are pushing, to the point where I heard from so many folks I just accepted it unthinkingly and uncritically, and that's not what I should be doing. And we'll get to what I was wrong about in a minute, but the thinking goes that the next big growth area is large enterprises, specifically around corporate IT. And those are folks who are used to managing things in a GUI environment—which is fine—and clicking around in web apps. Now, it's easy to sit here on our high horse and say, “Oh, you should learn to write code,” or YAML, which is basically code. Cool.As an individual, I agree, someone should because as soon as they do that, they are now able to go out and take that skill to a more lucrative role. The company then has to backfill someone into the role that they just got promoted out of, and the company still has that dependency. And you cannot succeed in that market with a philosophy of, “Oh, you built something in the console. Now, throw it away and do it right.” Because that is maddening to that user persona. Rightfully so.I'm not that user persona and I find it maddening when I have to keep tripping over that particular thing. How did that come to be, from your perspective? First, do you think that is where the next million cloud customers come from? And have I adequately captured that user persona, or am I completely often the weeds somewhere?Richard: I mean, I shared your post internally when that one came out because that resonated with me of how we were thinking about it. Again, it's easy to think about the cloud-native operators, it's Spotify doing something amazing, or this team at Twitter doing something, or whatever. And it's not even to be disparaging. Like, look, I spent five years in enterprise IT and I was surrounded by operators who had to run dozen different systems; they weren't dedicated to just this thing or that. So, what are the tools that make my life easy?A lot of software just comes with UIs for quick install and upgrades, and how does that logic translate to this cloud world? I think that stuff does matter. How are you meeting these people a little better where they are? I think the hard part that we will always have in every cloud provider is—I think you've said this in different forums, but how do I not sometimes rub the data center on my cloud or vice versa? I also don't want to change the experience so much where I degrade it over the long term, I've actually somehow done something worse.So, can I meet those people where they are? Can we pull some of those experiences in, but not accidentally do something that kind of messes up the cloud experience? I mean, that's a fine line to walk. Does that make sense to you? Do you see where there's a… I don't know, you could accidentally cater to a certain audience too much, and change the experience for the worse?Corey: Yes, and no. My philosophy on it is that you have to meet customers where they are, but only to a point. At some point, what they're asking for becomes actively harmful or disadvantageous to wind up providing for them. “I want you to run my data center for me,” is on some level what some cloud environments look like, and I'm not going to sit here and tell people they're inherently wrong for that. Their big reason for moving to the cloud was because they keep screwing up replacing failed hard drives in their data center, so we're going to put it in the cloud.Is it more expensive that way? Well, sure in terms of actual cash outlay, it almost certainly is, but they're also not going down every month when a drive fails, so once the value of that? It's a capability story. That becomes interesting to me, and I think that trying to sit here in isolation, and say that, “Oh, this application is not how we would build it at Google.” And it's, “Yeah, you're Google. They are insert an entire universe of different industries that look nothing whatsoever like Google.” The constraints are different, the resources are different, and—Richard: Sure.Corey: —their approach to problem-solving are different. When you built out Google, and even when you're building out Google Cloud, look at some of the oldest craftiest stuff you have in your entire all of Google environment, and then remember that there are companies out there that are hundreds of years old. It's a different order of magnitude as far as era, as far as understanding of what's in the environment, and that's okay. It's a very broad and very diverse world.Richard: Yeah. I mean, that's, again, why I've been thinking more about migration than even some of the modernization piece. Should you bring your network architecture from on-prem to the cloud? I mean, I think most cases, no. But I understand sometimes that edge firewall, internal trust model you had on-prem, okay, trying to replicate that.So, yeah, like you say, I want to meet people where they are. Can we at least find some strategic leverage points to upgrade aspects of things as you get to a cloud, to save you from yourself in some places because all of a sudden, you have ten regions and you only had one data center before. So, many more rooms for mistakes. Where are the right guardrails? We're probably more opinionated than others at Google Cloud.I don't really apologize for that completely, but I understand. I mean, I think we've loosened up a lot more than maybe people [laugh] would have thought a few years ago, from being hyper-opinionated on how you run software.Corey: I will actually push back a bit on the idea that you should not replicate your on-premises data center in your cloud environment. Sure, are there more optimal ways to do it that are arguably more secure? Absolutely. But a common failure mode in moving from data center to cloud is, “All right, we're going to start embracing this entirely new cloud networking paradigm.” And it is confusing, and your team that knows how the data center network works really well are suddenly in way over their heads, and they're inadvertently exposing things they don't intend to or causing issues.The hard part is always people, not technology. So, when I glance at an environment and see things like that, perfect example, are there more optimal ways to do it? Oh, from a technology perspective, absolutely. How many engineers are working on that? What's their skill set? What's their position on all this? What else are they working on? Because you're never going to find a team of folks who are world-class experts in every cloud? It doesn't work that way.Richard: No doubt. No doubt, you're right. There's areas where we have to at least have something that's going to look similar, let you replicate aspects of it. I think it's—it'll just be interesting to watch, and I have enough conversations with customers who do ask, “Hey, where are the places we should make certain changes as we evolve?” And maybe they are tactical, and they're not going to be the big strategic redesign their entire thing. But it is good to see people not just trying to shovel everything from one place to the next.Corey: This episode is sponsored in part by something new. Cloud Academy is a training platform built on two primary goals. Having the highest quality content in tech and cloud skills, and building a good community the is rich and full of IT and engineering professionals. You wouldn't think those things go together, but sometimes they do. Its both useful for individuals and large enterprises, but here's what makes it new. I don't use that term lightly. Cloud Academy invites you to showcase just how good your AWS skills are. For the next four weeks you'll have a chance to prove yourself. Compete in four unique lab challenges, where they'll be awarding more than $2000 in cash and prizes. I'm not kidding, first place is a thousand bucks. Pre-register for the first challenge now, one that I picked out myself on Amazon SNS image resizing, by visiting cloudacademy.com/corey. C-O-R-E-Y. That's cloudacademy.com/corey. We're gonna have some fun with this one!Corey: Now, to follow up on what I was saying earlier, what I think I've gotten wrong by accepting the industry talking points on is that the next million cloud customers are big enterprises moving from data centers into the cloud. There's money there, don't get me wrong, but there is a larger opportunity in empowering the creation of companies in your environment. And this is what certain large competitors of yours get very wrong, where it's we're going to launch a whole bunch of different services that you get to build yourself from popsicle sticks. Great. That is not useful.But companies that are trying to do interesting things, or people who want to found companies to do interesting things, want something that looks a lot more turnkey. If you are going to be building cloud offerings, that for example, are terrific building blocks for SaaS companies, then it behooves you to do actual investments, rather than just a generic credit offer, into spurring the creation of those types of companies. If you want to build a company that does payroll systems, in a SaaS, cloud way, “Partner with us. Do it here. We will give you a bunch of credits. We will introduce you to your first ten prospective customers.”And effectively actually invest in a company success, as opposed to pitch-deck invest, which is, “Yeah, we'll give you some discounting and some credits, and that's our quote-unquote, ‘investment.'” actually be there with them as a partner. And that's going to take years for folks to wrap their heads around, but I feel like that is the opportunity that is significantly larger, even than the embedded existing IT space because rather than fighting each other for slices of the pie, I'm much more interested in expanding that pie overall. One of my favorite questions to get asked because I think it is so profoundly missing the point is, “Do you think it's possible for Google to go from number three to number two,” or whatever the number happens to be at some point, and my honest, considered answer is, “Who gives a shit?” Because number three, or number five, or number twelve—it doesn't matter to me—is still how many hundreds of billions of dollars in the fullness of time. Let's be real for a minute here; the total addressable market is expanding faster than any cloud or clouds are going to be able to capture all of.Richard: Yeah. Hey, look, whoever who'll be more profitable solving user problems, I really don't care about the final revenue number. I can be the number one cloud tomorrow by making Google Cloud free. What's the point? That's not a sustainable business. So, if you're just going for who can deploy the most VCPUs or who can deploy the most whatever, there's ways to game that. I want to make sure we are just uniquely solving problems better than anybody else.Corey: Sorry, forgive me. I just sort of zoned out for a second there because I'm just so taken aback and shocked by the idea of someone working at a large cloud provider who expresses a philosophy that isn't lying awake at night fretting over the possibility of someone who isn't them as making money somewhere.Richard: [laugh]. I mean, your idea there, it'll be interesting to watch, kind of, the maker's approach of are you enabling that next round of startups, the next round of people who want to take—I mean, honestly, I like the things we're doing building block-wise, even with our AI: we're not just handing you a vision API, we're giving you a loan processing AI that can process certain types of docs, that more packaged version of AI. Same with healthcare, same with whatever. I can imagine certain startups or a company idea going, “Hey, maybe I could disrupt or serve a new market.”I always love what Square did. They've disrupted emerging markets, small merchants here in North America, wherever, where I didn't need a big expensive point of sale system. You just gave me the nice, right building blocks to disrupt and run my business. Maybe Google Cloud can continue to provide better building blocks, but I do like your idea of actually investment zones, getting part of this. Maybe the next million users are founders and it's not just getting into some of these companies with, frankly, 10, 20, 30,000 people in IT.I think there's still plenty of room in these big enterprises to unlock many more of those companies, much more of their business. But to your point, there's a giant market here that we're not all grabbing yet. For crying out loud, there's tons of opportunity out here. This is not zero-sum.Corey: Take it a step further beyond that, and today, if you have someone who's enterprising, early on in their career, maybe they just got out of school, maybe they have just left their job and are ready to snap, or they have some severance money that they want to throw into something. Great. What do they want to do if they have an idea for a company? Well today, that answer looks a lot like, well, time to go to a boot camp and learn to code for six months so you can build a badly done MVP well enough to get off the ground and get some outside investment, and then go from there. Well, what if we cut that part out entirely?What if there were building blocks of I don't need to know or care that there's a database behind it, or what a database looks like. Picture Visual Basic in a web browser for building apps, and just take this bit of information I give you and store it and give it back to me later. Sure, you're going to have some significant challenges in the architecture or something like that as it goes from this thing that I'm talking about as an MVP to something planet-scale—like a Spotify for example—but that's not most businesses, and that's okay. Get out of the way and let people innovate and iterate on what it is they're doing more rapidly, and make it more accessible to teach people. That becomes huge; that gets the infrastructure bits that cloud providers excel at out of the way, and all it really takes is packaging those things into a golden path of what a given company of a particular profile should be doing, if—unless they have reason to deviate from it—and instead of having this giant paradox of choice issue, it's, “Oh, okay, I'll drag-drop, build things accordingly.”And under the hood, it's doing all the configuration of services and that's great. But suddenly, you've made being a founder of a software company—fundamentally—accessible to people who are not themselves software engineers. And I know that's anathema to some people, and I don't even slightly care because I am done with gatekeeping.Richard: Yeah. No, it's exciting if that can pull off. I mean, it's not the years ago where, how much capital was required to find the rack and do all sorts of things with tech, and hire some developers. And it's an amazing time to be software creators, now. The more we can enable that—yeah, I'm along for that journey, sign me up.Corey: I'm looking forward to seeing how it winds up shaking out. So, I want to talk a little bit about the paradox of choice problem that I just mentioned. If you take a look at the various compute services that every cloud provider offers, there are an awful lot of different choices as far as what you can run. There's the VM model, there's containers—if you're in AWS, you have 17 ways to run those—and you wind up—any of the serverless function story, and other things here and there, and managed services, I mean and honestly, Google has a lot of them, nowhere near as many as you do failed messaging products, but still, an awful lot of compute options. How do customers decide?What is the decision criteria that you see? Because the worst answer you can give someone who doesn't really know what they're doing is, “It depends,” because people don't know how to make that decision. It's, “What factors should I consider then, while making that decision?” And the answer has to be something somewhat authoritative because otherwise, they're going to go on the internet and get yelled at by everyone because no one is ever going to agree on this, except that everyone else is wrong.Richard: Mm-hm. Yeah, I mean, on one hand, look, I like that we intentionally have fewer choices than others because I don't think you need 17 ways to run a container. I think that's excessive. I think more than five is probably excessive because as a customer, what is the trade-off? Now, I would argue first off, I don't care if you have a lot of options as a vendor, but boy, the backends of those better be consistent.Meaning if I have a CI/CD tool in my portfolio and it only writes to two of them, shame on me. Then I should make sure that at least CI/CD, identity management, log management, monitoring, arguably your compute runtime should be a late-binding choice. And maybe that's blasphemous because somebody says, “I want to start up front knowing it's a function,” or, “I want to start it's a VM.” How about, as a developer, I couldn't care less. How about I just build cool software and maybe even at deploy time, I say, “This better fits in running in Kubernetes.” “This is better in a virtual machine.”And my cost of changing that later is meaningless because, hey, if it is in the container, I can switch it between three or four different runtimes, the identity management the same, it logs the exact same way, I can deploy CI/CD the same way. So, first off, if those things aren't the same, then the vendor is messing up. So, the customer shouldn't have to pay the cost of that. And then there gets to be other actual criteria. Look, I think you are looking at the workload itself, the team who makes it, and the strategy to figure out the runtime.It's easy for us. Google Compute Engine for VMs, containers go in GKE, managed services that need some containers, there are some apps around them, are Cloud Functions and Cloud Run. Like, it's fairly straightforward and it's going to be an OR situation—or an AND situation not an OR, which is great. But we're at least saying the premium way to run containers in Google Cloud for systems is GKE. There you go. If you do have a bunch of managed services in your architecture and you're stitching them together, then you want more serverless things like Cloud Run and Cloud Functions. And if you want to just really move some existing workload, GCE is your best choice. I like that that's fairly straightforward. There's still going to be some it depends, but it feels better than nine ways to run Kubernetes engines.Corey: I'm sure we'll see them in the fullness of time.Richard: [laugh].Corey: So, talk about Anthos a bit. That was a thing that was announced a while back and it was extraordinarily unclear what it was. And then I looked at the pricing and it was $10,000 a month with a one-year minimum commitment, and is like, “Oh, it's not for me. That's why I don't get it.” And I haven't really looked back at it since. But it is something else now. It almost feels like a wrapper brand, in some respects. How's it going? [unintelligible 00:29:26]?Richard: Yeah. Consumption, we'll talk more upcoming months on some of the adoption, but we're finally getting the hockey stick, which always comes delayed with platforms because nobody adopts platforms quickly. They buy the platform and a year later they start to actually build new development, migrate the things they have. So, we're starting to see the sort of growth. But back to your first point. And I even think I poorly tried to explain it a year ago with you. Basically, look, Anthos is the ability to manage fleets of GKE clusters, wherever they are. I don't care if they're on-prem, I don't care if they're in Google Cloud, I don't care if they're Amazon. We have one customer who only uses Anthos on AWS. Awesome, rock on.So, how do I put GKE clusters everywhere, but then do fleet management because look, some people are doing an app per cluster. They don't want to jam 50 apps in the cluster from different teams because they don't like the idea that this app requires root access; now you can screw around with mine. Or, you didn't update; that broke the cluster. I don't want any of that. So, you're going to see companies more, doing even app per cluster, app per developer per cluster.So, now I have a fleet problem. How do I keep it in sync? How do I make sure policy is consistent? Those sorts of things. So, Anthos is kind of solving the fleet management challenge and replacing people's first-gen app platform.Seeing a lot of those use cases, “Hey, we're retiring our first version of Docker Enterprise, Mesos, Cloud Foundry, even OpenShift,” saying, “All right, now's the time for our next version of our app platform. How about GKE, plus Cloud Run on top of it, plus other stuff?” Sounds good. So, going well is a, sort of—as you mentioned, there's a brand story here, mainly because we've also done two things that probably matter to you. A, we changed the price a lot.No minimum commit, remarkably at 20% of the cost it was when we launched, on purpose because we've gotten better at this. So, much cheaper, no minimum commit, pay as you go. Be on-premises, on bare metal with GKE. Pay by the hour, I don't care; sounds great. So, you can do that sort of stuff.But then more importantly, if you're a GKE customer and you just want config management, service mesh, things like that, now you can buy all of those independently as well. And Anthos is really the brand for fleet management of GKE. And if you're on Google Cloud only, it adds value. If you're off Google Cloud, if you're multi-cloud, I don't care. But I want to manage fleets of compute clusters and create them. We're going to keep doubling down on that.Corey: The big problem historically for understanding a lot of the adoption paradigm of Kubernetes has been that it was, to some extent, a reimagining of how Google ran and built software internally. And I thought at the time, the idea was—from a cynical perspective—that, “All right, well, your crappy apps don't run well on Google-style infrastructure so we're going to teach the entire world how to write software the way that we do.” And then you end up with people running their blog on top of Kubernetes, where it's one of those, like, the first blog post is, like, “How I spent the last 18 months building Kubernetes.” And, okay, that is certainly a philosophy and an approach, but it's almost approaching Windows 95 launch level of hype, where people who didn't own computers were buying copies of it, on some level. And I see the term come up in conversations in places where it absolutely has no place being brought up. “How do I run a Kubernetes cluster inside of my laptop?” And, “It's what you got going on in there, buddy?”Richard: [laugh].Corey: “What do you think you're trying to do here because you just said something that means something that I think is radically different to me than it is to you.” And again, I'm not here to judge other people's workflows; they're all terrible, except for mine, which is an opinion held by everyone about their own workflow. But understanding where people are, figuring out how to get there, how to meet customers where they are and empower them. And despite how heavily Google has been into the Kubernetes universe since its inception, you're very welcoming to companies—and loud-mouth individuals on Twitter—who have no use for Kubernetes. And working through various products you offer, I don't ever feel like a second-class citizen. There's really something impressive about that, of not letting the hype dictate the product and marketing decisions of it.Richard: Yeah, look, I think I tweeted it recently, I think the future of software is managed services with containers in the gap, for the most part. Whereas—if you can use managed services, please do. Use them wherever you can. And if you have to sling some code, maybe put it in a really portable thing that's really easy to run in lots of places. So, I think that's smart.But for us, look, I think we have the best container workflow from dev tools, and build tools, and artifact registries, and runtimes, but plenty of people are running containers, and you shouldn't be running Kubernetes all over the place. That makes sense for the workload, I think it's better than a VM at the retail edge. Can I run a small cluster, instead of a weird point-of-sale Windows app? Maybe. Maybe it makes sense to have a lightweight Kubernetes cluster there for consistency purposes.So, for me, I think it's a great medium for a subset of software. Google Cloud is going to take whatever you got, which is great. I think containers are great, but at the same time, I'm happily going to let you deploy a function that responds to you adding a storage item to a bucket, where at the same time give you a SaaS service that replaces the need for any code. All of those are terrific. So yeah, we love Kubernetes. We think it's great. We're going to be the best version to run it. But that's not going to be your whole universe.Corey: No, and I would argue it absolutely shouldn't be.Richard: [laugh]. Right. Agreed. Now again, for some companies, it's a great replacement for this giant fleet of VMs that all runs at eight percent utilization. Can I stick this into a bunch of high-density clusters? Absolutely you should. You're going to save an absolute fortune doing that and probably pick up some resilience and functionality benefits.But to your point, “Do I want to run a WordPress site in there?” I don't know, probably not. “Do I need to run my own MySQL?” I'd prefer you not do that. So, in a lot of cases, don't use it unless you have to. That should go for all compute nowadays. Use managed services.Corey: I'm a big believer in going down that approach just because it is so much easier than trying to build it yourself from popsicle sticks because you theoretically might have to move it someday in the future, even though you're not.Richard: [laugh]. Right.Corey: And it lets me feel better about a thing that isn't going to be used by anything that I'm doing in the near future. I just don't pretend to get it.Richard: No, I don't install a general purpose electric charger in my garage for any electric car I may get in the future; I charge for the one I have now. I just want it to work for my car; I don't want to plan for some mythical future. So yeah, premature optimization over architecture, or death in IT, especially nowadays where speed matters, don't waste your time building something that can run in nine clouds.Corey: Richard, I want to thank you for coming on again a year later to suffer my slings, arrows, and other various implements of misfortune. If people want to learn more about what you're doing, how you're doing it, possibly to pull a Forrest Brazeal and go work with you, where can they find you?Richard: Yeah, we're a fun place to work. So, you can find me on Twitter at @rseroter—R-S-E-R-O-T-E-R—hang out on LinkedIn, annoy me on my blog seroter.com as I try to at least explore our tech from time to time and mess around with it. But this is a fun place to work. There's a lot of good stuff going on here, and if you work somewhere else, too, we can still be friends.Corey: Thank you so much for your time today. Richard Seroter, director of outbound product management at Google. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment into which you have somehow managed to shove a running container.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Welcome to Day Two Cloud, where the topic is visibility. Hybrid cloud visibility with a side of Kubernetes, to be specific. VMware has come alongside as today's sponsor for a discussion about vRealize Operations Cloud to give you that visibility into applications and infrastructure running in complex, multi-cloud environments. The post Day Two Cloud 123: Managing Multi-Cloud Applications And Infrastructure With vRealize Operations Cloud (Sponsored) appeared first on Packet Pushers.
Welcome to Day Two Cloud, where the topic is visibility. Hybrid cloud visibility with a side of Kubernetes, to be specific. VMware has come alongside as today's sponsor for a discussion about vRealize Operations Cloud to give you that visibility into applications and infrastructure running in complex, multi-cloud environments. The post Day Two Cloud 123: Managing Multi-Cloud Applications And Infrastructure With vRealize Operations Cloud (Sponsored) appeared first on Packet Pushers.
Kelsey Hightower is a Principal Engineer at Google Cloud, open-source advocate, and one of our favorite speakers in tech. We hear about his early experiences with computers, thoughts on CS degree vs self-taught, running a tech support business, managing comedians, open-source, and his journey through tech. Regardless of where you are in your career, Kelsey drops knowledge on how to get yourself to where you want to go.Connect with Kelsey:https://twitter.com/kelseyhightowerhttps://github.com/kelseyhightowerMentioned in today's episode:Open Policy AgentKings of Comedy Search - Ronnie JordanTotal SystemsCoreOSPuppet LabsGo for Sysadmins - GopherCon 2014Want more from Ardan Labs?You can learn Go, Kubernetes, Docker & more through our video training, live events, or through our blog!
Software Engineering Daily invites Owen Frank Davis, Paul Davis, Kyle Davis, and Robbie Davis for a joint interview on the subject of reproduction and teething, as well as Lisch fascitis. Aledade and Kubernetes are both inconsiderate ideas for navigation. They need improvements in infrastructure. I prefer Dominaria to New Phyrexia (though I can get by The post #FREEZUCK | ?masked…?… !(DOCTORS)!! appeared first on Software Engineering Daily.
Software Engineering Daily invites Owen Frank Davis, Paul Davis, Kyle Davis, and Robbie Davis for a joint interview on the subject of reproduction and teething, as well as Lisch fascitis. Aledade and Kubernetes are both inconsiderate ideas for navigation. They need improvements in infrastructure. I prefer Dominaria to New Phyrexia (though I can get by The post #FREEZUCK | ?masked…?… !(DOCTORS)!! appeared first on Software Engineering Daily.
Software Engineering Daily invites Owen Frank Davis, Paul Davis, Kyle Davis, and Robbie Davis for a joint interview on the subject of reproduction and teething, as well as Lisch fascitis. Aledade and Kubernetes are both inconsiderate ideas for navigation. They need improvements in infrastructure. I prefer Dominaria to New Phyrexia (though I can get by The post #FREEZUCK | ?masked…?… !(DOCTORS)!! appeared first on Software Engineering Daily.
About Micheal Micheal Benedict leads Engineering Productivity at Pinterest. He and his team focus on developer experience, building tools and platforms for over a thousand engineers to effectively code, build, deploy and operate workloads on the cloud. Mr. Benedict has also built Infrastructure and Cloud Governance programs at Pinterest and previously, at Twitter -- focussed on managing cloud vendor relationships, infrastructure budget management, cloud migration, capacity forecasting and planning and cloud cost attribution (chargeback). Links: Pinterest: https://www.pinterest.com Teletraan: https://github.com/pinterest/teletraan Twitter: https://twitter.com/micheal Pinterestcareers.com: https://pinterestcareers.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: You know how git works right?Announcer: Sorta, kinda, not really. Please ask someone else!Corey: Thats all of us. Git is how we build things, and Netlify is one of the best way I've found to build those things quickly for the web. Netlify's git based workflows mean you don't have to play slap and tickle with integrating arcane non-sense and web hooks, which are themselves about as well understood as git. Give them a try and see what folks ranging from my fake Twitter for pets startup, to global fortune 2000 companies are raving about. If you end up talking to them, because you don't have to, they get why self service is important—but if you do, be sure to tell them that I sent you and watch all of the blood drain from their faces instantly. You can find them in the AWS marketplace or at www.netlify.com. N-E-T-L-I-F-Y.comCorey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats v-u-l-t-r.com slash screaming.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Every once in a while, I like to talk to people who work at very large companies that are not in fact themselves a cloud provider. I know it sounds ridiculous. How can you possibly be a big company and not make money by selling managed NAT gateways to an unsuspecting public? But I'm told it can be done here to answer that question. And hopefully at least one other is Pinterest. It's head of engineering productivity, Micheal Benedict. Micheal, thank you for taking the time to join me today.Micheal: Hi, Corey, thank you for inviting me today. I'm really excited to talk to you.Corey: So, exciting times at Pinterest in a bunch of different ways. It was recently reported—which of course, went right to the top of my inbox as 500,000 people on Twitter all said, “Hey, this sounds like a ‘Corey would be interested in it' thing.” It was announced that you folks had signed a $3.2 billion commitment with AWS stretching until 2028. Now, if this is like any other large-scale AWS contract commitment deal that has been made public, you were probably immediately inundated with a whole bunch of people who are very good at arithmetic and not very good at business context saying, “$3.2 billion? You could build massive data centers for that. Why would anyone do this?” And it's tiresome, and that's the world in which we live. But I'm guessing you heard at least a little bit of that from the peanut gallery.Micheal: I did, and I always find it interesting when direct comparisons are made with the total amount that's been committed. And like you said, there's so many nuances that go into how to perceive that amount, and put it in context of, obviously, what Pinterest does. So, I at least want to take this opportunity to share with everyone that Pinterest has been on the cloud since day one. When Ben initially started the company, that product was launched—it was a simple Django app—it was launched on AWS from day one, and since then, it has grown to support 450-plus million MAUs over the course of the decade.And our infrastructure has grown pretty complex. We started with a bunch of EC2 machines and persisting data in S3, and since then we have explored an array of different products, in fact, sometimes working very closely with AWS, as well and helping them put together a product roadmap for some of the items they're working on as well. So, we have an amazing partnership with them, and part of the commitment and how we want to see these numbers is how does it unlock value for Pinterest as a business over time in terms of making us much more agile, without thinking about the nuances of the infrastructure itself. And that's, I think, one of the best ways to really put this into context, that it's not a single number we pay at the end [laugh] of the month, but rather, we are on track to spending a certain amount over a period of time, so this just keeps accruing or adding to that number. And we basically come out with an amazing partnership in AWS, where we have that commitment and we're able to leverage their products and full suite of items without any hiccups.Corey: The most interesting part of what you said is the word partner. And I think that's the piece that gets lost an awful lot when we talk about large-scale cloud negotiations. It's not like buying a car, where you can basically beat the crap out of the salesperson, you can act as if $400 price difference on a car is the difference between storm out of the dealership and sign the contract. Great, you don't really have to deal with that person ever again.In the context of a cloud provider, they run your production infrastructure, and if they have a bad day, I promise you're going to have a bad day, too. You want to handle those negotiations in a way that is respectful of that because they are your partner, whether you want them to be or not. Now, I'm not suggesting that any cloud provider is going to hold an awkward negotiation against the customer, but at the same time, there are going to be scenarios in which you're going to want to have strong relationships, where you're going to need to cash in political capital to some extent, and personally, I've never seen stupendous value in trying to beat the crap out of a company in order to get another tenth of a percent discount on a service you barely use, just because someone decided that well, we didn't do well in the last negotiation so we're going to get them back this time.That's great. What are you actually planning to do as a company? Where are you going? And the fact that you just alluded to, that you're not just a pile of S3 and EC2 instances speaks, in many ways, to that. By moving into the differentiated service world, suddenly you're able to do things that don't look quite as much like building a better database and start looking a lot more like servicing your users more effectively and well.Micheal: And I think, like you said, I feel like there's like a general skepticism in viewing that the cloud providers are usually out there to rip you apart. But in reality, that's not true. To your point, as part of the partnership, especially with AWS and Pinterest, we've got an amazing relationship going on, and behind the scenes, there's a dedicated team at Pinterest, called the Infrastructure Governance Team, a cross-functional team with folks from finance, legal, engineering, product, all sitting together and working with our AWS partners—even the AWS account managers at the times are part of that—to help us make both Pinterest successful, and in turn, AWS gets that amazing customer to work with in helping build some of their newer products as well. And that's one of the most important things we have learned over time is that there's two parts to it; when you want to help improve your business agility, you want to focus not just on the bottom line numbers as they are. It's okay to pay a premium because it offsets the people capital you would have to invest in getting there.And that's a very tricky way to look at math, but that's what these teams do; they sit down and work through those specifics. And for what it's worth, in our conversations, the AWS teams always come back with giving us very insightful data on how we're using their systems to help us better think about how we should be pricing or looking things ahead. And I'm not the expert on this; like I said, there's a dedicated team sitting behind this and looking through and working through these deals, but that's one of the important takeaways I hope the users—or the listeners of this podcast then take away that you want to treat your cloud provider as your partner as much as possible. They're not always there to screw you. That's not their goal. And I apologize for using that term. It is important that you set that expectations that it's in their best interest to actually make you successful because that's how they make money as well.Corey: It's a long-term play. I mean, they could gouge you this quarter, and then you're trying to evacuate as fast as possible. Well, they had a great quarter, but what's their long-term prospect? There are two competing philosophies in the world of business; you can either make a lot of money quickly, or you can make a little bit of money and build it over time in a sustained way. And it's clear the cloud providers are playing the long game on this because they basically have to.Micheal: I mean, it's inevitable at this point. I mean, look at Pinterest. It is one of those success stories. Starting as a Django app on a bunch of EC2 machines to wherever we are right now with having a three-plus billion dollar commitment over a span of couple of years, and we do spend a pretty significant chunk of that on a yearly basis. So, in this case, I'm sure it was a great successful partnership.And I'm hoping some of the newer companies who are building the cloud from the get-go are thinking about it from that perspective. And one of the things I do want to call out, Corey, is that we did initially start with using the primitive services in AWS, but it became clear over time—and I'm sure you heard of the term multi-cloud and many of that—you know, when companies start evaluating how to make the most out of the deals they're negotiating or signing, it is important to acknowledge that the cost of any of those evaluations or even thinking about migrations never tends to get factored in. And we always tend to treat that as being extremely simple or not, but those are engineering resources you want to be spending more building on the product rather than these crazy costly migrations. So, it's in your best interest probably to start using the most from your cloud provider, and also look for opportunities to use other cloud providers—if they provide more value in certain product offerings—rather than thinking about a complete lift-and-shift, and I'm going to make DR as being the primary case on why I want to be moving to multi-cloud.Corey: Yeah. There's a question, too, of the numbers on paper look radically different than the reality of this. You mentioned, Pinterest has been on AWS since the beginning, which means that even if an edict had been passed at the beginning, that, “Thou shalt never build on anything except EC2 and S3. The end. Full stop.”And let's say you went down that rabbit hole of, “Oh, we don't trust their load balancers. We're going to build our own at home. We have load balancers at home. We'll use those.” It's terrible, but even had you done that and restricted yourselves just to those baseline building blocks, and then decide to do a cloud migration, you're still looking back at over a decade of experience where the app has been built unconsciously reflecting the various failure modes that AWS has, the way that it responds to API calls, the latency in how long it takes to request something versus it being available, et cetera, et cetera.So, even moving that baseline thing to another cloud provider is not a trivial undertaking by any stretch of the imagination. But that said—because the topic does always come up, and I don't shy away from it; I think it's something people should go into with an open mind—how has the multi-cloud conversation progressed at Pinterest? Because there's always a multi-cloud conversation.Micheal: We have always approached it with some form of… openness. It's not like we don't want to be open to the ideas, but you really want to be thinking hard on the business case and the business value something provides on why you want to be doing x. In this case, when we think about multi-cloud—and again, Pinterest did start with EC2 and S3, and we did keep it that way for a long time. We built a lot of primitives around it, used it—for example, my team actually runs our bread and butter deployment system on EC2. We help facilitate deployments across a 100,000-plus machines today.And like you said, we have built that system keeping in mind how AWS works, and understanding the nuances of region and AZ failovers and all of that, and help facilitate deployments across 1000-plus microservices in the company. So, thinking about leveraging, say, a Google Cloud instance and how that works, in theory, we can always make a case for engineering to build our deployment system and expand there, but there's really no value. And one of the biggest cases, usually, when multi-cloud comes in is usually either negotiation for price or actually a DR strategy. Like, what if AWS goes down in and us-east-1? Well, let's be honest, they're powering half the internet [laugh] from that one single—Corey: Right.Micheal: Yeah. So, if you think your business is okay running when AWS goes down and half the internet is not going to be working, how do you want to be thinking about that? So, DR is probably not the best reason for you to be even exploring multi-cloud. Rather, you should be thinking about what the cloud providers are offering as a very nuanced offering which your current cloud provider is not offering, and really think about just using those specific items.Corey: So, I agree that multi-cloud for DR purposes is generally not necessarily the best approach with the idea of being able to failover seamlessly, but I like the idea for backups. I mean, Pinterest is a publicly-traded company, which means that among other things, you have to file risk disclosures and be responsive to auditors in a variety of different ways. There are some regulations to start applying to you. And the idea of, well, AWS builds things out in a super effective way, region separation, et cetera, whenever I talk to Amazonians, they are always surprised that anyone wouldn't accept that, “Oh, if you want backups use a different region. Problem solved.”Right, but it is often easier for me to have a rehydrate the business level of backup that would take weeks to redeploy living on another cloud provider than it is for me to explain to all of those auditors and regulators and financial analysts, et cetera why I didn't go ahead and do that path. So, there's always some story for okay, what if AWS decides that they hate us and want to kick us off the platform? Well, that's why legal is involved in those high-level discussions around things like risk, and indemnity, and termination for convenience and for cause clauses, et cetera, et cetera. The idea of making an all-in commitment to a cloud provider goes well beyond things that engineering thinks about. And it's easy for those of us with engineering backgrounds to be incredibly dismissive of that of, “Oh, indemnity? Like, when does AWS ever lose data?” “Yeah, but let's say one day they do. What is your story going to be when asked some very uncomfortable questions by people who wanted you to pay attention to this during the negotiation process?” It's about dotting the i's and crossing the t's, especially with that many commas in the contractual commitments.Micheal: No, it is true. And we did evaluate that as an option, but one of the interesting things about compliance, and especially auditing as well, we generally work with the best in class consultants to help us work through the controls and how we audit, how we look at these controls, how to make sure there's enough accountability going through. The interesting part was in this case, as well, we were able to work with AWS in crafting a lot of those controls and setting up the right expectations as and when we were putting proposals together as well. Now, again, I'm not an expert on this and I know we have a dedicated team from our technical program management organization focused on this, but early on we realized that, to your point, the cost of any form of backups and then being able to audit what's going in, look at all those pipelines, how quickly we can get the data in and out it was proving pretty costly for us. So, we were able to work out some of that within the constructs of what we have with our cloud provider today, and still meet our compliance goals.Corey: That's, on some level, the higher point, too, where everything is everything comes down to context; everything comes down to what the business demands, what the business requires, what the business will accept. And I'm not suggesting that in any case, they're wrong. I'm known for beating the ‘Multi-cloud is a bad default decision' drum, and then people get surprised when they'll have one-on-one conversations, and they say, “Well, we're multi-cloud. Do you think we're foolish?” “No. You're probably doing the right thing, just because you have context that is specific to your business that I, speaking in a general sense, certainly don't have.”People don't generally wake up in the morning and decide they're going to do a terrible job or no job at all at work today, unless they're Facebook's VP of Integrity. So, it's not the sort of thing that lends itself to casual tweet size, pithy analysis very often. There's a strong dive into what is the level of risk a business can accept? And my general belief is that most companies are doing this stuff right. The universal constant in all of my consulting clients that I have spoken to about the in-depth management piece of things is, they've always asked the same question of, “So, this is what we've done, but can you introduce us to the people who are doing it really right, who have absolutely nailed this and gotten it all down?” “It's, yeah, absolutely no one believes that that is them, even the folks who are, from my perspective, pretty close to having achieved it.”But I want to talk a bit more about what you do beyond just the headline-grabbing large dollar figure commitment to a cloud provider story. What does engineering productivity mean at Pinterest? Where do you start? Where do you stop?Micheal: I want to just quickly touch upon that last point about multi-cloud, and like you said, every company works within the context of what they are given and the constraints of their business. It's probably a good time to give a plug to my previous employer, Twitter, who are doing multi-cloud in a reasonably effective way. They are on the data centers, they do have presence on Google Cloud, and AWS, and I know probably things have changed since a couple of years now, but they have embraced that environment pretty effectively to cater to their acquisitions who were on the public cloud, help obviously, with their initial set of investments in the data center, and still continue to scale that out, and explore, in this case, Google Cloud for a variety of other use cases, which sounds like it's been extremely beneficial as well.So, to your point, there is probably no right way to do this. There's always that context, and what you're working with comes into play as part of making these decisions. And it's important to take a lot of these with a grain of salt because you can never understand the decisions, why they were made the way they were made. And for what it's worth, it sort of works out in the end. [laugh]. I've rarely heard a story where it's never worked out, and people are just upset with the deals they've signed. So, hopefully, that helps close that whole conversation about multi-cloud.Corey: I hope so. It's one of those areas where everyone has an opinion and a lot of them do not necessarily apply universally, but it's always fun to take—in that case, great, I'll take the lesser trod path of everyone's saying multi-cloud is great, invariably because they're trying to sell you something. Yeah, I have nothing particularly to sell, folks. My argument has always been, in the absence of a compelling reason not to, pick a provider and go all in. I don't care which provider you pick—which people are sometimes surprised to hear.It's like, “Well, what if they pick a cloud provider that you don't do consulting work for?” Yeah, it turns out, I don't actually need to win every AWS customer over to have a successful working business. Do what makes sense for you, folks. From my perspective, I want this industry to be better. I don't want to sit here and just drum up business for myself and make self-serving comments to empower that. Which apparently is a rare tactic.Micheal: No, that's totally true, Corey. One of the things you do is help people with their bills, so this has come up so many times, and I realize we're sort of going off track a bit from that engineering productivity discussion—Corey: Oh, which is fine. That's this entire show's theme, if it has one.Micheal: [laugh]. So, I want to briefly just talk about the whole billing and how cost management works because I know you spend a lot of time on that and you help a lot of these companies be effective in how they manage their bills. These questions have come up multiple times, even at Pinterest. We actually in the past, when I was leading the infrastructure governance organization, we were working with other companies of our similar size to better understand how they are looking into getting visibility into their cost, setting sort of the right controls and expectations within the engineering organization to plan, and capacity plan, and effectively meet those plans in a certain criteria, and then obviously, if there is any risk to that, actively manage risk. That was like the biggest thing those teams used to do.And we used to talk a lot trade notes, and get a better sense of how a lot of these companies are trying to do—for example, Netflix, or Lyft, or Stripe. I recall Netflix, content was their biggest spender, so cloud spending was like way down in the list of things for them. [laugh]. But regardless, they had an active team looking at this on a day-to-day basis. So, one of the things we learned early on at Pinterest is that start investing in those visibility tools early on.No one can parse the cloud bills. Let's be honest. You're probably the only person who can reverse… [laugh] engineer an architecture diagram from a cloud bill, and I think that's like—definitely you should take a patent for that or something. But in reality, no one has the time to do that. You want to make sure your business leaders, from your finance teams to engineering teams to head of the executives all have a better understanding of how to parse it.So, investing engineering resources, take that data, how do you munch it down to the cost, the utilization across the different vectors of offerings, and have a very insightful discussion. Like, what are certain action items we want to be taking? It's very easy to see, “Oh, we overspent EC2,” and we want to go from there. But in reality, that's not just that thing; you will start finding out that EC2 is being used by your Hadoop infrastructure, which runs hundreds of thousands of jobs. Okay, now who's actually responsible for that cost? You might find that one job which is accruing, sort of, a lot of instance hours over a period of time and a shared multi-tenant environment, how do you attribute that cost to that particular cost center?Corey: And then someone left the company a while back, and that job just kept running in perpetuity. No one's checked the output for four years, I guess it can't be that necessarily important. And digging into it requires context. It turns out, there's no SaaS tool to do this, which is unfortunate for those of us who set out originally to build such a thing. But we discovered pretty early on the context on this stuff is incredibly important.I love the thing you're talking about here, where you're discussing with your peer companies about these things because the advice that I would give to companies with the level of spend that you folks do is worlds apart from what I would advise someone who's building something new and spending maybe 500 bucks a month on their cloud bill. Those folks do not need to hire a dedicated team of people to solve for these problems. At your scale, yeah, you probably should have had some people in [laugh] here looking at this for a while now. And at some point, the guidance changes based upon scale. And if there's one thing that we discover from the horrible pages of Hacker News, it's that people love applying bits of wisdom that they hear in wildly inappropriate situations.How do you think about these things at that scale? Because, a simple example: right now I spend about 1000 bucks a month at The Duckbill Group, on our AWS bill. I know. We have one, too. Imagine that. And if I wind up just committing admin credentials to GitHub, for example, and someone compromises that and start spinning things up to mine all the Bitcoin, yeah, I'm going to notice that by the impact it has on the bill, which will be noticeable from orbit.At the level of spend that you folks are at, at company would be hard-pressed to spin up enough Bitcoin miners to materially move the billing needle on a month-to-month basis, just because of the sheer scope and scale. At small bill volumes, yeah, it's pretty easy to discover the thing that spiking your bill to three times normal. It's usually a managed NAT gateway. At your scale, tripling the bill begins to look suspiciously like the GDP of a small country, so what actually happened here? Invariably, at that scale, with that level of massive multiplier, it's usually the simplest solution, an error somewhere in the AWS billing system. Yes, they exist. Imagine that.Micheal: They do exist, and we've encountered that.Corey: Kind of heartstopping, isn't it?Micheal: [laugh]. I don't know if you remember when we had the big Spectre and the Meltdown, right, and those were interesting scenarios for us because we had identified a lot of those issues early on, given the scale we operate, and we were able to, sort of, obviously it did have an impact on the builds and everything, but that's it; that's why you have these dedicated teams to fix that. But I think one of the points you made, these are large bills and you're never going to have a 3x jump the next day. We're not going to be seeing that. And if that happens, you know, God save us. [laugh].But to your point, one of the things we do still want to be doing is look at trends, literally on a week-over-week basis because even a one percentage move is a pretty significant amount, if you think about it, which could be funding some other aspects of the business, which we would prefer to be investing on. So, we do want to have enough rigor and controls in place in our technical stack to identify and alert when something is off track. And it becomes challenging when you start using those higher-order services from your public cloud provider because there's no clear insights on how do you, kind of, parse that information. One of the biggest challenges we had at Pinterest was tying ownership to all these things.No, using tags is not going to cut it. It was so difficult for us to get to a point where we could put some sense of ownership in all the things and the resources people are using, and then subsequently have the right conversation with our ads infrastructure teams, or our product teams to help drive the cost improvements we want to be seeing. And I wouldn't be surprised if that's not a challenge already, even for the smaller companies who have bills in the tunes of tens and thousands, right?Corey: It is. It's predicting the spend and trying to categorize it appropriately; that's the root of all AWS bill panic on the corporate level. It's not that the bill is 20% higher, so we're going to go broke. Most companies spend far more on payroll than they do on infrastructure—as you mentioned with Netflix, content is a significantly larger [laugh] expense than any of those things; real estate, it's usually right up there too—but instead it's, when you're trying to do business forecasting of, okay, if we're going to have an additional 1000 monthly active users, what will the cost for us be to service those users and, okay, if we're seeing a sudden 20% variance, if that's the new normal, then well, that does change our cost projections for a number of years, what happens? When you're public, there starts to become the question of okay, do we have to restate earnings or what's the deal here?And of course, all this sidesteps past the unfortunate reality that, for many companies, the AWS bill is not a function of how many customers you have; it's how many engineers you hired. And that is always the way it winds up playing out for some reason. “It's why did we see a 10% increase in the bill? Yeah, we hired another data science team. Oops.” It's always seems to be the data science folks; I know I'd beat up on those folks a fair bit, and my apologies. And one day, if they analyze enough of the data, they might figure out why.Micheal: So, this is where I want to give a shout out to our data science team, especially some of the engineers working in the Infrastructure Governance Team putting these charts together, helping us derive insights. So, definitely props to them.I think there's a great segue into the point you made. As you add more engineers, what is the impact on the bottom line? And this is one of the things actually as part of engineering productivity, we think about as well on a long-term basis. Pinterest does have over 1000-plus engineers today, and to large degree, many of them actually have their own EC2 instances today. And I wouldn't say it's a significant amount of cost, but it is a large enough number, were shutting down a c5.9xl can actually fund a bunch of conference tickets or something else.And then you can imagine that sort of the scale you start working with at one point. The nuance here is though, you want to make sure there's enough flexibility for these engineers to do their local development in a sustainable way, but when moving to, say production, we really want to tighten the flexibility a bit so they don't end up doing what you just said, spin up a bunch of machines talking to the API directly which no one will be aware of.I want to share a small anecdote because when back in the day, this was probably four years ago, when we were doing some analysis on our bills, we realized that there was a huge jump every—I believe Wednesday—in our EC2 instances by almost a factor of, like, 500 to 600 instances. And we're like, “Why is this happening? What is going on?” And we found out there was an obscure job written by someone who had left the company, calling an EC2 API to spin up a search cluster of 500 machines on-demand, as part of pulling that ETL data together, and then shutting that cluster down. Which at times didn't work as expected because, you know, obviously, your Hadoop jobs are very predictable, right?So, those are the things we were dealing with back in the day, and you want to make sure—since then—this is where engineering productivity as team starts coming in that our job is to enable every engineer to be doing their best work across code building and deploying the services. And we have done this.Corey: Right. You and I can sit here and have an in-depth conversation about the intricacies of AWS billing in a bunch of different ways because in different ways we both specialize in it, in many respects. But let's say that Pinterest theoretically was foolish enough to hire me before I got into this space as an engineer, for terrifying reasons. And great. I start day one as a typical software developer if such a thing could be said to exist. How do you effectively build guardrails in so that I don't inadvertently wind up spinning up all the EC2 instances available to me within an account, which it turns out are more than one might expect sometimes, but still leave me free to do my job without effectively spending a nine-month safari figuring out how AWS bills work?Micheal: And this is why teams like ours exist, to help provide those tools to help you get started. So today, we actually don't let anyone directly use AWS APIs, or even use the UI for that matter. And I think you'll soon realize, the moment you hit, like, probably 30 or 40 people in your organization, you definitely want to lock it down. You don't want that access to be given to anyone or everyone. And then subsequently start building some higher-order tools or abstraction so people can start using that to control effectively.In this case, if you're a new engineer, Corey, which it seems like you were, at some point—Corey: I still write code like I am, don't worry.Micheal: [laugh]. So yes, you would get access to our internal tool to actually help spin up what we call is a dev app, where you get a chance to, obviously, choose the instance size, not the instance type itself, and we have actually constrained the instance types we have approved within Pinterest as well. We don't give you the entire list you get a chance to choose and deploy to. We actually have constraint to based on the workload types, what are the instance types we want to support because in the future, if we ever want to move from c3 to c5—and I've been there, trust me—it is not an easy thing to do, so you want to make sure that you're not letting people just use random instances, and constrain that by building some of these tools. As a new engineer, you would go in, you'd use the tool, and actually have a dev app provisioned for you with our Pinterest image to get you started.And then subsequently, we'll obviously shut it down if we see you not being using it over a certain amount of time, but those are sort of the guardrails we've put in over there so you never get a chance to directly ever use the EC2 APIs, or any of those AWS APIs to do certain things. The similar thing applies for S3 or any of the higher-order tools which AWS will provide, too.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of "Hello, World" demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking databases, observability, management, and security.And - let me be clear here - it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself all while gaining the networking load, balancing and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build.With Always Free you can do things like run small scale applications, or do proof of concept testing without spending a dime. You know that I always like to put asterisks next to the word free. This is actually free. No asterisk. Start now. Visit https://snark.cloud/oci-free that's https://snark.cloud/oci-free.Corey: How does that interplay with AWS launches yet another way to run containers, for example, and that becomes a valuable potential avenue to get some business value for a developer, but the platform you built doesn't necessarily embrace that capability? Or they release a feature to an existing tool that you use that could potentially be a just feature capability story, much more so than a cost savings one. How do you keep track of all of that and empower people to use those things so they're not effectively trying to reimplement DynamoDB on top of EC2?Micheal: That's been a challenge, actually, in the past for us because we've always been very flexible where engineers have had an opportunity to write their own solutions many a times rather than leveraging the AWS services, and of late, that's one of the reasons why we have an infrastructure organization—an extremely lean organization for what it's worth—but then still able to achieve outsized outputs. Where we evaluate a lot of these use cases, as they come in and open up different aspects of what we want to provide say directly from AWS, or build certain abstractions on top of it. Every time we talk about containers, obviously, we always associate that with something like Kubernetes and offerings from there on; we realized that our engineers directly never ask for those capabilities. They don't come in and say, “I need a new container orchestration system. Give that to me, and I'm going to be extremely productive.”What people actually realize is that if you can provide them effective tools and that can help them get their job done, they would be happy with it. For example, like I said, our deployment system, which is actually an open-source system called Teletraan. That is the bread and butter at Pinterest at which my team runs. We operate 100,000-plus machines. We have actually looked into container orchestration where we do have a dedicated Kubernetes team looking at it and helping certain use cases moved there, but we realized that the cost of entire migrations need to be evaluated against certain use cases which can benefit from being on Kubernetes from day one. You don't want to force anyone to move there, but give them the right incentives to move there. Case in point, let's upgrade your OS. Because if you're managing machines, obviously everyone loves to upgrade their OSes.Corey: Well, it's one of the things I love savings plans versus RIs; you talk about the c3 to c5 migration and everyone has a story about one of those, but the most foolish or frustrating reason that I ever saw not to do the upgrade was what we bought a bunch of Reserved Instances on the C3s and those have a year-and-a-half left to run. And it's foolish not on the part of customers—it's economically sound—but on the part of AWS where great, you're now forcing me to take a contractual commitment to something that serves me less effectively, rather than getting out of the way and letting me do my job. That's why it's so important to me at least, that savings plans cover Fargate and Lambda, I wish they covered SageMaker instead of SageMaker having its own thing because once again, you're now architecturally constrained based upon some ridiculous economic model that they have imposed on us. But that's a separate rant for another time.Micheal: No, we actually went through that process because we do have a healthy balance of how we do Reserved Instances and how we look at on-demand. We've never been big users have spot in the past because just the spot market itself, we realized that putting that pressure on our customers to figure out how to manage that is way more. When I say customers, in this case, engineers within the organization.Corey: Oh, yes. “I want to post some pictures on Pinterest, so now I have to understand the spot market. What?” Yeah.Micheal: [laugh]. So, in this case, when we even we're moving from C3 to C5—and this is where the partnership really plays out effectively, right, because it's also in the best interest of AWS to deprecate their aging hardware to support some of these new ones where they could also be making good enough premium margins for what it's worth and give the benefit back to the user. So, in this case, we were able to work out an extremely flexible way of moving to a C5 as soon as possible, get help from them, actually, in helping us do that, too, allocating capacity and working with them on capacity management. I believe at one point, we were actually one of the largest companies with a C3 footprint and it took quite a while for us to move to C5. But rest assured, once we moved, the savings was just immense. We were able to offset any of those RI and we were able to work behind the scenes to get that out. But obviously, not a lot of that is considered in a small-scale company just because of, like you said, those constraints which have been placed in a contractual obligation.Corey: Well, this is an area in which I will give the same guidance to companies of your scale as well as small-scale companies. And by small-scale, I mean, people on the free tier account, give or take, so I do mean the smallest of the small. Whenever you wind up in a scenario where you find yourself architecturally constrained by an economic barrier like this, reach out to your account manager. I promise you have one. Every account, even the tiny free tier accounts, have an account manager.I have an account manager, who I have to say has probably one of the most surreal jobs that AWS, just based upon the conversations I throw past him. But it's reaching out to your provider rather than trying to solve a lot of this stuff yourself by constraining how you're building things internally is always the right first move because the worst case is you don't get anywhere in those conversations. Okay, but at least you explored that, as opposed to what often happens is, “Oh, yeah. I have a switch over here I can flip and solve your entire problem. Does that help anything?”Micheal: Yeah.Corey: You feel foolish finding that out only after nine months of dedicated work, it turns out.Micheal: Which makes me wonder, Corey. I mean, do you see a lot of that happening where folks don't tend to reach out to their account managers, or rather treat them as partners in this case, right? Because it sounds like there is this unhealthy tension, I would say, as to what is the best help you could be getting from your account managers in this case.Corey: Constantly. And the challenge comes from a few things, in my experience. The first is that the quality of account managers and the technical account managers—the folks who are embedded many cases with your engineering teams in different ways—does vary. AWS is scaling wildly and bursting at the seams, and people are hard to scale.So, some are fantastic, some are decidedly less so, and most folks fall somewhere in the middle of that bell curve. And it doesn't take too many poor experiences for the default to be, “Oh, those people are useless. They never do anything we want, so why bother asking them?” And that leads to an unhealthy dynamic where a lot of companies will wind up treating their AWS account manager types as a ticket triage system, or the last resort of places that they'll turn when they should be involved in earlier conversations.I mean, take Pinterest as an example of this. I'm not sure how many technical account managers you have assigned to your account, but I'm going to go out on a limb and guess that the ratio of technical account managers to engineers working on the environment is incredibly lopsided. It's got to be a high ratio just because of the nature of how these things work. So, there are a lot of people who are actively working on things that would almost certainly benefit from a more holistic conversation with your AWS account team, but it doesn't occur to them to do it just because of either perceived biases around levels of competence, or poor experiences in the past, or simply not knowing the capabilities that are there. If I could tell one story around the AWS account management story, it would be talk to folks sooner about these things.And to be clear, Pinterest has this less than other folks, but AWS does themselves no favors by having a product strategy of, “Yes,” because very often in service of those conversations with a number of companies, there is the very real concern of are they doing research so that they can launch a service that competes with us? Amazon as a whole launching a social network is admittedly one of the most hilarious ideas I [laugh] can come up with and I hope they take a whack at it just to watch them learn all these lessons themselves, but that is again, neither here nor there.Micheal: That story is very interesting, and I think you mentioned one thing; it's just that lack of trust, or even knowing what the account managers can actually do for you. There seems to be just a lack of education on that. And we also found it the hard way, right? I wouldn't say that Pinterest figured this out on day one. We evolved sort of a relationship over time. Yes, our time… engagements are, sort of, lopsided, but we were able to negotiate that as part of deals as we learned a bit more on what we can and we cannot do, and how these individuals are beneficial for Pinterest as well. And—Corey: Well, here's a question for you, without naming names—and this might illustrate part of the challenge customers have—how long has your account manager—not the technical account managers, but your account manager—been assigned to your account?Micheal: I've been at Pinterest for five years and I've been working with the same person. And he's amazing.Corey: Which is incredibly atypical. At a lot of smaller companies, it feels like, “Oh, I'm your account manager being introduced to you.” And, “Are you the third one this year? Great.” What happens is that if the account manager excels, very often they get promoted and work with a smaller number of accounts at larger spend, and whereas if they don't find that AWS is a great place for them for a variety of reasons, they go somewhere else and need to be backfilled.So, at the smaller account, it's, “Great. I've had more account managers in a year than you've had in five.” And that is often the experience when you start seeing significant levels of rotation, especially on the customer engineering side where you wind up with you have this big kickoff, and everyone's aware of all the capabilities and you look at it three years later, and not a single person who was in that kickoff is still involved with the account on either side, and it's just sort of been evolving evolutionarily from there. One thing that we've done in some of our larger accounts as part of our negotiation process is when we see that the bridges have been so thoroughly burned, we will effectively request a full account team cycle, just because it's time to get new faces in where the customer, in many cases unreasonably, is not going to say, “Yeah but a year-and-a-half ago you did this terrible thing and we're still salty about it.” Fine, whatever. I get it. People relationships are hard. Let's go ahead and swap some folks out so that there are new faces with new perspectives because that helps.Micheal: Well, first off, if you had so many switches in account manager, I think that's something speaks about [laugh] how you've been working, too. I'm just kidding. There are a bu—Corey: Entirely possible. In seriousness, yes. But if you talk to—like, this is not just me because in my case, yeah, I feel like my account manager is whoever drew the short straw that week because frankly, yeah, that does seem like a great punishment to wind up passing out to someone who is underperforming. But for a lot of folks who are in the mid-tier, like, spending $50 to $100,000 a month, this is a very common story.Micheal: Yeah. Actually, we've heard a bit about this, too. And like you said, I think maintaining context is the most thing. You really want your account manager to vouch for you, really be your champion in those meetings because AWS, like you said is so large, getting those exec time, and reviews, and there's so many things that happen, your account manager is the champion for you, or right there. And it's important and in fact in your best interest to have a great relationship with them as well, not treat them as, oh yet another vendor.And I think that's where things start to get a bit messy because when you start treating them as yet another vendor, there is no incentive for them to do the best for you, too. You know, people relationships are hard. But that said though, I think given the amount of customers like these cloud companies are accruing, I wouldn't be surprised; every account manager seems to be extremely burdened. Even in our case, although I've been having a chance to work with this one person for a long time, we've actually expanded. We have now multiple account managers helping us out as we've started scaling to use certain aspects of AWS which we've never explored before.We were a bit constrained and reserved about what service we want to use because there have been instances where we have tried using something and we have hit the wall pretty immediately. API rate limits, or it's not ready for primetime, and we're like, “Oh, my God. Now, what do we do?” So, we have a bit more cautious. But that said, over time, having an account manager who understands how you work, what scale you have, they're able to advocate with the internal engineering teams within the cloud provider to make the best of supporting you as a customer and tell that success story all the way out.So yeah, I can totally understand how this may be hard, especially for those small companies. For what it's worth, I think the best way to really think about it is not treat them as your vendor, but really go out on a limb there. Even though you signed a deal with them, you want to make sure that you have the continuing relationship with them to have—represent your voice better within the company. Which is probably hard. [laugh].Corey: That's always the hard part. Honestly, if this were the sort of thing that were easy to automate, or you could wind up building out something that winds up helping companies figure out how to solve these things programmatically, talk about interesting business problems that are only going to get larger in the fullness of time. This is not going away, even if AWS stopped signing up new customers entirely right now, they would still have years of growth ahead of them just from organic growth. And take a company with the scale of Pinterest and just think of how many years it would take to do a full-on exodus, even if it became priority number one. It's not realistic in many cases, which is why I've never been a big fan of multi-cloud as an approach for negotiation. Yeah, AWS has more data on those points than any of us do; they're not worried about it. It just makes you sound like an unsophisticated negotiator. Pick your poison and lean in.Micheal: That is the truth you just mentioned, and I probably want to give a call out to our head of infrastructure, [Coburn 00:42:13]. He's also my boss, and he had brought this perspective as well. As part of any negotiation discussions, like you just said, AWS has way more data points on this than what we think we can do in terms of talking about, “Oh, we are exploring this other cloud provider.” And it's—they would be like, “Yeah. Do tell me more [laugh] how that's going.”And it's probably in the best interest to never use that as a negotiation tactic because they clearly know the investments that's going to build on what you've done, so you might as well be talking more—again, this is where that relationship really plays together because you want both of them to be successful. And it's in their best interest to still keep you happy because the good thing about at least companies of our size is that we're probably, like, one phone call away from some of their executive team, where we could always talk about what didn't work for us. And I know not everyone has that opportunity, but I'm really hoping and I know at least with some of the interactions we've had with the AWS teams, they're actively working and building that relationship more and more, giving access to those customer advisory boards, and all of them to have those direct calls with the executives. I don't know whether you've seen that in your experience in helping some of these companies?Corey: Have a different approach to it. It turns out when you're super loud and public and noisy about AWS and spend too much time in Seattle, you start to spend time with those people on a social basis. Because, again, I'm obnoxious and annoying to a lot of AWS folks, but I'm also having an obnoxious habit of being right in most of the things I'm pointing out. And that becomes harder and harder to ignore. I mean, part of the value that I found in being able to do this as a consultant is that I begin to compare and contrast different customer environments on a consistent ongoing basis.I mean, the reason that negotiation works well from my perspective is that AWS does a bunch of these every week, and customers do these every few years with AWS. And well, we do an awful lot of them, too, and it's okay, we've seen different ways things can get structured and it doesn't take too long and too many engagements before you start to see the points of commonality in how these things flow together. So, when we wind up seeing things that a customer is planning on architecturally and looking to do in the future, and, “Well, wait a minute. Have you talked to the folks negotiating the contract about this? Because that does potentially have bearing and it provides better data than what AWS is gathering just through looking at overall spend trends. So yeah, bring that up. That is absolutely going to impact the type of offer you get.”It just comes down to understanding the motivators that drive folks and it comes down to, I think understanding the incentives. I will say that across the board, I have never yet seen a deal from AWS come through where it was, “Okay, at this point you're just trying to hoodwink the customer and get them to sign on something that doesn't help them.” I've seen mistakes that can definitely lead to that impression, and I've seen areas where they're doing data is incomplete and they're making assumptions that are not borne out in reality. But it's not one of those bad faith type—Micheal: Yeah.Corey: —of negotiations. If it were, I would be framing a lot of this very differently. It sounds weird to say, “Yeah, your vendor is not trying to screw you over in this sense,” because look at the entire IT industry. How often has that been true about almost any other vendor in the fullness of time? This is something a bit different, and I still think we're trying to grapple with the repercussions of that, from a negotiation standpoint and from a long-term business continuity standpoint, when your faith is linked—in a shared fate context—with your vendor.Micheal: It's in their best interest as well because they're trying to build a diversified portfolio. Like, if they help 100 companies, even if one of them becomes the next Pinterest, that's great, right? And that continued relationship is what they're aiming for. So, assuming any bad faith over there probably is not going to be the best outcome, like you said. And two, it's not a zero-sum game.I always get a sense that when you're doing these negotiations, it's an all-or-nothing deal. It's not. You have to think they're also running a business and it's important that you as your business, how okay are you with some of those premiums? You cannot get a discount on everything, you cannot get the deal or the numbers you probably want almost everything. And to your point, architecturally, if you're moving in a certain direction where you think in the next three years, this is what your usage is going to be or it will come down to that, obviously, you should be investing more and negotiating that out front rather than managed NAT [laugh] gateways, I guess. So, I think that's also an important mindset to take in as part of any of these negotiations. Which I'm assuming—I don't know how you folks have been working in the past, but at least that's one of the key items we have taken in as part of any of these discussions.Corey: I would agree wholeheartedly. I think that it just comes down to understanding where you're going, what's important, and again in some cases knowing around what things AWS will never bend contractually. I've seen companies spend six weeks or more trying to get to negotiate custom SLAs around services. Let me save everyone a bunch of time and money; they will not grant them to you.Micheal: Yeah.Corey: I promise. So, stop asking for them; you're not going to get them. There are other things they will negotiate on that they're going to be highly case-dependent. I'm hesitant to mention any of them just because, “Well, wait a minute, we did that once. Why are you talking about that in public?” I don't want to hear it and confidentiality matters. But yeah, not everything is negotiable, but most things are, so figuring out what levers and knobs and dials you have is important.Micheal: We also found it that way. AWS does cater to their—they are a platform and they are pretty clear in how much engagement—even if we are one of their top customers, there's been many times where I know their product managers have heavily pushed back on some of the requests we have put in. And that makes me wonder, they probably have the same engagement even with the smallest of customers, there's always an implicit assumption that the big fish is trying to get the most out of your public cloud providers. To your point, I don't think that's true. We're rarely able to negotiate anything exclusive in terms of their product offerings just for us, if that makes sense.Case in point, tell us your capacity [laugh] for x instances or type of instances, so we as a company would know how to plan out our scale-ups or scale-downs. That's not going to happen exclusively for you. But those kind of things are just, like, examples we have had a chance to work with their product managers and see if, can we get some flexibility on that? For what it's worth, though, they are willing to find a middle ground with you to make sure that you get your answers and, obviously, you're being successful in your plans to use certain technologies they offer or [unintelligible 00:48:31] how you use their services.Corey: So, I know we've gone significantly over time and we are definitely going to do another episode talking about a lot of the other things that you're involved in because I'm going to assume that your full-time job is not worrying about the AWS bill. In fact, you do a fair number of things beyond that; I just get stuck on that one, given that it is but I eat, sleep, breathe, and dream about.Micheal: Absolutely. I would love to talk more, especially about how we're enabling our engineers to be extremely productive in this new world, and how we want to cater to this whole cloud-native environment which is being created, and make sure people are doing their best work. But regardless, Corey, I mean, this has been an amazing, insightful chat, even for me. And I really appreciate you having me on the show.Corey: No, thank you for joining me. If people want to learn more about what you're up to, and how you think about things, where can they find you? Because I'm also going to go out on a limb and assume you're also probably hiring, given that everyone seems to be these days.Micheal: Well, that is true. And I wasn't planning to make a hiring pitch but I'm glad that you leaned into that one. Yes, we are hiring and you can find me on Twitter at twitter dot com slash M-I-C-H-E-A-L. I am spelled a bit differently, so make sure you can hit me up, and my DMs are open. And obviously, we have all our open roles listed on pinterestcareers.com as well.Corey: And we will, of course, put links to that in the [show notes 00:49:45]. Thank you so much for taking the time to speak with me today. I really appreciate it.Micheal: Thank you, Corey. It was really been great on your show.Corey: And I'm sure we'll do it again in the near future. Micheal Benedict, Head of Engineering Productivity at Pinterest. I am Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with a long rambling comment about exactly how many data centers Pinterest could build instead.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
In episode 22 of The Kubelist Podcast, Marc talks with Chandler Hoisington of AWS about EKS Anywhere. They explore the decisions made by the AWS team in building a Kubernetes distribution and how the project is helping make Kubernetes more ubiquitous.
Confluent Platform 7.0 has launched and includes Apache Kafka® 3.0, plus new features introduced by KIP-630: Kafka Raft Snapshot, KIP-745: Connect API to restart connector and task, and KIP-695: Further improve Kafka Streams timestamp synchronization. Reporting from Dubai, Tim Berglund (Senior Director, Developer Advocacy, Confluent) provides a summary of new features, updates, and improvements to the 7.0 release, including the ability to create a real-time bridge from on-premises environments to the cloud with Cluster Linking. Cluster Linking allows you to create a single cluster link between multiple environments from Confluent Platform to Confluent Cloud, which is available on public clouds like AWS, Google Cloud, and Microsoft Azure, removing the need for numerous point-to-point connections. Consumers reading from a topic in one environment can read from the same topic in a different environment without risks of reprocessing or missing critical messages. This provides operators the flexibility to make changes to topic replication smoothly and byte for byte without data loss. Additionally, Cluster Linking eliminates any need to deploy MirrorMaker2 for replication management while ensuring offsets are preserved. Furthermore, the release of Confluent for Kubernetes 2.2 allows you to build your own private cloud in Kafka. It completes the declarative API by adding cloud-native management of connectors, schemas, and cluster links to reduce the operational burden and manual processes so that you can instead focus on high-level declarations. Confluent for Kubernetes 2.2 also enhances elastic scaling through the Shrink API. Following ZooKeeper's removal in Apache Kafka 3.0, Confluent Platform 7.0 introduces KRaft in preview to make it easier to monitor and scale Kafka clusters to millions of partitions. There are also several ksqlDB enhancements in this release, including foreign-key table joins and the support of new data types—DATE and TIME— to account for time values that aren't TIMESTAMP. This results in consistent data ingestion from the source without having to convert data types.EPISODE LINKSDownload Confluent Platform 7.0Check out the release notesRead the Confluent Platform 7.0 blog postWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get $100 of free Confluent Cloud usage (details)
Deploy containerized apps without managing complex infrastructure. Write code using your preferred programming language or framework, and build microservices with full support for Distributed Application Runtime (Dapr). Scale dynamically based on HTTP traffic or events powered by Kubernetes Event-Driven Autoscaling (KEDA). Follow Us Frank: Twitter, Blog, GitHub James: Twitter, Blog, GitHub Merge Conflict: Twitter, Facebook, Website, Chat on Discord Music : Amethyst Seer - Citrine by Adventureface ⭐⭐ Review Us (https://itunes.apple.com/us/podcast/merge-conflict/id1133064277?mt=2&ls=1) ⭐⭐ Machine transcription available on http://mergeconflict.fm
We speak to Julie Koesmarno about Jupyter Notebooks on Azure generally, and specifically about using them to help with Incident Response. We also cover security news about .NET 6.0, Azure Monitor, HDInsight, Azure Static Web Apps, Azure Key Vault, Kubernetes, Firewall, Sentinel, Ransomware, IoT Solutions and more!
[00:00:32] Andrew tells us they shipped a new project at work this week they've been working on for a few months, and although it went pretty smoothly, he explains some bumps they had along the way and dealing with crunch time. Chris shares an issue and why he's been postponing the launch of the new Hatchbox. [00:04:13] We hear more about propagating the DNS and how long it took.[00:08:28] Andrew mentions using the Proxyman app and what it does. [00:09:15] Chris tells us about his new Mac, and he can't believe how fast it is![00:13:56] Andrew talks about some issues with installing Ruby 2.6.3 and building things in Docker on a new M1 Mac that a developer on his team just got.[00:17:24] Chris explains his upgrading issues on an older app he was working on this week and realized it was a Sass change he made. Ironically, Andrew ran into something very similar with Sass as well. [00:20:57] We hear about the Ember CLI Rails gem and Chris brings up that there is no solution on how to take an abandoned project like this and just keep maintaining it and he wishes there was a better solution. [00:25:43] Andrew mentions every time you add a gem, you need to be aware of the amount of code debt you will have, and he shares what happened to him when he was a beginning developer. Chris explains why he would rather build it from scratch in the app to tailor it to exactly what they need. [00:29:48] Chris announces a new GoRails Screencast coming up with Kasper and what they'll be talking about.[00:35:25] Find out more about the awesome and very thorough tutorial on “Deploying a Rails application to Kubernetes” that you should check out! [00:39:25] Chris and Andrew chat about the importance of being Rails Developers and not working on DevOps stuff. Panelists:Chris OliverAndrew MasonSponsor:HoneybadgerLinks:Ruby Radar NewsletterRuby Radar TwitterProxymanGlassWireGoRailsGoRails-YouTube SassDeploying a Rails application to Kubernetes-By Marco ColliEmber CLI Rails-GitHubRubyConf 2021
This week we discuss HashiCorp's S1, AWS Earnings and highlights from Microsoft Ignite. Plus, Coté teaches us a new Dutch phrase. Rundown Cloud software vendor HashiCorp files for IPO as investors pour money into high-growth tech stocks (https://www.cnbc.com/2021/11/04/cloud-software-vendor-hashicorp-files-for-ipo.htmlCot) Coté's highlights (https://twitter.com/cote/status/1456344043433177091). Understanding the 2021 State of Open Source Report (https://tanzu.vmware.com/content/blog/state-of-open-source-report-highlights) Amazon Amazon badly misses on earnings and revenue, gives disappointing fourth-quarter guidance (https://www.cnbc.com/2021/10/28/amazon-amzn-earnings-q3-2021.html) Amazon Web Services tops analysts' estimates on profit and revenue (https://www.cnbc.com/2021/10/28/aws-earnings-q3-2021.html) Amazon Is The Flywheel, AWS Is The Cash Register (https://www.nextplatform.com/2021/10/29/amazon-is-the-flywheel-aws-is-the-cash-register/) A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline! (https://github.com/localstack/localstack) Your hybrid, multicloud, and edge strategy just got better with Azure (https://azure.microsoft.com/en-us/blog/your-hybrid-multicloud-and-edge-strategy-just-got-better-with-azure/) Compliance in a DevOps Culture (https://martinfowler.com/articles/devops-compliance.html) Relevant to your interests Abacus.ai snags $50M Series C as it expands into computer vision use cases (https://techcrunch.com/2021/10/27/abacus-ai-snags-50m-series-c-as-it-expands-into-computer-vision-use-cases/) NeuVector is excited to announce we are joining SUSE (https://www.suse.com/c/accelerating-security-innovation/) Monitor Your Azure Environment Using Amazon Managed Grafana (https://www.youtube.com/watch?v=e5z4ysfz_gA) Facebook's new name will be Meta (https://www.theverge.com/2021/10/28/22745234/facebook-new-name-meta-metaverse-zuckerberg-rebrand) New product: Raspberry Pi Zero 2 W on sale now at $15 - Raspberry Pi (https://www.raspberrypi.com/news/new-raspberry-pi-zero-2-w-2/) Universal Search & Productivity App | Command E (https://getcommande.com/?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter_axiosprorata&stream=top) Software services firm Zendesk to buy SurveyMonkey parent for nearly $4 bln (https://www.reuters.com/technology/software-services-firm-zendesk-buy-surveymonkey-parent-nearly-4-bln-2021-10-28/) Kalshi (https://kalshi.com/markets) Popular gaming platform Roblox back online after multi-day crash (https://www.marketwatch.com/story/popular-gaming-platform-roblox-suffers-multi-day-crash-01635713002) Dell spins off $64 billion VMware as it battles debt hangover (https://arstechnica.com/information-technology/2021/11/dell-spins-off-64-billion-vmware-as-it-battles-debt-hangover/) BMC Unveils New Data Management and Analytics Capabilities (https://thenewstack.io/bmc-helix-and-control-m-data-management-and-analytics/) Squid Game Cryptocurrency Scammers Make Off With $2.1 Million (https://gizmodo.com/squid-game-cryptocurrency-scammers-make-off-with-2-1-m-1847972824) AI programming tool Copilot helps write up to 30% of code on GitHub (https://www.axios.com/copilot-artificial-intelligence-coding-github-9a202f40-9af7-4786-9dcb-b678683b360f.html) Introducing the Free Java License (https://blogs.oracle.com/java/post/free-java-license) Backblaze's IPO a test for smaller tech concerns (https://techcrunch.com/2021/11/02/backblazes-ipo-a-test-for-smaller-tech-concerns/) Happy 1.0, Knative (https://off-by-one.dev/happy-1-0-knative/) A Return to the General Purpose Database (https://redmonk.com/sogrady/2021/10/26/general-purpose-database/) Microsoft Teams enters the metaverse race with 3D avatars and immersive meetings (https://www.theverge.com/e/22523015) Nat Friedman to step down as head of Microsoft's GitHub (https://www.zdnet.com/article/nat-friedman-to-step-down-as-head-of-microsofts-github/) Microsoft launches Google Wave (https://techcrunch.com/2021/11/02/microsoft-launches-google-wave/) Nonsense Apple's worst shipping delay is for a $19 polishing cloth — Engadget (https://apple.news/A5hFyYAq3RgG35nJT1AX6bA) Aussie++ (https://aussieplusplus.vercel.app/) Microsoft resurrects Clippy again after brutally killing him off in Microsoft Teams (https://www.theverge.com/2021/11/1/22756973/microsoft-clippy-microsoft-teams-stickers-return) Allbirds shares surge 60% in eco-friendly shoe maker's market debut (https://www.cnbc.com/2021/11/03/allbirds-ipo-bird-to-start-trading-on-the-nasdaq.html) Sponsors strongDM — Manage and audit remote access to infrastructure. Start your free 14-day trial today at strongdm.com/SDT (http://strongdm.com/SDT) CBT Nuggets — Training available for IT Pros anytime, anywhere. Start your 7-day Free Trial today at cbtnuggets.com/sdt (https://cbtnuggets.com/sdt) Conferences MongoDB.local London 2021 (https://events.mongodb.com/dotlocallondon) - November 9, 2021 Coté speaking at DevOops (https://devoops.ru/en/) (Russia), Nov 11th: “Kubernetes is not for developers…?” (https://devoops.ru/en/talks/kubernetes-is-not-for-developers/) THAT Conference comes to Texas January 17-20, 2022 (https://that.us/activities/call-for-counselors/tx/2022) Listener Feedback Mailed stickers to Stephan in Berlin. Brian wants you to work at Red Hat as a Senior Product Manager (https://us-redhat.icims.com/jobs/88701/senior-product-manager) or Principle Product Manager (https://us-redhat.icims.com/jobs/89053/principal-product-manager) in Security. SDT news & hype Join us in Slack (http://www.softwaredefinedtalk.com/slack). Send your postal address to email@example.com (mailto:firstname.lastname@example.org) and we will send you free laptop stickers! Follow us on Twitch (https://www.twitch.tv/sdtpodcast), Twitter (https://twitter.com/softwaredeftalk), Instagram (https://www.instagram.com/softwaredefinedtalk/), LinkedIn (https://www.linkedin.com/company/software-defined-talk/) and YouTube (https://www.youtube.com/channel/UCi3OJPV6h9tp-hbsGBLGsDQ/featured). Brandon built the Quick Concall iPhone App (https://itunes.apple.com/us/app/quick-concall/id1399948033?mt=823) and he wants you to buy it for $0.99. Use the code SDT to get $20 off Coté's book, (https://leanpub.com/digitalwtf/c/sdt) Digital WTF (https://leanpub.com/digitalwtf/c/sdt), so $5 total. Become a sponsor of Software Defined Talk (https://www.softwaredefinedtalk.com/ads)! Recommendations Brandon: Success Equation (http://success-equation.com) — The spiritual sequel to “The Halo Effect” Podcast Interview with Author (http://Michael> Mauboussin Master Class — Moats, Skill, Luck, Decision Making and a Whole Lot More | Acquired Podcast) YouTube Talk by Author (https://www.youtube.com/watch?v=1JLfqBsX5Lc) Paradox of Skill (https://research-doc.credit-suisse.com/docView?language=ENG&format=PDF&source_id=em&document_id=805456950&serialid=LsvBuE4wt3XNGE0V%2B3ec251NK9soTQqcMVQ9q2QuF2I%3D) Matt: The Art and Soul of Dune (Companion Book Music) (https://open.spotify.com/album/0FGr97xSOQLD596ZebfU1T?si=9rTrMK_wTiWZOtwiKfvZMA) Dune (the book) (https://amzn.to/3whLKHx) Coté: LaserWriter II (https://www.goodreads.com/en/book/show/56269270-laserwriter-ii). Also, check out my Tiny Tanzu Talk videos playlist (https://www.youtube.com/playlist?list=PLk_5VqpWEtiV6sJUlKx_4dse8U2tLjjn0) - 18 months of video madness. Also, I watch Frozen from three to ten times a day now with Dutch subtitles turned on. So, I'm trying to memorize “als een kip met het gezicht van een aap.” (https://translate.google.com/?sl=auto&tl=nl&text=like%20a%20chicken%20with%20the%20face%20of%20a%20monkey&op=translate&hl=en) Photo Credits Show Art (https://unsplash.com/photos/UMfGoM67w48) Hashicorp S1 Screenshot Show Art (https://twitter.com/cote/status/1456349608775491585re) Banner Header (https://unsplash.com/photos/dwBZLRPhHjc)
Each year a group of user researchers and the Go team get together and create a survey for the Go community. The results of the survey are analyzed and turned into a report made available to everyone in the Go community. In this episode we sit down with Alice Merrick and Todd Kulesza to discuss the survey, how it's made, and some of the interesting results from this year's survey.
In this TCP Talks episode, Justin Brodley and Jonathan Baker talk with Jonathan Heiliger, co-founder and partner at Vertex Ventures: an early-stage venture capital firm backing innovative technology entrepreneurs. Earlier in his career, at just 19, Jonathan co-founded web hosting provider GlobalCenter and served as CTO. He went on to hold engineering roles at Walmart and Danger, Inc., the latter of which was acquired by Microsoft. He was also Vice President of Infrastructure and Operations at Facebook (now Meta), and a general partner at North Bridge Ventures. The latter firm's portfolio included Quora, Periscope, and Lytro (which has been acquired by Google.) At Vertex Ventures, Jonathan has helped cutting-edge companies like LaunchDarkly and OpsLevel revolutionize the tech space with continuous delivery and IT service management solutions. Jonathan shares his insights into the shifting market of IT services and explains why decentralizing infrastructure management can help digitally native companies operate at a faster pace. According to Jonathan, the question of IT service infrastructure isn't being adequately addressed. Without properly defining service ownership, businesses looking to scale run the risk of siloing critical knowledge, and losing track of services networks. Jonathan also discusses his own experiences running infrastructure at Facebook (oops, Meta), the merits of both centralized and decentralized IT services management, and how he and his partners at Vertex Ventures approach new investments. Featured Guest
In this episode, we cover: 00:00:00 - Introduction 00:03:20 - VMWare Tanzu 00:07:50 - Gustavo's Career in Security 00:12:00 - Early Days in Chaos Engineering 00:16:30 - Catzilla 00:19:45 - Expanding on SRE 00:26:40 - Learning from Customer Trends 00:29:30 - Chaos Engineering at VMWare 00:36:00 - Outro Links: Tanzu VMware: https://tanzu.vmware.com GitHub for SREDocs: https://github.com/google/sredocs E-book on how to start your incident lifecycle program: https://tanzu.vmware.com/content/ebooks/establishing-an-sre-based-incident-lifecycle-program Twitter: https://twitter.com/stratus TranscriptJason: Welcome to Break Things on Purpose, a podcast about chaos engineering and building reliable systems. In this episode, Gustavo Franco, a senior engineering manager at VMware joins us to talk about building reliability as a product feature, and the journey of chaos engineering from its place in the early days of Google's disaster recovery practices to the modern SRE movement. Thanks, everyone, for joining us for another episode. Today with us we have Gustavo Franco, who's a senior engineering manager at VMware. Gustavo, why don't you say hi, and tell us about yourself.Gustavo: Thank you very much for having me. Gustavo Franco; as you were just mentioning, I'm a senior engineering manager now at VMware. So, recently co-founded the VMware Tanzu Reliability Engineering Organization with Megan Bigelow. It's been only a year, actually. And we've been doing quite a bit more than SRE; we can talk about like—we're kind of branching out beyond SRE, as well.Jason: Yeah, that sounds interesting. For folks who don't know, I feel like I've seen VMware Tanzu around everywhere. It just suddenly went from nothing into this huge thing of, like, every single Kubernetes-related event, I feel like there's someone from VMware Tanzu on it. So, maybe as some background, give us some information; what is VMware Tanzu?Gustavo: Kubernetes is sort of the engine, and we have a Kubernetes distribution called Tanzu Kubernetes Grid. So, one of my teams actually works on Tanzu Kubernetes Grid. So, what is VMware Tanzu? What this really is, is what we call a modern application platform, really an end-to-end solution. So, customers expect to buy not just Kubernetes, but everything around, everything that comes with giving the developers a platform to write code, to write applications, to write workloads.So, it's basically the developer at a retail company or a finance company, they don't want to run Kubernetes clusters; they would like the ability to, maybe, but they don't necessarily think in terms of Kubernetes clusters. They want to think about workloads, applications. So, VMWare Tanzu is end-to-end solution that the engine in there is Kubernetes.Jason: That definitely describes at least my perspective on Kubernetes is, I love running Kubernetes clusters, but at the end of the day, I don't want to have to evaluate every single CNCF project and all of the other tools that are required in order to actually maintain and operate a Kubernetes cluster.Gustavo: I was just going to say, and we acquired Pivotal a couple of years ago, so that brought a ton of open-source projects, such as the Spring Framework. So, for Java developers, I think it's really cool, too, just being able to worry about development and the Java layer and a little bit of reliability, chaos engineering perspective. So, kind of really gives me full tooling, the ability common libraries. It's so important for reliable engineering and chaos engineering as well, to give people this common surface that we can actually use to inject faults, potentially, or even just define standards.Jason: Excellent point of having that common framework in order to do these reliability practices. So, you've explained what VMware Tanzu is. Tell me a bit more about how that fits in with VMware Tanzu?Gustavo: Yeah, so one thing that happened the past few years, the SRE organization grew beyond SRE. We're doing quite a bit of horizontal work, so SRE being one of them. So, just an example, I got to charter a compliance engineering team and one team that we call ‘Customer Zero.' I would call them partially the representatives of growth, and then quote-unquote, “Customer problems, customer pain”, and things that we have to resolve across multiple teams. So, SRE is one function that clearly you can think of.You cannot just think of SRE on a product basis, but you think of SRE across multiple products because we're building a platform with multiple pieces. So, it's kind of like putting the building blocks together for this platform. So then, of course, we're going to have to have a team of specialists, but we need an organization of generalists, so that's where SRE and this broader organization comes in.Jason: Interesting. So, it's not just we're running a platform, we need our own SREs, but it sounds like it's more of a group that starts to think more about the product itself and maybe even works with customers to help their reliability needs?Gustavo: Yeah, a hundred percent. We do have SRE teams that invest the majority of their time running SaaS, so running Software as a Service. So, one of them is the Tanzu Mission Control. It's purely SaaS, and what teams see Tanzu Mission Control does is allow the customers to run Kubernetes anywhere. So, if people have Kubernetes on-prem or they have Kubernetes on multiple public clouds, they can use TMC to be that common management surface, both API and web UI, across Kubernetes, really anywhere they have Kubernetes. So, that's SaaS.But for TKG SRE, that's a different problem. We don't have currently a TKG SaaS offering, so customers are running TKG on-prem or on public cloud themselves. So, what does the TKG SRE team do? So, that's one team that actually [unintelligible 00:05:15] to me, and they are working directly improving the reliability of the product. So, we build reliability as a feature of the product.So, we build a reliability scanner, which is a [unintelligible 00:05:28] plugin. It's open-source. I can give you more examples, but that's the gist of it, of the idea that you would hire security engineers to improve the security of a product that you sell to customers to run themselves. Why wouldn't you hire SREs to do the same to improve the reliability of the product that customers are running themselves? So, kind of, SRE beyond SaaS, basically.Jason: I love that idea because I feel like a lot of times in organizations that I talk with, SRE really has just been a renamed ops team. And so it's purely internal; it's purely thinking about we get software shipped to us from developers and it's our responsibility to just make that run reliably. And this sounds like it is that complete embrace of the DevOps model of breaking down silos and starting to move reliability, thinking of it from a developer perspective, a product perspective.Gustavo: Yeah. A lot of my work is spent on making analogies with security, basically. One example, several of the SREs in my org, yeah, they do spend time doing PRs with product developers, but also they do spend a fair amount of time doing what we call in a separate project right now—we're just about to launch something new—a reliability risk assessment. And then you can see the parallels there. Where like security engineers would probably be doing a security risk assessment or to look into, like, what could go wrong from a security standpoint?So, I do have a couple engineers working on reliability risk assessment, which is, what could go wrong from a reliability standpoint? What are the… known pitfalls of the architecture, the system design that we have? How does the architectural work looks like of the service? And yeah, what are the outages that we know already that we could have? So, if you have a dependency on, say, file on a CDN, yeah, what if the CDN fails?It's obvious and I know most of the audience will be like, “Oh, this is obvious,” but, like, are you writing this down on a spreadsheet and trying to stack-rank those risks? And after you stack-rank them, are you then mitigating, going top-down, look for—there was an SREcon talk by [Matt Brown 00:07:32], a former colleague of mine at Google, it's basically, know your enemy tech talk in SREcon. He talks about this like how SRE needs to have a more conscious approach to reliability risk assessment. So, really embraced that, and we embraced that at VMware. The SRE work that I do comes from a little bit of my beginnings or my initial background of working security.Jason: I didn't actually realize that you worked security, but I was looking at your LinkedIn profile and you've got a long career doing some really amazing work. So, you said you were in security. I'm curious, tell us more about how your career has progressed. How did you get to where you are today?Gustavo: Very first job, I was 16. There was this group of sysadmins on the first internet service provider in Brazil. One of them knew me from BBS, Bulletin Board Systems, and they, you know, were getting hacked, left and right. So, this guy referred me, and he referred me saying, “Look, it's this kid. He's 16, but he knows his way around this security stuff.”So, I show up, they interview me. I remember one of the interview questions; it's pretty funny. They asked me, “Oh, what would you do if we asked you to go and actually physically grab the routing table from AT&T?” It's just, like, a silly question and they told them, “Uh, that's impossible.” So, I kind of told him the gist of what I knew about routing, and it was impossible to physically get a routing table.For some reason, they loved that. That was the only candidate that could be telling them, “No. I'm not going to do it because it makes no sense.” So, they hired me. And the student security was basically teaching the older sysadmins about SSH because they were all on telnet, nothing was encrypted.There was no IDS—this was a long time ago, right, so the explosion of cybersecurity security firms did not exist then, so it was new. To be, like, a security company was a new thing. So, that was the beginning. I did dabble in open-source development for a while. I had a couple other jobs on ISPs.Google found me because of my dev and open-source work in '06, '07. I interviewed, joined Google, and then at Google, all of it is IC, basically, individual contributor. And at Google, I start doing SRE-type of work, but for the corporate systems. And there was this failed attempt to migrate from one Linux distribution to another—all the corporate systems—and I tech-led the effort making that successful. I don't think I should take the credit; it was really just a fact of, like you know, trying the second time and kind of, learned—the organization learned the lessons that I had to learn from the first time. So, we did a second time and it worked.And then yeah, I kept going. I did more SRE work in corp, I did some stuff in production, like all the products. So, I did a ton of stuff. I did—let's see—technical infrastructure, disaster recovery testing, I started a chaos-engineering-focused team. I worked on Google Cloud before we had a name for it. [laugh].So, I was the first SRE on Google Compute Engine and Google Cloud Storage. I managed Google Plus SRE team, and G Suite for a while. And finally, after doing all this runs on different teams, and developing new SRE teams and organizations, and different styles, different programs in SRE. Dave Rensin, which created the CRE team at Google, recruited me with Matt Brown, which was then the tech lead, to join the CRE team, which was the team at Google focused on teaching Google Cloud customers on how to adopt SRE practices. So, because I had this very broad experience within Google, they thought, yeah, it will be cool if you can share that experience with customers.And then I acquired even more experience working with random customers trying to adopt SRE practices. So, I think I've seen a little bit of it all. VMware wanted me to start, basically, a CRE team following the same model that we had at Google, which culminated all this in TKG SRE that I'm saying, like, we work to improve the reliability of the product and not just teaching the customer how to adopt SRE practices. And my pitch to the team was, you know, we can and should teach the customers, but we should also make sure that they have reasonable defaults, that they are providing a reasonable config. That's the gist of my experience, at a high level.Jason: That's an amazing breadth of experience. And there's so many aspects that I feel like I want to dive into [laugh] that I'm not quite sure exactly where to start. But I think I'll start with the first one, and that's that you mentioned that you were on that initial team at Google that started doing chaos engineering. And so I'm wondering if you could share maybe one of your experiences from that. What sort of chaos engineering did you do? What did you learn? What were the experiments like?Gustavo: So, a little bit of the backstory. This is probably because Kripa mentioned this several times before—and Kripa Krishnan, she actually initiated disaster recovery testing, way, way before there was such a thing as chaos engineering—that was 2006, 2007. That was around the time I was joining Google. So, Kripa was the first one to lead disaster recovery testing. It was very manual; it was basically a room full of project managers with postIts, and asking teams to, like, “Hey, can you test your stuff? Can you test your processes? What if something goes wrong? What if there's an earthquake in the Bay Area type of scenario?” So, that was the predecessor.Many, many years later, I work with her from my SRE teams testing, for my SRE teams participating in disaster recovery testing, but I was never a part of the team responsible for it. And then seven years later, I was. So, she recruited me with the following pitch, she was like, “Well, the program is big. We have disaster recovery tests, we have a lot of people testing, but we are struggling to convince people to test year-round. So, people tend to test once a year, and they don't test again. Which is bad. And also,” she was like, “I wish we had a software; there's something missing.”We had the spreadsheets, we track people, we track their tasks. So, it was still very manual. The team didn't have a tool for people to test. It was more like, “Tell me what you're going to test, and I will help you with scheduling, I'll help you to not conflict with the business and really disrupt something major, disrupt production, disrupt the customers, potentially.” A command center, like a center of operations.That's what they did. I was like, “I know exactly what we need.” But then I surveyed what was out there in open-source way, and of course, like, Netflix, gets a lot of—deserves a lot of credit for it; there was nothing that could be applied to the way we're running infrastructure internally. And I also felt that if we built this centrally and we build a catalog of tasks ourselves, and that's it, people are not going to use it. We have a bunch of developers, software engineers.They've got to feel like—they want to, they want to feel—and rightfully so—that they wanted control and they are in control, and they want to customize the system. So, in two weeks, I hack a prototype where it was almost like a workflow engine for chaos engineering tests, and I wrote two or three tests, but there was an API for people to bring their own test to the system, so they could register a new test and basically send me a patch to add their own tests. And, yeah, to my surprise, like, a year later—and the absolute number of comparison is not really fair, but we had an order of magnitude more testing being done through the software than manual tests. So, on a per-unit basis, the quality of the ultimate tasks was lower, but the cool thing was that people were testing a lot more often. And it was also very surprising to see the teams that were testing.Because there were teams that refused to do the manual disaster recovery testing exercise, they were using the software now to test, and that was part of the regular integration test infrastructure. So, they're not quite starting with okay, we're going to test in production, but they were testing staging, they were testing a developer environment. And in staging, they had real data; they were finding regressions. I can mention the most popular testing, too, because I spoke about this publicly before, which was this fuzz testing. So, a lot of things are RPC or RPC services, RPC, servers.Fuzz testing is really useful in the sense that, you know, if you send a random data in RPC call, will the server crash? Will the server handling this gracefully? So, we fought a lot of people—not us—a lot of people use or shared service bringing their own test, and fuzz testing was very popular to run continuously. And they would find a ton of crashes. We had a lot of success with that program.This team that I ran that was dedicated to building this shared service as a chaos engineering tool—which ironically named Catzilla—and I'm not a cat person, so there's a story there, too—was also doing more than just Catzilla, which we can also talk about because there's a little bit more of the incident management space that's out there.Jason: Yeah. Happy to dive into that. Tell me more about Catzilla?Gustavo: Yeah. So, Catzilla was sort of the first project from scratch from the team that ended up being responsible to share a coherent vision around the incident prevention. And then we would put Catzilla there, right, so the chaos engineering shared service and prevention, detection, analysis and response. Because once I started working on this, I realized, well, you know what? People are still being paged, they have good training, we had a good incident management process, so we have good training for people to coordinate incidents, but if you don't have SREs working directly with you—and most teams didn't—you also have a struggle to communicate with executives.It was a struggle to figure out what to do with prevention, and then Catzilla sort of resolved that a little bit. So, if you think of a team, like an SRE team in charge of not running a SaaS necessarily, but a team that works in function of a company to help the company to think holistically about incident prevention, detection, analysis, and response. So, we end up building more software for those. So, part of the software was well, instead of having people writing postmortems—a pet peeve of mine is people write postmortems and them they would give to the new employees to read them. So, people never really learned the postmortems, and there was like not a lot of information recovery from those retrospectives.Some teams were very good at following up on extra items and having discussions. But that's kind of how you see the community now, people talking about how we should approach retrospectives. It happened but it wasn't consistent. So then, well, one thing that we could do consistently is extract all the information that people spend so much time writing on the retrospectives. So, my pitch was, instead of having these unstructured texts, can we have it both unstructured and structured?So, then we launch postmortem template that was also machine-readable so we could extract information and then generate reports for to business leaders to say, “Okay, here's what we see on a recurring basis, what people are talking about in the retrospectives, what they're telling each other as they go about writing the retrospectives.” So, we found some interesting issues that were resolved that were not obvious on a per retrospective basis. So, that was all the way down to the analysis of the incidents. On the management part, we built tooling. It's basically—you can think of it as a SaaS, but just for the internal employees to use that is similar to externally what would be an incident dashboard, you know, like a status page of sorts.Of course, a lot more information internally for people participating in incidents than they have externally. For me is thinking of the SRE—and I manage many SRE teams that were responsible for running production services, such as Compute Engine, Google Plus, Hangouts, but also, you know, I just think of SRE as the folks managing production system going on call. But thinking of them a reliability specialists. And there's so many—when you think of SREs as reliability specialists that can do more than respond to pages, then you can slot SREs and SRE teams in many other areas of a organization.Jason: That's an excellent point. Just that idea of an SRE as being more than just the operation's on-call unit. I want to jump back to what you mentioned about taking and analyzing those retrospectives and analyzing your incidents. That's something that we did when I was at Datadog. Alexis Lê-Quôc, who's the CTO, has a fantastic talk about that at Monitorama that I'll link to in the [show notes 00:19:49].It was very clear from taking the time to look at all of your incidents, to catalog them, to really try to derive what's the data out of those and get that information to help you improve. We never did it in an automated way, but it sounds like with an automated tool, you were able to gather so much more information.Gustavo: Yeah, exactly. And to be clear, we did this manually before, and so we understood the cost of. And our bar, company-wide, for people writing retrospectives was pretty low, so I can't give you a hard numbers, but we had a surprising amount of retrospectives, let's say on a monthly basis because a lot of things are not necessarily things that many customers would experience. So, near misses or things that impact very few customers—potentially very few customers within a country could end up in a retrospective, so we had this throughput. So, it wasn't just, like, say, the highest severity outages.Like where oh, it happens—the stuff that you see on the press that happens once, maybe, a year, twice a year. So, we had quite a bit of data to discuss. So, then when we did it manually, we're like, “Okay, yeah, there's definitely something here because there's a ton of information; we're learning so much about what happens,” but then at the same time, we were like, “Oh, it's painful to copy and paste the useful stuff from a document to a spreadsheet and then crunch the spreadsheet.” And kudos—I really need to mention her name, too, Sue [Lueder 00:21:17] and also [Yelena Ortel 00:21:19]. Both of them were amazing project program managers who've done the brunt of this work back in the days when we were doing it manually.We had a rotation with SREs participating, too, but our project managers were awesome. And also Jason: As you started to analyze some of those incidents, every infrastructure is different, every setup is different, so I'm sure that maybe the trends that you saw are perhaps unique to those Google teams. I'm curious if you could share the, say, top three themes that might be interesting and applicable to our listeners, and things that they should look into or invest in?Gustavo: Yeah, one thing that I tell people about adopting the—in the books, the SRE books, is the—and people joke about it, so I'll explain the numbers a little better. 70, 75% of the incidents are triggered by config changes. And people are like, “Oh, of course. If you don't change anything, there are no incidents, blah, blah, blah.” Well, that's not true, that number really speaks to a change in the service that is impacted by the incident.So, that is not a change in the underlying dependency. Because people were very quickly to blame their dependencies, right? So meaning, if you think of a microservice mesh, the service app is going to say, “Oh, sure. I was throwing errors, my service was throwing errors, but it was something with G or H underneath, in a layer below.” 75% of cases—and this is public information goes into books, right—of retrospectives was written, the service that was throwing the errors, it was something that changed in that service, not above or below; 75% of the time, a config change.And it was interesting when we would go and look into some teams where there was a huge deviation from that. So, for some teams, it was like, I don't know, 85% binary deploys. So, they're not really changing config that much, or the configuration issues are not trigger—or the configuration changes or not triggering incidents. For those teams, actually, a common phenomenon was that because they couldn't. So, they did—the binary deploys were spiking as contributing factors and main triggers for incidents because they couldn't do config changes that well, roll them out in production, so they're like, yeah, of course, like, [laugh] my minor deploys will break more on my own service.But that showed to a lot of people that a lot of things were quote-unquote, “Under their control.” And it also was used to justify a project and a technique that I think it's undervalued by SREs in the wild, or folks running production in the wild which is canary evaluation systems. So, all these numbers and a lot of this analysis was just fine for, like, to give extra funding for the scene that was basically systematically across the entire company, if you tried to deploy a binary to production, if you tried to deploy a config change to production, will evaluate a canary if the binary is in a crash loop, if the binary is throwing many errors, is something is changing in a clearly unpredictable way, it will pause, it will abort the deploy. Which back to—much easier said than done. It sounds obvious, right, “Oh, we should do canaries,” but, “Oh, can you automate your canaries in such a way that they're looking to monitoring time series and that it'll stop a release and roll back a release so a human operator can jump in and be like, ‘oh, okay. Was it a false positive or not?'”Jason: I think that moving to canary deployments, I've long been a proponent of that, and I think we're starting to see a lot more of that with tools such as—things like LaunchDarkly and other tools that have made it a whole lot easier for your average organization that maybe doesn't have quite the infrastructure build-out. As you started to work on all of this within Google, you then went to the CRE team and started to help Google Cloud customers. Did any of these tools start to apply to them as well, analyzing their incidents and finding particular trends for those customers?Gustavo: More than one customer, when I describe, say our incident lifecycle management program, and the chaos engineering program, especially this lifecycle stuff, in the beginning, was, “Oh, okay. How do I do that?” And I open-sourced a very crufty prototype which some customers pick up on it and they implement internally in their companies. And it's still on GitHub, so /google/sredocs.There's an ugly parser, an example, like, of template for the machine-readable stuff, and how to basically get your retrospectives, dump the data onto Google BigQuery to be able to query more structurally. So yes, customers would ask us about, “Yeah. I heard about chaos engineering. How do you do chaos engineering? How can we start?”So, like, I remember a retail one where we had a long conversation about it, and some folks in tech want to know, “Yeah, instant response; how do I go about it?” Or, “What do I do with my retrospectives?” Like, people started to realize that, “Yeah, I write all this stuff and then we work on the action items, but then I have all these insights written down and no one goes back to read it. How can I get actionable insights, actionable information out of it?”Jason: Without naming any names because I know that's probably not allowed, are there any trends from customers that you'd be willing to share? Things that maybe—insights that you learned from how they were doing things and the incidents they were seeing that was different from what you saw at Google?Gustavo: Gaming is very unique because a lot of gaming companies, when we would go into incident management, [unintelligible 00:26:59] they were like, “If I launch a game, it's ride or die.” There may be a game that in the first 24, or 48 hours if the customers don't show up, they will never show up. So, that was a little surprising and unusual. Another trend is, in finance, you would expect a little behind or to be too strict on process, et cetera, which they still are very sophisticated customers, I would say. The new teams of folks are really interested in learning how to modernize the finance infrastructure.Let's see… well, tech, we basically talk the same language, with the gaming being a little different. In retail, the uniqueness of having a ton of things at the edge was a little bit of a challenge. So, having these hubs, where they have, say, a public cloud or on-prem data center, and these of having things running at the stores, so then having this conversation with them about different tiers and how to manage different incidents. Because if a flagship store is offline, it is a big deal. And from a, again, SaaS mindset, if you're think of, like, SRE, and you always manage through a public cloud, you're like, “Oh, I just call with my cloud provider; they'll figure it out.”But then for retail company with things at the edge, at a store, they cannot just sit around and wait for the public cloud to restore their service. So again, a lot of more nuanced conversations there that you have to have of like, yeah, okay, yeah. Here, say a VMware or a Google. Yeah, we don't deal with this problem internally, so yeah, how would I address this? The answers are very long, and they always depend.They need to consider, oh, do you have an operational team that you can drive around? [laugh]. Do you have people, do you have staffing that can go to the stores? How long it will take? So, the SLO conversation there is tricky.a secret weapon of SRE that has definitely other value is the project managers, program managers that work with SREs. And I need to shout out to—if you're a project manager, program manager working with SREs, shout out to you.Do you want to have people on call 24/7? Do you have people near that store that can go physically and do anything about it? And more often than not, they rely on third-party vendors, so then it's not staffed in-house and they're not super technical, so then remote management conversations come into play. And then you talk about, “Oh, what's your network infrastructure for that remote management?” Right? [laugh].Jason: Things get really interesting when you start to essentially outsource to other companies and have them provide the technology, and you try to get that interface. So, you mentioned doing chaos engineering within Google, and now you've moved to VMware with the Tanzu team. Tell me a bit more about how do you do chaos engineering at VMware, and what does that look like?Gustavo: I've seen varying degrees of adoption. So, right now, within my team, what we are doing is we're actually going as we speak right now, doing a big reliabilities assessment for a launch. Unfortunately, we cannot talk about it yet. We're probably going to announce this on October at VMworld. As a side effect of this big launch, we started by doing a reliability risk assessment.And the way we do this is we interview the developers—so this hasn't launched yet, so we're still designing this thing together. [unintelligible 00:30:05] the developers of the architecture that they basically sketch out, like, what is it that you're going to? What are the user journeys, the user stories? Who is responsible for what? And let's put an architecture diagram, a sketch together.And then we tried to poke or holes on, “Okay. What could go wrong here?” We write this stuff down. More often than not, from this list—and I can already see, like, that's where that output, that result fits into any sort of chaos engineering plan. So, that's where, like—so I can get—one thing that I can tell you for that risk assessment because I participated in the beginning was, there is a level of risk involving a CDN, so then one thing that we're likely going to test before we get to general availability is yeah, let's simulate that the CDN is cut off from the clients.But even before we do the test, we're already asking, but we don't trust. Like, trust and verify, actually; we do trust but trust and verify. So, we do trust the client is actually another team. So, we do trust the client team that they cache, but we are asking them, “Okay. Can you confirm that you cache? And if you do cache, can you give us access to flush the cache?”We trust them, we trust the answers; we're going to verify. And how do we verify? It's through a chaos engineering test which is, let's cut the client off from the CDN and then see what happens. Which could be, for us, as simple as let's move the file away; we should expect them to not tell us anything because the client will fail to read but it's going to pick from cache, it's not reading from us anyways. So, there is, like, that level of we tell people, “Hey, we're going to test a few things.”We'll not necessarily tell them what. So, we are also not just testing the system, but testing how people react, and if anything happens. If nothing happens, it's fine. They're not going to react to it. So, that's the level of chaos engineering that our team has been performing.Of course, as we always talk about improving reliability for the product, we talked about, “Oh, how is it that chaos engineering as a tool for our customers will play out in the platform?” That conversation now is a little bit with product. So, product has to decide how and when they want to integrate, and then, of course, we're going to be part of that conversation once they're like, “Okay, we're ready to talk about it.” Other teams of VMWare, not necessarily Tanzu, then they do all sorts of chaos engineering testing. So, some of them using tools, open-source or not, and a lot of them do tabletop, basically, theoretical testing as well.Jason: That's an excellent point about getting started. You don't have a product out yet, and I'm sure everybody's anticipating hearing what it is and seeing the release at VMworld, but testing before you have a product; I feel like so many organizations, it's an afterthought, it's the, “I've built the product. It's in production. Now, we need to keep it reliable.” And I think by shifting that forward to thinking about, we've just started diagramming the architecture, let's think about where this can break. And how we can build those tests so that we can begin to do that chaos engineering testing, begin to do that reliability testing during the development of the product so that it ships reliably, rather than shipping and then figuring out how to keep it reliable.Gustavo: Yeah. The way I talked to—and I actually had a conversation with one of our VPs about this—is that you have technical support that is—for the most part, not all the teams from support—but at least one of the tiers of support, you want it to be reactive by design. You can staff quite a few people to react to issues and they can be very good about learning the basics because the customers—if you're acquiring more customers, they are going to be—you're going to have a huge set of customers early in the journey with your product. And you can never make the documentation perfect and the product onboarding perfect; they're going to run into issues. So, that very shallow set of issues, you can have a level of arterial support that is reactive by design.You don't want that tier of support to really go deep into issues forever because they can get caught up into a problem for weeks or months. You kind of going to have—and that's when you add another tier and that's when we get to more of, like, support specialists, and then they split into silos. And eventually, you do get an IC SRE being tier three or tier four, where SRE is a good in-between support organizations and product developers, in the sense that product developers also tend to specialize in certain aspects of a product. SRE wants to be generalists for reliability of a product. And nothing better than to uncover reliability for product is understanding the customer pain, the customer issues.And actually, one thing, one of the projects I can tell you about that we're doing right now is we're improving the reliability of our installation. And we're going for, like, can we accelerate the speed of installs and reduce the issues by better automation, better error handling, and also good—that's where I say day zero. So, day zero is, can we make this install faster, better, and more reliable? And after the installs in day one, can we get better default? Because I say the ergonomics for SRE should be pretty good because we're TKG SREs, so there's [unintelligible 00:35:24] and SRE should feel at home after installing TKG.Otherwise, you can just go install vanilla Kubernetes. And if vanilla Kubernetes does feel at home because it's open-source, it's what most people use and what most people know, but it's missing—because it's just Kubernetes—missing a lot of things around the ecosystem that TKG can install by default, but then when you add a lot of other things, I need to make sure that it feels at home for SREs and operators at large.Jason: It's been fantastic chatting with you. I feel like we can go [laugh] on and on.Gustavo: [laugh].Jason: I've gone longer than I had intended. Before we go, Gustavo, I wanted to ask you if you had anything that you wanted to share, anything you wanted to plug, where can people find you on the internet?Gustavo: Yeah, so I wrote an ebook on how to start your incident lifecycle program. It's not completely out yet, but I'll post on my Twitter account, so twitter.com/stratus. So @stratus, S-T-R-A-T-U-S. We'll put the link on the [notes 00:36:21], too. And so yeah, you can follow me there. I will publish the book once it's out. Kind of explains all about the how to establish an incident lifecycle. And if you want to talk about SRE stuff, or VMware Tanzu or TKG, you can also message me on Twitter.Jason: Thanks for all the information.Gustavo: Thank you, again. Thank you so much for having me. This was really fun. I really appreciate it.Jason: For links to all the information mentioned, visit our website at gremlin.com/podcast. If you liked this episode, subscribe to the Break Things on Purpose podcast on Spotify, Apple Podcasts, or your favorite podcast platform. Our theme song is called “Battle of Pogs” by Komiku, and it's available on loyaltyfreakmusic.com.
About Betty Betty Junod is the Senior Director of Multi-Cloud Solutions at VMware helping organizations along their journey to cloud. This is her second time at VMware, having previously led product marketing for end user computing products. Prior to VMware she held marketing leadership roles at Docker and solo.io in following the evolution of technology abstractions from virtualization, containers, to service mesh. She likes to hang out at the intersection of open source, distributed systems, and enterprise infrastructure software. @bettyjunod Links: Twitter: https://twitter.com/BettyJunod Vmware.com/cloud: https://vmware.com/cloud TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: You know how git works right?Announcer: Sorta, kinda, not really Please ask someone else!Corey: Thats all of us. Git is how we build things, and Netlify is one of the best way I've found to build those things quickly for the web. Netlify's git based workflows mean you don't have to play slap and tickle with integrating arcane non-sense and web hooks, which are themselves about as well understood as git. Give them a try and see what folks ranging from my fake Twitter for pets startup, to global fortune 2000 companies are raving about. If you end up talking to them, because you don't have to, they get why self service is important—but if you do, be sure to tell them that I sent you and watch all of the blood drain from their faces instantly. You can find them in the AWS marketplace or at www.netlify.com. N-E-T-L-I-F-Y.comCorey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats v-u-l-t-r.com slash screaming.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Periodically, I like to poke fun at a variety of different things, and that can range from technologies or approaches like multi-cloud, and that includes business functions like marketing, and sometimes it extends even to companies like VMware. My guest today is the Senior Director of Multi-Cloud Solutions at VMware, so I'm basically spoilt for choice. Betty Junod, thank you so much for taking the time to speak with me today and tolerate what is no doubt going to be an interesting episode, one way or the other.Betty: Hey, Corey, thanks for having me. I've been a longtime follower, and I'm so happy to be here. And good to know that I'm kind of like the ultimate cross-section of all the things [laugh] that you can get snarky about.Corey: The only thing that's going to make that even better is if you tell me, “Oh, yeah, and I moonlight on a contract gig by naming AWS services.” And then I just won't even know where to go. But I'll assume they have to generate those custom names in-house.Betty: Yes. Yes, I think they do those there. I may comment on it after the fact.Corey: So, periodically I am, let's call it miscategorized, in my position on multi-cloud, which is that it's a worst practice that when you're designing something from scratch, you should almost certainly not be embracing unless you're targeting a very specific corner case. And I stand by that, but what that has been interpreted as by the industry, in many cases because people lack nuance when you express your opinions in tweet-sized format—who knew—as me saying, “Multi-cloud bad.” Maybe, maybe not. I'm not interested in assigning value judgment to it, but the reality is that there are an awful lot of multi-cloud deployments out there. And yes, some of them started off as, “We're going to migrate from one to the other,” and then people gave up and called it multi-cloud, but it is nuanced. VMware is a company that's been around for a long time. It has reinvented itself in a few different ways at different periods of its evolution, and it's still highly relevant. What is the Multi-Cloud Solutions group over at VMware? What do you folks do exactly?Betty: Yeah. And so I will start by multi-cloud; we're really taking it from a position of meeting the customer where they are. So, we know that if anything, the only thing that's a given in our industry is that there will be something new in the next six months, next year, and the whole idea of multi-cloud, from our perspective, is giving customers the optionality, so don't make it so that it's a closed thing for them. But if they decide—it's not that they're going to start, “Hey, I'm going to go to cloud, so day one, I'm going to go all-in on every cloud out there.” That doesn't make sense, right, as—Corey: But they all gave me such generous free credit offers when I founded my startup; I feel obligated to at this point.Betty: I mean, you can definitely create your account, log in, play around, get familiar with the console, but going from zero to being fully operationalized team to run production workloads with the same kind of SLAs you had before, across all three clouds—what—within a week is not feasible for people getting trained up and actually doing that. Our position is that meeting customers where they are and knowing that they may change their mind, or something new will come up—a new service—and they really want to use a new service from let's say GCP or AWS, they want to bring that with an application they already have or build a new app somewhere, we want to help enable that choice. And whether that choice applies to taking an existing app that's been running in their data center—probably on vSphere—to a new place, or building new stuff with containers, Kubernetes, serverless, whatever. So, it's all just about helping them actually take advantage of those technologies.Corey: So, it's interesting to me about your multi-cloud group, for lack of a better term, is there a bunch of things fall under its umbrella? I believe Bitnami does—or as I insist on calling it, ‘bitten-A-M-I'—I believe that SaltStack—which I wrote a little bit of once upon a time, which tells me you folks did no due diligence whatsoever because everything I've ever written is molten garbage—Betty: Not [unintelligible 00:04:33].Corey: And—so to be clear, SaltStack is good; just the parts that I wrote are almost certainly terrible because have you met me?Betty: I'll make a note. [laugh].Corey: You have Wavefront, you have CloudHealth, you have a bunch of other things in the portfolio, and yeah, all those things do work across multiple clouds, but there's nothing that makes using any of those things a particularly bad idea even if you're all-in on one cloud provider, too. So, it's a portfolio that applies to a whole bunch have different places from your perspective, but it can be used regardless of where folks stand ideologically.Betty: Yes. So, this goes back to the whole idea that we meet the customers where they are and help them do what they want to do. So, with that, making sure these technologies that we have work on all the clouds, whether that be in the data center or the different vendors, so that if a customer wants to just use one, or two, or three, it's fine. That part's up to them.Corey: The challenge I've run into is that—and maybe this is a ‘Twitter Bubble' problem, but unfortunately, having talked to a whole bunch of folks in different contexts, I know it isn't—there's almost this idea that you have to be incredibly dogmatic about a particular technology that you're into. I joke periodically about the Rust Evangelism Strikeforce where their entire job is talking about using Rust; their primary IDE is PowerPoint because they're giving talks all the time about it rather than writing code. And great, that's a bit of an exaggeration, but there are the idea of a technology purist who is taking, “Things must be this way,” well past a point of being reasonable, and disregarding the reality that, yeah, the world is messy in a way that architectural diagrams never are.Betty: Yeah. The architectural diagrams are always 2D, right? Back to that PowerPoint slide: how can I make pretty boxes? And then I just redraw a line because something new came out. But you and I have been in this industry for a long time, there's always something new.And I think that's where the dogmatism gets problematic because if you say we're only going to do containers this way—you know, I could see Swarm and Kubernetes, or all-in on AWS and we're going to use all the things from AWS and there's only this way. Things are generational and so the idea that you want to face the reality and say that there is a little bit of everything. And then it's kind of like, how do you help them with a part of that? As a vendor, it could be like, “I'm going to help us with a part of it, or I'm going to help address certain eras of it.” That's where I think it gets really bad to be super dogmatic because it closes you off to possibly something new and amazing, new thinking, different ways to solve the same problem.Corey: That's the problem is left to our own devices, most of us who are building things, especially for random ideas, yeah, there's a whole modern paradigm of how I can build these things, but I'm going to shortcut to the thing I know best, which may very well the architectures that I was using 15 years ago, maybe tools that I was using 15 years ago. There's a reason that Vim is still as popular as it is. Would I recommend it to someone who's a new user? Absolutely not; it's user-hostile, but back in my days of being a grumpy sysadmin, you learned vi because it was on everything you could get into, and you never knew in what environment you were going to be encountering stuff. These days, you aren't logging in to remote systems to manage them, in most cases, and when it happens, it's a rarity and a bug.The world changes; different approaches change, but you have to almost reinvent your entire philosophy on how things work and what your career trajectory looks like. And you have to give up aspects of what you've considered to be part of your identity and embrace something new. It was hard for me to accept that, for example, Docker and the wave of containerization that was rolling out was effectively displacing the world that I was deep in of configuration management with Puppet and with Salt. And the world changes; I said, “Okay, now I'll work on cloud.” And if something else happens, and mainframes are coming back again, instead, well, I'm probably not going to sit here railing against the tide. It would be ridiculous to do that from my perspective. But I definitely understand the temptation to fight against it.Betty: Mm-hm. You know, we spend so much time learning parts of our craft, so it's hard to say, “I'm now not going to be an expert in my thing,” and I have to admit that something else might be better and I have to be a newbie again. That can be scary for someone who's spent a lot of time to be really well-versed in a specific technology. It's funny that you bring up the whole Docker and Puppet config management; I just had a healthy discussion over Slack with some friends. Some people that we know and comment about some of the newer areas of config management, and the whole idea is like, is it a new category or an evolution of? And I went back to the point that I made earlier is like, it's generations. We continually find new ways to solve a problem, and one thing now is it [sigh] it just all goes so much faster, now. There's a new thing every week. [laugh] it seems sometimes.Corey: It is, and this is the joy of having been in this industry for a while—toxic and broken in many ways though it is—is that you go through enough cycles of seeing today's shiny, new, amazing thing become tomorrow's legacy garbage that we're stuck supporting, which means that—at least from my perspective—I tend to be fairly conservative with adopting new technologies with respect to things that matter. That means that I'm unlikely to wind up looking at the front page of Hacker News to pick a framework to build a banking system in, and I'm unlikely to be the first kid on my block to update to a new file system or database, just because, yeah, if I break a web server, we all laugh, we make fun of the fact that it throws an error for ten minutes, and then things are back up and running. If I break the database, there's a terrific chance that we don't have a company anymore. So, it's the ‘mistakes will show' area and understanding when to be aggressive and when to hold back as far as jumping into new technologies is always a nuanced decision. And let's be clear as well, an awful lot of VMware's customers are large companies that were founded, somehow—this is possible—before 2010. Imagine that. Did people—Betty: [laugh]. I know, right?Corey: —even have businesses or lives back then? I thought we all used horse-driven carriages and whatnot. And they did not build on cloud—not because of any perception of distrust; because it functionally did not exist at the time that they were building these things. And, “Oh, come out into the cloud. It's fine now.” It… yeah, that application is generating hundreds of millions in revenue every quarter. Maybe we treat that with a little bit of respect, rather than YOLO-ing it into some Lambda-driven monster that's constructed—Betty: One hundred—Corey: —out of popsicle sticks and glue.Betty: —percent. Yes. I think people forget that. And it's not that these companies don't want to go to cloud. It's like, “I can't break this thing. That could be, like, millions of dollars lost, a second.”Corey: I write my weekly newsletters in a custom monstrosity of a system that has something like 30-some-odd Lambda functions, a bunch of API gateways that are tied together with things, and periodically there are challenges with it that break as the system continues to evolve. And that's fine. And I'm okay with using something like that as a part of my workflow because absolute worst case, I can go back to the way that my newsletter was originally written: in Google Docs, and it doesn't look anywhere near the same way, and it goes back to just a text email that starts off with, “I have messed up.” And that would be a better story than most of the stuff I put out as a common basis. Similarly, yeah, durability is important.If this were a serious life-critical app, it would not just be hanging out in a single region of a single provider; it would probably be on one provider, as I've talked about, but going multi-region and having backups to a different cloud provider. But if AWS takes a significant enough outage to us-west-2 in Oregon, to the point where my ridiculous system cannot function to write the newsletter, that too, is a different handwritten email that goes out that week because there's no announcement they've made that anyone's going to give the slightest toss about, given the fact that it's basically Cloud Armageddon. So, we'll see. It's about understanding the blast radius and understanding your use case.Betty: Yep. A hundred percent.Corey: So, you've spent a fair bit of time doing interesting things in your career. This is your second outing at VMware, and in the interim, you were at solo.io for a bit, and before that you were in a marketing leadership role at Docker. Let's dive in, if you will. Given that you are no longer working at Docker, they recently made an announcement about a pricing model change, whereas it is free to use Docker Desktop for anyone's personal projects, and for small companies.But if you're a large company, which they define is ten million in revenue a year or 250 employees—those two things don't go alike, but okay—then you have to wind up having a paid plan. And I will say it's a novel approach, but I'm curious to hear what you have to say about it.Betty: Well, I'd say that I saw that there was a lot of flutter about that news, and it's kind of a, it doesn't matter where you draw the line in the sand for the tier, there's always going to be some pushback on it. So, you have to draw a line somewhere. I haven't kept up with the details around the pricing models that they've implemented since I left Docker a few years ago, but monetization is a really important part for a startup. You do have to make money because there are people that you have to pay, and eventually, you want to get off of raising money from VCs all the time. Docker Desktop has been something that has been a real gem from a local developer experience, right, giving the—so that has been well-received by the community.I think there was an enterprise application for it, but when I saw that, I was like, yeah, okay, cool. They need to do something with that. And then it's always hard to see the blowback. I think sometimes with the years that we've had with Docker, it's kind of like no matter what they do, the Twitterverse and Hacker News is going to just give them a hard time. I mean, that is my honest opinion on that. If they didn't do it, and then, say, they didn't make the kind of revenue they needed, people would—that would become another Twitter thread and Hacker News blow up, and if they do it, you'll still have that same reaction.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of "Hello, World" demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking databases, observability, management, and security.And - let me be clear here - it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself all while gaining the networking load, balancing and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build.With Always Free you can do things like run small scale applications, or do proof of concept testing without spending a dime. You know that I always like to put asterisks next to the word free. This is actually free. No asterisk. Start now. Visit https://snark.cloud/oci-free that's https://snark.cloud/oci-free.Corey: It seems to be that Docker has been trying to figure out how to monetize for a very long time because let's be clear here; I think it is difficult to overstate just how impactful and transformative Docker was to the industry. I gave a talk “Heresy in the Church of Docker” that listed a bunch of things that didn't get solved with Docker, and I expected to be torn to pieces for it, and instead I was invited to give it at ContainerCon one year. And in time, a lot of those things stopped being issues because the industry found answers to it. Now, unfortunately, some of those answers look like Kubernetes, but that's neither here nor there. But now it's, okay, so giving everything that you do that is core and central away for free is absolutely part of what drove the adoption that it saw, but goodwill from developers is not the sort of thing that generally tends to lead to interesting revenue streams.So, they had to do something. And they've tried a few different things that haven't seemed to really pan out. Then they spun off that pesky part of their business that made money selling support contracts, over to Mirantis, which was apparently looking for something now that OpenStack was no longer going to be a thing, and Kubernetes is okay, “Well, we'll take Docker enterprise stuff.” Great. What do they do, as far as turning this into a revenue model?There's a lot of the, I guess, noise that I tend to ignore when it comes to things like this because angry people on Twitter, or on Hacker News, or other terrible cesspools on the internet, are not where this is going to be decided. What I'm interested in is what the actual large companies are going to say about it. My problem with looking at it from the outside is that it feels as if there's significant ambiguity across the board. And if there's one thing that I know about large company procurement departments, it's that they do not like ambiguity. This change takes effect in three or four months, which is underwear-outside-the-pants-superhero-style speed for a lot of those companies, and suddenly, for a lot of developers, they're so far removed from the procurement side of the house that they are never going to have a hope of getting that approved on a career-wide timespan.And suddenly, for a lot of those companies, installing and running Docker Desktop just became a fireable offense because from the company's perspective, the sheer liability side of it, if they were getting subject to audit, is going to be a problem. I don't believe that Docker is going to start pulling Oracle-like audit tactics, but no procurement or risk management group in the world is going to take that on faith. So, the problem is not that it's expensive because that can be worked around; it's not that there's anything inherently wrong with their costing model. The problem is the ambiguity of people who just don't know, “Does this apply to me or doesn't this apply to me?” And that is the thing that is the difficult, painful part.And now, as a result, the [unintelligible 00:17:28] groups and their champions of Docker Desktop are having to spend a lot more time, energy, and thought on this than it would simply be for cutting a check because now it's a risk org-wide, and how do we audit to figure out who's installed this previously free open-source thing? Now what?Betty: Yeah, I'll agree with you on that because once you start making it into corporate-issued software that you have to install on the desktop, that gets a lot harder. And how do you know who's downloaded it? Like my own experience, right? I have a locked-down laptop; I can't just install whatever I want. We have a software portal, which lets me download the approved things.So, it's that same kind of model. I'd be curious because once you start looking at from a large enterprise perspective, your developers are working on IP, so you don't want that on something that they've downloaded using their personal account because now it sits—that code is sitting with their personal account that's using this tool that's super productive for them, and that transition to then go to an enterprise, large enterprise and going through a procurement cycle, getting a master services agreement, that's no small feat. That's a whole motion that is different than someone swiping a credit card or just downloading something and logging in. It's similar to what you see sometimes with the—how many people have signed up for and paid 99 bucks for Dropbox, and then now all of a sudden, it's like, “Wow, we have all of megacorp [laugh] signed up, and then now someone has to sell them a plan to actually manage it and make sure it's not just sitting on all these personal drives.”Corey: Well, that's what AWS's original sales motion looked a lot like they would come in and talk to the CTO or whatnot at giant companies. And the CTO would say, “Great, why should we pick AWS for our cloud needs?” And the answer is, “Oh, I'm sorry. You have 87 distinct accounts within your organization that we've [unintelligible 00:19:12] up for you. We're just trying to offer you some management answers and unify the billing and this, and probably give you a discount as well because there is price breaks available at certain sizing.” It was a different conversation. It's like, “I'm not here to sell you anything. We're already there. We're just trying to formalize the relationship.” And that is a challenge.Again, I'm not trying to cast aspersions on procurement groups. I mean, I do sell enterprise consulting here at The Duckbill Group; we deal with an awful lot of procurement groups who have processes and procedures that don't often align to the way that we do things as a ten-person, fully remote company. We do not have commercial vehicle insurance, for example, because we do not have a commercial vehicle and that is a prerequisite to getting the insurance, for one. We're unlikely to buy one to wind up satisfying some contractual requirements, so we have to go back and forth and get things like that removed. And that is the nature of the beast.And we can say yes, we can say no on a lot of those questionnaires, but, “It depends,” or, “I don't know,” is the sort of thing that's going to cause giant red flags and derail everything. But that is exactly what Docker is doing. Now, it's the well, we have a sort of sloppy, weird set of habits with some of our engineers around the bring your own device to work thing. So, that's the enterprise thing. Let me be very clear, here at The Duckbill Group, we have a policy of issuing people company machines, we manage them very lightly just to make sure the drives are encrypted, so they—and that the screensaver comes out with a password, so if someone loses a laptop, it's just, “Replace the hardware,” not, “We have a data breach.”Let's be clear here; we are responsible about these things. But beyond that, it's oh, you want to have some personal thing installed on your machine or do some work on that stuff? Fine. By all means. It's a situation of we have no policy against it; we understand this is how work happens, and we trust people to effectively be grownups.There are some things I would strongly suggest that any employee—ours or anyone else—not cross the streams on for obvious IP ownership rights and the rest, we have those conversations with our team for a reason. It's, understand the nuances of what you're doing, and we're always willing to throw hardware at people to solve these problems. Not every company is like that. And ten million in revenue is not necessarily a very large company. I was doing the math out for ten million in revenue or 250 employees; assuming that there's no outside investment—which with VC is always a weird thing—it's possible—barely—to have a $10 million in revenue company that has 250 employees, but if they're full time they are damn close to a $15 an hour minimum wage. So, who does it apply to? More people than you might believe.Betty: Yeah, I'm really curious to how they're going to like—like you say, if it takes place in three or four months, roll that out, and how would you actually track it and true that up for people? So.Corey: Yeah. And there are tools and processes to do this, but it's also not in anyone's roadmap because people are not sitting here on their annual planning periods—which is always aspirational—but no one's planning for, “Oh, yeah, Q3, one of our software suppliers is going to throw a real procurement wrench at us that we have to devote time, energy, resources, and budget to figure out.” And then you have a problem. And by resources, I do mean resources of basically assigning work and tooling and whatnot and energy, not people. People are humans, they are not resources; I will die on that hill.Betty: Well, you know, actually resource-wise, the thing that's interesting is when you say supplier, if it's something that people have been able to download for free so far, it's not considered a supplier. So, it's—now they're going to go from just a thing I can use and maybe you've let your developers use to now it has to be something that goes through the official internal vetting as being a supplier. So, that's just—it's a whole different ball game entirely.Corey: My last job before I started this place, was a highly regulated financial institution, and even grabbing things were available for free, “Well, hang on a minute because what license is it using and how is it going to potentially be incorporated?” And this stuff makes sense, and it's important. Now, admittedly, I have the advantage of a number of my engineering peers in that I've been married to a corporate attorney for 11 years and have insight into that side of the world, which to be clear, is all about risk mitigation which is helpful. It is a nuanced and difficult field to—as are most things once you get into them—and it's just the uncertainty that befuddles me a bit. I wish them well with it, truly I do. I think the world is better with an independent Docker in it, but I question whether this is going to find success. That said, it doesn't matter what I think; what matters is what customers say and do, and I'm really looking forward to seeing how it plays out.Betty: A hundred percent; same here. As someone who spent a good chunk of my life there, their mark on the industry is not to be ignored, like you said, with what happened with containers. But I do wish them well. There's lot of good people over there, it's some really cool tech, and I want to see a future for them.Corey: One last topic I want to get into before we wind up wrapping this episode is that you are someone who was nominated to come on the show by a couple of folks, which is always great. I'm always looking for recommendations on this. But what's odd is that you are—if we look at it and dig a little bit beneath the titles and whatnot, you even self-describe as your history is marketing leadership positions. It is uncommon for engineering-types to recommend that I talk to marketing folks.s personally I think that is a mistake; I consider myself more of a marketer than not in some respects, but it is uncommon, which means I have to ask you, what is your philosophy of marketing because it very clearly is differentiated in the public eye.Betty: I'm flattered. I will say that—and this goes to how I hire people and how I coach teams—it's you have to be super curious because there's a ton of bad marketing out there, where it's just kind of like, “Hey, we do these five things and we always do these five things: blah, blah, blah, blah, blah.” But I think it's really being curious about what is the thing that you're marketing? There are people who are just focused on the function of marketing and not the thing. Because you're doing your marketing job in the service of a thing, this new widget, this new whatever, and you got to be super curious about it.And I'll tell you that, for me, it's really hard for me to market something if I'm not excited about it. I have to personally be super excited about the tech or something happening in the industry, and it's, kind of like, an all-in thing for me. And so in that sense, I do spend a ton of time with engineers and end-users, and I really try to understand what's going on. I want to understand how the thing works, and I always ask them, “Well”—so I'll ask the engineers, like, “So… okay, this sounds really cool. You just described this new feature and you're super excited about it because you wrote it, but how is your end-user, the person you're building this for, how did they do this before? Help me understand. How did they do this before and why is this better?”Just really dig into it because for me, I want to understand it deeply before I talk about it. I think the thing is, it shows a tremendous amount of respect for the builder, and then to try to really be empathetic, to understand what they're doing and then partner with them—I mean, this sounds so business-y the way I'm talking about this—but really be a partner with them and just help them make their thing really successful. I'm like the other end; you're going to build this great thing and now I'm going to make it sound like it's the best thing that's ever happened. But to do that, I really need to deeply understand what it is, and I have to care about it, too. I have to care about it in the way that you care about it.Corey: I cannot effectively market or sell something that I don't believe in, personally. I also, to be clear because you are a marketing professional—or at least far more of one than I ever was—I do not view what I do is marketing; I view it as spectacle. And it's about telling stories to people, it's about learning what the market thinks about it, and that informs product design in many respects. It's about understanding the product itself. It's about being able to use the product.And if people are listening to this and think, “Wait a minute, that sounds more like DevRel.” I have news for you. DevRel is marketing, they're just scared to tell you that. And I know people are going to disagree with me on that. You're wrong. But that's okay; reasonable people can disagree.And that's how I see it is that, okay, I'll talk to people building the service, I'll talk to people using the service, but then I'm going to build something with the service myself because until then, it's all a game of who sounds the most convincing in the stories that they tell. But okay, you can tell an amazing story about something, but if it falls over when I tried to use it, well, I'm sorry, you're not being accurate in your descriptions of it.Betty: A hundred percent. I hate to say, like, you're storytellers, but that's a big part of it, but it's kind of like you want to tell the story, so you do something to that people believe a certain thing. But that's part of a curated experience because you want them to try this thing in a certain way. Because you've designed it for something. “I built a spoon. I want you to use that to eat your soup because you can't eat soup with a fork.”So, then you'll have this amazing soup-eating experience, but if I build you a spoon and then not give you any directions and you start throwing it at cars, you're going to be like, “This thing sucks.” So, I kind of think of it in that way. To your point of it has to actually work, it's like, but they also need to know, “What am I supposed to use it for?”Corey: The problem I've always had on some visceral level with formal marketing departments for companies is that they can say that a product that they sell is good, they can say that the product is great, or they can choose to say nothing at all about that product, but when there's a product in the market that is clearly a turd, a marketing department is never going to be able to say that, which I think erodes its authenticity in many respects. I understand the constraints behind, that truly I do, but it's the one superpower I think that I bring to the table where even when I do sponsorship stuff it's, you can buy my attention but not my opinion. Because the authenticity of me being trusted to call them like I see them, for lack of a better term, to my mind at least outweighs any short-term benefit from saying good things about a product that doesn't deserve them. Now, I've been wrong about things, sure. I have also been misinformed in both directions, thinking something is great when it's not, or terrible when it isn't or not understanding the use case, and I am thrilled to engage in those debates. “But this is really expensive when you run for this use case,” and the answer can be, “Well, it's not designed for that use case.” But the answer should not be, “No it's not.” I promise you, expensive is in the eye of the customer not the person building the thing.Betty: Yes. This goes back to I have to believe in the thing. And I do agree it's, like not [sigh]—it's not a panacea. You're not going to make Product A and it's going to solve everything. But being super clear and focused on what it is good for, and then please just try it in this way because that's what we built it for.Corey: I want to thank you for taking the time to have a what for some people is no doubt going to be perceived as a surprisingly civil conversation about things that I have loud, heated opinions about. If people want to learn more, where can they find you?Betty: Well, they can follow me on Twitter. But um, I'd say go to vmware.com/cloud for our work thing.Corey: Exactly. VM where? That's right. VM there. And we will, of course, put links to that in the [show notes 00:30:07].Betty: [laugh].Corey: Thank you so much for taking the time to speak with me. I appreciate it.Betty: Thanks, Corey.Corey: Betty Junod, Senior Director of Multi-Cloud Solutions at VMware. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with a loud, ranting comment at the end. Then, if you work for a company that is larger than 250 people or $10 million in revenue, please also Venmo me $5.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Nick Davey (Sr. Product Manager, Cloud & SDN @JuniperNetworks) talks about the evolution of SDN networking for Kubernetes, cross-cluster and cross-cloud networking, and Tungsten Fabric. SHOW: 563CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwCHECK OUT OUR NEW PODCAST - "CLOUDCAST BASICS"SHOW SPONSORS:Megaport - Network as a Service PlatformTry Megaport - Cloud Connectivity SimplifiedCBT Nuggets: Expert IT Training for individuals and teamsSign up for a CBT Nuggets Free Learner account SHOW NOTES:Juniper (homepage)Juniper Cloud ServicesJuniper SDN and OrchestrationJuniper Networks Day One: Building Containers Using Kubernetes and Contrail Tungsten Fabric from The Linux FoundationTopic 1 - Welcome to the show. Tell us a little bit about your background, and where you focus at Juniper?Topic 2 - There's lots of talk about Hybrid Cloud and Multi-Cloud, but it's often in the context of Kubernetes or an application running in multiple places. Let's talk about how we interconnect clouds.Topic 3 - How should we think about the networking interconnections between end-users, the service providers, and the public cloud providers? Topic 4 - Help us understand the similarities and differences between the orchestration of workloads (e.g. containers and Kubernetes) and orchestration of networks (e.g. SDN, Contrail, etc.). Who needs to know about each system?Topic 5 - How does this all evolve as things like Kubernetes, KubeVirt (VMs) extend out to the edge? Topic 6 - What are some of the newer use-cases or application-types that are starting to push the need for new cloud networking technologies? FEEDBACK?Email: show at the cloudcast dot netTwitter: @thecloudcastnet
We celebrate the launch of Knative 1.0 with Ville Aikas, who has been with the project since the beginning. He was also with the Kubernetes team at the beginning, and thus we cannot resist a Pete Best comparison. We also celebrate Jimmy’s last show as our guest host with a rapid-fire Kubernetes quiz. Do you have something cool to share? Some questions? Let us know: web: kubernetespodcast.com mail: email@example.com twitter: @kubernetespod Chatter of the week Jimmy graduates! CNCF Landscape The menu at the Cheesecake Factory In-n-Out Secret Menu Links from the interview Important programmers from Finland Paddington Bear University of Washington Google Voice Google Cloud Storage Read-after-write consistency The Fifth Beatle Knative Serving Eventing Build, which became Tekton Pipelines Did we market Knative wrong? by Ahmet Alp Balkan Duck typing Rubber duck debugging Extending Knative for Fun and Profit, by Matt Moore & Ville Aikas Subresources Proposal for custom subresources for CRDs Google Cloud Run IBM Cloud Code Engine Knative steering committee and technical oversight committee Great artists steal Chainguard Episode 152, guest hosted by Dan Lorenc Episode 47, with Kim Lewandowski SLSA Sigstore Ville to present at Knative community meetup on November 17 Craig presented Knative at the Kubernetes Colorado meetup in July 2018 Seattle Kraken Ville Aikas on Twitter
Watch the live stream: Watch on YouTube About the show Sponsored by Shortcut - Get started at shortcut.com/pythonbytes Special guest: The Anthony Shaw Michael #0: It's episode 2^8 (nearly 5 years of podcasting) Brian #1: Where does all the effort go?: Looking at Python core developer activity Łukasz Langa A look into CPython repository history and PR data Also, nice example of datasette in action and lots of SQL queries. The data, as well as the process, is open for anyone to look at. Cool that the process was listed in the article, including helper scripts used. Timeframe for data is since Feb 10, 2017, when source moved to GitHub, through Oct 9, 2021. However, some queries in the article are tighter than that. Queries Files involved in PRs since 1/1/20 top is ceval.c with 259 merged PRs Contributors by number of merged PRs lots of familiar names in the top 50, along with some bots it'd be fun to talk with someone about the bots used to help the Python project nice note: “Clearly, it pays to be a bot … or a release manager since this naturally causes you to make a lot of commits. But Victor Stinner and Serhiy Storchaka are neither of these things and still generate amazing amounts of activity. Kudos! In any case, this is no competition but it was still interesting to see who makes all these recent changes.” Who contributed where? Neat. There's a self reported Experts Index in the very nice Python Developer's Guide. But some libraries don't have anyone listed. The data does though. Łukasz generated a top-5 list for each file. Contributing to some file and have a question. These folks may be able to help. Averages for PR activity core developer authoring and merging their own PR takes on average ~7 days (std dev ±41.96 days); core developer authoring a PR which was merged by somebody else takes on average 20.12 days (std dev ±77.36 days); community member-authored PRs get merged on average after 19.51 days (std dev ±81.74 days). Interesting note on those std deviations: “Well, if we were a company selling code review services, this standard deviation value would be an alarmingly large result. But in our situation which is almost entirely volunteer-driven, the goal of my analysis is to just observe and record data. The large standard deviation reflects the large amount of variation but isn't necessarily something to worry about. We could do better with more funding but fundamentally our biggest priority is keeping CPython stable. Certain care with integrating changes is required. Erring on the side of caution seems like a wise thing to do.” More questions to be asked, especially from the issue tracker Which libraries require most maintenance? Michael #2: Why you shouldn't invoke setup.py directly By Paul Ganssle (from Talk Python #271: Unlock the mysteries of time, Python's datetime that is!) In response to conversation in Talk Python's cibuildwheel episode? For a long time, setuptools and distutils were the only game in town when it came to creating Python packages You write a setup.py file that invokes the setup() method, you get a Makefile-like interface exposed by invoking python setup.py [HTML_REMOVED] The last few years all direct invocations of setup.py are effectively deprecated in favor of invocations via purpose-built and/or standards-based CLI tools like pip, build and tox. In Python 2.0, the distutils module was introduced as a standard way to convert Python source code into *nix distro packages One major problem with this approach, though, is that every Python package must use distutils and only distutils — there was no standard way for a package author to make it clear that you need other packages in order to build or test your package. => Setuptools Works, but sometimes you need requirements before the install (see cython example) A build backend is something like setuptools or flit, which is a library that knows how to take a source tree and turn it into a distributable artifact — a source distribution or a wheel. A build frontend is something like pip or build, which is a program (usually a CLI tool) that orchestrates the build environment and invokes the build backend In this taxonomy, setuptools has historically been both a backend and a frontend - that said, setuptools is a terrible frontend. It does not implement PEP 517 or PEP 518's requirements for build frontends Why am I not seeing deprecation warnings? Use build package. Also can be replaced by tox, nox or even a Makefile Probably should just check out the summary table. Anthony #3: OpenTelemetry is going stable soon Cloud Native Computing Foundation project for cross-language event tracing, performance tracing, logging and sampling for distributed applications. Engineers from Microsoft, Amazon, Splunk, Google, Elastic, New Relic and others working on standards and specification. Formed through a merger of the OpenTracing and OpenCensus projects. Python SDK supports instrumentation of lots of frameworks, like Flask, Django, FastAPI (ASGI), and ORMs like SQLalchemy, or templating engines. All data can then be exported onto various platforms : NewRelic, Prometheus, Jaeger, DataDog, Azure Monitor, Google Cloud Monitoring. If you want to get started and play around, checkout the rich console exporter I submitted recently. Brian #4: Understanding all of Python, through its builtins Tushar Sadhwani I really enjoyed the discussion before he actually got to the builtins. LEGB rule defines the order of scopes in which variables are looked up in Python. Local, Enclosing (nonlocal), Global, Builtin Understanding LEGB is a good thing to do for Python beginners or advanced beginners. Takes a lot of the mystery away. Also that all the builtins are in one The rest is a quick scan through the entire list. It's not detailed everywhere, but pulls over scenic viewpoints at regular intervals to discuss interesting parts of builtins. Grouped reasonably. Not alphabetical Constants: There's exactly 5 constants: True, False, None, Ellipsis, and NotImplemented. globals and locals: Where everything is stored bytearray and memoryview: Better byte interfaces bin, hex, oct, ord, chr and ascii: Basic conversions … Well, it's a really long article, so I suggest jumping around and reading a section or two, or three. Luckily there's a nice TOC at the top. Michael #5: FastAPI, Dask, and more Python goodies win best open source titles Things that stood out to me FastAPI Dask Windows Terminal minikube - Kubernetes cluster on your PC OBS Studio Anthony #6: Notes From the Meeting On Python GIL Removal Between Python Core and Sam Gross Following on from last week's share on the “nogil” branch by Sam Gross, the Core Dev sprint included an interview. Targeted to 3.9 (alpha 3!), needs to at least be updated to 3.9.7. Nogil: Replaces pymalloc with mimalloc for thread safety Ties objects to the thread that created them witha. non-atomic local reference count within the owner thread Allows for (slower) reference counting from other threads. Immortalized some objects so that references never get inc/dec'ed like True, False, None, etc. Deferred reference counting Adjusts the GC to wait for all threads to pause at a safe point, doesn't wait for I/O blocked threads and constructs a list of objects to deallocate using mimalloc Relocates the MRO to a thread local (instead of process-local) to avoid contention on ref counting Modifies the builtin collections to be thread-safe (lists, dictionaries, etc,) since they could be shared across threads. IMHO, biggest thing to happen to Python in 5 years. Encouragingly, Sam was invited to be a Core Dev and Lukasz will mentor him! Extras Michael Python Developers Survey 2021 is open More PyPI CLI updates bump2version via Bahram Aghaei (youtube comment) Was there a bee stuck in Brian's mic last time? Brian PyCon US 2022 CFP is open until Dec 20 Python Testing with pytest, 2nd edition, Beta 7.0 All chapters now there. (Final chapter was “Advanced Parametrization”) It's in technical review phase now. If reading, please skip ahead to the chapter you really care about and submit errata if you find anything confusing. Joke:
Today on the Full Stack Podcast we dive into TriggerMesh, an open-source platform for putting together event-driven applications. It's built on Kubernetes. Scott Lowe speaks with co-founder and CEO Mark Hinkle. The post Full Stack Journey 059: Composing Event-Driven Applications With TriggerMesh appeared first on Packet Pushers.