POPULARITY
How can modern telemetry solutions help you? While at NDC in Melbourne, Richard chatted with Liz Fong-Jones of Honeycomb about her approach to educating leadership on creating great telemetry solutions for organizations. Liz discusses the importance of being able to answer questions about reliability issues without causing problems, as opposed to relying on a dashboard for every measurement taken of the system. The conversation also delves into the culture of building reliability, where people are encouraged to fail and learn, rather than being punished when things go wrong.LinksHoneycombRecorded April 30, 2025
On the first Primo episode of the year, Jesse and Katie discuss the technologist Liz Fong-Jones and her battle against Kiwi Farms. Plus, a consent accident, a very online defamation suit, and an unsolicited email from an old friend. Trans Lifeline tax filingLiz Fong-Jones (方禮真) on Twitter: "Sigh, I was hopi… To hear more, visit www.blockedandreported.org
This interview was recorded for GOTO Unscripted.https://gotopia.techRead the full transcription of this interview here:https://gotopia.tech/articles/336Charity Majors - CTO at honeycomb.io & Co- Author of "Observability Engineering"James Lewis - Software Architect & Director at ThoughtworksRESOURCESCharityhttps://twitter.com/mipsytipsyhttps://github.com/charityhttps://linkedin.com/in/charity-majorshttps://charity.wtfJameshttps://twitter.com/boicyhttps://linkedin.com/in/james-lewis-microserviceshttps://github.com/boicyhttps://www.bovon.orgDESCRIPTIONWhat's next in the observability space? Join Charity Majors and James Lewis as they discuss canonical logs, Observability 2.0 and how to simplify complexity in software engineering!RECOMMENDED BOOKSCharity Majors, Liz Fong-Jones & George Miranda • Observability Engineering • https://amzn.to/38scbmaCharity Majors & Laine Campbell • Database Reliability Engineering • https://amzn.to/3ujybdSKelly Shortridge & Aaron Rinehart • Security Chaos Engineering • https://www.verica.io/sce-bookNora Jones & Casey Rosenthal • Chaos Engineering • https://amzn.to/3hUmuAHRuss Miles • Learning Chaos Engineering • https://amzn.to/3hCiUe8Nicole Forsgren, Jez Humble & Gene Kim • Accelerate • https://amzn.to/442Rep0BlueskyTwitterInstagramLinkedInFacebookCHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/joinLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
Liz Fong-Jones (former Google SRE and current Field CTO at honeycomb.io) joins hosts Steve McGhee and Jordan Greenberg for a lively discussion centered around observability, its evolution from monitoring, and its role in modern software development. Tune in for more on the importance of observability as a spectrum, the evolving role of SREs, and advice to aspiring software engineers.
This interview was recorded at GOTO Copenhagen for GOTO Unscripted.http://gotopia.techRead the full transcription of this interview hereLiz Fong-Jones - Field CTO at Honeycomb.ioMarit van Dijk - Developer Advocate at JetBrains & Open Source ContributorRESOURCESLIzhttps://twitter.com/lizthegreyhttps://linkedin.com/in/efonghttps://www.lizthegrey.comMarithttps://twitter.com/MaritvanDijk77https://linkedin.com/in/maritvandijkhttps://mastodon.social/@maritvandijkhttps://github.com/mlvandijkhttps://medium.com/@mlvandijkhttps://maritvandijk.comDESCRIPTIONExplore the intricacies of efficient development collaboration and gain valuable insights into Site Reliability Engineering (SRE) strategies in this engaging conversation.Liz Fong-Jones and Marit van Dijk delve into the challenges developers face, emphasizing streamlined communication and workflow optimization. From managing software dependencies to the evolving role of SRE teams, they share practical experiences and thoughts on building internal platforms, shedding light on the collaborative dynamics that shape successful development endeavors.Discover how embracing effective communication and proven SRE practices can pave the way for improved team efficiency and impactful software development outcomes.RECOMMENDED BOOKSCharity Majors, Liz Fong-Jones & George Miranda • Observability EngineeringBeyer, Murphy, Rensin, Kawahara & Thorne • The Site Reliability WorkbookKelly Shortridge & Aaron Rinehart • Security Chaos EngineeringNora Jones & Casey Rosenthal • Chaos EngineeringRuss Miles • Learning Chaos EngineeringMark Seemann & Steven van Deursen • Dependency Injection Principles, Practices & PatternsTwitterInstagramLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
Haitians helping Missourians reach God, Dutch Healthcare, Liz Fong-Jones's Fetlife, and the fallout of Nick Rekieta's Polysubstance Abuse.
On this episode of “B The Way Forward,” host Brenda Darden Wilkerson explores what it means to be more than your job description. Our guest, Liz Fong-Jones is the perfect example of a talented tech employee who engages in workplace activism for employee rights while also excelling in her day job as an engineer. Liz began a lot of her activism while she was a site reliability engineer at Google. As a trans woman, Liz discusses the importance of workplace inclusion and why she fights so hard for marginalized groups in the workplace. Liz offers listeners all sorts of helpful tips from finding success despite an untraditional education – Liz was a college dropout herself – to how to best garner support for your own petitions and activism at work. Liz is now Field CTO at the female-led software company Honeycomb.io. “The fight is going to take various shapes over the next several years. But I think intersectionality is the most important thing that we can and should be doing. I think that means that we should reach out and build intersectional allyship, that we should think about people who sit at the intersections, who sit at those margins, right? If someone is a transgender person and they are a person of color, especially if they're black, that dramatically increases the likelihood that they are going to become the victim of violence that is transphobic. I think the number one power that we have right now is we need to think about protecting people's privacy.” For more on Liz and her work, check out... Linkedin - /efong Twitter - @honeycombio --- At AnitaB.org, we envision a future where the people who imagine and build technology mirror the people and societies for whom they build it. Find out more about how we support women, non-binary individuals, and other underrepresented groups in computing, as well as the organizations that employ them and the academic institutions training the next generations. --- Connect with AnitaB.org Instagram - @anitab_org Facebook - /anitab.0rg LinkedIn - /anitab-org On the web - anitab.org --- Our guests contribute to this podcast in their personal capacity. The views expressed in this interview are their own and do not necessarily represent the views of Anita Borg Institute for Women and Technology or its employees (“AnitaB.org”). AnitaB.org is not responsible for and does not verify the accuracy of the information provided in the podcast series. The primary purpose of this podcast is to educate and inform. This podcast series does not constitute legal or other professional advice or services. --- B The Way Forward Is… Produced by Dominique Ferrari and Paige Hymson Sound design and editing by Neil Innes and Ryan Hammond Mixing and mastering by Julian Kwasneski Associate Producer is Faith Krogulecki Executive Produced by Dominique Ferrari, Stacey Book, and Avi Glijansky for Riveter Studios and Frequency Machine Executive Produced by Brenda Darden Wilkerson for AnitaB.org Podcast Marketing from Lauren Passell and Arielle Nissenblatt with Riveter Studios and Tink Media in partnership with Carolyn Schneller and Coley Bouschet at AnitaB.org Photo of Brenda Darden Wilkerson by Mandisa Media Productions For more ways to be the way forward, visit AnitaB.org
Tuck and Ozzy chat with Liz Fong-Jones (she/her). Topics include: Creating a Coworker Solidarity Fund to support worker organizing and potential strikes Organizing your Google colleagues so hard that you crash Google Docs Being realistic about your security threats (e.g., is it safe to organize in Google Docs?) Ways that (trans & cis) tech workers can support trans people Plus: How the Joy Luck Club (not that one) saved her life This Week in Gender: Listen to Hang Up. Find Liz at lizthegrey.com and on cohost. Submit a piece of Theymail: a small message or ad that we'll read on the show. Today's message was from Wren Dove Lark. We've got limited-edition Trans Day of Snack 2024 merch on sale this month, with proceeds benefiting our mutual aid fund. ~~ Join our Patreon (patreon.com/gender) to get access to our bonus podcasts, weekly newsletter, and other perks. Find our starter packs at genderpodcast.com. We're also on Instagram @gendereveal. Senior Producer: Ozzy Llinas Goodman Logo: Ira M. LeighMusic: Breakmaster CylinderAdditional Music: “Hrse Hrse” & “Gamboler” by Blue Dot Sessions Sponsors: ShopEnby.com (code: GenderReveal) and DeleteMe (code: TUCK20)
Join host Kaivalya Apte in this episode of The Geek Narrator Podcast as he discusses observability engineering with field CTO at Honeycomb, Liz Fong-Jones. They delve into the importance of observability for software engineers, the role of Honeycomb in popularizing this concept, and how observability has evolved over the years. Liz shares her experiences transitioning from being an SRE at Google to advocating for observability at Honeycomb and walking the journey from developer advocate to Field CTO. They discuss the definitions and misconceptions surrounding observability and elucidate on Service-Level Objectives (SLOs) & indicators (SLIs) and challenges they solve. Tune in for an informative and in-depth conversation on observability engineering. Chapters: 00:00 Introduction 00:08 Understanding Observability Engineering 00:37 Guest Introduction: Liz Fong Jones 00:53 Liz's Journey to Field CTO at Honeycomb 27:38 Understanding Site Reliability Workbook Materials 27:57 Identifying Critical User Journeys 29:49 Different Types of Services and Their SLOs 33:05 Setting Up SLOs: Granularity and Number 42:42 Understanding Service Level Indicators (SLIs) 50:26 Common Mistakes in Setting Up SLOs 52:09 Cultivating an Observability-Driven Development Culture References: Observability Engineering: https://www.oreilly.com/library/view/observability-engineering/9781492076438/ @Google SRE book: https://sre.google/books/ =============================================================================== For discount on the below courses: Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount. =============================================================================== Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning!
In episode 2 of Generationship, Rachel Chalmers speaks with Liz Fong-Jones and Phillip Carter of Honeycomb. Together they explore use cases for LLMs, insights on navigating AI hallucinations, the tradeoffs between prompt engineering and fine-tuning, and the privacy and security implications inherent to commercial LLMs.
In episode 2 of Generationship, Rachel Chalmers speaks with Liz Fong-Jones and Phillip Carter of Honeycomb. Together they explore use cases for LLMs, insights on navigating AI hallucinations, the tradeoffs between prompt engineering and fine-tuning, and the privacy and security implications inherent to commercial LLMs.
Links from the show:Symfony 7.0.0 released (Symfony Blog)https://twitter.com/danjharrin/status/1729895786064888188Command Line Picasso | php[architect]PHP[TEK] 2024PHP | endoflife.datePHP: Never - Manual https://twitter.com/official_php/status/1729168870827532504?t=DL-o14jdWEFxVWgsT8hEGQ&s=19PHP: Supported VersionsPHP: rfc:release_cycle_updatePHP Support for PHP 7.2 - 8.0 | PHP LTS | Zend by PerforceUsing Amazon RDS Blue/Green Deployments for database updates - Amazon Relational Database ServiceEduards Sizovs is lying about conference diversity. | Liz Fong-Jones posted on the topic | LinkedInThis episode of PHPUgly was sponsored by:Honeybadger.ioBuilt for Developers. Monitoring doesn't have to be so complicated. That's why we built the monitoring tool we always wanted: a tool that's there when you need it, and gets out of your. Everything you need to keep production happy so that you can keep shipping. Deploy with confidence and be your team's DevOps hero.https://www.honeybadger.io/JetBrains PhpStormThe Lightning-Smart PHP IDE. Join over 600,000 happy PhpStorm users worldwide!https://www.jetbrains.com/phpstorm/php[architect]php[architect] magazine is the only technical journal dedicated exclusively to the world of PHP. We are committed to spreading knowledge of best practices in PHP. With that purpose, the brand has expanded into producing a full line of books, hosting online and in-person web training, as well as organizing multiple conferences per year.https://www.phparch.comPHPUgly streams the recording of this podcast live. Typically every Thursday night around 9 PM PT. Come and join us, and subscribe to our Youtube Channel, Twitch, or Twitter. Also, be sure to check out our Patreon Page.Twitter Account https://twitter.com/phpuglyMastodon Account https://phparch.social/@phpuglyHost:Eric Van Johnson | Mastodon: @eric@phpartch.socialJohn Congdon | Mastodon: @john@phpartch.socialTom RideoutStreams:Youtube ChannelTwitchPowered by RestreamPatreon PagePHPUgly Anthem by Harry Mack / Harry Mack Youtube ChannelThanks to all of our Patreon Sponsors:***** PATREON SUPPORTS SPONSOR LEVEL **Honeybadger (https://honeybader.io)** Patreon Supports **ButteryCrumpetFrank WDavid QBoštjan OMarcusShelby CS FRodrigo CBillyDarryl HKnut BDmitri GElgimboMikePageDevKenrick BKalen JR. C. S.Peter AHoneybadgerHolly SClayton SRonny NBen RAlex BKevin YWayneJeroen FahinkleChris CSteve MRobert SThorstenEmily JAndrew WulrikJohn CJames HEric MEd GRirielilHermitChampJeffrey DChris BTore BBek JDonald GPaul KDustin UMel S
This interview was recorded at YOW! London for GOTO Unscripted.gotopia.techRead the full transcription of this interview hereJessica Kerr - Principal Developer Evangelist at HoneycombJessica Cregg - Information Technology Operations Engineer at CybSafeRESOURCESJessica Kerrjessitron.com@jessitrongithub.com/jessitronlinkedin.com/in/jessicakerrtwitch.tv/jessitronicainstagram.com/jessitronicaJessica Cregg@JessicaCregggithub.com/jessicarfactorylinkedin.com/in/jessicacreggDESCRIPTIONSoftware's evolving landscape goes beyond mere programming, creating far-reaching consequences. Dive into software development's intricate world in this GOTO Unscripted talk where Jessica Kerr explores observability, gamification and systems thinking. Speaking with Jessica Cregg, Kerr highlights the distinct satisfaction derived from collaborative tasks and the pros and cons of integrating gamification as well as competition-driven incentives in software engineering.The talk underscores the interplay of observability and feature flags for agile deployment and issue resolution. Gain deeper insights into systems thinking, a framework that has evolved to embrace software's constant evolution and uncover how observability extends beyond monitoring to provide actionable insights.RECOMMENDED BOOKSDebbie Levitt • Customers Know You SuckYu-kai Chou • Actionable GamificationChris Collins • Gamification: Playing for ProfitsCharity Majors, Liz Fong-Jones & George Miranda • Observability EngineeringKelly Shortridge & Aaron Rinehart • Security Chaos EngineeringMatthew Skelton & Manuel Pais • Team TopologiesTwitterInstagramLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted almost daily
Liz is a developer advocate, labor and ethics organizer, and Site Reliability Engineer (SRE) with 18+ years of experience. She is currently the Field CTO at Honeycomb, and previously was an SRE working on products ranging from the Google Cloud Load Balancer to Google Flights. She lives in Vancouver, BC with her wife Elly, partners, and a Samoyed/Golden Retriever mix, and in Sydney, NSW. She plays classical piano, leads an EVE Online alliance, and advocates for transgender rights. You can follow Lizon Social Media https://twitter.com/lizthegrey https://www.lizthegrey.com/ https://cohost.org/lizthegrey Also take a look at some Links provided by Liz https://info.honeycomb.io/observability-engineering-oreilly-book-2022 https://opentelemetry.io/ PLEASE SUBSCRIBE TO THE PODCAST - Spotify: http://isaacl.dev/podcast-spotify - Apple Podcasts: http://isaacl.dev/podcast-apple - Google Podcasts: http://isaacl.dev/podcast-google - RSS: http://isaacl.dev/podcast-rss You can check out more episodes of Coffee and Open Source on https://www.coffeeandopensource.com/ Coffee and Open Source is hosted by Isaac Levin (https://twitter.com/isaacrlevin) --- Support this podcast: https://podcasters.spotify.com/pod/show/coffeandopensource/support
In the early 2000s, Charity Majors was a homeschooled kid who'd gotten a scholarship to study classical piano performance at the University of Idaho. “I realized, over the course of that first year, that music majors tended to still be hanging around the music department in their 30s and 40s,” she said. “And nobody really had very much money, and they were all doing it for the love of the game. And I was just like, I don't want to be poor for the rest of my life.” Fortunately, she said, it was pretty easy at that time to jump into the much more lucrative tech world. “It was buzzing, they were willing to take anyone who knew what Unix was,” she said of her first tech job, running computer systems for the university. Eventually, she dropped out of college, she said, “made my way to Silicon Valley, and I've been here ever since.” Majors, co-founder and chief technology officer of the six-year-old Honeycomb.io, an observability platform company, told her story for The New Stack's podcast series, The Tech Founder Odyssey, which spotlights the personal journeys of some of the most interesting technical startup creators in the cloud native industry. It's been a busy year for her and the company she co-founded with Christine Yen, a colleague from Parse, a mobile application development company that was bought by Facebook. In May, O'Reilly published “Observability Engineering,” which Majors co-wrote with George Miranda and Liz Fong-Jones. In June, Gartner named Honeycomb.io as a Leader in the Magic Quadrant for Application Performance Monitoring and Observability. Thus far Honeycomb.io, now employing about 200 people, has raised just under $97 million, including a $50 million Series C funding round it closed in October, led by Insight Partners (which owns The New Stack). This Tech Founder Odyssey conversation was co-hosted by Colleen Coll and Heather Joslyn of TNS. ‘Rage-Driven Development' Honeycomb.io grew from efforts at Parse to solve a stubborn observability problem: systems crashed frequently, and rarely for the same reasons each time. “We invested a lot in the last generation of monitoring technology, we had all these dashboards, we have all these graphs,” Majors said. “But in order to figure out what's going on, you kind of had to know in advance what was going to break.” Once Parse was acquired by Facebook, Majors, Yen and their teams began piping data into a Facebook tool called Scuba, which ”was aggressively hostile to users,” she recalled. But, “it did one thing really well, which is let you slice and dice in real time on dimensions that have very high cardinality,” meaning those that contain lots of unique terms. This set it apart from the then-current monitoring technologies, which were built around assessing low cardinality dimensions. Scuba allowed Majors' organization to gain more control over its reliability problem. And it got her and Yen thinking about how a platform tool that could analyze high cardinality data about system health in real time. “Everything is a high cardinality dimension now,” Majors said. “And [with] the old generation of tools, you hit a wall really fast and really hard.” And so, Honeycomb.io was created to build that platform. “My entire career has been rage-driven development,” she said. “Like: sounds cool, I'm gonna go play with that. This isn't working — I'm gonna go fix it from anger.” A Reluctant CEO Yen now holds the CEO role at Honeycomb.io, but Majors wound up with the job for roughly the first half of the company's life. Did Majors like being the boss? “Hated it,” she said. “Constitutionally what you want in a CEO is someone who is reliable, predictable, dependable, someone who doesn't mind showing up every Tuesday at 10:30 to talk to the same people. “I am not structured. I really chafe against that stuff.” However, she acknowledged, she may have been the right leader in the startup's beginning: “It was a state of chaos, like we didn't think we were going to survive. And that's where I thrive.” Fortunately, in Honeycomb.io's early days, raising money wasn't a huge challenge, due to its founders' background at Facebook. “There were people who were coming to us, like, do you want $2 million for a seed thing? Which is good, because I've seen the slides that we put together, and they are laughable. If I had seen those slides as an investor, I would have run the other way.” The “pedigree” conferred on her by investors due to her association with Facebook didn't sit comfortably with her. “I really hated it,” she said. “Because I did not learn to be a better engineer at Facebook. And part of me kind of wanted to just reject it. But I also felt this like responsibility on behalf of all dropouts, and queer women everywhere, to take the money and do something with it. So that worked out.” Majors, a frequent speaker at tech conferences, has established herself as a thought leader in not only observability but also engineering management. For other women, people of color, or people in the tech field with an unconventional story, she advised “investing a little bit in your public speaking skills, and making yourself a bit of a profile. Being externally known for what you do is really helpful because it counterbalances the default assumptions that you're not technical or that you're not as good.” She added, “if someone can Google your name plus a technology, and something comes up, you're assumed to be an expert. And I think that that really works to people's advantage.“ Majors had a lot more to say about how her outsider perspective has shaped the way she approaches hiring, leadership and scaling up her organization. Check out this latest episode of the Tech Founder Odyssey.
This interview was recorded for the GOTO Book Club. gotopia.tech/bookclubRead the full transcription of the interview hereCharity Majors - Co- Author of Observability Engineering and CTO at HoneycombLiz Fong-Jones - Co- Author of Observability Engineering an Principal Developer Advocate for SRE & Observability at HoneycombGeorge Miranda - Co- Author of Observability Engineering and Head of Ecosystems & Partnerships at HoneycombDESCRIPTIONObservability is crucial for developing and understanding the software that powers today's complex systems. Charity, Liz and George, authors of “Observability Engineering” show you how to manage software at scale, deliver complex cloud native applications and systems, and the benefit observability has across the entire software development lifecycle.You'll also learn the impact observability has on organizational culture (and vice versa).The interview is based on Charity's, Liz' and Goerge's book "Observability Engineering".RECOMMENDED BOOKSCharity Majors, Liz Fong-Jones & George Miranda • Observability EngineeringKelly Shortridge & Aaron Rinehart • Security Chaos EngineeringNora Jones & Casey Rosenthal • Chaos EngineeringMikolaj Pawlikowski • Chaos EngineeringRuss Miles • Learning Chaos EngineeringMurphy, Beyer, Jones & Petoff • Site Reliability EngineeringBeyer, Murphy, Rensin, Kawahara & Thorne • The Site Reliability WorkbookDavid N. Blank-Edelman • Seeking SRETwitterLinkedInFacebook Looking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket at gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted almost daily.Discovery MattersA collection of stories and insights on matters of discovery that advance life...Listen on: Apple Podcasts Spotify Health, Wellness & Performance Catalyst w/ Dr. Brad CooperLooking for a catalyst to optimize your health, wellness & performance? You've found it!!Listen on: Apple Podcasts Spotify The New Arab VoiceA podcast from The New Arab, a leading English-language website based in London...Listen on: Apple Podcasts Spotify
In episode 54 of o11ycast, Liz Fong-Jones and Jessica Kerr speak with Alex Boton of Lightstep. They discuss Alex's book Cloud Native Observability with OpenTelemetry, OTel documentation and community, vendor lock-in, and the pain of instrumentation.
In episode 54 of o11ycast, Liz Fong-Jones and Jessica Kerr speak with Alex Boton of Lightstep. They discuss Alex's book Cloud Native Observability with OpenTelemetry, OTel documentation and community, vendor lock-in, and the pain of instrumentation.
“Observability is a technique for ensuring that you can understand novel problems in your system. Can you understand what's happening in your system and why, without having to push a new code by slicing and dicing existing telemetry signals that are coming out of your system?" Liz Fong-Jones is the co-author of the “Observability Engineering” book and a Principal Developer Advocate for SRE and Observability at Honeycomb. In this episode, Liz shared in-depth about observability and why it is becoming an important practice in the industry nowadays. Liz started by explaining the fundamentals of observability and how it differs from traditional monitoring. She explained some important concepts, such as the core analysis loop, cardinality and dimensionality, and doing debugging from a first principle. Later, Liz shared the current state of observability and how we can improve our observability by doing observability driven development and improving our practices based on the proposed observability maturity model found in the book. Listen out for: Career Journey - [00:05:44] Observability - [00:06:30] Pillars of Observability - [00:09:57] Monitoring and SLO - [00:12:28] Core Analysis Loop - [00:15:06] Cardinality and Dimensionality - [00:18:41] Debugging from First Principle - [00:21:20] Current State of Observability - [00:26:49] Implementing Observability - [00:30:20] Observability Driven Development - [00:36:53] Having Developers On-Call - [00:39:06] Observability Maturity Model - [00:41:59] 3 Tech Lead Wisdom - [00:44:10] _____ Liz Fong-Jones's Bio Liz is a developer advocate, labor and ethics organizer, and Site Reliability Engineer (SRE) with 17+ years of experience. She is an advocate at Honeycomb for the SRE and Observability communities, and previously was an SRE working on products ranging from the Google Cloud Load Balancer to Google Flights. She lives in Vancouver, BC with her wife Elly, partners, and a Samoyed/Golden Retriever mix, and in Sydney, NSW. She plays classical piano, leads an EVE Online alliance, and advocates for transgender rights. Follow Liz: Twitter – @lizthegrey LinkedIn – https://www.linkedin.com/in/efong/ Website – https://www.lizthegrey.com/ GitHub – https://github.com/lizthegrey Our Sponsor Today's episode is proudly sponsored by Skills Matter, the global community and events platform for software professionals. Skills Matter is an easier way for technologists to grow their careers by connecting you and your peers with the best-in-class tech industry experts and communities. You get on-demand access to their latest content, thought leadership insights as well as the exciting schedule of tech events running across all time zones. Head on over to skillsmatter.com to become part of the tech community that matters most to you - it's free to join and easy to keep up with the latest tech trends. Like this episode? Subscribe on your favorite podcast app and submit your feedback. Follow @techleadjournal on LinkedIn, Twitter, and Instagram. Pledge your support by becoming a patron. For more info about the episode (including quotes and transcript), visit techleadjournal.dev/episodes/88.
About CaseyCasey spends his days leveraging AWS to help organizations improve the speed at which they deliver software. With a background in software development, he has spent the past 20 years architecting, building, and supporting software systems for organizations ranging from startups to Fortune 500 enterprises.Links Referenced: “17 Ways to Run Containers in AWS”: https://www.lastweekinaws.com/blog/the-17-ways-to-run-containers-on-aws/ “17 More Ways to Run Containers on AWS”: https://www.lastweekinaws.com/blog/17-more-ways-to-run-containers-on-aws/ kubernetestheeasyway.com: https://kubernetestheeasyway.com snark.cloud/quinntainers: https://snark.cloud/quinntainers ECS Chargeback: https://github.com/gaggle-net/ecs-chargeback twitter.com/nektos: https://twitter.com/nektos TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored by our friends at Revelo. Revelo is the Spanish word of the day, and its spelled R-E-V-E-L-O. It means “I reveal.” Now, have you tried to hire an engineer lately? I assure you it is significantly harder than it sounds. One of the things that Revelo has recognized is something I've been talking about for a while, specifically that while talent is evenly distributed, opportunity is absolutely not. They're exposing a new talent pool to, basically, those of us without a presence in Latin America via their platform. It's the largest tech talent marketplace in Latin America with over a million engineers in their network, which includes—but isn't limited to—talent in Mexico, Costa Rica, Brazil, and Argentina. Now, not only do they wind up spreading all of their talent on English ability, as well as you know, their engineering skills, but they go significantly beyond that. Some of the folks on their platform are hands down the most talented engineers that I've ever spoken to. Let's also not forget that Latin America has high time zone overlap with what we have here in the United States, so you can hire full-time remote engineers who share most of the workday as your team. It's an end-to-end talent service, so you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance because Revelo handles all of it. If you're hiring engineers, check out revelo.io/screaming to get 20% off your first three months. That's R-E-V-E-L-O dot I-O slash screaming.Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is someone that I had the pleasure of meeting at re:Invent last year, but we'll get to that story in a minute. Casey Lee is the CTO with a company called Gaggle, which is—as they frame it—saving lives. Now, that seems to be a relatively common position that an awful lot of different tech companies take. “We're saving lives here.” It's, “You show banner ads and some of them are attack platforms for JavaScript malware. Let's be serious here.” Casey, thank you for joining me, and what makes the statement that Gaggle saves lives not patently ridiculous?Casey: Sure. Thanks, Corey. Thanks for having me on the show. So Gaggle, we're ed-tech company. We sell software to school districts, and school districts use our software to help protect their students while the students use the school-issued Google or Microsoft accounts.So, we're looking for signs of bullying, harassment, self-harm, and potentially suicide from K-12 students while they're using these platforms. They will take the thoughts, concerns, emotions they're struggling with and write them in their school-issued accounts. We detect that and then we notify the school districts, and they get the students the help they need before they can do any permanent damage to themselves. We protect about 6 million students throughout the US. We ingest a lot of content.Last school year, over 6 billion files, about the equal number of emails ingested. We're looking for concerning content and then we have humans review the stuff that our machine learning algorithms detect and flag. About 40 million items had to go in front of humans last year, resulted in about 20,000 what we call PSSes. These are Possible Student Situations where students are talking about harming themselves or harming others. And that resulted in what we like to track as lives saved. 1400 incidents last school year where a student was dealing with suicide ideation, they were planning to take their own lives. We detect that and get them help within minutes before they can act on that. That's what Gaggle has been doing. We're using tech, solving tech problems, and also saving lives as we do it.Corey: It's easy to lob a criticism at some of the things you're alluding to, the idea of oh, you're using machine learning on student data for young kids, yadda, yadda, yadda. Look at the outcome, look at the privacy controls you have in place, and look at the outcomes you're driving to. Now, I don't necessarily trust the number of school administrations not to become heavy-handed and overbearing with it, but let's be clear, that's not the intent. That is not what the success stories you have alluded to. I've got to say I'm a fan, so thanks for doing what you're doing. I don't say that very often to people who work in tech companies.Casey: Cool. Thanks, Corey.Corey: But let's rewind a bit because you and I had passed like ships in the night on Twitter for a while, but last year at re:Invent something odd happened. First, my business partner procrastinated at getting his ticket—that's not the odd part; he does that a lot—but then suddenly ticket sales slammed shut and none were to be had anywhere. You reached out with a, “Hey, I have a spare ticket because someone can't go. Let me get it to you.” And I said, “Terrific. Let me pay you for the ticket and take you to dinner.”You said, “Yes on the dinner, but I'd rather you just look at my AWS bill and don't worry about the cost of the ticket.” “All right,” said I. I know a deal when I see one. We grabbed dinner at the Venetian. I said, “Bust out your laptop.” And you said, “Oh, I was kidding.” And I said, “Great. I wasn't. Bust it out.”And you went from laughing to taking notes in about the usual time that happens when I start looking at these things. But how was your recollection of that? I always tend to romanticize some of these things. Like, “And then everyone's restaurant just turned, stopped, and clapped the entire time.” Maybe that part didn't happen.Casey: Everything was right up until the clapping part. That was a really cool experience. I appreciate you walking through that with me. Yeah, we've got lots of opportunity to save on our AWS bill here at Gaggle, and in that little bit of time that we had together, I think I walked away with no more than a dozen ideas for where to shave some costs. The most obvious one, the first thing that you keyed in on, is we had RIs coming due that weren't really well-optimized and you steered me towards savings plans. We put that in place and we're able to apply those savings plans not just to our EC2 instances but also to our serverless spend as well.So, that was a very worthwhile and cost-effective dinner for us. The thing that was most surprising though, Corey, was your approach. Your approach to how to review our bill was not what I thought at all.Corey: Well, what did you expect my approach was going to be? Because this always is of interest to me. Like, do you expect me to, like, whip a portable machine learning rig out of my backpack full of GPUs or something?Casey: I didn't know if you had, like, some secret tool you were going to hit, or if nothing else, I thought you were going to go for the Cost Explorer. I spend a lot of time in Cost Explorer, that's my go-to tool, and you wanted nothing to do with Cost Exp—I think I was actually pulling up Cost Explorer for you and you said, “I'm not interested. Take me to the bills.” So, we went right to the billing dashboard, you started opening up the invoices, and I thought to myself, “I don't remember the last time I looked at an AWS invoice.” I just, it's noise; it's not something that I pay attention to.And I learned something, that you get a real quick view of both the cost and the usage. And that's what you were keyed in on, right? And you were looking at things relative to each other. “Okay, I have no idea about Gaggle or what they do, but normally, for a company that's spending x amount of dollars in EC2, why is your data transfer cost the way it is? Is that high or low?” So, you're looking for kind of relative numbers, but it was really cool watching you slice and dice that bill through the dashboard there.Corey: There are a few things I tie together there. Part of it is that this is sort of a surprising thing that people don't think about but start with big numbers first, rather than going alphabetically because I don't really care about your $6 Alexa for Business spend. I care a bit more about the $6 million, or whatever it happens to be at EC2—I'm pulling numbers completely out of the ether, let's be clear; I don't recall what the exact magnitude of your bill is and it's not relevant to the conversation.And then you see that and it's like, “Huh. Okay, you're spending $6 million on EC2. Why are you spending 400 bucks on S3? Seems to me that those two should be a little closer aligned. What's the deal here? Oh, God, you're using eight petabytes of EBS volumes. Oh, dear.”And just, it tends to lead to interesting stuff. Break it down by region, service, and use case—or usage type, rather—is what shows up on those exploded bills, and that's where I tend to start. It also is one of the easiest things to wind up having someone throw into a PDF and email my way if I'm not doing it in a restaurant with, you know, people clapping standing around.Casey: [laugh]. Right.Corey: I also want to highlight that you've been using AWS for a long time. You're a Container Hero; you are not bad at understanding the nuances and depths of AWS, so I take praise from you around this stuff as valuing it very highly. This stuff is not intuitive, it is deeply nuanced, and you have a business outcome you are working towards that invariably is not oriented day in day out around, “How do I get these services for less money than I'm currently paying?” But that is how I see the world and I tend to live in a very different space just based on the nature of what I do. It's sort of a case study and the advantage of specialization. But I know remarkably little about containers, which is how we wound up reconnecting about a week or so before we did this recording.Casey: Yeah. I saw your tweet; you were trying to run some workload—container workload—and I could hear the frustration on the other end of Twitter when you were shaking your fist at—Corey: I should not tweet angrily, and I did in this case. And, eh, every time I do I regret it. But it played well with the people, so that does help. I believe my exact comment was, “‘me: I've got this container. Run it, please.' ‘Google Cloud: Run. You got it, boss.' AWS has 17 ways to run containers and they all suck.”And that's painting with an overly broad brush, let's be clear, but that was at the tail end of two or three days of work trying to solve a very specific, very common, business problem, that I was just beating my head off of a wall again and again and again. And it took less than half an hour from start to finish with Google Cloud Run and I didn't have to think about it anymore. And it's one of those moments where you look at this and realize that the future is here, we just don't see it in certain ways. And you took exception to this. So please, let's dive in because 280 characters of text after half a bottle of wine is not the best context to have a nuanced discussion that leaves friendships intact the following morning.Casey: Nice. Well, I just want to make sure I understand the use case first because I was trying to read between the lines on what you needed, but let me take a guess. My guess is you got your source code in GitHub, you have a Docker file, and you want to be able to take that repo from GitHub and just have it continuously deployed somewhere in Run. And you don't want to have headaches with it; you just want to push more changes up to GitHub, Docker Build runs and updates some service somewhere. Am I right so far?Corey: Ish, but think a little further up the stack. It was in service of this show. So, this show, as people who are listening to this are probably aware by this point, periodically has sponsors, which we love: We thank them for participating in the ongoing support of this show, which empowers conversations like this. Sometimes a sponsor will come to us with, “Oh, and here's the URL we want to give people.” And it's, “First, you misspelled your company name from the common English word; there are three sublevels within the domain, and then you have a complex UTM tagging tracking co—yeah, you realize people are driving to work when they're listening to this?”So, I've built a while back a link shortener, snark.cloud because is it the shortest thing in the world? Not really, but it's easily understandable when I say that, and people hear it for what it is. And that's been running for a long time as an S3 bucket with full of redirects, behind CloudFront. So, I wind up adding a zero-byte object with a redirect parameter on it, and it just works.Now, the challenge that I have here as a business is that I am increasingly prolific these days. So, anything that I am not directly required to be doing, I probably shouldn't necessarily be the one to do it. And care and feeding of those redirect links is a prime example of this. So, I went hunting, and the things that I was looking for were, obviously, do the redirect. Now, if you pull up GitHub, there are hundreds of solutions here.There are AWS blog posts. One that I really liked and almost got working was Eric Johnson's three-part blog post on how to do it serverlessly, with API Gateway, and DynamoDB, no Lambdas required. I really liked aspects of what that was, but it was complex, I kept smacking into weird challenges as I went, and front end is just baffling to me. Because I needed a front end app for people to be able to use here; I need to be able to secure that because it turns out that if you just have a, anyone who stumbles across the URL can redirect things to other places, well, you've just empowered a whole bunch of spam email, and you're going to find that service abused, and everyone starts blocking it, and then you have trouble. Nothing lasts the first encounter with jerks.And I was getting more and more frustrated, and then I found something by a Twitter engineer on GitHub, with a few creative search terms, who used to work at Google Cloud. And what it uses as a client is it doesn't build any kind of custom web app. Instead, as a database, it uses not S3 objects, not Route 53—the ideal database—but a Google sheet, which sounds ridiculous, but every business user here knows how to use that.Casey: Sure.Corey: And it looks for the two columns. The first one is the slug after the snark.cloud, and the second is the long URL. And it has a TTL of five seconds on cache, so make a change to that spreadsheet, five seconds later, it's live. Everyone gets it, I don't have to build anything new, I just put it somewhere around the relevant people can access it, I gave him a tutorial and a giant warning on it, and everyone gets that. And it just works well. It was, “Click here to deploy. Follow the steps.”And the documentation was a little, eh, okay, I had to undo it once and redo it again. Getting the domain registered was getting—ported over took a bit of time, and there were some weird SSL errors as the certificates were set up, but once all of that was done, it just worked. And I tested the heck out of it, and cold starts are relatively low, and the entire thing fits within the free tier. And it is reminiscent of the magic that I first saw when I started working with some of the cloud providers services, years ago. It's been a long time since I had that level of delight with something, especially after three days of frustration. It's one of the, “This is a great service. Why are people not shouting about this from the rooftops?” That was my perspective. And I put it out on Twitter and oh, Lord, did I get comments. What was your take on it?Casey: Well, so my take was, when you're evaluating a platform to use for running your applications, how fast it can get you to Hello World is not necessarily the best way to go. I just assumed you're wrong. I assumed of the 17 ways AWS has to run containers, Corey just doesn't understand. And so I went after it. And I said, “Okay, let me see if I can find a way that solves his use case, as I understand it, through a quick tweet.”And so I tried to App Runner; I saw that App Runner does not meet your needs because you have to somehow get your Docker image pushed up to a repo. App Runner can take an image that's already been pushed up and deployed for you or it can build from source but neither of those were the way I understood your use case.Corey: Having used App Runner before via the Copilot CLI, it is the closest as best I can tell to achieving what I want. But also let's be clear that I don't believe there's a free tier; there needs to be a load balancer in front of it, so you're starting with 15 bucks a month for this thing. Which is not the end of the world. Had I known at the beginning that all of this was going to be there, I would have just signed up for a bit.ly account and called it good. But here we are.Casey: Yeah. I tried Copilot. Copilot is a great developer experience, but it also is just pulling together tons of—I mean just trying to do a Copilot service deploy, VPCs are being created and tons IAM roles are being created, code pipelines, there's just so much going on. I was like 20 minutes into it, and I said, “Yeah, this is not fitting the bill for what Corey was looking for.” Plus, it doesn't solve my the way I understood your use case, which is you don't want to worry about builds, you just want to push code and have new Docker images get built for you.Corey: Well, honestly, let's be clear here, once it's up and running, I don't want to ever have to touch the silly thing again.Casey: Right.Corey: And that's so far has been the case, after I forked the repo and made a couple of changes to it that I wanted to see. One of them was to render the entire thing case insensitive because I get that one wrong a lot, and the other is I wanted to change the permanent 301 redirect to a temporary 302 redirect because occasionally, sponsors will want to change where it goes in the fullness of time. And that is just fine, but I want to be able to support that and not have to deal with old cached data. So, getting that up and running was a bit of a challenge. But the way that it worked, was following the instructions in the GitHub repo.The developer environment had spun up in the Google's Cloud Shell was just spectacular. It prompted me for a few things and it told me step by step what to do. This is the sort of thing I could have given a basically non-technical user, and they would have had success with it.Casey: So, I tried it as well. I said, “Well, okay, if I'm going to respond to Corey here and challenge him on this, I need to try Cloud Run.” I had no experience with Cloud Run. I had a small example repo that loosely mapped what I understood you were trying to do. Within five minutes, I had Cloud Run working.And I was surprised anytime I pushed a new change, within 45 seconds the change was built and deployed. So, here's my conclusion, Corey. Google Cloud Run is great for your use case, and AWS doesn't have the perfect answer. But here's my challenge to you. I think that you just proved why there's 17 different ways to run containers on AWS, is because there's that many different types of users that have different needs and you just happen to be number 18 that hasn't gotten the right attention yet from AWS.Corey: Well, let's be clear, like, my gag about 17 ways to run containers on AWS was largely a joke, and it went around the internet three times. So, I wrote a list of them on the blog post of “17 Ways to Run Containers in AWS” and people liked it. And then a few months later, I wrote “17 More Ways to Run Containers on AWS” listing 17 additional services that all run containers.And my favorite email that I think I've ever received in feedback was from a salty AWS employee, saying that one of them didn't really count because of some esoteric reason. And it turns out that when I'm trying to make a point of you have a sarcastic number of ways to run containers, pointing out that well, one of them isn't quite valid, doesn't really shatter the argument, let's be very clear here. So, I appreciate the feedback, I always do. And it's partially snark, but there is an element of truth to it in that customers don't want to run containers, by and large. That is what they do in service of a business goal.And they want their application to run which is in turn to serve as the business goal that continues to abstract out into, “Remain a going concern via the current position the company stakes out.” In your case, it is saving lives; in my case, it is fixing horrifying AWS bills and making fun of Amazon at the same time, and in most other places, there are somewhat more prosaic answers to that. But containers are simply an implementation detail, to some extent—to my way of thinking—of getting to that point. An important one [unintelligible 00:18:20], let's be clear, I was very anti-container for a long time. I wrote a talk, “Heresy in the Church of Docker” that then was accepted at ContainerCon. It's like, “Oh, boy, I'm not going to leave here alive.”And the honest answer is many years later, that Kubernetes solves almost all the criticisms that I had with the downside of well, first, you have to learn Kubernetes, and that continues to be mind-bogglingly complex from where I sit. There's a reason that I've registered kubernetestheeasyway.com and repointed it to ECS, Amazon's container service that is not requiring you to cosplay as a cloud provider yourself. But even ECS has a number of challenges to it, I want to be very clear here. There are no silver bullets in this.And you're completely correct in that I have a large, complex environment, and the application is nuanced, and I'm willing to invest a few weeks in setting up the baseline underlying infrastructure on AWS with some of these services, ideally not all of them at once because that's something a lunatic would do, but getting them up and running. The other side of it, though, is that if I am trying to evaluate a cloud provider's handling of containers and how this stuff works, the reason that everyone starts with a Hello World-style example is that it delivers ideally, the meantime to dopamine. There's a reason that Hello World doesn't have 18 different dependencies across a bunch of different databases and message queues and all the other complicated parts of running a modern application. Because you just want to see how it works out of the gate. And if getting that baseline empty container that just returns the string ‘Hello World' is that complicated and requires that much work, my takeaway is not that this user experience is going to get better once I'd make the application itself more complicated.So, I find that off-putting. My approach has always been find something that I can get the easy, minimum viable thing up and running on, and then as I expand know that you'll be there to catch me as my needs intensify and become ever more complex. But if I can't get the baseline thing up and running, I'm unlikely to be super enthused about continuing to beat my head against the wall like, “Well, I'll just make it more complex. That'll solve the problem.” Because it often does not. That's my position.Casey: Yeah, I agree that dopamine hit is valuable in getting attached to want to invest into whatever tech stack you're using. The challenge is your second part of that. Your second part is will it grow with me and scale with me and support the complex edge cases that I have? And the problem I've seen is a lot of organizations will start with something that's very easy to get started with and then quickly outgrow it, and then come up with all sorts of weird Rube Goldberg-type solutions. Because they jumped all in before seeing—I've got kind of an example of that.I'm happy to announce that there's now 18 ways to run containers on AWS. Because in your use case, in the spirit of AWS customer obsession, I hear your use case, I've created an open-source project that I want to share called Quinntainers—Corey: Oh, no.Casey: —and it solves—yes. Quinntainers is live and is ready for the world. So, now we've got 18 ways to run containers. And if you have Corey's use case of, “Hey, here's my container. Run it for me,” now we've got a one command that you can run to get things going for you. I can share a link for you and you could check it out. This is a [unintelligible 00:21:38]—Corey: Oh, we're putting that in the [show notes 00:21:37], for sure. In fact, if you go to snark.cloud/quinntainers, you'll find it.Casey: You'll find it. There you go. The idea here was this: There is a real use case that you had, and I looked at AWS does not have an out-of-the-box simple solution for you. I agree with that. And Google Cloud Run does.Well, the answer would have been from AWS, “Well, then here, we need to make that solution.” And so that's what this was, was a way to demonstrate that it is a solvable problem. AWS has all the right primitives, just that use case hadn't been covered. So, how does Quinntainers work? Real straightforward: It's a command-line—it's an NPM tool.You just run a [MPX 00:22:17] Quinntainer, it sets up a GitHub action role in your AWS account, it then creates a GitHub action workflow in your repo, and then uses the Quinntainer GitHub action—reusable action—that creates the image for you; every time you push to the branch, pushes it up to ECR, and then automatically pushes up that new version of the image to App Runner for you. So, now it's using App Runner under the covers, but it's providing that nice developer experience that you are getting out of Cloud Run. Look, is container really the right way to go with running containers? No, I'm not making that point at all. But the point is it is a—Corey: It might very well be.Casey: Well, if you want to show a good Hello World experience, Quinntainer's the best because within 30 seconds, your app is now set up to continuously deliver containers into AWS for your very specific use case. The problem is, it's not going to grow for you. I mean that it was something I did over the weekend just for fun; it's not something that would ever be worthy of hitching up a real production workload to. So, the point there is, you can build frameworks and tools that are very good at getting that initial dopamine hit, but then are not going to be there for you unnecessarily as you mature and get more complex.Corey: And yet, I've tilted a couple of times at the windmill of integrating GitHub actions in anything remotely resembling a programmatic way with AWS services, as far as instance roles go. Are you using permanent credentials for this as stored secrets or are you doing the [OICD 00:23:50][00:23:50] handoff?Casey: OIDC. So, what happens is the tool creates the IAM role for you with the trust policy on GitHub's OIDC provider, sets all that up for you in your account, locks it down so that just your repo and your main branch is able to push or is able to assume the role, the role is set up just to allow deployments to App Runner and ECR repository. And then that's it. At that point, it's out of your way. And you're just git push, and couple minutes later, your updates are now running an App Runner for you.Corey: This episode is sponsored in part by our friends at Vultr. Optimized cloud compute plans have landed at Vultr to deliver lightning fast processing power, courtesy of third gen AMD EPYC processors without the IO, or hardware limitations, of a traditional multi-tenant cloud server. Starting at just 28 bucks a month, users can deploy general purpose, CPU, memory, or storage optimized cloud instances in more than 20 locations across five continents. Without looking, I know that once again, Antarctica has gotten the short end of the stick. Launch your Vultr optimized compute instance in 60 seconds or less on your choice of included operating systems, or bring your own. It's time to ditch convoluted and unpredictable giant tech company billing practices, and say goodbye to noisy neighbors and egregious egress forever.Vultr delivers the power of the cloud with none of the bloat. "Screaming in the Cloud" listeners can try Vultr for free today with a $150 in credit when they visit getvultr.com/screaming. That's G E T V U L T R.com/screaming. My thanks to them for sponsoring this ridiculous podcast.Corey: Don't undersell what you've just built. This is something that—is this what I would use for a large-scale production deployment, obviously not, but it has streamlined and made incredibly accessible things that previously have been very complex for folks to get up and running. One of the most disturbing themes behind some of the feedback I got was, at one point I said, “Well, have you tried running a Docker container on Lambda?” Because now it supports containers as a packaging format. And I said no because I spent a few weeks getting Lambda up and running back when it first came out and I've basically been copying and pasting what I got working ever since the way most of us do.And response is, “Oh, that explains a lot.” With the implication being that I'm just a fool. Maybe, but let's be clear, I am never the only person in the room who doesn't know how to do something; I'm just loud about what I don't know. And the failure mode of a bad user experience is that a customer feels dumb. And that's not okay because this stuff is complicated, and when a user has a bad time, it's a bug.I learned that in 2012. From Jordan Sissel the creator of LogStash. He has been an inspiration to me for the last ten years. And that's something I try to live by that if a user has a bad time, something needs to get fixed. Maybe it's the tool itself, maybe it's the documentation, maybe it's the way that GitHub repo's readme is structured in a way that just makes it accessible.Because I am not a trailblazer in most things, nor do I intend to be. I'm not the world's best engineer by a landslide. Just look at my code and you'd argue the fact that I'm an engineer at all. But if it's bad and it works, how bad is it? Is sort of the other side of it.So, my problem is that there needs to be a couple of things. Ignore for a second the aspect of making it the right answer to get something out of the door. The fact that I want to take this container and just run it, and you and I both reach for App Runner as the default AWS service that does this because I've been swimming in the AWS waters a while and you're a frickin AWS Container Hero, where it is expected that you know what most of these things do. For someone who shows up on the containers webpage—which by the way lists, I believe 15 ways to run containers on mobile and 19 ways to run containers on non-mobile, which is just fascinating in its own right—and it's overwhelming, it's confusing, and it's not something that makes it is abundantly clear what the golden path is. First, get it up and working, get it running, then you can add nuance and flavor and the rest, and I think that's something that's gotten overlooked in our mad rush to pretend that we're all Google engineers, circa 2012.Casey: Mmm. I think people get stressed out when they tried to run containers in AWS because they think, “What is that golden path?” You said golden path. And my advice to people is there is no golden path. And the great thing about AWS is they do continue to invest in the solutions they come up with. I'm still bitter about Google Reader.Corey: As am I.Casey: Yeah. I built so much time getting my perfect set of RSS feeds and then I had to find somewhere else to—with AWS, the different offerings that are available for running containers, those are there intentionally, it's not by accident. They're there to solve specific problems, so the trick is finding what works best for you and don't feel like one is better than the other is going to get more attention than others. And they each have different use cases.And I approach it this way. I've seen a couple of different people do some great flowcharts—I think Forrest did one, Vlad did one—on ways to make the decision on how to run your containers. And I break it down to three questions. I ask people first of all, where are you going to run these workloads? If someone says, “It has to be in the data center,” okay, cool, then ECS Anywhere or EKS Anywhere and we'll figure out if Kubernetes is needed.If they need specific requirements, so if they say, “No, we can run in the cloud, but we need privileged mode for containers,” or, “We need EBS volumes,” or, “We want really small container sizes,” like, less than a quarter-VCP or less than half a gig of RAM—or if you have custom log requirements, Fargate is not going to work for you, so you're going to run on EC2. Otherwise, run it on Fargate. But that's the first question. Figure out where are you going to run your containers. That leads to the second question: What's your control plane?But those are different, sort of related but different questions. And I only see six options there. That's App Runner for your control plane, LightSail for your control plane, Rosa if you're invested in OpenShift already, EKS either if you have Momentum and Kubernetes or you have a bunch of engineers that have a bunch of experience with Kubernetes—if you don't have either, don't choose it—or ECS. The last option Elastic Beanstalk, but let's leave that as a—if you're not currently invested in Elastic Beanstalk don't start today. But I look at those as okay, so I—first question, where am I going to run my containers? Second question, what do I want to use for my control plane? And there's different pros and cons of each of those.And then the third question, how do I want to manage them? What tools do I want to use for managing deployment? All those other tools like Copilot or App2Container or Proton, those aren't my control plane; those aren't where I run my containers; that's how I manage, deploy, and orchestrate all the different containers. So, I look at it as those three questions. But I don't know, what do you think of that, Corey?Corey: I think you're onto something. I think that is a terrific way of exploring that question. I would argue that setting up a framework like that—one or very similar—is what the AWS containers page should be, just coming from the perspective of what is the neophyte customer experience. On some level, you almost need a slide of have choose your level of experience ranging from, “What's a container?” To, “I named my kid Kubernetes because I make terrible life decisions,” and anywhere in between.Casey: Sure. Yeah, well, and I think that really dictates the control plane level. So, for example, LightSail, where does LightSail fit? To me, the value of LightSail is the simplicity. I'm looking at a monthly pricing: Seven bucks a month for a container.I don't know how [unintelligible 00:30:23] works, but I can think in terms of monthly pricing. And it's tailored towards a console user, someone just wants to click in, point to an image. That's a very specific user, there's thousands of customers that are very happy with that experience, and they use it. App Runner presents that scale to zero. That's one of the big selling points I see with App Runner. Likewise, with Google Cloud Run. I've got that scale to zero. I can't do that with ECS, or EKS, or any of the other platforms. So, if you've got something that has a ton of idle time, I'd really be looking at those. I would argue that I think I did the math, Google Cloud Run is about 30% more expensive than App Runner.Corey: Yeah, if you disregard the free tier, I think that's have it—running persistently at all times throughout the month, the drop-out cold starts would cost something like 40 some odd bucks a month or something like that. Don't quote me on it. Again and to be clear, I wound up doing this very congratulatory and complimentary tweet about them on I think it was Thursday, and then they immediately apparently took one look at this and said, “Holy shit. Corey's saying nice things about us. What do we do? What do we do?” Panic.And the next morning, they raised prices on a bunch of cloud offerings. Whew, that'll fix it. Like—Casey: [laugh].Corey: Di-, did you miss the direction you're going on here? No, that's the exact opposite of what you should be doing. But here we are. Interestingly enough, to tie our two conversation threads together, when I look at an AWS bill, unless you're using Fargate, I can't tell whether you're using Kubernetes or not because EKS is a small charge. And almost every case for the control plane, or Fargate under it.Everything else just manifests as EC2 spend. From the perspective of the cloud provider. If you're running a Kubernetes cluster, it is a single-tenant application that can have some very funky behaviors like cross-AZ chatter back and fourth because there's no internal mechanism to say talk to the free thing, rather than the two cents a gigabyte thing. It winds up spinning up and down in a bunch of different ways, and the behavior patterns, because of how placement works are not necessarily deterministic, depending upon workload. And that becomes something that people find odd when, “Okay, we look at our bill for a week, what can you say?”“Well, first question. Are you running Kubernetes at all?” And they're like, “Who invited these clowns?” Understand, we're not prying into your workloads for a variety of excellent legal and contractual reasons, here. We are looking at how they behave, and for specific workloads, once we have a conversation engineering team, yeah, we're going to dive in, but it is not at all intuitive from the outside to make any determination whether you're running containers, or whether you're running VMs that you just haven't done anything with in 20 years, or what exactly is going on. And that's just an artifact of the billing system.Casey: We ran into this challenge in Gaggle. We don't use EKS, we use ECS, but we have some shared clusters, lots of EC2 spend, hard to figure out which team is creating the services that's running that up. We actually ended up creating a tool—we open-sourced it—ECS Chargeback, and what it does is it looks at the CPU memory reservations for each task definition, and then prorates the overall charge of the ECS cluster, and then creates metrics in Datadog to give us a breakdown of cost per ECS service. And it also measures what we like to refer to as waste, right? Because if you're reserving four gigs of memory, but your utilization never goes over two gigs, we're paying for that reservation, but you're underutilizing.So, we're able to also show which services have the highest degree of waste, not just utilization, so it helps us go after it. But this is a hard problem. I'd be curious, how do you approach these shared ECS resources and slicing and dicing those bills?Corey: Everyone has a different approach, too. This there is no unifiable, correct answer. A previous show guest, Peter Hamilton, over at Remind had done something very similar, open-sourced a bunch of these things. Understanding what your spend is important on this, and it comes down to getting at the actual business concern because in some cases, effectively dead reckoning is enough. You take a look at the cluster that is really hard to attribute because it's a shared service. Great. It is 5% of your bill.First pass, why don't we just agree that it is a third for Service A, two-thirds for Service B, and we'll call it mostly good at that point? That can be enough in a lot of cases. With scale [laugh] you're just sort of hand-waving over many millions of dollars a year there. How about we get into some more depth? And then you start instrumenting and reporting to something, be it CloudWatch, be a Datadog, be it something else, and understanding what the use case is.In some cases, customers have broken apart shared clusters for that specific reason. I don't think that's necessarily the best approach from an engineering perspective, but again, this is not purely an engineering decision. It comes down to serving the business need. And if you're taking up partial credits on that cluster, for a tax credit for R&D for example, you want that position to be extraordinarily defensible, and spending a few extra dollars to ensure that it is the right business decision. I mean, again, we're pure advisory; we advise customers on what we would do in their position, but people often mistake that to be we're going to go for the lowest possible price—bad idea, or that we're going to wind up doing this from a purely engineering-centric point of view.It's, be aware of that in almost every case, with some very notable weird exceptions, the AWS Bill costs significantly less than the payroll expense that you have of people working on the AWS environment in various ways. People are more expensive, so the idea of, well, you can save a whole bunch of engineering effort by spending a bit more on your cloud, yeah, let's go ahead and do that.Casey: Yeah, good point.Corey: The real mark of someone who's senior enough is their answer to almost any question is, “It depends.” And I feel I've fallen into that trap as well. Much as I'd love to sit here and say, “Oh, it's really simple. You do X, Y, and Z.” Yeah… honestly, my answer, the simple answer, is I think that we orchestrate a cyber-bullying campaign against AWS through the AWS wishlist hashtag, we get people to harass their account managers with repeated requests for, “Hey, could you go ahead and [dip 00:36:19] that thing in—they give that a plus-one for me, whatever internal system you're using?”Just because this is a problem we're seeing more and more. Given that it's an unbounded growth problem, we're going to see it more and more for the foreseeable future. So, I wish I had a better answer for you, but yeah, that's stuff's super hard is honest, but it's also not the most useful answer for most of us.Casey: I'd love feedback from anyone from you or your team on that tool that we created. I can share link after the fact. ECS Chargeback is what we call it.Corey: Excellent. I will follow up with you separately on that. That is always worth diving into. I'm curious to see new and exciting approaches to this. Just be aware that we have an obnoxious talent sometimes for seeing these things and, “Well, what about”—and asking about some weird corner edge case that either invalidates the entire thing, or you're like, “Who on earth would ever have a problem like that?” And the answer is always, “The next customer.”Casey: Yeah.Corey: For a bounded problem space of the AWS bill. Every time I think I've seen it all, I just have to talk to one more customer.Casey: Mmm. Cool.Corey: In fact, the way that we approached your teardown in the restaurant is how we launched our first pass approach. Because there's value in something like that is different than the value of a six to eight-week-long, deep-dive engagement to every nook and cranny. And—Casey: Yeah, for sure. It was valuable to us.Corey: Yeah, having someone come in to just spend a day with your team, diving into it up one side and down the other, it seems like a weird thing, like, “How much good could you possibly do in a day?” And the answer in some cases is—we had a Honeycomb saying that in a couple of days of something like this, we wound up blowing 10% off their entire operating budget for the company, it led to an increased valuation, Liz Fong-Jones says that—on multiple occasions—that the company would not be what it was without our efforts on their bill, which is just incredibly gratifying to hear. It's easy to get lost in the idea of well, it's the AWS bill. It's just making big companies spend a little bit less to another big company. And that's not exactly, you know, saving the lives of K through 12 students here.Casey: It's opening up opportunities.Corey: Yeah. It's about optimizing for the win for everyone. Because now AWS gets a lot more money from Honeycomb than they would if Honeycomb had not continued on their trajectory. It's, you can charge customers a lot right now, or you can charge them a little bit over time and grow with them in a partnership context. I've always opted for the second model rather than the first.Casey: Right on.Corey: But here we are. I want to thank you for taking so much time out of well, several days now to argue with me on Twitter, which is always appreciated, particularly when it's, you know, constructive—thanks for that—Casey: Yeah.Corey: For helping me get my business partner to re:Invent, although then he got me that horrible puzzle of 1000 pieces for the Cloud-Native Computing Foundation landscape and now I don't ever want to see him again—so you know, that happens—and of course, spending the time to write Quinntainers, which is going to be at snark.cloud/quinntainers as soon as we're done with this recording. Then I'm going to kick the tires and send some pull requests.Casey: Right on. Yeah, thanks for having me. I appreciate you starting the conversation. I would just conclude with I think that yes, there are a lot of ways to run containers in AWS; don't let it stress you out. They're there for intention, they're there by design. Understand them.I would also encourage people to go a little deeper, especially if you got a significantly large workload. You got to get your hands dirty. As a matter of fact, there's a hands-on lab that a company called Liatrio does. They call it their Night Lab; it's a one-day free, hands-on, you run legacy monolithic job applications on Kubernetes, gives you first-hand experience on how to—gets all the way up into observability and doing things like Canary deployments. It's a great, great lab.But you got to do something like that to really get your hands dirty and understand how these things work. So, don't sweat it; there's not one right way. There's a way that will probably work best for each user, and just take the time and understand the ways to make sure you're applying the one that's going to give you the most runway for your workload.Corey: I will definitely dig into that myself. But I think you're right, I think you have nailed a point that is, again, a nuanced one and challenging to put in a rage tweet. But the services don't exist in a vacuum. They're not there because, despite the joke, someone wants to get promoted. It's because there are customer needs that are going on that, and this is another way of meeting those needs.I think there could be better guidance, but I also understand that there are a lot of nuanced perspectives here and that… hell is someone else's workflow—Casey: [laugh].Corey: —and there's always value in broadening your perspective a bit on those things. If people want to learn more about you and how you see the world, where's the best place to find you?Casey: Probably on Twitter: twitter.com/nektos, N-E-K-T-O-S.Corey: That might be the first time Twitter has been described as a best place for anything. But—Casey: [laugh].Corey: Thank you once again, for your time. It is always appreciated.Casey: Thanks, Corey.Corey: Casey Lee, CTO at Gaggle and AWS Container Hero. And apparently writing code in anger to invalidate my points, which is always appreciated. Please do more of that, folks. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, or the YouTube comments, which is always a great place to go reading, whereas if you've hated this podcast, please leave a five-star review in the usual places and an angry comment telling me that I'm completely wrong, and then launching your own open-source tool to point out exactly what I've gotten wrong this time.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Unedited live recording of this show on YouTube (Ep 142). Includes demos.Topics============Honeycomb.ioFree eBook Observability EngineeringOpenTelementryPixie12FactorSqlcommenterGuest Liz Fong-Jones=============Liz on TwitterLiz on TwitchJoin my Community==============Best coupons for my Docker and Kubernetes courses Follow me on Patreon and support this show!Chat with us on our Discord Server, Vital DevOpsHomepage bretfisher.com★ Support this podcast on Patreon ★
Dave, Mark and Mike discuss their best bits from the DevOps Enterprise Summit 2021 with IT Revolution. They also review their talk on the Serverless Value Flywheel. The DOES 2021 talks are practical with participants describing their digital transformation journeys in an open way, revealing the lessons they have learned. Here are quick links to themes, tools and techniques that the guys covered in their talk on the Value Flywheel and picked up in other talks and discussions: https://learnwardleymapping.com https://doctrine.wardleymaps.com https://medium.com/wardleymaps https://list.wardleymaps.com https://github.com/ddd-crew/ddd-starter-modelling-process… https://amplitude.com/north-star Our Talk: 'The Serverless Edge, Using Wardley Mapping with the Value Flywheel from combined Business and Technology Evolution' Transcript: https://www.theserverlessedge.com/wardley-mapping-with-the-value-flywheel/ Slides: (including ours under Dave Anderson): https://github.com/devopsenterprise/2021-virtual-us Slack: https://devopsenterprise.slack.com/ New Book announcement https://itrevolution.com/announcing-new-book-from-david-anderson/ The conference was virtual but there were still many opportunities to interact through the Slack Channel and Lean Coffees with leaders and architects from leading edge organisations. Dave and Mark also give a behind the scenes view of their talk: The Serverless Edge, Using Wardley Mapping with the value Flywheel from combined Business and Technology Evolution which previewed their book due to publish next year with IT Revolution: https://itrevolution.com/announcing-new-book-from-david-anderson/ In their talk they explained what a 'value flywheel' is and they did a live demo of a Wardley Map to show the technique in action. There are 4 stages of the 'value flywheel': 1. Clarity of Purpose 2. Challenge 3. Next Best Action 4. Long Term Value The transcript for talk is available on: https://www.theserverlessedge.com/wardley-mapping-with-the-value-flywheel/ Dave and Mark hosted a Lean Coffee on 'Building internal capability, not consultancy dependency', which prompted a good debate. Mike also picked up on themes around value streams and flow efficiency with Mik Kersten from Tasktop - tasktop.com, and similarly with Bank of New Zealand - bnz.co.nz. The guys felt that Wardley Mapping is very complimentary to those themes and is rising in popularity. Mark picked up on an increasing evolution towards product centricity, meaningful work and socio technical practice. Mike attended a session looking at ubiquitous language tying in with the need for visibility/observability to make decisions in DevOps organisations which Mark also picked up on in the talk on 'shifting left with product excellence' with Liz Fong-Jones with metrics being used in right way being key. Mike also picked up a lot of SRE themed talks including one by Michael Winslow from Comcast. Courtney Nash from Verica and Troy Koss from Capital One did a 'Chaos and Reliability: A Surprising Friendship in the Enterprise' talk that Mark enjoyed. And Dave picked up on a talk by Dr. Ron Westrum. Mike felt that Team Topologies concepts were permeating the talks and thinking at the conference. Dave enjoyed the talk with Admiral John Richardson and the ultimate distributed organisation, leadership and autonomous themes. Security was covered by John Wallace looking at how that area has changed. The guys picked up on the lack of serverless talks, although that could be about the organisational journey. Also the conference wasn't really about specific technologies. Tune in next time for more conversation on all things Serverless Craic. Serverless Craic from The Serverless Edge theserverlessedge.com @ServerlessEdge
In episode 47 of o11ycast, Liz Fong-Jones speaks with Gibbs Cullen of Chronosphere. Together they compare the roles of product managers and developer advocates, and discuss the phases of outcome-based observability.
In episode 47 of o11ycast, Liz Fong-Jones speaks with Gibbs Cullen of Chronosphere. Together they compare the roles of product managers and developer advocates, and discuss the phases of outcome-based observability.
About TimTim's tech career spans over 20 years through various sectors. Tim's initial journey into tech started as a US Marine. Later, he left government contracting for the private sector, working both in large corporate environments and in small startups. While working in the private sector, he honed his skills in systems administration and operations for largeUnix-based datastores.Today, Tim leverages his years in operations, DevOps, and Site Reliability Engineering to advise and consult with clients in his current role. Tim is also a father of five children, as well as a competitive Brazilian Jiu-Jitsu practitioner. Currently, he is the reigning American National and 3-time Pan American Brazilian Jiu-Jitsu champion in his division.Links: Twitter: https://twitter.com/elchefe The Duckbill Group: https://duckbillgroup.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate: is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards, while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other, which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at Honeycomb.io/screaminginthecloud. Observability, it's more than just hipster monitoring.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Periodically, I have a whole bunch of guests come on up, second time. Now, it's easy to take the naive approach of assuming that it's because it's easier for me to find a guest if I know them and don't have to reach out to brand new people all the time. This is absolutely correct; I'm exceedingly lazy. But I don't have too many folks on a third time, but that changes today.My guest is Tim Banks. I've had him on the show twice before, both times it led to really interesting conversations around a wide variety of things. Since those episodes, Tim has taken the job as a principal cloud economist here at The Duckbill Group. Yes, that is probably the strangest interview process you can imagine, but here we are. Tim, thank you so much for joining me both on the show and in the business.Tim: My pleasure, Corey. It was definitely an interesting interview process, you know, but I was glad to be here. So, I'm happy to be here a third time. I don't know if you get a jacket like you do in Saturday Night Live, if you host, like, a fifth time, but we'll see. Maybe it's a vest. A cool vest would be nice.Corey: We can come up with something.[ effectively, it can be like reverse hangman where you wind up getting a vest and every time you come on after that you get a sleeve, then you get a second sleeve, and then you get a collar, and we can do all kinds of neat stuff.Tim: I actually like that idea a lot.Corey: So, I'm super excited to be able to have this conversation with you because I don't normally talk a lot on this show about what cloud economics is because my guest usually is not as deep into the space as I am, and that's fine; people should never be as deep into this space as I am, in the general sense, unless they work here. Awesome. But I do guest on other shows, and people ask me all kinds of questions about AWS billing and cloud economics, and that's fine, it's great, but they don't ask the questions about the space in the same way that I would and the way that I think about it. So, it's hard for me to interview myself. Now, I'm not saying I won't try it someday, but it's challenging. But today, I get to take the easy path out and talk to you about it. So Tim, what the hell is a principal cloud economist?Tim: So, a principal cloud economist, is a cloud computing expert, both in architecture and practice, who looks at cloud cost in the same way that a lot of folks look at cloud security, or cloud resilience, or cloud performance. So, the same engineering concerns you have about making sure that your API stays up all the time, or to make sure that you don't have people that are able to escape containers or to make sure that you can have super, super low response times, is the same engineering fundamentals that I look at when I'm trying to find a way to reduce your AWS bill.Corey: Okay. When we say cloud cost and cloud economics, the natural picture that leads to mind is, “Oh, I get it. You're an Excel jockey.” And sometimes, yeah, we all kind of play those roles, but what you're talking about is something else entirely. You're talking about engineering expertise.And sure enough, if you look at the job postings we have for roles on the team from time to time, we have not yet hired anyone who does not have an engineering and architecture background. That seems odd to folks who do not spend a lot of time thinking about the AWS bill. I'm told those people are what is known as ‘happy.' But here we are. Why do we care about the engineering aspect of any of this?Tim: Well, I think first and foremost because what we're doing in essence, is still engineering. People aren't putting construction paper up on [laugh] AWS; sometimes they do put recipes up on there, but it still involves working on a computer, and writing code, and deploying it somewhere. So, to have that basic understanding of what it is that folks are doing on the platform, you have to have some engineering experience, first and foremost. Secondly, the fact of the matter is that most cost optimization, in my opinion, can be done on the whiteboard, before anything else, and really I think should be done on the whiteboard before anything else. And so the Excel aspect of it is always reactive. “We have now spent this much. How much was it? Where did it go?” And now we have to figure out where it went.I like to figure out and get a ballpark on how much something is going to cost before I write the first line of code. I want to know, hey, we have a tier here, we're using this kind of storage, it's going to take this kind of instance types. Okay, well, I've got an idea of how much it's going to cost. And I was like, “You know, that's going to be expensive. Before we do anything, is there a way that we can reduce costs there?”And so I'm reverse engineering that on already deployed workloads. Or when customers want to say, “Hey, we were thinking about doing this, and this is our proposed architecture,” I'm going to look at it and say, “Well, if you do this and this and this and this, you can save money.”Corey: So, it sounds like you and I have a bit of a philosophical disagreement in some ways. One of my recurring talking points has always been that, “Oh, by and large, application developers don't need to think overly much about cloud cost. What they need to know generally fits on an index card.” It's, okay, big things cost more than small things; if you turn something on, it will never get turned off and will bill you in perpetuity; data transfer has some weird stuff; and if you store data, you pay for data, like, that level of baseline understanding. When I'm trying to build something out my immediate thought is, great, is this thing possible?Because A, I don't always know that it is, and B, I'm super bad at computers so for me, it may absolutely not be, whereas you're talking about baking cost assessments into the architecture as a day one type of approach, even when sketching ideas out on the whiteboard. I'm curious as to how we diverge there. Can you talk more about your philosophy?Tim: Sure. And the reason I do that is because, as most folks that have an engineering background in cloud infrastructure will tell you, you want to build resilience in, on the whiteboard. You certainly want to build performance in, on the whiteboard, right? And security folks will tell you you want to do security on the whiteboard. Because those things are hard to fix after they're deployed.As soon as they're deployed, without that, you now have technical debt. If you don't consider cost optimization and cost efficiency on the whiteboard, and then you try and do it after it's deployed, you not only have technical debt, you may have actual real debt.Corey: One of the comments I tend to give a lot is that architecture and cost are the same thing in the world of cloud. And I think that we might be in violent agreement, as Liz Fong-Jones is fond of framing it, where I am acutely aware of aspects of cost and that does factor into how I build things on the whiteboard—let's also be very clear, most of the things that I build are very small scale; the largest cost by a landslide is the time I spend building it—in practice, that's an awful lot of environments; people are always more expensive than the AWS environment they're working on. But instead, it's about baking in the assumptions and making sure you're not coming up with something that is going to just be wasteful and horrible out of the gate, and I guess part of that also is the fact that I am at a level of billing understanding that I sort of absorbed these concepts intrinsically. Because to me, there is no difference between cost and architecture in an environment like this. You're right, there's always an inherent trade-off between cost and durability. On the one hand, I don't like that. On the other, it feels like it's been true forever and I don't see a way out of it.Tim: It is inescapable. And it's interesting because you talk about the level of an application developer or something like that, like what is your level of concern, but retroactively, we'll go in for cost optimization houses—and I've done this as far back as when I was working at AWS has a TAM—and I'll ask the question to an application developer or database administrator, and I'm like, “Why do you do this? What do you have a string value for something that could be a Boolean?” And you'll ask, “Well, what difference does that make?” Well, it makes a big difference when you're talking about cycles for CPU.You can reduce your CPU consumption on a database instance by changing a string to a Boolean, you need fewer instances, or you need a less powerful instance, or you need less memory. And now you can run a less expensive instance for your database architecture. Well, maybe for one node it's not that biggest difference, but if you're talking about something that's multi-AZ and multi-node, I mean, that can be a significant amount of savings just by making one simple change.Corey: And that might be the difference right there. I didn't realize that, offhand. It makes sense if you think about it, but just realizing that I've made that mistake on one of my DynamoDB tables. It costs something like seven cents a month right now, so it's not something I'm rushing to optimize, but you're right, expand that out by a factor of a million or so, and we're talking serious money, and then that sort of optimization makes an awful lot of sense. I think that my position on it is that when you're building out something small scale as a demo or a proof of concept, spending time on optimizations like this is not the best use of anyone's time or brain sweat, for lack of a better term. How do you wind up deciding when it's time to focus on stuff like that?Tim: Well, first, I will say that—I daresay that somewhere in the 80% of production workloads are just—were the POC, [laugh] right? Because, like, “It worked for this to get funding, let's run it,” right?Corey: Let they who does not have a DynamoDB table in production with the word ‘test' or ‘dev' in it cast the first stone.Tim: It's certainly not me. So, I understand how some of those decisions get made. And that's why I think it's better to think about it early. Because as I mentioned before, when you start something and say, “Hey, this works for now,” and you don't give consideration to that in the future, or consideration for what it's going to be like in the future, and when you start doing it, you'll paint yourself into corners. That's how you get something like static values put in somewhere, or that's how you get something like, well, “We have to run this instance type because we didn't build in the ability to be more microservice-based or stateless or anything like that.”You've seen people that say, “Hey, we could save you a lot of money if you can move this thing off to a different tier.” And it's like, “Well, that would be an extensive rewrite of code; that'd be very expensive.” I daresay that's the main reason why most AS/400s are still being used right now is because it's too expensive to rewrite the code.Corey: Yeah, and there's no AWS/400 that they can migrate to. Yet. Re:Invent is nigh.Tim: So, I think that's why, even at the very beginning, even if you were saying, “Well, this is something we will do later.” Don't make it impossible for you to do later in your code. Don't make it impossible for you to do later in your architecture. Make things as modular as possible, so that way you can say, “Hey”—later on down the road—“Oh, we can switch this instance type.” Or, “Here's a new managed service that we can maybe save money on doing this.”And you allow yourself to switch things out, or turn different knobs, or change the way you do things, and give yourself more options in the future, whether those options are for resilience, or those options or for security, or those options are for performance, or they're for cost optimizations. If you make binding decisions earlier on, you're going to have debt that's going to build up at some point in the future, and then you're going to have to pay the piper. Sometimes that piper is going to be AWS.Corey: One thing that I think gets lost in a lot of conversations about cloud economics—because I know that it happened to me when I first started this place—where I am planning to basically go out and be the world's leading expert in AWS cost analysis and understanding and optimization. Great. Then I went out into the world and started doing some of my first engagements, and they looked a lot less like far-future cost attribution projections and a lot more like, “What's a reserved instance?” And, “We haven't bought any of those in 18 months.” And, “Oh, yeah, we shut down an entire project six months ago. We should probably delete all the resources, huh?”The stuff that I was preparing for at the high end of the maturity curve are great and useful and terrific to have conversations about in some very nuanced depth, but very often there's a walk before you can run style of conversation where, okay, let's do the easy stuff first before we start writing a whole bunch of bespoke internal stuff that maps your business needs to the AWS bill. How do you, I guess, reconcile those things where you're on the one hand, you see the easy stuff and on the other, you see some of the just the absolutely challenging, very hard, five-years-of-engineering-effort-style problems on the other?Tim: Well, it's interesting because I've seen one customer very recently who has brilliant analyses as to their cost; just well-charted, well-tagged, well-documented, well—you know, everything is diagrammed quite nicely and everything like that, and they're very, very aware of their costs, but they leave test instances running all weekend, you know, and their associated volumes and things like that. And that's a very easy thing to fix. That is a very, very low-hanging fruit. And so sometimes, you just have to look at where they're spending their efforts where sometimes they do spend so much time chasing those hard to do things because they are hard to do and they're exciting in an engineering aspect, and then something as simple as, “Hey, how about we delete these old volumes?” It just isn't there.Or, “How about we switch to your S3 bucket storage type?” Those are easy, low-hanging fruits, and you would be surprised how sometimes they just don't get that. But at the same time, sometimes customers have, like, “Hey, we could knock this thing out, we knock this thing out,” because it's Trusted Advisor. Every AI cost optimization recommendation you can get will tell you these five things to do, no matter who you are or where you are, but they don't do the conceptual things like understanding some of the principles behind cost optimization and cost optimization architecture, and proactive cost optimization versus react with cost optimizations. So, you're doing very conceptual education and conversations with folks rather than the, “Do these five things.” And I've not often found a customer that you have to do both on; it's usually one or the other.Corey: It's funny that you made that specific reference to that example. One of my very first projects—not naming names. Generally, when it comes to things like this, you can tell stories or you can name names; I bias for stories—I was talking to a company who was convinced that their developer environments were incredibly overwrought, expensive, et cetera, and burning money. Okay, great. So, I talked about the idea of turning those things off at night or between test runs, deleting volumes to snapshot, and restore them on a schedule when people come in in the morning because all your developers sit in the same building in the same time zones. Great. They were super on board with the idea, and it was going to be a little bit of work, but all right, this was in the days before the EC2 Instance Scheduler, for example.But first, let's go ahead and do some analysis. This is one of those early engagements that really reinforced my idea of, yeah, before we start going too far down the rabbit hole, let's double-check what's going on in the account. Because periodically you encounter things that surprise people. Like, “What's up with those Australia instances?” “Oh, we don't have anything in that region.” “I believe you're being sincere when you say this, however, the API generally doesn't tell lies.”So, that becomes a, oh, security incident time. But looking at this, they were right; they had some fairly sizable developer instances that were running all the time, but doing some analysis, their developer environment was 3% of their bill at the time and they hadn't bought RIs in a year-and-a-half. And looking at what they were doing, there was so much easier stuff that they could do to generate significant savings without running the potential of turning a developer environment off at night in the middle of an incident or something like that. The risk factor and effort were easier just do the easy stuff, then do another pass and look at the deep stuff. And to be clear, they weren't lying to me; they weren't wrong.Back when they started building this stuff out, their developer environments were significantly large and were a significant portion of their spend. And then they hit product-market fit, and suddenly their production environment had to scale significantly in a short period of time. Which, yay, cloud. It's good at that. Then it just became such a small portion that developer environments weren't really a thing. But the narrative internally doesn't get updated very often because once people learn something, they don't go back to relearn whether or not it's still true. It's a constant mistake; I make it myself frequently.Tim: I think it's interesting, there are things that we really need to put into buckets as far as what's an engineering effort and what's an administrative effort. And when I say ‘administrative effort,' I mean if I can save money with a stroke of a pen, well, that's going to be pretty easy, and that's usually going to be RIs; that's going to be EDPs, or PPAs or something like that, that don't require engineering effort. It just requires administrative effort, I think RIs being the simplest ones. Like, “Oh, all I have to do is go in here and click these things four times and I'm going to save money?” “Well, let's do that.”And it's surprising how often people don't do that. But you still have to understand that, and whether it's RIs or whether it's a savings plan, it's still a commitment of some kind, but if you are willing to make that commitment, you can save money with no engineering effort whatsoever. That's almost free money.Corey: So, much of what we do here comes down to psychology, in many ways, more than it does math. And a lot of times you're right, everything you say is right, but in a large-scale environment, go ahead and click that button to buy the savings plan or the reserved instance, and that's a $20 million purchase. And companies will stall for months trying to run a different series of analyses on this and what if this happens, what if that happens, and I get it because, “Yeah, I'm going to click this button that's going to cost more money than I'll make in my lifetime,” that's a scary thing to do; I get it. But you're going to spend the money, one way or the other, with the provider, and if you believe that number is too high, I get it; I am right there with you. Buy half of them right now and then you can talk about the rest until you get to a point of being comfortable with it.Do it incrementally; it's not all or nothing, you have one shot to make the buy. Take pieces out of it that makes sense. You know you're probably not going to turn off your database cluster that handles all of production in the next year, so go ahead and go for it; it saves some money. Do the thing that makes sense. And that doesn't require deep-dive analytics that requires, on some level, someone who's seen a lot of these before who gets what customers are going through. And honestly, it's empathy in many respects, becomes one of those powerful things that we can apply to our customer accounts.Tim: Absolutely. I mean, people don't understand that decision paralysis, about making those commitments costs you money. You can spend months doing analysis, but those months doing analysis, you're going to spend 30, 40, 50, 60, 70% more on your EC2 instances or other compute than you would otherwise, and that can be quite significant. But it's one of those cases where we talk about psychology around perfect being the enemy of good. You don't have to make the perfect purchase of RIs or savings plans and have that so tuned perfectly that you're going to get one hundred percent utilization and zero—like, you don't have to do that.Just do something. Do a little bit. Like you said, buy half; buy anything; just something, and you're going to save money. And then you can run analysis later on, while you're saving money [laugh] and get a little better and tune it up a little more and get more analysis on and maybe fine-tune it, but you don't actually ever need to have it down to the penny. Like, it never has to be that good.Corey: At some point, one of the value propositions we have for our customers has always been that we tell you when to stop focusing on saving money because there's a theoretical cap of a hundred percent of the cloud bill that you can save, but you can make so much more than that by launching the right feature to the right market a little sooner; focus on that. Be responsible stewards of the money that's invested with you, but by and large, as a general piece of guidance, at some point, stop cutting and go back to doing the thing that makes your company work. It's not all about saving money at all costs for almost all of us. It is for us, but we're sort of a special case.Tim: Well, it's a conversation I often have. It's like, all right, are you trying to save money on AWS or are you trying to save money overall? So, if you're going to spend $400,000 worth of engineering effort to save $10,000 on your AWS bill, that doesn't make no sense. So—[laugh]—Corey: Right. There has to be a strategic reason to do things like that—Tim: Exactly.Corey: —and make sure you understand the value of what you're getting for this. One reason that we wind up charging the way that we do—and we've gotten questions on this for a while—has been that we charge a fixed fee for what we do on engagements. And similarly—people have asked this, but haven't tied the two things together—you talk about cost optimization, but never cost-cutting. Why is that? Is that just a negative term?And the answer has been no, they're aligned. What we do focuses on what is best for the customer. Once that fixed fee is decided upon, every single thing that we say is what we would do if we were in the customer's position. There are times we'll look at what they have going on and say, “Ah, you really should spend more money here for resiliency, or durability,” or, “Okay, that is critical data that's not being backed up. You should consider doing that.”It's why we don't take percentages of things because, at that point, we're not just going with the useful stuff, it's, well we're going to basically throw the entire kitchen sink at you. We had an early customer and I was talking to their AWS account manager about what we were going to be doing and their comment was, “Oh, saving money on AWS bills is great, make sure you check the EBS snapshots.” Yeah, I did that. They were spending 150 bucks a month on EBS snapshots, which is basically nothing. It's one of those stories where if, in the course of an hour-long meeting, I can pay for that entire service, by putting a quarter on the table, I'm probably not going to talk about it barring [laugh] some extenuating circumstances.Focus on the big things, not the things that worked in a different environment with a different account and different constraints. It's hard to context switch like that, but it gets a lot easier when it is basically the entirety of what we do all day.Tim: The difference I draw between cost optimization and cost-cutting is that cost optimization is ensuring that you're not spending money unnecessarily, or that you're maximizing your dollar. And so sometimes we get called in there, and we're just validation for the measures they've already done. Like, “Your team is doing this exactly right. You're doing the things you should be doing. We can nitpick if you want to; we're going to save you $7 a year, but who cares about that? But y'all are doing what you should be doing. This is great. Going forward, you want to look for these things and look for these things and look for these things. We're going to give you some more concepts so that you are cost-optimized in the future.” But it doesn't necessarily mean that we have to cut your bill. Because if you're already spending efficiently, you don't need your bill cut; you're already cost-optimized.Corey: Oh, we're not going to nitpick on that, you're mostly optimized there. It's like, “Yeah, that workload's $140 million a year and rising; please, pick nits.” At which point? “Okay, great.” That's the strategic reason to focus on something. But by and large, it comes down to understanding what the goals of clients are. I think that is widely misunderstood about what we do and how we do it.The first question I always ask when someone does outreach of, “Hey, we'd like to talk about coming in here and doing a consulting engagement with us.” “Great.” I always like to ask the quote-unquote, “Foolish question” of, “Why do you care about the AWS bill?” And occasionally I'll get people who look at me like I have two heads of, “Why wouldn't I care about the AWS bill?” Because there are more important things to care about for the business, almost certainly.Tim: One of the things I try and do, especially when we're talking about cost optimization, especially trying to do something for the right now so they can do things going forward, it's like, you know, all right, so if we cut this much from your bill—if you just do nothing else, but do reserved instances or buy a savings plan, right, you're going to save enough money to hire four engineers. Think about what four engineers would do for your overall business? And that's how I want you to frame it; I want you to look at what cost optimization is going to allow you to do in the future without costing you any more money. Or maybe you save a little more money and you can shift it; instead of paying for your AWS bill, maybe you can train your developers, maybe you can get more developers, maybe you can get some ProServ, maybe you can do whatever, buy newer computers for your people so they can do—whatever it is, right? We're not saying that you no longer have to spend this money, but saying, “You can use this money to do something other than give it to Jeff Bezos.”Corey: This episode is sponsored in part by Liquibase. If you're anything like me, you've screwed up the database part of a deployment so severely that you've been banned from touching every anything that remotely sounds like SQL, at at least three different companies. We've mostly got code deployments solved for, but when it comes to databases we basically rely on desperate hope, with a roll back plan of keeping our resumes up to date. It doesn't have to be that way. Meet Liquibase. It is both an open source project and a commercial offering. Liquibase lets you track, modify, and automate database schema changes across almost any database, with guardrails to ensure you'll still have a company left after you deploy the change. No matter where your database lives, Liquibase can help you solve your database deployment issues. Check them out today at liquibase.com. Offer does not apply to Route 53.Corey: There was an article recently, as of the time of this recording, where Pinterest discussed what they had disclosed in one of their regulatory filings which was, over the next eight years, they have committed to pay AWS $3.2 billion. And in this article, they have the head of engineering talking to the reporter about how they're thinking about these things, how they're looking at things that are relevant to their business, and they're talking about having a dedicated team that winds up doing a whole bunch of data analysis and running some analytics on all of these things, from piece to piece to piece. And that's great. And I worry, on some level, that other companies are saying, “Oh, Pinterest is doing that. We should, too.” Yeah, for the course of this commitment, a 1% improvement is $32 million, so yeah, at that scale I'm going to hire a team of data scientists, too, look at these things. Your bill is $50,000 a month. Perhaps that's not worth the effort you're going to put into it, barring other things that contribute to it.Tim: It's interesting because we will get folks that will approach us that have small accounts—very small, small spend—and like, “Hey, can you come in and talk to us about this whatever.” And we can say very honestly, “Look, we could, but the amount of money we're going to charge you is going to—it's not going to be worth your while right now. You could probably get by on the automated recommendations, on the things that already out there on the internet that everybody can do to optimize their bill, and then when you grow to a point where now saving 10% is somebody's salary, that's when it, kind of, becomes more critical.” And it's hard to say what point that is in anyone's business, but I can say sometimes, “Hey, you know what? That's not really what you need to focus on.” If you need to save $100 a month on your AWS bill, and that's critical, you've got other concerns that are not your AWS bill.Corey: So, back when you were interviewing to work here, one of the areas of focus that you kept bringing up was the concept of observability, and my response to this was, “Ah, hell. Another one.” Because let's be clear, Mike Julian—my business partner and our CEO—has written a book called Practical Monitoring, and apparently what we learned from this is as soon as you finish writing a book on the topic, you never want to talk about that topic ever again, which yeah, in hindsight makes sense. Why do you care about observability when you're here to look at cloud costs?Tim: Because cloud costs is another metric, just like you would use for performance, or resilience, or security. You do real-time monitoring to see if somebody has compromised the system, you do real-time monitoring to see if you have bad performance, if response times are too slow. You do real-time monitoring to know if something has gone down and then you need to make adjustments, or that the automated responses you have in response to that downtime are working. But cloud costs, you send somebody a report at the end of the month. Can you imagine, if you will—just for a second—if you got a downtime report at the end of month, and then you can react to something that has gone down?Or if you get a security report at the end of the month, and then you can react to the fact that somebody has your root keys? Or if you get [laugh] a report at the end of month, this said, “Hey, the CPU on this one was pegged. You should probably scale up.” That's outrageous to anybody in this industry right now. But why do we accept that for cloud cost?Corey: It's worse than that. There are a number of startups that talk about, “Oh, real-time cloud cost monitoring. Okay, the only way you're going to achieve such a thing is if you build an API shim that interprets everything that you're telling your cloud control plane to do, taking cost metrics out of it, and then passing it on to the actual cloud control plane.” Otherwise, you're talking about it showing up in the billing record in—ideally, eight hours; in practice, several days, or you're talking about the CloudTrail events, which is not holistic but gives you some rough idea, but it's also in some cases, 5 to 20 minutes delayed. There's no real-time way to do this without significant disruption to what's going on in your environment.So, when I hear about, “Oh, we do real-time bill analysis.” Yeah, it feels—to be very direct—you don't know enough about the problem space you're working within to speak intelligently about it because anyone who's played in this space for a while knows exactly how hard it is to get there. Now, I've talked to companies that have built real-time-ish systems that take that shim approach and acts sort of as a metadata sidecar ersatz billing system that tracks all of this so they can wind up intercepting potentially very expensive configuration mistakes. And that's great. That's also a bit beyond for a lot of folks today, but it's where the industry is going. But there is no way to get there today, short of effectively intercepting all of those calls, in a way that is cohesive and makes sense. How do you square that circle given the complete lack of effective tooling?Tim: Honestly, I'm going to point that right back at the cloud provider because they know how much you're spending, real-time. They know exactly how much you spend in real-time. They've figured it out. They have the buckets, they have APIs for it internally. I'm sure they do; it would make no sense for them not to. Without giving anything anyway, I know that when I was at AWS, I knew how much they were spending, almost real-time.Corey: That's impressive. I wish that existed. My never having worked at AWS perspective on it is that they, of course, have the raw data effective immediately, or damn close to it, but the challenge for the billing system is distilling and summarizing and attributing all of that in a reasonable timeframe; it is an exabyte-scale problem. I've talked to folks there who have indicated it is comfortably north of a petabyte in raw data per day. And that was a couple of years ago, so one can only imagine as the footprint has increased, so has all of this.I mean, the billing system is fundamentally magic from the outside. I'm not saying it's good magic, but it is magic, and it's something that is unappreciated, that every customer uses, and is one of those areas that doesn't get the attention it deserves. Because, let's be clear, here, we talk about observability; the bill is still the only thing that AWS offers that gives you a holistic overview of everything running in your account, in one place.Tim: What I think is interesting is that you talk about this, the scale of the problem and that it makes it difficult to solve. At the same time, I can have a conversation with my partner about kitty litter, and then all of a sudden, I'm going to start getting ads about kitty litter within minutes. So, I feel like it's possible to emit cost as a metric like you would CPU or disk. And if I'm going to look at who's going to do that, I'm going to look right back at AWS. The fun part about that, though, is I know from AWS's business model, that if that's something they were to emit, it would also cost you, like, 25 cents per call, and then you would actually, like, triple your cloud costs just trying to figure out how much it costs you.Corey: Only with 16 other billing dimensions because of course it would. And again, I'm talking about stuff, because of how I operate and how I think about this stuff, that is inherently corner case, or [vertex 00:31:39] case in many cases. But for the vast majority of folks, it's not the, “Oh, you have this really weird data transfer paradigm between these two resources,” which yeah, that's a problem that needs to be addressed in an awful lot of cases because data transfer pricing is bonkers, but instead it's the, “Huh. You just spun up a big cluster that's going to cost $20,000 a month.” You probably don't need to wait a full day to flag that.And you also can't put this on the customer in the sense of, “Oh, just set some budget alarms, that's great. That's the first thing you should do in a new AWS account.” “Well, jackhole, I've done an awful lot of first things I'm supposed to do in an AWS account, in my dedicated test account for these sorts of things. It's been four months, I'm not done yet with all of those first things I'm supposed to do.” It's incredibly secure, increasingly expensive, and so far all it runs is a single EC2 instance that is mostly there just so that everything else doesn't error out trying to divide by zero.Tim: There are some things that are built-in. If I stand up an EC2 instance and it goes down, I'm going to get an alert that this instance terminated for some reason. It's just going to show up informationally.Corey: In the console. You're not going to get called about it or paged about it, unless—Tim: Right.Corey: —you have something else in the business that will, like a boss that screams at you two o'clock in the morning. This is why we have very little that's production-facing here.Tim: But if I know that alert exists somewhere in the console, that's easy for me to write a trap for. That's easy for me to write, say hey, I'm going to respond to that because this call is going to come out somewhere; it's going to get emitted somewhere. I can now, as an engineer, write a very easy trap that says, “Hey, pop this in the Slack. Send an alert. Send a page.”So, if I could emit a cost metric, and I could say, “Wow. Somebody has spun up this thing that's going to cost X amount of money. Someone should get paged about this.” Because if they don't page about this and we wait eight hours, that's my month's salary. And you would do that if your database server went down; you would do that if someone rooted that database server; you would do that if the database server was [bogging 00:33:48] you to scale up another one. So, why can't you do that if that database server was all of sudden costing you way more than you had calculated?Corey: And there's a lot of nuance here because what you're talking about makes perfect sense for smaller-scale accounts, but even some of the very large accounts where we're talking hundreds of millions a year in spend, you can set compromised keys up on GitHub, put them in Payspin, whatever, and then people start spinning up Bitcoin miners everywhere. Great. It takes a long time to materially move the needle on that level of spend; it gets lost in the background noise. I lose my mind when I wind up leaving a managed NAT gateway running and it cost me 70 bucks a month in my $5 a month test account. Yeah, but you realize you could basically buy an island and it gets lost in the AWS bill at some of the high watermarks for some of these larger accounts.“Oh, someone spun up a cluster that's going to cost $400,000 a year?” Yeah, do I need to re-explain to you what a data science team does? They light money on fire in return for questionable returns, as a general rule. You knew that when you hired them; leave them alone. Whereas someone in their developer account does this, yeah, you kind of want to flag that immediately.It always comes down to rules and context. But I'd love to have some templates ready to go of, “I'm a starving student, please alert me anytime it looks like I might possibly exceed the free tier,” or better yet, “Don't let me, and if I do, it's on you and you eat the cost.” Conversely, it's, “Yeah, this is a Netflix sub-account or whatnot. Maybe don't bother me for anything whatsoever because freedom and responsibility is how we roll.” I imagine that's what they do internally on a lot of their cloud costing stuff because freedom and responsibility is ingrained in their culture. It's great. It's the freedom from having to think about cloud bills and the responsibility for paying it, of the cloud bill.Tim: Yeah, we will get internally alerted if things are [laugh] up too long, and then we will actually get paged, and then our manager would get paged, [laugh] and it would go up the line. If you leave something that's running too expensive, too long. So, there is a system there for it.Corey: Oh, yeah. The internal AWS systems for employees are probably my least favorite AWS service, full stop. And I've seen things posted about it; I believe it's called Isengard, for spinning up internal accounts and the rest—there's a separate one, I think, called Conduit, but I digress—that you spin something up, and apparently if it doesn't wind up—I don't need you to comment on this because you worked there and confidentiality is super important, but to my understanding it's, great, it has a whole bunch of formalized stuff like that and it solves for a whole lot of nifty features that bias for the way that AWS focuses on accounts and how they've view security and the rest. And, “Oh, well, we couldn't possibly ship this to customers because it's not how they operate.” And that's great.My problem with this internal provisioning system is it isolates and insulates AWS employees from the real pain of working with multiple accounts as a customer. You don't have to deal with the provisioning process of Control Tower or whatnot; you have your own internal thing. Eat your own dog food, gargle your own champagne, whatever it takes to wind up getting exposure to the pain that hits customers and suddenly you'll see those things improve. I find that the best way to improve a product is to make the people building it live with the painful parts.Tim: I think it's interesting that the stance is, “Well, it's not how the customers operate, and we wouldn't want the customers to have to deal with this.” But at the same time, you have to open up, like, 100 accounts if you need more than a certain number of S3 buckets. So, they are very comfortable with burdening the customer with a lot of constraints, and they say, “Well, constraints drive innovation.” Certainly, this is a constraint that you could at least offer and let the customers innovate around that.Corey: And at least define who the customer is. Because yeah, “I'm a Netflix sub-account is one story,” “I'm a regulated bank,” is another story, and, “I'm a student in my dorm room, trying to learn how this whole cloud thing works,” is another story. From risk tolerance, from a data protection story, from a billing surprise story, from a, “I'm trying to learn what the hell this is, and all these other service offerings you keep talking to me about confuse the hell out of me; please streamline the experience.” There's a whole universe of options and opportunity that isn't being addressed here.Tim: Well, I will say it very simply like this: we're talking about a multi-trillion dollar company versus someone who, if their AWS bill is too high, they don't pay rent; maybe they don't eat; maybe they have other issues, they don't—medical bill doesn't get paid; child care doesn't get paid. And if you're going to tell me that this multi-trillion dollar company can't solve for that so that doesn't happen to that person and tells them, “Well, if you come in afterwards, after your bill gets there, maybe we can do something about it, but in the meantime, suffer through this.” That's not ethical. Full stop.Corey: There are a lot of things that AWS gets right, and I want to be clear that I'm not sitting here trying to cast blame and say that everything they're doing is terrible. I feel like every time I talk about billing in any depth, I have to throw this disclaimer in. Ninety to ninety-five percent of what they do is awesome. It's just the missing piece that is incredibly painful for customers, and that's what I spend most of my time focusing on. It should not be interpreted to think that I hate the company.I just want them to do better than they are, and what they're doing now is pretty decent in most respects. I just want to fix the painful parts. Tim, thank you for joining me for a third time here. I'm certain I'll have you back in the somewhat near future to talk about more aspects of this, but until then, where can people find you slash retain your services?Tim: Well, you can find me on Twitter at @elchefe. If you want to retain my services for which you would be very, very happy to have, you can go to duckbillgroup.com and fill out a little questionnaire, and I will magically appear after an exchange of goods and services.Corey: Make sure to reference Tim by name just so that we can make our sales team facepalm because they know what's coming next. Tim, thank you so much for your time; it's appreciated.Tim: Thank you so much, Corey. I loved it.Corey: Principal cloud economist here at The Duckbill Group, Tim Banks. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, wait at least eight hours—possibly as many as 48 to 72—and then leave a comment explaining what you didn't like.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
All-star trio Charity Majors, Liz Fong-Jones, and George Miranda join us this week to talk all things observability, observability engineering, and their new book on the subject.
Our old pal Mark Mirchandani is back this week, joining Stephanie Wong and our guests Steve McGhee and Yuri Grinshteyn to talk about Site Reliability Engineering. SRE is Google’s way of helping companies of all sizes create consistent, predictable, and functional projects. It helps clients approach operations from a software engineering stand point so that growing systems can be managed efficiently. We talk about the challenges of implementing best SRE practices and how companies can overcome these. Though the benefits of SRE are many, it can be difficult for clients to grasp. Steve and Yuri tell us the process they go through with customers to help them set realistic goals and work to make reliable, scalable projects with little downtime. By starting small and taking wins early, Steve says clients reap the rewards of SRE and are encouraged to push forward. Yuri’s customer-centric approach encourages companies to prioritize alerts that affect the user experience, thus limiting inbox mayhem and keeping customers happy. Alerts based on symptoms, Steve says, help accomplish this goal. Later, Yuri and Steve describe the best ways for companies to get started with SRE. Realistic goals and specific detailed plans can make the journey less bumpy for clients, and Google’s SRE team can help. Steve McGhee Steve was an SRE at Google for about 10 years, then left to help a company build reliable systems on the Cloud. Now he’s back at Google, helping more companies do that. Yuri Grinshteyn Yuri works with Google Cloud Platform customers to help them design, architect, build, and operate reliable applications and services. He also advocates for SRE principles and practices on YouTube and elsewhere. Cool things of the week Fresh updates: Google Cloud 2021 Summits blog Why you need to explain machine learning models blog GCP Podcast Episode 260: Responsible AI with Craig Wiley and Tracy Frey podcast GCP Podcast Episode 249: ML Lifecycle with Dale Markowitz and Craig Wiley podcast GCP Podcast Episode 214: AI in Healthcare with Dale Markowitz podcast Interview Site Reliability Engineering site Reliability Architecture Framework site Site Reliability Engineering: Measuring and Managing Reliability on Coursera site Developing a Google SRE Culture on Coursera site How Lowe's meets customer demand with Google SRE practices blog GCP Podcast Episode 68: The Home Depot with William Bonnell podcast GCP Podcast Episode 213: The Art of SLOs with Alex Bramley podcast GCP Podcast Episode 127: SRE vs Devops with Liz Fong-Jones and Seth Vargo podcast GCP Podcast Episode 72: Customer Reliability Engineering with Luke Stone podcast GCP Podcast Episode 38: Site Reliability Engineering with Paul Newson podcast GCP Podcast Episode 59: SRE II with Paul Newson podcast What’s something cool you’re working on? Yuri has been working on Engineering for Reliability. Stephanie has been working on her new series What's New in Networking.
Corey Quinn is guest hosting on Software Engineering Daily this week, presenting a Tour of the Cloud. Corey Quinn is the Chief Cloud Economist at The Duckbill Group, where he helps companies fix their AWS bill by making it smaller and less horrifying. If you're looking to lower your AWS bill or negotiate a new The post GCP with Liz Fong-Jones appeared first on Software Engineering Daily.
Corey Quinn is guest hosting on Software Engineering Daily this week, presenting a Tour of the Cloud. Corey Quinn is the Chief Cloud Economist at The Duckbill Group, where he helps companies fix their AWS bill by making it smaller and less horrifying. If you’re looking to lower your AWS bill or negotiate a new
Corey Quinn is guest hosting on Software Engineering Daily this week, presenting a Tour of the Cloud. Corey Quinn is the Chief Cloud Economist at The Duckbill Group, where he helps companies fix their AWS bill by making it smaller and less horrifying. If you’re looking to lower your AWS bill or negotiate a new The post GCP with Liz Fong-Jones appeared first on Software Engineering Daily.
Corey Quinn is guest hosting on Software Engineering Daily this week, presenting a Tour of the Cloud. Corey Quinn is the Chief Cloud Economist at The Duckbill Group, where he helps companies fix their AWS bill by making it smaller and less horrifying. If you’re looking to lower your AWS bill or negotiate a new The post GCP with Liz Fong-Jones appeared first on Software Engineering Daily.
In episode 4 of Unintended Consequences, Yoz and Kim speak with Charity Majors and Liz Fong-Jones of Honeycomb about scaling teams and company values. They share how their team makes space for curiosity and learning as a group, and why this has been so important for the Honeycomb platform.
In episode 4 of Unintended Consequences, Yoz and Kim speak with Charity Majors and Liz Fong-Jones of Honeycomb about scaling teams and company values. They share how their team makes space for curiosity and learning as a group, and why this has been so important for the Honeycomb platform.
In episode 4 of Unintended Consequences, Yoz and Kim speak with Charity Majors and Liz Fong-Jones of Honeycomb about scaling teams and company values. They share how their team makes space for curiosity and learning as a group, and why this has been so important for the Honeycomb platform. The post Ep. #4, Everything is an Experiment with Charity Majors and Liz Fong-Jones of Honeycomb appeared first on Heavybit.
02:21 - Melissa’s Superpower: Being Extremely Online 03:06 - Unionizing Glitch (https://glitch.com/) * Glitch workers sign tech’s first collective bargaining agreement (https://www.theverge.com/2021/3/2/22307671/glitch-workers-sign-historic-collective-bargaining-agreement-cwa) * Misconceptions re: Unions * Engineer Salary Discrepancies * Middle Management, Product Management Unionization * Minority Unions (i.e. Google) * What is a Minority Union? (https://workercenters.com/labors-loophole/what-is-a-minority-union/) * The Rise of Minority Unions: How Social Movements and Tech Giants Could Be Showing Signs of Things To Come (The Rise of Minority Unions: How Social Movements and Tech Giants Could Be Showing Signs of Things To Come) 14:58 - Melissa’s Previous Experience with Working w/ Unions * Communications Workers of America (CWA) (https://cwa-union.org/) * Civic Technology (What Is Civic Technology? (https://www.forbes.com/sites/quora/2017/09/19/what-is-civic-technology/)) * Chi Hack Night (https://chihacknight.org/) 17:13 - Positive Skills Union Organizers Should Have 18:32 - Thoughts on Leading with Petitions * We are Frank — a platform for worker voice (https://getfrank.medium.com/we-are-frank-189111ceb54a) * 2018 Google Walkouts (https://en.wikipedia.org/wiki/2018_Google_walkouts) 26:58 - Writing Online; Dismantling Publications and the Fracturing of the Media World * The Rise Of Substack—And What’s Behind It (https://www.forbes.com/sites/falonfatemi/2021/01/20/the-rise-of-substack-and-whats-behind-it/) * Melissa McEwen: The best JavaScript date libraries in 2021 (https://www.skypack.dev/blog/2021/02/the-best-javascript-date-libraries/) 29:41 - Evaluating Human Performance * PSA: DevRel isn’t fake !! * How to Hire A-Players: Finding the Top People for Your Team- Even If You Don't Have a Recruiting Department (https://www.amazon.com/How-Hire-Players-Recruiting-Department/dp/0470562242) * People Skills 43:21 - Getting Started with Organizing a Union * Use Signal (https://signal.org/en/), Not Slack * Be Harder to Fire Reflections: Casey: Hearing success stories re: unionizing. Jacob: How people skills can be a function of your individual team. Melissa: Studying more about unions in other countries. Rein: Looking more into co-ops and collectivisations. An injury to one is an injury to all. (https://en.wikipedia.org/wiki/An_injury_to_one_is_an_injury_to_all) This episode was brought to you by @therubyrep (https://twitter.com/therubyrep) of DevReps, LLC (http://www.devreps.com/). To pledge your support and to join our awesome Slack community, visit patreon.com/greaterthancode (https://www.patreon.com/greaterthancode) To make a one-time donation so that we can continue to bring you more content and transcripts like this, please do so at paypal.me/devreps (https://www.paypal.me/devreps). You will also get an invitation to our Slack community this way as well. Transcript: JACOB: Hello, and welcome to Greater Than Code, Episode 229. My name is Jacob Stoebel and I’m here with my co-panelist, Casey Watts. CASEY: Hi, I'm Casey. I'm here today with Melissa McEwen. Melissa is a web developer, working in content now. She often writes about the JavaScript ecosystem. She helped unionize Glitch, which recently signed their first Collective Bargaining Agreement in late February. Welcome, Melissa. So glad to have you. MELISSA: Hi, everyone. CASEY: We like to start each show by asking you a certain question. Melissa, what is your superpower and how did you acquire it? MELISSA: My superpower is being extremely online and I acquired it by being given computers way too young and having nothing to do, but play with computers. CASEY: I like that phrase “extremely online.” What does that look like today for you? MELISSA: It means, I know way too much about what's going on in Twitter and the internet in general and sometimes, I'll make references that you only know if you're extremely online and it's kind of embarrassing. I don't even know what it's like to not be extremely online, but I'm trying to stop being extremely online because it's overwhelming trying not to check Twitter every 5 seconds. CASEY: Oh, yeah. I did that a lot, too. I don't know if I would describe myself as extremely online, but I might have seen some of the same memes as you and I think that would [chuckles] give me a little bit of that. MELISSA: Yeah. I mean, memes, what's the latest drama on Twitter today, that kind of stuff. JACOB: Is there a way to turn that superpower and help people around you or, how do you leverage that? MELISSA: Yeah, the only thing that's good about it, I would say is that you know a lot. I try to write about things and provide my knowledge to other people. I mean, you know a lot, but on a surface level, that's the problem so, you have to always be aware of that. I'm not an expert on unions and for the Glitch union, I was one of the original organizing committee folks, but I was laid off last year in March and there were 18 people, I think laid off. So the union has been going on without me and that's just great. Me and some other externally online people, when we started the union, we leveraged our externally onliness because we were connected to a lot of people who helped us like the CWA, which is the Communications Workers of America. We found them online, for example and they were critical in getting the union actually started because we'd been talking about it, but they were the people that pushed us and they're one of the bigger unions. They've been around for a long time. They have an organized telecommunications workers, primarily and now they're doing some tech stuff. So very interesting. JACOB: Well, as someone who is moderately online at best, I have been reading a little bit about recent union news with Glitch, but I would love to hear your story about how it started and how it brought us to today. MELISSA: Yeah, I mean there's only so much I can say, but the stuff that really was – building a union is about connecting with your coworkers and a lot of people have said, “How are we going to build a union to the remote workplace?” Well, I was remote and half the company was remote. That's one good thing about being extremely online is you’re probably used to talking to people online. I connected to people in my workplace and people on my team. At first, it was mainly people on my own team and then what CWA teaches you to do is to build connections in your workplace. It's almost like you map it out and you talk to other people in your workplace and you try to leverage those connections. I wasn't connected to everybody in the workplace, but I was connected to some other people who were connected to people I wasn't connected to. So it was challenging in that this was not an office where I could go see these people every day. I had to kind of – you can't just sneakily invite someone to a call unionizing. You have to actually build social capital and build relationships and then turn those into those connections you need to build a union. A lot of us had been following union stuff in tech. I was a member of Tech Workers Co, I think others were and we thought since Glitch is a very diverse workplace, we want to make sure that workers have a seat at the table and can actually help each other and to help the company do right by the workers. We had some bumps along the road. It is hard to organize people remotely and a lot of people have misconceptions about unions. They think unions are only for certain workers like people working in a mine, or they have bad impressions of unions. Like, I don't know. I grew up and my parents were like, they told me that unions were bad. We watched On the Waterfront and they were like, “Oh, look, unions, they’re so corrupt.” But a union is just like an organization. It's a big organization and they have a history and they have a context and a union is just like anything. Like a company. It can be bad; it can be good. It's based on the people and once you join a union, you can help guide that union by being part of it. JACOB: I would think an extremely online person would be very good at that. MELISSA: Yeah, it did help to be constantly on Slack and on Twitter. JACOB: And good at really just making those connections. That would not come naturally to make all those personal connections, what you just said. MELISSA: Yeah, but also, it was. I do think people who had those real life – who were at the office did have an advantage in forming those connections because not everybody at Glitch was extremely online, for example. Also, meeting each other in real life, occasionally like, we'd go to the conferences and stuff, that really helped. It's complicated about how much organizing you can do in the workplace and at what times. You don't want to ever do it on times are supposed to be working, for example, so. CASEY: What were some of the things that made this unionization effort successful and possible and what were some of the things that got in the way? I think we've covered some already. MELISSA: Yeah. I think having a pretty social workplace, that was social online, but that doesn't include everybody. There’s some people who were more online than others, for example and the fact that we relied so much on online organizing, it was harder to reach those people. So it was very crucial that we have people in the New York City office who were able to do some on the ground in-person organizing and getting those people on board was like, once we got those people on board, that was a very important thing that we did. Because originally, it was all remote people and then we added in the New York City office people. Yeah, the bumps along the road are just misconceptions about unions, what they mean. People can union bust themselves just by having these misconceptions like, “Oh, union is a third-party. It'll affect my relationship with my manager. I can't be friends with my manager anymore.” It's not true at all. So some of the organizing committee had been in unions before. Like, there was one woman, who was a social worker, who had been a social worker union and I had been in a Civil Workers Union before. So I knew that I was friends with my managers in these unions and I mean, not that being friends with the manager is the priority, but the idea that if you're friends with a manager, you can't do a union. That's just not true. But some people thought that. CASEY: The biggest misconception I can think of is why do you well-compensated professionals need a union and I'm sure you've heard this all the time. MELISSA: Oh yeah, that’s a big one. CASEY: Yeah. Fill us in for that. Like what do you say to that? MELISSA: I think so. Online, someone was like, “Oh, it's cultural appropriation of blue-collar workers.” I do not agree with that. I think all workers benefit from a union and it is just an organization that allows workers to negotiate with their bosses and on a fair playing field. It's not a culture. You don't have to be in the movie, The Irishman, or On the Waterfront, or even know people like that. It's just a way of organizing a workplace and having a seat at the table, so. JACOB: You mentioned earlier that I think, or maybe you implied that this union joins multiple disciplines, too. Is that true? MELISSA: Yeah, like we had engineers and then we also had a media department. That's where things would be hard because a lot of workplaces are quite siloed and I've always been against that. Like, I hate the term non-technical for example, like video production people, those are the most technical people I know they're literally working with like technical equipment every day and they know so much about it. Those people are tactical. And then another big obstacle is who is eligible for a union? Who can join? It's not clear because tech has roles that aren't very traditional, like product manager. Is that a manager, or is that an individual contributor and often, that’s hashed out on the negotiating table. It's based on all these laws and I've read some of the laws, I'm not an expert, but it's good to read a little bit of the labor law just to understand. But even if you know it, it's interpreted differently by different courts and stuff. There's a National Labor Review Board that reviews labor disputes and stuff and that was Trump's appointed board. So we wanted to make sure we got a voluntary recognition because we didn't want anything to do with that board at that time because they were very hostile towards workers. JACOB: The reason I was curious about joining together all kinds of different people from different roles, I was just curious if that diverse workforce came with a diversity of priorities and goals for a union and if those presented any challenges. MELISSA: Yeah. There's a big class difference between engineers and people outside of engineering. Engineers are overwhelmingly paid higher than people outside of engineering, for example and I totally understand the resentment towards engineers. We need to acknowledge that if you're organizing multiple people and outside of engineering. I mean, the fact that engineering is so well-compensated. I don't understand why, for example, a video producer isn't compensated the same as an engineer. It's just an accident of history, how culturally valued, supply and demand, all these things mixed up together. So you have to realize that and when it comes down to money, paying dues. For an engineer, it might be like, “Oh, you're taking 1%, or 2% for the union,” and that's like, “Oh, that’d take you away from being able to go on vacation.” Whereas, for someone who is making a lot less, that's taking away from their ability to pay rent. So that is really, really hard and I don't have a good solution for that. I wish unions would offer things like maybe peg it to your income, maybe not, maybe at a lower percentage, but it tends to not be that high of a percentage; it's 1 to 3%. But acknowledging that that can be the difference between someone being able to afford or not. Especially the salary ranges were quite extreme in our case so, that was really hard. CASEY: I'm listening to this conversation based on my background as a product manager who happened to have managed engineers, designers, and product managers, I don't know how that structure came into play. But even that tier, I wanted to be part of a union, but I think it's US law maybe that gets in the way that says managers at any level can't be unionizing in any form, not even like—let's use a synonym for a union—collective people who tell each other, “Yes, you deserve more money,” or something like that. It's not we're not incentivized to work together in any way and we pretend that the HR department of the company does that for us, which they do the opposite often. What do you think about that middle management kind of thing and how it plays into product management? Your thought process? MELISSA: Yeah. That really sucks because then it becomes like some people feel left out who wanted to be part of the union and at that point, they feel like, “Oh, am I part of –?” Like, they're obviously not C-suite so that's really hard. Other countries have other types of unions like sectoral bargaining that get around that. I don't know that much about that, but we weren't sure if a product manager fit under the definition of qualify, or not. It just depends on if you make decisions on employment, if you tell people what to do, there's a lot of criteria. We did find that product managers were not going to be part of the union. So what does the product manager do? Well, they can't organize themselves, but they're just not legally protected under this bargaining thing under a labor law. So that really sucks and I don't know what the solution is. I guess, getting involved in bigger organizations that work for unionization. The Google union is very interesting and that is a different form of union. It's called a minority union and I don't know that much about those, but I know that people who are managers can join that one, but it has fewer legal protections. So I assume when CWA decided to organize Google under a minority union, it was because they felt they were not capable of doing the traditional union because there are so many obstacles to doing so in Google—Google’s size and multiple locations. It's very difficult. You can organize however you want, it's just what is legally protected and that kind of goes in, in that article. I talk about petitions, for example. Petitions are an example of organizing. That's not unionization, it's not protected by US labor law, but it is a form of organizing. The Google walkout, that's a form of organizing. That's not unionization. You just have fewer legal protections and you don't have the structure that you get from a union when you do those things. CASEY: Well, that's awesome. I'm not up-to-date on this. I'm going to be Googling minority union and sectoral bargaining after this call. MELISSA: Yeah. I didn't even know what a minority union until that came out. I was like, “Wow, I guess, someone should write a book.” There probably is a book. I'm going to find that book. JACOB: Melissa, what brought you to doing this in the first place? Did you have experience with organizing before, or was it something new to you? MELISSA: I didn't have any experience organizing, I suppose, but I was in a union before. I worked at University of Illinois in Chicago and their IT departments are in a union, an older established union. As soon as you join as an employee there, you're a part of that union. Actually unions, some of them aren't that great. Our union was kind of mediocre, to be honest. They barely involved people, for example, in the very top down. That's one thing when you're organizing, you have to choose which union you're going to organize under, or even to start your own union. We thought about starting our own union. I don't feel that qualified to hire union lawyers. You need to advantage money because CWA provided that all the lawyers and stuff like that and all the structure. CWA has gotten a lot of flack on Twitter recently with the Google union stuff. People have dug up the fact that they've represented security guards in the past, but it's a big organization; it's like working with the government. You can't expect perfection, we've got to get involved. If you want to change things, you've got to be involved yourself. I'm very skeptical of the idea that we should just throw that away and start our own thing as tech workers. Because I think people of different ages and classes and stuff have so much to teach us and that's what you get when you join a big union like CWA and you can't demand they fit your extremely online standards. So if you want them to follow the standard, you've got to join and get involved. JACOB: So definitely a politics of compromise from the get-go. MELISSA: Yeah, and I've been involved with the civic technology a little bit. So I was a little bit familiar with that. I've worked in government contracting and I've gone to Chai Hack Night, which is a Chicago meetup, for quite a while. It's a Chicago meetup focused on civil technology and government. I was familiar with some of that, but if you're a startup person, maybe that's harder. You expect unions are going to cater to you, treat you like a freaking princess or whatever, but no, they're not. They are a saboteurization. They've got members, they have a history, and you've got to take that for what it is. CASEY: All right, Melissa, you brought to the table to the union organizing effort your superpower of being extremely online. What other skills did some of the union organizers have that really helped? MELISSA: Yeah. Actually being consistent and organized, that's really important. Organizing meetings. I'm not into that kind of thing and thankfully, there were other people who did that and I thank them quite a lot. Taking notes, following up, once you make me angry, I'm very effective at arguing with people. So that's a good thing about extremely online, but it's bad about being extremely online, but it did come in handy a few times when unionizing. But otherwise, doing in-person on the ground work, I couldn't do because I was remote and organizing the meetings, taking notes, following up with CWA, coordinating between different people, that stuff. The other people helped with that. The other members of the organizing committee and then after the union was recognized, we had an election and some people did that election where you elect the reps and other people did that and I was really happy because I was tired at that point. [chuckles] CASEY: So I'm going into a little bit of a different topic. Melissa, I think you mentioned something about companies and nonprofits who want to lead with petitions and you have some thoughts on that I'm curious to hear. MELISSA: I am super anti petitions. I think these organizations push them and I think they're just antithetical to unionization. A Coworker, for example, they really push you to do these petitions and a, you're alerting your boss that you're organizing, you're doing it under a way that's not legally protected. Why don't you just unionize? I understand that some people can't and if you genuinely can't, that's great, but I wouldn't trust Coworker to tell you if it's okay, or not. I have noticed that some of the conflict on Twitter regarding the Google union, some people involved with that are also involved in Coworker. So I'm really against that and another company that's spread it out. It's a startup, they're called Get Frank but they're also doing petitions. They're very antithetical to unionization and people don't want to say that because the people who were involved with that are nice people and some of them are even involved with Tech Workers Co and stuff and they're nice online, or they're well-respected, but at some point, you’ve got to say, “This is just anti-union.” REIN: Yeah. I mean, taking a collective bargaining opportunity that can stretch across multiple issues and organize the workforce to push for all of them and turning into a petition about a specific thing that has marginal support. I don't see how that helps. I mean, I don't think that those startups are disrupting business organization. I think they're disrupting union organization. MELISSA: Yeah, and I think more people should call them out and the fact that a lot of people who the media goes to for comments about tech organizing are like – so, Liz Fong-Jones, I really respect her. She's on Twitter and she's a member of the board on Coworker and I find that not good. REIN: I mean, I guess the argument is that any place where you can voice concerns and generate support within the workers, the employees is better than none, but that's not how the world works. We can have unions, too, or instead actually putting effort into that means that you're not spending that time putting effort into organizing. MELISSA: Yeah. So when we were first thinking about unionizing, I was on Tech Workers Co and they connected me to people at Coworker and they were really pushing out to do a petition. I'm really lucky that my coworker, Steph, could have connected with CWA because she was like, “No, let's talk to CWA.” CWA took it from there and they actually got us the motivation and the resources we needed to unite us. Whereas, Coworker was like, “Oh, we love unions, but why don't you do this petition first, it's building organization?” and CWA is like, “No.” Unfortunately, some people are taking the CWA being against that as an insult on them personally, which is really weird, that it's an insult for people who did past organization efforts that weren’t unionizing. I don't see why that is relevant. I understand sometimes you can't unionize and I respect other organization efforts, but you're taking an example of a company that can unionize and you're pushing them to do a petition. You're wasting their time. You're endangering their jobs. It's just bad. REIN: Well, I think if there was evidence that it starting with petitions led towards more formal union organizing, I would be more in favor of it, but I don't know of any. MELISSA: Yeah. People use the Google walkout, for example and I guess, the Google unions and the controversy on Twitter was about how the union wasn't involving the past organizers who did all this work for the Google walkout. I recognize Google walkout was an amazing thing and the people who organized that were really great, but that doesn't mean that you have to use their expertise to unionize. A union should be for the current employees. When I'm talking about our union at Glitch, I'm not speaking for the union. I was laid off. I'm not a member anymore. That's very sad. It's very unfair, but I'm not a member and the employees who are working there have insight into the company that I don't. So I don't expect them to recognize me, or to ask me for advice, or anything. I don't even talk to them that much anymore because that's their sphere. CASEY: I'm not an expert on Coworker, but this reminds me of another metaphor a little bit. Let me know if this is close, or not, or similarities and differences. So you know how when you look on the bottom of a solo cup, you see a triangle, or a cycle symbol with a number? Some of those aren't really recyclable and the lobbyists who made that happen, and you’re required to put them on, knew that ahead of time. So they are doing this small change, “Look, you can do the thing,” and then that stops people from pushing back against the production of it. It's helping, but not really and I'm hearing your view of Coworker seems to be helping, but not really. MELISSA: I mean, the Frank one is even worse. They're a for-profit startup. I'm like, “If anyone is giving them positive coverage, they are not asking the right questions here.” Actually, when I saw them written about, I attempted to join just to see what they were about and they rejected me because they were like, “Oh, you're already in a union. You don't need us.” So very interesting. They occasionally email me asking for my feedback, but I'm like, “I don't think you're worth my time.” REIN: If someone wanted to make a platform for unionizing, but I don't think you're going to get much traction in Silicon Valley on that one. MELISSA: There is one person who's doing that. It's called Unit, but I don't know that much about it. I'm just very skeptical of the idea that tech can disrupt unions and it's the easy way out to say, “Oh, the old unions, they're not radical enough. They don't cater to tech workers.” To throw that all away for those reasons is bad in my opinion, because they're not perfect, existing unions, but you're unionizing with a diverse workforce that has a history and has power and I don't know. I think it's also classist, too, like, “Oh, we don't want to organize with these people that aren't tech workers. We don't want to organize with these blue-collar workers.” They're not thinking that maybe explicitly, but that's what they're saying in a way. They don't want to say that, but that's what they're saying. REIN: Yeah. I personally have a problem with trade unions that is that they fracture the workforce and they prevent people with different trades from organizing together and historically, that's been on purpose. Like there's a reason the AFL is still around, but the Knights of Labor aren't. MELISSA: Yeah. I mean, unions are organizations, they’re just like companies and stuff. There's some that even have dark histories of racism and stuff like that. Although, trade unions are a little different than like CWA. This is where I wish I was more up to the terminology, but it's very complicated. REIN: I would just like to unionize whole companies and not worry about what job titles people have because I think that's the systems thinking way to do it. MELISSA: Yeah, and we unionized everyone in our company that qualified under the labor, the national labor law, and not just engineers so, that was good. Luckily, the people were into engineers being craftsmen, or whatever are usually typically anti-union, but otherwise, you'd think that they'd be like, “Oh, we need an engineer's trade union because we're like electricians, or something.” But I think that would not be a good direction. CASEY: Yeah, I think it makes a lot of sense that there are unions for people who work at a company, separate from groups of people working on a technology like, Ruby user groups and all the other meetup groups for every technology everywhere and the conferences. It's like the skills are separate from the union, from the company and it's funny, I guess maybe historical that a lot of them are conflated together. All the engineers in the company are doing both a little bit. I like that we're cleanly splitting it now sometimes. That sounds great. Melissa, I noticed that you have a Substack newsletter, which is a popular thing lately. Not that you're working on a lot lately. I know we talked about that, but there's a trend for individual people to be writing more and more online lately and it seems like you're aware of that and in that sphere. What's your experience lately writing online, trying to get an audience and all that? The process. MELISSA: I say no to Substack because I'm like, “This is just more work and I don't need any more work.” I started a Substack because I was like, “Oh, a lot of people have Substacks.” But then I was like, “Oh, this requires me to do, this is another job.” You have to have a consistent thing and at least, we are starting to – Substack encourages paying creators. That's good. But at some point, it's like, “Oh, I'm paying like ten different creators. I wish there was this thing where I could just pay them all at the same time and they could have jobs and benefits. Oh, that's called a publication. Too bad, we've systematically disabled these by predatory capitalists, hedge funds and stuff, buying them and disposing of them.” Like what's happened to the Chicago Tribune. I had friends who worked there and that thing it's basically just been totally dismantled by predatory companies. So I think Substack is going to be here and other similar models are going to be here for the foreseeable future. But I don't think they are – I think it's sad. CASEY: Have you worked with any of the traditional publications to try to get things published? I know you do JavaScript content work. MELISSA: Yeah. So I originally was a food writer and I've worked for Chicagoist. I left Chicagoist because I didn't have time due to my tech job, but they unionized and they were shut down because they unionized and that's really sad. A lot of my friends lost their jobs. So I have a little bit of experience in the media world and I've watched the media world become so fractured and precarious and I think the tech industry has been unfortunately, a negative actor in that. But now, I primarily write about JavaScript and I do so professionally. It'd be nice to write about food instead, but I like JavaScript. I like coding a lot so that's cool. There's no jobs in food writing, though. CASEY: Tell us about something you wrote recently. MELISSA: I wrote about JavaScript date libraries and like the different ones that are out there and when you should use the library and when you shouldn't use the library and that's for the blog I work for, which is called Skypack blog and I do DevRel all for them in there, a CDN for JavaScript modules. Oh, here's the thing we can talk about: how people attack DevRel as being non-technical and I hate that. JACOB: Yeah, please. MELISSA: There was a tweet this week, or maybe it was on Friday, it was like, “Offend a developer relations person in one tweet,” and I'm like – so it was a variation on the original one, which was, “Offend a software engineer, offend a DBA in one tweet,” and those were often there a software engineers making fun of software engineers or DVA's people making jokes about data structures, or a bad data. The DevRel one was like, “Oh, your job is fake.” That's what all the jokes were and most of them were not from DevRel people and I'm like, “I hate this.” I used to be a frontend developer and people used to joke like that about frontend developers, like, “Oh, you just play with CSS all day and you just push little boxes around the page and give them different colors.” We need to recognize that there’s sexism involved in this and also, racism because frontend development and DevRel tend to be more diverse subsections of tech. I'm just tired of men saying a job is fake and that I'm not technical. I left frontend dev because of that, partially. I shouldn’t have done that because the end of the day, there's no way to convince these people that you're a real engineer. They're just not going to be convinced because they're sexist and they're jerks and they should be deleted. REIN: Yeah. It was kind of funny when it was software engineers laughing at themselves, but it turned into punching down pretty quickly and then it just got me in and I did not like it. I would say to those people that they should try a day in the life of a DevRel and see if you think you're good at it. MELISSA: Yeah. It's thinking that, okay, if you have these skills, you don't have the technical skills and also, that your other skills aren't valuable at all. This is a constant struggle, working with engineers, especially working in cross-departmental is engineers not recognizing other skills. I was talking about video editing before. I'm like, “That is the worst thing I can totally think of is calling a video editor non-technical; they're literally the most technical people I could think of.” They're walking with software technology and also, a lot of engineers who are like, “Oh, anyone can write things,” and I'm like, “I've edited y’all’s writing. I know you can't write.” [chuckles] Even me, I feel like sometimes the more engineering I do, the worst I become as a writer. That's scary, but I try to balance it. I try to be a mediocre engineer and a mediocre writer. REIN: I want people to stop doing that because it’s just a shitty thing to do, but I will also say that as you get more experienced as a software engineer – so I'm a principal now, which means I'm a huge deal, but as you get more experienced, you need to get good at a lot of the stuff that DevRels are good at. You need to be able to convince people that your ideas are good. You need to be able to communicate both verbally and written in writing. You need to give a shit about product and marketing and customer support and people who aren't engineered. You have to start doing all that stuff if you want to grow as an engineer. So to some extent, I think these people are limiting themselves more than are limiting DevRel. They should still stop being shitty people, though. MELISSA: Yeah. The whole principal engineer thing is funny because I was just thinking about how every company has a different definition for principal, senior, junior. That's one of the things that a union can help with and otherwise, it can be very arbitrary and you can feel like they're used to discriminating against people. So if the union can negotiate what a ladder is and what it means, that's way better than having just a random manager do it. That's my rant with all of tech. We're always constantly reinventing the same thing over and over again. Ladders were like, “Oh, we’ve got to build this from scratch for ourselves. Even though we have no training on building ladders, we're just going to invent this because we know everything because we're engineers.” Same with interviewing process. I'm like, “Oh, there's decades of research on interview process. but you want to invent your own new interviewing process.” I'm like, “At some point, you're just like experimenting on people and that's unethical.” I'm like, “Take your weird games elsewhere. If you want to design weird games, play Dungeons & Dragons, or something.” REIN: Yeah. I mean, if you want to take human performance seriously, you can do that. People have been doing that for decades. You just need to go take a course and read some books and started taking it seriously. It's not hard. I mean, it's hard to evaluate human performance because human performance is very complex, but it's impossible if you don't know what you're doing. MELISSA: Yeah, and I tried to get – any interview process I'm involved with designing. I'm like, “First of all, why am I involved with designing this? I'm not qualified. Second of all, at least I did read some research and I do know that the research shows that you want to do a structured interview.” If I can just get people to agree to that one thing, it's so much better than if they're just asking random questions. So structured interview means you agree on a structure beforehand for the interview, you agree on questions and what you're going to talk to the person about, or what exercise you're going to do, if you insist on doing programming exercise. You ask the same ones to every candidate. There's other things you could do to make it more fair, but if you just have that one baseline. Otherwise, it's so arbitrary. REIN: There's a book called Hiring A-Players, or something like that and I like some of the advice that it has, but I think the idea that you can distinguish between “A and B players” in an interview is pretty marginal. But I do like the parts about trying to make things more evidence-based when you're trying to assess capability. I think that a lot of the hiring practices we have today mostly are about providing motivated reasoning to hire people who look like you and that's about 90% of what they do. MELISSA: Yeah, and there's also this thing, I will die in this hill, but I have people who insist if we don't do a specific code exercise, or do some kind of screener that we're going to hire someone who can't code, who literally can't code and some people will have insisted that they've worked with such people. I'm really skeptical that like can't code. What does that mean? I don't know. Does it mean they just didn't integrate with the team correctly? No one tried to help them? I'm not sure. I'm just really skeptical of that. It just sounds like more hoops to jump through, but I have not convinced anybody of that besides myself. [chuckles] At least in workplaces. REIN: I think in my career, I've maybe worked with one person who I genuinely thought couldn't code, but that's when I was pretty new. What I think now is that they were really not put in an environment where they could be successful. They were dropped in immediately into a high-pressure scenario, with little experience, with a team that was small, under-resourced, over pressurized, and didn't have time to support them. So what I thought then was, “Wow, this guy sure can't code. He sucks.” What I think now is, “Wow, we sure screwed up putting him in that position.” MELISSA: Yeah. I've taught people to code who are 12. I'm really skeptical that someone was hired that managed, I don't know, I just sound like they're not managed well, or not onboarded well, but that'd be a cool, like, I don't know. Maybe I'm becoming too interested in HR, I will become an HR researcher and study the phenomenon of people saying that their coworkers can't code and what does that mean? REIN: Yeah. MELISSA: I mean actually find those people, ask them, and then find the people who supposedly can’t code and find out they actually can. They were in a very difficult environment, for example, or I don't know. I've been in environments where getting the dev environment started took you five days. No wonder they had trouble; you thought they couldn't code because you did set them up to being able to code. They had to install 40 different things and do a proxy, or whatever. So yeah. JACOB: I’m someone who’s very – well, there's that phenomenon stereotype threat you perceive that other people are making preconceived judgements about you. Like, “Oh, I'm the only person of color in my team and I can tell that I'm not expected to do well.” It affects your performance and as a white male, that actually does make some sense to me. If I can feel that I'm going to be judged for the output that I put out, instantly whether it's I didn't follow the great style, or it looks like my work is going to be picked apart immediately. That's just going to be debilitating and I'm just going to be constantly focused about looking good rather than trying to solve the problem. That is not what – Rein’s story does not surprise me at all. MELISSA: Yeah. If I actually hired someone who couldn't code, that would be actually exciting to me, it would be like My Fair Lady, or something because I could definitely teach them how to code and I'd be really impressed because I was like, “Oh, they were able to talk about all these projects and stuff and not actually be able to code?” I don't believe this person exists, by the way. REIN: The other thing I really wish people would understand is that human performance is ecological. The context matters. If you take one person and drop them into five different hypothetical companies, you'd get five different outcomes. They'd perform in different ways. You wouldn't get the same performance for them in those different companies because it's not just about the person. MELISSA: Yeah, and it's also about the demands of the job. I worked with one guy and people told me he couldn't code and what they actually meant was that they just didn't think he was technical, or something, but he was coding every day. He was doing Dribble templates, which is not considered the highest level of work by some snobby engineers. But that guy could definitely code and he did his job and it was very unfair to say he couldn't code. CASEY: I have a story I can share about some evidence-based interviewing I did back at the IT department. We evaluated hundreds of student employees to fix laptops every year—we hired a whole bunch—and we evaluated them based on the people skills and their technical skills on a scale we put that into data for all the points that evidence and structured questions and all that. Some people had a 5 on people's scales out of 5 and 1 on technical skills, or vice versa, or something close to that. And then we look back a year, or 2, or 3 later, after they had time to learn and grow in the position, we loved all the people with the 5 on people's skills. They were the best employees. They learned the most over time. We're proud of them. They were great to work with. They taught other people a lot, too. But the ones with the technical 5s weren’t people ones. A lot of them resigned, or didn't like the job, or people avoided working with them, they were solo employees. Maybe they got some work done, but that lesson that you can learn the technical part, but you can't necessarily learn the people part. Some of it's learnable if you're motivated, but the disposition is what really drove success in that role. I think that applies everywhere. It's not surprising. MELISSA: I wish there were more approaches teaching people skills because, I don't know, it feels like there's a lot of trainings for engineering skills, but not for people's skills. I've definitely like, I was raised by parents who were weird and homeschooled me. So I definitely use a lot of stuff like books to learn people skills and stuff like that. I don't know. It's super basic, but How to Win Friends and Influence People, that one. You could just read that. I mean, it gets you some of the way there. So I wish there were more resources like that. REIN: Yeah. I would say that you can learn people skills, but companies don't teach them. That's not what companies think is part of their responsibility. They think that they're hiring the person as they are and can teach them technical things. That's another problem, which is that companies aren't providing the opportunities to grow that people need. JACOB: There's probably different people's skills for different companies, that would be successful. MELISSA: Yeah, and it's the same thing. It's the saying that I've heard at workplace is like, “Oh, he doesn't know how to code.” I've also heard the same thing like, “He has no social skills.” It's like something you're born with and can't be changed and that's just your lot in life and I don't believe that. I was homeschooled and when I first went to school, you would have said, “I had poor social skills,” but now I have serviceable social skills. JACOB: I think Casey pointed out an important distinction between a disposition to be personable and learn and apply people skills versus the skills you have at a particular moment. I, as a neurodiverse person, I think that's a really important thing because I'm sure people have said behind my back many times in my life that I don't have people's skills without commenting on the disposition of my ability to do well and interface with people. I think they’re two different things. MELISSA: I think neurodiverse people—I'm also in that category—also sometimes are even better at certain people's skills because we've been told we have these issues and we really want to think about them. I've read a lot of books; I don't think most neurotypical people have read as many books as I have on human psychology. I wasn't a psychology major—I just want to know why are these normal people trying to get me to do these things? What does it mean? That's a level I’m asking? Yeah, but that's a skill and it's a learned skill that is valuable to me. REIN: Can we talk about unions again because I have a question? If you already talked about this before I got here, just let me know. But my question is: what would you say to someone who really has no idea how to get started with this, but thinks that there's an opportunity to organize their company is worried about retaliation and things like that and wants to get started? MELISSA: Yeah. Get in contact with, they could DM me and I could connect them to people at the current Glitch union, or two, you can approach a union directly. CWA is happy to help. The union that Kickstarter organizers worked under OPEIU, I think is also another option. It can be hard to pick a union because some only do local organizing, but there are some that are national like CWA. CWA has a lot of resources. I would just go with them at first. But you can always do your research and stuff. I'd just be careful with people who direct you to those petition sites, or whatever and that did happen to me. REIN: And don’t do your organizing on the company Slack. MELISSA: Oh yeah, for sure. Use Signal, don't do it on company time when you're supposed to be working, build social relationships with people at work. Although, it could be, I don’t know if – I was a member of a company where they specifically seem to discourage social relationships. I was a contractor. I wonder if that was a way that they were discouraging organization and unionizing. You see that with Uber and stuff like that. Uber drivers, they're not given a company Slack, pr whatever, or even like, they don't have a way to chat with other drivers. They've had to do this on their own time on Facebook; they've used Facebook to organize. So definitely don't use any company resources, or company time. You're not legally protected if you do that. If you do contact CWA and stuff, they'll tell you what's legal and illegal. It is for example, legal to organize during lunch, I believe, but you should definitely check that beforehand. And then you get into issues if you're remote, time zones, everyone has lunch at a different time. You have to be creative. REIN: Yeah. It turns out it's legal, except for all of these loopholes that make it not legal and companies are incentivized to make the case that what you did was illegal so that they can fire you. So just be extra careful. MELISSA: Yeah. I don't know. I've known of union organizers that they're going to find a way to fire some of them, but if you can stand up and up in your job, you're harder to fire. Make sure to attend all your meetings. Don't be late to work. I am not a fan of that and I think it's very unfair that you have to be expected to live by this perfect standard that non-organizing employees don't have to follow, but I'm willing to do it for the union. REIN: Yeah. I mean, just be aware that once it becomes apparent that this is what you're doing, they're going to try to fire you—any company will—and so you need to be on your best behavior even more so than you were before. MELISSA: And it is scary organizing unions. I've often wondered would I have been laid off if there was a union, or not? I don't know. But the thing is you negotiate severance for me and I didn't have to do that individually. So it gave me a good cushion when I was laid off and I know people who are laid off who didn't have those things. A company can hurt you even if you don't unionize and at least, unions give you some protection and I'm very grateful to CWA negotiating my severance. REIN: So are we getting close to reflections? CASEY: I think it is time for reflections. I can go first. As a product manager and engineering manager before, I've always been interested in being part of a union and it's awesome hearing a success story about how, what happened at a company, even though it was the formal type that I'm not eligible for as a manager. But now I'm very interested in looking up some of these alternative forms like sectoral marketing, minority union. I think there's a whole lot happening recently that could help middle managers like me and a lot of my roles have the benefits. Often, I hear, “No, you can't possibly ever be part of a union. Why would you even ask that question?” And it's just great to hear someone actually who has worked with a union and say, “No, that's possible. It's just a different form. Not covered by loss.” That's what I want to hear. That's what I wanted to believe. MELISSA: Yeah. It's so unfair. Unions are just what's the law now doesn't have to be the law tomorrow, for example and different countries have different forms of unions and stuff, so. JACOB: I'm thinking more about the thread we got on about personal skills, people skills and I'm thinking more about how those can be really just a function of the culture of your team and who's on it and what everyone's individual needs are and how their brains are wired and so many other factors. I'm just thinking about, “Well, what are the right skills that I need for my team rather than just an arbitrary, or a universal list of what those skills might be?” MELISSA: Yeah. I'm thinking I need to like – I'm here talking about unions and there's so much I don't know about unions. I'd like to study unions in other countries, especially. I really want to learn about different forms of unionization and really delve into the history of unionization. I've done it a little. I was never taught that much about unionization in school and stuff like that, especially from homeschool because my parents were anti-union. But even when I went to public school, after being homeschooled, we really didn't talk that much. I know about the Triangle Shirtwaist Fire, but I think for most people, we don't know that much about it and I definitely want to beef up my history and international knowledge on that. REIN: Yeah. I think also looking into collectivization work around collectives, things like that, there's a tech consultancy that does the websites for Verso and Haymarket and some other lefty publications and there are workers collective and there are actually a surprising number of them. MELISSA: Yeah. That's super interesting to me. I've done a little bit of co-ops and stuff. I've been members of co-ops. There is an interesting article, I forget where I saw it, but it was about how co-ops can be good, but they're not the answer to work, or organizing because often they replace work, or unionization. For example, they were talking about this coffee shop that they were trying to unionize and they all got fired and then they formed a co-op and that was seen as success, but it's not necessarily. For example, I'm a member of a co-op, a food co-op, and the workers there were trying to unionize and the co-op was union busting them and that was like, wow, that is really special and as a member of the co-op, I was writing to the board. I was like, “How dare you, I'm going to quit.” [chuckles] We should recognize the union. They really fought that union and I was like, “This is supposed to be – co-op is supposed to be empowering to workers,” but just like unions, there are many different forms of co-ops. There's a very interesting history, especially internationally and I don't even know the tip of the iceberg on that. But I'm very fascinated and having been in co-ops and been involved with co-ops. Another issue with co-ops is often the membership that can be almost like trade unions in that, there can require an onerous process to join one. REIN: I think the thing I'd like to leave our listeners with, you might've heard the saying, “An injury to one is an injury to all,” and you might know that that comes from the IWW, I believe. But you might not know that it comes from preamble to their constitution, which says in part, “Trade unions foster a state of things which allows one set of workers to be pitted against another set of workers in the same industry, thereby helping to feed one another in wage wars. Trade unions aid the employing class to mislead workers into the belief that the working class have interests in common with their employers. These sad conditions can only be changed and the interest of the working class upheld only by an organization formed in such a way that all of its members in any one industry, or in all industries, if necessary, cease work whenever a strike or lockout is on.” So the IWW obviously believes very strongly, you have to organize whole companies and not just the techies maybe get their union because they're special. I mean, can you imagine if Uber, if the tech workers and the drivers unionized together? They share the same interests, folks they could do that. MELISSA: Yeah. That's an interesting question. Like, could they? That's another thing that contracting, or permalansing, I don't know, maybe there'll be a major court challenge, especially with the Biden administration where the National Labor Board might be more sympathetic. Can contractors unionize with regular workers? Contracting is a way to bust unions and to keep people in a position of precarity, but what if they ruled that you can unionize. Once you realize that’s arbitrary, you're like, “Oh, if you've got good enough lawyers, if you have politicians that can get involved, maybe unionization 10 years from now will look really different because maybe they –” REIN: Yeah, the main difference is that the drivers don't have multi-million dollar lobbying organization that they're backed by. That's the main difference and the reason they're not getting the respect they deserve. Special Guest: Melissa McEwen.
Principal Developer Advocate for SRE & Observability at Honeycomb, Liz Fong-Jones, talks about common questions she regularly gets from companies and clients looking to implement observability into their workflow and defines observability as a mechanism in order to improve the operability of systems. Liz says we shouldn’t talk about observability in a vacuum and that instead, it's a technique for analyzing production data that goes hand in hand with other production techniques, team philosophies, and development methodologies. Should you find a burning need to share your thoughts or rants about the show, please spray them at devrel@newrelic.com. While you’re going to all the trouble of shipping us some bytes, please consider taking a moment to let us know what you’d like to hear on the show in the future. Despite the all-caps flaming you will receive in response, please know that we are sincerely interested in your feedback; we aim to appease. Follow us on the Twitters: @ObservyMcObserv.
Liz Fong-Jones speaks about her upcoming talk at PagerDuty Summit, on 23 September, as well as other topics including how Honeycomb is handling the COVID pandemic and how to navigate work-life balance when home is work.
Liz Fong-Jones is the Principal Developer Advocate at Honeycomb, a company that helps developers visualize, understand, and debug software. Prior to joining Honeycomb, Liz worked at Google for over 11 years, wearing many different hats over that period, including Staff Developer Advocate, Staff Site Reliability Engineer, and Site Reliability Engineering Manager. Join Corey and Liz as they discuss why people either love or hate Honeycomb, how Honeycomb has been pretty awesome to Corey over the years, why Liz left Google after an 11-year run, what Liz’s opinions on AWS and GCP are, how nobody really has a good understanding of AWS’ offering, why Liz doesn’t think anyone has to worry about GCP being deprecated, what boards and VCs tend to do once they hear the word “union,” how there isn’t an ethical leader among cloud providers, and more.
Today on the podcast, Jon Foust is back with Mark Mirchandani as we talk about SLOs and the importance of measuring service reliability with Alex Bramley. As a member of the Google SRE team, Alex and his coworkers help customers optimally run their services on Google Cloud. They collaborate with the client, weighing client needs and user needs to develop a plan that is affordable, efficient, and has the highest reliability for the user. Recently, they’ve been working to automate functions such as detection of outages, so that Google and the customer can work together quickly to get everything working smoothly again. Later, Alex, describes the steps developers go through at his workshop, The Art of SLOs, which was designed to help companies measure and improve reliability. At this workshop, attendees are encouraged to set SLO targets and error budgets. They are given theoretical reliability problems to solve, allowing them to practice without the added pressure of messy, real-world problems. The Art of SLOs helps developers understand what measurements are beneficial and why and the best way to implement projects that can take those measurements accurately. Alex was able to make the materials for the workshop free online! Alex Bramley Alex Bramley joined Google in January 2010 as the first Mobile SRE in London, after IBM bought the startup he enjoyed working for and made it much less fun. He spent around 7½ years in various reincarnations of Mobile/Android/Play SRE, looking after the infrastructure that makes phones smart, keeps them up to date, and provides them with countless distracting apps. CRE offered an interesting opportunity to do something different and learn from a bunch of very smart senior people, and Alex has not regretted taking the leap into the unknown. Much of his time recently has been spent rethinking how people teach customers, partners and the general public about SLOs. He helped create the Coursera course on measuring and managing reliability and developed what became the Art of SLOs for Liz Fong-Jones to deliver with other Google SREs at SREcon EMEA’18. Alex works four days a week so he can (suffer) enjoy looking after his children on Wednesdays, listen to cheerful music, and waste a lot of time playing computer games and occasionally writing code. Cool things of the week Postponing Google Cloud Next ’20: Digital Connect blog New research: How effective is basic account hygiene at preventing hijacking blog Simplified global game management: Introducing Game Servers blog Interview The Art of SLOs site CRE Life Lessons blog Putting customers first with SLIs and SLOs blog Putting customers first with SLIs and SLOs (Part 2) blog Measuring and Managing Reliability course Site Reliability Engineering books Question of the week How do I get started with GCGS? docs Google for Games Developer Summit Keynote video Google for Games Developer Summit Playlists videos Where can you find us next? Jon will be working on an Open Match sample project for the developer community. Mark will be making more videos like Error Reporting and error logging - Stack Doctor.
What is Honeycomb.io? From the site: “Honeycomb is a tool for introspecting and interrogating your production systems. We can gather data from any source—from your clients (mobile, IoT, browsers), vendored software, or your own code. Single-node debugging tools miss crucial details in a world where infrastructure is dynamic and ephemeral. Honeycomb is a new type of tool, designed and evolved to meet the real needs of platforms, microservices, serverless apps, and complex systems.” SSH 2FA gist https://gist.github.com/lizthegrey/9c21673f33186a9cc775464afbdce820 Honeycomb.io for digging into access logs & retracing what pentesters do. Check out our Store on Teepub! https://brakesec.com/store Join us on our #Slack Channel! Send a request to @brakesec on Twitter or email bds.podcast@gmail.com #Brakesec Store!:https://www.teepublic.com/user/bdspodcast #Spotify: https://brakesec.com/spotifyBDS #RSS: https://brakesec.com/BrakesecRSS #Youtube Channel: http://www.youtube.com/c/BDSPodcast #iTunes Store Link: https://brakesec.com/BDSiTunes #Google Play Store: https://brakesec.com/BDS-GooglePlay Our main site: https://brakesec.com/bdswebsite #iHeartRadio App: https://brakesec.com/iHeartBrakesec #SoundCloud: https://brakesec.com/SoundcloudBrakesec Comments, Questions, Feedback: bds.podcast@gmail.com Support Brakeing Down Security Podcast by using our #Paypal: https://brakesec.com/PaypalBDS OR our #Patreon https://brakesec.com/BDSPatreon #Twitter: @brakesec @boettcherpwned @bryanbrake @infosystir #Player.FM : https://brakesec.com/BDS-PlayerFM #Stitcher Network: https://brakesec.com/BrakeSecStitcher #TuneIn Radio App: https://brakesec.com/TuneInBrakesec
Ms. Berlin's appearance on #misec podcast - https://www.youtube.com/watch?v=Cj2IF0zn_BE with @kentgruber and @quantissIA Blog post: https://www.honeycomb.io/blog/incident-report-running-dry-on-memory-without-noticing/ What is Honeycomb.io? From the site: “Honeycomb is a tool for introspecting and interrogating your production systems. We can gather data from any source—from your clients (mobile, IoT, browsers), vendored software, or your own code. Single-node debugging tools miss crucial details in a world where infrastructure is dynamic and ephemeral. Honeycomb is a new type of tool, designed and evolved to meet the real needs of platforms, microservices, serverless apps, and complex systems.” What are SLOs and how do you establish them? Are they anything like SLA (Service level agreements)? Can you give us an idea of timeline? Length of time from issue to IR to resolution? Are the dashboards mentioned in the blogs post your operations dashboard? [nope! hashtag no-dashboards] Leading and lagging indicators ( IT and infosec call them detection and mitigation indicators) https://kpilibrary.com/topics/lagging-and-leading-indicators How important is telemetry (or meta-telemetry, since it’s telemetry on telemetry, if I’m reading it right --brbr) in making sure you can understand issues? Do you have levels of escalation? How do you define those? When you declared an emergency, how did brainstorming help with addressing the issues? Do that help your org see the way to a proper fix? Did you follow any specific methodology? Did you have a warroom or web conference? Communications: https://twitter.com/lizthegrey/status/1192036833812717568 Can being over transparent be detrimental? Communication methods in an IR: Slack Phone Tree Ticket system Emails What does escalation look like for Ms. Berlin? Mr. Boettcher? (stories or examples?) Confirmation bias (or “it’s never in our house”) fallacy “I’ve seen and been a part of that, very prevalent in IT” --brbr Especially when the bias is based on previous outages/issues From the blog: “We quickly found ourselves locked in a state of confirmation bias…” Root Cause Analysis: Once you diagnosed the issue, how quickly was a fix pushed out? What kind of documentation or monitoring was generated/added to ensure this won’t happen again? Check out our Store on Teepub! https://brakesec.com/store Join us on our #Slack Channel! Send a request to @brakesec on Twitter or email bds.podcast@gmail.com #Brakesec Store!:https://www.teepublic.com/user/bdspodcast #Spotify: https://brakesec.com/spotifyBDS #RSS: https://brakesec.com/BrakesecRSS #Youtube Channel: http://www.youtube.com/c/BDSPodcast #iTunes Store Link: https://brakesec.com/BDSiTunes #Google Play Store: https://brakesec.com/BDS-GooglePlay Our main site: https://brakesec.com/bdswebsite #iHeartRadio App: https://brakesec.com/iHeartBrakesec #SoundCloud: https://brakesec.com/SoundcloudBrakesec Comments, Questions, Feedback: bds.podcast@gmail.com Support Brakeing Down Security Podcast by using our #Paypal: https://brakesec.com/PaypalBDS OR our #Patreon https://brakesec.com/BDSPatreon #Twitter: @brakesec @boettcherpwned @bryanbrake @infosystir #Player.FM : https://brakesec.com/BDS-PlayerFM #Stitcher Network: https://brakesec.com/BrakeSecStitcher #TuneIn Radio App: https://brakesec.com/TuneInBrakesec
In episode 13 of O11ycast, Charity Majors and Liz Fong-Jones talk with Natalie Bennett, Software Engineering Manager at Pivotal. They discuss the difference between collaborative projects and teams, continuous verification, and diagnosing failed deployments.
In episode 13 of O11ycast, Charity Majors and Liz Fong-Jones talk with Natalie Bennett, Software Engineering Manager at Pivotal. They discuss the difference between projects and teams, continuous verification, and diagnosing failed deployments. The post Ep. #13, Cloud Wrangling with Natalie Bennett of Pivotal appeared first on Heavybit.
In episode 13 of O11ycast, Charity Majors and Liz Fong-Jones talk with Natalie Bennett, Software Engineering Manager at Pivotal. They discuss the difference between projects and teams, continuous verification, and diagnosing failed deployments.
In episode 13 of O11ycast, Charity Majors and Liz Fong-Jones talk with Natalie Bennett, Software Engineering Manager at Pivotal. They discuss the difference between projects and teams, continuous verification, and diagnosing failed deployments. The post Ep. #13, Cloud Wrangling with Natalie Bennett of Pivotal appeared first on Heavybit.
In episode 12 of O11ycast, Charity Majors and Liz Fong-Jones speak with Rich Archbold of Intercom. They discuss the crucial importance of timely shipping, high-cardinality metrics, and the engineering value of running less software.
In episode 12 of O11ycast, Charity Majors and Liz Fong-Jones speak with Rich Archbold of Intercom. They discuss the crucial importance of timely shipping, high-cardinality metrics, and the engineering value of running less software.
In episode 12 of O11ycast, Charity Majors and Liz Fong-Jones speak with Rich Archbold of Intercom. They discuss the crucial importance of timely shipping, high-cardinality metrics, and the engineering value of running less software. The post Ep. #12, Speed of Deployment with Rich Archbold of Intercom appeared first on Heavybit.
In episode 12 of O11ycast, Charity Majors and Liz Fong-Jones speak with Rich Archbold of Intercom. They discuss the crucial importance of timely shipping, high-cardinality metrics, and the engineering value of running less software. The post Ep. #12, Speed of Deployment with Rich Archbold of Intercom appeared first on Heavybit.
Bridget and Matty chat with guests Liz Fong-Jones, Alice Goldfuss, and RJ Williams, in front of a live studio audience at devopsdays Minneapolis 2019.
In episode 11 of O11ycast, Charity Majors and Liz Fong-Jones speak with Gremlin chaos engineer Ana Medina. They discuss the relevance of breaking things in order to engineer them more efficiently, monitoring vs observability, and chaos engineering at scale.
In episode 11 of O11ycast, Charity Majors and Liz Fong-Jones speak with Gremlin chaos engineer Ana Medina. They discuss the relevance of breaking things in order to engineer them more efficiently, monitoring vs observability, and chaos engineering at scale.
Bridget and Matty chat with guests Liz Fong-Jones, Alice Goldfuss, and RJ Williams, in front of a live studio audience at devopsdays Minneapolis 2019.
In episode 11 of O11ycast, Charity Majors and Liz Fong-Jones speak with Gremlin chaos engineer Ana Medina. They discuss the relevance of breaking things in order to engineer them more efficiently, monitoring vs observability, and chaos engineering at scale. The post Ep. #11, Chaos Engineering with Ana Medina of Gremlin appeared first on Heavybit.
In episode 10 of O11ycast, Charity Majors and Liz Fong-Jones speak with Bugsnag CEO James Smith. They discuss the seemingly impossible ways an organization can measure technical debt and how they can attempt to reduce it.
In episode 10 of O11ycast, Charity Majors and Liz Fong-Jones speak with Bugsnag CEO James Smith. They discuss the seemingly impossible ways an organization can measure technical debt and how they can attempt to reduce it.
In episode 10 of O11ycast, Charity Majors and Liz Fong-Jones speak with Bugsnag CEO James Smith. They discuss the seemingly impossible ways an organization can measure technical debt and how they can attempt to reduce it. The post Ep. #10, Measuring Technical Debt with Bugsnag CEO James Smith appeared first on Heavybit.
BTSW has shared quitting as a viable tactic on previous episodes. If you can’t change your workplace, find a new one. Eula believes in this tactic– she's a prolific quitter. Jeannie doesn’t - she has literally never quit a job. So when is quitting a wise tactical choice? And what happens when you *DO* quit just to discover your new work reality is just a different kind of toxic? And honestly, if “quitting” is a viable tactic, why isn’t “just suck it up and deal with it” a viable tactic too? Liz Fong-Jones thought a lot about this when she quit her job at Google, and she shares her insights on how to quit strategically. Love this podcast? Support KUOW Public Radio and BTSW by donating here: https://kuow.org/donate/btsw?utm_source=shownotes&utm_campaign=btswbonus
This week, former Alphabet engineer and activist Liz Fong-Jones joined to discuss the challenges of working as a trans person at Alphabet and how the tech industry can better protect the rights of vulnerable employees. Larry Jasinski, the CEO of ReWalk, came on talk his company's work using robotics to help paraplegics and stroke victims walk again. Then Reddit and Initialized Capital co-founder Alexis Ohanian joined to talk about his new mission: fighting for paid paternity leave.
Description Liz Fong-Jones is a Site Reliability Engineer and former high-ranking Google employee who used her talents as a coder and some strategic planning to take her out of a tumultuous early home life and into the life of a multi-millionaire. Arlan and Liz met up in New York City to speak candidly about life, liberty, and the pursuit of happiness. Links: Twitter Website Sponsor This episode is sponsored by DigitalOcean. Apply for DigitalOcean Hatch. (To inquire about sponsoring the YFM podcast, please get in touch.) Mentions Effective Altruism Transcript The transcript for this episode will be added shortly. Credits Producer and Editor: Anna Eichenauer Senior Producer: Bryan Landers Additional audio mixing and mastering: Alfred ‘Rook‘ Hamilton Additional production: Chacho Valadez Executive producer: Arlan Hamilton Music by Jeff Kaale (1, 2, 3, 4) --- Send in a voice message: https://anchor.fm/yfm/message
Liz Fong-Jones is a 15-year Site Reliability Engineer. She’s a former Google and current SRE and observability advocate for Honeycomb.io. Liz evangelizes “Building Excellent Systems” that both look out for the people who use the system, and for the people who work on it. In this fascinating episode, she talks with Ledge about wide ranging ethical issues in engineering, and about how it’s a business advantage to say you’re an ethical engineer. See acast.com/privacy for privacy and opt-out information.
Podcast Description “We have to stand up for what’s right and we have to do that in an intersectional way. We can’t just stand up for our selves, we have to build that solidarity. We don’t have the numbers alone. If we work together we can actually accomplish change.”Liz Fong-Jones is a developer advocate, labor and ethics organizer, and Site Reliability Engineer (SRE) with 15+ years of experience. She is an advocate at Honeycomb.io for the SRE and Observability communities, and previously was an SRE working on products ranging from the Google Cloud Load Balancer to Google Flights.She lives in Brooklyn with her wife, metamours, and a Samoyed/Golden Retriever mix, and in San Francisco and Seattle with her other partners. She plays classical piano, leads an EVE Online alliance, and advocates for transgender rights as a board member of the National Center for Transgender Equality. Additional Resources Liz's WebsiteNational Center for Transgender EqualityTransgender Law CenterMeet Demma Rosa Rodriguez, Head of Equity Engineering at GoogleA Trans Woman Was Charged With 'False Personation' for Giving the NYPD Her Real NameThe NAACP Statement On the North Carolina Legislature's HB2 Repeal ProposalPosture Magazine Twitter Liz Fong-Jones Become a #causeascene Podcast sponsor because disruption and innovation are products of individuals who take bold steps in order to shift the collective and challenge the status quo.Learn more > Transcript Kim Crayton: 00:00 Today's episode is supported by Tito. Tito is a design-led event software that makes it super easy to manage tickets for events. The product aims to be clean, simple and intuitive to use while the business aims to be sustainable at the go, driven by people and principles rather than growth at all costs. To learn more, visit their website at ti.ton/a: 00:24 [music]Kim Crayton: 00:42 Welcome to the #causeascene podcast, the show focused on the strategic disruption of the status quo in technical organizations, communities and events.n/a: 00:53 [music]Kim Crayton: 00:55 Hello everyone and welcome to today's episode of the #causeascene podcast. My guest today is someone I've been watching and following and have been admiring for a while, so I'm really excited about having her on the show. So I'll let her introduce herself.Liz Fong-Jones: 01:09 Hi, I'm Liz Fong-Jones and I am a developer advocate, Site Reliability Engineer, and an advocate for employee rights and for ethical product design.Kim Crayton: 01:21 So we're going to start this right off. And so could you tell me why is it important to cause a scene and how are you causing a scene, Liz?Liz Fong-Jones: 01:28 I think it's important to cause a scene because the status quo is just not working for some people. And I think that we have to make sure that the world is more fair and that means that we have to disrupt the status quo. And the way that I'm disrupting the status quo, in part, is if you all saw the Google Walkout from last November when 20,000 Google employees and contractors walked out of the Google offices worldwide to protest sexual harassment. What I want to see is I want to see more of that. And in order to see more of that, it means that people that don't necessarily feel safe or supported to go on strike can go on strike, right? The people that uh, are on H-1B visas, people that are contractors who are afraid that their bosses are going to retaliate against them, right? Like all of these groups of people can't necessarily easily engage in industrial act...
Happy Holidays, everyone! Melanie and Mark wrap up a great year by reminiscing about some of their favorite episodes! We also talk about the big news of the year, our favorite articles, and what’s coming up for the GCP Podcast in 2019. Cool things of the week Kubernetes and GKE for developers: a year of Cloud Console blog Reducing gender bias in Google Translate blog Cloud Security Command Center is now in beta and ready to use blog Main content Podcast accomplishments! We have awesome new intro and outro music, new website, new YouTube videos! We hit 1 million and then 2 million downloads! Mark and the podcast are celebrating their three year anniversary! Top 10 most downloaded episodes of all time! GCP Podcast Episode 111: Google Cloud Platform with Sam Ramji podcast GCP Podcast Episode 112: Percy.io with Mike Fotinakis podcast GCP Podcast Episode 146: Google AI with Jeff Dean podcast GCP Podcast Episode 127: SRE vs Devops with Liz Fong-Jones and Seth Vargo podcast GCP Podcast Episode 128: Decision Intelligence with Cassie Kozyrkov podcast GCP Podcast Episode 113: Open Source TensorFlow with Yifei Feng podcast GCP Podcast Episode 88: Kubernetes 1.7 with Tim Hockin podcast GCP Podcast Episode 108: Launchpad Studio with Malika Cantor and Peter Norvig podcast GCP Podcast Episode 130: Data Science with Juliet Hougland and Michelle Casbon podcast GCP Podcast Episode 125: Open Source at Google Cloud Platform with Sarah Novotny podcast Top 10 most downloaded episodes for 2018! Exact same list except Tim Hockin is not #7. Following episodes go up a number and we added to #10 spot. GCP Podcast Episode 122: Project Jupyter with Jessica Forde, Yuvi Panda and Chris Holdgraf podcast Mark’s favorite episodes GCP Podcast Episode 129: Developer Relations with Mandy Waite podcast GCP Podcast Episode 121: Kontributing to Kubernetes with Paris Pittman and Garrett Rodrigues podcast GCP Podcast Episode 131: Actions on Google with Mandy Chan podcast GCP Podcast Episode 148: Wellio with Sivan Aldor-Noiman and Erik Andrejko podcast GCP Podcast Episode 110: CPU Vulnerability with Matt Linton and Paul Turner podcast GCP Podcast Episode 125: Open Source at Google Cloud Platform with Sarah Novotny podcast GCP Podcast Episode 140: Container Security with Maya Kaczorowski podcast Melanie’s favorite episodes GCP Podcast Episode 117: Cloud AI Fei-Fei Li was the Chief Scientist of AI/ML at Google podcast GCP Podcast Episode 114: ML Bias & Fairness with Timnit Gebru and Margaret Mitchell podcast GCP Podcast Episode 141: Accessibility in Tech podcast GCP Podcast Episode 136: Robotics with Raia podcast GCP Podcast Episode 150: Strange Loop, Remote Working, and Distributed Systems with KF podcast DL Indaba GCP Podcast Episode 147: DL Indaba: AI Investments in Africa podcast GCP Podcast Episode 149: Deep Learning Research in Africa with Yabebal Fantaye & Jessica Phalafala podcast GCP Podcast Episode 152: AI Corporations and Communities in Africa with Karim Beguir & Muthoni Wanyoike podcast GCP Podcast Episode 157: NeurIPS and AI Research with Anima Anandkumar podcast Favorite announcements, products, and more at Google Cloud Unity and Google Cloud Strategic Alliance blog Open Match blog Cloud TPU site Google Dataset Search is in beta site No tricks, just treats: Globally scaling the Halloween multiplayer Doodle with Open Match on Google Cloud blog GKE On-Prem site Open Source - Knative release, Skaffold, Istio updates, gVisor, etc. Google in Ghana blog Cloud NEXT blog GCP Podcast Episode 137: Next Day 1 podcast GCP Podcast Episode 138: Next Day 2 podcast GCP Podcast Episode 139: Next Day 3 podcast Unity and DeepMind partner to advance AI research blog Introducing PyTorch across Google Cloud blog Question of the week What were your personal highlights for 2018? Mark Agones Introducing Agones: Open-source, multiplayer, dedicated game-server hosting built on Kubernetes blog github The new website Having Melanie join me on the podcast Melanie Bringing Francesc back Meeting Grace GCP Podcast Episode 142: Agones With Mark Mandel and Cyril Tovena podcast Where can you find us next? It’s the holidays! Special thanks! Thank you guests Thank you Jennifer Thank you HD Interactive: James, Trae, Sabrina, and Sean Thank you Greg Thank you Neil, Chuck, and Shana Thank you MBooth for the website overhaul and social media support Thank you Francesc Thank you listeners!
In episode 6 of O11ycast, Charity and Rachel talk to Google developer advocate Liz Fong-Jones about ways to build systems to be more transparent and explainable. The post Ep. #6, Customer Reliability Engineering with Google’s Liz Fong-Jones appeared first on Heavybit.
Mark and Melanie speak with Patrick Flynn and Mike Eltsufin about their exciting new Java products for Google Cloud. Mike tells us all about the new Spring Cloud GCP, a helpful tool that integrates Google Cloud Platform APIs and the Spring Framework. Patrick elaborates on his team’s new tool, Jib, a Java container image builder, and how it helps Java developers. Patrick Flynn Patrick Flynn is a long time Java developer who spent many years in Google Ads, and is now four years into being the tech lead of the Google Cloud Java Tools team. Mike Eltsufin Mike Eltsufin has been an enterprise Java application developer in the banking sector for over a decade before joining Google. Currently, he’s the tech lead of the Cloud Java Frameworks team, focusing on bringing the goodness of Spring Boot to Google Cloud Java developers. Cool things of the week Introducing container-native load balancing on Google Kubernetes Engine blog Simplifying cloud networking for enterprises: announcing Cloud NAT and more blog Store it, analyze it, back it up: Cloud Storage updates bring new replication options blog Postmortems and Retrospectives with Liz and Seth video GCP Podcast Episode 127: SRE vs Devops with Liz Fong-Jones and Seth Vargo podcast Interview App Engine site Kubernetes Engine site Spring Framework site Spring Boot site Spring Cloud GCP site Spring Cloud GCP on GitHub site Cloud Pub/Sub site Spanner site Cloud Sql site Cloud Datastore site Docker site Jib on GitHub site Cloud Tools for IntelliJ Documentation site Introducing Jib — build Java Docker images better blog Bazel site Skaffold on GitHub site Netty site SpringOne site Knative and riff for Spring Developers video Jib Gitter site Sig Apps site Kubernetes Slack site Codelabs site Question of the week What if we have an object in Google Cloud Storage, and I want to automatically change an aspect of it – such as: Downgrade the storage class of objects older than 365 days to Coldline Storage. Delete objects created before January 1, 2013. Keep only the 3 most recent versions of each object in a bucket with versioning enabled. Managing Object Lifecycles docs and guide Where can you find us next? Patrick’s team will be at KubeCon Shanghai and Oracle Code One and he will be at KubeCon Seattle Mark will be at KubeCon in December. Melanie will be at Twilio Signal $BASH event on Thursday and SOCML in November.
In episode 6 of O11ycast, Charity and Rachel talk to Google developer advocate Liz Fong-Jones about ways to build systems to be more transparent and explainable.
It’s the third and final day for us at NEXT, and Mark and Melanie are wrapping up with some great interviews! First, we spoke with Stephanie Cueto and Vivian San of Techtonica, a San Francisco non-profit. Next, Liz Fong-Jones and Nikhita Raghunath joined us for a quick discussion about open source and Stackdriver and last but not least, Robert Kubis helped us close things sharing what it means to do DevRel at this event. Stephanie Cueto and Vivian San Stephanie Cueto is a Software Engineer and advocate for the Latinx & women community. She has been involved in the Tech community since 2016. Playing with code at an early age and working in education led to my interest in becoming a Software Engineer. Currently she is a Software Engineer Apprentice at Techtonica, where she has gained the skills to build projects in MongoDb, MySQL, Express.js, React, and Node.js. During the program, she created Salient Alert, a platform for reporting ICE Raids and Checkpoints. Vivian San is a highly analytical full-stack software engineer with an educational background in the hard sciences. She is strongly motivated by writing clean, efficient code, and passionate about teaching and giving back to underrepresented individuals and communities. Liz Fong-Jones and Nikhita Raghunath Liz Fong-Jones is a Staff Site Reliability Engineer at Google and works on the Google Cloud Customer Reliability Engineering team in New York. In her 10+ years at Google she has worked across eight different teams spanning the stack from Google Flights to Cloud Bigtable. She lives with her wife, Metamour, and a Samoyed/Golden Retriever mix in Brooklyn. In her spare time she plays classical piano, leads an EVE Online alliance, and advocates for transgender rights. Nikhita Raghunath is an intern at Red Hat and works on the extensibility of Kubernetes. Previously, she was a Google Summer of Code (2017) student for the Cloud Native Computing Foundation (CNCF) and also worked on Kubernetes. She is interested in backend applications, distributed systems and Linux. Nikhita likes programming in Go, C++, C, and Python. She also likes to give talks at conferences and speak about her work. Robert Kubis Robert Kubis is a developer advocate for the Google Cloud Platform based in London, UK, specializing in container, storage, and scalable technologies. Before joining Google, Robert collected over 10 years of experience in software development and architecture. He has driven multiple full-stack application developments at SAP with a passion for distributed systems, containers, and databases. In his spare time he enjoys following tech trends, trying new restaurants, traveling, and improving his photography skills. Interviews Made Here Together: NEXT Developer Keynote video Techtonica site I am Remarkable Workshop site Haben Girma’s accessibility presentation at NEXT video GCPPodcast Episode 127: SRE vs Devops with Liz Fong-Jones and Seth Vargo podcast Red Hat site Kubernetes site Introducing Agones blog Stackdriver site OpenCensus site GCPPodcast Episode 118: OpenCensus with Morgan McLean and JBD podcast Edge TPU site GCPPodcast Episode 135: VirusTotal with Emi Martínez podcast Cloud Spanner site
Some companies that offer services expect you to do things their way or take the highway. However, Google expects people to simply adapt the tech company’s suggestions and best practices for their specific context. This is how things are done at Google, but this may not work in your environment. Today, we’re talking to Liz Fong-Jones, a Senior Staff Site Reliability Engineer (SRE) at Google. Liz works on the Google Cloud Customer Reliability Engineering (CRE) team and enjoys helping people adapt reliability practices in a way that makes sense for their companies. Some of the highlights of the show include: Liz figures out an appropriate level of reliability for a service and how a service is engineered to meet that target Staff SRE involves implementation, and then identifying and solving problems Google’s CRE team makes sure Google Cloud customers can build seamless services on the Google Cloud Platform (GCP) Service Level Objectives (SLOs) include error budgets, service level indicators, and key metrics to resolve issues when technology fails Learn from failures through instant reports and shared post-mortems; be transparent with customers and yourself GCP: Is it part of Google or not? It’s not a division between old and new. Perceptions and misunderstandings of how Google does things and how it’s a different environment Google’s efforts toward customer service and responsiveness to needs Migrating between different Cloud providers vs. higher level services How to use Cloud machine learning-based products GCP needs to focus on usability to maintain a phase of growth Offer sensible APIs; tear up, turn down, and update in a programmatic fashion Promotion vs. Different Job: When you’ve learned as much as you can, look for another team to teach something new What is Cloud and what isn’t? Cloud deployments require SRE to be successful but SREs can work on systems that do not necessarily run in the Cloud. Links: Cloud Spanner Kubernetes Cloud Bigtable Google Cloud Platform blog - CRE Life Lessons Google SRE on YouTube
This week is a clash of titans! Liz Fong-Jones and Seth Vargo join Mark and Melanie, to battle out on which is better: SRE or Devops (hint - everyone wins!). Liz Fong-Jones Liz is a Staff Site Reliability Engineer at Google and works on the Google Cloud Customer Reliability Engineering team in New York. She has worked on services ranging from Google Flights to Cloud Bigtable in her 10+ years at Google. She lives with her wife, metamour, and a Samoyed/Golden Retriever mix in Brooklyn. In her spare time, she plays classical piano, leads an EVE Online alliance, and advocates for transgender rights. Seth Vargo Seth Vargo is a Developer Advocate at Google. Previously he worked at HashiCorp, Chef Software, CustomInk, and a few Pittsburgh-based startups. He is the author of Learning Chef and is passionate about reducing inequality in technology. Seth is an active member of the DevOps community and has written thought-leader-y pieces such as the 10 Myths of DevOps. Cool things of the week Google I/O session youtube What’s new in Firebase at I/O 2018 blog Introducing ML Kit for Firebase blog Jeff Dean is new Head of AI wired Introducing Cloud Memorystore: A fully managed in-memory data store service for Redis blog Google Group Issue tracker Interview class SRE implements DevOps youtube series DevOps wikipedia Site Reliability Engineering (SRE) site Terraform site Chef site Puppet site Ansible site SaltStack site Prometheus site Datadog site Stackdriver site The Site Reliability Workbook: Practical Ways to Implement SRE amazon Seeking SRE o’reilly Customer Reliability Engineering Blog Series blogs Question of the week I’m a researcher at a regionally accredited academic institution and I need compute resources. Does Google Cloud have any programs that can help me out? Google Cloud Platform announces new credits program for researchers blog faq Where can you find us next? Mark will be speaking at the Monthly SF Game Development Community, presenting on You Can’t Just Add More Servers on May the 30th in San Francisco. Melanie is speaking at the Understand Risk Forum on May 17th, in Mexico City.
