Podcast appearances and mentions of emily omier

18PODCASTS
36EPISODES
34mAVG DURATION
1MONTHLY NEW EPISODE
Jul 31, 2024LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about emily omier

The Business of Open Source

Hosted by emily omier

The New Stack Podcast

3 episodes with emily omier

Fireside with Voxgig

2 episodes with emily omier

Latest podcast episodes about emily omier

Open source as a privilege of successful businesses with Tom Wilkie

The Business of Open Source

Play Episode Listen Later Jul 31, 2024 44:55

This week on The Business of Open Source, I talked with Tom Wilkie, CTO at Grafana Labs. We talked about how he had a 10-month run building a startup before ultimately joining Grafana in an acquisition — why he thought that was the right move at the time and how it's developed since then. But Tom has also had a long career in open source businesses, and we had plenty to talk about. My favorite quote: “I've always seen open source as a privilege of successful businesses, so I want to be a successful business.” At Kausal, Tom's first startup, the focus was on financial sustainability from the beginning, and they had $100k in revenue in 10 months before the acquisition by Grafana. At Grafana Labs, everything is done with an eye on revenue — yes, there are tons of open source projects and tons of investment in those projects, but it has to be tied to revenue. Some other things we talked about: Starting an open source company with the explicit goal of being a successful business, which is not what Tom sees all open source companies doingWhy you should probably start with open source code at the beginning if you intend to open source at all, because otherwise your code will get messy and you'll be too embarrassed to open itHow integrations are the secret sauce that Grafana Labs monetizes — why that it, and how it allows so much code to stay open source without threatening Grafana's financial successChoosing a SaaS strategy versus choosing an enterprise on-prem strategy — and how you need to be aware of what your competitors are doing when choosing which is right for you. Thanks for listening! I'm Emily Omier, a consultant who works with company on open source strategy related to positioning and product management. If you're struggling with your strategy around open source — whether you're unsure how to differentiate in the ecosystem or not sure what to open source — I can help. Learn more here.

business starting businesses privilege saas cto real world open source ithow wilkie founder stories grafana grafana labs emily omier

L'Open Source, une stratégie de croissance - Emily OMIER (Emily Omier Consulting) et Guillaume GRILLAT (Adevinta) - #S06EP20

Paroles de Tech Leaders

Play Episode Listen Later Jul 28, 2024 21:12

interview france tech summit consulting rocks open source visitez guillaume strat nouvel ausha venez rejoignez croissance adevinta emily omier grillat

How to Find Undiscovered (Yet Profitable) Niches

Everyone Hates Marketers | No-Fluff, Actionable Marketing Podcast

Play Episode Listen Later Mar 5, 2024 53:10

Emily is a former journalist and content writer. Through a tragic event, she understood and used her professional and personal experiences and skills to reposition herself, identifying a niche in the tech industry, leading to significant success in her newly chosen field.Emily's story is nothing short of inspiring. This episode is a blend of heartfelt storytelling and hardcore business strategy. In this episode, you'll learn:The essence of positioning in businessHow to transition from offering everything to everyone to finding an undiscovered, yet profitable, nicheHow to identify your skills and leverage them to make moneyResources and Recommendations:Suggestion: Taking an improv class to build confidence and adaptability in business and public speaking."Improvisation and the Theatre" by Keith Johnstone"Thinking in Bets" by Annie DukeTopics Covered:(00:00) - Intro (01:33) - The importance of positioning in business (06:29) - Who can ruin your positioning? (11:06) - Why 99% of business is b*lls (13:10) - Leveraging experience and skills for career evolution (17:46) - Finding and mastering your niche (19:47) - Knowing when to specialize (23:17) - Embracing career changes (31:12) - Navigating niche market risks (45:49) - Emily's top 3 resources ***→ Join 14,000+ weirdos who learn to stand the f*ck out with my daily (Mon-Fri) emails: everyonehatesmarketers.com→ See my pretty face on LinkedIn: https://www.linkedin.com/in/louisgrenier/→ Leave a review on Apple Podcasts: https://apple.co/3p4wL4r→ Leave a review on Spotify: https://open.spotify.com/show/7iEF1qovZZiaP1iRtxGARoFinally...If you're curious about putting your brand in front of my 14,000+ daily newsletter subscribers and/or podcast listeners, email me: louis@everyonehatesmarketers.com

spotify marketing navigating embracing theater branding advertising leveraging profitable bets positioning digital media improvisation niches businesshow undiscovered nichehow louis grenier emily omier

Episode 66: Open Source Founders Summit 05f524, with Emily Omier, Founder

Open Source Underdogs

Play Episode Listen Later Feb 26, 2024 15:04

OSFS Website: https://05f5.com/ Intro Michael: Hello and welcome to Open Source Underdogs! I’m your host Mike Schwartz, and this is a special episode to promote the inaugural Open Source Founder Summit, which is happening in Paris, May 27th and 28th. Talking about this event is Emily Omier, host of The Business of Open Source Podcast,... The post Episode 66: Open Source Founders Summit 05f524, with Emily Omier, Founder first appeared on Open Source Underdogs.

founders business summit open source mike schwartz open source podcast emily omier

Opportunités et défis des produits basés sur l' open source - Emily OMIER (Emily Omier Consulting) et Philippe ENSARGUET (Orange) - #S06EP03

Paroles de Tech Leaders

Play Episode Listen Later Feb 25, 2024 23:00

business interview leadership france tech startups dans orange consulting rocks cto open source femme k2 opportunit rejoignez produits emily omier

Emily Omier and Remy Bertot Talk About Open Source Founders Summit

The Business of Open Source

Play Episode Listen Later Jan 29, 2024 12:22

How can we get founders of open source companies together to share ideas, share strategies and tactics and build a community not just of open source practitioners, but of open source business owners? We create a conference/summit/retreat to bring them together to learn and to work on their businesses together. At least that is the bet that Remy Bertot and I are makingIn this episode, I talked with Remy about Open Source Founders Summit, a summit they're organizing on May 27th and 28th, 2024 in Paris, France — we each shared our motivations for organizing the event, and talked about why we think it's important for people to come together in person. You should listen to the episode, but if you don't want to, the bottom line is that we think there needs to be a space for all open source founders (not just the DevTools, not just the VC-backed) can come together to share business ideas — a place where business, not tech, is the focus. Listen to the episode, and join us in May!

founders business france summit vc real world open source founder stories devtools emily omier

Why We Need To Take More Risks with Emily Omier

Soloist Women

Play Episode Listen Later Dec 7, 2023 41:01

You've made bold moves to get where you are, including having the guts to buck the norm and go solo. But are you still taking new (calculated) risks to build the business and life you've imagined? Consultant to open source start-ups—and confirmed risk taker—Emily Omier believes we need to take more risks.Emily shares her gutsy story and what it's taught her about risk:What happens when you hit rock bottom in your life (while running your business) and must pull yourself out of it.The difference between something that is risky vs. something that feels risky.Why we—women in particular—don't take nearly enough thoughtful risks.The road from “mercenary” (where it's mostly about the money) to collaboration (where it's all about relationships and outcomes).How paying attention to these two emotions will teach you what risks are worth your time and investment.LINKSEmily Omier Website | Podcast | LinkedInRochelle Moulton Email List | LinkedIn | Twitter | InstagramBIOEmily helps open source startups accelerate revenue growth with killer positioning. She writes about entrepreneurship for engineers, and hosts The Business of Open Source, a podcast about building open source companies.BOOK A STRATEGY CALL WITH ROCHELLERESOURCES FOR SOLOISTSThe Soloist Women Mastermind (Apply January 2024) A structured eight-month mastermind with an intentionally small group of hand-picked women soloists grappling with—and solving—the same kinds of challenges. 10 Ways To Grow Revenue As A Soloist (Without Working More Hours): most of us have been conditioned to work more when we want to grow revenue—but what if we just worked differently?The Soloist Women community: a place to connect with like-minded women (and join a channel dedicated to your revenue level).The Authority Code: How to Position, Monetize and Sell Your Expertise: equal parts bible, blueprint and bushido. How to think like, become—and remain—an authority.

business position consultants risks open source monetize emily omier

Episode 208: FOSSY 2023 with Emily Omier

Sustain

Play Episode Listen Later Nov 17, 2023 15:58

Guest Emily Omier Panelist Richard Littauer Show Notes Hello and welcome to Sustain! Richard is in Portland at FOSSY, the Free and Open Source Software Yearly conference that is held by the Software Freedom Conservancy. Today, we chat with Emily Omier, a revenue strategy and positioning consultant who helps open source startups accelerate revenue and community growth. Based in Paris, Emily lends her expertise to primarily European startups, helping them navigate their unique challenges and carve out a profitable strategy. We discuss her approach, which connects perfectly with her marketing background with company and product alignment in the open source space. We also touch on the critical role open source workers play in business profitability. Press download now to hear more! [00:00:47] Emily explains her role as a consultant who works with open source businesses to help them clarify their commercial strategy and positioning. [00:01:24] Emily reveals that she's originally from Portland but currently resides in Paris. She serves both the European and American market and shares why she finds the European ecosystem more interesting. [00:03:00] Richard inquires about Emily's approach to improving profit margins for European startups through open source strategy. Emily explains that her clients are typically companies that have already decided to be open. [00:05:56] Emily tells us that her ideal clients are relatively small startups that have some revenue and a commercial offering. [00:07:21] The topic of marketing comes up and Emily explains that although her background is in marketing, her current role involves various parts of a company, not just marketing. She discusses the importance of knowing the company's identity, understanding the target user for the opens source project, and aligning product development with the company's story. [00:10:06] We find out that Emily works mainly with founders and has never worked directly with a community or an Open Source Program Office (OSPO). She emphasizes the importance of open source workers in big businesses being able to articulate how their work in open source contributes to the company's bottom line. [00:11:45] How did Emily get into this field if she hasn't worked with open source communities? She goes in depth how she was working in marketing with Kubernetes companies in the cloud native sphere, where she found a significant overlap with open source communities. [00:13:43] Find out where you can learn more about Emily online. Links SustainOSS (https://sustainoss.org/) SustainOSS Twitter (https://twitter.com/SustainOSS?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) SustainOSS Discourse (https://discourse.sustainoss.org/) podcast@sustainoss.org (mailto:podcast@sustainoss.org) SustainOSS Mastodon (https://mastodon.social/tags/sustainoss) Richard Littauer Twitter (https://twitter.com/richlitt?lang=en) Software Freedom Conservancy (https://sfconservancy.org/) Open OSS (https://openoss.sourceforge.net/) Emily Omier Website (https://www.emilyomier.com/) The Business of Open Source Podcast (https://www.emilyomier.com/podcast) Emily Omier Twitter (https://twitter.com/EmilyOmier?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) Emily Omier LinkedIn (https://fr.linkedin.com/in/emilyomier?challengeId=AQHJRrJupUzrmAAAAYmn4QujXz32UfFYYdOJu6Cwe8np9YMXmh2KyyQTSopAxSC0DOoo2UQR1RVR5_3WnrQ5dUEB_ACLVd1nlQ&submissionId=8fe02694-e1b5-7617-5a1c-ad714a549b7f&challengeSource=AgGQwK0uUAAOQQAAAYmn4UKDUfqqgKKeE4IgYfXz2zitfD0NjjWc4ZlBEVCHufA&challegeType=AgEU8E9XcIwLhQAAAYmn4UKGTco4NnuMpQ_7KshxTbqlpYZsv3Mqpe0&memberId=AgE1fNY20cuH2AAAAYmn4UKJNK_nVe-y3Y6S8Exd8XMDdOg&recognizeDevice=AgHGFy9fem4K2AAAAYmn4UKMAGkvgKlal2Kdg8CJ9dDoydDseQXZ) The New Stack-Entrepreneurship for Engineers (https://thenewstack.io/author/emily-omier/) Credits Produced by Richard Littauer (https://www.burntfen.com/) Edited by Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/) Show notes by DeAnn Bahr Peachtree Sound (https://www.peachtreesound.com/) Special Guest: Emily Omier.

american business european portland press engineers edited sustain kubernetes software freedom conservancy submissionid open source podcast emily omier fossy

Emily Omier, Positioning Strategy Consultant for Open Source Startups

The Craft Of Open Source

Play Episode Listen Later Nov 14, 2023 47:06

Positioning for open source startups is directly connected not just to their free resource but also their commercial strategy. How should business leaders approach this to yield the best results and maintain a well-performing team? In this episode, Positioning Strategy Consultant for open source startups Emily Omier talks about keeping teams in alignment and in the same feedback loop. She explains how this approach can help them navigate through the complex mind games involved in these projects and why she considers the government as the best open source clients out there. Emily also shares how she sees the industry progress in the next half decade in terms of dominance and refutes three fallacies about open source projects.

startups consultants positioning open source strategy consultant emily omier

Episode 107 Jonathan Reimer, co-founder, crowd.dev

Fireside with Voxgig

Play Episode Listen Later Aug 10, 2023 35:29

This time we welcome Jonathan Reimer to the podcast. Jonathan and I dive straight into his company 'crowd.dev', where they are working hard at building an open source developer data platform that they hope will help customers get a clear picture of how developers engage with their brand, their tools, and the developer community in general. Continuing on theme from Emily Omier's episode last week, we discuss the magic word "profitability". With over five thousand open source products on the market, not everyone can bank on being the next Red Hat - so how can the rest of us do it? Few people would know better than Jonathan and the team at crowd.dev, as not only do they specialise in finding revenue opportunities for open source companies, they're an open source company themselves. Like many of us, Jonathan stumbled into the world of developer relations, meetups and open source products somewhat by accident. With a background in economics that became a career in sales and marketing, Jonathan realised that developers had a huge influence over the products their companies ended up purchasing. If he wanted those developers onside, the product had to be open source - from there, he co-founded crowd.dev and the rest is history. He also informs me about a system called ‘accounts-based marketing' that crowd.dev has had a lot of success with. How do you prove that your services are either making or saving money for a business? Accounts-based marketing is a new way for companies like crowd.dev to get proper attribution from customers, and display the value they add to the companies they work for. For anyone interested in the nitty-gritty of life as an open source company, this one's for you.

co founders crowd accounts red hat reimer emily omier

Episode 105 Emily Omier, Open Source Consultant, Podcast Host and Advisor

Fireside with Voxgig

Play Episode Listen Later Jul 27, 2023 36:22

Tune in for an in-depth chat about how companies can improve their revenue through fully understanding their open source project and software. Emily speaks truthful sense. This is a tough message for the overly optimistic among you. As Richard so rightly points out – knowing the right thing to do vs doing it are two different things! Emily outlines the many, many ways a company can leverage open source in their offerings and is beautifully frank about source available vs open source… So where does DevRel come in to this? Well – unless a company has clearly identified its positioning, then how can DevRel communicate the benefits your project, service or product deliver? You need the template – visit Emily's website (link below) – to clearly define your company's positioning. Now with clear positioning, the first thing you should do is raise your prices…what? Crazy! But no. It's a real and possible strategy. Emily shares her background with us, and issues a warning about choosing journalism as a career…a good way to get comfortable with not having any money. Do not ask Emily to write a Kubernetes 101. She's had quite enough of that, thank you very much. But the repeated requests did wake her up to the opportunity to help companies to take their messaging and positioning seriously. Reach out to Emily Omier: https://www.linkedin.com/in/emilyomier/ Enter your name and email to download the template here: https://www.emilyomier.com/ Find out more and listen to previous podcasts here: https://www.voxgig.com/podcast Subscribe to our newsletter for weekly updates and information about upcoming meetups: https://voxgig.substack.com/ Join the Dublin DevRel Meetup group here: www.devrelmeetup.com

reach consultants advisor podcast hosts open source kubernetes devrel emily omier

Monetizing Open Source Software

Startup Hustle

Play Episode Listen Later Jun 12, 2023 44:13

Join Matt Watson and Emily Omier, Positioning Consultant at Emily Omier Consulting, for a conversation around monetizing open-source software. Listen to their insightful discussion about what an open source is, the pros and cons of an open-source business, and the business models of an open source. Find Startup Hustle Everywhere: https://gigb.co/l/YEh5 This episode is sponsored by Full Scale: https://fullscale.io Learn more about Emily Omier Consulting: https://www.emilyomier.com See omnystudio.com/listener for privacy information.

monetizing open source software full scale emily omier

HOSB - Messaging, Positioning, and Figuring out if your Open Source Project Sucks.

The Hacking Open Source Business Podcast

Play Episode Listen Later Nov 8, 2022 30:24 Transcription Available

Live from all things open, Matt & Avi welcome Emily Omier to the Hacking Open Source Business Podcast. Emily and Avi discuss their talk on how to determine if your open source project sucks or not, as well as dig into how to interview users, figure out the position and messaging for your open source project, and more! We hope you will join us!Checkout our other interviews, clips, and videos: https://l.hosbp.com/YoutubeDon't forget to visit the open-source business community at: https://opensourcebusiness.community/Visit our primary sponsor, Scarf, for tools to help analyze your #opensource growth and adoption: https://about.scarf.sh/Subscribe to the podcast on your favorite app:Spotify: https://l.hosbp.com/SpotifyApple: https://l.hosbp.com/AppleGoogle: https://l.hosbp.com/GoogleBuzzsprout: https://l.hosbp.com/Buzzsprout

spotify live sucks checkout messaging positioning buzzsprout scarf apple google open source projects emily omier

Emily Omier — How to "sell" open-source

Tiny DevOps

Play Episode Listen Later Feb 22, 2022 41:20

Does your company produce open-source software? Are you considering doing so? Emily Omier helps open-source startups with product positioning, and today she joins me to discuss how you can position your open-source project, if you have one, and help you decide if you should have one.In this episode: What are the reasons to contribute open-source, as a company? What are the differences and siilarities between open-source and non-open-source software products. How to market your product to both technical and non-technical people. Why to focus on outcomes before features Who are the buyers/stakeholders for your product? Use language that resonates with your target audience Should you seek contributors for an open-source project? And if so, how? Tips for accepting financial sponsorships GuestEmily Omieremilyomier.comCloud Native Startup podcastPositioning Open Source blogTwitter: @EmilyOmierWatch this episode on YouTube.

business marketing tips open source emily omier

Laying The Groundwork: How to Position an Open-Source Project

The New Stack Podcast

Play Episode Listen Later Jan 4, 2022 31:16

The most attractive characteristic of open-source projects is the potential to tap into the total addressable market of collaborators. But when looking for users to your project and building a community around it requires the project to stand out from the millions of others, how do you build a plan to monetize it?In this podcast, Emily Omier, a positioning consultant who works with startups to stake out the right position in the cloud native / Kubernetes ecosystem, discusses how to grow your project by finding the right market category for your open-source startup.Alex Williams, founder and publisher of The New Stack hosted this podcast.

position laying kubernetes groundwork alex williams open source projects new stack emily omier

Accelerating Revenue Growth with Killer Positioning with Emily Omier

The Recognized Authority

Play Episode Listen Later Dec 27, 2021 49:53

How do you give yourself the maximum chance of winning your ideal clients? What are the characteristics of clients that you are uniquely able to help? How do you know if your messaging is right? In this episode, Emily Omier and Alastair McDermott discuss the differences between strategy, positioning and messaging, how to get more referrals, and how to make it easier for your clients to make a purchasing decision. They also discuss being authentic, contrarian and developing a point of view. “If you can't say what your product is in under eight words, you have a positioning problem.” -- Emily Omier

killers record positioning accelerating alastair revenue growth emily omier

Percona Podcast 46 - Cloud Native Open Source Project : Positioning vs. messaging vs. marketing

Percona's HOSS Talks FOSS: The Open Source Database Podcast

Play Episode Listen Later Nov 16, 2021 41:24

In this HOSS Talks FOSS 46, Matt Yonkovit, Head of Open Source Strategy at Percona, sits down with Emily Omier, Founder of Emily Omier Consulting. Emily is specialist in the positioning of cloud native open source projects. She sits down to talk to us about how to position your project and gain additional traction in the market. We discuss the often subtle differences between positioning vs. messaging vs. marketing in a context of growing Open Source adoption and in software as service (SaaS) business models.

founders head marketing saas messaging positioning open source cloud native open source projects percona emily omier

How GitOps Benefits from Security-as-Code

The New Stack Podcast

Play Episode Listen Later Oct 26, 2021 33:37

Security-as-code is the practice of “building security into DevOps tools and workflows by mapping out how changes to code and infrastructure are made and finding places to add security checks, tests, and gates without introducing unnecessary costs or delays,” according to tech publisher O'Reilly. In this latest “pancakes and podcast” special episode —recorded during a pancake breakfast during KubeCon + CloudNativeCon in October — we discuss how security-as-code can benefit emerging GitOps practices.The guests were Sean O'Dell, director of developer advocacy, Accurics, Sara Joshi, who was an associate software engineer for Accurics when this recording was made; Parminder Singh, chief information security officer (CISO), for hybrid-cloud digital-transformation services provider DigitalOnUs; Brendan O'Leary, staff developer evangelist, GitLab; Cindy Blake, senior security evangelist, GitLab; and Emily Omier, contributor, The New Stack and owner of marketing consulting provider Emily Omier Consulting.Alex Williams, founder and publisher of TNS, hosted the podcast.

benefits security code makers devops ciso gitlab alex williams tns gitops new stack kubecon cloudnativecon emily omier

E6: Product Positioning for OSS Startups

Open Source Startup Podcast

Play Episode Listen Later Jul 16, 2021 42:10

Emily Omier, Positioning Consultant for Commercial Open-Source Companies 2:53: Emily discusses the positioning of the 2 products OSS companies have: the OSS product & the paid product 5:30: Emily talks about her process & how she interacts with ‘super users' to understand what they use the project for (hint: it's often not what the project owner had in mind) 8:02: Emily digs into competition and how most OSS startups are competing with a manual process vs. other companies 10:09: The hard parts about positioning an OSS startup are discussed: resource challenges, focus, and engineers as challenging buyers 14:36: Emily talks about common mistakes OSS companies make 18:54: We dig into the right team set-up for OSS companies 21:10: Emily talks about the key areas OSS companies should think about when coming up with their positioning: expectations on what the product does and does not contain, expectations on related tools, competitors, and target users (type of app, workload, product, use case, etc.) 35:45: We talk about monetization and how to think about targeting the right customers 37:50: Emily ends on advice for early-stage OSS founders: don't try to create a completely new category, bring well-understood concepts together, and position as the best option for a small market early-on

startups product positioning oss product positioning emily omier

Positioning your product with Emily Omier

Under the Hood of Developer Marketing

Play Episode Listen Later Jul 8, 2021 37:47

Theme of the week: How important is the positioning of your product or OSS project? Ask any marketer and they will tell you positioning is K-E-Y. Sure, that is true for consumer or B2B products and services. Is it that important for an audience that is very technical and cares about solving problems? Spoiler alert: it is important, especially when your consumers can actually look under the hood and see what it is you are offering to them (pun intended). Emily Omier joined us to talk positioning and we also talked about: What do we mean when we say “positioning” How is positioning developer tools different from consumer product positioning Why does it matter to developers how you will position your product Should OSS or commercial projects try creating a new category? Does creating new categories create confusion? How does it affect developers? Which is the best positioning strategy? How important is segmentation in this strategy? What's in the “how to position your product/project” effectively checklist? And more! https://www.devrelx.com/podcast (Listen to this episode) to better understand developer advocates. Let's talk Data! This is the graph we discuss with Emily: https://www.devrelx.com/trends?lightbox=comp-kisqhm6d3__8898bfb6-ccf4-420b-9621-a1d78fa8fc8f_runtime_dataItem-kisqhm6e (Kubernetes users influence decisions) Emily Omier is a former journalist and marketing writer who helps entrepreneurial engineers articulate their product's value and get more ideal customers. She hosts The Business of Cloud Native, a podcast about the business side of cloud native and open source technology, contributes to The New Stack and writes the Positioning Open Source blog. https://www.emilyomier.com/press-info?utm_source=UnderTheHoodofDeveloperMarketing&utm_medium=Podcast_Episode&utm_campaign=EO (More info.) https://devecon.typeform.com/to/Q9WqxzJN (We want to make this podcast better for you! Take a few minutes to take our short survey.)

business spoilers data product b2b positioning oss kubernetes cloud native new stack dataitem emily omier

Discussing Bloomberg's Cloud Native Journey with Andrey Rybka

The Business of Open Source

Play Episode Listen Later Nov 18, 2020 30:54

This conversation covers: How Bloomberg is demystifying bond trading and pricing, and bringing transparency to financial markets through their various digital offerings. Andrey's role as CTO of compute architecture at Bloomberg, where he oversees research implementation of new compute related technologies to support kind of our business and engineering objectives. Why factors like speed and reliability are integral to Bloomberg's operations, and how they impact Bloomberg's operations . Andrey also talks about how they impact his approach to technology, and why they use cloud-native technology. How Andrey and his team use containers to scale and ensure reliability. Why portability is important to Bloomberg's applications. Bloomberg's journey to cloud-native. Some of the open-source services that Andrey and his team are using at Bloomberg. Unexpected challenges that Andrey has encountered at Bloomberg. Primary business value that Bloomberg has experienced from their cloud-native transition. Links Bloomberg Bloomberg GitHub Follow Andrey on Twitter Connect with Andrey on LinkedIn TranscriptEmily: Hi everyone. I'm Emily Omier, your host, and my day job is helping companies position themselves in the cloud-native ecosystem so that their product's value is obvious to end-users. I started this podcast because organizations embark on the cloud naive journey for business reasons, but in general, the industry doesn't talk about them. Instead, we talk a lot about technical reasons. I'm hoping that with this podcast, we focus more on the business goals and business motivations that lead organizations to adopt cloud-native and Kubernetes. I hope you'll join me.Emily: Welcome to The Business of Cloud Native, I'm your host Emily Omier. And today I'm chatting with Andrey Rybka from Bloomberg, thank you so much for joining us, Andrey.Andrey: Thank you for your invitation.Emily: Course. So, first of all, can you tell us a little bit about yourself and about Bloomberg?Andrey: Sure. So, I lead the secure computer architecture team, as the name suggests, in the CTO office. And our mission is to help with research implementation of new compute-related technologies to support our business and engineering objectives. But more specifically, we work on ways to faster provision, manage, and elastically scale compute infrastructure, as well as support rapid application development and delivery. And we also work on developing and articulating company's compute strategic direction, which includes the compute storage middleware, and application technologists, and we also help us product owners for the specific offerings that we have in-house. And as far as Bloomberg, so Bloomberg was founded in 1981 and it's got very large presence: about 325,000 Bloomberg subscribers in about 170 countries, about 20,000 employees, and more news reporters than The New York Times, Washington Post, and Chicago Tribune combined. And we have about 6000 plus software engineers, so pretty large team of very talented people, and we have quite a lot of data scientists and some specialized technologists. And some impressive, I guess, points is we run one of the largest private networks in the world, and we move about a hundred and twenty billion pieces of data from financial markets each day, with a peak of more than 10 million messages a second. We generate about 2 million news stories—and they're published every day—and then news content, we consuming from about 125,000 sources. And the platform allows and supports about 1 million messages, chats handled every day. So, it's very large and high-performance kind of deployment.Emily: And can you tell me just a little bit more about the types of applications that Bloomberg is working on or that Bloomberg offers? Maybe not everybody is familiar with why people subscribe to Bloomberg, what the main value is. And I'm also curious how the different applications fit into that.Andrey: The core product is Bloomberg Terminal, which is Software as a Service offering that is delivering diverse array of information of news and analytics to facilitate financial decision-making. And Bloomberg has been doing a lot of things that make financial markets quite a bit more transparent. The original platform helped to demystify a lot of bond trading and pricing. So, the Bloomberg Terminal is the core product, but there's a lot of products that are focused on the trading solutions, there is enterprise data distribution for market data and such, and there is a lot of verticals such as Bloomberg Media: that's bloomberg.com, TV, and radio, and news articles that are consumer-facing. But also there is Bloomberg Law, which is offering for the attorneys, and there is other verticals like New Energy Finance, which helps with all the green energy and information that helps a lot to do with helping with climate change. And then there's Bloomberg Government, which is focused on, specifically, research around government-specific data feeds. And so in general, you've got finance, government, law, and new energy as the key solutions.Emily: And how important is speed?Andrey: It is extremely important because, well, first of all, obviously, for traders, although we're not in high-frequency game, we definitely want to deliver the news as fast as possible. We want to deliver actionable financial information as fast as possible, so definitely it is a major factor, but also not the only factor because there's other considerations like reliability and quality of service as well.Emily: And then how does this translate to your approach to new technology in general? And then also, why did you think cloud-native might be a good technology to look into and to adopt?Andrey: So, I guess if we define cloud-native, a little because I think there's different definitions; many people think of containers immediately. But I think that we need to think of outside of not just, I guess, containers, but I guess the container orchestration and scaling elastically, up and down. And those, I guess, primitives. So, when we originally started on our cloud-native journey, we had this problem of we were treating our machines as pets if you know the paradigm of pets versus cattle where pet is something that you care for, and there's, like, literally the name for it, you take it to the vet if it gets sick. And when you use think of herd of cattle, there's many of them, and you can replace, and you have quite a lot of understanding of scalability with the herd versus pets. So, we started moving towards that direction because we wanted to have more uniform infrastructure, more heterogeneous. And we started with VMs. So, we didn't necessarily jump to containers. And then we started thinking like, “Is VMs the right abstraction?” And for some workloads it is, but then in some cases, we started thinking, “Well, maybe we need something more lightweight.” So, that's how we started looking at containers because you could provision them faster, and they could start off faster, and developers seem to be gravitating towards containers quite a bit because it's very easy to bootstrap your local dev environment with containers. And when you ship a container to the higher environment, it actually works. Used to be a problem where you developed on your local machine and you'd ship your code to production or higher environment, and it doesn't work because some dependency get missed. And that's where containers came about, to help with that problem.Emily: And then how does that fit in with your core business needs?Andrey: So, one of the big things is obviously, we need to ship products faster—and that's probably common to a lot of businesses—but we also want to ensure that we have highest availability possible, and that's where the containers help us to scale out our workloads and ensure that there's some resurrection happens with things like Kubernetes when something dies. And we also wanted to maximize our machine utilization. So, we have very large data centers and edge deployments—which I guess could be referenced as a private cloud—so we want to maximize utilization in our data centers. So, that's where virtualization and containers help quite a bit. But also, we wanted to make sure our workloads are portable across all the environments, from private cloud to the public cloud, or the edge. And that's where containerized technologies could help quite a bit. Because not only you can have, let's say Kubernetes clusters on-prem on the edge, but also, now all the three major cloud providers support a managed Kubernetes offering. And in this case, you have basically highly portable deployments across all the clouds, private and public.Emily: And why was that important?Andrey: Basically, we wanted to have, more or less, very generic way to deploy something, an application, right? And if you think of containers, that's pretty much, like I say, Docker is pretty standard these days. And developers, we were challenged with different package formats. So, if you do any application Ruby and Rails, or Java and Python, there is a native packages that you can use to package your application, distribute it, but it's not as uniformly support it outside of Bloomberg or even across various deployment platforms. But containers do get you that abstraction layer that helps you to basically build once and deploy many different targets in very uniform way. So, whether we do it on-premises, or to the edge, or to the public cloud, we can effectively use the same packaging mechanism. But not only for deployment, which is the one problem, but also for post-deployment. So, if we need to self-heal the workload. So, all those primitives are built there in the, I guess, Kubernetes fabric.Emily: But why is being portable important? What does it give you? What advantage? I mean, I understand that's one of the advantages of containers. But why specifically for Bloomberg, why do you care? I mean, are you moving applications around between public cloud providers, and—Andrey: So, we're definitely adopting public cloud quite a bit, but I guess what I was trying to hint is we have to support the private cloud deployments as our primary, I guess, delivery mechanism. But the edge deployments, when we actually deploy something closer to the customers, to your point about being faster, to deliver things faster to our customers, we have to deliver things to the edge, which is what I'm describing as something that is close to the customer. And then as far as the public clouds, we started moving a lot of workloads to public cloud, and that definitely required some rethinking of how do we want to adopt public cloud. But whether it's private or public, our main goal, I think, here is to make it easier for developers to package and deploy things and effectively, run faster, or deploy things faster, but also do it in more reliable way, right? Because it used to be that we could deploy things to a particular target of machines, and we could do it relatively reliably, but there was no auto-healing, necessarily, in place there. So, resilience and reliability wasn't quite as good as what we get with Kubernetes. And what I mentioned before, machine utilization, or actually ability to elastically scale workloads, and—within vertical and horizontal—vertical, we generally knew how to do that. Although I think with containers and VMs, you can do it much better to higher degree, but also horizontally, obviously, this was pretty challenging to do before Kubernetes came about. You had to bootstrap, even your bunch of VMs and different availability zones, figure out how you're going to deploy to them, and it just wasn't quite there as far as automation and ease of use.Emily: Let's change gears just a little bit to talk about a little bit of your journey to cloud-native. You mentioned that you started with VMs, and then you moved to containers. What time frame are we talking about? In addition to containers, what technologies do you use?Andrey: So, yeah, I guess we started about eight years ago or so, with OpenStack as a primary virtualization platform, and if you look at github.com/bloomberg, you will see that we actually open-sourced our OpenStack distribution, so anyone can look and see if, potentially, they can benefit from that. And so OpenStack provided the VMs, basic storage, and some basic Infrastructure as a Service concepts. But then we also started getting into object storage, so there was a lot of investment made into S3 compatible storage, similar to how it AWS's S3 object storage, or it's based on the [00:14:21 Sapth] open-source framework. So, that was our foundational blocks. And then, very shortly thereafter, we started looking at Kubernetes to build a general-purpose Platform as a Service. Because effectively developers generally don't really want to manage virtual machines, they want to just write applications and deploy them to the—somewhere, right, but they don't really care that much about the where I use Red Hat, or Ubuntu or the don't really care to configure proxies or anything like that. So, we started rolling out general-purpose Platform as a Service based on Kubernetes. So, that was with initial alpha release of Kubernetes; we already started adopting it. And then thereafter, we also started looking into how we can leverage Kubernetes for data science platform. Well, now we have a world-class data science platform that allows data scientists to train and run inference on the various large clusters of compute with GPUs. Then we quickly realized that on-prem, if we're building this on-prem, we need to have similar constructs to what you normally find in public cloud providers. So, as AWS add Identity Access Management, we started introducing that on-prem as well. But more importantly, we needed something that would be a discovery layer as a service, or if I'm looking for service, I need to go somewhere to look it up. And DNS was not necessarily the right construct, although it's certainly very important. So, we started looking into leveraging Consul as a primary discovery as a service. And that actually paid quite a lot of dividends and it's helped us quite a bit. We also looked into Databases as a Service because everything that I described so far was really good for stateless workloads, to a greater degree with Kubernetes, I think you can get really good at running stateless workloads, but for something's stateful, I think you needed something, basically, that will not run necessarily with Kubernetes. So, that's where we started looking at offering more Database as a Service, which Bloomberg has been doing this quite a bit before that. We open-sourced our core relational database called Comdb2. But we also wanted to offer that for MySQL, for Postgres, and some other database flavors. So, I think we have a pretty decent offering right now, which offers variety of Databases as a Service, and I would argue that you can provision some of those databases faster than you can do it on AWS.Emily: It sounds like—and correct me if I'm wrong, but it sounds like you've ended up building a lot of things in-house. I mean, you used Kubernetes, but you've also done a lot of in-house custom work.Andrey: Right. So, custom, but with the principle of using open-source. Everything that I described actually has an open-source framework behind it. And this open first principle is something that now this is becoming more normal. Before we build something in-house, we looked at open-source frameworks, and we look at which open-source community we can leverage that has a lot of contributors, but also, can we contribute back, right? So, we contribute back to a lot of open-source projects, like for example, Solr. So, we offer search as a service based on the Solr open-source framework. But also, we have Redis caching as a service, queuing as a service based on RabbitMQ. Kafka as a service, so distributed event streaming as a service. So, quite a few open-source frameworks. We're always thought of, “Can we start with something that's open-source, participate in the community, and contribute back?”Emily: And tell me a little bit about what has gone really well? And also what has been possibly unexpectedly challenging, or even expectedly, but it's always even more interesting to hear what was surprising.Andrey: I think, generally open-source first, as a strategy worked out pretty well. I think we have, I've listed only some of the services that we have in-house, but we certainly have quite a bit more. And the benefits, I think, I don't even know how to quantify it, but it certainly enabled us to go fast and deliver business value as soon as possible versus waiting for years before we build our alternative technology. And I think developer happiness also improved quite a bit because we started investing heavily into our developer experience and as a major effort. And this everything as a service makes it extremely easy for developers to deliver new products. So, all of the investment we've made so far paid huge dividends. Challenges: I think that as with anything, starting with open-source projects, you certainly have bugs and things like that. So, in this case, we preferred to partner with a company that effectively has inside knowledge into the open-source project so we can have at least for a couple of years, somebody who can help us guide us and potentially, we—by actually invest in actual money into the project, we get it to the point where it's mature enough and actually meets a certain quality criteria. And some of the projects we invested heavily in which many people don't know, probably, but we—like Chromium Project. So, many people use Chrome, but Bloomberg has been sponsoring Chromium and WebKit open-source development quite a bit. JavaScript, V8 Engine, even the newer technologies like WebAssembly we're heavily invested in sponsoring that. But again, one thing that it's very clear, it's not just we're going to be the consumers of the open-source, but we're going to be contributors back with either our developers helping on the projects, or we need to invest to help this actual open-source project we're leveraging to be successful, and not just by saying, like, Bloomberg consumes it, but actually investing back. So, that's one of the things that was a big lesson learned. But currently, I think we have a really good enough system in place where we always adopt open-source projects in a very conscious and serious way with investment going back into the open-source community.Emily: You mentioned being able to deliver business value sooner. What do you think are the primary two or three business values that you get from this cloud-native transition?Andrey: So, ability to go faster. That's one thing that's very clear. Ability to elastically scale workloads, and ability to achieve uniformity of deployments across various environments: private, edge, public, so we are able to deliver now products to our customers as they transition to the public cloud, for example, much faster because we're have a lot of standardized and a lot of technologists that helped us with adoption. And including Kubernetes is one of them, but not only Kubernetes. We also use Terraform, extensively, some other multi-cloud frameworks.And then also delivering things more reliably. That's I think, one of the things that is not always recognized, but I think reliability is a huge differentiator, and some of it has to do with how we deliver things to the customer with some resiliency and redundancy. So, we run very large private content delivery network as a service, and it's also based on open-source technologies. And the reliability is one of the main things that I would say we get from a lot of this technologies because if we do it on our own, yes, it would be generally Bloomberg working on this problem and solving it, but you get a, actually, a worldwide number of experts from different companies who're contributing back to this technologies, and I see this as a, obviously, a huge benefit because it's not just Bloomberg working on solving some distributed system framework, but it's actually people worldwide working on this.Emily: And would you say there's anything in moving to cloud-native that you would do differently?Andrey: I think what I see as the big challenge, especially with Kubernetes, is adoption of stateful workloads because I still think it's not quite there yet. Generally, the way we're thinking right now is we leverage Kubernetes for our stateless workloads, but some stateful workloads require some cloud-native storage primitives to be there, and this is where I think it's still not quite mature. You can certainly leverage various vendors for that, but I really would like to see better support for stateful workloads in the open-source world. And definitely still looking for a project to partner with to deliver better stateful workloads on Kubernetes. And I think, to a various degree, the public cloud providers, so hyperscalers are getting pretty good at this, but that is still private to them. So, whether it's Google, Amazon, or Azure, they deliver the statefulness to varying degree of reliability. But I would like this to be something that you can leverage on private clouds or anywhere else, and having it somewhere, well-supported through an open-source community would be, I think, hugely beneficial to quite a few people. So, Kubernetes, I think, is the right compute fabric for the future, but it still doesn't support some of the workload types that I would like to be there.Emily: Are any other continuing challenges that you're working on, or problems that you haven't quite solved yet, either that you feel like you haven't solved, maybe internally that might be specific to you, or that you just feel like the community hasn't quite figured out yet?Andrey: So, this whole idea of multi-cloud deployments, we leverage quite a few technologies from Terraform, to Vault, to Consul, to some other frameworks that help with some of it, but the day two alerting, monitoring, and troubleshooting with multi-cloud deployments is still not quite there. So, yes, you can solve it for one particular cloud provider, but as soon as you go to two, I think there's quite a few challenges that left unaddressed from just, like, single pane of glass—the view of all of your workloads, right. And that's definitely something that I would like to address: reliability, alerting across all the cloud providers, security across all the cloud providers. So, that's one of the challenges that I'm still working on—or, actually, quite a few people are working on at Bloomberg. As I said, we have 6000 plus talented engineers who are working on this.Emily: Excellent. Anything else that you'd like to add?Andrey: You know, I'm very excited about the future. I think this is almost like a compute renaissance. And it's really exciting to see all of these things that are happening, and I'm really excited about the future, I guess.Emily: Fabulous. Just a couple more questions. First of all, is there a tool that you feel like you couldn't do your job without?Andrey: Right. Yes, VI editor or [laugh] [00:27:42 unintelligible]? No. So, I think obviously, Docker has done quite a bit for the containerization. And I know, we're looking at alternatives to Docker at this point, but I do give Docker quite a bit of a credit because right now, local development environment, we bootstrap with Docker, we ship it as a deployment mechanism all over the place. So, I would say Docker, Kubernetes, and the two primary ones, but I don't necessarily want to pick favorites. [laugh]. I really like a lot of HashiCorp tools, you know Terraform, Consul, Vault, fantastic tools; a really good community. I really like Jenkins. We run Jenkins, the service; really good. Kafka has been extremely reliable and scalability-wise, Kafka is just amazing. Cache in Redis is really one of my favorite cache tools.There's probably a lot to mention. I've mentioned. [00:28:43 unintelligible] databases, Postgres is one of my favorite databases, or in so many varieties and different types of workload. But we also gain quite a lot from Hadoop and HBase. But one of my favorite NoSQL databases is Cassandra, an extremely reliable, and the replication across, I guess, low-quality bandwidth and of environment has been really awesome. So, I guess I'm not answering with just one but many tools, but I really like all of those tools.Emily: Excellent. Okay, well, just the last question is, where could listeners connect with you or follow you?Andrey: I am on Twitter as @andrey_rybka. I'm happy to get any direct messages. We are hiring. We're always hiring A lot of great opportunities. As I said, we're open-source first company these days, and we definitely have a lot of exciting new projects. I haven't mentioned even probably 90 probably of other exciting projects that we have. We also have github.com/bloomberg, so you're welcome to browse and look at some of the cool open-source projects that we have as well.Emily: Excellent. Cool. Well, thank you so much for joining me.Andrey: Thank you very much.Emily: Thanks for listening. I hope you've learned just a little bit more about The Business of Cloud Native. If you'd like to connect with me or learn more about my positioning services, look me up on LinkedIn: I'm Emily Omier—that's O-M-I-E-R—or visit my website which is emilyomier.com. Thank you, and until next time.Announcer: This has been a HumblePod production. Stay humble.

How Systematic Approaches Cloud-Native with Thomas Vitale

The Business of Open Source

Play Episode Listen Later Nov 11, 2020 23:07

This conversation covers: An average workday for Thomas as senior systems engineer at Systematic. How Systematic uses cross-functional collaboration to solve problems and produce high quality software. How security and data privacy relate to cloud-native technologies, and the challenges they present. Systematic's journey to cloud native, and why the company decided it was a good idea. Why it's important to consider the hidden costs and complexities of cloud-native before migrating. What makes an application appropriate for the cloud, and some tips to help with making that decision. The biggest surprises that Thomas has encountered when moving applications to cloud-native technology. Thomas's new book, Cloud Native Spring in Action, which is about designing and developing cloud-native applications using Spring Boot, Kubernetes, and other cloud-native technologies. Thomas also talks about who would benefit from his book. Thomas's background and experience using cloud-native technology. The biggest misconceptions about cloud-native, according to Thomas. Links Systematic Cloud Native Spring in Action book Thomas Vitale personal website Follow Thomas on Twitter Connect with Thomas on LinkedIn TranscriptEmily: Hi everyone. I'm Emily Omier, your host, and my day job is helping companies position themselves in the cloud-native ecosystem so that their product's value is obvious to end-users. I started this podcast because organizations embark on the cloud naive journey for business reasons, but in general, the industry doesn't talk about them. Instead, we talk a lot about technical reasons. I'm hoping that with this podcast, we focus more on the business goals and business motivations that lead organizations to adopt cloud-native and Kubernetes. I hope you'll join me.Emily: Welcome to The Business of Cloud Native. I'm your host, Emily Omier, and today I'm chatting with Thomas Vitale. Thomas, thanks so much for joining us.Thomas: Hi, Emily. And thanks for having me on this podcast.Emily: Of course. I just like to start by asking everyone to introduce themselves. So, Thomas, can you tell us a little bit about what you do and where you work, and how you actually spend your day?Thomas: Yes, I work as a senior systems engineer at Systematic. That is a Danish company, where I design and develop software solutions in the healthcare sector. And I really like working with cloud-native technologies and, in particular, with Java frameworks, and with Kubernetes, and Docker. I'm particularly passionate about application security and data privacy. These are the two main things that I've been doing, also, in Systematic.Emily: And can you tell me a little bit about what a normal workday looks like for you?Thomas: That's a very interesting question. So, in my daily work, I work on features for our set of applications that are used in the healthcare sector. And I participate in requirements elicitation and goal clarification for all new features and new set of functionality that we'd like to introduce in our application. And I'm also involved in the deployment part, so I work on the full value stream, we could say. So, from the early design and development, and then deploying the result in production.Emily: And to what extent, at Systematic, do you have a division between application developers and platform engineers, or however else you want to call them—DevOps teams?Thomas: In my project, currently, we are going through what we can call as maybe a DevOps transformation, or cloud transformation because we started combining different responsibilities in the same team, so in a DevOps culture, where we have a full collaboration between people with different expertise, so not only developers but also operators, testers. And this is a very powerful collaboration because it means putting together different people in a team that can bring an idea to production in a very high-quality way because you have all the skills to actually address all the problems in advance, or to foresee, maybe, some difficulties, or how to better make a decision when there's different options because you have not only the point of view of a developer—so how is better the code—but also the effects that each option has in production because that is where the software will live. And that is the part that provides value to the customers. And I think it's a very important part. When I first started being responsible, also, for the next part, after developing features, I feel like I really started growing in my professional career because suddenly, you approach problems in a totally different way. You have full awareness of how each piece of a system will behave in production. And I just think it's, it's awesome. It's really powerful. And quality-wise, it's a win-win situation.Emily: And I wanted to ask also about security and data privacy that you mentioned being one of your interests. How do those two concepts relate to cloud-native technologies? And what are some of the challenges in being secure and managing data privacy specifically for cloud-native?Thomas: I think in general, security has always been a critical concern that sometimes is not considered at the very beginning of the development process, and that's a mistake. So, the same thing should happen in a cloud-native project. Security should be a concern from day one. And the specific case of the Cloud: if we are moving from a more traditional system and more traditional infrastructure, we have a set of new challenges that have to be solved because especially if we are going with a public cloud, starting from an on-premise solution, we start having challenges about how to manage data. So, from the data privacy point of view, we have—depending also on the country—different laws about how to manage data, and that is one of the critical concerns, I think, especially for organizations working in the healthcare domain, or finance—like banks. The data ownership and management can really differ depending on the domain. And in the Cloud, there's a risk if you're not managing your own infrastructure in specific cases. So, I think this is one of the aspects to consider when approaching a cloud-native migration: how your data should be managed, and if there is any law or particular regulation on how they should be managed.Emily: Excellent. And can you actually tell me a little bit about Systematic's journey to cloud-native and why the company decided that this was a good idea? What were some of the business goals in adopting things like Docker and Kubernetes?Thomas: Going to the Cloud, I think is a successful decision when an organization has those problems that the cloud-native technologies attempt to solve. And some goals that are commonly addressed by cloud-native technologies are, for example, scalability. We gain a lot of possibilities to scale our applications, not only in terms of computational resources, and leveraging the elasticity of the Cloud, so that we can have computational units enabled only when needed. So, if there is, for example, an high workload on the application, and then scale down if it's not needed anymore, and that also results in cost optimization, but also scaling geographically. So, with the Cloud, it's more approachable to start a business that has a target in different countries and different continents because the Cloud lets you use different technologies and features to reach the users in the best way possible, ensuring performance and high availability. Something else related to that is resilience. Using something like Kubernetes and proper design practices in the applications, we can achieve resiliency in our infrastructure at a level that is not possible with traditional technologies. And then we have speed. Usually, a cloud-native transformation is accompanied by starting using practices like continuous delivery and DevOps, that really focus on automating and putting together different skills so that we can go faster in a more agile way and reduce the time to market. That's also a very important point for organizations.Overall, I think there's a part of cost optimization, but at the same time, we should be careful because there are some hidden costs that sometimes are not considered fully. And that is about educating people to use new cloud-native technologies. We have some paradigm shift because some practices that were well-consolidated and used with traditional applications are now not used anymore, and it takes time to switch to a different point of view and acquire the skills required to operate cloud-native infrastructures and to design cloud-native applications. So, to make a decision about whether cloud-native migration is a good idea, I recommend to consider these hidden costs as well, not only the advantages but the hidden costs and the overall complexity, if you think about something like Kubernetes. For some applications, the Cloud is just not the right solution.Emily: What would you say is a type of application that is not appropriate for the Cloud?Thomas: As an application is not actively developed with new features, but it's in a pure maintenance phase and that is fully reaching the goals for its users. If it doesn't need to scale more than what it does today, if it doesn't need to be more resilient because maybe high availability is not that important or critical, then maybe going to the Cloud is not the right solution because you would add up complexity and make things actually harder to maintain. That is one scenario that I could think of.Emily: So, basically, if it's not broken, don't try to fix it.Thomas: Yes. So, cloud-native is the answer to a specific set of problems. So, if you don't have those problems, so maybe cloud-native is not for you because it's solving different problems.Emily: So, would you say, the problems are if you need to really iterate quickly, develop new features, ship them out to customers?Thomas: Yeah, if you don't need this agility because you're not actively developing features, or if it's small things that don't require so much speed or scalability power, then might not be a good idea. Or even if you don't have the resources, to acquire the skills required to design cloud-native applications and to manage cloud-native infrastructure.Emily: What have been the biggest surprises as you've been using cloud-native technology and moving applications to cloud-native technology?Thomas: So, I always think about how logging works. I think it's a fun anecdote that when you move to a cloud-native application, usually logging is managed through files in a traditional application where we set up rules to store those files. But usually in a cloud-native setting, like in a Kubernetes environment, we have the platform taking care of aggregating logs from different applications and systems. And these applications are providing these logs as events in the standard output. So, there's no files. So, one of the first questions for developers when moving to the Cloud is like, “Okay, now where's the log files?” Because when something goes wrong, the first thing is, “Let me check the log file.” But there's no log file anymore, and this is just a fun aspect. But in general, I think there's a whole new way of thinking about applications, especially if we're talking about containerized applications. So, considering Java applications, for example, they're traditionally packaged in a way that needs to be deployed on an application server like Tomcat. Now, we don't have that anymore, but we have a self-contained Java application packaged as JAR in a container. So, it's self-contained, it contains all the dependencies that are needed, and it can run on any environment where we have a container engine working, like Docker. So, developers take on more responsibilities than before because it's not only about the application itself, but it's also about the environment where the application needs to be deployed; that is actually part of the container now. So, we can see a flow of responsibilities that is different than before.Emily: And do you think that that surprises a lot of organizations, that having developers need to take on more responsibilities is something that's not anticipated?Thomas: Sometimes, maybe it's not anticipated. And usually, it's because when considering this migration, we don't consider those aspects of acquiring new skills, or bringing in, maybe, some consultants to help during the migration to help with all those practicalities, that from a high-level point of view, are not very visible.Emily: Excellent. Tell me a little bit—I know you just wrote a book, so I was wondering if you could talk a little bit about your book and what inspired you to write it.Thomas: The book is about designing and developing cloud-native applications using mainly Spring Boot, and Kubernetes, plus all the great cloud-native technologies that are available. The goal is to teach techniques that can be immediately applied to real projects, so to enterprise-grade applications, as much as possible. The cloud-native landscape is so complex, it's so huge that it's impossible to cover everything, but I made a selection of all the aspects that I consider important and my goal is to try as much as possible to do things like I would do in a real application. So, what I would do daily in my job, so considering all those aspects that sometimes are not considered when teaching new technologies or new features, like for example, security. By the end of the book, the reader will have deployed a cloud-native system composed of different applications and services on a real Kubernetes cluster in a public cloud service. And besides this, I also aim at navigating these cloud-native landscape because when we look at this famous landscape picture on the website from the Cloud Native Computing Foundation, it can be really overwhelming, both for new developers, but also for experienced developers that are experiencing different techniques—maybe, you come from a more traditional practice and want to switch to cloud-native, it can be really overwhelming. So, I hope that with this book, I can also help the reader navigating this landscape. The idea for the book, I got it back in January, February have been conducted by the Manning publisher, and about spring, we started considering different ideas. And I was really researching into the cloud-native landscape, in particular, how to use Spring and all related technologies to build strong native applications, so I decided to propose a book about that, that have these two main characteristics of teaching, as much as possible, real-world examples and techniques, and also help the reader navigating the landscape. So, it will not contain everything about cloud-native because that would require several books, but I think it's a good primer for the field.Emily: Excellent. And just, actually, a couple, sort of, basic questions I forgot to ask at the beginning, like how long have you been working with cloud-native technology? When did Systematic start using it and moving applications to cloud-native?Thomas: In Systematic, we have different projects and products. So, depending on that, some teams have started even several years ago, and in my project right now, it's quite recent. But I've been working with Spring for, I think, five years; also with Docker. And it's my favorite set of technologies because I think provides a lot of features to solve many different problems. I also, in my spare time when I have time, I like to contribute to the Spring projects on GitHub. It's a great community, I think.Emily: Fabulous, and who do you think would benefit most from reading your book?Thomas: I think it would benefit experienced developers, back end developers with experience in developing web applications in a more traditional way that would like to move to cloud-native, either because their organizations are doing the migration, or because maybe they would like to understand more how it works, or they would like to find a job in that field. But also for junior developers that have some experience with application development, so some basic experience with Spring and Spring Boot, but would like to take the next step towards the cloud-native world also understanding what is that about, and what, actually, cloud-native means.Emily: What do you think is the biggest misconception about cloud-native?Thomas: I think there's two misconceptions that usually, I find that is… thinking that cloud-native is about containers and that cloud-native is about microservices. So, for the first case, containers are used a lot for cloud-native applications, of course. They're used directly when working at the Kubernetes level, for example, but are used also when leveraging platforms like Heroku, or CloudFoundry. In that case, it's not the developer building the container, but it's the platform itself, but still, they are used. But cloud-native can also be applied to other different technologies, like functions, for example, in the serverless world. So, functions are not containers. So, in that way, I think it's a bit misleading to define cloud-native as containers. I think that containers is one of the technologies and implementations used for developing large native applications. But it's not the definition of cloud-native.And microservices, also. I think it's wrong to imply that cloud-native means microservices because cloud-native applications are distributed systems. Of course, they can be microservices; it's a very used architectural style in the Cloud world, but I don't think that it's a definition. Also, the CNCF previously had a definition for cloud-native technologies that was actually based on containers and microservices, and then they changed that. So, now they are listed as examples exactly because that is not the definition.Emily: Fabulous. What about open-source, what do you think open-source and cloud-native's relationship is?Thomas: One example above all, like Kubernetes, that is open-source. It, I think, is the most popular project on GitHub with the highest number of contributions, if I'm not wrong. And I think that just explains everything about cloud-native because given that this is such a core project, and it's also the project that started all for the Cloud Native Computing Foundation, and it's a, now, very wide landscape of technologies. Open-source, I think it's a very important part of it because it allows contributions from people around the world with different set of skills. And we're not talking just about coding, but also other aspects, like, for example, technical documentation. I know that lately there have been several contributions, for example, to the technical documentation for Kubernetes to improve the documentations and help people approaching these technology and understand the more complex topics. And from a security point of view as well, from a testing point of view, I think that open-source technologies open many possibilities.Emily: Great. Well, I just have a couple last questions for you. The first one I like to ask all of my guests, what is a engineering tool that you can't do your job without?Thomas: That's a very difficult question because I use so many tools. But I will say if I had to choose one, I would probably say my terminal window.Emily: Excellent. And then, how can listeners connect with you or follow you? And in fact, where should they go to buy your book?Thomas: So, they can find me on my website, thomasvitale.com, where I also write blog post about Spring and security. I'm also active on Twitter. My handle is @vitalethomas and on LinkedIn. But on my website, they can find all my contacts there. I repeat, thomasvitale.com. And the book is available on manning.com. That is the name of the publisher, so they can find it there. The title of the book is Cloud Native Spring in Action: With Spring Boot and Kubernetes.Emily: Excellent. Well, thank you so much, Thomas, for coming on the show and chatting.Thomas: Thank you.Emily: Thanks for listening. I hope you've learned just a little bit more about The Business of Cloud Native. If you'd like to connect with me or learn more about my positioning services, look me up on LinkedIn: I'm Emily Omier—that's O-M-I-E-R—or visit my website which is emilyomier.com. Thank you, and until next time.Announcer: This has been a HumblePod production. Stay humble.

business action spring open security cloud danish real world approaches manning java github devops systematic docker kubernetes jar vitale cloud native tomcat heroku cncf spring boot cloud native computing foundation emily omier thomasso thomas thank humblepod

The Importance of OSPO with Nithya Ruff

The Business of Open Source

Play Episode Listen Later Oct 14, 2020 35:47

The conversation covers: The main function of an OSPO, and why Comcast has one. How Nithya approaches non-technical stakeholders about open-source. Where the OSPO typically sits in the organizational hierarchy. The risk of ignoring open-source, or ignoring the way that open-source is consumed in an organization. Why every enterprise today is using open-source in some way or another. The relationship between cloud-native and open-source. Some of the major misconceptions about the role of open-source in major companies. Common mistakes that companies make when setting up OSPOs. Why Nithya and her team rely heavily on the TODO Group in the Linux Foundation. Links: Comcast: https://www.xfinity.com/ Linux Foundation: https://www.linuxfoundation.org/ TODO Group and The New Stack survey: https://thenewstack.io/survey-open-source-programs-are-a-best-practice-among-large-companies/ Trixter GitHub: https://github.com/tricksterproxy/trickster Kuberhealthy GitHub: https://github.com/Comcast/kuberhealthy Comcast GitHub: https://comcast.github.io/ Nithya Ruff Twitter: https://twitter.com/nithyaruff TranscriptEmily: Hi everyone. I'm Emily Omier, your host, and my day job is helping companies position themselves in the cloud-native ecosystem so that their product's value is obvious to end-users. I started this podcast because organizations embark on the cloud naive journey for business reasons, but in general, the industry doesn't talk about them. Instead, we talk a lot about technical reasons. I'm hoping that with this podcast, we focus more on the business goals and business motivations that lead organizations to adopt cloud-native and Kubernetes. I hope you'll join me.Emily: Welcome to The Business of Cloud Native, my name is Emily Omier, and today I'm chatting with Nithya Ruff, and she's joining us from the open source program office at Comcast. Nethya, thank you so much for joining us.Nithya: Oh, it's such a pleasure to be here, Emily. Thank you for inviting me.Emily: I want to start with having you introduce yourself, you run an open source program office. And if you could talk a little bit about what that is, and what you do every day.Nithya: So, just to introduce myself, I started working in open-source back in 1998, when open-source was still kind of new to companies and organizations. And from that point on, I've been working to build bridges between companies using open-source and communities where open-source is created. At Comcast, I have the pleasure of running our open source program office for the company, and I also sit on the board of the Linux Foundation and recently was elected chair. So, it gives me a chance to both look at the community side through the LF and through corporate use of open-source at Comcast.So, you also ask what does an OSPO do? What is an OSPO, and why does Comcast have one? So, an open source program office is a fairly new construct, and it started about 10, 11 years ago, when companies were doing so much open-source that they really couldn't keep track of all of the different areas of open-source usage, contribution, collaboration across their companies. And they felt that they wanted to have a little more coordination, if you will, across all of their developers in terms of policy for use, the process for contribution, and some guidelines around how to comply with open-source licenses and, on a more strategic note, to educate both executives as well as the company in terms of open-source and opportunities from a business engagement and a strategy perspective. So, you find that a lot of large companies typically have open source program offices. And we, frankly, have been using open-source for a very long time as a company, almost since the turn of the century, around 2005. And we started contributing and our number of developers started growing, and we didn't realize that we needed a center of excellence, which is what an open source program office is, where people can come to ask for help on legal matters—meaning compliance and license matters—ask for help in engaging with open-source communities, and generally come for all things open-source; be kind of a concierge service for all things open-source.Emily: And how long has Comcast had an OSPO?Nithya: I came on board in 2017 to start the OSPO, but as I mentioned before, we've done open-source organically throughout the company for many, many more years before I came on board. My coming on board just, kind of, formalized, if you will, the face of open-source work for the company to the outside world.Emily: You know, when we think about open-source in the enterprise, what sort of business opportunities and risks do you have to balance?Nithya: That's a great question. There are lots and lots of great business value and opportunity that companies get from open-source. And the more engaged you are with open-source, the more business value you'll get. So, if you're just consuming open-source, then clearly it reduces the cost of your development, it helps you get to market faster, you're using tried and tested projects that other companies have used and hundreds of developers around the world have used. So, you get a chance to really cut cost and go to market faster. But as you become more sophisticated in collaborating with other companies and contributing open-source back, you start realizing the benefit of, say leveraging a lot of other developers in maintaining code that you've contributed. You may start off at contributing a project, and you are often the only one bearing the burden of that project, and very soon, as it becomes useful to more and more people, you're sharing the burden with others, and you benefit from hundreds of new use cases coming into the code, hundreds of new features and functions coming in which you could never have thought of as a small team yourself. I believe that the quality of code improves when you're going to open-source something, it helps with recruitment and thought leadership because now candidates can actually see the kind of work that you do and the quality of work that you produce, and before that, they would just know that you were in this space, or telecom, or other areas, but they could not see the type of work that you did. And so, to me, from a business value, there's a tremendous amount of business value that companies get. On the risk side is the fact that you need to use it correctly, meaning you need to understand the license; you need to understand how you're combining your code with the proprietary code in your company; you need to understand if the code is coming from a good community, meaning a healthy community that is here to stay, and that has a good cadence of releases and is vibrant from an activity perspective; you need to also understand that you need to be engaged with the open-source community and understand where that particular project is going and to be able to sit at the table to influence or contribute to the positive direction of that project, and sustainability of that project. So, if you just consume and don't engage, or don't understand the license implications or contribute, I think you're not getting all of the value and you risk being considered a poor citizen in the community. And frankly, if people don't collaborate with you or cooperate with you, sending a patch upstream may take months to be accepted, as opposed to someone who's part of the community, who's accepted, who's seen as a good citizen. So, I think you've got to invest correctly through either an open source program office or a really intentional and thoughtful program to engage with the community in order to really mitigate risk, but also get the full benefit of working with open-source.Emily: And what do you find you have to educate the non-engineering stakeholders about, so the business leadership when you're talking about open-source?Nithya: That's also a very important function of an OSPO in my mind, is really making sure you have executive sponsorship and business buy-in for why open-source is a key part of the innovation process in the company. Because as you correctly said, there is a level of investment one needs to make, whether it is in an OSPO or in the compliance function, or for engineers to take the time to upstream their patches or to engage with communities. It all takes investment of time and money. And business needs to buy into why this is a benefit for the company, why this is a benefit for the business. And very often, I find that leadership gets it. In fact, some of my best sponsors and champions are executives. Our CTO, Matt Zelesko, completely gets why open-source is important for the business innovation, competitive advantage. And so, also my boss Jon Moore gets it. And I found that in a previous company where I was starting an open source program office, I had to work a little harder because it was a hardware company and they did not understand how working in open-source would fit into the engineering priorities. And so we had to, kind of, share more about how it allowed us to optimize software for our hardware, how it allowed us to influence certain key dependencies that we had in our product process, and that customers were asking for more open-source based software on our product. So, yes, building the business case is extremely important, and having sponsors at the business is extremely important. The other key constituent is legal, and working with your legal team hand-in-hand, and understanding their role in assessing risk and sharing risk with you, and your role as a business saying, “Is this an acceptable risk that I want to take on? And how do I work with this risk, but still get the benefits?” is as important. We have a great legal team here, and they work very closely with us, so we act as the first line of questions for our developers. And should they have any questions about, “Should I use this license? Or should I combine this license with this?” we then try to give them as many answers as we can, and then we escalate it to legal to bring them into the discussion as well. So, we act as a liaison between us and legal. So, to your point, it is important for the business to understand. And the OSPO does a great job in many of my companies that I've worked with to educate and keep business informed of what's happening on the open-source side.Emily: You mentioned working with developers; what is the OSPO's relationship with the actual developers on the team?Nithya: So, we don't have many developers on our team. In my OSPO, I have one developer who helps us with the automation and functioning of the open source program office tools and processes. Most of us are program managers, and community managers, and developer relations managers. The developers are our customer. So, I think of the developers in Comcast as our customer, and that we are advocates for them. And their need to use open-source in a frictionless way in their development process as our objective. So, we've worked very, very hard to make sure that the information they need, the processes they need are well oiled, and that they can focus on their core priority, which is getting products to market to really help our customers. And they don't need to become experts at compliance, they don't need to become experts at any of the functions that we do. We see them as our customer, so we act as advocates.Emily: Where exactly in the organizational hierarchy, or structure is the OSPO? Is it part of the engineering team?Nithya: Yes. I think that is the best place for an OSPO reside because you really are living with the engineering organization, and you're understanding their pain points, and you're understanding the struggles that they have, and what they need to accomplish, and their deadlines, et cetera. So, we live in the product and technology organization, under our CTO, who's also part of the engineering organization. So, I find that the best OSPOs typically reside in engineering or the CTO office, there are some that reside in legal or marketing. And whenever you decide, it tends to flavor the focus of your work. For us, the focus of our work is how can we help our developers be the best developers and use the best open-source components and techniques to get their work done?Emily: And what do you see is the risk that organizations take if they ignore open-source, they don't have this, sort of, conscientious investment in either an OSPO or some other way to manage the way open-source is consumed?Nithya: This is how I would put it. Everything that you see in technology development today, a lot of the software that we consume, whether it's from vendors, or through the Cloud—public cloud, private cloud—is made up of open-source software. There's a ton—I would say, almost 50, 60 percent of infrastructure software, especially data center, cloud, et cetera, is often open-source software. So, if you don't know the dependencies you have, if you don't know the stack that you're using and what components you have, you're working blindly. And you don't know if one of those stack's components is going to go away or going to change direction. So, you really need to be cognizant of knowing what you're using, and what your dependencies are and making sure that you're working with those open-source communities to stay on top of your dependencies. You're also missing out on really collaborating with other companies to solve common problems, solve them more effectively, more collaboratively. It's a competitive advantage, frankly, and if you don't intentionally implement some sort of an OSPO, or at least someone is tagged with directing OSPO type of work in the company, you're missing out on getting the best benefits of open-source.Emily: Do you think there are any enterprises that don't use open-source?Nithya: No. I believe that every single enterprise, knowingly or unknowingly, have some amount of open-source in their product, or in their tools, or in their infrastructure somewhere.Emily: And what percentage of enterprises have—this is obviously just going to be your best guess, but what percentage of enterprises have an OSPO?Nithya: I think it's a small percentage. New Stack and the TODO Group do a very, very good survey. I would refer us to that survey. And that gives you a sense of how many companies have an OSPO. I believe it's something like 45, 50 percent have OSPOs, and then another 10, 15 percent, say we intend to start one in the next two to three years. And then there's another, I don't know 30 percent that say, I have no intention of starting one. And the reason may be because they have a group of volunteers or part-time people across their organization who are fulfilling those functions between their legal team, and a couple of expert developers, and their communications team, they may think that they have solved the problem, so they don't need to have a specialized function to do this.Emily: I wanted to ask a little bit about the relationship between cloud-native and open-source. What do you see as that relationship?Nithya: If you ask anyone—and this is my opinion as well—that cloud-native technologies are very open-source-based. Look at Kubernetes, or Prometheus, or any of the technologies under the CNCF umbrella, or under any of the cloud-native areas, you find that most of them have their roots, or are created in the open-source way of development. So, it is an integral part of participation in cloud-native is knowing how to collaborate in an open-source way. So, it makes a lot of sense that CNCF is under the Linux Foundation, and it operates like an open-source project with governance, and technical body, and contributors. So, for us as well, being a cloud company—or a company that uses Cloud to host our infrastructure, and also a user of public cloud, we think that knowledge of open-source and how to work with open-source helps us work more effectively with the cloud ecosystem. And we have contributed components like Trickster, which is a Prometheus dashboard acceleration component. We've also contributed something called Kuberhealthy, which allows you to really orchestrate across Kubernetes clusters to open-source because we know that that's the way to function, and influence, and if you will, kind of take advantage of the ecosystems in the cloud-native technology stack. So, cloud-native is all built on open-source. So, that's the relationship in my mind.Emily: Yeah. I mean, I think actually, the Linux Foundation defines cloud-native as built on open-source software. I forget the exact words.Nithya: Yep. I think so, too.Emily: What do you think are some misconceptions out there, particularly among the enterprise users, about open-source and about the role of open-source in a major company?Nithya: There are a number of misconceptions. And we talked and touched upon a few before, but I think it's worth repeating it because you need to confront these misconceptions and start engaging with open-source if you want to compete with the other companies in your industry, who all are becoming digital companies and are digitally transformed. And they need to work with open-source as part of their digital framework. So, one of the misconceptions is that vendor-supplied software or products don't have any open-source in them. In fact, a lot of vendor-supplied software, maybe even from Microsoft has some open-source in them. Even from Apple, for example. If you look at the disclosure notices, you'll see that all of them consume open-source. So, whether you like it or not, there is open-source and you need to understand and manage it. The second is not knowing what your engineers are downloading and using, and hence what you're dependent upon as a company, and whether those components are healthy, and whether those communities are doing the right thing. You need to understand what you're using. It's like a chef: you need to know your components, and the quality of the food that you create will depend upon the components you use. You'll also need to understand licenses and watch needed to comply with those licenses, and need to put process in place to comply with those licenses. You also need to give back; it's not enough to just consume and not contribute back things like bug fixes, patches, and changes you make because you end up carrying all of that load with you as technical debt if you don't upstream it. And, frankly, you also consume, so you should give back as well. It's not sufficient to just take but not give. The last one is that open-source is free. And so, many people are attracted to open-source because they think, “Ah. I don't have to pay any license fees. I can just get it, I can run it anywhere I want, and I can change it,” et cetera. But the fact of the matter is if you want to use it correctly, you do need to invest in a team that knows how to support itself, knows how to work with the community to get patches or make change happen, you need to build that knowledge in the house, and you do need to have some cost of ownership associated with using open-source. So, these are some of the major misconceptions that I see in companies that are not engaging with open-source.Emily: And what do you see as, in your experience, some of the mistakes that companies can make, even when they're in the process of setting up an OSPO? What have you learned—maybe what mistakes have you made that you wish you could go back in time and undo—and what advice would you give to somebody who was thinking about setting up an OSPO?Nithya: Couple of mistakes that come to mind is releasing a piece of code that's not been well thought through, or properly documented, or with the correct license. And you find that you get a lot of criticism for poor quality code, or poorly released projects. You end up not having anyone wanting to work with the project or contributing to the project, so the very intent of getting it out there so that others could use and collaborate with you is lost. And then sometimes companies have also made announcements saying that they want to release a particular piece of software, and they backtrack and they change their mind and they say, “No, we're not going to release it anymore.” And that looks really poor in the community because there are people who are depending upon it or wanting it, and it can affect the reputation of a company. There was one more thing which I was going to say is, is really not being a good player. For instance, keeping a lot of the conversations inside the company, in terms of governing a project or roadmap for a project, and not being transparent and sharing the direction of the project or where it's going with the community. For an open-source project, is really bad. It can affect how you're perceived and how you're trusted or not trusted in the community. So, it's important to understand the norms of open-source, which is transparency, collaboration, contribute small pieces often, versus dumping a big piece of code or surprising the community. So, all of these things are important to consider. And, frankly, an OSPO, helps you really understand how community behaves: we often do a lot of education on how to work with community inside the company, and we also represent the company's interests in communities and foundations and say, “This is where we are going. This is where we need your help.” And the more transparent you are, the better you can work with community. So, those are some areas where I've seen companies go wrong.Emily: And when a developer who works at Comcast contributes to a project is he or she contributing as an individual or as part of the company? And how is that, sort of, almost, tension navigated?Nithya: Most companies have a policy that any work that you do during your workday or on work equipment, is company property, right? And so it's copyrighted as Comcast, and most of our developers will contribute things under their Comcast email id. And that's fairly normal in the industry. And there are times when developers want to do work on their own time for their own pet projects, and they can do it under their personal emails and their personal equipment. So, that's where the industry draws the line. Of course, there are some companies that are very loose about this type of demarcation, and some companies are incredibly tight depending upon the industry they're in, regulated versus high tech. But we are very encouraging of our developers to contribute code, whether on their own time or during company time, and we make the process extremely easy. We have a very lightweight process where they submit a request to contribute, and the OSPO shepherds that contribution through legal, through security, through other technical reviewers, and all in the interest of making sure we provide guardrails for the developer so that he or she does it successfully and looks good when they make the contribution. So, 95 percent of the time, we approve requests for contributions. So, very, very rarely do we say, “This is not approved,” because we think it's the right thing to do to give back and to share some of the work that we do with others, just like we get the benefit of using others' work.Emily: Is there anything else that you want to add about what OSPOs do, what they bring to the business, the relationship between cloud-native and open-source, anything that I haven't thought to ask that I should have?Nithya: The OSPO, if you will, is a horizontal function that cuts across the entire enterprise development and helps coordinate and direct the intelligent and judicious use of open-source. So, that's why it touches all of the software development tools, apps, vendor-supplied software, public clouds, internal clouds, et cetera. Wherever open-source is used, which is everywhere, we touch it. And we also serve as the external face of the company to the open-source community so that the open-source community has one place that they can come to for questions, or to give feedback on something they're doing, or to ask a license question, or to ask for sponsorship or support for a conference or a foundation. So, it really makes open-source navigation very, very effective for the community, as well as for inside the company. So, I'm a huge, huge fan of OSPO. I also love running an OSPO, and the kind of people that are typically in an OSPO. They tend to be very versatile, very general, they can pivot from legal to development matters to marketing and communications to really assisting a developer navigate something challenging. So, they're very versatile and terrific type of people. They also tend to have very high EQ and tend to make sure that they have a service mentality when they take care of questions that come in. So, I would say an OSPO is a great role for someone who wants to help and wants to know the breadth of software development.Emily: It sounds like you're making a recruitment pitch.Nithya: Uh, yeah. I don't have any openings right now, but I'm always encouraging and mentoring other OSPOs. I do at least one or two consultations with other OSPOs because we enjoy what we do as an OSPO and we want to help other OSPOs be successful.Emily: I mean, is it hard to find people to work in OSPOs?Nithya: It's kind of hard, in the sense that there are not too many people who do this work. So, I know, practically, I know all of the OSPO leadership and people who do this line of work in the industry. And it takes—some who come from a developer background. They have grown up as a developer using open-source and know the pains that they faced inside their company using open-source and not having certain processes or certain support or tools, and they go out to change the world in that way. I came from a different direction. I came from strategy and product management, and I came with the notion of, “How do I connect the dots better across the organization? How do I make sure that people know what to do and how to build relationships?” So, I came from that perspective. Frankly, I think it's something that innately people have, which is the ability to absorb a lot of different types of knowledge and connect the dots and work to change things. You don't have to be born in open-source to be a good OSPO person. You just need to have a desire to help developers.Emily: Was there any tools—it doesn't, obviously, have to be a software development tool, but any tools that you could not do your work without?Nithya: More than tools, I would say the organization that we rely on very heavily is the TODO Group in the Linux Foundation because it is a group of other OSPO people. And so it's been a great exchange of ideas and support, and tips, and best practices. The couple of tools that we use very, very heavily, and the love using, clearly, is something like GitHub or GitLab which helps you coordinate and collaborate on software development and documentation, et cetera. The other tool we use a lot to build community inside the company is things like Slack, or Slack equivalents because it helps you create communities of interest. So, when we are doing something around CNCF, we have a CNCF channel. We have a very, very large open-source channel that people come in and ask questions, and the whole community gets involved in helping them. So, I would say those are two really good tools that I like, and we use a lot in our function. And the TODO Group I think is a fabulous organization.Emily: And where should listeners go to learn more and/or to follow or connect with you?Nithya: There are two places I would say. comcast.github.io is where we publish all of our open-source projects, and you can see the statement we make about open-source. We also feature job openings at Comcast as well as our Innovation Fund, which is a grant-based fund request, so people can make a request for us to contribute money towards their project, or to research. And I'm on Twitter at @nithyaruff.Emily: Well, thank you so much, Nithya. This has been really fabulous.Nithya: Thanks, Emily. And thank you for helping me share my enthusiasm for what an open-source office is, and why everybody needs one.Emily: Thanks for listening. I hope you've learned just a little bit more about The Business of Cloud Native. If you'd like to connect with me or learn more about my positioning services, look me up on LinkedIn: I'm Emily Omier—that's O-M-I-E-R—or visit my website which is emilyomier.com. Thank you, and until next time.Announcer: This has been a HumblePod production. Stay humble.

business apple microsoft cloud cto slack real world eq github comcast prometheus ruff tricksters kubernetes gitlab cloud native linux foundation cncf innovation fund nithya jon moore new stack emily it emily omier ospo emily well humblepod

Disrupting the Cloud Storage Market with Ben Golub

The Business of Open Source

Play Episode Listen Later Oct 7, 2020 24:52

This conversation covers: The advantages of using a distributed data storage model. How Storj is creating new revenue models for open-source projects, and how the open-source community is responding. The business and engineering reasons why users decide to opt for cloud-native, according to Ben. Viewing cloud-native as a journey, instead of a destination — and some of the top mistakes that people tend to make on the journey. Ben also talks about the top pitfalls people make with storage and management. Why businesses are often caught off guard with high storage costs, and how Storj is working to make it easier for customers. Avoiding vendor lock-in with storage. Advice for people who are just getting started on their cloud journey. The person who should be responsible for making a cloud journey successful. Links: Storj Labs: https://storj.io/ Twitter: https://twitter.com/golubbe GitHub: https://github.com/golubbe TranscriptEmily: Hi everyone. I'm Emily Omier, your host, and my day job is helping companies position themselves in the cloud-native ecosystem so that their product's value is obvious to end-users. I started this podcast because organizations embark on the cloud naive journey for business reasons, but in general, the industry doesn't talk about them. Instead, we talk a lot about technical reasons. I'm hoping that with this podcast, we focus more on the business goals and business motivations that lead organizations to adopt cloud-native and Kubernetes. I hope you'll join me.Emily: Welcome to The Business of Cloud Native, my name is Emily Omier. I'm your host, and today I'm chatting with Ben Golub. Ben, thank you so much for joining us.Ben: Oh, Thank you for having me.Emily: And I always like to just start off with having you introduce yourself. So, not only where you work and what your job title is, but what you actually spend your day doing.Ben: [laughs]. Okay. I'm Ben Golub. I'm currently the executive chair and CEO of Storj Labs, which is a decentralized storage service. We kind of like to think of it as the Airbnb of disk drives, But probably most of the people on your podcast who, if they're familiar with the, sort of, cloud-native space would have known me as the former CEO of Docker from when it was released up until a few years ago. But yeah, I tend to spend my days doing a lot of stuff, in addition to family and dealing with COVID, running startups. This is now my seventh startup, fourth is a CEO.Emily: Tell me a little bit, like, you know, when you stumble into your home office—just kidding—nobody is going to the office, I know. But when you start your day, what sort of tasks are on your todo list? So, what do you actually spend your time doing?Ben: Sure. We've got a great team of people who are running a decentralized storage company. But of course, we are decentralized in more ways than one. We are 45 people spread across 15 different countries, trying to build a network that provides enterprise-grade storage on disk drives that we don't own, that are spread across 85 different countries. So, there's a lot of coordination, a lot of making sure that everybody has the context to do the right thing, and that we stay focused on doing the right thing for our users, doing the right thing for our suppliers, doing the right thing for each other, as well.Emily: One of the reasons I thought it'd be really interesting to talk with you is that I know your goal is to, sort of, revolutionize some of the business models related to managing storage. Can you talk about that a little bit more?Ben: Sure. Sure. I mean, obviously, there's been a big trend over the past several years towards the Cloud in general, and a big part of the [laughs] Cloud is storage. Actually, AWS started with S3, and it's a $90 billion market that's growing. The world's going to create enough data this year to fill a stack of CD-ROMs, to the orbit of Mars and back. And yet prices haven't come down, really, in about five years, and the whole market is controlled by essentially three players, Microsoft, Google, in the largest, Amazon, who also happen to be three of the five largest companies on the planet. And we think that data is so critical to everything that we do that we want to make sure that it doesn't stay centralized in the hands of a few, but that we, sort of, create a more, sort of, democratic—if you will—way of handling data that also addresses some of the serious privacy, data mining, and security concerns that happen when all the data is held by only a few people.Emily: With this, I'm sure you've heard about digital vegans. So, people who try to avoid all of the big tech giants—Ben: Right, right.Emily: Does this make it possible to do that?Ben: Well, so we're more of a back end. So, we're a service that people who produce-consumer-facing services use. But absolutely, if somebody—and we actually have people who want to create a more secure way of providing data backup, more secure way of enabling data communications, video sharing, all these sorts of things, and they can use us and service those [laughs] digital vegans, if you will.Emily: So, if I'm creating a SaaS product for digital vegans, I would go with you?Ben: I would hope you'd consider us, yeah. And by the way, I mean, also people who have mainstream applications use us as well. I mean, so we have people who are working with us who may have sensitive medical data on people, or people who are doing advanced research into areas like COVID, and they're using us partially because we're more secure and more private, but also because we are less likely to be hacked. And also because frankly faster, cheaper, more resilient.Emily: I was just going to ask, what are the advantages of distributed storage?Ben: Yeah. We benefit from all the same things that the move towards cloud-native in general benefits from, right? When you take workloads, and you take data, and you spread them across large numbers of devices that are operated independently, you get more resilience, you get more security, you can get better performance because things are closer to the edge. And all of these are benefits that are, sort of, inherent to doing things in a decentralized way as opposed to a centralized way. And then, quite frankly we're cheaper. I mean, because of the economics and doing this this way, we can price anywhere from a half to a third of what the large cloud providers offer, and do so profitably for ourselves.Emily: You also offer some new revenue models for open-source projects. Can you talk about that a little bit more?Ben: Sure, I mean, obviously I come from an open-source background, and one of the big stories of open-source for the past several years is the challenges for open-source companies in monetizing, and in particular, in a cloud world, a large number of open-source companies are now facing the situation where their products, completely legally but nonetheless, not in a fiscally sustainable way, are run by the large cloud companies and essentially given away as a loss leader. So, that a large cloud company might take a great product from Mongo or Redis, or Elastic, and run it essentially for free, give it away for free, not pay Mongo, Elastic, or Redis. And the cloud companies monetize that by charging customers for compute, and storage, and bandwidth. But unfortunately, the people who've done all the work to build this great product don't have the opportunity to share in the monetization. And it makes it really very hard to adopt a SaaS model for the cloud companies, which for many of them is really the best way that they would normally have for monetizing their efforts. So, what we have done is we've launched a program that basically turns it on its head and says, “Hey, if you are an open-source project, and you integrate with us in a way that your users send data to us, we'll share the revenue back with you. And as more of your users share more data with us, we'll send more money back to you.” And we think that that's the way it should be. If people are building great open-source projects that generate usage and revolutionize computing, they should be rewarded as well.Emily: How important is this to the open-source community? How challenging is it to find a way to support an open-source project?Ben: It's critical. I mean, if you look at the most—I'd start by saying two-thirds of all cloud workloads are open-source, and yet in the $180 billion cloud market, less than $5 billion [unintelligible] going back to the open-source projects that have built these things. And it's not easy to build an open-source project, and it takes resources. And even if you have a large community, you have developers who have families, or [laughs] need to eat, right? And so, as an open-source company, what you really want to be able to do is become self-sustaining. And while having contributions is great, ultimately, if open-source projects don't become self-sustaining, they die. Emily: A question just, sort of, about the open-source ethos: I mean, how does the community about open-source feel about this? It is obvious developers have to eat just like everybody else, and it seems like it should be obvious that they should also be rewarded when they have a project that's successful. But sometimes you hear that not everybody is comfortable with open-source being monetized in any way. It's like a dirty word.Ben: Yeah. I mean, I think [unintelligible] some people who object to open-source being monetized, and that tends to be a fringe, but I think there's a larger percentage that don't like the notion that you have to come up with a more restrictive license in order to monetize. And I think unfortunately a lot of open-source companies have felt the need to adopt more restrictive licenses in order to prevent their product being taken and used as a loss leader by the large cloud companies. And I guess our view is, “Hey, what the world doesn't need is a different kind of license. It needs a different kind of cloud.” And that's, and that's what we've been doing. And I think our approach has, frankly, gotten a lot of enthusiasm and support because it feels fair. It's not, it's not trying to block people from doing what they want to do with open-source and saying, “This usage is good, this is bad.” It's just saying, “Hey, here's a new viable model for monetizing open-source that is fair to the open-source companies.”Emily: So, does Storj just manage storage? Or, where's the compute coming from?Ben: It's a good question. And so, generally speaking, the compute can either be done on-premise, it can be done at the end. And we're, sort of, working with both kinds. We ourselves don't offer a compute service, but because the world is getting more decentralized, and because, frankly, the rise of cloud-native approaches, people are able to have the compute and the storage happening in different places.Emily: How challenging is it to work with storage, and how similar of an experience is it to working with something like AWS for an end-user? I just want to get my app up.Ben: Sure, sure. If you have an S3 compatible application, we're also S3 compatible. So, if you've written your application to run on AWS S3, or frankly, these days most people use the S3 API for Google and Microsoft as well, it's really not a big effort to transition. You change a few lines of code, and suddenly, the data is being stored in one place versus the other. We also have native libraries in a lot of different languages and bindings, so for people who want to take full advantage of everything that we have to offer, it's a little bit more work, but for the most part, our aim is to say, “You don't have to change the way that you do storage in order to get a much better way of doing storage.”Emily: So, let me ask a couple questions just related to the topic of our podcast, the business of cloud-native. What do you think are the reasons that end users decide to go for cloud-native?Ben: Oh, I think there are huge advantages across the board. There are certainly a lot of infrastructural advantages: the fact that you can scale much more quickly, the fact that you can operate much more efficiently, the fact that you are able to be far more resilient, these are all benefits that seemed to come with adopting more cloud-native approaches on the infrastructure side if you will. But for many users, the bigger advantages come from running your applications in a more cloud-native way. Rather than having a big monolithic application that's tied tightly to a big monolithic piece of hardware, and both are hard to change, and both are at risk, if you write applications composed of smaller pieces that can be modified quickly and independently by small teams and scale independently, that's just a much more scalable, faster way to build, frankly, better applications. You couldn't have a Zoom, or a Facebook, or Google search, or any of these massive-scale, rapidly changing applications being written in the traditional way.Emily: Those sound kind of like engineering reasons for cloud-native. What about business reasons?Ben: Right. So, the business reasons [unintelligible], sort of, come alongside. I mean, so when you're able to write applications faster, modify them faster, adapt to a changing environment faster, do it with fewer people, all of those end up having real big business benefits. Being able to scale flexibly, these give huge economic benefits, but I think the economic benefits on the infrastructure side are probably outweighed by the business flexibility: the fact that you can build things quickly and modify them quickly, and react quickly to changing environment, that's [unintelligible]. Obviously, again, you use Zoom as an example. There's this two-week period, back in March, where suddenly almost every classroom and every business started using Zoom, and Zoom was able to scale rapidly, adapt rapidly, and suddenly support that. And that's because it was done in a more—in a cloud-native way.Emily: I mean, it's interesting, one of the tensions that I've seen in this space is that some people like to talk a lot about cost benefits. So, we're going to move to cloud-native because it's cheap, we're going to reduce costs. And then there's other people that say, well, this isn't really a cost story. It's a flexibility and agility, a speed story.Ben: Yeah, yeah. And I think the answer is it can be both. What I always say, though, is cloud-native is not really a destination, it's a journey. And how far we go along with that path, and whether you emphasize the operational side versus—or the infrastructural side versus the development side, it sort of depends on who you are, and what your application is, and how much it needs to scale. And it's absolutely the case that for many companies and applications if they try to look like Google from day one, they're going to fail. And they don't need to because it's—the way you build an application that's going to be servicing hundreds of million people is different than the way you build an application, there's going to be servicing 50,000 people.Emily: What do you see is that some of the biggest misconceptions or mistakes that people make on this journey?Ben: So, I think one is clearly that they knew it as an all or nothing proposition, and they don't think about why they're going on the journey. I think a second mistake that they often make is that they underestimate the organizational change that it takes to build things in the cloud-native way. And obviously, the people, and how they work together, and how you organize, is as big transition for many people as the tech stack that you'd use. And I think the third is that they don't take full advantage of what it takes to move a traditional application to run it in a cloud-native infrastructure. And you can get a lot of benefits, frankly, just by containerizing or Docker-izing a traditional app and moving it online.Emily: What about specifically related to storage and data management? What do you think are some misconceptions or pitfalls?Ben: Right. So, I think that the challenge that many people have when they deal with storage is that they don't think about the data at rest. They don't think about the security issues that are inherent in having data that can be attacked in a single place, or needs to be retrieved from a single place. And part of why we built Storj, frankly, is a belief that if you take data and you encrypt it, and you break it up into pieces, and you distribute those pieces, you actually are doing things in a much better way that's inherent, that you're not dependent on any one data center being up, or any one administrator doing their job correctly, or any password being strong. By reducing the susceptibility to single points of failure, you can create an environment that's more secure, much faster, much more reliable. And that's math. And it gets kind of shocking to see that people who make the journey to cloud-native, while they're changing lots of other aspects of their infrastructure and their applications, repeating the same mistakes that people have been making for 30 years in terms of data access, security, and distribution.Emily: Do you think that that is partially a skills gap?Ben: It may be a skills gap, but it's also, frankly, there's been a dearth of viable other options. And I think that—we frequently when I'm talking with customers, they all say, “Hey, we've been thinking about being decentralized for a while, but it just has been too difficult to do.” Or there have been decentralized options, but they're, sort of, toys. And so, what we've aimed to do is create a decentralized storage solution that is enterprise-grade, is S3 compatible, so it's easy to adopt, but that brings all the benefits of decentralization.Emily: I'm also just curious because of the sort of organizational changes that need to happen. I mean, everybody, particularly in a large organization, is going to have these super-specific areas of expertise, and to a certain extent, you have to bring them all together.Ben: You do. Right. You do have to. And so I'm a big believer in you pick pilot projects that you do with a small team, and you get some wins, and nothing helps evangelize change better than wins. And it's hard to get people to change if they don't see success, and a better world at the end of the tunnel. And so, what we've tried to do, and what I think people doing in the cloud-native journey often do, is you say, “Let's take a small low-risk application or small, low-risk dataset, handle it in a different way, and show the world that it can be done better,” right? Or, “Show our organization that it can be better.” And then build up not only muscle memory around how you do this, but you build up natural advocates in the organization.Emily: Going back to this idea of costs, you mentioned that Storj can reduce costs substantially. Do you think a lot of organizations are surprised at how much cloud storage costs?Ben: Yes. And unfortunately, it's a surprise that comes over time. I mean, you… I think the typical story if you get started with Cloud. And there's not a lot of large upfront costs when your usage is low. So, yeah, so you start with somebody pulling out their credit card and building their pilot project, and just charging themselves directly to charging themselves directly to—you know, charging their Amazon, or their Google, or their Microsoft directly to their credit card, then they move to paying through a centralized organization. But then as they grow, suddenly, this thing that seemed really low price becomes very, very expensive, and they feel trapped. And data, in particular, has this—in some ways, it grows a lot faster than compute. Because, generally speaking, you're keeping around the data that you've created. So, you have this base of data that grows so slowly that you're creating more data every day, but you're also storing all the data that you've had in the past. So, it grows a lot more exponentially than compute, often. And because data at rest is somewhat expensive to move around, people often find themselves regretting their decisions a few months into the project, if they're stuck with one centralized provider. And the providers make it very difficult and expensive to move data out.Emily: What advice would you have to somebody who's at that stage, at the just getting started, whipping out my credit card stage? What do you do to avoid that sinking feeling in your stomach five months from now?Ben: Right. I mean, I guess what I would say is that don't make yourself dependent on any one provider or any one person. And that's because things have gotten so much more compatible, and that's on the storage side by the things that we do, on the compute side by the use of containers and Docker. You don't need to lock yourself in, as long as you're thoughtful at the outset.Emily: And who's the right person to be thinking about these things?Ben: That's a good question. So, you know, I'd like to say the individual developer, except developers for the most part, they have something that they want to build, [laughs] they want to get it built as fast as possible and they don't want to worry about infrastructure. But I really think it's probably that set of people that we call DevOps people that really should be thinking about this, to be thinking not only how can we enable people to build and deploy and secure faster, but how can we build and secure and deploy in a way that doesn't make us dependent on centralized services?Emily: Do you have other pieces of advice for somebody setting out on the “Cloud journey,” in quotes, too basically avoid the feeling, midway through, that they messed up.Ben: So, I think that part of it is being thoughtful about how you set off on this cloud journey. I mean, know where you want to end up, I think this [unintelligible]. You want to set off on a journey across the country, it's good to know that you want to end up in Oregon versus you want to end up in Utah, or Arizona. [unintelligible] from east to west, and making sure your whole organization has a view of where you want to get. And then along the way, you can say, “You know what? Let's course-correct.” But if you are going down on the cloud journey because you want to save money, you want to have flexibility, you don't want to be locked in, you want to be able to move stuff to the edge, then thinking really seriously about whether your approach towards the Cloud is helping you achieve those ends. And, again, my view is that if you are going off on a journey to the Cloud, and you are locking yourself into a large provider that is highly centralized, you're probably not going to achieve those aims in the long run.Emily: And then again, who is the persona who needs to be thinking this? And ultimately, whose responsibility is it to make a cloud journey successful?Ben: So, I think that generally speaking, a cloud journey past these initial pilots where I think pilots are often, it's a small team that are proving that things can be done in a cloud-native way, they should do whatever it takes to prove that something can be done, and get some successes. But then I think that the head of engineering, the Vice President of Operations, the person who's heading up DevOps should be thoughtful, and should be thinking about where the organization is going, from that initial pilot into developing the long-term strategy.Emily: Anything else that you'd like to add?Ben: Well, these are a lot of really good questions, so I appreciate all your questions and the topic in general. I guess I would just add, maybe my own personal bias, that data is important. The cloud is important, but data is really important. And as, you know, look at the world creating enough data this year to fill a stack of CD-ROMs, to the orbit of Mars and back, some of that is cat videos, but also buried in there is probably the cure to COVID, and the cure for cancer, and a new form of energy. And so, making it possible for people to create, and store, and retrieve, and use data in a way that's cost-effective, where they don't have to throw out data, that is secure and private, that's a really noble goal. And that's a really important thing, I think, for all of us to embrace.Emily: Just a couple of final questions. The first one, I just like to ask everybody, what is your favorite can't-live-without software engineering tool?Ben: Honestly, I think that collaboration tools, writ large, are important. And whether that's things like GitHub, or things like video conferencing, or things like shared meeting spaces, it's really the tools enable groups of people to work together that I think are the most important.Emily: Where can people connect with you or follow you?Ben: Oh, so I'm on Twitter, @golubbe, G-O-L-U-B-B-E. And that's probably the best place to initially reach out to me, but then I [blog], and I'm on GitHub as well. I'm not that great [unintelligible].Emily: Well, thank you so much for joining us. This was a great conversation.Ben: Oh, thank you, Emily. I had a great conversation as well.Emily: Thanks for listening. I hope you've learned just a little bit more about The Business of Cloud Native. If you'd like to connect with me or learn more about my positioning services, look me up on LinkedIn: I'm Emily Omier—that's O-M-I-E-R—or visit my website which is emilyomier.com. Thank you, and until next time.Announcer: This has been a HumblePod production. Stay humble.

covid-19 ceo amazon google business advice zoom arizona vice president market microsoft oregon mars utah airbnb operations cloud saas real world disrupting aws github devops s3 docker kubernetes elastic cd roms mongo cloud native redis cloud storage golub aws s3 storj ben it ben you ben oh ben so ben well emily omier ben right emily well humblepod

Enabling Cloud Native Environments with Gou Rao

The Business of Open Source

Play Episode Listen Later Sep 16, 2020 29:38

The conversation covers: Gou's role as CTO of Portworx, and how he works with customers on a day to day basis. Common pain points that Gou talks about with customers. Gou explains how he helps customers create agile and cost-effective application development and deployment environments. The types of people that Gou talks to when approaching customers about cloud native discussions. Why customers often struggle with infrastructure related problems during their cloud native journeys, and how Gou and his team help. Common misconceptions that exist among customers when exploring cloud native solutions. For example, Gou mentions moving to Kubernetes for the sake of moving to Kubernetes. Gou's thoughts on state — including why there is no such thing as an end-to-end stateless architecture. Some cloud native vertical trends that Gou is noticing taking place in the market. The issue of vendor lock-in, and how data and state fit into lock-in discussions. Gou's opinion on where he sees the cloud native ecosystem heading. Links Portworx: https://portworx.com/ Portworx Blog: https://portworx.com/blog/ Gou Rao Email: mailto:gou@portworx.com TranscriptEmily: Hi everyone. I'm Emily Omier, your host, and my day job is helping companies position themselves in the cloud-native ecosystem so that their product's value is obvious to end-users. I started this podcast because organizations embark on the cloud naive journey for business reasons, but in general, the industry doesn't talk about them. Instead, we talk a lot about technical reasons. I'm hoping that with this podcast, we focus more on the business goals and business motivations that lead organizations to adopt cloud-native and Kubernetes. I hope you'll join me.Emily: Welcome to The Business of Cloud Native, I'm your host Emily Omier, and today I am chatting with Gou Rao. Gou, I want to go ahead and have you introduce yourself. Where do you work? What do you do?Gou: Sure. Hi, Emily, and hi to everybody that's listening in. Thanks for having me on this podcast. My name is Gou Rao. I'm the CTO at Portworx. Portworx is a leader in the cloud-native storage space. We help companies run mission-critical stateful applications in production in hybrid, multi-cloud, and cloud-native environments.Emily: So, when you say you're CTO, obviously that's a job title everyone, sort of, understands. But what does that mean you spend your day doing?Gou: Yeah, it is an overloaded term. As a CTO, I think CTOs in different companies wear multiple hats doing different things. Here at Portworx, technically I'm in charge of this company strategy and technical direction. What does that mean in terms of my day to day activities? And it's spending a lot of time with customers understanding the problems that they're trying to solve, and then trying to build a pattern around what different people in different industries and companies are doing, and then identifying common problems and trying to bring solutions to market, by working with our engineering teams, that sort of address, holistically, the underlying areas that I see people try and craft solutions around, whether it's enabling an agile development environment for their internal developers, or cost optimization, there's usually some underlying theme, and my job is to identify what that is, and come up with a meaningful solution that addresses a wide segment of the market.Emily: What are the most common pain points that you end up talking to customers about?Gou: Over the past, I think, eight-plus years or so—I think the enterprise software space goes through iterations in the types of problems that are being solved. Over the past eight-plus years or so, it really has been around this—we use this term cloud-native—enabling cloud-native environments. And what does that really mean? In talking to customers, what this is really meant recently is enabling an agile application development and deployment environment. And let's even define what that is. Me as an application developer, I have to rely on traditional IT techniques where there's a separate storage department, compute department, networking department, security department, and I have to interact with all of them just to develop and try out an application. But that really is impeding me as a developer from how fast I can iterate and build product and get it out there, so by and large, the common underlying theme has been, “Make that process better for me.” So, if I'm head of infrastructure how can I enable my developers to build and push product faster? So, getting that agility up in a sense where it makes—cost-wise, too, so it has to make cost sense—how do I enable an efficient, cost-efficient development platform? That has been the underlying theme. That sort of defines a set of technologies that we call cloud-native, and so orchestration tools like Kubernetes, and storage technologies like, hopefully, what we're doing at Portworx, these are all aimed at facilitating that. That's been sort of what we've been focused on over the past couple of years.Emily: And when you talk to customers, do they tend to say, “Hey, we need to figure out a way to increase our development velocity?” Or do they tend to say, “We need a better solution for stateful applications?” What's the type of vocabulary that they're attempting to use to describe their problems, and how high-level do they usually go?Gou: That's a good question. Both. So, the backdrop really is, “Increase my development velocity. Make it easier for me to put product out there faster.” Now, what does it take to get there? So, the second-order problems then become do I run in the public cloud, private cloud? Do I need help running stateful applications? So, these are all pillars that support the main theme here, which is increasing development velocity. So, the primary umbrella under which our customers are operating under is really around increasing the development velocity in a way that makes cost sense. And if you double-click on that and look at the type of problems that they're solving, they would include, “How do I efficiently run my applications in a public cloud? Or a hybrid cloud? How do I enable workflows that need to span multiple clouds?” Again because maybe they're using cloud provider technologies, like either compute resources, or even services that a cloud provider may be offering, so that, again, all of this so that they can increase their development velocity.Emily: And in the past, and to a certain extent now, storage was somewhat of a siloed area of expertise. When you're talking to customers, who are you talking to in an organization? I mean, is it somebody who's a storage specialist or is it someone who's not?Gou: No, they're not. So, that's been one of the things that have really changed in this ecosystem, which is the shift away from this kind of like, hey, there's a storage admin and a storage architect, and then there's a compute admin or BM admin or a security admin, that's really not who are driving this because if you look at that—that world really thinks in terms of infrastructure first.Actually, let me just take a step back for a second. One of the things that has actually changed here in the industry is this: a move from a machine-centric organization to an application-centric organization. So, let me explain what that means. Historically, enterprises have been run by data centers that have been run by a machine-centric control plane. This is your typical VMware type of control plane where the most important concept in a data center is a machine. So, if you need storage, it's for a machine. If you need an IP address, it's for a machine. If you want to secure something, you're securing a machine. And if you look at it, that really is not what an enterprise or business is trying to solve. What enterprises and businesses are trying to do is put their product out there faster. And so for them, what is more important? It's an application. And so what it's actually changed here? And one of the things that defines this cloud-native movement, at least in my mind, is this move from a machine-centric control plane to an application-centric control plane where the first-class citizen or the most important thing in an enterprise data center is not a machine, it's an application. And really, that's where technologies like Kubernetes and things like that come in.So, now your question is, who do we talk to? We don't talk to a storage administrator or machine-centric administrator; we talk to the people that are more focused on building and harnessing that application-centric control plane. So, these are people like a cloud architect, or it's a CIO level—or a CTO level driven decision to enable their application developers to move faster. These are application owners, application architects, it's that kind of people. You had an additional question there which is, are they storage experts? And so by definition, these people are not. So, they know they need storage, they know they need to secure their applications, they know they need networking, but they're not experts in any one of those domains. There are more application-level architects and application experts.Emily: Do you find that there tends to be some knowledge gaps? If so, is there anything that you keep sort of repeating over and over again when you have these conversations?Gou: So, it's not about having a knowledge gap, it's more about solving an infrastructure problem—which is what storage is—is not necessarily their primary task. So, in a sense that they're not experienced with that, they know that they need infrastructure support, they know they need storage support, and networking support, but they expect that to be part of the core platform. So, one of the things that we've had to do—and I suspect others in this cloud-native ecosystem—is to take all of that the heavy lifting that storage software would do and then package it in such a way that it's easy to consume by application architects; easy to consume by people that are putting together this cloud-native platform. So, dealing with things like drive failures, or how do I properly [unintelligible] my data or RAID protect my data, or how do I deal with backups? These are things that they kind of know they need me to do, but they're not really experienced with it, and they expect the cloud-native software platforms to handle that functionality for them. So, in other words, doing the job of a storage admin, but in a form of software has been really important.Emily: Do you find that there's any common misconceptions out there?Gou: Yeah. With any new technology or evolving space, I think there are bound to be some misconceptions in how you approach things. So, just moving to Kubernetes for the sake of moving to Kubernetes, for instance, is a common—or improperly embracing is as a common mistake that we see. You really need to think about why you're moving to this platform, and if you're really doing it to embrace developer agility. For instance, one mistake we see, and this is especially true with the ecosystem we work in because Portworx is enabling storage technologies in Kubernetes. We see people try and take Kubernetes and leverage legacy infrastructure tools like either NFS or connecting their Kubernetes systems to storage arrays—which are really machine-centric storage arrays—over protocols like iSCSI, and you kind of look at that, and what is the problem with that? And one way I like to describe it is if you take an agile platform like Kubernetes, and then you make that platform rely on legacy infrastructure, well, you're kind of bringing down the power of Kubernetes to the lowest common denominator here, which is your legacy platform. And your Kubernetes platform is only going to be as agile as the least nimble element in the stack. And so a mis-pattern that we see here is where people try and take them, and then they say, “Well, geez, I'm not getting the agility out of Kubernetes, I don't really see what all the fuss about this is. I'm still moving IT tickets around, and make developers still rely on storage admins.” And then we have to tell them, “Well, that's because you've kind of tied your Kubernetes platform down to legacy infrastructure, and let's now think about modernizing that infrastructure.” And that's sort of—I do find myself in a spot where we have to educate the partners about that. And they eventually see that and hopefully, that's why they work with us.Emily: Why do you think they tend to make that mistake?Gou: It's what's lying around, right? It's the ease of just leveraging the tools and the equipment you already have, not really fully understanding the problem all the way through. It takes time for people to try out a pattern—take Kubernetes, try and see what you have lying around, connect it to learn from mistakes. And so that really takes some time. They look at others to see what others are doing, and it's not immediately evident. I think, with any new technology, there's always some learning that goes along with it.Emily: Well, let's talk a little bit about stateful apps in general. Why do people need stateful apps? What business problem do stateful apps deal with, or make it possible to solve that you just can't do with stateless?Gou: Sure. Yeah very rarely is there really something that is a stateless app, so unless you're doing some sort of ephemeral experiment—and there are certain use cases for that—especially in enterprises, you always rely on data. And data is essentially your state in some shape or form, whether it's short-lived state, long-lived state, there's always some state and data that's involved in an application. And people try to think of things in terms of stateless, and what that really means is, they're punting the state problem to some other part of their overall solution; it doesn't really go away. What do I mean by that? Well, if you put all your stateless apps on your cloud-native platform, where did the state go? You're probably dealing with it in some VM or some bare-metal machine. It didn't really go away; it's there, and you're still left with trying to connect to it, and access it, and manage it. And then you start wondering, what kind of northbound impact is that having on your, what you think is your stateless side of things. And it has an impact. Anytime you need to make changes to your data structure, it's not like your stateless side of things is not impacted, so it kind of halts the entire pipeline. So, what we see happening as a pattern is people finally understanding that there's really no such thing as an end-to-end stateless architecture—especially in enterprises—and that they need to embrace the state and manage it in a cloud-native way. And so, that's really where a lot of this talk you see around these days, around how do you manage stateful applications in Kubernetes, that's where that is coming from. How do you govern it? How do you enforce things like RBAC on it? How do you manage its accessibility? How do you deal with its portability, because state has gravity? These are the main topics these days that people have to think about.Emily: Can you think of any misconceptions other than the ones that you've just mentioned that are specifically related to state?Gou: Sure. I think costs—well, less a misconception than I think not fully appreciating or understanding the impact that this would have, and it really is around cost and especially when running in the public cloud. People underestimate the cost associated with state in the public cloud, and especially when you're enabling an agile platform, with agility comes—really determines automation, and people tend to do a lot of things and so, when you have a lot of state laying around, the cost can add up. Simple example is if you have a lot of volumes that are sitting around in, let's say, Amazon, well, you're paying for those bills. And density is another related concept, which is, if you're running thousands of applications, and thousands of databases, or thousands of volumes, how do you manage that with not spending that much money on the resources? How do you virtually manage that? So, cost planning in the public cloud is something that I see is so commonly overlooked or not really fully thought through. And it does come to bite people down the line because those bills stack up quickly. So, coming up with an architecture where that is well thought through becomes really important.Emily: When do people tend to think about state when they're thinking about a digital transformation towards cloud-native?Gou: I think, Emily, state is something that is front and center of any enterprise architecture because it really—look at it this way, any application you're going to build, generally speaking, revolves around the data structures and the data it operates on. You really typically start with—and there's a few different ways in which you start with cracking your application stack, right? One is you start with the user experience, like what end-user experience you want to fulfill, what is the problem you're solving? Another core element is then what are the data structures needed to solve that problem? So, state becomes an important thing right from day one. Now, in terms of managing it, people can choose to defer that problem saying, “I'm not going to deal with managing the state in my cloud-native platform.” This goes to your comment about stateless architectures. Again, if they do that, then they're punting the problem until the point where they actually need to implement it and manage it in production. Either way, that problem comes up. You're thinking about the data structures and development time for sure, you can avoid that. In terms of embracing how you're going to run it, it definitely comes into place as you're rolling up your application as you're getting close to production because somebody has to manage that.Emily: And there are probably people out there who would argue, hey, cloud-native means stateless. What would you say to those people?Gou: Yeah, I think we've had this debate many years ago, too. This notion of ‘twelve-factor applications' is kind of where it started, and I think people realized that, unless you're dealing with an architecture where state is sort of a small subset of the overall product offering, where really a lot of value is in maybe your UI, or it's a web app where people are interacting with each other and there's just a small amount of data sitting in a database that's recording a few messages, there really is no such thing as a safe plus architecture. So, in that case, what you could do is you architect your database, maybe just once and you're really not touching it—any changes you're making to your application are more cosmetic—then you can put your state somewhere else and an external database, have an endpoint that your applications interact with. Keep in mind, though, that still—that's not stateless; you've just put your state outside of your cloud-native platform. But what I would say to your question, what would you say to somebody that says that's the way to do things, my point is that's not how enterprise applications and architectures work. They're more complex than that, and the state is more central, the data is more central to the application, where changes intimately involve changing how your data structure is organized, changing tables around. In that case, it doesn't make sense to treat that outside of your cloud-native stack: because you're making so many changes to it, you're then going to lose the agility that the cloud-native platform can offer compared to bringing those microservice databases into your Kubernetes platform, letting the developers make the data structure changes they need to do as frequently as they need to, all within the same platform, within the same control plane. That makes a better enterprise architecture.Emily: Sort of a different type of question. But I'm just curious if there's anything that continues to surprise you about the cloud-native ecosystem, even about your conversations with customers, what continues to surprise you, but also what continues to surprise them?Gou: I was a little bit more surprised a couple of years ago, but does continue to surprise me is the mixing of the cloud-native and the legacy tools that still happens, and it's not just with storage, I see the same thing happening around security, or even networking. And some of that, not so much as surprise as and it makes less sense to me—and eventually I think people realize it—that they make this change, and then it sort of looks like a lateral move to them because they didn't get the agility they wanted. If somebody has to roll out an application and they have to sit through a machine type of security audit, for instance, if they're doing security, and they're not leveraging the new tools that do container-native security, those kinds of things do raise a flag and trying to work with our partners and customers and trying to point these things out, I still do see some of that happening. And I think it has been getting better over time because people learn from their peers within their industries, whether it's banking—you'll see banks kind of look at each other, and they develop similar architectures. It's a close-knit ecosystem, for instance.Emily: Actually, do you see any specific trends that are vertical-specific?Gou: So, industry-specific, right? So, for instance, in the financial sector, they're certainly trends, whether it's embracing hybrid-cloud technologies, and they kind of do things similarly. An example is, some industries are okay with completely relying on one single cloud provider. Generally speaking—and I'm just giving an example in the financial space—we've seen that not be the case, where they kind of want more of a multi-cloud or cloud-neutral platform, they probably will run on-prem and in the public cloud, but they don't want to be locked into a particular cloud provider. And so that's, for example, a trend in that industry. So, yeah, different industries have different kinds of specific features that they're looking for and architectures that they're circling around.Emily: You brought up lock-in. Lock-in is a big topic with a lot of people, but often the idea of data gravity doesn't make its way into the primary conversations about lock-in. How does data and state fit into these lock-in discussions?Gou: That's a really good point, and I'll break down into two things. So, data has gravity, and ultimately if you have petabytes of data sitting around somewhere, it becomes harder and harder to move that, but even before that, one of the most important things that we try and teach people, and I think people, kind of, realize this themselves is lock-in starts even before the data gravity has become an issue. It starts with, is your enterprise architecture or platform itself, just completely relying on a particular provider's APIs and offerings? And that's where we really caution our customers and partners to say, that's the layer at which you want to build your—that's your demarcation point, which is run your services in a cloud provider but don't lock your architecture around relying on that cloud provider providing those services. So, instance as database, if you're running a database, it's better to run your database on your platform in the public cloud, as opposed to putting all of your data in the cloud providers database because then you're really locked in, not just by way of data, but even by way of your enterprise architecture. So, that's one of the first things that I think people are looking for which is, how do I build this cloud-agnostic platform? Forget cloud-native, but cloud-agnostic platform where I can run all of my databases at ease, but I've not locked in my database to a cloud provider. Once you do that, then you can start breaking up your applica—especially in enterprises, it's not like you just have one very large database; you typically have many, many, many small databases, so it becomes easy to start with portability, and you can have sets of your—if you need to—applications run in different cloud providers and your ways to connect these and this is the notion of hybrid-cloud, which is becoming real.Emily: What do you see as the future? Where do you see this ecosystem going?Gou: You know, there's a lot happening in different segments, right. And 5G, for example, is a huge enabler of Kubernetes, or consumer of cloud-native technologies, and there's just a lot happening over there. Just around edge to core compute, and patterns that are emerging with how you move data from the edge to the core or vice versa, and how you distribute data at the edge. So, there's a lot of innovations happening there in the 5G space. Every segment has something interesting going on. From Portworx's standpoint, what we're doing over the next couple of years and helping people in two main areas.I mentioned 5G, so enabling workflows that are truly cloud-native. What do I mean by that? Driven by Kubernetes, that allow the developers to create higher-level workflows where they can either move data between Kubernetes clusters—again, it doesn't have to be between cloud providers; just to take a step back for a second. Whenever somebody is running a cloud-native platform, what we find is that it's not like an enterprise has one very large Kubernetes clusters. Typically people are managing fleets of Kubernetes clusters, so whether they're doing blue/green deployments, or they just have different clusters for different types of applications, or they're compartmentalized for different departments, we find that people having to move and facilitate movement of applications between clusters is very important. So, that's an area where we're focusing on; certainly makes sense in the 5G space.The other area is around putting in more AI into the platform. So, here's an example. Does every enterprise out there, does every developer need to be—if they're running stateful applications, do they need to become a database expert to run it? And our answer is no. We want to bring in this notion of self-driving technologies into stateful applications where—hopefully with the Portworx software—if you're running stateful applications, the Portworx software is managing the performance of the database, maybe even the sharding of the database, the vertical and horizontal scaling of it, monitoring the database for performance problems, hotspots, reacting to it. We've introduced some technologies like that already over the past couple of years. An example is capacity management. What we've found is—and I think you asked this question early on—people that are running stateful applications on their Kubernetes platform, they're not storage experts. People routinely make mistakes when it comes to capacity planning. One of the things that we found is people had underestimated how much space they're going to use or how much storage they're going to use, and so they ran out of capacity in their cloud-native platform. Why should it be their problem to deal with that? So, we added software that's intelligent enough to detect these kind of cases and react to it. Similarly, we'll do the same thing around vertical scaling, reducing hotspots. And bringing in that kind of intelligence and little platform is something not just Portworx but we expect other people in this ecosystem to solve.Emily: Do you think there's any business problems that customers talk about that you don't have a good answer to? Or that you don't think the ecosystem has a good answer to, yet?Gou: Yeah, no, it's a good question. So, making the platform simple. So, the simplicity is one aspect. I think we do see enterprises struggling with the cloud-native space being slightly complex enough with so many technologies out there. Kubernetes itself, certainly, it has its own learning curve. So, making some of those things more simple and cookie-cutter so that enterprises can simply focus on just their application development, that is an area that needs to still be worked on. And we see enterprises, I wouldn't say really struggling, but trying to wrap their head around how to make the platform more simple for their developers. There are projects in the cloud-native space that are focused on this. I think a lot of this is going to come down to establishing a set of patterns for different use-cases. The more material and examples that are out there and use-cases that are out there will certainly help.Emily: Do you have an engineering tool that you can't live without? If so, what is it?Gou: [laughs]. My go-to is obviously GDB. It's our debugger environment. I certainly wouldn't be able to develop without that. When I am looking into insights into how a certain customer's environment is doing, Prometheus and Grafana are, sort of, my go-to tools in terms of looking at metrics, and application performance health, and things like that.Emily: How could listeners connect with you or follow you?Gou: On our site portworx.com, P-O-R-T-W-O-R-X-dot-com. We frequently blog on that site; that would be a good way to follow what we're up to and what we're thinking. I certainly respond to people via email. So, reach out to me if you have any questions. I'm pretty good replying to that: Gou—that's G-O-U—at portworx.com. I don't nearly tweet as much as our PR team would like me to tweet so there's not too much information there, unfortunately.Emily: Thank you so much for joining us on The Business of Cloud Native.Gou: Thank you, Emily. Thanks for having me.Emily: Thanks for listening. I hope you've learned just a little bit more about The Business of Cloud Native. If you'd like to connect with me or learn more about my positioning services, look me up on LinkedIn: I'm Emily Omier, that's O-M-I-E-R, or visit my website which is emilyomier.com. Thank you, and until next time.Announcer: This has been a HumblePod production. Stay humble.

amazon business ai pr simple driven 5g ip lock cto real world raid historically generally cio ui environments enabling prometheus apis vm vmware kubernetes bm cloud native ctos nfs grafana gdb rbac gou iscsi emily omier emily well humblepod

Navigating the Cloud Native Ecosystem with Harness Evangelist Ravi Lachhman

The Business of Open Source

Play Episode Listen Later Sep 2, 2020 32:17

The conversation covers: An overview of Ravi's role as an evangelist — an often misunderstood, but important technology enabler. Balancing organizational versus individual needs when making decisions. Some of the core motivations that are driving cloud native migrations today. Why Ravi believes it in empowering engineers to make business decisions. Some of the top misconceptions about cloud native. Ravi also provides his own definition of cloud native. How cloud native architectures are forcing developers to “shift left.” Links https://harness.io/ Twitter: https://twitter.com/ravilach Harness community: https://community.harness.io/ Harness Slack: https://harnesscommunity.slack.com/ TranscriptEmily: Hi everyone. I'm Emily Omier, your host, and my day job is helping companies position themselves in the cloud-native ecosystem so that their product's value is obvious to end-users. I started this podcast because organizations embark on the cloud naive journey for business reasons, but in general, the industry doesn't talk about them. Instead, we talk a lot about technical reasons. I'm hoping that with this podcast, we focus more on the business goals and business motivations that lead organizations to adopt cloud-native and Kubernetes. I hope you'll join me.Welcome to The Business of Cloud Native, I am your host Emily Omier. And today I'm chatting with Ravi Lachhman. Ravi, I want to always start out with, first of all, saying thank you—Ravi: Sure, excited to be here.Emily: —and second of all, I like to have you introduce yourself, in your own words. What do you do? Where do you work?Ravi: Yes, sure. I'm an evangelist for Harness. So, what an evangelist does, I focus on the ecosystem, and I always like the joke, I marry people with software because when people think of evangelists, they think of a televangelist. Or at least that's what I told my mother and she believes me still. I focus on the ecosystem Harness plays in. And so, Harness is a continuous delivery as a service company. So, what that means, all of the confidence-building steps that you need to get software into production, such as approvals, test orchestration, Harness, how to do that with lots of convention, and as a service.Emily: So, when you start your day, walk me through what you're actually doing on a typical day?Ravi: a typical day—dude, I wish there was a typical day because we wear so many hats as a start-up here, but kind of a typical day for me and a typical day for my team, I ended up reading a lot. I probably read about two hours a day, at least during the business day. Now, for some people that might not be a lot, but for me, that's a lot. So, I'll usually catch up with a lot of technology news and news in general. They kind of see how certain things are playing out. So, a big fan of The New Stack big fan of InfoQ. I also like reading Hacker News for more emotional reading. The big orange angry site, I call Hacker News. And then really just interacting with the community and teams at large. So, I'm the person I used to make fun of, you know, quote-unquote, “thought leader.” I used to not understand what they do, then I became one that was like, “Oh, boy.” [laughs]. And so just providing guidance for some of our field teams, some of the marketing teams around the cloud-native ecosystem, what I'm seeing, what I'm hearing, my opinion on it. And that's pretty much it. And I get to do fun stuff like this, talking on podcasts, always excited to talk to folks and talk to the public. And then kind of just a mix of, say, making some sort of demos, or writing scaffolding code, just exploring new technologies. I'm pretty fortunate in my day to day activities.Emily: And tell me a little bit more about marrying people with software. Are you the matchmaker? Are you the priest, what role?Ravi: I can play all parts of the marrying lifecycle. Sometimes I'm the groom, sometimes I'm the priest. But I'm really helping folks make technical decisions. So, it's go a joke because I get the opportunity to take a look at a wide swath of technology. And so just helping folks make technical decisions. Oh, is this new technology hot? Does this technology make sense? Does this project fatality? What do you think? I just play, kind of, masters of ceremony on folks who are making technology decisions.Emily: What are some common decisions that you help people with, and common questions that they have?Ravi: Lot of times it comes around common questions about technology. It's always finding rationale. Why are you leveraging a certain piece of technology? The ‘why' question is always important. Let's say that you're a forward-thinking engineer or a forward-thinking technology leader. They also read a lot, and so if they come across, let's say a new hot technology, or if they're on Twitter, seeing, yeah, this particular project's getting a lot of retweets, or they go in GitHub and see oh, this project has little stars, or forks. What does that mean? So, part of my role when talking to people is actually to kind of help slow that roll down, saying, “Hey, what's the business rationale behind you making a change? Why do you actually want to go about leveraging a certain, let's say, technology?” I'm just taking more of a generic approach, saying, “Hey, what's the shiny penny today might not be the shiny penny tomorrow.” And also just providing some sort of guidance like, “Hey, let's take a look at project vitality. Let's take a look at some other metrics that projects have, like defect close ratio—you know, how often it's updates happening, what's your security posture?” And so just walking through a more, I would say the non-fun tasks or non-functional tasks, and also looking about how to operationalize something like, “Hey, given you want to make sure you're maintaining innovation, and making sure that you're maintaining business controls, what are some best operational practices?” You know, want to go for gold, or don't boil the ocean, it's helping people make decisive decisions.Emily: What do you see as sort of the common threads that connect to the conversations that you have?Ravi: Yeah, so I think a lot of the common threads are usually like people say, “Oh, we have to have it. We're going to fall behind if you don't use XYZ technology.” And when you really start getting to talking to them, it's like, let's try to line up some sort of technical debt or business problem that you have, and how about are you going to solve these particular technical challenges? It's something that, of the space I play into, which is ironic, it's the double-edged sword, I call it ‘chasing conference tech.' So, sometimes people see a really hot project, if my team implements this, I can go speak at a conference about a certain piece of technology. And it's like, eh, is that a really rational reason? Maybe. It kind of goes into taking the conversation slightly somewhere else. One of the biggest challenges I think, let's say if you're kind of climbing the engineering ranks—and this is something that I had to do as I went from a junior to a staff to a principal engineer in my roles—with that it's always having some sort of portfolio. So, if you speak at a conference, you have a portfolio, people can Google your name, funny pictures of you are not the only things that come up, but some sort of technical knowledge, and sometimes that's what people are chasing. So, it's really trying to have to balance that emotional decision with what's best for the firm, what's best for you, and just what's best for the team.Emily: That's actually a really interesting question is sometimes what's best for the individual engineer is not what's best for the organization. And when I say individual engineer, maybe it's not one individual, but five, or the team. How do you sort of help piece together and help people understand here's the business reason, that's organization-wide, but here's my personal motivation, and how do I reconcile these, and is there a way even to get both?Ravi: There actually is a way to get both. I call it the 75/25 percent rule. And let's take all the experience away from the engineers, to start with a blank slate. It has to do with the organization. An organization needs to set up engineers to be successful in being innovative. And so if we take the timeline or the scale all the way back to hiring, so when I like to hire folks, I always like to look at—my ratio is a little bit different than 75/25. I'm more of a 50/50. You bring 50 percent of the skills, and you'll learn 50 percent of the skills, versus more conservative organizations would say, “You know what? You have 75 percent of the skills, if you can learn 25 percent of the skills, this job would be interesting to you.” Versus if you have to learn 80 percent, it's going to be frustrating for the individual. And so having that kind of leeway to make decisions, and also knowing that technical change can take a lot of time, I think, as an engineer, as an engineer—as talking software engineering professions as a whole, how do you build your value? So, your value is usually calculated in two parts. It's calculated in your business domain experience and your technical skills. And so when you go project to project—and this is what might be more of, hey, if you're facing too big of a climb, you'll usually change roles. Nobody is in their position for a decade. Gone are the days that you're a lifetime engineer on one project or one product. It's kind of a given that you'll change around that because you're building your repertoire in two places: you're building domain experience, and you're building technical experience. And so knowing when to pick your battles, as cliche as that sounds, oh, you know what, this particular technology, this shiny penny came out. I seen a lot of it when Kubernetes came out, like, “Oh, we have to have it.” But—or even a lot of the cloud-native and container-based and all the ‘et cetera accessories' as I call it, as those projects get steam surrounding it. It's, “We have to have it.” It's like, eh. It's good for resume building, but there's your things to do on your own also to learn it. I think we live in a day of open source. And so as an engineer, if I want to learn a new skill, I don't necessarily have to wait for my organization to implement it. I could go and play, something like Katacoda, I can go do things on my own, I can learn and then say, “You know what, this is a good fit. I can make a bigger play to help implement it in the organization than just me wanting to learn it.” Because a lot of the learning is free these days, which I think it's amazing. I know that was a long-winded answer. But I think you can kind of quench the thirst of knowledge with playing it on your own, and that if it makes sense, you can make a much better case to the business or to technology leadership to make change.Emily: And what do you think the core business motivations are for most of the organizations that you end up talking to?Ravi: Yeah, [unintelligible] core motivation to leveraging cloud-native technology, it really depends on organization to organization. I'm pretty fortunate that I get to span, I think, a wide swath of organization—so from startups to pretty established enterprises—I kind of talk about the pretty established enterprises. A lot of the business justification, it might not be a technical justification, but there's a pseudo technical business reason, a lot of times, though, I when I talk to folks, they're big concern is portability. And so, like, hey, if you take a look at the dollar and cents rationale behind certain things, the big play there is portability. So, if you're leveraging—we can get into the definition of what cloud-native resources are, but a big draw to that is being portable—and so, hopefully, you're not tied down to a single provider, or single purveyor, and you have the ability to move. Now, that also ties into agility. Supposedly, if you're able to use ubiquitous hardware or semi-ubiquitous software, you were able to move a little bit faster. But again, what I usually see is folk's main concern is portability. And then also with that is [unintelligible] up against scale. And so as—looking at ways of reducing resources, if you could use generics, you're able to shop around a little bit better, either internally or externally, and help provide scale for a softer or lesser cost.Emily: And how frequently do you think the engineers that you talked to are aware of those core business motivations?Ravi: Hmm, it really depends on—I'm always giving you the ‘depends' answer because talking to a wide swath of folks—where I see there's more emotion involved in a good way if there's closer alignment to the business—which is something hard to do. I think it is slowly eroding and chipping away. I've definitely seen this during my career. It's the old stodgy business first technology argument, right. Like, modern teams, they're very well [unintelligible] together. So, it's not a us versus them or cat versus dog argument, “Oh, why do these engineers want to take their sweet time?” versus, “Why does the business want us to act so fast?” So, having the engineers empowered to make decisions, and have them looked at instead of being a cost center, as the center of innovation is fairly key. And so having that type of rationale, like, hey, allowing the engineers to give input into feature development, even requirement development is something I've seen changed throughout my career. It used to be a very special thing to do requirements building, versus most of the projects that I've worked on now—as an engineer, we're very, very well attuned to the requirements with the business.Emily: Do you think there's anything that gets lost in translation?Ravi: Oh, absolutely. As people, we're emotional. And so if we're all sum total of our experiences—so let's say if someone asked, Emily, you and I a question, we would probably have four different answers for that person, just because maybe we have differences in opinions, differences of sum totals of experience. And I might say, “Hey, try this or this,” and then you might say, “Try that or that.” So, it really depends. Being lost in translation is always—it's been a fundamental problem in requirements gathering and it's continued to be a fundamental problem. I think just taking that question a step further, is how you go about combating that? I think having very shortened feedback cycles are very important. So, if you have to make any sort of adjustments, gone are the days I think when I started my career, waterfall was becoming unpopular, but the first project or two I was on was very waterfall-ish just because of the size of the project we worked on, we had to agree on lots of things; we were building something for six months. Versus, if you look at today, modern development methodologies like Agile, or Scaled Agile, a lot of the feedback happens pretty regularly, which can be exhausting, but decisions are made all the time.Emily: Do you think in addition to mistranslations, do you think there are any misconceptions? And I'm talking about sort of on both sides of this equation, you know, business leaders or business motivations, and then also technologists, and let's refocus back to talk about cloud-native in particular. What sort of misconceptions do you think are sort of floating out there about cloud-native and what it means?Ravi: Yeah, so what cloud-native means—it means something different to everybody. So, you listen to your podcasts for a couple episodes, if you asked any one of the guests the question, we all would give you a different answer. So, in my definition of cloud-native—and then I'll get back to what some of the misconceptions are—I have a very basic definition: cloud-native means two pillars. It means your architecture, or your platform needs to be ephemeral, and it needs to be [indibited]. So, it needs to be able to be short-lived, and be consistent, which are two things that are at odds with each other. But if you kind of talk to folks that, hey, they might be a little more slighted towards the business, they have this idea that cloud-native will solve all your problems. So, it reminds me a lot of big data back in the day. “Oh, if you have a Hadoop cluster, it will solve all of our logistics and shipping problems.” No. That's the technology. If you have Kubernetes, it will solve all of our problems. No. That's the technology. It's just a conduit of helping you make changes. And so just making sure that understand that hey, cloud-native doesn't mean that you get the checkmark that, “Oh, you know what? We're stable. We're robust. We can scale by using all cloud-native technologies,” because cloud-native technologies are actually quite complicated. If you're introducing a lot of complexity to your architecture, does it make sense? Does that make sense? Does it give you the value you're looking for? Because at the end of the day, and this is kind of something, the older I get, the more I believe it, is that your customers don't care how you did something; they care what the result is. So, if your web application's up, they don't care if you're running a simple LAMP stack, they just care that the application is up, versus using the latest Kubernetes stack, but using some sort of cloud-native NoSQL database, and we're using [Istio], and we're using, pick your flavor du jour of cloud-native technology, your end customer actually doesn't care how you did it. They care what happened.Emily: We can talk about misconceptions that other people have, but is there anything that continues to surprise you?Ravi: Yeah, I think the biggest misconception is that there's very limited choice. And so I'll play devil's advocate, I think the CNCF, the Cloud Native Computing Foundation, there's lots of projects, I've seen the CNCF, they have something called the CNCF Landscape, and I seen it grow from 200 cards, it was 1200 cards at KubeCon, I guess, end of last year in San Diego, and it's hovering around 1500 cards. So, these cards means there's projects or vendors that play in this space. Having that much choice—this is usually surprising to people because they—if you're thinking of cloud-native, it's like saying Kleenex today, and you think of Kubernetes or other auxiliary product or project that surrounds that. And a lot of misconception would be it's helping solve for complexity. It's the quintessential computer science argument. All you do in computer science is move complexity around like an abacus. We move it left to right. We're just shifting it around, and so by leveraging certain technologies there's a lot of complication, a lot of burden that's brought in. For example, if you want to leverage, let's say, a service Istio, Istio will not solve all your networking problems. In fact, it's going to introduce a whole set of problems. And I could talk about my biggest outage, and one of the things I see with cloud-native is a lot of skills are getting shifted left because you're codifying areas that were not codified before. But that's something I would love to talk about.Emily: Tell me about your biggest outage that sounds interesting.Ravi: Yeah, I didn't know how it would manifest itself. It's ways, I think, until, like, years later that I didn't have the aha moment. I used to think it was me, it probably still is me, but—so the year was 2013, and I was working for a client, and we were—it's actually a large news site—and so we were in the midst of modernizing their application, or their streaming application. And so I was one of the first applications to actually go to AWS. And so my background is in Java, so I have a Java software engineer or J2ED or JEE engineer, and having to start working more in infrastructure was kind of a new thing, so I was very fortunate up until 2013-ish up until this point that I didn't really touch the infrastructure. I was immune to that. And now being more, kind of becoming a more senior engineer was in charge of the infrastructure for the application—which is kind of odd—but what ended up happening that—this is going to be kind of funny—since I was one of the first teams to go to AWS, the networking team wouldn't touch the configurations. So, when we were testing things, and [unintelligible] environments, we had our VPC CIDR rules—so the traffic rules—wide open. And then as we were going into production, there were rules that we had to limit traffic due to a CIDR so up until 2013, I thought a C-I-D-R like a CIDR was something you drink. I was like, “What? Like apple cider?” So, this shows you how much I know. So, basically, I had to configure the VPC or Virtual Private Cloud networking rules. Finally, when we deployed the application, unknowing to myself, CIDR calculation is a significant digit calculation. So, the larger the number you divide by, the more IPs you let in. And so instead of dividing by 16, I divided by 8. I was like, “Oh, you'll have a bigger number if you divide by a smaller number.” I end up cutting off half the traffic of the internet when we deployed to production. So, that was a very not smooth way of doing something. But how did this manifest itself? So, the experts, who would have been the networking team, refused to look at my configuration because it was a public cloud. “Nope, you don't have a slot in our data center, we look at it.” And poor me, as a JEE or J2EE engineer, I had very little experience networking. Now, if you fast forward to what this means today, a lot of the cloud-native stack, are again, slicing and dicing these CNCF cards, a lot of this, you're exposing different, let's say verticals or dimensions to engineers that they haven't really seen before. A lot of its networking related a lot of it can be storage related. And so, as a software engineer, these are verticals that I'd never had to deal with before. Now, it's kind of ironic that in 2020, hey, yes, you will be dealing with certain configurations because, hey, it's code. So, it's shifting the burden left towards the developer that, “Oh, you know what, you know networking—” or, “You do need to know your app, so here's some Istio rules that you need to include in your packaging of your application.” Which folks might scratch your head. So, yeah, again, it's like shifting complexity away from folks that have traditional expertise towards the developer. Now, times are changing. I seen a lot of this in years gone by, “Oh, no. These are pieces of code. We don't want to touch it.” Being more traditional or legacy operations team, versus today, everybody—it's kind of the merging of the two worlds. The going joke is all developers are becoming infrastructure engineers, and infrastructure engineers are becoming software engineers. So, it's the perfect blend of two worlds coming together.Emily: That's interesting. And I now think I understand what you mean by skills shifting left. Developers have to know more, and more, and more. But I'm also curious, there's also people who talk about how Kubernetes, one of its failures is that it forces this shift left of skills and that the ideal world is that developers don't need to interact with it at all. That's just a platform team. What do you think about that?Ravi: These are awesome questions. These are things I'm very passionate about. I definitely seen the evolution. So, I've been pretty fortunate that I was jumping on the application infrastructure shift around 2014, 2015, so right when Kubernetes was coming of age. So, most of my background was in distributed systems. So, I'm making very large distributed Java applications. And so when Kubernetes came out, the teams that I worked on, the applications that were deployed to Kubernetes were actually owned by the app dev team. The infrastructure team wouldn't even touch the Kubernetes cluster. It was like, “Oh, this is a development tool. This is not a platform tool.” The platform teams that I were interacting with 2015, 2016, as Kubernetes became more popular than ever, they were the legacy—well, hate to say legacy because it's kind of my background too—they were the remaining middleware engineers. We maintained a web server cluster, we maintained the message broker cluster, we maintained XYZ distributed Java infrastructure cluster. And so when looking at a tool like Kubernetes, or even there were different platforming services, so the paths I've leveraged early, or mid-2010s was Red Hat OpenShift, before and after the Kubernetes migration inside of OpenShift. And so looking at a different—how teams are set up, it used to be, “Oh, this is an app dev item. This is what houses your application.” Versus today, because the workloads are so critical that are going on to say platforms such as Kubernetes, it was that you really need that system engineering bubble of expertise. You really need those platform engineers to understand how to best scale, how to best purvey, and maintain a platform like Kubernetes. Also, one of the odd things are—going back to your point, Emily, like, hey, why things were tossed over either to the development team or going back to a developing software engineer myself, do we care what the end system is?So, it used to be, I'll talk about Java-land here for a minute, give you kind of long-winded answer of back in Java land, we really used to care about the target system, not necessarily for an application that have one node, but if we had to develop a clustered application. So, we have more than one node talking to each other, or a stateful application, we really had start developing to a specific target system. Okay, I know how JBoss WildFly clusters or I know how IBM WebSphere or WebLogic clusters. And so when we're designing our applications, we had to make sure that we play well into those clustering mechanisms. With Kubernetes, since it's generic, you don't necessarily have to play into those clustering mechanisms because there's a basic understanding. But that's been the biggest Achilles heel in Kubernetes. It wasn't designed for those type of workloads, stateful workloads that don't like dying very often. That's kind of been the push or pull. It's just a tool, there's a lot of generic, so you can assume that the target platform will handle a certain way. And you're slowly start backing off the case that you're building to a specific target platform. But as Kubernetes has evolved, especially with the operator framework, you actually are starting to build to Kubernetes in 2018, 2019, 2020.Emily: It actually brought up a question for me that, at risk of sounding naive myself, I feel like I never meet anybody who introduces themselves as a platform engineer. I meet all these developers, everyone's a developer evangelist, for example, or their background is as a developer, I feel like maybe once or twice, someone has introduced themselves as, “I'm a platform engineer,” or, “I'm an operations specialist.” I mean, is that just me? Is that a real thing?Ravi: They're very real jobs. I think… it's like saying DevOps engineer, it means something else to who you talk to you. So, I'll harp on, like ‘platform engineer.' so kind of like, the evolution of the platform engineer, if you would have talked to me in 2013, 2014, “Hey, I'm a platform engineer,” I would think that you're a software engineer focused on platform tools. Like, “Hey, I focus on authentication, authorization.” You're building—let's say we had a dozen people on this call and we're working for Acme Incorporated, there's modules that transcend every one of our teams. Let's say logging, or let's say login, or let's say, some sort of look and feel. So, the platform engineer or the platform engineering development focused platform engineering team would make common reusable modules throughout. Now, with the great rise of platforms as a service, like PCF, and OpenShift, and DCOS, they became kind of like a shift. The middleware engineers that were maintaining the message broker clusters, maintaining your web application server clusters, they're kind of shifting towards one of those platforms. Even today, Kubernetes, pick your provider du jour of Kubernetes. And so those are where the platform engineers are today. “Hey, I'm a platform engineer. I focus on OpenShift and Kubernetes.” Usually, they're very vertically focused on one or more specific platforms. And operations folks can ride very big gamut. Usually, if you put, “operations” in quotes, usually they're systems or infrastructure engineers that are very focused on the infrastructure where the platform's run.Emily: I'm obviously a words person, and it just seems like there's this vocabulary issue where everybody knows what a developer is, and so it's easy to say, “Oh, I'm a developer.” But then everything else that's related to engineering, there's not quite as much specificity, precisely because you said everybody has a slightly different understanding. It's kind of interesting.Ravi: Yeah, it's like, I think as a engineer, we're not one for titles. So, I think a engineer is a engineer. I think if you asked most engineers, it's like, “Yeah, I'm a engineer.” It's so funny, a good example of that is Tim Berners-Lee, the person who created WWW, the World Wide Web. If you looked at his LinkedIn, he just says he's a web developer. And he invented WWW. So, usually engineering-level folks, you're not—at least for myself—is not one for title.Emily: The example that you gave regarding the biggest outage of your career was basically a skills problem. Do you think that there's still a skills or knowledge issue in the cloud-native world?Ravi: Oh, absolutely. We work for incentivization. You know, my mortgage is with PNC, and they require a payment every month, unfortunately. So, I do work for an employer. Incentivization is key. So, kind of resume chasing, conference chasing there's been some of that in the cloud-native world, but what ends up happening more often than not is that we're continuously shifting left. A talk I like to give is called, “The Engineering Burden is on the Rise.” And taking a look at what, let's say, a software engineer was required to do in 2010 versus what a software engineer is required to do today in 2020. And there's a lot more burden in infrastructure that, as a software engineer you didn't have to deal with. Now, this has to do with two things, or actually one particular movement. There's a movie company, or a video company in Los Gatos, California, and there's a book company in South Lake Union in Seattle. And so these two particular companies given the rise of what's called a full lifecycle developer. Basically, if you run it, or if you operate—you operate what you run, or if you write it, you run it. So, that means that if you write a piece of code, you're in charge of the operations. You have support, you're in charge of the SLAs, SLOs, SLIs. You're ultimately responsible if a customer has a problem. And can you imagine the number of people, the amount of skill set that requires? There's this concept of a T-shaped skill that you have to have experience in so many different platforms, that it becomes a very big burden. As an engineer, I don't envy anybody entering a team that's leveraging a lot of cloud-native technology because most likely a lot of that onus will fall on the software engineer to create the deployable, to create how you build it, to fly [unintelligible] in your CI stack, write the configuration that builds it, write the configuration deploys it, write the networking rules, write how you test it, write the login interceptors. So, there's a lot going on.Emily: Is there anything else that you want to add about your experience with cloud-native that I haven't really thought to ask, yet?Ravi: It's not all doom and gloom. I'm very positive on cloud-native technologies. I think it's a great equalizer. You're kind of going back—this might be a more intrinsic, like a 30-second answer here. If you taking back that I wanted to learn certain skills in 2010, I basically had to be working for a firm. So, 2010, I was working for IBM. So, there's certain distributed Java problems I wanted to solve. I basically had to be working for a firm because the software licensing costs were so expensive, and that technology wasn't very democratized. Looking at cloud-native technology today, there's a big, big push for open source, which open source is R&D methodology. That's what open source is, it helps alleviate some sort of acquisition—but not necessarily adoption—problems. And you can learn a lot. Hey, you could pick up any project and just try to learn, try to run it. Pick up these particular distributed system skills that were very guarded, I would say, a decade ago, it's being opened up to the masses. And so there's a lot to drink from, but you can drink as much as you want from the CNCF or the cloud-native garden hose.Emily: Do you have a software engineering tool that you cannot live without?Ravi: Recently, because I deal in a lot of YAML, I need a YAML linter. So, YAML is a space-separated language. As a human, I can't tell you what spaces are. Like, you know, if you have three spaces, and the next line you have four spaces. So, I use a YAML linter. It puts periods for me, so I can count them because it's been multiple times that my demo is not syntactically correct because I missed a space and I can't see it on my screen.Emily: And how can listeners connect with you?Ravi: Oh, yeah. You can hit me up on Twitter @ravilach, R-A-V-I-L-A-C-H. Or come visit us at Harness at www.harness.io. I run the Harness community, so community.harness.io. We have a Slack channel and a Discourse, and always excited to interact with people.Emily: Thanks for listening. I hope you've learned just a little bit more about the business of cloud-native. If you'd like to connect with me or learn more about my positioning services, look me up on LinkedIn: I'm Emily Omier, that's O-M-I-E-R, or visit my website which is emilyomier.com. Thank you, and until next time.Announcer: This has been a HumblePod production. Stay humble.

Simplifying Cloud Native Testing with Jón Eðvald

The Business of Open Source

Play Episode Listen Later Aug 26, 2020 27:45

The conversation covers: Some of the pain points and driving factors that led Jón and his partners to launch Garden. Jon also talks about his early engineering experiences prior to Garden. How the developer experience can impact the overall productivity of a company, and why companies should try and optimize it. Kubernetes shortcomings, and the challenges that developers often face when working with it. Jón also talks about the Kubernetes skills gap, and how Garden helps to close that gap. Business stakeholder perception regarding Kuberentes challenges. The challenge of deploying a single service on Kubernetes in a secure manner — and why Jón was surprised by this process. How the Kubernetes ecosystem has grown, and the benefits of working with a large community of people who are committed to improving it. Jón's multi-faceted role as CEO of Garden, and what his day typically entails as a developer, producer, and liaison. Garden's main mission, which involves streamlining end-to-end application testing. Links: Company site: https://garden.io/ Twitter: https://twitter.com/jonedvald Kubernetes Slack: https://slack.k8s.io/ Transcript:Emily: Hi everyone. I'm Emily Omier, your host, and my day job is helping companies position themselves in the cloud-native ecosystem so that their product's value is obvious to end-users. I started this podcast because organizations embark on the cloud naive journey for business reasons, but in general, the industry doesn't talk about them. Instead, we talk a lot about technical reasons. I'm hoping that with this podcast, we focus more on the business goals and business motivations that lead organizations to adopt cloud-native and Kubernetes. I hope you'll join me.Emily: Welcome to The Business of Cloud Native. I'm your host Emily Omier. And today I'm chatting with Jón Eðvald. And, Jón, thank you so much for joining me.Jón: Thank you so much for having me. You got the name pretty spot on. Kudos.Emily: Woohoo, I try. So, if you could actually just start by introducing yourself and where you work in Garden, that would be great.Jón: Sure. So, yeah, my name is Jón, one of the founders, and I'm the CEO of Garden. I've been doing software engineering for more years than I'd like to count, but Garden is my second startup. Previous company was some years ago; dropped out of Uni to start what became a natural language processing company. So, different sort of thing than what I'm doing now. But it's actually interesting just to scan through the history of how we used to do things compared to today. We ran servers out of basically a cupboard with a fan in it, back in the day, and now, things are done somewhat differently. So, yeah, I moved to Berlin, it's about four years ago now, met my current co-founders. We all shared a passion and, I guess to some degree, frustrations about the general developer experience around, I guess, distributed systems in general. And now it's become a lot about Kubernetes these days in the cloud-native world, but we are interested in addressing common developer headaches regarding all things microservices. Testing, in particular, has become a big part of our focus. Garden itself is an open-source product that aims to ease the developer experience around Kubernetes, again, with an emphasis on testing. When we started it, there wasn't a lot of these types of tools around, or they were pretty early on. Now there's a whole bunch of them, so we're trying to fit into this broad ecosystem. Happy to expand on that journey. But yeah, that's roughly—that's what Garden is, and that's… yeah, a few hop-skips of my history as well.Emily: So, tell me a little bit more about the frustration that led you to start Garden. What were you doing, and what were you having trouble doing, basically?Jón: So, when I first moved to Berlin, it was to work for a company called Clue. They make a popular period tracking app. So, initially, I was meant to focus on the data science and data engineering side of things, but it became apparent that there was a lot of need for people on the engineering side as well. So, I gravitated into that and ended up managing the engineering team there. And it was a small operation. We had more than a million daily active users yet just a single back end developer, so it was bursting at the seams. And at the time running a simple Node.js backend on Heroku, single Postgres database, pretty simple. And I took that through—first, we adopted containers and moved into Docker Cloud. Then Docker Cloud disappeared, or was terminated without—we had to discover that by ourselves. And then Kubernetes was manifesting as the de facto way to do these things. So, we went through that transition, and I was kind of surprised. It was easy enough to get going and get to a functional level with Kubernetes and get everything running and working. The frustration came more from just the general developer experience and developer productivity side. Specifically, we found it very difficult to test the whole application because we had, by the end of that journey, a few different services doing different things. And for just the time you make a simple change to your code to it actually having been built, deployed, and ultimately tested was a rather tedious experience. And I found myself building tools, bespoke tools to be able to deal with that, and that ended up being sort of a janky prototype of what Garden is today. And I realized that my passion was getting the better of me, and we wanted to start a company to try and do better.Emily: Why do you think developer experience matters?Jón: Beyond just the, kind of, psychological effect of having to have these long and tedious feedback loops—just as a developer myself, it kind of grinds and reduces the overall joy of working on something. But in more concrete material terms, it really limits your productivity. You basically, you take—if your feedback loop is 10 times longer than it should be, that exponentially reduces the overall output of you as an individual or your team. So, it has a pretty significant impact on just the overall productivity of a company.Emily: And, in fact, it seems like a lot of companies move to Kubernetes or adopt distributed systems, cloud-native in general, precisely to get the speed.Jón: And, yeah, that makes sense. I think it's easy to underestimate all the, what are often called these day-two problems, when—so, it's easy enough to grok how you might adopt Kubernetes. You might get the application working, and you even get to production fairly quickly, and then you find that you've left a lot of problems unsolved, that Kubernetes by itself doesn't really address for you. And it's often conflated by the fact that you may be actually adopting multiple things at the same time. You may be not only transitioning to Kubernetes from something analogous, you may be going from simpler, bespoke processes, or you might have just a monolith that didn't really have any complicated requirements when it comes to dev tooling and dev setups. So, yeah, you might be adopting microservices, containers, and Kubernetes all at the same time, and I can say a lot of good things about Kubernetes, but it leaves, definitely, a lot of gaps in terms of the experience, especially for the application developer. So, you have these different kinds of developers, and we're kind of gradually, and somewhat clumsy sometimes, moving from a world where would have the application developer, and then you would have the operators, and you would kind of throw things over a fence. And now we're trying to fuse these a little bit, and you end up with this combination of different people with different strengths, and different skill sets and the [00:07:03 infra], and often termed DevOps engineers—I know that makes some people cringe. It's just what we see out there, that's a job title that we're seeing more and more—they are at home with this sort of a thing, and they maybe have time and patience to tinker with something like Kubernetes, whereas your application developer, they just want to get that feature out. So, you end up with these two different disciplines where one is basically trying to unblock the other. And we're definitely made progress. And I like to think what we are building helps, and all the other tools that are kind of overlapping in our space. But I still see a lot of this frustration with both having to adopt this more DevOps mindset, and also just the frustrations with Kubernetes specifically, and all the actual technologies that are in play these days.Emily: And to what extent do you think this sub-optimal developer experience is about a skills gap?Jón: I think that's definitely a part of it. I mean, we have both current and prospective customers in all kinds of different companies coming from different backgrounds, and some more old school and rigid than others. And I would say, for companies where the divide was really crisp—like, you had the application developer, and then you had the operator—then basically the mentality, understandably, for the application developer, they just want to work on their code, and not have to worry about networking ports, and ingresses, and all the low-level primitives of Kubernetes. It's a whole new thing to learn, and the perception is that it's more getting in the way of them getting stuff done. They don't really need the power for their immediate needs. The power is maybe something that the operator really enjoys, and has a strong need for.Emily: Do you think tools like Garden help close the skills gap?Jón: I think so. It definitely makes it more palatable at the start; it's easier to get going, you get some guide rails, and you get some higher-level abstractions that are maybe familiar to you, so you're not completely overwhelmed by all the different options, and flags and whatnot that you need for all the Kubernetes, YAML, et cetera, and you can ease into it. So, I think that's really helpful. Another thing that I think is really helpful is, so instead of this separation—[00:09:33 crowbar] separation between developer and operator, people coming from the operations side, part of their role now is to empower the developer, so the developer becomes their customer. I think it, in some sense, always was that way, but now they are more focused on the developer experience, and unblocking, and creating, often, these internal tools and internal abstractions that buffer this skills gap, I guess, a little bit. And we see all kinds of variations on how people actually go about it. With Garden, what we hope to achieve is for—there's usually a team that is responsible for the developer experience of an organization, and they can just hand tools like that to the team. And maybe there's a little bit of a dotted line between what the application developer is responsible for and what they should know, but they should still be familiar enough with the concept so that they can start to perceive the infrastructure and the general systems architecture of the application as part of the application and not this separate concern. Yeah, I mean, I definitely would hope that were helpful in this regard.Emily: Yeah, and I was going to ask, actually, what your opinion about this is, you know about how familiar an application developer actually needs to be with Kubernetes.Jón: I think unless there is substantial investment at their company in abstracting these things away from the developer, they need to have at least a cursory understanding of how all this set up. Maybe they're not responsible for actually making the Kubernetes manifest, maybe they're working in some kind of a templated fashion, or they have a little bit of a buffer between the low-level configuration of everything and what they're working on in a day-to-day, but they'll get stuck pretty quickly and it will create a lot of friction if they just don't know at all how the thing works. I think it's also just important for any, kind of, distributed systems developers to understand what the primitives are, even if you don't know which flag to use for this and that to actually achieve what you want to achieve, understanding the dynamics and understanding—you know, as a developer, you're meant to understand basically how a computer works, you know, the general layout of a computer, CPU or memory, and you have some notion of how the network works. This is maybe a layer on top of that similar to how you are familiar with how an operating system works, you should have basic familiarity over how Kubernetes works and just the general layout of a cloud-native ecosystem. Which I think is reasonable.Emily: And then when we think about business stakeholders in a company, so people who are outside of the IT department, how do they perceive this challenge? Do they understand how developers might struggle with moving to Kubernetes? Do they understand the stakes of getting developer experience right?Jón: I think a lot of the time they find out in the process. I think, for better or for worse, this cloud-native ecosystem was kind of jumpstarted, and people were sold on the idea a little bit before it was actually palatable for day-to-day development and actually running in production. So, you have an ecosystem running around trying to make this into a pleasant experience, but part of that meant that there have been—especially the early adopters, I can only imagine, but even companies adopting Kubernetes now, yeah, they maybe have the wrong picture of what it actually takes to do this successfully, and we've definitely seen—I mean, I'm kind of in that group myself. I adopted Kubernetes fairly early on. Well,-ish; it was about three years ago. And once you get past day one, as it were, there is a lot of surprises, and I think that could have done with a little bit more education, and maybe the motivation to get people on board kind of clouded that a little bit. No pun intended. So, yeah, I think it's more common than not that people have had to find that out along the way. Maybe that's probably getting better now. I think people have a—both the tooling is actually better so that gap isn't as wide as it was, but also to general awareness, I think there's enough stories out there where people have really struggled with the adoption, and struggled with productivity somewhere along the way, or after the adoption process. Yeah. Does that answer the question? [laughs].Emily: Absolutely. And actually, I'm curious, what were some of the surprises for you?Jón: I'm still surprised every now and then. I've been working, I've been developing against Kubernetes, full-time pretty much—well, to the extent that I can develop full time over the last couple of years—and worked with it as an end-user before that, and I'm still finding new and interesting ways in which something can fail. And just a simple example, just the act of deploying a single service onto Kubernetes, you need to write up a few different YAML files. Doing that in a secure manner is not exactly obvious, and is nowhere near secure by default, so you have to, kind of, discover that, or pay some security providers to really help you with that. The number of ways in which a service deployment can fail is just—it's still—it keeps rising. [laughs]. I found a new one just yesterday. Like, “Oh, actually, here's an interesting condition. I hadn't come across that before.” So, the abstractions are fairly low-level, and you have to be able to navigate that. And I would much prefer higher-level abstractions that encapsulate all the—even if it's just encapsulating all the different failure modes of starting a container, that would be super helpful. But I think that's one thing that people keep stumbling on. And just to deploy a single service, there's a lot of different bells and whistles that you need to be kind of faintly aware of, and they—you will keep discovering them as you keep working with it. It's been both an interesting experience, and it's definitely put a lot of life into this developer ecosystem, but it's also been quite frustrating since myself and a lot of other developers out there are stuck having to deal with this and discover all these different things gradually. Does that make sense?Emily: Absolutely, but I'm going to ask another question, which is, what about pleasant surprises?Jón: I think I mentioned one earlier. One of the pleasant surprises is how quickly this ecosystem has formed, and a huge community of people, and really putting a lot of effort into making the whole experience better, and adding any functionality that doesn't come out of the box, as it were. I think it's been really interesting to observe that ecosystem manifest. Yeah, I think that's probably the most interesting positive thing that comes to mind. I think it's been interesting to see all kinds of different, ostensibly competing companies work together in many ways. Just the general power of the open-source movement has definitely come to light which has been great. I feel like the attitudes just in our user community, and the user community generally—just go to the Kubernetes Slack, or you have discussions on GitHub issues or all kinds of, you know, one or more of these open-source projects and ecosystem. Like, just general morale, and helpfulness, and collaborative nature has been really interesting to watch, and I've really enjoyed that.Emily: I'm going to ask, just sort of a very different question, which is, I always like to ask my guests what their job actually entails. Everyone has a job title, and it doesn't always provide lots of visibility into what you actually do during the day. So, what does somebody like you, who's the CEO of a startup, what do you actually do when you get into the office? What does your day actually look like?Jón: So, it has evolved a lot. Let's put it that way. And I wear many hats for sure. So, my background is I'm an engineer, that's kind of my comfort zone. I can write code and I can think about systems and all of that jazz, and that comes naturally to me. So, in my CEO role, I've been learning a lot on the go. So, these days, a lot of my time, I've spent maybe 30, 40 percent of my time coding these days, which is just about average. Sometimes it's a little bit more, sometimes it's almost not at all. And that's probably going to fluctuate and, over time, decrease. I think the most important part of my role is to generally engage with the ecosystem, and that involves reading, talking to people, talking to investors, talking to other people in my industry, talking to customers, kind of trying to build a mental map of this world we live in, and try and figure out how that translates into our product roadmap for us, and try and communicate what's important, what's urgent for the team to think about. So, I'm kind of this liaison between the external world and the internal world that is our team in many different ways. I think that consumes a lot of my time. And then working on content, working with our marketing manager, joining sales calls, supporting customers, working with our designer trying to figure out what's our brand identity, and how should that evolve. Yeah, I'm kind of a nexus of a lot of different things, which I really enjoy. Sometimes exhausting, don't get me wrong. But yeah, I like to think that I'm connecting dots.Emily: So, I wanted to also ask a little bit more about what Garden does, and particularly how is the testing that you do, how is it different from what you would get in just a normal CI tool?Jón: Right. So, testing is definitely—it was always part of the picture when we started. I think when we started working on Garden, we maybe tried to solve too many problems. That's pretty common startup mistake, I guess. And we're focusing more and more on testing. And so, I'm happy to elaborate on that. So, the way people do testing today—and this I derive from both personal experience and talking to a lot of our current and prospective customers—it's very common to, well A) not to it in any meaningful way. That's surprisingly common still, or to unit test the individual components of an application. But that falls short in a number of different ways when working with just microservices in general. And in these cloud-native scenarios, you tend to have a lot of different moving parts. And what we're trying to address is how easily you can test your application end-to-end; how you can spin up your own ad hoc instance of an application, the full application with all the different services, even the ones you do not working on; and how you can write tests such that they actually test the interaction between services and not through mocking and stubbing them, or bending over backwards to do these contract testing and things like that. That certainly helps mitigate these things, but the cloud-native ecosystem actually bring—and Kubernetes, and just the declarative nature of how we define and deploy applications today—actually creates a lot of opportunities. You can now feasibly spin up a whole instance of an application with all different services that comprise your back end, and you can run tests from within there. And Garden really helps to make that efficient and easy to use for developers. So, the comparison to CI naturally comes up, and Garden in a certain sense blurs the line between a developer tool that you use in the inner loop or locally from your laptop, and the process that happens in your CI. We don't aim to replace CI; that's a pretty saturated market. We just want to focus on the testing and previewing side of it. And what Garden does, which is unique, is that it allows you to describe every part of your stack, how it's built, how is deployed, and how it's tested, and we compose this into what we call the Stack Graph. So, we have this directed graph of all the different things that need to happen between just a bunch of Git repositories and a fully running, tested system. And where that comes in really handy is when you make a small change—you push a pull request or you're just testing locally by yourself—we know which parts of your stack actually are affected by that change, so you don't need to run a full battery of all of your different tests, you end-to-end tests. You can actually just automatically trigger the few parts that need to be rebuilt, redeployed, and retested. So, it's making it viable to actually run integration tests and end-to-end tests as part of your development workflow, and it also makes it much easier for the developer to cross between the inner loop—working on the code themselves—and getting it to pass through CI because that's often an arduous process, just getting things to pass testing in CI, because it may work fine on your laptop, or you may only have a small subset of the system that you're working on, and then you have to make another commit and push to get, and then the CI pipeline runs again. That can take an arbitrarily long amount of time. And instead, you can just run Garden test, or run a specific test suite from your laptop and have it work much the same way in CI as when you're developing locally.Emily: Do you have anything that you can think of that you'd like to add about this general topic of the surprises and misconceptions that come up around cloud-native and Kubernetes?Jón: It's generally been a pleasant experience. And it's been really motivating to see all this activity in this space. There's definitely lots of activity, lots of small startups and big companies trying to figure out better ways to do things. And I think it's fun to be a part of it. I think, who's to say how it's all going to shake out, but I don't remember it being quite like this pre-cloud-native, and I'm very much looking forward to, at the end of this journey, that we as an ecosystem kind of are going through that we will have something that really materially changes how we develop distributed systems, not just in the context of Kubernetes—because that's a specific technology that works for a specific type of customer—but rather taking it further and keeping this collaborative note, even when we go past what's the technology of the day.Emily: All right, so one bonus question, what is your favorite engineering tool that you could not live without? Or I should say that you could not do your work without? Maybe you could live without it.Jón: Well, yeah, exactly. Yeah. I think probably the most central tool that I have in terms of my development is Google. The thing is, as a developer, there's just no way you can know all the things. You don't remember every API. Like, I keep looking at the same thing over and over again, and being able to have all this information just at my fingertips, I think that's something that I would—well actually both have a hard time living without, and engineering without. And notable mentions, I've really enjoyed working with [00:25:33 VS Code] as of late. We code in TypeScript, which is slightly unusual, but I've also become a fan of that. There's probably a number of smaller tools that I'm forgetting to mention that I use all the time, but just being able to find whatever I need at any time, that's probably the most revolutionary thing to come up over the last decade.Emily: Where could listeners connect with you or follow you?Jón: So, I'm on Twitter, not the most active on Twitter, but I'm always—if you follow me, I probably just follow you back and try and get into the conversation that way. You can reach me there directly. Also, if you just want to ping me—so on Twitter, @jonedvald, just my name as it's printed. You can find the community for Garden itself through our website, Garden.io. We have a Slack channel on the Kubernetes Slack, just #Garden. And yeah, please reach out. I'd love to chat. If you're interested in using Garden or not, I'm always trying to be more engaged with the community, and that doesn't have to be Garden-related necessarily.Emily: Well, thank you so much for joining us on The Business of Cloud Native.Jón: Thank you so much for having me.Emily: Thanks for listening. I hope you've learned just a little bit more about the business of cloud native. If you'd like to connect with me or learn more about my positioning services, look me up on LinkedIn: I'm Emily Omier, that's O-M-I-E-R, or visit my website which is emilyomier.com. Thank you, and until next time.Announcer: This has been a HumblePod production. Stay humble.

ceo google business gardens berlin testing previous slack real world api clue simplifying github devops kudos cpu node kubernetes git vald typescript cloud native vs code heroku postgres yaml emily omier emily well humblepod

CERN's Transition to Containerization and Kubernetes with Ricardo Rocha

The Business of Open Source

Play Episode Listen Later Aug 19, 2020 34:26

Some of the highlights of the show include: The challenges that CERN was facing when storing, processing, and analyzing data, and why it pushed them to think about containerization. CERN's evolution from using mainframes, to physical commodity hardware, to virtualization and private clouds, and eventually to containers. Ricardo also explains how the migration to containerization and Kubernetes was started. Why there was a big push from groups that focus on reproducibility to explore containerization. How end users have responded to Kubernetes and containers. Ricardo talks about the steep Kubernetes learning curve, and how they dealt with frustration and resistance. Some of top benefits of migrating to Kubernetes, and the impact that the move has had on their end users. Current challenges that CERN is working through, regarding hybrid infrastructure and rising data loads. Ricardo also talks about how CERN optimizes system resources for their scientists, and what it's like operating as a public sector organization. How CERN handles large data transfers. Links: Email:ricardo.rocha@cern.ch Twitter: https://twitter.com/ahcorporto CERN TranscriptEmily: Hi everyone. I'm Emily Omier, your host, and my day job is helping companies position themselves in the cloud-native ecosystem so that their product's value is obvious to end-users. I started this podcast because organizations embark on the cloud naive journey for business reasons, but in general, the industry doesn't talk about them. Instead, we talk a lot about technical reasons. I'm hoping that with this podcast, we focus more on the business goals and business motivations that lead organizations to adopt cloud-native and Kubernetes. I hope you'll join me.Emily: Welcome to the Business of Cloud Native. I'm your host, Emily Omier, and today I'm here with Ricardo Rocha. Ricardo, thank you so much for joining us.Ricardo: It's a pleasure.Emily: Ricardo, can you actually go ahead and introduce yourself: where you work, and what you do?Ricardo: Yeah, yes, sure. I work at CERN, the European Organization for Nuclear Research. I'm a software engineer and I work in the CERN IT department. I've done quite a few different things in the past in the organization, including software development in the areas of storage and monitoring, and also distributed computing. But right now, I'm part of the CERN Cloud Team, and we manage the CERN private cloud and all the resources we have. And I focus mostly on networking and containerization, so Kubernetes and all these new technologies.Emily: And on a day to day basis, what do you usually do? What sort of activities are you actually doing?Ricardo: Yeah. So, it's mostly making sure we provide the infrastructure that our physics users and experiments require, and also the people on campus. So, CERN is a pretty large organization. We have around 10,000 people on-site, and many more around the world that depend on our resources. So, we operate private clouds, we basically do DevOps-style work. And we have a team dedicated for the Cloud, but also for other areas of the data center. And it's mostly making sure everything operates correctly; try to automate more and more, so we do some improvements gradually; and then giving support to our users.Emily: Just so everyone knows, can you tell a little bit more about what kind of work is done at CERN? What kind of experiments people are running?Ricardo: Our main goal is fundamental research. So, we try to answer some questions about the universe. So, what's dark matter? What's dark energy? Why don't we see antimatter? And similar questions. And for that, we build very large experiments. So, the biggest experiment we have, which is actually the biggest scientific experiment ever built, is the Large Hadron Collider, and this is a particle accelerator that accelerates two beams of protons in opposite directions, and we make them collide at very specific points where we build this very large physics experiments that try to understand what happens in these collisions and try to look for new physics. And in reality, what happens with these collisions is that we generate large amounts of data that need to be stored, and processed, and analyzed, so the IT infrastructure that we support, it's larger fraction dedicated to this physics analysis.Emily: Tell me a little bit more about some of the challenges related to processing and storing the huge amount of data that you have. And also, how this has evolved, and how it pushed you to think about containerization.Ricardo: The big challenge we have is the amount of data that we have to support. So, these experiments, each of the experiments, at the moment of the collisions, it can generate data in the order of one petabyte a second. This is, of course, not something we can handle, so the first thing we do, we use these hardware triggers to filter this data quite significantly, but we still generate, per experiment, something like a few gigabytes a second, so up to 10 gigabytes a second. And this we have to store, and then we have large farms that will handle the processing and the reconstruction of all of this. So, we've had these sort of experiments since quite a while, and to analyze all of this, we need a large amount of resources, and with time. If you come and visit CERN, you can see a bit of the history of computing, kind of evolving with what we used to have in the past in our data center. But it's mostly—we used to have large mainframes, that now it's more in the movies that we see them, but we used to have quite a few of those. And then we transitioned to physical commodity hardware with Linux servers. Eventually introduced virtualization and private clouds to improve the efficiency and the provisioning of these resources to our users, and then eventually, we moved to containers and the main motivation is always to try to be as efficient as possible, and to speed up this process of provisioning resources, and be more flexible in the way we assign compute and also storage. What we've seen is that in the move from physical to virtualization, we saw that the provisioning and maintenance got significantly improved. What we see with containerization is the extra speed in also deployment and update of the applications that run on those resources. And we also see an improving resource utilization. We already had the possibility to improve quite a bit with virtualization by doing things like overcommit, but with containers, we can go one step further by doing more efficient resource sharing for the different applications we have to run.Emily: Is the amount of data that you're processing stable? Is it steadily increasing, have spikes, a combination?Ricardo: So, the way it works is, we have what we call ‘beam' which is when we actually have protons circulating in the accelerator. And during these periods, we try to get as much collisions as possible, which means as much data as possible because this is the main ingredient we need to do physics analysis. And then with the current experiment, actually, we are quite efficient. So, we have a steady stream of data we generate from raw data is something like 70 petabytes a year, and then out of that data, we generate a lot more data be it like pre-processing, or physics analysis. So, in terms of data, we can pretty much calculate how much storage we need. What we see spikes is in the computing part.Emily: And is the amount of data, are you constantly having to increase your data storage capacity?Ricardo: Yeah, so, yeah, that's one aspect is we never—from the raw data, we keep it forever. So, actually, we need to increase what we do, which is kind of particular to the way CERN has been doing things is we rely a lot on tape for archival. And this has allowed us to—because states still evolve, and the capacity of the tape still evolves, this actually allows us to increase the capacity on our data center without having to get more space or more robots. We just increase the capacity of the tape storage. And then for the on-disk storage, though, we do have to increase capacity regularly.Emily: Who spearheaded the push to containerization and Kubernetes? Was it within the IT? Who in the organization was really pushing for it?Ricardo: Yeah. That's an interesting question because it happens gradually in different places. I wouldn't say it's a decision taken from top; bottom. It's more like a movement that starts slowly. And there's two main triggers, I would say. One of them is from the IT department, the people run the services because some of our engineers are exposed to these new technologies frequently, and there's always someone that sees some benefit, does a prototype, and then presents to their colleagues and this kind of trigger. So, it's more on the sense of simplifying operations, deployment, all the monitoring that is required, and also improve resource usage. So, this was one of the triggers. The other trigger is actually from our end users. One big aspect of science, and specifically here—physics—is that there's a big need to share a setup. When someone performs some analysis, it's quite common that they want to share the exact same setup with their colleagues, and make sure that this analysis is reproducible. So, there was a huge push from groups that focus on reproducibility to explore containerization and the possibility to just wrap all the dependencies in one piece of software and make sure that even if things change dramatically in the infrastructure, they still have one blob that can be run at any moment, be it in five years, ten years, and that they also can share easily with their colleagues.Emily: Do you think that that's been accomplished? Has it made it easier for the physicists who collaborate?Ricardo: I think, yeah, from the experience that I have working with some groups in this area, I think that that has been accomplished. And actually last year, we did some exercise that had two goals: one was to make sure that we could scale out using these technologies to a large amount of resources, and the second was that we could prove that this technology can be used for reusability and reproducibility. And we took, as an example, the analysis that originated the Nobel Prize in 2013, which was the analysis that found the Higgs boson at the time. So, this was code that was pretty old, and the run for this analysis was done in 2012, with old hardware, old code, and what we did is we wrapped the code as it was at the time in a container, and we could run it in infrastructure that is modern today using Kubernetes, and containers, and we could reproduce the results in a much faster way because we have better resources and all this new technologies, but using exactly the same code that was used at the time.Emily: Wow. And how would you say that moving to Kubernetes and to containers has changed the experience for end-users for the scientists?Ricardo: Yeah. So, there, I think it depends on which—how to put it—which step of the transition you're at, I think. There's clearly—like, the people that are familiar with Kubernetes, and at ease with these kind of technologies, they will tell about the benefits quite easily. But one thing that we saw is that unlike the transition from physical to virtualization where the resources still felt the same—you would still SSH to your machines, and the tools inside the virtual machines were still very similar—an, like, that—the transition to containerization and to Kubernetes has a steep learning curve. So, we have both some resistance from some people and also some frustration at the start. So, we promote all these trainings, and we do webinars internally to try to help with this, this jump. But then the measure of benefits once you start using this are quite clear. Like if you're managing a service, you see that deploying updates becomes much much easier. And then if you have to monitor, restart, collect metrics, all of this is well integrated because of this way of declaring how applications should look like, and how they should work, and just delegating to the orchestrator on how to handle this. I think yeah, people have seen this benefit quite a lot from the infrastructure part, from the service management as well.Emily: What do you think has been surprising or unexpected in the transition to Kubernetes and containers? I think the first thing that I'm interested in is unpleasant surprises: things that were harder than expected.Ricardo: Yep. I think this steep learning curve is one. Yeah, when you start using containers, it's not that complicated. Like if you—I don't know—build a Docker file, create your container is not that complicated, but when you jump in the full system, we do see some people having some kind of resistance, I would say, at the start. And it's quite important to have this kind of dissemination activities internally. And then the other thing that we realized—but it's kind of specific to our infrastructure because I mentioned that we have all these needs for computing capacity, and because we have a large data center, but it's not enough, we actually have a network of institutes around the world and data centers that collaborate with us, and that we developed this technology over the last almost 20 years now. And we had our own ways to, for example, distribute software to these sites. And by moving to containers, this poses a challenge because then the software now is wrapped in this well—defined image; it's distributed by registries, and this doesn't really match all the systems we had before. So, we have to rethink all these aspects of a very large, distributed infrastructure that has limitations, but we know about them and we know how to deal with them, and really learn how to do this when we start using containers, how to distribute the software, how to distribute the images and all the tracking that is required around this. So, for us in this area, it's not something we thought at the start, but it became pretty obvious after a while.Emily: Makes sense. What specific resistance did you meet, and how did you overcome it?Ricardo: Yeah, so I think we really need to show the benefit in some cases. We have people that are very proactive, of course, into exploring these technologies, but in some cases—because for many people, this is not their core work. So, if you're a physicist, you want to execute your analysis or get your workloads done. You don't necessarily want to know all the details about infrastructure, so you need to hide this. But if you are involved in this, you kind of have to relearn a bit, how to do system operations and service, how you do your tasks of service manager. So, you really need a learning process to understand how containers work, how you deploy things in Kubernetes. You have to redefine a lot of the structure of how your application is deployed. If you had a configuration management system that you were using—Puppet, Ansible, whatever—you have to transition to this new model of deploying things. So, I think if you don't see this as part of your core work, you are less motivated to do the transition. But yeah, I think this happens. It happened in the past already, like when we transition to automated configuration management, when we change configuration management systems, there's always a change that is required. Yeah, we are also lucky that in this research area, people are open-minded. So, yeah, we do organize a lot of different activities in-house. We do what we call ‘container office hours' weekly, where people can come and ask questions. We will organize regular webinars in different topics, and we have a lot of internal communication systems where we try to be responsive.Emily: What do you think are the benefits that, when you encounter somebody who's not super excited about this—or, when you encountered—what are the benefits that you usually highlight?Ricardo: Yeah. So, actually, we've been collecting from some users that have finished this transition what do they think, and the main benefits they say is in the speed of provisioning. In our previous setup, sometimes you can think that provisioning a cluster can take up to an hour, even in some cases more, to provision a full cluster, while now provisioning a full Kubernetes cluster, even a large one can take less than 10 minutes. And then even more when you do the deployment of the applications or when you do an update to an application, because of the way things are structured, you can really trust that this will happen within seconds and that you can apply these changes very quickly. If you have a large application that is distributed in many nodes, even if you have an automation system, this can take several minutes, or up to an hour or more, for that change to be applied everywhere, while with this containerization and orchestration offered by Kubernetes we can reduce that to just a couple of seconds and be sure that the state is correct; we don't have to do some extra modifications; we just trust the system in most cases. I think those have been the two main aspects. And then for more advanced use cases, I would say that one benefit is that in some cases, we have tools that are deployed at CERN, but they are also deployed in other sites—as I mentioned, we have this large network—and one thing, because of the how popular Kubernetes became and these technologies, is that they can develop the application at CERN, rapid, and make sure that it runs well on our Kubernetes system, and in most cases, they can just take that, deploy in a different site or in a public cloud, and with no or very little modifications, they will get a similar setup running there. So, that's also a key aspect, is that we reduced the amount of work that is going into custom deployments here and there.Emily: And would that make it easier for a scientist to, say, collaborate with somebody at a different location?Ricardo: Yeah, I think that's also one aspect, is that the systems start feeling quite familiar because they're pretty similar in all these places. This is something that is happening in our own infrastructure, but if we look at public clouds, all of them offer some kind of managed Kubernetes service that in some cases have slight differences but deal—in its overall functionality, they're pretty similar. They also have the right abstractions on computing, and also on storage and network so that you don't have to know, necessarily, the details underneath.Emily: And does this also translate to physicists being able to complete their research faster?Ricardo: Yeah. So, that's another aspect that we've been investigating is that on-premise, we have this capability to improve our resource usage. So, hopefully, we can do more with the same amount of hardware, but what we've seen is that we can also explore more on-demand resources. So, I mentioned earlier this trial that we did with this Higgs analysis, and the original analysis takes almost a day. We containerized and tried to wrap it all, and we managed to run a subset of the analysis on our infrastructure—but of course, it's a production infrastructure, so we can just ask for all the 300,000 cores to run a test; we get a very small fraction of that if we want to do something like this—but what we showed is that by going, for example, to the public cloud, because the systems are offered in the same way, we took the exact same application, we deployed it in a public cloud, in this case into GCP, using GKE, and we can trust, or we can imagine that they're offering us infinite capacity as long as we pay, but we only pay for what we use. And basically, we could scale-out, and in this experiment, we scaled out to 25,000 cores. And we could reduce what took initially one day, and then eight hours, we reduced it to something less than six minutes. And that's all also all we pay. Like when we finished the analysis, we just get rid of the resources and stop paying. So, this ability to scale out in this way, explore more resources in different places—we do the same with the HPC computers as well—kind of opens up a lot of possibilities for our physicists to still rely on our large capacity, but explore this possibility of getting extra capacity and do their analysis much faster if they have access to this extra resources.Emily: Do you ever feel like there's miscommunications or mistranslations that happen between your team—the cloud team, and the end-users—the physicists?Ricardo: Well, I think their main goal is to get access to resources, so miscommunications usually happen, when the expectations are not met somehow. And this is usually because we have a large amount of users, and we have to prioritize resources. So, sometimes, yeah, sometimes they will want their resources a bit faster than they can get. So, hopefully, this ability to explore more resources from other places will also help in this area.Emily: Since starting the Cloud transition, what do you think has been easier than expected, either for your team or for the scientists that are using them?Ricardo: Well, we've done the transition to Cloud itself already back in 2013, offering this API based provisioning of resources. I think one benefit that we had was that in the initial transition to the Cloud, we rely on a software stack of OpenStack for the private cloud, and also, we now rely on Kubernetes and other products that are part of CNCF. For the containerization part, I think one big thing here has been that we always got very involved with the upstream communities, and this has paid off quite a lot. We also contribute a lot to the upstream projects. I think that's the key aspect is that we suddenly realized that things that we used to have to build in-house because we always had big data, and we had to have systems that process all this amount of data, but as big data becomes a more generic problem for other industries and companies, these communities have been built to handle this, and this has been one key aspect that we realize now that we are not alone. We can just join these communities, participate, and contribute to them, learn with them, and we give some but we get a lot more back from these communities.Emily: What are you looking for from these communities now? And what I'm trying to ask is what's the next step, the problems that you haven't solved yet that you're actively working on?Ricardo: Yeah, so our main problem right now—well, it's not a problem. It's a challenge I would say—it's that even if we have a lot of data, we are actually having a big upgrade of our experiment in just a few years, where the amount of data could well be increased by something like 100 times, so this gets us to amounts of data that are again pushing the boundaries, and we need to figure out how to handle that. But the big challenge right now is to really understand how to have hybrid infrastructure where our workloads—what we try to do is hide this fact that the infrastructure is maybe not all in our data center, but some bits are in public clouds or other centers, we try to hide this to our user. So, this hybrid approach with the clusters spanning multiple providers is something that still has open issues in terms of networking, but also because we depend so much on this amount of data, we need to understand how to distribute this data in an efficient way and make sure that the workloads run in this places correctly. If you've used public clouds before, you realize quickly that it's very easy to get things up and running, it's very easy to push data, but then when you want to have a more distributed infrastructure and move data around, there's a lot of challenges, both technically and financially. So, these are challenges that we'll have to tackle in the next couple of months or years.Emily: Yeah, in fact, I was going to ask about how you manage data transfers. I know that could be expensive, and just difficult to manage. Is that something that you feel like you've solved or…?Ricardo: Yeah, so, technically we know how to do it because we have this past experience with large distributed infrastructures, like this large collection of centers around the world that I mentioned earlier. We have dedicated links between all the sites, and we know how to manage this, and how to efficiently move the data around. And we also are quite well connected to big hubs, to the scientific network hubs in Europe like [JANET], but also direct links to some cloud providers. So, technically we've experimented this, we can have efficient movement of the data we have built over the years to systems that can handle this, and we have the physical infrastructure as well. I think the challenge is understanding the cost models of this: when things are worth moving around, or just move the workloads to where the data is, and how to tackle this. So, we need to understand better how we do this, and also how we account our users when this is happening. This is not a problem that we've solved yet.Emily: Do you also have to have any way to put guardrails on what scientists can and cannot do? To what extent do you need to manage centrally and then let the scientists do whatever they want, but within a certain parameter?Ricardo: Yeah, so our scientists are grouped in what we call experiments, which are linked to physical experiments underground. These groups are well defined, and the amount of resources that they get are constantly renegotiated, but they are well defined. So, we know which share go to each group of users, but one very important point for us is that we need to try to optimize resource usage. So, if for some reason, a group that has a certain amount of resources allocated is not using them, we allow other groups to just increase a bit—temporarily—their usage, but then we need this mechanism of fair share to balance the usage over a large amount of time. So, using this model on-premises, we know how to do the accounting and management of this. Integrating this to a more flexible infrastructure where we can explore resources from different cloud providers and centers, yeah, it's something we are working on. But we do have this requirement that all of it has to be accounted, and over time, we have to make sure that everyone gets their share.Emily: Do you think everybody feels like the goals that you set out in moving to containers and Kubernetes, that you've met those goals? I mean, do you think everybody's, sort of, satisfied that it was a success?Ricardo: Yeah, I'm not sure we are at that point, yet. I think when we present these exercises, like, what we did that very large scale, that there is a clear improvement. I think people get excited about this. Now, we need to make sure that all of this is offered in a very easy way so that people are happy. But what it also shows—this kind of experiments, and I think it's where physicists get more interested, also, in this kind of technology, is that it makes it much easier to explore not only generic computing resources, but exploring dedicated resources such as accelerators, GPUs, or more custom accelerators like TPUs and IPUs, that we wouldn't have access, necessarily, in our data center, when we offer the potential to have a much easier way to explore these resources and have some sort of central accounting, then, yeah, they get excited because they know that this on-demand offering of GPUs has a lot of potential to speed up their analysis, and without having to overprovision our internal resources, which is something that we wouldn't be allowed to do.Emily: I'm imagining you don't really have much of a comparison, but I'm thinking about what's unique about being a public sector organization using this technology or being part of the open-source community and what's different from being in a private company.Ricardo: Yeah, I think one aspect is that we can't necessarily easily choose our providers because it's public sector, so maybe the tendering process is a bit different from a private company. I don't know exactly, but I could imagine that there are differences in this area. But also the type of workloads we do are quite different. One area that is kind of appearing from industry that is similar to what we do is machine learning, because there's this large batch workloads for machine learning model training. This is a lot of what we do, the majority of what we do. So, a traditional IT company in the industry would be more interested in providing services. In our case, we do that as well, but I would say 80 percent of our computing is not that. It's large batch submissions that need to execute as fast as possible, so the model is very different. This is especially important when we start talking about multi-cluster because our requirements are much simpler for multi-cluster—or multi-cloud then the requirements of a company that needs to expose services to their end-users.Emily: Is there anything else that you'd like to add that I haven't thought to ask?Ricardo: No, I think we covered most of the motivations behind our transition. I would say that I mentioned that one of the challenges is the amount of data that we will have in this upgrade. But I think we did—it could now happen, but it looks like it might happen—we did is that we need to find new computing models to do our analysis than the traditional way we used to do, and this seems to involve, these days, a lot of machine learning. So, I think one thing that we will see that we will benefit a lot, also, from what's happening in the industry is all these frameworks for improving machine learning. And for us, it's also nice to see that a lot of these frameworks are actually relying on this cloud-native Kubernetes type of technologies because it will mean that we are doing something right on the infrastructure, and we might benefit also to upper layers very soon.Emily: All right, one last question. What is an engineering tool that you can't live without, your favorite tool?Ricardo: Ah, okay. That's… well, I would say it's my desktop setup, I would say. I've been a longtime Linux user, and the layout I have, and the tools I use daily are the ones that make me productive, be it to manage my email, calendar, whatever. I think that's one. Yeah, I've learned to rely on Docker for quick prototyping of different environments as well, testing different setups. Yeah, I haven't thought about that. But I would say something in this area.Emily: And where could listeners connect with you or follow you?Ricardo: They can contact me by my CERN email which is ricardo.rocha@cern.ch. Or I also have a Twitter account, which is at @ahcorporto. I can pass the name after. But yeah, those are probably the two main points of contact.Emily: Okay, fabulous. Well, thank you so much for joining me.Ricardo: Thank you. This was great. Thanks.Emily: Thanks for listening. I hope you've learned just a little bit more about the business of cloud native. If you'd like to connect with me or learn more about my positioning services, look me up on LinkedIn: I'm Emily Omier, that's O-M-I-E-R, or visit my website which is emilyomier.com. Thank you, and until next time.Announcer: This has been a HumblePod production. Stay humble.

RVU's Cloud Native Transformation with Paul Ingles

The Business of Open Source

Play Episode Listen Later Aug 5, 2020 37:32

Some highlights of the show include: The company's cloud native journey, which accelerated with the acquisition of Uswitch. How the company assessed risk prior to their migration, and why they ultimately decided the task was worth the gamble. Uswitch's transformation into a profitable company resulting from their cloud native migration. The role that multidisciplinary, collaborative teams played in solving problems and moving projects forward. Paul also offers commentary on some of the tensions that resulted between different teams. Key influencing factors that caused the company to adopt containerization and Kubernetes. Paul goes into detail about their migration to Kubernetes, and the problems that it addressed. Paul's thoughts on management and prioritization as CTO. He also explains his favorite engineering tool, which may come as a surprise. Links: RVU Website: https://www.rvu.co.uk/ Uswitch Website: https://www.uswitch.com/ Twitter: https://twitter.com/pingles GitHub: https://github.com/pingles TranscriptAnnouncer: Welcome to The Business of Cloud Native podcast, where we explore how end users talk and think about the transition to Kubernetes and cloud-native architectures.Emily: Welcome to The Business of Cloud Native. I'm your host, Emily Omier, and today I am chatting with Paul Ingles. Paul, thank you so much for joining me.Paul: Thank you for having me.Emily: Could you just introduce yourself: where do you work? What do you do? And include, sort of, some specifics. We all have a job title, but it doesn't always reflect what our actual day-to-day is.Paul: I am the CTO at a company called RVU in London. We run a couple of reasonably big-ish price comparison, aggregator type sites. So, we help consumers figure out and compare prices on broadband products, mobile phones, energy—so in the UK, energy is something which is provided through a bunch of different private companies, so you've got a fair amount of choice on kind of that thing. So, we tried to make it easier and simpler for people to make better decisions on the household choices that they have. I've been there for about 10 years, so I've had a few different roles. So, as CTO now, I sit on the exec team and try to help inform the business and technology strategy. But I've come through a bunch of teams. So, I've worked on some of the early energy price comparison stuff, some data infrastructure work a while ago, and then some underlying DevOps type automation and Kubernetes work a couple of years ago.Emily: So, when you get in to work in the morning, what types of things are usually on your plate?Paul: So, I keep a journal. I use bullet journalling quite extensively. So, I try to track everything that I've got to keep on top of. Generally, what I would try to do each day is catch up with anybody that I specifically need to follow up with. So, at the start of the week, I make a list of every day, and then I also keep a separate column for just general priorities. So, things that are particularly important for the week, themes of work going on, like, technology changes, or things that we're trying to launch, et cetera. And then I will prioritize speaking to people based on those things. So, I'll try and make sure that I'm focusing on the most important thing. I do a weekly meeting with the team. So, we have a few directors that look after different aspects of the business, and so we do a weekly meeting to just run through everything that's going on and sharing the problems. We use the three P's model: so, sharing progress problems and plans. And we use that to try and steer on what we do. And we also look at some other team health metrics. Yeah, it's interesting actually. I think when I switched from working in one of the teams to being in the CTO role, things change quite substantially. That list of things that I had to care about increase hugely, to the point where it far exceeded how much time I had to spend on anything. So, nowadays, I find that I'm much more likely for some things to drop off. And so it's unfortunate, and you can't please everybody, so you just have to say, “I'm really sorry, but this thing is not high on the list of priorities, so I can't spend any time on it this week, but if it's still a problem in a couple of weeks time, then we'll come back to it.” But yeah, it can vary quite a lot.Emily: Hmm, interesting. I might ask you more questions about that later. For now, let's sort of dive into the cloud-native journey. What made RVU decide that containerization was a good idea and that Kubernetes was a good idea? What were the motivations and who was pushing for it?Paul: That's a really good question. So, I got involved about 10 years ago. So, I worked for a search marketing startup that was in London called Forward Internet Group, and they acquired USwitch in 2010. And prior to working at Forward, I'd worked as a consultant at ThoughtWorks in London, so I spent a lot of time working in banks on continuous delivery and things like that. And so when Uswitch came along, there were a few issues around the software release process. Although there was a ton of automation, it was still quite slow to actually get releases out. We were only doing a release every fortnight. And we also had a few issues with the scalability of data. So, it was a monolithic Windows Microsoft stack. So, there was SQL Server databases, and .NET app servers, and things like that. And our traffic can be quite spiky, so when companies are in the news, or there's policy changes and things like that, we would suddenly get an increase in traffic, and the Microsoft solution would just generally kind of fall apart as soon as we hit some kind of threshold. So, I got involved, partly to try and improve some of the automation and release practices because at the search start-up, we were releasing experiments every couple of hours, even. And so we wanted to try and take a bit of that ethos over to Uswitch, and also to try and solve some of the data scalability and system scalability problems. And when we got started doing that, a lot of it was—so that was in the early heyday of AWS, so this was about 2008, that I was at the search startup. And we were used to using EC2 to try and spin up Hadoop clusters and a few other bits and pieces that we were playing around with. And when we acquired Uswitch, we felt like it was quickest for us to just create a different environment, stick it under the load balancer so end users wouldn't realize that some requests was being served off of the AWS infrastructure instead, and then just gradually go from there. We found that that was just the fastest way to move. So, I think it was interesting, and it was both a deliberate move, but it was also I think the degree to which we followed through on it, I don't think we'd really anticipated quite how quickly we would shift everything. And so when Forward made the acquisition, I joined summer of 2010, and myself and a colleague wrote a little two-pager on, here are the problems we see, here are the things that we think we can help with and the ways that technology approach that we'd applied at Forward would carry across, and what benefits we thought it would bring. Unfortunately because Forward was a privately held business—we were relatively small but profitable—and the owner of that business was quite risk-affine. He was quite keen on playing blackjack and other stuff. So, he was pretty happy with talking about probabilities of success.And so we just said, we think there's a future in it if we can get the wheels turning a bit better. And he was up for it. He backed us and we just took it from there. And so we replaced everything from self-hosted physical infrastructure running on top of .NET to all AWS hosted, running a mix of Ruby, and Closure, and other bits and pieces in about two years. And that's just continued from there. So, the move to Kubernetes was a relatively recent one; that was only within the last—I say ‘recent.' it was about two years ago, we started moving things in earnest. And then you asked what was the rationale for switching to Kubernetes—Emily: Let me first ask you, when you were talking with the owner, what were the odds that you gave him for success?Paul: [laughs]. That's a good question. I actually don't know. I think we always knew that there was a big impact to be had. I don't think we knew the scale of the upside. So, I don't think we—I mean, at the time, Uswitch was just about breaking even, so we didn't realize that there was an opportunity to radically change that. I think we underestimated how long it would take to do. So, I think we'd originally thought that we could replace, I think maybe most of the stuff that we needed replaced within six months. We had an early prototype out within two weeks, two or three weeks because we'd always placed a big emphasis on releasing early, experimenting, iterative delivery, A/B testing, that kind of thing. So, I think it was almost like that middle term that was the harder piece. And there was definitely a point where… I don't know, I think it was this classic situation of pulling on a ball of string where it was like, what wanted to do was to focus on improving the end-user experience because our original belief was that, aside from the scalability issues, that the existing site just didn't solve the problem sufficiently well, that it needed an overhaul to simplify the journeys, and simplify the process, and improve the experience for people. We were focusing on that and we didn't want to get drawn into replacing a lot of the back office and integration type systems partly because there was a lot of complexity there. But also because you then have to engage with QA environments, and test environments, and sign-offs with the various people that we integrate with. But it was, as I said, it was this kind of tugging on a ball of string where every improvement that we made in the end-user experience—so we would increase conversion rate by 10 percent but through doing that, we would introduce downstream error in the ways that those systems would integrate, and so we gradually just ended up having to pull in slightly more and more pieces to make it work. I don't think we ever gave odds of success. I think we underestimated how long that middle piece would take. I don't think we really anticipated the degree of upside that we would get as a consequence, through nothing other than just making releases quicker, being able to test and move faster, and focusing on end-user experience was definitely the right thing to focus on.Emily: Do you think though, that everybody perceived it as a risk? I'm just asking because you mentioned the blackjack, was this a risk that could fail?Paul: Well, I think the interesting thing about it was that we knew it was the right thing to do. So, again, I think our experience as consultants at ThoughtWorks was on applying continuous delivery, what we would today call DevOps, applying those practices to software delivery. And so we'd worked on systems where there weren't continuous integration servers and where people weren't releasing every day, and then we'd worked in environments where we were releasing every couple of hours, and we were very quickly able to hone in on what worked and discard things that didn't. And so I think because we've been able to demonstrate that success within the search business, I think that carried a great deal of trust. And so when it came to talking about things we could potentially do, we were totally convinced that there were things that we could improve. I think it was a combination of, there was a ton of potential, we knew that there was a new confluence of technologies and approaches that could be successful if we were able to just start over. And then I think also probably a healthy degree of, like, naive, probably overconfidence in what we could do that we would just throw ourselves into it. So, it's hard work, but yeah, it was ultimately highly successful. So, it's something I'm exceedingly proud of today.Emily: You said something really interesting, which is that Uswitch was barely profitable. And if I understand correctly, that changed for the better. Can you talk about how this is related?Paul: Yeah, sure. I think the interesting thing about it was that we knew that there was something we could do better, but we weren't sure what it was. And so the focus was always on being able to release as frequently as we possibly could to try and understand what that was, as well as trying to just simplify and pay back some of the technical debt. Well, so, trying to overcome some of the artificial constraints that existed because of the technology choices that people have made—perfectly decent decisions on, back in the day, but platforms like AWS offered better alternatives, now. So, we just focused on being able to deliver iteratively, and just keep focusing on continual improvement, releasing, understanding what the problems were, and then getting rid of those little niggly things. The manager I had at Forward was this super—I don't know, he just had the perfect ethos, and he was driven—so we were a team that were focused on doing daily experiments. And so we would rely on data on our spend and data on our revenue. And that would come in on a daily cycle. So, a lot of the rhythm of the team was driven off of that cycle. And so as we could run experiments and measure their profitability, we could then inform what we would do on the day. And so, we have a handful of long-running technology things that we were doing, and then we would also have other tactical things that he would have ideas on, he would have some hypothesis of, well, “Maybe this is the reason that this is happening, let's come up with a test that we can use to try and figure out whether that's true.” We would build something quickly to throw it together to help us either disprove it or support it, and we would put it live, see what happened, and then move on to the next thing. And so I think a lot of the—what we wanted to do is to instill a bit of that environment in Uswitch. And so a lot of it was being able to release quickly, making sure that people had good data in front of them. I mean, even tools like Google Analytics were something which we were quite au fait with using but didn't have broad adoption at the time. And so we were using that to look at site behavior and what was going on and reason about what was happening. So, we just tried to make sure that people were directly using that, rather than just making changes on a longer cycle without data at all.Emily: And can you describe how you were working with the business side, and how you were communicating, what the sort of working relationship was like? If there was any misunderstandings on either side.Paul: Yeah, it's a good question. So, when I started at Uswitch, the organizational structure was, I guess, relatively classical. So, you had a pooled engineering team. So, it was a monolithic system, deployed onto physical infrastructure. So, there was an engineering team, there was an operations team, and then there were a handful of people that were business specific in the different markets that we operated in. So, there was a couple of people that focused on, like, the credit card market; a couple of people that focused on energy, for example. And, I used to call it the stand-up swarm: so, in the morning, we would sit on our desks and you would see almost the entire office moved from the different card walls that were based around the office. Although there was a high degree of interaction between the business stakeholders, the engineers, designers, and other people, it always felt slightly weird that you would have almost all of the company interested in almost everything that was going on, and so I think the intuition we had was that a lot of the ways that we would think about structuring software around loosely-coupled but highly cohesive, those same principles should or could apply to the organization itself. And so what we tried to do is to make sure that we had multidisciplinary teams that had the people in them to do the work. So, for the early days of the energy work, there was only a couple of us that were in it. So, we had a couple of engineers, and we had a lady called Emma, who was the product owner. She used to work in the production operations team, so she used to be focused on data entry from the products that different energy providers would send us, but she had the strongest insight into the domain problem, what problem consumers were trying to overcome, and what ways that we could react to it. And so, when we got involved, she had a couple of ideas that she'd been trying to get traction on, that she'd been unable to. And so what we—we had a, I don't know, probably a, I think a half-day session in an office. So, we took over the boardroom at the office and just said, “Look, we could really do with a separate space away from everybody to be able to focus on it. And we just want to prove something out for a couple of weeks. And we want to make sure that we've got space for people to focus.” And so we had a half-day in there, we had a conversation about, “Okay, well, what's the problem? What's the technical complexity of going after any of these things?” And there's a few nuances, too. Like, if you choose option A, then we have to get all of the historical information around it, as well as the current products and market. Whereas if we choose option B, then we can simplify it down, and we don't need to do all of that work, and we can try and experiment with something sooner. So, we wanted it to be as collaborative as possible because we knew that the way that we would be successful is by trying to execute on ideas faster than we'd been able to before. And at the same time, we also wanted to make sure that there was a feeling of momentum and that we would—I think there was probably a healthy degree of slight overconfidence, but we were also very keen to be able to show off what we could do. And so we genuinely wanted to try and improve the environment for people so that we could focus on solving problems quicker, trying out more experiments, being less hung up on whether it was absolutely the right thing to do, and instead just focus on testing it. So, were there tensions? I think there were definitely tensions; I don't think there weren't tensions so much on the technical side; we were very lucky that most of the engineers that already worked there were quite keen on doing something different, and so we would have conversations with them and just say, “Look, we'll try everything we can to try and remove as many of the constraints that exist today.” I think a lot of the disagreement or tension was whether or not it was the right problem to be going after. So, again, the search business that we worked in was doing a decent amount of money for the number of people that were there, and we knew there was a problem we could fix, but we didn't know how much runway it would have. And so there was a lot of tension on whether we should be pulling people into focusing on extending the search business, or whether we needed to focus on fixing Uswitch. So, there was a fair amount of back and forth about whether or not we needed to move people from one part of the business to another and that kind of thing.Emily: Let's talk a little bit about Kubernetes, and how Uswitch decided to use Kubernetes, what problem it solved, and who was behind the decision, who was really making the push.Paul: Yeah, interesting. So, I think containers was something that we'd been experimenting with for a little while. So, as I think a lot of the culture was, we were quite risk-affine. So, we were quite keen to be trying out new technologies, and we'd been using modern languages and platforms like Closure since the early days of them being available. We'd been playing around with containers for a while, and I think we knew there was something in it, but we weren't quite sure what it was. So, I think, although we were playing around with it quite early, I think we were quite slow to choose one platform or another. I think in the end, we—in the intervening period, I guess, between when we went from the more classical way of running Puppet across a bunch of EC2 instances that run a version of your application, the next step after that was switching over to using ECS. So, Amazon's container service. And I guess the thing that prompted a bit more curiosity into Kubernetes was that—I forgot the projects I was working on, but I was working on a team for a little while, and then I switched to go do something else. And I needed to put a new service up, and rather than just doing the thing that I knew, I thought, “Well, I'll go talk to the other teams.” I'll talk to some other people around the company, and find out what's the way that I ought to be doing this today, and there was a lot of work around standardizing the way that you would stand up an ECS cluster. But I think even then, it always felt like we were sharing things in the wrong way. So, if you were working on a team, you had to understand a great deal of Amazon to be able to make progress. And so, back when I got started at Uswitch, when I talk about doing the work about the energy migration, AWS at the time really only offered EC2, load balancers, firewalling, and then eventually relational databases, and so back then the amounts of complexity to stand up something was relatively small. And then come to a couple of years ago. You have to appreciate and understand routing tables, VPCs, the security rules that would permit traffic to flow between those, it was one of those—it was just relatively non-trivial to do something that was so core to what we needed to be able to do. And I think the thing that prompted Kubernetes was that, on the Kubernetes project side, we'd seen a gradual growth and evolution of the concepts, and abstractions, and APIs that it offered. And so there was a differentiation between ECS or—I actually forget what CoreOS's equivalent was. I think maybe it was just called CoreOS. But there are a few alternative offerings for running containerized, clustered services, and Kubernetes seems to take a slightly different approach that it was more focused on end-user abstractions. So, you had a notion of making a deployment: that would contain replicas of a container, and you would run multiple instances of your application, and then that would become a service, and you could then expose that via Ingress. So, there was a language that you could use to talk about your application and your system that was available to you in the environment that you're actually using. Whereas AWS, I think, would take the view that, “Well, we've already got these building blocks, so what we want our users to do is assemble the building blocks that already exist.” So, you still have to understand load balancers, you still have to understand security groups, you have to understand a great deal more at a slightly lower level of abstraction. And I think the thing that seemed exciting, or that seems—the potential about Kubernetes was that if we chose something that offered better concepts, then you could reasonably have a team that would run some kind of underlying platform, and then have teams build upon that platform without having to understand a great deal about what was going on inside. They could focus more on the applications and the systems that they were hoping to build. And that would be slightly harder on the alternative. So, I think at the time, again, it was one of those fortunate things where I was just coming to the end of another project and was in the fortunate position where I was just looking around at the various different things that we were doing as a business, and what opportunity there was to do something that would help push things on. And Kubernetes was one of those things which a couple of us had been talking about, and thinking, “Oh, maybe now is the time to give it a go. There's enough stability and maturity in it; we're starting to hit the problems that it's designed to address. Maybe there's a bit more appetite to do something different.” So, I think we just gave it a go. Built a proof of concept, showed that could run the most complex system that we had, and I think also did a couple of early experiments on the ways in which Kubernetes had support for horizontal scaling and other things which were slightly harder to put into practice in AWS. And so we did all that, I think gradually it just kind of growed out from there, just took the proof of concepts to other teams that were building products and services. We found a team that were struggling to keep their systems running because they were a tiny team. They only had, like, two or three engineers in. They had some stability problems over a weekend because the server ran out of hard disk space, and we just said, “Right. Well, look, if you use this, we'll take on that problem. You can just focus on the application.” It kind of just grew and grew from there.Emily: Was there anything that was a lot harder than you expected? So, I'm looking for surprises as you're adopting Kubernetes.Paul: Oh, surprises. I think there was a non-trivial amount that we had to learn about running it. And again, I think at the point at which we'd picked it up, it was, kind of, early days for automation, so there was—I think maybe Google had just launched Google Kubernetes engine on Google Cloud. Amazon certainly hadn't even announced that hosted Kubernetes would be an option. There was an early project within Kubernetes, called kops that you could use to create a cluster, but even then it didn't fit our network topology because it wouldn't work with the VPC networking that we needed and expected within our production infrastructure. So, there was a lot of that kind of work in the early days, to try and make something work, you had to understand in quite a level of detail what each component of Kubernetes was doing. As we were gradually rolling it out, I think the things that were most surprising were that, for a lot of people, it solved a lot of problems that meant they could move on, and I think people were actually slightly surprised by that. Which, [laughs], it sounds like quite a weird turn of phrase, but I think people were positively surprised at the amount of stuff that they didn't have to do for solving a fair few number of problems that they had. There was a couple of teams that were doing things that are slightly larger scale that we had to spend a bit more time on improving the performance of our setup. So, in particular, there was a team that had a reasonably strong requirement on the latency overheads of Ingress. So, they wanted their application to respond within, I don't know, I think it was maybe 200 milliseconds or something. And we, through setting up the monitoring and other bits and pieces that we had, we realized that Ingress currently was doing all right, but there was a fair amount of additional latency that was added at the tail that was a consequence of a couple of bugs or other things that existed in the infrastructure. So, there was definitely a lot of little niggly things that came up as we were going, but we were always confident that we could overcome it. And, as I said, I think that a lot of teams saw benefits very early on. And I think the other teams that were perhaps a little bit more skeptical because they got their own infrastructure already, they knew how to operate it, it was highly tested, they'd already run capacity and load tests on it, they were convinced that it was the most efficient thing that they could possibly run. I think even over the long run, I think they realized that there was more work that they needed to do than they should be focusing on, and so they were quite happy to ultimately switch over to the shared platform and infrastructure that the cloud infrastructure team run.Emily: As we wrap up, there's actually a question I want to go back to, which is how you were talking about the shifting priorities now that you've become CTO. Do you have any sort of examples of, like, what are the top three things that you will always care about, that you will always have the energy to think about? And then I'm curious to have some examples of things that you can't deal with, you can't think about. The things that tend to drop off.Paul: The top three things that I always think about. So, I think, actually, what's interesting about being CTO, that I perhaps wasn't expecting is that you're ever so slightly removed from the work, that you can't rely on the same signals or information to be able to make a decision on things, and so when I give the Kubernetes story, it's one of those, like, because I'd moved from one system to another, and I was starting a new project, I experienced some pain. It's like, “Right. Okay, I've got to go do something to fix this. I've had enough.” And I think the thing that I'm always paying attention to now, is trying to understand where that pain is next, and trying to make sure that I've got a mechanism for being able to appreciate that. So, I think a lot of the things I try to spend time on are things to help me keep track of what's going on, and then help me make decisions off the back of it. So, I think the things that I always spend time on are generally things trying to optimize some process or invest in automation. So, a good example at the moment is, we're talking about starting to do canary deployments. So, starting to automate the actual rollout of some new release, and being able to automate a comparison against the existing service, looking at latency, or some kind of transactional metrics to understand whether it's performing as well or different than something historical. So, I think the things that I tend to spend time on are process-oriented or are things to try and help us go quicker. One of the books that I read that changed my opinion of management was Andy Grove's, High Output Management. And I forget who recommended it to me, but somebody recommended it to me, and it completely altered my opinion of what value a manager can add. So, one of the lenses I try to apply to anything is of everything that's going on, what's the handful of things that are going to have the most impact or leverage across the organization, and try and spend my time on those. I think where it gets tricky is that you have to go broad and deep. So, as much as there are broad things that have a high consequence on the organization as a whole, you also need an appreciation of what's going on in the detail, and I think that's always tricky to manage. I'm sorry, I forgot what was the second part of your question.Emily: The second part was, do you have any examples of the things that you tend to not care about? That presumably someone is asking you to care about, and you don't?Paul: [laughs]. Yeah, it's a good question. I don't think it's that I don't care about it. I think it's that there are some questions that come my way that I know that I can defer, or they're things which are easy to hand off. So, I think the… that is a good question. I think one of the things that I think are always tricky to prioritize, are things which feel high consequence but are potentially also very close to bikeshedding. And I think that is something which is fair—I'd be interested to hear what other people said. So, a good example is, like, choice of tooling. And so when I was working on a team, or on a problem, we would focus on choosing the right tool for the job, and we would bias towards experimenting with tools early, and figuring out what worked, and I think now you have to view the same thing through a different lens. So, there's a degree to which you also incur an organizational cost as a consequence of having high variability in the programming languages that you choose to use. And so I don't think it's something I don't care about, but I think it's something which is interesting that I think it's something which, over the time I've been doing this role, I've gradually learned to let go of things that I would otherwise have previously thoroughly enjoyed getting involved in. And so you have to step back and say, “Well, actually I'm not the right person to be making a decision about which technology this team should be using. I should be trusting the team to make that decision.” And you have to kind of—I think that over the time I've been doing the role, you kind of learn which are the decisions that are high consequence that you should be involved in and which are the ones that you have to step back from. And you just have to say, look, I've got two hours of unblocked time this week where I can focus on something, so of the things on my priority list—the things that I've written in my journal that I want to get done this month—which of those things am I going to focus on, and which of the other things can I leave other people to get on with, and trust that things will work out all right?Emily: That's actually a very good segue into my final question, which is the same for everyone. And that is, what is an engineering tool that you can't live without—your favorite?Paul: Oh, that's a good question. So, I don't know if this is a cop-out by not mentioning something engineering-related, but I think the tool and technique which has helped me the most as I had more and more management responsibility and trying to keep track of things, is bullet journaling. So, I think, up until, I don't know, maybe five years ago, probably, I'd focus on using either iOS apps or note tools in both my laptop, and phone, and so on, and it never really stuck. And bullet journaling, through using a pen and a notepad, it forced me to go a bit slower. So, it forced me to write things down, to think through what was going on, and there is something about it being physical which makes me treat it slightly differently. So, I think bullet journaling is one of the things which has had the—yeah, it's really helped me deal with keeping track of what's going on, and then giving me the ability to then look back over the week, figure out what were the things that frustrated me, what can I change going into next week, one of the suggestions that the person that came up with bullet journaling recommended, is this idea of an end of week reflection. And so, one of the things I try to do—it's been harder doing it now that I'm working at home—is to spend just 15 minutes at the end of the week thinking of, what are the things that I'm really proud of? What are some good achievements that I should feel really good about going into next week? And so I think a lot of the activities that stem from bullet journaling have been really helpful. Yeah, it feels like a bit of a cop-out because it's not specifically technology related, but bullet journaling is something which has made a big difference to me.Emily: Not at all. That's totally fair. I think you are the first person who's had a completely non-technological answer, but I think I've had someone answer Slack, something along those lines.Paul: Yeah, I think what's interesting is there there are loads of those tools that we use all the time. Like Google Docs is something I can't live without. So, I think there's a ton of things that I use day-to-day that are hard to let go off, but I think the I think that the things that have made the most impact on my ability to deal with a stressful job, and give you the ability to manage yourself a little bit, I think yeah, it's been one of the most interesting things I've done.Emily: And where could listeners connect with you or follow you?Paul: Cool. So, I am @pingles on Twitter. My DMs are open, so if anybody wants to talk on that, I'm happy to. I'm also on GitHub under pingles, as well. So, @pingles, generally in most places will get you to me.Emily: Well, thank you so much for joining me.Paul: Thank you for talking. It's been good fun.Announcer: Thank you for listening to The Business of Cloud Native podcast. Keep up with the latest on the podcast at thebusinessofcloudnative.com and subscribe on iTunes, Spotify, Google Podcasts, or wherever fine podcasts are distributed. We'll see you next time.

amazon spotify google business uk transformation microsoft built forward ios google podcasts cto slack real world generally puppets closure aws github apis google analytics qa devops google cloud kubernetes cloud native ecs sql server ingress thoughtworks hadoop ec2 andy grove vpc my dms coreos high output management rvu paul yeah paul oh windows microsoft paul so vpcs uswitch paul well emily omier emily well announcer thank paul ingles

#33 Emily Omier, the importance of entrepreneurship and how to start being your own boss

Uncharted Podcast

Play Episode Listen Later Jul 23, 2020 30:13

Emily Omier is a journalist turned business owner that helps companies in the Kubernetes ecosystem with marketing strategy, positioning and content. Tune in to a fantastic episode and make sure to connect with Emily and us if you haven't already. Connect with Emily Omier: https://www.linkedin.com/in/b2b-tech-content-strategist-writer/ https://www.fullstackcontent.io/ Connect with Poya Osgouei: LinkedIn: https://www.linkedin.com/in/poyaosgouei/ Twitter: https://twitter.com/IamPoya Connect with Robby Allen: Linkedin: https://www.linkedin.com/in/robbyallen/ Twitter: https://twitter.com/_RobbyAllen --- Support this podcast: https://anchor.fm/uncharted1/support

entrepreneurship kubernetes own boss emily omier

Cloud Costs: A Conversation with Travis Rehl

The Business of Open Source

Play Episode Listen Later Jul 22, 2020 35:20

This conversation covers: Why many businesses are shifting away from analyzing total cloud spend (CapEX vs. OpEX) and are now forecasting spend based around usage patterns. The difference between cloud-native, cloud computing, and operating in the cloud. The delta that often exists between engineering teams and business stakeholders regarding costs. Travis also offers tips for aligning both parties earlier in the project lifecycle. Common misconceptions that exist around cost management, for both engineers and business stakeholders. For example, Travis talks about how engineers often assume that business teams manage purely to dollars and cents, when they are often very open to extending budgets when it's necessary. Tips for predicting cloud spend, and why teams usually fall short in their projections. Why conducting cloud cost management too early in a project can be detrimental. Comparing the cost of the cloud to a private data center. The growing reliance on multi-cloud among large enterprises. Travis also explains why it's important to have the right processes in place, to identify cross-cloud saving opportunities. How IT has transitioned from a business enabler to a business driver in recent years, and is now arguably the most important component for the average company. Links: Twitter: https://twitter.com/TravisWRehl LinkedIn: https://www.linkedin.com/in/travis-rehl-tech/ Main Company Site: https://cloudcheckr.com CloudCheckr All Stars: https://cloudchecker.com/allstars TranscriptAnnouncer: Welcome to The Business of cloud-native podcast, where we explore how end users talk and think about the transition to Kubernetes and cloud-native architectures.Emily: Welcome to the Business of cloud-native. I'm your host, Emily Omier, and I'm here today with Travis Rehl, who is the director of product at CloudCheckr. Travis, I just wanted to start out, first of all, by saying thank you for joining me on the show. And second of all, if you could just start off by introducing yourself. What you do, and by that I mean, what does an actual day look like? And some of your background?Travis: Yeah. Well, thanks for having me. So yeah, I'm Travis Rehl, director of product here at CloudCheckr. What that really means is, I have the fun job of figuring out what should the business do next in relation to our product offering here at the business. That means roadmap, looking at the market, what are customers doing differently now, or planning to do differently over the next year, two years or so, on cloud? What their cost strategies are, what their invoicing and chargeback strategies are, all that type of fun stuff, and how we can help accommodate those particular strategies using our product offering. Sort of, day to day, though, I would say that a bunch of my time during the day is spent talking to customers, figuring out where they are in their cloud journey, if you will, what programs or projects they may have in flight that are interesting, or complicated, or they need help on. Especially making any sort of analysis help in particular, and then lastly, taking all that information and packaging it up neatly, so that the business can make a decision to add functionality to our product in some way that can assist them move forward.Emily: The first question I wanted to ask is actually if you could talk just a little bit about the distinction between cloud-native, and cloud computing, and operating in the cloud. What do all of those things actually mean, and where's the delta between them?Travis: Sure. Yeah so, it's actually kind of interesting, and you'll hear it a little bit differently from different people. In my background, in particular—I used to run an engineering department for a managed service provider. And so we used to do a lot of project planning of companies as to what's their strategy for their software deployment of some kind on cloud. And typically the two you see for, say, cloud-native versus operating in the cloud, operating on the cloud is very atypical. You'd associate that to something like lift and shift—probably hear about a lot—the concept of taking your on-prem workload and simply cloning it, or taking it in some way and copying in some way, on to the cloud-native vendor in particular. So, literally just standing up servers of clones of hard drives and so forth, and emulating what you had on-prem, but on the cloud. That's a great technique for moving quickly to cloud. That's not a great technique if you want to be cloud-native. So, that's really the big segue for cloud-native, in particular, is you want to build a software solution that takes advantage of cloud-only technology, meaning serverless compute resources, meaning auto-scaling different types of services themselves, stuff you probably didn't have when you're on-prem originally, that you now have, you can take advantage of on the cloud. That's almost like a redesign, or reimplementation around those models that cloud itself provides to you. That, to me, is the big difference. And I see oftentimes that gap-wise, many companies who are starting on-prem, they will do the migration to cloud first, the lift and shift model, and then they will decide, “Hey, I want to redesign pieces of it to make it more cloud-native.” And then you'll see startups who don't have on-prem at all, they will just go into cloud-native from the get-go.Emily: Of course, CloudCheckr specializes in helping with costs among some other things, but how do costs fit into this journey, and what sort of cost-related concerns do companies have as they're on this cloud journey?Travis: Yeah, so there's a few. I would actually say that years ago—just to clarify, the discussion has changed over the last few years—but years ago, it started with CapEx versus OpEx costs, specifically for purchasing of your IT services. On-prem, you'd probably purchase up-front a bulk number of VMs or servers or otherwise, for a number of years, and so be a CapEx cost. When you moved over to cloud and more of this, usage-based, model kind of threw a lot of people for a loop when it came to more OpEx usage space models. AWS, Azure, GCP have helped in that regard with things like reserved instances for companies who are more CapEx oriented as well, but in terms of the initial years ago, a big hurdle was communicating that difference and how the business may pay for these services. And a lot of people were very interested in moving to OpEx back then, in particular. When it came to how do you take into account all these cost-related changes the business may go through, one of the big ones that I see most recently is around the transference and storage of data. In the past, it would have been about how much money total am I going to spend on the cloud itself. Now, it's about what am I forecasting to spend based off of those usage patterns. It's a bit easier to forecast those things when you have servers that run for a period of time, but when you have usage patterns for data ingestion, for data transfer, for servers spinning up and spinning down and scaling out horizontally, that pattern becomes a bit more fluid. So, that's typically a conversation that comes up quite often. That's the type of thing that CloudCheckr and products like us can help with.Emily: Do you think there's any delta between how engineering teams think about and understand cost, and business stakeholders think about and understand costs?Travis: I would say yeah, there's a fairly significant difference there. I would say that engineers initially care a little bit less about cost. They have an objective. They're trying to solve for a project or goal they're trying to achieve. And so as a result, they're saying, “How can I achieve that quickly?” And that changes over time as the product becomes closer to production-ready. They may say, “Okay, now I want to optimize a bit more what I built.” Whereas the business is more thinking about, “I have this project plan in front of me, how do I go as quickly as possible, without incurring so much costs that it goes over budget, or I don't foresee a particular budgetary circumstance occurring?” It's slightly different mentalities, to be quite honest with you. What I see the most of is that they do align towards the end of the project, or the steady-state of the project, when the team has delivered the thing, whatever that may be, but need to make a decision quite quickly as to how do they want to either cap those costs moving forward or come up with an appropriate sort of budgetary model that scales linearly or otherwise, that both sides can agree upon?Emily: And are there any best practices that you can think of for how that gap can be bridged earlier in the process?Travis: Yeah. What's interesting, you're seeing, probably in the last couple years now—as you've probably heard, “Cloud Centers Of Excellence.” I'm assuming you've heard that term?Emily: Mm-hm. Yep.Emily: Yeah. So, one of the big amplifiers of that is bringing in your financial team to that planning model for how you want to deploy and manage cloud resources, so they have a say early on in the process, but also, it's more about communicating what the plan needs to be. Typically, I see the engineering team—the product manager, right—going off and saying, “We need a build and go fast.” And then it's the financials—accounting or otherwise—that says, “Hey, what happened to this project once you delivered it?” To bring them in earlier to that conversation via a CCOE sort of methodology—Cloud Center Of Excellence methodology—I see helps the most. It's really about communicating first. The second, though, is about setting corporate guidelines for what people are allowed or not allowed to do. Shadow IT has been a problem for a very long time for businesses. Cloud actually, in my opinion, can amplify that problem because of how easy it can be to start spinning up resources, and doing projects, or otherwise. So, it's pretty important for central IT, or for that Center Of Excellence group to set the standards of how the business needs to operate from that point forward, but then allowing different organizations, business units, or groups to deploy and manage at the cadence they need.Emily: What do you think are some misconceptions on both sides about cost management? So, misconceptions from engineers and misconceptions from business stakeholders?Travis: I think, as a product person, I have the fun job of being someone in the middle of it all. I think engineers have a misconception that the business side is managing to dollars and cents. That can often be the case. So, if you're an engineer, and you are a creative type, and you want to build something, and you have someone out there saying, “No, no, no, you can't do it because of XYZ thing,” cost or [unintelligible], or otherwise it can create this barrier between the two of them. But what I have learned, though, is that from an engineer communicating to a business person, especially on this subject, documenting and communicating the impact, the positive impact, and sometimes negative impact, but positive impact I'd recommend, on the business for why the project will enable your teams, or get you into a market faster, or solve a corporate problem that's been around for a while. The business side is very open, I found over the years, to hearing out those reasons, and agreeing to extending costs, or being more lenient on the cost model of a project so it can go faster. Likewise, though, on the business side, when they're working with engineers, it can be a little bit fraught when they think of, say if you've got a project moving and a business team says, “You have a budget for this and a line item,” and they say, “What's the project plan? When are you going to deliver? What's the velocity for delivering?” Cloud lets engineers do a bunch of new things they probably haven't had in the past, like faster deployment times to the product or a project in question; being able to react, or be proactive, however way you want to manage to it, for a customer requests, or otherwise. And as a result, it can be easy for that business person to expect a very linear approach to product development or delivery of services. And so for them, it's more about seeing and working with engineers to see how, “Hey, maybe the business can be improved by doing things a little bit differently than they had in the past.”Emily: Yeah. So, basically, what you're saying is, as long as the business is understanding this expense as an investment in something, usually it's not a big problem.Travis: Yeah. But it really comes down to communicating that. It can be so hard for an engineer, to say, “Hey, if I deliver this new pipeline, then I can deliver code, converge code, deploy code, test code faster.” That may not translate really well to a business person, and that's kind of the, I would hope, the role of product within your organization, or their organization, to help do that translation because if you're able to deploy quicker, that means you can solve customer problems faster, your time to market's faster, you can make changes to your cost model faster, with very little risk. But that's the communication gap that typically resides.Emily: And what do you think are some surprises that come up on the cloud journey regarding—we've already talked about how this is a shift from CapEx to OpEx, but what do people not fully understand about what those OpEx costs are going to be for example, or how it's going to work at the beginning of the journey versus at the end?Travis: Yeah, so at the beginning, I think there's an interpretation that—let's say you're spinning up a project for a very first time, and you're typically used to that CapEx model, and the team delivers their budget to you, the business person, and they're like, “It's going to cost us, round numbers, like$1,000 this month to get started.” And then by month two, they've scaled up, and they're two grand or three grand or something, whatever that number needs to be. The business is typically not prepared or is ready to be prepared to understand that it will change; the costs will change. It never is exactly what you ever thought it was going to be at to start. You can estimate all you want, there's tons of different calculators out there that helps you get close enough, but it's never going to be the number you always envisioned it to be. And that's just something you have to live with and get used to. It's more about the later on in the project that matters more. So, I think up front, it's very useful for project leaders to also communicate to the business that this is an estimate. We think it's going to start here, but we expect it to grow. And then, have good caps for yourself, almost like milestones to say, if you're getting closer to a particular cap, to a particular budget, maybe you should sit down for a moment and say, “Hey, are we doing this the wrong way, or can we do it a little bit differently?” Later on in the project, though, it gets kind of interesting. So, in my past life, I worked at a company where we did a lot of eCommerce implementations for other vendors, for other businesses, and one of the things always surprised people the most was data transfer costs. So, when we will be scoping a project, we would say, “Okay, here's the traffic we would expect to hit your new eCommerce solution, here's the type of things we expect them to do, here is the type of content, they need to load from the system, and thus, here's how much you're going to spend in transfer rates from the servers or otherwise—or CDN or otherwise—to the end-user.” And then what always hit us every single time is, like, Black Friday, or some kind of promotional event a marketing team would do, where we see a ton more traffic coming onto the system. And suddenly, all those atypical usage statistics would spike, you would be sending more images out to users, you'd be ingesting more traffic than you're anticipating, and you'd have more storage of their user information or otherwise, on your back end. And so, that sort of scaling model of usage patterns becomes more common over time because you see it more often. I like to put sort of, like, buffers on those particular areas of the budget, that say, “Hey, here's how much I think I'm going to spend on EC2, or computes, or VMs, or what have you, but I know that we ingest a lot of images from end-users, and we have to take in those images all the time, and I know we're going to run four promotions in a given year, and I should expect a 40 percent increase on those time periods. So, maybe my OpEx costs should be increased by a certain budget to accommodate that.” That type of planning has to start, has to become normal to happen between the project team and the accounting group, or financial group of some kind. So, there's some more of a clear agreement as to what realistically the budget needs to be long term.Emily: Do actual cloud costs ever come in less than estimated?Travis: I would say it's more often over what's estimated. The only times I see it coming in less is when the team delivering the solution implements a really sophisticated sort of data transfer strategy, or storage and archival strategy than expected. Maybe usage is less than you expect as well. Or, everybody should always be planning for some buffer, so if you've got a project that you think it's going to cost you $800 a month, maybe you should tell the business it's $1000. [laughs]. Sort of overachieve where you think it's appropriate.In that case, you're always going to come in lower than you expect. I actually think that cloud costs will always increase. That's just natural to do so. As you do more on cloud, you store more, you transfer more, you need more compute; it will always increase. I think it's more important to analyze the rates in which you are increasing. If you are spiking and it continues to spike, then there is a problem. If you're steadily growing, but your user account is growing, your revenue is increasing as well, and it's linear in some fashion, in that regard, that's a good thing. That doesn't mean something's wrong. But if you're spiking and things are consistently spiking, then you have a problem.Emily: Just like any other part of your business. If you're hiring more people, hopefully, it's because it means your business is growing. But you're also spending more—Travis: Exactly.Emily:—obviously. You were also saying sometimes the costs end up being less than expected because the company has invested in cost management strategies. Why—Travis: Yeah—Emily: Oh, go ahead.Travis: I was just going to say, I would say that companies don't typically start there. At least in the past, they haven't started first food forward, where they're saying, “I want to keep costs in check first, prior to doing the project.” Typically, they do a project, then they find that oh, they did too much, and now we need help. That's the normal I see.Emily: Yeah, in fact, I was going to ask, are there disadvantages to doing cloud cost management?Travis: In general?Emily: Yes, or scenarios where you would say, “Maybe this isn't what you're supposed to be focused on?”Travis: Yeah, I think it depends on the stage of either the entire company or the individual business unit or project. And what I mean by that is, if you're Google, you're probably managing a very small team, like they're a startup, or your startup yourself. And so, in the very early onset of any product delivery, cloud or not, it's typically more about volume and usage patterns of your end-users on the system. And you will spend money to get there, you will put in long hours to make it happen. Don't create barriers yourself, to achieve the goal that you have in mind for your team, your project, or otherwise. But then as the project matures, you typically see people sort of step back and say, “Okay, we made a great lot of great strides. We've hit a lot of our milestones, we have the user base we need, how do we tailor back a bit so that things become a bit more normalized and consistent on the delivery?” And so early on, I don't see a lot of people putting a lot of effort into cost savings, cost optimizations, and recommendations, just because things are fluid, dynamic, chaotic, whatever word choice you want to use for yourself at the time. But at some point in any business, you will come to the conclusion of, “Okay, we've done a lot of things; a lot of stuff. How do you then efficiently optimize what you have already delivered?” And that's where tools like CloudCheckr and others come into play to help you figure out that cost optimization and that strategy.Emily: Yeah. And can you think of any examples where you think, maybe this company would have paid less if they had done cost management, but maybe they wouldn't have accomplished x?Travis: Yeah. Without using a customer name, but to be honest, I forget their name offhand; it's about a year ago, we had a particular scenario where they're moving from on-prem. They were a video provider, and they were moving from on-prem to cloud. And in the move they had, it was something like $400,000 per month in wasted resources. Meaning they had IP addresses not attached to particular servers, or load balancers, they had storage volumes that are no longer in use, not even being backed up or anything, just kind of live off to the side, that people kind of forgot about. It was like $400,000 a month in that stuff. It was a lot of money. And they actually made a decision in the middle of the project, to back up and slow down for a moment and clean up all those things, and it probably cost them a couple months or so—because of the size and volume they're driving at—it probably cost them a couple months of their internal resource time, or otherwise to remediate those loss cost savings, and then continue the project forward with better guardrails. If I remember correctly, they actually missed a significant event day. Like, it wasn't Black Friday, it was some media day they had at the company they wanted to leverage a new system for so that it had a more performant experience with their customers to see videos and content they were delivering. And they had to leverage their older system at the time, which actually cost them more money at the moment because they had to spin up additional resources on their old system to handle that load, and it just wasn't optimal experience for themselves to do that. I think they broke even at the end of the day. I would have just gone faster on the new thing in that scenario, but that's how it played out for them.Emily: So, basically, what you're saying is that it's not always a really simple decision. There's pluses and minuses, and everybody has a limited amount of time, resources, and focus, so you have to decide what's most important at this moment for your organization.Travis: That particular daily experience is near and dear to my heart because that's the majority of my job. There are a lot of opportunities in front of you, a lot of bad things you could do, too, sometimes you're not going to get everybody happy, either. You need someone who's going to make a decision and move quickly, especially if you're a business that's on cloud in a competitive market that's trying to grow. You don't have months or years—sometimes even weeks—to make that type of decision. You kind of have to have all the data in front of you and say, “Here's the best one.” Cloud makes that harder to be quite honest with you. The speed at which you can deliver functionality, and your competitors themselves, the speed at which they can change means that you need to be very confident in what you can do and what your team can do, as well as understand what the risks are in front of you so you can make a decision quickly. If you're not ready to do that, then there may be problems ahead for you because as the market continues to go in that direction, that's just going to be a soft skill required moving forward for a lot of people.Emily: Do you think that the cloud is less expensive than data center? Do you think that cloud-native is less expensive than say someone doing a lift and shift?Travis: Yeah, so it depends a little bit. So, I can start with cloud-native and then work my way back to private cloud. It will be more expensive upfront, to go cloud-native typically, because you're building it from scratch, or redesigning a significant portion most likely of an existing application stack. So, the upfront overhead to go cloud-native is always going to be higher, especially when you factor in things like labor costs. However, though, once you have made that transition, costs can be tremendously lower. So, it's really more so about, is the business willing to take on the effort to make that leap upfront, or is there only so much that's palatable [laughs] at the time to move forward.But typically, it's a bigger spike upfront with much cheaper experience late game. When it comes to things like on-prem, or private cloud, or even hybrid cloud to some extent, I actually think if you have workloads that are very consistent, if you know exactly how much storage you're going to need, how much compute power you need, it's very stable, very consistent, it's probably cheaper to go to your private data center because you can purchase in bulk at the best rates for the time period you need for that specific thing, because you know exactly what it's going to be. That is only the case for a small subset of businesses in the world, who even have the analytics to come up with that data. But if you do, it's completely viable. Was it Dropbox, I think, recently—not recently. A couple years ago now, actually—they went off of their cloud provider. I think they're on AWS at a time, offhand. And they went to their own private data center. That could be butchered in this, slightly, but you get the point, is they knew exactly how much storage they need. And they found it to be cheaper to do it themselves on their own private data center. So, they did. Not a lot of companies can do that. But it's completely an option out there.Emily: What do you think the stakes are? And by which I mean, how big a piece of the pie is IT budgets?Travis: So, what funny is five, six years ago, if you were to ask me that question—this is back when even I worked more true to form IT—IT was not a business driver. It was overhead. It was, make sure my email is still on, and I have computers, and I can do these things, and connect the networks together. And it wasn't driving business decisions, type of a thing. That has radically changed over the last few years. I would actually say that IT is probably the most important aspect of your business now. Without a really strong IT department, you can't move fast on product development. Without a strong IT department, your engineering teams are—the velocity of the deliverables is slower. You can't leverage all these great new stuff that's coming out. So, IT now, to me, is probably the most important thing a business can truly manage appropriately. When it comes to the total budget, it has greatly increased because more responsibilities have also been placed upon IT.They now have to manage to different usage patterns from forecast, different budgets. They have to manage to new skill sets requirements for cloud in particular, and other different types of niche technologies that your business may need. That means that IT themselves has to either staff up or purchase technology solutions like CloudCheckr to accommodate that growth in the business requirement. So, the IT budget is greatly increasing, but for good reasons. It's because the business demands nimbleness, and only a strong IT department can deliver that.Emily: Does that mean though, that it's more important than ever to figure out where you're wasting resources, or at least be strategic about where you're spending those IT resources?Travis: I one hundred percent believe in the strategy of identifying and managing to cloud waste should be in the forefront of pretty much every conversation, among other items. When it comes to focusing entirely on cloud waste, I don't think that's the right decision. I think there's a way that manage to that, that the business will accept while allowing for the right velocity of improvements to products or services, but when it comes to defining—really this comes down to me is the Center Of Excellence, at the end of the day. The Center Of Excellence should be that IT is either the owner of, or is a prominent member of. And they should be defining what's the atypical cost-saving strategy? What is the security profiling? What is the particular way you want to tag your resources, or otherwise, so that you can allocate them to the right business unit, or project code, or etcetera, for that cost analysis? That is the objective of the Center Of Excellence is defining the strategy. Cost savings is a piece of—a significant piece of—but a piece of the overall strategy that you should have an answer to, but it should not be the primary driver.Emily: Anything else that you'd like to add?Travis: In relation to Centers Of Excellence?Emily: Just in relation, actually, to this whole discussion about cloud-native, costs, etcetera?Travis: Yeah. Really, I want to speak to more on the large enterprise side, and/or smaller companies who are maybe being acquired, and having to manage themselves inside of a larger organization all of a sudden. It's becoming fairly normal to see two cloud providers inside of a larger organization. And the normalcy that I've seen is acquisitions: a larger business purchasing a smaller one, maybe the larger ones on AWS, but a smaller one has chosen GCP. And no company is going to tell the new acquisition to refactor themselves and shift over to Amazon. That's a—millions of dollars will be spent [laughs], and lots of time spent to accommodate that request, so no one's really going to answer it. Instead, what they will, though, is central IT will be asked to manage both. So, multi-cloud is becoming a thing. It's becoming more and more important to have the same business strategy applicable to multiple cloud vendors at once. It's important to have the right tooling in place so that as you onboard different cloud vendors than you're normally used to, at least the terminology stays the same, the functionality stays the same because you have the right tools that can help make that translation for you. Without that right process in place, it will mean that your organization will have to incur overhead to help manage always to different costs, different ways to optimize your costs, or otherwise. And so multi-cloud tooling is becoming more and more important for larger orgs, especially as your atypical business processes pan out.Emily: How exactly, actually, does multi-cloud impact to this sort of cost equation?Travis: Yeah, so there's two trains of thought when it comes to how multi-cloud impacts a project and then how the project [eventual] cost savings. So, there's the one that I personally believe does not happen that often, but it's worth mentioning, where you have a project that's deployed—you know, software deployed, application stack deployed—and that's leveraging a small piece of it from a different cloud vendor. So, maybe 90 percent of the stack is deployed on AWS but uses Google Cloud for some analysis engine they may have, or some particular tool. So, now you've got to combine these two things together. That happens, but rarely. The alternative to that is you have discrete projects who have their own cloud vendors that they use 100 percent of but you now need to manage the business unit, owns both those projects. And how do you then combine those two things, but there is no project dependency between the two? In both cases, there needs to be a strategy employed for identifying the individual resources. So, in AWS or otherwise, you have a tagging strategy based off of project code, or business unit, or reporting lines, or however the business needs to manage to it. The second is, you need to come up with and identify consistent ways you want to save money. Every cloud vendor may have a slightly different functionality because they need to differentiate among themselves; that's normal. But everybody should be turning off servers from 7 p.m. at night until 6 a.m. the next day, when the workforce is typically not working—the internal systems, pre-production type systems—if you can. And that's true to form, regardless if you're Amazon, or Azure, or GCP. And so you need tooling and the right process in place to identify those cross-cloud cost-saving strategy options available to you, and need to normalize the way you want to implement on it. If you don't do that, what typically will end up happening is you're going to be working out of multiple consoles with different terminology, and different settings you're not used to, and different ways to implement it, and somewhere something will fall through the cracks, and you will not have consistency in your cost-saving strategy.Emily: All right, one more question. What's your favorite can't live without engineering tool?Travis: Oh, I can't say that. Because I'm going to get yelled at. [laughs]. If I had to take off my CloudCheckr hat and put my personal hat on for a brief moment, I would say log analysis tools like Sumo Logic, like Splunk, is my favorite thing, period. Back when I was more in engineering side of the house, before I was in product, I would spend hours looking through log analytics. There's just so much you can do with those tools nowadays to identify, come up with really cool analysis products. That's actually one of the guiding principles I see for CloudCheckr is people really want to have fun tools they can play with, and see, and get their hands on, and come up with new conclusions to answers they may not have had in the past. That's what I see those log analytics products doing. CloudCheckr is following a very similar route, and that's the ethos that I'm instilling on our product team is our users, our customers, they should have a fun product to be able to get into and play with their data, and use in different ways to come to new conclusions they've never seen before, especially on cost analysis. So, those are my favorite products and how we integrate them into our product decisions.Emily: Where can listeners connect with you?Travis: Yeah, so I'm on LinkedIn. Just Google my name Travis Rehl, CloudCheckr, feel free to contact me; happy to chat. CloudCheckr, if you're a CloudChecker customer, we have an All-Stars program at cloudchecker.com/allstars, which enables you to have a Slack workspace with us, which myself and my team run so you're able to chat, have conversations like this, or otherwise. I also post on Twitter, but be honest with you not that often. So, if you're looking to reach out, I'd say LinkedIn is probably your best bet.Emily: Thank you again. This is Travis Rehl. And thank you for joining us on the Business of cloud-native.Travis: Sure. Thanks for having me.Announcer: Thank you for listening to The Business of cloud-native podcast. Keep up with the latest on the podcast at thebusinessofcloudnative.com and subscribe on iTunes, Spotify, Google Podcasts, or wherever fine podcasts are distributed. We'll see you next time.This has been HumblePod production. Stay humble.

amazon spotify google business conversations tips cost meaning black friday cloud ecommerce google podcasts costs ip sort slack real world aws dropbox all stars azure xyz google cloud kubernetes cdn splunk capex gcp vms cloud native opex center of excellence ec2 just google shadow it sumo logic emily omier announcer thank humblepod

The Power of Aligning Engineering and Operations with Dave Mangot

The Business of Open Source

Play Episode Listen Later Jul 15, 2020 38:39

Some of the highlights of the show include: The difference between cloud computing and cloud native. Why operations teams often struggle to keep up with development teams, and the problems that this creates for businesses. How Dave works with operations teams and trains them how to approach cloud native so they can keep up with developers, instead of being a drag on the organization. Dave's philosophy on introducing processes, and why he prefers to use as few as possible for as long as possible and implement them only when problems arise. Why executives should strive to keep developers happy, productive, and empowered. Why operations teams need to stop thinking about themselves as people who merely complete ticket requests, and start viewing themselves as key enablers who help the organization move faster. Viewing wait time as waste. The importance of aligning operations and development teams, and having them work towards the same goal. This also requires using the same reporting structure. Links: Company site: https://www.mangoteque.com/ LinkedIn: https://www.linkedin.com/in/dmangot/ Twitter: https://twitter.com/DaveMangot CIO Author page: https://www.cio.com/author/Dave-Mangot/ TranscriptAnnouncer: Welcome to The Business of Cloud Native podcast, where we explore how end users talk and think about the transition to Kubernetes and cloud-native architectures.Emily: Welcome to The Business of Cloud Native. I'm your host, Emily Omier, and today I am chatting with Dave Mangot. And Dave is a consultant who works with companies on improving their web operations. He has experience working with a variety of companies making the transition to cloud-native and in various stages of their cloud computing journey. So, Dave, my first question is, can you go into detail about, sort of, the nitty-gritty of what you do?Dave: Sure. I've spent my whole technical professional career mostly in Silicon Valley, after moving out to California from Maryland. And really, I got early into web operations working in Unix systems administration as a sysadmin, and then we all changed the names of all those things over the years from sysadmin to Technical Infrastructure Engineer, and then Site Reliability Engineer, and all the other fun stuff. But I've been involved in the DevOps movement, kind of, since the beginning, and I've been involved in cloud computing, kind of, since the beginning. And so I'm lucky enough in my day job to be able to work with companies on their, like you said, transitions into Cloud, but really I'm helping companies, at least for their cloud stuff, think about what does cloud computing even mean? What does it mean to operate in a cloud computing manner? It's one thing to say, “We're going to move all of our stuff from the data center into Cloud,” but most people you'll hear talk about lift and shift; does that really the best way? And obviously, it's not. I think most of the studies will prove that and things like the State of DevOps report, and those other things, but really love working with companies on, like, what is so unique about the Cloud, and what advantages does that give, and how do we think about these problems in order to be able to take the best advantage that we can?Emily: Dive into a little bit more. What is the difference between cloud computing and cloud-native? And where does some confusion sometimes seep in there?Dave: I think cloud-native is just really talking about the fact that something was designed specifically for running in a cloud computing environment. To me, I don't really get hung up on those differences because, ultimately, I don't think they matter all that much. You can take memcached, which was designed to run in the data center, and you can buy that as a service on AWS. So, does that mean because it wasn't designed for the Cloud from the beginning, that it's not going to work? No, you're buying that as a service from AWS. I think cloud-native is really referring to these tools that were designed with that as a first-class citizen. And there's times where that really matters. I remember, we did an analysis of the configuration management tools years back, and what would work best on AWS and things like that, and it was pretty obvious that some of those tools were not designed for the Cloud. They were not cloud-native. They really had this distinct feel that their cloud capabilities were bolted on much later, and it was clunky, and it was hard to work with, whereas some of the other tools, really felt like that was a very natural fit, like that was the way that they had been created. But ultimately, I think the differences aren't all that great, it just, really, matters how you're going to take advantage of those tools.Emily: And with the companies that you work with, what is the problem or problems that they are usually facing that lead them to hire you?Dave: Generally the question, or the statement, I guess, that I get from the CIOs and CTOs, and CEOs is, “My production web operations team can't keep up with my development teams.” And there's a lot of reasons why those kinds of things can happen, but with the dawn of all these cloud-native type things, which is pretty cool, like containers, and all this other stuff, and CI/CD is a big popular thing now, and all kinds of other stuff. What happens, tends to be is the developers are really able to take advantage of these things, and consume them, and use them because look at AWS. AWS is API, API, API. Make an API call for this, make an API call for that. And for developers, they're really comfortable in that environment. Making an API call is kind of a no brainer. And then a lot of the operations teams are struggling because that's not normal for them. Maybe they were used to clicking around in a VMware console, and now that's not a thing because everything's API, API, API. And so what happens is the development teams start to rocket ahead of the operations teams, and the operations teams are running around struggling to keep up because they're kind of in a brand new world that the developers are dragging them into, and they have to figure out how they're going to swim in that world. And so I tend to work with operations teams to help them get to a point where they're way more comfortable, and they're thinking about the problems differently, and they're really enabling development to go as quickly as development wants to go. Which, you know, that's going to be pretty fast, especially when you're working with cloud-native stuff. But I mean, kind of to the point earlier, we built—at one of the companies I worked at years ago—what I would say, like, a cloud environment in a data center, where everything was API first, and you didn't have to run around, and click in consoles, and try to find information, and manually specify things, and stuff like that; it just worked. Just like if you make a call for VM in AWS, an EC2 instance. And so, really, it's much more about the way that we look at the problems, then it is about where this thing happens to be located because obviously cloud-native is going to be Azure, it's going to be GCP, it's going to be all those things. There's not one way to do it specifically.Emily: What's the business pain that happens if the operations team can't keep up with the developers? What happens? Why is that bad?Dave: That's a great question. It really comes down to this idea of an impedance mismatch. If the operations teams can't keep up with the development teams, then the operations teams become a drag on the business. There's so much—if you read the state of DevOps reports that are put out by DORA research and—I guess, Google now, now that they bought it—but they show that these organizations that are able to go quickly: the organizations that are able to do deploys-on-demand, the organizations that are able to remediate outages faster, all those things play into your business's success. So, the businesses that can do that have higher market capitalization, they have happier employees, they have all kinds of fantastic business outcomes that come from those abilities, and so you don't want your operations team to be a drag on your organization because that speed of business, that ability to do things a lot more easily, let's even call it like a lot more cloud-native if you want, that has real market effects. That has real business performance impacts. And so, if you look at the DevOps way of looking at this—like I said, I've been pretty involved in the DevOps movement—really the DevOps is about all the different parts of the organization working together in concert to be able to make the organization a success. And the first way of DevOps, you're talking about systems thinking, you're looking at the overall flow of work through the system, and you want to optimize that because the faster we can get work flowing through the system, the faster we can deliver new features to our customers, bug fixes to our customers, all the things that our customers want, all the things that our customers love. And so if you're going to optimize flow of work through the system, you definitely don't want work slowing down inside the operations part of the system. That's bad for the business, and that's bad for your business outcomes.Emily: And how do you think companies realize that this is a problem? I mean, is it obvious or not?Dave: I think it's one of those things like I always talk to people about process right? When do we want to introduce process? a lot of startups are like, “We need more process here, we need more process there.” And my advice to everybody is always, use as little process as possible for as long as possible, and when you need that process, it will make itself known. The pain will be so obvious that you'll be like, “Okay, we can't do this anymore this way. We've run this all the way to the end, and now we have to change things, and now we have to introduce process here.” And I think that it becomes pretty obvious to certainly the companies that I work with. At one point where they're like, “This isn't working, I'm getting my development leadership coming to be saying ‘I'm waiting for this, I'm waiting for that. I'm waiting for this. I'm waiting for that. I don't have permission to do this. We're being blocked here.'” all the things that you don't want to be hearing from your development leaders because what they're expressing is their pain of being inhibited; their pain of being slowed down. And I think it's just, like, with the process thing, I think at some point, the pain becomes obvious enough that people say, “We have to do something.” I remember talking to one company, and I was like, “Well, what do you want out of this engagement? What's your end goal?” And they said, “We'd like it for our developers to show up every day and be really happy with the environment that they're working in.” And so, you can hear it there, right? Their developers are not happy. People are coming from other companies, and they're going to this company—and I certainly won't name who they are—but they're going to this company, and they're saying, “Hey, when I worked at this other place, I didn't have all this. I didn't have all these things stopping me. I didn't have all these things inhibiting me.” And so that's why, what I said in the beginning, it's things that I'm hearing from CEOs, and CTOs, and people in those positions because at some point that stuff is just bubbling up and bubbling up, and the amount of frustration just really makes itself known.Emily: Why do you think COOs care how happy their developers are?Dave: Well, I mean, there's tons of studies that show the happier your developers are, the more productive they are. I mean, look at the Google rework stuff about psychological safety. Google discovered after hiring a professional psychology researcher to determine who were their highest performing teams, their highest performing teams weren't the teams that had the most talented engineers; it wasn't the people who went to MIT; their highest performing teams weren't the ones who had the best boss, or the coolest scrum master, or anything like that. Their highest performing teams were the teams that had the most psychological safety. People who were able to operate in an environment where they felt free to talk about the things that maybe weren't going well, things that could be improved, crazy ideas they had to make improvements, stuff like that. And I don't think that you can be on a team that's unhappy and feel like there's a lot of psychological safety there. And so, I think those things are highly correlated to one another. So, I mean, obviously, the environment that's necessary for psychological safety goes far beyond whether or not my Kubernetes cluster is automatically deploying my Docker containers; that's certainly not the case. But I think it's important to recognize that if developers are in an environment where they feel empowered, and they're not being inhibited, and they can really focus on their work, and improving things, and making things better, that they're going to produce better work, and that's going to be better for companies and certainly their business outcomes.Emily: And bringing it back to cloud-native a little bit. Can you connect for me how a cloud-native type architecture helps bring operation teams up to speed or helps remove these roadblocks?Dave: Yeah. Well, I think it's a little bit of the reverse, right? I think that the successful operations teams are the ones who are enabling these cloud-native ways of looking at the world. I think there used to be this notion of, if you want something from operations, you open up a ticket, and then operations goes and they do the ticket, and then they come back to you and say, “It's done.” And then, this never-ending cycle of sending off something and then waiting, and then sending off something, and waiting. And in cloud-native environments, we don't have that. In cloud-native environments, people are empowered and enabled, to go off and deploy things, and test things, and remediate things, and do dark launching, and have feature flags, and all these other things that, even though we're moving quickly, we can do that safely. And I think that's part of the mind shift that has to happen for these operations teams, is they need to stop thinking about themselves as people who get things done, and they need to start thinking about themselves as people who are enabling the whole organization to go faster, easier, better. I always talked to my SREs—Site Reliability Engineers—who used to work for me, and I'd say, “You have two responsibilities and that's it. And this is, in order, your first responsibility is to keep the site up.” That sounds pretty normal, right? That's what most operations teams feel like they've been tasked with. And I'm like, “Your second responsibility is to keep the developers moving as fast as possible.” And so really, when you start taking that to heart, keeping developers moving as fast as possible, that's not closing tickets as fast as you can. That's not keeping the developer moving as fast as possible, that's enabling developers to have self-service tools, and have things where they want to get something done and it's very painless for them to do that. We used to launch EC2 instances at one of the companies I was working with, where we had gotten it down to a point where you just said what kind of machine you wanted, and then that was it; and you were done. And everything else got taken care of for you: all the DNS, all the security groups, routing, networking, DNS, like, everything was all taken care of, all the software was loaded. There wasn't anything to do but say what you wanted. And we actually were able to turn that tool over to the developers so they could launch all their own stuff. They didn't need us anymore. And I think that's really creating these cloud-native ideas. Certainly, a lot of that stuff is part of the cloud-native tooling, now. This was a few years ago, but really it's enabling the developers to go as fast as possible. We could have said, “Hey, you want a machine? Open up a ticket, and it's so easy for us to spin up a machine.” But we didn't do that. We took it to the next level, and we empowered them, and we allowed them to go quickly. And that's really the sort of mental shift that the operations teams have to make. How do we do that?Emily: I have to say, I have never been a developer, but whenever anyone talks about this process of submitting a ticket and waiting for it to get addressed, it just sounds like hell.Dave: Yeah. Well, if you look at it from a lean manufacturing, Toyota kind of thing, all that wait time is waste. In lean, they call that waste. It's a handoff: there's no work being accomplished during that time, and so it's waste in the system. And so Toyota is always trying to move towards—I can't remember they call. It, I think it was, like, one piece flow, or something like that where basically you want work to be happening at all times in the system, and you certainly don't want things sitting around. And so, developers don't want that either. Developers want to put things out there. They want to see, does this work? Does this not work? And when you enable developers to have that kind of power and have that ability to go really fast, there's all kinds of like things that we can enable for the business that help cost savings, better security, all kinds of stuff far beyond just simple, “Hey, here's more features. Here's more features.”Emily: How easy do you think it is for operations teams to sort of shift to think, like, “Our job is to make things as easy as possible for developers?”Dave: I don't think it's that hard, actually, mostly because if we're starting to look at things from a DevOps mindset, we're understanding that the whole goal is to optimize the entire system; it's not to optimize a single point in the system. And I always advocate that operations teams report up through the same reporting structure as the engineering teams do. The worst thing you can do is silo it off so all the operations teams report to the COO, and all the engineering teams report to the CTO. Like, that's awful because what you want to do is you want to align the outcome so that everybody's working towards the same goal, and now we can start to partner up together in order to be able to achieve those goals. And so, one of my favorite examples of this from, like, enabling the developers to go fast, and doing that in partnership with operations was, I worked at a company, and we had a storage system, and we were storing all this stuff in a database, and we were paying a lot of money to store all this data for our customers. And that's what the customers were paying us for, was to store their data. And so the developers had this idea that they wanted to try this other way of storing the data. And so, we worked with them—the operations teams work with them, “How do you want to do this? What kinds of things do you need? What's going to work best? Is this going to work best? Is that going to work best?” And we had a lot of collaboration, and, “Here's where we're going to launch these new things, and we're going to try them out. And this is how we're going to try them out.”And it wasn't a process that happened overnight: from beginning to end of this project, it probably took, I don't know, a year and a half or something like that, of iterating, and trying, and testing, and making sure it's safe, and all these other stuff. But in the end, we wound up shutting off the old database system and talking to the engineers about what that meant for the business. They said a conservative estimate would be that we saved the company 75 percent on storage costs. That's the conservative estimate. I mean, that's insane, right? 75 percent for their biggest cost. That was the biggest cost of the company, and we knocked it down by 75 percent, at minimum.And so this idea of enabling this cloud approach of going quickly, and taking advantage of all these resources, and moving fast without impediments, that can have some major impact. And it's not operations teams doing that; it's not development teams doing that; it's operations and development teams doing that together in partnership to achieve those pretty awesome business outcomes.Emily: In that particular case, who had the initial push? Who had this initial idea that let's figure out a better way to approach storage?Dave: Well, I mean, we got a challenge from the business. The business said, look at our costs. Look at what we're doing. Are there ways that we can improve this so that we can improve our profitability? And so it was a challenge. And I think the best thing about that is, it wasn't the business telling us how to do it; it wasn't people saying, here's what you should do. The business is saying, “If this is a problem, how do you solve it?” And then they, kind of, got out of our way and said, “Let the engineers do their engineering.” And I think that was kind of fantastic because the results were exactly what they wanted. But the business is going to look at the problems from a business perspective, and I think it's important that as engineers, we look at the problems from a business perspective as well. We're not showing up for work to have fun and play with computers. We're showing up at work to achieve an objective. That's why we get paid. If you want to hobby around with your computers, you can hobby around at home, but we're getting paid at work to achieve the goals of the business. And so, that was the way that they were looking at the problem, and that's the way that we wound up looking at the problem. Which is the correct way?Emily: Do you have any other notable examples that come to mind?Dave: Yeah, I mean, this idea of cloud and being able to go quickly, we had this one problem with that—actually, with that same database engine, which is hilarious, before we wound up replacing it, where we were upgrading the software from one version to another, and we're making a pretty big jump. And so, we spun up the new version of the software; we loaded the data on, and we started seeing how their performance was. And the performance was terrible. I mean, not just, we would have trouble with it; it was unusable. There was no way we could run the business with that level of performance. And we're like, “What happened? [laughs]. What did we do here?” And so, we went and looked in GitHub at the differences between the old version of the software, and the new version of software. And there was, like, 5000 commits that had happened between the old version and the new version. And so all we had to do was find out which of those 5000 commits was the problem. [laughs]. Which, that's a daunting task or whatever. But the operations team got down to it, and we built a bunch of tooling, and we started changing some things and making improvements so that we were able to spin up clusters of this software and run a full test suite to determine whether this problem still existed. And that was something we could do in 20 minutes. And so then we started doing what's called git bisecting, but we started searching in a certain kind of pattern, which I won't get into, for which of these—where was the problem? So, we would look, say in the middle, and then if the problem wasn't there in the middle, then we would look between the middle and the right. If it was there, then we would look between the middle and the left. And we kept doing this bisect, and within two days, we had found the exact commit that had caused the problem. And it was them subtracting, like, a nano from a milli, or something like that. But going back and talking to the CTO afterwards, I said, “You know, if we hadn't built these tools, and we hadn't had this ability to really iterate super quickly in the Cloud, what would you have done?” And he was like, “I have no idea.” He's like, “Maybe we would have spent a couple of days and then given up, maybe we would have just gone in a completely different direction.” But that ability to be able to work so effectively with these cloud tools, and so easily with these cloud tools, enabled us to do something that the business just would just not have had the opportunity to take advantage of at all. And so that was a major win for being able to have operations teams that think about these problems in a completely different way.Emily: It sounds like, in this particular company, the engineering teams and the business leaders were fairly well-aligned, and able to communicate pretty well about what the end goals are. How common do you find that is with your clients?Dave: I don't know. I think it's pretty variable. It depends on the organization. I think that is one of the things that I emphasize when I'm working with my clients is how important that alignment is. I sort of talked about it a little bit earlier, when I said you shouldn't have one group reporting to the CTO and another group reporting to the COO. But also, it's really important for leadership to be communicating this stuff in the proper way. One of the things I loved most about my experience working at Salesforce was, Marc Benioff was the CEO and he would publish what they call V2MOMs, which is like—oh boy, vision, values, metrics, obstacles, and measures, or something. I don't remember what the last thing was. But he would publish his V2MOM, which was basically his objectives for the next time period, whether it was quarterly, or yearly, I don't really remember. But then what would happen was the people that worked for him would look at his V2MOM, and they would write theirs about what they wanted to get accomplished, but showing how what they were doing was in support of what he wanted. And then the people below them would do the same thing. And the people below them would do the same thing. And what you were able to create was this incredible amount of alignment at a 16,000 person company, which is crazy, up and down the ladder so that everybody understands what they're doing, and how it fits into the larger picture, and what they're doing in support of the goals of the business, and the objectives of the business, and that goes all the way down to the most junior engineer. And I think having that kind of alignment is, I mean, it's obviously incredibly powerful. I mean, Salesforce is a rocket ship and has been for a long, long time. And Google does this for their OKRs, and that there was that thing that was popularized by Intel as well; there's a whole bunch of these things. But that alignment is phenomenal if you want to have a lasting, high performing organization.Emily: When you see companies that don't have that alignment, or even just, it seems like the engineering team maybe doesn't entirely understand where the business is going, or even the business doesn't understand what the engineering team is doing, what happens and where is the communication going wrong?Dave: I mean, you see the frustration. You see the fracturing. You see the silos. You see a lot of finger-pointing. I've definitely worked with some clients where the ops team hates the dev team; the dev team hates the ops team. I remember the dev team saying, “Ops doesn't actually want to do any work. They just want to invent stuff for themselves to work on. And that's how they want to spend their day.” And the ops team is saying, “The developers don't even understand anything about what we're doing, and they just want to go o—” you know, there's all these crazy, awful made up stories. And if you've ever read the book Crucial Conversations—they also have a course, or whatever—one of the things they talk about is you need to establish mutual purpose in order to have a difficult conversation. And I think that's really important for what we're talking about in the business: we need to establish mutual purpose, just we talked about with DevOps, there's only one goal. And the other thing that they say in that class, or in that book, is that when we are going to have a conversation with somebody that we are not getting along with, we invent a story that explains why they behave the way that they do, and every time we see something that validates that story, then it's even more evidence that that story is actually correct. The problem is, is it's a story. Like it's not real. It may seem real to us; it may feel real to us. But it's a story. It's something that we made up. And so, that's the kind of outcomes that you get when you have this fracturing, where you don't have this alignment up and down. You have people telling these stories like, “Operations doesn't want to do any real work. They just want to make stuff up for themselves to work on.” Which is, if you're not in that environment, if you're someone like you and me looking from the outside, that's absurd. But I can certainly see how you can make up a story that gets continually validated by what you see because you're looking for evidence that supports your story. That's part of what makes you think that you're right, is that you're always searching for this evidence. And so obviously, those are not going to be high performing organizations. That's why it's so important to get that kind of alignment.Emily: Going back to this idea of sort of moving to cloud-native, what do you think are some of the surprises or misconceptions that come up when teams are moving to more cloud-native approaches?Dave: I feel like my clients generally are not terribly surprised. I think by the time that someone's reaching out to me, they are feeling a lot of pain, and they know that things have to change, and they are looking for what are the ways that things have to change? I don't ever have to go into a client and convince them that they need to do it better. The clients that are coming to me, recognizing that they're having a problem, and so it's really just getting them to stop focusing on what we call an SRE toil, which is popularized by Google, which is—I don't remember the exact definition, but it's basically manual work that's devoid of enduring value, that's repetitive, it's automatable, it's just repeat, repeat, repeat, repeat: we're not making improvements anymore. And so once we start to have this Kaizen mindset, this idea of continual improvement at all times, instead of just trying to keep the business running, that starts to enable all this kinds of stuff. And that's why we talked about building things in sort of a cloud-native manner. We're talking about that ability to go fast. We're talking about that ability to enable things, and part of that is this idea of continual improvement; this idea of always making things better. A lot of this comes out of Agile as well. I always talk to people about their sprint retrospectives, and I say, “It's your opportunity to make your team better. It's your opportunity to make your environment better. It's your opportunity to make your company better.” And I was like, “The worst thing that you could do in Agile is if in January of last year, and in January of this year, your team is just as good as it was.” That's terrible. Your team needs to be much better than it was. And so enabling developers to go quickly and all that other stuff. And putting all those things in place is a big part of that.Emily: Anything else that you want to add about this topic that I didn't think to ask?Dave: I mean, I think embracing these principles is really important. I think that if you look at the companies who are trying to go fast, and don't embrace these principles, these cloud-native ideas, or just even these cloud computing ideas, it basically becomes technical debt that keeps building up, and building up, and building up. And everybody knows accumulating tons of technical debt is not going to help your organization to move faster; it's not going to help you achieve all those great business outcomes that you want to get out of the State of DevOps report. And so I've seen situations where they have not been able to make that transition into this way of looking at the world, and the environment becomes really fragile; it becomes really brittle; it becomes really hard to make changes, and the only way is to make changes is to double down on the technical debt, and accumulate more of it, to the point where eventually they wind up spinning up an entire team whose sole purpose is to try to undo the mess that's been created. And you don't want that. You don't want to allocate a team to start unpacking your technical debt. You'd rather just not accumulate that technical debt as you're going along. And so I think it's really crucial for businesses that want to be successful in the long term that they start to embrace these ideas early. And obviously, if I'm a startup and I want to embrace a lot of these cloud-native things, that's a lot easier than if I'm a well-established company. I, in my consulting practice, I don't really work with startups because they don't tend to have these problems. They don't tend to accumulate a lot of technical debt because they are founded with this idea of going quickly and being able to empower developers and enable people to go quick. To your point earlier, the companies that I'm working with are the ones who are making this transition, where they've been running in the data center, or maybe they built an environment in the Cloud, but it's just not operating the way that they expected, and they're paying ridiculous amounts of money [laughs] to run stuff in AWS, where we thought, “Hey, what's going on? This isn't supposed to be this way.” But startups have the ability to do this much easier because they're unencumbered. And then as they grow, and they start to introduce more process because that stuff is inevitable that we're going to need to do that, that's when these things become even more important that we make sure that we're keeping them in mind and we're doubling down on them, and we're not introducing lean waste into the system and stuff like that, that will ultimately catch up with us.Emily: It's so true. All right, just a couple more questions. What is your favorite engineering tool?Dave: Ah. I mean, I'm supposed to give some kind of DevOps-y, it's not about the tools answer, but this week I think it was, on Twitter, I saw somebody else put up a SmokePing graph. And most people are not going to have heard of SmokePing. I worked at multiple ISPs in my career already, so the networking stuff is important to me. But wow, I love a SmokePing graph. And it's basically just a bunch of pings that are sent to some target, and then they're graphed when they come back, but instead of saying, “I sent one ping, and I came back with 20 milliseconds,” it sends 20, and then it graphs them all at that time point, so you can actually see density. It's basically before everybody came up with the idea of heat maps, this was one of the original heat map tools, and I still run SmokePing in my house just to see the performance of my home network going out to different parts of the internet, and that's definitely my favorite tool.Emily: Where can listeners connect with you? Website?Dave: Yeah, yeah, that's a great question. So, if people are interested in my business, I'm at mangoteque.com, M-A-N-G-O-T-E-Q-U-E. That was a fun name invented by Corey Quinn of The Duckbill Group and I loved it, and so I wound up using it. And they could also find me on LinkedIn obviously, or Twitter at @DaveMangot. M-A-N-G-O-T, and I post a lot on there about things that I've observed. I post a lot on there about DevOps. I post a lot on there about taking a scientific approach to a lot of the things we're doing, not just in terms of the scientific method, but like in terms of cognitive neuroscience, and things like that. And I also write a monthly column for CIO.com.Emily: Well, Dave, thank you so much for joining me.Dave: Thank you for having me, Emily, this was really fun.Announcer: Thank you for listening to The Business of Cloud Native podcast. Keep up with the latest on the podcast at thebusinessofcloudnative.com and subscribe on iTunes, Spotify, Google Podcasts, or wherever fine podcasts are distributed. We'll see you next time.This has been HumblePod production. Stay humble.

ceo spotify california google business state mit open maryland silicon valley ceos engineering operations cloud coo google podcasts developers cto intel real world aligning salesforce toyota api agile cio cios aws github vm devops azure ops vmware kaizen dns docker kubernetes okrs crucial conversations sre ci cd unix isps gcp cloud native ctos marc benioff coos ec2 corey quinn site reliability engineer duckbill group dave yeah dave well emily it emily omier emily well dave thank announcer thank humblepod

Scaling in the Cloud: A Conversation with Jon Tirsen

The Business of Open Source

Play Episode Listen Later Jul 1, 2020 26:38

In this episode of the Business Cloud Native, host Emily Omier talks with Jon Tirsen, who is engineering lead for storage at Cash App. This conversation focuses on Cash App's cloud native journey, and how they are working to build an application that is more scalable, flexible, and easier to manage.The conversation covers: How the need for hybrid cloud services and uniform program models led Cash App to Kubernetes. Some of the major scaling issues that Cash App was facing. For example, the company needed to increase user capacity, and add new product lines. The process of trying to scale Cash App's MySQL database, and the decision to split up their dataset into smaller parts that could run on different databases. Cash App's monolithic application, which contains hundreds of thousands of lines of code — and why it's becoming increasingly difficult to manage and grow. How Jon's team is trying to balance product/ business and technical needs, and deliver value while rearchitecting their system to scale their operations. Why Cash App is working to build small, product-oriented teams, and a system where products can be executed and deployed at their own pace through the cloud. Jon also discusses some of the challenges that are preventing this from happening. How Cash App was able to help during the pandemic, by facilitating easy stimulus transfers through their service — and why it wouldn't have been possible without a cloud native architecture. Links: Cash App: https://cash.app/ Square: https://squareup.com/us/en Jon on Twitter: https://twitter.com/tirsen?lang=en Connect with Jon on LinkedIn: https://www.linkedin.com/in/tirsen/?originalSubdomain=au The Business of Cloud Native: http://thebusinessofcloudnative.com TranscriptAnnouncer: Welcome to The Business of Cloud Native podcast where we explore how end users talk and think about the transition to Kubernetes and cloud-native architectures.Emily: Welcome to The Business of Cloud Native. My name is Emily Omier, I'm here chatting with Jon Tirsen.Jon: Happy to be here. My name is, as you said, Jon Tirsen, and I work as the engineering lead of storage here at Cash App. I've been at Cash for maybe four or five years now. So, I've been with it from the very early days. And before Cash, I was doing a startup, that failed, for five years. So, it's a travel guide in the mobile phone startup. And before that, I was at Google working on another failed product called the Google Wave, which you might remember, and before that, it was a company called ThoughtWorks, which some of you probably know about as well.Emily: And in case people don't know, the Cash App is part of Square, right?Jon: Yes. Cash App is where we're separating all the different products quite a lot these days. So, it used to be called just Square Cash, but now it has its own branding and its own identity, and its own leadership, and everything. So, we're trying to call it an ecosystem of startups. So, each product line can run its business the way it wants to, to a large degree.Emily: And so, what do you actually spend your day doing?Jon: Most of my days, I'm still code, and doing various operational tasks, and setting up systems, and testing, and that sort of thing. I also, maybe about half my day, I spend on more management tasks, which is reviewing documents, writing documents, and talking to people trying to figure out our strategy and so on. So, maybe about half my time, I do real technical things, and then the other half I do more management stuff.Emily: Where would you say the cloud-native journey started for you?Jon: Well, so a lot of Square used to run on-premises. So, we had our own data centers and things. But especially for Cash App, since we've grown so quickly, it started getting slightly out of control. We were basically outgrowing—we could not physically put more machines into our data centers. So, we've started moving a lot of our services over to Amazon in this case, and we want to have a shared way of building services that would work both in the Cloud and also in our data centers. So, something like Kubernetes and all the tools around that would give us a more uniform programming model that we could use to deploy apps in both of these environments. We started that, two, three years ago. We started looking at moving our workload out of our data centers.Emily: What were the issues that you were encountering? Give me a little bit more details about the scaling issues that we were talking about.Jon: There two dimensions that we needed to scale out the Cash App, sort of, system slash [unintelligible] architecture. So, one thing was that we just grew so quickly that we needed to be able to increase capacity. So, that was across the board. So, from databases to application servers, and bandwidth, everywhere. We need to just be able to increase our capacity of handling more users, but also we were trying to grow our product as well. So, at the same time, we also want to build and be able to add new features at an increased pace. So, we want to be able to add new product lines in the Cash App. So, for example, we built the Cash Card, which is a way you can keep your money in the Cash App bank accounts, and then you can spend that money using a separate card, and then we add a new functionality around that card, and so on. So, we also needed to be able to scale out the team to be able to have more people working on the team to build new products for our users, for our customers. Those are the two dimensions: we needed to scale out the system, but we also needed to have more people be able to work productively. So, that's why we started trying to chop up—we have this big monolith as most companies probably do, which that's I don't know how many hundreds of thousands of lines of code in there. But we also wanted to move things out of that, to be able to have more people contribute productively.Emily: And where are you in that process?Jon: Well, [laughs], we're probably adding still adding code at an exponential rate to the monolith. We're also adding code at an exponential rate outside of the monolith, but it just feels so much easier to just build some code in the monolith than it is outside of it, unfortunately, which something we're trying to fix, but it's very hard. And it is getting a little bit out of hand, this monolith now. So, we have, sort of, a moratorium on adding new code to the monolith now, and I'm not sure how much of an effect that has made. But the monolith is still growing, as well as our non-monolith services as well, of course. Emily: When you were faced with this scaling issue, what were the conversations happening between the technical side and the business owners? And how is this decision made about the best way to solve this problem is x, is the Cloud, is cloud-native architecture?Jon: I think the business side—the product owners, product managers—they trust us to make the right decision. So, it was largely a decision made on the technical side. They do still want us to build functionality, and to add new features, and fix bugs, and so on. So, they want us to do that, but they don't really have strong influence on the technical choices we've made. I think that's something we have to balance out. So, how can we keep on giving the product side and the business side what they need? So, to keep on delivering value to them while we try to rearchitect our system so that we can scale out our operations on our side. So, it's a very tricky balance to find there. And I think so far, maybe we've erred on the side of keep on delivering functionality, and maybe we need to do more on the rearchitecting things. But yeah, that's always a constant rebalancing act we're always dealing with. Emily: Do you think that you have gotten the increased scalability? How far along are you on reaching the goals that you originally had?Jon: I think we have a pretty scalable system now, in terms of the amount of customers we can service. So, we can add capacity. If we can keep on adding hardware to it, we can grow very far. We've actually noticed that the last few weeks, we've had an almost unprecedented growth, especially with the Coronavirus crisis. Every single day, it's almost a record. I mean, there's still issues, of course, and we're constantly trying to stay on top of that growth, but we have a reasonably good architecture there. What I think is probably our larger problem is the other side, so the human side. As I said, we are still adding code to this monolith, which is getting completely out of hand to work with. And we're not growing our smaller services fast enough. It's probably time to spend more effort on rearchitecting that side of things as well.Emily: What are some of the organizational, or people challenges that you've run into?Jon: Yeah. So, we want to build smaller teams oriented around products. We see ourselves more of a platform on products these days: we're not just a single product. And we want to build smaller teams. That is, maybe we have one team that is around our card, and one team around our [unintelligible] trading and so on. And we want to have the smaller teams, and we want them to be able to execute independently. So, we want to be able to put together a cross-functional team of some engineers, and some UX people, and some product people, and some business people, and then they should be able to execute independently and have their own services running in our cloud infrastructure, and not have to coordinate too much with all of the other teams that are also trying to execute independently. So, each product can do its own thing, and own their own services, and deploy at their own pace, and so on. That's what we're trying to achieve, but as long as they still have to do a lot of work inside of our big monolith, then they can't really execute independently. So, one team might build something that actually causes issues with another team's products, and so on, and that becomes very complicated to deal with. So, we tried to move away from that, and move towards a model where a team has a couple of services that they own, and they can do most of their work inside of those services.Emily: What do you think is preventing you from being farther along than you are? Farther along towards this idea of teams being totally self-sufficient?Jon: Yeah, I think it's the million-dollar question, really. Why are we still seeing exponential growth in code size in our monolith, and not in our services? And I think it's a combination of many, many things. One thing I think, we don't have all of the infrastructure available to us in our cloud, in our smaller services. So, say you want to build a little feature, you want to add a little button that does something, and if you want to do that inside our monolith, that might take you two, three days. Whereas if you want to pull up a completely new service—I think we've solved it at an infrastructural layer, it's very quick and easy to just pull up a new service, and have it run, and be able to take traffic, and so on—but it's more of the domain-specific infrastructures of being able to access all the different data sets that you need to be able to access, and be able to shift information back to the mobile device. And all these things, it's very easy to do inside a monolith, but it's much harder to do outside of the monolith. So, we have to replicate a big set of what we call product platforms. So, instead of infrastructural platform is more product specific platform features like customer information, and be able to send information back to the client, and so on. And all those things have to be rebuilt for cloud services. We haven't really gotten all the way there yet.Emily: If I understood correctly from the case study with the CNCF, you sort of started the cloud-native journey with your databases.Jon: Yes, that was the thing that was on fire. Cash App was initially built as a hack week project, and it was never really designed to scale. So, it was just running on a single MySQL database for a really long time. And we actually literally put a piece of hardware on fire with that database. We managed to roll it, roll it off, of course, didn't take down our service, but it was actually smoking in our [laughs] data centers. It melted the service around it in its chassis. So, that was a big problem, and we needed to solve that very quickly. So, that's where we started.Emily: Could you actually go into that just a little bit more? I read the case study, but probably most listeners haven't. Why was the database such a big problem? And how did you solve it?Jon: Yeah, as I said, so we only had a single MySQL database. And as most people know, it's very hard to keep on scaling that, so we bought more and more expensive hardware. And since we were a mobile app, we don't get all the benefits from caching and replica reads, so most of the time, the user is actually accessing data that is already on the device, so they don't actually make any calls out to our back end to read the data. Usually, you scale out a database by adding replicas, and caching, and that sort of stuff, but that wasn't our bottleneck. Our bottleneck was that we simply could not write to the database, we couldn't update the database fast enough, or with enough capacity. So, we needed to shard it, and split up the data set into smaller parts that we could run on separate databases. And we used the thing called Vitess for that, which is a Cloud Native Foundation member, a product and [unintelligible] CNCF. And with Vitess, we were able to split up the database into smaller parts. It was quite a large project, and especially back then, Vitess was—it was quite early days. So, the Vitess was used to scale out YouTube and then it was open-sourced. And then, we started using it. I think, not long after that, it was also used by Slack. So now, currently Slack uses it for most of its data. And we started using it very early, so it was still kind of early days, and we had to build a lot of new functionality in there, and we had to port [00:15:20 unintelligible] make sure all of our queries worked with the Vitess. But then we were able to do shard splitting. So, without having to restart or have downtime in our app, we could split up the database into smaller parts, and then the Vitess would handle the routing of queries, and so on.Emily: If at all, how did that serve as the gateway to then starting to think about changing more of the application, or moving more into services as opposed to a monolith?Jon: Yeah, I think that was kind of orthogonal in some ways. So, while we scaled out the database layer, we also realized that we needed to scale out the human side of it. So, we have multiple teams being able to work independently. And that is something we haven't I think we haven't really gotten to completely, yet. So, while we've scaled out the database layer, we're not quite there from the human side of things.Emily: Why is it important to scale any of this out? I understand the database, but why is it important to get the scaling for the teams?Jon: Yeah, I mean, it's a very competitive space, what we're trying to do. We have a very formidable competitors, both from other apps and also from the big banks, and for us to be able to keep on delivering new features for our customers at a high pace, and be able to change those features to react to changing customer demands or, like during this crisis we are in now, and being able to respond to what our competitors are doing. I mean, that just makes us a more effective business. And we don't always know when we start a new product line where it's exactly going to lead us, we sort of look at what our customers are using it and where that takes us, and being able to respond to that quickly, that's something that is very hard if you have a big monolith that has a million lines of code and takes you several hours to compile, then it's going to be very hard for you to deliver functionality and make changes to functionality in a good time.Emily: Can you think of any examples where you're able to respond really quickly to something like this current crisis in a way that wouldn't have been possible with the old models?Jon: I don't actually know the details here. I live currently in Australia, so I don't know. But the US government is handing out these checks, right? So, you get some kind of a subsidy. And apparently, they were going to mail those out to a lot of people, but we actually stepped up and said, look, you can just Cash App them out to people. So, people sign up for a Cash App account, and then they can receive their subsidies directly into the Cash App accounts, or into their bank accounts via our payment rails. And we were able to execute on that very quickly, and I think we are now an official way to get that subsidy from the US government. So, that's something that we probably wouldn't have been able to do unless we've invested more to be able to respond to that so quickly, within just weeks, I think.Emily: And as Cash App has moved to increasingly service-oriented architectures and increasingly cloud-native, what has been surprisingly easy?Jon: Surprisingly easy. I don't think I've been surprised by anything being easy, to my recollection. I think most things have been surprisingly hard. [laughs]. I think we are still somewhat in the early days of this infrastructure, and there are so many issues; there's so many bugs; there's so many unknowns. And when you start digging into things, it just surprises you how hard. So, I work in the infrastructure team, and we try to provide a curated experience for our product teams, the product engineering teams, so we deal with that pain directly where we have to figure out how all these products work together, and how to build functionality on top of them. I think we deal with that pain for our product engineers. But of course, they are also running into things all the time. So, no, it is surprisingly hard sometimes, but it's all right.Emily: What do you think has been surprisingly challenging, unexpectedly challenging?Jon: Maybe I shouldn't be, but I am somewhat surprised how immature things still are. Just as an example, how hard it is, if you run a pod, in a EKS—Amazon Kubernetes cluster, and you just want to authenticate to be able to use other Amazon products like Dynamo, or S3, or something, this is still something that is incredibly hard to do. So, you would think that just having two products from the same vendor inside of the same ecosystem, you would think that that would be a no-brainer: that they would just work together, but no. I think we'll figure it out eventually, but currently, it's still a lot of work to get things to play well together.Emily: If you had a top-three wish list of things for the community to address, what do you think they would be?Jon: Yeah, I guess the out-of-the-box experience with all of these tools, so that they just work together really well, without having to manually set up a lot of different things, that'd be nice. I think I also, maybe this all exists, we haven't integrated all these tools, but something that struck me the other day, I was debugging some production issue—it wasn't a major issue, but it was an issue that had been an ongoing thing for two weeks—and I just wanted to see what change happened those two weeks ago. What was the delta? What made that change happen? And being able to get that information out of Kubernetes and Amazon—and maybe there's some audit logging tools and all this stuff, but it's not entirely clear how to use them, or how to turn them on, and so on. So, that's a really nice, user friendly, and easy to use kind of auditing, and audit trail tools would be really nice. So, that's one wish, I guess, in general: having a curated experience. So, if you start from scratch, and you want to get all of the best practice tools, and you want to get all the functionality out of a cloud infrastructure, there's still a lot of choices to make, and there's a lot of different tools that you need to set up to make them work together, Prometheus, and Grafana, and Kubernetes, and so on. And having a curated out-of-the-box experience that just makes everything work, and you don't have to think about everything, that would be quite nice. So, Kubernetes operators are great, and these CRDs, this metadata you can store and work with inside of Kubernetes is great, but unfortunately they don't play well with the rest of the cloud infrastructure at Amazon, at AWS. Amazon was working on this Amazon operator, which you would be able to configure other AWS resources from inside of the Kubernetes cluster. So, you could have a CRD for an S3 bucket, so you wouldn't need a Terraform. So right now, you can have Helm Charts and similar to manage the Kubernetes side of things, but then you also need Terraform stuff to manage the AWS side of things, but just something thing that unifies this, so you can have a single place for all your infrastructural metadata. That would be nice. And Amazon is working on this, and they open-sourced something like an AWS operator, but I think they actually withdrew it and they are doing something closed-source. I don't know where that project is going. But that would be really nice.Emily: Go back again to this idea of the business of cloud-native. To what extent do you have to talk about this with business stakeholders? What are those conversations look like?Jon: A Cash App, we usually do not pull in product and business people in these conversations, I think, except when it comes to cost [laughs] and budgeting. But they think more in terms of features and being able to deliver and have teams be able to execute independently, and so on. And our hope is that we can construct an infrastructure that provides these capabilities to our business side. So, it's almost like a black box. They don't know what's inside. We are responsible for figuring out how to give it to them, but they don't always know exactly what's inside of the box.Emily: Excellent. The last question is if there's an engineering tool you can't live without?Jon: I would say all of the JetBrains IDEs for development. I've been using those for maybe 20 years, and they keep on delivering new tools, and I just love them all.Emily: Well, thank you so much for joining.Jon: Thanks for inviting me to speak on the podcast.Announcer: Thank you for listening to The Business of Cloud Native podcast. Keep up with the latest on the podcast at thebusinessofcloudnative.com and subscribe on iTunes, Spotify, Google Podcasts, or wherever fine podcasts are distributed. We'll see you next time.This has been HumblePod production. Stay humble.

amazon spotify australia google business conversations coronavirus cloud google podcasts scaling square slack real world ux aws prometheus s3 dynamo kubernetes farther mysql terraform cloud native thoughtworks grafana cncf google wave crd square cash crds cash card emily omier emily well announcer thank jon thanks humblepod

Why Companies Go Cloud-Native with Austin Adams and Zach Arnold

The Business of Open Source

Play Episode Listen Later Jun 17, 2020 40:36

Some of the highlights of the show include The diplomacy that's required between software engineers and management, and why influence is needed to move projects forward to completion. Driving factors behind Ygrene's Kubernetes migration, which included an infrastructure bottleneck, a need to streamline deployment, and a desire to leverage their internal team of cloud experts. Management's request to ship code faster, and why it was important to the organization. How the company's engineers responded to the request to ship code faster, and overcame disconnects with management. How the team obtained executive buy-in for a Kubernetes migration. Key cultural changes that were required to make the migration to Kubernetes successful. How unexpected challenges forced the team to learn the “depths of Kubernetes,” and how it helped with root cause analysis. Why the transition to Kubernetes was a success, enabling the team to ship code faster, deliver more value, secure more customers, and drive more revenue. Links: HerdX: https://www.herdx.com/ Ygrene: https://ygrene.com/ Austin Twitter: https://twitter.com/_austbot Austin LinkedIn: https://www.linkedin.com/in/austbot/ Arnold's book on publisher site: https://www.packtpub.com/cloud-networking/the-kubernetes-workshop Arnold's book on Amazon: https://www.amazon.com/Kubernetes-Workshop-Interactive-Approach-Learning/dp/1838820752/ TranscriptAnnouncer: Welcome to The Business of Cloud Native podcast where we explore how end users talk and think about the transition to Kubernetes and cloud-native architectures.Emily: Welcome to The Business of Cloud Native. My name is Emily Omier, and I am here with Austin Adams and Zack Arnold, and we are here to talk about why companies go cloud-native.Austin: So, I'm currently the CTO of a small Agrotech startup called HerdX. And that means I spend my days designing software, designing architecture for how distributed systems talk, and also leading teams of engineers to build proof-of-concepts and then production systems as they take over the projects that I've designed. Emily: And then, what did you do at Ygrene? Austin: I did the exact same thing, except for without the CTO title. And I also had other higher-level engineers working with me at Ygrene. So, we made a lot of technical decisions together. We all migrated to Kubernetes together, and Zack was a chief proponent of that, especially with the culture change. So, I focused on the designing software that teams of implementation engineers could take over and actually build out for the long run. And I think Zack really focused on—oh, I'll let Zack say what he focused on. [laughs].Emily: Go for it, Zach.Zach: Hello. I'm Zack. I also no longer work for Ygrene, although I have a lot of admiration and respect for the people who do. It was a fantastic company. So, Austin called me up a while back and asked me to think about participating in a DevOps engineering role at Ygrene. And he sort of said at the outset, we don't really know what it looks like, and we're pretty sure that we just created a position out of a culture, but would you be willing to embody it? And up until this point, I'd had cloud experience, and I had had software engineering experience, but I didn't really spend a ton of time focused on the actual movement of software from developer's laptops to production with as few hiccups, and as many tests, and as much safety as possible in between. So, I always told people the role felt like it was three parts. It was part IT automation expert, part software engineer, and then part diplomat. And the diplomacy was mostly in between people who are more operations focused. So, support engineers, project managers, and people who were on-call day in and day out, and being a go-between higher levels of management and software engineers themselves because there's this awkward, coordinated motion that has to really happen at a fine-grained level in order to get DevOps to really work at a company. What I mean by that is, essentially, Dev and Ops seem to on the surface have opposing goals, the operation staff, it's job is to maintain stability, and the development side's job is to introduce change, which invariably introduces instability. So, that dichotomy means that being able to simultaneously satisfy both desires is really a goal of DevOps, but it's difficult to achieve at an organizational level without dealing with some pretty critical cultural components. So, what do I spend my day on? The answer to that question is, yes. It really depends on the day. Sometimes it's cloud engineers. Sometimes it's QA folks, sometimes it's management. Sometimes I'm heads-down writing software for integrations in between tools. And every now and again, I get to contribute to open-source. So, a lot of different actual daily tasks take place in my position.Emily: Tell me a little bit more about this diplomacy between software engineers and management.Zach: [laughs]. Well, I'm not sure who's going to be listening in this amazing audience of ours, but I assume, because people are human, that they have capital O-pinions about how things should work, especially as it pertains to either software development lifecycle, the ITIL process of introducing change into a datacenter, into a cloud environment, compliance, security. There's lots of, I'll call them thought frameworks that have a very narrow focus on how we should be doing something with respect to software. So, diplomacy is the—well, I guess in true statecraft, it's being able to work in between countries. But in this particular case, diplomacy is using relational equity or influence, to be able to have every group achieve a common and shared purpose. At the end of the day, in most companies the goal is actually to be able to produce a product that people would want to pay for, and we can do so as quickly and as efficiently as possible. To do that, though, it again requires a lot of people with differing goals to work together towards that shared purpose. So, the diplomacy looks like, aside from just having way too many meetings, it actually looks like being able to communicate other thought frameworks to different stakeholders and being able to synthesize all of the different narrow-focused frameworks into a common shared, overarching process. So, I'll give you a concrete example because it feels like I just spewed a bunch of buzzwords. A concrete example would be, let's say in the common feature that's being delivered for ABC Company, for this feature it requires X number of hours of software development; X number of hours of testing; X number of hours of preparing, either capacity planning, or fleet size recommendations, or some form of operational pre-work; and then the actual deployment, and running, and monitoring. So, in the company that I currently work for, we just described roughly 20 different teams that would have to work together in order to achieve the delivery of this feature as rapidly as possible. So, the process of DevOps and the diplomacy of DevOps, for me looks like—aside from trying to automate as much as humanly possible and to provide what I call interface guarantees, which are basically shared agreements of functionality between two teams. So, the way that the developers will speak to the QA engineers is through Git. They develop new software, and they push it into shared code repositories, the way that the QA engineers will speak to people who are going to be handling the deployments—or at management in this particular case—is going to be through a well-formatted XML test file. So, providing automation around those particular interfaces and then ensuring that everyone's shared goals are met at the particular period of time where they're going to be invoked over the course of the delivery of that feature, is the “subtle art,”—air quotes, you can't see but—to me of DevOps diplomacy. That kind of help?Emily: Yeah, absolutely. Let's take, actually, just a little bit of a step back. Can you talk about what some of the business goals were behind moving to Kubernetes for Ygrene? Who was the champion of this move? Was it business stakeholders saying, “Hey, we really need this to change,” or engineering going to business stakeholders? Who needed a change. I believe that the desire for Kubernetes came from a bottleneck of infrastructure. Not so much around performance, such as the applications weren't performing due to scale. We had projected scale that we were coming to where it would cause a problem potentially, but it was also in the ease of deployment. It had a very operations mindset as Zack was saying, our infrastructure was almost entirely managed—of the core applications set—by outsourcing. And so, we depended on them to innovate, we depended on them to spin up new environments and services. But we also have this internal competing team that always had this cloud background. And so, what we were trying to do was lessen the time between idea to deployment by utilizing platforms that were more scalable, more flexible, and all the things that Docker gives with the Dev/Prod Parity, the ease of packaging your environment together so that small team can ship an entire application. And so, I think our main goal with that was to take that team that already had a lot of cloud experience, and give them more power to drive the innovation and not be bottlenecked just by what the outsourcing team could do. Which, by the way, just for the record, the outsourcing team was an amazing team, but they didn't have the Kubernetes or cloud experience, either. So, in terms of a hero or champion of it, it just started as an idea between me and the new CTO, or CIO that came in, talking about how can we ship code faster? So, one of the things that happened in my career was the desire for a rapid response team which, that sounds like a buzzword or something, but it was this idea that Ygrene was shipping software fairly slow, and we wanted to get faster. So, really the CIO, and one of the development managers, they were the really big champions of, “Hey, let's deliver value to the business faster.” And they had the experience to ask their engineers how to make that happen, and then trust Zack and I through this process of delivering Kubernetes, and Istio, and container security, and all these different things that eventually got implemented.Emily: Why do you think shipping code faster matters?Austin: I think, for this company, why it mattered was the PACE financing industry is relatively new. And while financing has some old established patterns, I feel like there's still always room for innovation. If you hear the early days of the Bridgewater Financial Hedge Fund, they were a source of innovation and they used technology to deliver new types of assets and things like that. And so, our team at Ygrene was excellent because they wanted to try new things. They wanted to try new patterns of PACE financing, or ways of getting in front of the customer, or connections with different analytics so they could understand their customer better. So, it was important to be able to try things, experiment to see what was going to be successful. To get things out into the real world to know, okay, this is actually going to work, or no, this isn't going to work. And then, also, one of the things within financing is—especially newer financing—is there's a lot of speed bumps along the way. Compliance laws can come into effect, as well as working with cities and governments that have specialized rules and specialized things that they need—because everyone's an expert when it comes to legislation, apparently—they decide that they need X, and they give us a time when we have to get it done. And so, we actually have another customer out there, which is the legislative bodies. So, they have to get the software—their features that are needed within the financing system out by certain dates, or we're no longer eligible to operate in those counties. So, one of it was a core business risk, so we needed to be able to deliver faster. The other was how can we grow the business?Emily: Zach, this might be a question for you. Was there anything that was lost in translation as you were explaining what engineering was going to do in order to meet this goal of shipping code faster, of being more agile, when you were talking to C level management? How did they understand, and did anything get lost in translation?Zach: One of the largest disconnects, both on a technical and from a high level speaking to management issue I had was explaining how we were no longer going to be managing application servers as though they were pets. When you come from an on-premise setup, and you've got your VMware ESXi, and you're managing virtual machines, the most important thing that you have is backups because you want to keep those machines exactly as they are, and you install new software on those machines. When Kubernetes says, I'm going to put your pods wherever they fit on the cluster, assuming it conforms with the scheduling pattern, and if a node dies, it's totally fine, I'm going to spin a new one up for you, and move pods around and ensure that the application is exactly as you had stated—as in, it's in its desired state—that kind of thinking from switching from infrastructure as pets to infrastructure as cattle, is difficult to explain to people who have spent their careers in building and maintaining datacenters. And I think a lot—well, it's not guaranteed that this is across the board, but if you want to talk about a generational divide, people that usually occupy the C level office chairs are familiar with—in their heyday of their career—a datacenter-based setup. In a cloud-based consumption model where it really doesn't matter—I can just spin up anything anywhere—when you talk about moving from reasoning about your application as the servers it comprises and instead talking about your application as the workload it comprises, it becomes a place where you have to really, really concretely explain to people exactly how it's going to work that the entire earth will not come crashing down if you lose a server, or if you lose a pod, or if a container hiccups and gets restarted by Kubernetes on that node. I think that was the real key one. And the reason why that actually became incredibly beneficial for us is because once we actually had that executive buy-off when it came to, while I still may not understand, I trust that you know what you're doing and that this infrastructure really is replaceable, it allowed us to get a little bit more aggressive with how we managed our resources. So, now using Horizontal Pod Autoscaling, using the Kubernetes Cluster Autoscaler, and leveraging Amazon EC2 Spot Fleets, we were only ever paying for the exact amount of infrastructure that was required to run our business. And I think that is usually the thing that translates the best to management and non-technical leadership. Because when it comes down to if I'm aware that using this tool, and using a cloud-native approach to running my application, I am only ever going to be paying for the computational resource that I need in that exact minute to run my business, then the budget discussions become a lot easier, because everyone is aware that this is your exact run-rate when it comes to technology. Does that make sense? Emily: Absolutely. How important was having that executive buy-in? My understanding is that a lot of companies, they think that they're going to get all these savings from Kubernetes, and it doesn't always materialize. So, I'm just curious, it sounds like it really did for Ygrene.Zach: There was two things that really worked well for us when this transformation was taking place. The first was, Ygrene was still growing, so if the budget grew alongside of the growth of the company, nobody noticed. So, that was one really incredible thing that happened that, I think, now having had different positions in the industry, I don't know if I appreciated that enough because if you're attempting to make a cost-neutral migration to the Cloud, or to adopt cloud-native management principles, you're going to probably move too little, too late. And when that happens, you run the risk of really doing a poor job of adopting cloud-native, and then scrapping that project, because it never materialized the benefit, as you just described, that some people didn't experience. And the other benefit that we had, I think was the fact that because there were enough incredibly senior technical people—and again, I learned everything from these people—working with us, and because we were all, for the most part, on the same page when it came to this migration, it was easy to have a unified front with our management because every engineer saw the value of this new way of running our infrastructure and running our application. In one non—and this obviously helps with our engineers—one non-monetary benefit that helped really get the buy-in was the fact that, with Kubernetes, our on-call SEV-1 pages went down, I want to say, by over 40 percent which was insane because Kubernetes was automatically intervening in the case where servers went down. JVMs run out of memory, exceptions cause strange things, but a simple restart usually fixes the vast majority of them. Well, now Kubernetes was doing this and we didn't need to wake somebody up in order to keep the machine running.Emily: From when you started this transition to when you, I should say, when you probably left the company, but what were some of the surprises, either surprises for you, or surprises for other people in the organization?Austin: The initial surprise was the yes that we got. So, initially I pitched it and started talking about it, and then the culture started changing to where we realized we really needed to change, and bringing Zack on and then getting the yes from management was the initial surprise. And—Emily: Why was that a surprise?Austin: It was just surprising because, when you work as an engineer—I mean, none of us were C suite, or Dev managers, or anything. We were just highly respected engineers working in the HQ. So, it was just a surprise that what we felt was a semi-crazy idea at the time—because Kubernetes was a little bit earlier. I mean, EKS wasn't even a thing from Amazon. We ran our Kubernetes clusters from the hip, which is using kops, which is—kops is a great tool, but obviously it wasn't managed. It was managed by us, mainly by Zach and his team, to be honest. So, that was a surprise that they would trust a billion-dollar financing engine to run on the proposal of two engineers. And then, the next ones were just how much the single-server, vertical scaling, and depending on running on the same server was into our applications. So, as we started to look at the core applications and moving them into a containerized environment, but also into an environment that can be spun up and spun down, looking at the assumptions the application was making around being on the same server; having specific IP addresses, or hostnames; and things like that, where we had to take those assumptions out and make things more flexible. So, we had to remove some stateful assumptions in the applications, that was a surprise. We also had to enforce more of the idea of idempotency, especially when introducing Istio, and [00:21:44 retryable] connections and retryable logic around circuit breaking and service-to-service communication. So, some of those were the bigger surprises, is the paradigm shift between, “Okay, we've got this service that's always going to run on the same machine, and it's always going to have local access to its files,” to, “Now we're on a pod that's got a volume mounted, and there's 50 of them.” And it's just different. So, that was a big—[laughs], that was a big surprise for us.Emily: Was there anything that you'd call a pleasant surprise? Things that went well that you anticipated to be really difficult?Zach: Oh, my gosh, yes. When you read through Kubernetes for the first time, you tend to have this—especially if somebody else told you, “Hey, we're going to do this,” this sinking feeling of, “Oh my god, I don't even know nothing,” because it's so immense in its complexity. It requires a retooling of how you think, but there have been lots of open-source community efforts to improve the cluster lifecycle management of Kubernetes, and one such project that really helped us get going—do you remember this Austin?—was kops.Austin: Yep. Yep, kops is great.Zach: I want to say Justin Santa Barbara was the original creator of that project, and it's still open source, and I think he still maintains it. But to have a production-ready, and we really mean production-ready: it was private, everything was isolated, the CNI was provisioned correctly, everything was in the right place, to have a fully production-ready Kubernetes cluster ready to go within a few hours of us being able to learn about this tool in AWS was huge because then we could start to focus on what we didn't even understand inside of the cluster. Because there were lots of—Kubernetes is—there's two sides of it, and both of them are confusing. There's the infrastructure that participates in the cluster, and there's the actual components inside of the cluster which get orchestrated to make your application possible. So, not having to initially focus on the infrastructure that made up the cluster, so we could just figure out the difference between our butt and the hole in the ground, when it came to our application inside of Kubernetes was immensely helpful to us. I mean, there are a lot of tools these days that do that now: GKE, EKS, AKS, but we got into Kubernetes right after it went GA, and this was huge to help with that.Emily: Can you tell me also a little bit about the cultural changes that had to happen? And what were these cultural changes, and then how did it go?Zach: As Austin said, the notion of—I think a lot—and I don't want to offer this as a sweeping statement—but I think the vast majority of the engineers that we had in Seattle, in San Jose, and in Petaluma where the company was headquartered, I think, even if they didn't understand what the word idempotent meant, they understood more or less how that was going to work. The larger challenge for us was actually in helping our contractors, who actually made up the vast majority of our labor force towards the end of my tenure there, how a lot of these principles worked in software. So, take a perfect example: part of the application is written in Ruby on Rails, and in Ruby on Rails, there's a concept of one-off tasks called rake tasks. When you are running a single server, and you're sending lots of emails that have attachments, those attachments have to be on the file system. And this is the phrase I always said to people, as we refactor the code together, I repeated the statement, “You have to pretend this request is going to start on one server and finish on a different one, and you don't know what either of them are, ahead of time.” And I think using just that simple nugget really helped, culturally, start to reshape this skill of people because when you can't use or depend on something like the file system, or you can't depend on that I'm still on the same server, you begin to break your task into components, and you begin to store those components in either a central database or a central file system like Amazon S3. And adopting those parts of, I would call, cloud-native engineering were critical to the cultural adoption of this tool. I think the other thing was, obviously, lots of training had to take place. And I think a lot of operational handoff had to take place. I remember for, basically, a fairly long stretch of time, I was on-call along with whoever was also on-call because I had the vast majority of the operational knowledge of Kubernetes for that particular team. So, I think there was a good bit of rescaling and mindset shift from the technical side of being able to adopt a cloud-native approach to software building. Does that make sense?Emily: Absolutely. What do you think actually were some of the biggest challenges or the biggest pain points? Zach: So, challenges of cultural shift, or challenges of specifically Kubernetes adoption?Emily: I was thinking challenges of Kubernetes adoption, but I'm also curious about the cultural shift if that's one of the biggest pain points.Zach: It really was for us. I think—because now it wouldn't—if you wanted to take out Kubernetes and replace it with Nomad there? All of the engineers would know what you're talking about. It wouldn't take but whatever the amount of time it would to migrate your Kubernetes manifests to Nomad HCL files. So, I do think the rescaling and the mindset shift, culturally speaking, was probably the thing that helped solidify it from an engineering level. But Kubernetes adoption—or at least problems in Kubernetes adoption, there was a lot of migration horror stories that we encountered. A lot of cluster instability in earlier versions of Kubernetes prevented any form of smooth upgrades. I had to leave—it was with my brother's—it was his wedding, what was it—oh, rehearsal dinner, that's what it was. I had to leave his rehearsal dinner because the production cluster for Ygrene went down, and we needed to get it back up. So, lots of funny stories like that. Or Nordstrom did a really fantastic talk on this in KubeCon in Austin in 2017. But the [00:28:57 unintelligible] split-brain problem where suddenly the consensus in between all of the Kubernetes master nodes began to fail for one reason or another. And because they were serving incorrect information to the controller managers, then the controller managers were acting on incorrect information and causing the schedulers to do really crazy things, like delete entire deployments, or move pods, or kill nodes, or lots of interesting things. I think we unnecessarily bit off a little bit too much when it came to trying to do tricky stuff when it came to infrastructure. We introduced a good bit of instability when it came to Amazon EC2 Spot that I think, all things considered, I would have revised the decision on that. Because we faced a lot of node instability, which translated into application instability, which would cause really, really interesting edge cases to show up basically only in production.Austin: One of the more notable ones—and I think this is the symptom of one of the larger challenges was during testing, one of our project managers that also helped out in the testing side—technical project managers—which we nicknamed the Edge Case Factory, because she was just, anointed, or somehow had this superpower to find the most interesting edge cases, and things that never went wrong for anyone else always went wrong for her, and it really helped us build more robust software for sure, but there's some people out there with mutant powers to catch bugs, and she was one of them. We had two clusters, we had lower environment clusters, and then we had production cluster. The production cluster hosted two namespaces: the staging namespace, which is supposed to be an exact copy of production; and then the production namespace, so that you can smoke-test legitimate production resources, and blah blah blah. So, one time, we started to get some calls that, all of a sudden, people were getting the staging environment underneath the production URL. Zach: Yeah.Austin: And we were like, “Uh… excuse me?” It comes down to—we eventually figured it out. It was something within the networking layer. But it was this thing, as we rolled along, the deeper understanding of, okay, how does this—to use a term that Zack Arnold coined—this benevolent botnet, how does this thing even work, at the most fundamental and most detailed levels? And so, as problems and issues would occur, pre-production or even in production, we had to really learn the depths of Kubernetes. And I think the reason we had to learn it at that stage was because of how new Kubernetes was, all things considered. But I think now with a lot more of the managed systems, I would say it's not necessary, but it's definitely helpful to really know how Kubernetes works down in the depths. So, that was one of the big challenges was, to put it succinctly, when an issue comes up, knowing really what's going on under the hood, really, really helped us as we discovered and learned things about Kubernetes.Zach: And what you're saying, Austin, was really illuminated by the fact that the telemetry that we had in production was not sufficient, in our minds, at least until very recently, to be able to adequately capture all the data necessary to accurately do root cause analyses on particular issues. In early days, there was far too much root cause analysis by, “It was probably this,” and then we moved on. Now having actually taken the time to instrument tracing, to instrument metrics, to instrument logs with correlation, we used, eventually, Datadog, but working our way through the various telemetry tools to achieve this, we really struggled being able to give accurate information to stakeholders about what was really going wrong in production. And I think Austin was probably the first person in the headquarters side of the company—I'm not entirely certain about some of our satellite dev offices—but to really champion a data-driven way of actually running software. Which, it seems trivial now because obviously that's how a lot of these tools work out of the box. But for us, it was really like, “Oh, I guess we really do need to think about the HTTP error rate.” [laughs].Emily: So, taking another step back here, do you think that Ygrene got everything that it expected, or that it wanted out of moving to Kubernetes?Austin: I think we're obviously playing up some of the challenges that we had because it was our day-to-day, but I do believe that trust in the dev team grew, we were able to deploy code during the day, which we could have done that in the beginning, even with vertically scaled infrastructure, we would have done it with downtime, but it really was that as we started to show that Kubernetes and these cloud-native tools like Fluentd, Prometheus, Istio, and other things like that when you set them up properly, they do take a lot of the risk out. It added trust in the development team. It gave more responsibility to the developers to manage their own code in production, which is the DevOps culture, the DevOps mindset. And I think in the end, we were able to ship code faster, we were able to deliver more value, we were able to go into new jurisdictions and markets quicker, to get more customers, and to ultimately increase the amount of revenue that Ygrene had. So, it built a bridge between the data science side of things, the development side of things, the project management side of things, and the compliance side of things. So, I definitely think they got a lot out of trusting us with this migration. I think that were we to continue, probably Zack and I even to this day, we would have been able to implement more, and more, and more. Obviously, I left the company, Zach left the company to pursue other opportunities, but I do believe we left them in a good spot to take this ecosystem that was put in place and run with it. To continue to innovate and do experiments to get more business.Zach: Emily, I'd characterize it with an anecdote. After our Chief Information Officer left the company, our Chief Operating Officer actually took over the management of the Technology Group, and aside from basically giving dev management carte blanche authority to do as they needed to, I think there was so much trust there that we didn't have at the beginning of our journey with technology and Ygrene. And it was characterized in, we had monthly calls with all of the regional account managers, which are basically our out-of-office sales staff. And generally, the project managers from our group would have to sit in those meetings and hear just about how terrible our technology was relative to the competition, either lacking in features, lacking in stability, lacking in design quality, lacking in user interface design, or way overdoing the amount of compliance we had to have. And towards the end of my tenure, those complaints dropped to zero, which I think was really a testament to the fact that we were running things stably, the amount of on-call pages went down tremendously, the amount of user-impacting production outages was dramatically reduced, and I think the overall quality of software increased with every release. And to be able to say that, as a finance company, we were able to deploy 10 times during the day if we needed to, and not because it was an emergency, but because it was genuinely a value-added feature for customers. I think that that really demonstrated that we reached a level of success adopting Kubernetes and cloud-native, that really helped our business win. And we positioned them, basically, now to make experiments that they thought would work from a business sense we implement the technology behind it, and then we find out whether or not we were right.Emily: Let's go ahead and wrap up. We're nearing the top of the hour, but just two questions for both of you. One is, where could listeners find you or connect with you? And the second one is, do you have a can't-live-without engineering tool?Austin: Yeah, so I'll go first. Listeners can find me on Twitter @_austbot, or on LinkedIn. Those are really the only tools I use. And I can't really live without Prometheus and Grafana. I really love being able to see everything that's happening in my applications. I love instrumentation. I'm very data-driven on what's happening inside. So, obviously Kubernetes is there, but it's almost become that Kubernetes is the Cloud. I don't even think about it anymore. It's these other tools that help us monitor and create active monitoring paradigms in our application so we can deploy fast, and know if we broke something. Zach: And if you want to stay in contact with me, I would recommend not using Twitter, I lost my password and I'm not entirely certain how to get it back. I don't have a blue checkmark, so I can't talk to Twitter about that. I probably am on LinkedIn… you know what, you can find me in my house. I'm currently working. The engineering tool that I really can't live without, I think my IDE. I use IntelliJ by JetBrains, and—Austin: Yeah, it's good stuff.Zach: —I think I wouldn't be able to program without it. I fear for my next coding interview because I'll be pretending that there's type ahead completion in a Google Doc, and it just won't work. So, yeah, I think that would be the tool I'd keep forever.Austin: And if any of Zach's managers are listening, he's not planning on doing any coding interviews anytime soon.Zach: [laughs]. Yes, obviously.Emily: Well, thank you so much. Zach: Emily Omier, thank you so much for your time.Austin: Right, thanks.Austin: And don't forget Zack is an author. He and his team worked very hard on that book.Emily: Zack, do you want to give a plug to your book?Zach: Oh, yeah. Some really intelligent people that, for some reason, dragged me along, worked on a book. Basically it started as an introduction to Kubernetes, and it turned into a Master's Course on Kubernetes. It's from Packt Publishing and yeah, you can find it there, amazon.com or steal it on the internet. If you're looking to get started with Kubernetes I cannot recommend the team that worked on this book enough. It was a real honor to be able to work with people I consider to be heavyweights in the industry. It was really fun.Emily: Thank you so much.Announcer: Thank you for listening to The Business of Cloud Native podcast. Keep up with the latest on the podcast at thebusinessofcloudnative.com and subscribe on iTunes, Spotify, Google Podcasts, or wherever fine podcasts are distributed. We'll see you next time.This has been HumblePod production. Stay humble.

amazon spotify business master management seattle driving companies ga cloud adams google podcasts ip pace cto compliance real world chief operating officer san jose cio hq nomad aws lost in translation prometheus dev qa devops google docs ops ide docker kubernetes git chief information officers ruby on rails petaluma xml cloud native datadog cni itil sev eks jetbrains aks technology group amazon s3 grafana kubecon istio intellij agrotech gke vmware esxi zack arnold packt publishing jvms emily omier emily well announcer thank ygrene austin it zach it humblepod zach there

Improving Developer Happiness on Kubernetes, But First: Who Does Configuration?

The New Stack Podcast

Play Episode Listen Later Feb 13, 2020 41:01

In this episode of The New Stack Analysts podcast, we explore how two Kubernetes Special Interest Groups (SIG) are taking into account all of the various user personas in order to help shape the user experience on Kubernetes. Guests on this episode include: Tasha Drew, co-chair of the Kubernetes Usability SIG and a product line manager for Kubernetes and Project Pacific at VMware. Lei Zhang, co-chair of the Kubernetes Application Delivery SIG and a staff engineer at Alibaba. Emily Omier, The New Stack correspondent and a content strategist for cloud native companies.

happiness improving developers alibaba vmware kubernetes configuration new stack emily omier new stack analysts

Improving Developer Happiness on Kubernetes, But First: Who Does Configuration?

The New Stack Analysts

Play Episode Listen Later Feb 13, 2020 41:00

happiness improving developers alibaba vmware kubernetes configuration new stack emily omier new stack analysts