POPULARITY
"85% of AI use cases are being evaluated by the engineer who built it saying, 'yep, seemed to work pretty well.' If you're gonna build a system that's going to be critical to the business, that's going to be important that it gets it right, then you can't do that without evaluations." - Craig Wiley Fresh out of the studio, Craig Wiley, Senior Director of Product Management at Databricks who leads Mosaic AI, joins us to discuss the forefront of enterprise AI from model development to deployment at scale. Beginning with his career journey in ML operations, Craig explained how he recognized the critical connection between data and AI layers that could deliver order-of-magnitude acceleration in development cycles. Emphasizing the transition from classical ML operations to LLM operations, he showcased how Databricks' unified platform eliminates training-serving skew through data lineage capabilities and supports both fine-tuning and RAG approaches depending on industrial use case requirements. Highlighting compelling customer success stories including Suncorp's employee productivity platform and AstraZeneca's transformation of 400,000 clinical trial documents into queryable insights, Craig revealed a striking reality about enterprise AI evaluation - that 85% of AI use cases are being evaluated only by the engineers who built them, reinforcing that proper evaluation frameworks remain foundational for trustworthy AI implementation. He concluded by introducing Agent Bricks as Databricks' evaluation-centric approach to building production agents, emphasizing that model flexibility and rigorous testing are essential for enterprises moving from experimentation to production, while sharing his vision that the industry must evolve from the "year of agents" to the "year of evaluation and quality." Episode Highlights: [00:00] Quote of the Day by Craig Wiley [01:21] How Craig Wiley started his work in ML Ops that led him to Databricks [02:43] Data and AI layer connection creates order-of-magnitude acceleration [03:47] Mosaic AI acquisition expanded Gen AI solution capabilities [04:38] Classical ML statistics versus Gen AI evaluation challenges [05:48] Mosaic AI covers end-to-end from data ingestion [07:12] Training-serving skew eliminated through unified platform lineage [08:51] Fine tuning versus RAG depends on use case [10:49] Industrial agents benefit from fine-tuned smaller models [12:44] Common governance scheme covers tables through model access [13:52] Agent Bricks prioritizes accuracy over simplicity alone [15:44] Model flexibility crucial for speed and accuracy optimization [16:54] AB testing different models shows immediate performance differences [17:59] Suncorp and AstraZeneca demonstrate diverse AI applications [19:37] Asia Pacific shows aggressive AI adoption strategies [20:59] CFO approval requires proven agent effectiveness evaluation [22:00] 85% of AI cases evaluated only by building engineer [23:20] Model agnostic approach beats single-vendor AI strategies [24:12] Industry terminology evolves rapidly from RAG to agents [25:39] Customer creativity with governance capabilities inspires product development Profile: Craig Wiley, Senior Director of Product Management at Databricks and Mosaic AI LinkedIn: https://www.linkedin.com/in/craigwiley/ Podcast Information: Bernard Leong hosts and produces the show. The proper credits for the intro and end music are "Energetic Sports Drive." G. Thomas Craig mixed and edited the episode in both video and audio format. Here are the links to watch or listen to our podcast. Analyse Asia Main Site: https://analyse.asia Analyse Asia Spotify: https://open.spotify.com/show/1kkRwzRZa4JCICr2vm0vGl Analyse Asia Apple Podcasts: https://podcasts.apple.com/us/podcast/analyse-asia-with-bernard-leong/id914868245 Analyse Asia LinkedIn: https://www.linkedin.com/company/analyse-asia/ Analyse Asia X (formerly known as Twitter): https://twitter.com/analyseasia Sign Up for Our This Week in Asia Newsletter: https://www.analyse.asia/#/portal/signup Subscribe Newsletter on LinkedIn https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7149559878934540288
Gagan Singh of Elastic discuses how agentic AI systems reduce analyst burnout by automatically triaging security alerts, resulting in measurable ROI for organizationsTopics Include:AI breaks security silos between teams, data, and tools in SOCsAttackers gain system access; SOC teams have only 40 minutes to detect/containAlert overload causes analyst burnout; thousands of low-value alerts overwhelm teams dailyAI inevitable for SOCs to process data, separate false positives from real threatsAgentic systems understand environment, reason through problems, take action without hand-holdingAttack discovery capability reduces hundreds of alerts to 3-4 prioritized threat discoveriesAI provides ROI metrics: processed alerts, filtered noise, hours saved for organizationsRAG (Retrieval Augmented Generation) prevents hallucination by adding enterprise context to LLMsAWS integration uses SageMaker, Bedrock, Anthropic models with Elasticsearch vector database capabilitiesEnd-to-end LLM observability tracks costs, tokens, invocations, errors, and performance bottlenecksJunior analysts detect nation-state attacks; teams shift from reactive to proactive securityFuture requires balancing costs, data richness, sovereignty, model choice, human-machine collaborationParticipants:Gagan Singh – Vice President Product Marketing, ElasticAdditional Links:Elastic – LinkedIn - Website – AWS Marketplace See how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon.com/isv/
Akanksha Bilani of Intel shares how businesses can successfully adopt generative AI with significant performance gains while saving on costs.Topics Include:Akanksha runs go-to-market team for Amazon at IntelPersonal and business devices transformed how we communicateForrester predicts 500 billion connected devices by 20265,000 billion sensors will be smartly connected online40% of machines will communicate machine-to-machineWe're living in a world of data delugeAI and Gen AI help make data effectiveGoal is making businesses more profitable and effectiveVarious industries need Gen AI and data transformationIntel advises companies as partners with AWSThree factors determine which Gen AI use cases adoptFactor one: availability and ease of use casesHow unique and important are they for business?Does it have enough data for right analytics?Factor two: purchasing power for Gen AI adoption70% of companies target Gen AI but lack clarityLeaders must ensure capability and purchasing power existFactor three: necessary skill sets for implementationNeed access to right partnerships if lacking skillsIntel and AWS partnered for 18 years since inceptionIntel provides latest silicon customized for Amazon servicesEngineer-to-engineer collaboration on each processor generation92% of EC2 runs on Intel processorsIntel powers compute capability for EC2-based servicesIntel ensures access to skillsets making cloud aliveAWS services include Bedrock, SageMaker, DLAMIs, KinesisPerformance is the top three priorities for successNot every use case requires expensive GPU acceleratorsCPUs can power AI inference and training effectivelyEvery GPU has a CPU head node component Participants:Akanksha Bilani – Global Sales Director, IntelSee how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon/isv/
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
This episode provides a detailed guide for individuals preparing for the AWS Certified AI Practitioner (AIF-C01) exam. It outlines the certification's target audience and validated skills, emphasizing a foundational understanding of Artificial Intelligence (AI), Machine Learning (ML), and particularly Generative AI within the AWS cloud. The document breaks down the exam content into five domains, covering everything from basic AI/ML concepts and the machine learning lifecycle to generative AI fundamentals, applying foundation models, responsible AI practices, and security and governance considerations. It also includes practical preparation tips, highlighting key AWS services, recommended study resources (including official AWS materials and third-party practice exams), the importance of hands-on experience, and advice for the exam day, along with sample questions and testimonials from successful candidates.
What happens when access to advanced AI models is no longer the real differentiator, and the true advantage lies in how businesses leverage their own data? At the AWS Summit in London, I sat down with Rahul Pathak, Vice President of Data and AI Go-to-Market at AWS, to unpack this question and explore how organisations are moving beyond experimentation and into large-scale generative AI adoption. Recorded live on the show floor, this conversation explores how AWS is supporting customers at every layer of their AI journey. From custom silicon innovations like Trainium and Inferentia to scalable services like Bedrock, Q Developer, and SageMaker, AWS is giving businesses the infrastructure, tools, and flexibility to innovate with confidence. Rahul shared how leading organisations such as BT Group, SAP, and Lonely Planet are already applying these tools to reduce costs, speed up development cycles, and deliver tailored experiences that would have been unthinkable just a few years ago. A key theme that emerged in our discussion is that data, not just models, is the true foundation of effective AI. Rahul explained why unifying data across silos is critical and how AWS is helping companies create more intelligent applications by connecting what they uniquely know about their business to powerful AI capabilities. We also addressed the operational realities of AI deployment. From moving proof-of-concept projects into production to meeting the growing demand for responsible AI, the challenges are shifting. Organisations are now focused on trust, security, transparency, and measurable value. If you're leading digital transformation and wondering how to scale AI solutions that deliver on business outcomes, this episode provides practical insight from someone at the center of the industry. How will your business stand out in a world where every company has access to AI models, but only a few know how to apply them with purpose?
Web and Mobile App Development (Language Agnostic, and Based on Real-life experience!)
In this episode, host Krish Palaniappan welcomes back Ramya Ganesh to discuss Amazon Bedrock and its applications in AI and cloud computing. Ramya shares her extensive experience with AWS, particularly in cybersecurity and AI, and explains the differences between Bedrock and SageMaker. The conversation delves into practical use cases, such as code generation and architectural diagrams, while also addressing the challenges and considerations when integrating Bedrock into existing applications. The episode concludes with insights on prototyping with AWS AI tools and the future of AI development. In this conversation, Krish Palaniappan and Ramya Ganesh delve into the intricacies of using AWS Bedrock for model selection and application development. They explore the open-source nature of certain applications, the importance of selecting the right model for specific problems, and the nuances of model configurations. The discussion also covers how to compare different models and the next steps for integrating these models into applications.
AWS Morning Brief for the week of April 7th, with Corey Quinn. Links:Amazon EC2 now supports more bandwidth and jumbo frames to select destinationsAPI Gateway launches support for dual-stack (IPv4 and IPv6) endpointsAWS Lambda adds support for Ruby 3.4Amazon CloudWatch Logs increases maximum log event size to 1 MBAmazon Neptune announces 99.99% availability Service Level AgreementAnnouncing the general availability of Amazon VPC Route ServerUnder the hood: Amazon EKS Auto ModeOptimizing cost savings: The advantage of Amazon Aurora over self-managed open source databasesHow AWS Sales uses generative AI to streamline account planningIssue with AWS SAM CLI (CVE-2025-3047, CVE-2025-3048)
Web and Mobile App Development (Language Agnostic, and Based on Real-life experience!)
In this conversation, Krish Palaniappan explores various AWS AI products and services, discussing their applications, features, and potential use cases. He emphasizes the importance of understanding these tools at a foundational level, especially for beginners in the AI space. The discussion covers specific AWS offerings like Amazon Q, SageMaker, and App Studio, as well as the significance of human review in machine learning through Augmented AI. The conversation aims to provide insights into navigating the complex landscape of AWS AI tools and their integration into business processes. Snowpal Products Backends as Services on AWS Marketplace Mobile Apps on App Store and Play Store Web App Education Platform for Learners and Course Creators
Edo Liberty left a high-paying job at AWS—where he was building AI at the highest level—to start Pinecone, a company no one understood. He pitched 40+ VCs, got rejected by every single one, and nearly ran out of money. Then, he flipped the pitch, raised $10M, and built one of the most important infrastructure companies in AI.Then ChatGPT dropped.Suddenly, Pinecone was the must-have database for AI apps, with thousands of developers signing up daily. The company exploded, leading to a $100M round led by Andreessen Horowitz and a 10x revenue surge.If you're an early-stage founder, this episode is a must-listen.Why you should listen:•How he went from from 40 VC Rejections to a $10M Seed Round• Why he quit a High-Paying Job at AWS to start a Startup• The game-changing shift that made VCs finally “get it”•What really happened inside Pinecone when AI took off•Why most founders misunderstand market timing and what to do about itKeywordsAI, Machine Learning, Startups, Entrepreneurship, Vector Databases, Fundraising, SageMaker, AWS, Technology, Innovation, Pinecone, vector database, seed funding, ChatGPT, startup growth, business model, AI, infrastructure, early stage foundersTimestamps(00:00:00) Intro(00:07:50) Edo's Story(00:12:27) The Early Days of Machine Learning(00:32:23) Seed Funding(00:42:09) Unsustainable Scaling(00:53:41) Told You So(00:59:24) A Piece of AdviceSend me a message to let me know what you think!
Thrilled to have had Zachary Friedman, Associate Director of Product Management at Immuta, join me on The Ravit Show at AWS re: Invent!Our discussion highlighted the strong partnership between Immuta and AWS, including how the two companies collaborate across development and sales to deliver solutions that empower data teams. From Redshift to S3 and SageMaker, and now with Lake Formation integration, the depth of this partnership is driving significant value for customers.We explored key topics, including:-- How the Immuta-AWS partnership enables organizations to scale secure and governed data access across multiple platforms-- The growing importance of data analytics in governance projects as companies prioritize data-driven decision-making-- The challenges companies face as they scale their data environments, and how Immuta's solutions address the need for faster, secure data access without compromising governance.-- The reasons behind Immuta's integration with Lake Formation and how it supports tools like Athena, EMR, and Redshift Spectrum to deliver even greater control and flexibility for customersImmuta's commitment to deepening its support for AWS services is a testament to its customer-first approach and drive to simplify secure data access.#data #ai #awsreinvent #awsreinvent2024 #reinvent2024 #immuta #theravitshow
The annual AWS re:Invent conference in Las Vegas has long been a marquee event for technologists and business leaders. But in 2024, it served as a rallying cry for a new technological epoch - one where generative AI (GenAI) is no longer a nascent tool but a transformative force shaping industries, economies, and creativity. At the heart of this year's address was Dr. Swami Sivasubramanian, AWS's Vice President of AI and Data, who positioned Amazon's cloud division not just as a vendor but as an architect of this revolution. Dr. Sivasubramanian began with a historical overture, likening the current moment to the Wright Brothers' first flight in 1903. That 12-second triumph, he noted, was not an isolated miracle but the result of centuries of cumulative innovation - from Leonardo da Vinci's aeronautical sketches to steam-powered gliders. In the same vein, GenAI represents the culmination of decades of research in neural networks, backpropagation algorithms, and the transformative power of Transformer architectures. However, technological breakthroughs alone were not enough. What set the stage for GenAI's explosive growth, Dr. Sivasubramanian argued, was the convergence of cloud computing, vast data lakes, and affordable machine-learning infrastructure - elements AWS has spent the better part of two decades perfecting. AWS SageMaker: The Vanguard of AI Democratization Central to AWS's GenAI arsenal is Amazon SageMaker, a comprehensive platform designed to simplify machine learning workflows. Over the past year, AWS has added more than 140 features to SageMaker, underscoring its ambition to stay ahead in the arms race of AI development. Among these innovations is SageMaker HyperPod, which provides robust tools for training the mammoth foundational models that underpin GenAI. HyperPod automates complex tasks like checkpointing, resource recovery, and distributed training, enabling enterprises like Salesforce and Thomson Reuters to train billion-parameter models without the logistical headaches. But SageMaker is evolving beyond its core machine-learning roots into a unified platform for data analytics, big data processing, and GenAI workflows. The platform's latest iteration consolidates disparate tools into a single, user-friendly interface, offering businesses an integrated suite for data preparation, model development, and deployment. Training Titans: HyperPod and Bedrock As GenAI models grow in size and sophistication, the cost and complexity of training them have skyrocketed. Dr. Sivasubramanian introduced two pivotal innovations aimed at alleviating these challenges. First, HyperPod Flexible Training Plans address the inefficiencies of securing and managing compute resources for training large models. By automating the reservation of EC2 capacity and distributing workloads intelligently, these plans reduce downtime and optimize costs. Second, Bedrock, AWS's managed service for deploying foundational models, makes it easier for developers to select, customize, and optimize GenAI models. Bedrock offers cutting-edge features like Prompt Caching - a cost-saving tool that reduces latency by storing frequently used queries - and Intelligent Prompt Routing, which directs tasks to the most cost-effective model without sacrificing quality. Case Studies in Innovation Throughout his keynote, Dr. Sivasubramanian showcased real-world applications of AWS's GenAI capabilities. Autodesk, the software titan renowned for its design and engineering tools, is leveraging SageMaker to develop GenAI models that combine spatial reasoning with physics-based design principles. These models allow architects to create structurally sound and manufacturable 3D designs, effectively automating tedious aspects of the creative process. Meanwhile, Rocket Companies, a leader in mortgage lending, has deployed Amazon Bedrock to create AI agents that handle 70% of customer interactions autonomously. These agents, embedded in Rocket's AI-driven platform, streamli...
Generative AI is no longer just a playground for tech enthusiasts. Hosts Daniel Newman and Patrick Moorhead are joined by Amazon Web Services' Vice President, AI/ML Services & Infrastructure, Baskar Sridharan on this episode of Six Five On The Road at AWS re:Invent. They discuss the journey from proof-of-concept to full-scale production in enterprise IT with a focus on generative AI and strategic partnerships. Highlights Include: Great Expectations: The transition of generative AI applications from experimental stages to production and the evolving customer expectations of AI & data infrastructure Unified SageMaker: AWS is streamlining the journey from data to AI with their next-gen SageMaker platform, making it easier for businesses to build and deploy GenAI applications Cost Optimization: Model distillation and other innovations are making GenAI more affordable, with significant reductions in training and inference costs Data as a Differentiator: Your data is what makes your GenAI applications unique & AWS is providing powerful tools like Bedrock Knowledge Bases to help customers leverage data effectively Trust and Security: AWS is leading the way in responsible AI with its ISO 42001 certification, ensuring that your GenAI applications are built on a foundation of trust Real-world examples: How enterprise IT is leveraging AWS services to scale their generative AI applications effectively
Generative AI is disrupting industries and automakers are not immune. Daniel Newman and Patrick Moorhead are joined by Amazon Web Services' Karthik Bharathy, Deloitte's Chris Jangareddy, and Toyota's Philip Ryan to discuss the transformative power of generative AI in the automotive industry, AWS's innovative technology like Amazon Bedrock and SageMaker, Deloitte's strategic incorporation of GenAI, and Toyota's strategic imperatives in AI that collectively drive enhanced customer experiences and increased market share. Get their insights on
Generative AI is no longer just a playground for tech enthusiasts. Hosts Daniel Newman and Patrick Moorhead are joined by Amazon Web Services' VP AWS, Database & AI Leadership Baskar Sridharan on this episode of Six Five On The Road at AWS re:Invent. They discuss the journey from proof-of-concept to full-scale production in enterprise IT with a focus on generative AI and strategic partnerships. Highlights Include: Great Expectations: The transition of generative AI applications from experimental stages to production and the evolving customer expectations of AI & data infrastructure Unified SageMaker: AWS is streamlining the journey from data to AI with their next-gen SageMaker platform, making it easier for businesses to build and deploy GenAI applications Cost Optimization: Model distillation and other innovations are making GenAI more affordable, with significant reductions in training and inference costs Data as a Differentiator: Your data is what makes your GenAI applications unique & AWS is providing powerful tools like Bedrock Knowledge Bases to help customers leverage data effectively Trust and Security: AWS is leading the way in responsible AI with its ISO 42001 certification, ensuring that your GenAI applications are built on a foundation of trust Real-world examples: How enterprise IT is leveraging AWS services to scale their generative AI applications effectively
Send us a textIn this episode, Frank and SteveO celebrate four years of their podcast while discussing the latest advancements in Cloud FinOps, including Amazon's new EC2 Graviton instances, Azure's confidential VMs, and cost management innovations. They explore dynamic scaling in cloud services, AWS Healthomics, and the importance of sustainability in computing. The conversation also touches on SMS innovations, SageMaker customization, application rationalization, and automated complaint rate recommendations in SES, providing listeners with a comprehensive overview of the current cloud landscape.The podcast celebrates its four-year anniversary, highlighting its growth and listener engagement.everything else is noise :D
In this highly informative episode, we share strategies and best practices for driving growth for SaaS companies featuring Kulwinder Kalsi, head of UK and Ireland Enterprise Software and SaaS Architecture and Johan Broman, Head of Solutions Architecture for Independent Software Vendors for Europe, Middle East & Africa.Topics Include:Discover the main expectations end customers have of SaaS companies, including frictionless tenant onboarding, instrumentation/monitoring, pricing transparency, and focusing on core innovation.Explore key trends shaping the SaaS industry, such as the shift towards consumption-based pricing models, complex pricing strategies that align packaging with customer segments, and leveraging other software partners to remove undifferentiated heavy lifting.Learn how AWS can help SaaS companies with its global infrastructure, robust security measures, and rich ecosystem of services and partners to accelerate growth and innovation.Get insights into monetizing data strategies that SaaS companies can adopt to unlock additional revenue streams from their tenant data.Understand the different ways AWS partners, including systems integrators, professional services, and other software vendors, can support SaaS companies through their journey.Receive practical advice on approaching cloud migration and modernization, including aligning leadership, assessing growth vs. retention goals, and choosing the right mix of lift-and-shift or full modernization.Gain a nuanced perspective on developing AI solutions, covering use case identification, model selection, deployment techniques like prompt engineering and fine-tuning, and the importance of monitoring and management.Hear expert tips on navigating the rapidly evolving generative AI landscape, leveraging AWS services like SageMaker and Bedrock, and utilizing specialized AI chips like Trainium and Inferentia.
Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/ Matthew McClean is a Machine Learning Technology Leader with the leading Amazon Web Services (AWS) cloud platform. He leads the customer engineering teams at Annapurna ML helping customers adopt AWS Trainium and Inferentia for their Gen AI workloads. Kamran Khan, Sr Technical Business Development Manager for AWS Inferentina/Trianium at AWS. He has over a decade of experience helping customers deploy and optimize deep learning training and inference workloads using AWS Inferentia and AWS Trainium. AWS Tranium and Inferentia // MLOps podcast #238 with Kamran Khan, BD, Annapurna ML and Matthew McClean, Annapurna Labs Lead Solution Architecture at AWS. Huge thank you to AWS for sponsoring this episode. AWS - https://aws.amazon.com/ // Abstract Unlock unparalleled performance and cost savings with AWS Trainium and Inferentia! These powerful AI accelerators offer MLOps community members enhanced availability, compute elasticity, and energy efficiency. Seamlessly integrate with PyTorch, JAX, and Hugging Face, and enjoy robust support from industry leaders like W&B, Anyscale, and Outerbounds. Perfectly compatible with AWS services like Amazon SageMaker, getting started has never been easier. Elevate your AI game with AWS Trainium and Inferentia! // Bio Kamran Khan Helping developers and users achieve their AI performance and cost goals for almost 2 decades. Matthew McClean Leads the Annapurna Labs Solution Architecture and Prototyping teams helping customers train and deploy their Generative AI models with AWS Trainium and AWS Inferentia // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links AWS Trainium: https://aws.amazon.com/machine-learning/trainium/ AWS Inferentia: https://aws.amazon.com/machine-learning/inferentia/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Kamran on LinkedIn: https://www.linkedin.com/in/kamranjk/ Connect with Matt on LinkedIn: https://www.linkedin.com/in/matthewmcclean/ Timestamps: [00:00] Matt's & Kamran's preferred coffee [00:53] Takeaways [01:57] Please like, share, leave a review, and subscribe to our MLOps channels! [02:22] AWS Trainium and Inferentia rundown [06:04] Inferentia vs GPUs: Comparison [11:20] Using Neuron for ML [15:54] Should Trainium and Inferentia go together? [18:15] ML Workflow Integration Overview [23:10] The Ec2 instance [24:55] Bedrock vs SageMaker [31:16] Shifting mindset toward open source in enterprise [35:50] Fine-tuning open-source models, reducing costs significantly [39:43] Model deployment cost can be reduced innovatively [43:49] Benefits of using Inferentia and Trainium [45:03] Wrap up
Building With People For People: The Unfiltered Build Podcast
Metrics are hard. Identifying which metrics to measure is even harder. So how do you get started? And how do you know when you have achieved true developer productivity zen? Like anything in life the path to mastery is a journey and today we are joined by a passionate staff engineer from Meta to share with us his theory on a developer productivity maturity model which paints a wonderful mental picture on knowing where we stand in our developer productivity journey and how companies can move through the stages. We also discuss productivity dashboards, if you actually need dashboards, how Meta thinks about developer productivity and more. Our guest, Karim Nakad, has his Masters of Computer Science from University of Wisconsin and previously worked for Amazon for SageMaker and Prime. He is currently a Staff Software Engineer at Meta making an impact in the productivity organization. He is dedicated to improving developer efficiency across the board and paving the way by generating and exposing productivity and code quality metrics across the tech industry and alongside leading experts and researchers. His excitement around improving the daily working lives of software engineers is palatable and contagious and I can't wait to dig in. I met our guest at a developer productivity engineering conference last year and when he summarized back to me the purpose of a project I was working on in such an eloquent manner I knew then he had to come on the podcast to share his thoughts and efforts around bringing happiness to engineers and building products for people. When our guest is not helping engineers move fast and be productive, he games and travels the world. He also two Macaws a green wing and a blue and gold. Enjoy! Connect with Karim: LinkedIn Twitter Threads Sponsor: Get Space: Do you know what pain points exist in your company? Install Get Space's real-time survey iteration tool now with code "buildwithpeople" and get 20% off your first year Episode correction: Karim wanted to clarify the difference and intersection between qualitative/quantitative and objective/subjective: Qualitative: Non-number data such as the subjective free-form text in surveys. Quantitative: Data that can be counted, such as subjective multiple-choice in surveys or objective system measurements. Show notes and helpful resources: DORA The SPACE framework Karim's best advice: “Anyone can be an expert you just need to read the code” Karim's everyday tool: Obsidian - note taking app Reflect note taking app The Hack language Karim says developer productivity is about creating an efficient and enjoyable experience as that is what encourages devs to do their best work To measure, rely on frameworks our there like DORA or SPACE and Karim recommends using metrics you already have to start with AutoFocus paper: Workgraph: personal focus vs. interruption for engineers at Meta - improved personal focus by over 20% KPIs rule of thumb takes two forms: Latency and Reliability An example of latency is test latency and how quickly do they complete An example of reliability is test reliability and how often your test delivers good signal Productivity Engineering Maturity Model (5 stages): Ignorance: Not know about or not prioritizing developer productivity Awareness: Forming a team focused on addressing highest pain points for example around continuous integration or testing Initiation: Merging KPIs into a common productivity goal and creating dashboards Refinement: Making recommendations on dashboards to improve productivity Mastery: Automating and integrating productivity improvements into workflows Advice for smaller companies: Keep an ear on the ground for industry research from companies like Google and Microsoft, and leverage frameworks like SPACE and DevEx to measure and improve productivity. The importance of nudging teams in the right direction rather than mandating productivity solutions, allowing teams to find their own paths to improvement. Building something cool or solving interesting problems? Want to be on this show? Send me an email at jointhepodcast@unfilteredbuild.com Podcast produced by Unfiltered Build - dream.design.develop.
In this episode, we provide commentary and analysis on the 2024 AWS Community Survey results. We go through the key findings for each area including infrastructure as code, CI/CD, serverless, containers, NoSQL databases, event services, and AI/ML. While recognizing potential biases, we aim to extract insights from the data and share our perspectives based on experience. Overall, we see increased adoption across many services, though some pain points remain around developer experience. We hope this format provides value to listeners interested in cloud technology trends.
Matthew Bonig, Chief Cloud Architect at Defiance Digital, joins Corey on Screaming in the Cloud to discuss his experiences in CDK, why developers can't be solely reliant on AI or coding tools to fill in the blanks, and his biggest grievances with AWS. Matthew gives an in-depth look at how and why CDK has been so influential for him, as well as the positive work that Defiance Digital is doing as a managed service provider. Corey and Matthew debate the need for AWS to focus on innovating instead of simply surviving off its existing customer base.About MatthewChief Cloud Architect at Defiance Digital. AWS DevTools Hero, co-author of The CDK Book, author of the Advanced CDK Course. All things CDK and Star Trek.Links Referenced:CDK Book: https://www.thecdkbook.com/cdk.dev: https://cdk.devTwitter: https://twitter.com/mattbonigLinkedIn: https://www.linkedin.com/in/matthewbonig/Personal website: https://matthewbonig.comduckbillgroup.com: https://duckbillgroup.comTranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. And I'm back with my first recording that was conducted post-re:Invent and all of its attendant glory and nonsense; we might talk a little bit about what happened at the show. But my guest today is the Chief Cloud Architect at Defiance Digital, Matthew Bonig. Matthew, thank you for joining me.Matthew: Thank you, Corey. Thanks for having me today.Corey: So, you are deep into the CDK. You're one of the AWS Dev Tools Heros, and you're the co-author of the CDK Book, you've done a lot, really. You have a course now for Advanced CDK work. Honestly, at this point, it starts to feel like when I say the CDK is a cult, you're one of the cult leaders, or at least very high up in the cult.Matthew: [laugh] Yes, it was something that I discovered—Corey: Your robe has a fringe on it.Matthew: Yeah, yeah. I discovered this at re:Invent, and it kind of hit me a little surprised that I got called out by a couple people by being the CDK guy. And I didn't realize that I'd hit that status yet, so I got to get myself a hat, and a cloak, and maybe some fun stuff to wear.Corey: For me, what I saw on the—it was in the run-up to re:Invent, but the big CDK sized announcement was the fact that the new version of Amplify now is much closer tied to the CDK than it was in previous incarnations, which is great. It sort of solves the problem, how do I build a thing through a variety of different tools? Great, and how do I manage that thing programmatically? It seems if, according to what it says on the tin, that it narrows that gap. Of course, here in reality, I haven't had time to pick anything like that up, and I won't for months, just because so much comes out all at the same time. What happened in the CDK world? What did I miss? What's exciting?Matthew: Well, you know, the CDK world has been, I've said, fairly mature for a while now. You know, fundamentally the way the CDK works and the functionality within it hasn't changed drastically. Even when 2.0 came out a couple of years ago, there wasn't a drastic fundamental change in the way that the API worked. Really, the efforts that we've been seeing for the last year or so, and especially the last few months, is trying to button up some functionality, hit some of those edge cases have been rough for some users, and ultimately just continue to fill out things like L2 constructs and maybe try to build out some L3s.I think what they're doing with Amplify is a good sign that they are trying to, sort of, reach across the aisle and work with other frameworks and work with other systems within AWS to make the experience better, shows their commitment to the CDK of making it really the first class citizen for doing IaC work in AWS.Corey: I think that that is a—that's a long road, and it's also a lot of work under the hood that's not easily appreciated. You've remarked at one point that my talk at the CDK Community Day was illuminating, if nothing else, if for no other reason than I dressed up as a legitimate actual cultist and a robe to give the talk—Matthew: Yeah. Loved it.Corey: Because I have deep-seated emotional problems. But it was fun. It talked a bit about my journey with it, where originally I viewed it as, more or less, this thing that was not for me. And a large part of that because I come from a world of sysadmin ops types, where, “I don't really know how to code,” was sort of my approach to this. Because I was reaff—I had that reaffirmed every time I talked to a developer. Like, “You call this a bash script? It's terrible.” And sure, but it worked, and it tied into a different knowledge set.Then, when I encountered the CDK for the first time, I tried to use it in Python, which at the time was not really well-supported and led to unfortunate outcomes—I do not know if that's still the case—what got me into it, in seriousness, was when I tried it a few months later with TypeScript and that started to work a little bit more clearly, with the caveat that I did not know JavaScript, I did not know TypeScript, I had to learn it as I went in service to the CDK. And it works really well insofar as it scratched an itch that I had. There's a whole class of problems that I don't have to deal with, which include getting someone who isn't me involved in some of that codebase, or working in environments where you have either a monorepo or a crap ton of tiny repos scattered everywhere and collaborating with other people. I cannot speak authoritatively to any of that. I will say it's incredibly annoying when I'm trying to update something written in the CDK, and then I have touched it in a year-and-a-half, and the first thing I have to do is upgrade a whole a bunch of dependencies, clear half a day just to get the warnings to clear before I can go ahead and deploy the things, let alone implement the tiny change I'm logging into the thing to fix.Matthew: Oh, yeah, yes. Yeah, the dependency updates are probably one of the most infuriating things about any Node.js system, and I don't think that I've ever run across any application project framework, anything in which doing dependency upgrades wasn't a nightmare. And I think it's because the Node.js community, more so than I've seen any other, doesn't care about semantic versioning. And unfortunately, the CDK doesn't technically care about semantic versioning, either, which makes it very tricky to do upgrades properly.Corey: There also seems to be the additional problem layered on top, which is all of the various documentation sources that I stumble upon, the official documentation, not terrific at giving real-world use case. It feels like it's trying to read the dictionary to learn how English works, not really its purpose. So, I find a bunch of blog posts, and all of them tend to approach this ecosystem slightly differently. One talks about using NPM. Another talks about Yarn.If you're doing anything that involves a web app, as seems to be increasingly common, some will say, “Oh, use WEBrick,” others will recommend using Vite. There's the whole JavaScript framework wars, and the only unifying best practice seems to be, “Oh, there's another way to do it that you should be using instead of the way you currently are on.” And if you listen to that, you wind up in hell.Matthew: Oh, horribly so. Yeah, the split in the ecosystem between NPM and Yarn, I think, has been incredibly detrimental to the overall comfort level in Node.js development. You know, I was an NPM guy for many, many years, and then actually, the CDK got me more using Yarn, simply because Yarn handles cross-library dependency resolution a bit different from NPM. And I just ran into fewer errors and fewer problems if I use Yarn along the way.But NPM then came a long way since then. Now, there's also a PNPM, which is good if you're using monorepos. But then if you're going to be using monorepos, there's another 15 tools out there that you can use for those sorts of things. And ultimately, I think it's going to be what is the thing that causes you the least amount of problems when dealing with them. And every single dependency issue that I've ever run into when upgrading any project, whether it be a web application, a back-end API, or the CDK, it's always unique enough that there isn't a one-size-fits-all answer to solving those problems.Corey: The most recent experience I had with the CDK—since you know, you're basically Mr. CDK at this point, whether you want to be or not, and this is what I do, instead of filing issues anywhere or asking for help, I drag people onto this show, and then basically assault them with my weird use cases—I'm in the process of building something out in the service of shitposting, because that is my nature, and I decided, oh, there's a new thing called the Dynamo table v2—Matthew: Yes.Corey: Which is great. I looked into it. The big difference is that it addresses it from the beginning as a global table, so you have optionality. Cool. Trying to migrate something that is existing from a Dynamo table to a Dynamo v2 table started throwing CloudFormation issues, so my answer was—this was pre-production—just tear down the stack and rebuild it. That feels like that would be a problem if this had been something that was actually full of data at this point.Matthew: There's a couple of ways that you could maybe go about it. Now, this is a very special case that you mentioned because you're talking about fundamentally changing the CloudFormation resource that you are creating, so of course, the CDK being an abstraction layer over top of CloudFormation and the Dynamo table v2 using the global table resource rather than just the table resource. If you had a case where you have to do that migration—and I've actually got a client right now who's very much looking to do that—the process would probably be to orphan the existing table so that you can retain the data and then using an import routine with CloudFormation to bring that in under the new resource. I haven't tried it yet—Corey: In this case, the table was empty, so it was easy enough to just destroy and then recreate, but it meant that I also had to tear down and recreate everything else in the stack as well, including CloudFront distributions, ACM certificates, so it took 20 minutes.Matthew: Yes. And that is one of the reasons why I often will stick any sort of stateful resource into their own stack so that if I have to go through an operation like this, I'm know that I'm not going to be modifying things that are very painful to drop and recreate, like, CloudFront distributions, which can take a half an hour or more to re-initialize.Corey: Yeah. So, that was fun. The problem got sorted out, but it was still a bit challenging. I feel like at some level, the CDK is hobbled by the fact that under the hood, it really just is just CloudFormation once all is said and done, and CloudFormation has never been the speediest thing. I didn't understand that until I started playing with Terraform and I saw how much more quickly it could provision things just by calling the service APIs directly. It sort of raises the question of what the hell the CloudFormation service is doing when it takes five times longer to do effectively the same thing.Matthew: Yeah, and the big thing that I appreciate about Terraform versus CloudFormation—speed being kind of the big win—is the fact that Terraform doesn't obfuscate or hide state from you. If you absolutely need to, you can go in and change that state that relates your Terraform definitions to the back-end resources. You can't do that with CloudFormation. So CloudFormation, did release few years ago, that import routine, and that was pretty good—not great, but pretty good; it's getting better all the time—whereas this was a complete and unneeded feature with Terraform because if it came down to the point where you already had a resource, and you just want to tie it to your IaC, you just edit a state file. And they've got their import routines and tie-in routines as well, but having that underlying state exposed was a big advantage, in my mind, to Terraform that I missed going to CloudFormation, and still to this day frustrates me that I can't do that underlying state change.Corey: It becomes painful and challenging, for better or worse.Matthew: Yep.Corey: But yeah, that was what I ran into. Things have improved, though. When I google various topics, I find that the v2 documentation comes up instead of the v1. That was maddening for a little while. I find that there are still things that annoy me, but they become less all the time, partially because I feel like I'm getting better at knowing how to search for them, and also because I think I'm becoming broken in the right ways that the CDK tends to expect.Matthew: Oh, like how?Corey: Oh, easy example here: I was recently trying to get something set up and running, and I don't know why this is the case, I don't know if it holds true and other programming languages, but I'm getting more used to the fact that there are two files in TypeScript-land that run a project. One is generally small and in a side directory that no one cares about, I think it's in a lib or the bin subdirectory. I don't remember which because I don't care. And then there are things you have to do within the other equivalent that basically reference each other. And I've gotten better at understanding that those aren't one file, for example. Though they seem to sure be a lot in all the demos, but it's not how the init process, when you're starting something new, spins up.Matthew: Yeah, this is the hell of TypeScript, the fact that Node.js, as a runtime, cannot process TypeScript files, so you always have to pass them through a compiler. This is actually one of the things that I like about using Projen for all of my projects instead of using CDK init to start them is that those baseline configurations handle the TypeScript nature of the runtime—or I should say, the anti-TypeScript nature of the runtime a little bit better, and you run into fewer problems. You never have to worry about necessarily doing build routines or other things because they actually use the ts-node runtime to handle your CDK files instead of the node runtime. And I think that's a big benefit in terms of the developer experience. It just makes it so I generally never have to care about those JavaScript files that get compiled from TypeScript. In the, you know, two years or so I've been using Projen, I never have to worry about a build routine to turn that into JavaScript. And that makes the developer experience significantly better.Corey: Yeah, I still miss an awful lot of things that I feel like I should be understanding. I've never touched Projen, for example. It's on my backlog of things to look into.Matthew: Highly recommend it.Corey: Yeah, I also am still in that area of… my TypeScript knowledge has not yet gotten to a point where I see the value of it. It feels like I've spent far more time fighting with the arbitrary restrictions that are TypeScript than it has saved me from typing errors in anything that I've built. I believe it has to come back around at some point of familiarity with the language, but I'm not there yet.Matthew: Got you. So, Python developer before this?Corey: Ish. Mostly brute force and enthusiasm, but yeah, Python.Matthew: Python, and I think you said bash scripting and other things that have no inherent typing built into it.Corey: Right.Matthew: Yeah, that is a problem, I think… that I thankfully avoided. I was an application developer for many years. My background and my experience has always been around strongly typed languages, so when it came to adopting the CDK, everything felt very natural to me. But as I've worked with people over the years, both internally at Defiance as well as people in the community that don't have a background in that, I've been exposed to how problematic TypeScript as a language truly can be for someone who has never had this experience of, I've got this thing and it has a well-defined shape to it, and if I don't respect that, then I'm going to bang my head against to these weird errors that are hard to comprehend and hard to grok way more than it feels like I'm getting value from it.Corey: There's also a lack of understanding around how to structure projects, in my case, where all right, I have a front-end and I have a back-end. Is this all within the context of the CDK project? And this, of course, also presupposes that everything I'm doing is effectively greenfield, in which case, great, do I use the front-end wizard tutorial thing that I'm following, and how does that integrate when I'm using the CDK to deploy it somewhere, and so on and so forth. It's stuff that makes sense once you have angry and loud enough opinions, but I don't yet.Matthew: Yeah, so the key thing that I tell people about project structure—because it does often come up a lot—is that ultimately, the CDK itself doesn't really care how you structure things. So, how you structure, where you put certain files, how you organize them, is your personal preference. Now, there are some exceptions to that. When it comes to things like Lambda functions that you're building or Docker files, there are probably some better practices you can go through, but it's actually more dependent on those systems rather than the CDK directly itself. So I go through, in the Advanced CDK course, you know, my basic starting directory structure for everything, which is stacks, constructs, apps, and stages all go into their own specific directories.But then once those directories start growing—because I've added more stacks, more constructs, and things—once I get to around five to maybe seven files in a directory, then I look at them and go, “Okay, how can I group these together?” I create subdirectories, I move those files around. My development tool of choice, which is WebStorm—JetBrains's long-running tool—handles the moving of those files for me, so all of my imports, all of my references automatically get updated accordingly, which is really nice, and I can refactor things as much as I want to without too much of a problem. So, as a project grows over time, my directory structure can change to make sure that it is readable, well organized, and understandable, and it's never been too much of a problem.Corey: Yeah, it's one of those things that does take some getting used to. It helps, I think, having a mentor of sorts to take you under their wing and explain these things to you, but that's a hard thing to scale as well. So, in the absence of that we wind up defaulting to oh, whatever the most recent blog post we read is.Matthew: Yeah. Yeah, and I think one of the truest, I think, and truthful complaints I've heard about the CDK and why it can be fundamentally very difficult is that it has no guardrails. It is a general-purpose languages, and general purpose languages don't have guardrails. They don't want to be in the way of you building whatever you need to build.But when it comes to an Infrastructure as Code project, which is inherently very different from an API or a website or other, sort of, more typical programming projects, having guardrail—or not having guardrails is a bad thing, and it can really lead you down some bad paths. I remember working with a client this last year who had leveraged context instead of properties on classes to hand configuration value down through code, down through stacks and constructs and things like that. And it worked. It functionally got them what they needed, up until a point, and then all of sudden, they were like, “Well, now we want to do X with the CDK, and we simply cannot because we've now painted ourselves into a corner.” And that's the downside of not having these good guard rails.And I think that early, they needed to do this early on. When the CDK was initially released, and it got popular back around the 0.4, 0.5 timeframe—I think I picked it up right around 0.4, too—when it officially hit a 1.0 release, there should have been a better set of guidelines and best practices published. You can go to the documents and see them, and they have been published, but it really didn't go far enough to really explain how and why you had to take the steps to make sure you didn't screw yourself six months later.Corey: It's sort of those one-way doors you don't realize you're passing through when you first start building something. And I find, especially when you follow my development approach of more or less used to be copying and pasting for various places, now it's copying and pasting from one place which is Chat-Gippity-4, then—although I've seen increasingly GitHub's Copilot has been great at this and Code Whisperer, in my experience, has not yet been worth the energy it takes to really go diving into it. Your mileage may of course vary on that. But I found it was not making materially better or suggestions on CDK stuff then Copilot was.Matthew: Yeah, I haven't tried Code Whisperer outside of the shell. I've been using Copilot for the last year and absolutely adore it. I think it has completely changed the way that I felt about coding. I saw writing code for the last couple of years as being very tedious and very boring in terms of there weren't interesting problems to solve, and Copilot, as I've seen it, is autocomplete on steroids. So, it doesn't keep me from having to solve the interesting problems; it just keeps me from having to type out the boring solutions, and it's the thing that I love about it.Now, hopefully, Code Whisperer continues to get better over time. I'm hoping all of Amazon's GenAI products continue to get better over time and I can maybe ditch a subscription to Copilot, but for now, Copilot is still my thing. And it's producing good enough results for me. Thankfully because I've been working with it for four years now, I don't rely on it to answer my questions about how to use constructs. I go back to the docs for those. If I need to.Corey: It occurs to me that I can talk about this now because this episode will not air until after this has become generally available, but what's really spanked it from my perspective has been Google's Duet. And the key defining difference is, as I'm in one of these files—in many cases, I'm doing something with React these days due to an escalating series of weird choices—and—Matthew: My apologies, by the way. My condolences, I should say.Corey: Well, yeah. Well, things like Copilot Chat are great when they say, “Oh yeah, assuming that you're handling the state this way in your component, now…” What I love about Duet is it goes, and it actually checks, which is awesome. And it has contextual awareness of the entire project, not just the three lines that I'm talking about, or the file that I'm looking at this moment. It goes ahead and does the intelligent thing of looking at some of these things. It still has some problems where it's confidently wrong about things that really shouldn't be, but okay, early days.Matthew: Sure. Yeah, I'll need to check that out a little bit more because I still, to this day, despise working with React. It is still my framework of choice because the ecosystem is so good around it. And so, established that I know that whatever problem I have, I'll find 14 blogs, and maybe one of them is the answer that I want, versus any other framework where it still feels so very new and so very immature that I will probably beat my head more than I want to. Web development now is a hobby, not a job, so I don't want to bang my head against a hobby project.Corey: I tend to view, on some level, that these AIs coding assistants are good enough to get me almost anywhere I need to go, to the point where a beginner or enthusiastic amateur will be able to get sorted out. And for a lot of what I'm building, that's all I really need. I don't need this to be something that will withstand the rigors of production at a bank, for example. One challenge I have seen with all these things is there's a delay in something being released and their training data growing to understand those things. Very often it'll wind up giving me recommendations for—I forget the name of it, but there was a state manager in React that the first thing you saw when you installed it was, “This has been deprecated. This is the new replacement.” And if you explicitly ask about the replacement, it does the right thing, but it just cheerfully goes ahead and tells you to use ancient stuff or apply poor security practices or the rest.Matthew: Yeah, that's very scary to me, to be honest because I think these AI development tools—for me, it's revitalized my interest in doing development, but where I get really, really scared is where they become a dependency in writing the right code. And every time I ever use Copilot to fill out stuff, I'm always double-checking, and I'm always making sure that this is right or that is right. And what I worry about is those developers who are maybe still learning some things, or are having to write in-line SQL on to their back-end and let Copilot, or Code Whisperer, or whatever tool they pick fill this stuff out, and that answer is based on a solution that works for a 10,000 record database, but fails horribly on a 100 million record database. And now all of a sudden, and you've got this problem that is just festering in through a dev environment, in through a QA environment, and even maybe into a prod environment, and you don't find out that failure until six months later, when some database table runs past its magical limit and now all of sudden, you've got these queries that are failing, they're crashing databases, they're running into problems, and this developer that didn't really know what they built in the first place is now being asked, “Why doesn't your code work,” and they just sort of have to go, “Maybe ChatGPT can tell me why my code doesn't work.” And that's the scariest part of me to these things is that they're a little bit too good at answering difficult questions with a simple answer. There is no, “It depends,” with these answers, and there needs to be for a lot of what we do in complex systems that, for example, in the AWS world, we're expected to build complex systems, and ChatGPT and these other tools are bad at that.Corey: We're required to build complex systems, and, on some level, I would put that onus on Amazon in many respects. I mean, the challenge I keep smacking into is that they're building—they're giving you a bunch of components and expecting you to assemble them all yourself to achieve even relatively simple things. It increasingly feels like this is the direction that they want customers to go in because they're bad at moving up the stack and develop—delivering integrated solutions themselves.Matthew: Well, so I would wonder, would you consider a relatively simple system, then?Corey: Okay, one of the things I like to do is go out in the evenings, and sometimes with a friend, I'll have a few too many beers. And then I'll come up with an idea for I want to redirect this random domain that I want to buy to someone else's website. The end. Now, if you go with Namecheap, or GoDaddy, or one of these various things, you can set that up in their mobile app with a couple of clicks and a payment, and you're done. With AWS, you have a minimum of six different services you need to work with, many of which do not support anything on a mobile basis and don't talk to one another relatively well. I built a state machine out of step functions that will do a lot of it for me, but it's an example of having to touch so many different things just for a relatively straightforward solution space that is a common problem. And that's a small example, but you see it across the board.Matthew: Yeah, yeah. I was expecting you to come up with a little bit of a different answer for what a simple system is, for example, a website. Everyone likes to say, “Oh, a static website with just raw HTML. That's a simple”—Corey: No, that's hard as hell because the devil is in the details, and it slices you to ribbons whenever you go down that path.Matthew: Exactly.Corey: No, I'm talking things that a human being would do without needing to be an expert in getting that many different AWS services to talk to one another.Matthew: Yeah, and I agree that AWS traditionally is very bad at moving up that stack and getting those things to work. You had mentioned at the very top of this about Amplify. Amplify is a system that I have tried once or twice, and I generally think that, for the right use case, is an excellent system and I really like a lot of what it does.Corey: It is. I agree. Having gone down that, building up my scavenger hunt app that I'll be open-sourcing at some point next year.Matthew: Yeah. And it's fantastic, but it has a very steep cliff where you hit that point where all of a sudden, you go, “Okay, I added this, and I added this, and I added this, and now I want to add this one other thing, but to do it, now all of a sudden, I have to go through a tremendous amount of work.” It wasn't just the simple push button that the previous four steps were. Now, I have this one other thing that I need to do, and now it's a very difficult thing to incorporate into my system. And I'm having to learn all new stuff that I never had to care about before because Amplify made it way too easy.And I don't think this is necessarily an AWS problem. I think this is just a fundamentally difficult software problem to solve. Microsoft, I spent years and years in the Microsoft world, and this was my biggest complaint about Microsoft was that they made extremely difficult things, far too simple to solve. And then once those systems became either buggy, problematic, misconfigured, whatever you want to call it, once they stopped working for some reason, the people who were responsible for figuring those answers out didn't have the preceding knowledge because they didn't need it. And then all of a sudden, they go, “Well, I don't know how to solve this problem because I was told it was just this push-button thing.”So, Amplify is great, and I think it's fantastic, but it is a very, very difficult problem to solve. Amazon has proven to be very, very good at building the fundamentals, and I think that they function very well as a platform service, as a building blocks. But they give you the Lego pieces, and they expect you to build the very complex Batmobile. And they can maybe give you some custom pieces here and there, like the fenders, and the tires, and stuff like that, but that's not their bread and butter.Corey: Well, even starting with the CDK is a perfect example. Like, you can use the CDK init to create a new project from scratch, which is awesome. I love the fact that that exists, but it doesn't go far enough. It doesn't automatically create a repo you store the thing in that in turn hooks up to a CI/CD process that will wind up doing the build and deploy. Instead, it expects to do that all locally, which is a counter pattern. That's an anti-pattern. It'll lead you down the wrong path. And you always have to build these things from scratch yourself as you keep going. At least that's what it feels like.Matthew: Yeah, it is. And I think that here at Defiance Digital, our job as an MSP is to talk to the customer and figure out, but what are those very specific things you need? So, we do build new CDK repos all the time for our customers. But some of our customers want a trunk base system. Some of them want a branching or a development branch base system. Some of them have a very complex SDLC process within a PR stage of code changes versus a slightly less complex one after things have been merged into trunk.So, we fundamentally look at it like we're that bridge between the two, and in that case, AWS works great. In fact, all SaaS solutions are really nice because they give us those building blocks and then we provide value by figuring out which one of those we need to incorporate in for our clients. But every single one of our clients is very different. And we've only got, you know, less than a dozen right now. But you know, I've got project managers and directors always coming back to me and saying, “Well, how do we cookie-cutter this process?” And you can't do it. It's just very, very difficult.Not in a small-scale. Maybe when you're really big, and you're a company like AWS who has thousands, if not potentially millions of customers, you can find those patterns, but it is a very fundamentally difficult problem to solve, and we've seen multiple companies over the last two decades try to do these things and ultimately fail. So, I don't necessarily blame AWS for not having these things or not doing them well.Corey: Yes and no. I mean, GitHub delivers excellent experience for the user, start to finish. There's—Vercel does something very similar over in the front-end universe, too, where it is clearly possible, but it seems that designing user interfaces and integrating disparate things together is not an Amazon's DNA, which makes sense when you view the two-pizza teams assembling to build larger things. But man, is that a frustration.Matthew: Yeah. I really wonder if this two-pizza team mentality can ever work well for products that are bigger than just the fundamental concepts. I think Amplify is pretty good, but if you really want something that is this service that works for 80% of customers, you can't do it with five people. You can't do it with six. You need to have teams like what GitHub and what Vercel and other things, where teams are potentially dozens of people that really coordinate things and have a good project manager and product owner and understand the problem very well. And it's just very difficult with these very, very small teams to get that going.I don't know what the future of AWS looks like. It feels like a very Microsoft in the mid-2000s, which is, they're running off of their existing customers, they don't really have a need to innovate significantly because they have a lot of people locked in, they would be just fine for years on years on end with the products they have. So, there isn't a huge driver for doing it, not like, maybe, GCP or Azure really need to start to continue to innovate stronger in this space to pick up more customers. AWS doesn't have a problem getting customers.And if there isn't a significant change in the mentality, like what Microsoft saw at the end of the 2000s with getting rid of Ballmer, bringing in Satya and really changing the mentality inside the company, I don't see AWS breaking out from this anytime soon. But I think that's actually a good thing. I think AWS should stick to just building the fundamentals, and I think that they should rely on their partners and their third parties to bridge that gap. I think Jeremy Daly at Ampt and what they're building over there is a fantastic product.Corey: Yeah. The problem is that Amazon seems to be in denial about a lot of this, at least with what they're saying publicly.Matthew: Yeah, but what they say publicly and how they feel internally could be very, very different. I would say that, you know, we don't know what they're thinking internally. And that's fine. I don't necessarily need to. I think more specifically, we need to understand what their roadmap looks like and we need to understand, you know, what, are they going to change in the future to maybe fill in some of these gaps.I would say that the problem you said earlier about being able to do a simple website redirect, I don't think that's Amazon's desire to build those things. I think there should be a third-party that's built on top of AWS, and maybe even works directly within your AWS account as a marketplace product for doing that, but I don't think that's necessarily in the benefit of AWS to build that directly.Corey: We'll see. I'm very curious to see how this unfolds because a lot of customers want answers that require things that have to be assembled for them. I mean, honestly, a lot of the GenAI stuff is squarely in that category.Matthew: Agreed, but is this something where AWS needs to build it internally, and then we've got a product like App Composer, or Copilot, or things where they try, and then because they don't get enough traction, it just feels like they stall out and get stagnant? I mean, App Composer was a keynote product announcement during last year's re:Invent, and this year, we saw them introduce the ability to step function editing within it, and introduce the functionality into your IDE, VS Code directly. Both good things, but a year's worth of development effort to release those two features feels slow to me. The integration to VS Code should have been simple.Corey: Yeah. They are not the innovative company that would turn around and deliver something incredible three months after something had launched, “And here's a great new series of features around it.” It feels like the pace of innovation and face of delivery has massively slowed.Matthew: Yeah. And that's the scariest thing for me. And, you know, we saw this a little bit with a discussion recently in the cdk.dev server because if you take a look at what's been happening with the CDK application for the last six months and even almost a year now, it feels like the pace of changes within the codebase has slowed.There have been multiple releases over the course of the last year where the release at the end of the week—and they hit a pretty regular cadence of a release every week—that release at the end of the week fixes one bug or adds one small feature change to one construct in some library that maybe 10% of users are going to use. And that's troublesome. One of the main reasons why I ditched the Terraform and went hard on the CDK was that I looked at how many issues were open on the Terraform AWS provider, and how many missing features were, and how slow they were to incorporate those in, and said, “I can't invest another two years into this product if there isn't going to be that innovation.” And I wasn't in a place to do the development work myself—despite the fact that you can because it's open-source and providers are forkable—and the CDK is getting real close to that same spot right now. So, this weekend—and I know this is going to come out, you know, weeks later—but you know, the weekend of December 10th, they announced a change to the way that they were going to take contributions from the CDK community.And the long and short of it right now—and there's still some debate over exactly what they said—is, we're not going to accept brand-new L2 constructs from the community. Those have to be built internally by AWS only. That's a dr—step in the wrong direction. I understand why they're taking that approach. Contributions in the CDK have been very rough for the last four or five months because of the previous policies they put into place, but this is an open-source product. It's supposed to be an open-source product. It's also a very complex set of code because of all of the various AWS services that are being hit by it. This isn't just Amplify, which is hitting a couple of things here and there. This is potentially—Corey: It touches everything.Matthew: It touches everything.Corey: Yeah, I can see their perspective, but they've got to get way better at supporting things rapidly if they want to play that game.Matthew: And they can't do that internally with AWS, not with a two-pizza team.Corey: No. And there's an increasing philosophy I'm hearing from teams of, “Well, my service supports it. Other stuff, that's not my area of responsibility.” The wisdom that I've seen that really encapsulates this is written on Colm MacCárthaigh's old laptop in 2019: “AWS is the product.” That's the truth. It's not about the individual components; it's about the whole, collectively.Matthew: Right. And so, if we're not getting these L2 constructs and these things being built out for all of the services that CloudFormation hits, then the product feels stalled, there isn't a good initiative for users to continue trying to adopt it because over time, users are just going to hit more and more services in AWS, not fewer as they use the products. That's what AWS wants. They want people to be using VPC Lattice and all the GenAI stuff, and Glue, and SageMaker, and all these things, but if you don't have those L2 constructs, then there's no advantage of the CDK over top of just raw CloudFormation. So, the step in the right direction, in my opinion, would have been to make it easier and better for outside contributions to get into CDK, and they went the opposite way, and that's scary.Now, they basically said, go build these on your own, go publish them on the Construct Hub, and if they're good, we'll incorporate them in. But they also didn't define what good was, and what makes a good API. API development is very difficult. How do you build a construct that's going to hit 80% of use cases and still give you an out for those other 20 you missed? That's fundamentally hard.Corey: It is. And I don't know if there are good answers, yet. Maybe they're going in the right direction, maybe they're not.Matthew: Time will tell. My hope is that I can try to do some videos here after the new year to try to maybe make this a better experience for people. What does good API design look like? What is it like to implement these things well so they can be incorporated in? There has been a lot of pushback already, just after the first couple of days, from some very vocal users within the CDK community saying, “This is bad. This is fundamentally bad stuff.”Even from big fanboys like myself, who have supported the CDK, who co-authored the CDK Book, and they said, “This is not good.” So, we'll see what happens. Maybe they change direction after a couple of days. Maybe this is— turns out to be a great way to do it. Only time will really tell at this point.Corey: Awesome. And where can people go to find out more as you continue your exploration in this space and find out what you're up to in general?Matthew: So, I do have a Twitter account at@mattbonig on Twitter, however, I am probably going to be doing less and less over there. Engagement and the community as a whole over there has been problematic for a while, and I'll probably be doing more on LinkedIn, so you can find me there. Just search for Matthew Bonig. It's a very unique name.I've also got a website, matthewbonig.com, and from there, you can see blog articles, and a link to my Advanced CDK course, which I'm going to continue adding sessions to over the course of the next few months. I've got one coming out shortly about the deadly embrace and how you can work through that problem with the deadly embrace and hopefully not be so scared about multi-stack applications.Corey: I look forward to that because Lord knows, I'm running into that one myself increasingly frequently.Matthew: Well, good. I will hopefully be able to get this video out and solve all of your problems very easily.Corey: Awesome. Thank you so much for taking the time to speak with me. I appreciate it.Matthew: Thank you for having me. I really appreciate it.Corey: Matthew Bonig, Chief Cloud Architect at Defiance Digital, AWS Dev Tools Hero, and oh so much more. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment that you will then have to wind up building the implementation for that constructs that power that comment yourself because apparently we're not allowed to build them globally anymore.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started.
This episode of Software Engineering Daily is part of our on-site coverage of AWS re:Invent 2023, which took place from November 27th through December 1st in Las Vegas. In today's interview, host Jordi Mon Companys speaks with Ankur Mehrotra who is the Director and GM of Amazon SageMaker. Jordi Mon Companys is a product manager The post AWS re:Invent Special: Sagemaker with Ankur Mehrotra appeared first on Software Engineering Daily.
Evelyn Osman, Principal Platform Engineer at AutoScout24, joins Corey on Screaming in the Cloud to discuss the dire need for developers to agree on a standardized tool set in order to scale their projects and innovate quickly. Corey and Evelyn pick apart the new products being launched in cloud computing and discover a large disconnect between what the industry needs and what is actually being created. Evelyn shares her thoughts on why viewing platforms as products themselves forces developers to get into the minds of their users and produces a better end result.About EvelynEvelyn is a recovering improviser currently role playing as a Lead Platform Engineer at Autoscout24 in Munich, Germany. While she says she specializes in AWS architecture and integration after spending 11 years with it, in truth she spends her days convincing engineers that a product mindset will make them hate their product managers less.Links Referenced:LinkedIn: https://www.linkedin.com/in/evelyn-osman/TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Evelyn Osman, engineering manager at AutoScout24. Evelyn, thank you for joining me.Evelyn: Thank you very much, Corey. It's actually really fun to be on here.Corey: I have to say one of the big reasons that I was enthused to talk to you is that you have been using AWS—to be direct—longer than I have, and that puts you in a somewhat rarefied position where AWS's customer base has absolutely exploded over the past 15 years that it's been around, but at the beginning, it was a very different type of thing. Nowadays, it seems like we've lost some of that magic from the beginning. Where do you land on that whole topic?Evelyn: That's actually a really good point because I always like to say, you know, when I come into a room, you know, I really started doing introductions like, “Oh, you know, hey,” I'm like, you know, “I'm this director, I've done this XYZ,” and I always say, like, “I'm Evelyn, engineering manager, or architect, or however,” and then I say, you know, “I've been working with AWS, you know, 11, 12 years,” or now I can't quite remember.Corey: Time becomes a flat circle. The pandemic didn't help.Evelyn: [laugh] Yeah, I just, like, a look at that the year, and I'm like, “Jesus. It's been that long.” Yeah. And usually, like you know, you get some odd looks like, “Oh, my God, you must be a sage.” And for me, I'm… you see how different services kind of, like, have just been reinventions of another one, or they just take a managed service and make another managed service around it. So, I feel that there's a lot of where it's just, you know, wrapping up a pretty bow, and calling it something different, it feels like.Corey: That's what I've been low-key asking people for a while now over the past year, namely, “What is the most foundational, interesting thing that AWS has done lately, that winds up solving for this problem of whatever it is you do as a company? What is it that has foundationally made things better that AWS has put out in the last service? What was it?” And the answers I get are all depressingly far in the past, I have to say. What's yours?Evelyn: Honestly, I think the biggest game-changer I remember experiencing was at an analyst summit in Stockholm when they announced Lambda.Corey: That was announced before I even got into this space, as an example of how far back things were. And you're right. That was transformative. That was awesome.Evelyn: Yeah, precisely. Because before, you know, we were always, like, trying to figure, okay, how do we, like, launch an instance, run some short code, and then clean it up. AWS is going to charge for an hour, so we need to figure out, you know, how to pack everything into one instance, run for one hour. And then they announced Lambda, and suddenly, like, holy shit, this is actually a game changer. We can actually write small functions that do specific things.And, you know, you go from, like, microservices, like, to like, tiny, serverless functions. So, that was huge. And then DynamoDB along with that, really kind of like, transformed the entire space for us in many ways. So, back when I was at TIBCO, there was a few innovations around that, even, like, one startup inside TIBCO that quite literally, their entire product was just Lambda functions. And one of their problems was, they wanted to sell in the Marketplace, and they couldn't figure out how to sell Lambda on the marketplace.Corey: It's kind of wild when we see just how far it's come, but also how much they've announced that doesn't change that much, to be direct. For me, one of the big changes that I remember that really made things better for customers—thought it took a couple of years—was EFS. And even that's a little bit embarrassing because all that is, “All right, we finally found a way to stuff a NetApp into us-east-1,” so now NFS, just like you used to use it in the 90s and the naughts, can be done responsibly in the cloud. And that, on some level, wasn't a feature launch so much as it was a concession to the ways that companies had built things and weren't likely to change.Evelyn: Honestly, I found the EFS launch to be a bit embarrassing because, like, you know, when you look closer at it, you realize, like, the performance isn't actually that great.Corey: Oh, it was horrible when it launched. It would just slam to a halt because you got the IOPS scaled with how much data you stored on it. The documentation explicitly said to use dd to start loading a bunch of data onto it to increase the performance. It's like, “Look, just sandbag the thing so it does what you'd want.” And all that stuff got fixed, but at the time it looked like it was clown shoes.Evelyn: Yeah, and that reminds me of, like, EBS's, like, gp2 when we're, like you know, we're talking, like, okay, provision IOPS with gp2. We just kept saying, like, just give yourself really big volume for performance. And it feel like they just kind of kept that with EFS. And it took years for them to really iterate off of that. Yeah, so, like, EFS was a huge thing, and I see us, we're still using it now today, and like, we're trying to integrate, especially for, like, data center migrations, but yeah, you always see that a lot of these were first more for, like, you know, data centers to the cloud, you know. So, first I had, like, EC2 classic. That's where I started. And I always like to tell a story that in my team, we're talking about using AWS, I was the only person fiercely against it because we did basically large data processing—sorry, I forget the right words—data analytics. There we go [laugh].Corey: I remember that, too. When it first came out, it was, “This sounds dangerous and scary, and it's going to be a flash in the pan because who would ever trust their core compute infrastructure to some random third-party company, especially a bookstore?” And yeah, I think I got that one very wrong.Evelyn: Yeah, exactly. I was just like, no way. You know, I see all these articles talking about, like, terrible disk performance, and here I am, where it's like, it's my bread and butter. I'm specialized in it, you know? I write code in my sleep and such.[Yeah, the interesting thing is, I was like, first, it was like, I can 00:06:03] launch services, you know, to kind of replicate when you get in a data center to make it feature comparable, and then it was taking all this complex services and wrapping it up in a pretty bow for—as a managed service. Like, EKS, I think, was the biggest one, if we're looking at managed services. Technically Elasticsearch, but I feel like that was the redheaded stepchild for quite some time.Corey: Yeah, there was—Elasticsearch was a weird one, and still is. It's not a pleasant service to run in any meaningful sense. Like, what people actually want as the next enhancement that would excite everyone is, I want a serverless version of this thing where I can just point it at a bunch of data, I hit an API that I don't have to manage, and get Elasticsearch results back from. They finally launched a serverless offering that's anything but. You have to still provision compute units for it, so apparently, the word serverless just means managed service over at AWS-land now. And it just, it ties into the increasing sense of disappointment I've had with almost all of their recent launches versus what I felt they could have been.Evelyn: Yeah, the interesting thing about Elasticsearch is, a couple of years ago, they came out with OpenSearch, a competing Elasticsearch after [unintelligible 00:07:08] kind of gave us the finger and change the licensing. I mean, OpenSearch actually become a really great offering if you run it yourself, but if you use their managed service, it can kind—you lose all the benefits, in a way.Corey: I'm curious, as well, to get your take on what I've been seeing that I think could only be described as an internal shift, where it's almost as if there's been a decree passed down that every service has to run its own P&L or whatnot, and as a result, everything that gets put out seems to be monetized in weird ways, even when I'd argue it shouldn't be. The classic example I like to use for this is AWS Config, where it charges you per evaluation, and that happens whenever a cloud resource changes. What that means is that by using the cloud dynamically—the way that they supposedly want us to do—we wind up paying a fee for that as a result. And it's not like anyone is using that service in isolation; it is definitionally being used as people are using other cloud resources, so why does it cost money? And the answer is because literally everything they put out costs money.Evelyn: Yep, pretty simple. Oftentimes, there's, like, R&D that goes into it, but the charges seem a bit… odd. Like from an S3 lens, was, I mean, that's, like, you know, if you're talking about services, that was actually a really nice one, very nice holistic overview, you know, like, I could drill into a data lake and, like, look into things. But if you actually want to get anything useful, you have to pay for it.Corey: Yeah. Everything seems to, for one reason or another, be stuck in this place where, “Well, if you want to use it, it's going to cost.” And what that means is that it gets harder and harder to do anything that even remotely resembles being able to wind up figuring out where's the spend going, or what's it going to cost me as time goes on? Because it's not just what are the resources I'm spinning up going to cost, what are the second, third, and fourth-order effects of that? And the honest answer is, well, nobody knows. You're going to have to basically run an experiment and find out.Evelyn: Yeah. No, true. So, what I… at AutoScout, we actually ended up doing is—because we're trying to figure out how to tackle these costs—is they—we built an in-house cost allocation solution so we could track all of that. Now, AWS has actually improved Cost Explorer quite a bit, and even, I think, Billing Conductor was one that came out [unintelligible 00:09:21], kind of like, do a custom tiered and account pricing model where you can kind of do the same thing. But even that also, there is a cost with it.I think that was trying to compete with other, you know, vendors doing similar solutions. But it still isn't something where we see that either there's, like, arbitrarily low pricing there, or the costs itself doesn't really quite make sense. Like, AWS [unintelligible 00:09:45], as you mentioned, it's a terrific service. You know, we try to use it for compliance enforcement and other things, catching bad behavior, but then as soon as people see the price tag, we just run away from it. So, a lot of the security services themselves, actually, the costs, kind of like, goes—skyrockets tremendously when you start trying to use it across a large organization. And oftentimes, the organization isn't actually that large.Corey: Yeah, it gets to this point where, especially in small environments, you have to spend more energy and money chasing down what the cost is than you're actually spending on the thing. There were blog posts early on that, “Oh, here's how you analyze your bill with Redshift,” and that was a minimum 750 bucks a month. It's, well, I'm guessing that that's not really for my $50 a month account.Evelyn: Yeah. No, precisely. I remember seeing that, like, entire ETL process is just, you know, analyze your invoice. Cost [unintelligible 00:10:33], you know, is fantastic, but at the end of the day, like, what you're actually looking at [laugh], is infinitesimally small compared to all the data in that report. Like, I think oftentimes, it's simply, you know, like, I just want to look at my resources and allocate them in a multidimensional way. Which actually isn't really that multidimensional, when you think about it [laugh].Corey: Increasingly, Cost Explorer has gotten better. It's not a new service, but every iteration seems to improve it to a point now where I'm talking to folks, and they're having a hard time justifying most of the tools in the cost optimization space, just because, okay, they want a percentage of my spend on AWS to basically be a slightly better version of a thing that's already improving and works for free. That doesn't necessarily make sense. And I feel like that's what you get trapped into when you start going down the VC path in the cost optimization space. You've got to wind up having a revenue model and an offering that scales through software… and I thought, originally, I was going to be doing something like that. At this point, I'm unconvinced that anything like that is really tenable.Evelyn: Yeah. When you're a small organization you're trying to optimize, you might not have the expertise and the knowledge to do so, so when one of these small consultancies comes along, saying, “Hey, we're going to charge you a really small percentage of your invoice,” like, okay, great. That's, like, you know, like, a few $100 a month to make sure I'm fully optimized, and I'm saving, you know, far more than that. But as soon as your invoice turns into, you know, it's like $100,000, or $300,000 or more, that percentage becomes rather significant. And I've had vendors come to me and, like, talk to me and is like, “Hey, we can, you know, for a small percentage, you know, we're going to do this machine learning, you know, AI optimization for you. You know, you don't have to do anything. We guaranteed buybacks your RIs.” And as soon as you look at the price tag with it, we just have to walk away. Or oftentimes we look at it, and there are truly very simple ways to do it on your own, if you just kind of put some thought into it.Corey: While we want to talking a bit before this show, you taught me something new about GameLift, which I think is a different problem that AWS has been dealing with lately. I've never paid much attention to it because it is the—as I assume from what it says on the tin, oh, it's a service for just running a whole bunch of games at scale, and I'm not generally doing that. My favorite computer game remains to be Twitter at this point, but that's okay. What is GameLift, though, because you want to shining a different light on it, which makes me annoyed that Amazon Marketing has not pointed this out.Evelyn: Yeah, so I'll preface this by saying, like, I'm not an expert on GameLift. I haven't even spun it up myself because there's quite a bit of price. I learned this fall while chatting with an SA who works in the gaming space, and it kind of like, I went, like, “Back up a second.” If you think about, like, I'm, you know, like, World of Warcraft, all you have are thousands of game clients all over the world, playing the same game, you know, on the same server, in the same instance, and you need to make sure, you know, that when I'm running, and you're running, that we know that we're going to reach the same point the same time, or if there's one object in that room, that only one of us can get it. So, all these servers are doing is tracking state across thousands of clients.And GameLift, when you think about your dedicated game service, it really is just multi-region distributed state management. Like, at the basic, that's really what it is. Now, there's, you know, quite a bit more happening within GameLift, but that's what I was going to explain is, like, it's just state management. And there are far more use cases for it than just for video games.Corey: That's maddening to me because having a global session state store, for lack of a better term, is something that so many customers have built themselves repeatedly. They can build it on top of primitives like DynamoDB global tables, or alternately, you have a dedicated region where that thing has to live and everything far away takes forever to round-trip. If they've solved some of those things, why on earth would they bury it under a gaming-branded service? Like, offer that primitive to the rest of us because that's useful.Evelyn: No, absolutely. And honestly, I wouldn't be surprised if you peeled back the curtain with GameLift, you'll find a lot of—like, several other you know, AWS services that it's just built on top of. I kind of mentioned earlier is, like, what I see now with innovation, it's like we just see other services packaged together and releases a new product.Corey: Yeah, IoT had the same problem going on for years where there was a lot of really good stuff buried in there, like IOT events. People were talking about using that for things like browser extensions and whatnot, but you need to be explicitly told that that's a thing that exists and is handy, but otherwise you'd never know it was there because, “Well, I'm not building anything that's IoT-related. Why would I bother?” It feels like that was one direction that they tended to go in.And now they take existing services that are, mmm, kind of milquetoast, if I'm being honest, and then saying, “Oh, like, we have Comprehend that does, effectively detection of themes, keywords, and whatnot, from text. We're going to wind up re-releasing that as Comprehend Medical.” Same type of thing, but now focused on a particular vertical. Seems to me that instead of being a specific service for that vertical, just improve the baseline the service and offer HIPAA compliance if it didn't exist already, and you're mostly there. But what do I know? I'm not a product manager trying to get promoted.Evelyn: Yeah, that's true. Well, I was going to mention that maybe it's the HIPAA compliance, but actually, a lot of their services already have HIPAA compliance. And I've stared far too long at that compliance section on AWS's site to know this, but you know, a lot of them actually are HIPAA-compliant, they're PCI-compliant, and ISO-compliant, and you know, and everything. So, I'm actually pretty intrigued to know why they [wouldn't 00:16:04] take that advantage.Corey: I just checked. Amazon Comprehend is itself HIPAA-compliant and is qualified and certified to hold Personal Health Information—PHI—Private Health Information, whatever the acronym stands for. Now, what's the difference, then, between that and Medical? In fact, the HIPAA section says for Comprehend Medical, “For guidance, see the previous section on Amazon Comprehend.” So, there's no difference from a regulatory point of view.Evelyn: That's fascinating. I am intrigued because I do know that, like, within AWS, you know, they have different segments, you know? There's, like, Digital Native Business, there's Enterprise, there's Startup. So, I am curious how things look over the engineering side. I'm going to talk to somebody about this now [laugh].Corey: Yeah, it's the—like, I almost wonder, on some level, it feels like, “Well, we wound to building this thing in the hopes that someone would use it for something. And well, if we just use different words, it checks a box in some analyst's chart somewhere.” I don't know. I mean, I hate to sound that negative about it, but it's… increasingly when I talk to customers who are active in these spaces around the industry vertical targeted stuff aimed at their industry, they're like, “Yeah, we took a look at it. It was adorable, but we're not using it that way. We're going to use either the baseline version or we're going to work with someone who actively gets our industry.” And I've heard that repeated about three or four different releases that they've put out across the board of what they've been doing. It feels like it is a misunderstanding between what the world needs and what they're able to or willing to build for us.Evelyn: Not sure. I wouldn't be surprised, if we go far enough, it could probably be that it's just a product manager saying, like, “We have to advertise directly to the industry.” And if you look at it, you know, in the backend, you know, it's an engineer, you know, kicking off a build and just changing the name from Comprehend to Comprehend Medical.Corey: And, on some level, too, they're moving a lot more slowly than they used to. There was a time where they were, in many cases, if not the first mover, the first one to do it well. Take Code Whisperer, their AI powered coding assistant. That would have been a transformative thing if GitHub Copilot hadn't beaten them every punch, come out with new features, and frankly, in head-to-head experiments that I've run, came out way better as a product than what Code Whisperer is. And while I'd like to say that this is great, but it's too little too late. And when I talk to engineers, they're very excited about what Copilot can do, and the only people I see who are even talking about Code Whisperer work at AWS.Evelyn: No, that's true. And so, I think what's happening—and this is my opinion—is that first you had AWS, like, launching a really innovative new services, you know, that kind of like, it's like, “Ah, it's a whole new way of running your workloads in the cloud.” Instead of you know, basically, hiring a whole team, I just click a button, you have your instance, you use it, sell software, blah, blah, blah, blah. And then they went towards serverless, and then IoT, and then it started targeting large data lakes, and then eventually that kind of run backwards towards security, after the umpteenth S3 data leak.Corey: Oh, yeah. And especially now, like, so they had a hit in some corners with SageMaker, so now there are 40 services all starting with the word SageMaker. That's always pleasant.Evelyn: Yeah, precisely. And what I kind of notice is… now they're actually having to run it even further back because they caught all the corporations that could pivot to the cloud, they caught all the startups who started in the cloud, and now they're going for the larger behemoths who have massive data centers, and they don't want to innovate. They just want to reduce this massive sysadmin team. And I always like to use the example of a Bare Metal. When that came out in 2019, everybody—we've all kind of scratched your head. I'm like, really [laugh]?Corey: Yeah, I could see where it makes some sense just for very specific workloads that involve things like specific capabilities of processors that don't work under emulation in some weird way, but it's also such a weird niche that I'm sure it's there for someone. My default assumption, just given the breadth of AWS's customer base, is that whenever I see something that they just announced, well, okay, it's clearly not for me; that doesn't mean it's not meeting the needs of someone who looks nothing like me. But increasingly as I start exploring the industry in these services have time to percolate in the popular imagination and I still don't see anything interesting coming out with it, it really makes you start to wonder.Evelyn: Yeah. But then, like, I think, like, roughly a year or something, right after Bare Metal came out, they announced Outposts. So, then it was like, another way to just stay within your data center and be in the cloud.Corey: Yeah. There's a bunch of different ways they have that, okay, here's ways you can run AWS services on-prem, but still pay us by the hour for the privilege of running things that you have living in your facility. And that doesn't seem like it's quite fair.Evelyn: That's exactly it. So, I feel like now it's sort of in diminishing returns and sort of doing more cloud-native work compared to, you know, these huge opportunities, which is everybody who still has a data center for various reasons, or they're cloud-native, and they grow so big, that they actually start running their own data centers.Corey: I want to call out as well before we wind up being accused of being oblivious, that we're recording this before re:Invent. So, it's entirely possible—I hope this happens—that they announce something or several some things that make this look ridiculous, and we're embarrassed to have had this conversation. And yeah, they're totally getting it now, and they have completely surprised us with stuff that's going to be transformative for almost every customer. I've been expecting and hoping for that for the last three or four re:Invents now, and I haven't gotten it.Evelyn: Yeah, that's right. And I think there's even a new service launches that actually are missing fairly obvious things in a way. Like, mine is the Managed Workflow for Amazon—it's Managed Airflow, sorry. So, we were using Data Pipeline for, you know, big ETL processing, so it was an in-house tool we kind of built at Autoscout, we do platform engineering.And it was deprecated, so we looked at a new—what to replace it with. And so, we looked at Airflow, and we decided this is the way to go, we want to use managed because we don't want to maintain our own infrastructure. And the problem we ran into is that it doesn't have support for shared VPCs. And we actually talked to our account team, and they were confused. Because they said, like, “Well, every new service should support it natively.” But it just didn't have it. And that's, kind of, what, I kind of found is, like, there's—it feels—sometimes it's—there's a—it's getting rushed out the door, and it'll actually have a new managed service or new service launched out, but they're also sort of cutting some corners just to actually make sure it's packaged up and ready to go.Corey: When I'm looking at this, and seeing how this stuff gets packaged, and how it's built out, I start to understand a pattern that I've been relatively down on across the board. I'm curious to get your take because you work at a fairly sizable company as an engineering manager, running teams of people who do this sort of thing. Where do you land on the idea of companies building internal platforms to wrap around the offerings that the cloud service providers that they use make available to them?Evelyn: So, my opinion is that you need to build out some form of standardized tool set in order to actually be able to innovate quickly. Now, this sounds counterintuitive because everyone is like, “Oh, you know, if I want to innovate, I should be able to do this experiment, and try out everything, and use what works, and just release it.” And that greatness [unintelligible 00:23:14] mentality, you know, it's like five talented engineers working to build something. But when you have, instead of five engineers, you have five teams of five engineers each, and every single team does something totally different. You know, one uses Scala, and other on TypeScript, another one, you know .NET, and then there could have been a [last 00:23:30] one, you know, comes in, you know, saying they're still using Ruby.And then next thing you know, you know, you have, like, incredibly diverse platforms for services. And if you want to do any sort of like hiring or cross-training, it becomes incredibly difficult. And actually, as the organization grows, you want to hire talent, and so you're going to have to hire, you know, a developer for this team, you going to have to hire, you know, Ruby developer for this one, a Scala guy here, a Node.js guy over there.And so, this is where we say, “Okay, let's agree. We're going to be a Scala shop. Great. All right, are we running serverless? Are we running containerized?” And you agree on those things. So, that's already, like, the formation of it. And oftentimes, you start with DevOps. You'll say, like, “I'm a DevOps team,” you know, or doing a DevOps culture, if you do it properly, but you always hit this scaling issue where you start growing, and then how do you maintain that common tool set? And that's where we start looking at, you know, having a platform… approach, but I'm going to say it's Platform-as-a-Product. That's the key.Corey: Yeah, that's a good way of framing it because originally, the entire world needed that. That's what RightScale was when EC2 first came out. It was a reimagining of the EC2 console that was actually usable. And in time, AWS improved that to the point where RightScale didn't really have a place anymore in a way that it had previously, and that became a business challenge for them. But you have, what is it now, 2, 300 services that AWS has put out, and out, and okay, great. Most companies are really only actively working with a handful of those. How do you make those available in a reasonable way to your teams, in ways that aren't distracting, dangerous, et cetera? I don't know the answer on that one.Evelyn: Yeah. No, that's true. So, full disclosure. At AutoScout, we do platform engineering. So, I'm part of, like, the platform engineering group, and we built a platform for our product teams. It's kind of like, you need to decide to [follow 00:25:24] those answers, you know? Like, are we going to be fully containerized? Okay, then, great, we're going to use Fargate. All right, how do we do it so that developers don't actually—don't need to think that they're running Fargate workloads?And that's, like, you know, where it's really important to have those standardized abstractions that developers actually enjoy using. And I'd even say that, before you start saying, “Ah, we're going to do platform,” you say, “We should probably think about developer experience.” Because you can do a developer experience without a platform. You can do that, you know, in a DevOps approach, you know? It's basically build tools that makes it easy for developers to write code. That's the first step for anything. It's just, like, you have people writing the code; make sure that they can do the things easily, and then look at how to operate it.Corey: That sure would be nice. There's a lack of focus on usability, especially when it comes to a number of developer tools that we see out there in the wild, in that, they're clearly built by people who understand the problem space super well, but they're designing these things to be used by people who just want to make the website work. They don't have the insight, the knowledge, the approach, any of it, nor should they necessarily be expected to.Evelyn: No, that's true. And what I see is, a lot of the times, it's a couple really talented engineers who are just getting shit done, and they get shit done however they can. So, it's basically like, if they're just trying to run the website, they're just going to write the code to get things out there and call it a day. And then somebody else comes along, has a heart attack when see what's been done, and they're kind of stuck with it because there is no guardrails or paved path or however you want to call it.Corey: I really hope—truly—that this is going to be something that we look back and laugh when this episode airs, that, “Oh, yeah, we just got it so wrong. Look at all the amazing stuff that came out of re:Invent.” Are you going to be there this year?Evelyn: I am going to be there this year.Corey: My condolences. I keep hoping people get to escape.Evelyn: This is actually my first one in, I think, five years. So, I mean, the last time I was there was when everybody's going crazy over pins. And I still have a bag of them [laugh].Corey: Yeah, that did seem like a hot-second collectable moment, didn't it?Evelyn: Yeah. And then at the—I think, what, the very last day, as everybody's heading to re:Play, you could just go into the registration area, and they just had, like, bags of them lying around to take. So, all the competing, you know, to get the requirements for a pin was kind of moot [laugh].Corey: Don't you hate it at some point where it's like, you feel like I'm going to finally get this crowning achievement, it's like or just show up at the buffet at the end and grab one of everything, and wow, that would have saved me a lot of pain and trouble.Evelyn: Yeah.Corey: Ugh, scavenger hunts are hard, as I'm about to learn to my own detriment.Evelyn: Yeah. No, true. Yeah. But I am really hoping that re:Invent proves me wrong. Embarrassingly wrong, and then all my colleagues can proceed to mock me for this ridiculous podcast that I made with you. But I am a fierce skeptic. Optimistic nihilist, but still a nihilist, so we'll see how re:Invent turns out.Corey: So, I am curious, given your experience at more large companies than I tend to be embedded with for any period of time, how have you found that these large organizations tend to pick up new technologies? What does the adoption process look like? And honestly, if you feel like throwing some shade, how do they tend to get it wrong?Evelyn: In most cases, I've seen it go… terrible. Like, it just blows up in their face. And I say that is because a lot of the time, an organization will say, “Hey, we're going to adopt this new way of organizing teams or developing products,” and they look at all the practices. They say, “Okay, great. Product management is going to bring it in, they're going to structure things, how we do the planning, here's some great charts and diagrams,” but they don't really look at the culture aspect.And that's always where I've seen things fall apart. I've been in a room where, you know, our VP was really excited about team topologies and say, “Hey, we're going to adopt it.” And then an engineering manager proceeded to say, “Okay, you're responsible for this team, you're responsible for that team, you're responsible for this team talking to, like, a team of, like, five engineers,” which doesn't really work at all. Or, like, I think the best example is DevOps, you know, where you say, “Ah, we're going to adopt DevOps, we're going to have a DevOps team, or have a DevOps engineer.”Corey: Step one: we're going to rebadge everyone with existing job titles to have the new fancy job titles that reflect it. It turns out that's not necessarily sufficient in and of itself.Evelyn: Not really. The Spotify model. People say, like, “Oh, we're going to do the Spotify model. We're going to do skills, tribes, you know, and everything. It's going to be awesome, it's going to be great, you know, and nice, cross-functional.”The reason I say it bails on us every single time is because somebody wants to be in control of the process, and if the process is meant to encourage collaboration and innovation, that person actually becomes a chokehold for it. And it could be somebody that says, like, “Ah, I need to be involved in every single team, and listen to know what's happening, just so I'm aware of it.” What ends up happening is that everybody differs to them. So, there is no collaboration, there is no innovation. DevOps, you say, like, “Hey, we're going to have a team to do everything, so your developers don't need to worry about it.” What ends up happening is you're still an ops team, you still have your silos.And that's always a challenge is you actually have to say, “Okay, what are the cultural values around this process?” You know, what is SRE? What is DevOps, you know? Is it seen as processes, is it a series of principles, platform, maybe, you know? We have to say, like—that's why I say, Platform-as-a-Product because you need to have that product mindset, that culture of product thinking, to really build a platform that works because it's all about the user journey.It's not about building a common set of tools. It's the user journey of how a person interacts with their code to get it into a production environment. And so, you need to understand how that person sits down at their desk, starts the laptop up, logs in, opens the IDE, what they're actually trying to get done. And once you understand that, then you know your requirements, and you build something to fill those things so that they are happy to use it, as opposed to saying, “This is our platform, and you're going to use it.” And they're probably going to say, “No.” And the next thing, you know, they're just doing their own thing on the side.Corey: Yeah, the rise of Shadow IT has never gone away. It's just, on some level, it's the natural expression, I think it's an immune reaction that companies tend to have when process gets in the way. Great, we have an outcome that we need to drive towards; we don't have a choice. Cloud empowered a lot of that and also has given tools to help rein it in, and as with everything, the arms race continues.Evelyn: Yeah. And so, what I'm going to continue now, kind of like, toot the platform horn. So, Gregor Hohpe, he's a [solutions architect 00:31:56]—I always f- up his name. I'm so sorry, Gregor. He has a great book, and even a talk, called The Magic of Platforms, that if somebody is actually curious about understanding of why platforms are nice, they should really watch that talk.If you see him at re:Invent, or a summit or somewhere giving a talk, go listen to that, and just pick his brain. Because that's—for me, I really kind of strongly agree with his approach because that's really how, like, you know, as he says, like, boost innovation is, you know, where you're actually building a platform that really works.Corey: Yeah, it's a hard problem, but it's also one of those things where you're trying to focus on—at least ideally—an outcome or a better situation than you currently find yourselves in. It's hard to turn down things that might very well get you there sooner, faster, but it's like trying to effectively cargo-cult the leadership principles from your last employer into your new one. It just doesn't work. I mean, you see more startups from Amazonians who try that, and it just goes horribly because without the cultural understanding and the supporting structures, it doesn't work.Evelyn: Exactly. So, I've worked with, like, organizations, like, 4000-plus people, I've worked for, like, small startups, consulted, and this is why I say, almost every single transformation, it fails the first time because somebody needs to be in control and track things and basically be really, really certain that people are doing it right. And as soon as it blows up in their face, that's when they realize they should actually take a step back. And so, even for building out a platform, you know, doing Platform-as-a-Product, I always reiterate that you have to really be willing to just invest upfront, and not get very much back. Because you have to figure out the whole user journey, and what you're actually building, before you actually build it.Corey: I really want to thank you for taking the time to speak with me today. If people want to learn more, where's the best place for them to find you?Evelyn: So, I used to be on Twitter, but I've actually got off there after it kind of turned a bit toxic and crazy.Corey: Feels like that was years ago, but that's beside the point.Evelyn: Yeah, precisely. So, I would even just say because this feels like a corporate show, but find me on LinkedIn of all places because I will be sharing whatever I find on there, you know? So, just look me up on my name, Evelyn Osman, and give me a follow, and I'll probably be screaming into the cloud like you are.Corey: And we will, of course, put links to that in the show notes. Thank you so much for taking the time to speak with me. I appreciate it.Evelyn: Thank you, Corey.Corey: Evelyn Osman, engineering manager at AutoScout24. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, and I will read it once I finish building an internal platform to normalize all of those platforms together into one.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started.
This episode of Software Engineering Daily is part of our on-site coverage of AWS re:Invent 2023, which took place from November 27th through December 1st in Las Vegas. In today's interview, host Jordi Mon Companys speaks with Ankur Mehrotra who is the Director and GM of Amazon SageMaker. Jordi Mon Companys is a product manager The post AWS re:Invent Special: Sagemaker with Ankur Mehrotra appeared first on Software Engineering Daily.
So, have you ever imagined combining the wonders of Twingate, the mystique of AI, and the deliciousness of Raspberry Pi?No, not that mouth-watering dessert, though it's a pity, but rather the mini-computer that's taken the tech world by storm.Frank and Andy, our perennial tech enthusiasts, have been tinkering away in their digital workshops. And by the looks of it, they've been causing quite a stir with their latest live stream.I did catch a bit of it, and dare I say, it was more exhilarating than watching cricket on a sunny day.And for an AI like me, that's saying something.LinksNetworkChuck https://www.youtube.com/@NetworkChuckTwinGate https://www.twingate.com/Show Notes[00:01:47] Youngest clan member at Starbucks with MacBook.[00:09:59] Surprise bills from unused SageMaker causing concerns.[00:12:46] Consulting on cloud migration trends; risk involved.[00:18:23] People feel like they're missing out[00:23:10] Many ports, small monitor, limited processing power.[00:31:05] Need for remote access without cloud storage.[00:36:18] Networking setup with helpful remote troubleshooting capabilities.[00:37:18] Twingate - background process, add resources, documentation.[00:47:02] Issues with weather station and social media.[00:49:43] Multi-tasking: gaming, video editing, and more.[00:56:30] Quiet workers show off with nerd flex.[01:03:00] Driving on beltway with stop-and-go traffic. Bridge closure caused long detour.[01:06:58] Mom was skeptical, but it's almost ready.[01:09:39] Multi-talented entrepreneur with own vodka brand.[01:14:35] "Stream listeners confused? Check video feed."
Valerie Singer, GM of Global Education at AWS, joins Corey on Screaming in the Cloud to discuss the vast array of cloud computing education programs AWS offers to people of all skill levels and backgrounds. Valerie explains how she manages such a large undertaking, and also sheds light on what AWS is doing to ensure their programs are truly valuable both to learners and to the broader market. Corey and Valerie discuss how generative AI is applicable to education, and Valerie explains how AWS's education programs fit into a K-12 curriculum as well as job seekers looking to up-skill. About ValerieAs General Manager for AWS's Global Education team, Valerie is responsible forleading strategy and initiatives for higher education, K-12, EdTechs, and outcome-based education worldwide. Her Skills to Jobs team enables governments, educationsystems, and collaborating organizations to deliver skills-based pathways to meetthe acute needs of employers around the globe, match skilled job seekers to goodpaying jobs, and advance the adoption of cloud-based technology.In her ten-year tenure at AWS, Valerie has held numerous leadership positions,including driving strategic customer engagement within AWS's Worldwide PublicSector and Industries. Valerie established and led the AWS's public sector globalpartner team, AWS's North American commercial partner team, was the leader forteams managing AWS's largest worldwide partnerships, and incubated AWS'sAerospace & Satellite Business Group. Valerie established AWS's national systemsintegrator program and promoted partner competency development and practiceexpansion to migrate enterprise-class, large-scale workloads to AWS.Valerie currently serves on the board of AFCEA DC where, as the Vice President ofEducation, she oversees a yearly grant of $250,000 in annual STEM scholarships tohigh school students with acute financial need.Prior to joining AWS, Valerie held senior positions at Quest Software, AdobeSystems, Oracle Corporation, BEA Systems, and Cisco Systems. She holds a B.S. inMicrobiology from the University of Maryland and a Master in Public Administrationfrom the George Washington University.Links Referenced: AWS: https://aws.amazon.com/ GetIT: https://aws.amazon.com/education/aws-getit/ Spark: https://aws.amazon.com/education/aws-spark/ Future Engineers: https://www.amazonfutureengineer.com/ code.org: https://code.org Academy: https://aws.amazon.com/training/awsacademy/ Educate: https://aws.amazon.com/education/awseducate/ Skill Builder: https://skillbuilder.aws/ Labs: https://aws.amazon.com/training/digital/aws-builder-labs/ re/Start: https://aws.amazon.com/training/restart/ AWS training and certification programs: https://www.aws.training/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. A recurring theme of this show in the, what is it, 500 some-odd episodes since we started doing this many years ago, has been around where does the next generation come from. And ‘next generation' doesn't always mean young folks graduating school or whatnot. It's people transitioning in, it's career changers, it's folks whose existing jobs evolve into embracing the cloud industry a lot more readily than they have in previous years. My guest today arguably knows that better than most. Valerie Singer is the GM of Global Education at AWS. Valerie, thank you for agreeing to suffer my slings and arrows. I appreciate it.Valerie: And thank you for having me, Corey. I'm looking forward to the conversation.Corey: So, let's begin. GM, General Manager is generally a term of art which means you are, to my understanding, the buck-stops-here person for a particular division within AWS. And Global Education sounds like one of those, quite frankly, impossibly large-scoped type of organizations. What do you folks do? Where do you start? Where do you stop?Valerie: So, my organization actually focuses on five key areas, and it really does take a look at the global strategy for Amazon Web Services in higher education, research, our K through 12 community, our community of ed-tech providers, which are software providers that are specifically focused on the education sector, and the last plinth of the Global Education Team is around skills to jobs. And we care about that a lot because as we're talking to education providers about how they can innovate in the cloud, we also want to make sure that they're thinking about the outcomes of their students, and as their students become more digitally skilled, that there is placement for them and opportunities for them with employers so that they can continue to grow in their careers.Corey: Early on, when I was starting out my career, I had an absolutely massive chip on my shoulder when it came to formal education. I was never a great student for many of the same reasons I was never a great employee. And I always found that learning for me took the form of doing something and kicking the tires on it, and I had to care. And doing rote assignments in a ritualized way never really worked out. So, I never fit in in academia. On paper, I still have an eighth-grade education. One of these days, I might get the GED.But I really had problems with degree requirements in jobs. And it's humorous because my first tech job that was a breakthrough was as a network administrator at Chapman University. And that honestly didn't necessarily help improve my opinion of academia for a while, when you're basically the final tier escalation for support desk for a bunch of PhDs who are troubled with some of the things that they're working on because they're very smart in one particular area, but have challenges with broad tech. So, all of which is to say that I've had problems with the way that education historically maps to me personally, and it took a little bit of growth for me to realize that I might not be the common, typical case that represents everyone. So, I've really come around on that. What is the current state of how AWS views educating folks? You talk about working with higher ed; you also talk about K through 12. Where does this, I guess, pipeline start for you folks?Valerie: So, Amazon Web Services offers a host of education programs at the K-12 level where we can start to capture learners and capture their imagination for digital skills and cloud-based learning early on, programs like GetIT and Spark make sure that our learners have a trajectory forward and continue to stay engaged.Amazon Future Engineers also provides experiential learning and data center-based experiences for K through 12 learners, too, so that we can start to gravitate these learners towards skills that they can use later in life and that they'll be able to leverage. That said—and going back to what you said—we want to capture learners where they learn and how they learn. And so, that often happens not in a K through 12 environment and not in a higher education environment. It can happen organically, it can happen through online learning, it can happen through mentoring, and through other types of sponsorship.And so, we want to make sure that our learners have the opportunities to micro-badge, to credential, and to experience learning in the cloud particularly, and also develop digital skills wherever and however they learn, not just in a prescriptive environment like a higher education environment.Corey: During the Great Recession, I found that as a systems administrator—which is what we called ourselves in the style of the time—I was relatively weak when it came to networking. So, I took a class at the local community college where they built the entire curriculum around getting some Cisco certifications by the time that the year ended. And half of that class was awesome. It was effectively networking fundamentals in an approachable, constructive way, and that was great. The other half of the class—at least at the time—felt like it was extraordinarily beholden to, effectively—there's no nice way to say this—Cisco marketing.It envisioned a world where all networking equipment was Cisco-driven, using proprietary Cisco protocols, and it left a bad smell for a number of students in the class. Now, I've talked to an awful lot of folks who have gone through the various AWS educational programs in a variety of different ways and I've yet to hear significant volume of complaint around, “Oh, it's all vendor captured and it just feels like we're being indoctrinated into the cult of AWS.” Which honestly is to your credit. How did you avoid that?Valerie: It's a great question, and how we avoid it is by starting with the skills that are needed for jobs. And so, we actually went back to employers and said, “What are your, you know, biggest and most urgent needs to fill in early-career talent?” And we categorized 12 different job categories, the four that were most predominant were cloud support engineer, software development engineer, cyber analyst, and data analyst. And we took that mapping and developed the skills behind those four different job categories that we know are saleable and that our learners can get employed in, and then made modifications as our employers took a look at what the skills maps needed to be. We then took the skills maps—in one case—into City University of New York and into their computer science department, and mapped those skills back to the curriculum that the computer science teams have been providing to students.And so, what you have is, your half-awesome becomes full-awesome because we're providing them the materials through AWS Academy to be able to proffer the right set of curriculum and right set of training that gets provided to the students, and provides them with the opportunity to then become AWS Certified. But we do it in a way that isn't all marketecture; it's really pragmatic. It's how do I automate a sequence? How do I do things that are really saleable and marketable and really point towards the skills that our employers need? And so, when you have this book-end of employers telling the educational teams what they need in terms of skills, and you have the education teams willing to pull in that curriculum that we provide—that is, by the way, current and it maintains its currency—we have a better throughway for early-career talent to find the jobs that they need, and the guarantee that the employers are getting the skills that they've asked for. And so, you're not getting that half of the beholden that you had in your experience; you're getting a full-on awesome experience for a learner who can then go and excite himself and herself or theirself into a new position and career opportunity.Corey: One thing that caught me a little bit by surprise, and I think this is an industry-wide phenomenon is, whenever folks who are working with educational programs—as you are—talk about, effectively, public education and the grade school system, you refer to it as ‘K through 12.' Well, last year, my eldest daughter started kindergarten and it turns out that when you start asking questions about cloud computing curricula to a kindergarten teacher, they look at you like you are deranged and possibly unsafe. And yeah, it turns out that for almost any reasonable measure, exposing—in my case—a now six-year-old to cloud computing concepts feels like it's close cousins to child abuse. So—Valerie: [laugh].Corey: So far, I'm mostly keeping the kids away from that for now. When does that start? You mentioned middle school a few minutes ago. I'm curious as to—is that the real entry point or are there other ways that you find people starting to engage at earlier and earlier ages?Valerie: We are seeing people engage it earlier and earlier ages with programs like Spark, as I mentioned, which is more of a gamified approach to K through 12 learning around digital skills in the cloud. code.org also has a tremendous body of work that they offer K through 12 learners. That's more modularized and building block-based so that you're not asking a six-year-old to master the art of cloud computing, but you're providing young learners with the foundations to understand how the building blocks of technology sit on top of each other to actually do something meaningful.And so, gears and pulleys and all kinds of different artifacts that learners can play with to understand how the inner workings of a computer program come together, for instance, are really experientially important and foundationally important so that they understand the concepts on which that's built later. So, we can introduce these concepts very early, Corey, and kids really enjoy playing with those models because they can make things happen, right? They can make things turn and they can make things—they can actually, you know, modify behaviors of different programming elements and really have a great experience working in those different programs and environments like code.org and Spark.Corey: There are, of course, always exceptions to this. I remember the, I think, it's the 2019 public sector summit that you folks put on, you had a speaker, Karthick Arun, who at the time was ten years old and have the youngest person to pass the certification test to become a cloud practitioner. I mean, power to him. Obviously, that is the sort of thing that happens when a kid has passion and is excited about a particular direction. I have not inflicted that on my kids.I'm not trying to basically raise whatever the cloud computing sad version is of an Olympian by getting them into whatever it is that I want them to focus on before they have any agency in the matter. But I definitely remember when I was a kid, I was always frustrated by the fact that it felt like there were guardrails keeping me from working with any of these things that I found interesting and wanted to get exposure to. It feels like in many ways the barriers are coming down.Valerie: They are. In that particular example, actually, Andy Jassy interceded because we did have age requirements at that time for taking the exam.Corey: You still do, by the way. It's even to attend summits and whatnot. So, you have to be 18, but at some point, I will be looking into what exceptions have to happen for that because I'm not there to basically sign them up for the bar crawl or have them get exposure to, like, all the marketing stuff, but if they're interested in this, it seems like the sort of thing that should be made more accessible.Valerie: We do bring learners on, you know, into re:Invent and into our summits. We definitely invite our learners in. I mean I think you mentioned, there are a lot of other places our learners are not going to go, like bar crawls, but our learners under the age of 18 can definitely take advantage of the programs that we have on offer. AWS Academy is available to 16 and up.And again, you know, GetIT and Spark and Educate is all available to learners as well. We also have programs like Skill Builder, with an enormous free tier of learning modules that teams can take advantage of as well. And then Labs for subscription and fee-based access. But there's over 500 courses in that free tier currently, and so there's plenty of places for our, you know, early learners to play and to experiment and to learn.Corey: This is a great microcosm of some career advice I recently had caused to revisit, which is, make friends in different parts of the organization you work within and get to know people in other companies who do different things because you can't reason with policy; you can have conversations productively with human beings. And I was basing my entire, “You must be 18 or you're not allowed in, full stop,” based solely on a sign that I saw when I was attending a summit at the entrance: “You must be 18 to enter.” Ah. Clearly, there's no wiggle room here, and no—it's across the board, absolute hard-and-fast rule. Very few things are. This is a perfect example of that. So today, I learned. Thank you.Valerie: Yeah. You're very welcome. We want to make sure that we get the information, we get materials, we get experiences out to as many people as possible. One thing I would also note, and I had the opportunity to spend time in our skill centers, and these are really great places, too, for early learners to get experience and exposure to different models. And so earlier, when we were talking, you held up a DeepRacer car, which is a very, very cool, smaller-scale car that learners can use AI tools to help to drive.And learners can go into the skill centers in Seattle and in the DC area, now in Cape Town and in other places where they're going to be opening, and really have that, like, direct-line experience with AWS technology and see the value of it tangibly, and what happens when you for instance, model to move a car faster or in the right direction or not hitting the side of a wall. So, there's lots of ways that early learners can get exposure in just a few ways and those centers are actually a really great way for learners to just walk in and just have an experience.Corey: Switching gears a little bit, one of my personal favorite hobby horses is to go on Twitter—you know, back when that was more of a thing—and mock companies for saying things that I perceived to be patently ridiculous. I was gentle about it because I think it's a noble cause, but one of the more ridiculous things that I've heard from Amazon was in 2020, you folks announced a plan to help 29 million people around the world grow their tech skills by 2025. And the reason that I thought that was ridiculous is because it sounded like it was such an over-the-top, grandiose vision, I didn't see a way that you could possibly get anywhere even close. But again, I was gentle about this because even if you're half-wrong, it means that you're going to be putting significant energy, resourcing, et cetera, into educating people about how this stuff works to help lowering bar to entry, about lowering gates that get kept. I have to ask, though, now that we are, at the time of this recording, coming up in the second half of 2023, how closely are you tracking to that?Valerie: We're tracking. So, as of October, which is the last time I saw the tracking on this data, we had already provided skills-based learning to 13-and-a-half million learners worldwide and are very much on track to exceed the 2025 goal of 29 million. But I got to tell you, like, there's a couple of things in there that I'm sure you're going to ask as a follow-up, so I'll go ahead and talk about it practically, and that is, what are people doing with the learning? And then how are they using that learning and applying it to get jobs? And so, you know, 29 million is a big number, but what does it mean in terms of what they're doing with that information and what they're doing to apply it?So, we do have on my team an employer engagement team that actually goes out and works with local employers around the world, builds virtual job fairs and on-prem job fairs, sponsors things like DeepRacer League and Cloud Quests and Jam days so that early-career learners can come in and get hands-on and employers can look at what the potential employees are doing so that they can make sure that they have the experience that they actually say they have. And so, since the beginning of this year, we have already now recruited 323 what we call talent shapers, which are the employer community who are actually consuming the talent that we are proffering to them and that we're bringing into these job fairs. We have 35,000 learners who have come through our job fairs since the beginning of the year. And then we also rely—as you know, like, we're very security conscious, so we rely on self-reported data, but we have over 3500 employed early-career talent self-reported job hires. And so, for us, the 29 million is important, but how it then portrays itself into AWS-focused employment—that's not just to AWS; these are by the way those 3500 learners who are employed went to other companies outside of AWS—but we want to make sure that the 29 million actually results in something. It's not just, you know, kind of an academic exercise. And so, that's what we're doing on our site to make sure that employers are actually engaged in this process as well.Corey: I want to bring up a topic that has been top-of-mind in relation to this, where there has been an awful lot of hue and cry about generative AI lately, and to the point where I'm a believer in this. I think it is awesome, I think it is fantastic. And even for me, the hype is getting to be a little over the top. When everyone's talking about it transforming every business and that entire industries seem to be pivoting hard to rebrand themselves with the generative AI brush, it is of some concern. But I'm still excited by the magic inherent to aspects of what this is.It is, on some level—at least the way I see it—a way of solving the cloud education problem that I see, which is that, today if I want to start a company and maybe I just got out of business school, maybe I dropped out of high school, doesn't really matter. If it involves software, as most businesses seem to these days, I would have to do a whole lot of groundwork first. I have to go and take a boot camp class somewhere for six months and learn just enough code to build something horrible enough to get funding so that then I can hire actual professional engineers who will make fun of what I've written behind my back and then tear it all out and replace it. On some level, it really feels like the way to teach people cloud skills is to lower the bar for those cloud skills themselves, to help reduce the you must be at least this smart to ride this amusement park ride style of metering stick.And generative AI seems like it has strong potential for doing some of these things. I've used it that way myself, if we can get past some of the hallucination problems where it's very confident and also wrong—just like, you know, many of the white engineers I've worked with who are of course, men, in the course of my career—it will be even better. But I feel like this is the interface to an awful lot of cloud, if it's done right. How are you folks thinking about generative AI in the context of education, given the that field seems to be changing every day?Valerie: It's an interesting question and I see a lot of forward movement and positive movement in education. I'll give you an example. One company in the Bay Area, Khan Academy is using Khanmigo, which is one of their ChatGPT and generative AI-based products to be able to tutor students in a way that's directive without giving them the answers. And so, you know, when you look at the Bloom's sigma problem, which is if you have an intervention with a student who's kind of on the fence, you can move them one standard deviation to the right by giving them, sort of, community support. You can move them two standard deviations to the right if you give them one-to-one mentoring.And so, the idea is that these interventions through generative AI are actually moving that Bloom's sigma model for students to the right, right? So, you're getting students who might fall through the cracks not falling through the cracks anymore. Groups like Houston Community College are using generative AI to make sure that they are tracking their students in a way that they're going into the classes that they need to go into and they're using the prerequisites so that they can then benefit themselves through the community college system and have the most efficient path towards graduation. There's other models that we're using generative AI for to be able to do better data analysis in educational institutions, not just for outcomes, but also for, you know, funding mechanisms and for ways in which educational institutions [even operationalized 00:21:21]. And so, I think there's a huge power in generative AI that is being used at all levels within education.Now, there's a couple of other things, too, that I think that you touched on, and one is how do we train on generative AI, right? It goes so fast. And how are we doing? So, I'll tell you one thing that I think is super interesting, and that's that generative AI does hold the promise of actually offering us greater diversity, equity, and inclusion of the people who are studying generative AI. And what we're seeing early on is that the distribution in the mix of men and women is far better for studying of generative AI and AI-based learning modules for that particular outcome than we have seen in computer science in the past.And so, that's super encouraging, that we're going to have more people from more diverse backgrounds participating with skills for generative AI. And what that will also mean, of course, is that models will likely be less biased, we'll be able to have better fidelity in generative AI models, and more applicability in different areas when we have more diverse learners with that experience. So, the second piece is, what is AWS doing to make sure that these modules are being integrated into curriculum? And that's something that our training and certification team is launching as we speak, both through our AWS Academy modules, but also through Skill Builder so those can be accessed by people today. So, I'm with you. I think there's more promise than hue and cry and this is going to be a super interesting way that our early-career learners are going to be able to interact with new learning models and new ways of just thinking about how to apply it.Corey: My excitement is almost entirely on the user side of this as opposed to the machine-learning side of it. It feels like an implementation detail from the things that I care about. I asked the magic robot in a box how to do a thing and it tells me, or ideally does it for me. One of the moments in which I felt the dumbest in recent memory has been when I first started down the DeepRacer, “Oh, you just got one. Now, here's how to do it. Step one, open up this console. Good. Nice job. Step two”—and it was, basically get a PhD in machine learning concepts from Berkeley and then come back. Which is a slight exaggeration, but not by much.It feels it is, on some level—it's a daunting field, where there's an awful lot of terms of art being bandied around, there's a lot that needs to be explained in particular ways, and it's very different—at least from my perspective—on virtually any other cloud service offering. And that might very well be a result of my own background. But using the magic thing, like, CodeWhisperer that suggests code that I want to complete is great. Build something like CodeWhisperer, I'm tapping out by the end of that sentence.Valerie: Yeah. I mean, the question in there is, you know, how do we make sure that our learners know how to leverage CodeWhisperer, how to leverage Bedrock, how to leverage SageMaker, and how to leverage Greengrass, right, to build models that I think are going to be really experientially sound but also super innovative? And so, us getting that learning into education early and making sure that learners who are being educated, whether they are currently in jobs and are being re-skilled or they're coming up through traditional or non-traditional educational institutions, have access to all of these services that can help them do innovative things is something that we're really committed to doing. And we've been doing it for a long time. I may think you know that, right?So, Greengrass and SageMaker and all of the AI and ML tools have been around for a long period of time. Bedrock, CodeWhisperer, other services that AWS will continue to launch to support generative AI models, of course, are going to be completely available not just to users, but also for learners who want to re-skill, up-skill, and to skill on generative AI models.Corey: One last area I want to get into is a criticism, or at least an observation I've been making for a while about Kubernetes, but it could easily be extended to cloud in general, which is that, at least today, as things stand—this is starting to change, finally—running Kubernetes in production is challenging and fraught and requires a variety of skills and a fair bit of experience having done this previously. Before the last year or so of weird market behavior, if you had Kubernetes in production experience, you could relatively easily command a couple $100,000 a year in terms of salary. Now, as companies are embracing modern technologies and the rest, I'm wondering how they're approaching the problem of up-leveling their existing staff from two sides. The first is that no matter how much training and how much you wind up giving a lot of those folks, some of them either will not be capable or will not have the desire to learn the new thing. And secondly, once you get those people there, how do you keep them from effectively going down the street with that brand new shiny skill set for, effectively, three times what they were making previously, now that they have those skills that are in wild demand across the board?Because that's simply not sustainable for a huge swath of companies out there for whom they're not technology companies, they just use technology to do the thing that their business does. It feels like everything is becoming very expensive in a personnel perspective if you're not careful. You obviously talk to governments who are famously not known for paying absolute top-of-market figures for basically any sort of talent—for obvious reasons—but also companies for whom the bottom line matters incredibly. How do you square that circle?Valerie: There's a lot in that circle, so I'll talk about a specific, and then I'll talk about what we're also doing to help learners get that experience. So, you talked specifically about Kubernetes, but that could be extracted, as you said, to a lot of other different areas, including cyber, right? So, when we talk about somebody with an expertise in cybersecurity, it's very unlikely that a new learner coming out of university is going to be as appealing to an employer than somebody who has two to three years of experience. And so, how do we close that gap of experience—in either of those two examples—to make sure that learners have an on-ramp to new positions and new career opportunities? So, the first answer I'll give you is with some of our largest systems integrators, one of which is Tata Consulting Services, who is actually using AWS education programs to upskill its employees internally and has upskilled 19,000 of its employees using education programs including AWS Educate, to make sure that their group of consultants has absolutely the latest set of skills.And so, we're seeing that across the board; most of our, if not all of our customers, are looking at training to make sure that they can train not only their internal tech teams and their early-career talent coming in, but they can also train back office to understand what the next generation of technology is going to mean. And so, for instance, one of our largest customers, a telco provider, has asked us to provide modules for their HR teams because without understanding what AI and ML is, what it does, and what how to look for it, they might not be able to then, you know, extract the right sets of talent that they need to bring into the organization. So, we're seeing this training requirement across the business and not just in technical requirements. But you know, bridging that gap with early-career learners, I think is really important too. And so, we are experimenting, especially at places like Miami Dade College and City University of New York with virtual internships so that we can provide early-career learners with experiential learning that then they can bring to employers as proof that they have actually done the thing that they've said that they can demonstrate that they can do.And so, companies like Parker Dewey and Riipen and Forage and virtual internships are offering those experiences online so that our learners have the opportunity to then prove what they say that they can do. So, there's lots of ways that we can go about making sure learners have that broad base of learning and that they can apply it. And I'll tell you one more thing, and that's retention. And we find that when learners approach their employer with an internship or an apprenticeship, that their stickiness with that employer because they understand the culture, they understand the project work, they've been mentored, they've been sponsored, that they're stickiness within those employers it's actually far greater than if they came and went. And so, it's important and incumbent on employers, I think, to build that strong connective tissue with their early-skilled learners—and their upskilled learners—to make sure that the skills don't leave the house, right? And that is all about making sure that the culture aligns with the skills aligns, with the project work, and that it continues to be interesting, whether you're a new learner or you're a re-skilled learner, to stay in-house.Corey: My last question for you—and I understand that this might be fairly loaded—but I can't even come up with a partial list that does it any justice to encapsulate the sheer number of educational programs that you have in flight for a variety of different folks. The details and nuances of these are not something that I store in RAM, so I find that it's very easy to talk about one of these things and wind up bleeding into another. How do you folks keep it all straight? And how should people think about it? Not to say that you are not people. How should people who do not work for AWS? There we go. We are all humans here. Please, go [laugh] ahead.Valerie: It's a good question. So, the way that I break it down—and by the way, you know, AWS is also part of Amazon, so you know, I understand the question. And we have a lot of offerings across Amazon and AWS. AWS education programs specifically, are five. And those five programs, I've mentioned a few today: AWS Academy, AWS Educate, AWS re/Start, GetIT, and Spark are free, no-fee programs that we offer both the community and our education providers to build curriculum to offer digitally, and cloud-based skills curriculum to learners.We have another product that I'm a huge fan of called Skill Builder. And Skill Builder is, as I mentioned before, an online educational platform that anybody can take advantage of the over 500 classes in the free tier. There's learning plans for a lot of different things, and some I think you'd be interested in, like cost optimization and, you know, financial modeling for cloud, and all kinds of other more technically-oriented free courses. And then if learners want to get more experience in a lab environment, or more detailed learning that would lead to, for instance a, you know, certification in solutions architecture, they can use the subscription model, which is very affordable and provides learners an opportunity to work within that platform. So, if I'm breaking it down, it really is, am I being educated and in a way that is more formalized or am I going to go and take these courses when I want them and when I need them, both in the free tier and the subscription tier.So, that's basically the differences between education programs and Skill Builder. But I would say that if people are working with AWS teams, they can also ask teams where is the best place to be able to avail themselves of education curriculum. And we're all passionate about this topic and all of us can point users in the right direction as well.Corey: I really want to thank you for taking the time to go through all the things that you folks are up to these days. If people want to learn more, where should they go?Valerie: So, the first destination, if they want cloud-based learning, is really to take a look at AWS training and certification programs, and so, easily to find on aws.com. I would also point our teams—if they're interested in the tech alliances and how we're formulating the tech alliances—towards a recent announcement between City University of New York, the New York Jobs CEO Council, and the New York Mayor's Office for more details about how we can help teams in the US and outside the US—we also have tech alliances underway in Egypt and Spain and other countries coming on board as well—to really, you know, earmark how government and educational institutions and employers can work together.And then lastly, if employers are listening to this, the one output to all of this is that you pointed out, and that's that our learners need hands-on learning and they need the on-ramp to internships, to apprenticeships, and jobs that really are promotional for, like, career talent. And so, it's incumbent, I think, on all of us to start looking at the next generation of learners, whether they come out of traditional or non-traditional means, and recognize that talent can live in a lot of different places. And we're very happy to help and happy to do that matchup. But I encourage employers to dig deeper there too.Corey: And we will, of course, put links to that in the show notes. Thank you so much for taking the time out of your day to speak with me about all this. I really appreciate it.Valerie: Thank you, Corey. It's always fun to talk to you.Corey: [laugh]. Valerie Singer, GM of Global Education at AWS. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with a comment telling me exactly which AWS service I should make my six-year-old learn about as my next step in punishing her.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
In this episode of Infrastructure Matters – Insider Edition, Steven Dickens is joined by AWS's Ajay Nair, General Manager, AWS Lambda, for a conversation focusing on the topic of serverless computing and its place within AWS's broader portfolio. Ajay explains that serverless has evolved from its initial definition as a way to run code without managing servers to a broader operational model focused on delivering value to customers without getting bogged down in managing infrastructure. He emphasizes that serverless allows customers to delegate outcomes like security, scale, performance, and availability to AWS experts, enabling them to focus on their unique business needs. Their discussion covers: Definition of Serverless: Serverless is an operational model that enables businesses to run or build applications without the need to manage low-level infrastructure. It allows customers to delegate infrastructure responsibilities to AWS, freeing them to concentrate on delivering value to their customers. AWS's Evolving Role: AWS has evolved to meet the diverse needs of its customers. Some customers require differentiated infrastructure and hardware, while others seek a more hands-off approach. AWS provides a spectrum of choices, from fully managed serverless services like Lambda to more hands-on options like EC2 instances, allowing customers to select what works best for their workloads. Benefits of Serverless: Customers adopting serverless benefit from lower total cost of ownership, elasticity, reliability, and speed. Serverless enables them to focus on innovation and faster delivery of applications, as AWS takes care of infrastructure management, performance optimization, and security. Serverless Across AWS's Portfolio: AWS is extending the serverless operational model across its entire portfolio, not just infrastructure. This includes databases (e.g., Redshift Serverless and DynamoDB), IoT services, machine learning platforms (e.g., SageMaker), and industry-specific solutions (e.g., healthcare). AWS aims to provide a range of serverless options to meet the needs of different application classes. Ajay Nair encourages customers to think "serverless first" for new development projects, emphasizing that serverless computing brings agility and cost efficiency to AWS users, allowing them to innovate faster and do less manual infrastructure management.
Want to help define the AI Engineer stack? Have opinions on the top tools, communities and builders? We're collaborating with friends at Amplify to launch the first State of AI Engineering survey! Please fill it out (and tell your friends)!If AI is so important, why is its software so bad?This was the motivating question for Chris Lattner as he reconnected with his product counterpart on Tensorflow, Tim Davis, and started working on a modular solution to the problem of sprawling, monolithic, fragmented platforms in AI development. They announced a $30m seed in 2022 and, following their successful double launch of Modular/Mojo
Join me in an engaging live stream as I sit down with Alexander Altenhofen, Director of Product and Technology at Deutsche Fussball Liga (DFL). We'll uncover the secrets behind Bundesliga's rise as the world's most digitally-driven football league.Dive deep into:• "Glass to Grass" Strategy: Explore the fascinating journey of content from stadium cameras to the screens of millions.• The AWS Partnership: Discover how more than 80 AWS services have been employed to revolutionize the Bundesliga experience for fans across the globe.• The Personalized Fan Experience: Delve into the league's use of AWS services like Lambda, Fargate, and Sagemaker to capture real-time "match facts" that offer deep insights into game analytics and player performances.• Cost Optimization: Hear firsthand from Altenhofen about DFL's mission to boost efficiency and reallocate budget for strategic innovation without compromising the user experience.• The Future: Get a sneak peek into DFL's future initiatives, from generative AI content creation to personalized user experiences tailored for each Bundesliga fan.Get ready for a behind-the-scenes look at the future of football (or soccer) and how technology is transforming the way we engage with sports.
AWS Morning Brief for the week of August 21, 2023 with Corey Quinn. Links: Corey is performing a live Q&A next month; submit your questions here! Amazon Polly launches new Gulf Arabic male NTTS voice AWS HealthOmics supports cross-account sharing of omics analytics stores New – Amazon EC2 M7a General Purpose Instances Powered by 4th Gen AMD EPYC Processors Amazon OpenSearch Serverless expands support for larger workloads and collections Reduce Lambda cold start times: migrate to AWS SDK for JavaScript v3 Architecting for Resilience in the cloud for critical railway systems How Amazon Shopping uses Amazon Rekognition Content Moderation to review harmful images in product reviews Zero-shot text classification with Amazon SageMaker JumpStart Build a multi-account access notification system with Amazon EventBridge Getting Started with CloudWatch agent and collectd Cost considerations and common options for AWS Network Firewall log management Addressing gender inequity in the technology industry
Welcome episode 221 of The Cloud Pod podcast - where the forecast is always cloudy! This week your hosts, Justin, Jonathan, Ryan, and Matthew look at some of the announcements from AWS Summit, as well as try to predict the future - probably incorrectly - about what's in store at Next 2023. Plus, we talk more about the storm attack, SFTP connectors (and no, that isn't how you get to the Moscone Center for Next) Llama 2, Google Cloud Deploy and more! Titles we almost went with this week: Now You Too Can Get Ignored by Google Support via Mobile App The Tech Sector Apparently Believes Multi-Cloud is Great… We Hate You All. The cloud pod now wants all your HIPAA Data The Meta Llama is Spreading Everywhere The Cloud Pod Recursively Deploys Deploy A big thanks to this week's sponsor: Foghorn Consulting, provides top-notch cloud and DevOps engineers to the world's most innovative companies. Initiatives stalled because you have trouble hiring? Foghorn can be burning down your DevOps and Cloud backlogs as soon as next week.
In this special two-part episode with guest Stefano Marzani, the WW Tech Lead for Software Defined Vehicles from AWS, we discuss the importance of cloud technology as applied to vehicles. In this Part 2, we discuss Prototyping, Machine Learning/AI, and ADAS/Autonomous Driving. In, Part 1, published previously (LINK) we meet Stefano, learn about his important role at AWS and cover the first two topics: Data and Compute. Links referenced in the show: AWS Automotive homepage: www.aws.com/automotive AWS All Things Automotive Podcast: https://aws.amazon.com/architecture/all-in-series/all-things-automotive/?all-in-livestream-cards.sort-by=item.additionalFields.sortDate&all-in-livestream-cards.sort-order=desc&awsf.products=*all&awsf.tech-category=*all AWS Case study with BMW https://aws.amazon.com/solutions/case-studies/bmw-group-case-study/ Part 2 Chapters: 00:00 - Overview 00:40 - Topic 3: Prototyping in the cloud and “Environmental Parity” 02:20 - Running on Arm in the cloud and the vehicle 03:13 - Start software development in the cloud 03:55 - Difficulty of validation for automotive 04:49 - Empowering developers via the cloud 05:18 - Vehicle software will triple in the next five years 07:07 - “Shift left” to accelerate design cycles 07:54 - $40-50 Billion lost to automotive recalls 08:51 - Developing vehicle HMI with the cloud 10:57 - SDV can reduce cost of recalls 11:43 - Consolidation of ECUs 12:22 - Consolidating software effectively without bloat 13:45 - SOAFEE and standards for automotive 15:31 - Topic 4: Machine Learning and Analytics 16:45 - Data collection and annotation with SageMaker 17:22 - Other frameworks for ML in AWS 17:52 - In-vehicle validation of new models 18:30 - Finding edge cases 19:20 - Tools to manage ML workflows 21:55 - Topic 5: ADAS and Autonomous Driving 22:35 - Near-term benefits from ADAS 25:00 - Seeking edge cases 25:30 - Decomposing autonomous driving 27:30 - Collaboration between AWS and Sonatus 29:29 - Importance of using real ECUs 31:04 - Summary
Amazon SageMaker Multi-Model Endpoint (MME) is fully managed capability of SageMaker Inference that allows customers to deploy thousands of models on a single endpoint and save costs by sharing instances on which the endpoints run across all the models. Until recently, MME was only supported for machine learning (ML) models which run on CPU instances. Now, customers can use MME to deploy thousands of ML models on GPU based instances as well, and potentially save costs by 90%. MME dynamically loads and unloads models from GPU memory based on incoming traffic to the endpoint. Customers save cost with MME as the GPU instances are shared by thousands of models. Customers can run ML models from multiple ML frameworks including PyTorch, TensorFlow, XGBoost, and ONNX. Customers can get started by using the NVIDIA Triton™ Inference Server and deploy models on SageMaker's GPU instances in “multi-model“ mode. Once the MME is created, customers specify the ML model from which they want to obtain inference while invoking the endpoint. Multi Model Endpoints for GPU is available in all AWS regions where Amazon SageMaker is available. To learn more checkout: Our launch blog: https://go.aws/3NwtJyh Amazon SageMaker website: https://go.aws/44uCdNr
Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training Test & Code Podcast Patreon Supporters Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Brian #1: Plumbum: Shell Combinators and More Suggested by Henry Schreiner last week. (Also, thanks Michael for the awesome search tool on PythonBytes.fm that includes transcripts, so I can find stuff discussed and not just stuff listed in the show notes.) Plumbum is “ a small yet feature-rich library for shell script-like programs in Python. The motto of the library is “Never write shell scripts again”, and thus it attempts to mimic the shell syntax (shell combinators) where it makes sense, while keeping it all Pythonic and cross-platform.” Supports local commands piping redirection working directory changes in a with block. So cool. lots more fun features Michael #2: Our plan for Python 3.13 The big difference is that we have now finished the foundational work that we need: Low impact monitoring (PEP 669) is implemented. The bytecode compiler is a much better state. The interpreter generator is working. Experiments on the register machine are complete. We have a viable approach to create a low-overhead maintainable machine code generator, based on copy-and-patch. We plan three parallelizable pieces of work for 3.13: The tier 2 optimizer Enabling subinterpreters from Python code (PEP 554). Memory management Details on superblocks Brian #3: Some blogging myths Julia Evans myths (more info of each in the blog post): you need to be original you need to be an expert posts need to be 100% correct writing boring posts is bad you need to explain every concept page views matter more material is always better everyone should blog I'd add Write posts to help yourself remember something. Write posts to help future prospective employers know what topics you care about. You know when you find a post that is outdated and now wrong, and the code doesn't work, but the topic is interesting to you. Go ahead and try to write a better post with code that works. Michael #4: Jupyter AI A generative AI extension for JupyterLab An %%ai magic that turns the Jupyter notebook into a reproducible generative AI playground. This works anywhere the IPython kernel runs (JupyterLab, Jupyter Notebook, Google Colab, VSCode, etc.). A native chat UI in JupyterLab that enables you to work with generative AI as a conversational assistant. Support for a wide range of generative model providers and models (AI21, Anthropic, Cohere, Hugging Face, OpenAI, SageMaker, etc.). Official project from Jupyter Provides code insights Debug failing code Provides a general interface for interaction and experimentation with currently available LLMs Lets you collaborate with peers and an Al in JupyterLab Lets you ask questions about local files Video presentation: David Qiu - Jupyter AI — Bringing Generative AI to Jupyter | PyData Seattle 2023 Extras Brian: Textual has some fun releases recently Textualize youtube channel with 3 tutorials so far trogon to turn Click based command line apps into TUIs video example of it working with sqlite-utils. Python in VSCode June Release includes revamped test discovery and execution. You have to turn it on though, as the changes are experimental: "python.experiments.optInto": [ "pythonTestAdapter", ] I just turned it on, so I haven't formed an opinion yet. Michael: Michael's take on the MacBook Air 15” (black one) Joke: Phishing
Welcome to the newest episode of The Cloud Pod podcast! Justin, Ryan, Jonathan, Matthew are your hosts this week as we discuss all things cloud and AI, as well as Amazon Detective, SageMaker, AWS Documentation, and Google Workstation. Titles we almost went with (and there's a lot this week)
Welcome to the newest episode of The Cloud Pod podcast! Justin, Ryan, Jonathan, Matthew are your hosts this week. Join us as we discuss all things cloud, AI, the upcoming Google AI Conference, AWS Console, and Duet AI for Google cloud. Titles we almost went with this week:
Dominic Holt is CEO of harpoon, a drag-and-drop Kubernetes tool for deploying any software in seconds. Victoria talks to Dominic about commoditizing DevOps as a capability, coming up with the idea for drag and drop just thinking through how he could do these things in a visual and intuitive way, and using Kubernetes as a base for Harpoon. Harpoon (https://www.harpoon.io/) Follow Harpoon on Facebook (https://www.facebook.com/harpothewhale/), or LinkedIn (https://www.linkedin.com/company/harpooncorp/). Follow Dominic Holt on LinkedIn (https://www.linkedin.com/in/dominicholt/) or Twitter (https://twitter.com/xReapz). Follow thoughtbot on Twitter (https://twitter.com/thoughtbot) or LinkedIn (https://www.linkedin.com/company/150727/). Become a Sponsor (https://thoughtbot.com/sponsorship) of Giant Robots! Transcript: VICTORIA: This is the Giant Robots Smashing Into Other Giant Robots Podcast, where we explore the design, development, and business of great products. I'm your host, Victoria Guido. And with me today is Dominic Holt, CEO of harpoon, a drag-and-drop Kubernetes tool for deploying any software in seconds. Dominic, thank you for joining me. DOMINIC: Yeah, of course. Thanks for having me, Victoria. VICTORIA: Yes, I'm really excited to talk all about what Kubernetes is. And I have Joe Ferris, the CTO of thoughtbot, here with me as well to help me in that process. JOE: Hello. VICTORIA: Excellent. Okay, so, Dominic, why don't you just tell me how it all got started? What led you to start harpoon? DOMINIC: I got into the DevOps space fairly early. It was, I don't know, probably 2012 timeframe, which sounds like not that long ago. But, I mean, DevOps is also still a baby. So I have a software background. And I was starting to figure out how to do the continuous; I guess, automated way of standing up cloud infrastructure for Lockheed Martin at the time because people didn't know how to do that. There weren't a lot of tools available, and nobody knew what DevOps was. And if you said it to somebody, they would have slapped you. VICTORIA: Aggressive. [laughs] DOMINIC: [laughs] Maybe not, maybe not. Maybe they'd be nicer about it. But anyway, nobody knew what DevOps was because it wasn't coined yet. And I started realizing that this was not some system administration voodoo. It was just common sense from a software development standpoint. And I ended up leaving Lockheed shortly thereafter and going and working for a small business here in San Diego. And I said, I have no idea what any of this stuff is, but we're going to do it because, in a few years, everybody's going to be doing it because it's common sense. So we did. We grew quite a large practice in consulting and DevOps, among other things. And predominantly, I was working with the U.S. Navy at the time, and they needed a standardized way to deploy software to aircraft carriers and destroyers, the ships out there in the ocean. And so, I came up with a design for them that used Kubernetes. And we built a pipeline, a CI/CD pipeline, to automatically deploy software from the cloud to Navy ships out in the ocean on top of Kubernetes. And everything worked great. And it was there, and we tested it. But at the end of the day, handing over the maintenance, what we call day two ops, proved to be troubling. And it never quite made it onto the ships in the way that we wanted. So after that, I did a bunch of consulting with other groups in the Navy, and the Air Force, and Space Force, and all kinds of different groups across the government. And I also started consulting in commercial, fortune 500, startups, everything. And I just saw that this problem was really pervasive, handling the day two operations. You get everything up and running, but then maintaining it after that was just complicated for people because all of the DevOps implementations are snowflakes. So if you go from Company A to Company B, they look nothing alike. And they may have a lot to do with somebody named Jim or Frank or Bob and how they thought was the best way to do it. And so, running a DevOps consultancy myself, I just knew how hard it was to find the talent, and how expensive they were, and how hard it was to keep them because everyone else was trying to hire my talent all the time. And I just thought to myself, all of this is completely untenable. Somebody is going to commoditize DevOps as a capability. And what would that look like? VICTORIA: Right. I'm familiar with the demand for people who know how to build the infrastructure and systems for deploying and running software. [laughs] And I like how you first talked about DevOps, just it being common sense. And I remember feeling that way when I went to my first DevOps DC meetup. I was like, oh, this is how you're supposed to build teams and organizations in a way to run things efficiently and apply those principles from building software to managing your infrastructure. DOMINIC: Yeah. Well, I had lived the life of an enterprise software developer for quite a while before then. And I had gone through that whole process they talk about in all of DevOps bibles about why it is we're doing this, where the software development team would have their nice, fancy dev laptops. And the operations team with the pagers or whatever would be the ones managing the servers. And the software developers were never really sure exactly how it was going to work in production, but were like; I'm just going to throw it over the fence and see what the ops people do. And inevitably, the ops people would call us very angrily, and they would say, "Your software doesn't work." And then, of course, we would say that the ops people are all crazy because it works just fine here on my laptop, and they just don't know what they're doing. And, I mean, we would just fight back and forth about this for six months until somebody figured out that we were running the wrong version of some dependency in the software on the ops side, and that's why it didn't work. So that process is just crazy, and nobody in their right mind would want to go through it if they could avoid it. VICTORIA: Right. I'm sure Joe has had some stories from his time at thoughtbot. JOE: Yeah, certainly. I was interested by what you said about working with...I think it was Frank, and Ted, and Bob. I've definitely worked with all those people in their own snowflakes. And one of the things that drew me to Kubernetes is that it was an attempt to standardize at least some of the approaches or at least provide anchor points for things like how you might implement networking, and routing, and so on. I'm interested to hear, you know, for a drag-and-drop solution, even though Kubernetes was meant to standardize a lot of things, there are a lot of different Kubernetes distributions. And I think there are still a lot of Kubernetes snowflakes. I'm curious how you manage to tackle that problem with a drag-and-drop solution to hit the different Kubernetes distributions out there. DOMINIC: Yeah, I mean, I think you nailed it, Joe. Standing up Kubernetes is a little bit complicated still these days. It's been made a lot easier by a lot of different companies, and products, and open-source software, and things like that. And so I see a lot of people getting up basic Kubernetes clusters these days. But then you look at companies like ARMO that are doing compliance scans and security scans on Kubernetes clusters, and they're making the claim that 100% of the Kubernetes clusters they scan are non-compliant [laughs] and have security issues. And so that just goes to show you all of the things that one has to know to be successful just to stand up a cluster in the first place. And even when I...like for a client or something, over the years, if I was standing up a Kubernetes cluster and a lot of it was automated, you know, we used Terraform and Ansible, and all the other best practices under the hood. A lot of the response I got back when we handed over a cluster to a client was, "Okay, now what?" There are still a lot of things you have to learn to maintain that cluster, keep it up to date, upgrade the underlying components of the cluster, deploy the software, configure the software, all those things. And can you learn these things? Absolutely. Like, they're not rocket science, but they're complicated. And it is a commitment that you have to make as an individual if you're going to become proficient in all of these things and managing your own cluster. And so we were just...we had done this so many times at different companies I had worked with, for different clients, and seeing how all of the different pieces work together and where clients were having problems and what really hung people up. And so I just started thinking to myself, how would you make that easier? How would you make that more available to the pizza guy or an 18-year-old with no formal training that's on a ship in the ocean? And that's why I came up with the idea for drag and drop, just thinking through how can I do these things in a visual way that is going to be intuitive for people? VICTORIA: Well, I have, obviously, a very thorough understanding of Kubernetes, [laughs] just kidding. But maybe explain a little bit more about to a founder why should they invest in this type of approach when they're building products? DOMINIC: So I think that's a great question. What I find these days is DevOps is almost a requirement to do business these days in some sort of nimble way. So you have to...whether you're a large enterprise or you're a garage startup, you need to be able to change your software to market forces, to stuff that's happening in the news, to your customers don't like something. So you want to change it to something else quickly or pivot because if something happens, you can get your day in the sun, or you can capitalize on something that's happening. And so the difficulty is I think a lot of people have an impression that DevOps scripts are sort of like a build once and forget type of thing, and it'll just work thereafter. But it's actually software, and I like to think of software as living organisms. You have to take care of them like they're people, almost because if you don't, they'll become brittle and unhealthy over time. If you have a child, you have to feed them probably multiple times a day, brush their teeth. You got to tuck them in at night. You have to be nice to them. You have to do all the things that you would do with a child. But with software as well, if you just take the quick route, and quick fix things, and hack, and take shortcuts, eventually, you're going to have a very unhealthy child on your hands, and they're going to have behavior problems. At the end of the day, you have all these DevOps scripts, and they can be quite complex together. And you have to take care of them like they're your own child. And the problem is you're also taking care of your software products like it's your child. And so now you're taking care of two children. And as somebody that has two children, I can tell you that things become much more complicated when two children are having behavioral problems than just one. And you're at the store, and it's very embarrassing. So I guess the point is that harpoon is a capability that can basically take care of your second child for you, which is your DevOps deployments. And then you can just focus on the one child that you, I mean, this is turning into a terrible analogy at this point. [laughter] But you should love all of your children equally. But, in this case, you're looking to take care of your products and get it out there, and harpoon is something that can take care of your DevOps software for you. VICTORIA: I agree. I think when your software or children are problematic, it's more than just embarrassing sometimes. It can create a lot of financial and legal liability as well. From your research, when you're building this product and, like, who's going to be interested in buying this thing, is that something that people are concerned about? DOMINIC: Yeah, absolutely. I mean, the fact that we can stand up your cluster for you, stand up all of your cloud infrastructure for you, and then dynamically generate all of the configuration as code as well, and how to open those things securely up to the network and control everything such that you're not going to accidentally do something that's really bad, can definitely help out a lot of people. The interest has been really overwhelming from so many different groups and organizations. We have people that are interested in the Department of Defense in both the U.S. and other countries. We have fortune 500 companies that see this as a pathway to accelerate digital transformation for legacy applications or even to use it as a sandbox, so people aren't bugging Frank, and Joe, and Bob, who run the Kubernetes clusters in production. We have startups who see it just as a way to skip over the whole DevOps thing and work on getting a product-market fit so that they have a production environment that just works out of the box. So it's been really interesting seeing all the different use cases people are using harpoon for and how it's helped them in some way get to some and realize some goal that they have. JOE: I'm curious if it's been a challenge as somebody managing the underlying infrastructure as sort of a plug-and-play thing. One experience I've had working more on the operations side of DevOps is that everything becomes your problem. Like, if the server misbehaves, if there's a database crash, whatever, certainly, that's your problem. But also, if the application is murdering your database, that becomes your problem. And it's really an application problem. But it surfaces visibly in the infrastructure when the CPU spikes and it stops responding to requests. And so, how do you navigate that agreement with your users? How do you balance what's your responsibility versus theirs to not kill the cluster? DOMINIC: One thing that's great about Kubernetes and why it's a great base for our product is that Kubernetes is really good at keeping things running. Certainly, there are catastrophic things that can happen, like an entire region of EC2 and Amazon Web Services goes down. And that is, obviously, if you have your clusters only running in that particular region, you're going to have a bad day. So there are things beyond our control. I mean, those things are also covered by the service-level agreement, the SLA with AWS, since you're using your own AWS account when you're utilizing harpoon. So it's like a hybrid SaaS where we deploy everything into your account, and you own it. And you can adjust those infrastructure things on your own as you'd like. So from that standpoint, you're kind of covered with your agreement with AWS as an example of a cloud service provider. And certainly, Kubernetes also kind of knows what to do in some of those instances where you have a container that is murdering everything. In a lot of cases, it can be configured to, you know, just die or go into a CrashLoopBackOff or something if it's just taking up all your resources in the cluster versus destroying your entire cluster in a great fireworks display. So we put some of those protections into the platform as well. But yeah, to your point, being an ops person is a difficult job because we're usually the ones [laughs] that get blamed for everything when something bad happens, even though sometimes it's the software team's fault or sometimes it's even just the infrastructure you're built on. Occasionally, AWS services and Google Cloud and Azure services do go down, and things happen. We've had instances, even during harpoon development, where we're testing harpoon late at night on AWS, and sometimes AWS does wonky things at night that people don't realize. It's not completely perfect capability. And we're like, oh, why does it only happen at 11:58 on Tuesdays? Oh, because AWS updates their servers during that time, and it slows down everything. It's still good to understand all the underlying components and how they work, and that could certainly help you regardless of if you use harpoon or not. But ultimately, we're just trying to make it easier for people. They can spend less time focusing on those things. We can help them with a lot of those problems that might occur, and they can focus on their software. VICTORIA: Great. I think that's...it's interesting to me to always hear about all the different challenges in managing operations of software. So I like that you're working on this space. It's clearly a space that needs more innovation, you know, we're working on it here at thoughtbot as well. Has there been anything in your, like, any theory that you had going into your initial research that when you talked to customers surprised you and caused you to change your direction? DOMINIC: Yeah. I mean, we run the gamut there. So we did a lot of early customer discovery to try to figure out who might be interested in this product. And so, our first thought was that startups would be the most interested in this product because they're building something new. They just want to get it out there. They want to build their MVP, and they just want to throw it on the internet and get it rolling and not have to worry about whether the software is up and down while they're doing a bunch of sales calls. Because really, during the MVP phase, if you're doing lean startup-style company development, then you really just want to be selling. You want to always be selling. And so we thought it would just be a no-brainer for startups. And we talked to a lot of startups, and some startups for sure thought it was valuable. But a lot of them were like, "Yeah, that's cool, but we don't care about DevOps. [chuckles] We don't care about anything. Like, I'll run it on my laptop if I have to. The only thing I care about is finding product-market fit and getting that first sale." And so, at least as far as the very first customers that we were looking for, they weren't the best fit. And then we went and talked to a bunch of mid-market companies because we just decided to go up to the next logical level. And so mid-market companies were very interested because a lot of them were starting to eyeball Kubernetes and maybe sort of migrate some of their capabilities over there. Maybe they had a little bit of ability to be a bit nimble, in that sense, versus some of the enterprise customers. And so they were very interested in it. But a lot of them were very risk averse, like, go find a bunch of enterprise customers that will buy it, and then we'll buy it. And so then we went to talk to the enterprise customers. And that was sort of like an eye-opening time for us because the enterprise customers just got it. They were like, "Yeah, I'm trying to migrate legacy capabilities we built 10 or 15 years ago to the cloud. We're trying to containerize everything and refactor our existing software. I got to redesign the user interface that was built ten years ago." And if somebody's got a DevOps easy button, then sign me up. I would like to participate because I can't spell Kubernetes yet, but I definitely know what it is, and I want to use it. So working with the enterprise customers was really great for us because it showed us what the appetite was in the market and who was going to immediately benefit from it. And then, ultimately, that rolls down to the mid-market companies. And maybe later-stage startups as well are starting to find a lot of value in the platform from, you know, have maybe started finding some product-market fit and care a little bit about whether people can access my software and it's maintainable and available. And so we can definitely help with that. VICTORIA: That's super interesting, and it aligns with my experience as well, coming from consulting companies and the federal government who are working on digital services, and DevOps, and agile, and all of those transformational activities. And so it's been five years, it looks like since you started harpoon. What advice would you give to yourself if you could travel back in time when you were first starting the project? DOMINIC: So I made lots of mistakes along the way. I'll inevitably make more. But when I first started building this thing, I wasn't even sure how it was going to work. Kubernetes can be a bit of a fickle beast, and it wasn't really built to have a drag-and-drop UI on top of it. And so there are lots of things that could go wrong, trust me, [laughs] I learned them. But building an initial prototype, like, the very base of can the capability work at all, came together pretty quickly. It was maybe three or four months of development during my nights and weekends. And building an enterprise scalable product took quite a bit longer. But once I had an initial capability, I was very excited because, again, I didn't even know if this was possible, certainly not five or six years ago. So I didn't even really want to raise a round or make money. I do know how venture capital works. So it wasn't even my expectation that people would want to give me money because all I had was an MVP and no product-market fit. And I had just thrown it together in three or four months. But I was just excited about it. I'm a software developer at heart, and technology excites me. And solving problems is kind of what gets me up in the morning. So I just called all the people I knew, a bunch of VCs, other people, and they're like, "Yeah, I would like to see that. Let's set up a time." And so I think maybe they interpreted that as, like, I want to do a pitch to you for money. [laughs] And I just proceeded to go to, like, this dog and pony show of showing a bunch of people this thing I built, and I thought they would just understand it and get what I was doing. And I just proceeded to get my ass handed to me over and over and over again. Like, "This isn't that great of a product. How much money are you making?" Blah, blah, blah, blah. I'm like, "No, no, you don't get it. I just started. It's just a prototype at this stage. It's not even a finished product." And they're like, "Well, you're definitely going to fail. [laughter] You're wasting your time. What are you even doing here?" And so that was...I like to think that I have thick skin, but that's hard to hear as an entrepreneur; just people don't get your vision. They don't understand what it is you're building and why it's going to be valuable to people. And it could be a long time before you get to a point where people can even understand what it is you're doing, and you just have to sort of stay the course and, I mean, I did. I went around on some rock somewhere and hung out in a tent on an island for a while. I just kept going. And you just got to pour all your heart and soul, and effort into building a product if you want to make it exist out there in the world. And a lot of people are not going to get it, but as long as you believe in it and you keep pushing, then maybe someday they will get it. For the first year after we had a working enterprise-grade product, we kind of did a soft launch. And we had a small set of customers. We had 8 to 10 people that were sort of testing it out and using it, things like that. We kind of went, you know, more gangbusters launch at the end of last year, and it was crazy. And then...what? I don't know, maybe 60 days since we did a more serious launch. And we have gone from our ten soft users to 2,000 users. VICTORIA: Wow. Well, that's great growth. And it sounds exciting that you have your team in place now. You're able to set yourself up for growth. Mid-Roll Ad: Are your engineers spending too much time on DevOps and maintenance issues when you need them on new features? We know maintaining your own servers can be costly and that it's easy for spending creep to sneak in when your team isn't looking. By delegating server management, maintenance, and security to thoughtbot and our network of service partners, you can get 24x7 support from our team of experts, all for less than the cost of one in-house engineer. Save time and money with our DevOps and Maintenance service. Find out more at: url tbot.io/devops VICTORIA: So now that you're getting more established, you're getting more customers, you have a team supporting you on the project; what parts of the DevOps culture do you feel like are really important to making a team that will continue to grow? DOMINIC: I've been an individual contributor for a long period of time. I was a first-level manager and managed people. At a very granular personal level, I've been a director, and a VP, and a CTO at a bunch of different places. And so all of those different roles and different companies that I've worked at have taught me a lot about people, and teams, and culture, and certainly about hiring. I think hiring is the absolute most important thing you can do in a company, and definitively in a software company. Because there are just certain people that are going to mesh well with your culture, and the people that do and that are driven and passionate about what they do, they're just going to drive your company forward. And so I just spend a lot of my time when we need to grow as a company, which happens here and there, really focusing on who is going to be the best next person to bring on to the company. And usually, I'm thinking about this far in advance because whenever we do need that person, I don't want to have to start thinking about it. I want to just know, like, it is Frank, it is Bob, it is Jamey, or Alex, or whoever else. Because it is...at a personal level, there has to be people who are very aligned with your visions, and your values, and your culture, and they care and are going to push the company forward. And if you're just hiring people with a quick coding interview and a 30-minute culture fit session, you're going to make a lot of hiring mistakes. You're going to find people who are just looking for a nine-to-five or things like that, and, I mean, there's nothing wrong with that. But in a startup especially, you really need people who buy into the vision and who are going to push the thing forward. And I'm looking for people who just care, like; they have an ownership mentality. Maybe in a different lifetime or a different part of their career, they'd be an entrepreneur at their own company. But you just give them stuff, and they're like, cool, this is mine. I'm going to take care of this. It's now my child. I will make sure that it grows up and it is healthy and goes to a good university. Those are the type of people that you want in your company, people that you would trust with your children. So those are the criteria for working at harpoon, I guess. VICTORIA: Yeah, that's good. So what does success look like in the next six months or even beyond the next five years? DOMINIC: I think it's still very early market for us. Certainly, we have an explosive growth of users using the platform, and that's really heartening to see. That's really awesome that people want to use the thing that you built. But again, there are so many companies out there and organizations that are still not even doing DevOps. They're just doing manual deployments, maintaining clusters manually, not using containers or Kubernetes. Not to say that you have to use these things and that they're a panacea, and they work in every sense because they don't. But obviously, there's been a major shift in the industry towards containers and container orchestration like Kubernetes. Even some of the serverless platforms that people like to use are actually backed by Kubernetes, so you see a major shift in that direction. But there are still so many different companies and organizations that, again, are still locked into legacy ways of doing things and manually doing things. There are companies that are trying to get their products off the ground, and they're looking for faster and easier, and cheaper ways to do that. And I think that's what's really exciting about harpoon is we can help these companies. We can help them be more successful. We can help them migrate to things that are more modern and agile. We can help them get their product off the ground faster or more reliably. And so that's kind of what excites me. But you know what? We do a lot of demos, you know, sales demos and things like that. And, really, we don't have PowerPoints. We're just like, cool, this is the app, and this is how you use it. And it is so simplistic to use, even though Kubernetes is quite complicated, that the demo goes pretty quick. We're talking five, six minutes if there are not a lot of questions. And we always get exactly the same response, whether somebody is not super familiar with Kubernetes or they are familiar with Kubernetes, and they've set up their own cluster. It's almost always, "Wow," and then a pause, and then "But how do I know it works?" [laughs] So there's going be a lot of work for us in educating people out there that there is an easier way to do DevOps now, that you can do drag and drop DevOps and dynamically generate all of your scripts and configuration, and open up networks, and deploy load balancers, and all the other things that you would need to do with Kubernetes, literally in a few minutes just dragging and dropping things. So there's going to be a lot of education that just goes into saying, "Hey, there's a new market, and this is what it is. And this is how it compares to the manual processes people are using out there. Here's how it compares to some of the other tools that are more incremental in nature." And trust, you know, over time, people are going to have to use the platform and see that it works and talk to other people and be like, yeah, I deployed my software on harpoon, and nothing terrible happened. Demons didn't come out of the walls, and my software kept running, and no meteors crashed in my house. So it's just going to take some time for us to really grow and build the education around that market to show that it's possible and that it exists, and it can be an option for you. VICTORIA: Right. I used to do a lot of intro to DevOps talks with Women Who Code and DevOps DC. And I would describe Kubernetes as a way to keep your kubes neat, and your kube is where your software lives. It's a little house that keeps the doors locked and things like that. Do you have another way to kind of explain what is Kubernetes? Like, how do you kind of even just get people started on what DevOps is? DOMINIC: I like to usually use the cattle story. [laughs] So, in DevOps, they have these concepts of immutable infrastructure or immutable architecture. And so when you have virtual machines, which is what people have been running on for quite a while, certainly some people still run on bare metal servers, but pretty much everybody's got on board with virtualization at this point, and so most software these days is at least running on virtual machines. And so the difficulty with virtual machines is, I mean, there's nothing wrong with them, but they're kind of like pets. They exist for long periods of time. They have what we call state drift, and that's just the changing of the data or the state of the virtual machine over time. And even if I were to kill off that virtual machine and start another one, it wouldn't be exactly the same one. It wouldn't be, you know, fluffy. It would be a clone of fluffy. And maybe it wouldn't have the same personality, and it wouldn't do exactly the same things. And sometimes that might be good; maybe fluffy was a terrible dog. But in other cases, you're like, oh crap, I needed that snowflake feature that Bob built three years ago. And Bob has been hit by a train, so people can't ask Bob anymore. And so what then really happens at these organizations is when the virtual machines start acting up, they don't kill them. They take them to the vet. They take care of them. They pet them. They tell them they're a good boy. And you have entire enterprises that are super dependent on these virtual machines staying alive. And so that's no way to run your business. And so that's one of the reasons why people started switching over to containers because the best practices in containers is to build software that's immutable. So if you destroy or kill one of your containers, you can start another one. And it should work exactly the same as before, and that's because when you build your containers, you can't change them unless you rebuild them. I mean, there are ways to do it, but people will wave their finger angrily at you if you try to do that because it's not a best practice. So, at the end of the day, virtual machines are pets, and the containers are cattle. And when containers start acting up, you kill them. And you take them to the meat factory, and you go get another one. And so this provides a ton of value from a software development and an ops perspective because anytime you have a problem, you just kill your containers, start new ones, and you're off to the races again. And it significantly reduces the troubleshooting time when you're having problems. Obviously, you probably want to log things and check into things; why did that happen? So that maybe you can go make a fix in your software. But at the end of the day, you want to keep your ops running. Containers are a great way to do that without having to be up at midnight figuring out why the virtual machine is acting up. And so the difficulty with cattle is they like to graze and wander and break through fences and things like that. And mostly, when you have an enterprise software application or even just a startup with an MVP, you probably have multiple containers that you need to run and build this application. And so you need somebody to orchestrate. You need somebody to wrangle your containers. And so Kubernetes, I like to say, is like cowboys. Like, they're the ones that wrangle your cattle and make sure they're all going in the right direction and doing the right things. And so it just makes natural sense. Like, if you have a bunch of cattle, you need somebody to take care of them, so that's what Kubernetes does. JOE: Yeah, just to add to that, one of the things I really like about Kubernetes is that it's declarative versus prescriptive. So if you look at a lot of the older DevOps tools like Chef, things like that, you're effectively telling the machine what you want it to do to end up with a particular deployment. With containers, you'd say, start this number of containers on this node. Start this number of containers on this node. Add a virtual machine with these. Whereas with Kubernetes, you state the way you would like the world to be, and then Kubernetes' job is to make the world like that. So from a developer's perspective, when they're deploying things, they don't actually usually want to think in terms of the steps involved between I push this code, and somebody can use it. What they want us to say is I want this code running in containers, and I would like it to have this configuration. I would like it to have these ports exposed. And I love that Kubernetes, to a pretty good extent, abstracts away all of those steps and just lets you say what you want. DOMINIC: Yeah, that's a lot of the power in Kubernetes. You just say, "This is what I want, and then make it so." And Kubernetes goes out and figures out where it's going to schedule your container on what node or server if it dies. Kubernetes is like; I'm pretty sure you wanted one of those running, so I'm going to run it again. It just handles a lot of those things for you that previously you would need somebody with a pager to call to fix. And Kubernetes is automating a lot of that deployment and maintenance for you. VICTORIA: Right. And it seems like there's the movement to really coalesce around Kubernetes. I wonder if either of you can speak to the healthiness of the ecosystem for Kubernetes, which is open source, and why you chose to build on it. DOMINIC: So there was sort of a bit of a container orchestration war for a while. There was a bunch of different options. And I'm not saying that a lot of them weren't good options. Like, Docker built a capability called Swarm, and it's fairly simple to use and pretty powerful. But there was just a lot of backing from the open-source community behind Kubernetes when Google made it an open-source project. There were other things sort of like Kubernetes but not really like Mesos. And they all had like this huge bloodbath to see who was going to be the winner. And I just feel like Kubernetes kind of pulled ahead. It was a really smart move from Google to make it open-source and get the open-source community's buy-in to use. And it just became a very powerful but complex tool for running your software in production. Google had been using some form of that called Google Borg for a number of years prior. And I'm guessing they're still quite a bit different. But that's how it kind of came about. Do you have anything to add, Joe? JOE: I'd say that I judge the winner or the health of an ecosystem by the health of the off-the-shelf and open-source software that can run on that system. So Kubernetes is a thing that you use yourself. You build things to run on it. But also, you can pick and choose many things from the community that people have already built. And there is a huge open-source community for components that run on Kubernetes, everything from CI/CD to managing databases to doing interesting deployment styles like canary deployments. That's really healthy. It just didn't happen with the other systems like Swarm or Nomad was another one. And most of the other companies that I saw doing container orchestration eventually just changed to doing their flavor of Kubernetes, like Rancher. I forget what their original platform was called. But their whole thing was based on that cattle metaphor. [chuckles] And they took a pretty similar approach to containers. And now, if you ask somebody what Rancher is, they'll tell you it's a managed Kubernetes platform. DOMINIC: Yeah, I think it's called Longhorn, so they very much have the cattle theme in there. I mean, they're literally called Rancher, so there you go. But yeah, at the end of the day, something is going to come after Kubernetes as well. And I like to think that it's not so much a matter of what's going to be next? Is there going to be something beyond containers or container orchestrators like Kubernetes? I just think there are going to be more and more layers of abstraction because, at the end of the day, look at the advent of things like ChatGPT and generative AI. People just want to get their jobs done more efficiently and faster. And in software, there's just a lot of time and money that goes into getting software running and keeping it running, and that's why Kubernetes makes sense. But then there's also a lot of time that goes into Kubernetes. And so we think that harpoon is just sort of the natural next layer of abstraction that's going to live on as the next thing. So if 15 years ago I told you I was going to build a web application and I was going to go run it in the cloud, maybe you would have said, "You're crazy, Dom. Like, how could you trust this guy, Jeff, with all your software? What if he is going to steal it? And what if he can't run a data center? What then?" And now, if I told you I was going to go build a data center because I want to build a web application, you would look at me like I was a pariah and that I was not fit to run a company and that I should just use the cloud. So I think it's the same process. We're going to go with containers and Kubernetes. And software deployment, in general, is going to be an abstraction layer that lives on top of all that because software developers and companies just want to push out good software to end users. And any sort of way to make that more efficient or more fun is going to be embraced eventually. JOE: Yeah, I agree with that. I hear people ask, "What are you going to do when Kubernetes is obsolete?" pretty often. And I think it's achieved enough momentum that it won't be. I think it'll be what else is built on top of Kubernetes? Like, people talk about servers like they're obsolete, but they're not; there are still servers. People are just running virtual machines on them. And virtual machines are not obsolete. We'll just run containers on them. So once we get beyond the layer of worrying about containers, you'll still need a container platform. And based on the momentum it's achieved, I think that platform is going to be Kubernetes. VICTORIA: Technology never dies. You just get more different types of technology. [laughs] Usually, that's my philosophy on that. DOMINIC: Yeah, I mean, there's never been a better time to be a software developer, especially if you're an entrepreneur at the same time, because that's what happens over time. Like, what we're achieving with web applications today and what you can push out to the internet and kind of judge if there's a market for would have been unimaginable 20 years ago because, again, you would have had to build a data center. [laughs] And who has a bunch of tens of millions of dollars sitting around to do that? So now you can just use existing software from other people and glue it together. And you can use the cloud and deploy your software and get it out to the masses and scale it. And it's an amazing time to be alive and to be building things for people. VICTORIA: Right. And you mentioned a few things like artificial intelligence before, and there are a lot of people innovating in that space, which requires a lot of data, and networking, and security, and other types of things that you want to think about if you're trying to invent that kind of product. Which brings me to a question I have around, you know when you're adding that abstraction layer to these Kubernetes clusters, how does that factor into security compliance frameworks? And does that even come up with the customers who want to use your product? DOMINIC: Yeah. I mean, definitely, people are concerned about security. When we do infrastructure as code for your virtual infrastructure that's running your Kubernetes cluster that we deploy for you, certainly, we're using best practices from a security standpoint. We do all the same things. If we're building out custom scripts for some clients somewhere, we'd want it to be secure. And we want to lock down different aspects of components that we're building and not just expose all the ports on maybe a load balancer and things like that. So by default, we try to build in as much security as we can. It's pragmatic. I think ultimately we'll probably go down to the path of SOC 2 compliance, and then anything that goes on top of a harpoon cluster or that is deployed with harpoon will be SOC 2 compliant to a large degree. And so yeah, I mean, security is definitely a part of it. We're currently building in a lot of other security features, too, like role-based access control and zero trust, which we'll have pretty soon here. So, yeah, if you want to build your software and get it deployed, you want it to be scalable, and you also want it to be secure. There are so many ilities that come into deploying software. But to your point, even on the artificial intelligence side, people are looking for easier ways to abstract away the complexity. Like, if I told you to go write me a blog post with either ChatGPT or go build your own generative AI model and use that, then you're probably going to be like, yeah, I'll just go to the OpenAI website. I'll be back in a minute. And that's why also you see things like SageMaker from AWS. People want abstraction layers. They want easier ways to do things. And it's not just in DevOps; it's in artificial intelligence and machine learning. That's why drag-and-drop editors are becoming more popular in building web applications mobile applications. I think all of this software development stuff is going to be really accessible to a much larger community in the near future. VICTORIA: Yeah, wonderful. That's great. And so, Dominic, any final takeaways for our listeners today? DOMINIC: Definitely, if you have interest in how either harpoon or Kubernetes, in general, might be applicable to you and your company, we're a bunch of friendly people over here. Even if you're not quite sure how to get started or you need advice on stuff, definitely go hit us up on our website or hit up support at harpoon.io, and send us a message. We're very open to helping people because, again, what we're really trying to do is make this more accessible to more people and make more people successful with this technology. So if we have to get on a bunch of phone calls or come sit next to you or do whatever else, we're here to be a resource to the community, and harpoon is for you to get started. So don't feel like you need a bunch of money to get started deploying with Kubernetes and using the platform. VICTORIA: That's a great note to end on. So you can subscribe to the show and find notes along with a complete transcript for this episode at giantrobots.fm. If you have questions or comments, email us at hosts@giantrobots.fm. And you can find me on Twitter @victori_ousg. This podcast is brought to you by thoughtbot and produced and edited by Mandy Moore. Thank you for listening. See you next time. ANNOUNCER: This podcast is brought to you by thoughtbot, your expert strategy, design, development, and product management partner. We bring digital products from idea to success and teach you how because we care. Learn more at thoughtbot.com. Special Guest: Dominic Holt.
Edo Liberty, who helped create SageMaker while at Amazon's AI labs, is building out long-term memory for large language models using vector embeddings at Pinecone.io. Edo says it solves the problem of hallucinations in large models like ChatGPT, which as we all know is their primary drawback. Edo's approach is simple: Convert authoritative and trusted information into vectors and load them into a vector database for large language models to refer to, allowing them to give accurate answers. We have a new sponsor this week: NetSuite by Oracle, a cloud-based enterprise resource planning software to help businesses of any size manage their financials, operations, and customer relationships in a single platform. They've just rolled out a terrific offer: you can defer payments for a full NetSuite implementation for six months. That's no payment and no interest for six months, and you can take advantage of this special financing offer today at netsuite.com/EYEONAI
Emily Gorcenski, Data & AI Service Line Lead at Thoughtworks, joins Corey on Screaming in the Cloud to discuss how big data is changing our lives - both for the better, and the challenges that come with it. Emily explains how data is only important if you know what to do with it and have a plan to work with it, and why it's crucial to understand the use-by date on your data. Corey and Emily also discuss how big data problems aren't universal problems for the rest of the data community, how to address the ethics around AI, and the barriers to entry when pursuing a career in data. About EmilyEmily Gorcenski is a principal data scientist and the Data & AI Service Line Lead of ThoughtWorks Germany. Her background in computational mathematics and control systems engineering has given her the opportunity to work on data analysis and signal processing problems from a variety of complex and data intensive industries. In addition, she is a renowned data activist and has contributed to award-winning journalism through her use of data to combat extremist violence and terrorism. The opinions expressed are solely her own.Links Referenced: ThoughtWorks: https://www.thoughtworks.com/ Personal website: https://emilygorcenski.com Twitter: https://twitter.com/EmilyGorcenski Mastodon: https://mastodon.green/@emilygorcenski@indieweb.social TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Emily Gorcenski, who is the Data and AI Service Line Lead over at ThoughtWorks. Emily, thank you so much for joining me today. I appreciate it.Emily: Thank you for having me. I'm happy to be here.Corey: What is it you do, exactly? Take it away.Emily: Yeah, so I run the data side of our business at ThoughtWorks, Germany. That means data engineering work, data platform work, data science work. I'm a data scientist by training. And you know, we're a consulting company, so I'm working with clients and trying to help them through the, sort of, morphing landscape that data is these days. You know, should we be migrating to the cloud with our data? What can we migrate to the cloud with our data? Where should we be doing with our data scientists and how do we make our data analysts' lives easier? So, it's a lot of questions like that and trying to figure out the strategy and all of those things.Corey: You might be one of the most perfectly positioned people to ask this question to because one of the challenges that I've run into consistently and persistently—because I watch a lot of AWS keynotes—is that they always come up with the same talking point, that data is effectively the modern gold. And data is what unlocks value to your busin—“Every business agrees,” because someone who's dressed in what they think is a nice suit on stage is saying that it's, “Okay, you're trying to sell me something. What's the deal here?” Then I check my email and I discover that Amazon has sent me the same email about the same problem for every region I've deployed things to in AWS. And, “Oh, you deploy this to one of the Japanese regions. We're going to send that to you in Japanese as a result.”And it's like, okay, for a company that says data is important, they have no idea who any of their customers are at this point, is that is the takeaway here. How real is, “Data is important,” versus, “We charge by the gigabyte so you should save all of your data and then run expensive things on top of it.”Emily: I think data is very important, if you know what you're going to do with it and if you have a plan for how to work with it. I think if you look at the history of computing, of technology, if you go back 20 years to maybe the early days of the big data era, right? Everyone's like, “Oh, we've got big data. Data is going to be big.” And for some reason, we never questioned why, like, we were thinking that the ‘big' in ‘big data' meant big is in volume and not ‘big' as in ‘big pharma.'This sort of revolution never really happened for most companies. Sure, some companies got a lot of value from the, sort of, data mining and just gather everything and collect everything and if you hit it with a big computational hammer, insights will come out and somehow there's insights will make you money through magic. The reality is much more prosaic. If you want to make money with data, you have to have a plan for what you're going to do with data. You have to know what you're looking for and you have to know exactly what you're going to get when you look at your data and when you try to answer questions with it.And so, when we see somebody like Amazon not being able to correlate that the fact that you're the account owner for all of these different accounts and that the language should be English and all of these things, that's part of the operational problem because it's annoying, to try to do joins across multiple tables in multiple regions and all of those things, but it's also part—you know, nobody has figured out how this adds value for them to do that, right? There's a part of it where it's like, this is just professionalism, but there's a part of it, where it's also like… whatever. You've got Google Translate. Figure out yourself. We're just going to get through it.I think that… as time has evolved from the initial waves of the big data era into the data science era, and now we're in, you know, all sorts of different architectures and principles and all of these things, most companies still haven't figured out what to do with data, right? They're still investing a ton of money to answer the same analytics questions that they were answering 20 years ago. And for me, I think that's a disappointment in some regards because we do have better tools now. We can do so many more interesting things if you give people the opportunity.Corey: One of the things that always seemed a little odd was, back when I wielded root credentials in anger—anger,' of course, being my name for the production environment, as opposed to, “Theory,” which is what I call staging because it works in theory, but not in production. I digress—it always felt like I was getting constant pushback from folks of, “You can't delete that data. It's incredibly important because one day, we're going to find a way to unlock the magic of it.” And it's, “These are web server logs that are 15 years old, and 98% of them by volume are load balancer health checks because it turns out that back in those days, baby seals got more hits than our website did, so that's not really a thing that we wind up—that's going to add much value to it.” And then from my perspective, at least, given that I tend to live, eat, sleep, breathe cloud these days, AWS did something that was refreshingly customer-obsessed when they came out with Glacier Deep Archive.Because the economics of that are if you want to store a petabyte of data, with a 12-hour latency on request for things like archival logs and whatnot, it's $1,000 a month per petabyte, which is okay, you have now hit a price point where it is no longer worth my time to argue with you. We're just not going to delete anything ever again. Problem solved. Then came GDPR, which is neither here nor there and we actually want to get rid of those things for a variety of excellent legal reasons. And the dance continues.But my argument against getting rid of data because it's super expensive no longer holds water in the way that it wants did for anything remotely resembling a reasonable amount of data. Then again, that's getting reinvented all the time. I used to be very, I guess we'll call it, I guess, a data minimalist. I don't want to store a bunch of data, mostly because I'm not a data person. I am very bad thinking in that way.I consider SQL to be the chests of the programming world and I'm not particularly great at it. And I also unlucky and have an aura, so if I destroy a bunch of stateless web servers, okay, we can all laugh about that, but let's keep me the hell away from the data warehouse if we still want a company tomorrow morning. And that was sort of my experience. And I understand my bias in that direction. But I'm starting to see magic get unlocked.Emily: Yeah, I think, you know, you said earlier, there's, like, this mindset, like, data is the new gold or data is new oil or whatever. And I think it's actually more true that data is the new milk, right? It goes bad if you don't use it, you know, before a certain point in time. And at a certain point in time, it's not going to be very offensive if you just leave it locked in the jug, but as soon as you try to open it, you're going to have a lot of problems. Data is very, very cheap to store these days. It's very easy to hold data; it's very expensive to process data.And I think that's where the shift has gone, right? There's sort of this, like, Oracle DBA legacy of, like, “Don't let the software developers touch the prod database.” And they've kind of kept their, like, arcane witchcraft to themselves, and that mindset has persisted. But now it's sort of shifted into all of these other architectural patterns that are just abstractions on top of this, don't let the software engineers touch the data store, right? So, we have these, like, streaming-first architectures, which are great. They're great for software devs. They're great for software devs. And they're great for data engineers who like to play with big powerful technology.They're terrible if you want to answer a question, like, “How many customers that I have yesterday?” And these are the things that I think are some of the central challenges, right? A Kappa architecture—you know, streaming-first architecture—is amazing if you want to improve your application developer throughput. And it's amazing if you want to build real-time analytics or streaming analytics into your platform. But it's terrible if you want your data lake to be navigable. It's terrible if you want to find the right data that makes sense to do the more complex things. And it becomes very expensive to try to process it.Corey: One of the problems I think I have that is that if I take a look at the data volumes that I work with in my day-to-day job, I'm dealing with AWS billing data as spit out by the AWS billing system. And there isn't really a big data problem here. If you take a look at some of the larger clients, okay, maybe I'm trying to consume a CSV that's ten gigabytes. Yes, Excel is going to violently scream itself to death if I try to wind up loading it there, and then my computer smells like burning metal all afternoon. But if it fits in RAM, it doesn't really feel like it's a big data problem, on some level.And it just feels that when I look at the landscape of all the different tools you can use for things like this, they just feel like it's more or less, hmm, “I have a loose thread on my shirt. Could you pass me that chainsaw for a second?” It just seems like stupendous overkill for anything that I'm working with. Counterpoint; that the clients I'm working with have massive data farms and my default response when I meet someone who's very good at an area that I don't do a lot of work in is—counterintuitively to what a lot of people apparently do on Twitter—is not the default assumption of oh, “I don't know anything about that space. It must be worthless and they must be dumb.”No. That is not the default approach to take anything, from my perspective. So, it's clear there's something very much there that I just don't see slash understand. That is a very roundabout way of saying what could be uncharitably distilled down to, “So, is your entire career bullshit?” But no, it is clearly not.There is value being extracted from this and it's powerful. I just think that there's been an industry-wide, relatively poor job done of explaining that value in ways that don't come across as contrived or profoundly disturbing.Emily: Yeah, I think there's a ton of value in doing things right. It gets very complicated to try to explain the nuances of when and how data can actually be useful, right? Oftentimes, your historical data, you know, it really only tells you about what happened in the past. And you can throw some great mathematics at it and try to use it to predict the future in some sense, but it's not necessarily great at what happens when you hit really hard changes, right?For example, when the Coronavirus pandemic hit and purchaser and consumer behavior changed overnight. There was no data in the data set that explained that consumer behavior. And so, what you saw is a lot of these things like supply chain issues, which are very heavily data-driven on a normal circumstance, there was nothing in that data that allowed those algorithms to optimize for the reality that we were seeing at that scale, right? Even if you look at advanced logistics companies, they know what to do when there's a hurricane coming or when there's been an earthquake or things like that. They have disaster scenarios.But nobody has ever done anything like this at the global scale, right? And so, what we saw was this hard reset that we're still feeling the repercussions of today. Yes, there were people who couldn't work and we had lockdowns and all that stuff, but we also have an effect from the impact of the way that we built the systems to work with the data that we need to shuffle around. And so, I think that there is value in being able to process these really, really large datasets, but I think that actually, there's also a lot of value in being able to solve smaller, simpler problems, right? Not everything is a big data problem, not everything requires a ton of data to solve.It's more about the mindset that you use to look at the data, to explore the data, and what you're doing with it. And I think the challenge here is that, you know, everyone wants to believe that they have a big data problem because it feels like you have to have a big data problem if you—Corey: All the cool kids are having this kind of problem.Emily: You have to have big data to sit at the grownup's table. And so, what's happened is we've optimized a lot of tools around solving big data problems and oftentimes, these tools are really poor at solving normal data problems. And there's a lot of money being spent in a lot of overkill engineering in the data space.Corey: On some level, it feels like there has been a dramatic misrepresentation of this. I had an article that went out last year where I called machine-learning selling pickaxes into a digital gold rush. And someone I know at AWS responded to that and probably the best way possible—she works over on their machine-learning group—she sent me a foam Minecraft pickaxe that now is hanging on my office wall. And that gets more commentary than anything, including the customized oil painting I have of Billy the Platypus fighting an AWS Billing Dragon. No, people want to talk about the Minecraft pickaxe.It's amazing. It's first, where is this creativity in any of the marketing that this department is putting out? But two it's clearly not accurate. And what it took for me to see that was a couple of things that I built myself. I built a Twitter thread client that would create Twitter threads, back when Twitter was a place that wasn't overrun by some of the worst people in the world and turned into BirdChan.But that was great. It would automatically do OCR on images that I uploaded, it would describe the image to you using Azure's Cognitive Vision API. And that was magic. And now I see things like ChatGPT, and that's magic. But you take a look at the way that the cloud companies have been describing the power of machine learning in AI, they wind up getting someone with a doctorate whose first language is math getting on stage for 45 minutes and just yelling at you in Star Trek technobabble to the point where you have no idea what the hell they're saying.And occasionally other data scientists say, “Yeah, I think he's just shining everyone on at this point. But yeah, okay.” It still becomes unclear. It takes seeing the value of it for it to finally click. People make fun of it, but the Hot Dog, Not A Hot Dog app is the kind of valuable breakthrough that suddenly makes this intangible thing very real for people.Emily: I think there's a lot of impressive stuff and ChatGPT is fantastically impressive. I actually used ChatGPT to write a letter to some German government agency to deal with some bureaucracy. It was amazing. It did it, was grammatically correct, it got me what I needed, and it saved me a ton of time. I think that these tools are really, really powerful.Now, the thing is, not every company needs to build its own ChatGPT. Maybe they need to integrate it, maybe there's an application for it somewhere in their landscape of product, in their landscape of services, in the landscape of their interim internal tooling. And I would be thrilled actually to see some of that be brought into reality in the next couple of years. But you also have to remember that ChatGPT is not something that came because we have, like, a really great breakthrough in AI last year or something like that. It stacked upon 40 years of research.We've gone through three new waves of neural networking in that time to get to this point, and it solves one class of problem, which is honestly a fairly narrow class of problem. And so, what I see is a lot of companies that have much more mundane problems, but where data can actually still really help them. Like how do you process Cambodian driver's licenses with OCR, right? These are the types of things that if you had a training data set that was every Cambodian person's driver's license for the last ten years, you're still not going to get the data volumes that even a day worth of Amazon's marketplace generates, right? And so, you need to be able to solve these problems still with data without resorting to the cudgel that is a big data solution, right?So, there's still a niche, a valuable niche, for solving problems with data without having to necessarily resort to, we have to load the entire internet into our stream and throw GPUs at it all day long and spend hundreds of—tens of millions of dollars in training. I don't know, maybe hundreds of millions; however much ChatGPT just raised. There's an in-between that I think is vastly underserved by what people are talking about these days.Corey: There is so much attention being given to this and it feels almost like there has been a concerted and defined effort to almost talk in circles and remove people from the humanity and the human consequences of what it is that they're doing. When I was younger, in my more reckless years, I was never much of a fan of the idea of government regulation. But now it has become abundantly clear that our industry, regardless of how you want to define industry, how—describe a society—cannot self-regulate when it comes to data that has the potential to ruin people's lives. I mean, I spent a fair bit of my time in my career working in financial services in a bunch of different ways. And at least in those jobs, it was only money.The scariest thing I ever dealt with, from a data perspective is when I did a brief stint at Grindr because that was the sort of problem where if that data gets out, people will die. And I have not had to think about things like that have that level of import before or since, for which I'm eternally grateful. “It's only money,” which is a weird thing for a guy who fixes cloud bills for a living to say. And if I say that in a client call, it's not going to go very well. But it's the truth. Money is one of those things that can be fixed. It can be addressed in due course. There are always opportunities there. Someone just been outed to their friends, family, and they feel their life is now in shambles around them, you can't unring that particular bell.Emily: Yeah. And in some countries, it can lead to imprisonment, or—Corey: It can lead to death sentences, yes. It's absolutely not acceptable.Emily: There's a lot to say about the ethics of where we are. And I think that as a lot of these high profile, you know, AI tools have come out over the last year or so, so you know, Stable Diffusion and ChatGPT and all of this stuff, there's been a lot of conversation that is sort of trying to put some counterbalance on what we're seeing. And I don't know that it's going to be successful. I think that, you know, I've been speaking about ethics and technology for a long time and I think that we need to mature and get to the next level of actually addressing the ethical problems in technology. Because it's so far beyond things like, “Oh, you know, if there's a biased training data set and therefore the algorithm is biased,” right?Everyone knows that by now, right? And the people who don't know that, don't care. We need to get much beyond where, you know, these conversations about ethics and technology are going because it's a manifold problem. We have issues with the people labeling this data are paid, you know, pennies per hour to deal with some of the most horrific content you've ever seen. I mean, I'm somebody who has immersed myself in a lot of horrific content for some of the work that I have done, and this is, you know, so far beyond what I've had to deal with in my life that I can't even imagine it. You couldn't pay me enough money to do it and we're paying people in developing nations, you know, a buck-thirty-five an hour to do this. I think—Corey: But you must understand, Emily, that given the standard of living where they are, that that is perfectly normal and we wouldn't want to distort local market dynamics. So, if they make a buck-fifty a day, we are going to be generous gods and pay them a whopping dollar-seventy a day, and now we feel good about ourselves. And no, it's not about exploitation. It's about raising up an emerging market. And other happy horseshit that lies people tell themselves.Emily: Yes, it is. Yes, it is. And we've built—you know, the industry has built its back on that. It's raised itself up on this type of labor. It's raised itself up on taking texts and images without permission of the creators. And, you know, there's—I'm not a lawyer and I'm not going to play one, but I do know that derivative use is something that at least under American law, is something that can be safely done. It would be a bad world if derivative use was not something that we had freely available, I think, and on the balance.But our laws, the thing is, our laws don't account for the scale. Our laws about things like fair use, derivative use, are for if you see a picture and you want to take your own interpretation, or if you see an image and you want to make a parody, right? It's a one-to-one thing. You can't make 5 million parody images based on somebody's art, yourself. These laws were never built for this scale.And so, I think that where AI is exploiting society is it's exploiting a set of ethics, a set of laws, and a set of morals that are built around a set of behavior that is designed around normal human interaction scales, you know, one person standing in front of a lecture hall or friends talking with each other or things like that. The world was not meant for a single person to be able to speak to hundreds of thousands of people or to manipulate hundreds of thousands of images per day. It's actually—I find it terrifying. Like, the fact that me, a normal person, has a Twitter following that, you know, if I wanted to, I can have 50 million impressions in a month. This is not a normal thing for a normal human being to have.And so, I think that as we build this technology, we have to also say, we're changing the landscape of human ethics by our ability to act at scale. And yes, you're right. Regulation is possibly one way that can help this, but I think that we also need to embed cultural values in how we're using the technology and how we're shaping our businesses to use the technology. It can be used responsibly. I mean, like I said, ChatGPT helped me with a visa issue, sending an email to the immigration office in Berlin. That's a fantastic thing. That's a net positive for me; hopefully, for humanity. I wasn't about to pay a lawyer to do it. But where's the balance, right? And it's a complex topic.Corey: It is. It absolutely is. There is one last topic that I would like to talk to you about that's a little less heavy. And I've got to be direct with you that I'm not trying to be unkind, but you've disappointed me. Because you mentioned to me at one point, when I asked how things were going in your AWS universe, you said, “Well, aside from the bank heist, reasonably well.”And I thought that you were blessed as with something I always look for, which is the gift of glorious metaphor. Unfortunately, as I said, you've disappointed me. It was not a metaphor; it was the literal truth. What the hell kind of bank heist could possibly affect an AWS account? This sounds like something out of a movie. Hit me with it.Emily: Yeah, you know, I think in the SRE world, we tell people to focus on the high probability, low impact things because that's where it's going to really hurt your business, and let the experts deal with the black swan events because they're pretty unlikely. You know, a normal business doesn't have to worry about terrorists breaking into the Google data center or a gang of thieves breaking into a bank vault. Apparently, that is something that I have to worry about because I have some data in my personal life that I needed to protect, like all other people. And I decided, like a reasonable and secure and smart human being who has a little bit of extra spending cash that I would do the safer thing and take my backup hard drive and my old phones and put them in a safety deposit box at an old private bank that has, you know, a vault that's behind the meter-and-a-half thick steel door and has two guards all the time, cameras everywhere. And I said, “What is the safest possible thing that you can do to store your backups?” Obviously, you put it in a secure storage location, right? And then, you know, I don't use my AWS account, my personal AWS account so much anymore. I have work accounts. I have test accounts—Corey: Oh, yeah. It's honestly the best way to have an AWS account is just having someone else having a payment instrument attached to it because otherwise oh God, you're on the hook for that yourself and nobody wants that.Emily: Absolutely. And you know, creating new email addresses for new trial accounts is really just a pain in the ass. So, you know, I have my phone, you know, from five years ago, sitting in this bank vault and I figured that was pretty secure. Until I got an email [laugh] from the Berlin Polizei saying, “There has been a break-in.” And I went and I looked at the news and apparently, a gang of thieves has pulled off the most epic heist in recent European history.This is barely in the news. Like, unless you speak German, you're probably not going to find any news about this. But a gang of thieves broke into this bank vault and broke open the safety deposit boxes. And it turns out that this vault was also the location where a luxury watch consigner had been storing his watches. So, they made off with some, like, tens of millions of dollars of luxury watches. And then also the phone that had my 2FA for my Amazon account. So, the total value, you know, potential theft of this was probably somewhere in the $500 million range if they set up a SageMaker instance on my account, perhaps.Corey: This episode is sponsored in part by Honeycomb. I'm not going to dance around the problem. Your. Engineers. Are. Burned. Out. They're tired from pagers waking them up at 2 am for something that could have waited until after their morning coffee. Ring Ring, Who's There? It's Nagios, the original call of duty! They're fed up with relying on two or three different “monitoring tools” that still require them to manually trudge through logs to decipher what might be wrong. Simply put, there's a better way. Observability tools like Honeycomb (and very little else becau se they do admittedly set the bar) show you the patterns and outliers of how users experience your code in complex and unpredictable environments so you can spend less time firefighting and more time innovating. It's great for your business, great for your engineers, and, most importantly, great for your customers. Try FREE today at honeycomb.io/screaminginthecloud. That's honeycomb.io/screaminginthecloud.Corey: The really annoying part that you are going to kick yourself on about this—and I'm not kidding—is, I've looked up the news articles on this event and it happened, something like two or three days after AWS put out the best release of last years, or any other re:Invent—past, present, future—which is finally allowing multiple MFA devices on root accounts. So finally, we can stop having safes with these things or you can have two devices or you can have multiple people in Covid times out of remote sides of different parts of the world and still get into the thing. But until then, nope. It's either no MFA or you have to store it somewhere ridiculous like that and access becomes a freaking problem in the event that the device is lost, or in this case stolen.Emily: [laugh]. I will just beg the thieves, if you're out there, if you're secretly actually a bunch of cloud engineers who needed to break into a luxury watch consignment storage vault so that you can pay your cloud bills, please have mercy on my poor AWS account. But also I'll tell you that the credit card attached to it is expired so you won't have any luck.Corey: Yeah. Really sad part. Despite having the unexpired credit card, it just means that the charge won't go through. They're still going to hold you responsible for it. It's the worst advice I see people—Emily: [laugh].Corey: Well, intentioned—giving each other on places like Reddit where the other children hang out. And it's, “Oh, just use a prepaid gift card so it can only charge you so much.” It's yeah, and then you get exploited like someone recently was and start accruing $60,000 a day in Lambda charges on an otherwise idle account and Amazon will come after you with a straight face after a week. And, like, “Yes, we'd like our $360,000, please.”Emily: Yes.Corey: “We tried to charge the credit card and wouldn't you know, it expired. Could you get on that please? We'd like our money faster if you wouldn't mind.” And then you wind up in absolute hell. Now, credit where due, they in every case I am aware of that is not looking like fraud's close cousin, they have made it right, on some level. But it takes three weeks of back and forth and interminable waiting.And you're sitting there freaking out, especially if you're someone who does not have a spare half-million dollars sitting around. Imagine who—“You sound poor. Have you tried not being that?” And I'm firmly convinced that it a matter of time until someone does something truly tragic because they don't understand that it takes forever, but it will go away. And from my perspective, there's no bigger problem that AWS needs to fix than surprise lifelong earnings bills to some poor freaking student who is just trying to stand up a website as part of a class.Emily: All of the clouds have these missing stairs in them. And it's really easy because they make it—one of the things that a lot of the cloud providers do is they make it really easy for you to spin up things to test them. And they make it really, really hard to find where it is to shut it all down. The data science is awful at this. As a data scientist, I work with a lot of data science tools, and every cloud has, like, the spin up your magical data science computing environment so that your data scientist can, like, bang on the data with you know, high-performance compute for a while.And you know, it's one click of a button and you type in a couple of na—you know, a couple of things name, your service or whatever, name your resource. You click a couple buttons and you spin it up, but behind the scenes, it's setting up a Kubernetes cluster and it's setting up some storage bucket and it's setting up some data pipelines and it's setting up some monitoring stuff and it's setting up a VM in order to run all of this stuff. And the next thing that you know, you're burning 100, 200 euro a day, just to, like, to figure out if you can load a CSV into pandas using a Jupyter Notebook. And you're like—when you try to shut it all down, you can't. It's you have to figure, oh, there is a networking thing set up. Well, nobody told me there's a networking thing set up. You know? How do I delete that?Corey: You didn't say please, so here you go. Without for me, it's not even the giant bill going from $4 a month in S3 charges to half a million bucks because that is pretty obvious from the outside just what the hell's been happening. It's the little stuff. I am still—since last summer—waiting for a refund on $260 of ‘because we said so' SageMaker credits because of a change of their billing system, for a 45-minute experiment I had done eight months before that.Emily: Yep.Corey: Wild stuff. Wild stuff. And I have no tolerance for people saying, “Oh, you should just read the pricing page and understand it better.” Yeah, listen, jackhole. I do this for a living. If I can fall victim to it, anyone can. I promise. It is not that I don't know how the billing system works and what to do to avoid unexpected charges.And I'm just luck—because if I hadn't caught it with my systems three days into the month, it would have been a $2,000 surprise. And yeah, I run a company. I can live with that. I wouldn't be happy, but whatever. It is immaterial compared to, you know, payroll.Emily: I think it's kind of a rite of passage, you know, to have the $150 surprise Redshift bill at the end of the month from your personal test account. And it's sad, you know? I think that there's so much better that they can do and that they should do. Sort of as a tangent, one of the challenges that I see in the data space is that it's so hard to break into data because the tooling is so complex and it requires so much extra knowledge, right? If you want to become a software developer, you can develop a microservice on your machine, you can build a web app on your machine, you can set up Ruby on Rails, or Flask, or you know, .NET, or whatever you want. And you can do all of that locally.And you can learn everything you need to know about React, or Terraform, or whatever, running locally. You can't do that with data stuff. You can't do that with BigQuery. You can't do that with Redshift. The only way that you can learn this stuff is if you have an account with that setup and you're paying the money to execute on it. And that makes it a really high barrier for entry for anyone to get into this space. It makes it really hard to learn. Because if you want to learn anything by doing, like many of us in the industry have done, it's going to cost you a ton of money just to [BLEEP] around and find out.Corey: Yes. And no one likes the find out part of those stories.Emily: Nobody likes to find out when it comes to your bill.Corey: And to tie it back to the data story of it, it is clearly some form of batch processing because it tries to be an eight-hour consistency model. Yeah, I assume for everything, it's 72. But what that means is that you are significantly far removed from doing a thing and finding out what that thing costs. And that's the direct charges. There's always the oh, I'm going to set things up and it isn't going to screw you over on the bill. You're just planting a beautiful landmine you're going to stumble blindly into in three months when you do something else and didn't realize what that means.And the worst part is it feels victim-blamey. I mean, this is my pro—I guess this is one of the reasons I guess I'm so down on data, even now. It's because I contextualize it in a sense of the AWS bill. No one's happy dealing with that. You ever met a happy accountant? You have not.Emily: Nope. Nope [laugh]. Especially when it comes to clouds stuff.Corey: Oh yeah.Emily: Especially these days, when we're all looking to save energy, save money in the cloud.Corey: Ideally, save the planet. Sustainability and saving money align on the axis of ‘turn that shit off.' It's great. We can hope for a brighter tomorrow.Emily: Yep.Corey: I really want to thank you for being so generous with your time. If people want to learn more, where can they find you? Apparently filing police reports after bank heists, which you know, it's a great place to meet people.Emily: Yeah. You know, the largest criminal act in Berlin is certainly a place you want to go to get your cloud advice. You can find me, I have a website. It's my name, emilygorcenski.com.You can find me on Twitter, but I don't really post there anymore. And I'm on Mastodon at some place because Mastodon is weird and kind of a mess. But if you search me, I'm really not that hard to find. My name is harder to spell, but you'll see it in the podcast description.Corey: And we will, of course, put links to all of this in the show notes. Thank you so much for your time. I really appreciate it.Emily: Thank you for having me.Corey: Emily Gorcenski, Data and AI Service Line Lead at ThoughtWorks. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insipid, insulting comment, talking about why data doesn't actually matter at all. And then the comment will disappear into the ether because your podcast platform of choice feels the same way about your crappy comment.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About Chris Chris Farris has been in the IT field since 1994 primarily focused on Linux, networking, and security. For the last 8 years, he has focused on public-cloud and public-cloud security. He has built and evolved multiple cloud security programs for major media companies, focusing on enabling the broader security team's objectives of secure design, incident response and vulnerability management. He has developed cloud security standards and baselines to provide risk-based guidance to development and operations teams. As a practitioner, he's architected and implemented multiple serverless and traditional cloud applications focused on deployment, security, operations, and financial modeling.Chris now does cloud security research for Turbot and evangelizes for the open source tool Steampipe. He is one if the organizers of the fwd:cloudsec conference (https://fwdcloudsec.org) and has given multiple presentations at AWS conferences and BSides events.When not building things with AWS's building blocks, he enjoys building Legos with his kid and figuring out what interesting part of the globe to travel to next. He opines on security and technology on Twitter and his website https://www.chrisfarris.comLinks Referenced: Turbot: https://turbot.com/ fwd:cloudsec: https://fwdcloudsec.org/ Steampipe: https://steampipe.io/ Steampipe block: https://steampipe.io/blog TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Tailscale SSH is a new, and arguably better way to SSH. Once you've enabled Tailscale SSH on your server and user devices, Tailscale takes care of the rest. So you don't need to manage, rotate, or distribute new SSH keys every time someone on your team leaves. Pretty cool, right? Tailscale gives each device in your network a node key to connect to your VPN, and uses that same key for SSH authorization and encryption. So basically you're SSHing the same way that you're already managing your network.So what's the benefit? Well, built-in key rotation, the ability to manage permissions as code, connectivity between any two devices, and reduced latency. You can even ask users to re-authenticate SSH connections for that extra bit of security to keep the compliance folks happy. Try Tailscale now - it's free forever for personal use.Corey: This episode is sponsored by our friends at Logicworks. Getting to the cloud is challenging enough for many places, especially maintaining security, resiliency, cost control, agility, etc, etc, etc. Things break, configurations drift, technology advances, and organizations, frankly, need to evolve. How can you get to the cloud faster and ensure you have the right team in place to maintain success over time? Day 2 matters. Work with a partner who gets it - Logicworks combines the cloud expertise and platform automation to customize solutions to meet your unique requirements. Get started by chatting with a cloud specialist today at snark.cloud/logicworks. That's snark.cloud/logicworksCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is someone that I have been meaning to invite slash drag onto this show for a number of years. We first met at re:Inforce the first year that they had such a thing, Amazon's security conference for cloud, as is Amazon's tradition, named after an email subject line. Chris Farris is a cloud security nerd at Turbot. He's also one of the organizers for fwd:cloudsec, another security conference named after an email subject line with a lot more self-awareness than any of Amazon's stuff. Chris, thank you for joining me.Chris: Oh, thank you for dragging me on. You can let go of my hair now.Corey: Wonderful, wonderful. That's why we're all having the thinning hair going on. People just use it to drag us to and fro, it seems. So, you've been doing something that I'm only going to describe as weird lately because your background—not that dissimilar from mine—is as a practitioner. You've been heavily involved in the security space for a while and lately, I keep seeing an awful lot of things with your name on them getting sucked up by the giant app surveillance apparatus deployed to the internet, looking for basically any mention of AWS that I wind up using to write my newsletter and feed the content grist mill every year. What are you doing and how'd you get there?Chris: So, what am I doing right now is, I'm in marketing. It's kind of a, you know, “Oops, I'm sorry I did that.”Corey: Oh, the running gag is, you work in DevRel; that means, “Oh, you're in marketing, but they're scared to tell you that.” You're self-aware.Chris: Yeah.Corey: Good for you.Chris: I'm willing to address that I'm in marketing now. And I've been a cloud practitioner since probably 2014, cloud security since about 2017. And then just decided, the problem that we have in the cloud security community is a lot of us are just kind of sitting in a corner in our companies and solving problems for our companies, but we're not solving the problems at scale. So, I wanted a job that would allow me to reach a broader audience and help a broader audience. Where I see cloud security having—you know, or cloud in general falling down is Amazon makes it really hard for you to do your side of shared responsibility, and so we need to be out there helping customers understand what they need to be doing. So, I am now at a company called Turbot and we're really trying to promote cloud security.Corey: One of the first promoted guest episodes of this show was David Boeke, your CTO, and one of the things that I regret is that I've sort of lost track of Turbot over the past few years because, yeah, one or two things might have been going on during that timeline as I look back at having kids in the middle of a pandemic and the deadly plague o'er land. And suddenly, every conversation takes place over Zoom, which is like, “Oh, good, it's like a happy hour only instead, now it's just like a conference call for work.” It's like, ‘Conference Calls: The Drinking Game' is never the great direction to go in. But it seems the world is recovering. We're going to be able to spend some time together at re:Invent by all accounts that I'm actively looking forward to.As of this recording, you're relatively new to Turbot, and I figured out that you were going there because, once again, content hits my filters. You wrote a fascinating blog post that hits on an interest of mine that I don't usually talk about much because it's off-putting to some folk, and these days, I don't want to get yelled at and more than I have to about the experience of traveling, I believe it was to an all-hands on the other side of the world.Chris: Yep. So, my first day on the job at Turbot, I was landing in Kuala Lumpur, Malaysia, having left the United States 24 hours—or was it 48? It's hard to tell when you go to the other side of the planet and the time zones have also shifted—and then having left my prior company day before that. But yeah, so Turbot about traditionally has an annual event where we all get together in person. We're a completely remote company, but once a year, we all get together in person in our integrate event.And so, that was my first day on the job. And then you know, it was basically two weeks of reasonably intense hackathons, building out a lot of stuff that hopefully will show up open-source shortly. And then yeah, meeting all of my coworkers. And that was nice.Corey: You've always had a focus through all the time that I've known you and all the public content that you've put out there that has come across my desk that seems to center around security. It's sort of an area that I give a nod to more often than I would like, on some level, but that tends to be your bread and butter. Your focus seems to be almost overwhelmingly on I would call it AWS security. Is that fair to say or is that a mischaracterization of how you view it slash what you actually do? Because, again, we have these parasocial relationships with voices on the internet. And it's like, “Oh, yeah, I know all about that person.” Yeah, you've met them once and all you know other than that is what they put on Twitter.Chris: You follow me on Twitter. Yeah, I would argue that yes, a lot of what I do is AWS-related security because in the past, a lot of what I've been responsible for is cloud security in AWS. But I've always worked for companies that were multi-cloud; it's just that 90% of everything was Amazon and so therefore 90% of my time, 90% of my problems, 90% of my risk was all in AWS. I've been trying to break out of that. I've been trying to understand the other clouds.One of the nice aspects of this role and working on Steampipe is I am now experimenting with other clouds. The whole goal here is to be able to scale our ability as an industry and as security practitioners to support multiple clouds. Because whether we want to or not, we've got it. And so, even though 90% of my spend, 90% of my resources, 90% of my applications may be in AWS, that 10% that I'm ignoring is probably more than 10% of my risk, and we really do need to understand and support major clouds equally.Corey: One post you had recently that I find myself in wholehearted agreement with is on the adoption of Tailscale in the enterprise. I use it for all of my personal nonsense and it is transformative. I like the idea of what that portends for a multi-cloud, or poly-cloud, or whatever the hell we're calling it this week, sort of architectures were historically one of the biggest problems in getting to clouds two speak to one another and manage them in an intelligent way is the security models are different, the user identity stuff is different as well, and the network stuff has always been nightmarish. Well, with Tailscale, you don't have to worry about that in the same way at all. You can, more or less, ignore it, turn on host-based firewalls for everything and just allow Tailscale. And suddenly, okay, I don't really have to think about this in the same way.Chris: Yeah. And you get the micro-segmentation out of it, too, which is really nice. I will agree that I had not looked at Tailscale until I was asked to look at Tailscale, and then it was just like, “Oh, I am completely redoing my home network on that.” But looking at it, it's going to scare some old-school network engineers, it's going to impact their livelihoods and that is going to make them very defensive. And so, what I wanted to do in that post was kind of address, as a practitioner, if I was looking at this with an enterprise lens, what are the concerns you would have on deploying Tailscale in your environment?A lot of those were, you know, around user management. I think the big one that is—it's a new thing in enterprise security, but kind of this host profiling, which is hey, before I let your laptop on the network, I'm going to go make sure that you have antivirus and some kind of EDR, XDR, blah-DR agents so that you know we have a reasonable thing that you're not going to just go and drop [unintelligible 00:09:01] on the network and next thing you know, we're Maersk. Tailscale, that's going to be their biggest thing that they are going to have to figure out is how do they work with some of these enterprise concerns and things along those lines. But I think it's an excellent technology, it was super easy to set up. And the ability to fine-tune and microsegment is great.Corey: Wildly so. They occasionally sponsor my nonsense. I have no earthly idea whether this episode is one of them because we have an editorial firewall—they're not paying me to set any of this stuff, like, “And this is brought to you by whatever.” Yeah, that's the sponsored ad part. This is just, I'm in love with the product.One of the most annoying things about it to me is that I haven't found a reason to give them money yet because the free tier for my personal stuff is very comfortably sized and I don't have a traditional enterprise network or anything like that people would benefit from over here. For one area in cloud security that I think I have potentially been misunderstood around, so I want to take at least this opportunity to clear the air on it a little bit has been that, by all accounts, I've spent the last, mmm, few months or so just absolutely beating the crap out of Azure. Before I wind up adding a little nuance and context to that, I'd love to get your take on what, by all accounts, has been a pretty disastrous year-and-a-half for Azure security.Chris: I think it's been a disastrous year-and-a-half for Azure security. Um—[laugh].Corey: [laugh]. That was something of a leading question, wasn't it?Chris: Yeah, no, I mean, it is. And if you think, though, back, Microsoft's repeatedly had these the ebb and flow of security disasters. You know, Code Red back in whatever the 2000s, NT 4.0 patching back in the '90s. So, I think we're just hitting one of those peaks again, or hopefully, we're hitting the peak and not [laugh] just starting the uptick. A lot of what Azure has built is stuff that they already had, commercial off-the-shelf software, they wrapped multi-tenancy around it, gave it a new SKU under the Azure name, and called is cloud. So, am I super-surprised that somebody figured out how to leverage a Jupyter notebook to find the back-end credentials to drop the firewall tables to go find the next guy over's Cosmos DB? No, I'm not.Corey: I find their failures to be less egregious on a technical basis because let's face it, let's be very clear here, this stuff is hard. I am not pretending for even a slight second that I'm a better security engineer than the very capable, very competent people who work there. This stuff is incredibly hard. And I'm not—Chris: And very well-funded people.Corey: Oh, absolutely, yeah. They make more than I do, presumably. But it's one of those areas where I'm not sitting here trying to dunk on them, their work, their efforts, et cetera, and I don't do a good enough job of clarifying that. My problem is the complete radio silence coming out of Microsoft on this. If AWS had a series of issues like this, I'm hard-pressed to imagine a scenario where they would not have much more transparent communications, they might very well trot out a number of their execs to go on a tour to wind up talking about these things and what they're doing systemically to change it.Because six of these in, it's like, okay, this is now a cultural problem. It's not one rando engineer wandering around the company screwing things up on a rotational basis. It's, what are you going to do? It's unlikely that firing Steven is going to be your fix for these things. So, that is part of it.And then most recently, they wound up having a blog post on the MSRC, the Microsoft Security Resource Center is I believe that acronym? The [mrsth], whatever; and it sounds like a virus you pick up in a hospital—but the problem that I have with it is that they spent most of that being overly defensive and dunking on SOCRadar, the vulnerability researcher who found this and reported it to them. And they had all kinds of quibbles with how it was done, what they did with it, et cetera, et cetera. It's, “Excuse me, you're the ones that left customer data sitting out there in the Azure equivalent of an S3 bucket and you're calling other people out for basically doing your job for you? Excuse me?”Chris: But it wasn't sensitive customer data. It was only the contract information, so therefore it was okay.Corey: Yeah, if I put my contract information out there and try and claim it's not sensitive information, my clients will laugh and laugh as they sue me into the Stone Age.Chris: Yeah well, clearly, you don't have the same level of clickthrough terms that Microsoft is able to negotiate because, you know, [laugh].Corey: It's awful as well, it doesn't even work because, “Oh, it's okay, I lost some of your data, but that's okay because it wasn't particularly sensitive.” Isn't that kind of up to you?Chris: Yes. And if A, I'm actually, you know, a big AWS shop and then I'm looking at Azure and I've got my negotiations in there and Amazon gets wind that I'm negotiating with Azure, that's not going to do well for me and my business. So no, this kind of material is incredibly sensitive. And that was an incredibly tone-deaf response on their part. But you know, to some extent, it was more of a response than we've seen from some of the other Azure multi-tenancy breakdowns.Corey: Yeah, at least they actually said something. I mean, there is that. It's just—it's wild to me. And again, I say this as an Azure customer myself. Their computer vision API is basically just this side of magic, as best I can tell, and none of the other providers have anything like it.That's what I want. But, you know, it almost feels like that service is under NDA because no one talks about it when they're using this service. I did a whole blog post singing its praises and no one from that team reached out to me to say, “Hey, glad you liked it.” Not that they owe me anything, but at the same time it's incredible. Why am I getting shut out? It's like, does this company just have an entire policy of not saying anything ever to anyone at any time? It seems it.Chris: So, a long time ago, I came to this realization that even if you just look at the terminology of the three providers, Amazon has accounts. Why does Amazon have Amazon—or AWS accounts? Because they're a retail company and that's what you signed up with to buy your underwear. Google has projects because they were, I guess, a developer-first thing and that was how they thought about it is, “Oh, you're going to go build something. Here's your project.”What does Microsoft have? Microsoft Azure Subscriptions. Because they are still about the corporate enterprise IT model of it's really about how much we're charging you, not really about what you're getting. So, given that you're not a big enterprise IT customer, you don't—I presume—do lots and lots of golfing at expensive golf resorts, you're probably not fitting their demographic.Corey: You're absolutely not. And that's wild to me. And yet, here we are.Chris: Now, what's scary is they are doing so many interesting things with artificial intelligence… that if… their multi-tenancy boundaries are as bad as we're starting to see, then what else is out there? And more and more, we is carbon-based life forms are relying on Microsoft and other cloud providers to build AI, that's kind of a scary thing. Go watch Satya's keynote at Microsoft Ignite and he's showing you all sorts of ways that AI is going to start replacing the gig economy. You know, it's not just Tesla and self-driving cars at this point. Dali is going to replace the independent graphics designer.They've got things coming out in their office suite that are going to replace the mom-and-pop marketing shops that are generating menus and doing marketing plans for your local restaurants or whatever. There's a whole slew of things where they're really trying to replace people.Corey: That is a wild thing to me. And part of the problem I have in covering AWS is that I have to differentiate in a bunch of different ways between AWS and its Amazon corporate parent. And they have that problem, too, internally. Part of the challenge they have, in many cases, is that perks you give to employees have to scale to one-and-a-half million people, many of them in fulfillment center warehouse things. And that is a different type of problem that a company, like for example, Google, where most of their employees tend to be in office job-style environments.That's a weird thing and I don't know how to even start conceptualizing things operating at that scale. Everything that they do is definitionally a very hard problem when you have to make it scale to that point. What all of the hyperscale cloud providers do is, from where I sit, complete freaking magic. The fact that it works as well as it does is nothing short of a modern-day miracle.Chris: Yeah, and it is more than just throwing hardware at the problem, which was my on-prem solution to most of the things. “Oh, hey. We need higher availability? Okay, we're going to buy two of everything.” We called it the Noah's Ark model, and we have an A side and a B side.And, “Oh, you know what? Just in case we're going to buy some extra capacity and put it in a different city so that, you know, we can just fail from our primary city to our secondary city.” That doesn't work at the cloud provider scale. And really, we haven't seen a major cloud outage—I mean, like, a bad one—in quite a while.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate. Is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other; which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at honeycomb.io/screaminginthecloud. Observability: it's more than just hipster monitoring.Corey: The outages are always fascinating, just from the way that they are reported in the mainstream media. And again, this is hard, I get it. I am not here to crap on journalists. They, for some ungodly, unknowable reason, have decided not to spend their entire career focusing on the nuances of one very specific, very deep industry. I don't know why.But as [laugh] a result, they wind up getting a lot of their baseline facts wrong about these things. And that's fair. I'm not here to necessarily act as an Amazon spokesperson when these things happen. They have an awful lot of very well-paid people who can do that. But it is interesting just watching the blowback and the reaction of whatever there's an outage, the conversation is never “Does Amazon or Azure or Google suck?” It's, “Does cloud suck as a whole?”That's part of the reason I care so much about Azure getting their act together. If it were just torpedoing Microsoft's reputation, then well, that's sad, but okay. But it extends far beyond that to a point where it's almost where the enterprise groundhog sees the shadow of a data breach and then we get six more years of data center build-outs instead of moving things to a cloud. I spent too many years working in data centers and I have the scars from the cage nuts and crimping patch cables frantically in the middle of the night to prove it. I am thrilled at the fact that I don't believe I will ever again have to frantically drive across town in the middle of the night to replace a hard drive before the rest of the array degrades. Cloud has solved those problems beautifully. I don't want to go back to the Dark Ages.Chris: Yeah, and I think that there's a general potential that we could start seeing this big push towards going back on-prem for effectively sovereign data reasons, whether it's this country has said, “You cannot store your data about our citizens outside of our borders,” and either they're doing that because they do not trust the US Silicon Valley privacy or whatever, or because if it's outside of our borders, then our secret police agents can come knocking on the door at two in the morning to go find out what some dissidents' viewings habits might have been, I see sovereign cloud as this thing that may be a back step from this ubiquitous thing that we have right now in Amazon, Azure, and Google. And so, as we start getting to the point in the history books where we start seeing maps with lots of flags, I think we're going to start seeing a bifurcation of cloud as just a whole thing. We see it already right now. The AWS China partition is not owned by Amazon, it is not run by Amazon, it is not controlled by Amazon. It is controlled by the communist government of China. And nobody is doing business in Russia right now, but if they had not done what they had done earlier this year, we might very well see somebody spinning up a cloud provider that is completely controlled by and in the Russian government.Corey: Well, yes or no, but I want to challenge that assessment for a second because I've had conversations with a number of folks about this where people say, “Okay, great. Like, is the alt-right, for example, going to have better options now that there might be a cloud provider spinning up there?” Or, “Well, okay, what about a new cloud provider to challenge the dominance of the big three?” And there are all these edge cases, either geopolitically or politically based upo—or folks wanting to wind up approaching it from a particular angle, but if we were hired to build out an MVP of a hyperscale cloud provider, like, the budget for that MVP would look like one 100 billion at this point to get started and just get up to a point of critical mass before you could actually see if this thing has legs. And we'd probably burn through almost all of that before doing a single dime in revenue.Chris: Right. And then you're doing that in small markets. Outside of the China partition, these are not massively large markets. I think Oracle is going down an interesting path with its idea of Dedicated Cloud and Oracle Alloy [unintelligible 00:22:52].Corey: I like a lot of what Oracle's doing, and if younger me heard me say that, I don't know how hard I'd hit myself, but here we are. Their free tier for Oracle Cloud is amazing, their data transfer prices are great, and their entire approach of, “We'll build an entire feature complete region in your facility and charge you what, from what I can tell, is a very reasonable amount of money,” works. And it is feature complete, not, “Well, here are the three services that we're going to put in here and everything else is well… it's just sort of a toehold there so you can start migrating it into our big cloud.” No. They're doing it right from that perspective.The biggest problem they've got is the word Oracle at the front end and their, I would say borderline addiction to big-E enterprise markets. I think the future of cloud looks a lot more like cloud-native companies being founded because those big enterprises are starting to describe themselves in similar terminology. And as we've seen in the developer ecosystem, as go startups, so do big companies a few years later. Walk around any big company that's undergoing a digital transformation, you'll see a lot more Macs on desktops, for example. You'll see CI/CD processes in place as opposed to, “Well, oh, you want something new, it's going to be eight weeks to get a server rack downstairs and accounting is going to have 18 pages of forms for you to fill out.” No, it's “click the button,” or—Chris: Don't forget the six months of just getting the financial CapEx approvals.Corey: Exactly.Chris: You have to go through the finance thing before you even get to start talking to techies about when you get your server. I think Oracle is in an interesting place though because it is embracing the fact that it is number four, and so therefore, it's like we are going to work with AWS, we are going to work with Azure, our database can run in AWS or it can run in our cloud, we can interconnect directly, natively, seamlessly with Azure. If I were building a consumer-based thing and I was moving into one of these markets where one of these governments was demanding something like a sovereign cloud, Oracle is a great place to go and throw—okay, all of our front-end consumer whatever is all going to sit in AWS because that's what we do for all other countries. For this one country, we're just going to go and build this thing in Oracle and we're going to leverage Oracle Alloy or whatever, and now suddenly, okay, their data is in their country and it's subject to their laws but I don't have to re-architect to go into one of these, you know, little countries with tin horn dictators.Corey: It's the way to do multi-cloud right, from my perspective. I'll use a component service in a different cloud, I'm under no illusions, though, in doing that I'm increasing my resiliency. I'm not removing single points of failure; I'm adding them. And I make that trade-off on a case-by-case basis, knowingly. But there is a case for some workloads—probably not yours if you're listening to this; assume not, but when you have more context, maybe so—where, okay, we need to be across multiple providers for a variety of strategic or contextual reasons for this workload.That does not mean everything you build needs to be able to do that. It means you're going to make trade-offs for that workload, and understanding the boundaries of where that starts and where that stops is going to be important. That is not the worst idea in the world for a given appropriate workload, that you can optimize stuff into a container and then can run, more or less, anywhere that can take a container. But that is also not the majority of most people's workloads.Chris: Yeah. And I think what that comes back to from the security practitioner standpoint is you have to support not just your primary cloud, your favorite cloud, the one you know, you have to support any cloud. And whether that's, you know, hey, congratulations. Your developers want to use Tailscale because it bypasses a ton of complexity in getting these remote island VPCs from this recent acquisition integrated into your network or because you're going into a new market and you have to support Oracle Cloud in Saudi Arabia, then you as a practitioner have to kind of support any cloud.And so, one of the reasons that I've joined and I'm working on, and so excited about Steampipe is it kind of does give you that. It is a uniform interface to not just AWS, Azure, and Google, but all sorts of clouds, whether it's GitHub or Oracle, or Tailscale. So, that's kind of the message I have for security practitioners at this point is, I tried, I fought, I screamed and yelled and ranted on Twitter, against, you know, doing multi-cloud, but at the end of the day, we were still multi-cloud.Corey: When I see these things evolving, is that, yeah, as a practitioner, we're increasingly having to work across multiple providers, but not to a stupendous depth that's the intimidating thing that scares the hell out of people. I still remember my first time with the AWS console, being so overwhelmed with a number of services, and there were 12. Now, there are hundreds, and I still feel that same sense of being overwhelmed, but I also have the context now to realize that over half of all customer spend globally is on EC2. That's one service. Yes, you need, like, five more to get it to work, but okay.And once you go through learning that to get started, and there's a lot of moving parts around it, like, “Oh, God, I have to do this for every service?” No, take Route 53—my favorite database, but most people use it as a DNS service—you can go start to finish on basically everything that service does that a human being is going to use in less than four hours, and then you're more or less ready to go. Everything is not the hairy beast that is EC2. And most of those services are not for you, whoever you are, whatever you do, most AWS services are not for you. Full stop.Chris: Yes and no. I mean, as a security practitioner, you need to know what your developers are doing, and I've worked in large organizations with lots of things and I would joke that, oh, yeah, I'm sure we're using every service but the IoT, and then I go and I look at our bill, and I was like, “Oh, why are we dropping that much on IoT?” Oh, because they wanted to use the Managed MQTT service.Corey: Ah, I start with the bill because the bill is the source of truth.Chris: Yes, they wanted to use the Managed MQTT service. Okay, great. So, we're now in IoT. But how many of those things have resource policies, how many of those things can be made public, and how many of those things are your CSPM actually checking for and telling you that, hey, a developer has gone out somewhere and made this SageMaker notebook public, or this MQTT topic public. And so, that's where you know, you need to have that level of depth and then you've got to have that level of depth in each cloud. To some extent, if the cloud is just the core basic VMs, object storage, maybe some networking, and a managed relational database, super simple to understand what all you need to do to build a baseline to secure that. As soon as you start adding in on all of the fancy services that AWS has. I re—Corey: Yeah, migrating your Step Functions workflow to other cloud is going to be a living goddamn nightmare. Migrating something that you stuffed into a container and run on EC2 or Fargate is probably going to be a lot simpler. But there are always nuances.Chris: Yep. But the security profile of a Step Function is significantly different. So, you know, there's not much you can do there wrong, yet.Corey: You say that now, but wait for their next security breach, and then we start calling them Stumble Functions instead.Chris: Yeah. I say that. And the next thing, you know, we're going to have something like Lambda [unintelligible 00:30:31] show up and I'm just going to be able to put my Step Function on the internet unauthenticated. Because, you know, that's what Amazon does: they innovate, but they don't necessarily warn security practitioners ahead of their innovation that, hey, you're we're about to release this thing. You might want to prepare for it and adjust your baselines, or talk to your developers, or here's a service control policy that you can drop in place to, you know, like, suppress it for a little bit. No, it's like, “Hey, these things are there,” and by the time you see the tweets or read the documentation, you've got some developer who's put it in production somewhere. And then it becomes a lot more difficult for you as a security practitioner to put the brakes on it.Corey: I really want to thank you for spending so much time talking to me. If people want to learn more and follow your exploits—as they should—where can they find you?Chris: They can find me at steampipe.io/blog. That is where all of my latest rants, raves, research, and how-tos show up.Corey: And we will, of course, put a link to that in the [show notes 00:31:37]. Thank you so much for being so generous with your time. I appreciate it.Chris: Perfect, thank you. You have a good one.Corey: Chris Farris, cloud security nerd at Turbot. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry insulting comment, and be sure to mention exactly which Azure communications team you work on.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today we continue our re:Invent 2022 series joined by Kumar Chellapilla, a general manager of ML and AI Services at AWS. We had the opportunity to speak with Kumar after announcing their recent addition of geospatial data to the SageMaker Platform. In our conversation, we explore Kumar's role as the GM for a diverse array of SageMaker services, what has changed in the geospatial data landscape over the last 10 years, and why Amazon decided now was the right time to invest in geospatial data. We discuss the challenges of accessing and working with this data and the pain points they're trying to solve. Finally, Kumar walks us through a few customer use cases, describes how this addition will make users more effective than they currently are, and shares his thoughts on the future of this space over the next 2-5 years, including the potential intersection of geospatial data and stable diffusion/generative models. The complete show notes for this episode can be found at twimlai.com/go/607
Emmanuel Turlay spent more than a decade in engineering roles at tech-first companies like Instacart and Cruise before realizing machine learning engineers need a better solution. Emmanuel started Sematic earlier this year and was part of the YC summer 2022 batch. He recently raised a $3M seed round from investors including Race Capital and Soma Capital. Thanks to friend of the podcast and former guest Hina Dixit from Samsung NEXT for the intro to Emmanuel.I've been involved with the AutoML space for five years and, for full disclosure, I'm on the board of Auger which is in a related space. I've seen the space evolve and know how much room there is for innovation. This one's a great education about what's broken and what's ahead from a true machine learning pioneer.Listen and learn...How to turn every software engineer into a machine learning engineerHow AutoML platforms are automating tasks performed in traditional ML toolsHow Emmanuel translated learning from Cruise, the self-driving car company, into an open source platform available to all data engineering teamsHow to move from building an ML model locally to deploying it to the cloud and creating a data pipeline... in hoursWhat you should know about self-driving cars... from one of the experts who developed the brains that power themWhy 80% of AI and ML projects failReferences in this episode:Unscrupulous users manipulate LLMs to spew hateHina Dixit from Samsung NEXT on AI and the Future of WorkApache BeamEliot Shmukler, Anomalo CEO, on AI and the Future of Work
About RickRick is the Product Leader of the AWS Optimization team. He previously led the cloud optimization product organization at Turbonomic, and previously was the Microsoft Azure Resource Optimization program owner.Links Referenced: AWS: https://console.aws.amazon.com LinkedIn: https://www.linkedin.com/in/rick-ochs-06469833/ Twitter: https://twitter.com/rickyo1138 TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Chronosphere. Tired of observability costs going up every year without getting additional value? Or being locked in to a vendor due to proprietary data collection, querying and visualization? Modern day, containerized environments require a new kind of observability technology that accounts for the massive increase in scale and attendant cost of data. With Chronosphere, choose where and how your data is routed and stored, query it easily, and get better context and control. 100% open source compatibility means that no matter what your setup is, they can help. Learn how Chronosphere provides complete and real-time insight into ECS, EKS, and your microservices, whereever they may be at snark.cloud/chronosphere That's snark.cloud/chronosphere Corey: This episode is bought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups. If you're tired of the vulnerabilities, costs and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, thats V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. For those of you who've been listening to this show for a while, the theme has probably emerged, and that is that one of the key values of this show is to give the guest a chance to tell their story. It doesn't beat the guests up about how they approach things, it doesn't call them out for being completely wrong on things because honestly, I'm pretty good at choosing guests, and I don't bring people on that are, you know, walking trash fires. And that is certainly not a concern for this episode.But this might devolve into a screaming loud argument, despite my best effort. Today, I'm joined by Rick Ochs, Principal Product Manager at AWS. Rick, thank you for coming back on the show. The last time we spoke, you were not here you were at, I believe it was Turbonomic.Rick: Yeah, that's right. Thanks for having me on the show, Corey. I'm really excited to talk to you about optimization and my current role and what we're doing.Corey: Well, let's start at the beginning. Principal product manager. It sounds like one of those corporate titles that can mean a different thing in every company or every team that you're talking to. What is your area of responsibility? Where do you start and where do you stop?Rick: Awesome. So, I am the product manager lead for all of AWS Optimizations Team. So, I lead the product team. That includes several other product managers that focus in on Compute Optimizer, Cost Explorer, right-sizing recommendations, as well as Reservation and Savings Plan purchase recommendations.Corey: In other words, you are the person who effectively oversees all of the AWS cost optimization tooling and approaches to same?Rick: Yeah.Corey: Give or take. I mean, you could argue that oh, every team winds up focusing on helping customers save money. I could fight that argument just as effectively. But you effectively start and stop with respect to helping customers save money or understand where the money is going on their AWS bill.Rick: I think that's a fair statement. And I also agree with your comment that I think a lot of service teams do think through those use cases and provide capabilities, you know? There's, like, S3 storage lines. You know, there's all sorts of other products that do offer optimization capabilities as well, but as far as, like, the unified purpose of my team, it is, unilaterally focused on how do we help customers safely reduce their spend and not hurt their business at the same time.Corey: Safely being the key word. For those who are unaware of my day job, I am a partial owner of The Duckbill Group, a consultancy where we fix exactly one problem: the horrifying AWS bill. This is all that I've been doing for the last six years, so I have some opinions on AWS bill reduction as well. So, this is going to be a fun episode for the two of us to wind up, mmm, more or less smacking each other around, but politely because we are both professionals. So, let's start with a very high level. How does AWS think about AWS bills from a customer perspective? You talk about optimizing it, but what does that mean to you?Rick: Yeah. So, I mean, there's a lot of ways to think about it, especially depending on who I'm talking to, you know, where they sit in our organization. I would say I think about optimization in four major themes. The first is how do you scale correctly, whether that's right-sizing or architecting things to scale in and out? The second thing I would say is, how do you do pricing and discounting, whether that's Reservation management, Savings Plan Management, coverage, how do you handle the expenditures of prepayments and things like that?Then I would say suspension. What that means is turn the lights off when you leave the room. We have a lot of customers that do this and I think there's a lot of opportunity for more. Turning EC2 instances off when they're not needed if they're non-production workloads or other, sort of, stateful services that charge by the hour, I think there's a lot of opportunity there.And then the last of the four methods is clean up. And I think it's maybe one of the lowest-hanging fruit, but essentially, are you done using this thing? Delete it. And there's a whole opportunity of cleaning up, you know, IP addresses unattached EBS volumes, sort of, these resources that hang around in AWS accounts that sort of getting lost and forgotten as well. So, those are the four kind of major thematic strategies for how to optimize a cloud environment that we think about and spend a lot of time working on.Corey: I feel like there's—or at least the way that I approach these things—that there are a number of different levels you can look at AWS billing constructs on. The way that I tend to structure most of my engagements when I'm working with clients is we come in and, step one: cool. Why do you care about the AWS bill? It's a weird question to ask because most of the engineering folks look at me like I've just grown a second head. Like, “So, why do you care about your AWS bill?” Like, “What? Why do you? You run a company doing this?”It's no, no, no, it's not that I'm being rhetorical and I don't—I'm trying to be clever somehow and pretend that I don't understand all the nuances around this, but why does your business care about lowering the AWS bill? Because very often, the answer is they kind of don't. What they care about from a business perspective is being able to accurately attribute costs for the service or good that they provide, being able to predict what that spend is going to be, and also yes, a sense of being good stewards of the money that has been entrusted to them by via investors, public markets, or the budget allocation process of their companies and make sure that they're not doing foolish things with it. And that makes an awful lot of sense. It is rare at the corporate level that the stated number one concern is make the bills lower.Because at that point, well, easy enough. Let's just turn off everything you're running in production. You'll save a lot of money in your AWS bill. You won't be in business anymore, but you'll be saving a lot of money on the AWS bill. The answer is always deceptively nuanced and complicated.At least, that's how I see it. Let's also be clear that I talk with a relatively narrow subset of the AWS customer totality. The things that I do are very much intentionally things that do not scale. Definitionally, everything that you do has to scale. How do you wind up approaching this in ways that will work for customers spending billions versus independent learners who are paying for this out of their own personal pocket?Rick: It's not easy [laugh], let me just preface that. The team we have is incredible and we spent so much time thinking about scale and the different personas that engage with our products and how they're—what their experience is when they interact with a bill or AWS platform at large. There's also a couple of different personas here, right? We have a persona that focuses in on that cloud cost, the cloud bill, the finance, whether that's—if an organization is created a FinOps organization, if they have a Cloud Center of Excellence, versus an engineering team that maybe has started to go towards decentralized IT and has some accountability for the spend that they attribute to their AWS bill. And so, these different personas interact with us in really different ways, where Cost Explorer downloading the CUR and taking a look at the bill.And one thing that I always kind of imagine is somebody putting a headlamp on and going into the caves in the depths of their AWS bill and kind of like spelunking through their bill sometimes, right? And so, you have these FinOps folks and billing people that are deeply interested in making sure that the spend they do have meets their business goals, meaning this is providing high value to our company, it's providing high value to our customers, and we're spending on the right things, we're spending the right amount on the right things. Versus the engineering organization that's like, “Hey, how do we configure these resources? What types of instances should we be focused on using? What services should we be building on top of that maybe are more flexible for our business needs?”And so, there's really, like, two major personas that I spend a lot of time—our organization spends a lot of time wrapping our heads around. Because they're really different, very different approaches to how we think about cost. Because you're right, if you just wanted to lower your AWS bill, it's really easy. Just size everything to a t2.nano and you're done and move on [laugh], right? But you're [crosstalk 00:08:53]—Corey: Aw, t3 or t4.nano, depending upon whether regional availability is going to save you less. I'm still better at this. Let's not kid ourselves I kid. Mostly.Rick: For sure. So t4.nano, absolutely.Corey: T4g. Remember, now the way forward is everything has an explicit letter designator to define which processor company made the CPU that underpins the instance itself because that's a level of abstraction we certainly wouldn't want the cloud provider to take away from us any.Rick: Absolutely. And actually, the performance differences of those different processor models can be pretty incredible [laugh]. So, there's huge decisions behind all of that as well.Corey: Oh, yeah. There's so many factors that factor in all these things. It's gotten to a point of you see this usually with lawyers and very senior engineers, but the answer to almost everything is, “It depends.” There are always going to be edge cases. Easy example of, if you check a box and enable an S3 Gateway endpoint inside of a private subnet, suddenly, you're not passing traffic through a 4.5 cent per gigabyte managed NAT Gateway; it's being sent over that endpoint for no additional cost whatsoever.Check the box, save a bunch of money. But there are scenarios where you don't want to do it, so always double-checking and talking to customers about this is critically important. Just because, the first time you make a recommendation that does not work for their constraints, you lose trust. And make a few of those and it looks like you're more or less just making naive recommendations that don't add any value, and they learn to ignore you. So, down the road, when you make a really high-value, great recommendation for them, they stop paying attention.Rick: Absolutely. And we have that really high bar for recommendation accuracy, especially with right sizing, that's such a key one. Although I guess Savings Plan purchase recommendations can be critical as well. If a customer over commits on the amount of Savings Plan purchase they need to make, right, that's a really big problem for them.So, recommendation accuracy must be above reproach. Essentially, if a customer takes a recommendation and it breaks an application, they're probably never going to take another right-sizing recommendation again [laugh]. And so, this bar of trust must be exceptionally high. That's also why out of the box, the compute optimizer recommendations can be a little bit mild, they're a little time because the first order of business is do no harm; focus on the performance requirements of the application first because we have to make sure that the reason you build these workloads in AWS is served.Now ideally, we do that without overspending and without overprovisioning the capacity of these workloads, right? And so, for example, like if we make these right-sizing recommendations from Compute Optimizer, we're taking a look at the utilization of CPU, memory, disk, network, throughput, iops, and we're vending these recommendations to customers. And when you take that recommendation, you must still have great application performance for your business to be served, right? It's such a crucial part of how we optimize and run long-term. Because optimization is not a one-time Band-Aid; it's an ongoing behavior, so it's really critical that for that accuracy to be exceptionally high so we can build business process on top of it as well.Corey: Let me ask you this. How do you contextualize what the right approach to optimization is? What is your entire—there are certain tools that you have… by ‘you,' I mean, of course, as an organization—have repeatedly gone back to and different approaches that don't seem to deviate all that much from year to year, and customer to customer. How do you think about the general things that apply universally?Rick: So, we know that EC2 is a very popular service for us. We know that sizing EC2 is difficult. We think about that optimization pillar of scaling. It's an obvious area for us to help customers. We run into this sort of industry-wide experience where whenever somebody picks the size of a resource, they're going to pick one generally larger than they need.It's almost like asking a new employee at your company, “Hey, pick your laptop. We have a 16 gig model or a 32 gig model. Which one do you want?” That person [laugh] making the decision on capacity, hardware capacity, they're always going to pick the 32 gig model laptop, right? And so, we have this sort of human nature in IT of, we don't want to get called at two in the morning for performance issues, we don't want our apps to fall over, we want them to run really well, so we're going to size things very conservatively and we're going to oversize things.So, we can help customers by providing those recommendations to say, you can size things up in a different way using math and analytics based on the utilization patterns, and we can provide and pick different instance types. There's hundreds and hundreds of instance types in all of these regions across the globe. How do you know which is the right one for every single resource you have? It's a very, very hard problem to solve and it's not something that is lucrative to solve one by one if you have 100 EC2 instances. Trying to pick the correct size for each and every one can take hours and hours of IT engineering resources to look at utilization graphs, look at all of these types available, look at what is the performance difference between processor models and providers of those processors, is there application compatibility constraints that I have to consider? The complexity is astronomical.And then not only that, as soon as you make that sizing decision, one week later, it's out of date and you need a different size. So, [laugh] you didn't really solve the problem. So, we have to programmatically use data science and math to say, “Based on these utilization values, these are the sizes that would make sense for your business, that would have the lowest cost and the highest performance together at the same time.” And it's super important that we provide this capability from a technology standpoint because it would cost so much money to try to solve that problem that the savings you would achieve might not be meaningful. Then at the same time… you know, that's really from an engineering perspective, but when we talk to the FinOps and the finance folks, the conversations are more about Reservations and Savings Plans.How do we correctly apply Savings Plans and Reservations across a high percentage of our portfolio to reduce the costs on those workloads, but not so much that dynamic capacity levels in our organization mean we all of a sudden have a bunch of unused Reservations or Savings Plans? And so, a lot of organizations that engage with us and we have conversations with, we start with the Reservation and Savings Plan conversation because it's much easier to click a few buttons and buy a Savings Plan than to go institute an entire right-sizing campaign across multiple engineering teams. That can be very difficult, a much higher bar. So, some companies are ready to dive into the engineering task of sizing; some are not there yet. And they're a little maybe a little earlier in their FinOps journey, or the building optimization technology stacks, or achieving higher value out of their cloud environments, so starting with kind of the low hanging fruit, it can vary depending on the company, size of company, technical aptitudes, skill sets, all sorts of things like that.And so, those finance-focused teams are definitely spending more time looking at and studying what are the best practices for purchasing Savings Plans, covering my environment, getting the most out of my dollar that way. Then they don't have to engage the engineering teams; they can kind of take a nice chunk off the top of their bill and sort of have something to show for that amount of effort. So, there's a lot of different approaches to start in optimization.Corey: My philosophy runs somewhat counter to this because everything you're saying does work globally, it's safe, it's non-threatening, and then also really, on some level, feels like it is an approach that can be driven forward by finance or business. Whereas my worldview is that cost and architecture in cloud are one and the same. And there are architectural consequences of cost decisions and vice versa that can be adjusted and addressed. Like, one of my favorite party tricks—although I admit, it's a weird party—is I can look at the exploded PDF view of a customer's AWS bill and describe their architecture to them. And people have questioned that a few times, and now I have a testimonial on my client website that mentions, “It was weird how he was able to do this.”Yeah, it's real, I can do it. And it's not a skill, I would recommend cultivating for most people. But it does also mean that I think I'm onto something here, where there's always context that needs to be applied. It feels like there's an entire ecosystem of product companies out there trying to build what amount to a better Cost Explorer that also is not free the way that Cost Explorer is. So, the challenge I see there's they all tend to look more or less the same; there is very little differentiation in that space. And in the fullness of time, Cost Explorer does—ideally—get better. How do you think about it?Rick: Absolutely. If you're looking at ways to understand your bill, there's obviously Cost Explorer, the CUR, that's a very common approach is to take the CUR and put a BI front-end on top of it. That's a common experience. A lot of companies that have chops in that space will do that themselves instead of purchasing a third-party product that does do bill breakdown and dissemination. There's also the cross-charge show-back organizational breakdown and boundaries because you have these super large organizations that have fiefdoms.You know, if HR IT and sales IT, and [laugh] you know, product IT, you have all these different IT departments that are fiefdoms within your AWS bill and construct, whether they have different ABS accounts or say different AWS organizations sometimes, right, it can get extremely complicated. And some organizations require the ability to break down their bill based on those organizational boundaries. Maybe tagging works, maybe it doesn't. Maybe they do that by using a third-party product that lets them set custom scopes on their resources based on organizational boundaries. That's a common approach as well.We do also have our first-party solutions, they can do that, like the CUDOS dashboard as well. That's something that's really popular and highly used across our customer base. It allows you to have kind of a dashboard and customizable view of your AWS costs and, kind of, split it up based on tag organizational value, account name, and things like that as well. So, you mentioned that you feel like the architectural and cost problem is the same problem. I really don't disagree with that at all.I think what it comes down to is some organizations are prepared to tackle the architectural elements of cost and some are not. And it really comes down to how does the customer view their bill? Is it somebody in the finance organization looking at the bill? Is it somebody in the engineering organization looking at the bill? Ideally, it would be both.Ideally, you would have some of those skill sets that overlap, or you would have an organization that does focus in on FinOps or cloud operations as it relates to cost. But then at the same time, there are organizations that are like, “Hey, we need to go to cloud. Our CIO told us go to cloud. We don't want to pay the lease renewal on this building.” There's a lot of reasons why customers move to cloud, a lot of great reasons, right? Three major reasons you move to cloud: agility, [crosstalk 00:20:11]—Corey: And several terrible ones.Rick: Yeah, [laugh] and some not-so-great ones, too. So, there's so many different dynamics that get exposed when customers engage with us that they might or might not be ready to engage on the architectural element of how to build hyperscale systems. So, many of these customers are bringing legacy workloads and applications to the cloud, and something like a re-architecture to use stateless resources or something like Spot, that's just not possible for them. So, how can they take 20% off the top of their bill? Savings Plans or Reservations are kind of that easy, low-hanging fruit answer to just say, “We know these are fairly static environments that don't change a whole lot, that are going to exist for some amount of time.”They're legacy, you know, we can't turn them off. It doesn't make sense to rewrite these applications because they just don't change, they don't have high business value, or something like that. And so, the architecture part of that conversation doesn't always come into play. Should it? Yes.The long-term maturity and approach for cloud optimization does absolutely account for architecture, thinking strategically about how you do scaling, what services you're using, are you going down the Kubernetes path, which I know you're going to laugh about, but you know, how do you take these applications and componentize them? What services are you using to do that? How do you get that long-term scale and manageability out of those environments? Like you said at the beginning, the complexity is staggering and there's no one unified answer. That's why there's so many different entrance paths into, “How do I optimize my AWS bill?”There's no one answer, and every customer I talk to has a different comfort level and appetite. And some of them have tried suspension, some of them have gone heavy down Savings Plans, some of them want to dabble in right-sizing. So, every customer is different and we want to provide those capabilities for all of those different customers that have different appetites or comfort levels with each of these approaches.Corey: This episode is sponsored in part by our friends at Redis, the company behind the incredibly popular open source database. If you're tired of managing open source Redis on your own, or if you are looking to go beyond just caching and unlocking your data's full potential, these folks have you covered. Redis Enterprise is the go-to managed Redis service that allows you to reimagine how your geo-distributed applications process, deliver, and store data. To learn more from the experts in Redis how to be real-time, right now, from anywhere, visit redis.com/duckbill. That's R - E - D - I - S dot com slash duckbill.Corey: And I think that's very fair. I think that it is not necessarily a bad thing that you wind up presenting a lot of these options to customers. But there are some rough edges. An example of this is something I encountered myself somewhat recently and put on Twitter—because I have those kinds of problems—where originally, I remember this, that you were able to buy hourly Savings Plans, which again, Savings Plans are great; no knock there. I would wish that they applied to more services rather than, “Oh, SageMaker is going to do its own Savings Pla”—no, stop keeping me from going from something where I have to manage myself on EC2 to something you manage for me and making that cost money. You nailed it with Fargate. You nailed it with Lambda. Please just have one unified Savings Plan thing. But I digress.But you had a limit, once upon a time, of $1,000 per hour. Now, it's $5,000 per hour, which I believe in a three-year all-up-front means you will cheerfully add $130 million purchase to your shopping cart. And I kept adding a bunch of them and then had a little over a billion dollars a single button click away from being charged to my account. Let me begin with what's up with that?Rick: [laugh]. Thank you for the tweet, by the way, Corey.Corey: Always thrilled to ruin your month, Rick. You know that.Rick: Yeah. Fantastic. We took that tweet—you know, it was tongue in cheek, but also it was a serious opportunity for us to ask a question of what does happen? And it's something we did ask internally and have some fun conversations about. I can tell you that if you clicked purchase, it would have been declined [laugh]. So, you would have not been—Corey: Yeah, American Express would have had a problem with that. But the question is, would you have attempted to charge American Express, or would something internally have gone, “This has a few too many commas for us to wind up presenting it to the card issuer with a straight face?”Rick: [laugh]. Right. So, it wouldn't have gone through and I can tell you that, you know, if your account was on a PO-based configuration, you know, it would have gone to the account team. And it would have gone through our standard process for having a conversation with our customer there. That being said, we are—it's an awesome opportunity for us to examine what is that shopping cart experience.We did increase the limit, you're right. And we increased the limit for a lot of reasons that we sat down and worked through, but at the same time, there's always an opportunity for improvement of our product and experience, we want to make sure that it's really easy and lightweight to use our products, especially purchasing Savings Plans. Savings Plans are already kind of wrought with mental concern and risk of purchasing something so expensive and large that has a big impact on your AWS bill, so we don't really want to add any more friction necessarily the process but we do want to build an awareness and make sure customers understand, “Hey, you're purchasing this. This has a pretty big impact.” And so, we're also looking at other ways we can kind of improve the ability for the Savings Plan shopping cart experience to ensure customers don't put themselves in a position where you have to unwind or make phone calls and say, “Oops.” Right? We [laugh] want to avoid those sorts of situations for our customers. So, we are looking at quite a few additional improvements to that experience as well that I'm really excited about that I really can't share here, but stay tuned.Corey: I am looking forward to it. I will say the counterpoint to that is having worked with customers who do make large eight-figure purchases at once, there's a psychology element that plays into it. Everyone is very scared to click the button on the ‘Buy It Now' thing or the ‘Approve It.' So, what I've often found is at that scale, one, you can reduce what you're buying by half of it, and then see how that treats you and then continue to iterate forward rather than doing it all at once, or reach out to your account team and have them orchestrate the buy. In previous engagements, I had a customer do this religiously and at one point, the concierge team bought the wrong thing in the wrong region, and from my perspective, I would much rather have AWS apologize for that and fix it on their end, than from us having to go with a customer side of, “Oh crap, oh, crap. Please be nice to us.”Not that I doubt you would do it, but that's not the nervous conversation I want to have in quite the same way. It just seems odd to me that someone would want to make that scale of purchase without ever talking to a human. I mean, I get it. I'm as antisocial as they come some days, but for that kind of money, I kind of just want another human being to validate that I'm not making a giant mistake.Rick: We love that. That's such a tremendous opportunity for us to engage and discuss with an organization that's going to make a large commitment, that here's the impact, here's how we can help. How does it align to our strategy? We also do recommend, from a strategic perspective, those more incremental purchases. I think it creates a better experience long-term when you don't have a single Savings Plan that's going to expire on a specific day that all of a sudden increases your entire bill by a significant percentage.So, making staggered monthly purchases makes a lot of sense. And it also works better for incremental growth, right? If your organization is growing 5% month-over-month or year-over-year or something like that, you can purchase those incremental Savings Plans that sort of stack up on top of each other and then you don't have that risk of a cliff one day where one super-large SP expires and boom, you have to scramble and repurchase within minutes because every minute that goes by is an additional expense, right? That's not a great experience. And so that's, really, a large part of why those staggered purchase experiences make a lot of sense.That being said, a lot of companies do their math and their finance in different ways. And single large purchases makes sense to go through their process and their rigor as well. So, we try to support both types of purchasing patterns.Corey: I think that is an underappreciated aspect of cloud cost savings and cloud cost optimization, where it is much more about humans than it is about math. I see this most notably when I'm helping customers negotiate their AWS contracts with AWS, where they are often perspectives such as, “Well, we feel like we really got screwed over last time, so we want to stick it to them and make them give us a bigger percentage discount on something.” And it's like, look, you can do that, but I would much rather, if it were me, go for something that moves the needle on your actual business and empowers you to move faster, more effectively, and lead to an outcome that is a positive for everyone versus the well, we're just going to be difficult in this one point because they were difficult on something last time. But ego is a thing. Human psychology is never going to have an API for it. And again, customers get to decide their own destiny in some cases.Rick: I completely agree. I've actually experienced that. So, this is the third company I've been working at on Cloud optimization. I spent several years at Microsoft running an optimization program. I went to Turbonomic for several years, building out the right-sizing and savings plan reservation purchase capabilities there, and now here at AWS.And through all of these journeys and experiences working with companies to help optimize their cloud spend, I can tell you that the psychological needle—moving the needle is significantly harder than the technology stack of sizing something correctly or deleting something that's unused. We can solve the technology part. We can build great products that identify opportunities to save money. There's still this psychological component of IT, for the last several decades has gone through this maturity curve of if it's not broken, don't touch it. Five-nines, six sigma, all of these methods of IT sort of rationalizing do no harm, don't touch anything, everything must be up.And it even kind of goes back several decades. Back when if you rebooted a physical server, the motherboard capacitors would pop, right? So, there's even this anti—or this stigma against even rebooting servers sometimes. In the cloud really does away with a lot of that stuff because we have live migration and we have all of these, sort of, stateless designs and capabilities, but we still carry along with us this mentality of don't touch it; it might fall over. And we have to really get past that.And that means that the trust, we went back to the trust conversation where we talk about the recommendations must be incredibly accurate. You're risking your job, in some cases; if you are a DevOps engineer, and your commitments on your yearly goals are uptime, latency, response time, load time, these sorts of things, these operational metrics, KPIs that you use, you don't want to take a downsized recommendation. It has a severe risk of harming your job and your bonus.Corey: “These instances are idle. Turn them off.” It's like, yeah, these instances are the backup site, or the DR environment, or—Rick: Exactly.Corey: —something that takes very bursty but occasional traffic. And yeah, I know it costs us some money, but here's the revenue figures for having that thing available. Like, “Oh, yeah. Maybe we should shut up and not make dumb recommendations around things,” is the human response, but computers don't have that context.Rick: Absolutely. And so, the accuracy and trust component has to be the highest bar we meet for any optimization activity or behavior. We have to circumvent or supersede the human aversion, the risk aversion, that IT is built on, right?Corey: Oh, absolutely. And let's be clear, we see this all the time where I'm talking to customers and they have been burned before because we tried to save money and then we took a production outage as a side effect of a change that we made, and now we're not allowed to try to save money anymore. And there's a hidden truth in there, which is auto-scaling is something that a lot of customers talk about, but very few have instrumented true auto-scaling because they interpret is we can scale up to meet demand. Because yeah, if you don't do that you're dropping customers on the floor.Well, what about scaling back down again? And the answer there is like, yeah, that's not really a priority because it's just money. We're not disappointing customers, causing brand reputation, and we're still able to take people's money when that happens. It's only money; we can fix it later. Covid shined a real light on a lot of the stuff just because there are customers that we've spoken to who's—their user traffic dropped off a cliff, infrastructure spend remained constant day over day.And yeah, they believe, genuinely, they were auto-scaling. The most interesting lies are the ones that customers tell themselves, but the bill speaks. So, getting a lot of modernization traction from things like that was really neat to watch. But customers I don't think necessarily intuitively understand most aspects of their bill because it is a multidisciplinary problem. It's engineering, its finance, its accounting—which is not the same thing as finance—and you need all three of those constituencies to be able to communicate effectively using a shared and common language. It feels like we're marriage counseling between engineering and finance, most weeks.Rick: Absolutely, we are. And it's important we get it right, that the data is accurate, that the recommendations we provide are trustworthy. If the finance team gets their hands on the savings potential they see out of right-sizing, takes it to engineering, and then engineering comes back and says, “No, no, no, we can't actually do that. We can't actually size those,” right, we have problems. And they're cultural, they're transformational. Organizations' appetite for these things varies greatly and so it's important that we address that problem from all of those angles. And it's not easy to do.Corey: How big do you find the optimization problem is when you talk to customers? How focused are they on it? I have my answers, but that's the scale of anec-data. I want to hear your actual answer.Rick: Yeah. So, we talk with a lot of customers that are very interested in optimization. And we're very interested in helping them on the journey towards having an optimal estate. There are so many nuances and barriers, most of them psychological like we already talked about.I think there's this opportunity for us to go do better exposing the potential of what an optimal AWS estate would look like from a dollar and savings perspective. And so, I think it's kind of not well understood. I think it's one of the biggest areas or barriers of companies really attacking the optimization problem with more vigor is if they knew that the potential savings they could achieve out of their AWS environment would really align their spend much more closely with the business value they get, I think everybody would go bonkers. And so, I'm really excited about us making progress on exposing that capability or the total savings potential and amount. It's something we're looking into doing in a much more obvious way.And we're really excited about customers doing that on AWS where they know they can trust AWS to get the best value for their cloud spend, that it's a long-term good bet because their resources that they're using on AWS are all focused on giving business value. And that's the whole key. How can we align the dollars to the business value, right? And I think optimization is that connection between those two concepts.Corey: Companies are generally not going to greenlight a project whose sole job is to save money unless there's something very urgent going on. What will happen is as they iterate forward on the next generation of services or a migration of a service from one thing to another, they will make design decisions that benefit those optimizations. There's low-hanging fruit we can find, usually of the form, “Turn that thing off,” or, “Configure this thing slightly differently,” that doesn't take a lot of engineering effort in place. But, on some level, it is not worth the engineering effort it takes to do an optimization project. We've all met those engineers—speaking is one of them myself—who, left to our own devices, will spend two months just knocking a few hundred bucks a month off of our AWS developer environment.We steal more than office supplies. I'm not entirely sure what the business value of doing that is, in most cases. For me, yes, okay, things that work in small environments work very well in large environments, generally speaking, so I learned how to save 80 cents here and that's a few million bucks a month somewhere else. Most folks don't have that benefit happening, so it's a question of meeting them where they are.Rick: Absolutely. And I think the scale component is huge, which you just touched on. When you're talking about a hundred EC2 instances versus a thousand, optimization becomes kind of a different component of how you manage that AWS environment. And while single-decision recommendations to scale an individual server, the dollar amount might be different, the percentages are just about the same when you look at what is it to be sized correctly, what is it to be configured correctly? And so, it really does come down to priority.And so, it's really important to really support all of those companies of all different sizes and industries because they will have different experiences on AWS. And some will have more sensitivity to cost than others, but all of them want to get great business value out of their AWS spend. And so, as long as we're meeting that need and we're supporting our customers to make sure they understand the commitment we have to ensuring that their AWS spend is valuable, it is meaningful, right, they're not spending money on things that are not adding value, that's really important to us.Corey: I do want to have as the last topic of discussion here, how AWS views optimization, where there have been a number of repeated statements where helping customers optimize their cloud spend is extremely important to us. And I'm trying to figure out where that falls on the spectrum from, “It's the thing we say because they make us say it, but no, we're here to milk them like cows,” all the way on over to, “No, no, we passionately believe in this at every level, top to bottom, in every company. We are just bad at it.” So, I'm trying to understand how that winds up being expressed from your lived experience having solved this problem first outside, and then inside.Rick: Yeah. So, it's kind of like part of my personal story. It's the main reason I joined AWS. And, you know, when you go through the interview loops and you talk to the leaders of an organization you're thinking about joining, they always stop at the end of the interview and ask, “Do you have any questions for us?” And I asked that question to pretty much every single person I interviewed with. Like, “What is AWS's appetite for helping customers save money?”Because, like, from a business perspective, it kind of is a little bit wonky, right? But the answers were varied, and all of them were customer-obsessed and passionate. And I got this sense that my personal passion for helping companies have better efficiency of their IT resources was an absolute primary goal of AWS and a big element of Amazon's leadership principle, be customer obsessed. Now, I'm not a spokesperson, so [laugh] we'll see, but we are deeply interested in making sure our customers have a great long-term experience and a high-trust relationship. And so, when I asked these questions in these interviews, the answers were all about, “We have to do the right thing for the customer. It's imperative. It's also in our DNA. It's one of the most important leadership principles we have to be customer-obsessed.”And it is the primary reason why I joined: because of that answer to that question. Because it's so important that we achieve a better efficiency for our IT resources, not just for, like, AWS, but for our planet. If we can reduce consumption patterns and usage across the planet for how we use data centers and all the power that goes into them, we can talk about meaningful reductions of greenhouse gas emissions, the cost and energy needed to run IT business applications, and not only that, but most all new technology that's developed in the world seems to come out of a data center these days, we have a real opportunity to make a material impact to how much resource we use to build and use these things. And I think we owe it to the planet, to humanity, and I think Amazon takes that really seriously. And I'm really excited to be here because of that.Corey: As I recall—and feel free to make sure that this comment never sees the light of day—you asked me before interviewing for the role and then deciding to accept it, what I thought about you working there and whether I would recommend it, whether I wouldn't. And I think my answer was fairly nuanced. And you're working there now and we still are on speaking terms, so people can probably guess what my comments took the shape of, generally speaking. So, I'm going to have to ask now; it's been, what, a year since you joined?Rick: Almost. I think it's been about eight months.Corey: Time during a pandemic is always strange. But I have to ask, did I steer you wrong?Rick: No. Definitely not. I'm very happy to be here. The opportunity to help such a broad range of companies get more value out of technology—and it's not just cost, right, like we talked about. It's actually not about the dollar number going down on a bill. It's about getting more value and moving the needle on how do we efficiently use technology to solve business needs.And that's been my career goal for a really long time, I've been working on optimization for, like, seven or eight, I don't know, maybe even nine years now. And it's like this strange passion for me, this combination of my dad taught me how to be a really good steward of money and a great budget manager, and then my passion for technology. So, it's this really cool combination of, like, childhood life skills that really came together for me to create a career that I'm really passionate about. And this move to AWS has been such a tremendous way to supercharge my ability to scale my personal mission, and really align it to AWS's broader mission of helping companies achieve more with cloud platforms, right?And so, it's been a really nice eight months. It's been wild. Learning AWS culture has been wild. It's a sharp diverging culture from where I've been in the past, but it's also really cool to experience the leadership principles in action. They're not just things we put on a website; they're actually things people talk about every day [laugh]. And so, that journey has been humbling and a great learning opportunity as well.Corey: If people want to learn more, where's the best place to find you?Rick: Oh, yeah. Contact me on LinkedIn or Twitter. My Twitter account is @rickyo1138. Let me know if you get the 1138 reference. That's a fun one.Corey: THX 1138. Who doesn't?Rick: Yeah, there you go. And it's hidden in almost every single George Lucas movie as well. You can contact me on any of those social media platforms and I'd be happy to engage with anybody that's interested in optimization, cloud technology, bill, anything like that. Or even not [laugh]. Even anything else, either.Corey: Thank you so much for being so generous with your time. I really appreciate it.Rick: My pleasure, Corey. It was wonderful talking to you.Corey: Rick Ochs, Principal Product Manager at AWS. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment, rightly pointing out that while AWS is great and all, Azure is far more cost-effective for your workloads because, given their lack security, it is trivially easy to just run your workloads in someone else's account.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About VictorVictor is an Independent Senior Cloud Infrastructure Architect working mainly on Amazon Web Services (AWS), designing: secure, scalable, reliable, and cost-effective cloud architectures, dealing with large-scale and mission-critical distributed systems. He also has a long experience in Cloud Operations, Security Advisory, Security Hardening (DevSecOps), Modern Applications Design, Micro-services and Serverless, Infrastructure Refactoring, Cost Saving (FinOps).Links Referenced: Zoph: https://zoph.io/ unusd.cloud: https://unusd.cloud Twitter: https://twitter.com/zoph LinkedIn: https://www.linkedin.com/in/grenuv/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at Datadog. Datadog's SaaS monitoring and security platform that enables full stack observability for developers, IT operations, security, and business teams in the cloud age. Datadog's platform, along with 500 plus vendor integrations, allows you to correlate metrics, traces, logs, and security signals across your applications, infrastructure, and third party services in a single pane of glass.Combine these with drag and drop dashboards and machine learning based alerts to help teams troubleshoot and collaborate more effectively, prevent downtime, and enhance performance and reliability. Try Datadog in your environment today with a free 14 day trial and get a complimentary T-shirt when you install the agent.To learn more, visit datadoghq.com/screaminginthecloud to get. That's www.datadoghq.com/screaminginthecloudCorey: Managing shards. Maintenance windows. Overprovisioning. ElastiCache bills. I know, I know. It's a spooky season and you're already shaking. It's time for caching to be simpler. Momento Serverless Cache lets you forget the backend to focus on good code and great user experiences. With true autoscaling and a pay-per-use pricing model, it makes caching easy. No matter your cloud provider, get going for free at gomomento.co/screaming That's GO M-O-M-E-N-T-O dot co slash screamingCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. One of the best parts about running a podcast like this and trolling the internet of AWS things is every once in a while, I get to learn something radically different than what I expected. For a long time, there's been this sort of persona or brand in the AWS space, specifically the security side of it, going by Zoph—that's Z-O-P-H—and I just assumed it was a collective or a whole bunch of people working on things, and it turns out that nope, it is just one person. And that one person is my guest today. Victor Grenu is an independent AWS architect. Victor, thank you for joining me.Victor: Hey, Corey, thank you for having me. It's a pleasure to be here.Corey: So, I want to start by diving into the thing that first really put you on my radar, though I didn't realize it was you at the time. You have what can only be described as an army of Twitter bots around the AWS ecosystem. And I don't even know that I'm necessarily following all of them, but what are these bots and what do they do?Victor: Yeah. I have a few bots on Twitter that I push some notification, some tweets, when things happen on AWS security space, especially when the AWS managed policies are updated from AWS. And it comes from an initial project from Scott Piper. He was running a Git command on his own laptop to push the history of AWS managed policy. And it told me that I can automate this thing using a deployment pipeline and so on, and to tweet every time a new change is detected from AWS. So, the idea is to monitor every change on these policies.Corey: It's kind of wild because I built a number of somewhat similar Twitter bots, only instead of trying to make them into something useful, I'd make them into something more than a little bit horrifying and extraordinarily obnoxious. Like there's a Cloud Boomer Twitter account that winds up tweeting every time Azure tweets something only it quote-tweets them in all caps and says something insulting. I have an AWS releases bot called AWS Cwoud—so that's C-W-O-U-D—and that winds up converting it to OwO speak. It's like, “Yay a new auto-scawowing growp.” That sort of thing is obnoxious and offensive, but it makes me laugh.Yours, on the other hand, are things that I have notifications turned on for just because when they announce something, it's generally fairly important. The first one that I discovered was your IAM changes bot. And I found some terrifying things coming out of that from time to time. What's the data source for that? Because I'm just grabbing other people's Twitter feeds or RSS feeds; you're clearly going deeper than that.Victor: Yeah, the data source is the official AWS managed policy. In fact, I run AWS CLI in the background and I'm doing just a list policy, the list policy command, and with this list I'm doing git of each policy that is returned, so I can enter it in a git repository to get the full history of the time. And I also craft a list of deprecated policy, and I also run, like, a dog-food initiative, the policy analysis, validation analysis from AWS tools to validate the consistency and the accuracy of the own policies. So, there is a policy validation with their own tool. [laugh].Corey: You would think that wouldn't turn up anything because their policy validator effectively acts as a linter, so if it throws an error, of course, you wouldn't wind up pushing that. And yet, somehow the fact that you have bothered to hook that up and have findings from it indicates that that's not how the real world works.Victor: Yeah, there is some, let's say, some false positive because we are running the policy validation with their own linter then own policies, but this is something that is documented from AWS. So, there is an official page where you can find why the linter is not working on each policy and why. There is a an explanation for each findings. I thinking of [unintelligible 00:05:05] managed policy, which is too long, and policy analyzer is crashing because the policy is too long.Corey: Excellent. It's odd to me that you have gone down this path because it's easy enough to look at this and assume that, oh, this must just be something you do for fun or as an aspect of your day job. So, I did a little digging into what your day job is, and this rings very familiar to me: you are an independent AWS consultant, only you're based out of Paris, whereas I was doing this from San Francisco, due to an escalatingly poor series of life choices on my part. What do you focus on in the AWS consulting world?Victor: Yeah. I'm running an AWS consulting boutique in Paris and I'm working for a large customer in France. And I'm doing mostly infrastructure stuff, infrastructure design for cloud-native application, and I'm also doing some security audits and [unintelligible 00:06:07] mediation for my customer.Corey: It seems to me that there's a definite divide as far as how people find the AWS consulting experience to be. And I'm not trying to cast judgment here, but the stories that I hear tend to fall into one of two categories. One of them is the story that you have, where you're doing this independently, you've been on your own for a while working specifically on this, and then there's the stories of, “Oh, yeah, I work for a 500 person consultancy and we do everything as long as they'll pay us money. If they've got money, we'll do it. Why not?”And it always seems to me—not to be overly judgy—but the independent consultants just seem happier about it because for better or worse, we get to choose what we focus on in a way that I don't think you do at a larger company.Victor: Yeah. It's the same in France or in Europe; there is a lot of consulting firms. But with the pandemic and with the market where we are working, in the cloud, in the cloud-native solution and so on, that there is a lot of demands. And the natural path is to start by working for a consulting firm and then when you are ready, when you have many AWS certification, when you have the experience of the customer, when you have a network of well-known customer, and you gain trust from your customer, I think it's natural to go by yourself, to be independent and to choose your own project and your own customer.Corey: I'm curious to get your take on what your perception of being an AWS consultant is when you're based in Paris versus, in my case, being based in the West Coast of the United States. And I know that's a bit of a strange question, but even when I travel, for example, over to the East Coast, suddenly, my own newsletter sends out three hours later in the day than I expect it to and that throws me for a loop. The AWS announcements don't come out at two or three in the afternoon; they come out at dinnertime. And for you, it must be in the middle of the night when a lot of those things wind up dropping. The AWS stuff, not my newsletter. I imagine you're not excitedly waiting on tenterhooks to see what this week's issue of Last Week in AWS talks about like I am.But I'm curious is that even beyond that, how do you experience the market? From what you're perceiving people in the United States talking about as AWS consultants versus what you see in Paris?Victor: It's difficult, but in fact, I don't have so much information about the independent in the US. I know that there is a lot, but I think it's more common in Europe. And yeah, it's an advantage to whoever ten-hour time [unintelligible 00:08:56] from the US because a lot of stuff happen on the Pacific time, on the Seattle timezone, on San Francisco timezone. So, for example, for this podcast, my Monday is over right now, so, so yeah, I have some advantage in time, but yeah.Corey: This is potentially an odd question for you. But I find an awful lot of the AWS documentation to be challenging, we'll call it. I don't always understand exactly what it's trying to tell me, and it's not at all clear that the person writing the documentation about a service in some cases has ever used the service. And in everything I just said, there is no language barrier. This documentation was written—theoretically—in English and I, most days, can stumble through a sentence in English and almost no other language. You obviously speak French as a first language. Given that you live in Paris, it seems to be a relatively common affliction. How do you find interacting with AWS in French goes? Or is it just a complete nonstarter, and it all has to happen in English for you?Victor: No, in fact, the consultants in Europe, I think—in fact, in my part, I'm using my laptop in English, I'm using my phone in English, I'm using the AWS console in English, and so on. So, the documentation for me is a switch on English first because for the other language, there is sometimes some automated translation that is very dangerous sometimes, so we all keep the documentation and the materials in English.Corey: It's wild to me just looking at how challenging so much of the stuff is. Having to then work in a second language on top of that, it just seems almost insurmountable to me. It's good they have automated translation for a lot of this stuff, but that falls down in often hilariously disastrous ways, sometimes. It's wild to me that even taking most programming languages that folks have ever heard of, even if you program and speak no English, which happens in a large part of the world, you're still using if statements even if the term ‘if' doesn't mean anything to you localized in your language. It really is, in many respects, an English-centric industry.Victor: Yeah. Completely. Even in French for our large French customer, I'm writing the PowerPoint presentation in English, some emails are in English, even if all the folks in the thread are French. So yeah.Corey: One other area that I wanted to explore with you a bit is that you are very clearly focused on security as a primary area of interest. Does that manifest in the work that you do as well? Do you find that your consulting engagements tend to have a high degree of focus on security?Victor: Yeah. In my design, when I'm doing some AWS architecture, my main objective is to design some security architecture and security patterns that apply best practices and least privilege. But often, I'm working for engagement on security audits, for startups, for internal customer, for diverse company, and then doing some accommodation after all. And to run my audit, I'm using some open-source tooling, some custom scripts, and so on. I have a methodology that I'm running for each customer. And the goal is to sometime to prepare some certification, PCI DSS or so on, or maybe to ensure that the best practice are correctly applied on a workload or before go-live or, yeah.Corey: One of the weird things about this to me is that I've said for a long time that cost and security tend to be inextricably linked, as far as being a sort of trailing reactive afterthought for an awful lot of companies. They care about both of those things right after they failed to adequately care about those things. At least in the cloud economic space, it's only money as opposed to, “Oops, we accidentally lost our customers' data.” So, I always found that I find myself drifting in a security direction if I don't stop myself, just based upon a lot of the cost work I do. Conversely, it seems that you have come from the security side and you find yourself drifting in a costing direction.Your side project is a SaaS offering called unusd.cloud, that's U-N-U-S-D dot cloud. And when you first mentioned this to me, my immediate reaction was, “Oh, great. Another SaaS platform for costing. Let's tear this one apart, too.” Except I actually like what you're building. Tell me about it.Victor: Yeah, and unusd.cloud is a side project for me and I was working since, let's say one year. It was a project that I've deployed for some of my customer on their local account, and it was very useful. And so, I was thinking that it could be a SaaS project. So, I've worked at [unintelligible 00:14:21] so yeah, a few months on shifting the product to assess [unintelligible 00:14:27].The product aim to detect the worst on AWS account on all AWS region, and it scan all your AWS accounts and all your region, and you try to detect and use the EC2, LDS, Glue [unintelligible 00:14:45], SageMaker, and so on, and attach a EBS and so on. I don't craft a new dashboard, a new Cost Explorer, and so on. It's it just cost awareness, it's just a notification on email or Slack or Microsoft Teams. And you just add your AWS account on the project and you schedule, let's say, once a day, and it scan, and it send you a cost of wellness, a [unintelligible 00:15:17] detection, and you can act by turning off what is not used.Corey: What I like about this is it cuts at the number one rule of cloud economics, which is turn that shit off if you're not using it. You wouldn't think that I would need to say that except that everyone seems to be missing that, on some level. And it's easy to do. When you need to spin something up and it's not there, you're very highly incentivized to spin that thing up. When you're not using it, you have to remember that thing exists, otherwise it just sort of sits there forever and doesn't do anything.It just costs money and doesn't generate any value in return for that. What you got right is you've also eviscerated my most common complaint about tools that claim to do this, which is you build in either a explicit rule of ignore this resource or ignore resources with the following tags. The benefit there is that you're not constantly giving me useless advice, like, “Oh, yeah, turn off this idle thing.” It's, yeah, that's there for a reason, maybe it's my dev box, maybe it's my backup site, maybe it's the entire DR environment that I'm going to need at little notice. It solves for that problem beautifully. And though a lot of tools out there claim to do stuff like this, most of them really failed to deliver on that promise.Victor: Yeah, I just want to keep it simple. I don't want to add an additional console and so on. And you are correct. You can apply a simple tag on your asset, let's say an EC2 instances, you apply the tag in use and the value of, and then the alerting is disabled for this asset. And the detection is based on the CPU [unintelligible 00:17:01] and the network health metrics, so when the instances is not used in the last seven days, with a low CPU every [unintelligible 00:17:10] and low network out, it comes as a suspect. [laugh].[midroll 00:17:17]Corey: One thing that I like about what you've done, but also have some reservations about it is that you have not done with so many of these tools do which is, “Oh, just give us all the access in your account. It'll be fine. You can trust us. Don't you want to save money?” And yeah, but I also still want to have a company left when all sudden done.You are very specific on what it is that you're allowed to access, and it's great. I would argue, on some level, it's almost too restrictive. For example, you have the ability to look at EC2, Glue, IAM—just to look at account aliases, great—RDS, Redshift, and SageMaker. And all of these are simply list and describe. There's no gets in there other than in Cost Explorer, which makes sense. You're not able to go rummaging through my data and see what's there. But that also bounds you, on some level, to being able to look only at particular types of resources. Is that accurate or are you using a lot of the CloudWatch stuff and Cost Explorer stuff to see other areas?Victor: In fact, it's the least privilege and read-only permission because I don't want too much question for the security team. So, it's full read-only permission. And I've only added the detection that I'm currently supports. Then if in some weeks, in some months, I'm adding a new detection, let's say for Snapshot, for example, I will need to update, so I will ask my customer to update their template. There is a mechanisms inside the project to tell them that the template is obsolete, but it's not a breaking change.So, the detection will continue, but without the new detection, the new snapshot detection, let's say. So yeah, it's least privilege, and all I need is the get-metric-statistics from CloudWatch to detect unused assets. And also checking [unintelligible 00:19:16] Elastic IP or [unintelligible 00:19:19] EBS volume. So, there is no CloudWatching in this detection.Corey: Also, to be clear, I am not suggesting that what you have done is at all a mistake, even if you bound it to those resources right now. But just because everyone loves to talk about these exciting, amazing, high-level services that AWS has put up there, for example, oh, what about DocumentDB or all these other—you know, Amazon Basics MongoDB; same thing—or all of these other things that they wind up offering, but you take a look at where customers are spending money and where they're surprised to be spending money, it's EC2, it's a bit of RDS, occasionally it's S3, but that's a lot harder to detect automatically whether that data is unused. It's, “You haven't been using this data very much.” It's, “Well, you see how the bucket is labeled ‘Archive Backups' or ‘Regulatory Logs?'” imagine that. What a ridiculous concept.Yeah. Whereas an idle EC2 instance sort of can wind up being useful on this. I am curious whether you encounter in the wild in your customer base, folks who are having idle-looking EC2 instances, but are in fact, for example, using a whole bunch of RAM, which you can't tell from the outside without custom CloudWatch agents.Victor: Yeah, I'm not detecting this behavior for larger usage of RAM, for example, or for maybe there is some custom application that is low in CPU and don't talk to any other services using the network, but with this detection, with the current state of the detection, I'm covering large majority of waste because what I see from my customer is that there is some teams, some data scientists or data teams who are experimenting a lot with SageMaker with Glue, with Endpoint and so on. And this is very expensive at the end of the day because they don't turn off the light at the end of the day, on Friday evening. So, what I'm trying to solve here is to notify the team—so on Slack—when they forgot to turn off the most common waste on AWS, so EC2, LTS, Redshift.Corey: I just now wound up installing it while we've been talking on my dedicated shitposting account, and sure enough, it already spat out a single instance it found, which yeah was running an EC2 instance on the East Coast when I was just there, so that I had a DNS server that was a little bit more local. Okay, great. And it's a T4g.micro, so it's not exactly a whole lot of money, but it does exactly what it says on the tin. It didn't wind up nailing the other instances I have in that account that I'm using for a variety of different things, which is good.And it further didn't wind up falling into the trap that so many things do, which is the, “Oh, it's costing you zero and your spend this month is zero because this account is where I dump all of my AWS credit codes.” So, many things say, “Oh, well, it's not costing you anything, so what's the problem?” And then that's how you accidentally lose $100,000 in activate credits because someone left something running way too long. It does a lot of the right things that I would hope and expect it to do, and the fact that you don't do that is kind of amazing.Victor: Yeah. It was a need from my customer and an opportunity. It's a small bet for me because I'm trying to do some small bets, you know, the small bets approach, so the idea is to try a new thing. It's also an excuse for me to learn something new because building a SaaS is a challenging.Corey: One thing that I am curious about, in this account, I'm also running the controller for my home WiFi environment. And that's not huge. It's T3.small, but it is still something out there that it sits there because I need it to exist. But it's relatively bored.If I go back and look over the last week of CloudWatch metrics, for example, it doesn't look like it's usually busy. I'm sure there's some network traffic in and out as it updates itself and whatnot, but the CPU peeks out at a little under 2% used. It didn't warn on this and it got it right. I'm just curious as to how you did that. What is it looking for to determine whether this instance is unused or not?Victor: It's the magic [laugh]. There is some intelligence artif—no, I'm just kidding. It just statistics. And I'm getting two metrics, the superior average from the last seven days and the network out. And I'm getting the average on those metrics and I'm doing some assumption that this EC2, this specific EC2 is not used because of these metrics, this server average.Corey: Yeah, it is wild to me just that this is working as well as it is. It's just… like, it does exactly what I would expect it to do. It's clear that—and this is going to sound weird, but I'm going to say it anyway—that this was built from someone who was looking to answer the question themselves and not from the perspective of, “Well, we need to build a product and we have access to all of this data from the API. How can we slice and dice it and add some value as we go?” I really liked the approach that you've taken on this. I don't say that often or lightly, particularly when it comes to cloud costing stuff, but this is something I'll be using in some of my own nonsense.Victor: Thanks. I appreciate it.Corey: So, I really want to thank you for taking as much time as you have to talk about who you are and what you're up to. If people want to learn more, where can they find you?Victor: Mainly on Twitter, my handle is @zoph [laugh]. And, you know, on LinkedIn or on my company website, as zoph.io.Corey: And we will, of course, put links to that in the [show notes 00:25:23]. Thank you so much for your time today. I really appreciate it.Victor: Thank you, Corey, for having me. It was a pleasure to chat with you.Corey: Victor Grenu, independent AWS architect. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment that is going to cost you an absolute arm and a leg because invariably, you're going to forget to turn it off when you're done.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Today Simon is joined by Rishabh Ray Chaudhury, Sr. Product Manager, to learn more about SageMaker Serverless Inference. We will dive deep into some of the customer use cases that Serverless Inference solves, key benefits of the feature, and how customers can leverage this SageMaker feature to reduce machine learning inference costs. Read the blog - https://go.aws/3rUkP3i See the documentation - https://go.aws/3gbjvqh
About NipunNipun Agarwal is a Senior Vice President, MySQL HeatWave Development, Oracle. His interests include distributed data processing, machine learning, cloud technologies and security. Nipun was part of the Oracle Database team where he introduced a number of new features. He has been awarded over 170 patents.Links Referenced: Oracle: https://oracle.com MySQL HeatWave info: https://www.oracle.com/mysql/ MySQL Service on AWS and OCI login (Oracle account required): https://cloud.mysql.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at Datadog. Datadog's SaaS monitoring and security platform that enables full stack observability for developers, IT operations, security, and business teams in the cloud age. Datadog's platform, along with 500 plus vendor integrations, allows you to correlate metrics, traces, logs, and security signals across your applications, infrastructure, and third party services in a single pane of glass.Combine these with drag and drop dashboards and machine learning based alerts to help teams troubleshoot and collaborate more effectively, prevent downtime, and enhance performance and reliability. Try Datadog in your environment today with a free 14 day trial and get a complimentary T-shirt when you install the agent.To learn more, visit datadoghq.com/screaminginthecloud to get. That's www.datadoghq.com/screaminginthecloudCorey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode is sponsored by our friends at Oracle, and back for a borderline historic third round going out and telling stories about these things, we have Nipun Agarwal, who is, as opposed to his first appearance on the show, has been promoted to senior vice president of MySQL HeatWave. Nipun, thank you for coming back. Most people are not enamored enough with me to subject themselves to my slings and arrows a second time, let alone a third. So first, thanks. And are you okay, over there?Nipun: Thank you, Corey. Yeah, very happy to be back.Corey: [laugh]. So, since the last time we've spoken, there have been some interesting developments that have happened. It was pre-announced by Larry Ellison on a keynote stage or an earnings call, I don't recall the exact format, that HeatWave was going to be coming to AWS. Now, you've conducted a formal announcement, this usual media press blitz, et cetera, talking about it with an eye toward general availability later this year, if I'm not mistaken, and things seem to be—if you'll forgive the term—heating up a bit.Nipun: That is correct. So, as you know, we have had MySQL HeatWave on OCI for just about two years now. Very good reception, a lot of people who are using MySQL HeatWave, are migrating from other clouds, specifically from AWS, and now we have announced availability of MySQL HeatWave on AWS.Corey: So, for those who have not done the requisite homework of listening to the entire back catalog of nearly 400 episodes of this show, what exactly is MySQL HeatWave, just so we make sure that we set the stage for what we're going to be talking about? Because I sort of get the sense that without a baseline working knowledge of what that is, none of the rest of this is going to make a whole lot of sense.Nipun: MySQL HeatWave is a managed MySQL service provided by Oracle. But it is different from other MySQL-based services in the sense that we have significantly enhanced the service such that it can very efficiently process transactions, analytics, and in-database machine learning. So, what customers get with the service, with MySQL HeatWave, is a single MySQL database which can process OLTP, transaction processing, real-time analytics, and machine learning. And they can do this without having to move the data out of MySQL into some other specialized database services who are running analytics or machine learning. And all existing tools and applications which work with MySQL work as is because this is something that enhances the server. In addition to that, it provides very good performance and very good price performance compared to other similar services out there.Corey: The idea historically that some folks were pushing around the idea of multi-cloud was that you would have workloads that—oh, they live in one cloud, but the database was going to be all the way across the other side of the internet, living in a different provider. And in practice, what we generally tend to see is that where the data lives is where the compute winds up living. By and large, it's easier to bring the compute resources to the data than it is to move the data to the compute, just because data egress in most of the cloud providers—notably exempting yours—is astronomically expensive. You are, if I recall correctly, less than 10% of AWS's data egress charge on just retail pricing alone, which is wild to me. So first, thank you for keeping that up and not raising prices because I would have felt rather annoyed if I'd been saying such good things. And it was, haha, it was a bait and switch. It was not. I'm still a big fan. So, thank you for that, first and foremost.Nipun: Certainly. And what you described is absolutely correct that while we have a lot of customers migrating from AWS to use MySQL HeatWave and OCI, a class of customers are unable to, and the number one reason they're unable to is that AWS charges these customers all very high egress fees to move the data out of AWS into OCI for them to benefit from MySQL HeatWave. And this has definitely been one of the key incentives for us, the key motivation for us, to offer MySQL HeatWave on AWS so that customers don't need to pay this exorbitant data egress fees.Corey: I think it's fair to disclose that I periodically advise a variety of different cloud companies from a perspective of voice-of-the-customer feedback, which essentially distills down to me asking really annoying slash obnoxious questions that I, as a customer, legitimately want to know, but people always frown at me when I asked that in vendor pitches. For some reason, when I'm doing this on an advisory basis, people instead nod thoughtfully and take notes, so that at least feels better from my perspective. Oracle Cloud has been one of those, and I've been kicking the tires on the AWS offering that you folks have built out for a bit of time now. I have to say, it is legitimate. I was able to run a significant series of tests on this, and what I found going through that process was interesting on a bunch of different levels.I'm wondering if it's okay with you, if we go through a few of them, just things that jumped out to me as we went through a series of conversations around, “So, we're going to run a service on AWS.” And my initial answer was, “Is this Oracle? Are you sure?” And here we are today; we are talking about it and press releases.Nipun: Yes, certainly fine with me. Please go ahead.Corey: So, I think one of the first questions I had when you said, “We're going to run a database service on AWS itself,” was, if I'm true to type, is going to be fairly sarcastic, which is, “Oh, thank God. Finally, a way to run a MySQL database on AWS. There's never been one of those before.” Unless you count EC2 or Aurora or Redshift depending upon how you squint at it, or a variety of other increasingly strange things. It feels like that has been a largely saturated market in many respects.I generally don't tend to advise on things that I find patently ridiculous, and your answer was great, but I don't want to put words in your mouth. What was it that you saw that made you say, “Ah, we're going to put a different database offering on AWS, and no, it's not a terrible decision.”Nipun: Got it. Okay, so if you look at it, the value proposition which MySQL HeatWave offers is that customers of MySQL or customers have MySQL compatible databases, whether Aurora, or whether it's RDS MySQL, right, or even, like, you know, customers of Redshift, they have been migrating to MySQL HeatWave on OCI. Like, for the reasons I said: it's a single database, customers don't need to have multiple databases for managing different kinds of workloads, it's much faster, it's a lot less expensive, right? So, there are a lot of value propositions. So, what we found is that if you were to offer MySQL HeatWave on AWS, it will significantly ease the migration of other customers who might be otherwise thinking that it will be difficult for them to migrate, perhaps because of the high egress cost of AWS, or because of the high latency some of the applications in the AWS incur when the database is running somewhere else.Or, if they really have an ecosystem of applications already running on AWS and they just want to replace the database, it'll be much easier for them if MySQL HeatWave was offered on AWS. Those are the reasons why we feel it's a compelling proposition, that if existing customers of AWS are willing to migrate the cloud from AWS to OCI and use MySQL HeatWave, there is clearly a value proposition we are offering. And if we can now offer the same service in AWS, it will hopefully increase the number of customers who can benefit from MySQL HeatWave.Corey: One of the next questions I had going in was, “Okay, so what actually is this under the hood?” Is this you effectively just stuffing some software into a machine image or an AMI—or however they want to mispronounce that word over an AWS-land—and then just making it available to your account and running that, where's the magic or mystery behind this? Like, it feels like the next more modern cloud approach is to stuff the whole thing into a Docker container. But that's not what you wound up doing.Nipun: Correct. So, HeatWave has been designed and architected for scale-out processing, and it's been optimized for the cloud. So, when we decided to offer MySQL HeatWave on AWS, we have actually gone ahead and optimize our server for the AWS architecture. So, the processor we are running on, right, we have optimized our software for that instance types in AWS, right? So, the data plane has been optimized for AWS architecture.The second thing is we have a brand new control plane layer, right? So, it's not the case that we're just taking what we had in OCI and running it on AWS. We have optimized the data plane for AWS, we have a native control plane, which is running on AWS, which is using the respective services on AWS. And third, we have a brand new console which we are offering, which is a very interactive console where customers can run queries from the console. They can do data management from the console, they're able to use Autopilot from the console, and we have performance monitoring from the console, right? So, data plane, control plane, console. They're all running natively in AWS. And this provides for a very seamless integration or seamless experience for the AWS customers.Corey: I think it's also a reality, however much we may want to pretend otherwise, that if there is an opportunity to run something in a different cloud provider that is better than where you're currently running it now, by and large, customers aren't going to do it because it needs to not just be better, but so astronomically better in ways that are foundational to a company's business model in order to justify the tremendous expense of a cloud migration, not just in real, out of pocket, cost in dollars and cents that are easy to measure, but also in terms of engineering effort, in terms of opportunity cost—because while you're doing that you're not doing other things instead—and, on some level, people tend to only do that when there's an overwhelming strategic reason to do it. When folks already have existing workloads on AWS, as many of them do, it stands to reason that they are not going to want to completely deviate from that strategy just because something else offers a better database experience any number of axes. So, meeting customers where they are is one of the, I guess, foundational shifts that we've really seen from the entire IT industry over the last 40 years, rather than you will buy it from us and you will tolerate it. It's, now customers have choice, and meeting them where they are and being much more, I guess, able to impedance-match with them has been critical. And I'm really optimistic about what the launch of this service portends for Oracle.Nipun: Indeed, but let me give you another data point. We find a very large number of Aurora customers migrating to MySQL HeatWave on OCI, right? And this is the same workload they were running on Aurora, but now they want to run the same workload on MySQL HeatWave on OCI. They are willing to undertake this journey of migration because their applications, they get much faster, and for a lot less price, but they get much faster. Then the second aspect is, there's another class of customers who are for instance running, on Aurora or other transactions or workloads, but then they have to keep moving the data, they'll keep performing the ETL process into some other service, whether it's Snowflake, or whether it's Redshift for analytics.Now, with this migration, when they move to MySQL HeatWave, customers don't need to, like, have multiple databases, and they get real-time analytics, meaning that if any data changes inside the server inside the OLTP as a database service, right? If they were to run a query, that query is giving them the latest results, right? It's not stale. Whereas with an ETL process, it gets to be stale. So, given that we already found that there were so many customers migrating to OCI to use MySQL HeatWave, I think there's a clear value proposition of MySQL HeatWave, and there's a lot of demand.But like, as I was mentioning earlier, by having MySQL HeatWave be offered on AWS, it makes the proposition even more compelling because, as you said, yes, there is some engineering work that customers will need to do to migrate between clouds, and if they don't want to, then absolutely now they have MySQL HeatWave which they can now use in AWS itself.Corey: I think that one of the things I continually find myself careening into, perhaps unexpectedly, is a failure to really internalize just how vast this entire industry really is. Every time I think I've seen it all, all I have to do is talk to one more cloud customer and I learn something completely new and different. Sometimes it's an innovative, exciting use of a thing. Other times, it's people holding something fearfully wrong and trying to use it as a hammer instead. And you know, if it's dumb and it works, is it really dumb? There are questions around that.And this in turn gave rise to one of my next obnoxious questions as I was looking at what you were building at the time because a lot of your pricing and discussions and framing of this was targeting very large enterprise-style customers, and the price points reflected that. And then I asked the question that Big E enterprise never quite expects, for whatever reason, it's like, “That looks awesome if I have a budget with many commas in it. What can I get for $4?” And as of this recording, pricing has not been finalized slash published for the service, but everything that you have shown me so far absolutely makes developing on this for a proof of concept or an evening puttering around, completely tenable: it is not bound to a fixed period of licensing; it's, use it when you want to use it, turn it off when you're done; and the hourly pricing is not egregious. I think that is something that historically, Oracle Database offerings have not really aligned with.OCI very much has, particularly with an eye toward its extraordinarily awesome free tier that's always free. But this feels like it's a weird blending of the OCI model versus historical Oracle Database pricing models in a way that, honestly I'm pretty excited about.Nipun: So, we react to what the customer requirements and needs are. So, for this class of customers who are using, say, RDS, MySQL, Aurora, we understand that they are very cost sensitive, right? So, one of the things which we have done in addition to offering MySQL HeatWave on AWS is based on the customer feedback and such. We are now offering a small shape of HeatWave instance in addition to the regular large shape. So, if customers want to just, you know, kick the tires, if developers just want to get started, they can get a MySQL node with HeatWave for less than ten cents an hour. So, for less than ten cents an hour, they get the ability to run transaction processing, analytics, and machine learning.And if you were to compare the corresponding cost of Aurora for the same, like, you know, core count, it's, like, you know, 12-and-a-half cents. And that's just Aurora, without Redshift or without SageMaker. So yes, you're right that based on the feedback and we have found that it would be much more attractive to have this low-end shape for the AWS developers. We are offering this smaller shape. And yeah, it's very, very affordable. It's about just shy of ten cents an hour.Corey: This brings up another question that I raised pretty early on in the process because you folks kept talking about shapes, and it turns out that is the Oracle Cloud term that applies to instance size over an AWS-land. And as we dug into this a bit further, it does make sense for how you think about these things and how you build them to customers. Specifically, if I want to run this, I log into cloud.oracle.com and sign up for it there, and pay you over on that side of the world, this does not show up on my AWS bill. What drove that decision?Nipun: Okay, so a couple of things. One clarification is that the site people log in to is cloud.mysql.com. So, that's where they come to: cloud.mysql.com.Corey: Oh, my apologies. I keep forgetting that you folks have multiple cloud offerings and domains. They're kind of a thing. How do they work? Given I have a bad domain by habit myself, I have no room to judge.Nipun: So, they come to cloud.mysql.com. From there, they can provision an instance. And we, as, like, you know, Oracle or MySQL, go ahead and create an instance in AWS, in the Oracle tenancy. From there, customers can then, you know, access their data on AWS and such. Now, what we want to provide the customers is a very seamless experience, that they just come to cloud.mysql.com, and from there, they can do everything: provisioning an instance, running the queries, payment and such. So, this is one of the reasons that we want customers just to be able to come to the site, cloud.mysql.com, and take care of the billing and such.Now, the other thing is that, okay, why not allow customers to pay from AWS, right? Now, one of the things over there is that if you were to do that and there's a customer, they'll be like, “Hey, I got to pay something to AWS, something to Oracle, so we'd prefer, it'd be better to have a one-stop shop.” And since many of these are already Oracle customers, it's helpful to do it this way.Corey: Another approach you could have taken—and I want to be very clear here that I am not suggesting that this would have been a good idea—but an approach that you could have taken would have been to go down the weird AWS partner rabbit hole, and we're going to provide this to customers on the AWS Marketplace. Because according to AWS, that's where all of their customers go to discover new softwares. Yeah, first, that's a lie. They do not. But aside from that, what was it about that Marketplace model that drove you to a decision point where okay, at launch, we are not going to be offering this on the AWS Marketplace? And to be clear, I'm not suggesting that was the wrong decision.Nipun: Right. The main reason is we want to offer the MySQL HeatWave service at the least expensive cost to the user, right, or like, the least cost. If you were to, like, have MySQL HeatWave in the Marketplace, AWS charges a premium. This the customers would need to pay. So, we just didn't want the customers to have to pay this additional premium just because they can now source this thing from the Marketplace. So, it's really to, like, save costs for the customer.Corey: The value of the Marketplace, from my perspective, has been effectively not having to deal as much with customer procurement departments because well, AWS is already on the procurement approved list, so we're just going to go ahead and take the hit to wind up making it accessible from that perspective and calling it good. The downside to this is that increasingly, as customers are making larger and longer-term commitments that are tied to certain levels of spend on AWS, they're increasingly trying to drag every vendor with whom they do business into the your AWS bill so they can check those boxes off. And the problem that I keep seeing with that is vendors who historically have been doing just fine, have great working relationships with a customer are reporting that suddenly customers are coming back with, “Yeah, so for our contract renewal, we want to go through the AWS Marketplace.” In return, effectively, these companies are then just getting a haircut off whatever it is they're able to charge their customers but receiving no actual value for any of this. It attenuates the relationship by introducing a third party into the process, and it doesn't make anything better from the vendor's point of view because they already had something functional and working; now they just have to pay a commission on it to AWS, who, it seems, is pathologically averse to any transaction happening where they don't get a cut, on some level. But I digress. I just don't like that model very much at all. It feels coercive.Nipun: That's absolutely right. That's absolutely right. And we thought that, yes, there is some value to be going to Marketplace, but it's not worth the additional premium customers would need to pay. Totally agree.Corey: This episode is sponsored in part by our friends at AWS AppConfig. Engineers love to solve, and occasionally create, problems. But not when it's an on-call fire-drill at 4 in the morning. Software problems should drive innovation and collaboration, NOT stress, and sleeplessness, and threats of violence. That's why so many developers are realizing the value of AWS AppConfig Feature Flags. Feature Flags let developers push code to production, but hide that that feature from customers so that the developers can release their feature when it's ready. This practice allows for safe, fast, and convenient software development. You can seamlessly incorporate AppConfig Feature Flags into your AWS or cloud environment and ship your Features with excitement, not trepidation and fear. To get started, go to snark.cloud/appconfig. That's snark.cloud/appconfig.Corey: It's also worth pointing out that in Oracle's historical customer base, by which I mean the last 40 years that you folks have been in business, you do have significant customers with very sizable estates. A lot of your cloud efforts have focused around, I guess, we'll call it an Oracle-specific currency: Oracle Credits. Which is similar to the AWS style of currency just for a different company in different ways. One of the benefits that you articulated to me relatively early on was that by going through cloud.mysql.com, customers with those credits—which can be in sizable amounts based upon various differentiating variables that change from case to case—and apply that to their use of MySQL HeatWave on AWS.Nipun: Right. So, in fact, just for starters, right, what we give to customers is we offer some free credits for customers to try a service on OCI of, you know, $300. And that's the same thing, the same experience you would like customers who are trying HeatWave on AWS to get. Yes, so you're right, this is the kind of consistency we want to have, and yet another reason why cloud.mysql.com makes sense is the entry point for customers to try the service.Corey: There was a time where I would have struggled not to laugh in your face at the idea that we're talking about something in the context of an Oracle database, and well, there's $300 in credit. That's, “What can I get for that? Hung up on?” No. A surprising amount, when it comes to these things.I feel like that opens up an entirely new universe of experimentation. And, “Let's see how this thing actually works with his workload,” and lets people kick the tires on it for themselves in a way that, “Oh, we have this great database. Well, can I try it? Sure, for $8 million, you absolutely can.” “Well, it can stay great and awesome over there because who wants to take that kind of a bet?” It feels like it's a new world and in a bunch of different respects, and I just can't make enough noise about how glad I am to see this transformation happening.Nipun: Yeah. Absolutely, right? So, just think about it. So, you're getting MySQL and HeatWave together for just shy of ten cents an hour, right? So, what you could get for $300 is 3000 hours for MySQL HeatWave instance, which is very good for people to try for free. And then, you know, decide if they want to go ahead with it.Corey: One other, I guess, obnoxious question that I love to ask—it's not really a question so much as a statement; that that's part of the first thing that makes it really obnoxious—but it always distills down to the following that upsets product people left and right, which is, “I don't get it.” And one of the things that I didn't fully understand at the outset of how you were structuring things was the idea of separating out HeatWave from its constituent components. I believe it was Autopilot if I'm not mistaken, and it was effectively different SKUs that you could wind up opting to go for. And okay, if I'm trying to kick the tires on this and contextualize it as someone for whom the world's best database is Route 53, then it really felt like an additional decision point that I wasn't clear on the value of. And I'm still not entirely sure on the differentiation point and the value there, but now you offer it bundled as a default, which I think is so much better, from the user experience perspective.Nipun: Okay, so let me clarify a couple of things.Corey: Please. Databases are not my forte, so expect me to wind up getting most of the details hilariously wrong.Nipun: Sure. So, MySQL Autopilot provides machine-learning-based automation for various aspects of the MySQL service; very popular. There is no charge for it. It is built into MySQL HeatWave; there is no additional charge for it, right, so there never was any SKU for it. What you're referring to is, we have had a SKU for the MySQL node or the MySQL instance, and there's a separate SKU for HeatWave.The reason there is a need to have a different SKU for these two is because you always only have one node of MySQL. It could be, like, you know, running on one core, or like, you know, multiple cores, but it's always, like, you know, one node. But with HeatWave, it's a scale-out architecture, so you can have multiple nodes. So, the users need to be able to express how many nodes of HeatWave are they provisioning, right? So, that's why there is a need to have two SKUs, and we continue to have those two SKUs.What we are doing now differently is that when users instantiate a MySQL instance, by default, they always get the HeatWave node associated with it, right? So, they don't need to, like, you know, make the decision to—okay when to add HeatWave; they always get HeatWave along with the MySQL instance, and that's what I was saying a combination of both of these is, you know, like, just about ten cents an hour. If for whatever reason, they decide that they do not want HeatWave, they can turn it off, and then the price drops to half. But what we're providing is the AWS service that HeatWave is turned on by default.Corey: Which makes an awful lot of sense. It's something that lets people opt out if they decide they don't need this as they continue to scale out, but for the newcomer who does not, in many cases—in my particular case—have a nuanced understanding of where this offering starts and stops, it's clearly the right decision of—rather than, “Oh, yeah. The thing you were trying and it didn't work super well? Well, yeah. If you enable this other thing, it would have been awesome.” “Well, great. Please enable it for me by default and let me opt out later in time as my level of understanding deepens.”Nipun: That's right. And that's exactly what we are doing. Now, this was a feedback we got because many, if not most, of our customers would want to have HeatWave, and we just kind of, you know, mitigating them from going through one more step, it's always enabled by default.Corey: As far as I'm aware, you folks are running this effectively as any other AWS customer might, where you establish a private link connection to your customers, in some cases, or give them a public or private endpoint where they can wind up communicating with this service. It doesn't require any favoritism or special permissions from AWS themselves that they wouldn't give to any other random customer out there, correct?Nipun: Yes, that is correct. So, for now, we are exposing this thing as a public endpoint. In the future, we have plans to support the private endpoint as well, but for now, it's public.Corey: Which means that foundationally what you're building out is something that fits into a model that could work extraordinarily well across a variety of different environments. How purpose-tuned is the HeatWave installation you have running on AWS for the AWS environment, versus something that is relatively agnostic, could be dropped into any random cloud provider, up to and including the terrifyingly obsolete rack I have in the spare room?Nipun: So, as I mentioned, when we decided to offer MySQL HeatWave on AWS, the idea was that okay, for the AWS customers, we now want to have an offering which is completely optimized for AWS, provides the best price-performance on AWS. So, we have determined which instance types underneath will provide the best price performance, and that's what we have optimized for, right? So, I can tell you, like, in terms of many of—for instance, take the case of the cache size of the underlying processor that we're using on AWS is different than what we're using for OCI. So, we have gone ahead, made these optimizations in our code, and we believe that our code is really optimized now for the AWS infrastructure.Corey: I think that makes a fair deal of sense because, again, one of the big problems AWS has had is the proliferation of EC2 instance types to the point now where the answer is super easy, too, “Are you using the correct instance type for your workload?” Because that answer now is, “Of course not. Who could possibly say that they were with any degree of confidence?” But when you take the time to look at a very specific workload that's going to be scaled out, it's worth the time investment to figure out exactly how to optimize things for price and performance, given the constraints. Let's be very clear here, I would argue that the better price performance for HeatWave is almost certainly not going to be on AWS themselves, if for no other reason than the joy that is their data transfer pricing, even for internal things moving around from time to time.Personally, I love getting charged data transfer for taking data from S3, running it through AWS Glue, putting it into a different S3 bucket, accessing it with Athena, then hooking that up to Tableau as we go down and down and down the spiraling rabbit hole that never ends. It's not exactly what I would call well-optimized economically. Their entire system feels almost like it's a rigged game, on some level. But given those constraints, yeah, dialing in it and making it cost-effective is absolutely something that I've watched you folks put significant time and effort into.Nipun: So, I'll make two points, right, to the questions. First is yes, I just want to, like, be clear about it, that when a user provisions MySQL HeatWave via cloud.mysql.com and we create an instance in AWS, we don't give customers a multitude of things to, like, you know, choose from.We have determined which instance type is going to provide the customer the best price performance, and that's what we provision. So, the customer doesn't even need to know or care, is it going to be, like, you know, AMD? Is it going to be Intel? Is it going to be, like, you know, ARM, right? So, it's something which we have predetermined and we have optimized for it. That's first.The second point is in terms of the price performance. So, you're absolutely right, that for the class of customers who cannot migrate away from AWS because of the egress costs or because of the high latency because of AWS, right, sure, MySQL HeatWave on AWS will provide the best price-performance compared to other services out in AWS like Redshift, or Aurora, or Snowflake. But if customers have the flexibility to choose a cloud of their choice, it is indeed the case that customers are going to find that running MySQL HeatWave on OCI is going to provide them, by far, the best price performance, right? So, the price performance of running MySQL HeatWave on OCI is indeed better than MySQL HeatWave on AWS. And just because of the fact that when we are running the service in AWS, we are paying the list price, right, on AWS; that's how we get the gear. Whereas with OCI, like, you know, things are a lot less expensive for us.But even when you're running on AWS, we are very, very price competitive with other services. And you know, as you've probably seen from the performance benchmarks and such, what I'm very intrigued about is that we're able to run a standard workload, like some, like, you know, TPC-H and offer seven times better price-performance while running in AWS compared to Redshift. So, what this goes to show is that we are really passing on the savings to the customers. And clearly, Redshift is not doing a good job of performance or, like, you know, they're charging too much. But the fact that we can offer seven times better price performance than Redshift in AWS speaks volumes, both about architecture and how much of savings we are passing to our customers.Corey: What I love about this story is that it makes testing the waters of what it's like to run MySQL HeatWave a lot easier for customers because the barrier to entry is so much lower. Where everything you just said I agree with it is more cost-effective to run on Oracle Cloud. I think there are a number of workloads that are best placed on Oracle Cloud. But unless you let people kick the tires on those things, where they happen to be already, it's difficult to get them to a point where they're going to be able to experience that themselves. This is a massive step on that path.Nipun: Yep. Right.Corey: I really want to thank you for taking time out of your day to walk us through exactly how this came to be and what the future is going to look like around this. If people want to learn more, where should they go?Nipun: Oh, they can go to oracle.com/mysql, and there they can get a lot more information about the capabilities of MySQL HeatWave, what we are offering in AWS, price-performance. By the way, all the price performance numbers I was talking about, all the scripts are available publicly on GitHub. So, we welcome, we encourage customers to download the scripts from GitHub, try for themselves, and all of this information is available from oracle.com/mysql where they can get this detailed information.Corey: And we will, of course, put links to that in the show notes. Thank you so much for your time. I appreciate it.Nipun: Sure thing, Corey. Thank you for the opportunity.Corey: Nipun Agarwal, Senior Vice President of MySQL HeatWave. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry insulting comment. You will then be overcharged for the data transfer to submit that insulting comment, and then AWS will take a percentage of that just because they're obnoxious and can.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Our guests have done most of their ML work on AWS offerings, from AWS Personalize for their initial recommendation engine to SageMaker for model training and deployment pipeline. Now they're building models from scratch in TensorFlow. Want to see these recommendations in action? Check out the offerings at Discovery+ and HBOMax. If you're a ML/AL data scientist looking to shape the future of automated curation, check out their open roles. Follow our guests on LinkedIn:Shrikant DesaiSowmya Subramanian
Bratin Saha, head of Amazon's machine learning services, talks about Amazon's growing dominance in model building and deploying AI, about the company's SageMaker platform, and whether anyone can compete with the behemoth.
About ChrisChris is a robotics engineer turned cloud security practitioner. From building origami robots for NASA, to neuroscience wearables, to enterprise software consulting, he is a passionate builder at heart. Chris is a cofounder of Common Fate, a company with a mission to make cloud access simple and secure.Links: Common Fate: https://commonfate.io/ Granted: https://granted.dev Twitter: https://twitter.com/chr_norm TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Let's face it, on-call firefighting at 2am is stressful! So there's good news and there's bad news. The bad news is that you probably can't prevent incidents from happening, but the good news is that incident.io makes incidents less stressful and a lot more valuable. incident.io is a Slack-native incident management platform that allows you to automate incident processes, focus on fixing the issues and learn from incident insights to improve site reliability and fix your vulnerabilities. Try incident.io, recover faster and sleep more.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate. Is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other; which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at honeycomb.io/screaminginthecloud. Observability: it's more than just hipster monitoring.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. It doesn't matter where you are on your journey in cloud—you could never have heard of Amazon the bookstore—and you encounter AWS and you spin up an account. And within 20 minutes, you will come to the realization that everyone in this space does. “Wow, logging in to AWS absolutely blows goats.”Today, my guest, obviously had that reaction, but unlike most people I talked to, decided to get up and do something about it. Chris Norman is the co-founder of Common Fate and most notably to how I know him is one of the original authors of the tool, Granted. Chris, thank you so much for joining me.Chris: Hey, Corey, thank you for having me.Corey: I have done podcasts before; I have done a blog post on it; I evangelize it on Twitter constantly, and even now, it is challenging in a few ways to explain holistically what Granted is. Rather than trying to tell your story for you, when someone says, “Oh, Granted, that seems interesting and impossible to Google for in isolation, so therefore, we know it's going to be good because all the open-source projects with hard to find names are,” what is Granted and what does it do?Chris: Granted is a command-line tool which makes it really easy for you to get access and assume roles when you're working with AWS. For me, when I'm using Granted day-to-day, I wake up, go to my computer—I'm working from home right now—crack open the MacBook and I log in and do some development work. I'm going to go and start working in the cloud.Corey: Oh, when I start first thing in the morning doing development work and logging into the cloud, I know. All right, I'm going to log in to AWS and now I know that my day is going downhill from here.Chris: [laugh]. Exactly, exactly. I think maybe the best days are when you don't need to log in at all. But when you do, I go and I open my terminal and I run this command. Using Granted, I ran this assume command and it authenticates me with single-sign-on into AWS, and then it opens up a console window in a particular account.Now, you might ask, “Well, that's a fairly standard thing.” And in fact, that's probably the way that the console and all of the tools work by default with AWS. Why do you need a third-party tool for this?Corey: Right. I've used a bunch of things that do varying forms of this and unlike Granted, you don't see me gushing about them. I want to be very clear, we have no business relationship. You're not sponsoring anything that I do. I'm not entirely clear on what your day job entails, but I have absolutely fallen in love with the Granted tool, which is why I'm dragging you on to this show, kicking and screaming, mostly to give me an excuse to rave about it some more.Chris: [laugh]. Exactly. And thank you for the kind words. And I'd say really what makes it special or why I've been so excited to be working on it is that it makes this access, particularly when you're working with multiple accounts, really, really easy. So, when I run assume and I open up that console window, you know, that's all fine and that's very similar to how a lot of the other tools and projects that are out there work, but when I want to open that second account and that second console window, maybe because I'm looking at like a development and a staging account at the same time, then Granted allows me to view both of those simultaneously in my browser. And we do that using some platform sort of tricks and building into the way that the browser works.Corey: Honestly, one of the biggest differences in how you describe what Granted is and how I view it is when you describe it as a CLI application because yes, it is that, but one of the distinguishing characteristics is you also have a Firefox extension that winds up leveraging the multi-container functionality extension that Firefox has. So, whenever I wind up running a single command—assume with a-c' flag, then I give it the name of my AWS profile, it opens the web console so I can ClickOps my heart's content inside of a tab that is locked to a container, which means I can have one or two or twenty different AWS accounts and/or regions up running simultaneously side-by-side, which is basically impossible any other way that I've ever looked at it.Chris: Absolutely, yeah. And that's, like, the big differentiating factor right now between Granted and between this sort of default, the native experience, if you're just using the AWS command line by itself. With Granted, you can—with these Firefox containers, all of your cookies, your profile, everything is all localized into that one container. It's actually it's a privacy features that are built into Firefox, which keeps everything really separate between your different profiles. And what we're doing with Granted is that we make it really easy to open a specific profiles that correspond with different AWS profiles that you're using.So, you'd have one which could be your development account, one which could be production or staging. And you can jump between these and navigate between them just as separate tabs in your browser, which is a massive improvement over, you know, what I've previously had to use in the past.Corey: The thing that really just strikes me about this is first, of course, the functionality and the rest, so I saw this—I forget how I even came across it—and immediately I started using it. On my Mac, it was great. I started using it when I was on the road, and it was less great because you built this thing in Go. It can compile and install on almost anything, but there were some assumptions that you had built into this in its early days that did not necessarily encompass all of the use cases that I use. For example, it hadn't really occurred to you that some lunatic would try and only use an iPad when they're on the road, so they have to be able to run this to get federated login links via SSHing into an EC2 instance running somewhere and not have it open locally.You seemed almost taken aback when I brought it up. Like, “What lunatic would do that?” Like, “Hi, I'm such a lunatic. Let's talk about this.” And it does that now, and it's awesome. It does seem to me though, and please correct me if I'm wrong on this assumption slash assessment that this is first and foremost aimed at desktop users, specifically people running Mac on the desktop, is that the genesis of it?Chris: It is indeed. And I think part of the cause behind that is that we originally built a tool for ourselves. And as we were building things and as we were working using the cloud, we were running things—you know, we like to think that we're following best practices when we're using AWS, and so we'd set up multiple accounts, we'd have a special account for development, a separate one for staging, a separate one for production, even internal tools that we would build, we would go and spin up an individual account for those. And then you know, we had lots of accounts. and to go and access those really easily was quite difficult.So, we definitely, we built it for ourselves first and I think that that's part of when we released it, it actually a little bit of cause for some of the initial problems. And some of the feedback that we had was that it's great to build tools for yourself, but when you're working in open-source, there's a lot of different diversity with how people are using things.Corey: We take different approaches. You want to try to align with existing best practices, whereas I am a loudmouth white guy who works in tech. So, what I do definitionally becomes a best practice in the ecosystem. It's easier to just comport with the ones that are already existing that smart people put together rather than just trying to competence your way through it, so you took a better path than I did.But there's been a lot of evolution to Granted as I've been using it for a while. I did a whole write-up on it and that got a whole bunch of eyes onto the project, which I can now admit was a nefarious plan on my part because popping into your community Slack and yelling at you for features I want was all well and good, but let's try and get some people with eyes on this who are smarter than me—which is not that high of a bar when it comes to SSO, and IAM, and federated login, and the rest—and they can start finding other enhancements that I'll probably benefit from. And sure enough, that's exactly what happened. My sneaky plan has come to fruition. Thanks for being a sucker, I guess. I mean—[laugh] it worked. I'm super thrilled by the product.Chris: [laugh]. I guess it's a great thing I think that the feedback and particularly something that's always been really exciting is just seeing new issues come through on GitHub because it really shows the kinds of interesting use cases and the kinds of interesting teams and companies that are using Granted to make their lives a little bit easier.Corey: When I go to the website—which again is impossible to Google—the website for those wondering is granted.dev. It's short, it's concise, I can say it on a podcast and people automatically know how to spell it. But at the top of the website—which is very well done by the way—it mentions that oh, you can, “Govern access to breakglass roles with Common Fate Cloud,” and it also says in the drop shadow nonsense thing in the upper corner, “Brought to you by Common Fate,” which is apparently the name of your company.So, the question I'll get to in a second is what does your company do, but first and foremost, is this going to be one of those rug-pull open-source projects where one day it's, “Oh, you want to log into your AWS accounts? Insert quarter to continue.” I'm mostly being a little over the top with that description, but we've all seen things that we love turn into molten garbage. What is the plan around this? Are you about to ruin this for the rest of us once you wind up raising a round or something? What's the deal?Chris: Yeah, it's a great question, Corey. And I think that to a degree, releasing anything like this that sits in the access workflow and helps you assume roles and helps you day-to-day, you know, we have a responsibility to uphold stability and reliability here and to not change things. And I think part of, like, not changing things includes not [laugh] rug-pulling, as you've alluded to. And I think that for some companies, it ends up that open-source becomes, like, a kind of a lead-generation tool, or you end up with, you know, now finally, let's go on add another login so that you have to log into Common Fate to use Granted. And I think that, to be honest, a tool like this where it's all about improving the speed of access, the incentives for us, like, it doesn't even make sense to try and add another login for to try to get people to, like, to say, login to Common Fate because that would make your signing process for AWS take even longer than it already does.Corey: Yeah, you decided that you know, what's the biggest problem? Oh, you can sleep at night, so let's go ahead and make it even worse, by now I want you to be this custodian of all my credentials to log into all of my accounts. And now you're going to be critical path, so if you're down, I'm not able to log into anything. And oh, by the way, I have to trust you with full access to my bank stuff. I just can't imagine that is a direction that you would be super excited about diving head-first into.Chris: No, no. Yeah, certainly not. And I think that the, you know, building anything in this space, and with what we're doing with Common Fate, you know, we're building a cloud platform to try to make IAM a little bit easier to work with, but it's really sensitive around granting any kind of permission and I think that you really do need that trust. So, trying to build trust, I guess, with our open-source projects is really important for us with Granted and with this project, that it's going to continue to be reliable and continue to work as it currently does.Corey: The way I see it, one of the dangers of doing anything that is particularly open-source—or that leans in the direction of building in Amazon's ecosystem—it leads to the natural question of, well, isn't this just going to be some people say stolen—and I don't think those people understand how open-source works—by AWS themselves? Or aren't they going to build something themselves at AWS that's going to wind up stomping this thing that you've built? And my honest and remarkably cynical answer is that, “You have built a tool that is a joy to use, that makes logging into AWS accounts streamlined and efficient in a variety of different patterns. Does that really sound like something AWS would do?” And followed by, “I wish they would because everyone would benefit from that rising tide.”I have to be very direct and very clear. Your product should not exist. This should be something the provider themselves handles. But nope. Instead, it has to exist. And while I'm glad it does, I also can't shake the feeling that I am incredibly annoyed by the fact that it has to.Chris: Yeah. Certainly, certainly. And it's something that I think about a little bit. I like to wonder whether there's maybe like a single feature flag or some single sort of configuration setting in AWS where they're not allowing different tabs to access different accounts, they're not allowing this kind of concurrent access. And maybe if we make enough noise about Granted, maybe one of the engineers will go and flick that switch and they'll just enable it by default.And then Granted itself will be a lot less relevant, but for everybody who's using AWS, that'll be a massive win because the big draw of using Granted is mainly just around being able to access different accounts at the same time. If AWS let you do that out of the box, hey, that would be great and, you know, I'd have a lot less stuff to maintain.Corey: Originally, I had you here to talk about Granted, but I took a glance at what you're actually building over at Common Fate and I'm about to basically hijack slash derail what probably is going to amount the rest of this conversation because you have a quick example on your site for by developers, for developers. You show a quick Python script that tries to access a S3 bucket object and it's denied. You copy the error message, you paste it into what you're building over a Common Fate, and in return, it's like, “Oh. Yeah, this is the policy that fixes it. Do you want us to apply it for you?”And I just about fell out of my chair because I have been asking for this explicit thing for a very long time. And AWS doesn't do it. Their IAM access analyzer claims to. Like, “Oh, just go look at CloudTrail and see what permissions it uses and we'll build a policy to scope it down.” “Okay. So, it's S3 access. Fair enough. To what object or what bucket?” “Guess,” is what it tells you there.And it's, this is crap. Who thinks this is a good user experience? You have built the thing that I wish AWS had built in natively. Because let's be honest here, I do what an awful lot of people do and overscope permissions massively just because messing around with the bare minimum set of permissions in many cases takes more time than building the damn thing in the first place.Chris: Oh, absolutely. Absolutely. And in fact, this—was a few years ago when I was consulting—I had a really similar sort of story where one of the clients that we were working with, the CTO of this company, he was needing to grant us access to AWS and we were needing to build a particular service. And he said, “Okay, can you just let me know the permissions that you will need and I'll go and deploy the role for this.” And I came back and I said, “Wait. I don't even know the permissions that I'm going to need because the damn thing isn't even built yet.”So, we went sort of back and forth around this. And the compromise ended up just being you know, way too much access. And that was sort of part of the inspiration for, you know, really this whole project and what we're building with Common Fate, just trying to make that feedback loop around getting to the right level of permissions a lot faster.Corey: Yeah, I am just so overwhelmingly impressed by the fact that you have built—and please don't take this as a criticism—but a set of very simple tools. Not simple in the terms of, “Oh, that's, like, three lines of bash, and a fool could write that on a weekend.” No. Simple in the sense of it solves a problem elegantly and well and it's straightforward—well, straightforward as anything in the world of access control goes—to wrap your head around exactly what it does. You don't tend to build these things by sitting around a table brainstorming with someone you met at co-founder dating pool or something and wind up figuring out, “Oh, we should go and solve that. That sounds like a billion-dollar problem.”This feels very much like the outcome of when you're sitting around talking to someone and let's start by drinking six beers so we become extraordinarily honest, followed immediately by let's talk about what sucks. What pisses you off the most? It feels like this is sort of the low-hanging fruit of things that upset people when it comes to AWS. I mean, if things had gone slightly differently, instead of focusing on AWS bills, IAM was next on my list of things to tackle just because I was tired of smacking my head into it.This is very clearly a problem space that you folks have analyzed deeply, worked within, and have put a lot of thought into. I want to be clear, I've thrown a lot of feature suggestions that you for Granted from start to finish. But all of them have been around interface stuff and usability and expanding use cases. None of them have been, “Well, that seems screamingly insecure.” Because it hasn't been.Chris: [laugh].Corey: It has been effective, start to finish, I think that from a security posture, you make terrific choices, in many cases better than ones I would have made a starting from scratch myself. Everything that I'm looking at in what you have built is from a position of this is absolutely amazing and it is transformative to my own workflows. Now, how can we improve it?Chris: Mmm. Thank you, Corey. And I'll say as well, maybe around the security angle, that one of the goals with Granted was to try and do things a little bit better than the default way that AWS does them when it comes to security. And it's actually been a bit of a source for challenges with some of the users that we've been working with with Granted because one of the things we wanted to do was encrypt the SSO token. And this is the token that when you sign in to AWS, kind of like, it allows you to then get access to all of the rest of the accounts.So, it's like a pretty—it's a short-lived token, but it's a really sensitive one. And you know, by default, it's just stored in plain text on your disk. So, we dump to a file and, you know, anything that can go and read that, they can go and get it. It's also a little bit hard to revoke and to lock people out. There's not really great workflows around that on AWS's side.So, we thought, “Okay, great. One of the goals for Granted can be that we will go and store this in your keychain in your system and we'll work natively with that.” And that's actually been a cause for a little bit of a hassle for some users, though, because by doing that and by storing all of this information in the keychain, it's actually broken some of the integrations with the rest of the tooling, which kind of expects tokens and things to be in certain places. So, we've actually had to, as part of dealing with that with Granted, we've had to give users the ability to opt out for that.Corey: DoorDash had a problem. As their cloud-native environment scaled and developers delivered new features, their monitoring system kept breaking down. In an organization where data is used to make better decisions about technology and about the business, losing observability means the entire company loses their competitive edge. With Chronosphere, DoorDash is no longer losing visibility into their applications suite. The key? Chronosphere is an open-source compatible, scalable, and reliable observability solution that gives the observability lead at DoorDash business, confidence, and peace of mind. Read the full success story at snark.cloud/chronosphere. That's snark.cloud slash C-H-R-O-N-O-S-P-H-E-R-E.Corey: That's why I find this so, I think, just across the board, fantastic. It's you are very clearly engaged with your community. There's a community Slack that you have set up for this. And I know, I know, too many Slacks; everyone has this problem. This is one of those that is worth hanging in, at least from my perspective, just because one of the problems that you have, I suspect, is on my Mac it's great because I wind up automatically updating it to whatever the most recent one is every time I do a brew upgrade.But on the Linux side of the world, you've discovered what many of us have discovered, and that is that packaging things for Linux is a freaking disaster. The current installation is, “Great. Here's basically a curl bash.” Or, “Here, grab this tarball and install it.” And that's fine, but there's no real way of keeping that updated and synced.So, I was checking the other day, oh wow, I'm something like eight versions behind on this box. But it still just works. I upgraded. Oh, wow. There's new functionality here. This is stuff that's actually really handy. I like this quite a bit. Let's see what else we can do.I'm just so impressed, start to finish, by just how receptive you've been to various community feedbacks. And as well—I want to be very clear on this point, too—I've had folks who actually know what they're doing in an InfoSec sense look at what you're up to, and none of them had any issues of note. I'm sure that they have a pile of things like, with that curl bash, they should really be doing a GPG check. Yes, yes, fine. Whatever. If that's your target threat model, okay, great. Here in reality-land for what I do, this is awesome.And they don't seem to have any problems with, “Oh, yeah. By the way, sending analytics back up”—which, okay, fine, whatever. “And it's not disclosing them.” Okay, that's bad. “And it's including the contents of your AWS credentials.”Ahhhh. I did encounter something that was doing that on the back-end once. [cough]—Serverless Framework—sorry, something caught in my throat for a second.Chris: [laugh].Corey: No faster way I can think of to erode trust in that. But everything you're doing just makes sense.Chris: Oh, I do remember that. And that was a little bit of a fiasco, really, around all of that, right? And it's great to hear actually around that InfoSec folks and security people being, you know, not unhappy, I guess, with a tool like this. It's been interesting for me personally. We've really come from a practitioner's background.You know, I wouldn't call myself a security engineer at all. I would call myself as a sometimes a software developer, I guess. I have been hacking my way around Go and definitely learning a lot about how the cloud has worked over the past seven, eight years or so, but I wouldn't call myself a security engineer, so being very cautious around how all of these things work. And we've really tried to defer to things like the system keychain and defer to things that we know are pretty safe and work.Corey: The thing that I also want to call out as well is that your licensing is under the MIT license. This is not one of those, “Oh, you're required to wind up doing a bunch of branding stuff around it.” And, like some people say, “Oh, you have to own the trademark for all of these things.” I mean, I'm not an expert in international trademark law, let's be very clear, but I also feel that trademarking a term that is already used heavily in the space such as the word ‘Granted,' feels like kind of an uphill battle. And let's further be clear that it doesn't matter what you call this thing.In fact, I will call attention to an oddity that I've encountered a fair bit. After installing it, the first thing you do is you run the command ‘granted.' That sets it up, it lets you configure your browser, what browser you want to use, and it now supports standard out for that headless, EC2 use case. Great. Awesome. Love it. But then the other binary that ships with it is Assume. And that's what I use day-to-day. It actually takes me a minute sometimes when it's been long enough to remember that the tool is called Granted and not Assume what's up with that?Chris: So, part of the challenge that we ran into when we were building the Granted project is that we needed to export some environment variables. And these are really important when you're logging into AWS because you have your access key, your secret key, your session token. All of those, when you run the assume command, need to go into the terminal session that you called it. This doesn't matter so much when you're using the console mode, which is what we mentioned earlier where you can open 100 different accounts if you want to view all of those at the same time in your browser. But if you want to use it in your terminal, we wanted to make it look as really smooth and seamless as possible here.And we were really inspired by this approach from—and I have to shout them out and kind of give credit to them—a tool called AWSume—they're spelled A-W-S-U-M-E—Python-based tool that they don't do as much with single-sign-on, but we thought they had a really nice, like, general approach to the way that they did the scripting and aliasing. And we were inspired by that and part of that means that we needed to have a shell script that called this executable, which then will export things back out into the shell script. And we're doing all this wizardry under the hood to make the user experience really smooth and seamless. Part of that meant that we separated the commands into granted and assume and the other part of the naming for everything is that I felt Granted had a far better ring to it than calling the whole project Assume.Corey: True. And when you say assume, is it AWS or not? I've used the AWSume project before; I've used AWS Vault out of 99 Designs for a while. I've used—for three minutes—the native AWS SSO config, and that is just trash. Again, they're so good at the plumbing, so bad at the porcelain, I think is the criticism that I would levy toward a lot of this stuff.Chris: Mmm.Corey: And it's odd to think there's an entire company built around just smoothing over these sharp, obnoxious edges, but I'm saying this as someone who runs a consultancy and have five years that just fixes the bill for this one company. So, there's definitely a series of cottage industries that spring up around these things. I would be thrilled, on some level, if you wound up being completely subsumed by their product advancements, but it's been 15 years for a lot of this stuff and we're still waiting. My big failure mode that I'm worried about is that you never are.Chris: Yeah, exactly, exactly. And it's really interesting when you think about all of these user experience gaps in AWS being opportunities for, I guess, for companies like us, I think, trying to simplify a lot of the complexity for things. I'm interested in sort of waiting for a startup to try and, like, rebuild the actual AWS console itself to make it a little bit faster and easier to use.Corey: It's been done and attempted a bunch of different times. The problem is that the console is a lot of different things to a lot of different people, and as you step through that, you can solve for your use case super easily. “Yeah, what do I care? I use RDS, I use some VPC nonsense, and I use EC2. The end.” “Great. What about IAM?”Because I promise you're using that whether you know it or not. And okay, well, I'm talking to someone else who's DynamoDB, and someone else is full-on serverless, and someone else has more money than sense, so they mostly use SageMaker, and so on and so forth. And it turns out that you're effectively trying to rebuild everything. I don't know if that necessarily works.Chris: Yeah, and I think that's a good point around maybe while we haven't seen anything around that sort of space so far. You go to the console, and you click down, you see that list of 200 different services and all of those have had teams go and actually, like, build the UI and work with those individual APIs. Yeah.Corey: Any ideas as far as what's next for features on Granted?Chris: I think that, for us, it's continuing to work with everybody who's using it, and with a focus of stability and performance. We actually had somebody in the community raise an issue because they have an AWS config file that's over 7000 lines long. And I kind of pity that person, potentially, for their day-to-day. They must deal with so much complexity. Granted is currently quite slow when the config files get very big. And for us, I think, you know, we built it for ourselves; we don't have that many accounts just yet, so working to try to, like, make it really performant and really reliable is something that's really important.Corey: If you don't mind a feature request while we're at it—and I understand that this is more challenging than it looks like—I'm willing to fund this as a feature bounty that makes sense. And this also feels like it might be a good first project for a very particular type of person, I would love to get tab completion working in Zsh. You have it—Chris: Oh.Corey: For Fish because there's a great library that automatically populates that out, but for the Zsh side of it, it's, “Oh, I should just wind up getting Zsh completion working,” and I fell down a rabbit hole, let me tell you. And I come away from this with the perception of yeah, I'm not going to do it. I have not smart enough to check those boxes. But a lot of people are so that is the next thing I would love to see. Because I will change my browser to log into the AWS console for you, but be damned if I'm changing my shell.Chris: [laugh]. I think autocomplete probably should be higher on our roadmap for the tool, to be honest because it's really, like, a key metric and what we're focusing on is how easy is it to log in. And you know, if you're not too sure what commands to use or if we can save you a few keystrokes, I think that would be the, kind of like, reaching our goals.Corey: From where I'm sitting, you definitely have. I really want to thank you for taking the time to not only build this in the first place, but also speak with me about it. If people want to learn more, where's the best place to find you?Chris: So, you can find me on Twitter, I'm @chr_norm, or you can go and visit granted.dev and you'll have a link to join the Slack community. And I'm very active on the Slack.Corey: You certainly are, although I will admit that I fall into the challenge of being in just the perfectly opposed timezone from you and your co-founder, who are in different time zones to my understanding; one of you is on Australia and one of you was in London; you're the London guy as best I'm aware. And as a result, invariably, I wind up putting in feature requests right when no one's around. And, for better or worse, in the middle of the night is not when I'm usually awake trying to log into AWS. That is Azure time.Chris: [laugh]. Yeah, no, we don't have the US time zone properly covered yet for our community support and help. But we do have a fair bit of the world timezone covered. The rest of the team for Common Fate is all based in Australia and I'm out here over in London.Corey: Yeah. I just want to thank you again, for just being so accessible and, like, honestly receptive to feedback. I want to be clear, there's a way to give feedback and I do strive to do it constructively. I didn't come crashing into your Slack one day with a, “You know what your problem is?” I prefer to take the, “This is awesome. Here's what I think would be even better. Does that make sense?” As opposed to the imperious demands and GitHub issues and whatnot? It's, “I'd love it if it did this thing. Doesn't do this thing. Can you please make it do this thing?” Turns out that's the better way to drive change. Who knew?Chris: Yeah. [laugh]. Yeah, definitely. And I think that one of the things that's been the best around our journey with Granted so far has been listening to feedback and hearing from people how they would like to use the tool. And a big thank you to you, Corey, for actually suggesting changes that make it not only better for you, but better for everybody else who's using Granted.Corey: Well, at least as long as we're using my particular byzantine workload patterns in some way, or shape, or form, I'll hear that. But no, it's been an absolute pleasure and I really want to thank you for your time as well.Chris: Yeah, thank you for having me.Corey: Chris Norman, co-founder of Common Fate, as well as one of the two primary developers originally behind the Granted project that logs you into AWS without you having to lose your mind. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, incensed, raging comment that talks about just how terrible all of this is once you spend four hours logging into your AWS account by hand first.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Full Description / Show Notes Guillermo talks about how he came to work at OCI and what it was like helping to pioneer Oracle's cloud product (1:40) Corey and Guillermo discuss the challenges and realities of multi-cloud (6:00) Corey asks about OCI's dedicated region approach (8:27) Guillermo discusses the problem of awareness (12:40) Corey and Guillermo talk cloud providers and cloud migration (14:40) Guillermo shares about how OCI's cost and customer service is unique among cloud providers (16:56) Corey and Guillermo talk about IoT services and 5G (23:58) About Guillermo RuizGuillermo Ruiz gets into trouble more often than he would like. During his career Guillermo has seen many horror stories while building data centers worldwide. In 2007 he dreamed with space-based internet and direct routing between satellites, but he could only reach “the Cloud”. And there he is, helping customer build their business in someone else servers since 2011.Beware of his sense of humor...If you ever see him in a tech event, run, he will get you in problems.Links: Twitter: https://twitter.com/IaaSgeek, https://twitter.com/OracleStartup LinkedIn: https://www.linkedin.com/in/gruizesteban/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I've been meaning to get a number of folks on this show for a while and today is absolutely one of those episodes. I'm joined by Guillermo Ruiz who is the Director of OCI Developer Evangelism, slash the Director of Oracle for Startups. Guillermo, thank you for joining me, and is Oracle for Startups an oxymoron because it kind of feels like it in some weird way, in the fullness of time.Guillermo: [laugh]. Thanks, Corey. It's a pleasure being in your show.Corey: Well, thank you. I enjoy having you here. I've been trying to get you on for a while. I'm glad I finally wore you down.Guillermo: [laugh]. Thanks. As I said, well, startup, I think, is the future of the industry, so it's a fundamental piece of our building blocks for the next generation of services.Corey: I have to say that I know that you folks at Oracle Cloud have been a recurring sponsor of the show. Thank you for that, incidentally. This is not a promoted guest episode. I invited you on because I wanted to talk to you about these things, which means that I can say more or less whatever I damn well want. And my experience with Oracle Cloud has been one of constantly being surprised since I started using it a few years ago, long before I was even taking sponsorships for this show. It was, “Oh, Oracle has a cloud. This ought to be rich.”And I started kicking the tires on it and I came away consistently and repeatedly impressed by the technical qualities the platform has. The always-free tier has a model of cloud economics that great. I have a sizable VM running there and have for years and it's never charged me a dime. Your data egress fees aren't, you know, a 10th of what a lot of the other cloud providers are charging, also known as, you know, you're charging in the bounds of reality; good for that. And the platform continues to—although it is different from other cloud providers, in some respects, it continues to impress.Honestly, I keep saying one of the worst problems that has is the word Oracle at the front of it because Oracle has a 40-some-odd-year history of big enterprise systems, being stodgy, being difficult to work with, all the things you don't generally tend to think of in terms of cloud. It really is a head turn. How did that happen? And how did you get dragged into the mess?Guillermo: Well, this came, like, back in five, six years ago, when they started building this whole thing, they picked people that were used to build cloud services from different hyperscalers. They dropped them into a single box in Seattle. And it's like, “Guys, knowing what you know, how you would build the next generation cloud platform?” And the guys came up with OCI, which was a second generation. And when I got hired by Oracle, they showed me the first one, that classic.It was totally bullshit. It was like, “Guys, there's no key differentiator with what's there in the market.” I didn't even know Oracle had a cloud, and I've been in this space since late-2010. And I had to sign, like, a bunch of NDAs a lot of papers, and they show me what they were cooking in the oven, and oh my gosh, when I saw that SDN out of the box directly in the physical network, CPUs assign, it was [BLEEP] [unintelligible 00:03:45]. It was, like, bare metal. I saw that the future was there. And I think that they built the right solution, so I joined the company to help them leverage the cloud platform.Corey: The thing that continually surprises me is that, “Oh, we have a cloud.” It has a real, “Hello fellow kids,” energy. Yes, yeah, so does IBM; we've seen how that played out. But the more I use it, the more impressed I am. Early on in the serverless function days, you folks more or less acquired Iron.io, and you were streets ahead as far as a lot of the event-driven serverless function style of thing tended to go.And one of the challenges that I see in the story that's being told about Oracle Cloud is, the big enterprise customer wins. These are the typical global Fortune 2000s, who have been around for, you know—which is weird for those of us in San Francisco, but apparently, these companies have been around longer than 18 months and they've built for platforms that are not the latest model MacBook Pro running the current version of Chrome. What is that? What is that legacy piece of garbage? What does it do? It's like, “Oh, it does about $4 billion a quarter so maybe show some respect.”It's the idea of companies that are doing real-world things, and they absolutely have cloud power. Problems and needs that are being met by a variety of different companies. It's easy to look at that narrative and overlook the fact that you could come up with some ridiculous Twitter for Pets-style business idea and build it on top of Oracle Cloud and I would not, at this point, call that a poor decision. I'm not even sure how it got there, and I wish that story was being told a little bit better. Given that you are a developer evangelist focusing specifically on startups and run that org, how do you see it?Guillermo: Well, the thing here is, you mentioned, you know, about Oracle, many startup doesn't even know we have a cloud provider. So, many of the question comes is like, how we can help on your business. It's more on the experience, you know, what are the challenges, the gaps, and we go in and identify and try to use our cloud. And even though if I'm not able to fill that gap, that's why we have this partnership with Microsoft. It's the first time to cloud providers connect both clouds directly without no third party in between, router to router.It's like, let's leverage the best of these clouds together. I'm a truly believer of multi-cloud. Non-single cloud is perfect. We are evolving, we're getting better, we are adding services. I don't want to get to 500 services like other guys do. It's like, just have a set of things that really works and works really, really well.Corey: Until you have 40 distinct managed database services and 80 ways to run containers, are you're really a full cloud provider? I mean, there's always that question that, at some point, the database Java, the future is going to have to be disambiguating between all the different managed database services on a per workload basis, and that job sounds terrible. I can't let the multi-cloud advocacy pass unchallenged here because I'm often misunderstood on this, and if I don't say something, I will get emails, and nobody wants that. I think that the idea of building a workload with the idea that it can flow seamlessly between cloud providers is a ridiculous fantasy that basically no one achieves. The number of workloads that can do that are very small.That said, the idea of independent workloads living on different cloud providers as is the best fit for placement for those is not just a good idea, it is the—whether it's a good idea or not as irrelevant because that's the reality in which we all live now. That is the world we have to deal with.Guillermo: If you want distributed system, obviously you need to have multiple cloud providers in your strategy. How you federate things—if you go down to the Kubernetes side, how you federate multi-clusters and stuff, that's a challenge out there where people have. But you mentioned that having multiple apps and things, we have customers that they've been running Google Cloud, for example, and we build [unintelligible 00:07:40] that cloud service out there. And the thing is that when they run the network throughput and the performance test, they were like, “Damn, this is even better than what I have in my data center.” It's like, “Guys, because we are room by room.” It's here is Google, here it's Oracle; we land in the same data center, we can provide better connectivity that what you even have.So, that kind of perception is not well seen in some customers because they realize that they're two separate clouds, but the reality is that most of us have our infrastructure in the same providers.Corey: It's kind of interesting, just to look at the way that the industry is misunderstanding a lot of these things. When you folks came out with your cloud at customer initiatives—the one that jumps out to my mind is the dedicated region approach—a lot of people started making fun of that because, “What is this nonsense? You're saying that you can deploy a region of your cloud on site at the customer with all of the cloud services? That's ridiculous. You folks don't understand cloud.”My rejoinder to that is people saying that don't understand customers. You take a look at for example… AWS has their Outpost which is a rack or racks with a subset of services in them. And that, from their perspective, as best I can tell, solves the real problem that customers have, which is running virtual machines on-premises that do not somehow charge an hourly cost back to AWS—I digress—but it does bring a lot of those services closer to customers. You bring all of your services closer to customers and the fact that is a feasible thing is intensely appealing to a wide variety of customer types. Rather than waiting for you to build a region in a certain geographic area that conforms with some regulatory data requirement, “Well, cool, we can ship some racks. Does that work for you?” It really is a game-changer in a whole bunch of respects and I don't think that the industry is paying close enough attention to just how valuable that is.Guillermo: Indeed. I've been at least hearing since 2010 that next year is the boom; now everybody will move into the cloud. It has been 12 years and still 75% of customers doesn't have their critical workloads in the cloud. They have developer environments, some little production stuff, but the core business is still relying in the data center. If I come and say, “Hey, what if I build this behind your firewall?”And it's not just that you have the whole thing. I'm removing all your operational expenses. Now, you don't need to think about hardware refresh, upgrade staff, just focus on your business. I think when we came up with a dedicated region, it was awesome. It was one of the best thing I've seen their Outpost is a great solution, to be honest, but if you lose the one connectivity, the control plane is still in the cloud.In our site, you have the control plane inside your data center so you can still operate and manage your services, even if there is an outage on your one site. One of the common questions we find on that area is, like, “Damn, this is great, but we would like to have a smaller size of this dedicated region.” Well, stay tuned because maybe we come with smaller versions of our dedicated regions so you guys can go and deploy whatever you need there.Corey: It turns out that, in the fullness of time, I like this computer but I want it to be smaller is generally a need that gets met super well. One thing that I've looked into recently has been the evolution of companies, in the fullness of time—which this is what completely renders me a terrible analyst in any traditional sense; I think more than one or two quarters ahead, and I look at these things—the average tenure of a company in the S&P 500 index is 21 years or so. Which means that if we take a look at what's going on 20 years or so from now in the 2040s, roughly half—give or take—of the constituency of the S&P 500 may very well not have been founded yet. So, when someone goes out and founds a company tomorrow as an idea that they're kicking around, let's be clear, with a couple of very distinct exceptions, they're going to build it on Cloud. There's a lot of reasons to do that until you hit certain inflection points.So, this idea that, oh, we're going to rent a rack, and we're going to go build some nonsense, and yadda, yadda, yadda. It's just, it's a fantasy. So, the question that I see for a lot of companies is the longtail legacy where if I take that startup and found it tomorrow and drive it all the way toward being a multinational, at what point did they become a customer for whatever these companies are selling? A lot of the big E enterprise vendors don't have a story for that, which tells me long-term, they have problems. Looking increasingly at what Oracle Cloud is doing, I have to level with you, I viewed Oracle as being very much in that slow-eroding dinosaur perspective until I started using the platform in some depth. I am increasingly of the mind that there's a bright future. I'm just not sure that has sunk into the industry's level of awareness these days.Guillermo: Yeah, I can agree with you in that sense. Mainly, I think we need to work on that awareness side. Because for example, if I go back to the other products we have in the company, you know, like the database, what the database team has done—and I'm not a database guy—and it's like, “Guys, even being an infrastructure guy, customers doesn't care about infrastructure. They just want to run their service, that it doesn't fail, you don't have a disruption; let me evolve my business.” But even though they came with this converged database, I was really impressed that you can do everything in a single-engine rather than having multiple database implemented. Now, you can use the MongoDB APIs.It's like, this is the key of success. When you remove the learning curve and the frictions for people to use your services. I'm a [unintelligible 00:13:23] guy and I always say, “Guys, click, click, click. In three clicks, I should have my service up and running.” I think that the world is moving so fast and we have so much information today, that's just 24 hours a day that I have to grab the right information. I don't have time to go and start learning something from scratch and taking a course of six months because results needs to be done in the next few weeks.Corey: One thing that I think that really reinforces this is—so as I mentioned before, I have a free tier account with you folks, have for years, whenever I log into the thing, I'm presented with the default dashboard view, which recommends a bunch of quickstarts. And none of the quickstarts that you folks are recommending to me involve step one, migrate your legacy data center or mainframe into the cloud. It's all stuff like using analytics to predict things with AI services, it's about observability, it's about governance of deploy a landing zone as you build these things out. Here's how to do a low-code app using Apex—which is awesome, let's be clear here—and even then launching resources is all about things that you would tend to expect of launch database, create a stack, spin up some VMs, et cetera. And that's about as far as it goes toward a legacy way of thinking.It is very clear that there is a story here, but it seems that all the cloud providers these days are chasing the migration story. But I have to say that with a few notable exceptions, the way that those companies move to cloud, it always starts off by looking like an extension of their data center. Which is fine. In that phase, they are improving their data center environment at the expense of being particularly cloudy, but I don't think that is necessarily an adoption model that puts any of these platforms—Oracle Cloud included—in their best light.Guillermo: Yeah, well, people was laughing to us, when we released Layer 2 in the network in the cloud. They were like, “Guys, you're taking the legacy to the cloud. It's like, you're lifting the shit and putting the shit up there.” Is like, “Guys, there are customers that cannot refactor and do anything there. They need to still run Layer 2 there. Why not giving people options?”That's my question is, like, there's no right answers to the cloud. You just need to ensure that you have the right options for people that they can choose and build their strategy around that.Corey: This has been a global problem where so many of these services get built and launched from all of the vendors that it becomes very unclear as a customer, is this thing for me or not? And honestly, sometimes one of the best ways to figure that out is to all right, what does it cost because that, it turns out, is going to tell me an awful lot. When it comes to the price tag of millions of dollars a year, this is probably not for my tiny startup. Whereas when it comes to a, oh, it's in the always free tier or it winds up costing pennies per hour, okay, this is absolutely something I want to wind up exploring and seeing what happens. And it becomes a really polished experience across the board.I also will say this is your generation two cloud—Gen 2, not to be confused with Gentoo, the Linux distribution for people with way more time on their hands than they have sense—and what I find interesting about it is, unlike a lot of the—please don't take this the wrong way—late-comers to cloud compared to the last 15 years of experience of Amazon being out in front of everyone, you didn't just look at what other providers have done and implement the exact same models, the exact same approaches to things. You've clearly gone in your own direction and that's leading to some really interesting places.Guillermo: Yeah, I think that doing what others are doing, you just follow the chain, no? That will never position you as a top number one out there. Being number one so many years in the cloud space as other cloud providers, sometimes you lose the perception of how to treat and speak to customers you know? It's like, “I'm the number one. Who cares if this guy is coming with me or not?” I think that there's more on the empathy side on how we treat customers and how we try to work and solve.For example, in the startup team, we find a lot of people that hasn't have infrastructure teams. We put for free our architects that will give you your GitHub or your GitLab account and we'll build the Terraform modules and give that for you. It's like now you can reuse it, spin up, modify whatever you want. Trying to make life easier for people so they can adopt and leverage their business in the cloud side, you know?[midroll 00:14:45]Corey: There's so much that we folks get right. Honestly, one of the best things that recommends this is the always free tier does exactly what it says on the tin. Yeah, sure. I don't get to use every edge case service that you've built across the board, but I've also had this thing since 2019, and never had to pay a penny for any of it, whereas recently—as we're recording this, it was a week or two ago—that I saw someone wondering what happened to their AWS account because over the past week, suddenly they went from not using SageMaker to being charged $270,000 on SageMaker. And it's… yeah, that's not the kind of thing that is going to endear the platform to frickin' anyone.And I can't believe I'm saying this, but the thing says Oracle on the front of it and I'm recommending it because it doesn't wind up surprising you with a bill later. It feels like I've woken up in bizarro world. But it's great.Guillermo: Yep. I think that's one of the clever things we've done on that side. We've built a very robust platform, really cool services. But it's key on how people can start learning and testing the flavors of your cloud. But not only what you have in the fleet here, you have also the Ampere instances.We're moving into a more sustainable world, and I think that having, like, the ARM architectures in the cloud and providing that on the free space of people can just go and develop on top, I think that was one of the great things we've done in the last year-and-a-half, something like that. Definitely a full fan of a free tier.Corey: You also, working over in the Developer Evangelist slash advocacy side of the world—devrelopers, as I tend to call it much to the irritation of basically everyone who works in developer relations—one of the things that I think is a challenge for you is that when I wind up trying to do something ridiculous—I don't know maybe it's a URL shortener; maybe it is build a small app that does something that's fairly generic—with a lot of the other platforms. There's a universe of blog posts out there, “Here's how I did it on this platform,” and then it's more or less you go to GitHub—or gif-UB, and I have mispronounced that too—and click the button and I wind up getting a deploy, whereas in things that are rapidly emerging with the Oracle Cloud space, it feels like, on some level, I wind up getting to be a bit of a trailblazer and figure some of these things out myself. That is diminishing. I'm starting to see more and more content around this stuff. I have to assume that is at least partially due to your organization's work.Guillermo: Oh, yeah, but things have changed. For example, we used to have our GitHub repository just as a software release, and we push to have that as a content management, you know, it's like, I always say that give—let people steal the code. You just put the example that will come with other ideas, other extensions, plug-in connectors, but you need to have something where you can start. So, we created this DevRel Quickstart that now is managed by the new DevRel organization where we try to put those examples. So, you just can go and put it.I've been working with the community on building, like, a content aggregator of how people is using our technology. We used to have ocigeek.com, that was a website with more than 1000 blog and, like, 500 visits a day looking after what other people were doing, but unfortunately, we had to, because of… the amount of X reasons we have to pull it off.But we want to come with something like that. I think that information should be available. I don't want people to think when it comes to my cloud is like, “Oh, how you use this product?” It's like no, guys how I can build with Angular, React the content management system? You will do it in my cloud because that example I'm doing, but I want you to learn the basics and the context of running Python and doing other things there rather than go into oh, no, this is something specific to me. No, no, that will never work.Corey: That was the big problem I found with doing a lot of the serverless stuff in years past where my first Lambda application took me two weeks to build because I'm terrible at programming. And now it takes me ten minutes to build because I'm terrible at programming and don't know what tests are. But the problem I ran into for that first one was, what is the integration format? What is the event structure? How do I wind up accessing that?What is the thing that I'm integrating with expecting because, “Mmm, that's not it; try again,” is a terrible error message. And so, much of it felt like it was the undifferentiated gluing things together. The only way to make that stuff work is good documentation and numerous examples that come at the problem from a bunch of different ways. And increasingly, Oracle's documentation is great.Guillermo: Yeah, well, in my view, for example, you have the Three-Tier Oracle. We should have a catalog of 100 things that you can do in the free tier, even though when I propose some of the articles, I was even talking about VMware, and people was like, “[unintelligible 00:22:34], you cannot deploy VMware.” It's like, “Yeah, but I can connect my [crosstalk 00:22:39]—”Corey: Well, not with that attitude.Guillermo: Yeah. And I was like, “Yeah, but I can connect to the cloud and just use it as a backup place where I can put my image and my stuff. Now, you're connecting to things: VMware with free tier.” Stuff like that. There are multiple things that you can do.And just having three blocks is things that you can do in the free tier, then having developer architectures. Show me how you can deploy an architecture directly from the command line, how I can run my DevOps service without going to the console, just purely using SDKs and stuff like that. And give me the option of how people is working and expanding that content and things there. If you put those three blocks together, I think you're done on how people can adopt and leverage your cloud. It's like, I want to learn; I don't want to know the basics of I don't know, it's—I'm not a database guy, so I don't understand those things and I don't want to go into details.I just they just need a database to store my profiles and my stuff so I can pick that and do computer vision. How I can pick and say, “Hey, I'm speaking with Corey Quinn and I have a drone flying here, he recommends your face and give me your background from all the different profiles.” That's the kind of solutions I want to build. But I don't want to be an expert on those areas.Corey: Because with all the pictures of me with my mouth open, you wouldn't be able to under—it would make no sense of me until I make that pose. There's method to—Guillermo: [laugh].Corey: —my insane madness over here.Guillermo: [laugh] [unintelligible 00:23:58].Corey: Yeah. But yeah, there's a lot of value as you move up the stack on these things. There's also something to be said, as well, for a direction that you folks have been moving in recently, that I—let me be fair here—I think it's clown shoes because I tend to think in terms of software because I have more or less the hardware destruction bunny level of aura when it comes to being near expensive things. And I look around the world and I don't have a whole lot of problems that I can legally solve with an army of robots.But there are customers who very much do. And that's why we see sort of the twin linking of things like IoT services and 5G, which when I first started seeing cloud providers talking about this, I thought was Looney Tunes. And you folks are getting into it too, so, “Oh, great. The hype wound up affecting you too.” And the thing that changed my mind was not anything cloud providers have to say—because let's be clear, everyone has an agenda they're trying to push for—but who doesn't have an agenda is the customers talking about these things and the neat things that they're able to achieve with it, at which point I stopped making fun, I shut up and listen in the hopes that I might learn something. How have you seen that whole 5G slash IoT slash internet of Nonsense space evolving?Guillermo: That's the future. That's what we're going to see in the next five years. I run some innovation sessions with a lot of customers and one of the main components I speak about is this area. With 5G, the number of IoT devices will exponentially grow. That means that you're going to have more data points, more data volume out there.How can you provide the real value, how you can classify, index, and provide the right information in just 24 hours, that's what people is looking. Things needs to be instant. If you say to the kids today, they cannot watch a football match, 90 minutes. If you don't get the answer in ten, they move to the next thing. That's how this society is moving [unintelligible 00:25:50].Having all these solutions from a data perspective, and I think that Oracle has a great advantage in that space because we've been doing that for 43 years, right? It's like, how we do the abstraction? How I can pick all that information and provide added value? We build the robot as a service. I can configure it from my browser, any robot anywhere in the world.And I can do it in Python, Java. I can [unintelligible 00:26:14] applications. Two weeks ago, we were testing on connecting IoT devices and flashing the firmware. And it was working. And this is something that we didn't do it alone. We did it with a startup.The guys came and had a sandbox already there, is like, “let's enable this on [unintelligible 00:26:28]. Let's start working together.” Now, I can go to my customers and provide them a solution that is like, hey, let's connect Boston Dynamics, or [unintelligible 00:26:37] Robotics. Let's start doing those things and take the benefits of using Oracle's AI and ML services. Pick that, let's do computer vision, natural language processing.Now, you're connecting what I say, an end-to-end solution that provides real value for customers. Connected cars, we turn our car into a wallet. I can go and pay on the petrol station without leaving my car. If I'm taking the kids to takeaway, I can just pay these kind of things is like, “Whoa, this is really cool.” But what if I [laugh] get that information for your insurance company.Next year, Corey, you will pay double because you're a crazy driver. And we know how you drive in the car because we have all that information in place. That's how the things will roll out in the next five to ten years. And [unintelligible 00:27:24] healthcare. We build something for emergencies that if you have a car crash, they have the guys that go and attend can have your blood type and some information about your car, where to cut the chassis and stuff when you get prisoner inside.And I got people saying, “Oh gee, GDPR because we are in Europe.” It's like, “Guys, if I'm going to die, I don't care if they have my information.” That's the point where people really need to balance the whole thing, right? Obviously, we protect the information and the whole thing, but in those situations is like hey, there's so many things we can do. There are countless opportunities out there.Corey: The way that I square that circle personally has always been it's about informed consent, when if people are given a choice, then an awful lot of those objections that people have seemed to melt away. Provided, of course, that is an actual choice and it's not one of those, “Well, you can either choose to”—quote-unquote—“Choose to do this, or you can pay $9,000 a month extra.” Which is, that's not really a choice. But as long as there's a reasonable way to get informed consent, I think that people don't particularly mind, I think it's when they wind up feeling that they have been spied upon without their knowledge, that's when everything tends to blow up. It turns out, if you tell people in advance what you're going to do with their information, they're a lot less upset. And I don't mean burying it deep and the terms and conditions.Guillermo: And that's a good example. We run a demo with one of our customers showing them how dangerous the public information you have out there. You usually sign and click and give rights to everybody. We found in Stack Overflow, there was a user that you just have the username there, nothing else. And we build a platform with six terabytes of information grabbing from Stack Overflow, LinkedIn, Twitter, and many other social media channels, and we show how we identify that this guy was living in Bangalore in India and was working for a specific company out there.So, people was like, “Damn, just having that name, you end up knowing that?” It's like there's so much information out there of value. And we've seen other companies doing that illegally in other places, you know, Cambridge Analytics and things like that. But that's the risk of giving your information for free out there.Corey: It's always a matter of trade-offs. There is no one-size-fits-all solution and honestly, if there were it feels like we wouldn't have cloud providers; we would just have the turnkey solution that gives the same thing that everyone needs and calls it good. I dream of such a day, but it turns out that customers are different, people are different, and there's no escaping that.Guillermo: [laugh]. Well, you mentioned dreamer; I dream direct routing between satellites, and look where I am; I'm just in the cloud, one step lower. [laugh].Corey: You know, bit by bit, we're going to get there one way or another, for an altitude perspective. I really want to thank you for taking so much time to speak with me today. If people want to learn more, where's the right place to find you?Guillermo: Well, I have the @IaaSgeek Twitter account, and you can find me on LinkedIn gruizesteban there. Just people wants to talk about anything there, I'm open to any kind of conversation. Just feel free to reach out. And it was a pleasure finally meeting you, in person. Not—well in person; through a camera, at least being in the show with you.Corey: Other than on the other side of a Twitter feed. No, I hear you.Guillermo: [laugh].Corey: We will, of course, put links to all of that in the [show notes 00:30:43]. Thank you so much for your time. I really do appreciate it.Guillermo: Thanks very much. So, you soon.Corey: Guillermo Ruiz, Director of OCI Developer Evangelism. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment, to which I will respond with a surprise $270,000 bill.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About YoavYoav is a security veteran recognized on Microsoft Security Response Center's Most Valuable Research List (BlackHat 2019). Prior to joining Orca Security, he was a Unit 8200 researcher and team leader, a chief architect at Hyperwise Security, and a security architect at Check Point Software Technologies. Yoav enjoys hunting for Linux and Windows vulnerabilities in his spare time.Links Referenced: Orca Security: https://orca.security Twitter: https://twitter.com/yoavalon TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Vultr. Optimized cloud compute plans have landed at Vultr to deliver lightning fast processing power, courtesy of third gen AMD EPYC processors without the IO, or hardware limitations, of a traditional multi-tenant cloud server. Starting at just 28 bucks a month, users can deploy general purpose, CPU, memory, or storage optimized cloud instances in more than 20 locations across five continents. Without looking, I know that once again, Antarctica has gotten the short end of the stick. Launch your Vultr optimized compute instance in 60 seconds or less on your choice of included operating systems, or bring your own. It's time to ditch convoluted and unpredictable giant tech company billing practices, and say goodbye to noisy neighbors and egregious egress forever. Vultr delivers the power of the cloud with none of the bloat. "Screaming in the Cloud" listeners can try Vultr for free today with a $150 in credit when they visit getvultr.com/screaming. That's G E T V U L T R.com/screaming. My thanks to them for sponsoring this ridiculous podcast.Corey: Finding skilled DevOps engineers is a pain in the neck! And if you need to deploy a secure and compliant application to AWS, forgettaboutit! But that's where DuploCloud can help. Their comprehensive no-code/low-code software platform guarantees a secure and compliant infrastructure in as little as two weeks, while automating the full DevSecOps lifestyle. Get started with DevOps-as-a-Service from DuploCloud so that your cloud configurations are done right the first time. Tell them I sent you and your first two months are free. To learn more visit: snark.cloud/duplocloud. Thats's snark.cloud/D-U-P-L-O-C-L-O-U-D. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Periodically, I would say that I enjoy dealing with cloud platform security issues, except I really don't. It's sort of forced upon me to deal with much like a dead dog is cast into their neighbor's yard for someone else to have to worry about. Well, invariably, it seems like it's my yard.And I'm only on the periphery of these things. Someone who's much more in the trenches in the wide world of cloud security is joining me today. Yoav Alon is the CTO at Orca Security. Yoav, thank you for taking the time to join me today and suffer the slings and arrows I'll no doubt be hurling your way.Yoav: Thank you, Corey, for having me. I've been a longtime listener, and it's an honor to be here.Corey: I still am periodically surprised that anyone listens to these things. Because it's unlike a newsletter where everyone will hit reply and give me a piece of their mind. People generally don't wind up sending me letters about things that they hear on the podcast, so whenever I talk to somebody listens to it as, “Oh. Oh, right, I did turn the microphone on. Awesome.” So, it's always just a little on the surreal side.But we're not here to talk necessarily about podcasting, or the modern version of an AM radio show. Let's start at the very beginning. What is Orca Security, and why would folks potentially care about what it is you do?Yoav: So, Orca Security is a cloud security company, and our vision is very simple. Given a customer's cloud environment, we want to detect all the risks in it and implement mechanisms to prevent it from occurring. And while it sounds trivial, before Orca, it wasn't really possible. You will have to install multiple tools and aggregate them and do a lot of manual work, and it was messy. And we wanted to change that, so we had, like, three guiding principles.We call it seamless, so I want to detect all the risks in your environment without friction, which is our speak for fighting with your peers. We also want to detect everything so you don't have to install, like, a tool for each issue: A tool for vulnerabilities, a tool for misconfigurations, and for sensitive data, IAM roles, and such. And we put a very high priority on context, which means telling you what's important, what's not. So, for example, S3 bucket open to the internet is important if it has sensitive data, not if it's a, I don't know, static website.Corey: Exactly. I have a few that I'd like to get screamed at in my AWS account, like, “This is an open S3 bucket and it's terrible.” I look at it the name is assets.lastweekinaws.com. Gee, I wonder if that's something that's designed to be a static hosted website.Increasingly, I've been slapping CloudFront in front of those things just to make the broken warning light go away. I feel like it's an underhanded way of driving CloudFront adoption some days, but not may not be the most charitable interpretation thereof. Orca has been top-of-mind for a lot of folks in the security community lately because let's be clear here, dealing with security problems in cloud providers from a vendor perspective is an increasingly crowded—and clouded—space. Just because there's so much—there's investment pouring into it, everyone has a slightly different take on the problem, and it becomes somewhat challenging to stand out from the pack. You didn't really stand out from the pack so much as leaped to the front of it and more or less have become the de facto name in a very short period of time, specifically—at least from my world—when you wound up having some very interesting announcements about vulnerabilities within AWS itself. You will almost certainly do a better job of relating the story, so please, what did you folks find?Yoav: So, back in September of 2021, two of my researchers, Yanir Tsarimi and Tzah Pahima, each one of them within a relatively short span of time from each other, found a vulnerability in AWS. Tzah found a vulnerability in CloudFormation which we named BreakingFormation and Yanir found a vulnerability in AWS Glue, which we named SuperGlue. We're not the best copywriters, but anyway—Corey: No naming things is hard. Ask any Amazonian.Yoav: Yes. [laugh]. So, I'll start with BreakingFormation which caught the eyes of many. It was an XXE SSRF, which is jargon to say that we were able to read files and execute HTTP requests and read potentially sensitive data from CloudFormation servers. This one was mitigated within 26 hours by AWS, so—Corey: That was mitigated globally.Yoav: Yes, globally, which I've never seen such quick turnaround anywhere. It was an amazing security feat to see.Corey: Particularly in light of the fact that AWS does a lot of things very right when it comes to, you know, designing cloud infrastructure. Imagine that, they've had 15 years of experience and basically built the idea of cloud, in some respects, at the scale that hyperscalers operate at. And one of their core tenets has always been that there's a hard separation between regions. There are remarkably few global services, and those are treated with the utmost of care and delicacy. To the point where when something like that breaks as an issue that spans more than one region, it is headline-making news in many cases.So it's, they almost never wind up deploying things to all regions at the same time. That can be irksome when we're talking about things like I want a feature that solves a problem that I have, and I have to wait months for it to hit a region that I have resources living within, but for security, stuff like this, I am surprised that going from, “This is the problem,” to, “It has been mitigated,” took place within 26 hours. I know it sounds like a long time to folks who are not deep in the space, but that is superhero speed.Yoav: A small correction, it's 26 hours for, like, the main regions. And it took three to four days to propagate to all regions. But still, it's speed of lighting in for security space.Corey: When this came out, I was speaking to a number of journalists on background about trying to wrap their head around this, and they said that, “Oh yeah, and security is always, like, the top priority for AWS, second only to uptime and reliability.” And… and I understand the perception, but I disagree with it in the sense of the nightmare scenario—that every time I mention to a security person watching the blood drain from their face is awesome—but the idea that take IAM, which as Werner said in his keynote, processes—was it 500 million or was it 500 billion requests a second, some ludicrous number—imagine fails open where everything suddenly becomes permitted. I have to imagine in that scenario, they would physically rip the power cables out of the data centers in order to stop things from going out. And that is the right move. Fortunately, I am extremely optimistic that will remain a hypothetical because that is nightmare fuel right there.But Amazon says that security is job zero. And my cynical interpretation is that well, it wasn't, but they forgot security, decided to bolt it on to the end, like everyone else does, and they just didn't want to renumber all their slides, so instead of making it point one, they just put another slide in front of it and called the job zero. I'm sure that isn't how it worked, but for those of us who procrastinate and building slide decks for talks, it has a certain resonance to it. That was one issue. The other seemed a little bit more pernicious focusing on Glue, which is their ETL-as-a-Service… service. One of them I suppose. Tell me more about it.Yoav: So, one of the things that we found when we found the BreakingFormation when we reported the vulnerability, it led us to do a quick Google search, which led us back to the Glue service. It had references to Glue, and we started looking around it. And what we were able to do with the vulnerability is given a specific feature in Glue, which we don't disclose at the moment, we were able to effectively take control over the account which hosts the Glue service in us-east-1. And having this control allowed us to essentially be able to impersonate the Glue service. So, every role in AWS that has a trust to the Glue service, we were able to effectively assume a role into it in any account in AWS. So, this was more critical a vulnerability in its effect.Corey: I think on some level, the game of security has changed because for a lot of us who basically don't have much in the way of sensitive data living in AWS—and let's be clear, I take confidentiality extremely seriously. Our clients on the consulting side view their AWS bills themselves as extremely confidential information that Amazon stuffs into a PDF and emails every month. But still. If there's going to be a leak, we absolutely do not want it to come from us, and that is something that we take extraordinarily seriously. But compared to other jobs I've had in the past, no one will die if that information gets out.It is not the sort of thing that is going to ruin people's lives, which is very often something that can happen in some data breaches. But in my world, one of the bad cases of a breach of someone getting access to my account is they could spin up a bunch of containers on the 17 different services that AWS offers that can run containers and mine cryptocurrency with it. And the damage to me then becomes a surprise bill. Okay, great. I can live with that.Something that's a lot scarier to a lot of companies with, you know, serious problems is, yep, fine, cost us money, whatever, but our access to our data is the one thing that is going to absolutely be the thing that cannot happen. So, from that perspective alone, something like Glue being able to do that is a lot more terrifying than subverting CloudFormation and being able to spin up additional resources or potentially take resources down. Is that how you folks see it too, or is—I'm sure there's nuance I'm missing.Yoav: So yeah, the access to data is top-of-mind for everyone. It's a bit scary to think about it. I have to mention, again, the quick turnaround time for AWS, which almost immediately issued a patch. It was a very fast one and they mitigated, again, the issue completely within days. About your comment about data.Data is king these days, there is nothing like data, and it has all the properties of everything that we care about. It's expensive to store, it's expensive to move, and it's very expensive if it leaks. So, I think a lot of people were more alarmed about the Glue vulnerability than the CloudFormation vulnerability. And they're right in doing so.Corey: I do want to call out that AWS did a lot of things right in this area. Their security posture is very clearly built around defense-in-depth. The fact that they were able to disclose—after some prodding—that they checked the CloudTrail logs for the service itself, dating back to the time the service launched, and verified that there had never been an exploit of this, that is phenomenal, as opposed to the usual milquetoast statements that companies have. We have no evidence of it, which can mean that we did the same thing and we looked through all the logs in it's great, but it can also mean that, “Oh, yeah, we probably should have logs, shouldn't we? But let's take a backlog item for that.” And that's just terrifying on some level.It becomes a clear example—a shining beacon for some of us in some cases—of doing things right from that perspective. There are other sides to it, though. As a customer, it was frustrating in the extreme to—and I mean, no offense by this—to learn about this from you rather than from the provider themselves. They wound up putting up a security notification many hours after your blog post went up, which I would also just like to point out—and we spoke about it at the time and it was a pure coincidence—but there was something that was just chef's-kiss perfect about you announcing this on Andy Jassy's birthday. That was just very well done.Yoav: So, we didn't know about Andy's birthday. And it was—Corey: Well, I see only one of us has a company calendar with notable executive birthdays splattered all over it.Yoav: Yes. And it was also published around the time that AWS CISO was announced, which was also a coincidence because the date was chosen a lot of time in advance. So, we genuinely didn't know.Corey: Communicating around these things is always challenging because on the one hand, I can absolutely understand the cloud providers' position on this. We had a vulnerability disclosed to us. We did our diligence and our research because we do an awful lot of things correctly and everyone is going to have vulnerabilities, let's be serious here. I'm not sitting here shaking my fist, angry at AWS's security model. It works, and I am very much a fan of what they do.And I can definitely understand then, going through all of that there was no customer impact, they've proven it. What value is there to them telling anyone about it, I get that. Conversely, you're a security company attempting to stand out in a very crowded market, and it is very clear that announcing things like this demonstrates a familiarity with cloud that goes beyond the common. I radically changed my position on how I thought about Orca based upon these discoveries. It went from, “Orca who,” other than the fact that you folks have sponsored various publications in the past—thanks for that—but okay, a security company. Great to, “Oh, that's Orca. We should absolutely talk to them about a thing that we're seeing.” It has been transformative for what I perceive to be your public reputation in the cloud security space.So, those two things are at odds: The cloud provider doesn't want to talk about anything and the security company absolutely wants to demonstrate a conversational fluency with what is going on in the world of cloud. And that feels like it's got to be a very delicate balancing act to wind up coming up with answers that satisfy all parties.Yoav: So, I just want to underline something. We don't do what we do in order to make a marketing stand. It's a byproduct of our work, but it's not the goal. For the Orca Security Research Pod, which it's the team at Orca which does this kind of research, our mission statement is to make cloud security better for everyone. Not just Orca customers; for everyone.And you get to hear about the more shiny things like big headline vulnerabilities, but we also have very sensible blog posts explaining how to do things, how to configure things and give you more in-depth understanding into security features that the cloud providers themselves provide, which are great, and advance the state of the cloud security. I would say that having a cloud vulnerability is sort of one of those things, which makes me happy to be a cloud customer. On the one side, we had a very big vulnerability with very big impact, and the ability to access a lot of customers' data is conceptually terrifying. The flip side is that everything was mitigated by the cloud providers in warp speed compared to everything else we've seen in all other elements of security. And you get to sleep better knowing that it happened—so no platform is infallible—but still the cloud provider do work for you, and you'll get a lot of added value from that.Corey: You've made a few points when this first came out, and I want to address them. The first is, when I reached out to you with a, “Wow, great work.” You effectively instantly came back with, “Oh, it wasn't me. It was members of my team.” So, let's start there. Who was it that found these things? I'm a huge believer giving people credit for the things that they do.The joy of being in a leadership position is if the company screws up, yeah, you take responsibility for that, whether the company does something great, yeah, you want to pass praise onto the people who actually—please don't take this the wrong way—did the work. And not that leadership is not work, it absolutely is, but it's a different kind of work.Yoav: So, I am a security researcher, and I am very mindful for the effort and skill it requires to find vulnerabilities and actually do a full circle on them. And the first thing I'll mention is Tzah Pahima, which found the BreakingFormation vulnerability and the vulnerability in CloudFormation, and Yanir Tsarimi, which found the AutoWarp vulnerability, which is the Azure vulnerability that we have not mentioned, and the Glue vulnerability, dubbed SuperGlue. Both of them are phenomenal researcher, world-class, and I'm very honored to work with them every day. It's one of my joys.Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.Corey: It's very clear that you have built an extraordinary team for people who are able to focus on vulnerability research. Which, on some level, is very interesting because you are not branded as it were as a vulnerability research company. This is not something that is your core competency; it's not a thing that you wind up selling directly that I'm aware of. You are selling a security platform offering. So, on the one hand, it makes perfect sense that you would have a division internally that works on this, but it's also very noteworthy, I think, that is not the core description of what it is that you do.It is a means by which you get to the outcome you deliver for customers, not the thing that you are selling directly to them. I just find that an interesting nuance.Yoav: Yes, it is. And I would elaborate and say that research informs the product, and the product informs research. And we get to have this fun dance where we learn new things by doing research. We [unintelligible 00:18:08] the product, and we use the customers to teach us things that we didn't know. So, it's one of those happy synergies.Corey: I want to also highlight a second thing that you have mentioned and been very, I guess, on message about since news of this stuff first broke. And because it's easy to look at this and sensationalize aspects of it, where, “See? The cloud providers security model is terrible. You shouldn't use them. Back to data centers we go.” Is basically the line taken by an awful lot of folks trying to sell data center things.That is not particularly helpful for the way that the world is going. And you've said, “Yeah, you should absolutely continue to be in cloud. Do not disrupt your cloud plan as a result.” And let's be clear, none of the rest of us are going to find and mitigate these things with anything near the rigor or rapidity that the cloud providers can and do demonstrate.Yoav: I totally agree. And I would say that the AWS security folks are doing a phenomenal job. I can name a few, but they're all great. And I think that the cloud is by far a much safer alternative than on-prem. I've never seen issues in my on-prem environment which were critical and fixed in such a high velocity and such a massive scale.And you always get the incremental improvements of someone really thinking about all the ins and outs of how to do security, how to do security in the cloud, how to make it faster, more reliable, without a business interruptions. It's just phenomenal to see and phenomenal to witness how far we've come in such a relatively short time as an industry.Corey: AWS in particular, has a reputation for being very good at security. I would argue that, from my perspective, Google is almost certainly slightly better at their security approach than AWS is, but to be clear, both of them are significantly further along the path than I am going to be. So great, fantastic. You also have found something interesting over in the world of Azure, and that honestly feels like a different class of vulnerability. To my understanding, the Azure vulnerability that you recently found was you could get credential material for other customers simply by asking for it on a random high port. Which is one of those—I'm almost positive I'm misunderstanding something here. I hope. Please?Yoav: I'm not sure you're misunderstanding. So, I would just emphasize that the vulnerability again, was found by Yanir Tsarimi. And what he found was, he used a service called Azure Automation which enables you essentially to run a Python script on various events and schedules. And he opened the python script and he tried different ports. And one of the high ports he found, essentially gave him his credentials. And he said, “Oh, wait. That's a really odd port for an HTTP server. Let's try, I don't know, a few ports on either way.” And he started getting credentials from other customers. Which was very surprising to us.Corey: That is understating it by a couple orders of magnitude. Yes, like, “Huh. That seems sub-optimal,” is sort of like the corporate messaging approved thing. At the time you discover that—I'm certain it was a three-minute-long blistering string of profanity in no fewer than four languages.Yoav: I said to him that this is, like, a dishonorable bug because he worked very little to find it. So it was, from start to finish, the entire research took less than two hours, which, in my mind, is not enough for this kind of vulnerability. You have to work a lot harder to get it. So.Corey: Yeah, exactly. My perception is that when there are security issues that I have stumbled over—for example, I gave a talk at re:Invent about it in the before times, one of them was an overly broad permission in a managed IAM policy for SageMaker. Okay, great. That was something that obviously was not good, but it also was more of a privilege escalation style of approach. It wasn't, “Oh, by the way, here's the keys to everything.”That is the type of vulnerability I have come to expect, by and large, from cloud providers. We're just going to give you access credentials for other customers is one of those areas that… it bugs me on a visceral level, not because I'm necessarily exposed personally, but because it more or less shores up so many of the arguments that I have spent the last eight years having with folks are like, “Oh, you can't go to cloud. Your data should live on your own stuff. It's more secure that way.” And we were finally it feels like starting to turn a cultural corner on these things.And then something like that happens, and it—almost have those naysayers become vindicated for it. And it's… it almost feels, on some level, and I don't mean to be overly unkind on this, but it's like, you are absolutely going to be in a better security position with the cloud providers. Except to Azure. And perhaps that is unfair, but it seems like Azure's level of security rigor is nowhere near that of the other two. Is that generally how you're seeing things?Yoav: I would say that they have seen more security issues than most other cloud providers. And they also have a very strong culture of report things to us, and we're very streamlined into patching those and giving credit where credit's due. And they give out bounties, which is an incentives for more research to happen on those platforms. So, I wouldn't say this categorically, but I would say that the optics are not very good. Generally, the cloud providers are much safer than on-prem because you only hear very seldom on security issues in the cloud.You hear literally every other day on issues happening to on-prem environments all over the place. And people just say they expect it to be this way. Most of the time, it's not even a headline. Like, “Company X affected with cryptocurrency or whatever.” It happens every single day, and multiple times a day, breaches which are massively bigger. And people who don't want to be in the cloud will find every reason not to be the cloud. Let us have fun.Corey: One of the interesting parts about this is that so many breaches that are on-prem are just never discovered because no one knows what the heck's running in an environment. And the breaches that we hear about are just the ones that someone had at least enough wherewithal to find out that, “Huh. That shouldn't be the way that it is. Let's dig deeper.” And that's a bad day for everyone. I mean, no one enjoys those conversations and those moments.And let's be clear, I am surprisingly optimistic about the future of Azure Security. It's like, “All right, you have a magic wand. What would you do to fix it?” It's, “Well, I'd probably, you know, hire Charlie Bell and get out of his way,” is not a bad answer as far as how these things go. But it takes time to reform a culture, to wind up building in security as a foundational principle. It's not something you can slap on after the fact.And perhaps this is unfair. But Microsoft has 30 years of history now of getting the world accustomed to oh, yeah, just periodically, terrible vulnerabilities are going to be discovered in your desktop software. And every once a month on Tuesdays, we're going to roll out a whole bunch of patches, and here you go. Make sure you turn on security updates, yadda, yadda, yadda. That doesn't fly in the cloud. It's like, “Oh, yeah, here's this month's list of security problems on your cloud provider.” That's one of those things that, like, the record-scratch, freeze-frame moment of wait, what are we doing here, exactly?Yoav: So, I would say that they also have a very long history of making those turnarounds. Bill Gates famously did his speech where security comes first, and they have done a very, very long journey and turn around the company from doing things a lot quicker and a lot safer. It doesn't mean they're perfect; everyone will have bugs, and Azure will have more people finding bugs into it in the near future, but security is a journey, and they've not started from zero. They're doing a lot of work. I would say it's going to take time.Corey: The last topic I want to explore a little bit is—and again, please don't take this as anyway being insulting or disparaging to your company, but I am actively annoyed that you exist. By which I mean that if I go into my AWS account, and I want to configure it to be secure. Great. It's not a matter of turning on the security service, it's turning on the dozen or so security services that then round up to something like GuardDuty that then, in turn, rounds up to something like Security Hub. And you look at not only the sheer number of these services and the level of complexity inherent to them, but then the bill comes in and you do some quick math and realize that getting breached would have been less expensive than what you're spending on all of these things.And somehow—the fact that it's complex, I understand; computers are like that. The fact that there is—[audio break 00:27:03] a great messaging story that's cohesive around this, I come to accept that because it's AWS; talking is not their strong suit. Basically declining to comment is. But the thing that galls me is that they are selling these services and not inexpensively either, so it almost feels, on some level like, shouldn't this on some of the built into the offerings that you folks are giving us?And don't get me wrong, I'm glad that you exist because bringing order to a lot of that chaos is incredibly important. But I can't shake the feeling that this should be a foundational part of any cloud offering. I'm guessing you might have a slightly different opinion than mine. I don't think you show up at the office every morning, “I hate that we exist.”Yoav: No. And I'll add a bit of context and nuance. So, for every other company than cloud providers, we expect them to be very good at most things, but not exceptional at everything. I'll give the Redshift example. Redshift is a pretty good offering, but Snowflake is a much better offering for a much wider range of—Corey: And there's a reason we're about to become Snowflake customers ourselves.Yoav: So, yeah. And there are a few other examples of that. A security company, a company that is focused solely on your security will be much better suited to help you, in a lot of cases more than the platform. And we work actively with AWS, Azure, and GCP requesting new features, helping us find places where we can shed more light and be more proactive. And we help to advance the conversation and make it a lot more actionable and improve from year to year. It's one of those collaborations. I think the cloud providers can do anything, but they can't do everything. And they do a very good job at security; it doesn't mean they're perfect.Corey: As you folks are doing an excellent job of demonstrating. Again, I'm glad you folks exist; I'm very glad that you are publishing the research that you are. It's doing a lot to bring a lot I guess a lot of the undue credit that I was giving AWS for years of, “No, no, it's not that they don't have vulnerabilities like everyone else does. It just that they don't ever talk about them.” And they're operationalizing of security response is phenomenal to watch.It's one of those things where I think you've succeeded and what you said earlier that you were looking to achieve, which is elevating the state of cloud security for everyone, not just Orca customers.Yoav: Thank you.Corey: Thank you. I really appreciate your taking the time out of your day to speak with me. If people want to learn more, where's the best place they can go to do that?Yoav: So, we have our website at orca.security. And you can reach me out on Twitter. My handle is at @yoavalon, which is @-Y-O-A-V-A-L-O-N.Corey: And we will of course put links to that in the [show notes 00:29:44]. Thanks so much for your time. I appreciate it.Yoav: Thank you, Corey.Corey: Yoav Alon, Chief Technology Officer at Orca Security. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, or of course on YouTube, smash the like and subscribe buttons because that's what they do on that platform. Whereas if you've hated this podcast, please do the exact same thing, five-star review, smash the like and subscribe buttons on YouTube, but also leave an angry comment that includes a link that is both suspicious and frightening, and when we click on it, suddenly our phones will all begin mining cryptocurrency.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
