POPULARITY
Zastanawiasz się jak zbudować własną Architekturę on-premises PaaS bez uzależnienia od chmury? W tym odcinku Patoarchitekci analizują open-source'owe alternatywy dla usług chmurowych. Łukasz i Szymon omawiają Kubernetesa, Ranchera i inne kluczowe komponenty własnej platformy. Prowadzący szczegółowo rozkładają na czynniki pierwsze budowę platformy PaaS. Od operatorów baz danych i cache'u Redis, przez storage obiektowy Minio, po monitoring z Grafaną. Dowiesz się, kiedy ma sens przenoszenie workloadów z chmury na on-prem i jak uniknąć typowych pułapek przy budowie własnej infrastruktury. Chcesz odzyskać kontrolę nad swoją infrastrukturą i kosztami? Posłuchaj tego odcinka i przekonaj się, czy budowa własnego PaaS-a to dobry pomysł dla Twojej organizacji. Pamiętaj tylko, że MVP platformy to dopiero początek – prawdziwe wyzwania zaczynają się przy jej utrzymaniu! A teraz nie ma co się obijać!
Como hacer copias de seguridad o #backup en #android utilizando #restic #termux y #minio de forma sencilla, segura y cifrada.Hace unos días te comenté que estaba estudiando la posibilidad de reemplazar BorgBackup, la herramienta que utilizo por defecto para copias de seguridad, y de la que te he hablado en varias ocasiones. Es una herramienta de la que estoy realmente satisfecho y que me ha ahorrado mas de un disgusto, como por ejemplo lo que te conté en el episodio 173 titulado Hice un rm -rf, salvado por Borg. Sin embargo, hace poco descubrí Restic del que te hablé en el episodio 677 titulado No pierdasa tus datos. Backups infalibles con Restic y Minio, y llevo unas semanas comparando uno con el otro. Y, realmente estoy tan satisfecho con este último, con Restic que he decidido implantarlo en otros dispositivos donde hasta el momento no estaba haciendo copias de seguridad, y me refiero a mis dispositivos Android. Así, en este episodio te hablaré de backups en Android.Más información y enlaces en las notas del episodio
Como hacer copias de seguridad o #backup en #android utilizando #restic #termux y #minio de forma sencilla, segura y cifrada.Hace unos días te comenté que estaba estudiando la posibilidad de reemplazar BorgBackup, la herramienta que utilizo por defecto para copias de seguridad, y de la que te he hablado en varias ocasiones. Es una herramienta de la que estoy realmente satisfecho y que me ha ahorrado mas de un disgusto, como por ejemplo lo que te conté en el episodio 173 titulado Hice un rm -rf, salvado por Borg. Sin embargo, hace poco descubrí Restic del que te hablé en el episodio 677 titulado No pierdasa tus datos. Backups infalibles con Restic y Minio, y llevo unas semanas comparando uno con el otro. Y, realmente estoy tan satisfecho con este último, con Restic que he decidido implantarlo en otros dispositivos donde hasta el momento no estaba haciendo copias de seguridad, y me refiero a mis dispositivos Android. Así, en este episodio te hablaré de backups en Android.Más información y enlaces en las notas del episodio
Recently VMware Data Services Manager 2.2 was released, so I had to invite my good friend Cormac Hogan to discuss all the enhancements we introduced to an already great product. Of course, we also discussed the Tech Preview for the Object Storage Service, which enables you to deploy MinIO at scale! Disclaimer: The thoughts and opinions shared in this podcast are our own/guest(s), and not necessarily those of Broadcom or VMware by Broadcom.
Buscas un sistema seguro y fiable para tus copias de seguridad? Monta tu sistema utilizando #restic, #resticprofile y #minio para tus #backups infaliblesHace años que te vengo hablando sobre copias de seguridad. En concreto en el episodio 173 te comenté como había hecho un rm -rf, y fuí salvado por Borg. Se que no es un tema tan atractivo como hablar de multimedia, o de servicios que impactan, pero es algo imprescindible. El problema es que solo te acuerdas de las copas de seguridad cuando realmente las necesitas, y en ese momento, es posible que te acuerdes para mal, porque o bien en un momento determinado decidiste no hacerlas o decidiste dejarlo para mas adelante (maldita procrastinación), o simplemente no comprobaste que realmente se estuvieran haciendo de forma correcta. Así, tener un sistema de copias de seguridad eficaz y eficiente es realmente imprescindible y fundamental. En este episodio te hablaré sobre Restic, una alternativa a Borg que estoy probando y que con mucha probabilidad se quedé como sistema por defecto en las próximas semanas.Más información y enlaces en las notas del episodio
Buscas un sistema seguro y fiable para tus copias de seguridad? Monta tu sistema utilizando #restic, #resticprofile y #minio para tus #backups infaliblesHace años que te vengo hablando sobre copias de seguridad. En concreto en el episodio 173 te comenté como había hecho un rm -rf, y fuí salvado por Borg. Se que no es un tema tan atractivo como hablar de multimedia, o de servicios que impactan, pero es algo imprescindible. El problema es que solo te acuerdas de las copas de seguridad cuando realmente las necesitas, y en ese momento, es posible que te acuerdes para mal, porque o bien en un momento determinado decidiste no hacerlas o decidiste dejarlo para mas adelante (maldita procrastinación), o simplemente no comprobaste que realmente se estuvieran haciendo de forma correcta. Así, tener un sistema de copias de seguridad eficaz y eficiente es realmente imprescindible y fundamental. En este episodio te hablaré sobre Restic, una alternativa a Borg que estoy probando y que con mucha probabilidad se quedé como sistema por defecto en las próximas semanas.Más información y enlaces en las notas del episodio
AGRIGENTO (ITALPRESS) - "Mi auguro che ci sia un clima propositivo e positivo perché saremo sotto i riflettori internazionali". Lo ha detto Giacomo Minio, presidente della Fondazione Agrigento Capitale Italiana della Cultura 2025, a margine della conferenza stampa di presentazione del nuovo logo e della campagna di comunicazione.col/sat/gsl
AGRIGENTO (ITALPRESS) - "Mi auguro che ci sia un clima propositivo e positivo perché saremo sotto i riflettori internazionali". Lo ha detto Giacomo Minio, presidente della Fondazione Agrigento Capitale Italiana della Cultura 2025, a margine della conferenza stampa di presentazione del nuovo logo e della campagna di comunicazione.col/sat/gsl
In this episode, Daniel Valdivia, an engineer from MinIO, discusses his participation at KubeCon and his work in Kubernetes integrations and AI initiatives. We discussed the significance of object storage standardization via the Open Platform for Enterprise AI (OPEA), emphasizing the flexibility and scalability of MinIO's offerings. Daniel highlights MinIO's contributions to open source projects like PyTorch and Spark and shares insights on new hardware technologies like PCIe Gen 5. Daniel also announces the launch of MinIO's new AI store, designed to empower enterprises to efficiently manage exascale infrastructure and AI pipelines. 00:00 Introduction 00:13 Meet Daniel Valdivia: Engineer at Minio 00:24 The Importance of Kubernetes Integrations 00:43 Intel's Open Platform for Enterprise AI 00:58 MinIO's Unique Object Storage Solutions 01:56 Community Participation and Contributions 02:18 Ensuring Compatibility with AI Hardware 03:20 The Role of OPEA in Enterprise AI 05:56 Open Source Contributions and Challenges 09:12 Future of AI and Hardware Innovations 13:23 Big Announcement 14:40 Conclusion and Final Thoughts Guest: Daniel Valdivia is an engineer with MinIO where he focuses on Kubernetes, ML/AI and VMware. Prior to joining MinIO, Daniel was the Head of Machine Learning for Espressive. Daniel has held senior application development roles with ServiceNow, Oracle and Freescale. Daniel holds a Bachelor of Engineering from Tecnológico de Monterrey, Campus Guadalajara and Bachelor of Science in Computer Engineering from Instituto Tecnológico y de Estudios Superiores de Monterrey.
Today's guest is Ahmed Azam, Head of Infrastructure and Cloud Services at Northwestern Mutual. Ahmed joins Emerj Senior Editor Matthew DeMello to explore the organization's transformative journey in adopting cloud technology. With roots tracing back to 1857, Northwestern Mutual has continuously evolved, leveraging technological advancements to maintain a competitive edge. Ahmed shares insightful stories about the company's pioneering history, including its early adoption of mainframe computing and the more recent integration of cloud-based solutions. This episode is sponsored by MinIO. Find out more about sponsored content and how to engage with the Emerj audience at emerj.com/ad1.
Today on the Tech Bytes podcast we welcome back sponsor MinIO to talk about how AI is altering the data infrastructure landscape, and why organizations are looking to build AI infrastructure on-prem. We also dig into MinIO's AIStor, a software-only, distributed object store that offers simplicity, scalability, and performance for AI infrastructure and other high-performance... Read more »
Today on the Tech Bytes podcast we welcome back sponsor MinIO to talk about how AI is altering the data infrastructure landscape, and why organizations are looking to build AI infrastructure on-prem. We also dig into MinIO's AIStor, a software-only, distributed object store that offers simplicity, scalability, and performance for AI infrastructure and other high-performance... Read more »
Pay rates for IT security professionals are rising faster than inflation, but burnout and stress are growing faster. A survey of UK security professionals revealed the fast pace of modern security and the risk of unknown failure is causing skilled practitioners to leave the field. Would yet more pay fix the problem, or is there another way to address IT security staff retention? This and more on the Rundown. Time Stamps: 0:00 - Welcome to the Rundown 1:19 - Can AMD Top NVIDIA? 3:50 - Quantum AI Isn't a Thing 7:13 - MinIO Introduces AIStor 12:23 - Amazon Employee Details Exposed in MoveIt Breach 15:20 - Marslink is Further Away than Starlink 18:19 - AI is writing Google's Code 22:05 - Amazon Won't Go Nuclear 26:41 - Is IT Security Too Stressful for the Money? 35:45 - The Weeks Ahead 37:23 - Thanks for Watching Hosts: Tom Hollingsworth: https://www.linkedin.com/in/networkingnerd/ Alastair Cooke: https://www.linkedin.com/in/alastaircooke/ Follow Gestalt IT Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/Gestalt-IT #Rundown, #AI, #AIStor, #CyberSecurity, #AWS, @NetworkingNerd, @DemitaasseNZ, @GestaltIT, @TechstrongTV, @TheGuturumGroup, @TechFieldDay, @AMD, @NVIDIA, @MinIO, @GoogleCloud, @Google, @AWSCloud,
Today's guest is Yonas Yohannes, CTO of FinTech and FIS at Oracle. An accomplished executive and author, Yonas joins us on today's podcast to explain the evolving role of endpoint storage for driving new AI capabilities at the edge. He breaks down AI's true value beyond the marketing hype, and its broader impact on infrastructure across industries, with a special focus on financial services. Throughout the episode, Yonas addresses the real challenges businesses face in adopting AI while ensuring transparency and avoiding regulatory risk. This episode is sponsored by MinIO. Learn how brands work with Emerj and other Emerj Media options at emerj.com/ad1.
Today's guest is Ylan Kazi, Chief Data and AI Officer at Blue Cross Blue Shield North Dakota. Ylan joins us on today's program to discuss the complexities faced by leaders in legacy industries, such as healthcare, as they navigate the balance between infrastructure investments in cloud technologies and end-point storage to meet business goals. Throughout the episode, Ylan shares insights on developing a robust business strategy for cloud migration, highlighting common pitfalls like cost overruns and outdated mindsets. This episode is sponsored by MinIO. Find out more about sponsored content and how to engage with the Emerj audience at emerj.com/ad1.
Mark Rostick is a Vice President & Senior Managing Director located in Raleigh, NC. He is a voting member of Intel Capital's investment committee. He joined Intel Capital in 1999. Mark also co-manages our Cloud domain investment activities and portfolio. He has deep investment experience in cloud applications, infrastructure hardware and software, as well as AI/ML. As a member of Intel Capital's Investment Committee, he is responsible for approving investments proposed by Intel Capital investors, as well as managing the group's personnel and operations. Mark currently serves as a director or observer on the boards of Beep, RunPod, Hypersonic, Immuta, Lilt, MinIO, Opaque Systems, Tetrate, and Verta. Prior to Intel, Mark worked as a practicing attorney and in banking. You can learn more about: How to invest in the top AI/ML companies How to build a successful career in corporate venture The evolving landscape of enterprise software investments #IntelCapital #VentureCapital #TechInvestment #CloudComputing #AI #ML ===================== YouTube: @GraceGongCEO Newsletter: @SmartVenture LinkedIn: @GraceGong TikTok: @GraceGongCEO IG: @GraceGongCEO Twitter: @GraceGongGG ===================== Join the SVP fam with your host Grace Gong. In each episode, we are going to have conversations with some of the top investors, superstar founders, as well as well-known tech executives in silicon valley. We will have a coffee chat with them to learn their ways of thinking and actionable tips on how to build or invest in a successful company.
Today's guest is Robert Wenier, Global Head of Cloud and Infrastructure at AstraZeneca. Robert joins us on the program to explore the complex decisions faced by leaders in legacy industries as they balance infrastructure investments between cloud technologies and end-point storage. How can they align these investments with their business goals while managing the competing forces of performance, risk, and cost? We break down the strategic considerations: ensuring technology delivers the required performance, carefully monitoring risks like security and capacity, and managing costs to create value and maintain margins. This episode is sponsored by MinIO. Learn how brands work with Emerj and other Emerj Media options at emerj.com/ad1.
Today's guest is Shardul Vikram, CTO and Head of Data & AI for SAP Industries and Customer Experience. Shardul joins Emerj Senior Editor Matthew DeMello on today's program to explore the evolving landscape of cloud adoption and storage solutions within the life sciences and financial services industries. A decade has passed since cloud technology burst onto the scene with great promises, yet today, not everything resides “on the cloud.” As the hype around new technologies like AI starts to cool, Shardul offers legacy and regulated industry leaders actionable insights on driving a balanced approach—leveraging both cloud and endpoint storage to achieve their unique goals. Today's episode is part of a special series sponsored by MinIO for a deep dive into the challenges and opportunities at the intersection of infrastructure investment, technology strategy, and competitive advantage in today's evolving landscape. Learn how brands work with Emerj and other Emerj Media options at emerj.com/ad1.
In this episode of Tech.Strong.Women., Jodi Ashley and Tracy Ragan are joined by Garima Kapoor, a remarkable founder with a PhD in finance and economics who transitioned from academia to entrepreneurship. She co-founded MinIO, an open source object storage company, alongside her engineer husband. Garima highlighted the challenges she faced as a woman founder, including gender discrimination in fundraising and business meetings. She underscored the need for greater female representation in leadership and the support of women-led companies. MinIO's success is attributed to its open source model and robust community engagement, with Garima emphasizing the importance of strong licenses and ecosystem contributions. She also discussed building an inclusive company culture at MinIO, prioritizing work-life balance, and making meaningful investments for long-term growth.
Mark Khavkin tells us that from the very beginning of his career journey—a 2008 role as an investment professional with a European private equity firm—he was able to gain experience in board strategy, investor relations, and entrepreneurial exploration. This foundation allowed him to read boardroom dynamics from very early on and prepared him to anticipate a variety of operational perspectives that would set the stage for his path forward. Transitioning to Silicon Valley, Khavkin joined eBay's corporate development team, where he learned to align acquisition opportunities with the strategic goals of business units and technology leaders—experience that deepened his understanding of operational management and strategic planning. A pivotal moment came when a former eBay divisional CFO who had served as a mentor invited Khavkin to join oDesk (later Upwork) as FP&A lead. This role allowed him to influence company culture and drive change from within the finance function. At Upwork, Khavkin tells us he sharpened his ability to integrate investor narratives with internal strategies, from marketing to product development. His ability to present a cohesive story from market opportunities to long-term strategy proved instrumental during the early milestones of Upwork's IPO journey. Throughout his career, Khavkin has come to pursue experiences that would require a unique blend of investment acumen, strategic insight, and leadership impact. His journey highlights the importance of understanding both investor perspectives and operational realities, while crafting a narrative that demonstrates insight into both.
Today on the Tech Bytes podcast we talk with Jonathan Symonds, Chief Marketing Officer at MinIO about MinIO’s object storage offering; a software-defined, Amazon S3-compatible object storage that offers high performance and scale for modern workloads and AI/ML. We discuss how MinIO helps customers across industries drive AI innovation and AI architectures, how object storage... Read more »
Today on the Tech Bytes podcast we talk with Jonathan Symonds, Chief Marketing Officer at MinIO about MinIO’s object storage offering; a software-defined, Amazon S3-compatible object storage that offers high performance and scale for modern workloads and AI/ML. We discuss how MinIO helps customers across industries drive AI innovation and AI architectures, how object storage... Read more »
Today on the Tech Bytes podcast we talk with Jonathan Symonds, Chief Marketing Officer at MinIO about MinIO’s object storage offering; a software-defined, Amazon S3-compatible object storage that offers high performance and scale for modern workloads and AI/ML. We discuss how MinIO helps customers across industries drive AI innovation and AI architectures, how object storage... Read more »
It's return guest season here at Latent Space! We last talked to Kanjun in October and Jonathan in May (and December post Databricks acquisition): Imbue and Databricks are back for a rare treat: a double-header interview talking about DBRX from Databricks and Imbue 70B, a new internal LLM that “outperforms GPT-4o” zero-shot on a range of reasoning and coding-related benchmarks and datasets, while using 7x less data than Llama 3 70B.While Imbue, being an agents company rather than a model provider, are not releasing their models today, they are releasing almost everything else: * Cleaned-up and extended versions of 11 of the most popular NLP reasoning benchmarks* An entirely new code-focused reasoning benchmark* A fine-tuned 70B model, built with Meta Llama 3, to identify ambiguity* A new dataset of 450,000 human judgments about ambiguity* Infrastructure scripts for bringing a cluster from bare metal to robust, high performance training* Our cost-aware hyperparameter optimizer, CARBS, which automatically and systematically fine-tunes all hyperparameters to derive optimum performance for models of any sizeAs well as EXTREMELY detailed posts on the infrastructure needs, hyperparameter search, and clean versions of the sorry state of industry standard benchmarks. This means for the FIRST TIME (perhaps since Meta's OPT-175B in 2022?) you have this level of educational detail into the hardware and ML nitty gritty of training extremely large LLMs, and if you are in fact training LLMs of this scale you now have evals, optimizers, scripts, and human data/benchmarks you can use to move the industry forward together with Imbue.We are busy running the sold-out AI Engineer World's Fair today, and so are unable to do our usual quality writeup, however, please enjoy our show notes and the excellent conversation! Thanks also to Kanjun, Ashley, Tom and the rest of team Imbue for setting up this interview behind the scenes.Video podTimestamps* [00:00:00] Introduction and catch up with guests* [00:01:55] Databricks' text to image model release* [00:03:46] Details about the DBRX model* [00:05:26] Imbue's infrastructure, evaluation, and hyperparameter optimizer releases* [00:09:18] Challenges of training foundation models and getting infrastructure to work* [00:12:03] Details of Imbue's cluster setup* [00:18:53] Process of bringing machines online and common failures* [00:22:52] Health checks and monitoring for the cluster* [00:25:06] Typical timelines and team composition for setting up a cluster* [00:27:24] Monitoring GPU utilization and performance* [00:29:39] Open source tools and libraries used* [00:32:33] Reproducibility and portability of cluster setup* [00:35:57] Infrastructure changes needed for different model architectures* [00:40:49] Imbue's focus on text-only models for coding and reasoning* [00:42:26] CARBS hyperparameter tuner and cost-aware optimization* [00:51:01] Emergence and CARBS* [00:53:18] Evaluation datasets and reproducing them with high quality* [00:58:40] Challenges of evaluating on more realistic tasks* [01:06:01] Abstract reasoning benchmarks like ARC* [01:10:13] Long context evaluation and needle-in-a-haystack tasks* [01:13:50] Function calling and tool use evaluation* [01:19:19] Imbue's future plans for coding and reasoning applications* [01:20:14] Databricks' future plans for useful applications and upcoming blog postsTranscriptSWYX [00:00:00]: Welcome to the Latent Space Podcast, another super special edition. Today, we have sort of like a two-header. John Frankel from Mosaic Databricks, or Databricks Mosaic, and Josh Albrecht from MBU. Welcome.JOSH [00:00:12]: Hey, glad to be here.SWYX [00:00:14]: Thank you for having us. Hey, so both of you are kind of past guests. Jonathan, you were actually one of the most popular episodes from last year talking about MPT7B. Remember the days when we trained large models and there was 7B?JONATHAN [00:00:30]: Yeah, back when reproducing LLAMA1-7B was considered a huge accomplishment for the field. Those are the good old days. I miss that.SWYX [00:00:38]: As the things have accelerated a lot. Actually, let's do a quick catch up and Josh, you can chime on in as well. So Databricks got acquired. I talked to you at New York.JONATHAN [00:00:45]: Mosaic got acquired, although sometimes it feels like Mosaic acquired Databricks because, you know, we're having a lot of fun being here. But, you know, yeah.SWYX [00:00:52]: Yeah. I mean, you are chief scientist now of Databricks.JONATHAN [00:00:55]: Chief AI scientist. Careful with the title. As much as I would love to understand how Spark works, I'm going to have to defer that to much smarter people than me.SWYX [00:01:03]: Got it. And I don't know about like what you would highlight so far as a post-acquisition, but the most recent news is that you guys released DBRX. Is that the thing that most people should be aware of?JONATHAN [00:01:13]: Actually, that's no longer the most recent news. Honestly, the most recent news, we announced this, but it was at our Data and AI Summit last week. So it was announced among like 100,000 other things, is that we finally released our text to image model, which has been a year in the making through a collaboration directly with Shutterstock. There was a lot of work put into finding a dataset that we were comfortable with working on and trying to build a model that honestly, I felt like I could trust and that others might be able to trust to put out in the world. So that model was released last week. It's unfortunately just available via API due to the fact that the data is quite sensitive and quite valuable. It's Shutterstock's entire business in a lot of ways, but I'm still really excited that there's now a model that is trained on a dataset where the provenance of every single image is known, and it's a damn good model. So I'm really proud of the team on that.SWYX [00:01:55]: Yeah, amazing. Josh, do you have any thoughts on image model questions?JOSH [00:01:59]: That is not my area of expertise, but I was excited to see the release of it last week as well, and very happy that you guys did a nice job on the data side of everything there. So that was cool to see.SWYX [00:02:09]: I think what's unusual is like, I think Shutterstock's doing multiple deals in multiple labs. So what is the Shutterstock model? Like, I guess, is this the house model for Shutterstock? Is this Databricks' version of the Shutterstock model? Like, what is this?JONATHAN [00:02:22]: The way that I would think about it is that Shutterstock is doing an amazing business in AI across the board. Their dataset is kind of widely known to be the best stock photos dataset in the world, the most comprehensive, the biggest. When you think about like, what dataset am I going to train a multimodal model on? You call Shutterstock. And I, at least I've heard in the news, like OpenAI, Google, Meta, Apple have all called Shutterstock and made those deals. So a lot of models have had Shutterstock data incorporated into them. But this is the only model I know of so far where it was, you know, exclusively and specifically trained just on the vanilla Shutterstock data. There was nothing else mixed in. We didn't go and scrape the web and find other data or combined datasets or anything like that. And so this is, in some sense, the house blend. But the other piece is that it's just a dataset where the provenance of every image is known in public. Where did the data come from? It is the Shutterstock collection. That's it. You know, nothing less, nothing more. And certainly being at Databricks, if I've learned one thing, I've learned about enterprise customers and what they want out of AI. And one of the things they ask for most is just, what can you tell me about the data the model was trained on? And here, especially for text to image models, where images are just tricky subject matter, there's been a lot of kind of legal conversation about images, especially. It's nice to just have something where I can point to it and say, you know, if you want to know where the images came from, these are what they are and this is how they got there.SWYX [00:03:36]: I will talk a little bit about Databricks because it's relevant to the rest of today's episode. So Databricks, sorry, I keep misspeaking. It's DBRX.JONATHAN [00:03:46]: DBRX, actually, there's been a pronunciation update. It is now D-B-Rex. So we have decided to add a dinosaur mascot because what model doesn't like a mascot? So literally, I wish I could pull it up. There is a little plush dinosaur that we had made. It's like the world's cutest dinosaur, but it is the official mascot of D-B-Rex. And there's a little dinosaur logo that, you know, you'll probably see around a little bit more because DBRX is a mouthful, but D-B-Rex, like, you know, it's just kind of...SWYX [00:04:13]: Rolls off the tongue. I love mascots. Like every company should have a mascot. And I think Hugging Face got it right. You need an emoji mascot because that's the minimal viable image.JONATHAN [00:04:21]: I probably shouldn't talk at all about, you know, Velociraptor, but, you know, that's a, maybe that's something we can talk about later in the summer. I'll just leave it at that.SWYX [00:04:28]: Okay. That's a hint to names. I feel like your names leak a lot of alpha. So just to quickly cover the headline details, DBRX, as Make Sure Experts model, that's fairly big, 132 billion total parameters, so 36 billion active on any input, pre-trained on 12 trillion tokens of text and code, and did really well on evals to the point where you had to dye your hair blue. That's my high level conclusion.JONATHAN [00:04:53]: Never make a bet with your team two weeks out from model launch, even when, you know, human eval is looking quite bad. Because if you set some bar, even if it's arbitrary and you think there's no way in hell they're going to hit it, apparently money doesn't motivate people anymore. Humiliating their boss motivates people. So Josh, you should really take a hint from this. You know, you cannot pay someone enough money to make up for you dyeing your hair blue.JOSH [00:05:15]: I'll keep that in mind for our next model.SWYX [00:05:17]: It works. So speaking of Imbue's next model, perhaps Josh, you want to actually just say hi to the general sort of latent space audience and talk about what we're releasing today. Yeah.JOSH [00:05:26]: I'm Josh, CTO of Imbue, and we're not releasing the model. We're not releasing the weights, but we are releasing a bunch of different things that should make it easier for other people to make their own models. So I think right now, training foundation models from scratch is like a very difficult, time-consuming, expensive, kind of risky endeavor, especially for smaller companies. And the things that we're releasing hopefully make that at least a little bit easier. So the things that we're releasing fall into kind of three different buckets. One is infrastructure and scripts for dealing with the kind of hardware and hardware failures and understanding how well is the actually lowest level of thing actually working so that you can actually do your training at all and at a reasonable speed without having to constantly restart, etc. So infrastructure and training scripts. A second set of things is around the evaluation. So after you've trained it, like how well is this actually working and how do you know how well it's working? We're releasing a whole bunch of different data there, a new benchmark about code, reasoning, understanding, as well as our own private versions of 11 different open source benchmarks. So things like pool queue or ANLI, where we've gone through and kind of cleaned up the data as much as possible by looking at all the ones that models get wrong or that are flagged for ambiguity and also our own kind of private reproductions of those where we've done like a kind of clean room black box, like, okay, this is what the data set is supposed to be. Here are some examples. Let's make our own version of this to make sure that there is no data contamination, etc. To make sure that we're actually, you know, not testing on train. And then I think a final thing that we're releasing there is around 450,000 human judgments about ambiguity and question quality, which we used in the process of cleaning these evaluations and we also hope will be helpful for other people training kind of similar models. And then the third thing is CARBS, our hyperparameter, our cost-aware hyperparameter optimizer, which was especially helpful for being able to experiment at much smaller scales and then scale those experiments up to the much larger scale kind of on the first try without having to retry it. You don't want to be training, you know, 10, 20 different 70B models. You really want to get these larger modelsSWYX [00:07:30]: right on the first try.JOSH [00:07:30]: And so the ability to kind of tune things very precisely and learn scaling laws, not just for, you know, the like data and flops, but also for learning rate and all the other hyperparameters and see like how should you scale these things up was extremely valuable to us as we were training the larger models. Yeah, that's a lot of stuff.SWYX [00:07:49]: Yeah, exactly. So there's a bunch of stuffJOSH [00:07:50]: we'll have to go through all of it.JONATHAN [00:07:52]: Yeah, I just want to throw in how excited I am about this. This is the stuff that nobody ever talks about. That is the difference between success and failure in this stuff. Like, can you get your cluster to run? Can you get software on your cluster? Can you figure out what broke? Because fault tolerance is still not really built into any of the fundamental primitives of training models. And so if something breaks, you have to go figure out what broke, your job stops, you have to restart your job. It is a nightmare just to get to the point where anything can train on the cluster. A basic MPI hello world that has the GPUs talk to each other is hard enough, let alone actually training a model, let alone getting good performance out of the GPUs, let alone actually getting a model that converges to anything interesting. There's so many levels of things you have to accomplish. This is the kind of stuff that matters. I think to a point that Josh made earlier, before we got on here, there are plenty of weights out there. Nobody's released this.JOSH [00:08:46]: Yeah, that was part of the motivation actually is that there are lots of other things that are complimentary, but I have not seen nearly as much discussion about some of these other things that we think are pretty important. I mean, in some sense,SWYX [00:08:56]: I'm very excited to have Jonathan on because this is a little bit, you're a bread and butter with Mosaic. And I think you've released some part with Composer. And I think it's just really interesting to see like a different take, basically a full stack take that's kind of open source today.JONATHAN [00:09:18]: Yeah, it's really kind of, it's been an ordeal to figure this out. And every time something changes, whether it's a new GPU or even a new driver update, you get new creative errors and new things go wrong. And, you know, we've dealt with the weirdest things from, you know, our InfiniBand cables getting stolen from the data center twice, like in boxes before they arrived at the data center. Like, you know, Porch Pirate basically had stolen our InfiniBand cables back when those were hard to come by. To like, you know, weird recalls of switches to like the strangest stuff has happened. I have my favorite GPU failures I've seen, like ones where the GPU doesn't fail, it has a correctable memory issue and the memory correction causes the GPU to become a straggler and hold up the whole job. Like weird stuff happens and figuring out how to not just identify all of that, but then eventually productize it, is in some sense, the entire story of Mosaic and now Databricks in terms of our ML offering. Really, the thing we offer is we have gone through this suffering and figured out how to even productize that. It has been a pain in the butt.SWYX [00:10:20]: Yeah, it's a lot of work.JOSH [00:10:20]: I think my favorite failure was GPU is just giving wrong math. Like if they give errors, great, because you can see the errors, but if they just give you the wrong math back, not so fun.SWYX [00:10:30]: When did they give you wrong math?JOSH [00:10:32]: Like literally you could just, you know, add two things. For example, the numbers come back. They're not the numbers that they're supposed to be.JONATHAN [00:10:40]: I think it's important to say at this stage, just because like it, I think it goes without saying for Josh and I, but it's worth saying here, this isn't to say that like anything is wrong with us. It's not like NVIDIA did a bad job or, you know, Mellanox did a bad job or the like the server builder, the data center operator, the cloud provider, like the million other parties that are involved in building this. We are running these insane chips that are huge and complicated and built on tiny transistors at insane frequencies with insane heat in data centers that for the most part, were not built remotely for this kind of power or heat and have been retrofitted for this. Like failures happen on a good day with normal CPUs. And this is not a good day and not a normal CPU for the most part. It's fun to joke about all the weird things we see. This is not to say anybody's done anything wrong. This is just kind of part and parcel of working on a massive cluster running at multiple megawatts of power at a time.SWYX [00:11:32]: It's crazy. Yeah.JONATHAN [00:11:33]: So optical cables, like all sorts, like everything.SWYX [00:11:37]: I'll take the opportunity to start going to the sort of infra piece. There's just like a description of the infra just to give people a sense of what we talk about when we talk about massive clusters. So I'm just going to read off the blog post here. This post is about one cluster that has 4,092 H100 GPUs spread across 511 computers. They use unified fabric manager nodes, which manage the infinite band network. And you talk a little bit about your networking. Is there anything unusual about this setup that you'll call out to people?JOSH [00:12:03]: Yeah, actually this particular cluster is a little bit non-standard. The normal, like vanilla setup for these large clusters as vanilla as it can be is what's normally like a 127 node cluster. So closer to like 1024 GPUs instead of 4,000. Here we have a larger cluster. As you start to get into the larger clusters, the networking becomes a little bit more custom. It's a little bit more, it's a little bit trickier. It's a little bit more difficult to get these things to all be able to talk to each other at the same speed. And so this has, in this particular case, this is a three tier network architecture instead of two tiers, kind of the normal one. So most of the clusters are a little bit smaller. As you get to even larger scales, then this becomes even much more complicated,SWYX [00:12:43]: much more expensive.JOSH [00:12:43]: So we chose this particular scale, kind of knowing our own workloads and kind of what we wanted to do. This was kind of the right size for us. But yeah, I think it's not exactly vanilla already. It's already getting into kind of the custom territory.SWYX [00:12:54]: So my understanding is that there, and is there any part of this that comes with the Voltage Park deal that you guys had? Is that part of the hardware that you got from the deal with them?JOSH [00:13:04]: Yeah, so we worked really closely with Voltage Park to set up all their clusters and infrastructure and everything and kind of decide even like what to order, how should the networking work? Like we were very involved in kind of the construction and bring up of this. And that's what this post is about, is about that process of like bringing up all these, there's like different clusters in different places of different scales. So in this particular post, we're talking about this one 4096 GPU, but there are other clusters that they have as well. And we were very closely involved with figuring out the exact architecture and kind of the trade-offs that go along with picking, you know, those exact components. You really don't want to like place the wrong order because it takes months to get it and it's very expensive. So yeah, we were happy to help out with that.JONATHAN [00:13:43]: And then your bit of good cables get stolen.SWYX [00:13:44]: Yeah, yeah, exactly.JOSH [00:13:47]: We wanted to make sure that we ended up with compute that would work for us and that would also work for their other customers. And so we kind of helped design something so that we would get exactly what we were looking for. We knew that these kinds of details would be super important and that getting down to the level of the hardware and like having these good scripts and everything was going to be a core part of like actually getting this to work. I'm very glad that we did that. I don't think that most companies kind of take that full stack approach, but for us, it certainly paid off.SWYX [00:14:12]: Yeah, it's basically sort of built to spec. It's interesting that relationship because you usually, for the rest of us who don't operate at your scale, we take whatever we can get from cloud providers, but you are basically co-designing from the single machine up. And you described that a little bit. Do you want to take us through the process that you described here?JOSH [00:14:27]: Yeah, so for the actual, like the blog post and kind of bringing these machines online.SWYX [00:14:32]: Yeah.JOSH [00:14:32]: So yeah, I think the process, as we have it broken down in the blog post, there's kind of a few different layers. First is like getting the individual machines to work at all and then getting the machines to actually be able to talk to each other. So getting the InfiniBand networking to work and then getting to a point where, you know, not just the machines are working and they can talk to each other, but everything is actually working correctly. There's a big gap between like it's working at all to it's working perfectly correctly. And then after you have all this stuff working perfectly correctly, nice and healthy, then now you get into kind of the software data, like training issues. And then after that, you're still not done. Like now, even once you're training at full speed, things are going to fail over time. Things are going to change. There's going to be new, you know, firmware updates. Like how do you kind of deal with this change and flux over time without going crazySWYX [00:15:16]: and pulling your hair out,JOSH [00:15:16]: trying to like reproduce things or understand why there were regressions. And so there's a lot of work to kind of automate the infrastructure tooling as well. And kind of the first step, like bringing these things online in the first place, you know, you have hundreds of machines at this point. So you don't necessarily want to be like walking around with like a CD-ROM or a USB drive, like plugging it in with your keyboard, like hitting next, next, next on the OS install. That's not how this works. You do that for one machine. And then you use, we use this thing called Metal as a Service to bring up all the other machines. So it's a kind of server that can kind of install the operating system on these other machines. So most like when you're talking about these machines, like each machine is, you know, on the order of hundreds of thousands of dollars. So they usually come with a kind of out-of-band management interface as well. So they don't, they have their InfiniBand networking. They have their normal 100 gigabit per second Ethernet networking. These are like dual, redundant, et cetera. And then you also have this extra out-of-band management network. So you can log in and you can see like the boot screen or you can see the blue screen of death. You can like get in there and actually see what was wrong, which is pretty fun. And it makes it like possible to automate a lot of this work. So the beginning of that, and the blog post goes into much more detail about like exactly how we set these up and kind of the other errors that we ran into. When you're bringing these online, you'll definitely have failures. Even if they all worked in the factory, they get shipped, some parts come loose, something fails, something goes wrong. So when you're bringing them online, there'll be some that don't quite work for all sorts of reasons. As you start to be working with machines at this scale, like if something happens one in a thousand times, you're like pretty likely to see it. And so you can get pretty rare, weird things, especially since we had fairly early builds and fairly early versions of this hardware. Like these are some of the like first machines that were ever produced, some of the first GPUs. So you've got some extra special things there. We definitely worked with Dell, for example, on making fixes in the firmware level to be like, okay, like this thing is wrong. Like we need to update this at the firmware to like actually fix this particular thing. So we worked pretty closely with Dell and Nvidia. Yeah, that's what I'm saying. Like this stuff gets complicated. And the thing is like, you know, taking a step back, the whole reason we're doing this, right, is that we knew that this was going to be complicated. There would be these kinds of failures. And if we're just using, you know, AWS or some other cloud provider, these errors are still gonna be there and you're gonna have no way to know and no way to debug this and no way to diagnose what's going wrong. And so we would much rather be able to like call up Dell and say, hey, this isn't working. And they're like, yep, okay, cool. Let's debug it together. Oh, I see. Yeah, cool. We'll ship a firmware update and actually fix this for you. That was a much better experience than like, great, just magically fails. I guess we restart and hope that that machine goes away. Like that's not a very good place to be. So yeah, that's kind of the first place is getting to a place where like GPU training is working on your single node machines. You can observe stuff. We have tons of tooling around like, you know, Prometheus and all sorts of other tools for understanding what's going on in these machines because you don't want to be like logging into each one and looking at the temperature or something you really need to have tooling to collect all these metrics, et cetera. Unfortunately, all of the scripts that we have for this are like for this entire cluster and for all this infrastructure are a little bit like special purpose for our particular thing. So it's not that every script that we have, it's not that you can just like take this and plug this in. Even if we did open source all the tooling that we have, you'd still have to do like a lot of work to open source it. What we are releasing is as many of the things that we can that are going to be useful for other people. You're still going to have to have some way of kind of managing these things, making your own like logging aggregators, et cetera, et cetera. So that's kind of bringing them up to the like, you know, the single nodes that are working. From there, it goes into, I'm happy to keep going if you want. Well, I just want to leave the opportunity for JohnSWYX [00:18:53]: to comment if there's anything that's different from how he runs things.JONATHAN [00:18:57]: Oh, I mean, all I'll say is I'll endorse this and say this s**t is hard. Like this is really, really hard. And, you know, I have a special props to, you know, the folks in Vue because they were building this from the ground up. You know, at Databricks and at Mosaic, we typically work with cloud providers because some of this stuff is just, there's too much to handle. It's complicated. There's a lot to deal with. And this doesn't even get into things like physical security, you know, securing power if you're the data center operator. Like this gets infinitely complicated and you have to abstract somewhere. Like, you know, and then you get to the folks who are literally building their own custom chips and like, good God.SWYX [00:19:36]: Like, oh my God, that's, you know,JONATHAN [00:19:38]: if you're one of those folks, you're having, you know, pour one out for the infra people at some of the AI chip startups who are having a really, really interesting time right now. But this stuff is really hard. And I don't think we talk about it much because there's so many other things that are hard. But the other hard things, I think everybody's becoming pretty familiar with at this point. This is something that I don't think there's ever really been a comprehensive discussion of, at least not that I've seen.SWYX [00:20:00]: Yeah, so my impression is that you guys, Mosaic, have your own software for sort of spinning up and down machines, just like Imbue had to build. But Imbue probably, it sounds like Imbue, you guys went fuller stack. I don't know how to describe it. Like Mosaic is not working with Dell on like their firmware.JONATHAN [00:20:21]: No, no, we're typically working with like, you know, pick your cloud provider on their Dell firmware or what have you. Like, it's kind of, I think one of the things, I don't know, Josh, you can correct me on this. It's kind of impossible if you're doing training to not go all the way through the entire stack, regardless of what happens. Like somehow I'm still chatting with cloud providers about power contracts, even though the whole point of dealing with the cloud provider is not to have to think about power contracts. Somehow I'm still asking them about which InfiniBand provider they used this time to see if this is part of the bad batch of cables I encountered on that cloud provider or what have you. Or like, we're still talking about a firmware update from pick your provider. You can't not do this. It's convenient that they have data center staff who are worrying about what to send back to which provider when, and they have people who can go and wait for the InfiniBand cables so they don't get stolen outside. But, you know, it's kind of, it's impossible not to really go full stack if you're thinking about the infrastructure at all. I don't know, Josh, correct me. No, I think that's right.JOSH [00:21:17]: That's what we expected from the beginning as well, is that we would inevitably have to get into the details here. And I'm glad that we kind of just planned for it. I think it made it a lot easier from our perspective to have direct control over this. Instead of having to go to the cloud provider that goes to the data center, that goes to the supplier, we could just go direct to NVIDIA or DellSWYX [00:21:37]: or the data center,JOSH [00:21:37]: whoever was responsible and be like, hey, this thing needs to change. And they're like, oh, okay. Yeah, that is our responsibility. Great, we can fix that. So it was just a lot easier for us to fix these bugs than if we had to go through an extra layer of email.SWYX [00:21:48]: Something we discussed in the pre-show was that you had a rule of thumb for your cluster of reliability. You say here in the post, by and large, you expect around 3% of your machines to break every week. So you're basically going to turn through all your machines in a year.JOSH [00:22:04]: As it says in the post. So that would be true if it was a uniform failure like that. But as it says in the post, it's usually these kind of problematic nodes. And to be clear, that is the number that we've heard from other people is like they're having about 3%. I don't think we're experiencing failure rates that are that high. I think ours is actually quite a bit lower than that, probably because we've taken the time to like dig into a large, maybe larger number than we should have of these failures and get to the root cause of it and be like, oh, okay, like that's exactly what's going wrong.SWYX [00:22:33]: How do we fix this?JOSH [00:22:33]: How do we prevent this from happening? How do we make automated checks for this so that if it does happen, it just goes back to whoever owns that particular part of the process and they can fix it immediately.SWYX [00:22:43]: And that's part of what you're also open sourcing, which is the health checks, right? You got the NIC health checks, GPU health check, this space health check, Docker D message. I don't know what that is.JOSH [00:22:52]: That one is just a lot of stuff.SWYX [00:22:54]: Yeah.JOSH [00:22:55]: That one is one where we realized that actually like when these machines boot, sometimes they wouldn't actually boot cleanly all the way. Or when they rebooted, they had problems that they didn't have when they were working before, which was kind of frustrating. Like usually if you restart your computer,SWYX [00:23:08]: it gets better.JOSH [00:23:08]: Here you restart. It did not get better.SWYX [00:23:10]: It got worse.JOSH [00:23:10]: That was very frustrating. So this health check looks at every particular line we've ever seen from the boot, like in D message, like every single log line that your computer emitsSWYX [00:23:21]: and says like,JOSH [00:23:21]: have we ever seen this before?SWYX [00:23:23]: Is this expected?JOSH [00:23:23]: Is this in the right order? Or is there something out of place? If there's anything out of place, let me say, okay, great. Like now it goes into this, like longer, more triage list of like, all right, great. Like, is this acceptable?SWYX [00:23:33]: Should we flag this?JOSH [00:23:33]: Like, should someone take a look at this? So we're looking down at a very, very granular detail level, what's happening on these computers to make sure that nothing is out of place. And that's critical because without that, if you're running your training, as Jonathan said, and this thing is slow, like what are you supposed to do? Right?SWYX [00:23:49]: Like you really,JOSH [00:23:49]: you really want to be very certain that like all 4,000 of these GPUs are working like they're supposed to.SWYX [00:23:54]: We know that.JOSH [00:23:54]: And so if it's slow, it's because like we messed up the config or something else and not because of this earlier thing that's like really hard to detect in software later.JONATHAN [00:24:01]: Yeah. I think the, I'm just curious to ask,SWYX [00:24:03]: like, you know,JONATHAN [00:24:03]: suppose you were to set up another, let's say another H100 cluster and it were at a different data center. And instead of the vendor being Dell, it was super micro or what have you. How much of this would be repeatable? And how much of this would you have to redo? I, you know, I genuinely don't know.SWYX [00:24:18]: A decent amount.JOSH [00:24:19]: I think it would go a lot faster the second time. I think there's lots of learnings that we had. And also the blog post,SWYX [00:24:24]: you know, yes,JOSH [00:24:24]: we are releasing the health checks, releasing some scripts, but a lot of the valuable stuff is also in the blog post itself, in the details and kind of the, you know, the learnings that we've had and the sort of errors that we run into. We tried to as much as possible surface those to other peopleSWYX [00:24:36]: could learn from thoseJOSH [00:24:36]: and avoid the same mistakes or failures as well. But I think it would go a lot faster.SWYX [00:24:41]: Although, yes,JOSH [00:24:41]: there would certainly be some things that'd be a little bit different. I mean, there'd probably be different CPUsSWYX [00:24:46]: or whatever,JOSH [00:24:46]: but I think a lot of that stuff is less,SWYX [00:24:49]: it's less,JOSH [00:24:49]: that's the like, that's less variable. I think most of it would apply the second time around. Although I'm sure next timeSWYX [00:24:56]: we're building one,JOSH [00:24:56]: it'll probably be, you know, at a scale that's 10x as big with a different chip or something like this.SWYX [00:25:00]: And then who knows?JOSH [00:25:01]: Yeah, with Kinect X8,JONATHAN [00:25:02]: that will have its own fun behavior and all that good stuff. Yeah.SWYX [00:25:06]: Perhaps there's something that people don't discuss about, and you don't even talk about this in the blog, but I always wonder is what is the timeline that's like kind of reasonable for this amount of work, at least the initial stages? And also what does the team composition look like for setting up a cluster, right? Like what are the mix of skills that you typically would require to get all this going?JOSH [00:25:27]: I'm, I can't really speak to typical. One thing I am very proud of is how much we accomplished with such a ridiculously small team. Like our infrastructure team is like, you know, fluctuates from week to week, depending on like how many things are on fire and how much we need to build. But it's like between like three and six people, like it's small. It's not like some huge team of like tons and tons of engineers. But those people are very, very good at what they do. And so that has allowed us to get a lot of mileage out of out of these things. I think it's not that we're building everything, right? It's not that three to six people build this whole thing. I definitely want to like, you know, say thanks very much to Dell and H5 and NVIDIA and the other people that have done a lot of the work, like to bring up this cluster, you know, with 4000 GPUs and three tier networking, networking architecture, you have 12,000 cables. So that's 24,000 things that need to be plugged in. Like that's just a lot of stuff to plug in, right? And you don't want to mess it up. Like each one needs to be done correctly. Like it's a little bit loose. Like it doesn't really work.SWYX [00:26:23]: If you break it,JOSH [00:26:23]: you need to replace it. Like there's a lot of workSWYX [00:26:26]: that goes into this.JOSH [00:26:27]: Yeah.SWYX [00:26:28]: And then, you know,JOSH [00:26:28]: that's just like that's it. That's if you were to do everything right the first time.SWYX [00:26:32]: And if you didn'tJOSH [00:26:32]: have to fix anything. But inevitably, you know, you will have to replace something, which means like taking all the wires out, pulling the thing out, taking all the GPUs out, going and fixing some cable, putting it all back correctly, putting it back in, doing this every time. So there were a lot of people at Dell, NVIDIA and at H5 that all helped a ton with this stuff. I don't know the exact size of the Dell team. It also fluctuated over time.SWYX [00:26:55]: Yeah, excellent. And then, you know, you so you have all the hardware set up and now you're firing it up for a single node. There's a long description that you guys have about just like monitoring the MFU, right? And what each situation might look might be indicative of. One of the most interesting things to me that I saw from here is like, you know, if training immediately starts off at 60 to 80% MFU, something's wrong.SWYX [00:27:24]: But like, you know, like what what are like, you know, some anecdotes or, you know, notable scenarios here that you might you might call out as maybe counterintuitive or super interesting.JOSH [00:27:36]: There's just so many of them. I mean, one of them, which I think is probably pretty common, like common knowledge by this point. But like we did have a sort of likeSWYX [00:27:46]: which one was this exactly?JOSH [00:27:47]: I think for the MFU, like gradually getting worse over time. I think that one, when we saw that the first time we were like, what the heck is going on? Like, why does it get just like a little bit worse? This is so strange. Like, what is it getting lazy or tired or something? Like, is it heat? Like what's going on? And in this particular case, it was memory fragmentation. Because you have hundreds of machines, they're doing garbage collection slightly different times. And then they get slightly further apart and slightly more and more jittered until eventually they're all happening kind of at random times. And just like really messing up each one of your steps. So you just turn off garbage collection and call it a day, basically,SWYX [00:28:20]: to be honest.JOSH [00:28:20]: There's other things you can do if you want to be a little bit more sophisticated about it. But you can also just manuallyJONATHAN [00:28:25]: have it all garbage collect on some interval. Like that's what we've done. We just have a garbage collection callback that just runs. But I've seen the exact same thing.JOSH [00:28:33]: Yeah, yeah, exactly. So I thought that one was kind of funny. And we did trace that one down and look and we did find the actual call. Like, again, this goes to like having good tools. So we had really good tools where we could look at a bunch of like actual traces in C and be like, OK, cool. This is the thing that's taking a lot of time. Or like, you know, this is the thing that doesn't quite line up here. Like, oh, I guess it's garbage collection. OK, cool.SWYX [00:28:52]: Interesting.JOSH [00:28:52]: Yeah, let's just try taking it off.SWYX [00:28:54]: OK, great.JOSH [00:28:54]: That's what it was. Now we can fix it. So for each of them, like basically bugs are not hard if you have good tools. But if you don't have good tools, bugs can be very, very hard. So similarly for like heat, another thing that we saw was like, oh, you know, the CPU is getting throttled. OK, well, it's easy to see if you're monitoring the CPU throttling or monitoring the heat. If you're not monitoring that, it's really hard to know why it's just suddenly one of them is going slower. I noticed also in the pieceSWYX [00:29:17]: that you mentioned FSDP with 0.3. Actually, we met, I went to iClear and Guanhua from the DSP team was there presenting 0++. I was wondering if you want to make any call outs to, you know, particular open source or open library or open whatever implementation teams that were super helpful in your process. I think we ended up actuallyJOSH [00:29:39]: pulling from a whole bunch of different ones to pull things in into our own particular pipeline. So we use things from NVIDIA's, you know, Megatron stuff. We use stuff from probably DeepSpeed. I think we pulled in a bunch of different pieces from a bunch of different places. So it was really nice to see all these working open source like examples. I think I really appreciate all the effort that has gone into actually tuning these things because you can tune them, but it's a lot of work to like tune this stuff and do all this stuff from scratch. It's really nice to have like a working example. I think those are probably the two biggest ones, DeepSpeed and Megatron alone, but there are probably other ones as well.SWYX [00:30:13]: Is there a particular thing in the ecosystem where you would call out as like, you know, there should be something here that is open source, but like it's not really, it's like everyone kind of builds it on their own. I want to say something with the file system because everyone talks about the file system eventually.JOSH [00:30:28]: The file system actually was,SWYX [00:30:30]: I mean, we did somethingJOSH [00:30:31]: kind of dumb there. Like we have our own sort of local mirror so that we can, you know, like a crappy version of S3SWYX [00:30:38]: that's local,JOSH [00:30:38]: but it's just a pretty simple script, right?SWYX [00:30:41]: Like I think we run likeJOSH [00:30:41]: a little web server that just like serves files and then, you know, it can upload themSWYX [00:30:45]: and download them.JOSH [00:30:45]: Okay, great. And part of the reason we did that is that our internet connectionSWYX [00:30:50]: in the beginningJOSH [00:30:50]: was not the like full speedSWYX [00:30:52]: one that we wouldJOSH [00:30:52]: eventually have. And so we are a little bit more kind of bottlenecked in terms of internet bandwidth. And so we had this. I think we looked at a bunch of services out there like Minio and some other ones, but a lot of these like come with a lot of extra overhead and maintenance. And since we already have so much infrastructureSWYX [00:31:09]: to deal with,JOSH [00:31:09]: we kind of didn't want to, you know, bring in a whole other like cloud provider, virtualize something, something.SWYX [00:31:14]: We just wanted something simple.JOSH [00:31:14]: So we went with that, which has been quite helpful. Like our toolsSWYX [00:31:19]: are usually quite simple.JOSH [00:31:19]: It's like Bash and Python and SSH and Docker. Like we'd like to keep things simple so that's easier to debug, like less layers of infrastructure, less layers of abstraction, make it a lot easier to work with. Like we don't use Kubernetes,SWYX [00:31:30]: for example,JOSH [00:31:30]: and we just directly launch these things. And it's just been much easier to debug this way. One tool actually that does come into mind that I will call out is Kraken from Uber. That was great. We love that tool. We were a little bit skeptical. What is it?SWYX [00:31:44]: I'm sorry. Yeah.JOSH [00:31:45]: So Kraken is this, yeah, it's a distributed like Docker registry, basically, that uses BitTorrent to like transfer things between the machines in a sort of nice optimal way. Like in the very beginning, the naive way is like you have this one Docker registry, which was outside of the cluster. So every time we change an image, you know, there's many gigabytes that each of the 500 machines needs to download.SWYX [00:32:07]: So that just takesJOSH [00:32:07]: a really long time. So what this thing does is like just one of them downloads it and then like they all sort of broadcast all the pieces to each other. And it was just like a really nice, fast way of getting these images down. And it was very robust.SWYX [00:32:19]: Like there's a lotJOSH [00:32:19]: going on under the hood, but I think it's a pretty cool tool that we haven't really had any bugs with it at all. Amazing.SWYX [00:32:26]: Yeah. I mean, that's all my questions, I guess, for the info piece. I don't know if, John, you had something that you were sort of burning to ask or.JONATHAN [00:32:33]: No, all I can say is just sameSWYX [00:32:36]: in a lot of places, like, you know, and they're done thatJONATHAN [00:32:38]: seeing this plus one. I think the one big difference, you know, perhaps in philosophies is we've tried to basically standardize on as much commodity stuff as possible, just because, you know, I think the reason I asked about trying to do thisSWYX [00:32:50]: on multiple differentJONATHAN [00:32:50]: pieces of infrastructure is like, I think we're running on like six or seven different clouds right now. And everybody has done something slightly different. And my gosh, the little differences add up as you know, you've seen. And so, you know,SWYX [00:33:04]: our philosophy has been like, whatever the hellJONATHAN [00:33:05]: we can standardize, please let's standardize it. Like vanilla off the shelf FSDB.SWYX [00:33:10]: And like, you know,JONATHAN [00:33:10]: we wrote our own data loader, but we've tried to make that as much of a standard as we can across our infrastructure and in Databricks, because things just start getting really complicatedSWYX [00:33:18]: or like we useJONATHAN [00:33:18]: Kubernetes extensively because it at least gives us a uniform set of APIs. Like that's our hardware abstraction layer to a certain extent for everything else. So it's just, you know, a difference in philosophy there. But otherwise, like, yeah, this stuff is really, really hard. And I feel like we take for granted how much of this, you know, is done for us when you go and you just query chat GPT, for example. Like, oh my God, everything going on underneath that, you know, it's kind of a miracle that the machines boot up, let alone that you can like query a giant language model that's probably doing inference across multiple machines and was trained across thousands of machines. Like, you know, minor miracle.SWYX [00:33:54]: Yeah, it is an awesome amount of power that we invoke with a single API call that we take for granted these days. It's absurd. Yeah, I mean, like Kubernetes, like that point about Kubernetes, I will say as a former AWS employee, like it seems like it would be ideal for imbue to at some point make it more abstracted or agnostic because you're going to want to, you know, replicate your setup. We do have our ownJOSH [00:34:19]: sort of replacement. It's just a much simpler version of Kubernetes. Kubernetes is really designed for running services, not for running experiments. Like that's not its like main architecture. And so for us, like we have everything that's like, cool, you're going to run an experiment. So you want it to run to completion, right?SWYX [00:34:34]: OK, great.JOSH [00:34:34]: Like the primitives are sort of built around a slightly different style. And that makes it a lot easier, like just a lot simpler to fit that the nature of like these machines are going to disappear. They will need to be rebooted for infrastructure upgrades. They will like something will happen to the GPUs. Failure is like baked into this as like a core part of our infrastructure. So it's not that we don't have an abstraction. It's that it's a sort of simpler, more tailored abstraction for the particular work that we're doing.JONATHAN [00:34:58]: Yeah, I think it all depends on what your goals are. And like, I think the challenge in a lot of the deep learning stuff right now is that people are trying to like, people often build things that are more complicated than necessary to get the job done. And the complication is the enemy of everything. You know, don't use a fancier parallelism strategy than you have to. Don't use a fancier set of libraries than you have to.SWYX [00:35:18]: Don't do anythingJONATHAN [00:35:18]: that you don't have to do because it's hard enough as it is. Like, don't overcomplicateSWYX [00:35:23]: your own life.JONATHAN [00:35:23]: Don't try to bring in more tools or more fancy architecture tweaks if you absolutely don't have to.SWYX [00:35:29]: Like getting to the minimumJONATHAN [00:35:30]: necessary to get the job done. And it's really tempting to want to try to use everything. So like, I totally understand that one.SWYX [00:35:37]: I think the last piece I'll maybe call out is that I'm just going to weave this in just because I see the opportunity to do it. Are there any infrastructure shifts that need to be, that need to rise because of changing architecture? So I think, for example,SWYX [00:35:57]: you're announcing a dense model, a 70B dense model, whereas John just worked on DBRX and the image-to-text model, which presumably has different bottlenecks.JONATHAN [00:36:10]: That's correct for us. You know, we train both dense and mixture of expert models. The one we happened to, you know, kind of get permission to open source was a mixture of expert model. And those models are very demanding when it comes to network bandwidth, at least if you're training them in kind of FSTP 03 style, where there's just a lot of parameters getting shuffled back and forth. And your ratio of kind of compute to amount of data that you have to shuffle back and forth becomes a lot worse because you're now, you know, you're only using a fraction of the parameters for every token instead of all the parameters. And so we had to really push the envelope on getting all the stuff to the right places on time. And so actually the networking part of DBRX was the single hardest thing, I think, of the entire process. Just get MOE training, working at scale across a big cluster. We still managed to, I think, do it all with commodity parts, which was very exciting. You know, we were using FSTP and we eventually used HSTP so that we could have HSTP as a version of FSTP where you have multiple smaller replicas and you're doing data parallel within those replicas. And that helped a lot with network latency issues that we were running into just because we were transmitting so much data, you know, for every single part of the process. I think it actually, like, it was instructive for how Google designs their hardware and software together personally. Their training, as far as I understand, using kind of a 03 style of training and have been for a while. They also train mixture of expert models. TPUs have a very different network bandwidth to compute ratio. They have a lot more bandwidth just objectively. And TPUs per chip tend to be a little bit less compute intensive and have a little bit less memory. You know, it's just a different design choice. So the ratio of flops to bandwidth is very different. And that means that it's much easier for Google to be able to pull offSWYX [00:37:54]: some of this stuff.JONATHAN [00:37:54]: They also have interesting, you know, Torus style network architecture or Torus style, like, literal network architectureSWYX [00:38:00]: is not like the model,JONATHAN [00:38:00]: but the network.SWYX [00:38:02]: Is this the sort of block attention? I forgot what you call it. So this is just more or the,JONATHAN [00:38:07]: yeah, this is more, not the ring attention, but these are the ring all reduces. Like you have three different dimensions of rings because they kind of put you in these three dimensional Toruses from what I understand. And so like, you know, Google's infrastructure in some sense is kind of, I wouldn't say built for this, but maybe the way that Google trains models is built for a slightly different bit of infrastructure they have. And it's kind of neat to think about that. You know, as one thing that I think NVIDIA announced for, you know, for, for both the GH200 and the GB200 is this hybrid networking where you'll have blocks of NVLink network chips. I think for the GB200, I think it's like groups of 72 GPUs will all have NVLink to each other. So higher bandwidth, then you'll have normal networking of some kind, InfiniBand or Rocky or what have you between these blocks. And that's kind of a, you know, it's a change due to the fact that, you know, it's hard to build really high bandwidth networks over very large groups, but it is now a blocked networking. And you have to think about how you architect your model and your parallelism differently. You also have to think about fault tolerance differently because it now matters where you lose a GPU, whereas it didn't before. So, you know, it's, it's, it's just all really interesting and really fun speaking personally, but it's going to mean new nightmares when we all move to that generation and have to think about, you know, new versions of these problems.JOSH [00:39:20]: As you go up to larger scales, it gets quite different. Like right now, you know, if you're experiencing, let's say, for example, you experience a GPU failure every day, that's fine.SWYX [00:39:31]: Just restart.JOSH [00:39:31]: If you make your thing 24 times as big, now it's once an hour. Now it stops being quite as easy to just restart, right? So now you have to kind of break, like bake in this sort of redundancy that you didn't have before. So I think as you go up in scale, you end up running into like a lot of really interesting problems that also inform the, the actual like design. Yeah, I mean, as an orchestration guy,SWYX [00:39:52]: this is why I always emphasize like very cheap storage or very fast storage. So you can checkpoint more, but I don't think that's probably not the best solution to for fast, you know, training.JONATHAN [00:40:05]: Which works fine when you're doing language and then you move to vision or video. And then, you know, you have multi petabyte datasetsSWYX [00:40:12]: and getting, you know,JONATHAN [00:40:13]: cheap, fast multi petabyte storage starts to bite. Like I've certainly encountered issues where the literal data center where my GPUs were did not have enough, you know, object store to fit the datasets that people wanted to bring into that data center from whichever users were, were trying to bring them in. And then you get to a wholeSWYX [00:40:31]: different world of hurtJONATHAN [00:40:31]: where you have to keep your data in a different region because the region is just out of storage. So things get fun really fast.SWYX [00:40:39]: Speaking of vision, Josh, actually, you know, Embu is an agents company, but you're only, you're announcing a text-only model. What, where does, where does the vision side come in?JOSH [00:40:49]: I think we've actually done a lot of work in the past and people can see kind of our blog posts about sort of self-supervised learning and some other kind of vision-related stuff in the past as well. So we're very familiar with, with that stuff. But I think our main focus right now is on kind of, as we say, coding and reasoning. And there, there's certainly a visual component to some problems. But, you know, it's not necessarily required for all problems. And actually we found that for most of the kind of like code writing and, and reasoning problems that we care about, the visual part isn't really a huge important part of it. Sometimes if you really need to, you can maybe describeSWYX [00:41:24]: the thing.JOSH [00:41:24]: There are other like, you know, multimodal models that you can use off the shelf to sort of plug in for those particular piecesSWYX [00:41:30]: that you need, right?JOSH [00:41:30]: Like if something is driving a browser or whatever, like you can sometimes get away with not having to have that baked into the original model. So our folk were, you know, in a sense, we kind of do a lot across the stack. We're working on our own infrastructure and pre-training and RL and fine tuning and products and everything. But in another sense, we're very narrowly focused on the application side. So all of the stuff across the stack is kind of going toward a very particular purpose. And so that particular purpose right now doesn't really need vision. So we think that people are going to make all sorts of really cool image modelsSWYX [00:42:00]: like Jonathan, right?JOSH [00:42:00]: And all sorts of interesting multimodal models into the future. We'll let them go do that. That's great. We'll take advantage of that, partner with those people in the future. And right now we're really focused on kind of the core reasoning and coding capabilities and aspects of the model.SWYX [00:42:14]: I wanted to go into carbs since that's kind of the next layer of the stack. We talked about carbs in the first episode with Kanjin because you've actually had a blog post about it like a couple of years ago. Maybe let's introduce it.JONATHAN [00:42:26]: Has that been a couple of years now?JOSH [00:42:28]: No, it must have been at least one year. Hopefully it's not multiple years.SWYX [00:42:32]: Sorry, I'm counting AI time. Yeah, yeah. Yeah, I was going to sayJONATHAN [00:42:35]: you're making me feel really old right now.SWYX [00:42:39]: I count everything before the generally intelligent rename as like, you know, prehistory. Yeah. And now sort of modernity, right? So I actually thought carbs was more about hyperparameter optimization in a sense of like sort of parameters, hyperparameter search. Whereas, you know, when you introduced it, especially in this blog post, it's more about scaling laws and predictability of like, are we sort of in the right ballpark before we scale things up? Maybe sort of recount the history of carbs.JOSH [00:43:10]: Yeah, so it really is a little bit of both. So carbs is, it's maybe a backronym, but it's for cost aware Pareto region Bayesian search. So this is about technically how it works, but carbs is like, you know, we like pastries and stuff.SWYX [00:43:26]: So great, why not? But the point is thatJOSH [00:43:29]: it's a cost aware hyperparameter tuner. So most hyperparameter tuners, you kind of say, OK, here's this objective function. I want you to make this number as big as possible or as small as possible, whichever direction you want to go. So yeah, just go make this number, you know, as small as possible. OK, so it'll try a bunch of differentSWYX [00:43:46]: hyperparameters,JOSH [00:43:46]: a bunch of different configurationsSWYX [00:43:48]: to figure out, like,JOSH [00:43:48]: how do I tweak your network and architecture, et cetera, to get the kind of best performance I possibly can. That's usually saying, like, you know, almost all of these hyperparameter configurations are, let's say they're all going to use the same number of GPUs or the same number of nodes.SWYX [00:44:01]: So it's going to runJOSH [00:44:01]: for the same amount of time.SWYX [00:44:03]: So you can do that.JOSH [00:44:03]: You can get a number out and that's great. But what carbs does is it says,SWYX [00:44:07]: OK, actually,JOSH [00:44:07]: what if we relax that constraint? What if we say each of these different points, we're going to model how expensive it will be to sample this configuration. So if what if we train with just one one hundredth of the data? Like, how well can we do?SWYX [00:44:19]: What if we trainJOSH [00:44:19]: with one tenth of the data? What if we train with all the data? That way you can understand, like, as we get more and more data, as we spend more and more compute,SWYX [00:44:26]: as we make a biggerJOSH [00:44:26]: and bigger network, how does performance change with these things that change? Like how expensive it is to even explore this data point. So by doing that, we can see the scaling laws for not just, you know,SWYX [00:44:36]: the scaling lawsJOSH [00:44:36]: from like the, you know, Chantilla paper, the scaling laws for all parameters. We can see how does how does the number of layers change with this? How does the, you know, the learning rate change? How do the like, you know, various types of regularization change? So you can see these nice scaling laws. And as you're going across costs, like how should this be changing as you're scaling up your model? So that, coupled with the kind of metric that we chose, which is a very precise way of measuring performance, allowed us to really like hone in on parameters that worked really wellSWYX [00:45:05]: and understand, like,JOSH [00:45:05]: how do we want to scale those up, especially as we're changingSWYX [00:45:08]: things about the network?JOSH [00:45:08]: Like one of the things that we did is we used a custom tokenizer. As we change this tokenizer, changes a bunch of other things about the model. So how should we scale up this entirely new tokenizer? Like no one has ever made a model this large with this tokenizer before. And so how do we want toSWYX [00:45:22]: change all these things?JOSH [00:45:22]: Harps kind of shows you, like, look, as you change these parameters, like these other ones are kind of dependent on this.SWYX [00:45:28]: Like this is the, these areJOSH [00:45:28]: the relationships between them. So you can better understand, like, OK, if I'm going to scale this up 10x or 100x, like, where do I want to be? I can only go so far. And so, you know, we did run, like, I think maybe it was like a 14b one or somethingSWYX [00:45:40]: like that to check.JOSH [00:45:41]: But and so we had a bunch of like 1b or 14b and then at 70b. I don't think we had a, I think we just did like one at 14b. So you can, we get to check that like, oh, is this on the curve? Like, is this where we expect? It was like right there. So then great, go on to the next one. Yeah, I mean, that makes a lot of sense.SWYX [00:45:56]: I wonder if, so one of the key questions, and correct me if I'm wrong, but like usually people do search or do their evals just based on loss. But you actually evaluate based on, you know, the sort of end state evals that people might expect, like HellaSwag and Lombata, whatever. What is the norm here? Is there a norm?JOSH [00:46:20]: Yeah, I don't know if there's a hundred percent.SWYX [00:46:21]: I don't know. I only see loss on most people's reports.JOSH [00:46:25]: I think it's easy to, like, loss is very nice because it's very precise. It will tell you, like, very fine grained differences between like really small changes in your hyperparameters or network architecture. Whereas, especially at the smaller scales, if you're looking at like accuracy, it's very noisy. Like it might be zero or a hundred or like, you know, fluctuating by like 10 or 20 percentage points, which makes it really hard to tell, like, did that change actually mean anything? So our loss is sort of a combination of these two. Instead of saying, like, let's just look at perplexity, we say, let's look at perplexity on the tasks that we care about for multiple choice questions effectively.SWYX [00:47:00]: So we're saying like, yes,JOSH [00:47:00]: this is formulated as a multiple choice question, and we're going to look at the, like, you know, the loss of perplexity for this particular answer token. And that ends up being something that's like both targeted to what you actually care about and also very precise. The nice thing about this though is that it's independent of the data that you train on. One thing that's annoying about perplexity or about loss is that as you change your data set, this is really obnoxious because now it fundamentally changes your loss, right? And so you can't tell, like, how do I tweak my data set? But because we have this held out evaluation dat
Today's guest is Sheri Crawford, Director of Data Governance at Scotiabank. Sheri joins us on the program today to discuss the biggest challenges for data management teams to drive the systems and the infrastructure necessary to capitalize on new data-heavy emerging use cases in generative AI. Throughout the episode, Sheri gives business leaders in financial services and beyond actionable insights into balancing consumer needs with infrastructure changes in the digital transformation process. Today's episode is sponsored by MinIO. Learn how brands work with Emerj and other Emerj Media options at emerj.com/ad1.
Today's guest is Anand Babu Periasamy, Co-founder & Co-CEO of MinIO, Inc. MinIO is a software company that develops High-Performance Object Storage systems that are API compatible with the Amazon S3 cloud storage service. Anand joins us on today's podcast to discuss opportunities for IT and infrastructure leaders to scale AI across the enterprise. Throughout the episode, Anand explains at length what he sees as the critical ingredients for ensuring sustainable growth in infrastructure systems and the advantages of object storage regardless of industrial sector. This episode is sponsored by MinIO. Learn how brands work with Emerj and other Emerj Media options at emerj.com/ad1.
Join Corey Quinn and MinIO's co-founder and CEO, AB Periasamy, for a look into MinIO's strategic approach to integrating open-source contributions with its business objectives amidst the AI evolution. They discuss the effect of AI on data management, highlight the critical role of data replication, and advocate for the adoption of cloud-native architecture. Their conversation examines the insights of data replication, mentioning its pivotal role in ensuring efficient data management and storage. Overall, a recurring theme throughout the episode is the importance of simplifying technology to catalyze a broader understanding and utilization that can remain accessible and beneficial to all.Show Highlights: (00:00) - Intro(03:40) - MinIO's evolution and commitment to simplicity and scalability.(07:25) - The significance of data replication and object storage's versatility.(12:12) - Challenges and innovations in data backup and disaster recovery.(15:21) - Launch of MinIO's Enterprise Object Store and its comprehensive features.(20:50) - Balancing open-source contributions and commercial objectives.(30:32) - AI's growing influence on data storage strategies and MinIO's role.(34:33) - The shift towards software-defined data infrastructure driven by AI and cloud technologies.(39:40) - Resources and the future of tech (43:31) - Closing thoughts About A.B Periasamy:AB Periasamy is the CEO and co-founder of MinIO. One of the leading thinkers and technologists in the open source software movement, AB was a co-founder and CTO of GlusterFS which was acquired by RedHat in 2011. Following the acquisition, he served in the office of the CTO at RedHat prior to founding MinIO in late 2015. AB is an active angel investor and serves on the board of H2O.ai and the Free Software Foundation of India. He earned his BE in Computer Science and Engineering from Annamalai University.Links:MinIO: https://min.io/Kubernetes:https://kubernetes.io/AWS (Amazon Web Services): https://aws.amazon.com/Twitter: @abperiasamy
Garima Kapoor, COO and co-founder of MinIO, joins me to share her journey from investor and advisor to co-founder of MinIO and the wealth of knowledge she's amassed along the way. In this episode, Garima explains how her experience in finance and belief in the power of open source helped MinIO to break into the data storage market. She also reviews the challenges she faced as a first-time founder and what others can learn from her mistakes and take away from some of their own. Since Garima started her journey with MinIO as CFO, she outlines that role for me and explains how she thinks a CFO should operate in an open source company. In reviewing mistakes she's seen from other founders, Garima states some principles that create the “foundation for any open source business.” - “You should always be very honest to your community. You should always be very transparent to the community”Highlights:Garima introduces herself and explains why she and her co-founders started MinIO (1:31)Garima describes how the MinIO founders honed in on a problem they wanted to solve (3:55)How the MinIO founders used open source crack the market (6:37)What triggers a user to purchase a commercial license for the product (10:33)Garima explains why she and her cofounders were set on their open source strategy from day one (11:35)Garima explores the differences between being an investor and advisor for other companies and starting her own. (13:25)Garima shares go-to-market advice for other founders (15:21)Garima outlines her strategy for building on small successes (18:38)Garima explains why she started as CFO for MinIO and breaks down the role a CFO can play in a new company (21:46)Why Garima thinks a CFO's role remains the same in an open source company as compared to a proprietary company (27:17)How to avoid competing with your open source product when you also have a commercial offering (34:06)Links:GarimaLinkedIn: https://www.linkedin.com/in/garimakap/Twitter: https://twitter.com/garimakapCompany: min.io
In this episode, Amir interviews Ugur Tigli, the CTO of MinIO, a high-performance object storage company. They discuss the infrastructure components of cloud storage, data protection, operating models, and costs and how they tie into AI workloads. Ugur explains that MinIO is an open-source, S3-compatible distributed object storage solution popular for its simplicity and ease of deployment. They also delve into why MinIO chose the open-source path and its benefits. Listen to the episode to learn more about cloud and AI workloads and the impact on cloud costs. Highlights [00:02:40] Dual licensing model. [00:04:15] Open source and security. [00:07:36] AI and data growth. [00:14:15] Complex data infrastructure evolution. [00:16:39] Object storage simplification. [00:20:19] AI and storage cost. [00:24:07] Integrating with external systems. Ugur Tigli is CTO at MinIO. In this current role, he oversees enterprise strategy and interfaces with MinIO's enterprise client base. He helps clients architect and deploy API-driven, cloud-native, and scalable enterprise-grade data infrastructure using MinIO. Ugur has almost two decades of experience building high-performance data infrastructure for global financial institutions. Before MinIO, he was a technology leader at Bank of America, serving as the Senior Vice President and Global Head of Hardware Engineering. He joined Bank of America through the acquisition of Merrill Lynch, where he was the Vice President for Storage Engineering. Ugur has a Bachelor of Science in Electrical Engineering from Lafayette College. https://www.linkedin.com/in/ugur-tigli-9a9323/ Thank you so much for checking out this episode of The Tech Trek, and we would appreciate it if you would take a minute to rate and review us on your favorite podcast player. Want to learn more about us? Head over at https://www.elevano.com Have questions or want to cover specific topics with our future guests? Please message me at https://www.linkedin.com/in/amirbormand (Amir Bormand)
IN THIS EPISODE...In this digital age, where the volume of data is growing exponentially, object storage has emerged as a fundamental technology, particularly well-suited for cloud computing and big data applications. It offers the advantages of easy scalability, durability, and accessibility, making it an integral part of modern data management solutions. Unlike traditional file systems, which organize data into hierarchical folders and directories, object storage takes a different approach.My guest today, Garima Kapoor, Ph.D., is the Co-Founder and Chief Operating Officer (COO) of MinIO, Inc., an industry-leading company that has pioneered a high-performance, S3-compatible object store. With a solid educational background and extensive experience, Garima has been instrumental in MinIO's remarkable journey. Under her strategic leadership, MinIO has emerged as a powerhouse in data storage, specializing in large-scale AI/ML, data lake, and database workloads. The innovative object store solution MinIO offers is designed to meet the demanding requirements of modern data-driven applications. It is characterized by its software-defined architecture, enabling seamless deployment on a wide range of environments, including cloud and on-premises infrastructure.------------Full show notes, links to resources mentioned, and other compelling episodes can be found at http://LeadYourGamePodcast.com. (Click the magnifying icon at the top right and type “Garima”)Love the show? Subscribe, rate, review, and share! ------------JUST FOR YOU: Increase your leadership acumen by identifying your personal Leadership Trigger. Take my free my free quiz and instantly receive your 5-page report. Need to up-level your workforce or execute strategic People initiatives? https://shockinglydifferent.com/contact or tweet @KaranRhodes.-------------ABOUT GARIMA KAPOOR:Garima Kapoor is a prominent figure in the tech industry, known for her role as the Chief Operating Officer (COO) and co-founder of MinIO, a cutting-edge technology company. With a solid financial background, she initially served as the company's Chief Financial Officer (CFO) before taking on her current leadership position. Garima is not only a successful entrepreneur but also an active investor and advisor to emerging technology companies in the dynamic landscape of Silicon Valley.Her academic journey is equally impressive, holding a Doctor of Philosophy (Ph.D.) in Accounting and Finance from Nirma University, a Masters in Economics from Banasthali Vidyapith, and a Bachelor of Science (BS) degree in Economics from Delhi University. Garima's multifaceted expertise and leadership have played a pivotal role in shaping the success of MinIO and contributing to the advancement of technology in the digital era.------------WHAT TO LISTEN FOR:WHAT TO LISTEN FOR:1. What does MinIO do, and how does it help organizations?2. What is object storage?3. What are the tips for building a successful startup?4. What is the role of fundraising and product development in the growth of a storage company?5. What is courageous agility, and how does it help to navigate unpredictable paths in leadership and...
A New variant of Chae$ malware is described. A "Smishing Triad" impersonates postal services. A MinIO storage exploit reported. Okta warns of attackers seeking senior admin privileges. LockBit compromises a UK security contractor. DDoS takes down a German financial regulator's site. Infamous Chisel as GRU combat support. Joe Carrigan on Meta uncovering a Chinese influence effort. Our guest is Connie Stack, CEO of Next DLP, discussing data breach notification procedure. And please -PLEASE- remember to change your default passwords. For links to all of today's stories check out our CyberWire daily news briefing: https://thecyberwire.com/newsletters/daily-briefing/12/169 Selected reading. Threat Profile: Chae$ 4 Malware (Morphisec) "Smishing Triad" Targeted USPS and US Citizens for Data Theft (Resecurity) 'Smishing Triad' Targeted USPS and US Citizens for Data Theft (Security Affairs) New Attack Vector In The Cloud: Attackers caught exploiting Object Storage Services (Security Joes) Hackers exploit MinIO storage system to breach corporate networks (BleepingComputer) Okta Warns of Social Engineering Attacks Targeting Super Administrator Privileges (The Hacker News) More Okta customers trapped in Scattered Spider's web (Register) Cross-Tenant Impersonation: Prevention and Detection (Okta Security) Breaking: UK MoD attacked by LockBit (Computing) German financial agency site disrupted by DDoS attack since Friday (BleepingComputer) LogicMonitor customers hacked in reported ransomware attacks (BleepingComputer) LogicMonitor customers hit by hackers, because of default passwords (TechCrunch) Learn more about your ad choices. Visit megaphone.fm/adchoices
New PDF MalDoc allows evasion of antivirus MinIO Storage system being used to compromise servers Okta warns of IT help desk attacks Thanks to today's episode sponsor, Comcast Data rules everything around us – but why are the people who need data the most unable to access it? What if you could boost the productivity of your security teams and their ability to collaborate by providing them access to the same shared and enriched data? You can. With DataBee™, from Comcast Technology Solutions. Learn how DataBee can help your organization make better informed decisions, quickly and cost-effectively. Visit https://comca.st/DataBee For the stories behind the headlines, head to CISOseries.com.
Anand Babu "AB" Periasamy is the cofounder and CEO of MinIO, a high performance object storage for AI that's built for large scale workloads. They have raised $126M in funding from the likes of General Catalyst, Softbank, Intel Capital, and Nexus Venture Partners. It's the world's fastest growing object storage company with more than 1 billion Docker pulls and more than 35K stars on GitHub. He's also an angel investor with investments in companies like H2O.ai, Isovalent, Starburst, Postman, and many more. He was previously the cofounder and CTO of Gluster, which got acquired by Red Hat. In this episode, we cover a range of topics including: - Why is storage important for AI workflows - What are the characteristics of a good data storage product - Repatriation of data from public cloud to on-prem - Running ML experiments in parallel - AI compute offerings from data infrastructure providers - Making data infrastructure faster and cheaper AB's favorite book: An Awesome Book! (Author: Dallas Clayton)--------Where to find Prateek Joshi: Newsletter: https://prateekjoshi.substack.com Website: https://prateekj.com LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 Twitter: https://twitter.com/prateekvjoshi
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses what is Object Storage and the history of file systems. Join the community at datanation.click
China is reportedly blocking multiple mergers and is now investigating Micron for an unspecified cybersecurity concern. Delays include Intel's acquisition of Tower Semiconductor, Maxlinear's purchase of Silicon Motion, Broadcom's acquisition of VMware, and Microsoft's bid for Activision Blizzard, as well as the long-delay in Cisco's acquisition of Acacia and the called-off purchase of NXP by Qualcomm which we reported on previously. The latest move against Micron seems designed to antagonize rather than investigate any real cybersecurity flaws at the memory provider. What's going on here? Time Stamps: 0:00 - Welcome to the Rundown 1:00 - Cisco Pulls Out of Russia 2:38 - Sony Invests in Raspberry Pi 4:10 - Always Greener on the Google AI Side 6:28 - Jacob Ziv Dies at 91 8:03 - MinIO Fires Back at Weka 13:48 - China Plays Hardball with Western Companies 21:44 - The Weeks Ahead 22:25 - Thanks for Watching Follow our hosts on Social MediaTom Hollingsworth: https://www.twitter.com/NetworkingNerdStephen Foskett: https://www.twitter.com/SFoskett Max Mortillaro: https://www.twitter.com/MaxMortillaro Follow Gestalt ITWebsite: https://www.GestaltIT.com/Twitter: https://www.twitter.com/GestaltITLinkedIn: https://www.linkedin.com/company/1789 Tags: #Rundown, #RaspberryPi, #A100, #China, #Russia, #NFD31, #ArubaAtmosphere23, #NFDx, #MFD9, @Cisco @RaspberryPi_Org, @Google, @GoogleCloud, #AI, @NVIDIA, @MinIO, @WekaIO, #Storage
Hey, it's 5:05 on Friday, March 31st, 2023. From The Sourced Podcast Network in New York City, this is your host Pokie Huang. Stories in today's episode come from Kadi Grigg in Alexandria, Virginia, Edwin Kwan in Sydney, Australia, Olimpiu Pop in Transylvania, Romania, Katy Craig in San Diego, California, Marcel Brown in St. Louis, Missouri. Let's get to it.Latest Mass Ransomware Attack
X as Code expert Ned Bellavance rejoins the podcast to discuss the latest battle in open-source licensing between MinIO and WekaIO and how customers should think about open-source licensing. Show Notes: Ned's Pluralsight Courses: https://www.pluralsight.com/authors/edward-bellavance Block and Files Article on MinIO/WekaIO: https://blocksandfiles.com/2023/03/26/we-object-minio-says-no-more-open-license-for-you-weka/
Computing pioneer and Intel co-founder Gordon Moore has died. His name is commonly used in reference to Moore's Law, which stated that processors would be exponentially more complex, but he was much more than this. Moore was a visionary, who guided Intel through the DRAM market in the early years and then lead the transition of the company to lay the foundation for modern microcomputers. He quiet and polite, unlike Robert Noyce and Andy Grove, but everyone trusted Moore's thoughtful and considered decisions. Moore learned from his mistakes, notably a foray into the digital watch market, and was able to lead while allowing others to have their own autonomy. In a way, Moore created Silicon Valley but was entirely unlike what it has become. We could surely use more leaders in the mold of Gordon Moore! Time Stamps: 0:00 - Welcome to the Rundown 0:34 - Toshiba Takeover Talks 3:10 - Biden Outlaws Feds Commercial Spyware 6:03 - MinIO and Weka Divided on Licensing Changes 14:09 - OVHcloud Owes for Data Damages 18:42 - Arm Wants an Arm and a Leg for Chip Licenses 24:52 - Gordon Moore Dies at 94 37:35 - The Weeks Ahead 38:34 - Thanks for Watching Follow our hosts on Social MediaTom Hollingsworth: https://www.twitter.com/NetworkingNerdStephen Foskett: https://www.twitter.com/SFoskett Follow Gestalt ITWebsite: https://www.GestaltIT.com/Twitter: https://www.twitter.com/GestaltITLinkedIn: https://www.linkedin.com/company/1789 Tags: #Rundown, #GordonMoore, #Spyware, #Pegasus, #Cloud, #ChipLicense, #NFD31, #NFDx, #ArubaAtmosphere, #Toshiba, @MinIO, @WekaIO, @OVHcloud, @OVHcloud_US, @Arm, @Intel
AB Periasamy, Co-Founder and CEO of MinIO, joins Corey on Screaming in the Cloud to discuss what it means to be truly open source and the current and future state of multi-cloud. AB explains how MinIO was born from the idea that the world was going to produce a massive amount of data, and what it's been like to see that come true and continue to be the future outlook. AB and Corey explore why some companies are hesitant to move to cloud, and AB describes why he feels the move is inevitable regardless of cost. AB also reveals how he has helped create a truly free open-source software, and how his partnership with Amazon has been beneficial. About ABAB Periasamy is the co-founder and CEO of MinIO, an open source provider of high performance, object storage software. In addition to this role, AB is an active investor and advisor to a wide range of technology companies, from H2O.ai and Manetu where he serves on the board to advisor or investor roles with Humio, Isovalent, Starburst, Yugabyte, Tetrate, Postman, Storj, Procurify, and Helpshift. Successful exits include Gitter.im (Gitlab), Treasure Data (ARM) and Fastor (SMART).AB co-founded Gluster in 2005 to commoditize scalable storage systems. As CTO, he was the primary architect and strategist for the development of the Gluster file system, a pioneer in software defined storage. After the company was acquired by Red Hat in 2011, AB joined Red Hat's Office of the CTO. Prior to Gluster, AB was CTO of California Digital Corporation, where his work led to scaling of the commodity cluster computing to supercomputing class performance. His work there resulted in the development of Lawrence Livermore Laboratory's “Thunder” code, which, at the time was the second fastest in the world. AB holds a Computer Science Engineering degree from Annamalai University, Tamil Nadu, India.AB is one of the leading proponents and thinkers on the subject of open source software - articulating the difference between the philosophy and business model. An active contributor to a number of open source projects, he is a board member of India's Free Software Foundation.Links Referenced: MinIO: https://min.io/ Twitter: https://twitter.com/abperiasamy LinkedIn: https://www.linkedin.com/in/abperiasamy/ Email: mailto:ab@min.io TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Chronosphere. When it costs more money and time to observe your environment than it does to build it, there's a problem. With Chronosphere, you can shape and transform observability data based on need, context and utility. Learn how to only store the useful data you need to see in order to reduce costs and improve performance at chronosphere.io/corey-quinn. That's chronosphere.io/corey-quinn. And my thanks to them for sponsor ing my ridiculous nonsense. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn, and I have taken a somewhat strong stance over the years on the relative merits of multi-cloud, and when it makes sense and when it doesn't. And it's time for me to start modifying some of those. To have that conversation and several others as well, with me today on this promoted guest episode is AB Periasamy, CEO and co-founder of MinIO. AB, it's great to have you back.AB: Yes, it's wonderful to be here again, Corey.Corey: So, one thing that I want to start with is defining terms. Because when we talk about multi-cloud, there are—to my mind at least—smart ways to do it and ways that are frankly ignorant. The thing that I've never quite seen is, it's greenfield, day one. Time to build something. Let's make sure we can build and deploy it to every cloud provider we might ever want to use.And that is usually not the right path. Whereas different workloads in different providers, that starts to make a lot more sense. When you do mergers and acquisitions, as big companies tend to do in lieu of doing anything interesting, it seems like they find it oh, we're suddenly in multiple cloud providers, should we move this acquisition to a new cloud? No. No, you should not.One of the challenges, of course, is that there's a lot of differentiation between the baseline offerings that cloud providers have. MinIO is interesting in that it starts and stops with an object store that is mostly S3 API compatible. Have I nailed the basic premise of what it is you folks do?AB: Yeah, it's basically an object store. Amazon S3 versus us, it's actually—that's the comparable, right? Amazon S3 is a hosted cloud storage as a service, but underneath the underlying technology is called object-store. MinIO is a software and it's also open-source and it's the software that you can deploy on the cloud, deploy on the edge, deploy anywhere, and both Amazon S3 and MinIO are exactly S3 API compatible. It's a drop-in replacement. You can write applications on MinIO and take it to AWS S3, and do the reverse. Amazon made S3 API a standard inside AWS, we made S3 API standard across the whole cloud, all the cloud edge, everywhere, rest of the world.Corey: I want to clarify two points because otherwise I know I'm going to get nibbled to death by ducks on the internet. When you say open-source, it is actually open-source; you're AGPL, not source available, or, “We've decided now we're going to change our model for licensing because oh, some people are using this without paying us money,” as so many companies seem to fall into that trap. You are actually open-source and no one reasonable is going to be able to disagree with that definition.The other pedantic part of it is when something says that it's S3 compatible on an API basis, like, the question is always does that include the weird bugs that we wish it wouldn't have, or some of the more esoteric stuff that seems to be a constant source of innovation? To be clear, I don't think that you need to be particularly compatible with those very corner and vertex cases. For me, it's always been the basic CRUD operations: can you store an object? Can you give it back to me? Can you delete the thing? And maybe an update, although generally object stores tend to be atomic. How far do you go down that path of being, I guess, a faithful implementation of what the S3 API does, and at which point you decide that something is just, honestly, lunacy and you feel no need to wind up supporting that?AB: Yeah, the unfortunate part of it is we have to be very, very deep. It only takes one API to break. And it's not even, like, one API we did not implement; one API under a particular circumstance, right? Like even if you see, like, AWS SDK is, right, Java SDK, different versions of Java SDK will interpret the same API differently. And AWS S3 is an API, it's not a standard.And Amazon has published the REST specifications, API specs, but they are more like religious text. You can interpret it in many ways. Amazon's own SDK has interpreted, like, this in several ways, right? The only way to get it right is, like, you have to have a massive ecosystem around your application. And if one thing breaks—today, if I commit a code and it introduced a regression, I will immediately hear from a whole bunch of community what I broke.There's no certification process here. There is no industry consortium to control the standard, but then there is an accepted standard. Like, if the application works, they need works. And one way to get it right is, like, Amazon SDKs, all of those language SDKs, to be cleaner, simpler, but applications can even use MinIO SDK to talk to Amazon and Amazon SDK to talk to MinIO. Now, there is a clear, cooperative model.And I actually have tremendous respect for Amazon engineers. They have only been kind and meaningful, like, reasonable partnership. Like, if our community reports a bug that Amazon rolled out a new update in one of the region and the S3 API broke, they will actually go fix it. They will never argue, “Why are you using MinIO SDK?” Their engineers, they do everything by reason. That's the reason why they gained credibility.Corey: I think, on some level, that we can trust that the API is not going to meaningfully shift, just because so much has been built on top of it over the last 15, almost 16 years now that even slight changes require massive coordination. I remember there was a little bit of a kerfuffle when they announced that they were going to be disabling the BitTorrent endpoint in S3 and it was no longer going to be supported in new regions, and eventually they were turning it off. There were still people pushing back on that. I'm still annoyed by some of the documentation around the API that says that it may not return a legitimate error code when it errors with certain XML interpretations. It's… it's kind of become very much its own thing.AB: [unintelligible 00:06:22] a problem, like, we have seen, like, even stupid errors similar to that, right? Like, HTTP headers are supposed to be case insensitive, but then there are some language SDKs will send us in certain type of casing and they expect the case to be—the response to be same way. And that's not HTTP standard. If we have to accept that bug and respond in the same way, then we are asking a whole bunch of community to go fix that application. And Amazon's problem are our problems too. We have to carry that baggage.But some places where we actually take a hard stance is, like, Amazon introduced that initially, the bucket policies, like access control list, then finally came IAM, then we actually, for us, like, the best way to teach the community is make best practices the standard. The only way to do it. We have been, like, educating them that we actually implemented ACLs, but we removed it. So, the customers will no longer use it. The scale at which we are growing, if I keep it, then I can never force them to remove.So, we have been pedantic about, like, how, like, certain things that if it's a good advice, force them to do it. That approach has paid off, but the problem is still quite real. Amazon also admits that S3 API is no longer simple, but at least it's not like POSIX, right? POSIX is a rich set of API, but doesn't do useful things that we need to do. So, Amazon's APIs are built on top of simple primitive foundations that got the storage architecture correct, and then doing sophisticated functionalities on top of the simple primitives, these atomic RESTful APIs, you can finally do it right and you can take it to great lengths and still not break the storage system.So, I'm not so concerned. I think it's time for both of us to slow down and then make sure that the ease of operation and adoption is the goal, then trying to create an API Bible.Corey: Well, one differentiation that you have that frankly I wish S3 would wind up implementing is this idea of bucket quotas. I would give a lot in certain circumstances to be able to say that this S3 bucket should be able to hold five gigabytes of storage and no more. Like, you could fix a lot of free tier problems, for example, by doing something like that. But there's also the problem that you'll see in data centers where, okay, we've now filled up whatever storage system we're using. We need to either expand it at significant cost and it's going to take a while or it's time to go and maybe delete some of the stuff we don't necessarily need to keep in perpetuity.There is no moment of reckoning in traditional S3 in that sense because, oh, you can just always add one more gigabyte at 2.3 or however many cents it happens to be, and you wind up with an unbounded growth problem that you're never really forced to wrestle with. Because it's infinite storage. They can add drives faster than you can fill them in most cases. So, it's it just feels like there's an economic story, if nothing else, just from a governance control and make sure this doesn't run away from me, and alert me before we get into the multi-petabyte style of storage for my Hello World WordPress website.AB: Mm-hm. Yeah, so I always thought that Amazon did not do this—it's not just Amazon, the cloud players, right—they did not do this because they want—is good for their business; they want all the customers' data, like unrestricted growth of data. Certainly it is beneficial for their business, but there is an operational challenge. When you set quota—this is why we grudgingly introduced this feature. We did not have quotas and we didn't want to because Amazon S3 API doesn't talk about quota, but the enterprise community wanted this so badly.And eventually we [unintelligible 00:09:54] it and we gave. But there is one issue to be aware of, right? The problem with quota is that you as an object storage administrator, you set a quota, let's say this bucket, this application, I don't see more than 20TB; I'm going to set 100TB quota. And then you forget it. And then you think in six months, they will reach 20TB. The reality is, in six months they reach 100TB.And then when nobody expected—everybody has forgotten that there was a code a certain place—suddenly application start failing. And when it fails, it doesn't—even though the S3 API responds back saying that insufficient space, but then the application doesn't really pass that error all the way up. When applications fail, they fail in unpredictable ways. By the time the application developer realizes that it's actually object storage ran out of space, the lost time and it's a downtime. So, as long as they have proper observability—because I mean, I've will also asked observability, that it can alert you that you are only going to run out of space soon. If you have those system in place, then go for quota. If not, I would agree with the S3 API standard that is not about cost. It's about operational, unexpected accidents.Corey: Yeah, on some level, we wound up having to deal with the exact same problem with disk volumes, where my default for most things was, at 70%, I want to start getting pings on it and at 90%, I want to be woken up for it. So, for small volumes, you wind up with a runaway log or whatnot, you have a chance to catch it and whatnot, and for the giant multi-petabyte things, okay, well, why would you alert at 70% on that? Well, because procurement takes a while when we're talking about buying that much disk for that much money. It was a roughly good baseline for these things. The problem, of course, is when you have none of that, and well it got full so oops-a-doozy.On some level, I wonder if there's a story around soft quotas that just scream at you, but let you keep adding to it. But that turns into implementation details, and you can build something like that on top of any existing object store if you don't need the hard limit aspect.AB: Actually, that is the right way to do. That's what I would recommend customers to do. Even though there is hard quota, I will tell, don't use it, but use soft quota. And the soft quota, instead of even soft quota, you monitor them. On the cloud, at least you have some kind of restriction that the more you use, the more you pay; eventually the month end bills, it shows up.On MinIO, when it's deployed on these large data centers, that it's unrestricted access, quickly you can use a lot of space, no one knows what data to delete, and no one will tell you what data to delete. The way to do this is there has to be some kind of accountability.j, the way to do it is—actually [unintelligible 00:12:27] have some chargeback mechanism based on the bucket growth. And the business units have to pay for it, right? That IT doesn't run for free, right? IT has to have a budget and it has to be sponsored by the applications team.And you measure, instead of setting a hard limit, you actually charge them that based on the usage of your bucket, you're going to pay for it. And this is a observability problem. And you can call it soft quotas, but it hasn't been to trigger an alert in observability. It's observability problem. But it actually is interesting to hear that as soft quotas, which makes a lot of sense.Corey: It's one of those problems that I think people only figure out after they've experienced it once. And then they look like wizards from the future who, “Oh, yeah, you're going to run into a quota storage problem.” Yeah, we all find that out because the first time we smack into something and live to regret it. Now, we can talk a lot about the nuances and implementation and low level detail of this stuff, but let's zoom out of it. What are you folks up to these days? What is the bigger picture that you're seeing of object storage and the ecosystem?AB: Yeah. So, when we started, right, our idea was that world is going to produce incredible amount of data. In ten years from now, we are going to drown in data. We've been saying that today and it will be true. Every year, you say ten years from now and it will still be valid, right?That was the reason for us to play this game. And we saw that every one of these cloud players were incompatible with each other. It's like early Unix days, right? Like a bunch of operating systems, everything was incompatible and applications were beginning to adopt this new standard, but they were stuck. And then the cloud storage players, whatever they had, like, GCS can only run inside Google Cloud, S3 can only run inside AWS, and the cloud player's game was bring all the world's data into the cloud.And that actually requires enormous amount of bandwidth. And moving data into the cloud at that scale, if you look at the amount of data the world is producing, if the data is produced inside the cloud, it's a different game, but the data is produced everywhere else. MinIO's idea was that instead of introducing yet another API standard, Amazon got the architecture right and that's the right way to build large-scale infrastructure. If we stick to Amazon S3 API instead of introducing it another standard, [unintelligible 00:14:40] API, and then go after the world's data. When we started in 2014 November—it's really 2015, we started, it was laughable. People thought that there won't be a need for MinIO because the whole world will basically go to AWS S3 and they will be the world's data store. Amazon is capable of doing that; the race is not over, right?Corey: And it still couldn't be done now. The thing is that they would need to fundamentally rethink their, frankly, you serious data egress charges. The problem is not that it's expensive to store data in AWS; it's that it's expensive to store data and then move it anywhere else for analysis or use on something else. So, there are entire classes of workload that people should not consider the big three cloud providers as the place where that data should live because you're never getting it back.AB: Spot on, right? Even if network is free, right, Amazon makes, like, okay, zero egress-ingress charge, the data we're talking about, like, most of MinIO deployments, they start at petabytes. Like, one to ten petabyte, feels like 100 terabyte. For even if network is free, try moving a ten-petabyte infrastructure into the cloud. How are you going to move it?Even with FedEx and UPS giving you a lot of bandwidth in their trucks, it is not possible, right? I think the data will continue to be produced everywhere else. So, our bet was there we will be [unintelligible 00:15:56]—instead of you moving the data, you can run MinIO where there is data, and then the whole world will look like AWS's S3 compatible object store. We took a very different path. But now, when I say the same story that when what we started with day one, it is no longer laughable, right?People believe that yes, MinIO is there because our market footprint is now larger than Amazon S3. And as it goes to production, customers are now realizing it's basically growing inside a shadow IT and eventually businesses realize the bulk of their business-critical data is sitting on MinIO and that's how it's surfacing up. So now, what we are seeing, this year particularly, all of these customers are hugely concerned about cost optimization. And as part of the journey, there is also multi-cloud and hybrid-cloud initiatives. They want to make sure that their application can run on any cloud or on the same software can run on their colos like Equinix, or like bunch of, like, Digital Reality, anywhere.And MinIO's software, this is what we set out to do. MinIO can run anywhere inside the cloud, all the way to the edge, even on Raspberry Pi. It's now—whatever we started with is now has become reality; the timing is perfect for us.Corey: One of the challenges I've always had with the idea of building an application with the idea to run it anywhere is you can make explicit technology choices around that, and for example, object store is a great example because most places you go now will or can have an object store available for your use. But there seem to be implementation details that get lost. And for example, even load balancers wind up being implemented in different ways with different scaling times and whatnot in various environments. And past a certain point, it's okay, we're just going to have to run it ourselves on top of HAproxy or Nginx, or something like it, running in containers themselves; you're reinventing the wheel. Where is that boundary between, we're going to build this in a way that we can run anywhere and the reality that I keep running into, which is we tried to do that but we implicitly without realizing it built in a lot of assumptions that everything would look just like this environment that we started off in.AB: The good part is that if you look at the S3 API, every request has the site name, the endpoint, bucket name, the path, and the object name. Every request is completely self-contained. It's literally a HTTP call away. And this means that whether your application is running on Android, iOS, inside a browser, JavaScript engine, anywhere across the world, they don't really care whether the bucket is served from EU or us-east or us-west. It doesn't matter at all, so it actually allows you by API, you can build a globally unified data infrastructure, some buckets here, some buckets there.That's actually not the problem. The problem comes when you have multiple clouds. Different teams, like, part M&A, the part—like they—even if you don't do M&A, different teams, no two data engineer will would agree on the same software stack. Then where they will all end up with different cloud players and some is still running on old legacy environment.When you combine them, the problem is, like, let's take just the cloud, right? How do I even apply a policy, that access control policy, how do I establish unified identity? Because I want to know this application is the only one who is allowed to access this bucket. Can I have that same policy on Google Cloud or Azure, even though they are different teams? Like if that employer, that project, or that admin, if he or she leaves the job, how do I make sure that that's all protected?You want unified identity, you want unified access control policies. Where are the encryption key store? And then the load balancer itself, the load, its—load balancer is not the problem. But then unless you adopt S3 API as your standard, the definition of what a bucket is different from Microsoft to Google to Amazon.Corey: Yeah, the idea of an of the PUTS and retrieving of actual data is one thing, but then you have how do you manage it the control plane layer of the object store and how do you rationalize that? What are the naming conventions? How do you address it? I even ran into something similar somewhat recently when I was doing an experiment with one of the Amazon Snowball edge devices to move some data into S3 on a lark. And the thing shows up and presents itself on the local network as an S3 endpoint, but none of their tooling can accept a different endpoint built into the configuration files; you have to explicitly use it as an environment variable or as a parameter on every invocation of something that talks to it, which is incredibly annoying.I would give a lot for just to be able to say, oh, when you're talking in this profile, that's always going to be your S3 endpoint. Go. But no, of course not. Because that would make it easier to use something that wasn't them, so why would they ever be incentivized to bake that in?AB: Yeah. Snowball is an important element to move data, right? That's the UPS and FedEx way of moving data, but what I find customers doing is they actually use the tools that we built for MinIO because the Snowball appliance also looks like S3 API-compatible object store. And in fact, like, I've been told that, like, when you want to ship multiple Snowball appliances, they actually put MinIO to make it look like one unit because MinIO can erase your code objects across multiple Snowball appliances. And the MC tool, unlike AWS CLI, which is really meant for developers, like low-level calls, MC gives you unique [scoring 00:21:08] tools, like lscp, rsync-like tools, and it's easy to move and copy and migrate data. Actually, that's how people deal with it.Corey: Oh, God. I hadn't even considered the problem of having a fleet of Snowball edges here that you're trying to do a mass data migration on, which is basically how you move petabyte-scale data, is a whole bunch of parallelism. But having to figure that out on a case-by-case basis would be nightmarish. That's right, there is no good way to wind up doing that natively.AB: Yeah. In fact, Western Digital and a few other players, too, now the Western Digital created a Snowball-like appliance and they put MinIO on it. And they are actually working with some system integrators to help customers move lots of data. But Snowball-like functionality is important and more and more customers who need it.Corey: This episode is sponsored in part by Honeycomb. I'm not going to dance around the problem. Your. Engineers. Are. Burned. Out. They're tired from pagers waking them up at 2 am for something that could have waited until after their morning coffee. Ring Ring, Who's There? It's Nagios, the original call of duty! They're fed up with relying on two or three different “monitoring tools” that still require them to manually trudge through logs to decipher what might be wrong. Simply put, there's a better way. Observability tools like Honeycomb (and very little else because they do admittedly set the bar) show you the patterns and outliers of how users experience your code in complex and unpredictable environments so you can spend less time firefighting and more time innovating. It's great for your business, great for your engineers, and, most importantly, great for your customers. Try FREE today at honeycomb.io/screaminginthecloud. That's honeycomb.io/screaminginthecloud.Corey: Increasingly, it felt like, back in the on-prem days, that you'd have a file server somewhere that was either a SAN or it was going to be a NAS. The question was only whether it presented it to various things as a volume or as a file share. And then in cloud, the default storage mechanism, unquestionably, was object store. And now we're starting to see it come back again. So, it started to increasingly feel, in a lot of ways, like Cloud is no longer so much a place that is somewhere else, but instead much more of an operating model for how you wind up addressing things.I'm wondering when the generation of prosumer networking equipment, for example, is going to say, “Oh, and send these logs over to what object store?” Because right now, it's still write a file and SFTP it somewhere else, at least the good ones; some of the crap ones still want old unencrypted FTP, which is neither here nor there. But I feel like it's coming back around again. Like, when do even home users wind up instead of where do you save this file to having the cloud abstraction, which hopefully, you'll never have to deal with an S3-style endpoint, but that can underpin an awful lot of things. It feels like it's coming back and that's cloud is the de facto way of thinking about things. Is that what you're seeing? Does that align with your belief on this?AB: I actually, fundamentally believe in the long run, right, applications will go SaaS, right? Like, if you remember the days that you used to install QuickBooks and ACT and stuff, like, on your data center, you used to run your own Exchange servers, like, those days are gone. I think these applications will become SaaS. But then the infrastructure building blocks for these SaaS, whether they are cloud or their own colo, I think that in the long run, it will be multi-cloud and colo all combined and all of them will look alike.But what I find from the customer's journey, the Old World and the New World is incompatible. When they shifted from bare metal to virtualization, they didn't have to rewrite their application. But this time, you have—it as a tectonic shift. Every single application, you have to rewrite. If you retrofit your application into the cloud, bad idea, right? It's going to cost you more and I would rather not do it.Even though cloud players are trying to make, like, the file and block, like, file system services [unintelligible 00:24:01] and stuff, they make it available ten times more expensive than object, but it's just to [integrate 00:24:07] some legacy applications, but it's still a bad idea to just move legacy applications there. But what I'm finding is that the cost, if you still run your infrastructure with enterprise IT mindset, you're out of luck. It's going to be super expensive and you're going to be left out modern infrastructure, because of the scale, it has to be treated as code. You have to run infrastructure with software engineers. And this cultural shift has to happen.And that's why cloud, in the long run, everyone will look like AWS and we always said that and it's now being becoming true. Like, Kubernetes and MinIO basically is leveling the ground everywhere. It's giving ECS and S3-like infrastructure inside AWS or outside AWS, everywhere. But what I find the challenging part is the cultural mindset. If they still have the old cultural mindset and if they want to adopt cloud, it's not going to work.You have to change the DNA, the culture, the mindset, everything. The best way to do it is go to the cloud-first. Adopt it, modernize your application, learn how to run and manage infrastructure, then ask economics question, the unit economics. Then you will find the answers yourself.Corey: On some level, that is the path forward. I feel like there's just a very long tail of systems that have been working and have been meeting the business objective. And well, we should go and refactor this because, I don't know, a couple of folks on a podcast said we should isn't the most compelling business case for doing a lot of it. It feels like these things sort of sit there until there is more upside than just cost-cutting to changing the way these things are built and run. That's the reason that people have been talking about getting off of mainframe since the '90s in some companies, and the mainframe is very much still there. It is so ingrained in the way that they do business, they have to rethink a lot of the architectural things that have sprung up around it.I'm not trying to shame anyone for the [laugh] state that their environment is in. I've never yet met a company that was super proud of its internal infrastructure. Everyone's always apologizing because it's a fire. But they think someone else has figured this out somewhere and it all runs perfectly. I don't think it exists.AB: What I am finding is that if you are running it the enterprise IT style, you are the one telling the application developers, here you go, you have this many VMs and then you have, like, a VMware license and, like, Jboss, like WebLogic, and like a SQL Server license, now you go build your application, you won't be able to do it. Because application developers talk about Kafka and Redis and like Kubernetes, they don't speak the same language. And that's when these developers go to the cloud and then finish their application, take it live from zero lines of code before it can procure infrastructure and provision it to these guys. The change that has to happen is how can you give what the developers want now that reverse journey is also starting. In the long run, everything will look alike, but what I'm finding is if you're running enterprise IT infrastructure, traditional infrastructure, they are ashamed of talking about it.But then you go to the cloud and then at scale, some parts of it, you want to move for—now you really know why you want to move. For economic reasons, like, particularly the data-intensive workloads becomes very expensive. And at that part, they go to a colo, but leave the applications on the cloud. So, it's the multi-cloud model, I think, is inevitable. The expensive pieces that where you can—if you are looking at yourself as hyperscaler and if your data is growing, if your business focus is data-centric business, parts of the data and data analytics, ML workloads will actually go out, if you're looking at unit economics. If all you are focused on productivity, stick to the cloud and you're still better off.Corey: I think that's a divide that gets lost sometimes. When people say, “Oh, we're going to move to the cloud to save money.” It's, “No you're not.” At a five-year time horizon, I would be astonished if that juice were worth the squeeze in almost any scenario. The reason you go for therefore is for a capability story when it's right for you.That also means that steady-state workloads that are well understood can often be run more economically in a place that is not the cloud. Everyone thinks for some reason that I tend to be its cloud or it's trash. No, I'm a big fan of doing things that are sensible and cloud is not the right answer for every workload under the sun. Conversely, when someone says, “Oh, I'm building a new e-commerce store,” or whatnot, “And I've decided cloud is not for me.” It's, “Ehh, you sure about that?”That sounds like you are smack-dab in the middle of the cloud use case. But all these things wind up acting as constraints and strategic objectives. And technology and single-vendor answers are rarely going to be a panacea the way that their sales teams say that they will.AB: Yeah. And I find, like, organizations that have SREs, DevOps, and software engineers running the infrastructure, they actually are ready to go multi-cloud or go to colo because they have the—exactly know. They have the containers and Kubernetes microservices expertise. If you are still on a traditional SAN, NAS, and VM architecture, go to cloud, rewrite your application.Corey: I think there's a misunderstanding in the ecosystem around what cloud repatriation actually looks like. Everyone claims it doesn't exist because there's basically no companies out there worth mentioning that are, “Yep, we've decided the cloud is terrible, we're taking everything out and we are going to data centers. The end.” In practice, it's individual workloads that do not make sense in the cloud. Sometimes just the back-of-the-envelope analysis means it's not going to work out, other times during proof of concepts, and other times, as things have hit a certain point of scale, we're in an individual workload being pulled back makes an awful lot of sense. But everything else is probably going to stay in the cloud and these companies don't want to wind up antagonizing the cloud providers by talking about it in public. But that model is very real.AB: Absolutely. Actually, what we are finding with the application side, like, parts of their overall ecosystem, right, within the company, they run on the cloud, but the data side, some of the examples, like, these are in the range of 100 to 500 petabytes. The 500-petabyte customer actually started at 500 petabytes and their plan is to go at exascale. And they are actually doing repatriation because for them, their customers, it's consumer-facing and it's extremely price sensitive, but when you're a consumer-facing, every dollar you spend counts. And if you don't do it at scale, it matters a lot, right? It will kill the business.Particularly last two years, the cost part became an important element in their infrastructure, they knew exactly what they want. They are thinking of themselves as hyperscalers. They get commodity—the same hardware, right, just a server with a bunch of [unintelligible 00:30:35] and network and put it on colo or even lease these boxes, they know what their demand is. Even at ten petabytes, the economics starts impacting. If you're processing it, the data side, we have several customers now moving to colo from cloud and this is the range we are talking about.They don't talk about it publicly because sometimes, like, you don't want to be anti-cloud, but I think for them, they're also not anti-cloud. They don't want to leave the cloud. The completely leaving the cloud, it's a different story. That's not the case. Applications stay there. Data lakes, data infrastructure, object store, particularly if it goes to a colo.Now, your applications from all the clouds can access this centralized—centralized, meaning that one object store you run on colo and the colos themselves have worldwide data centers. So, you can keep the data infrastructure in a colo, but applications can run on any cloud, some of them, surprisingly, that they have global customer base. And not all of them are cloud. Sometimes like some applications itself, if you ask what type of edge devices they are running, edge data centers, they said, it's a mix of everything. What really matters is not the infrastructure. Infrastructure in the end is CPU, network, and drive. It's a commodity. It's really the software stack, you want to make sure that it's containerized and easy to deploy, roll out updates, you have to learn the Facebook-Google style running SaaS business. That change is coming.Corey: It's a matter of time and it's a matter of inevitability. Now, nothing ever stays the same. Everything always inherently changes in the full sweep of things, but I'm pretty happy with where I see the industry going these days. I want to start seeing a little bit less centralization around one or two big companies, but I am confident that we're starting to see an awareness of doing these things for the right reason more broadly permeating.AB: Right. Like, the competition is always great for customers. They get to benefit from it. So, the decentralization is a path to bringing—like, commoditizing the infrastructure. I think the bigger picture for me, what I'm particularly happy is, for a long time we carried industry baggage in the infrastructure space.If no one wants to change, no one wants to rewrite application. As part of the equation, we carried the, like, POSIX baggage, like SAN and NAS. You can't even do [unintelligible 00:32:48] as a Service, NFS as a Service. It's too much of a baggage. All of that is getting thrown out. Like, the cloud players be helped the customers start with a clean slate. I think to me, that's the biggest advantage. And that now we have a clean slate, we can now go on a whole new evolution of the stack, keeping it simpler and everyone can benefit from this change.Corey: Before we wind up calling this an episode, I do have one last question for you. As I mentioned at the start, you're very much open-source, as in legitimate open-source, which means that anyone who wants to can grab an implementation and start running it. How do you, I guess make peace with the fact that the majority of your user base is not paying you? And I guess how do you get people to decide, “You know what? We like the cut of his jib. Let's give him some money.”AB: Mm-hm. Yeah, if I looked at it that way, right, I have both the [unintelligible 00:33:38], right, on the open-source side as well as the business. But I don't see them to be conflicting. If I run as a charity, right, like, I take donation. If you love the product, here is the donation box, then that doesn't work at all, right?I shouldn't take investor money and I shouldn't have a team because I have a job to pay their bills, too. But I actually find open-source to be incredibly beneficial. For me, it's about delivering value to the customer. If you pay me $5, I ought to make you feel $50 worth of value. The same software you would buy from a proprietary vendor, why would—if I'm a customer, same software equal in functionality, if its proprietary, I would actually prefer open-source and pay even more.But why are, really, customers paying me now and what's our view on open-source? I'm actually the free software guy. Free software and open-source are actually not exactly equal, right? We are the purest of the open-source community and we have strong views on what open-source means, right. That's why we call it free software. And free here means freedom, right? Free does not mean gratis, that free of cost. It's actually about freedom and I deeply care about it.For me it's a philosophy and it's a way of life. That's why I don't believe in open core and other models that holding—giving crippleware is not open-source, right? I give you some freedom but not all, right, like, it's it breaks the spirit. So, MinIO is a hundred percent open-source, but it's open-source for the open-source community. We did not take some community-developed code and then added commercial support on top.We built the product, we believed in open-source, we still believe and we will always believe. Because of that, we open-sourced our work. And it's open-source for the open-source community. And as you build applications that—like the AGPL license on the derivative works, they have to be compatible with AGPL because we are the creator. If you cannot open-source, you open-source your application derivative works, you can buy a commercial license from us. We are the creator, we can give you a dual license. That's how the business model works.That way, the open-source community completely benefits. And it's about the software freedom. There are customers, for them, open-source is good thing and they want to pay because it's open-source. There are some customers that they want to pay because they can't open-source their application and derivative works, so they pay. It's a happy medium; that way I actually find open-source to be incredibly beneficial.Open-source gave us that trust, like, more than adoption rate. It's not like free to download and use. More than that, the customers that matter, the community that matters because they can see the code and they can see everything we did, it's not because I said so, marketing and sales, you believe them, whatever they say. You download the product, experience it and fall in love with it, and then when it becomes an important part of your business, that's when they engage with us because they talk about license compatibility and data loss or a data breach, all that becomes important. Open-source isn't—I don't see that to be conflicting for business. It actually is incredibly helpful. And customers see that value in the end.Corey: I really want to thank you for being so generous with your time. If people want to learn more, where should they go?AB: I was on Twitter and now I think I'm spending more time on, maybe, LinkedIn. I think if they—they can send me a request and then we can chat. And I'm always, like, spending time with other entrepreneurs, architects, and engineers, sharing what I learned, what I know, and learning from them. There is also a [community open channel 00:37:04]. And just send me a mail at ab@min.io and I'm always interested in talking to our user base.Corey: And we will, of course, put links to that in the [show notes 00:37:12]. Thank you so much for your time. I appreciate it.AB: It's wonderful to be here.Corey: AB Periasamy, CEO and co-founder of MinIO. I'm Cloud Economist Corey Quinn and this has been a promoted guest episode of Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice that presumably will also include an angry, loud comment that we can access from anywhere because of shared APIs.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Matty Stratton, Director of Developer Relations at Aiven, joins Corey on Screaming in the Cloud for a friendly debate on whether or not company employees can still be considered community members. Corey says no, but opens up his position to the slings and arrows of Matty in an entertaining change of pace. Matty explains why he feels company employees can still be considered community members, and also explores how that should be done in a way that is transparent and helpful to everyone in the community. Matty and Corey also explore the benefits and drawbacks of talented community members becoming employees.About MattyMatty Stratton is the Director of Developer Relations at Aiven, a well-known member of the DevOps community, founder and co-host of the popular Arrested DevOps podcast, and a global organizer of the DevOpsDays set of conferences.Matty has over 20 years of experience in IT operations and is a sought-after speaker internationally, presenting at Agile, DevOps, and cloud engineering focused events worldwide. Demonstrating his keen insight into the changing landscape of technology, he recently changed his license plate from DEVOPS to KUBECTL.He lives in Chicago and has three awesome kids, whom he loves just a little bit more than he loves Diet Coke. Links Referenced: Aiven: https://aiven.io/ Twitter: https://twitter.com/mattstratton Mastodon: hackyderm.io/@mattstratton LinkedIn: https://www.linkedin.com/in/mattstratton/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at Min.ioWith more than 1.1 billion docker pulls - Most of which were not due to an unfortunate loop mistake, like the kind I like to make - and more than 37 thousand github stars, (which are admittedly harder to get wrong), MinIO has become the industry standard alternative to S3. It runs everywhere - public clouds, private clouds, Kubernetes distributions, baremetal, raspberry's pi, colocations - even in AWS Local Zones. The reason people like it comes down to its simplicity, scalability, enterprise features and best in class throughput. Software-defined and capable of running on almost any hardware you can imagine and some you probably can't, MinIO can handle everything you can throw at it - and AWS has imagined a lot of things - from datalakes to databases.Don't take their word for it though - check it out at www.min.io and see for yourself. That's www.min.io Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I am joined today by returning guest, my friend and yours, Matty Stratton, Director of Developer Relations at Aiven. Matty, it's been a hot second. How are you?Matty: It has been a while, but been pretty good. We have to come back to something that just occurred to me when we think about the different things we've talked about. There was a point of contention about prior art of the Corey Quinn face and photos. I don't know if you saw that discourse; we may have to have a conversation. There may be some absent—Corey: I did not see—Matty: Okay.Corey: —discourse, but I also would accept freely that I am not the first person to ever come up with the idea of opening my mouth and looking ridiculous for a photograph either.Matty: That's fair, but the thing that I think was funny—and if you don't mind, I'll just go ahead and throw this out here—is that I didn't put this two and two together. So, I posted a picture on Twitter a week or so ago that was primarily to show off the fact—it was a picture of me in 1993, and the point was that my jeans were French-rolled and were pegged. But in the photo, I am doing kind of the Corey Quinn face and so people said, “Oh, is this prior art?” And I said—you know what? I actually just remembered and I've never thought about this before, but one of my friends in high school, for his senior year ID he took a picture—his picture looks like, you know, that kind of, you know, three-quarters turn with the mouth opening going, “Ah,” you know?And he loved that picture—number one, he loved that picture so much that this guy carried his senior year high school ID in his wallet until we were like 25 because it was his favorite picture of himself. But every photo—and I saw this from looking through my yearbook of my friend Jay when we are seniors, he's doing the Corey Quinn face. And he is anecdotally part of the DevOps community, now a little bit too, and I haven't pointed this out to him. But people were saying that, you know, mine was prior art on yours, I said, “Actually, I was emulating yet someone else.”Corey: I will tell you the actual story of how it started. It was at re:Invent, I want to say 2018 or so, and what happened was is someone, they were a big fan of the newsletter—sort of the start of re:Invent—they said, “Hey, can I get a selfie with you?” And I figured, sure, why not. And the problem I had is I've always looked bad in photographs. And okay, great, so if I'm going to have a photo taken of me, that's going to be ridiculous, why not as a lark, go ahead and do this for fun during the course of re:Invent this year?So, whenever I did that I just slapped—if someone asked for a selfie—I'd slap the big happy open mouth smile on my face. And people thought, “Oh, my God, this is amazing.” And I don't know that it was necessarily worth that level of enthusiasm, but okay. I'll take it. I'm not here to tell people they're wrong when they enjoy a joke that I'm putting out there.And it just sort of stuck. And I think the peak of it that I don't think I'm ever going to be able to beat is I actually managed to pull that expression on my driver's license.Matty: Wow.Corey: Yeah.Matty: That's—Corey: They don't have a sense of humor that they are aware of at the DMV.Matty: No, they really don't. And having been to the San Francisco DMV and knowing how long it takes to get in there, like, that was a bit of a risk on your part because if they decided to change their mind, you wouldn't be able to come back for another four months [laugh].Corey: It amused me to do it, so why not? What else was I going to do? I brought my iPad with me, it has cellular on it, so I just can work remotely from there. It was either that or working in my home office again, and frankly, at the height of the pandemic, I could use the break.Matty: Yes [laugh]. That's saying something when the break you can use is going to the DMV.Corey: Right.Matty: That's a little bit where we were, where we at. I think just real quick thinking about that because there's a lot to be said with that kind of idea of making a—whether it's silly or not, but having a common, especially if you do a lot of photos, do a lot of things, you don't have to think about, like, how do I look? I mean, you have to think about—you know, you can just say I just know what I do. Because if you think about it, it's about cultivating your smile, cultivating your look for your photos, and just sort of having a way so you don't—you just know what to do every time. I guess that's a, you know, maybe a model tip or something. I don't know. But you might be onto something.Corey: I joke that my entire family motto is never be the most uncomfortable person in the room. And there's something to be said for it where if you're going to present a certain way, make it your own. Find a way to at least stand out. If nothing else, it's a bit different. Most people don't do that.Remember, we've all got made fun of, generally women—for some reason—back about 15 years ago or so for duck face, where in all the pictures you're making duck face. And well, there are reasons why that is a flattering way to present your face. But if there's one thing we love as a society, it's telling women they're doing something wrong.Matty: Yeah.Corey: So yeah, there's a whole bunch of ways you're supposed to take selfies or whatnot. Honestly, I'm in no way shape or form pretty enough or young enough to care about any of them. At this point, it's what I do when someone busts out a camera and that's the end of it. Now, am I the only person to do this? Absolutely not. Do I take ownership of it? No. Someone else wants to do it, they need give no credit. The idea probably didn't come from me.Matty: And to be fair, if I'm little bit taking the mickey there or whatever about prior art, it was more than I thought it was funny because I had not even—it was this thing where it was like, this is a good friend of mine, probably some of that I've been friends with longer than anyone in my whole life, and it was a core part [laugh] of his personality when we were 18 and 19, and it just d—I just never direct—like, made that connection. And then it happened to me and went “Oh, my God. Jason and Corey did the same thing.” [laugh]. It was—Corey: No, it feels like parallel evolution.Matty: Yeah, yeah. It was more of me never having connected those dots. And again, you're making that face for your DMV photo amused you, me talking about this for the last three minutes on a podcast amused me. So.Corey: And let's also be realistic here. How many ways are there to hold your face during a selfie that is distinguishable and worthy of comment? Usually, it's like okay, well, he has this weird sardonic half-smile with an eyebrow ar—no. His mouth was wide open. We're gonna go with that.Matty: You know, there's a little—I want to kind of—because I think there's actually quite a bit to the lesson from any of this because I think about—follow me here; maybe I'll get to the right place—like me and karaoke. No one would ever accuse me of being a talented singer, right? I'm not going to sing well in a way where people are going to be moved by my talent. So instead, I have to go a different direction. I have to go funny.But what it boils down to is I can only do—I do karaoke well when it's a song where I can feel like I'm doing an impression of the singer. So, for example, the B-52s. I do a very good impression of Fred Schneider. So, I can sing a B-52 song all day long. I actually could do better with Pearl Jam than I should be able to with my terrible voice because I'm doing an Eddie Vedder impression.So, what I'm getting at is you're sort of taking this thing where you're saying, okay, to your point, you said, “Hey,”—and your words, not mine—[where 00:07:09] somebody say, “The picture is not going to be of me looking like blue steel runway model, so I might as well look goofy.” You know? And take it that way and be funny with it. And also, every time, it's the same way, so I think it's a matter of kind of owning the conversation, you know, and saying, how do you accentuate the thing that you can do. I don't know. There's something about DevOps, somehow in there.Corey: So, I am in that uncomfortable place right now between having finalized a blog post slash podcast that's going out in two days from this recording. So, it will go out before you and I have this discussion publicly, but it's also too late for me to change any of it,m so I figured I will open myself up to the slings and arrows of you, more or less. And you haven't read this thing yet, which is even better, so you're now going to be angry about an imperfect representation of what I said in writing. But the short version is this: if you work for a company as their employee, then you are no longer a part of that company's community, as it were. And yes, that's nuanced and it's an overbroad statement and there are a bunch of ways that you could poke holes in it, but I'm curious to get your take on the overall positioning of it.Matty: So, at face value, I would vehemently disagree with that statement. And by that is, that I have spent years of my life tilting at the opposite windmill, which is just because you work at this company, doesn't mean you do not participate in the community and should not consider yourself a part of the community, first and foremost. That will, again, like everything else, it depends. It depends on a lot of things and I hope we can kind of explore that a little bit because just as much as I would take umbrage if you will, or whatnot, with the statement that if you work at the company, you stop being part of the community, I would also have an issue with, you're just automatically part of the community, right? Because these things take effort.And I feel like I've been as a devreloper, or whatever, Corey—how do you say it?Corey: Yep. No, you're right on. Devreloper.Matty: As a—or I would say, as a DevRel, although people on Twitter are angry about using the word DevRel to discuss—like saying, “I'm a DevRel.” “DevRel is a department.” It's a DevOps engineer thing again, except actually—it's, like, actually wrong. But anyway, you kind of run into this, like for example—I'm going to not name names here—but, like, to say, you know, Twitter for Pets, the—what do you—by the way, Corey, what are you going to do now for your made-up company when what Twitter is not fun for this anymore? You can't have Twitter for Pets anymore.Corey: I know I'm going to have to come up with a new joke. I don't quite know what to do with myself.Matty: This is really hard. While we will pretend Twitter for Pets is still around a little bit, even though its API is getting shut down.Corey: Exactly.Matty: So okay, so we're over here at Twitter for Pets, Inc. And we've got our—Corey: Twitter for Bees, because you know it'll at least have an APIary.Matty: Yeah. Ha. We have our team of devrelopers and community managers and stuff and community engineers that work at Twitter for Pets, and we have all of our software engineers and different people. And a lot of times the assumption—and now we're going to have Twitter for Pets community something, right? We have our community, we have our area, our place that we interact, whether it's in person, it's virtual, whether it's an event, whether it's our Discord or Discourse or Slack or whatever [doodlee 00:10:33] thing we're doing these days, and a lot of times, all those engineers and people whose title does not have the word ‘community' on it are like, “Oh, good. Well, we have people that do that.”So, number one, no because now we have people whose priority is it; like, we have more intentionality. So, if I work on the community team, if I'm a dev advocate or something like that, my priority is communicating and advocating to and for that community. But it's like a little bit of the, you know, the office space, I take the requirements from the [unintelligible 00:11:07] to people, you I give them to the engineers. I've got people—so like, you shouldn't have to have a go-between, right? And there's actually quite a bit of place.So, I think, this sort of assumption that you're not part of it and you have no responsibility towards that community, first of all, you're missing a lot as a person because that's just how you end up with people building a thing they don't understand.Corey: Oh, I think you have tremendous responsibility to the community, but whether you're a part of it and having responsibility to it or not aligned in my mind.Matty: So… maybe let's take a second and what do you mean by being a part of it?Corey: Right. Where very often I'll see a certain, I don't know, very large cloud provider will have an open-source project. Great, so you go and look at the open-source project and the only people with commit access are people who work at that company. That is an easy-to-make-fun-of example of this. Another is when the people who are in a community and talking about how they perceive things and putting out content about how they've interacted with various aspects of it start to work there, you see areas where it starts to call its authenticity into question.AWS is another great example of this. As someone in the community, I can talk about how I would build something on top of AWS, but then move this thing on to Fastly instead of CloudFront because CloudFront is terrible. If you work there, you're not going to be able to say the same thing. So, even if you're not being effusive with praise, there are certain guardrails and constraints that keep you from saying what you might otherwise, just based upon the sheer self-interest that comes from the company whose product or service you're talking about is also signing your paycheck and choosing to continue to do so.Matty: And I think even less about it because that's where your paycheck is coming. It's also just a—there's a gravitational pull towards those solutions because that's just what you're spending your day with, right? You know—Corey: Yeah. And you also don't want to start and admit even to yourself, in some cases, that okay, this aspect of what our company does is terrible, so companies—people shouldn't use it. You want to sort of ignore that, on some level, psychologically because that dissonance becomes harmful.Matty: Yeah. And I think there's—so again, this is where things get nuanced and get to levels. Because if you have the right amount of psychological safety in your organization, the organization understands what it's about to that. Because even people whose job is to be a community person should be able to say, “Hey, this is my actual opinion on this. And it might be contrary to the go-to-market where that comes in.”But it's hard, especially when it gets filtered through multiple layers and now you've got a CEO who doesn't understand that nuance who goes, “Wait, why was Corey on some podcast saying that the Twitter for Pets API is not everything it could possibly be?” So, I do think—I will say this—I do think that organizations and leadership are understanding this more than they might have in the past, so we are maybe putting on ourselves this belief that we can't be as fully honest, but even if it's not about hiding the warts, even if it's just a matter of also, you're just like, hey, chances are—plus also to be quite frank, if I work at the company, I probably have access to way more shit than I would have to pay for or do whatever and I know the right way. But here's the trick, and I won't even say it's a dogfooding thing, but if you are not learning and thinking about things the way that your users do—and I will even say that that's where—it is the users, which are the community, that community or the people that use your product or are connected to it, they don't use it; they may be anecdotal—or not anecdotally, maybe tangentially connected. I will give an example. And there was a place I was working where it was very clear, like, we had a way to you know, do open-source contributions back of a type of a provider plug-in, whatever you want to call it and I worked at the company and I could barely figure out how to follow the instructions.Because it made a lot of sense to someone who built that software all day long and knew the build patterns, knew all that stuff. So, if you were an engineer at this company, “Well, yeah, of course. You just do this.” And anybody who puts the—connects the dots, this has gotten better—and this was understood relatively quickly as, “Oh, this is the problem. Let's fix it.” So, the thing is, the reason why I bring this up is because it's not something anybody does intentionally because you don't know what you don't know. And—Corey: Oh, I'm not accusing anyone of being a nefarious actor in any of this. I also wonder if part of this is comes from your background as being heavily involved in the Chef community as a Chef employee and as part of the community around that, which is inherently focused on an open-source product that a company has been built around, whereas my primary interaction with community these days is the AWS community, where it doesn't matter whether you're large or small, you are not getting much, if anything, for free from AWS; you're all their customers and you don't really have input into how something gets built, beyond begging nicely.Matty: That's definitely true. And I think we saw that and there was things, when we look at, like, how community, kind of, evolved or just sort of happened at Chef and why we can't recreate it the same way is there was a certain inflection point of the industry and the burgeoning DevOps movement, and there wasn't—you know, so a lot of that was there. But one of the big problems, too, is, as Corey said, everybody—I shouldn't say every, but I've from the A—all the way up to AWS to your smaller startups will have this problem of where you end up hiring in—whether you want to or not—all of your champions and advocates and your really strong community members, and then that ends up happening. So, number one, that's going to happen. So frankly, if you don't push towards this idea, you're actually going to have people not want to come work because you should be able to be still the member that you were before.And the other thing is that at certain size, like, at the size of a hyperscaler, or, you know, a Microsoft—well, anybody—well Microsofts not a hyperscaler, but you know what I'm saying. Like, very, very large organization, your community folks are not necessarily the ones doing that hiring away. And as much as they might—you know, and again, I may be the running the community champion program at Microsoft and see that you want—you know, but that Joe Schmo is getting hired over into engineering. Like, I'm not going to hire Joe because it hurts me, but I can't say you can't, you know? It's so this is a problem at the large size.And at the smaller size, when you're growing that community, it happens, too, because it's really exciting. When there's a place that you're part of that community, especially when there's a strong feel, like going to work for the mothership, so to speak is, like, awesome. So again, to give an example, I was a member of the Chef community, I was a user, a community person well, before, you know, I went and, you know, had a paycheck coming out of that Seattle office. And it was, like, the coolest thing in the world to get a job offer from Ch—like, I was like, “Oh, my God. I get to actually go work there now.” Right?And when I was at Pulumi, there quite a few people I could think of who I knew through the community who then get jobs at Pulumi and we're so excited, and I imagine still excited, you know? I mean, that was awesome to do. So, it's hard because when you get really excited about a technology, then being able to say, “Wait, I can work on this all the time?” That sounds awesome, right? So like, you're going to have that happen.So, I think what you have to do is rather than prevent it from happening because number one, like, you don't want to actually prevent that from happening because those people will actually be really great additions to your organization in lots of ways. Also, you're not going to stop it from happening, right? I mean, it's also just a silly way to do it. All you're going to do is piss people off, and say, like, “Hey, you're not allowed to work here because we need you in the community.” Then they're going to be like, “Great. Well, guess what I'm not a part of anymore now, jerk?” Right? You know [laugh] I mean so—Corey: Exactly.Matty: Your [unintelligible 00:18:50] stops me. So, that doesn't work. But I think to your point, you talked about, like, okay, if you have a, ostensibly this a community project, but all the maintainers are from one—are from your company, you know? Or so I'm going to point to an example of, we had—you know, this was at Pulumi, we had a Champions program called Puluminaries, and then there's something similar to like Vox Populi, but it was kind of the community that was not run by Pulumi Inc. In that case.Now, we helped fund it and helped get it started, but there was there were rules about the, you know, the membership of the leadership, steering committee or board or whatever it was called, there was a hard limit on the number of people that could be Pulumi employees who were on that board. And it actually, as I recall when I was leaving—I imagine this is not—[unintelligible 00:19:41] does sometimes have to adjust a couple of things because maybe those board members become employees and now you have to say, you can't do that anymore or we have to take someone down. But the goal was to actually, you know, basically have—you know, Pulumi Corp wanted to have a voice on that board because if for no other reason, they were funding it, but it was just one voice. It wasn't even a majority voice. And that's a hard sell in a lot of places too because you lose control over that.There's things I know with, uh—when I think about, like, running meetup communities, like, we might be—well I mean, this is not a big secret, I mean because it's been announced, but we're—you know, Aiven is helping bootstrap a bunch of data infrastructure meetups around the world. But they're not Aiven meetups. Now, we're starting them because they have to start, but pretty much our approach is, as soon as this is running and there's people, whether they work here, work with us or not, they can take it, right? Like, if that's go—you know? And being able to do that can be really hard because you have to relinquish the control of your community.And I think you don't have to relinquish a hundred percent of that control because you're helping facilitate it because if it doesn't already have its own thing—to make sure that things like code of conduct and funding of it, and there's things that come along with the okay, we as an organization, as a company that has dollars and euros is going to do stuff for this, but it's not ours. And that's the thing to remember is that your community does not belong to you, the company. You are there to facilitate it, you are there to empower it, you're there to force-multiply it, to help protect it. And yeah, you will probably slurp a whole bunch of value out of it, so this is not magnanimous, but if you want it to actually be a place it's going to work, it kind of has to be what it wants to be. But by the same token, you can't just sort of sit there and be like, “I'm going to wait for this community grow up around me without anything”—you know.So, that's why you do have to start one if there is quote-unquote—maybe if there's no shape to one. But yeah, I think that's… it is different when it's something that feels a little—I don't even want to say that it's about being open-source. It's a little bit about it less of it being a SaaS or a service, or if it's something that you—I don't know.Corey: This episode is sponsored in part by Honeycomb. I'm not going to dance around the problem. Your. Engineers. Are. Burned. Out. They're tired from pagers waking them up at 2 am for something that could have waited until after their morning coffee. Ring Ring, Who's There? It's Nagios, the original call of duty! They're fed up with relying on two or three different “monitoring tools” that still require them to manually trudge through logs to decipher what might be wrong. Simply put, there's a better way. Observability tools like Honeycomb (and very little else becau se they do admittedly set the bar) show you the patterns and outliers of how users experience your code in complex and unpredictable environments so you can spend less time firefighting and more time innovating. It's great for your business, great for your engineers, and, most importantly, great for your customers. Try FREE today at honeycomb.io/screaminginthecloud. That's honeycomb.io/screaminginthecloud.Corey: Yeah, I think you're onto something here. I think another aspect where I found it be annoying is when companies view their community as, let's hire them all. And I don't think it ever starts that way. I think that it starts as, well these are people who are super-passionate about this, and they have great ideas and they were great to work with. Could we hire them?And the answer is, “Oh, wait. You can give me money for this thing I've been doing basically for free? Yeah, sure, why not?” And that's great in the individual cases. The problem is, at some point, you start to see scenarios where it feels like, if not everyone, then a significant vocal majority of the community starts to work there.Matty: I think less often than you might think is it done strategically or on purpose. There have been exceptions to that. There's one really clear one where it feels like a certain company a few years ago, hired up all the usual suspects of the DevOps community. All of a sudden, you're like, oh, a dozen people all went to go work at this place all at once. And the fun thing is, I remember feeling a little bit—got my nose a little out of joint because I was not the hiring mana—like, I knew the people.I was like, “Well, why didn't you ask me?” And they said, “Actually, you are more important to us not working here.” Now, that might have just been a way to sell my dude-in-tech ego or not, but whether or not that was actually true for me or not, that is a thing where you say you know, your folks—but I do think that particular example of, like, okay, I'm this, that company, and I'm going to go hire up all the usual suspects, I think that's less. I think a lot of times when you see communities hire up those people, it's not done on purpose and in fact, it's probably not something they actually wanted to do in mass that way. But it happens because people who are passionate about your product, it's like I said before, it actually seems pretty cool to go work on it as your main thing.But I can think of places I've been where we had, you know—again, same thing, we had a Pulumi—we had someone who was probably our strongest, loudest, most vocal community member, and you know, I really wanted to get this person to come join us and that was sort of one of the conversations. Nobody ever said, “We won't offer this person a job if they're great.” Like, that's the thing. I think that's actually kind of would be shitty to be like, “You're a very qualified individual, but you're more important to me out in the community so I'm not going to make your job offer.” But it was like, Ooh, that's the, you know—it'd be super cool to have this person but also, not that that should be part of our calculus of decision, but then you just say, what do you do to mitigate that?Because what I'm concerned about is people hearing this the wrong way and saying, “There's this very qualified individual who wants to come work on my team at my company, but they're also really important to our community and it will hurt our community if they come work here, so sorry, person, we're not going to give you an opportunity to have an awesome job.” Like, that's also thinking about the people involved, too. But I know having talked to folks that lots of these different large organizations that have this problem, generally, those community folks, especially at those places, they don't want this [laugh] happening. They get frustrated by it. So, I mean, I'll tell you, it's you know, the—AWS is one of them, right?They're very excited about a lot of the programs and cool people coming from community builders and stuff and Heroes, you know. On one hand, it's incredibly awesome to have a Hero come work at AWS, but it hurts, right, because now they're not external anymore.Corey: And you stop being a Hero in that case, as well.Matty: Yeah. You do, yeah.Corey: Of course, they also lose the status if they go to one of their major competitors. So like, let me get this straight. You can't be a Hero if you work for AWS or one of its competitors. And okay, how are there any Heroes left at all at some point? And the answer is, they bound it via size and a relatively small list of companies. But okay.Matty: So, thinking back to your point about saying, okay, so if you work at the company, you lose some authenticity, some impartiality, some, you know… I think, rather than just saying, “Well, you're not part”—because that also, honestly, my concern is that your blog post is now going to be ammunition for all the people who don't want to act as members of the community for the company they work for now. They're going to say, well, Corey told me I don't have to. So, like I said, I've been spending the last few years tilting at the opposite windmill, which is getting people that are not on the community team to take part in community summits and discourse and things like that, like, you know, for that's—so I think the thing is, rather than saying, “Well, you can't,” or, “You aren't,” it's like, “Well, what do you do to mitigate those things?”Corey: Yeah, it's a weird thing because taking AWS as the example that I've been beating up on a lot, the vast majority of their employees don't know the community exists in any meaningful sense. Which, no fault to them. The company has so many different things, no one keeps up with at all. But it's kind of nuts to realize that there are huge communities of people out there using a thing you have built and you do not know that those users exist and talk to each other in a particular watering hole. And you of course, as a result, have no presence there. I think that's the wrong direction, too. But—Matty: Mm-hm.Corey: Observing the community and being part of the community, I think there's a difference. Are you a biologist or are you a gorilla?Matty: Okay, but [sigh] I guess that's sort of the difference, too which—and it's hard, it's very hard to not just observe. Because I think that actually even taking the mentality of, “I am here to be Jane Goodall, Dr. Jane Goodall, and observe you while I live amongst you, but I'm not going to actually”—although maybe I'm probably doing disservice—I'm remembering my Goodall is… she was actually more involved. May be a bad example.Corey: Yeah. So, that analogy does fall apart a little bit.Matty: It does fall apart a little bit—Corey: Yeah.Matty: But it's you kind of am I sitting there taking field notes or am I actually engaging with you? Because there is a difference. Even if your main reason for being there is just purely to—I mean, this is not the Prime Directive. It's not Star Trek, right? You're not going to like, hold—you don't need to hold—I mean, do you have to hold yourself aloof and say, “I don't participate in this conversation; I'm just here to take notes?”I think that's very non-genuine at that point. That's over-rotating the other way. But I think it's a matter of in those spaces—I think there's two things. I think you have to have a way to be identified as you are an employee because that's just disclosure.Corey: Oh, I'm not suggesting by any stretch of the imagination, people work somewhere but not admit that they work somewhere when talking about the company. That's called fraud.Matty: Right. No, no, and I don't think it's even—but I'm saying beyond just, if it's not, if you're a cop, you have to tell me, right?Corey: [laugh].Matty: It's like, it's not—if asked, I will tell you I work at AWS. It's like in that place, it should say, “I am an AWS em—” like, I should be badged that way, just so it's clear. I think that's actually helpful in two ways. It's also helpful because it says like, okay, maybe you have a connection you can get for me somehow. Like, you might actually have some different insight or a way to chase something that, you know, it's not necessarily just about disclosure; it's also helpful to know.But I think within those spaces, that disclosure—or not disclosure, but being an employee does not offer you any more authority. And part of that is just having to be very clear about how you're constructing that community, right? And that's sort of the way that I think about it is, like, when we did the Pulumi Community Summit about a year ago, right? It was an online, you know, thing we did, and the timing was such that we didn't have a whole lot of Pulumi engineers were able to join, but when we—and it's hard to say we're going to sit in an open space together and everybody is the same here because people also—here's the difference. You say you want this authority? People will want that authority from the people that work at the company and they will always go to them and say, like, “Well, you should have this answer. Can you tell me about this? Can you do this?”So, it's actually hard on both cases to have that two-way conversation unless you set the rules of that space such as, “Okay, I work at Aiven, but when I'm in this space, short of code of conduct or whatever, if I have to be doing that thing, I have no more authority on this than anyone else.” I'm in this space as the same way everyone else's. You can't let that be assumed.Corey: Oh, and big companies do. It's always someone else's… there's someone else's department. Like, at some level, it feels like when you work in one of those enormous orgs, it's your remit is six inches wide.Matty: Well, right. Right. So, I think it's like your authority exists only so far as it's helpful to somebody. If I'm in a space as an Aivener, I'm there just as Matty the person. But I will say I work at Aiven, so if you're like, “God, I wish that I knew who was the person to ask about this replication issue,” and then I can be like, “Aha, I actually have backchannel. Let me help you with that.” But if I can say, “You know what? This is what I think about Kafka and I think why this is whatever,” like, you can—my opinion carries just as much weight as anybody else's, so to speak. Or—Corey: Yeah. You know, it's also weird. Again, community is such a broad and diverse term, I find myself in scenarios where I will observe and talk to people inside AWS about things, but I never want to come across as gloating somehow, that oh, I know, internal people that talk to you about this and you don't. Like, that's never how I want to come across. And I also, I never see the full picture; it's impossible for me to, so I never make commitments on behalf of other people. That's a good way to get in trouble.Matty: It is. And I think in the case of, like, someone like you who's, you know, got the connections you have or whatever, it's less likely for that to be something that you would advertise for a couple of reasons. Like, nobody should be advertising to gloat, but also, part of my remit as a member of a community team is to actually help people. Like, you're doing it because you want to or because it serves you in a different way. Like, that is literally my job.So like, it shouldn't be, like—like, because same thing, if you offer up your connections, now you are taking on some work to do that. Someone who works at the company, like, yes, you should be taking on that work because this is what we do. We're already getting paid for it, you know, so to speak, so I think that's the—Corey: Yeah.Matty: —maybe a nuance, but—Corey: Every once in a while, I'll check my Twitter spam graveyard, [unintelligible 00:32:01] people asking me technical questions months ago about various things regarding AWS and whatnot. And that's all well and good; the problem I have with it is that I'm not a support vector. I don't represent for the company or work for them. Now, if I worked there, I'd feel obligated to make sure this gets handed to the right person. And that's important.The other part of it, though, is okay, now that that's been done and handed off, like do I shepherd it through the process? Eh. I don't want people to get used to asking people in DMs because again, I consider myself to be a nice guy, but if I'm some nefarious jerk, then I could lead them down a very dark path where I suddenly have access to their accounts. And oh, yeah, go ahead and sign up for this thing and I'll take over their computer or convince them to pay me in iTunes gift cards or something like that. No, no, no. Have those conversations in public or through official channels, just because I don't, I don't think you want to wind up in that scenario.Matty: So, my concern as well, with sort of taking the tack of you are just an observer of the community, not a part of it is, that actually can reinforce some pretty bad behavior from an organization towards how they treat the community. One of the things that bothers me—if we're going to go on a different rant about devrelopers like myself—is I like to say that, you know, we pride ourselves as DevRels as being very empathetic and all this stuff, but very happy to shit all over people that work in sales or marketing, based on their job title, right? And I'm like, “Wow, that's great,” right? We're painting with this broad brush. Whereas in reality, we're not separate from.And so, the thing is, when you treat your community as something separate from you, you are treating it as something separate from you. And then it becomes a lot easier also, to not treat them like people and treat them as just a bunch of numbers and treat them as something to have value extracted from rather than it—this is actually a bunch of humans, right? And if I'm part of that, then I'm in the same Dunbar number a little bit, right? I'm in the same monkey sphere as those people because me, I'm—whoever; I'm the CTO or whatever, but I'm part of this community, just like Joe Smith over there in Paducah, you know, who's just building things for the first time. We're all humans together, and it helps to not treat it as the sort of amorphous blob of value to be extracted.So, I think that's… I think all of the examples you've been giving and those are all valid concerns and things to watch out for, the broad brush if you're not part of the community if you work there, my concern is that that leads towards exacerbating already existing bad behavior. You don't have to convince most of the people that the community is separate from them. That's what I'm sort of getting at. I feel like in this work, we've been spending so much time to try to get people to realize they should be acting like part of their larger community—and also, Corey, I know you well enough to know that, you know, sensationalism to make a point [laugh] works to get somebody to join—Corey: I have my moments.Matty: Yeah, yeah, yeah. I mean, there's I think… I'll put it this way. I'm very interested to see the reaction, the response that comes out in, well now, for us a couple of days, for you the listener, a while ago [laugh] when that hits because I think it is a, I don't want to say it's controversial, but I think it's something that has a lot of, um… put it this way, anything that's simple and black and white is not good for discussion.Corey: It's nuanced. And I know that whenever I wrote in 1200 words is not going to be as nuanced of the conversation we just had, either, so I'm sure people will have opinions on it. That'd be fun. It'd be a good excuse for me to listen.Matty: Exactly [laugh]. And then we'll have to remember to go back and find—I'll have to do a little Twitter search for the dates.Corey: We'll have to do another discussion on this, if anything interesting comes out of it.Matty: Actually, that would be funny. That would be—we could do a little recap.Corey: It would. I want to thank you so much for being so generous with your time. Where can people find you if they want to learn more?Matty: Well, [sigh] for the moment, [sigh] who knows what will be the case when this comes out, but you can still find me on Twitter at @mattstratton. I'm also at hackie-derm dot io—sorry, hackyderm.io. I keep wanting to say hackie-derm, but hackyderm actually works better anyway and it's funnier. But [hackyderm.io/@mattstratton](https://hackyderm.io/@mattstratton) is my Mastodon. LinkedIn; I'm. Around there. I need to play more at that. You will—also again, I don't know when this is coming out, so you won't tell you—you don't find me out traveling as much as you might have before, but DevOpsDays Chicago is coming up August 9th and 10th in Chicago, so at the time of listening to this, I'm sure our program will have been posted. But please come and join us. It will be our ninth time of hosting a DevOpsDay Chicago. And I have decided I'm sticking around for ten, so next year will be my last DevOpsDay that I'm running. So, this is the penultimate. And we always know that the penultimate is the best.Corey: Absolutely. Thanks again for your time. It's appreciated. Matty Stratton, Director of Developer Relations at Aiven. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment talking about how I completely missed the whole point of this community and failing to disclose that you are in fact one of the producers of the show.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Looks like it's time to update your iPhones, Macs, and iPads again. Apple has released security fixes for all affected devices and they've acknowledged that it maybe have been actively exploited. Jon, is this a huge concern and should we be updating our devices immediately? Why is this happening more and more? This and more on the Rundown. Time Stamps: 0:00 | Welcome to the Rundown 0:36 | AWS Modular Data Center for U.S. Department of Defense 3:19 | EDB Adds Encryption for PostgreSQL 7:21 | Breaking Up with Internet Explorer on Valentines Day 10:30 | SuSE Launches ATIP for Modernizing Comms Networks 15:04 | MinIO and Veeam Team Up for Backup 18:53 | Apple Says Zero Day Vulnerability for iPhones and Macs May have been Exploited 28:00 | The Weeks Ahead 30:07 | Thanks for Watching Follow our hosts on Social Media Tom Hollingsworth: https://www.twitter.com/NetworkingNerd Stephen Foskett: https://www.twitter.com/SFoskett Jon Myer: https://www.twitter.com/_JonMyer Follow Gestalt IT Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/1789 Tags: @Apple @AWSCloud @Oracle @Microsoft @MinIO @Veeam #Storage #SuSE CyberSecurity #AWS #InternetExplorer #EFD1 #TFD27
Write Admin tools from Day One, Differentiating between Data Security and Data Integrity, 45 year-old Unix tool is finally getting an upgrade, OpenBSD 7.2 on an ODROID-HC4, Dotfiles Management, and more NOTES This episode of BSDNow is brought to you by Tarsnap (https://www.tarsnap.com/bsdnow) and the BSDNow Patreon (https://www.patreon.com/bsdnow) Headlines Write Admin tools from Day One (https://milwaukeemaven.blogspot.com/2022/08/write-admin-tools-from-day-one.html) Differentiating between Data Security and Data Integrity (https://klarasystems.com/articles/openzfs-data-security-vs-integrity/) News Roundup This 45 year-old Unix tool is finally getting an upgrade (https://www.techradar.com/news/45-year-old-unix-tool-finally-gets-an-upgrade) Installing OpenBSD 7.2 on an ODROID-HC4 (https://www.tumfatig.net/2022/install-openbsd-odroid-hc4/) Dotfiles Management (https://mitxela.com/projects/dotfiles_management) Beastie Bits FreeBSD Journal - November/December 2022 - Observability and Metrics (https://freebsdfoundation.org/past-issues/observability-and-metrics/) HAMMER2 file system for NetBSD (https://github.com/kusumi/netbsd_hammer2) Running OpenBSD 7.2 on your laptop is really hard (not) (https://sohcahtoa.org.uk/openbsd.html) MinIO on OpenBSD 7.2: Install (https://dev.to/nabbisen/minio-on-openbsd-72-install-3b3h) WireGuard VPN on OpenBSD (https://www.adrianobarbosa.xyz/blog/openbsd-wireguard.html) A tool for glamorous shell scripts (https://github.com/charmbracelet/gum) Visualize your git commits with a heat map in the terminal (https://github.com/james-stoup/heatwave) Tarsnap This weeks episode of BSDNow was sponsored by our friends at Tarsnap, the only secure online backup you can trust your data to. Even paranoids need backups. Send questions, comments, show ideas/topics, or stories you want mentioned on the show to feedback@bsdnow.tv (mailto:feedback@bsdnow.tv)
FreeBSD Q3 2022 status report, Leveraging MinIO and OpenZFS to avoid vendor lock in, FreeBSD on Firecracker platform, How Much Faster Is Making A Tar Archive Without Gzip, Postgres from packages on OpenBSD, Upgrading an NVMe zpool from 222G to 1TB drives, Don't use Reddit for Linux or BSD related questions, and more. NOTES This episode of BSDNow is brought to you by Tarsnap (https://www.tarsnap.com/bsdnow) and the BSDNow Patreon (https://www.patreon.com/bsdnow) Headlines FreeBSD Quarterly Status Report Third Quarter 2022 (https://www.freebsd.org/status/report-2022-07-2022-09/) Avoid Infrastructure Vendor Lock-in by leveraging MinIO and OpenZFS (https://klarasystems.com/articles/avoid-vendor-lock-in-with-minio-and-openzfs/) Announcing the FreeBSD/Firecracker platform (https://www.daemonology.net/blog/2022-10-18-FreeBSD-Firecracker.html) News Roundup How Much Faster Is Making A Tar Archive Without Gzip? (https://lowendbox.com/blog/how-much-faster-is-making-a-tar-archive-without-gzip/) PostgreSQL from packages on OpenBSD (https://www.dbi-services.com/blog/postgresql-from-packages-on-openbsd/) Upgrading an NVMe zpool from 222G to 1TB drives (https://dan.langille.org/2022/10/18/upgrading-an-nvme-zpool-from-222g-to-1tb-drives/) PSA: Don't use Reddit for Linux or BSD related questions (https://unixsheikh.com/articles/dont-use-reddit-for-linux-or-bsd-related-questions.html) Tarsnap This weeks episode of BSDNow was sponsored by our friends at Tarsnap, the only secure online backup you can trust your data to. Even paranoids need backups. Feedback/Questions Hinnerk - vnet jails (https://github.com/BSDNow/bsdnow.tv/blob/master/episodes/481/feedback/Hinnerk%20-%20vnet%20jails.md) Tom's response example: https://adventurist.me/posts/00304 Hugo - Apple M2 (https://github.com/BSDNow/bsdnow.tv/blob/master/episodes/481/feedback/Hugo%20-%20Apple%20M2.md) kevin - emacs backspace (https://github.com/BSDNow/bsdnow.tv/blob/master/episodes/481/feedback/kevin%20-%20emacs%20backspace.md) ) Send questions, comments, show ideas/topics, or stories you want mentioned on the show to feedback@bsdnow.tv (mailto:feedback@bsdnow.tv)
About AshleighAshleigh Early is a passionate advocate for sales people and through her consulting, coaching, and The Other Side of Sales, she is devoted to making B2B sales culture more inclusive so anyone can thrive. Over the past ten years Ashleigh has led, built, re-built, and consulted for 2 unicorns, 3 acquisitions, 1 abject failure and every step in between. She is also the Head of Sales at the Duckbill Group! You can find Ashleigh on Twitter @AshleighatWork and more about the Other Side of Sales at Othersideofsales.comLinks: Twitter: https://twitter.com/ashleighatwork LinkedIn: https://www.linkedin.com/in/ashleighearly TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.Corey: Today's episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that's built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you're defining those as, which depends probably on where you work. It's getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that's exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn't eat all the data you've gotten on the system, it's exactly what you've been looking for. Check it out today at min.io/download, and see for yourself. That's min.io/download, and be sure to tell them that I sent you.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today does something that I, sort of, dabbled around the fringes of once upon a time, but then realized I wasn't particularly good at it and got the hell out of it and went screaming into clouds instead. Ashleigh Early is the Head of Sales here at The Duckbill Group. Ashleigh, thank you for joining me.Ashleigh: Thanks for coming on and running, screaming from my chosen profession [laugh]. You're definitely not the only one.Corey: Well, let's be clear here; there are two ways that can go because sure, I used to dabble around in sales when I was, basically, trying to figure how to not starve to death. But I also used to run things; it's basically a smart team. I was managing people and realized I was bad at that, too. So, really, that's, sort of, an open-ended direction. We can go either side and…But, let's go with sales. That seems like a more interesting way for this to play out. So, you've been here for—what is it now—it feels like ages, but my awareness for the passing of time in the middle of a global panini is relatively not great.Ashleigh: Yeah. I think we're at day—what is it—1,053 of March 2020? So, time is irrelevant; it's a construct; I don't know. But, technically, by the Gregorian Calendar, I think I'm at six months.Corey: It's very odd to me, at least the way that I contextualized doing this. Back when I started what became The Duckbill Group, I was an independent consultant. It was, more or less, working people I knew through my network who had a very specific, very expensive problem: The AWS bill is too high. And I figured, this is genius. It is the easiest possible sale in the world and one of the only scenarios where I can provably demonstrate ROI to a point where, “Bring me in; you will inherently save money.”And all of that is true, but one of things I learned very quickly was that, even with the easiest sale of, “Hi. I'd like to sell you this bag of money,” there is no such thing as an easy enterprise sale. There is nuance to it. There is a lot of difficulty to it. And I was left with the, I guess, driving question—after my first few months of playing this game—of, “How on earth does anyone make money in this space?”The reason I persisted was, basically, a bunch of people did favors for me, but they didn't owe me at all. It was, “Oh, great. I'll give them the price quote.” And they're, like, “Oh, yeah.” So cool, they turned around and quoted that to their boss at triple the rate because, “Don't slit your own throat on this.” They were right. And not for nothing, it turns out when you're selling advice, charging more for it makes it likelier to succeed as a project.But, I had no idea what I was doing. And, like most engineers on Twitter, I look at something I don't understand deeply myself, and figure, “Oh. Well, it's not engineering, therefore, it's easy.” Yeah, it turns out that running a business is humbling across a whole bunch of different axes.Ashleigh: I wouldn't even say, it's not running a business; it's working with humans. Working with humans is humbling. If you're working with a machine or even something as simple as, like, you know, you're making a product. It's follow a recipe; it's okay. Follow the instructions. I do A, then B, then C, then D, unless you don't enjoy using the instructions because you don't enjoy using instructions. But you still follow a set general process; you build a thing that comes out correctly.The moment that process is, talk to this person, and then Person A, then Person B, then Person C, then Person D, then Back to Person A, then Person D, and then finally to Person E, everything goes to heck in a handbasket. That's what really makes it interesting. And for those of us who are of a certain disposition, we find that fascinating and enthralling. If you're of another disposition, that's hell on earth [laugh]. So, it's a very—yeah, it's a very interesting thing.Corey: Back when I was independent, and people tried to sell me things—and yeah, sometimes it worked. It was always interesting going through various intake funnels and the rest. And, like, “Well, what role do you hold in the organization? Do you influence the decision? Do you make the decision? How many people need to be involved in the rest?”And I was looking around going, “How many people do you think fit in my home office here? Let's be serious.” I mean, there are times I escalated to the Chihuahua because she's unpleasant and annoying and basically, sometimes so are people. But that's a separate topic for later. But it became a very different story back as the organizational distance between the people that needed to sign off on a sale increased.Ashleigh: Mm-hm. Absolutely. And you might have felt me squirm when you described those questions because one of my biggest pet peeves is when people take sales terminology and directly use that with clients. Just like if you're an engineer and you're describing what you do, you're not going to go home and explain to your dad in technical jargon what exactly; you're going to tell him broad strokes. And if they're interested, go deeper and deeper; technical, more technical.I hate when salespeople use sales jargon, like, “What's your role in the organization? Are you the decision-maker?” Don't—mmm. There are better ways to deal with that. So, that's just a sign of poor training. It's not the sales rep's fault; it's his company's fault—their company's fault. But that's a different thing.It's fascinating to me, kind of, watching this—what you said spoke of two things there. One is poor training, and two, of a lack of awareness of the situation and a lack of just doing a little bit of pre-work. Like, you do five seconds of research on Corey Quinn, you can realize that the company is ten to 15 people tops. So, it makes sense to ask a question around, “Hey, do you need anyone else to sign off before we can move forward with this project?”That tells me if I need to get someone for technical, for budget, for whatever, but asking if you're a decision-maker, or if you're influencing, or if you're doing initial research, like, that's using sales terminology, not actually getting to the root of the problem and immediately making it very clear, you didn't do any actual research in advance, which is not—in modern selling—not okay.Corey: My business partner, Mike, has a CEO job title, and he'll get a whole bunch of cold outreach constantly all day, every day. I conducted a two-week experiment where in front of my Chief Cloud Economist job title, I put ‘CTO/' just to see what would happen, and sure enough, I started getting outreach left, right, up, down, and sideways. Not just for things that a CTO figure might theoretically wind up needing to buy, but also, job opportunities for a skill set that I haven't dusted off in a decade.So, okay. Once people can have something that hits their filters when you're searching for very specific titles, then you wind up getting a lot more outreach. But if you create a job title that no one sensible would ever pick for themselves, suddenly a lot of that tends to go by the wayside. It shined a light on how frustratingly dreary a lot of the sales prospecting work really can be from—Ashleigh: Oh, yeah.Corey: —just from the side of someone who gets it. Now, I'm not exaggerating when I say that I did work in sales once upon a time. Not great at it, but one of the first white-collar-style jobs that I had was telemarketing, of all things. And I was spectacular at it because I was fortunate enough to be working on a co-branded affinity credit card that was great, and I had the opportunity to position it as a benefit of an existing membership or something else people already had. I was consistently top-ten out of 400 people on a shift, and it was great.But it was also something that was very time-limited, and if you're having an off day, everything winds up crumbling. And, eventually, I drifted off and started doing different things. But I've never forgotten those days. And that's why it just grinds my gears both to see crappy sales stuff happening, and two, watching people on Twitter—particularly—taking various sales-prospect outreach for a drag. And it's—Ashleigh: Oh, God. Yeah.Corey: —you know, not everyone is swimming in the ocean of privilege that some of the rest of us are. And understand that you're just making yourself look like a jerk when you're talking to someone who is relatively early-career and didn't happen to google you deeply enough before sending you an email that you find insulting. That bugs me a fair bit.Ashleigh: And I think part of that is just a lack of humanity and understanding. Like, there's—I mean, I get it; I'm the first person to be jumping on Twitter and [unintelligible 00:08:41] when something goes down, or something's not working, and saying, you know—I'm the first one to get angry and start complaining. Don't get me wrong. However, what I think a lot of people—it's really easy to dehumanize something you don't see very often, or you're not involved in directly. And I find it real interesting you mentioned you worked in, you know, doing telemarketing.I lasted literally two weeks in telemarketing. I full-on rage-quit. It was a college job. I worked in my college donations center. I lasted two weeks, and I fully walked out on a shift. I was, like, “Screw this; I'm never doing anything like that ever again. I hate this.”But what I hated about it was I hated the lack of connection. I was, like, I'm not just going to read some scripts and get yelled at for having too much banter. Like, I'm getting money; what do you care? I'm getting more money than other people. Maybe they're not making as many calls, but I'm getting just as much, so why do you care how I do this?But what really gets me is you have to remember—and I think a lot of people don't understand how, kind of, most large, modern sales organizations work. And just really quickly giving you a very, very generic explanation, the way a lot of organizations work is they employ something called SDRs or Sales Development Reps. That title can be permeated in a million different ways. There's ADRs, MDRs, BDRs, whatever. But basically, it's their job to do nothing but scour the internet using, sometimes, actual, like, scripts.Sometimes they use LinkedIn; sometimes they have—they purchase databases. So, for example, like, you might change your title on LinkedIn, but it's not changing in the database. Just trust me Corey, they have you flagged as a CTO. Sorry. What [crosstalk 00:10:16].Corey: My personal favorite is when I get cold outreach asking me on the phone call about whether we have any needs for whatever it is they happen to be selling at—and then they name a company that I left in 2012. I don't know how often that database has been sold and resold and sold onwards, yet again. And it's just, I work in tech. What do you think the odds are that I'm still in the same job I was ten years ago? And I get that it happens, but at some point, it just becomes almost laughable.Ashleigh: Yeah. If you work in a company—that when in doubt—I tell every sales, kind of, every company team that I work with—do not use those vendors. Ninety percent of them are not very good; they're using old databases; they don't update. You're better off paying for a database that is subscription-based because then, literally, you've got an SLA on data quality, and you can flag and get things fixed. The number one sales-data provider, I happen to know for a fact, I actually earned, I think, almost $10,000 in donations to a charity in—what was this—this was 2015 because I went through and did a scrub of are RCRM versus I think, LinkedIn or something else, and I flagged everything that wasn't accurate and sent it back to them.And they happened to have a promotion where for every—where you could do a flag that wasn't accurate because they were no longer at the company. They would donate a buck to charity, and I think I sent them, like, 10,000 or something. [unintelligible 00:11:36] I was like, “None of these are accurate.” And they're, like, you know? And they sent me this great email, like, “Thank you for telling us; we really appreciate it.”I didn't even know they were doing this promotion. They thought I'd be saving up for it. And I was, like, “No, I just happened to run this analysis and thought you'd want to know.” So, subscriptions—Corey: You know, it turns out computers are really fast at things.Ashleigh: Yeah, and I was very proud I figured out how to run a script. I was, like, “Yay. Look at me; I wrote a macro.” This was very exciting for—the first—God, the first five or so years of my sales career, I've consistently called myself a dumb salesperson because I was working in really super-technical products. I worked for Arista Networks, FireEye, Bromium, you know, PernixData. I was working in some pretty reasonably hard tech, and I'd always, kind of, introduced myself, I definitely talked about my technical aptitude because I have a degree in political science and opera. These are not technical fields, and yet here I am every day, talking about, you know, tech [crosstalk 00:12:25].Corey: Well, if the election doesn't pan out the way you want, why don't you sing about it? Why not? You can tie all these things together.Ashleigh: You can. And, honestly, there have several points—I've done a whole other shows on, like, how those two, seemingly, completely disparate things have actually been some of the greatest gifts to my career. And most notably, I think, is the fact that I have my degree in political science as a Bachelor of Science, which means I have a BS in BS, which is incredibly relevant to my career in a lot of different ways.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that's snark.cloud/oci-free.Ashleigh: Yeah, so wrapping up, kind of, how modern-skills organizations work, most companies' employees can be called BDRs, and they're typically people who have less than five years of sales experience. They, rightly or wrongly, tend to be people in their early-20s who have very little training. Most people get SDRs on phones within a week, which means—Corey: These are the people that are doing the cold outreach?Ashleigh: —they've gotten maybe five or six hours of product training. Hmm? Sorry.Corey: These are the people who are doing the cold outreach?Ashleigh: These are the people who are doing the cold outreach. So, their whole job is just to get appointments for account execs. Account execs make it—again; tons of different names, but these are the closers. They'll run you through the sales cycle. They typically have between five and thirty years of experience.But they're the ones depending on how big your company is. [unintelligible 00:13:35] the bigger your company, typically the more experience your sales rep's going to have in terms of managing most separate deal cycles. But what ends up happening is you end up with this SDR organization—this is where I've spent most of my career is helping people build healthy sales-development organizations. In terms of this churn-and-burn culture where you've got people coming in and basically flaming out because they go on Twitter or—heaven forbid—Reddit and get sales advice from these loud-mouthed, terrible people, who are telling them to do things that didn't work ten years ago, but they then go try it; they send it out, and then their prospects suddenly blasting them on Twitter.It's not that rep's fault that they got no training in the first place, they got no support, they just had to figure it out because that's the culture. It's the company's fault. And a lot of times, people don't—there was a big push against this last year, I think, within the sales community against other sales leaders doing it, but now, it's starting to spread out. Like, I have no problem dragging someone for a really terrible email. Anonymize the company; anonymize the email. And, if you want to give feedback, give it to them directly. And you can also say, “I'm going to post this, but it's not coming back to you.” And tell them, like—Corey: Whenever I get outreach from—Ashleigh: “Get out of that terrible company.”Corey: Yeah. Whenever I get outreach from AWS for a sales motion or for recruiting or whatnot. I always anonymize the heck out of the rep. It's funny to me because it's, “Don't you know who I am?” It is humorous, on some level. And it's clear that is a numbers game, and they're trying to do a bunch of different things, but a cursory google of my name would show it. It's just amusing.I want to be clear that whenever I do that, I don't think the rep has done anything wrong. They're doing exactly what they should. I just find it very funny that, “Wait, me? Work at an AWS? The bookstore?” It seems like it would be a—yeah. Yeah, the juxtaposition is just hilarious to me. They've done nothing wrong, and that's okay. It's a hard racket.I remember—at least they have the benefit over my first enterprise sales job where I was selling tape drives into the AS/400 market, competing against IBM on price. That was in the days of “No one ever gets fired for buying IBMs.” So, yeah. The place you want to save money on is definitely the backup system that's going to save all of your systems. I made one sale in my time there—and apparently set a company record because it wasn't specifically aimed at the AS/400—and I did the math on that and realized, “Huh, I'd have to do two of these a month in order to beat the draw against commission structure that they had.”So, I said, “To hell with this,” and I quit. The CEO was very much a sales pro, and, “Well, you need to figure out whether you're a salesperson or not.” Even back then, I had an attitude problem, but it was, “Yeah, I think that—oh, I know that I am. It's just a question is am I going to be a salesperson here?” And the answer is, “No.” It [laugh]—Ashleigh: Yeah.Corey: It's a two-way street.Ashleigh: It is. And I say this all the time to people who—I work with a lot of salespeople now who are, like, “I don't think sales is for me. I don't know, I need [unintelligible 00:16:24]. The past three companies didn't work.” The answer isn't, “Is sales for you?”The answer is, “Are you selling the right thing at the right place?” And one of the things we've learned from the ‘Great Recession' and the ‘Great Reshuffling' in everything is there's no reason to stay at a terrible company, and there's no reason to stay at a company where you're not really passionate and understand what you're selling. I joked about, you know, I talked down about myself for the first bit of my career. Doesn't mean I didn't—like, I might not understand exactly how heuristics work, but I understand what heuristics are. Just don't ask me to design any of them.You know, like, you have to understand and you have to be really excited about it. And that's what modern sales is. And so, yes, you're going to get a ton of the outreach because that's how people—it still works. That's why we all still get Nigerian prince emails. Somebody, somewhere, still clicks those things, sadly. And that gets me really angry.Corey: It's a pure numbers game.Ashleigh: Exactly. Ninety percent if enterprise B2B sales is not that anymore. Even the companies that are using BDRs—which is most of them—are now moving to what's called ‘account-based selling'. We're using hyper-personalized messaging. You're probably noticing videos are popping up more.I'm a huge fan of video. I think it's a great way to force personalization. It's, like, “Hi. Corey, I see you. I'm talking to you. I've done my research. I know what you're doing at The Duckbill Group and here's how I think we can help. If that's not the case, no worries. Let me know; I'll leave you alone.” That's what selling should be.Corey: I have yet to receive one of those, but I'm sure it'll happen now that I've mentioned that and put that out into the universe.Ashleigh: Probably.Corey: What always drove me nuts—and maybe this is unfair—but when I'm trying to use a product, probably something SaaS-based—and I see this a lot—where, first, if you aren't letting me self-serve and get off with the free tier and just start testing something, well, that's already a ding against you because usually I'm figuring this out at 2 o'clock in the morning when I can't sleep, and I want to work on something. I don't want to wait for a sales cycle, and I have to slow things down. Cool. But at some point, for sophisticated customers, you absolutely need to have a sales conversation. But, okay, great. Usually, I encounter this more with lead magnets or other things designed to get my contact info.But what drives me up a wall, when they start demanding information that is very clearly trying to classify me in their sales funnel, on some level. I'll give you my name, my company, and my work email address—although I would think that from my work email address, you could probably figure out where I work and the rest—but then there are other questions. How big is your company? What is your functional role within the company? And where are you geographically?Well, that's an interesting question. Why does that matter in 2022? Well, very often leads get circulated out to people based upon geography. And I get it, but it also frustrates me, just because I don't want to have to deal with classifying and sorting myself out for what is going to be a very brief conversation [laugh] with a salesperson. Because if the product works, great, I'm going to buy. If it doesn't work, I'm going to get frustrated and not want to hear from you forever.Which gets to my big question for you—and please don't take the question as anything other than the joking spirit in which it's intended—but why are so many salespeople profoundly annoying?Ashleigh: I would—uh, hmm.Corey: Sales processes is probably the better way to frame it because—Ashleigh: I was going to say, “Yeah, it's not the people; it's the process.” So—Corey: —it's not the individual's fault, as we've talked about it.Ashleigh: —yeah, I was going to say, I was, like, “Okay, I think it's less the people; more of the processes.” And processes that will make [crosstalk 00:19:37]—Corey: Yeah. It expresses itself as the same person showing up again and again. But that is not—Ashleigh: Totally.Corey: —their fault. That is the process by which they are being measured at as a part of their job. And it's unfair to blame them for that. But the expression is, “This person's annoying the hell out of me, what gives?”Ashleigh: “Oh, my gosh. Why does she keep [unintelligible 00:19:51] my inbox? Leave me alone. Just let me freaking test it.” I said, “I needed two weeks. Just let me have the two weeks to freaking test the thing. I will get back to you.” [unintelligible 00:19:58] yeah, no, I know.And even since moving into leadership several years ago, same thing. I'm like, “Okay, no.” I've gotten to the point where I've had several conversations with salespeople. I'm like, “I know the game. I know what you're trying to do. I respect it. Leave me alone. I promise I will get back to you, just lea”—I have literally said this to people. And the weird thing is most salespeople respect that. We really respect the transparency on that.Now, the trick is what you're talking about with lead capture and stuff like this, again, it comes down to company's design and it comes down to companies who value the buyer experience and customer journey, and companies who don't. And this, I think, is actually more driven by—in my humble opinion—our slightly over-reliance on venture capital, which is all about for a gathering of as much data as possible, figuring out how to monetize it, and move from there. In their mind, personal experience and emotion doesn't really factor into that equation very much, so you end up with these buyer journeys that are less about the buyer and more about getting them from click to purchase as efficiently as possible in terms of company resources, which includes salespeople time. So, as to why you have to fill out all those things, that just to me reeks of a company that maybe doesn't really understand the client experience and probably is going to have a pretty, mmm, support program as well, which means the product had better be really freaking good for me to buy it.Corey: To be clear, at The Duckbill Group, we do not have a two-in-the-morning click here and get you onboarded. Turns out that we have yet to really see the value in building a shopping cart system, where you can buy, “One consulting please,” and call it good. We're not quite at the level of productizing our offering yet and having conversations is a necessary part of what we do. But that also aligns with our customer expectation where there is not a general expectation in this industry that you can buy a full-on bespoke consulting engagement without talking to a human being. That, honestly, if someone trying to sell someone such a thing, I would be terrified.Ashleigh: Yeah, run screaming. Good Lord. No, exactly. And that's one of the reasons I love working with this team and I love this problem is because this isn't a quick, you know, download, install, and save, you know, save ten percent on your AWS bill by installing Duckbill Group. It ain't that simple. If it were that simple, like, AWS wouldn't have the market cap it does.So, that's one of the things I love. I love really meaty problems that don't have clean answers, and specifically have answers that look slightly different for everybody. I love those sort of problems. I've done the highly prioritized stuff: Click here, buy, get it on the free tier, and then it's all about up-sale, cross-sale as needed. Been there, done that; that's fun, and that's a whole different bucket of challenges, but what we're dealing with every single day on the consulting's of The Duckbill Group is far more nuanced and far more exciting because we're also seeing some truly incredible architecture designs. Like, companies who are really on the bleeding edge of what they're doing. And it's just really fun—Corey: Cost and architecture are the same thing in the Cloud.Ashleigh: —[crosstalk 00:22:59] that little—Corey: It's a blast to see it.Ashleigh: It's so much fun. It's, it's, it's… the world's best jigsaw puzzle because it covers, like, every single continent and all these different nuances, and you got to think about a ‘ephemerality,' which is my new favorite word. So…Corey: It's fun because you are building a sales team here, which opens up a few interesting avenues for me. For one, I don't have to manage and yell at individual salespeople in the same way. For example, we talk about it being a process and not a person thing. We're launching some outbound sales work and basically, having the person to talk to about that process—namely you—means that I don't need to be hovering over people's shoulders the way I felt that I once did, as far as what are we sending people? These passive-aggressive drip campaigns of, “Clearly, you don't mind lighting money on fire. If that changes, please let me know.”It's email eight in a sequence. It's no. This stuff has an implicit ‘Love, Mike and Corey' at the bottom of everything that comes out of this company, and it represents us on some respect. And let's be clear, we have a savvy, sophisticated, and more-attractive-than-the-average audience listening to all of these shows. And they'll eat me alive if we start doing stuff like that—Ashleigh: Oh, yeah.Corey: —not to mention that I find it not particularly respectful of their time and who they are. It doesn't work, so we have to be very conscious of that. The fact that I never had to explain that concept in any depth to you made bringing you in one of the easiest decisions we've ever made.Ashleigh: Well, I think it helped—I think in one of my interviews I went off on the ‘alligator email,' which is this infamous email we've all gotten, which is basically, like, you know, “Hi. I haven't heard from you yet, so I want to know which one of these three scenarios has happened to you. One, you're not interested in my product but didn't have the balls to email me and say that you're not interested. Two, you're no longer in this position, in which case, you're not going to read this email anyway. Or three, you're being chased by an alligator, and I should call animal control because you need help.” This email was—Corey: He, he, he, hilarious.Ashleigh: Ugh. And there's variations of it. And I've seen variations of it that are very well done and are on brand and work with the company. I've seen variations that could be legitimately, I think, great humor. And that's great.Humor in emails and humor in sales is fantastic. I have to shout out my friend, Jon Selig up in Canada, who actually, literally, does workshops on how sales teams can integrate humor into their prospecting. It's freaking brilliant. But—Corey: Near and dear to my heart.Ashleigh: —if you're not actually trained in that stuff, don't do it. Don't do the alligator email. But I think I went off on that during one of our interviews just because I was just sick of seeing these things. And what kills me, again, it comes back to the beginning, is people who have no training, no experience coming in—I mean, it really kills me, too, because there's a real concerted effort in the sales community to get more diverse people into sales to, kind of, kill the sales bro just by washing them out, basically. And so, we're recruiting hard with veterans, with black and other racial minority groups, LGBTQ communities, all sorts of things, and indigenous peoples.And so, we're bringing people that also are maybe a little bit more mature, a little bit older, have families they're supporting, and we're throwing them in a role with no support and very little training. And then they wash out, and we wonder why. It's, like, well, maybe because you didn't—it's, like, when I explain this to other people who aren't in sales, like, “Really, imagine coming in to being hired for a coding job, being told you're going to be trained on, you know, Ruby on Rails or C# or whatever it is we're currently using”—my reference is probably super outdated—but then, being given a book, and that's it. And told, “Learn it. And by the way, your first project is due in a month.” That's what we're doing in sales—Corey: For a lot of folks, that's how we learned in the engineering spaces, but let's be clear, the people who do well in that, generally have tailwinds of privilege at their back. They don't have headwinds of, “You suck at this.” It was, you're-born-on-third-you-didn't-hit-a-triple school-of-thought. It's—Ashleigh: Yeah.Corey: —the idea of building an onboarding pipeline, of making this stuff more accessible to people earlier on is incredibly important. One of my, I guess, awakening moments as we were building this company was it turns out that if you manage salespeople as if they were engineers, it doesn't go super well. Whereas, if you manage engineers like they're salespeople, they quit—rage quit—cry, and call you out as being an abusive manager.One of the best descriptions I ever heard from an advisor was that salespeople are sharks. But that's not intended to be unkind. It is simply a facet of their nature. They enjoy the hunt; they enjoy chasing things down, and they like playing games. Whereas, as soon as you start playing games with your engineers on how much money they're going to make this week, that turns out to be a very negative thing. It's a different mindset. It's about motivating people as whatever befits what it is that they want to be doing.Ashleigh: It is. And the other thing is it's a cultural conditioning. So, it's really interesting to say, you know, “People,” you know, “Playing games.” We do enjoy—there's definitely some enjoyment of the competition; there's the thrill of the hunt, absolutely, but at the same time, you want your salespeople to quit? Screw with their money.You screw with their money; we will bail so fast it'll make your head spin. So, it's like, people think, “Oh, we love this.” No, it's really more—think of it as we are gamblers.Corey: Yeah. To be clear when I say, “Playing games with money,” I'm talking about the idea of, “Sell to a company in this profile this quarter, and we'll throw a $5,000 bonus your way,” or something like that. It is if the business wants to see something, great, make it worth the sales team's while to pursue it, or don't be surprised when no one really cares that much about those things—Ashleigh: Exactly.Corey: It's all upside. It is not about, “He, he. And if you don't sell to this weird thing that I can't really describe effectively to you, we're going to cut your bet—” Yeah, that goes over like a lead balloon. As it should. My belief is that compensation should always go up, not down.Ashleigh: Yeah. No, it should. Aside from that, here's a fun stat—I believe this came out of Forrester, it might've been out of [Topel 00:28:54]; I apologize, I don't remember exactly who said this, but a recent study found that less than 68 percent of sales reps make their quota every month. So, imagine that where if you're—we have this thing called OTE, which is On Target Earnings. So, if you have this number you're supposed to take home every month, only 68 percent of sales reps actually do that every month.So, that means we live with this number as our target, but we're living and budgeting anywhere from 30 to 50 percent below that. And then hoping and doing the work that goes in there. That's what we've been conditioned to accept, and that's why you end up with sales reps that use terms like ‘shark' and are aggressive and are in your face and can get—[unintelligible 00:29:30]—Corey: I didn't realize it was pejorative.Ashleigh: I know. No. But here's the thing too, but somebody called it ‘commission breath,' which I love. It's, like, you can smell commission breath coming off us when we're desperate. You totally can. It's because of this antiquated way of building commissions.And this is something that I—this was really obvious to me, and apparently, I was a little bit ahead of the curve. When I started designing comp plans, everyone told me, “You want to design a comp plan? Tie it to what you want them to do very specifically.” So, if you want them to move a pen, design a comp plan that they get a buck when they put the pen from the heel of your hand to the tips of your fingers. Then they get a buck. And then they can do that repeatedly. That's literally how I was taught design comp plans.In my head, that meant that I need to design it in such a way that it's doable for my team because I don't want my team worrying about how they're going to put food on the table while they're talking to a client because they're going get commission breath and it'll piss off the client. That's not a good client experience; that's not going to lead to good performance. Apparently—Corey: Yeah. My concern as a business owner has nothing to do with salespeople making too much money. In fact, I am never happier than I am than paying out commissions. The concern, then, therefore has to become the, “Okay, great. How do I keep the salespeople from being inadvertently incentivized to sell something for $10 that costs me $12 to fulfill?”It's a question of what behaviors do you incentivize that align what they're motivated by with what the company needs. And very often getting that wrong—which happens from time to time—is not viewed as a learning experience that it should be. But instead, “They're just out to screw us.” And I've seen so many company owners get so annoyed whenever their salespeople outperform. But what did you expect? That is the positive outcome. As opposed to what? The underperforming sales rep that can't close a deal? Please.Ashleigh: Well, no. And let's think about this too, especially if it's tied to commission and you're paying out commission. It's, like, okay, commission is always some, sort of, percentage—depending on a lot of things—but some sort of percent of what they're bringing in. If you design a comp plan that has you paying out more in commission than the sales that were earned to bring it in, that's on you; you screwed up. And you need to either be honest and say, “I screwed up; I can't pay this,” and know that you're going to lose some sales reps, but you won't lose as many as if you just refuse to pay it.But, honestly, and I'm not even kidding, I know people. I've worked at a company that I happen to know did this. That literally fired people because they didn't have the money to pay out the commission. And because they fired them before the commission was due to be paid out, then that person no longer had a legal claim to it. That's common. So, the commission goes both ways.Corey: To be clear, we've never done that, but I also would say that if we had, that's a screaming red flag for our consultancy, given the nature of what it is that we do here. It turns out that when we're building out comp plans, we model out various scenarios. Like, what is the worst way that this could wind up unfolding? And, okay, some of our early drafts it's, yeah, it turns out that we would not be able to pay salaries because we wound up giving all of that in commission to people with uncapped upside. Okay, great.But we're also not going to cap people's commissions because that winds up being a freaking problem, so how do we wind up motivating in a way that continues to grow and continues to incentivize the behaviors we want? And it turns out it's super complicated which why we brought you in. It's easier.Ashleigh: Yeah, it's a pain. But the other side of this too, I think, is there is another force at play here, which is finance. A lot of traditional finance modeling is built around that 50 to 70 percent of people hit commission. So, if all of the sudden, you design a comp plan such of a way that a hundred percent of the team is hitting commission, finance loses their shit. So, you have to make sure that when you're designing these things, one of the things I learned, I learned the hard way—this is how I learned that not everyone does it this way—I built my first comp plan; my team's hitting it.My team's overperforming, not a ton, but we're doing really well. All of the sudden, I'm getting called to Finance and getting raked over the coals. And they're like, “What did you do?” I'm like, “What do you mean what did I do? I designed a comp plan; we're hitting goal. Why are you mad?” “Well, we only had this much budgeted for commission.”And I was, like, “That's not my fault.” “Well, that's what historic performance was.” “Okay, well that's not what we're going to do going forward. We're going to do this.” And they're like, “Oh, well, you need to notify us if you're going to change it like that.” And I was, like, “Wait a minute. You modeled so that my team would not hit OTE?” “Yes.” “That's how you've always done this?” “Yes.” “Okay. Well, that's not what we're going to do going forward, and if that's a problem, I'll go find a door.” Because, no.Especially when we're talking about people who are living in extremely expensive areas. I spent most of my career living and working in San Francisco, managing teams of people who made less than six figures. And that's rough when you're paying two grand in rent every month. And 60 percent of your pay is commission. Like, no. You need to know that money's coming.So, I talk about modern sales a lot because that's what I'm trying to use because there's Glengarry Glen Ross, kind of, Wolf of Wall Street school, which is not how anyone behaves anymore, and if you're in an environment that's like that or treats your salespeople like that? Please leave. And then you've got modern sales, which is all about, “Okay, let's figure out how we can set up our salespeople to be the best people they can be to give our clients the best experience they can.” That's where you get top performance out of, and that's where you never run into the terrible emails with the alligators, and the, “Clearly you like lighting piles of money on fire.” That's where you don't get emails to Corey Quinn asking him if he's interested in coming to work for AWS, the book company.It's by incentivizing the people and creating good humans where they can really thrive as salespeople and as people in general. The rest comes with time. But, it's this whole, new way of looking at things. And it's big, and it's scary, and it costs more upfront, but you get more on the back end every single time.Corey: Not that you care about this an awful lot, but you have your own podcast that talks about this, The Other Side of Sales. What inspired you to decide, not just to build sales teams through a different lens, but also to, “You know what? I'm going to go out and talk into microphones through the internet from time to time.” Which, let's be clear, it takes a little bit of a certain warped perspective. I say this myself, having done this far too often.Ashleigh: Yeah. No, it's a fun little origin story. So, I'm a huge Star Trek geek; obsessive. And I was listening to a Star Trek podcast run by a couple of guys who are a little bit embarrassed to run a Star Trek podcast, called The Greatest Generation. Definitely not safe for work, but a really good podcast if you're into Star Trek at all.And they always do, kind of, letters at the end of the shows. And one of the letters at the end of the show one day was, “Hey, I was really inspired by you guys and I started my own podcast on this random thing that I am super excited about.” And I'm literally driving in the car with my husband, and I'm, like, “Huh. I don't know why I'm not listening to sales podcasts. I listen to enough of these other random ones.” Jumped online, pulled up a list of sales podcasts, and I think I went through three or four articles of, like, every sales podcast that was big. And this was, like, January of 2019.Corey: “By Broseph McBrowerson, but Everyone Calls Him ‘Browie.'” Yeah.Ashleigh: Literally, there was, Conversations with Women in Sales with the late, great—with the amazing Lori Richardson, who's now with it, but she took over for a mentor of mine who passed in 2020, sadly. But there was that, and then there was one other that was hosted by a husband-and-wife team. And that was it out of, like, 30 podcasts. And [laugh] so it was this moment of, like, epiphany of, like, “I can start my own podcast,” and, “Oh, I probably need to,” because, literally, no one looks or sounds like someone who I would actually want to hang out with ever, or do business with, in a lot of cases. And that's really changed. I'm so grateful.But really, what it came down to was I didn't feel there was a podcast for me. There wasn't a podcast I could listen to about sales that could help me, that I felt like I identified with. So, I was, like, “All right, fine. I'll start my own.” I called up a friend, and she was, literally, going through the same thing at the same time, so we said, “Screw it. We'll do our own.”We went full Bender from Futurama. We're like, “Just screw it; we'll have our own podcast… with liquor… and heels… and honest conversations that happens to us every day,” and random stuff. It's a lot of fun. And we've gone through a few iterations and it's been a long journey. We're about to hit our hundredth episode, which is really exciting.But yeah, we're—The Other Side of Sales is on a mission to make B2B sales culture truly inclusive so everyone can thrive, so, our conversations are all interviews with amazing sales pros who are trying to do amazing things and who are 90—I think are over 90 percent—are from a minority background, which is really exciting to, kind of, try and shift that conversation from Broseph McBrowerson. Our original tagline was the ‘anti-sales bro' podcast, but we thought that was a little too antagonistic. So…Corey: Yeah, being a little too antagonistic is, generally, my failure mode, so I hear you on that. I really want to thank you for taking so much time out of your day to speak with me. Because—well, not that I should thank you. It's one of those, I should really turn around and say, “Wait a minute. Why aren't you selling things? Why are you still talking to me?” But no—Ashleigh: No, I'm waiting for you to say, “Back to work.”Corey: Do appreciate your—exactly. I think that's a different podcast. Thank you so much for your time. If people want to learn more, where's the best place to find you?Ashleigh: Well, definitely please go check out duckbillgroup.com. We would love to talk to with you about anything to do with your AWS bill. Got a ton of resources on there around how to get that managed and sorted.If you're interested in connecting with me you can always hit me up at—I'm on Twitter @ashleighatwork, which is another deep-cut Star Trek reference, or you can hit me up at LinkedIn. Just search Ashleigh Early. My name is spelled a little weird because I'm a little weird. It's A-S-H-L-E-I-G-H, and then Early, like ‘early in the morning.'Corey: And links to all of that will wind up in the [show notes 00:39:11]. Thanks so much for your time. It's appreciated.Ashleigh: This has been fun; we'll do it again soon.AndIf your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About PeterPeter's spent more than a decade building scalable and robust systems at startups across adtech and edtech. At Remind, where he's VP of Technology, Peter pushes for building a sustainable tech company with mature software engineering. He lives in Southern California and enjoys spending time at the beach with his family.Links: Redis: https://redis.com/ Remind: https://www.remind.com/ Remind Engineering Blog: https://engineering.remind.com LinkedIn: https://www.linkedin.com/in/hamiltop Email: peterh@remind101.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Today's episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that's built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you're defining those as, which depends probably on where you work. It's getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that's exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn't eat all the data you've gotten on the system, it's exactly what you've been looking for. Check it out today at min.io/download, and see for yourself. That's min.io/download, and be sure to tell them that I sent you.Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats V-U-L-T-R.com slash screaming.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn and this is a fun episode. It is a promoted episode, which means that our friends at Redis have gone ahead and sponsored this entire episode. I asked them, “Great, who are you going to send me from, generally, your executive suite?” And they said, “Nah. You already know what we're going to say. We want you to talk to one of our customers.” And so here we are. My guest today is Peter Hamilton, VP of Technology at Remind. Peter, thank you for joining me.Peter: Thanks, Corey. Excited to be here.Corey: It's always interesting when I get to talk to people on promoted guest episodes when they're a customer of the sponsor because to be clear, you do not work for Redis. This is one of those stories you enjoy telling, but you don't personally have a stake in whether people love Redis, hate Redis, adopt that or not, which is exactly what I try and do on these shows. There's an authenticity to people who have in-the-trenches experience who aren't themselves trying to sell the thing because that is their entire job in this world.Peter: Yeah. You just presented three or four different opinions and I guarantee we felt all at the different times.Corey: [laugh]. So, let's start at the very beginning. What does Remind do?Peter: So, Remind is a messaging tool for education, largely K through 12. We support about 30 million active users across the country, over 2 million teachers, making sure that every student has, you know, equal opportunities to succeed and that we can facilitate as much learning as possible.Corey: When you say messaging that could mean a bunch of different things to a bunch of different people. Once on a lark, I wound up sitting down—this was years ago, so I'm sure the number is a woeful underestimate now—of how many AWS services I could use to send a message from me to you. And this is without going into the lunacy territory of, “Well, I can tag a thing and then mail it to you like a Snowball Edge or something.” No, this is using them as intended, I think I got 15 or 16 of them. When you say messaging, what does that mean to you?Peter: So, for us, it's about communication to the end-user. We will do everything we can to deliver whatever message a teacher or district administrator has to the user. We go through SMS, text messaging, we go through Apple and Google's push services, we go through email, we go through voice call, really pulling out all the stops we can to make sure that these important messages get out.Corey: And I can only imagine some of the regulatory pressure you almost certainly experience. It feels like it's not quite to HIPAA levels, where ohh, there's a private cause of action if any of this stuff gets out, but people are inherently sensitive about communications involving their children. I always sort of knew this in a general sense, and then I had kids myself, and oh, yeah, suddenly I really care about those sorts of things.Peter: Yeah. One of the big challenges, you can build great systems that do the correct thing, but at the end of the day, we're relying on a teacher choosing the right recipient when they send a message. And so we've had to build a lot of processes and controls in place, so that we can, kind of, satisfy two conflicting needs: One is to provide a clear audit log because that's an important thing for districts to know if something does happen, that we have clear communication; and the other is to also be able to jump in and intervene when something inappropriate or mistaken is sent out to the wrong people.Corey: Remind has always been one of those companies that has a somewhat exalted reputation in the AWS space. You folks have been early adopters of a bunch of different services—which let's be clear, in the responsible way, not the, “Well, they said it on stage; time to go ahead and put everything they just listed into production because we for some Godforsaken reason, view it as a todo list.”—but you've been thoughtful about how you approach things, and you have been around as a company for a while. But you've also been making a significant push toward being cloud-native by certain definitions of that term. So, I know this sounds like a college entrance essay, but what does cloud-native mean to you?Peter: So, one of the big gaps—if you take an application that was written to be deployed in a traditional data center environment and just drop it in the cloud, what you're going to get is a flaky data center.Corey: Well, that's unfair. It's also going to be extremely expensive.Peter: [laugh]. Sorry, an expensive, flaky data set.Corey: There we go. There we go.Peter: What we've really looked at–and a lot of this goes back to our history in the earlier days; we ran a top of Heroku and it was kind of the early days what they call the Twelve-Factor Application—but making aggressive decisions about how you structure your architecture and application so that you fit in with some of the cloud tools that are available and that you fit in, you know, with the operating models that are out there.Corey: When you say an aggressive decision, what sort of thing are you talking about? Because when I think of being aggressive with an approach to things like AWS, it usually involves Twitter, and I'm guessing that is not the direction you intend that to go.Peter: No, I think if you look at Twitter or Netflix or some of these players that, quite frankly, have defined what AWS is to us today through their usage patterns, not quite that.Corey: Oh, I mean using Twitter to yell at them explicitly about things—Peter: Oh.Corey: —because I don't do passive-aggressive; I just do aggressive.Peter: Got it. No, I think in our case, it's been plotting a very narrow path that allows us to avoid some of the bigger pitfalls. We have our sponsor here, Redis. Talk a little bit about our usage of Redis and how that's helped us in some of these cases. One of the pitfalls you'll find with pulling a non-cloud-native application and put it in the cloud is state is hard to manage.If you put state on all your machines and machines go down, networks fail, all those things, you now no longer have access to that state and we start to see a lot of problems. One of the decisions we've made is try to put as much data as we can into data stores like Redis or Postgres or something, in order to decouple our hardware from the state we're trying to manage and provide for users so that we're more resilient to those sorts of failures.Corey: I get the sense from the way that we're having this conversation, when you talk about Redis, you mean actual Redis itself, not ElastiCache for Redis, or as to I'm tending to increasingly think about AWS's services, Amazon Basics for Redis.Peter: Yeah. I mean, Amazon has launched a number of products. They have their ElastiCache, they have their new MemoryDB, there's a lot different ways to use this. We've relied pretty heavily on Redis, previously known as Redis Labs, and their enterprise product in their cloud, in order to take care of our most important data—which we just don't want to manage ourselves—trying to manage that on our own using something like ElastiCache, there's so many pitfalls, so many ways that we can lose that data. This data is important to us. By having it in a trusted place and managed by a great ops team, like they have at Redis, we're able to then lean in on the other aspects of cloud data to really get as much value as we can out of AWS.Corey: I am curious. As I said you've had a reputation as a company for a while in the AWS space of doing an awful lot of really interesting things. I mean, you have a robust GitHub presence, you have a whole bunch of tools that have come out Remind that are great, I've linked to a number of them over the years in the newsletter. You are clearly not afraid, culturally, to get your hands dirty and build things yourself, but you are using Redis Enterprise as opposed to open-source Redis. What drove that decision? I have to assume it's not, “Wait. You mean, I can get it for free as an open-source project? Why didn't someone tell me?” What brought you to that decision?Peter: Yeah, a big part of this is what we could call operating leverage. Building a great set of tools that allow you to get more value out of AWS is a little different story than babysitting servers all day and making sure they stay up. So, if you look through, most of our contributions in open-source space have really been around here's how to expand upon these foundational pieces from AWS; here's how to more efficiently launch a suite of servers into an auto-scaling group; here's, you know, our troposphere and other pieces there. This was all before Amazon CDK product, but really, it was, here's how we can more effectively use CloudFormation to capture our Infrastructure as Code. And so we are not afraid in any way to invest in our tooling and invest in some of those things, but when we look at the trade-off of directly managing stateful services and dealing with all the uncertainty that comes, we feel our time is better spent working on our product and delivering value to our users and relying on partners like Redis in order to provide that stability we need.Corey: You raise a good point. An awful lot of the tools that you've put out there are the best, from my perspective, approach to working with AWS services. And that is a relatively thin layer built on top of them with an eye toward making the user experience more polished, but not being so heavily opinionated that as soon as the service goes in a different direction, the tool becomes completely useless. You just decide to make it a bit easier to wind up working with specific environment variables or profiles, rather than what appears to be the AWS UX approach of, “Oh, now type in your access key, your secret key and your session token, and we've disabled copy and paste. Go, have fun.” You've really done a lot of quality of life improvements, more so than you have this is the entire system of how we do deploys, start to finish. It's opinionated and sort of a, like, a take on what Netflix, did once upon a time, with Asgard. It really feels like it's just the right level of abstraction.Peter: We did a pretty good job. I will say, you know, years later, we felt that we got it wrong a couple times. It's been really interesting to see that, that there are times when we say, “Oh, we could take these three or four services and wrap it up into this new concept of an application.” And over time, we just have to start poking holes in that new layer and we start to see we would have been better served by sticking with as thin a layer as possible that enables us, rather than trying to get these higher-level pieces.Corey: It's remarkably refreshing to hear you say that just because so many people love to tell the story on podcasts, or on conference stages, or whatever format they have of, “This is what we built.” And it is an aspirationally superficial story about this. They don't talk about that, “Well, firstly, without these three wrong paths first.” It's always a, “Oh, yes, obviously, we are smart people and we only make the correct decision.”And I remember in the before times sitting in conference talks, watching people talk about great things they'd done, and I'll turn next to the person next to me and say, “Wow, I wish I could be involved in a project like that.” And they'll say, “Yeah, so do I.” And it turns out they work at the company the speaker is from. Because all of these things tend to be the most positive story. Do you have an example of something that you have done in your production environment that going back, “Yeah, in hindsight, I would have done that completely differently.”Peter: Yeah. So, coming from Heroku moving into AWS, we had a great open-source project called Empire, which kind of bridge that gap between them, but used Amazon's ECS in order to launch applications. It was actually command-line compatible with the Heroku command when it first launched. So, a very big commitment there. And at the time—I mean, this comes back to the point I think you and I were talking about earlier, where architecture, costs, infrastructure, they're all interlinked.And I'm a big fan of Conway's Law, which says that an organization's structure needs to match its architecture. And so six, seven years ago, we're heavy growth-based company and we are interns running around, doing all the things, and we wanted to have really strict guardrails and a narrow set of things that our development team could do. And so we built a pretty constrained: You will launch, you will have one Docker image per ECS service, it can only do these specific things. And this allowed our development team to focus on pretty buttons on the screen and user engagement and experiments and whatnot, but as we've evolved as a company, as we built out a more robust business, we've started to track revenue and costs of goods sold more aggressively, we've seen, there's a lot of inefficient things that come out of it.One particular example was we used PgBouncer for our connection pooling to our Postgres application. In the traditional model, we had an auto-scaling group for a PgBouncer, and then our auto-scaling groups for the other applications would connect to it. And we saw additional latency, we saw additional cost, and we eventually kind of twirl that down and packaged that PgBouncer alongside the applications that needed it. And this was a configuration that wasn't available on our first pass; it was something we intentionally did not provide to our development team, and we had to unwind that. And when we did, we saw better performance, we saw better cost efficiency, all sorts of benefits that we care a lot about now that we didn't care about as much, many years ago.Corey: It sounds like you're describing some semblance of an internal platform, where instead of letting all your engineers effectively, “Well, here's the console. Ideally, you use some form of Infrastructure as Code. Good luck. Have fun.” You effectively gate access to that. Is that something that you're still doing or have you taken a different approach?Peter: So, our primary gate is our Infrastructure as Code repository. If you want to make a meaningful change, you open up a PR, got to go through code review, you need people to sign off on it. Anything that's not there may not exist tomorrow. There's no guarantees. And we've gone around, occasionally just shut random servers down that people spun up in our account.And sometimes people will be grumpy about it, but you really need to enforce that culture that we have to go through the correct channels and we have to have this cohesive platform, as you said, to support our development efforts.Corey: So, you're a messaging service in education. So, whenever I do a little bit of digging into backstories of companies and what has made, I guess, an impression, you look for certain things and explicit dates are one of them, where on March 13th of 2020, your business changed just a smidgen. What happened other than the obvious, we never went outside for two years?Peter: [laugh]. So, if we roll back a week—you know, that's March 13th, so if we roll back a week, we're looking at March 6th. On that day, we sent out about 60 million messages over all of our different mediums: Text, email, push notifications. On March 13th that was 100 million, and then, a few weeks later on March 30th, that was 177 million. And so our traffic effectively tripled over the course of those three weeks. And yeah, that's quite a ride, let me tell you.Corey: The opinion that a lot of folks have who have not gotten to play in sophisticated distributed systems is, “Well, what's the hard part there you have an auto-scaling group. Just spin up three times the number of servers in that fleet and problem solved. What's challenging?” A lot, but what did you find that the pressure points were?Peter: So, I love that example, that your auto-scaling group will just work. By default, Amazon's auto-scaling groups only support 1000 backends. So, when your auto-scaling group goes from 400 backends to 1200, things break, [laugh] and not in ways that you would have expected. You start to learn things about how database systems provided by Amazon have limits other than CPU and memory. And they're clearly laid out that there's network bandwidth limits and things you have to worry about.We had a pretty small team at that time and we'd gotten this cadence where every Monday morning, we would wake up at 4 a.m. Pacific because as part of the pandemic, our traffic shifted, so our East Coast users would be most active in the morning rather than the afternoon. And so at about 7 a.m. on the east coast is when everyone came online. And we had our Monday morning crew there and just looking to see where the next pain point was going to be.And we'd have Monday, walk through it all, Monday afternoon, we'd meet together, we come up with our three or four hypotheses on what will break, if our traffic doubles again, and we'd spend the rest of that next week addressing those the best we could and repeat for the next Monday. And we did this for three, four or five weeks in a row, and finally, it stabilized. But yeah, it's all the small little things, the things you don't know about, the limits in places you don't recognize that just catch up to you. And you need to have a team that can move fast and adapt quickly.Corey: You've been using Redis for six, seven years, something along those lines, as an enterprise offering. You've been working with the same vendor who provides this managed service for a while now. What are the fruits of that relationship? What is the value that you see by continuing to have a long-term relationship with vendors? Because let's be serious, most of us don't stay in jobs that long, let alone work with the same vendor.Peter: Yeah. So, coming back to the March 2020 story, many of our vendors started to see some issues here that various services weren't scaled properly. We made a lot of phone calls to a lot of vendors in working with them, and I… very impressed with how Redis Labs at the time was able to respond. We hopped on a call, they said, “Here's what we think we need to do, we'll go ahead and do this. We'll sort this out in a few weeks and figure out what this means for your contract. We're here to help and support in this pandemic because we recognize how this is affecting everyone around the world.”And so I think when you get in those deeper relationships, those long-term relationships, it is so helpful to have that trust, to have a little bit of that give when you need it in times of crisis, and that they're there and willing to jump in right away.Corey: There's a lot to be said for having those working relationships before you need them. So often, I think that a lot of engineering teams just don't talk to their vendors to a point where they may as well be strangers. But you'll see this most notably because—at least I feel it most acutely—with AWS service teams. They'll do a whole kickoff when the enterprise support deal is signed, three years go passed, and both the AWS team and the customer's team have completely rotated since then, and they may as well be strangers. Being able to have that relationship to fall back on in those really weird really, honestly, high-stress moments has been one of those things where I didn't see the value myself until the first time I went through a hairy situation where I found that that was useful.And now it's oh, I—I now bias instead for, “Oh, I can fit to the free tier of this service. No, no, I'm going to pay and become a paying customer.” I'd rather be a customer that can have that relationship and pick up the phone than someone whining at people in a forum somewhere of, “Hey, I'm a free user, and I'm having some problems with production.” Just never felt right to me.Peter: Yeah, there's nothing worse than calling your account rep and being told, “Oh, I'm not your account rep anymore.” Somehow you missed the email, you missed who it was. Prior to Covid, you know—and we saw this many, many years ago—one of the things about Remind is every back-to-school season, our traffic 10Xes in about three weeks. And so we're used to emergencies happening and unforeseen things happening. And we plan through our year and try to do capacity planning and everything, but we been around the block a couple of times.And so we have a pretty strong culture now leaning in hard with our support reps. We have them in our Slack channels. Our AWS team, we meet with often. Redis Labs, we have them on Slack as well. We're constantly talking about databases that may or may not be performing as we expect them, too. They're an extension of our team, we have an incident; we get paged. If it's related to one of the services, we hit them in Slack immediately and have them start checking on the back end while we're checking on our side. So.Corey: One of the biggest takeaways I wish more companies would have is that when you are dependent upon another company to effectively run your production infrastructure, they are no longer your vendor, they're your partner, whether you want them to be or not. And approaching it with that perspective really pays dividends down the road.Peter: Yeah. One of the cases you get when you've been at a company for a long time and been in relationship for a long time is growing together is always an interesting approach. And seeing, sometimes there's some painful points; sometimes you're on an old legacy version of their product that you were literally the last customer on, and you got to work with them to move off of. But you were there six years ago when they're just starting out, and they've seen how you grow, and you've seen how they've grown, and you've kind of been able to marry that experience together in a meaningful way.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that's snark.cloud/oci-free.Corey: Redis is, these days, of data platform back once upon a time, I viewed it as more of a caching layer. And I admit that the capabilities of the platform has significantly advanced since those days when I viewed it purely through lens of cache. But one of the interesting parts is that neither one of those use cases, in my mind, blends particularly well with heavy use of Spot Fleets, but you're doing exactly that. What are your folks doing over there?Peter: [laugh]. Yeah, so as I mentioned earlier, coming back to some of the Twelve-Factor App design, we heavily rely on Redis as sort of a distributed heap. One of our challenges of delivering all these messages is every single message has its in-flight state: Here's the content, here's who we sent it to, we wait for them to respond. On a traditional application, you might have one big server that stores it all in-memory, and you get the incoming requests, and you match things up. By moving all that state to Redis, all of our workers, all of our application servers, we know they can disappear at any point in time.We use Amazon's Spot Instances and their Spot Fleet for all of our production traffic. Every single web service, every single worker that we have runs on this infrastructure, and we would not be able to do that if we didn't have a reliable and robust place to store this data that is in-flight and currently being accessed. So, we'll have a couple hundred gigs of data at any point in time in a Redis Database, just representing in-flight work that's happening on various machines.Corey: It's really neat seeing Spot Fleets being used as something more than a theoretical possibility. It's something I've always been very interested in, obviously, given the potential cost savings; they approach cheap is free in some cases. But it turns out—we talked earlier about the idea of being cloud-native versus the rickety, expensive data center in the cloud, and an awful lot of applications are simply not built in a way that yeah, we're just going to randomly turn off a subset of your systems, ideally, with two minutes of notice, but all right, have fun with that. And a lot of times, it just becomes a complete non-starter, even for stateless workloads, just based upon how all of these things are configured. It is really interesting to watch a company that has an awful lot of responsibility that you've been entrusted with who embraces that mindset. It's a lot more rare than you'd think.Peter: Yeah. And again, you know, sometimes, we overbuild things, and sometimes we go down paths that may have been a little excessive, but it really comes down to your architecture. You know, it's not just having everything running on Spot. It's making effective use of SQS and other queueing products at Amazon to provide checkpointing abilities, and so you know that should you lose an instance, you're only going to lose a few seconds of productive work on that particular workload and be able to kick off where you left off.It's properly using auto-scaling groups. From the financial side, there's all sorts of weird quirks you'll see. You know, the Spot market has a wonderful set of dynamics where the big instances are much, much cheaper per CPU than the small ones are on the Spot market. And so structuring things in a way that you can colocate different workloads onto the same hosts and hedge against the host going down by spreading across multiple availability zones. I think there's definitely a point where having enough workload, having enough scale allows you to take advantage of these things, but it all comes down to the architecture and design that really enables it.Corey: So, you've been using Redis for longer than I think many of our listeners have been in tech.Peter: [laugh].Corey: And the key distinguishing points for me between someone who is an advocate for a technology and someone who's a zealot—or a pure critic—is they can identify use cases for which is great and use cases for which it is not likely to be a great experience. In your time with Redis, what have you found that it's been great at and what are some areas that you would encourage people to consider more carefully before diving into it?Peter: So, we like to joke that five, six years ago, most of our development process was, “I've hit a problem. Can I use Redis to solve that problem?” And so we've tried every solution possible with Redis. We've done all the things. We have number of very complicated Lua scripts that are managing different keys in an atomic way.Some of these have been more successful than others, for sure. Right now, our biggest philosophy is, if it is data we need quickly, and it is data that is important to us, we put it in Enterprise Redis, the cloud product from Redis. Other use cases, there's a dozen things that you can use for a cache, Redis is great for cache, memcache does a decent job as well; you're not going to see a meaningful difference between those sorts of products. Where we've struggled a little bit has been when we have essentially relational data that we need fast access to. And we're still trying to find a clear path forward here because you can do it and you can have atomic updates and you can kind of simulate some of the ACID characteristics you would have in a relational database, but it adds a lot of complexity.And that's a lot of overhead to our team as we're continuing to develop these products, to extend them, to fix any bugs you might have in there. And so we're kind of recalibrating a bit, and some of those workloads are moving to other data stores where they're more appropriate. But at the end of the day, it's data that we need fast, and it's data that's important, we're sticking with what we got here because it's been working pretty well.Corey: It sounds almost like you started off with the mindset of one database for a bunch of different use cases and you're starting to differentiate into purpose-built databases for certain things. Or is that not entirely accurate?Peter: There's a little bit of that. And I think coming back to some of our tooling, as we kind of jumped on a bit of the microservice bandwagon, we would see, here's a small service that only has a small amount of data that needs to be stored. It wouldn't make sense to bring up a RDS instance, or an Aurora instance, for that, you know, in Postgres. Let's just store it in an easystore like Redis. And some of those cases have been great, some of them have been a little problematic.And so as we've invested in our tooling to make all our databases accessible and make it less of a weird trade-off between what the product needs, what we can do right now, and what we want to do long-term, and reduce that friction, we've been able to be much more deliberate about the data source that we choose in each case.Corey: It's very clear that you're speaking with a voice of experience on this where this is not something that you just woke up and figured out. One last area I want to go into with you is when I asked you what is you care about primarily as an engineering leader and as you look at serving your customers well, you effectively had a dual answer, almost off the cuff, of stability and security. I find the two of those things are deeply intertwined in most of the conversations I have, but they're rarely called out explicitly in quite the way that you do. Talk to me about that.Peter: Yeah, so in our wild journey, stability has always been a challenge. And we've alway—you know, been an early startup mode, where you're constantly pushing what can we ship? How quickly can we ship it? And in our particular space, we feel that this communication that we foster between teachers and students and their parents is incredibly important, and is a thing that we take very, very seriously. And so, a couple years ago, we were trying to create this balance and create not just a language that we could talk about on a podcasts like this, but really recognizing that framing these concepts to our company internally: To our engineers to help them to think as they're building a feature, what are the things they should think about, what are the concerns beyond the product spec; to work with our marketing and sales team to help them to understand why we're making these investments that may not get particular feature out by X date but it's still a worthwhile investment.So, from the security side, we've really focused on building out robust practices and robust controls that don't necessarily lock us into a particular standard, like PCI compliance or things like that, but really focusing on the maturity of our company and, you know, our culture as we go forward. And so we're in a place now we are ISO 27001; we're heading into our third year. We leaned in hard our disaster recovery processes, we've leaned in hard on our bug bounties, pen tests, kind of, found this incremental approach that, you know, day one, I remember we turned on our bug bounty and it was a scary day as the reports kept coming in. But we take on one thing at a time and continue to build on it and make it an essential part of how we build systems.Corey: It really has to be built in. It feels like security is not something could be slapped on as an afterthought, however much companies try to do that. Especially, again, as we started this episode with, you're dealing with communication with people's kids. That is something that people have remarkably little sense of humor around. And rightfully so.Seeing that there is as much if not more care taken around security than there is stability is generally the sign of a well-run organization. If there's a security lapse, I expect certain vendors to rip the power out of their data centers rather than run in an insecure fashion. And your job done correctly—which clearly you have gotten to—means that you never have to make that decision because you've approached this the right way from the beginning. Nothing's perfect, but there's always the idea of actually caring about it being the first step.Peter: Yeah. And the other side of that was talking about stability, and again, it's avoiding the either/or situation. We can work in as well along those two—stability and security—we work in our cost of goods sold and our operating leverage in other aspects of our business. And every single one of them, it's our co-number one priorities are stability and security. And if it costs us a bit more money, if it takes our dev team a little longer, there's not a choice at that point. We're doing the correct thing.Corey: Saving money is almost never the primary objective of any company that you really want to be dealing with unless something bizarre is going on.Peter: Yeah. Our philosophy on, you know, any cost reduction has been this should have zero negative impact on our stability. If we do not feel we can safely do this, we won't. And coming back to the Spot Instance piece, that was a journey for us. And you know, we tested the waters a bit and we got to a point, we worked very closely with Amazon's team, and we came to that conclusion that we can safely do this. And we've been doing it for over a year and seen no adverse effects.Corey: Yeah. And a lot of shops I've talked to folks about well, when we go and do a consulting project, it's, “Okay. There's a lot of things that could have been done before we got here. Why hasn't any of that been addressed?” And the answer is, “Well. We tried to save money once and it caused an outage and then we weren't allowed to save money anymore. And here we are.” And I absolutely get that perspective. It's a hard balance to strike. It always is.Peter: Yeah. The other aspect where stability and security kind of intertwine is you can think about security as InfoSec in our systems and locking things down, but at the end of the day, why are we doing all that? It's for the benefit of our users. And Remind, as a communication platform, and safety and security of our users is as dependent on us being up and available so that teachers can reach out to parents with important communication. And things like attendance, things like natural disasters, or lockdowns, or any of the number of difficult situations schools find themselves in. This is part of why we take that stewardship that we have so seriously is that being up and protecting a user's data just has such a huge impact on education in this country.Corey: It's always interesting to talk to folks who insists they're making the world a better place. And it's, “What do you do?” “We're improving ad relevance.” I mean, “Okay, great, good for you.” You're serving a need that I would I would not shy away from classifying what you do, fundamentally, as critical infrastructure, and that is always a good conversation to have. It's nice being able to talk to folks who are doing things that you can unequivocally look at and say, “This is a good thing.”Peter: Yeah. And around 80% of public schools in the US are using Remind in some capacity. And so we're not a product that's used in a few civic regions. All across the board. One of my favorite things about working in Remind is meeting people and telling them where I work, and they recognize it.They say, “Oh, I have that app, I use that app. I love it.” And I spent years and ads before this, and you know, I've been there and no one ever told me they were glad to see an ad. That's never the case. And it's been quite a rewarding experience coming in every day, and as you said, being part of this critical infrastructure. That's a special thing.Corey: I look forward to installing the app myself as my eldest prepares to enter public school in the fall. So, now at least I'll have a hotline of exactly where to complain when I didn't get the attendance message because, you know, there's no customer quite like a whiny customer.Peter: They're still customers. [laugh]. Happy to have them.Corey: True. We tend to be. I want to thank you for taking so much time out of your day to speak with me. If people want to learn more about what you're up to, where's the best place to find you?Peter: So, from an engineering perspective at Remind, we have our blog, engineering.remind.com. If you want to reach out to me directly. I'm on LinkedIn; good place to find me or you can just reach out over email directly, peterh@remind101.com.Corey: And we will put all of that into the show notes. Thank you so much for your time. I appreciate it.Peter: Thanks, Corey.Corey: Peter Hamilton, VP of Technology at Remind. This has been a promoted episode brought to us by our friends at Redis, and I'm Cloud Economist Corey Quinn. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry and insulting comment that you will then hope that Remind sends out to 20 million students all at once.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About PhilipPhilip Griffiths is VP Global Business Development and regularly speaks at events from DevOps to IoT to Cyber Security. Prior to this, he worked for Atos IT Services in various roles working with C-suit executives to realise their digital transformation. He lives in Cambridge with his wife and two daughters.Links: NetFoundry: https://netfoundry.io/ Blog article: https://netfoundry.io/demystifying-the-magic-of-zero-trust-with-my-daughter-and-opensource/ netfoundry.io/screaminginthecloud: https://netfoundry.io/screaminginthecloud TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Today's episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that's built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you're defining those as, which depends probably on where you work. It's getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that's exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn't eat all the data you've gotten on the system, it's exactly what you've been looking for. Check it out today at min.io/download, and see for yourself. That's min.io/download, and be sure to tell them that I sent you.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that's snark.cloud/oci-free.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Today's promoted episode is about a topic that is near and dear to my heart. In the AWS universe, we have seen over time that the networking has gotten more and more capable going from EC2 Classic to the world of VPC network to a whole bunch of other things. But with that capability comes a stupendous amount of complexity, to the point where the easy answer to, “Do you understand how networking works within AWS?” Is, of course, no, “I don't.”I'm joined today by Philip Griffiths, who's the Head of Business Development at NetFoundry. Philip, thank you for joining me.Philip: Pleasure to be here, Corey.Corey: So, NetFoundry has what I would argue to be one of the most intriguing-slash-differentiated approaches to handling that ever-increasing complexity around the networking story, not just in AWS, but a number of different cloud providers, and between them, and that approach is to ignore it completely. Have I nailed the salient approach here with that, I guess we'll call it a flippant statement.Philip: Yeah, I'd probably say so. It's the interesting thing where a lot of people say cloud networking is hard, and from our perspective, it should just be super easy, you should be able to provision it in a few minutes with only outbound ports, and set up your policy so that malicious actors can't get inside it. It should be that easy, and programmable, and it's a shame that the current world is not.Corey: One of the hard problems has always been in, I guess, security, which is the thing that everyone pretends to care about right up front, but in practice, often winds up bolting it on after the fact because, “We care about security,” is sort of the trademark phrase of things that we see, usually an email announcing a data breach when it was very clear that companies did not care about security. It's not just me complaining about how complex the network stack is, but by what directly flows from that. If you aren't able to fit all of that into your head as far as what's going on from a security perspective, the odds of misconfiguration creep in and you don't really become aware of what your risk exposure is. I'm really partial to the idea of just avoiding it entirely. Is NetFoundry, effectively, a network overlay? Is it something that goes a bit beyond that? Effectively, where do you folks start and where do you stop?Philip: Yes, that is precisely correct. We are a network overlay that's been built on the principles of zero trust. What is very unique is the ability to be able to start it wherever you want. So yes, you can deploy it from the AWS Marketplace in a few minutes into your VPC or into your operating system, but we also have the ability to actually put it directly into the application stack itself, which has some very interesting complications. What I find as the most interesting starting point is the oxymoron of secure networking.There are no secure networks. It's not possible. Networks are designed to share information and taking it to first principles, you can only isolate networks. And this is why we had the thought process for if we're going to put our overlay network into stuff and make it secure, we have to start at the application level because then we can actually just isolate it to an application communicating into an application, which has profound implications.Corey: The network part is relatively straightforward. I imagine it just becomes, more or less, what resembles a fairly flat network where everything internal is allowed to talk to each other, and then, in turn, this winds up effectively elevating what should be allowed to talk to what and on what ports and whatnot into something that's a lot closer to the application logic, and transcends whatever provider it happens to be traversing.Philip: Yeah, correct. Following the principles of zero trust, we utilize strong embedded identity as a function of what the endpoints are, what the source and destination is. And therefore you build up your policies and services to say what should communicate to what on the basis that the default the least privileged: Absolutely nothing. Your underlay then, the only thing you need is commodity internet with outbound ports. The whole concept of north-south, east-west, if you're app-embedded, you don't even need public DNS; you don't even need DNS at all. Naming conventions go out the window; you don't need to conform to the standards. You know, you could say, “I want to hit Jenkins.” You go to Jenkins because that can be done.Corey: I would approach this entire endeavor with a fair bit of suspicion and no small amount of alarm if it were something that you had developed internally, as far as, “Well, we're just going to replace what amounts to your entire network stack and just go ahead and trust us. It's fine.” But you didn't do that. You're riding on top of the OpenZiti open-source project. And that basically assuages a whole raft of concerns I would have if something like this were proprietary, and people who know what they're doing—who, let's be clear, aren't me—were not able to inspect it and say, “Okay, this passes muster”—as they have done—or alternately, “No, this is terrifyingly dangerous for a variety of excellent reasons.”And it really feels like a lot of the zero-trust stories that we see these days that are taking advantage of either a network overlay approach or shifting authentication into a different layer, have all taken a somewhat similar tack. I used to think it was a good idea; now I'm starting to suspect it might very well be the only viable model. Do you find that that's accurate, or was this a subject of some contention when you were starting out?Philip: So, there's two very interesting [sigh] thoughts that came to me as you were saying that. The number one is yes, we drove forward with OpenZiti because we've seen open-source just completely dominate the industry and everything new that's been built. If you want to deploy an application, you're building on Linux. And in fact, you're probably [laugh] also running on Kubernetes if you're building new. And our objective was to be able to turn OpenZiti into you know, the open-source, zero-trust private network and equivalent where it's just standard: You'll bake your application with Ziti, by design.It will become a check function that people say you have to comply to. When I look at other vendors and how they look at zero-trust, I broadly see a few things that dishearten me. And again, it's a big market, a lot of people—everyone says they're zero-trust nowadays—but I broadly categorize it into a few ways. You have people who are effectively acting as a proxy and they're adding authentication as a way to check what people should have access to. And they may give access to the whole network, they may do granular; it varies between them. In fact, I've just written a blog on this where I effectively call that no-magic zero trust. It's a blog conceptualized within Harry Potter and [unintelligible 00:07:36] a conversation with my daughter.Corey: Yeah, any way to tell a story that beats the traditional enterprise voice is very much appreciated over in this corner of the world.Philip: [laugh]. Yeah, exactly. You have a second tier, which is what I like to think as semi-magical. And that's where you start saying, I am going to use a software-defined perimeter. So, that it's first packet authenticate, or outbound-only based upon embedded identity. And in my eyes, this is basically an invisibility cloak.You then have app-embedded or magical zero-trust. And this is where you're putting the invisibility cloak inside your application, but you're also giving it a port key so that when it needs to connect to something else on the other side of the world, it just happens; it's transparent. And broadly speaking, I think it's very good that the whole world, including the US government is taking zero-trust incredibly importantly, but the distribution of how people tackle a problem is wildly different. There are some zero-trust solutions, which going in the right direction, but fundamentally, if you're putting it in front of your—I won't name a vendor, but there was a vendor who in December, they released a report that said in 90 seconds, common vulnerabilities are exploited something like 96% of the time. 24-hours, 100%.A few days later, they had a 9.8 CVE on their zero-trust VPN concentrator with a public IP, to which I thought, “If you're not patching that immediately, you've got problems if someone is coming into your network.”Corey: Absolutely. We just completed our annual security awareness training here, and so much of it just… it really made my skin crawl, there was an entire module on how to effectively detect phishing emails, and I got to tell you, if they ever start running spellcheck on their some of their [spear-phishing 00:09:23] campaigns, then we're all doomed because that was what the entire training was here. My position is, is that okay, if someone in your company clicks a bad link and it destroys the company's infrastructure, maybe it's the person who's clicking the link that is not necessarily the critical failure point here. Great, if someone compromises an employee workstation, there should be a way to contain the blast radius, they should not now be inside the walls and able to traverse into whatever it is that they want. There should be additional barriers, and zero trust—though it has become, as you say, a catch-all term—seems to be a serious way of looking at this type of compromise and this sort of mitigation against that sort of behavior.Philip: Definitely. And I think that leads itself to, if you're using the correct zero-trust solution, you're able to close [unintelligible 00:10:12] ports, great, you've now massively reduced your attack surface. But what if someone does get a phishing injection of ransomware or something to their endpoint or into their servers? The two things that I like to think about is that if you're creating your overlay network so that the only communication from your server is outbound into the public IPs of your private overlay, then effectively even if the ransomware gets in there, it can't then connect to its command and control module to then go through the kill cycle to other activities. The other is that if you then look at it [instead 00:10:46] of on the server-side, but actually on the client-side, if someone infects my Mac laptop with ransomware, we use this internal application called Mattermost.And it's basically Slack, but open-source. If my Mattermost is Ziti-fied, even I've got ransomware on my device, it can't side-channel attack into Mattermost because you would actually have to break into the Mattermost application and somehow get that Mattermost application to make a compromised query or whatever to get past the system. So really, when I look at zero-trust, it's not about saying, “We're secure. Job done. You know, fire the security department because we don't need them anymore.” It's all about saying—Corey: Box check. Hand it off to the auditor.Philip: [laugh]. Exactly. It's more about saying the cost of attack, the cost of compromised is increased, ideally, to the point where the malicious actors don't have a return on investment. Because if they don't have a return on investment, they will find something else that's not your applications and your systems to try and compromise.Corey: I want to make sure that I'm contextualizing this properly because we're talking—I think—about what almost looks like two different worlds here. There's the, this is how things wind up working in the ecosystem as far as your server environment goes in a cloud provider, but then we're also talking about what goes on in your corporate network of people who are using laptops, which is increasingly being done from home these days. Where do you folks start? Where do you stop? Do you transcend into the corporate network as well, or is this primarily viewed as a production utility?Philip: We do. One of our original design principles with OpenZiti was for it to be a platform rather than a point solution. So, we designed it from the ground up to be able to support any IP packets, TCP, UDP, et cetera, whether you're doing, client-server, server-server, machine-server, server-initiated, client-initiated, yadda, yadda, yadda. So effectively, the same technology can be applied to many different use cases, depending on where you want to use it. We've been doing work recently to handle, let's call them the hard use cases.Probably one of the hardest ones out there is VoIP. There is a playbook that is currently taking place where the VoIP-managed service provider gets DDoSed by malicious actors; the playbook is to move it onto a CDN so that you move the attack surface and you get respite for a few hours. And there's not really any way to solve it because blocking DDoS attacks at layer 3, layer 4 is incredibly difficult unless you can make your PBX dark. And I've seen a couple of our OpenZitiE engineers making calls from one device to another without going through the PBX by doing that over OpenZiti, and being able to solve some of the challenges that's normally associated with VoIP. Again, it was really one of our design principles: How can we make the platform is so flexible that we can do X, Y, Zed today; we're able to build it, again to become a standard, because it can handle anything.Corey: One of the big questions that people are going to have going into this is, and this may sound surprising is a little bit less about technical risk of things like encryption and the rest and a lot more around the idea of okay, does this mean that what you are building becomes a central point of business risk? In other words, if the NetFoundry SaaS installation and wherever they happen to be using as their primary winds up going down, does that mean suddenly nothing can talk to one another? Because it turns out that, you know, computers are not particularly useful in 2022 if they aren't able to talk to other computers, by and large. “The network is the computer,” as was famously stated. What is the failure mode in the event that you experience technical interruption?Philip: We have this internal sessions, which we call Ziti Kitchens, where our engineering team that are creating Ziti educate on stuff that they're building. And one of them in the Ziti Kitchen was around HA, HS, et cetera, and all of the functions that we've built in so that you have redundancy and availability within the different components. Because effectively it's an overlay network, so we've designed it to be a mesh overlay network. You can setup with one point of failure, but then simultaneously, you can very easily set up to have no points of failure because it can have that redundancy and the overlay has its own mechanisms to do things like smart routing and calculation of underlying costs.That cost in that instance would be, well, AWS has gone down, so the latency to send a packet or flow over it is incredibly high, therefore I'm going to avoid that route and send the traffic to another location. I always remember this Ziti Kitchen episode because the underlying technology that does it is called Terminators—Ziti has these things called Terminators—some of the slide there was this little heads over the Terminator with the red eyes, you know, the silver exoskeleton, which always made me laugh.Corey: It's helpful to have things that fail out of band as opposed to—think of the traditional history in security before everything was branded with zero-trust as a prerequisite for exhibiting at RSA; before that was firewalls was the story, and the question always was, if a firewall fails, do you want it to fail open or fail closed? And believe it or not, there are legitimate answers in both directions; depends on context and what you're doing. There are some things for example, IAM in a cloud world where you absolutely never want to fail open, full stop. You would rather someone bodily rip the power cable out the back of the data center rather than let that happen. With something like this, where nothing is able to talk to one another if the entire system goes down, yeah, you want to have the control system that you folks run to be out of band, that is almost always the right answer.As I look at the various case studies that you have on your website and the serious companies that are using what you have built, do you find that they are primarily centralizing around individual cloud providers? Are you seeing that they're using this as a expression of multi-cloud because I can definitely see a story where oh, it helps bring two cloud providers from a networking and security perspective onto the same page, but I can also see, even within one cloud provider, the idea that, hey, I don't have to play around with your ridiculous nonsense? What use cases are you seeing emerge among your customers?Philip: Definitely, the multi-cloud challenge is one that we're seeing as a emerging trend. We do a lot of work with Oracle and, you know, their stated position is multi-cloud is a fact. In fact for them, if we make the secure networking easier, we can bring workloads into our cloud quicker [unintelligible 00:17:21] the main driver between our partnership. We recently did a blog talking about Superclouds and the advent of organizations like Snowflake and HashiCorp and Confluence and Databricks basically building value and business applications which abstracts away the underlying complexity. But you get into the problem of the standard shared security model, where the customer has to deal with DNS and VPNs and MPLS and AWS Private Endpoint or Azure Private Link or whatever they call it, and you have to assemble this Frankenstein of stuff just to enable a VM to communicate to another VM.And the posit of our blog—in fact, we use that exact quote—John Gage—“The computer is the network.” If you can put a network inside the application, you've now given your supercloud superpowers because [unintelligible 00:18:13] natively—I mean, this is very marketing term, but, “Develop once; deploy anywhere,” and be multi-cloud-native.Corey: The idea of being able to adapt to emerging usage patterns without full-on redeploy is handy. What I also would like to highlight, too, is that you are, of course, a network overlay and that is something that is fairly well understood and people have seen it, but your preferred adoption model goes up a couple of steps beyond that into altering the way that the application thinks about these things. And you offer an SDK that ranges from single line of code implementation to I think up to 20, so it's not a massive rewrite of the application, but it does require modification of the stack. What does that buy you, for lack of a better term? Because once you have the application becomes aware of what is effectively its own, “Special network,” quote-unquote, its work to wind up modifying existing applications around something like this. What's the payoff?Philip: So, there's three broad ones that immediately come to my mind. Number one is the highest security that effectively—your private network is inside the app, so you have to somehow break into the app and that can be incredibly complicated, particularly run the app in something like a confidential compute enclave; you can now have a distributed confidential system.The second is what you're getting in programmability. You're able to effectively operate in a fully—even, you know, you get to a GitOps environment. We're currently working on documentation which says, “Hey, you can do all this stuff in GitOps and then it'll go into your CI/CD and that'll talk to the APIs.” And it'll effectively do everything in a completely programmable manner so that you can treat your private networks as cattle rather than as pets.The third is transparency. You used the words earlier of bolt-on networking because that's how we always think about networking security: We bolt it on. As a user, we have to jump through the VPN hoop, we have to go through the bastion, we have to interact with the network. If your private network's inside the application, then you interact with the application. I can have a mobile application on my device and I have no idea that it's part of a private network and that the API is private and the malicious actors can't get to it. I just interact with the application. That is it.That is what no one else has the ability to do and where OpenZiti has its most power because then you get rid of the constant tug of war between the security team that want to lock everything down and the users and the developers who want to move fast and give a great experience. You can effectively have your cake and eat it.Corey: The challenge, of course, with rolling a lot of these things out in a way that becomes highly programmable is that unlocks a bunch of capability, but the double-edged sword there is always one of complexity. I mean, we take a look at the way that AWS networking has progressed, and they finally rolled out the VPC Reachability Analyzer, so when two things can't talk to each other, well, you run this thing and it tells you exactly why, which is super handy. And then just as a way of twisting the knife a little bit, every time you run it, they charge at ten cents for the privilege, which doesn't actually matter in the context of what anyone is being compensated for, until and unless you build this into something programmatic, but it stings a little bit. And the idea of being able to program these things to abstract away a lot of that complexity is incredibly compelling, except for the part where now it feels like it really increases developer burden on a lot of these things. Have you found that to be true? Do you find that it is sort of like a sliding scale? What has the customer experience been around this?Philip: I would say a sliding scale. You know, we had one organization who they started with the OpenZiti Tunnelers, and then we convinced them to use the SDK and [unintelligible 00:21:51], “Oh, this was super easy.” And now they just run OpenZiti on themselves. But then they've also said at some point, we'll use the NetFoundry platform, which effectively gives us a SaaS experience in consuming that. One of the huge focus—well, we've got a few big focuses for product development, but one of the really big areas is really giving more visibility and monitoring so that rather than people having to react to configuration problems or things which they need to fix in order to ensure your perfect network overlay, instead, those things are being seen and automatically dealt with human-in-the-loop if you want it, in order to remove that burden.Because ultimately, if you can get the network to a point where as long as you've got underlay and you've set your policy, the overlay is going to work, it's going to be secure, and it's going to give you the uptime you need, that is the Nirvana that we all have to strive for.Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats V-U-L-T-R.com slash screaming.Corey: A common criticism of things that shall we say abstract away the network is a fairly common predictable failure mode. I've been making fun of Kubernetes on this particular point for years, and I'm annoyed that at the time that we're recording this, that is still accurate. But from the cloud providers' perspective, when you run Kubernetes, it looks like one big really strangely behaved single-tenant application. And Kubernetes itself is generally not aware of zone affinity, so it could just as easily wind up tossing traffic to the node next to it at zero cost or across an availability zone at two cents per gigabyte, or, God forbid across the internet at nine cents a gigabyte and counting depending upon how it works. And the application-side has absolutely no conception of this.How does OpenZiti address this in the real world because it's one of those things where it almost doesn't matter what you folks charge on top of it, but instead oh wow, this winds up being so hellaciously expensive that we can't use it regardless of whatever benefit it provides just because it becomes a non-starter.Philip: So, when we built the overlay and the mesh, we did it from the perspective of making it as programmable and self-driven as possible. So, with the whole Terminator strategies that was mentioned earlier, it gives you the ability to start putting logic into how you want packets to flow. Today, it does it on a calculation of end-to-end latency and chooses and reroutes traffic in order to give that information. But there's no reason that you couldn't hook it up into understanding what is the numerical in monetary cost for sending a packet along a certain path. Or even what is my application performance monitoring tool saying? Because what that says versus what the network believes could be different things. And effectively you can ingest that information to make your smart routing decisions so all of that logic can exist within the overlay that operates for you.Corey: I will say that really harkens back, on some level, to what I was experimenting with back when I got my CCNA many years ago where there's an idea of routing protocols have built into the idea of the cost of a link. I will freely admit slash confess that at the time of the low-cost link, I assumed this was about what was congested or what would wind up having, theoretically, some transit versus peering agreement. It never occurred to me that I'd have to think about those things in a local network and have to calculate in the Byzantine pricing models of cloud providers. But I've seen examples of folks who are using OpenZiti, and NetFoundry alike, to wind up building in these costing models so that yeah, ideally, it just keeps everything local, but of that path degrades then yes, we would prefer to go over an expensive link than to basically have TCP terminate on the floor until everything comes back up. It sort of feels like there's an awful lot of logic you can bake into that goes well beyond what routing protocols are capable of, just by virtue of exposing that programmability.Well, for this customer because they're on the pre—on the extreme tier, then we want to have the expensive fallback; for low-tier customers, we might want to have them just have an outage until things end. And it really comes down to letting business decisions express themselves in terms of application behavior while in degraded state. I love that idea.Philip: Yeah, I understand. We don't do it today, but there will be a point in the future—I strongly believe—that we'll be able to say, hey, I'll give you an SLA on the internet. Because we'll have such path diversity and visibility of how the internet operates that we'll be able to say within certain risk parameters of what we can deliver. But then you can take it to other logical extremes. You could say, “Hey, I want to build a green overlay. I want to make sure that I'm using Arm instances and in data centers of renewable energy so that my network is green.”Or you can say on a GDPR-compliant overlay so that my data stays within a certain country. You start being able to say—you know, really start dreaming up what are the different policies that I can apply to this because you're applying a central policy to then what is in the distributed system.Corey: One last topic I want to cover before we call it an episode is that you are, effectively, a SaaS company that is built on top of an open-source project. And that has been an interesting path for a lot of companies that early on, figured that if they wrote the software, a lot of the contributors who are doing the lion's share of contribution, that they were clearly the best people to run it. And Amazon's approach towards operational excellence—as they called it—wound up causing some challenges when they launched the Amazon Basics version of that service. I feel like there are some natural defenses built into OpenZiti to keep it from suffering that fate, but I'm very curious to get your take on it.Philip: Fundamentally, our take is that—in fact, our mission is to take what was previously impossible and turn it into a standard. And the only way you can really create standards is to have a open-source that is adopted by the wider community and that ecosystems get built around and into. And that means giving an OpenZiti to absolutely everyone so that they can use it, they can innovate on top of it. We all know that very few people actually want to host their own infrastructure, so we assume a large percentage of people will come and go, “Hey, NetFounder, you provide us the hosting, you provide us the SaaS capability so we don't have to do that ourselves.” But fundamentally in the knowledge that there's something bigger because it's not just us maintaining this project; there's a bunch of people who are doing pull requests and find out cool, fun ways to build further value on what we can build ourselves.We believe the recent history is littered with examples of the new world built on open-source. And fundamentally, we think that's really the only way to be able to change an industry so profoundly as we intend to.Corey: I would also argue that, to be very direct—and I can probably get away with saying this in a way that I suspect you might not be able to—but if AWS had it in their character to simplify things and make it a lot easier for people to work with in a networking sense, what's stopping them? They didn't need to wait for an open-source company to wind up coming out of nowhere and demonstrating the value of this. Customers have been asking it for years. I think that at this point, this is something that is unlikely to ever wind up being integrated into a cloud provider's primary offering. Until and unless the entire industry shifts, at which point we're having a radically different conversation very far down the road.Philip: Yeah, potentially because it opens the interesting thing that if you make it so easy for someone to take their data out, do they use your cloud less? There are some cloud providers that will lean into that because they do see more clouds in the future and others that won't. I see it more myself that as those kind of things happen, it'll be done on a product-by-product basis. For example, we're talking to an organization, and [unintelligible 00:29:49] like, “Oh, could you Ziti-fy our JDBC driver so that when users access our database, they don't have to use a VPN?” [unintelligible 00:29:55], “Yeah. We've already done that with JDBC. We called it ZDBC.”So, we'll just, instead of using the general industry one—probably the Oracle one or something because that's kind of standard—we'll take your one that you've created for yourself and be able to solve that problem for you.Corey: I really want to thank you for taking the time to speak with me today. If people want to learn more, where's the best place to find you?Philip: Best place to go to is netfoundry.io/screaminginthecloud. From there, anyone can grab some free Ziggy swag. Ziggy's our little open-source mascot, cute little piece of pasta with many different outfits. Little sass as well. And you can find further information both on OpenZiti and NetFoundry.Corey: And we will put links to both of those in the [show notes 00:30:40]. Thanks so much for taking the time to speak with me today. I really appreciate it.Philip: It's a pleasure. Thanks, Corey.Corey: Philip Griffiths, Head of Business Development at NetFoundry. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment telling me exactly why I'm wrong about AWS's VPC complexity, and that comment will get moderated and I won't get to read it until you pay me ten cents to tell you how it got moderated.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About ClintClint is the CEO and a co-founder at Cribl, a company focused on making observability viable for any organization, giving customers visibility and control over their data while maximizing value from existing tools.Prior to co-founding Cribl, Clint spent two decades leading product management and IT operations at technology and software companies, including Splunk and Cricket Communications. As a former practitioner, he has deep expertise in network issues, database administration, and security operations.Links: Cribl: https://cribl.io/ Cribl.io: https://cribl.io Docs.cribl.io: https://docs.cribl.io Sandbox.cribl.io: https://sandbox.cribl.io TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Today's episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that's built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you're defining those as, which depends probably on where you work. It's getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that's exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn't eat all the data you've gotten on the system, it's exactly what you've been looking for. Check it out today at min.io/download, and see for yourself. That's min.io/download, and be sure to tell them that I sent you.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They've also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That's S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I have a repeat guest joining me on this promoted episode. Clint Sharp is the CEO and co-founder of Cribl. Clint, thanks for joining me.Clint: Hey, Corey, nice to be back.Corey: I was super excited when you gave me the premise for this recording because you said you had some news to talk about, and I was really excited that oh, great, they're finally going to buy a vowel so that people look at their name and understand how to pronounce it. And no, that's nowhere near forward-looking enough. It's instead it's some, I guess, I don't know, some product announcement or something. But you know, hope springs eternal. What have you got for us today?Clint: Well, one of the reasons I love talking to your audiences because product announcements actually matter to this audience. It's super interesting, as you get into starting a company, you're such, like, a product person, you're like, “Oh, I have this new set of things that's really going to make your life better.” And then you go out to, like, the general media, and you're like, “Hey, I have this product.” And they're like, “I don't care. What product? Do you have a funding announcement? Do you have something big in the market that—you know, do you have a new executive? Do you”—it's like, “No, but, like, these features, like these things, that we—the way we make our lives better for our customers. Isn't that interesting?” “No.”Corey: Real depressing once you—“Do you have a security breach to announce?” It's, “No. God no. Why would I wind up being that excited about it?” “Well, I don't know. I'd be that excited about it.” And yeah, the stuff that mainstream media wants to write about in the context of tech companies is exactly the sort of thing that tech companies absolutely do not want to be written about for. But fortunately, that is neither here nor there.Clint: Yeah, they want the thing that gets the clicks.Corey: Exactly. You built a product that absolutely resonates in its target market and outside of that market. It's one of those, what is that thing, again? If you could give us a light refresher on what Cribl is and does, you'll probably do a better job of it than I will. We hope.Clint: We'd love to. Yeah, so we are an observability company, fundamentally. I think one of the interesting things to talk about when it comes to observability is that observability and security are merging. And so I like to say observability and include security people. If you're a security person, and you don't feel included by the word observability, sorry.We also include you; you're under our tent here. So, we sell to technology professionals, we help make their lives better. And we do that today through a flagship product called LogStream—which is part of this announcement, we're actually renaming to Stream. In some ways, we're dropping logs—and we are a pipeline company. So, we help you take all of your existing agents, all of your existing data that's moving, and we help you process that data in the stream to control costs and to send it multiple places.And it sounds kind of silly, but one of the biggest problems that we end up solving for a lot of our enterprises is, “Hey, I've got, like, this old Syslog feed coming off of my firewalls”—like, you remember those things, right? Palo Alto firewalls, ASA firewalls—“I actually get that thing to multiple places because, hey, I want to get that data into another security solution. I want to get that data into a data lake. How do I do that?” Well, in today's world, that actually turns out is sort of a neglected set of features, like, the vendors who provide you logging solutions, being able to reshape that data, filter that data, control costs, wasn't necessarily at the top of their priority list.It wasn't nefarious. It wasn't like people are like, “Oh, I'm going to make sure that they can't process this data before it comes into my solution.” It's more just, like, “I'll get around to it eventually.” And the eventually never actually comes. And so our streaming product helps people do that today.And the big announcement that we're making this week is that we're extending that same processing technology down to the endpoint with a new product we're calling Cribl Edge. And so we're taking our existing best-in-class management technology, and we're turning it into an agent. And that seems kind of interesting because… I think everybody sort of assumed that the agent is dead. Okay, well, we've been building agents for a decade or two decades. Isn't everything exactly the same as it was before?But we really saw kind of a dearth of innovation in that area in terms of being able to manage your agents, being able to understand what data is available to be collected, being able to auto-discover the data that needs to be able to be collected, turning those agents into interactive troubleshooting experiences so that we can, kind of, replicate the ability to zoom into a remote endpoint and replicate that Linux command line experience that we're not supposed to be getting anymore because we're not supposed to SSH into boxes anymore. Well, how do I replicate that? How do I see how much disk is on this given endpoint if I can't SSH into that box? And so Cribl Edge is a rethink about making this rich, interactive experience on top of all of these agents that become this really massive distributed system that we can process data all the way out at where the data is being emitted.And so that means that now we don't nec—if you want to process that data in the stream, okay, great, but if you want to process that data at its origination point, we can actually provide you cheaper cost because now you're using a lot of that capacity that's sitting out there on your endpoints that isn't really being used today anyway—the average utilization of a Kubernetes cluster is like 30%—Corey: It's that high. I'm sort of surprised.Clint: Right? I know. So, Datadog puts out the survey every year, which I think is really interesting, and that's a number that always surprised me is just that people are already paying for this capacity, right? It's sitting there, it's on their AWS bill already, and with that average utilization, a lot of the stuff that we're doing in other clusters, or while we're moving that data can actually just be done right there where the data is being emitted. And also, if we're doing things like filtering, we can lower egress charges, there's lots of really, really good goodness that we can do by pushing that processing further closer to its origination point.Corey: You know, the timing of this episode is somewhat apt because as of the time that we're recording this, I spent most of yesterday troubleshooting and fixing my home wireless network, which is a whole Ubiquity-managed thing. And the controller was one of their all-in-one box things that kept more or less power cycling for no apparent reason. How do I figure out why it's doing that? Well, I'm used to, these days, doing everything in a cloud environment where you can instrument things pretty easily, where things start and where things stop is well understood. Finally, I just gave up and used a controller that's sitting on an EC2 instance somewhere, and now great, now I can get useful telemetry out of it because now it's stuff I know how to deal with.It also, turns out that surprise, my EC2 instance is not magically restarting itself due to heat issues. What a concept. So, I have a newfound appreciation for the fact that oh, yeah, not everything lives in a cloud provider's regions. Who knew? This is a revelation that I think is going to be somewhat surprising for folks who've been building startups and believe that anything that's older than 18 months doesn't exist.But there's a lot of data centers out there, there are a lot of agents living all kinds of different places. And workloads continue to surprise me even now, just looking at my own client base. It's a very diverse world when we're talking about whether things are on-prem or whether they're in cloud environments.Clint: Well, also, there's a lot of agents on every endpoint period, just due to the fact that security guys want an agent, the observability guys want an agent, the logging people want an agent. And then suddenly, I'm, you know, I'm looking at every endpoint—cloud, on-prem, whatever—and there's 8, 10 agents sitting there. And so I think a lot of the opportunity that we saw was, we can unify the data collection for metric type of data. So, we have some really cool defaults. [unintelligible 00:07:30] this is one of the things where I think people don't focus much on, kind of, the end-user experience. Like, let's have reasonable defaults.Let's have the thing turn on, and actually, most people's needs are set without tweaking any knobs or buttons, and no diving into YAML files and looking at documentation and trying to figure out exactly the way I need to configure this thing. Let's collect metric data, let's collect log data, let's do it all from one central place with one agent that can send that data to multiple places. And I can send it to Grafana Cloud, if I want to; I can send it to Logz.io, I can send it to Splunk, I can send it to Elasticsearch, I can send it to AWS's new Elasticsearch-y the thing that we don't know what they're going to call it yet after the lawsuit. Any of those can be done right from the endpoint from, like, a rich graphical experience where I think that there's a really a desire now for people to kind of jump into these configuration files where really a lot of these users, this is a part-time job, and so hey, if I need to go set up data collection, do I want to learn about this detailed YAML file configuration that I'm only going to do once or twice, or should I be able to do it in an easy, intuitive way, where I can just sit down in front of the product, get my job done and move on without having to go learn some sort of new configuration language?Corey: Once upon a time, I saw an early circa 2012, 2013 talk from Jordan Sissel, who is the creator of Logstash, and he talked a lot about how challenging it was to wind up parsing all of the variety of log files out there. Even something is relatively straightforward—wink, wink, nudge, nudge—as timestamps was an absolute monstrosity. And a lot of people have been talking in recent years about OpenTelemetry being the lingua franca that everything speaks so that is the wave of the future, but I've got a level with you, looking around, it feels like these people are living in a very different reality than the one that I appear to have stumbled into because the conversations people are having about how great it is sound amazing, but nothing that I'm looking at—granted from a very particular point of view—seems to be embracing it or supporting it. Is that just because I'm hanging out in the wrong places, or is it still a great idea whose time has yet to come, or something else?Clint: So, I think a couple things. One is every conversation I have about OpenTelemetry is always, “Will be.” It's always in the future. And there's certainly a lot of interest. We see this from customer after customer, they're very interested in OpenTelemetry and what the OpenTelemetry strategy is, but as an example OpenTelemetry logging is not yet finalized specification; they believe that they're still six months to a year out. It seems to be perpetually six months to a year out there.They are finalized for metrics and they are finalized for tracing. Where we see OpenTelemetry tends to be with companies like Honeycomb, companies like Datadog with their tracing product, or Lightstep. So, for tracing, we see OpenTelemetry adoption. But tracing adoption is also not that high either, relative to just general metrics of logs.Corey: Yeah, the tracing implementations that I've seen, for example, Epsagon did this super well, where it would take a look at your Lambdas Function built into an application, and ah, we're going to go ahead and instrument this automatically using layers or extensions for you. And life was good because suddenly you got very detailed breakdowns of exactly how data was flowing in the course of a transaction through 15 Lambdas Function. Great. With everything else I've seen, it's, “Oh, you have to instrument all these things by hand.” Let me shortcut that for you: That means no one's going to do it. They never are.It's anytime you have to do that undifferentiated heavy lifting of making sure that you put the finicky code just so into your application's logic, it's a shorthand for it's only going to happen when you have no other choice. And I think that trying to surface that burden to the developer, instead of building it into the platform so they don't have to think about it is inherently the wrong move.Clint: I think there's a strong belief in Silicon Valley that—similar to, like, Hollywood—that the biggest export Silicon Valley is going to have is culture. And so that's going to be this culture of, like, developer supporting their stuff in production. I'm telling you, I sell to banks and governments and telcos and I don't see that culture prevailing. I see a application developed by Accenture that's operated by Tata. That's a lot of inertia to overcome and a lot of regulation to overcome as well, and so, like, we can say that, hey, separation of duties isn't really a thing and developers should be able to support all their own stuff in production.I don't see that happening. It may happen. It'll certainly happen more than zero. And tracing is predicated on the whole idea that the developer is scratching their own itch. Like that I am in production and troubleshooting this and so I need this high-fidelity trace-level information to understand what's going on with this one user's experience, but that doesn't tend to be in the enterprise, how things are actually troubleshot.And so I think that more than anything is the headwind that slowing down distributed tracing adoption. It's because you're putting the onus on solving the problem on a developer who never ends up using the distributed tracing solution to begin with because there's another operations department over there that's actually operating the thing on a day-to-day basis.Corey: Having come from one of those operations departments myself, the way that I would always fix things was—you know, in the era that I was operating it made sense—you'd SSH into a box and kick the tires, poke around, see what's going on, look at the logs locally, look at the behaviors, the way you'd expect it to these days, that is considered a screamingly bad anti-pattern and it's something that companies try their damnedest to avoid doing at all. When did that change? And what is the replacement for that? Because every time I asked people for the sorts of data that I would get from that sort of exploration when they're trying to track something down, I'm more or less met with blank stares.Clint: Yeah. Well, I think that's a huge hole and one of the things that we're actually trying to do with our new product. And I think the… how do I replicate that Linux command line experience? So, for example, something as simple, like, we'd like to think that these nodes are all ephemeral, but there's still a disk, whether it's virtual or not; that thing sometimes fills up, so how do I even do the simple thing like df -kh and see how much disk is there if I don't already have all the metrics collected that I needed, or I need to go dive deep into an application and understand what that application is doing or seeing, what files it's opening, or what log files it's writing even?Let's give some good examples. Like, how do I even know what files an application is running? Actually, all that information is all there; we can go discover that. And so some of the things that we're doing with Edge is trying to make this rich, interactive experience where you can actually teleport into the end node and see all the processes that are running and get a view that looks like top and be able to see how much disk is there and how much disk is being consumed. And really kind of replicating that whole troubleshooting experience that we used to get from the Linux command line, but now instead, it's a tightly controlled experience where you're not actually getting an arbitrary shell, where I could do anything that could give me root level access, or exploit holes in various pieces of software, but really trying to replicate getting you that high fidelity information because you don't need any of that information until you need it.And I think that's part of the problem that's hard with shipping all this data to some centralized platform and getting every metric and every log and moving all that data is the data is worthless until it isn't worthless anymore. And so why do we even move it? Why don't we provide a better experience for getting at the data at the time that we need to be able to get at the data. Or the other thing that we get to change fundamentally is if we have the edge available to us, we have way more capacity. I can store a lot of information in a few kilobytes of RAM on every node, but if I bring thousands of nodes into one central place, now I need a massive amount of RAM and a massive amount of cardinality when really what I need is the ability to actually go interrogate what's running out there.Corey: The thing that frustrates me the most is the way that I go back and find my old debug statements, which is, you know, I print out whatever it is that the current status is and so I can figure out where something's breaking.Clint: [Got here 00:15:08].Corey: Yeah. I do it within AWS Lambda functions, and that's great. And I go back and I remove them later when I notice how expensive CloudWatch logs are getting because at 50 cents per gigabyte of ingest on those things, and you have that Lambda function firing off a fair bit, that starts to add up when you've been excessively wordy with your print statements. It sounds ridiculous, but okay, then you're storing it somewhere. If I want to take that log data and have something else consume it, that's nine cents a gigabyte to get it out of AWS and then you're going to want to move it again from wherever it is over there—potentially to a third system, because why not?—and it seems like the entire purpose of this log data is to sit there and be moved around because every time it gets moved, it winds up somehow costing me yet more money. Why do we do this?Clint: I mean, it's a great question because one of the things that I think we decided 15 years ago was that the reason to move this data was because that data may go poof. So, it was on a, you know, back in my day, it was an HP DL360 1U rackmount server that I threw in there, and it had raid zero discs and so if that thing went dead, well, we didn't care, we'd replace it with another one. But if we wanted to find out why it went dead, we wanted to make sure that the data had moved before the thing went dead. But now that DL360 is a VM.Corey: Yeah, or a container that is going to be gone in 20 minutes. So yeah, you don't want to store it locally on that container. But discs are also a fair bit more durable than they once were, as well. And S3 talks about its 11 nines of durability. That's great and all but most of my application logs don't need that. So, I'm still trying to figure out where we went wrong.Clint: Well, I think it was right for the time. And I think now that we have durable storage at the edge where that blob storage has already replicated three times and we can reattach—if that box crashes, we can reattach new compute to that same block storage. Actually, AWS has some cool features now, you can actually attach multiple VMs to the same block store. So, we could actually even have logs being written by one VM, but processed by another VM. And so there are new primitives available to us in the cloud, which we should be going back and re-questioning all of the things that we did ten to 15 years ago and all the practices that we had because they may not be relevant anymore, but we just never stopped to ask why.Corey: Yeah, multi-attach was rolled out with their IO2 volumes, which are spendy but great. And they do warn you that you need a file system that actively supports that and applications that are aware of it. But cool, they have specific use cases that they're clearly imagining this for. But ten years ago, we were building things out, and, “Ooh, EBS, how do I wind up attaching that from multiple instances?” The answer was, “Ohh, don't do that.”And that shaped all of our perspectives on these things. Now suddenly, you can. Is that, “Ohh don't do that,” gut visceral reaction still valid? People don't tend to go back and re-examine the why behind certain best practices until long after those best practices are now actively harmful.Clint: And that's really what we're trying to do is to say, hey, should we move log data anymore if it's at a durable place at the edge? Should we move metric data at all? Like, hey, we have these big TSDBs that have huge cardinality challenges, but if I just had all that information sitting in RAM at the original endpoint, I can store a lot of information and barely even touch the free RAM that's already sitting out there at that endpoint. So, how to get out that data? Like, how to make that a rich user experience so that we can query it?We have to build some software to do this, but we can start to question from first principles, hey, things are different now. Maybe we can actually revisit a lot of these architectural assumptions, drive cost down, give more capability than we actually had before for fundamentally cheaper. And that's kind of what Cribl does is we're looking at software is to say, “Man, like, let's question everything and let's go back to first principles.” “Why do we want this information?” “Well, I need to troubleshoot stuff.” “Okay, well, if I need to troubleshoot stuff, well, how do I do that?” “Well, today we move it, but do we have to? Do we have to move that data?” “No, we could probably give you an experience where you can dive right into that endpoint and get really, really high fidelity data without having to pay to move that and store it forever.” Because also, like, telemetry information, it's basically worthless after 24 hours, like, if I'm moving that and paying to store it, then now I'm paying for something I'm never going to read back.Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats V-U-L-T-R.com slash screaming.Corey: And worse, you wind up figuring out, okay, I'm going to store all that data going back to 2012, and it's petabytes upon petabytes. And great, how do I actually search for a thing? Well, I have to use some other expensive thing of compute that's going to start diving through all of that because the way I set up my partitioning, it isn't aligned with anything looking at, like, recency or based upon time period, so right every time I want to look at what happened 20 minutes ago, I'm looking at what happened 20 years ago. And that just gets incredibly expensive, not just to maintain but to query and the rest. Now, to be clear, yes, this is an anti-pattern. It isn't how things should be set up. But how should they be set up? And it is the collective the answer to that right now actually what's best, or is it still harkening back to old patterns that no longer apply?Clint: Well, the future is here, it's just unevenly distributed. So there's, you know, I think an important point about us or how we think about building software is with this customer is first attitude and fundamentally bringing them choice. Because the reality is that doing things the old way may be the right decision for you. You may have compliance requirements to say—there's a lot of financial services institutions, for example, like, they have to keep every byte of data written on any endpoint for seven years. And so we have to accommodate their requirements.Like, is that the right requirement? Well, I don't know. The regulator wrote it that way, so therefore, I have to do it. Whether it's the right thing or the wrong thing for the business, I have no choice. And their decisions are just as right as the person who says this data is worthless and should all just be thrown away.We really want to be able to go and say, like, hey, what decision is right? We're going to give you the option to do it this way, we're going to give you the option to do it this way. Now, the hard part—and that when it comes down to, like, marketing, it's like you want to have this really simple message, like, “This is the one true path.” And a lot of vendors are this way, “There's this new wonderful, right, true path that we are going to take you on, and follow along behind me.” But the reality is, enterprise worlds are gritty and ugly, and they're full of old technology and new technology.And they need to be able to support getting data off the mainframe the same way as they're doing a brand new containerized microservices application. In fact, that brand new containerized microservices application is probably talking to the mainframe through some API. And so all of that has to work at once.Corey: Oh, yeah. And it's all of our payment data is in our PCI environment that PCI needs to have every byte logged. Great. Why is three-quarters of your infrastructure considered the PCI environment? Maybe you can constrain that at some point and suddenly save a whole bunch of effort, time, money, and regulatory drag on this.But as you go through that journey, you need to not only have a tool that will work when you get there but a tool that will work where you are today. And a lot of companies miss that mark, too. It's, “Oh, once you modernize and become the serverless success story of the decade, then our product is going to be right for you.” “Great. We'll send you a postcard if we ever get there and then you can follow up with us.”Alternately, it's well, “Yeah, we're this is how we are today, but we have a visions of a brighter tomorrow.” You've got to be able to meet people where they are at any point of that journey. One of the things I've always respected about Cribl has been the way that you very fluidly tell both sides of that story.Clint: And it's not their fault.Corey: Yeah.Clint: Most of the people who pick a job, they pick the job because, like—look, I live in Kansas City, Missouri, and there's this data processing company that works primarily on mainframes, it's right down the road. And they gave me a job and it pays me $150,000 a year, and I got a big house and things are great. And I'm a sysadmin sitting there. I don't get to play with the new technology. Like, that customer is just as an applicable customer, we want to help them exactly the same as the new Silicon Valley hip kid who's working at you know, a venture-backed startup, they're doing everything natively in the cloud. Those are all right decisions, depending on where you happen to find yourself, and we want to support you with our products, no matter where you find yourself on the technology spectrum.Corey: Speaking of old and new, and the trends of the industry, when you first set up this recording, you mentioned, “Oh, yeah, we should make it a point to maybe talk about the acquisition,” at which point I sprayed coffee across my iMac. Thanks for that. Turns out it wasn't your acquisition we were talking about so much as it is the—at the time we record this—-the yet-to-close rumored acquisition of Splunk by Cisco.Clint: I think it's both interesting and positive for some people, and sad for others. I think Cisco is obviously a phenomenal company. They run the networking world. The fact that they've been moving into observability—they bought companies like AppDynamics, and we were talking about Epsagon before the show, they bought—ServiceNow, just bought Lightstep recently. There's a lot of acquisitions in this space.I think that when it comes to something like Splunk, Splunk is a fast-growing company by compared to Cisco. And so for them, this is something that they think that they can put into their distribution channel, and what Cisco knows how to do is to sell things like they're very good at putting things through their existing sales force and really amplifying the sales of that particular thing that they have just acquired. That being said, I think for a company that was as innovative as Splunk, I do find it a bit sad with the idea that it's going to become part of this much larger behemoth and not really probably driving the observability and security industry forward anymore because I don't think anybody really looks at Cisco as a company that's driving things—not to slam them or anything, but I don't really see them as driving the industry forward.Corey: Somewhere along the way, they got stuck and I don't know how to reconcile that because they were a phenomenally fast-paced innovative company, briefly the most valuable company in the world during the dotcom bubble. And then they just sort of stalled out somewhere and, on some level, not to talk smack about it, but it feels like the level of innovation we've seen from Splunk has curtailed over the past half-decade or so. And selling to Cisco feels almost like a tacit admission that they are effectively out of ideas. And maybe that's unfair.Clint: I mean, we can look at the track record of what's been shipped over the last five years from Splunk. And again they're a partner, their customers are great, I think they still have the best log indexing engine on the market. That was their core product and what has made them the majority of their money. But there's not been a lot new. And I think objectively we can look at that without throwing stones and say like, “Well, what net-new? You bought SignalFX. Like, good for you guys like that seems to be going well. You've launched your observability suite based off of these acquisitions.” But organic product-wise, there's not a lot coming out of the factory.Corey: I'll take it a bit further-slash-sadder, we take a look at some great companies that were acquired—OpenDNS, Duo Security, SignalFX, as you mentioned, Epsagon, ThousandEyes—and once they've gotten acquired by Cisco, they all more or less seem to be frozen in time, like they're trapped in amber, which leads us up to the natural dinosaur analogy that I'll probably make in a less formal setting. It just feels like once a company is bought by Cisco, their velocity peters out, a lot of their staff leaves, and what you see is what you get. And I don't know if that's accurate, I'm just not looking in the right places, but every time I talk to folks in the industry about this, I get a lot of knowing nods that are tied to it. So, whether or not that's true or not, that is very clearly, at least in some corners of the market, the active perception.Clint: There's a very real fact that if you look even at very large companies, innovation is driven from a core set of a handful of people. And when those people start to leave, the innovation really stops. It's those people who think about things back from first principles—like why are we doing things? What different can we do?—and they're the type of drivers that drive change.So, Frank Slootman wrote a book recently called Amp it Up that I've been reading over the last weekend, and he talks—has this article that was on LinkedIn a while back called “Drivers vs. Passengers” and he's always looking for drivers. And those drivers tend to not find themselves as happy in bigger companies and they tend to head for the exits. And so then you end up with the people who are a lot of the passenger type of people, the people who are like—they'll carry it forward, they'll continue to scale it, the business will continue to grow at whatever rate it's going to grow, but you're probably not going to see a lot of the net-new stuff. And I'll put it in comparison to a company like Datadog who I have a vast amount of respect for I think they're incredibly innovative company, and I think they continue to innovate.Still driven by the founders, the people who created the original product are still there driving the vision, driving forward innovation. And that's what tends to move the envelope is the people who have the moral authority inside of an even larger organization to say, “Get behind me. We're going in this direction. We're going to go take that hill. We're going to go make things better for our customers.” And when you start to lose those handful of really critical contributors, that's where you start to see the innovation dry up.Corey: Where do you see the acquisitions coming from? Is it just at some point people shove money at these companies that got acquired that is beyond the wildest dreams of avarice? Is it that they believe that they'll be able to execute better on their mission and they were independently? These are still smart, driven, people who have built something and I don't know that they necessarily see an acquisition as, “Well, time to give up and coast for a while and then I'll leave.” But maybe it is. I've never found myself in that situation, so I can't speak for sure.Clint: You kind of I think, have to look at the business and then whoever's running the business at that time—and I sit in the CEO chair—so you have to look at the business and say, “What do we have inside the house here?” Like, “What more can we do?” If we think that there's the next billion-dollar, multi-billion-dollar product sitting here, even just in our heads, but maybe in the factory and being worked on, then we should absolutely not sell because the value is still there and we're going to grow the company much faster as an independent entity than we would you know, inside of a larger organization. But if you're the board of directors and you're looking around and saying like, hey look, like, I don't see another billion-dollar line of bus—at this scale, right, if your Splunk scale, right? I don't see another billion-dollar line of business sitting here, we could probably go acquire it, we could try to add it in, but you know, in the case of something like a Splunk, I think part of—you know, they're looking for a new CEO right now, so now they have to go find a new leader who's going to come in, re-energize and, kind of, reboot that.But that's the options that they're considering, right? They're like, “Do I find a new CEO who's going to reinvigorate things and be able to attract the type of talent that's going to lead us to the next billion-dollar line of business that we can either build inside or we can acquire and bring in-house? Or is the right path for me just to say, ‘Okay, well, you know, somebody like Cisco's interested?'” or the other path that you may see them go down to something like Silver Lake, so Silver Lake put a billion dollars into the company last year. And so they may be looking at and say, “Okay, well, we really need to do some restructuring here and we want to do it outside the eyes of the public market. We want to be able to change pricing model, we want to be able to really do this without having to worry about the stock price's massive volatility because we're making big changes.”And so I would say there's probably two big options there considering. Like, do we sell to Cisco, do we sell to Silver Lake, or do we really take another run at this? And those are difficult decisions for the stewards of the business and I think it's a different decision if you're the steward of the business that created the business versus the steward of the business for whom this is—the I've been here for five years and I may be here for five years more. For somebody like me, a company like Cribl is literally the thing I plan to leave on this earth.Corey: Yeah. Do you have that sense of personal attachment to it? On some level, The Duckbill Group, that's exactly what I'm staring at where it's great. Someone wants to buy the Last Week in AWS media side of the house.Great. Okay. What is that really, beyond me? Because so much of it's been shaped by my personality. There's an audience, sure, but it's a skeptical audience, one that doesn't generally tend to respond well to mass market, generic advertisements, so monetizing that is not going to go super well.“All right, we're going to start doing data mining on people.” Well, that's explicitly against the terms of service people signed up for, so good luck with that. So, much starts becoming bizarre and strange when you start looking at building something with the idea of, oh, in three years, I'm going to unload this puppy and make it someone else's problem. The argument is that by building something with an eye toward selling it, you build a better-structured business, but it also means you potentially make trade-offs that are best not made. I'm not sure there's a right answer here.Clint: In my spare time, I do some investments, angel investments, and that sort of thing, and that's always a red flag for me when I meet a founder who's like, “In three to five years, I plan to sell it to these people.” If you don't have a vision for how you're fundamentally going to alter the marketplace and our perception of everything else, you're not dreaming big enough. And that to me doesn't look like a great investment. It doesn't look like the—how do you attract employees in that way? Like, “Okay, our goal is to work really hard for the next three years so that we will be attractive to this other bigger thing.” They may be thinking it on the inside as an available option, but if you think that's your default option when starting a company, I don't think you're going to end up with the outcome is truly what you're hoping for.Corey: Oh, yeah. In my case, the only acquisition story I see is some large company buying us just largely to shut me up. But—Clint: [laugh].Corey: —that turns out to be kind of expensive, so all right. I also don't think it serve any of them nearly as well as they think it would.Clint: Well, you'll just become somebody else on Twitter. [laugh].Corey: Yeah, “Time to change my name again. Here we go.” So, if people want to go and learn more about a Cribl Edge, where can they do that?Clint: Yeah, cribl.io. And then if you're more of a technical person, and you'd like to understand the specifics, docs.cribl.io. That's where I always go when I'm checking out a vendor; just skip past the main page and go straight to the docs. So, check that out.And then also, if you're wanting to play with the product, we make online available education called Sandboxes, at sandbox.cribl.io, where you can go spin up your own version of the product, walk through some interactive tutorials, and get a view on how it might work for you.Corey: Such a great pattern, at least for the way that I think about these things. You can have flashy videos, you can have great screenshots, you can have documentation that is the finest thing on this earth, but let me play with it; let me kick the tires on it, even with a sample data set. Because until I can do that, I'm not really going to understand where the product starts and where it stops. That is the right answer from where I sit. Again, I understand that everyone's different, not everyone thinks like I do—thankfully—but for me, that's the best way I've ever learned something.Clint: I love to get my hands on the product, and in fact, I'm always a little bit suspicious of any company when I go to their webpage and I can't either sign up for the product or I can't get to the documentation, and I have to talk to somebody in order to learn. That's pretty much I'm immediately going to the next person in that market to go look for somebody who will let me.Corey: [laugh]. Thank you again for taking so much time to speak with me. I appreciate it. As always, it's a pleasure.Clint: Thanks, Corey. Always enjoy talking to you.Corey: Clint Sharp, CEO and co-founder of Cribl. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment. And when you hit submit, be sure to follow it up with exactly how many distinct and disparate logging systems that obnoxious comment had to pass through on your end of things.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About AlexAlex holds a Ph.D. in Computer Science and Engineering from UC San Diego, and has spent over a decade building high-performance, robust data management and processing systems. As an early member of a couple fast-growing startups, he's had the opportunity to wear a lot of different hats, serving at various times as an individual contributor, tech lead, manager, and executive. Prior to joining the Duckbill Group, Alex spent a few years as a freelance data engineering consultant, helping his clients build, manage and maintain their data infrastructure. He lives in Los Angeles, CA.Links: Twitter: https://twitter.com/alexras/ Personal page: https://alexras.info Old Consulting website with blog: https://bitsondisk.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: The company 0x4447 builds products to increase standardization and security in AWS organizations. They do this with automated pipelines that use well-structured projects to create secure, easy-to-maintain and fail-tolerant solutions, one of which is their VPN product built on top of the popular OpenVPN project which has no license restrictions; you are only limited by the network card in the instance. To learn more visit: snark.cloud/deployandgoCorey: Today's episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that's built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you're defining those as, which depends probably on where you work. It's getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that's exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn't eat all the data you've gotten on the system, it's exactly what you've been looking for. Check it out today at min.io/download, and see for yourself. That's min.io/download, and be sure to tell them that I sent you. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm the chief cloud economist at The Duckbill Group, which people are generally aware of. Today, I'm joined by our most recent principal cloud economist, Alex Rasmussen. Alex, thank you for joining me today, it is a pleasure to talk to you, as if we aren't talking to each other constantly, now that you work here.Alex: Thanks, Corey. It's great being here.Corey: So, I followed a more, I'd say traditional path for a cloud economist, but given that I basically had to invent the job myself, the more common path because imagine that you start building a role from scratch and the people you wind up looking for initially look a lot like you. And that is grumpy sysadmin, historically, turned into something, kind of begrudgingly, that looks like an SRE, which I still maintain are the same thing, but it is imperative people not email me about that. Yes, I know, you work at Google. But instead, what I found during my tenure as a sysadmin, is that I was working with certain things an awful lot, like web servers, and other things almost never, like databases and data warehouses. Because if you screw up a web server, we all have a good laugh, the site's down for a couple of minutes, life goes on, you have a shame trophy on your desk if that's your corporate culture, things continue.Mess up the data severely enough, and you don't have a company anymore. So, I was always told to keep my aura away from the expensive spendy things that power a company. You are sort of the first of a cloud economist subtype that doesn't resemble that. Before you worked here, you were effectively an independent consultant working on data engineering. Before that, you had a couple of jobs, but you had gotten a PhD in computer science, which means, first, you are probably one of the people in this world most qualified to pass some crappy job interview of solving a sorting algorithm on a whiteboard, but how did you get here from where you were?Alex: Great question. So, I like to joke that I kind of went to school until somebody told me that I had to stop. And I took that and went and started—or didn't start, but I was an early engineer at a startup and then was an executive at another early-stage one, and did a little bit of everything. And went freelance, did that for a couple of years, and worked with all kinds of different companies—vast majority of those being startups—helping them with data infrastructure problems. I've done a little bit of everything throughout my career.I've been, you know, IC, manager, manager, manager, IT guy, everything in between. I think on the data side of things, it just sort of happened, to be honest with you, it kind of started with the stuff that I did for my dissertation and parlayed that into a job back when the big data wave was starting to kind of truly crest. And I've been working on data infrastructure, basically my entire career. So, it wasn't necessarily something that was intentional. I've just been kind of taking the opportunity that makes the most sense for me it kind of every juncture. And my career path has been a little bit strange, both by academic and industrial standards. But I like where I'm at and I gained something really valuable from each of those experiences. So.Corey: It's been an interesting area of I won't say weakness here, but it's definitely been a bit of a challenge when we look at an AWS environment and even talking about a typical AWS customer without thinking of any of them in particular, I can already tell you a few things are likely to be true. For example, the number one most expensive line item in their bill is going to be EC2, and compute is the thing that powers it. Now, maybe that is they're running a bunch of instances the old-fashioned way. Maybe they're running Kubernetes but that's how it shows up. There's a lot of things that could be, and we look at what rounds that out.Now, the next item down should almost certainly not be data transfer and if so we should have a conversation, but data in one form or another is very often going to be number two. And that can mean a bunch of different things, historically. It could mean, “Oh, you have a whole bunch of stuff in S3. Let's talk about access patterns. Let's talk about lifecycle policies. Let's talk about making sure the really important stuff is backed up somewhere. Maybe you want to spend more on that particular aspect of it.”If it's on EBS volumes, that's interesting and definitely worth looking into and trying to understand the context of what's going on. Periodically we'll see a whole bunch of additional charges that speak to some of that EC2 charge in the form of EMR, AWS's Elastic MapReduce, which charges a per-hour instance charge, but also charges you for the instances that are running under the hood and under the EC2 line item. So, there's a lot of data lifecycle stuff, there's a lot of data ecosystem stories, that historically we've consulted out with experts in that particular space. And that's great, but we were starting to have to drag those people in on more and more engagements as we saw them. And we realized that was really something we had to build out as a core competency for ourselves.And we started out not intending to hire for someone with that specialty, but the more we talked to you, the more it became clear that this was a very real and very growing need that we and our customers have. How closely it is what you're doing now as far as AWS bill analysis and data pattern deep-dive align with what you were doing as a freelance consultant in the space?Alex: A lot more than you might expect. You know, I think that increasingly, what you're seeing now is that a company's core differentiator is its data, right, how much of it they have, what they do with it. And so, you know, to your point, I think when you look at any company's cloud spend, it's going to be pretty heavy on the data side in terms of, like, where have you put it? What are you doing to process it? Where is it going once it's been processed? And then how is that—Corey: And data transfer is a very important first word in that two-word sequence.Alex: Oh, sure is. And so I think that, like, in a lot of ways, the way that a customer's cloud architecture looks and the way that their bill looks kind of as a consequence of that is kind of a reification in a way of the way that the data flows from one place to another and what's done with it at each step along the way. I think what complicates this is that companies that have been around for a little while have lived through this kind of very amorphous, kind of, polyglot way that we're approaching data. You know, back when I was first getting started in the big data days, it was MapReduce, MapReduce, MapReduce, right? And we quickly [crosstalk 00:07:29]—Corey: Oh, yes. The MapReduce white paper out of Google, a beautiful April Fool's Day prank that the folks at Yahoo fell for hook, line, and sinker. They wrote Hadoop, and now we're all stuck with that pattern. Great gag, they really should have clarified they were kidding. Here we are.Alex: Exactly. So—Corey: I mostly kid.Alex: No, for sure. But I think especially when it comes to data, we tend to over-index on what the large companies do and then quickly realize that we've made a mistake and correct backwards, right? So, there was this big push toward MapReduce for everything until people realize that it was just a pain in the neck to operate and to build. And so then we moved into Spark, so kind of up-leveled a little bit. And then there was this kind of explosion of NoSQL and NewSQL databases that hit the market.And MongoDB inexplicably won that war and now we're kind of in this world where everything is cloud data warehouse, right? And now we're trying to wrestle with, like, is it actually a good idea to put everything in one warehouse and have SQL be the lingua franca on top of it? But it's all changing so rapidly. And when you come into a customer that's been around for 10 or 15 years, and has, you know, been in the cloud for a substantial—Corey: Yeah, one of those ancient customers. That is—Alex: I know, right?Corey: —basically old enough to almost get a driver's license? Oh, yeah.Alex: Right. It's one of those things where it's like, “Ah, yes, in startup years, you're, like, a hundred years old,” right? But still, you know, I think you see this, kind of—I wouldn't call it a graveyard of failed experiments, right, but it's a collection of, like, “Well, we tried this, and it kind of worked and we're keeping it around because the cost of moving this stuff around—the kind of data gravity, so to speak—is high enough that we're not going to bother transitioning it over.” But then you get into this situation where you have to bend over backwards to integrate anything with anything else. And we're still kind of in the early days of fixing that.Corey: And the AWS bill pattern that we see all the time across the board of those experiments were not successful and do not need to exist, but there's no context into that. The person that set them up left five years ago, the jobs are still running on time. What's happening with them? Well, we could stop them and see who screams, but very often, that's not the right answer either.Alex: And I think there's also something to note there, too, which is like, getting rid of data is very scary, right? I mean, if you resize a Kubernetes cluster from 15 nodes to 10, nobody's going to look at you sideways. But if you go, “Hey, we're just going to drop these tables.” The immediate reaction that you get, particularly from your data science team more often than not is, “Oh, God, what if we need that?” And so the conversation never really happens, and that causes this kind of snowball of data debt that persists in some cases for many, many years.Corey: Yeah, in some cases, what I found has been successful on those big unknown questions is don't delete the data, but restrict access to it for a few weeks and see what happens. Look into it a bit and make sure that it's not like, “Oh, cool. We just did for a month, and now we don't need that data. Let's get rid of it.” And then another month goes by it's like, “So, time to report quarterly earnings. Where's the data?”Oh, dear, that's not going to go well, for anyone. And understanding what's happening, the idea of cloning a petabyte of data so you can run an experiment on it. And okay, turns out the experiment wasn't needed. Do we still need to keep all of that?Alex: Yeah.Corey: The underlying platform advancements have been helpful toward this as well, a petabyte of data now in Glacier Deep Archive cost the princely sum of a thousand bucks a month, which is pretty close to the idea of why would I ever delete data ever again? I can get it back within a day if I need it, so let's just put it there instead.Alex: Right. You know, funny story. When I was in graduate school, we were dealing with, you know, 100 terabyte datasets on the regular that we had to generate every time because we only had 200 terabytes of raw storage. [laugh]. And this was before cloud was yet mature enough that we could get the kind of performance numbers that we wanted off of it.And we would end up having to delete the input data to make room for the output data. [laugh]. And thankfully, we don't need to do that anymore. But there are a lot of, kind of, anti-patterns that arise from that too, right? If data is easy to keep around forever, it stays around forever.And if it's easy to, let's say, run a SQL command against your Snowflake instance that scans 20 terabytes of data, you're just going to do it, and the exposure of that to you is so minimal that you can end up causing a whole bunch of problems for yourself by the fact that you don't have to deal with stuff at that low-level of abstraction anymore.Corey: It's always fun watching how this stuff manifests—because I'm dipping a toe into it from time to time—the easy, naive answer that we could give every customer but we don't is, “Huh. So, you have a whole bunch of EMR stuff? Well, you know, if you migrate that into something else, you'll save a whole bunch of money on that.” With no regard for the 500 jobs that run against that EMR cluster on a consistent basis that form is a key part of business process. “Yeah, if you could just do the entire flow of how data is operated with throughout your entire business that would be swell because you can save tens of thousands of dollars a month on that.” Yeah, how about we don't suggest things that are just absolute buffoonery.Alex: Well, and it's like, you know, you hit on a good point. Like, one of my least favorite words in the English language is the word ‘just.' And you know, I spent a few years as a freelance data consultant, and you know, a lot of what I would hear sometimes from customers is, “Well, why don't we ‘just' deprecate X?”Corey: “Why don't we just—” “I'm going to stop you there because there is no ‘just.'”Alex: Exactly.Corey: There's always context that we cannot have as outsiders.Alex: Precisely. Precisely. And digging into that really is—it's the fun part of the job, but it's also the hard part of the job.Corey: Before we created The Duckbill Group, which was really when I took Mike Julian on as business partner and CEO and formed the entity, I had something in common with you; I was freelancing for a couple of years beforehand. Now, I know why I wound up deciding, all right, we're going to turn this into a company, but what was it that I guess made you decide to, you know, freelancing is all well and good, but it's time to get something that looks a lot more like a quote-unquote, “Traditional job.”Alex: So, I think, on one level, I went freelance because I wasn't exactly sure what I wanted to do next. And I knew what I was good at. I knew what I had a lot of experience at, and I thought, “Well, I can just go out and kind of find a bunch of people that are willing to hire me to do what I'm good at doing, and then maybe eventually I'll find one of them that I like enough that I'll go and work for them. Or maybe I'll come up with some kind of a business model that I can repeat enough times that I don't have to worry that I wake up tomorrow and all of my clients are gone and then I have to go live in a van down by the river.”And I think when I heard about the opening at The Duckbill Group, I had been thinking for a little while about well, this has been going fine for a long time, but effectively what I've been doing is I've been you know, a staff-level data engineer for hire. And do I want to do something more than that, you know? Do I want to do something more comp—perhaps more sophisticated or more complex than that? And I rapidly came to the conclusion that in order to do that, I would have to have sales and marketing, and I would have to, you know, spend a lot of my time bringing in business. And that's just not something that I have really any experience in or I'm any good at.And, you know, I also recognize that, you know, I'm a relatively small fish in a relatively large pond, and if I wanted to get the kind of like, large scale people, the like the big, you know, Fortune 1000 company kind of customers, they may not pay attention to somebody like me. And so I think that ultimately, what I saw with The Duckbill Group was, number one, a group of people that were strongly aligned to the way that I wanted to keep doing this sort of work, right? Cultural alignment was really strong, good people, but also, you know, you folks have a thing that you figured out, and that puts you 10 to 15 steps ahead of where I was. And I was kind of staring down the barrel that, I'm like, am I going to have to take six months not doing client work so that I can figure out how to make this business sustain? And, you know, I think that ultimately, like, I just looked at it, and I said, this just makes sense to me, like, as a next step. And so here we all are.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that's snark.cloud/oci-free.Corey: It's always fun seeing how people perceive what we've done from the outside. Like, “Oh, yeah, you just stumbled right onto the thing that works, and you've just been going, like, gangbusters ever since.” Then you come aboard, it's like, “Here, look at this pile of things that didn't pan out over here.” And it's, you get to see how the sausage is made in a way that we talk about from time to time externally, but surprisingly, most of our marketing efforts aren't really focused on, “And here's this other time we screwed up as well.” And we're honest about it, but it's not sort of the thing that we promote as the core message of what we do and who we are.A question I like to ask people during job interviews, and I definitely asked you this, and I'll ask you now, which is going to probably throw some folks for a loop because who talks to their current employees like this? But what's next for you? When it comes time for you to leave the Duckbill Group, what do you want to do after this job?Alex: That's a great question. So, I mean, as we've mentioned before, you know, my career trajectory has been very weird and circuitous. And, you know, I would be lying to you if I said that I had absolute certainty about what the rest of that looks like. I've learned a few things about myself in the course of my career, such as it is. In my kind of warm, gooey center, I build stuff. Like, that is what gives me joy, it is what makes me excited to wake up in the morning.I love looking at big, complicated things, breaking them down into pieces, and figuring out how to make the pieces work in a way that makes sense. And, you know, I've spent a long time in the data ecosystem. I don't know, necessarily, if that's something that I'm going to do forever. I'm not necessarily pigeonholing myself into that part of the space just yet, but as long as I get to kind of wake up in the morning, and say, “I'm going to go and build things and it's not going to actively make the world any worse,” I'm happy with that. And so that's really—you know, might go back to freelancing, might go and join another group, another company, big small, who knows. I'm kind of leaving that up to the winds of destiny, so to speak.Corey: One thing that I have found incredi—sorry. Let me just address that first. Like that—Alex: Sure.Corey: —is the right way to think about it. My belief has always been that you don't necessarily have, like, the ten-year plan, or the five-year plan or whatever it is because that's where you're going to go so much as it gives you direction and forces you to keep moving so you don't wind up sitting in the same place for five years with one year of experience repeated five times. It helps you remember the bigger picture. Because I've always despised this fiction that we see in job interviews where average tenure in our industry is 18 to 36 months, give or take, but somehow during the interviews, we all talk like this is now your forever job, and after 25 years, you'll retire. And yeah, let's be a little more realistic than that.My question is always what is next and how can we align in a way that helps you get to what's coming? That's the purpose behind the question, and that's—the only way to make that not just a drippingly insincere question is to mean it and to continue to focus on it from time to time of, great. What are you learning what's next? Now, at the time of this recording, you've been here, I believe three weeks if I'm not mistaken?Alex: I've—this is week two for me at time of recording.Corey: Excellent. Yes, my grasp of time is sort of hazy at the best of times. I have a—I do a lot of things.Alex: For sure.Corey: But yeah, it has been an eye-opening experience for me, not because, “Oh, wow, we have an employee.” Yeah, we've done that a few times before. But rather because of your background, you are asking different questions than we typically get during onboarding. I had a blog post go out recently—or will be by the time this airs—about a question that you asked about, “Wow, onboarding into our internal account structure for AWS is way more polished than I've ever seen it before. Is that something you built in-house? What is that?”And great. Oh, terrific, I'd forgotten that this is kind of a novel thing. No. What we're using is AWS's SSO offering, which is such a well-built, polished product that I can only assume that it's under NDA because Amazonians don't talk about it ever. But it's great.It has a couple of annoyances, but beyond that, it's something that I'm a big fan of, but I'd forgotten how transformative that is, compared to the usual approach of all right, here's your username, here's a password you're going to have to change, here are your IAM credentials to store on disk forever. It's the ability to look at what we're doing through the eyes of someone who is clearly deep into the technical weeds, but not as exposed to all of the minutiae of the 300-some-odd AWS services is really a refreshing thing for all of us, just because it helps us realize what it's like to see some of this stuff for the first time, as well as gives me content ideas because if it's new to you, I promise you are not the only person who's seeing it that way. And if you don't really understand something well enough to explain it, I would argue you don't really understand the thing, so it forces me to get more awareness around exactly how different facets work. It's been an absolutely fantastic experience so far, from my perspective.Alex: Thank you. Right back at you. I mean, spending so many years working with startups, my kind of level of expected sophistication is, “I'm going to write your password on the back of a napkin. I have fifteen other things to do. Go figure it out.” And so you know, it's always nice to see—particularly players like AWS that are such 800-pound gorillas—going in and trying to uplevel that experience in a way that feels like—because I mean, like, look, AWS could keep us with the, “Here's a CSV with your username and password. Good luck, have fun.” And you know, they would still make—Corey: And they're going to have to because so much automation is built around that—Alex: Oh yeah—Corey: In so many places.Alex: —so much.Corey: It's always net-additive, they never turn anything off, which is increasingly an operational burden.Alex: Yeah, absolutely. Absolutely. But yeah, it's nice to see them up-level this in a way that feels like they're paying attention to their customers' pain. And that's always nice to see.Corey: So, we met a few years ago—in the before times—at a mixer that we wound up throwing—slash meetup. It was in Southern California for some AWS event or another. You've been aware of who we are and what we do for a while now, so I'm very curious to know—and the joy of having these conversations is that I don't actually know what the answer is going to be, so this may never see the light of day if it goes to weird—Alex: [laugh].Corey: —in the wrong direction, but—no I'm kidding. What has been, I guess, the biggest points of dissonance or surprises based upon your perception of who we are and what we do externally, versus joining and seeing how the sausage is made?Alex: You know, I think the first thing is—um, well, how to put this. I think that a lot of what I was expecting, given how much work you all do and how big—well, ‘you all;' we do—and how big the list of clients is and how it gets bigger every day, I was expecting this to be, like, this very hyper put together, like, every little detail has been figured out kind of engagement where I would have to figure out how you all do this. And coming in and realizing that a lot of it is just having a lot of in-depth knowledge born from experience of a bunch of stuff inside of this ecosystem, and then the rest of it is kind of free jazz, is kind of encouraging. Because as someone that was you know, as a freelancer, right, who do you see, right? You see people who have big public presences or people who are giant firms, right?On the GCP side, SADA Systems is a great example. They're another local company for me here in Los Angeles, and—Corey: Oh, yes. [unintelligible 00:24:48] Miles has been a recurring guest on the show.Alex: Yeah. And he's great. And, like, they have this enormous company that's got, like, all these different specializations and they're basically kind of like the middleman for GCP on a lot of things. And, like, you see that, and then you kind of see the individual people that are like, “Yeah, you know, I'm not really going to tell you that I only have two clients and that if both of them go away, I'm screwed, but, like, I only have two clients, and if both of them go away, I'm screwed.” And so, you know, I think honestly seeing that, like, what you've built so far and what I hope to help you continue to build is, you know, you've got just enough structure around the thing so that it makes sense, and the rest of it, you're kind of admitting that no plan ever survives contact with the client, right, and that everybody's going to be different than that everybody's problems are going to be different.And that you can't just go in and say, “Here's a dashboard, here's a calculator, have fun, give me my money,” right? Because that feels like—in optimization spaces of any kind, be that cloud, or data or whatever, there's this, kind of, push toward, how do I automate myself out of a job, and the realization that you can't for something like this, and that ultimately, like, you're just going to have to go with what you know, is something that I kind of had a suspicion was the case, but this really made it clear to me that, like, oh, this is actually a reasonable way of going about this.Corey: We thought otherwise at one point. We thought that this was something could be easily addressed their software. We launched our DuckTools SaaS platform in beta and two months later, did the—our incredible journey has come to an end, and took it off of a public offering. Because it doesn't lend itself to solving these problems in software in any reasonable way. I am ever more convinced over time that the idea of being able to solve cloud cost optimization with software at VC-scale is a red herring.And yeah, it just isn't going to work because it's one size fits some. Our customers are, by definition, exceptional in many respects, and understanding the context behind why things are the way that they are mean that we can only go so far with process because then it becomes a let's have a conversation and let's be human. Otherwise, we try to overly codify the process, and congratulations, we just now look like really crappy software, but expensive because it's all people doing it. It doesn't work that way. We have tools internally that help smooth over a lot of those edges, but by and large, people who are capable of performing at especially at the principal level for a cloud economics role, inherently are going to find themselves stifled by too much process because they need to have the freedom to dig into the areas that are relevant to the customer.It's why we can't recraft all of our statements of work in ways that tend to shy away from explicitly defined deliverables. Because we deliver an outcome, but it's going to depend entirely, in most cases, up on what we discover along the way. Maybe a full-on report isn't the best way of presenting the data in the way that we see it. Maybe it's a small proof of concept script or something like that. Maybe it's, I don't know, an interpretive dance in front of the company's board.Alex: [laugh]. Right.Corey: I'm open to exploring opportunities. But it comes down to what is right for the customer. There's a reason we only ever charge a fixed fee for these things, and it's because at that point, great, we're giving you the advice that we'd implement ourselves. We have no partnerships with any vendor in the space just to avoid bias or the perception of same. It's important that we are the authoritative source around these things.Honestly, the thing that surprised me the most about all this is how true to that vision we've stayed as we've as we flushed out what works, what doesn't. And we can distantly fail to go out of business every month. I am ecstatic about that. I expected this to wind up cratering into a mountain four months after I went freelance. Not yet.Alex: Well, I mean, I think there's another aspect of this too, right? Because I've spent a lot of my career working inside of venture capital-backed companies. And there's a lot of positive things to be said about having ready access to that kind of cash, but it does something to your business the second you take it. And I've been in a couple of situations where, like, once you actually have that big bucket of money, the incentive is grow, right? Hire more people get more customers, go, go, go, go, go.And sometimes what you'll find is that you'll spend the time and the money on an initiative and it's clearly not working. And you just kind of have to keep doubling down because now you've got customers that are using this thing and now you have to maintain it, and before you know it, you've got this albatross hanging around your neck. And like one of the things that I really respect about the way that Duckbill Group is is handling this by not taking outside cash is, like, it frees you up to make these kinds of bets, and then two months later say, “Well, that didn't work,” and try something else. And you know, that's very difficult to do once you have to go and convince someone with, you know, money flowing out of their ears, that that's the right thing to do.Corey: We have to be intentional about what we're doing. One of the benefits of bringing you aboard is that one, it does improve our capacity for handling more engagements at the same time, but it also improves the quality of the engagements that we are delivering. Instead of basically doing a round-robin assignment policy we can—Alex: Right.Corey: —we consult with each other; we talk about specific areas in which we have specific expertise. You get dragged into a lot of data portions of existing engagements, and the rest of us get pulled into other areas in which you might not be as strong. For example, “What are all of these ridiculous services? I can't make heads or tails have the ridiculous naming side of it.” Surprise, that's not a you problem.It comes down to being able to work collaboratively and let each other shine in a way that doesn't mean we load people up with work. We're very strict about having a 40-hour or less work week, just because we're not rushing for an exit. We want to enjoy our time working, we want to enjoy what we're doing, and then we want to go home and don't think about work until it's time to come back and think about these things. Like, it's a lifestyle company, but that lifestyle doesn't need to be run, run, run, run, run all the time, and it doesn't need to be something that people barely tolerate.Alex: Yeah. And I think that, you know, especially coming from being an army of one in a lot of engagements, it is really refreshing to be able to—see because, you know, I'm fortunate enough, I have friends in the industry that I can go and say like, “I have no idea how to make heads or tails of X.” And you know, I can get help that way, but ultimately, like, the only other outlet that I have here is the customer and they're not bringing me in if they have those answers readily to hand. And so being able to bounce stuff off of other people inside of an organization like this has been really refreshing.Corey: One of the things I've appreciated about your tenure here so far is the questions that you ask are pitched at the perfect level, by which I mean, it is never something you could answer with a three-second visit to Google, but it's also not something that you've spent three days spinning your wheels on trying to understand. You do a bit of digging; it's a little unclear, especially since there are multiple paths to go down, and then you flag it for clarification. And there's really so much to be said for that. Really, when we're looking for markers of seniority in the interview process, it's admitting you don't know something, but then also talking about how you would go about getting the answer. And it's—because no one has all this stuff in their head. I spend a disturbing amount of time looking at search engines and trying to reformulate queries and to get answers that make sense.I don't have the entirety of AWS shoved into my head. Yet. I'm sure there's something at re:Invent that's going to be scary and horrifying that will claim to do it and basically have a poor user interface, but all right. When that comes, we'll reevaluate then because this industry is always changing.Alex: For sure. For sure. And I think it's, it's worth pointing out that, like, one of the things that having done this for a long time gives you is this kind of scaffolding in your head that you can hang things over. We're like, you don't need to have every single AWS service memorized, but if you've got that scaffold in your head going, “Oh, like, this thing sounds like it hangs over this part of the mental scaffold, and I've seen other things that do that, so I wonder if it does this and this and this,” right? And that's a lot of it, honestly.Because especially, like, when I was solely in the data space, there's a new data wareho—or a new, like, data catalog system coming out every other week. You know, there are a thousand different things that claim to do MLOps, right? And whenever, like, someone comes to me and says, “Do you have experience with such and such?” And the answer was usually, “Well if you hum a few bars, I can fake it.” And, you know, that tends to help a great deal.Corey: Yeah. “No, but I'll find out and get back to you,” the right answer. Making it up and being wrong is the best way to get rejected from an environment. That's not just consulting; that's employment, too. If 95% of the time, you give the right answer, but that one time and 20 you're going to just make it up, well, I have to validate the other 19 because I never know when someone's faking it or not. There's that level of earned trust that's important.Alex: Well, yeah. And you're being brought in to be the expert in the room. That doesn't necessarily mean that you are the all-seeing, all-knowing oracle of knowledge but, like, if you say a thing, people are just going to believe you. And so, you know, it's beholden on you—Corey: If not, we have a different problem.Alex: Well, yeah, exactly. Hopefully, right? But yeah, I mean, it's beholden on you to be honest with your customer at a certain point, I think.Corey: I really want to thank you for taking the time out of your day to got with me about this. And I would love to have you back on in a couple of months once you're fully up to speed and spinning at the proper RPMs and see what's happened then. I—Alex: Thank you. I'd—Corey: —really appreciate—Alex: —love to.Corey: —your time where's the best place for people to learn more about you if they haven't heard your name before?Alex: Well, let's see. I am @alexras on Twitter, A-L-E-X-R-A-S. My personal website is alexras.info.I've done some writing on data stuff, including a pretty big collection of blog posts on the data side of the AWS ecosystem that are still on my consulting page, bitsondisk.com. Other than that—I mean, yeah, Twitter is probably the best place to find me, so if you want to talk more about any weird, nerd data stuff, then please feel free to reach out there.Corey: And links to that will, of course, be in the [show notes 00:35:57]. Thanks again for your time. I really appreciate it.Alex: Thank you. It's been a pleasure.Corey: Alex Rasmussen, principal cloud economist here at The Duckbill Group. I am Corey Quinn, cloud economist to the stars, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment that you then submit to three other podcast platforms just to make sure you have a backup copy of that particular piece of data.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About AidanAidan is an AWS enthusiast, due in no small part to sharing initials with the cloud. He's been writing software for over 20 years and getting paid to do it for the last 10. He's still not sure what he wants to be when he grows up.Links: Stedi: https://www.stedi.com/ GitHub: https://github.com/aidansteele Blog posts: https://awsteele.com/ Ipv6-ghost-ship: https://github.com/aidansteele/ipv6-ghost-ship Twitter: https://twitter.com/__steele TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.Corey: Today's episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that's built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you're defining those as, which depends probably on where you work. It's getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that's exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn't eat all the data you've gotten on the system, it's exactly what you've been looking for. Check it out today at min.io/download, and see for yourself. That's min.io/download, and be sure to tell them that I sent you.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by someone who is honestly, feels like they're after my own heart. Aidan Steele by day is a serverless engineer at Stedi, but by night, he is an absolute treasure and a delight because not only does he write awesome third-party tooling and blog posts and whatnot around the AWS ecosystem, but he turns them into the most glorious, intricate, and technical shit posts that I think I've ever seen. Aidan, thank you for joining me.Aidan: Hi, Corey, thanks for having me. It's an honor to be here. Hopefully, we get to talk some AWS, and maybe also talk some nonsense as well.Corey: I would argue that in many ways, those things are one in the same. And one of the things I always appreciated about how you approach things is, you definitely seem to share that particular ethos with me. And there's been a lot of interesting content coming out from you in recent days. The thing that really wound up showing up on my radar in a big way was back at the start of January—2022, for those listening to this in the glorious future—about using IPv6 to use multi-factor auth, which it is so… I don't even have the adjectives to throw at this because, first it is ridiculous, two, it is effective, and three, it is just who thinks like that? What is this and what did you—what monstrosity have you built?Aidan: So, what did I end up calling it? I think it was ipv6-ghost-ship. And I think I called it that because I'd recently watched, oh, what was that series that was recently on Apple TV? Uh, the Isaac Asimov—Corey: If it's not Paw Patrol, I have no idea what it is because I have a four-year-old who is very insistent about these things. It is not so much a TV show as it is a way of life. My life is terrible. Please put me out of my misery.Aidan: Well, at least it's not Bluey. That's the one I usually hear about. That's Australia's greatest export. But it was one of the plot devices was a ship that would teleport around the place, and you could never predict where it was next. And so no one could access it. And I thought, “Oh, what about if I use the IPv6 address space?”Corey: Oh, Foundation?Aidan: That's the one. Foundation. That's how the name came about. The idea, honestly, it was because I saw—when was it?—sometime last year, AWS added support for those IP address prefixes. IPv4 prefixes were small; very useful and important, but IPv6 with more than 2 trillion IP addresses, per instance, I thought there's got to be fun to be had there.Corey: 281 trillion, I believe is the—Aidan: 281 trillion.Corey: Yeah. It is sarcastically large space. And that also has effectively, I would say in InfoSec sense, killed port scanning, the idea I'm going to scan the IP range and see what's there, just because that takes such a tremendous amount of time. Now here, in reality, you also wind up with people using compromised resources, and yeah, it turns out, I can absolutely scan trillions upon trillions of IP addresses as long as I'm using your AWS account and associated credit card in which to do it. But here in the real world, it is not an easily discoverable problem space.Aidan: Yeah. I made it as a novelty, really. I was looking for a reason to learn more about IPv6 and subnetting because it's the term I'd heard, a thing I didn't really understand, and the way I learn things is by trying to build them, realizing I have no idea what I'm doing, googling the error messages, reluctantly looking at the documentation, and then repeating until I've built something. And yeah, and then I built it, published it, and seemed to be pretty popular. It struck a chord. People retweeted it. It tickled your fancy. I think it spoke something in all of us who are trying not to take our jobs too seriously, you know, know we can have a little fun with this ludicrous tech that we get to play with.Corey: The idea being, you take the multi-factor auth code that your thing generates, and that is the last series of octets for the IP address you wind up going towards and that is such a large problem space that you're not going to find it in time, so whatever it is automatically connect to that particular IP address because that's the only one that's going to be listening for a 30 to 60-second span for the connection to be established. It is a great idea because SSH doesn't support this stuff natively. There's no good two-factor auth approach for this. And I love it. I'd be scared to death to run this in production for something that actually matters.And we also start caring a lot more about how accurate are the clocks on those instances, all of a sudden. But, oh, I just love the concept so much because it hits on the ethos of—I think—what so much of the cloud does were these really are fundamental building blocks that we can use to build incredible, awe-inspiring things that are globe-spanning, and also ridiculousness. And there's so much value of being able to do the same thing, sometimes at the same time.Aidan: Yeah, it's interesting, you mentioned, like, never using in prod, and I guess when I was building it, I thought, you know, that would be apparent. Like, “Yes, this is very neat, but surely no one's going to use it.” And I did see someone raised an issue on the GitHub project which was talking about clock skew. And I mentioned—Corey: Here at the bank where I'm running this in production, we're—Aidan: [laugh].Corey: —having some trouble with the clock. Yeah, it's—Aidan: You know, I mentioned that the underlying 2FA library did account for clock scheme 30 seconds either way, but it made me realize, I might need to put a disclaimer on the project. While the code is probably reasonably sound, I personally wouldn't run it in production, and it was more meant to be a piece of performance art or something to tickle one's fancy and to move on, not to roll it out. But I don't know, different strokes for different folks.Corey: I have gotten a lot better about calling out my ridiculous shitpost things when I do them. And the thing that really drove that home for me was talking about using DNS TXT records to store information about what server a virtual machine lives on—or container or whatnot—thus using Route 53 is a database. And that was a great gag, and then someone did a Reddit post of “This seems like a really good idea, so I'm going to start doing it, and I'm having these questions.”And at that point is like, “Okay, I've got a break character at that point.” And is, yeah, “Hi. That's my joke. Don't do it because X, Y, and Z are your failure modes, there are better tools for it. So yeah, there are ways you can do this with DNS, but it's not generally a great idea, and there are some risk factors to it. And okay, A, B, and C are the things you don't want to do, so let's instead do it in a halfway intelligent way because it's only funny if everyone's laughing. Otherwise, we fall into this trap of people take you seriously and they feel bad as a result when it doesn't work in production. So, calling it out as this is a joke tends to put a lot of that aside. It also keeps people from feeling left out.Aidan: Yeah. I realized that because the next novelty project I did a few days later—not sure if you caught it—it was a Rick Roll over ICMPv6 packets, where if you had run ping six to a certain IP range, it would return the lyrics to music's greatest treasure. So, I think that was hopefully a bit more self-evident that this should never be taken seriously. Who knows, I'm sure someone will find a use for it in prod.Corey: And I was looking through this, this is great. I love some of the stuff that you're doing because it's just fantastic. And I started digging a bit more to things you had done. And at that point, it was whoa, whoa, whoa, wait a minute. Back in 2020, you found an example of an issue with AWS's security model where CloudTrail would just start—if asked nicely—spewing other people's credential sets and CloudTrail events and whatnot into your account.And, A, that's kind of a problem. B, it was something that didn't make that big of a splash when it came out—I don't even think I linked to it at the time—and, C, it was examples of after the recent revelations around CloudFormation and Glue that the fine folks at Orca Security found out. That wasn't a one-off because you'd done this a year beforehand. We have now an established track record of cross-account data sharing and, potentially, exploits, and I'm looking at this and I got to level with you I felt incredibly naive because I had assumed that since we hadn't heard of this stuff in any real big sense that it simply didn't happen.So, when we heard about Azure; obviously, it's because Azure is complete clown shoes and the excellent people that AWS would never make these sorts of mistakes. Except we now have evidence that they absolutely did and didn't talk about it publicly. And I've got a level with you. I feel more than a little bit foolish, betrayed, naive for all this. What's your take on it?Aidan: Yeah, so just to clarify, it wasn't actually in your account. It was the new AWS custom resource execution model was you would upload a Lambda function that would run in an Amazon-managed account. And so that immediately set off my spidey sense because executing code in someone else's account seems fraught with peril. And so—Corey: Yeah, you can do all kinds of horrifying things there, like, use it to run containers.Aidan: Yeah. [laugh]. Thankfully, I didn't do anything that egregious. I stayed inside the Lambda function, but I look—I poked around at what credentials have had, and it would use CloudWatch to reinvoke itself and CloudWatch kept recording CloudTrail. And I won't go into all the details, but it ended up being that you could see credentials being recorded in CloudTrail in that account, and I could, sort of, funnel them out of there.When I found this, I was a little scared, and I don't think I'd reported an issue to AWS before, so I didn't want to go too far and do anything that could be considered malicious. So, I didn't actively seek out other people's credentials.Corey: Yeah, as a general rule, it's best once you discover things like that to do the right thing and report it, not proceed to, you know, inadvertently commit felonies.Aidan: Yeah. Especially because it was my first time. I felt better safe than sorry. So, I didn't see other credentials, but I had no reason to believe that, I wouldn't see it if I kept looking. I reported it to Amazon. Their security team was incredibly professional, made me feel very comfortable reporting it, and let me know when, you know, they'd remediated it, which was a matter of days later.But afterwards, it left me feeling a little surprised because I was able to publish about it, and a few people responded, you know, the sorts of people who pay close attention to the industry, but Amazon didn't publish anything as far as I was aware. And it changed the way I felt about AWS security, because like you, I sort of felt that AWS, more or less had a pretty perfect track record. They would have advisories about possible [Zen 00:12:04] exploits, and so on. But they'd never published anything about potential for compromise. And it makes me wonder how many of the things might have been reported in the past where either the third-party researcher either didn't end up publishing, or they published and it just disappeared into the blogosphere, and I hadn't seen it.Corey: They have a big earn trust principle over there, and I think that they always focus on the trust portion of it, but I think what got overlooked is the earn. When people are giving you trust that you haven't earned, on some level, the right thing to do is to call it out and be transparent around these things. Yes, I know, Wall Street's going to be annoyed and headlines, et cetera, et cetera, but I had always had the impression that had there been a cross-account vulnerability or a breach of some sort, they would communicate this and they would have their executives go on a speaking tour about it to explain how defense-in-depth mitigated some of it, and/or lessons learned, and/or what else we can learn. But it turns out that wasn't was happening at all. And I feel like they have been given trust that was unearned and now I am not happy with it.I suddenly have a lot more of a, I guess, skeptical position toward them as a result, and I have very little tolerance left for what has previously been a staple of the AWS security discussions, which is an executive getting on stage for a while and droning on about the shared responsibility model with the very strong implication that “Oh, yeah, we're fine. It's all on your side of the fence that things are going to break.” Yeah, turns out, that's not so true. Just you know, about the things on your side of the fence in a way that you don't about the things that are on theirs.Aidan: Yeah, it's an interesting one. Like, I think about it and I think, “Well, they never made an explicit promise that they would publish these things,” so, on one hand, I say to myself, “Oh, maybe that's on me for making that assumption.” But, I don't know, I feel like the way we felt was justified. Maybe naive in hindsight, but then, you know, I guess… I'm still not sure how to feel because of, like, I think about recent issues and how a couple of AWS Distinguished Engineers jumped on Twitter, and to their credit were extremely proactive in engaging with the community.But is that enough? It might be enough for say, to set my mind at ease or your mind at ease because we are, [laugh] to put it mildly, highly engaged, perhaps a little too engaged in the AWS space, but Twitter's very ephemeral. Very few of AWS's customers—Corey: Yeah, I can't link to tweets by distinguished engineers to present to an executive leadership team as an official statement from Amazon. I just can't.Aidan: Yeah. Yeah.Corey: And so the lesson we can take from this is okay, so “Well, we never actually said this.” “So, let me get this straight. You're content to basically let people assume whatever they want until they ask you an explicit question around these things. Really? Is that the lesson you want me to take from this? Because I have a whole bunch of very explicit questions that I will be asking you going forward, if that is in fact, your position. And you are not going to like the fact that I'm asking these questions.”Even if the answer is a hard no, people who did not have this context are going to wonder why are people asking those questions? It's a massive footgun here for them if that is the position that they intend to have. I want to be clear as well; this is also a messaging problem. It is not in any way, a condemnation of their excellent folks working on the security implementation themselves. This stuff is hard and those people are all-stars. I want to be very clear on this. It is purely around the messaging and positioning of the security posture.Aidan: Yeah, yeah. That's a good clarification because like you, my understanding that the service teams are doing a really stellar, above-average job, industry-wide, and the AWS Security Response Teams, I have absolute faith in them. It is a matter of messaging. And I guess what particularly brings it to front-of-mind is, it was earlier this month, or maybe it was last month, I received an email from a company called Sourcegraph. They do code search.I'm not even a customer of theirs yet, you know? I'm on a free trial, and I got an email that—I'm paraphrasing here—was something to the effect of, we discovered that it was possible for your code to appear in other customers' code search results. It was discovered by one of our own engineers. We found that the circumstances hadn't cropped up, but we wanted to tell you that it was possible. It didn't happen, and we're working on making sure it won't happen again.And I think about how radically different that is where they didn't have a third-party researcher forcing their hand; they could have very easily swept under the rug, but they were so proactive that, honestly, that's probably what's going to tipped me over to the edge into me becoming a customer. I mean, other than them having a great product. But yeah, it's a big contrast. It's how I like to see other companies work, especially Amazon.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They've also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That's S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.Corey: The two companies that I can think of that have had security problems have been CircleCI and Travis CI. Circle had an incredibly transparent early-on blog post, they engaged with customers on the forums, and they did super well. Travis basically denied, stonewalled for ages, and now the only people who use Travis are there because they haven't found a good way to get off of it yet. It is effectively DOA. And I don't think those two things are unrelated.Aidan: Yeah. No, that's a great point. Because you know, I've been in this industry long enough. You have to know that humans write code and humans make mistakes—I know I've made more than my fair share—and I'm not going to write off the company for making a mistake. It's entirely in their response. And yeah, you're right. That's why Circle is still a trustworthy business that should earn people's business and why Travis—why I recommend everyone move away from.Corey: Yeah, I like Orca Security as a company and as a product, but at the moment, I am not their customer. I am AWS's customer. So, why the hell am I hearing it from Orca and not AWS when this happens?Aidan: Yeah, yeah. It's… not great. On one hand, I'm glad I'm not in charge of finding a solution to this because I don't have the skills or the expertise to manage that communication. Because like I think you said in the past, there's a lot of different audiences that they have to communicate with. They have to communicate with the stock market, they have to communicate with execs, they have to communicate with developers, and each of those audiences demands a different level of detail, a different focus. And it's tricky. And how do you manage that? But, I don't know, I feel like you have an obligation to when people place that level of trust in you.Corey: It's just a matter of doing right by your customers, on some level.Aidan: Yeah.Corey: How long have you been working on an AWS-side environments? Clearly, this is not like, “Well, it's year two,” because if so I'm going to feel remarkably behind.Aidan: [laugh]. So, I've been writing code in some capacity or another for 20 years. It took about five years to get anyone to pay me to do so. But yeah, I guess the start of my professional career—and by ‘professional,' I want to use it in strictest term, means getting paid for money; not that I [laugh] am necessarily a professional—coincided with the launch of AWS. So, I don't hadn't experienced with the before times of data centers, never had to think about direct connect, but it means I have been using AWS since sometime in 2008.I was just looking at my bill earlier, I saw that my first bill was for $70. It was—I was using a C1xLarge, which was 80 cents an hour, and it had eight-core CPUs. And to put that in context at the time—Corey: Eight vCPUs, technically I believe.Aidan: An it basically is—Corey: —or were they using [eCPU 00:20:31] model back then?Aidan: Yeah, no, that was vCPUs. But to me, that was extraordinary. You know, I was somewhere just after high school. It was—the Netflix Prize was around. If you're not sure what that was, it was Netflix had this open competition where they said anyone who could improve upon their movie recommendation algorithm could win a million dollars.And obviously being a teenager, I had a massive ego and [laugh] no self-doubt, so I thought I could win this, but I just don't have enough CPUs or RAM on my laptop. And so when EC2 launched, and I could pay 80 cents an hour, rather than signing up for a 12-month contract with a colocation company, it was just a dream come true. I was able to run my terrible algorithms, but I could run them eight times faster. Unfortunately and obviously, I didn't win because it turns out, I'm not a world-class statistician. But—Corey: Common mistake. I make that mistake myself all the time.Aidan: [laugh]. Yeah. I mean, you know, I think I was probably 19 at the time, so I had—my ego did make me think I was one, but it turned out not to be so. But I think that was what really blew my mind was that me, a nobody, could create an account with Amazon and get access to these incredibly powerful machines for less than the dollar. And so I was hooked.Since then, I've worked at companies that are AWS customers since then. I've worked at places that have zero EC2 service, worked at places that have had thousands, and places in between. And it's got to a point, actually, where, I guess, my career is so entwined with AWS that one, my initials are actually AWS, but also—and this might sound ridiculous, and it's probably just a sign of my privilege—that I wouldn't consider working somewhere that used another cloud. Not—Corey: No, I think that's absolutely the right approach.Aidan: Yeah.Corey: I had a Twitter thread on this somewhat recently, and I'm going to turn it into a blog post because I got some pushback. If I were looking at doing something and I would come into the industry right now, my first choice would be Google Cloud because its developer experience is excellent. But I'm not coming to this without any experience. I have spent a decade or so learning not just how it was works, but also how it breaks, understanding the failure mode and what that's going to look like and what it's good at and what it's not. That's the valuable stuff for running things in a serious way.Aidan: Yeah. It's an interesting one. And I mean, for better or worse, AWS is big. I'm sure you will know much better than I do the exact numbers, but if a junior developer came to me and said, “Which cloud should I learn, or should I learn all of them?” I mean, you're right, Google Cloud does have a better developer experience, especially for new developers, but when I think about the sheer number of jobs that are available for developers, I feel like I would be doing them a disservice by not suggesting AWS, at least in Australia. It seems they've got such a huge footprint that you'll always be able to find a job working as an AWS-familiar engineer. It seems like that would be less the case with Google Cloud or Azure.Corey: Again, I am not sitting here, suggesting that anyone should, “Oh, clouds are insecure. We're going to run our own stuff in our own data centers.” That is ridiculous in this era. They are still going to do a better job of security than any of us will individually, let's be clear here. And it empowers and unlocks an awful lot of stuff.But with their privileged position as these hyperscale providers that are the default choice for building things, I think comes with a significant level of responsibility that I am displeased to discover that they've been abdicating. And I don't love that.Aidan: Yeah, it's an interesting one, right, because, like you're saying, they have access and the expertise that people doing it themselves will never match. So, you know, I'm never going to hesitate to recommend people use AWS on account security because your company's security posture will almost always be better for using AWS and following their guidelines, and so on. But yeah, like you say, with great power comes significant responsibility to earn trust and retain that trust by admitting and publicizing when mistakes are made.Corey: One last topic I want to get into with you is one that you and I have talked about very briefly elsewhere, that I feel like you and I are both relatively up-to-date on AWS intricacies. I think that we are both better than the average bear working with the platform. But I know that I feel this way, and I suspect you do too that VPCs have gotten confusing as hell. Is that just me? Am I a secret moron that no one bothered to ever tell me this, and I should update my own self-awareness?Aidan: [laugh]. Yeah, it's… I mean, that's been the story of my career with AWS. When I started, VPCs didn't exist. It was EC2 Classic—well, I guess at the time, it was just EC2—and it was simple. You launched an instance and you had an IP address.And then along came VPCs, and I think at the time, I thought something to the effect of “This seems like needless complexity. I'm not going to bother learning this. It will never be relevant.” In the end that wasn't true. I worked in much large deployments when VPCs made fantastic sense made a lot of things possible, but I still didn't go into the weeds.Since then, AWS has announced that EC2 Classic will be retired; an end of an era. I'm not personally still running anything in EC2 Classic, and I think they've done an incredible job of maintain support for this long, but VPC complexity has certainly been growing year-on-year since then. I recently was using the AWS console—like we all do and no one ever admits to—to edit a VPC subnet route table. And I clicked the drop-down box for a target, and I was overwhelmed by the number of options. There were NAT gateways, internet gateways, carrier gateways, I think there was a thing called a wavelength gateway, ENI, and… I [laugh] I think I was surprised because I just scroll through the list, and I thought, “Wow, that is a lot of different options. Why is that?”Especially because it's not so relevant to me. But I realized a big thing of what AWS has been doing lately is trying to make themselves available to people who haven't used the cloud yet. And they have these complicated networking needs, and it seems like they're trying to—reasonably successfully—make anything possible. But with that comes, you know, additional complexity.Corey: I appreciate that the capacity is there, but there has to be an abstraction model for getting rid of some of this complexity because otherwise, the failure mode is you wind up with this amazingly capable thing that can build marvels, but you also need to basically have a PhD in some of these things to wind up tying it all together. And if you bring someone else in to do it, then you have no idea how to run the thing. You're effectively a golden retriever trying to fly a space shuttle.Aidan: Yeah. It's interesting, like, clearly, they must be acutely aware of this because they have default VPCs, and for many use cases, that's all people should need. But as soon as you want, say a private subnet, then you need to either modify that default VPC or create a new one, and it's sort of going from 0 to 100 complexity extremely quickly because, you know, you need to create route tables to everyone's favorite net gateways, and it feels like the on-ramp needs to be not so steep. Not sure what the solution is, I hope they find one.Corey: As do I. I really want to thank you for taking the time to speak with me about so many of these things. If people want to learn more about what you're up to, where's the best place to find you?Aidan: Twitter's the best place. On Twitter, my username is @__Steele, which is S-T-E-E-L-E. From there, that's where I'll either—I'll at least speculate on the latest releases or link to some of the silly things I put on GitHub. Sometimes they're not so silly things. But yeah, that's where I can be found. And I'd love to chat to anyone about AWS. It's something I can geek out about all day, every day.Corey: And we will certainly include links to that in the [show notes 00:29:50]. Thank you so much for taking the time to speak with me today. I really appreciate it.Aidan: Well, thank you so much for having me. It's been an absolute delight.Corey: Aidan Steele, serverless engineer at Stedi, and shit poster extraordinaire. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an immediate request to correct the record about what I'm not fully understanding about AWS's piss-weak security communications.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.