Open source software, a time series db platform
POPULARITY
Aprende Grafana y domina el monitoreo de datos en tiempo real
InfluxDB just dropped its biggest update ever — InfluxDB 3.0 — and in this episode, we go deep with the team behind the world's most popular open-source time series database. You'll hear the inside story of how InfluxDB grew from 3,000 users in 2015 to over 1.3 million today, and why the company decided to rewrite its entire architecture from scratch in Rust, ditching Go and moving to object storage on S3.We break down the real technical challenges that forced this radical shift: the “cardinality problem” that choked performance, the pain of linking compute and storage, and why their custom query language (Flux) failed to catch on, leading to a humbling embrace of SQL as the industry standard. You'll learn how InfluxDB is positioning itself in a world dominated by Databricks and Snowflake, and the hard lessons learned about monetization when 1.3 million users only yield 2,600 paying customers.InfluxDataWebsite - https://www.influxdata.comX/Twitter - https://twitter.com/InfluxDBEvan KaplanLinkedIn - https://www.linkedin.com/in/kaplanevanX/Twitter - https://x.com/evankaplanFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturckFoursquare: Website - https://foursquare.comX/Twitter - https://x.com/Foursquare IG - instagram.com/foursquare (00:00) Intro (02:22) The InfluxDB origin story and why time series matters (06:59) The cardinality crisis and why Influx rebuilt in Rust (09:26) Why SQL won (and Flux lost) (16:34) Why UnfluxData bets on FDAP (22:51) IoT, Tesla Powerwalls, and real-time control systems (27:54) Competing with Databricks, Snowflake, and the “lakehouse” world (31:50) Open Source lessons, monetization, & what's next
AWS Morning Brief for the week of February 3, with Corey Quinn. Links:Amazon EFS is now available in the AWS Mexico (Central) RegionAmazon EKS and Amazon EKS Distro now supports Kubernetes version 1.32Amazon S3 Metadata is now generally availableAmazon Timestream for InfluxDb now supports Storage ScalingAWS Elastic Beanstalk now supports Python 3.13 on Amazon Linux 2023AWS Health now supports Internet Protocol Version 6 (IPv6)Announcing AWS Managed Notifications in the AWS Console Mobile AppAnnouncing new AWS Wavelength Zone in CasablancaAWS now supports Zone Groups for Availability ZonesAnnouncing the AWS CDK Glue L2 ConstructDeploy DeepSeek-R1 Distilled Llama models in Amazon BedrockAnnouncing upcoming changes to the AWS Security Token Service global endpointDesign patterns for multi-tenant access control on Amazon S3Amazon Nimble Studio Closed to New Customers
In this episode of the IoT For All Podcast, Evan Kaplan, CEO of InfluxData, joins Ryan Chacon to discuss time series data in IoT. The conversation covers the challenges of managing time series data, architecture and design considerations for handling time series data, optimizing data ingestion, organization, and querying, the integration of time series data with machine learning models, and the growing role of sensors and data in creating intelligent, autonomous systems. Evan Kaplan is a passionate entrepreneur and technology leader with nearly 25 years of experience in the CEO role. Evan's career spans from creating startups in his own garage to leading NASDAQ-listed companies generating nearly $200M in annual revenue. Prior to InfluxData, Evan served as Executive in Residence at Trinity Ventures, President and CEO at iPass Corporation (the leader in global Wi-Fi connectivity), and Founder, Chairman, and CEO at Aventail Corporation (the pioneer of SSLVPNs, now part of the Dell Corporation). InfluxData is the creator of InfluxDB, the leading time series platform used to collect, store, and analyze all time series data at any scale. Developers can query and analyze their time-stamped data in real-time to discover, interpret, and share new insights to gain a competitive edge. InfluxData is a remote-first company with a globally distributed workforce. Discover more about IoT at https://www.iotforall.com Find IoT solutions: https://marketplace.iotforall.com More about InfluxData: https://www.influxdata.com Connect with Evan: https://www.linkedin.com/in/kaplanevan/ (00:00) Intro (00:09) Evan Kaplan and InfluxData (00:47) What is time series data? (03:57) Managing time series data and challenges (09:15) What IoT applications use time series data? (12:04) Trends in data management (13:28) Future outlook on sensors and data (15:28) Learn more and follow up Subscribe to the Channel: https://bit.ly/2NlcEwm Join Our Newsletter: https://newsletter.iotforall.com Follow Us on Social: https://linktr.ee/iot4all
AWS Morning Brief for the week of December 23, with Corey Quinn. Links:Amazon AppStream 2.0 introduces client for macOSAmazon EC2 instances support bandwidth configurations for VPC and EBSAmazon Timestream for InfluxDB now supports Internet Protocol Version 6 (IPv6) connectivityAmazon WorkSpaces Thin Client now available to purchase in IndiaAWS Backup launches support for search and item-level recoveryAWS Mainframe Modernization now supports connectivity over Internet Protocol version 6 (IPv6)AWS Marketplace now supports self-service promotional media on seller product detail pagesAWS re:Post now supports Spanish and PortugueseAWS Resource Explorer supports 59 new resource typesAWS offers a self-service feature to update business names on AWS InvoicesAnnouncing CloudFormation support for AWS Parallel Computing ServiceAnnouncing Node Health Monitoring and Auto-Repair for Amazon EKS - AWSAnd that's a wrap!Best practices for creating a VPC for Amazon RDS for Db2How the Amazon TimeHub team handled disruption in AWS DMS CDC task caused by Oracle RESETLOGS: Part 3How to detect and monitor Amazon Simple Storage Service (S3) access with AWS CloudTrail and Amazon CloudWatchEnforce resource configuration to control access to new features with AWSMaximizing your cloud journey: Engaging an AWS Solutions Architect
Join hosts Phil Seboa and Ed Fuentes as they welcome principal engineer Lachlan Wright from PWD. Solutions. This episode of Unplugged dives deep into the world of industrial IoT, touching on Lachlan's career in automation, IIOT vs. IOT, the power of open frameworks, and the future of industrial automation. Lachlan shares his experiences with Raspberry Pi, Arduino, and emerging PLC technologies, alongside discussing the importance of data accessibility and the role of agile development in today's evolving tech landscape. Tune in for an enlightening conversation filled with valuable industry insights. 00:00 Introduction to Unplugged IIoT Podcast 00:45 Meet Lachlan Wright, Principal Engineer of PWD. Solutions 02:43 Phil's Passion for 3D Object Creation and Gaming through Blender 04:22 Ed's Journey with Python and Databasing 07:08 Versatility of Skills in Industrial and Control Systems 11:35 Lachlan's Home Automation: PLC MQTT for Power Monitoring 14:58 Lachlan's Experience with Chat GPT and New Facebook Tools 17:24 Phil on Llama 3.5 Models and Their Vast Resources 21:46 Lachlan's Industrial Anecdote: PLC, TCP Driver, InfluxDB, and Ignition 27:15 Pitfalls of Agile Methodology in IoT Digital Transformation 32:03 Importance of Community Collaboration in Open Source 35:16 Evolving PLCs: The Role of Software and Programming Languages 38:45 Integration of Docker Containers in Development 41:18 User Experience and Visualization in Industrial Applications 43:09 Shift Towards Web Native Technology 44:38 From Traditional SCADA to the WebDev Mindset 48:49 Interest in Time Series Databases like InfluxDB and Timescale 50:20 Enthusiasm for Continuous Learning and Technology Exploration 52:35 Unique Solutions for Different Industries and Sites 55:10 Raspberry Pi and Beckhoff CX 7000 Series in IIoT Deployments 58:46 Cost-Effectiveness of New Systems like Octo 22 Groove and PLC Nexts 01:02:19 Adoption of Raspberry Pi for Initial Automation Testing Connect with Lachlan on LinkedIn: https://www.linkedin.com/in/plcexpert/ Connect with Phil on LinkedIn: https://www.linkedin.com/in/phil-seboa/ Connect with Ed on LinkedIn: https://www.linkedin.com/in/ed-fuentes-2046121a/ ------ About Industry Sage Media: Industry Sage Media is your backstage pass to industry experts and the conversations that are shaping the future of the manufacturing industry. Learn more at: http://www.industrysagemedia.com
Allen Wyma talks with Andrew Lamb about InfluxDB's rewrite. InfluxDB is an open-source time series database. As a Staff Engineer at InfluxData, he works on InfluxDB 3.0, a new time series database written in Rust, focusing on query processing and the Apache Arrow DataFusion and Apache Arrow ecosystems. In that capacity, he is a member and past chair of the Apache Arrow PMC and actively contributes to Apache Arrow DataFusion and the Apache Rust implementation query engine. Andrew was a professional C/C++ programmer for 10 years before switching to Rust. His experience ranges from startups to large multinational corporations and distributed open source projects, and has paid leadership dues as an architect and manager/VP. He holds an SB and MEng from MIT in Electrical Engineering and Computer Science. Contributing to Rustacean Station Rustacean Station is a community project; get in touch with us if you'd like to suggest an idea for an episode or offer your services as a host or audio editor! Twitter: @rustaceanfm Discord: Rustacean Station Github: @rustacean-station Email: hello@rustacean-station.org Timestamps [@0:52] - Meet Andrew Lamb, Staff Engineer at InfluxData, working on InfluxDB IOx [@2:57] - Transitioning from C++ to Rust: Andrew's story [@11:24] - InfluxDB rewrite and its use cases [@22:13] - Compatibility of InfluxDB [@26:58] - Downsides of using Rust and other languages [@32:40] - Plans for the 3.0 alpha/beta release and different versions [@34:54] - Unique use of the async runtime Tokio [@55:28] - Rust as a tool for recruitment [@58:16] - Closing discussion Other links Andrew's X Account Using Rustlang's Async Tokio Runtime for CPU-Bound Tasks Using the FDAP Architecture to build InfluxDB 3.0 RustASIA Conf 2025 Credits Intro Theme: Aerocity Audio Editing: Plangora Hosting Infrastructure: Jon Gjengset Show Notes: Plangora Hosts: Allen Wyma
In this video I speak with Andrew Lamb, Staff Software Engineer @Influxdb. We discuss FDAP (Flight, DataFusion, Arrow, Parquet) stack for modern OLAP database system design. Andrew shared some insights into why the FDAP stack is so powerful in designing and implementing a modern OLAP database. Chapters: 00:00 Introduction 01:48 Understanding Analytics: Transactional vs Analytical Databases 04:41 The Genesis and Goals of the FDAP Stack 09:31 Decoding FDAP: Flight, Data Fusion, Arrow, and Parquet 12:40 Apache Parquet: Revolutionizing Columnar Storage 17:18 Apache Arrow: The In-Memory Game Changer 23:51 Interoperability and Migration with Apache Arrow 27:10 Comparing Apache Parquet and Arrow 28:26 Exploring Data Mutability in Analytic Systems 29:19 Handling Data Updates and Deletions 29:24 The Role of Immutable Storage in Analytics 30:42 Optimizing Data Storage and Mutation Strategies 34:20 Introducing Flight: Simplifying Data Transfer 35:02 Deep Dive into Flight's Benefits and SQL Support 39:20 Unpacking Data Fusion's SQL Support and Extensibility 46:12 The Interplay of FDAP Components in Analytics 51:49 Future Directions and Innovations in Data Analytics 56:04 Concluding Thoughts on FDAP and Its Impact FDAP Stack: https://www.influxdata.com/glossary/fdap-stack/ FDAP Blog: https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/ InfluxDB: https://www.influxdata.com/ Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #datafusion #parquet #sql #OLAP #apachearrow #database #systemdesign
In the push to integrate data into development, time series databases have gained significant importance. These databases capture time-stamped data from servers and sensors, enabling the collection and storage of valuable information. InfluxDB, a leading open-source time series database technology by InfluxData, has partnered with Amazon Web Services (AWS) to offer a managed open-source service for time series databases. Brad Bebee, General Manager of Amazon Neptune and Amazon Timestream highlighted the challenges faced by customers managing open-source Influx database instances, despite appreciating its API and performance. To address this, AWS initiated a private beta offering a managed service tailored to customer needs. Paul Dix, Co-founder and CTO of InfluxData joined Bebee, and highlighted Influx's prized utility in tracking measurements, metrics, and sensor data in real-time. AWS's Timestream complements this by providing managed time series database services, including TimesTen for Live Analytics and Timestream for Influx DB. Bebee emphasized the growing relevance of time series data and customers' preference for managed open-source databases, aligning with AWS's strategy of offering such services. This partnership aims to simplify database management and enhance performance for customers utilizing time series databases. Learn more from The New Stack about time series databases:What Are Time Series Databases, and Why Do You Need Them?Amazon Timestream: Managed InfluxDB for Time Series Data Install the InfluxDB Time-Series Database on Ubuntu Server 22.04Join our community of newsletter subscribers to stay on top of the news and at the top of your game.
Highlights from this week's conversation include:The Evolution of Data Systems (0:47)The Role of Open Source Software (2:39)Challenges of Time Series Data (6:38)Architecting InfluxDB (9:34)High Cardinality Concepts (11:36)Trade-Offs in Time Series Databases (15:35)High Cardinality Data (18:24)Evolution to InfluxDB 3.0 (21:06)Modern Data Stack (23:04)Evolution of Database Systems (29:48)InfluxDB Re-Architecture (33:14)Building an Analytic System with Data Fusion (37:33)Challenges of Mapping Time Series Data into Relational Model (44:55)Adoption and Future of Data Fusion (46:51)Externalized Joins and Technical Challenges (51:11)Exciting Opportunities in Data Tooling (55:20)Emergence of New Architectures (56:35)Final thoughts and takeaways (57:47)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Welcome to episode 252 of The Cloud Pod podcast, where the forecast is always cloudy! This week Justin, Jonathan, Ryan, and Matthew are talking about InfluxDB, collabs between AWS and NVIDIA, some personnel changes over at Microsoft, Amazon Timestream, and so much more! Sit back and enjoy – and make sure to hang around for the aftershow, where Linux and DBOS are on the docket. You won't want to miss it. Titles we almost went with this week: Light a fire under your Big Queries with Spark procedures All your NVIDIA GPU belong to AWS Thanks, EU for Free Data Transfer for all* Microsoft, Inflection, Mufasta, Scar… this is not the Lion King Sequel I expected The Cloud Pod sees Inflections in the Timestream The Cloud Pod is a palindrome The Cloudpod loves SQL so much we made a OS out of it Lets run SQL on Kubernetes on Top of DBOS. What could go wrong? The Cloud Pod is 5 7 5 long A big thanks to this week's sponsor: We're sponsorless this week! Interested in sponsoring us and having access to a specialized and targeted market? We'd love to talk to you. Send us an email or hit us up on our Slack Channel. Please. We're not above begging. Ok. Maybe Ryan is. But the rest of us? Absolutely not. AI Is Going Great (Or, How ML Makes All Its Money) 1:00 PSYCH! We're giving this segment a break this week. YOU'RE WELCOME. AWS 01:08 Anthropic's Claude 3 Haiku model is now available on Amazon Bedrock Last week Claude 3 Sonnet was available on Bedrock, this week Claude 3 Haiku is available on Bedrock. The Haiku model is the fastest and most compact mode of the Claude 3 family, designed for near-instant responsiveness and seamless generative AI experiences that mimic human interaction. We assume, thanks to how much Amazon is stretching this out, that next week we'll get Opus. Want to check it out for yourself? Head over to the Bedrock console. 02:02 Jonathan – “I haven’t tried Haiku, but I’ve played with Sonnet a lot for pre over the past week. It’s very good. It’s much better conversationally. I mean, I’m not talking about technical things. It’s like I ask all kinds of random philosophical questions or whatever, just to kind of explore what it can do, what it knows…If I was going to spend money on OpenAI or Anthropic, it would be on Anthropic right now.” 04:03 AWS Pi Day 2024: Use your data to power generative AI 3.14 just passed us by last week, and Amazon was back with a live steam on Twitch where they explored AWS storage from data lakes to High Performance Storage, and how to transform your data strategy to become the starting point for Generative AI. As always they announced several new storage features in honor of
Summary Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster (https://www.dataengineeringpodcast.com/dagster) today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join us at the top event for the global data community, Data Council Austin. From March 26-28th 2024, we'll play host to hundreds of attendees, 100 top speakers and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data and sharing their insights and learnings through deeply technical talks. As a listener to the Data Engineering Podcast you can get a special discount off regular priced and late bird tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit dataengineeringpodcast.com/data-council (https://www.dataengineeringpodcast.com/data-council) and use code dataengpod20 to register today! Your host is Tobias Macey and today I'm interviewing Paul Dix about his investment in the Apache Arrow ecosystem and how it led him to create the latest PFAD in database design Interview Introduction How did you get involved in the area of data management? Can you start by describing the FDAP stack and how the components combine to provide a foundational architecture for database engines? This was the core of your recent re-write of the InfluxDB engine. What were the design goals and constraints that led you to this architecture? Each of the architectural components are well engineered for their particular scope. What is the engineering work that is involved in building a cohesive platform from those components? One of the major benefits of using open source components is the network effect of ecosystem integrations. That can also be a risk when the community vision for the project doesn't align with your own goals. How have you worked to mitigate that risk in your specific platform? Can you describe the operational/architectural aspects of building a full data engine on top of the FDAP stack? What are the elements of the overall product/user experience that you had to build to create a cohesive platform? What are some of the other tools/technologies that can benefit from some or all of the pieces of the FDAP stack? What are the pieces of the Arrow ecosystem that are still immature or need further investment from the community? What are the most interesting, innovative, or unexpected ways that you have seen parts or all of the FDAP stack used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on/with the FDAP stack? When is the FDAP stack the wrong choice? What do you have planned for the future of the InfluxDB IOx engine and the FDAP stack? Contact Info LinkedIn (https://www.linkedin.com/in/pauldix/) pauldix (https://github.com/pauldix) on GitHub Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links FDAP Stack Blog Post (https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/) Apache Arrow (https://arrow.apache.org/) DataFusion (https://arrow.apache.org/datafusion/) Arrow Flight (https://arrow.apache.org/docs/format/Flight.html) Apache Parquet (https://parquet.apache.org/) InfluxDB (https://www.influxdata.com/products/influxdb/) Influx Data (https://www.influxdata.com/) Podcast Episode (https://www.dataengineeringpodcast.com/influxdb-timeseries-data-platform-episode-199) Rust Language (https://www.rust-lang.org/) DuckDB (https://duckdb.org/) ClickHouse (https://clickhouse.com/) Voltron Data (https://voltrondata.com/) Podcast Episode (https://www.dataengineeringpodcast.com/voltron-data-apache-arrow-episode-346/) Velox (https://github.com/facebookincubator/velox) Iceberg (https://iceberg.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/iceberg-with-ryan-blue-episode-52/) Trino (https://trino.io/) ODBC == Open DataBase Connectivity (https://en.wikipedia.org/wiki/Open_Database_Connectivity) GeoParquet (https://github.com/opengeospatial/geoparquet) ORC == Optimized Row Columnar (https://orc.apache.org/) Avro (https://avro.apache.org/) Protocol Buffers (https://protobuf.dev/) gRPC (https://grpc.io/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
For our very first episode, we welcome a special guest, Paul Dix, the CTO of InfluxData.He starts by giving us an overview of InfluxDB, an open source time series database used by developers to track server and application data. He takes us back to the early days of InfluxDB and explains how it came into existence, starting with the challenges they faced with their initial SaaS application and how they made the decision to repurpose their infrastructure and create this open source database. Paul also sheds light on the popularity of the programming language Go, which had a significant influence on their decision to use it for their project.He takes us through the journey of InfluxDB's development and the improvements that have been made over the years. He emphasizes the enhancements made in versions 0.11 and 1.0 to improve performance and query capabilities. Moreover, he shares their decision to explore using Rust for certain parts of the project and the positive impact it has had. Moving forward, the conversation delves into the challenges of managing high volumes of data in time series databases.Paul talks about the solutions they implemented, such as using BoltDB and developing the time-structured merge tree storage engine. We then dive into the decision to rewrite InfluxDB in Rust and the benefits it offers. He explains the improved performance, concurrency, and error handling that Rust brings to the table. Paul goes on to discuss the development process and how the engineering team has embraced Rust across their projects.As the conversation progresses, we touch on the performance improvements in InfluxDB 3 and the future plans for the database. Paul shares their vision of incorporating additional features and integrating with other tools and languages. He also mentions InfluxDB's involvement in open-source projects like Apache Aero Rust and Data Fusion, highlighting their ambition to extend beyond metric data. Paul concludes the conversation by discussing the standards and libraries in analytics, the role of Apache Iceberg, and the collaboration among data and analytics companies. He provides advice for getting started with Rust and InfluxDB, urging listeners to engage in hands-on projects and learn from books and online documentation.Thank you, Paul, for sharing your insights and expertise.
Summary If your business metrics looked weird tomorrow, would you know about it first? Anomaly detection is focused on identifying those outliers for you, so that you are the first to know when a business critical dashboard isn't right. Unfortunately, it can often be complex or expensive to incorporate anomaly detection into your data platform. Andrew Maguire got tired of solving that problem for each of the different roles he has ended up in, so he created the open source Anomstack project. In this episode he shares what it is, how it works, and how you can start using it today to get notified when the critical metrics in your business aren't quite right. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It's the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it's real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize (https://www.dataengineeringpodcast.com/materialize) today to get 2 weeks free! Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) Data projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. Miro is your single pane of glass where everyone can discover, track, and collaborate on your organization's data. I especially like the ability to combine your technical diagrams with data documentation and dependency mapping, allowing your data engineers and data consumers to communicate seamlessly about your projects. Find simplicity in your most complex projects with Miro. Your first three Miro boards are free when you sign up today at dataengineeringpodcast.com/miro (https://www.dataengineeringpodcast.com/miro). That's three free boards at dataengineeringpodcast.com/miro (https://www.dataengineeringpodcast.com/miro). Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Andrew Maguire about his work on the Anomstack project and how you can use it to run your own anomaly detection for your metrics Interview Introduction How did you get involved in the area of data management? Can you describe what Anomstack is and the story behind it? What are your goals for this project? What other tools/products might teams be evaluating while they consider Anomstack? In the context of Anomstack, what constitutes a "metric"? What are some examples of useful metrics that a data team might want to monitor? You put in a lot of work to make Anomstack as easy as possible to get started with. How did this focus on ease of adoption influence the way that you approached the overall design of the project? What are the core capabilities and constraints that you selected to provide the focus and architecture of the project? Can you describe how Anomstack is implemented? How have the design and goals of the project changed since you first started working on it? What are the steps to getting Anomstack running and integrated as part of the operational fabric of a data platform? What are the sharp edges that are still present in the system? What are the interfaces that are available for teams to customize or enhance the capabilities of Anomstack? What are the most interesting, innovative, or unexpected ways that you have seen Anomstack used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Anomstack? When is Anomstack the wrong choice? What do you have planned for the future of Anomstack? Contact Info LinkedIn (https://www.linkedin.com/in/andrewm4894/) Twitter (https://twitter.com/@andrewm4894) GitHub (http://github.com/andrewm4894) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Anomstack Github repo (http://github.com/andrewm4894/anomstack) Airflow Anomaly Detection Provider Github repo (https://github.com/andrewm4894/airflow-provider-anomaly-detection) Netdata (https://www.netdata.cloud/) Metric Tree (https://www.datacouncil.ai/talks/designing-and-building-metric-trees) Semantic Layer (https://en.wikipedia.org/wiki/Semantic_layer) Prometheus (https://prometheus.io/) Anodot (https://www.anodot.com/) Chaos Genius (https://www.chaosgenius.io/) Metaplane (https://www.metaplane.dev/) Anomalo (https://www.anomalo.com/) PyOD (https://pyod.readthedocs.io/) Airflow (https://airflow.apache.org/) DuckDB (https://duckdb.org/) Anomstack Gallery (https://github.com/andrewm4894/anomstack/tree/main/gallery) Dagster (https://dagster.io/) InfluxDB (https://www.influxdata.com/) TimeGPT (https://docs.nixtla.io/docs/timegpt_quickstart) Prophet (https://facebook.github.io/prophet/) GreyKite (https://linkedin.github.io/greykite/) OpenLineage (https://openlineage.io/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
This is Rust in Production, a podcast about companies who use Rust to shape the future of infrastructure. We follow their journey in pursuit of more reliable and efficient software as they solve some of the most challenging technical problems in the world.I'm your host, Matthias Endler, and I'm a software engineer at corrode, a consultancy that helps companies make the most of Rust. I've been using Rust since 2015, have been a member of the Rust Cologne meetup since Rust 1.0 and ran a YouTube channel called "Hello Rust".There are plenty of great podcasts about Rust, but I felt that there was a missing piece. I wanted to hear more about how companies who use Rust in production. What are the challenges they face? How do they overcome them? What are the benefits of using Rust? How does the company find and hire Rust developers? And what advice would they give to other companies who want to use Rust.I sit down with decision-makers from companies that bet big on Rust and ask them in-depth questions about what they learned along the way. New episodes air every two weeks on Thursdays at 4pm UTC. If you don't want to miss out, please subscribe on Spotify, Apple Podcasts, or wherever you listen to podcasts. This helps other people find the show and supports our work.If you want to learn more about the show, please visit corrode.dev/podcast. Stay tuned for the first episode, where I talk to Paul Dix from InfluxData about how they use Rust in the latest version of InfluxDB.
Summary Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold (https://www.dataengineeringpodcast.com/datafold) You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It's the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it's real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize (https://www.dataengineeringpodcast.com/materialize) today to get 2 weeks free! As more people start using AI for projects, two things are clear: It's a rapidly advancing field, but it's tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES (https://Neo4j.com/NODES). Your host is Tobias Macey and today I'm interviewing Eric Sammer about starting your stream processing journey with Decodable Interview Introduction How did you get involved in the area of data management? Can you describe what Decodable is and the story behind it? What are the notable changes to the Decodable platform since we last spoke? (October 2021) What are the industry shifts that have influenced the product direction? What are the problems that customers are trying to solve when they come to Decodable? When you launched your focus was on SQL transformations of streaming data. What was the process for adding full Java support in addition to SQL? What are the developer experience challenges that are particular to working with streaming data? How have you worked to address that in the Decodable platform and interfaces? As you evolve the technical and product direction, what is your heuristic for balancing the unification of interfaces and system integration against the ability to swap different components or interfaces as new technologies are introduced? What are the most interesting, innovative, or unexpected ways that you have seen Decodable used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Decodable? When is Decodable the wrong choice? What do you have planned for the future of Decodable? Contact Info esammer (https://github.com/esammer) on GitHub LinkedIn (https://www.linkedin.com/in/esammer/) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Decodable (https://www.decodable.co/) Podcast Episode (https://www.dataengineeringpodcast.com/decodable-streaming-data-pipelines-sql-episode-233/) Flink (https://flink.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/apache-flink-with-fabian-hueske-episode-57/) Debezium (https://debezium.io/) Podcast Episode (https://www.dataengineeringpodcast.com/debezium-change-data-capture-episode-114/) Kafka (https://kafka.apache.org/) Redpanda (https://redpanda.com/) Podcast Episode (https://www.dataengineeringpodcast.com/vectorized-red-panda-streaming-data-episode-152/) Kinesis (https://aws.amazon.com/kinesis/) PostgreSQL (https://www.postgresql.org/) Podcast Episode (https://www.dataengineeringpodcast.com/postgresql-with-jonathan-katz-episode-42/) Snowflake (https://www.snowflake.com/en/) Podcast Episode (https://www.dataengineeringpodcast.com/snowflakedb-cloud-data-warehouse-episode-110/) Databricks (https://www.databricks.com/) Startree (https://startree.ai/) Pinot (https://pinot.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/pinot-embedded-analytics-episode-273/) Rockset (https://rockset.com/) Podcast Episode (https://www.dataengineeringpodcast.com/rockset-serverless-analytics-episode-101/) Druid (https://druid.apache.org/) InfluxDB (https://www.influxdata.com/) Samza (https://samza.apache.org/) Storm (https://storm.apache.org/) Pulsar (https://pulsar.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/pulsar-fast-and-scalable-messaging-with-rajan-dhabalia-and-matteo-merli-episode-17) ksqlDB (https://ksqldb.io/) Podcast Episode (https://www.dataengineeringpodcast.com/ksqldb-kafka-stream-processing-episode-122/) dbt (https://www.getdbt.com/) GitHub Actions (https://github.com/features/actions) Airbyte (https://airbyte.com/) Singer (https://www.singer.io/) Splunk (https://www.splunk.com/) Outbox Pattern (https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
InfluxDB finishes a multi-year rewrite in Rust, the Raspberry Pi 5 will be on sale by the end of the month, the Bruno team builds an open source API explorer that's local-first and will never have a cloud, Xe Iaso thinks gokrazy is really cool & Matt Rickard shares lessons from years of debugging.
This is a recap of the top 10 posts on Hacker News on October 1st, 2023.This podcast was generated by wondercraft.ai(00:33): Tire dust makes up the majority of ocean microplasticsOriginal post: https://news.ycombinator.com/item?id=37726539&utm_source=wondercraft_ai(02:08): Influxdb made the switch from Go to RustOriginal post: https://news.ycombinator.com/item?id=37725778&utm_source=wondercraft_ai(04:03): Fine, I'll run a regression analysis but it won't make you happyOriginal post: https://news.ycombinator.com/item?id=37728642&utm_source=wondercraft_ai(05:57): Mozilla's midlife crisis has taken it from pioneer to Google's weird neighborOriginal post: https://news.ycombinator.com/item?id=37724538&utm_source=wondercraft_ai(07:41): DALL-E 3 is now publicly available inside BingOriginal post: https://news.ycombinator.com/item?id=37725498&utm_source=wondercraft_ai(09:09): DKIM: Rotate and publish your keysOriginal post: https://news.ycombinator.com/item?id=37723688&utm_source=wondercraft_ai(10:39): Pulsars, not dark matter, explain the Milky Way's antimatterOriginal post: https://news.ycombinator.com/item?id=37725530&utm_source=wondercraft_ai(12:19): 12,000-year-old realistic human statue was unearthedOriginal post: https://news.ycombinator.com/item?id=37729163&utm_source=wondercraft_ai(14:13): Never say no, but rarely say yes (2011)Original post: https://news.ycombinator.com/item?id=37724737&utm_source=wondercraft_ai(15:59): WFH significantly increased workforce participation from those with disabilitiesOriginal post: https://news.ycombinator.com/item?id=37727129&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai
InfluxDB finishes a multi-year rewrite in Rust, the Raspberry Pi 5 will be on sale by the end of the month, the Bruno team builds an open source API explorer that's local-first and will never have a cloud, Xe Iaso thinks gokrazy is really cool & Matt Rickard shares lessons from years of debugging.
InfluxDB finishes a multi-year rewrite in Rust, the Raspberry Pi 5 will be on sale by the end of the month, the Bruno team builds an open source API explorer that's local-first and will never have a cloud, Xe Iaso thinks gokrazy is really cool & Matt Rickard shares lessons from years of debugging.
### Tools* Ruff -> https://beta.ruff.rs/docs/### Cloud* Mountpoint for Amazon S3 -> https://aws.amazon.com/blogs/aws/mountpoint-for-amazon-s3-generally-available-and-ready-for-production-workloads/### Time Series* Is Flux being deprecated with InfluxDB 3.0? -> https://community.influxdata.com/t/is-flux-being-deprecated-with-influxdb-3-0/30992/8?u=pauldix* Timeseries dans le podcast AWS en Français -> https://aws.amazon.com/fr/blogs/france/podcasts/### Database* Awesome DuckDB -> https://github.com/davidgasquez/awesome-duckdb### GenAI* RAG vs Finetuning — Which Is the Best Tool to Boost Your LLM Application? -> https://towardsdatascience.com/rag-vs-finetuning-which-is-the-best-tool-to-boost-your-llm-application-94654b1eaba7* Best practices for your ChatGPT ‘on your data' solution -> https://medium.com/@imicknl/how-to-improve-your-chatgpt-on-your-data-solution-d1e842d87404* OpenAI, maker of ChatGPT, reportedly nears $1 billion in annual sales -> https://www.fastcompany.com/90946849/openai-chatgpt-reportedly-nears-1-billion-annual-sales?partner=rss&utm_source=feedly&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss?utm_source=tldrnewsletter### Vector DB* Jina-AI -> https://github.com/jina-ai/vectordb* Redis 7.2 LLM / VectorDB features -> ttps://redis.com/blog/introducing-redis-7-2/* AlloyDB -> https://techcrunch.com/2023/08/29/googles-alloydb-ai-transforms-databases-to-power-generative-ai-apps/?utm_source=substack&utm_medium=email&guccounter=2* Pinecone -> https://www.pinecone.io/blog/azure/?hss_channel=lcp-20299330&utm_content=256569107&utm_medium=social&utm_source=linkedin* pgvector -> https://jkatz05.com/post/postgres/pgvector-overview-0.5.0/* Vector Search Isn't Enough | BRKFP301H -> https://www.youtube.com/watch?v=5Qaxz2e2dVg### AI* AWS Entity Resolution: Match and Link Related Records from Multiple Applications and Data Stores | AWS News Blog -> https://aws.amazon.com/blogs/aws/aws-entity-resolution-match-and-link-related-records-from-multiple-applications-and-data-stores/### Agenda* Timeseries France 13/09/2023 -> https://timeseries.fr/edition/timeseriesfr-18/* Bigdatapero à Paris 27/09/2023 ->Cette publication est sponsorisée par Affini-Tech et CerenIT.CerenIT vous accompagne pour concevoir, industrialiser ou automatiser vos plateformes mais aussi pour faire parler vos données temporelles. Ecrivez nous à contact@cerenit.fr et retrouvez-nous aussi au Time Series France.Affini-Tech vous accompagne dans tous vos projets Cloud et Data, pour Imaginer, Expérimenter et Executer vos services ! (Affini-Tech, Datatask) Consulter le blog d'Affini-Tech et le blog de Datatask pour en savoir plus. On recrute ! Venez cruncher de la data avec nous ! Ecrivez nous à recrutement@affini-tech.comLe générique a été composé et réalisé par Maxence Lecointe
Richard Seroter, Director of Outbound Product Management at Google, joins Corey on Screaming in the Cloud to discuss what's new at Google. Corey and Richard discuss how AI can move from a novelty to truly providing value, as well as the importance of people maintaining their skills and abilities rather than using AI as a black box solution. Richard also discusses how he views the DevRel function, and why he feels it's so critical to communicate expectations for product launches with customers. About RichardRichard Seroter is Director of Outbound Product Management at Google Cloud. He's also an instructor at Pluralsight, a frequent public speaker, and the author of multiple books on software design and development. Richard maintains a regularly updated blog (seroter.com) on topics of architecture and solution design and can be found on Twitter as @rseroter. Links Referenced: Google Cloud: https://cloud.google.com Personal website: https://seroter.com Twitter: https://twitter.com/rseroter LinkedIn: https://www.linkedin.com/in/seroter/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Human-scale teams use Tailscale to build trusted networks. Tailscale Funnel is a great way to share a local service with your team for collaboration, testing, and experimentation. Funnel securely exposes your dev environment at a stable URL, complete with auto-provisioned TLS certificates. Use it from the command line or the new VS Code extensions. In a few keystrokes, you can securely expose a local port to the internet, right from the IDE.I did this in a talk I gave at Tailscale Up, their first inaugural developer conference. I used it to present my slides and only revealed that that's what I was doing at the end of it. It's awesome, it works! Check it out!Their free plan now includes 3 users & 100 devices. Try it at snark.cloud/tailscalescream Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. We have returning guest Richard Seroter here who has apparently been collecting words to add to his job title over the years that we've been talking to him. Richard, you are now the Director of Product Management and Developer Relations at Google Cloud. Do I have all those words in the correct order and I haven't forgotten any along the way?Richard: I think that's all right. I think my first job was at Anderson Consulting as an analyst, so my goal is to really just add more words to whatever these titles—Corey: It's an adjective collection, really. That's what a career turns into. It's really the length of a career and success is measured not by accomplishments but by word count on your resume.Richard: If your business card requires a comma, success.Corey: So, it's been about a year or so since we last chatted here. What have you been up to?Richard: Yeah, plenty of things here, still, at Google Cloud as we took on developer relations. And, but you know, Google Cloud proper, I think AI has—I don't know if you've noticed, AI has kind of taken off with some folks who's spending a lot the last year… juicing up services and getting things ready there. And you know, myself and the team kind of remaking DevRel for a 2023 sort of worldview. So, yeah we spent the last year just scaling and growing and in covering some new areas like AI, which has been fun.Corey: You became profitable, which is awesome. I imagined at some point, someone wound up, like, basically realizing that you need to, like, patch the hole in the pipe and suddenly the water bill is no longer $8 billion a quarter. And hey, that works super well. Like, wow, that explains our utility bill and a few other things as well. I imagine the actual cause is slightly more complex than that, but I am a simple creature.Richard: Yeah. I think we made more than YouTube last quarter, which was a good milestone when you think of—I don't think anybody who says Google Cloud is a fun side project of Google is talking seriously anymore.Corey: I misunderstood you at first. I thought you said that you're pretty sure you made more than I did last year. It's like, well, yes, if a multi-billion dollar company's hyperscale cloud doesn't make more than I personally do, then I have many questions. And if I make more than that, I have a bunch of different questions, all of which could be terrifying to someone.Richard: You're killing it. Yeah.Corey: I'm working on it. So, over the last year, another trend that's emerged has been a pivot away—thankfully—from all of the Web3 nonsense and instead embracing the sprinkle some AI on it. And I'm not—people are about to listen to this and think, wait a minute, is he subtweeting my company? No, I'm subtweeting everyone's company because it seems to be a universal phenomenon. What's your take on it?Richard: I mean, it's countercultural now to not start every conversation with let me tell you about our AI story. And hopefully, we're going to get past this cycle. I think the AI stuff is here to stay. This does not feel like a hype trend to me overall. Like, this is legit tech with real user interest. I think that's awesome.I don't think a year from now, we're going to be competing over who has the biggest model anymore. Nobody cares. I don't know if we're going to hopefully lead with AI the same way as much as, what is it doing for me? What is my experience? Is it better? Can I do this job better? Did you eliminate this complex piece of toil from my day two stuff? That's what we should be talking about. But right now it's new and it's interesting. So, we all have to rub some AI on it.Corey: I think that there is also a bit of a passing of the buck going on when it comes to AI where I've talked to companies that are super excited about how they have this new AI story that's going to be great. And, “Well, what does it do?” “It lets you query our interface to get an answer.” Okay, is this just cover for being bad UX?Richard: [laugh]. That can be true in some cases. In other cases, this will fix UXes that will always be hard. Like, do we need to keep changing… I don't know, I'm sure if you and I go to our favorite cloud providers and go through their documentation, it's hard to have docs for 200 services and millions of pages. Maybe AI will fix some of that and make it easier to discover stuff.So in some cases, UIs are just hard at scale. But yes, I think in some cases, this papers over other things not happening by just rubbing some AI on it. Hopefully, for most everybody else, it's actually interesting, new value. But yeah, that's a… every week it's a new press release from somebody saying they're about to launch some AI stuff. I don't know how any normal human is keeping up with it.Corey: I certainly don't know. I'm curious to see what happens but it's kind of wild, too, because there you're right. There is something real there where you ask it to draw you a picture of a pony or something and it does, or give me a bunch of random analysis of this. I asked one recently to go ahead and rank the US presidents by absorbency and with a straight face, it did it, which is kind of amazing. I feel like there's a lack of imagination in the way that people talk about these things and a certain lack of awareness that you can make this a lot of fun, and in some ways, make that a better showcase of the business value than trying to do the straight-laced thing of having it explain Microsoft Excel to you.Richard: I think that's fair. I don't know how much sometimes whimsy and enterprise mix. Sometimes that can be a tricky part of the value prop. But I'm with you this some of this is hopefully returns to some more creativity of things. I mean, I personally use things like Bard or what have you that, “Hey, I'm trying to think of this idea. Can you give me some suggestions?” Or—I just did a couple weeks ago—“I need sample data for my app.”I could spend the next ten minutes coming up with Seinfeld and Bob's Burgers characters, or just give me the list in two seconds in JSON. Like that's great. So, I'm hoping we get to use this for more fun stuff. I'll be fascinated to see if when I write the keynote for—I'm working on the keynote for Next, if I can really inject something completely off the wall. I guess you're challenging me and I respect that.Corey: Oh, I absolutely am. And one of the things that I believe firmly is that we lose sight of the fact that people are inherently multifaceted. Just because you are a C-level executive at an enterprise does not mean that you're not also a human being with a sense of creativity and a bit of whimsy as well. Everyone is going to compete to wind up boring you to death with PowerPoint. Find something that sparks the imagination and sparks joy.Because yes, you're going to find the boring business case on your own without too much in the way of prodding for that, but isn't it great to imagine what if? What if we could have fun with some of these things? At least to me, that's always been the goal is to get people's attention. Humor has been my path, but there are others.Richard: I'm with you. I think there's a lot to that. And the question will be… yeah, I mean, again, to me, you and I talked about this before we started recording, this is the first trend for me in a while that feels purely organic where our customers, now—and I'll tell our internal folks—our customers have much better ideas than we do. And it's because they're doing all kinds of wild things. They're trying new scenarios, they're building apps purely based on prompts, and they're trying to, you know, do this.And it's better than what we just come up with, which is awesome. That's how it should be, versus just some vendor-led hype initiative where it is just boring corporate stuff. So, I like the fact that this isn't just us talking; it's the whole industry talking. It's people talking to my non-technical family members, giving me ideas for what they're using this stuff for. I think that's awesome. So yeah, but I'm with you, I think companies can also look for more creative angles than just what's another way to left-align something in a cell.Corey: I mean, some of the expressions on this are wild to me. The Photoshop beta with its generative AI play has just been phenomenal. Because it's weird stuff, like, things that, yeah, I'm never going to be a great artist, let's be clear, but being able to say remove this person from the background, and it does it, as best I can tell, seamlessly is stuff where yeah, that would have taken me ages to find someone who knows what the hell they're doing on the internet somewhere and then pay them to do it. Or basically stumble my way through it for two hours and it somehow looks worse afterwards than before I started. It's the baseline stuff of, I'm never going to be able to have it—to my understanding—go ahead just build me a whole banner ad that does this and hit these tones and the rest, but it is going to help me refine something in that direction, until I can then, you know, hand it to a professional who can take it from my chicken scratching into something real.Richard: If it will. I think that's my only concern personally with some of this is I don't want this to erase expertise or us to think we can just get lazy. I think that I get nervous, like, can I just tell it to do stuff and I don't even check the output, or I don't do whatever. So, I think that's when you go back to, again, enterprise use cases. If this is generating code or instructions or documentation or what have you, I need to trust that output in some way.Or more importantly, I still need to retain the skills necessary to check it. So, I'm hoping people like you and me and all our —every—all the users out there of this stuff, don't just offload responsibility to the machine. Like, just always treat it like a kind of slightly drunk friend sitting next to you with good advice and always check it out.Corey: It's critical. I think that there's a lot of concern—and I'm not saying that people are wrong on this—but that people are now going to let it take over their jobs, it's going to wind up destroying industries. No, I think it's going to continue to automate things that previously required human intervention. But this has been true since the Industrial Revolution, where opportunities arise and old jobs that used to be critical are no longer centered in quite the same way. The one aspect that does concern me is not that kids are going to be used to cheat on essays like, okay, great, whatever. That seems to be floated mostly by academics who are concerned about the appropriate structure of academia.For me, the problem is, is there's a reason that we have people go through 12 years of English class in the United States and that is, it's not to dissect of the work of long-dead authors. It's to understand how to write and how to tell us a story and how to frame ideas cohesively. And, “The computer will do that for me,” I feel like that potentially might not serve people particularly well. But as a counterpoint, I was told when I was going to school my entire life that you're never going to have a calculator in your pocket all the time that you need one. No, but I can also speak now to the open air, ask it any math problem I can imagine, and get a correct answer spoken back to me. That also wasn't really in the bingo card that I had back then either, so I am a hesitant to try and predict the future.Richard: Yeah, that's fair. I think it's still important for a kid that I know how to make change or do certain things. I don't want to just offload to calculators or—I want to be able to understand, as you say, literature or things, not just ever print me out a book report. But that happens with us professionals, too, right? Like, I don't want to just atrophy all of my programming skills because all I'm doing is accepting suggestions from the machine, or that it's writing my emails for me. Like, that still weirds me out a little bit. I like to write an email or send a tweet or do a summary. To me, I enjoy those things still. I don't want to—that's not toil to me. So, I'm hoping that we just use this to make ourselves better and we don't just use it to make ourselves lazier.Corey: You mentioned a few minutes ago that you are currently working on writing your keynote for Next, so I'm going to pretend, through a vicious character attack here, that this is—you know, it's 11 o'clock at night, the day before the Next keynote and you found new and exciting ways to procrastinate, like recording a podcast episode with me. My question for you is, how is this Next going to be different than previous Nexts?Richard: Hmm. Yeah, I mean, for the first time in a while it's in person, which is wonderful. So, we'll have a bunch of folks at Moscone in San Francisco, which is tremendous. And I [unintelligible 00:11:56] it, too, I definitely have online events fatigue. So—because absolutely no one has ever just watched the screen entirely for a 15 or 30 or 60-minute keynote. We're all tabbing over to something else and multitasking. And at least when I'm in the room, I can at least pretend I'll be paying attention the whole time. The medium is different. So, first off, I'm just excited—Corey: Right. It feels a lot ruder to get up and walk out of the front row in the middle of someone's talk. Now, don't get me wrong, I'll still do it because I'm a jerk, but I'll feel bad about it as I do. I kid, I kid. But yeah, a tab away is always a thing. And we seem to have taken the same structure that works in those events and tried to force it into more or less a non-interactive Zoom call, and I feel like that is just very hard to distinguish.I will say that Google did a phenomenal job of online events, given the constraints it was operating under. Production value is great, the fact that you took advantage of being in different facilities was awesome. But yeah, it'll be good to be back in person again. I will be there with bells on in Moscone myself, mostly yelling at people, but you know, that's what I do.Richard: It's what you do. But we missed that hallway track. You missed this sort of bump into people. Do hands-on labs, purposely have nothing to do where you just walk around the show floor. Like we have been missing, I think, society-wise, a little bit of just that intentional boredom. And so, sometimes you need at conference events, too, where you're like, “I'm going to skip that next talk and just see what's going on around here.” That's awesome. You should do that more often.So, we're going to have a lot of spaces for just, like, go—like, 6000 square feet of even just going and looking at demos or doing hands-on stuff or talking with other people. Like that's just the fun, awesome part. And yeah, you're going to hear a lot about AI, but plenty about other stuff, too. Tons of announcements. But the key is that to me, community stuff, learn from each other stuff, that energy in person, you can't replicate that online.Corey: So, an area that you have expanded into has been DevRel, where you've always been involved with it, let's be clear, but it's becoming a bit more pronounced. And as an outsider, I look at Google Cloud's DevRel presence and I don't see as much of it as your staffing levels would indicate, to the naive approach. And let's be clear, that means from my perspective, all public-facing humorous, probably performative content in different ways, where you have zany music videos that, you know, maybe, I don't know, parody popular songs do celebrate some exec's birthday they didn't know was coming—[fake coughing]. Or creative nonsense on social media. And the the lack of seeing a lot of that could in part be explained by the fact that social media is wildly fracturing into a bunch of different islands which, on balance, is probably a good thing for the internet, but I also suspect it comes down to a common misunderstanding of what DevRel actually is.It turns out that, contrary to what many people wanted to believe in the before times, it is not getting paid as much as an engineer, spending three times that amount of money on travel expenses every year to travel to exotic places, get on stage, party with your friends, and then give a 45-minute talk that spends two minutes mentioning where you work and 45 minutes talking about, I don't know, how to pick the right standing desk. That has, in many cases, been the perception of DevRel and I don't think that's particularly defensible in our current macroeconomic climate. So, what are all those DevRel people doing?Richard: [laugh]. That's such a good loaded question.Corey: It's always good to be given a question where the answers are very clear there are right answers and wrong answers, and oh, wow. It's a fun minefield. Have fun. Go catch.Richard: Yeah. No, that's terrific. Yeah, and your first part, we do have a pretty well-distributed team globally, who does a lot of things. Our YouTube channel has, you know, we just crossed a million subscribers who are getting this stuff regularly. It's more than Amazon and Azure combined on YouTube. So, in terms of like that, audience—Corey: Counterpoint, you definitionally are YouTube. But that's neither here nor there, either. I don't believe you're juicing the stats, but it's also somehow… not as awesome if, say, I were to do it, which I'm working on it, but I have a face for radio and it shows.Richard: [laugh]. Yeah, but a lot of this has been… the quality and quantity. Like, you look at the quantity of video, it overwhelms everyone else because we spend a lot of time, we have a specific media team within my DevRel team that does the studio work, that does the production, that does all that stuff. And it's a concerted effort. That team's amazing. They do really awesome work.But, you know, a lot of DevRel as you say, [sigh] I don't know about you, I don't think I've ever truly believed in the sort of halo effect of if super smart person works at X company, even if they don't even talk about that company, that somehow presents good vibes and business benefits to that company. I don't think we've ever proven that's really true. Maybe you've seen counterpoints, where [crosstalk 00:16:34]—Corey: I can think of anecdata examples of it. Often though, on some level, for me at least, it's been okay someone I tremendously respect to the industry has gone to work at a company that I've never heard of. I will be paying attention to what that company does as a direct result. Conversely, when someone who is super well known, and has been working at a company for a while leaves and then either trashes the company on the way out or doesn't talk about it, it's a question of, what's going on? Did something horrible happen there? Should we no longer like that company? Are we not friends anymore? It's—and I don't know if that's necessarily constructive, either, but it also, on some level, feels like it can shorthand to oh, to be working DevRel, you have to be an influencer, which frankly, I find terrifying.Richard: Yeah. Yeah. I just—the modern DevRel, hopefully, is doing a little more of product-led growth style work. They're focusing specifically on how are we helping developers discover, engage, scale, become advocates themselves in the platform, increasing that flywheel through usage, but that has very discreet metrics, it has very specific ownership. Again, personally, I don't even think DevRel should do as much with sales teams because sales teams have hundreds and sometimes thousands of sales engineers and sales reps. It's amazing. They have exactly what they need.I don't think DevRel is a drop in the bucket to that team. I'd rather talk directly to developers, focus on people who are self-service signups, people who are developers in those big accounts. So, I think the modern DevRel team is doing more in that respect. But when I look at—I just look, Corey, this morning at what my team did last week—so the average DevRel team, I look at what advocacy does, teams writing code labs, they're building tutorials. Yes, they're doing some in person events. They wrote some blog posts, published some videos, shipped a couple open-source projects that they contribute to in, like gaming sector, we ship—we have a couple projects there.They're actually usually customer zero in the product. They use the product before it ships, provides bugs and feedback to the team, we run DORA workshops—because again, we're the DevOps Research and Assessment gang—we actually run the tutorial and Docs platform for Google Cloud. We have people who write code samples and reference apps. So, sometimes you see things publicly, but you don't see the 20,000 code samples in the docs, many written by our team. So, a lot of the times, DevRel is doing work to just enable on some of these different properties, whether that's blogs or docs, whether that's guest articles or event series, but all of this should be in service of having that credible relationship to help devs use the platform easier. And I love watching this team do that.But I think there's more to it now than years ago, where maybe it was just, let's do some amazing work and try to have some second, third-order effect. I think DevRel teams that can have very discrete metrics around leading indicators of long-term cloud consumption. And if you can't measure that successfully, you've probably got to rethink the team.[midroll 00:19:20]Corey: That's probably fair. I think that there's a tremendous series of… I want to call it thankless work. Like having done some of those ridiculous parody videos myself, people look at it and they chuckle and they wind up, that was clever and funny, and they move on to the next one. And they don't see the fact that, you know, behind the scenes for that three-minute video, there was a five-figure budget to pull all that together with a lot of people doing a bunch of disparate work. Done right, a lot of this stuff looks like it was easy or that there was no work at all.I mean, at some level, I'm as guilty of that as anyone. We're recording a podcast now that is going to be handed over to the folks at HumblePod. They are going to produce this into something that sounds coherent, they're going to fix audio issues, all kinds of other stuff across the board, a full transcript, and the rest. And all of that is invisible to me. It's like AI; it's the magic box I drop a file into and get podcast out the other side.And that does a disservice to those people who are actively working in that space to make things better. Because the good stuff that they do never gets attention, but then the company makes an interesting blunder in some way or another and suddenly, everyone's out there screaming and wondering why these people aren't responding on Twitter in 20 seconds when they're finding out about this stuff for the first time.Richard: Mm-hm. Yeah, that's fair. You know, different internal, external expectations of even DevRel. We've recently launched—I don't know if you caught it—something called Jump Start Solutions, which were executable reference architectures. You can come into the Google Cloud Console or hit one of our pages and go, “Hey, I want to do a multi-tier web app.” “Hey, I want to do a data processing pipeline.” Like, use cases.One click, we blow out the entire thing in the platform, use it, mess around with it, turn it off with one click. Most of those are built by DevRel. Like, my engineers have gone and built that. Tons of work behind the scenes. Really, like, production-grade quality type architectures, really, really great work. There's going to be—there's a dozen of these. We'll GA them at Next—but really, really cool work. That's DevRel. Now, that's behind-the-scenes work, but as engineering work.That can be some of the thankless work of setting up projects, deployment architectures, Terraform, all of them also dropped into GitHub, ton of work documenting those. But yeah, that looks like behind-the-scenes work. But that's what—I mean, most of DevRel is engineers. These are folks often just building the things that then devs can use to learn the platforms. Is it the flashy work? No. Is it the most important work? Probably.Corey: I do have a question I'd be remiss not to ask. Since the last time we spoke, relatively recently from this recording, Google—well, I'd say ‘Google announced,' but they kind of didn't—Squarespace announced that they'd be taking over Google domains. And there was a lot of silence, which I interpret, to be clear, as people at Google being caught by surprise, by large companies, communication is challenging. And that's fine, but I don't think it was anything necessarily nefarious.And then it came out further in time with an FAQ that Google published on their site, that Google Cloud domains was a part of this as well. And that took a lot of people aback, in the sense—not that it's hard to migrate a domain from one provider to another, but it brought up the old question of, if you're building something in cloud, how do you pick what to trust? And I want to be clear before you answer that, I know you work there. I know that there are constraints on what you can or cannot say.And for people who are wondering why I'm not hitting you harder on this, I want to be very explicit, I can ask you a whole bunch of questions that I already know the answer to, and that answer is that you can't comment. That's not constructive or creative. So, I don't want people to think that I'm not intentionally asking the hard questions, but I also know that I'm not going to get an answer and all I'll do is make you uncomfortable. But I think it's fair to ask, how do you evaluate what services or providers or other resources you're using when you're building in cloud that are going to be around, that you can trust building on top of?Richard: It's a fair question. Not everyone's on… let's update our software on a weekly basis and I can just swap things in left. You know, there's a reason that even Red Hat is so popular with Linux because as a government employee, I can use that Linux and know it's backwards compatible for 15 years. And they sell that. Like, that's the value, that this thing works forever.And Microsoft does the same with a lot of their server products. Like, you know, for better or for worse, [laugh] they will always kind of work with a component you wrote 15 years ago in SharePoint and somehow it runs today. I don't even know how that's possible. Love it. That's impressive.Now, there's a cost to that. There's a giant tax in the vendor space to make that work. But yeah, there's certain times where even with us, look, we are trying to get better and better at things like comms. And last year we announced—I checked them recently—you know, we have 185 Cloud products in our enterprise APIs. Meaning they have a very, very tight way we would deprecate with very, very long notice, they've got certain expectations on guarantees of how long you can use them, quality of service, all the SLAs.And so, for me, like, I would bank on, first off, for every cloud provider, whether they're anchor services. Build on those right? You know, S3 is not going anywhere from Amazon. Rock solid service. BigQuery Goodness gracious, it's the center of Google Cloud.And you look at a lot of services: what can you bet on that are the anchors? And then you can take bets on things that sit around it. There's times to be edgy and say, “Hey, I'll use Service Weaver,” which we open-sourced earlier this year. It's kind of a cool framework for building apps and we'll deconstruct it into microservices at deploy time. That's cool.Would I literally build my whole business on it? No, I don't think so. It's early stuff. Now, would I maybe use it also with some really boring VMs and boring API Gateway and boring storage? Totally. Those are going to be around forever.I think for me, personally, I try to think of how do I isolate things that have some variability to them. Now, to your point, sometimes you don't know there's variability. You would have just thought that service might be around forever. So, how are you supposed to know that that thing could go away at some point? And that's totally fair. I get that.Which is why we have to keep being better at comms, making sure more things are in our enterprise APIs, which is almost everything. So, you have some assurances, when I build this thing, I've got a multi-year runway if anything ever changes. Nothing's going to stay the same forever, but nothing should change tomorrow on a dime. We need more trust than that.Corey: Absolutely. And I agree. And the problem, too, is hidden dependencies. Let's say what is something very simple. I want to log in to [unintelligible 00:25:34] brand new AWS account and spin of a single EC2 instance. The end. Well, I can trust that EC2 is going to be there. Great. That's not one service you need to go through that critical path. It is a bare minimum six, possibly as many as twelve, depending upon what it is exactly you're doing.And it's the, you find out after the fact that oh, there was that hidden dependency in there that I wasn't fully aware of. That is a tricky and delicate balance to strike. And, again, no one is going to ever congratulate you—at all—on the decision to maintain a service that is internally painful and engineering-ly expensive to keep going, but as soon as you kill something, even it's for this thing doesn't have any customers, the narrative becomes, “They're screwing over their customers.” It's—they just said that it didn't have any. What's the concern here?It's a messaging problem; it is a reputation problem. Conversely, everyone knows that Amazon does not kill AWS services. Full stop. Yeah, that turns out everyone's wrong. By my count, they've killed ten, full-on AWS services and counting at the moment. But that is not the reputation that they have.Conversely, I think that the reputation that Google is going to kill everything that it touches is probably not accurate, though I don't know that I'd want to have them over to babysit either. So, I don't know. But it is something that it feels like you're swimming uphill on in many respects, just due to not even deprecation decisions, historically, so much as poor communication around them.Richard: Mm-hm. I mean, communication can always get better, you know. And that's, it's not our customers' problem to make sure that they can track every weird thing we feel like doing. It's not their challenge. If our business model changes or our strategy changes, that's not technically the customer's problem. So, it's always our job to make this as easy as possible. Anytime we don't, we have made a mistake.So, you know, even DevRel, hey, look, it puts teams in a tough spot. We want our customers to trust us. We have to earn that; you will never just give it to us. At the same time, as you say, “Hey, we're profitable. It's great. We're growing like weeds,” it's amazing to see how many people are using this platform. I mean, even services, you don't talk about having—I mean, doing really, really well. But I got to earn that. And you got to earn, more importantly, the scale. I don't want you to just kick the tires on Google Cloud; I want you to bet on it. But we're only going to earn that with really good support, really good price, stability, really good feeling like these services are rock solid. Have we totally earned that? We're getting there, but not as mature as we'd like to get yet, but I like where we're going.Corey: I agree. And reputations are tricky. I mean, recently InfluxDB deprecated two regions and wound up turning them off and deleting data. And they wound up getting massive blowback for this, which, to their credit, their co-founder and CTO, Paul Dix—who has been on the show before—wound up talking about and saying, “Yeah, that was us. We're taking ownership of this.”But the public announcement said that they had—that data in AWS was not recoverable and they're reaching out to see if the data in GCP was still available. At which point, I took the wrong impression from this. Like, whoa, whoa, whoa. Hang on. Hold the phone here. Does that mean that data that I delete from a Google Cloud account isn't really deleted?Because I have a whole bunch of regulators that would like a word if so. And Paul jumped onto that with, “No, no, no, no, no. I want to be clear, we have a backup system internally that we were using that has that set up. And we deleted the backups on the AWS side; we don't believe we did on the Google Cloud side. It's purely us, not a cloud provider problem.” It's like, “Okay, first, sorry for causing a fire drill.” Secondly, “Okay, that's great.” But the reason I jumped in that direction was just because it becomes so easy when a narrative gets out there to believe the worst about companies that you don't even realize you're doing it.Richard: No, I understand. It's reflexive. And I get it. And look, B2B is not B2C, you know? In B2B, it's not, “Build it and they will come.” I think we have the best cloud infrastructure, the best security posture, and the most sophisticated managed services. I believe that I use all the clouds. I think that's true. But it doesn't matter unless you also do the things around it, around support, security, you know, usability, trust, you have to go sell these things and bring them to people. You can't just sit back and say, “It's amazing. Everyone's going to use it.” You've got to earn that. And so, that's something that we're still on the journey of, but our foundation is terrific. We just got to do a better job on some of these intangibles around it.Corey: I agree with you, when you s—I think there's a spirited debate you could have on any of those things you said that you believe that Google Cloud is the best at, with the exception of security, where I think that is unquestionably. I think that is a lot less variable than the others. The others are more or less, “Who has the best cloud infrastructure?” Well, depends on who had what for breakfast today. But the simplicity and the approach you take to security is head and shoulders above the competition.And I want to make sure I give credit where due: it is because of that simplicity and default posturing that customers wind up better for it as a result. Otherwise, you wind up in this hell of, “You must have at least this much security training to responsibly secure your environment.” And that is never going to happen. People read far less than we wish they would. I want to make very clear that Google deserves the credit for that security posture.Richard: Yeah, and the other thing, look, I'll say that, from my observation, where we do something that feels a little special and different is we do think in platforms, we think in both how we build and how we operate and how the console is built by a platform team, you—singularly. How—[is 00:30:51] we're doing Duet AI that we've pre-announced at I/O and are shipping. That is a full platform experience covering a dozen services. That is really hard to do if you have a lot of isolation. So, we've done a really cool job thinking in platforms and giving that simplicity at that platform level. Hard to do, but again, we have to bring people to it. You're not going to discover it by accident.Corey: Richard, I will let you get back to your tear-filled late-night writing of tomorrow's Next keynote, but if people want to learn more—once the dust settles—where's the best place for them to find you?Richard: Yeah, hopefully, they continue to hang out at cloud.google.com and using all the free stuff, which is great. You can always find me at seroter.com. I read a bunch every day and then I've read a blog post every day about what I read, so if you ever want to tune in on that, just see what wacky things I'm checking out in tech, that is good. And I still hang out on different social networks, Twitter at @rseroter and LinkedIn and things like that. But yeah, join in and yell at me about anything I said.Corey: I did not realize you had a daily reading list of what you put up there. That is news to me and I will definitely track in, and then of course, yell at you from the cheap seats when I disagree with anything that you've chosen to include. Thank you so much for taking the time to speak with me and suffer the uncomfortable questions.Richard: Hey, I love it. If people aren't talking about us, then we don't matter, so I would much rather we'd be yelling about us than the opposite there.Corey: [laugh]. As always, it's been a pleasure. Richard Seroter, Director of Product Management and Developer Relations at Google Cloud. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment that you had an AI system write for you because you never learned how to structure a sentence.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Paul Dix is Cofounder & CTO of open source time series data company InfluxData. The company's open source datastore, InfluxDB, has 26K stars on GitHub. InfluxData has raised over $200M from investors including Norwest, Battery, and Sapphire Ventures. In this episode, we dig into building the category of time series data, how an open source company's monetization plan should tie to fundraising, some of the hardest decisions the team had to make during InfluxData's journey so far & more!
The Linux world is abuzz with controversy as IBM Red Hat puts the source code for Red Hat Enterprise Linux (RHEL) behind a paywall, leaving CentOS Stream as the only accessible option. Meanwhile, Oracle emphasizes its commitment to Linux freedom, offering open access to binaries and source code for their RHEL-compatible distribution, Oracle Linux. On the other hand, SUSE takes a bold step by forking RHEL and investing millions in developing their own RHEL-compatible distribution, free from restrictions. The battle for Linux supremacy is heating up, with each company vying for dominance and championing their respective visions of openness and innovation. Time Stamps: 0:00 - Welcome to the Rundown 0:39 - Intel Shucks NUC Products 7:18 - Kentik Announces Azure Observability 10:18 - El Capitan Reporting for Duty 15:39 - Microsoft Issues Patch Targeting Malicious Drivers 18:53 - InfluxData Sees Influx of Angry Customers 23:01 - Vector Capital Acquires Riverbed Technology 26:52 - Oracle Wants Linux to Remain Open and Free 31:34 - SUSE Working On Another Red Hat 35:21 - The Battle Over Red Hat Enterprise Linux 39:11 - The Weeks Ahead 40:56 - Thanks for Watching Follow our Hosts on Social Media Tom Hollingsworth: https://www.twitter.com/NetworkingNerd Stephen Foskett: https://www.twitter.com/SFoskett Follow Gestalt IT Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/Gestalt-IT Tags: #Rundown, @Intel, @IntelBusiness, @KentikInc, @Azure, @Microsoft, #Observability, @InfluxDB, @Riverbed, @Capital_Vector, @RedHat, @Oracle, @OracleCloud, #Linux, #OpenSource,
Watch on YouTube About the show Sponsored by InfluxDB from Influxdata. Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Ask me anything episode: Submit your question(s) for our upcoming AMA episode: form here. Thank you! Brian #1: PythonGUIS Martin Fitzpatrick A site with a collection of resources, guides, books, comparisons, etc, around GUIs in Python. Martin recommends starting with PyQT6 However, there are tutorials covering PyQT6 PySide6 PyQT5 TkInter PySide even Kivy Michael #2: JupyterLab 4.0 is Here The next major release of our full-featured development environment You can upgrade by running pip install --upgrade jupyterlab or conda install -c conda-forge jupyterlab. JupyterLab is now faster, thanks to improvements such as CSS rules optimization, CodeMirror 6, MathJax 3, and notebook windowing. JupyterLab 3 was when working with large notebooks. There are additional performance improvements available via opt-in settings: Faster tab-switching on Chromium browsers: “Settings” → “JupyterLab Shell” → switch “Hidden mode” to “contentVisibility” Better performance with long notebooks: “Settings” → “Notebook” → switch “Windowing mode” to “full” An upgraded text editor. Better real time collaboration. Bug fixes. More than 100 bugs have been addressed and resolved, enhancing JupyterLab's stability and performance. Brian #3: Proposing a struct syntax for Python Brett Cannon This would be a cool syntax for a data only type: struct Point(x: int, y: int) No positional only parameters No inheritance No methods Instances would be immutable, so p = Point(1, 2) would create an object that could be used as a key. A data only focused set of types. Michael #4: Python 3.13 Removes 20 Stdlib Modules via PyCoders From PEP 594 – Removing dead batteries from the standard library we're saying goodbye to aifc, audioop, cgi, cgitb, chunk, crypt, imghdr, mailcap, msilib, nis, nntplib, ossaudiodev, pipes, sndhdr, spwd, sunau, telnetlib, uu, xdrlib As well as the 2to3 program and lib2to3 module in Python. Python 3.12 final release is scheduled in 4 months (October 2023) and Python 3.13 final release is scheduled in 1 year and 4 months (October 2024). Extras Brian: Affirming your PSF Membership voting status You have until June 15 to affirm your voting rights in the upcoming Board Election, if you care about such things. Michael: 5 Career Tips for Budding Python Developers video PyCon US 2023 videos are up Python 3.11.4, 3.10.12, 3.9.17, 3.8.17, 3.7.17, and 3.12.0 beta 2 are now available Joke: Snorkel not included
Watch on YouTube About the show Sponsored by InfluxDB from Influxdata. Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Michael #1: pystack PyStack is a tool that uses forbidden magic to let you inspect the stack frames of a running Python process or a Python core dump, helping you quickly and easily learn what it's doing. PyStack has the following amazing features:
Watch on YouTube About the show Sponsored by InfluxDB from Influxdata. Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Brian #1: Python's Missing Batteries: Essential Libraries You're Missing Out On Martin Heinz Fun collection of a bunch of libraries you may not know about (or forgot about), with code examples. Utilities boltons : iterate through json and dates, quickly grab data out of nested structures, and convert nested data with jsonutils, timeutils, and iterutils sh : conveniently call shell funcitons Data Validation validators : validate email addresses, credit cars, IP addresses, and more. the fuzz : fuzzy string comparisons Debugging stackprinter : nice stack traces with exception messages higlighted Testing freezegun : stop time, change dates, … dirty_equals : comparing things that are kinda equal CLI tqdm : add a progress bar to command line apps Michael #2: awesome-polars A curated list of Polars talks, tools, examples & articles. Mostly articles and tutorials however. Brian #3: Running Headless Selenium in Python (2023) Siddiqi First off, if you are doing automated testing with Selenium, I hope you already know about headless. It's awesome and speeds up testing. Next, there's changes to how you code headless, as of Selenium 4.8.0 (Jan. 2023). Old: options.headless` `**=**` `True New: options.add_argument('--headless=new') for Chrome options.add_argument('--headless') for Firefox Reasons: Read Headless is Going Away! post on Selenium blog. Subtitle: “Now that we got your attention, headless is not actually going away, just the convenience method to set it in Selenium” Michael #4: Gracy Gracy helps you handle failures, logging, retries, throttling, and tracking for all your HTTP interactions. Has support for Parsing per status code Throttling Retries Custom validation Record/replay for testing A bit non-pythonic but perhaps inspriation for some out there Extras Michael: Mobile apps are finally out Take the git course for free for a limited time. Michael's blog post announcing the apps Joke: It's practice
Watch on YouTube About the show Sponsored by InfluxDB from Influxdata. Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Michael #1: Introducing 'Trusted Publishers' PyPI package maintainers can adopt a new, more secure publishing method that does not require long-lived passwords or API tokens to be shared with external systems. Our term for using the OpenID Connect (OIDC) standard to exchange short-lived identity tokens between a trusted third-party service and PyPI. Instead, PyPI maintainers can configure PyPI to trust an identity provided by a given OpenID Connect Identity Provider (IdP). These API tokens never need to be stored or shared rotate automatically by expiring quickly provide a verifiable link between a published package and its source Additional security hardening is available Brian #2: Mojo : a new programming language for all AI developers. Mojo may be the biggest programming language advance in decades - fast.ai blog Suggested by many listeners “Mojo combines the usability of Python with the performance of C, unlocking unparalleled programmability of AI hardware and extensibility of AI models.” A programming language compatible with Python, with performance similar to C++/Rust. “Mojo is designed to become a superset of Python over time by preserving Python's dynamic features while adding new primitives for systems programming.” - emphasis from Brian It's not there yet, but still super cool Built on a MLIR, not LLVM “How compatible is Mojo with Python really? Mojo already supports many core features of Python including async/await, error handling, variadics, etc, but… it is still very early and missing many features - so today it isn't very compatible. Mojo doesn't even support classes yet!” Michael #3: django-prose Wonderful rich-text editing for your Django project. Rendering rich-text in templates Small rich-text content (as model fields) Django Prose is using Bleach to only allow certain tags and attributes See the website for a screenshot of it in action Brian #4: pylyzer is a static code analyzer / language server for Python, written in Rust. Shunsuke Shibayama Suggested by Owen Features fast detailed analysis type checking plus things like out-of-bounds accesses to lists, and non-existent key references to dicts more readable reports and a VS Code extension pylyzer vs ruff “Ruff, like pylyzer, is a static code analysis tool for Python written in Rust, but Ruff is a linter and pylyzer is a type checker & language server. pylyzer does not perform linting, and Ruff does not perform type checking.” Some limitations and incomplete “todo list”. See README for more details. Joke: Escape Room
Watch on YouTube About the show Sponsored by InfluxDB from Influxdata. Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Brian #1: huak - A Python package manager written in Rust. Inspired by Cargo Suggested by Owen Tons of workflows activate - activate a virtual environment add add a dependency to a project pip install it into your virtual environment, and add it to the dependency list in pyproject.toml test - run pytest update update dependencies lint - run ruff, installing it first if necessary fix - autofix fixable lint conflicts build - build wheel in isolated virtual environment using hatchling Honestly I was considering building my own workflow tool, but this is darned close to what I want. Even though it's still “in an experimental state”. There are rough edges (ruff edges, get it), but still, way cool. I just don't know how to pronounce it. Is it like “walk”, or more like “whack”? Michael #2: PSF expresses concerns about a proposed EU law that may make it impossible to continue providing Python and PyPI to the European public After reviewing the proposed Cyber Resilience Act and Product Liability Act, the PSF has found issues that put the mission of our organization and the health of the open-source software community at risk. As currently written, the authors of open-source components might bear legal and financial responsibility for the way their components are applied in someone else's commercial product. The risk of huge potential costs would make it impossible in practice for us to continue to provide Python and PyPI to the European public. Brian #3: ChaosToolkit Suggested by the maintainer, Sylvain Hellegouarch Declare and store your Chaos Engineering experiments as JSON/YAML files so you can collaborate and orchestrate them as any other piece of code. Extensible through an Open API Can be automated in CI/CD pipeline Michael #4: PEP 711 – PyBI: a standard format for distributing Python Binaries “Like wheels, but instead of a pre-built python package, it's a pre-built python interpreter” Joke: It's the effort that counts
Watch on YouTube About the show Sponsored by InfluxDB from Influxdata. Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Michael #1: makeapp via Felix Ingram Simplifies Python application rollout and publishing. Link to its mention on Talk Python. Simplifies Python application rollout and publishing: Make a skeleton for your new application with one console command Automatically create a VCS repository for your application. Automatically check whether the chosen application name is not already in use. Customize new application layouts with skeleton templates. Put some skeleton default settings into a configuration file not to mess with command line switches anymore. Easily add entries to your changelog. Publish your application to remotes (VCS, PyPI) with single command. Brian #2: Looking forward to Python 3.12 We're on 3.12.0a7 now, the last alpha, final is scheduled for October schedule So far, in 3.12.0a7 What's new in Python 3.12 page has some examples of the Improved Error Messages Recent addition, PEP 684 - A Per-Interpreter GIL was approved recently “… sufficient isolation would facilitate true multi-core parallelism …” seems like a good thing. But also, “… this is an advanced feature meant for a narrow set of users of the C-API. “, so not really sure how this will affect us. Still, seems cool. Michael #3: Python 3.11.3 is out Fixes a HIGH level CVE in OpenSSL (so patch it) Lots of changes in Core and Builtins Brian #4: How to Make a Great Conference Talk Sebastian Witowski Lots of great advice for tech conf talks. Don't skip the last half of this, getting your talk accepted is really when the work starts. Good sections to make sure you don't miss Live demos “First of all - do you really need a demo? …” Rehearsing Don't skip this. Do this. A lot. Out loud. With a timer. While standing. Memorize the first few minutes, and the last few. Know how you're going to open and close. Night before get enough sleep Day of eat well. Don't drink too much liquids. Be comfortable. Sebastian was honest in saying this stuff works for him, but do what works for you. From Brian: I deviate from Sebastian in quite a few places, but still don't disagree with his advice. I can't give a talk without slides, as I use them for prompts to know what I'm talking about next. My talks usually have a lot of code snippets. Obviously, that would be difficult without slides. I write my talk and my slides in Markdown. Sebastian writes in something else, then builds slides as visual aids. That's cool. Do what works for you. Bonus tool from the article: demo-magic - If I'm ever tempted to live code again, I think I'll try this instead. Extras Michael: NOW the CDN course is out. Django 4.2 released. Joke: Using A.I. for Efficiency
Today on the Tech Bytes podcast we dive into gNMIc with sponsor Nokia. gNMIc is open-source software you can use to configure devices and collect device telemetry. It can output telemetry to InfluxDB, Prometheus, and SNMP traps. Nokia has contributed gNMIc to the OpenConfig project. We talk with gNMIc creator Karim Radhouani, Technology and Architecture Consulting Engineer at Nokia, about why he developed the tool and how customers are using it.
Today on the Tech Bytes podcast we dive into gNMIc with sponsor Nokia. gNMIc is open-source software you can use to configure devices and collect device telemetry. It can output telemetry to InfluxDB, Prometheus, and SNMP traps. Nokia has contributed gNMIc to the OpenConfig project. We talk with gNMIc creator Karim Radhouani, Technology and Architecture Consulting Engineer at Nokia, about why he developed the tool and how customers are using it. The post Tech Bytes: Configure Devices, Stream Telemetry With Nokia's Free, Open-Source gNMIc (Sponsored) appeared first on Packet Pushers.
Today on the Tech Bytes podcast we dive into gNMIc with sponsor Nokia. gNMIc is open-source software you can use to configure devices and collect device telemetry. It can output telemetry to InfluxDB, Prometheus, and SNMP traps. Nokia has contributed gNMIc to the OpenConfig project. We talk with gNMIc creator Karim Radhouani, Technology and Architecture Consulting Engineer at Nokia, about why he developed the tool and how customers are using it. The post Tech Bytes: Configure Devices, Stream Telemetry With Nokia's Free, Open-Source gNMIc (Sponsored) appeared first on Packet Pushers.
Today on the Tech Bytes podcast we dive into gNMIc with sponsor Nokia. gNMIc is open-source software you can use to configure devices and collect device telemetry. It can output telemetry to InfluxDB, Prometheus, and SNMP traps. Nokia has contributed gNMIc to the OpenConfig project. We talk with gNMIc creator Karim Radhouani, Technology and Architecture Consulting Engineer at Nokia, about why he developed the tool and how customers are using it.
Today on the Tech Bytes podcast we dive into gNMIc with sponsor Nokia. gNMIc is open-source software you can use to configure devices and collect device telemetry. It can output telemetry to InfluxDB, Prometheus, and SNMP traps. Nokia has contributed gNMIc to the OpenConfig project. We talk with gNMIc creator Karim Radhouani, Technology and Architecture Consulting Engineer at Nokia, about why he developed the tool and how customers are using it. The post Tech Bytes: Configure Devices, Stream Telemetry With Nokia's Free, Open-Source gNMIc (Sponsored) appeared first on Packet Pushers.
Today on the Tech Bytes podcast we dive into gNMIc with sponsor Nokia. gNMIc is open-source software you can use to configure devices and collect device telemetry. It can output telemetry to InfluxDB, Prometheus, and SNMP traps. Nokia has contributed gNMIc to the OpenConfig project. We talk with gNMIc creator Karim Radhouani, Technology and Architecture Consulting Engineer at Nokia, about why he developed the tool and how customers are using it.
Linda and Dave chat again with Jay Clifford, Developer Advocate at InfluxData. InfluxData are the makers of InfluxDB, a popular open-source platform for simplifying time series data management. Designed to handle high speed and high volume data ingest and real-time data analysis, InfluxDB's robust data collectors, common API across the entire platform, highly performant time series engine, and optimized storage lets you build once and deploy across multiple products and environments. In part two of this conversation, Jay gives us a pop quiz on time series data, offers best practices for developers on handling time series data, shares some real-world customer success stories, and covers handling visualization with time series data. If you missed part one, you can listen in with Episode 76. Follow Jay on LinkedIn: https://www.linkedin.com/in/jaymand13/ Jay on Git: https://github.com/Jayclifford345 Linda on Twitter: https://twitter.com/lindavivah Linda's Website: https://lindavivah.com/ Linda on TikTok: https://www.tiktok.com/@lindavivah Linda on Instagram: https://www.instagram.com/lindavivah/ Linda's Medium: https://medium.com/@LindaVivah [DOCS] InfluxDB Cloud Powered by IOX -https://docs.influxdata.com/influxdb/cloud-iox/get-started/ [GIT] InfluxData Telegraf Plugins & Integrations - https://github.com/influxdata/telegraf [GIT] InfluxDB OSS - https://github.com/influxdata/influxdb [PORTAL] InfluxData Website - https://www.influxdata.com [PORTAL] InfluxData Cloud Signup - https://cloud2.influxdata.com/signup [PORTAL] InfluxDB Use Cases - https://www.influxdata.com/customers/ [PORTAL] Pandas - https://pandas.pydata.org/ [SLACK] InfluxData Community Slack - https://www.influxdata.com/slack (connect with Jay @JayClifford) [TRAINING] InfluxDB University (Free InfluxDB Training) - https://university.influxdata.com Subscribe: Amazon Music: https://music.amazon.com/podcasts/f8bf7630-2521-4b40-be90-c46a9222c159/aws-developers-podcast Apple Podcasts: https://podcasts.apple.com/us/podcast/aws-developers-podcast/id1574162669 Google Podcasts: https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5zb3VuZGNsb3VkLmNvbS91c2Vycy9zb3VuZGNsb3VkOnVzZXJzOjk5NDM2MzU0OS9zb3VuZHMucnNz Spotify: https://open.spotify.com/show/7rQjgnBvuyr18K03tnEHBI TuneIn: https://tunein.com/podcasts/Technology-Podcasts/AWS-Developers-Podcast-p1461814/ RSS Feed: https://feeds.soundcloud
Watch on YouTube About the show Sponsored by Influxdb Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Brian #1: Pydantic V2 Pre Release Terrence Dorsey & Samuel Colvin Alpha release available to everyone: pip install --pre -U "pydantic>=2.0a1" Headlines: pydantic-core - all validation logic rewritten in Rust and moved to separate package, pytest-core 5-50x faster separation will aid safety and maintainability Lots ready for experimentation BaseModel, Dataclasses, Serialization, … Much still under construction Docs, BaseSettings→ pydantic-settings, … Michael #2: microdot The impossibly small web framework for Python and MicroPython Microdot is a minimalistic Python web framework inspired by Flask, and designed to run on systems with limited resources such as microcontrollers. It runs on standard Python and on MicroPython. Support for async, websockets, tls, even ASGI servers. Less mem usage by a big margin. Brian #3: GitHub Actions Tools: watchgha, build and inspect, and pytest annotate failures watchgha Ned Batchelder Watch GH Actions progress on the command line build-and-inspect-python-package Hynek Test the build of wheels, check contents, lint README print sdist contents, wheel contents, and metadata pytest-github-actions-annotate-failures utgwkk Nice traceback annotations for pytest Michael #4: PEP 709 – Inlined comprehensions by Carl Meyer Comprehensions are currently compiled as nested functions, which provides isolation of the comprehension's iteration variable, but is inefficient at runtime. This PEP proposes to inline list, dictionary, and set comprehensions into the code where they are defined, and provide the expected isolation by pushing/popping clashing locals on the stack. This change makes comprehensions much faster: up to 2x faster for a microbenchmark of a comprehension alone. Extras Michael: Python Web Apps that Fly with CDNs Course Joke: Can't watch movies
In this episode, Linda and Dave chat with Jay Clifford, Developer Advocate at InfluxData. InfluxData are the makers of InfluxDB, a popular open-source platform for simplifying time series data management. Designed to handle high speed and high volume data ingest and real-time data analysis, InfluxDB's robust data collectors, common API across the entire platform, highly performant time series engine, and optimized storage lets you build once and deploy across multiple products and environments. Jay shares his journey to the cloud, gives us a developer introduction to time series data management, how to compare it to relational data and databases, how developers should think about this kind of data, and shares some real world customer success stories. Follow Jay on LinkedIn: https://www.linkedin.com/in/jaymand13/ Jay on Git: https://github.com/Jayclifford345 Linda on Twitter: https://twitter.com/lindavivah Linda's Website: https://lindavivah.com/ Linda on TikTok: https://www.tiktok.com/@lindavivah Linda on Instagram: https://www.instagram.com/lindavivah/ Linda's Medium: https://medium.com/@LindaVivah [DAVES COLLECTION] Rick and Morty - https://awsdeveloperspodcast.s3.amazonaws.com/RickMorty.PNG [DAVES COLLECTION] World of Warcraft Collectors Editions – https://awsdeveloperspodcast.s3.amazonaws.com/WoWCollectorsEditions.PNG [DOCS] InfluxDB Cloud Powered by IOX -https://docs.influxdata.com/influxdb/cloud-iox/get-started/ [GIT] InfluxData Telegraf Plugins & Integrations - https://github.com/influxdata/telegraf [GIT] InfluxDB OSS - https://github.com/influxdata/influxdb [PORTAL] InfluxData Website - https://www.influxdata.com [PORTAL] InfluxData Cloud Signup - https://cloud2.influxdata.com/signup [PORTAL] InfluxDB Use Cases - https://www.influxdata.com/customers/ [PORTAL] Pandas - https://pandas.pydata.org/ [SLACK] InfluxData Community Slack - https://www.influxdata.com/slack (connect with Jay @JayClifford) [TRAINING] InfluxDB University (Free InfluxDB Training) - https://university.influxdata.com Subscribe: Amazon Music: https://music.amazon.com/podcasts/f8bf7630-2521-4b40-be90-c46a9222c159/aws-developers-podcast Apple Podcasts: https://podcasts.apple.com/us/podcast/aws-developers-podcast/id1574162669 Google Podcasts: https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5zb3VuZGNsb3VkLmNvbS91c2Vycy9zb3VuZGNsb3VkOnVzZXJzOjk5NDM2MzU0OS9zb3VuZHMucnNz Spotify: https://open.spotify.com/show/7rQjgnBvuyr18K03tnEHBI TuneIn: https://tunein.com/podcasts/Technology-Podcasts/AWS-Developers-Podcast-p1461814/ RSS Feed: https://feeds.soundcloud
Oskari Saarenmaa is Founder & CEO of Aiven, the fully managed, open source cloud data platform. Their platform combines all the tools needed to connect and manage open source data services such as Apache Kafka, Grafana, MySQL, Redis, InfluxDB along with many others. They have also open sourced a number of projects themselves (see here on GitHub). Aiven has raised $420M from investors including IVP and Atomico. In this episode, we discuss automation as a core value, finding a role in the open source ecosystem across multiple projects, the importance of 24/7 support when you have global customers, learning GTM as a technical team & more!
About BrianBrian is an accomplished dealmaker with experience ranging from developer platforms to mobile services. Before InfluxData, Brian led business development at Twilio. Joining at just thirty-five employees, he built over 150 partnerships globally from the company's infancy through its IPO in 2016. He led the company's international expansion, hiring its first teams in Europe, Asia, and Latin America. Prior to Twilio Brian was VP of Business Development at Clearwire and held management roles at Amp'd Mobile, Kivera, and PlaceWare.Links Referenced:InfluxData: https://www.influxdata.com/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is bought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups. If you're tired of the vulnerabilities, costs and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, thats V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out.Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. It's been a year, which means it's once again time to have a promoted guest episode brought to us by our friends at InfluxData. Joining me for a second time is Brian Mullen, CMO over at InfluxData. Brian, thank you for agreeing to do this a second time. You're braver than most.Brian: Thanks, Corey. I'm happy to be here. Second time is the charm.Corey: So, it's been an interesting year to put it mildly and I tend to have the attention span of a goldfish of most days, so for those who are similarly flighty, let's start at the very top. What is an InfluxDB slash InfluxData slash Influx—when you're not sure which one to use, just shorten it and call it good—and why might someone need it?Brian: Sure. So, InfluxDB is what most people understand our product as, a pretty popular open-source product, been out for quite a while. And then our company, InfluxData is the company behind InfluxDB. And InfluxDB is where developers build IoT real-time analytics and cloud applications, typically all based on time series. It's a time-series data platform specifically built to handle time-series data, which we think about is any type of data that is stamped in time in some way.It could be metrics, like, taken every one second, every two seconds, every three seconds, or some kind of event that occurs and is stamped in time in some way. So, our product and platform is really specialized to handle that technical problem.Corey: When last we spoke, I contextualized that in the realm of an IoT sensor that winds up reporting its device ID and its temperature at a given timestamp. That is sort of baseline stuff that I think aligns with what we're talking about. But over the past year, I started to see it in a bit of a different light, specifically viewing logs as time-series data, which hadn't occurred to me until relatively recently. And it makes perfect sense, on some level. It's weird to contextualize what Influx does as being a logging database, but there's absolutely no reason it couldn't be.Brian: Yeah, it certainly could. So typically, we see the world of time-series data in kind of two big realms. One is, as you mentioned the, you know, think of it as the hardware or, you know, physical realm: devices and sensors, these are things that are going to show up in a connected car, in a factory deployment, in renewable energy, you know, wind farm. And those are real devices and pieces of hardware that are out in the physical world, collecting data and emitting, you know, time-series every one second, or five seconds, or ten minutes, or whatever it might be.But it also, as you mentioned, applies to, call it the virtual world, which is really all of the software and infrastructure that is being stood up to run applications and services. And so, in that world, it could be the same—it's just a different type of source, but is really kind of the same technical problem. It's still time-series data being stamped, you know, data being stamped every, you know, one second, every five seconds, in some cases, every millisecond, but it is coming from a source that is actually in the infrastructure. Could be, you know, virtual machines, it could be containers, it could be microservices running within those containers. And so, all of those things together, both in the physical world and this infrastructure world are all emitting time-series data.Corey: When you take a look at the broader ecosystem, what is it that you see that has been the most misunderstood about time-series data as a whole? For example, when I saw AWS talking about a lot of things that they did in the realm of for your data lake, I talked to clients of mine about this and their response is, “Well, that'd be great genius, if we had a data lake.” It's, “What do you think those petabytes of nonsense in S3 are?” “Oh, those are the logs and the assets and a bunch of other nonsense.” “Yeah, that's what other people are calling a data lake.” “Oh.” Do you see similar lights-go-on moment when you talk to clients and prospective clients about what it is that they're doing that they just hadn't considered to be time-series data previously?Brian: Yeah. In fact, that's exactly what we see with many of our customers is they didn't realize that all of a sudden, they are now handling a pretty sizable time-series workload. And if you kind of take a step back and look at a couple of pretty obvious but sometimes unrecognized trends in technology, the first is cloud applications in general are expanding, they're both—horizontally and vertically. So, that means, like, the workloads that are being run in the Netflix's of the world, or all the different infrastructure that's being spun up in the cloud to run these various, you know, applications and services, those workloads are getting bigger and bigger, those companies and their subscriber bases, and the amount of data they're generating is getting bigger and bigger. They're also expanding horizontally by region and geography.So Netflix, for example, running not just in the US, but in every continent and probably every cloud region around the world. So, that's happening in the cloud world, and then also, in the IoT world, there's this massive growth of connected devices, both net-new devices that are being developed kind of, you know, the next Peloton or the next climate control unit that goes in an apartment or house, and also these longtime legacy devices that are been on the factory floor for a couple of decades, but now are being kind of modernized and coming online. So, if you look at all of that growth of the data sources now being built up in the cloud and you look at all that growth of these connected devices, both new and existing, that are kind of coming online, there's a huge now exponential growth in the sources of data. And all of these sources are emitting time-series data. You can just think about a connected car—not even a self-driving car, just a connected car, your everyday, kind of, 2022 model, and nearly every element of the car is emitting time-series data: its engine components, you know, your tires, like, what the climate inside of the car is, statuses of the engine itself, and it's all doing that in real-time, so every one second, every five seconds, whatever.So, I think in general, people just don't realize they're already dealing with a substantial workload of time series. And in most cases, unless they're using something like Influx, they're probably not, you know, especially tuned to handle it from a technology perspective.Corey: So, it's been a year. What has changed over on your side of the world since the last time we spoke? It seems that well, things continue and they're up and to the right. Well, sure, generally speaking, you're clearly still in business. Good job, always appreciative of your custom, as well as the fact that oh, good, even in a world where it seems like there's a macro recession in progress, that there are still companies out there that continue to persist and in some cases, dare I say, even thrive? What have you folks been up to?Brian: Yeah, it's been a big year. So first, we've seen quite a bit of expansion across the use cases. So, we've seen even further expansion in IoT, kind of expanding into consumer, industrial, and now sustainability and clean energy, and that pairs with what we've seen on FinTech and cryptocurrency, gaming and entertainment applications, network telemetry, including some of the biggest names in telecom, and then a little bit more on the cloud side with cloud services, infrastructure, and dev tools and APIs. So, quite a bit more broad set of use cases we're now seeing across the platform. And the second thing is—you might have seen it in the last month or so—is a pretty big announcement we had of our new storage engine.So, this was just announced earlier this month in November and was previously introduced to our community as what we call an IOx, which is how it was known in the open-source. And think of this really as a rebuilt and reimagined storage engine which is built on that open-source project, InfluxDB IOx that allows us to deliver faster queries, and now—pretty exciting for the first time—unlimited time-series, or cardinality as it's known in the space. And then also we introduced SQL for writing queries and BI tool support. And this is, for the first time we're introducing SQL, which is world's most popular data programming language to our platform, enabling developers to query via the API our language Flux, and InfluxQL in addition.Corey: A long time ago, it really seems that the cloud took a vote, for lack of a better term, and decided that when it comes to storage, object store is the way forward. It was a bit of a reimagining from how we all considered using storage previously, but the economics are at minimum of ten to one in favor of objects store, the latency is far better, the durability is off the charts better, you don't have to deal—at least in AWS-land—with the concept of availability zones and the rest, just from an economic and performance perspective, provided the use case embraces it, there's really no substitute.Brian: Yeah, I mean, the way we think about storage is, you know, obviously, it varies quite a bit from customer to customer with our use cases. Especially in IoT, we see some use cases where customers want to have data around for months and in some cases, years. So, it's a pretty substantial data set you're often looking at. And sometimes those customers want to downsample those, they don't necessarily need every single piece of minutia that they may need in real-time, but not in summary, looking backward. So, you really—we're in this kind of world where we're dealing with both hive fidelity—usually in the moment—data and lower fidelity, when people can downsample and have a little bit more of a summarized view of what happened.So, pretty unique for us and we have to kind of design the product in a way that is able to balance both of those because that's what, you know, the customer use cases demand. It's a super hard problem to solve. One of the reasons that you have a product like InfluxDB, which is specialized to handle this kind of thing, is so that you can actually manage that balance in your application service and setting your retention policy, et cetera.Corey: That's always been something that seemed a little on the odd side to me when I'm looking at a variety of different observability tools, where it seems that one of the key dimensions that they all tend to, I guess, operate on and price on is retention period. And I get it; you might not necessarily want to have your load balancer logs from 2012 readily available and paying for the privilege, but it does seem that given the dramatic fall of archival storage pricing, on some level, people do want to be able to retain that data just on the off chance that will be useful. Maybe that's my internal digital packrat chiming in at this point, but I do believe strongly that there is a correlation between how recent the data is and how useful it is, for a variety of different use cases. But that's also not a global truth. How do you view the divide? And what do you actually see people saying they want versus what they're actually using?Brian: It's a really good question and not a simple problem to solve. So, first of all, I would say it probably really depends on the use case and the extent to which that use case is touching real world applications and services. So, in a pure observability setting where you're looking at, perhaps more of a, kind of, operational view of infrastructure monitoring, you want to understand kind of what happened and when those tend to be a little bit more focused on real-time and recent. So, for example, you of course, want to know exactly what's happening in the moment, zero in on whatever anomaly and kind of surrounding data there is, perhaps that means you're digging into something that happened in you know, fairly recent time. So, those do tend to be, not all of them, but they do tend to be a little bit more real-time and recent-oriented.I think it's a little bit different when we look at IoT. Those generally tend to be longer timeframes that people are dealing with. Their physical out-in-the-field devices, you know, many times those devices are kind of coming online and offline, depending on the connectivity, depending on the environment, you can imagine a connected smart agriculture setup, I mean, those are a pretty wide array of devices out and in, you know, who knows what kind of climate and environment, so they tend to be a little bit longer in retention policy, kind of, being able to dig into the data, what's happening. The time frame that people are dealing with is just, in general, much longer in some of those situations.Corey: One story that I've heard a fair bit about observability data and event data is that they inevitably compose down into metrics rather than events or traces or logs, and I have a hard time getting there because I can definitely see a bunch of log entries showing the web servers return codes, okay, here's the number of 500 errors and number of different types of successes that we wind up seeing in the app. Yeah, all right, how many per minute, per second, per hour, whatever it is that makes sense that you can look at aberrations there. But in the development process at least, I find that having detailed log messages tell me about things I didn't see and need to understand or to continue building the dumb thing that I'm in the process of putting out. It feels like once something is productionalized and running, that its behavior is a lot more well understood, and at that point, metrics really seem to take over. How do you see it, given that you fundamentally live at that intersection where one can become the other?Brian: Yeah, we are right at that intersection and our answer probably would be both. Metrics are super important to understand and have that regular cadence and be kind of measuring that state over time, but you can miss things depending on how frequent those metrics are coming in. And increasingly, when you have the amount of data that you're dealing with coming from these various sources, the measurement is getting smaller and smaller. So, unless you have, you know, perfect metrics coming in every half-second, or you know, in some sub-partition of that, in milliseconds, you're likely to miss something. And so, events are really key to understand those things that pop up and then maybe come back down and in a pure metric setting, in your regular interval, you would have just completely missed. So, we see most of our use cases that are showing a balance of the two is kind of the most effective. And from a product perspective, that's how we think about solving the problem, addressing both.Corey: One of the things that I struggled with is it seems that—again, my approach to this is relatively outmoded. I was a systems administrator back when that title was not considered disparaging by a good portion of the technical community the way that it is today. Even though the job is the same, we call them something different now. Great. Okay, whatever smile, nod, and accept the larger paycheck.But my way of thinking about things are okay, you have the logs, they live on the server itself. And maybe if you want to be fancy, you wind up putting them to a centralized rsyslog cluster or whatnot. Yes, you might send them as well to some other processing system for visibility or a third-party monitoring system, but the canonical truth slash source of logs tends to live locally. That said, I got out of running production infrastructure before this idea of ephemeral containers or serverless functions really became a thing. Do you find that these days you are the source of truth slash custodian of record for these log entries, or do you find that you are more of a secondary source for better visibility and analysis, but not what they're going to bust out when the auditor comes calling in three years?Brian: I think, again, it—[laugh] I feel like I'm answering the same way [crosstalk 00:15:53]Corey: Yeah, oh, and of course, let's be clear, use cases are going to vary wildly. This is not advice on anyone's approach to compliance and the rest [laugh]. I don't want to get myself in trouble here.Brian: Exactly. Well, you know, we kind of think about it in terms of profiles. And we see a couple of different profiles of customers using InfluxDB. So, the first is, and this was kind of what we saw most often early on, still see quite a bit of them is kind of more of that operator profile. And these are folks who are going to—they're building some sort of monitor, kind of, source of truth for—that's internally facing to monitor applications or services, perhaps that other teams within their company built.And so that's, kind of like, a little bit more of your kind of pure operator. Yes, they're building up in the stack themselves, but it's to pay attention to essentially something that another team built. And then what we've seen more recently, especially as we've moved more prominently into the cloud and offered a usage-based service with a, you know, APIs and endpoint people can hit, as we see more people come into it from a builder's perspective. And similar in some ways, except that they're still building kind of a, you know, a source of truth for handling this kind of data. But they're also building the applications and services themselves are taken out to market that are in the hands of customers.And so, it's a little bit different mindset. Typically, there's, you know, a little bit more comfort with using one of many services to kind of, you know, be part of the thing that they're building. And so, we've seen a little bit more comfort from that type of profile, using our service running in the cloud, using the API, and not worrying too much about the kind of, you know, underlying setup of the implementation.Corey: Love how serverless helps you scale big and ship fast, but hate debugging your serverless apps? With Lumigo's serverless observability, it's fast and easy (and maybe a little fun, too). End-to-end distributed tracing gives developers full clarity into their most complex serverless and containerized applications, connecting every service from AWS Lambda and Amazon ECS to DynamoDB, API Gateways, Step Functions and more. Try Lumigo free and debug 3x faster, reduce error rate and speed up development. Visit snark.cloud/lumigo That's snark.cloud/L-U-M-I-G-OCorey: So, I've been on record a lot saying that the best database is TXT records stuffed into Route 53, which works super well as a gag, let's be clear, don't actually build something on top of this, that's a disaster waiting to happen. I don't want to destroy anyone's career as I do this. But you do have a much more viable competitive threat on the landscape. And that is quite simply using the open-source version of InfluxDB. What is the tipping point where, “Huh, I can run this myself,” turns into, “But I shouldn't. I should instead give money to other people to run it for me.”Because having been an engineer, where I believe I'm the world's greatest everything, when it comes to my environment—a fact provably untrue, but that hubris never quite goes away entirely—at what point am I basically being negligent not to start dealing with you in a more formalized business context?Brian: First of all, let me say that we have many customers, many developers out there who are running open-source and it works perfectly for them. The workload is just right, the deployment makes sense. And so, there are many production workloads we're using open-source. But typically, the kind of big turning point for people is on scale, scale, and overall performance related to that. And so, that's typically when they come and look at one of the two commercial offers.So, to start, open-source is a great place to, you know, kind of begin the journey, check it out, do that level of experimentation and kind of proof of concept. We also have 60,000-plus developers using our introductory cloud service, which is a free service. You simply sign up and can begin immediately putting data into the platform and building queries, and you don't have to worry about any of the setup and running servers to deploy software. So, both of those, the open-source and our cloud product are excellent ways to get started. And then when it comes time to really think about building in production and moving up in scale, we have our two commercial offers.And the first of those is InfluxDB Cloud, which is our cloud-native fully managed by InfluxData offering. We run this not only in AWS but also in Google Cloud and Microsoft Azure. It's a usage-based service, which means you pay exactly for what you use, and the three components that people pay for our data in, number of queries, and the amount of data you store in storage. We also for those who are interested in actually managing it themselves, we have InfluxDB Enterprise, which is a software subscription-base model, and it is self-managed by the customer in their environment. Now, that environment could be their own private cloud, it also could be on-premises in their own data center.And so, lots of fun people who are a little bit more oriented to kind of manage software themselves rather than using a service gear toward that. But both those commercial offers InfluxDB Cloud and InfluxDB Enterprise are really designed for, you know, massive scale. In the case of Cloud, I mentioned earlier with the new storage engine, you can hit unlimited cardinality, which means you have no limit on the number of time series you can put into the platform, which is a pretty big game-changing concept. And so, that means however many time-series sources you have and however many series they're emitting, you can run that without a problem without any sort of upper limit in our cloud product. Over on the enterprise side with our self-managed product, that means you can deploy a cluster of whatever size you want. It could be a two-by-four, it could be a four-by-eight, or something even larger. And so, it gives people that are managing in their own private cloud or in a data center environment, really their own options to kind of construct exactly what they need for their particular use case.Corey: Does your object storage layer make it easier to dynamically change clusters on the fly? I mean, historically, running things in a pre-provisioned cluster with EBS volumes or local disk was, “Oh, great. You want to resize something? Well, we're going to be either taking an outage or we're going to be building up something, migrating data live, and there's going to be a knife-switch cutover at some point that makes things relatively unfortunate.” It seems that once you abstract the storage layer away from anything resembling an instance that you would be able to get away from some of those architectural constraints.Brian: Yeah, that's really the promise, and what is delivered in our cloud product is that you no longer, as a developer, have to think about that if you're using that product. You don't have to think about how big the cluster is going to be, you don't have to think about these kind of disaster scenarios. It is all kind of pre-architected in the service. And so, the things that we really want to deliver to people, in addition to the elimination of that concern for what the underlying infrastructure looks like and how its operating. And so, with infrastructure concerns kind of out of the way, what we want to deliver on are kind of the things that people care most about: real-time query speed.So, now with this new storage engine, you can query data across any time series within milliseconds, 100 times faster queries against high cardinality data that was previously impossible. And we also have unlimited time-series volume. Again, any total number of time series you have, which is known as cardinality, is now able to run without a problem in the platform. And then we also have kind of opening up, we're opening up the aperture a bit for developers with SQL language support. And so, this is just a whole new world of flexibility for developers to begin building on the platform. And again, this is all in the way that people are using the product without having to worry about the underlying infrastructure.Corey: For most companies—and this does not apply to you—their core competency is not running time-series databases and the infrastructure attendant thereof, so it seems like it is absolutely a great candidate for, “You know, we really could make this someone else's problem and let us instead focus on the differentiated thing that we are doing or building or complaining about.”Brian: Yeah, that's a true statement. Typically what happens with time-series data is that people first of all, don't realize they have it, and then when they realize they have time-series data, you know, the first thing they'll do is look around and say, “Well, what do I have here?” You know, I have this relational database over here or this document database over here, maybe even this, kind of, search database over here, maybe that thing can handle time series. And in a light manner, it probably does the job. But like I said, the sources of data and just the volume of time series is expanding, really across all these different use cases, exponentially.And so, pretty quickly, people realize that thing that may be able to handle time series in some minor manner, is quickly no longer able to do it. They're just not purpose-built for it. And so, that's where really they come to a product like Influx to really handle this specific problem. We're built specifically for this purpose and so as the time-series workload expands when it kind of hits that tipping point, you really need a specialized tool.Corey: Last question, before I turn you loose to prepare for re:Invent, of course—well, I guess we'll ask you a little bit about that afterwards, but first, we can talk a lot theoretically about what your product could or might theoretically do. What are you actually seeing? What are the use cases that other than the stereotypical ones we've talked about, what have you seen people using it for that surprised you?Brian: Yeah, some of it is—it's just really interesting how it connects to, you know, things you see every day and/or use every day. I mean, chances are, many people listening have probably use InfluxDB and, you know, perhaps didn't know it. You know, if anyone has been to a home that has Tesla Powerwalls—Tesla is a customer of ours—then they've seen InfluxDB in action. Tesla's pulling time-series data from these connected Powerwalls that are in solar-powered homes, and they monitor things like health and availability and performance of those solar panels and the battery setup, et cetera. And they're collecting this at the edge and then sending that back into the hub where InfluxDB is running on their back end.So, if you've ever seen this deployed like that's InfluxDB running behind the scenes. Same goes, I'm sure many people have a Nest thermostat in their house. Nest monitors the infrastructure, actually the powers that collection of IoT data collection. So, you think of this as InfluxDB running behind the scenes to monitor what infrastructure is standing up that back-end Nest service. And this includes their use of Kubernetes and other software infrastructure that's run in their platform for collection, managing, transforming, and analyzing all of this aggregate device data that's out there.Another one, especially for those of us that streamed our minds out during the pandemic, Disney+ entertainment, streaming, and delivery of that to applications and to devices in the home. And so, you know, this hugely popular Disney+ streaming service is essentially a global content delivery network for distributing all these, you know, movies and video series to all the users worldwide. And they monitor the movement and performance of that video content through this global CDN using InfluxDB. So, those are a few where you probably walk by something like this multiple times a week, or in our case of Disney+ probably watching it once a day. And it's great to see InfluxDB kind of working behind the scenes there.Corey: It's one of those things where it's, I guess we'll call it plumbing, for lack of a better term. It's not the sort of thing that people are going to put front-and-center into any product or service that they wind up providing, you know, except for you folks. Instead, it's the thing that empowers a capability behind that product or service that is often taken for granted, just because until you understand the dizzying complexity, particularly at scale, of what these things have to do under the hood, it just—well yeah, of course, it works that way. Why shouldn't it? That's an expectation I have of the product because it's always had that. Yeah, but this is how it gets there.Brian: Our thesis really is that data is best understood through the lens of time. And as this data is expanding exponentially, time becomes increasingly the, kind of, common element, the common component that you're using to kind of view what happened. That could be what's running through a telecom network, what's happening with the devices that are connected that network, the movement of data through that network, and when, what's happening with subscribers and content pushing through a CDN on a streaming service, what's happening with climate and home data in hundreds of thousands, if not millions of homes through common device like a Nest thermostat. All of these things they attach to some real-world collection of data, and as long as that's happening, there's going to be a place for time-series data and tools that are optimized to handle it.Corey: So, my last question—for real this time—we are recording this the week before re:Invent 2022. What do you hope to see, what do you expect to see, what do you fear to see?Brian: No fears. Even though it's Vegas, no fears.Corey: I do have the super-spreader event fear, but that's a separate—Brian: [laugh].Corey: That's a separate issue. Neither one of us are deep into the epidemiology weeds, to my understanding. But yeah, let's just bound this to tech, let's be clear.Brian: Yeah, so first of all, we're really excited to go there. We'll have a pretty big presence. We have a few different locations where you can meet us. We'll have a booth on the main show floor, we'll be in the marketplace pavilion, as I mentioned, InfluxDB Cloud is offered across the marketplaces of each of the clouds, AWS, obviously in this case, but also in Azure and Google. But we'll be there in the AWS Marketplace pavilion, showcasing the new engine and a lot of the pretty exciting new use cases that we've been seeing.And we'll have our full team there, so if you're looking to kind of learn more about InfluxDB, or you've checked it out recently and want to understand kind of what the new capability is, we'll have many folks from our technical teams there, from our development team, some our field folks like the SEs and some of the product managers will be there as well. So, we'll have a pretty great collection of experts on InfluxDB to answer any questions and walk people through, you know, demonstrations and use cases.Corey: I look forward to it. I will be doing my traditional Wednesday afternoon tour through the expo halls and nature walk, so if you're listening to this and it's before Wednesday afternoon, come and find me. I am kicking off and ending at the [unintelligible 00:29:15] booth, but I will make it a point to come by the Influx booth and give you folks a hard time because that's what I do.Brian: We love it. Please. You know, being on the tour is—on the walking tour is excellent. We'll be mentally prepared. We'll have some comebacks ready for you.Corey: Therapists are standing by on both sides.Brian: Yes, exactly. Anyway, we're really looking forward to it. This will be my third year on your walking tour. So, the nature walk is one of my favorite parts of AWS re:Invent.Corey: Well, I appreciate that. Thank you. And thank you for your time today. I will let you get back to your no doubt frenzied preparations. At least they are on my side.Brian: We will. Thanks so much for having me and really excited to do it.Corey: Brian Mullen, CMO at InfluxData, I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment that you naively believe will be stored as a TXT record in a DNS server somewhere rather than what is almost certainly a time-series database.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
We cover how to sustain long-term transformational projects with Paul Dix, CTO & Founder @ InfluxData! This high-energy conversation reveals the history behind InfluxDB and its multi-phase, long-term transformation over the past 10 years. Plus we discuss how to know when it's time to take your company to the next level, identifying the right people for your eng teams, integrating multiple teams into an org re-architecture, and building open-source products/communities!ABOUT PAUL DIXPaul (@PaulDix) is the creator of InfluxDB. He has helped build software for startups, large companies, and organizations like Microsoft, Google, McAfee, Thomson Reuters, and Air Force Space Command. He is the series editor for Addison Wesley's Data & Analytics book and video series. In 2010 Paul wrote the book Service Oriented Design with Ruby and Rails. In 2009 he started the NYC Machine Learning Meetup. Paul holds a degree in computer science from Columbia University."What I need is a small team of focused people who are on board, who can be focused on getting this done and we'll prove it out as we go.And I think the mistake I made with the 2.0 cloud product was we got way too many of people involved way too quickly, right? I think for the initial phases of project, it's actually advantageous to have a smaller team.- Paul Dix Interested in joining an ELC Peer Group?ELCs Peer Groups provide a virtual, curated, and ongoing peer learning opportunity to help you navigate the unknown, uncover solutions and accelerate your learning with a small group of trusted peers.Apply to join a peer group HERE: sfelc.com/peerGroupsSHOW NOTES:The history behind InfluxDB & its multi-phase, long-term transformation (1:53)InfluxDB's first transformational phase featuring time series data (5:48)Phase 2.0 & shifting to a cloud-first delivery model (7:50)Challenges & opportunities faced in the current phase of InfluxDB (9:31)How Paul decided it was time to take the company to the next level (11:38)Making a bet on Rust (14:25)Why making an early announcement helped push Phase 3.0 forward (16:02)Strategies for identifying the right people for your eng team (19:06)How to optimize community insights when tailoring your vision (21:56)Tips for resolving disagreements between eng team members (24:45)Frameworks for executing long-term vision & achieving alignment (26:21)Processes for integrating other teams into an org's re-architecture (29:55)The impact of Conway's Law on team structure & open-source software (32:07)Considerations for managing large, open-source projects (36:40)Rapid fire questions (37:56)LINKS AND RESOURCES“The Happiness Hypothesis” by Jonathan Haidt - Each chapter is an attempt to savor one idea that has been discovered by several of the world's civilizations - to question it in light of what we now know from scientific research, and to extract from it the lessons that still apply to our modern lives.“The Fate of Rome” by Kyle Harper - How devastating viruses, pandemics, and other natural catastrophes swept through the far-flung Roman Empire and helped to bring down one of the mightiest civilizations of the ancient world
About PerryPerry Krug currently leads the Shared Services team which is focused on building tools and managing infrastructure and data to increase the productivity of Couchbase's Sales and Field organisations. Perry has been with Couchbase for over 12 years and has served in many customer-facing technical roles, helping hundreds of customers understand, deploy, and maintain Couchbase's NoSQL database technology. He has been working with high performance caching and database systems for over 15 years.Links Referenced: Couchbase: https://www.couchbase.com/ Perry's LinkedIn: https://www.linkedin.com/in/perrykrug/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: InfluxDB is the smart data platform for time series. It's built from the ground-up to handle the massive volumes and countless sources of time-stamped data produced by sensors, applications, and systems. You probably think of these as logs.InfluxDB is programmable and performant, has a common API across the platform, and handles high granularity data–at scale and with high availability. Use InfluxDB to build real-time applications for analytics, IoT, and cloud-native services, all in less time and with less code. So go ahead–turn your apps up to 11 and start your journey to Awesome for free at InfluxData.com/screaminginthecloudCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Today's episode is a promoted guest episode brought to us by our friends at Couchbase. Now, I want to start off by saying that this week is AWS re:Invent. And there is Last Week in AWS swag available at their booth. More on that to come throughout the next half hour or so of conversation. But let's get right into it. My guest today is Perry Krug, Director of Shared Services over at Couchbase. Perry, thanks for joining me.Perry: Hey, Corey, thank you so much for having me. It's a pleasure.Corey: So, we're recording this before re:Invent, so the fact that we both have, you know, personality and haven't lost our voices yet should probably be a bit of a giveaway on this. But I want to start at the very beginning because unlike people who are academically successful, I tend to suck at doing the homework, across the board. Couchbase has been around for a long time. We've seen the company do a bunch of different things, most importantly and notably, sponsoring my ridiculous nonsense for which I thank you. But let's start at the beginning. What is Couchbase?Perry: Yeah, you're very welcome, Corey. And it's again, it's a pleasure to be here. So, Couchbase is an enterprise database company at the very top level. We make database software and we distribute that to our customers. We have two flavors, two ways of getting your hands on it.One is the kind of legacy, what we call self-managed, where you the user, the customer, downloads the software, installs it themselves, sets it up, manages the cluster monitoring, scaling all of that. And that's, you know, a big part of our business. Over the last few years we've identified, and certainly others in the industry have, as well the desire for users to access database and other technology in a hosted Software-as-a-Service pay-as-you-go, cloud-native, buzzword, et cetera, et cetera, vehicle. And so, we've released the Couchbase Capella, which is our fully managed, fully hosted database-as-a-service, running in—currently—Amazon and Google, soon to be Azure as well. And it wraps and extends our core Couchbase Server product into a, as I mentioned, hosted and managed platform that our users can now come to and consume as developers and build their applications while leaving all of the operational and administration—monitoring, managing failover expansion, all of that—to us as the experts.Corey: So, you folks are non-relational database, NoSQL in the common parlance, which is odd because they call it NoSQL, yet. They keep making more of them, so I feel like that's sort of the Hollywood model where okay, that was so good. We're going to do it again. Where did NoSQL come from? Because back when I was learning databases, when dinosaurs roamed the earth, it was all about relational models, like we're going to use a relational database because when the only tool you have is an axe, every problem looks like hours of fun. What gave rise to this, I guess, Cambrian explosion that we've seen of NoSQL options that proliferate o'er the land?Perry: Yeah, a really, really good question, and I like the axe-throwing metaphor. So sure, 20, 30, 40 now years ago, as digital applications needed a place to store their data, the world invented relational databases. And those were used and continue to be used very well for what they were designed for, for data that follows a very strict structure that doesn't need to be served at significant scale, does not need to be replicated geographically, does not need to handle data coming in from different sources and those sources changing their formats of things all the time. And so, I'm probably as old as you are and been around when the dinosaurs were there. We remember this term called ‘Web 2.0.' Kids, you're going to have to go look that up in the dictionary or TikTok it or something.But Web 2.0 really was the turning point when websites became web applications. And suddenly, there was the introduction of MySpace and Facebook and Amazon and Google and LinkedIn, and a number of others, and they realized that relational databases we're not going to meet their needs, whether it be performance, whether it be flexibility, whether it be changing of data models, whether it be introducing new features at a rapid pace. They tried; they stretched them, they added a bunch of different databases together, and really was not going to be a viable solution. So, 10 now, maybe 15 years ago, you started to see the rise of these tech giants—although we didn't call them tech giants back then but they were the precursors to today's—invent their own new databases.So, Amazon had theirs, Google has theirs, LinkedIn, and a number of others. These companies had reached a level of scale and reached a level of user base, had reached a level of data requirement, had reached a level of expectation with their customers. These customers, us, the users, us consumers, we expect things to be fast, we expect them to be always available. We expect Facebook to give us our news feed in milliseconds. We expect Google to give us our website or our search results in immediate, with more and more information coming along with them.And so, it was these companies that hit those requirements first. The only solution for them was to start from scratch and rewrite their own databases. Fast forward five, six, seven years, and we as consumers turned around and said, “Look, I really liked the way Facebook does things. I really like the way Google does things. I really like the way Amazon does things.“Bank of America, can you do the same? IRS, can you do the same? Health care vendor number one, two, three, and four, government body, can you all give me the same experience? I want my taxi to tell me exactly where it's going to take me from one place to another, I want it to give me a receipt immediately after I finish my ride. Actually, I want to be able to change my payment method after I paid for that ride because I used the wrong one.”All of these are expectations that we as consumers have taken from the tech giants—Apple, LinkedIn, Facebook—and turned around to nearly every other service that we interact with on a daily basis. And all of a sudden, the requirements that Facebook had, that Google had, that no other company had, you know, outside of the top five, suddenly were needed by every single industry, nearly every single company, in order to be competitive in their markets.Corey: And there's no way to scale relational to get to a point where it can wind up handling those type workloads efficiently?Perry: Correct, correct. And it's not just that the technology cannot do it—everything is technically feasible—but the cost both financially and time-to-market-wise in order to do that in a relational database was untenable. It either cost too much money, or it costs too much developers time, or cost too much of everybody's time to try to shoehorn something into it. And then you have the rise of cloud and containers, which relational databases, you know, never even had the inkling of a thought that they might need to be able to handle someday. And so, these requirements that consumers have been placed on everything else that they interact with really led to the rise of NoSQL as a commodity or as a database for the masses.LinkedIn is not in the business of developing a database and then selling it to everybody else to use as a database, right? They built it for themselves, they made their service better. And so, what you see is some of those founding fathers created databases, but then had no desire to sell them to others. And then after that followed the rise of companies like Couchbase and a number of others who said, “Look, we think we can provide those capabilities, we think we can meet those requirements for everybody.” And thereby rose the plethora of NoSQL databases because everybody had a little bit different of an approach to it.If you ask ten people what NoSQL is about, you're going to get eleven or twelve different answers. But you can kind of distill that into two categories. One is performance and operations. So, I need it to be faster, I need it to be scalable, I need it to be replicated geographically. And that's what NoSQL is to me. And that's the right answer.And so, you have things like Cassandra and Redis that are meant to be fast and scalable and replicated. You ask another group and they're going to tell you, “No, no, no. NoSQL needs to be flexible. I need to get rid of the rigid database schemas, I need to bring JSON or other data formats in and munge all this data together and create something cool and new out of it.” And thereby you have the rise of things like MongoDB, who focused nearly exclusively on the developer experience of working with data.And for a long time, those two were in opposite camps, where you have the databases that did performance and the databases that did flexibility. I'm not here to say that Couchbase is the ultimate kitchen sink for everything, but we've certainly tried to approach both of those challenges together so that you can have something that scales and performs and can be flexible enough in data model. And everybody else is trying to do the same thing, right? But all these databases are competing for that same nirvana of the best of both worlds.Corey: And it almost feels like there's a convergence play in place where everything now is trying to go away from the idea of, “Oh, yeah, we started off as a purpose-built database, but you can use this for everything.” And I don't necessarily know that is going to be the path that a lot of companies want to go down. What do you view Couchbase as I guess, falling down? In other words, what workloads is Couchbase inappropriate for?Perry: Yeah, that's a good question. And my [crosstalk 00:10:35]—Corey: Anyone who can't answer that one is a zealot and that's one of those okay, let's be very careful and not take our eyes off you for one second, while smiling and backing away slowly.Perry: Let's cut to commercial. No, I mean, there certainly are workloads that you know, in the past, we've not been good for that we've made improvements to address. There are workloads that we had not address well today that we will try to address in the future, and there are workloads that we may never see as fitting in our wheelhouse. The biggest category group that comes to mind is Couchbase is not an archival database. We are not meant to have data put in us that you don't care about, that you don't want to—that you just need to keep it around, but you don't ever need to access.And there are systems that do that well, they do that at a solid total cost of ownership. And Couchbase is meant for operational data. It's meant for data that needs to be interacted with, read and/or written, at scale and at a reasonable performance to serve a user-facing or system-facing application. And we call ourselves a general-purpose database. Bongo and others call themselves as well. Oracle calls itself a general-purpose database, and yet, not everybody uses Oracle for everything.So, there are reasons that you—Corey: Who could afford that?Perry: Who could? Exactly. It comes down to cost, ultimately. So, I'm not here to say that Couchbase does everything. We like to think, and we're trying to target and strive towards an 80%, right? If we can do 80% of an application or an organization's workloads, there is certainly room for 20% of other workloads, other applications, other requirements that can be met or need to be met by purpose-built databases.But if you rewind four or five years, there was this big push towards polyglot persistence. It's a buzzword that came and kind of has gone out of fashion, but it presented the idea that everybody is going to use 15 different databases and everybody is going to pick the right one for exactly the workload and they're going to somehow stitch them all together. And that really hasn't come to fruition either. So, I think there's some balance, where it's not one to rule them all, but it's also not 15 for every company. Some organizations just have a set of requirements that they want to be met and our database can do that.Corey: Let's continue our tour of the competitive landscape here now that we've handled the relational side of the world. The best database, as anyone who's listened to this show knows, is of course, Amazon's Route 53 TXT records stuffed into DNS, especially in the NoSQL land. Clearly, you're all fighting for second place after that. How do you stack up against the idea of legitimately using that approach? And for those who are not in on the joke, please don't do this. It is not the right answer. But I'm curious to get your take as to why DNS TXT records are an inappropriate NoSQL option.Perry: Well, it's a joke, right? And let's be clear about that. But—Corey: I have to say that because otherwise, someone tries it in production. I've gotten that wrong a few times, historically, so now I put a disclaimer in because yeah, it's only funny, so long as people are in on the joke. If not, and I lead someone down the primrose path to disaster, I feel bad. So, let's be very clear. We're kidding.Perry: And I'm laughing. I'm laughing here behind the camera. I am. I am.Corey: Yeah.Perry: So, the element of truth that I think Couchbase is in a position, or I'm in a position to kind of talk about is, 12 years ago, when Couchbase started, we were a key-value database and that's where we saw the best part of the market in those days, and where we were able to achieve the best scale and replication and performance, and fairly quickly realized that simple key-value, though extremely valuable and easy to manage, was not broad enough in requirements-meeting. And that's where we set our sights on and identified the larger, kind of, document database group, which is really just a derivative of key-value, where still everything is a key and a value; it's just now a document that you can reason about, that you can create an index on, that you can query, that you can run full-text search on, you can do much more with the data. So, at our core, we are still a key-value database. When that value is JSON, we become a document database. And so, if Route 53 decided that they wanted to enter into the document database market, they would need to be adding things that allowed you to introspect and ask questions of the data within that text which you can't, right?Corey: Well, not with that attitude. But yeah, I agree with you.Perry: [laugh].Corey: Moving up the stack, let's talk about a much more fearsome competitor here that I'm certain you see an awful lot of deals that you wind up closing, specifically your own open-source product. You historically have wound up selling software into environments, I believe, you referred to as your legacy offering where it's the hosted version of your commercial software. And now of course, you also have Capella, your cloud-hosted version. But open-source looks surprisingly compelling for an awful lot of use cases and an awful lot of folks. What's the distinction?Perry: Sure. Just to correct a little bit the distinction, we have Couchbase Server, which we provide as a what we call self-managed, where you can download it and install it yourself. Now, you could do that with the open-source version or you could do that with our Enterprise Edition. What we've then done is wrapped that Enterprise Edition in a hosted bottle, and that's Capella. So, the open-source version is something we've long been supporters of; it's been a core part of our go-to-market for the last 12 or 13 years or so and we still see it as a strong offering for organizations that don't need the added features, the added capabilities, don't need the support of the experts that wrote the software behind them.Certainly, we contribute and support our community through our forums and Discord and other channels, but that's a very big difference than two o'clock in the morning, something's not working and I need a ticket to track. We don't do that for our community edition. So, we see lots of users downloading that, picking it up building it into their applications, especially applications that are in their infancy or are with organizations that they simply can't afford the added cost and therefore they don't get the added benefit. We're not here to gouge and carve out every dollar that we can, but if you need the benefit that we can provide, we think there's value in that and that's what we're trying to run a business as.Corey: Oh, absolutely. It doesn't work when you're trying to wind up charging a license fee for something that someone is doing in their spare time project for funsies just to learn the technology. It's like, and then you show up. It's like, “That'll be $700. Surprise.”Yeah, that's sort of the AWS billing model approach, where—it's not a viable onramp for most folks. So, the open-source direction down there make sense. Counterpoint. If you're running a bank on top of it, “Well, we're running it ourselves and really hoping for the best. I mean, we have access to the code and all.” Great, but there are times you absolutely want some of the best minds in the world, with respect to that particular product, able to help troubleshoot so the ATM start working again before people riot in the streets.Perry: Yeah, yeah. And ultimately, it's a question of core competencies. Are you an organization that wants to be in the database development market? Great, by all means, we'd love to support you in that. If you want to focus on doing what you do best be at a bank or an e-commerce website, you worry about your application, you let us worry about the database and everybody gets along very well.Corey: There's definitely something to be said for outsourcing some of the pain, some of the challenge around an awful lot of it.Perry: There's a natural progression to the cloud for that and Software-as-a-Service, database-as-a-service where you're now outsourcing even more by running on our hosting platform. No longer do you have to download the binary and install yourself, no longer do you have to setup the cluster and watch it in case it has a blip or the statistic goes up too far. We're taking care of that for you. So yes, you're paying for that service, but you're getting the value of not having to be a database manager, let alone database developer for them.Corey: Love how serverless helps you scale big and ship fast, but hate debugging your serverless apps? With Lumigo's serverless observability, it's fast and easy (and maybe a little fun, too). End-to-end distributed tracing gives developers full clarity into their most complex serverless and containerized applications, connecting every service from AWS Lambda and Amazon ECS to DynamoDB, API Gateways, Step Functions and more. Try Lumigo free and debug 3x faster, reduce error rate and speed up development. Visit snark.cloud/lumigo That's snark.cloud/L-U-M-I-G-OCorey: What is the point of distinction between Couchbase Server and Couchbase Capella? To be clear, your self-hosted versus managed cloud offerings. When is one appropriate versus the other?Perry: Well, I'm supposed to say that Capella is always the appropriate choice, but there are currently a number of situations where Capella is not available in particular regions or cloud providers and so downloading running the software yourself certainly in your own—yes, there are people who still run their own data centers. I know it's taboo and we don't like to talk about that, but there are people who have on-premise. And so, Couchbase Capella is not available for them. But Couchbase Server is the original Couchbase database and it is the core of Couchbase Capella. So, wrapping is not giving it enough credit; we use Couchbase Server to power Couchbase Capella.And so, there's an enormous amount of value added around the core database, but ultimately, it's the behind the scenes of Couchbase Capella. Which I think is a nice benefit in that when an application is connecting to either one, it gets the same experience. You can point an application at one versus the other and because it's the same database running behind the scenes, the behavior, the data model, the query language, the APIs are all the same, so it adds a nice level of flexibility four customers that are either moving from one to another or have to have some sort of hybrid approach, which we see in the market today.Corey: Let's talk economics for a second. I can see scenarios where especially you have a high volume environment where you're sending tremendous amounts of data back and forth and as soon as it crosses an availability zone boundary or a region boundary, or God forbid, goes out to the internet via standard egress fees over in AWS-land, there's a radically different economic modeling that comes into play as opposed to having something in the same availability zone, in the same subnet just where that—or all traffic back and forth is free. Do you see that in your customer base, that that is a model that is driving people towards self-hosting?Perry: No. And I'd say no because Capella allows you to peer and run your application in the same availability zone as the as a database. And so, as long as that's an option for you that we have, you know, our offering in the right region, in the right AZ, and you can put your application there, then that's not a not an issue. We did have a customer not too long ago that didn't set that up correctly, they thought they did, and we noticed some high data transfer charges. Again, the benefit of running a hosted service, we detected that for them and were able to turn around and say, “Hmm, you might want to change this to over there so that we all save some money in doing so.”If we were not there watching it, they might not have noticed that themselves if they were running it self-managed; they might not have known what to do about it. And so, there's a benefit to working with us and using that hosted platform that we can keep an eye out. And we can apply all of our learning and best practices and bug fixes, we give that to everybody, rather than each person having to stumble across those hurdles themselves.Corey: That's one of those fun, weird corner-case trivia things about AWS data transfer. When you're transferring data within the same region between availability zones, it costs a penny on the sending side and a penny on the receiving side. Everything else is one side or the other that winds up getting the charge. And what makes this especially fun is that when it shows up on your bill, if you transfer a petabyte, it shows as cross-AZ data transfer: two petabytes.Perry: Two. Yeah.Corey: So, it double-counts so they can bill for it appropriately, but it leads to some really weird hunting it down, like, “Okay, well, we found half of it, but where's the other half hiding?” It's always obnoxious to trace this stuff down. The fact that you see it on your bill, well, that's testament to the fact that yeah, they're using the service. Good for them and good for you. Being able to track it down on a per-customer basis that does speak to your level of insight into what exactly is going on in your environment and where. As someone who does this for a living, let me confirm that is absolutely non-trivial.Perry: No, definitely not trivial. And you know, we've learned over the last four or five years, we've learned an enormous amount about how cloud providers work, how AWS works, but guess what, Google does it completely differently. And Azure does it—Corey: Yep.Perry: —completely differently. And so, on the surface level, they're all just cloud providers and they give you a VM, and you put some stuff on it, but integrating with the APIs, integrating with the different systems and naming of things, and then understanding the intricacies of the ins and outs, and, yeah, these cloud providers have their own bugs as well. And so, sometimes you stumble across that for them. And it's been a significant learning exercise that I think we're all better off for, having Couchbase gone through it for you.Corey: Let's get this a little bit more germane for this week for those of you who are listening to this during re:Invent. You folks are clearly here at the show—it's funny to talk about ‘here,' even though when we're recording this, it is not near here; we're actually home and enjoying ourselves, but welcome to temporal dislocation; here we are—here at the show, you folks are—among other things—being kind enough to pass out the Last Week in AWS swag from your booth, which, thank you. So, that is obviously the primary reason that you were at the show. What are the other reasons? What are the secondary reasons that you decided to come here?Perry: Yeah [laugh]. Well, I guess I have to think about this now since you already called out the primary reason.Corey: Exactly. Wait, we can have more than one reason for things? My God.Perry: Can we? Can we? AWS has long been a huge partner of ours, even before Capella itself was released. I remember sometime in, you know, five years or so ago, some 30% of our customers were running Couchbase inside of AWS, and some of our largest were some of your largest at times, like Viber, the messaging platform. And so, we've always had a very strong relationship with AWS, and the better that we can be presenting ourselves to your customers, and our customers can feel that we are jointly supporting them, the better. And so, you know, coming to re:Invent is a testament to that long-standing and very solid partnership, and also it's meant to get more exposure for us to let it be clear that Couchbase runs very well on AWS.Corey: It's one of those areas where when someone says, “Oh yeah, this is a great service offering, but it doesn't run super well on AWS.” It's like, “Okay, so are you bad computers or is what you have built so broken and Byzantine that it has to live somewhere else?” Or occasionally, the use case is absolutely not supported by AWS. Not to beat them up some more on their egress fees, but I'm absolutely about to if you're building a video streaming site, you don't want it living in AWS. It won't run super well there. Well, it'll run well, it'll just run extortionately expensively and that means that it's a non-starter.Perry: Yeah, why do you think Netflix raises their fees?Corey: Netflix, to their credit, has been really rather public about this, where they do all of their egress via their Open Connect, custom-built CDN appliances that they drop all over the place. They don't stream a single byte from AWS, and we know this from the outside because they are clearly still solvent.Perry: [laugh].Corey: I do the math on that. So, if I had been streaming at on-demand prices one month with my Netflix usage, I would have wound up spending four times my subscription fee just in their raw costs for data transfer. And I have it on good authority that is not just data transfer that is their only bill in the entire company; they also have to pay people and content and the analytics engine and whatnot. And it's kind of a weird, strange world.Perry: Real estate.Corey: Yeah. Because it's one of those strange stories because they are absolutely a showcase customer for AWS. They've been a marquee customer trotted out year after year to talk about what they're doing. But if you attempt to replicate their business purely on top of AWS, it will not work. Full stop. The economics preclude that happening.What is your philosophy these days on what historically has felt like an existential threat to most vendors that I've spoken to in a variety of ways: what if Amazon decides to enter your market? I'd ask you the same thing. Do you have fears that they're going to wind up effectively taking your open-source offering and turning it into Amazon Basics Couchbase, for lack of a better term? Is that something that is on your threat radar, or is that not really something you concern yourselves about?Perry: So, I mean, there's no arguing, there's no illusion that Amazon and Google and Microsoft are significant competitors in the database space, along with Oracle and IBM and Mongo and a handful of others.Corey: Anything's a database if you hold it wrong.Perry: This is true. This specific point of open-source is something that we have addressed in the same ways that others have addressed. And that's by choosing and changing our license model so that it precludes cloud providers from using the open-source database to produce their own service on the back of it. Let me be clear, it does not impact our existing open-source users and anybody that wants to use the Community Edition or download the software, the source code, and build it themselves. It's only targeted at Amazon because they have a track record of doing that to things like Elastic and Redis and Mongo, all of whom who have made similar to Couchbase moves to prevent that by the licensing of the open-source code.Corey: So, one of the things I do see at re:Invent every year is—and I believe wholeheartedly this comes historically from a lot of AWS's requirements for vendors on the show floor that have become public through a variety of different ways—where you for a long time, you are not allowed to mention multi-cloud or reference the fact that you work on any other cloud provider there. So, there's been a theme of this is why, for whatever it is we sell or claim to sell or hope one day to sell, AWS is the absolute best place for you to run it, full stop. And in some cases, that's absolutely true because people build primarily for a certain cloud provider and then when they find customers and other places, they learn to run it over there, too. If I'm approaching this from the perspective of I have a database problem—because looking at my philosophy on databases is hard to imagine I don't have database problems—then is my experience going to be better or even materially different between any of the cloud providers if I become a Couchbase Capella customer?Perry: I'd like to say no. We've done our best to abstract and to leverage the best of all of the cloud providers underneath to provide Couchbase in the best form that they will allow us to. And as far as I can see, there's no difference amongst those. Your application and what you do with the data, that may be better suited to one provider or another, but it's always been Couchbase is philosophy—sort of say, strategy—to make our software available to wherever our customers and users want to, to consume it. And that goes everything from physical hardware running in a data center, virtual machines on top of that, containers, cloud, and different cloud providers, different regions, different availability zones, all the way through to edge and other infrastructures. We're not in a position to say, “If you want Couchbase, you should use AWS.” We're in a position to say, “If you are using AWS, you can have Couchbase.”Corey: I really want to thank you for being so generous with your time, and of course, your sponsorship dollars, which are deeply appreciated. Once again, swag is available at the Couchbase booth this week at re:Invent. If people want to learn more and if for some unfathomable reason, they're not at re:Invent, probably because they make good life choices, where can they go to find you?Perry: couchbase.com. That'll to be the best place to land on. That takes you to our documentation, our resources, our getting help, our contact pages, directly into Capella if you want to sign in or login. I would go there.Corey: And we will, of course, put links to that in the show notes. Thank you so much for your time. I really appreciate it.Perry: Corey, it's been a pleasure. Thank you for your questions and banter, and I really appreciate the opportunity to come and share some time with you.Corey: We'll have to have you back in the near future. Perry Krug, Director of Shared Services at Couchbase. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry and insulting comment berating me for being nowhere near musical enough when referencing [singing] Couchbase Capella.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
In this podcast Shane Hastie, Lead Editor for Culture & Methods, spoke to Rick Spencer of InfluxDB about building developer tools, removing friction and improving mean time to awesomeness, the need for online social intelligence and ways to avoid burnout Read a transcript of this interview: https://bit.ly/3EGfket Subscribe to our newsletters: - The InfoQ weekly newsletter: bit.ly/24x3IVq - The Software Architects' Newsletter [monthly]: www.infoq.com/software-architects-newsletter/ Upcoming Events: QCon Plus online: plus.qconferences.com/ - Nov 30 - Dec 8, 2022 QCon London: qconlondon.com/ - March 26-31, 2023 QCon San Francisco: qconsf.com/ - Oct 2-6, 2023 Follow InfoQ: - Twitter: twitter.com/InfoQ - LinkedIn: www.linkedin.com/company/infoq - Facebook: bit.ly/2jmlyG8 - Instagram: www.instagram.com/infoqdotcom/ - Youtube: www.youtube.com/infoq
Brian introduces himself, InfluxData, and what time series data is. He then talks about how it compares to other data and its unique value and benefits. Brian then connects it to the real world by telling us how customers engage with InfluxData's product and use cases with which time series data works well. Ryan and Brian then move into a high-level conversation around challenges in the IoT space and advice for companies trying to recognize where they need to improve.Brian Gilmore is Director of IoT and Emerging Technology at InfluxData, the creators of InfluxDB. He has focused the last decade of his career on working with organizations worldwide to drive the unification of industrial and enterprise IoT with machine learning, cloud, and other truly transformational technology trends.InfluxData is the creator of InfluxDB, the leading time series platform. They empower developers and organizations like Cisco, IBM, Siemens, and Tesla to build real-time IoT, analytics, and cloud applications with time-stamped data. Their technology is purpose-built to handle the massive volumes of data produced by sensors, systems, or applications that change over time. Easy to start and scale, InfluxDB gives developers time to focus on the features and functionalities that give their apps a competitive edge.
In this episode, we interview Brian Gilmore, director of IoT and Emerging Technology at InfluxData. InfluxData is the creator of InfluxDB, a pioneering time series platform that allows developers to build real-time IoT, analytics, and cloud applications with time-stamped data. They handle massive volumes of data produced by sensors, applications, and systems that change over time. Today, we discuss how next-generation databases create new opportunities by enabling organizations to seamlessly integrate real-time IoT data streams with cloud databases. We also dive deep into the relationships between database technology and adjacent innovations in AI, AR, and blockchain. Key Questions: What is the right way to think about “real time” from the perspective of a user? What are the unique uses of time series data, and what challenges does it present? How are AI, AR and blockchain being integrated into IoT systems? What recent database developments are improving management of complex IoT systems?
Brian Gilmore (@BrianMGilmore, Director IoT/Emerging Technology @InfluxDB) talks about Edge and Industrial Edge Computing, as well as application and data challenges at the edge.SHOW: 634CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwCHECK OUT OUR NEW PODCAST - "CLOUDCAST BASICS"SHOW SPONSORS:CloudZero - Cloud Cost Intelligence for Engineering TeamsStreamline on-call, collaboration, incident management, and automation with a free 30-day trial of Lightstep Incident Response, built on ServiceNow. Listeners of The Cloudcast will also receive a free Lightstep Incident Response T-shirt after firing an alert or incident.Pay for the services you use, not the number of people on your team with Lightstep Incident Response. Try free for 30 days. Fire an alert or incident today and receive a free Lightstep Incident Response t-shirt.Datadog Application Monitoring: Modern Application Performance MonitoringGet started monitoring service dependencies to eliminate latency and errors and enhance your users app experience with a free 14 day Datadog trial. Listeners of The Cloudcast will also receive a free Datadog T-shirt.SHOW NOTES:InfluxData (homepage) - InfluxDB - Time Series PlatformUnderstanding Time Series Database Platforms (Cloudcast Eps:394)Topic 1 - Welcome to the show. Before we get into the fascinating world of Edge and IoT, tell us a little bit about your background, and then where you focus these days with InfluxData.Topic 2 - It's been a little while since we covered IoT and IIoT. In the past it was somewhat of a fragmented market segment (lots of definitions, lots of different use-cases). How do you summarize the IoT and IIoT markets in 2022? Topic 3 - We've always said that the sensor part of IoT isn't very interesting, but what a company does with the data is very interesting (and complicated). How do companies think about edge data these days – what aspects of the data are valuable? Topic 4 - Time Series databases seem like the perfect fit for IoT and IIoT use-cases because they are designed to be both real-time and give historical context (from a time perspective). Is this the case, and why do companies ever consider other types of databases at the edge?Topic 5 - What are the current best practices about managing data at the edge, in terms of long-term retention and what they eventually do with the data (analysis, analytics, etc.) to better optimize those edge applications? Topic 6 - What are some of the emerging trends you're starting to see happen at the edge, that maybe weren't on the industry radar a few years ago? FEEDBACK?Email: show at the cloudcast dot netTwitter: @thecloudcastnet
In this episode, Ryan and Bhavin interview Rick Spencer, VP of Products at InfluxData, previously the VP of Platforms for InfluxData. The discussion focuses on InfluxData, creator of InfluxDB, and how they help customers looking for a Time-Series database solution. Rick talks about the interesting IoT and Edge computing use cases, and how getting that real-time sensor information can be transformational for customers. In the second half of the discussion, we focus on how InfluxDB cloud runs across three major cloud providers, running on top of Kubernetes itself. Rick focuses on their Kubernetes adoption journey and talks about their architecture today. We also talk about how users can leverage InfluxDB for monitoring their large-scale Kubernetes deployments too! Show Links: Datastax raises a private equity round of $115M valuring the company at $1.6B to focus more on Astra DB and Astra streaming - https://techcrunch.com/2022/06/15/datastax-proves-its-still-possible-to-raise-nine-figures-at-higher-valuation-in-2022/ Platform9 raises $26M bringing the total money raised across all the rounds to $100M - https://techcrunch.com/2022/06/14/platform9-raises-26m-to-help-manage-distributed-cloud-clusters/ Finout raised $14M Series A funding to help organizations break down the cost of each Kubernetes namespace, each folder in the S3 bucket, each Snowflake Query, etc. - https://www.geektime.com/finout-wants-to-reduce-your-cloud-costs-with-a-mega-bill-and-14m-in-a-round-funding/ Mercedez Benz runs 900 K8s clusters - https://www.infoworld.com/article/3664052/why-mercedes-benz-runs-on-900-kubernetes-clusters.html Weave Policy Library for HIPAA compliance - https://www.weave.works/blog/weave-policy-library-introducing-hipaa-policies Kanister Webinar - https://community.cncf.io/events/details/cncf-cncf-online-programs-presents-cncf-on-demand-webinar-kanister-application-level-data-protection-on-kubernetes Cloud-native storage - Kubernetes Podcast - https://kubernetespodcast.com/episode/182-cloud-native-storage/ 2021 K8s Annual Report - https://www.cncf.io/reports/kubernetes-annual-report-2021/ Enhancements#3337: KEP-3333 Retroactive default StorageClass assignment
This week we cover security updates for dpkg, logrotate, GnuPG, CUPS, InfluxDB and more, plus we take a quick look at some open positions on the team - come join us!
The solution many turn to for capturing their streaming data is InfluxDB. In this episode, I interview Brian Gilmore, Director of Product Management at InfluxData, about how real time applications achieve success built on top of InfluxDB. When most people hear the phrase Internet of Things, it typically evokes an image of connected devices we The post Time Series IoT on InfluxDB with Brian Gilmore appeared first on Software Engineering Daily.