The Data Engineering Show

Follow The Data Engineering Show
Share on
Copy link to clipboard

The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory, and learn from the biggest influencers in tech about their practical day to day data challenges and solutions in a casual and fun setting. WHO ARE THE DATA BROS? Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high growth analytics unicorn, they moved on to their next venture, Firebolt, a high performance cloud data warehouse serving some of the world’s most advanced tech companies. Their guilty pleasures include analyzing data pipelines and beating each other in endless query performance battles.

The Firebolt Data Bros


    • Apr 8, 2025 LATEST EPISODE
    • monthly NEW EPISODES
    • 34m AVG DURATION
    • 81 EPISODES


    Search for episodes from The Data Engineering Show with a specific topic:

    Latest episodes from The Data Engineering Show

    Revolutionizing Data Governance with DataStrato's Unified Open Source Approach

    Play Episode Listen Later Apr 8, 2025 23:36


    In this episode of The Data Engineering Show, the bros sit with Lisa Cao, Product Manager at DataStrato, to explore data catalogs and Apache Gravitino, a unified metadata lake used to manage access and perform data governance for all data sources. They discuss data catalogs and how they refine the data management process.

    Beyond Database Optimization with AI

    Play Episode Listen Later Mar 19, 2025 30:52


    In this episode of The Data Engineering Show, the bros welcome the CEO DuckDB Labs and co-creator DuckDB, Hannes Mühleisen. They delve into the groundbreaking journey of DuckDB, an analytical database that processes billions of queries every month. Learn why DuckDB prioritizes broad compatibility over specialized optimizations, how its extension model works and the emerging solutions for database technology in the age of AI.

    AI and Data Movement: Trends and Best Practices with Estuary's Daniel Pálma

    Play Episode Listen Later Feb 11, 2025 30:33


    In this episode of The Data Engineering Show, the bros sit with Daniel Pálma, Head of Marketing at Estuary, to delve into the intriguing world of data engineering and marketing. Daniel shares his transition journey into marketing from data engineering and how his technical proficiency has been leveraged to market to engineers. The conversation cuts across the importance of AI in data movement, the future of data engineering, real-time data integration challenges, and the evolution of data integration.

    AI and Data Change Management with Chad Sanderson, CEO Gable AI

    Play Episode Listen Later Jan 7, 2025 36:43


    In this episode of The Data Engineering Show, host Benjamin and co-host Eldad are joined by Chad Sanderson, CEO and co-founder of Gable AI to discuss the revolution of data quality and governance, the importance of understanding data flow and the processes that help organizations manage their data more effectively.

    Tech Stacks and Tradeoffs: Xudo's Founder on Picking the Right Tools for BI Success

    Play Episode Listen Later Nov 26, 2024 24:56


    Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter's grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter, BI is all about change management and aligning leadership with data projects.

    Data Rewind: Conversation Highlights from Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan

    Play Episode Listen Later Oct 31, 2024 28:02


    This is a special episode of The Data Engineering Show, and joining the Bros is not one guest, nor even two – instead they're revisiting the best bits from three different fascinating episodes. In each, they spotlight essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspectives on where data engineering is heading and why certain challenges persist.

    The Resurgence of SQL: Insights from Ryanne Dolan from LinkedIn

    Play Episode Listen Later Sep 24, 2024 32:57


    In this episode of The Data Engineering Show, Ryanne Dolan from LinkedIn joins the Bros to discuss LinkedIn's Hoptimator project. Ryanne explains how they're simplifying complex data workflows by automating them through SQL queries, integrating Kubernetes, Kafka, and Flink. The conversation highlights the shift towards a consumer-driven data model and the future of data engineering.

    Vector Databases Won't Replace SQL - Andy Pavlo

    Play Episode Listen Later Jun 4, 2024 42:59


    SQL's slow. SQL's stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production databases safely, and why SQL is here to stay.

    Vector Databases Won't Replace SQL - Andy Pavlo

    Play Episode Listen Later Jun 4, 2024 42:59


    SQL's slow. SQL's stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production databases safely, and why SQL is here to stay.

    How ZoomInfo transitioned from data graveyards to ROI-driven data projects

    Play Episode Listen Later Apr 16, 2024 39:46


    Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data & Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo's data stack and unique tech culture. 

    How ZoomInfo transitioned from data graveyards to ROI-driven data projects

    Play Episode Listen Later Apr 16, 2024 39:46


    Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data & Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo's data stack and unique tech culture. 

    Matthew Weingarten from Disney Streaming about Data Quality Best Practices

    Play Episode Listen Later Mar 26, 2024 27:21


    Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies.

    Matthew Weingarten from Disney Streaming about Data Quality Best Practices

    Play Episode Listen Later Mar 26, 2024 27:21


    Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies.

    Joseph Machado, Senior Data Engineer @ LinkedIn talks best practices

    Play Episode Listen Later Feb 29, 2024 25:59


    Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods. 

    Joseph Machado, Senior Data Engineer @ LinkedIn talks best practices

    Play Episode Listen Later Feb 29, 2024 25:59


    Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods. 

    Professors Joe Hellerstein and Joseph Gonzalez on LLMs

    Play Episode Listen Later Jan 24, 2024 46:07


    Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. They've inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded.If you consider yourself a hardcore engineer, this episode is for you.

    Professors Joe Hellerstein and Joseph Gonzalez on LLMs

    Play Episode Listen Later Jan 24, 2024 46:07


    Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. They've inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded.If you consider yourself a hardcore engineer, this episode is for you.

    Megan Lieu on powerful notebooks that enable collaboration

    Play Episode Listen Later Jan 1, 2024 31:31


    There are two types of data influencers on LinkedIn:1. Those who talk directly about the products and companies they work for2. Those that provide more general guidance, tips and opinions Can influencers actually be passionate about the products they're developing and straightforwardly talk about them without sounding salesly? We're kicking off 2024 with the amazing Megan Lieu on a new Data Engineering Show episode.Megan is one of those influencers that combine the two approaches, and with almost 100K followers, her content seems to be resonating with many data folks. She talked to the bros about her approach to data advocacy as well as the power of notebooks, especially when they become broader and enable collaboration.

    Megan Lieu on powerful notebooks that enable collaboration

    Play Episode Listen Later Jan 1, 2024 31:31


    There are two types of data influencers on LinkedIn:1. Those who talk directly about the products and companies they work for2. Those that provide more general guidance, tips and opinions Can influencers actually be passionate about the products they're developing and straightforwardly talk about them without sounding salesly? We're kicking off 2024 with the amazing Megan Lieu on a new Data Engineering Show episode.Megan is one of those influencers that combine the two approaches, and with almost 100K followers, her content seems to be resonating with many data folks. She talked to the bros about her approach to data advocacy as well as the power of notebooks, especially when they become broader and enable collaboration.

    Transitioning from software engineering to data engineering

    Play Episode Listen Later Nov 22, 2023 29:48


    Every data team should have at least one data engineer with a software engineering background. This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium. She's a data engineer at Adyen with a software engineering background, and she met the bros to talk about why both software and data engineering skills are so important.Without software engineering skills you'll be limited to the rigid capabilities of your stack. But without data engineering skills you'll find it hard to be cost effective and see the bigger picture.

    Transitioning from software engineering to data engineering

    Play Episode Listen Later Nov 22, 2023 29:48


    Every data team should have at least one data engineer with a software engineering background. This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium. She's a data engineer at Adyen with a software engineering background, and she met the bros to talk about why both software and data engineering skills are so important.Without software engineering skills you'll be limited to the rigid capabilities of your stack. But without data engineering skills you'll find it hard to be cost effective and see the bigger picture.

    Vin Vashishta explains why we should stop using dashboards

    Play Episode Listen Later Oct 4, 2023 35:45


    Vin Vashista, the guy we all love to follow, has never seen a dashboard with positive ROI. This time on The Data Engineering Show, he met the bros to talk about the difference between BI dashboards and analytics that actually introduce knowledge. It's no longer just about the data volume, it's about quality and relevance.

    Vin Vashishta explains why we should stop using dashboards

    Play Episode Listen Later Oct 4, 2023 35:45


    Vin Vashista, the guy we all love to follow, has never seen a dashboard with positive ROI. This time on The Data Engineering Show, he met the bros to talk about the difference between BI dashboards and analytics that actually introduce knowledge. It's no longer just about the data volume, it's about quality and relevance.

    Joe Reis and Matt Housley on the fundamentals of data engineering

    Play Episode Listen Later Sep 6, 2023 42:11


    After co-writing the best-selling book ‘Fundamentals of Data Engineering', Joe Reis and Matt Housely joined the bros for some much-needed ranting, priceless data advice, and good laughs. So why are we still talking about providing business value and dashboards, even though we don't really have anything new to say? If there are so many great tools in the data stack, why are we still so troubled? How can we focus more on things like data governance and data quality that'll actually push the industry forward?

    Joe Reis and Matt Housley on the fundamentals of data engineering

    Play Episode Listen Later Sep 6, 2023 42:11


    After co-writing the best-selling book ‘Fundamentals of Data Engineering', Joe Reis and Matt Housely joined the bros for some much-needed ranting, priceless data advice, and good laughs. So why are we still talking about providing business value and dashboards, even though we don't really have anything new to say? If there are so many great tools in the data stack, why are we still so troubled? How can we focus more on things like data governance and data quality that'll actually push the industry forward?

    Bill Inmon, the Godfather of Data Warehousing

    Play Episode Listen Later Aug 8, 2023 30:32


    As people in the data industry go, Bill Inmon is among the top, often seen as the godfather of the data warehouse. In this Data Engineering Show episode, Bill Inmon talks about surviving rabbit holes throughout the evolution of data, the data modeling renaissance, and why ChatGPT is not Textual ETL.

    Bill Inmon, the Godfather of Data Warehousing

    Play Episode Listen Later Aug 8, 2023 30:32


    As people in the data industry go, Bill Inmon is among the top, often seen as the godfather of the data warehouse. In this Data Engineering Show episode, Bill Inmon talks about surviving rabbit holes throughout the evolution of data, the data modeling renaissance, and why ChatGPT is not Textual ETL.

    Large-scale data engineering at Momentive.ai - Meenal Iyer

    Play Episode Listen Later Jul 12, 2023 38:40


    As companies scale, data gets messy. The data team says one thing, the business team says something completely different. Meenal Iyer, VP Data at Momentive.ai, Met the Data Bros to talk about enforcing collaboration in large organizations to ensure what she considers the three most important data factors: Adoption, Trust, and Value.

    Large-scale data engineering at Momentive.ai - Meenal Iyer

    Play Episode Listen Later Jul 12, 2023 38:40


    As companies scale, data gets messy. The data team says one thing, the business team says something completely different. Meenal Iyer, VP Data at Momentive.ai, Met the Data Bros to talk about enforcing collaboration in large organizations to ensure what she considers the three most important data factors: Adoption, Trust, and Value.

    Data engineering from the early 2000s till today - BlackRock

    Play Episode Listen Later Jun 8, 2023 41:49


    When it comes to data management, have we come a long way since the early 2000s? Or has it simply taken us 20 years to finally realize that you can't scale properly without data modeling. With over 20 years of experience in the data space, leading engineering teams at Cisco, Oracle, Greenplum, and now as Sr. Director of Engineering at BlackRock, Krishnan Viswanathan talks about the data engineering challenges that existed two decades ago and still exist today.

    Data engineering from the early 2000s till today - BlackRock

    Play Episode Listen Later Jun 8, 2023 41:49


    When it comes to data management, have we come a long way since the early 2000s? Or has it simply taken us 20 years to finally realize that you can't scale properly without data modeling. With over 20 years of experience in the data space, leading engineering teams at Cisco, Oracle, Greenplum, and now as Sr. Director of Engineering at BlackRock, Krishnan Viswanathan talks about the data engineering challenges that existed two decades ago and still exist today.

    Zach Wilson on what makes a great data engineer

    Play Episode Listen Later Apr 27, 2023 34:02


    How good you are at Spark or Flink ≠ how good you are at data engineering. After years of data engineering experience at Airbnb, Netflix, and Facebook, Zach Wilson is now focused on spreading the knowledge in EcZachly and all over social media. He met Benjamin Wagner to explain why data modeling and storytelling are more important than the actual tech, why data engineering is going to see more job growth than data science, and what brought him to start creating content, reaching over 250K followers on LinkedIn.

    Zach Wilson on what makes a great data engineer

    Play Episode Listen Later Apr 27, 2023 34:02


    How good you are at Spark or Flink ≠ how good you are at data engineering. After years of data engineering experience at Airbnb, Netflix, and Facebook, Zach Wilson is now focused on spreading the knowledge in EcZachly and all over social media. He met Benjamin Wagner to explain why data modeling and storytelling are more important than the actual tech, why data engineering is going to see more job growth than data science, and what brought him to start creating content, reaching over 250K followers on LinkedIn.

    How ZipRecruiter and Yotpo power self-service data platforms that work

    Play Episode Listen Later Mar 23, 2023 45:48


    Data engineers are not paid to do support. Liran Yogev, Director of Engineering at ZipRecruiter, and Doron Porat, Director of Infrastructure at Yotpo talk about building resilient self-service products that keep customers happy and engineers calm. They walked the bros through their data stacks and explained how ZipRecruiter is completely rebuilding its data layer from scratch.

    How ZipRecruiter and Yotpo power self-service data platforms that work

    Play Episode Listen Later Mar 23, 2023 45:48


    Data engineers are not paid to do support. Liran Yogev, Director of Engineering at ZipRecruiter, and Doron Porat, Director of Infrastructure at Yotpo talk about building resilient self-service products that keep customers happy and engineers calm. They walked the bros through their data stacks and explained how ZipRecruiter is completely rebuilding its data layer from scratch.

    Data Observability with Millions of Users - Barr Moses

    Play Episode Listen Later Feb 8, 2023 38:36


    Barr Moses, CEO of Monte Carlo explains the difference between data quality and data observability, and how to make sure your data is accurate in a world where so many different teams are accessing it.

    Data Observability with Millions of Users - Barr Moses

    Play Episode Listen Later Feb 8, 2023 38:36


    Barr Moses, CEO of Monte Carlo explains the difference between data quality and data observability, and how to make sure your data is accurate in a world where so many different teams are accessing it.

    How Amplitude Engineers Process 5 Trillion Real-time Events

    Play Episode Listen Later Jan 5, 2023 27:59


    Weichen Wang, Senior Engineering Manager at Amplitude, came to meet the bros to talk about Amplitude's cutting-edge data stack and how it processes 5 Trillion real-time events while dealing with mutable data and massive scale.

    How Amplitude Engineers Process 5 Trillion Real-time Events

    Play Episode Listen Later Jan 5, 2023 27:59


    Weichen Wang, Senior Engineering Manager at Amplitude, came to meet the bros to talk about Amplitude's cutting-edge data stack and how it processes 5 Trillion real-time events while dealing with mutable data and massive scale.

    Making Observability a Key Business Driver

    Play Episode Listen Later Nov 29, 2022 48:59


    80% of the code that you write doesn't work on the first try. And that's fine. But knowing which 80% is not working and which 20% is working is the actual challenge. After 10 years at Facebook, managing and scaling the Seattle site to over 6000 engineers(!) Vijaye Raji founded Statsig to make observability automated and real-time. How is the semantic layer managed? How was the Statsig team able to build an observability product that handles real-time ever-changing metadata? What are Vijaye's main takeaways from engineering at Facebook? Tune in.

    Making Observability a Key Business Driver

    Play Episode Listen Later Nov 29, 2022 48:59


    80% of the code that you write doesn't work on the first try. And that's fine. But knowing which 80% is not working and which 20% is working is the actual challenge. After 10 years at Facebook, managing and scaling the Seattle site to over 6000 engineers(!) Vijaye Raji founded Statsig to make observability automated and real-time. How is the semantic layer managed? How was the Statsig team able to build an observability product that handles real-time ever-changing metadata? What are Vijaye's main takeaways from engineering at Facebook? Tune in.

    A ClickHouse Review from a Practitioner's Point of View

    Play Episode Listen Later Sep 1, 2022 34:43


    Sudeep Kumar, Principal Engineer at Salesforce is a ClickHouse fan. He considers the shift to Clickhouse as one of his biggest accomplishments during his eBay days and walks Boaz through his experience with the platform. How on one hand it handled 2B events per minute, but also how it required rollups which compromised granularity when extending time windows. Besides a ClickHouse review from a practitioner's point of view, Sudeep tells us about interesting use-cases he's working on at Salesforce. 

    A ClickHouse Review from a Practitioner's Point of View

    Play Episode Listen Later Sep 1, 2022 34:43


    Sudeep Kumar, Prinipal Engineer at Salesforce is a ClickHouse fan. He considers the shift to ClickHouse as one of his biggest accomplishments during his eBay days and walks Boaz through his experience with the platform. How on one hand it handled 2B events per minute, but also how it required rollups which compromised granularity when extending time windows. Besides a ClickHouse review from a practitioner's point of view, Sudeep tells us about interesting use-cases he's working on at Salesforce.

    How Preset Built a Data-Driven Organization from the Ground Up

    Play Episode Listen Later Aug 3, 2022 45:56


    According to Maxime Beauchemin, CEO & Founder at Preset and Creator of Apache Superset and Apache Airflow, it's not so straight-forward to understand what you're really getting into and the vastness of the skills that are required in order to build a thriving company.Picking the right system and services is key for a successful start, and can help you avoid the chaos of having too many tools spread across multiple teams.Plus, Max walks the bros through the genesis of Airflow, Superset & Presto, and Airflow's old school marketing approach that won the hearts of developers across the world. And just like the terminator, once the machine takes over, you can't stop.

    The Creator of Airflow About His Recipe for Smart Data-Driven Companies

    Play Episode Listen Later Aug 3, 2022 45:56


    According to Maxime Beauchemin, CEO & Founder at Preset and Creator of Apache Superset and Apache Airflow, building a thriving company is not so straight-forward. So how did he do it? Choosing the right system and services is key for a successful start, and can help you avoid the chaos of having too many tools spread across multiple teams. Max walks the Bros through his recipe for a smart data-driven company, and the genesis of Airflow, Superset & Presto (with some great tidbits about Airflow's old school marketing approach and how the open source platform took on a life of its own).

    How Similarweb Delivers Customer Facing Analytics Over 100s of TBs

    Play Episode Listen Later Jul 14, 2022 37:11


    According to Yoav Shmaria, VP R&D Platform at Similarweb, the best way to manage data warehouse costs is to tag every table, database or ETL running to have good granularity over every feature.  Besides handy cost management tips, Yoav walks the bros through the tech stack he implemented to analyze 100s of TBs of web data to serve fast customer-facing analytics.Full disclosure, Similarweb is a Firebolt customer, but the bros kept it objective, and there's no Firebolt talk in this episode.

    How Similarweb Delivers Customer Facing Analytics Over 100s of TBs

    Play Episode Listen Later Jul 13, 2022 37:11


    According to Yoav Shmaria, VP R&D Platform at Similarweb, the best way to manage data warehouse costs is tagging every table, database or ETL running to have good granularity over every feature. Besides handy cost management tips, Yoav walks the bros through the tech stack he implemented to analyze 100s of TBs of web data to serve fast customer-facing analytics. Full disclosure, Similarweb is a Firebolt customer, but the bros kept it objective, and there's no Firebolt talk in this episode.

    How Klarna Designed a New Data Platform in the Cloud

    Play Episode Listen Later Jun 9, 2022 40:37


    Klarna is one of the leading fintech companies in the world, valued at $45B. While many corporations are “stuck” on-prem, Klarna made the move and today is a cloud-only company. Gunnar Tangring, Klarna's Lead Data Engineer tells Boaz what this new modernized stack looks like.

    How Klarna Designed a New Data Platform in the Cloud

    Play Episode Listen Later Jun 9, 2022 40:37


    Klarna is one of the leading fintech companies in the world, valued at $45B. While many corporations are “stuck” on-prem, Klarna made the move and today is a cloud-only company. Gunnar Tangring, Klarna's Lead Data Engineer tells Boaz what this new modernized stack looks like.

    How Eventbrite is Modernizing its Data Stack

    Play Episode Listen Later May 23, 2022 23:25


    Archana shares Eventbrite's data stack modernization process, and how you get engineers to adopt new technologies like dbt which may be outside their comfort zone.  

    How Eventbrite is Modernizing its Data Stack

    Play Episode Listen Later May 23, 2022 23:25


    Archana Ganapathi, Head of Data & Analytics Engineering at Eventbrite, shares Eventbrite's data stack modernization process, and how you get engineers to adopt new technologies like dbt which may be outside their comfort zone.

    Claim The Data Engineering Show

    In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

    Claim Cancel