Data Brew by Databricks

Share on

Welcome to Data Brew by Databricks with Denny and Brooke where we explore various topics in the data and AI community. In this series, we will interview subject matter experts in data engineering or data science. So join us with your morning brew in hand and get ready to dive deep into data + AI! For this first season, we will be focusing on lakehouses – combining the key features of data warehouses, such as ACID transactions, with the scalability of data lakes, directly against low-cost object stores.

Databricks

Apr 24, 2025 LATEST EPISODE
every other week NEW EPISODES
35m AVG DURATION
43 EPISODES

Search for episodes from Data Brew by Databricks with a specific topic:

Latest episodes from Data Brew by Databricks

Benchmarking Domain Intelligence | Data Brew | Episode 45

Play Episode Listen Later Apr 24, 2025 31:41

In this episode, Pallavi Koppol, Research Scientist at Databricks, explores the importance of domain-specific intelligence in large language models (LLMs). She discusses how enterprises need models tailored to their unique jargon, data, and tasks rather than relying solely on general benchmarks.Highlights include:- Why benchmarking LLMs for domain-specific tasks is critical for enterprise AI.- An introduction to the Databricks Intelligence Benchmarking Suite (DIBS).- Evaluating models on real-world applications like RAG, text-to-JSON, and function calling.- The evolving landscape of open-source vs. closed-source LLMs.- How industry and academia can collaborate to improve AI benchmarking.

ai data intelligence evaluating brew domain rag benchmarking research scientist json databricks

SWE-bench & SWE-agent | Data Brew | Episode 44

Play Episode Listen Later Apr 17, 2025 36:22

In this episode, Kilian Lieret, Research Software Engineer, and Carlos Jimenez, Computer Science PhD Candidate at Princeton University, discuss SWE-bench and SWE-agent, two groundbreaking tools for evaluating and enhancing AI in software engineering.Highlights include:- SWE-bench: A benchmark for assessing AI models on real-world coding tasks.- Addressing data leakage concerns in GitHub-sourced benchmarks.- SWE-agent: An AI-driven system for navigating and solving coding challenges.- Overcoming agent limitations, such as getting stuck in loops.- The future of AI-powered code reviews and automation in software engineering.

ai overcoming data addressing agent bench princeton university brew github swe

Enterprise AI: Research to Product | Data Brew | Episode 43

Play Episode Listen Later Apr 10, 2025 38:03

In this episode, Dipendra Kumar, Staff Research Scientist, and Alnur Ali, Staff Software Engineer at Databricks, discuss the challenges of applying AI in enterprise environments and the tools being developed to bridge the gap between research and real-world deployment.Highlights include:- The challenges of real-world AI—messy data, security, and scalability.- Why enterprises need high-accuracy, fine-tuned models over generic AI APIs.- How QuickFix learns from user edits to improve AI-driven coding assistance.- The collaboration between research & engineering in building AI-powered tools.- The evolving role of developers in the age of generative AI.

ai data product brew databricks ai research enterprise ai staff software engineer

Multimodal AI | Data Brew | Episode 42

Play Episode Listen Later Apr 7, 2025 42:14

In this episode, Chang She, CEO and Co-founder of LanceDB, discusses the challenges of handling multimodal data and how LanceDB provides a cutting-edge solution. He shares his journey from contributing to Pandas to building a database optimized for images, video, vectors, and subtitles.Highlights include:- The limitations of traditional storage systems like Parquet for multimodal AI.- How LanceDB enables efficient querying and processing of diverse data types.- The growing importance of multimodal AI in enterprise applications.- Future trends in AI, including a shift from single models to holistic AI systems.- Predictions and "spicy takes" on AI advancements in 2025.

ceo ai future predictions brew pandas multimodal ai data parquet

Age of Agents | Data Brew | Episode 41

Play Episode Listen Later Mar 27, 2025 40:47

In this episode, Michele Catasta, President of Replit, explores how AI-driven agents are transforming software development by making coding more accessible and automating application creation.Highlights include:- The difference between AI agents and copilots in software development.- How AI is democratizing coding, enabling non-programmers to build applications.- Challenges in AI agent development, including error handling and software quality.- The growing role of AI in entrepreneurship and business automation.- Why 2025 could be the year of AI agents and what's next for the industry.

president ai challenges data brew replit

Reward Models | Data Brew | Episode 40

Play Episode Listen Later Mar 20, 2025 39:58

In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).Highlights include:- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.- Techniques like Policy Proximal Optimization (PPO) and Direct PreferenceOptimization (DPO) for enhancing response quality.- The role of reward models in improving coding, math, reasoning, and other NLP tasks.Connect with Brandon Cui:https://www.linkedin.com/in/bcui19/

ai data reward models nlp brew research scientist databricks reinforcement learning rlhf mosaicml human feedback rlhf

Retrieval, rerankers, and RAG tips and tricks | Data Brew | Episode 39

Play Episode Listen Later Feb 20, 2025 45:22

In this episode, Andrew Drozdov, Research Scientist at Databricks, explores how Retrieval Augmented Generation (RAG) enhances AI models by integrating retrieval capabilities for improved response accuracy and relevance.Highlights include:- Addressing LLM limitations by injecting relevant external information.- Optimizing document chunking, embedding, and query generation for RAG.- Improving retrieval systems with embeddings and fine-tuning techniques.- Enhancing search results using re-rankers and retrieval diagnostics.- Applying RAG strategies in enterprise AI for domain-specific improvements.

ai data improving optimizing enhancing brew tips and tricks rag research scientist retrieval databricks

The Power of Synthetic Data | Data Brew | Episode 38

Play Episode Listen Later Feb 4, 2025 42:28

In this episode, Yev Meyer, Chief Scientist at Gretel AI, explores how synthetic data transforms AI and ML by improving data access, quality, privacy, and model training.Highlights include:- Leveraging synthetic data to overcome AI data limitations.- Enhancing model training while mitigating ethical and privacy risks.- Exploring the intersection of computational neuroscience and AI workflows.- Addressing licensing and legal considerations in synthetic data usage.- Unlocking private datasets for broader and safer AI applications.

ai data unlocking exploring addressing leveraging enhancing brew ml synthetic chief scientist

Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37

Play Episode Listen Later Jan 22, 2025 37:14

In this episode, Julia Neagu, CEO & co-founder of Quotient AI, explores the challenges of deploying Generative AI and LLMs, focusing on model evaluation, human-in-the-loop systems, and iterative development.Highlights include:- Merging reinforcement learning and unsupervised learning for real-time AI optimization.- Reducing bias in machine learning with fairness and ethical considerations.- Lessons from large-scale AI deployments on scalability and feedback loops.- Automating workflows with AI through successful business examples.- Best practices for managing AI pipelines, from data collection to validation.

ceo ai lessons secret data tools production infrastructure reducing brew merging automating

Mixture of Memory Experts (MoME) | Data Brew | Episode 36

Play Episode Listen Later Jan 10, 2025 41:24

In this episode, Sharon Zhou, Co-Founder and CEO of Lamini AI, shares her expertise in the world of AI, focusing on fine-tuning models for improved performance and reliability.Highlights include:- The integration of determinism and probabilism for handling unstructured data and user queries effectively.- Proprietary techniques like memory tuning and robust evaluation frameworks to mitigate model inaccuracies and hallucinations.- Lessons learned from deploying AI applications, including insights from GitHub Copilot's rollout.Connect with Sharon Zhou and Lamini:https://www.linkedin.com/in/zhousharon/https://x.com/realsharonzhouhttps://www.lamini.ai/

ceo ai lessons co founders data memory brew mixture proprietary github copilot mome

Mixed Attention & LLM Context | Data Brew | Episode 35

Play Episode Listen Later Nov 21, 2024 39:11

In this episode, Shashank Rajput, Research Scientist at Mosaic and Databricks, explores innovative approaches in large language models (LLMs), with a focus on Retrieval Augmented Generation (RAG) and its impact on improving efficiency and reducing operational costs.Highlights include:- How RAG enhances LLM accuracy by incorporating relevant external documents.- The evolution of attention mechanisms, including mixed attention strategies.- Practical applications of Mamba architectures and their trade-offs with traditional transformers.

data attention practical context mixed brew mosaic llm mamba research scientist databricks

Kumo AI & Relational Deep Learning | Data Brew | Episode 34

Play Episode Listen Later Oct 14, 2024 43:27

In this episode, Jure Leskovec, Co-founder of Kumo AI and Professor of Computer Science at Stanford University, discusses Relational Deep Learning (RDL) and its role in automating feature engineering. Highlights include:- How RDL enhances predictive modeling.- Applications in fraud detection and recommendation systems.- The use of graph neural networks to simplify complex data structures.

professor data stanford university applications computer science relational brew deep learning kumo jure leskovec

LLMs: Internals, Hallucinations, and Applications | Data Brew | Season 5 | Episode 4

Play Episode Listen Later Jul 21, 2023 38:50

Our fifth season dives into large language models (LLMs), from understanding the internals to the risks of using them and everything in between. While we're at it, we'll be enjoying our morning brew.In this session, we interviewed Chengyin Eng (Senior Data Scientist, Databricks), Sam Raymond (Senior Data Scientist, Databricks), and Joseph Bradley (Lead Production Specialist - ML, Databricks) on the best practices around LLM use cases, prompt engineering, and how to adapt MLOps for LLMs (i.e., LLMOps).

data applications brew llm hallucinations databricks internals

Demonstrate–Search–Predict Framework | Data Brew | Season 5 | Episode 3

Play Episode Listen Later Jun 29, 2023 33:14

We will dive into LLMs for our fifth season, from understanding the internals to the risks of using them and everything in between. While we're at it, we'll be enjoying our morning brew.In this session, we interviewed Omar Khattab - Computer Science Ph.D. Student at Stanford, creator of DSP (Demonstrate–Search–Predict Framework), to discuss DSP, common applications, and the future of NLP.

data search student stanford framework nlp predict brew demonstrate dsp

Generative AI Risks | Data Brew | Season 5 | Episode 2

Play Episode Listen Later Jun 8, 2023 34:38

We will dive into LLMs for our fifth season, from understanding the internals to the risks of using them and everything in between. While we're at it, we'll be enjoying our morning brew.In this session, we interviewed Yaron Singer, CEO of Robust Intelligence, Professor of Computer Science at Harvard University, and guest of Data Brew Season 3 (our first repeat guest!). In this session, we discuss generative AI, the trends toward embracing LLMs, and how the surface area for vulnerabilities in generative AI is much bigger.

ceo ai professor data risks harvard university computer science brew generative ai risks

John Snow Labs & SparkNLP | Data Brew | Season 5 | Episode 1

Play Episode Listen Later Jun 1, 2023 43:17

For our fifth season, we will dive into LLMs from understanding the internals to the risks of using them and everything in between. While we're at it, we'll be enjoying our morning brew.In this session, we interviewed David Talby who is the CTO at John Snow Labs; they help healthcare & life science companies put AI to good use. David's interests include natural language processing, applied artificial intelligence in healthcare, and responsible AI.

ai data cto labs brew john snow

Data Brew Season 4 Episode 6: Professional Athletes

Play Episode Listen Later Jun 9, 2022 35:49

For our fourth season, we focus on connected health and how data & AI augment and improve our daily health. While we're at it, we'll be enjoying our morning brew.Shayna Powless and Eli Ankou, professional cyclist for L39ion of Los Angeles and defensive tackle for the Buffalo Bills, respectively, provide valuable insight on how professional athletes leverage data to improve their performance and how they combine their passion for sports with the Dreamcatcher Foundation.See more at databricks.com/data-brew

ai los angeles data buffalo bills brew professional athletes

Data Brew Season 4 Episode 5: Public Health: Education, Access, and Policy

Play Episode Listen Later May 5, 2022 34:39 Transcription Available

For our fourth season, we focus on connected health and how data & AI augment and improve our daily health. While we're at it, we'll be enjoying our morning brew.Matt Willis, Marin County Public Health Officer, shares the three pillars of public health: education, access, and policy, and the critical role data plays in addressing the COVID-19 pandemic & opioid epidemic. See more at databricks.com/data-brew

covid-19 ai data policy brew matt willis public health education

Data Brew Season 4 Episode 4: 1283 Days of Running (and Counting)

Play Episode Listen Later Apr 14, 2022 35:54 Transcription Available

For our fourth season, we focus on connected health and how data & AI augment and improve our daily health. While we're at it, we'll be enjoying our morning brew.Running the length of the US every year, Alexandra Matthiesen shares her motivational secrets for running 1,283 consecutive days (and counting!) and redefining physical and mental limits. See more at databricks.com/data-brew

ai running data counting brew

Last Man Standing

Play Episode Listen Later Mar 31, 2022 41:20

For our fourth season, we focus on connected health and how data & AI augment and improve our daily health. While we're at it, we'll be enjoying our morning brew.Winner of the infamous Last Man Standing race (running 246 miles in 59 hours), Guillaume merges the world of competitive long-distance running with data science to push the boundaries of body and mind. See more at databricks.com/data-brew

ai winner guillaume last man standing

Data Brew Season 4 Episode 2: NBA Analytics

Play Episode Listen Later Mar 10, 2022 30:16 Transcription Available

For our fourth season, we focus on connected health and how data & AI augment and improve our daily health. While we're at it, we'll be enjoying our morning brew.Alexander Powell chronicles the evolution of sports analytics and how professional sports teams use data as a competitive advantage. See more at databricks.com/data-brew

ai data analytics brew

Data Brew Season 4 Episode 1: Reducing Injury & Increasing Retention of Industrial Athletes

Play Episode Listen Later Feb 24, 2022 33:58 Transcription Available

For our fourth season, we focus on connected health and how data & AI augment and improve our daily health. While we're at it, we'll be enjoying our morning brew.Globally, 38,000 people get hurt on the job every hour. In the United States alone, over $250 billion dollars is spent on workplace injury annually. Sean Petterson, founder and CEO of StrongArm Tech, discusses the role of wearable devices to reduce workplace injury and increase retention of industrial athletes. See more at databricks.com/data-brew

united states ceo ai data injury athletes increasing industrial reducing retention globally brew

Data Brew Season 3 Episode 6: Open Source

Play Episode Listen Later Oct 28, 2021 33:49 Transcription Available

For our third season, we focus on how leaders use data for change. Whether it's building data teams or using data as a constructive catalyst, we interview subject matter experts from industry to dive deeper into these topics.For our season 3 finale, Nithya Ruff discusses the open-source ecosystem, ways to contribute to open-source projects (hint: it's not just about the code), and how businesses can balance community and company interests. With 95% of open-source contributions coming from men, Nithya also educates us on how to improve diversity & inclusion in the open-source community.See more at databricks.com/data-brew

data open source brew nithya

Data Brew Season 3 Episode 5: Sustainability & Sake

Play Episode Listen Later Oct 14, 2021 32:26 Transcription Available

For our third season, we focus on how leaders use data for change. Whether it's building data teams or using data as a constructive catalyst, we interview subject matter experts from industry to dive deeper into these topics.We interview Junta Nakai in our most unique location yet - Brooklyn Kura - the first non-Japanese sake distillery in New York. In this episode, Junta shares the philosophical, economic, and tactical approaches to sustainability and ESG, as well as the secrets to brewing sake in the US. See more at databricks.com/data-brew

new york data japanese sustainability esg sake brew junta brooklyn kura

Data Brew Season 3 Episode 4: Executive Education

Play Episode Listen Later Oct 7, 2021 38:46 Transcription Available

For our third season, we focus on how leaders use data for change. Whether it's building data teams or using data as a constructive catalyst, we interview subject matter experts from industry to dive deeper into these topics.Did you know that the average tenure of a board member is longer than the average tenure of a marriage in the United States? In this episode, Coco Brown discusses the benefits and drawbacks of the long tenures of corporate boards, their current structure, the impact of recent legislation, and the importance of executive education to guide you through all of this. See more at databricks.com/data-brew

united states data executives brew executive education coco brown

Data Brew Season 3 Episode 3: 3 T's to Securing AI Systems: Tests, tests, and more tests

Play Episode Listen Later Sep 30, 2021 35:01 Transcription Available

For our third season, we focus on how leaders use data for change. Whether it's building data teams or using data as a constructive catalyst, we interview subject matter experts from industry to dive deeper into these topics.What does it mean to make your machine learning system “production-ready”? Yaron Singer walks us through the infrastructure, testing procedures, and more that help make ML systems ready for the real world in this episode of Data Brew.See more at databricks.com/data-brew

data tests securing brew ml ai systems

Data Brew Season 3 Episode 2: Data Culture Outside ‘The Valley'

Play Episode Listen Later Sep 23, 2021 35:39 Transcription Available

For our third season, we focus on how leaders use data for change. Whether it's building data teams or using data as a constructive catalyst, we interview subject matter experts from industry to dive deeper into these topics.Have you ever had a spam call automatically blocked for you? You can thank First Orion for that - in one day they blocked or scam tagged over 108 million calls - just on T-Mobile alone! In this episode, we have the pleasure to chat with Charles Morgan and Kent Welch, CEO and CDO, respectively, of First Orion to discuss Arkansan data culture, First Orion's one hundred day program, and team culture.See more at databricks.com/data-brew

ceo valley brew t mobile cdo arkansans data culture charles morgan first orion

Data Brew Season 3 Episode 1: Disrupt: Challenge your Business Assumptions

Play Episode Listen Later Sep 16, 2021 29:45 Transcription Available

For our third season, we focus on how leaders use data for change. Whether it's building data teams or using data as a constructive catalyst, we interview subject matter experts from industry to dive deeper into these topics. In this season opener, Elena Donio shares her experience using data and domain knowledge to disrupt the traditional service and sales compensation model. She also discusses how to build companies that scale, manage corporate cultural evolution, and the influence of corporate boards.See more at databricks.com/data-brew

data brew assumptions disrupt

Data Brew Season 2 Episode 9: Data Driven Software

Play Episode Listen Later Jul 21, 2021 31:12 Transcription Available

For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.We branch, version, and test our code, but what if we treated data like code? Tim Hunter joins us to discuss the open-source Data-Driven Software (DDS) package and how it leads to immense gains in collaboration and decreased runtime for data scientists at any organization.See more at databricks.com/data-brew

software brew ml data driven automl tim hunter

Data Brew Season 2 Episode 8: Feature Engineering

Play Episode Listen Later Jul 9, 2021 31:17 Transcription Available

For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.Is there ever a “one-size fits all” approach for feature engineering? Find out this and more with Amanda Casari and Alice Zheng, co-authors of the Feature Engineering for Machine Learning book.See more at databricks.com/data-brew

data machine learning brew ml automl feature engineering

Data Brew Season 2 Episode 7: Interpretable Machine Learning

Play Episode Listen Later Jul 1, 2021 37:07 Transcription Available

For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.What does it mean for a model to be “interpretable”? Ameet Talwalkar shares his thoughts on IML (Interpretable Machine Learning), how it relates to data privacy and fairness, and his research in this field.See more at databricks.com/data-brew

data machine learning brew ml automl

Data Brew Season 2 Episode 6: AutoML

Play Episode Listen Later Jun 17, 2021 35:55 Transcription Available

For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.Erin LeDell shares valuable insight on AutoML, what problems are best solved by it, its current limitations, and her thoughts on the future of AutoML. We also discuss founding and growing the Women in Machine Learning and Data Science (WiMLDS) non-profit.See more at databricks.com/data-brew

women data machine learning brew ml automl

Data Brew Season 2 Episode 5: ML Applications

Play Episode Listen Later Jun 10, 2021 32:40 Transcription Available

For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.Good machine learning starts with high quality data. Irina Malkova shares her experience managing and ensuring high-fidelity data, developing custom metrics to satisfy business needs, and discusses how to improve internal decision making processes.See more at databricks.com/data-brew

data applications brew ml automl

Data Brew Season 2 Episode 4: Hyperparameter and Neural Architecture Search

Play Episode Listen Later May 13, 2021 33:25 Transcription Available

For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.Liam Li is a leading researcher in the fields of hyperparameter optimization and neural architecture search, and is the author of the seminal Hyperband paper. In this session, Liam discusses the evolution of hyperparameter optimization techniques and illustrates how every data scientist can benefit from neural architecture search. See more at databricks.com/data-brew

data search architecture brew ml neural automl

Data Brew Season 2 Episode 3: Infrastructure for ML

Play Episode Listen Later May 5, 2021 30:34

For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more. Adam Oliner discusses how to design your infrastructure to support ML, from integration tests to glue code, the importance of iteration, and centralized vs decentralized data science teams. He provides valuable advice for companies investing in ML and crucial lessons he’s learned from founding two companies.See more at databricks.com/data-brew

data infrastructure brew ml automl

Data Brew Season 2 Episode 2: Data Ethics

Play Episode Listen Later Apr 28, 2021 25:47

For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.Have you ever wondered how your purchasing behavior may reveal protected attributes? Or how data scientists and business play a role in combating bias? We discuss with Diana Pfeil recommendations to reduce bias and improve fairness, from SHAP to adversarial debiasing.See more at databricks.com/data-brew

brew ml data ethics shap automl

Data Brew Season 2 Episode 1: ML in Production

Play Episode Listen Later Apr 22, 2021 30:49

For our second season, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.In the season opener, Matei Zaharia discusses how he entered the field of ML, best practices for productionizing ML pipelines, leveraging MLflow & the Lakehouse architecture for reproducible ML, and his current research in this field.See more at databricks.com/data-brew

ai data production spark machine learning brew ml automl mlflow matei zaharia

Data Brew Episode 6: Journey of Big Data

Play Episode Listen Later Feb 18, 2021 40:16

Jules Damji and Tathagata Das guide us through their journey in big data and the evolution of data architecture in the past 30 years. They discuss some of the biggest changes in industry they’ve seen, as well as trends to look forward to in the coming years. This is a fun episode connecting all four authors of the Learning Spark, 2nd Edition book.See more at databricks.com/data-brew

big data brew

Data Brew Episode 5: Combining Machine Learning and MLflow with your Lakehouse

Play Episode Listen Later Jan 6, 2021 36:00

Ellissa Verseput, ML Engineer at Quby, joins Denny and Brooke to discuss how Quby leverages ML to extract additional value from their data lake and how they manage this process.See more at databricks.com/data-brew

data machine learning brew ml mlflow quby

Data Brew Episode 4: BI on Data Lakes - Making it Real for Retail

Play Episode Listen Later Dec 22, 2020 29:05

In this session, we discuss the lessons learned with Lara Minor, Senior Enterprise Data Manager at Columbia Sportswear, on how her team achieved a 70% reduction in pipeline creation time. This had reduced ETL workload times from four hours with previous data warehouses to minutes enabling near real-time analytics. Her team migrated from multiple legacy data warehouses, run by individual lines of business, to a single scalable, reliable, performant data lake.See more at databricks.com/data-brew

data retail brew lakes etl columbia sportswear making it real

Data Brew Episode 3: Demystifying Delta Lake

Play Episode Listen Later Dec 6, 2020 25:51

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs. For our “Demystifying Delta Lake” session, we will interview Michael Armbrust - committer and PMC member of Apache Spark™ and the original creator of Spark SQL. He currently leads the team at Databricks that designed and built Structured Streaming and Delta Lake.See more at databricks.com/data-brew

data lake delta demystifying acid brew pmc databricks apache spark armbrust

Data Brew Episode 2: Welcome to lakehouse

Play Episode Listen Later Nov 12, 2020 26:10

Legacy approaches have failed to deliver on the promise of a single data architecture that can support every downstream use case from BI to AI. Lakehouse aspires to address this by combining the best of data warehouses and data lakes. Ali Ghodsi, Co-Founder and CEO of Databricks, and David Meyer, SVP of Product at Databricks, explain how. See more at databricks.com/data-brew

ceo ai co founders data product svp bi brew databricks david meyer

Data Brew Episode 1: From data warehousing to data lakes in 40 minutes

Play Episode Listen Later Oct 28, 2020 44:48 Transcription Available

In our inaugural episode, we’d like to welcome data warehouse luminaries Barry Devlin, Susan O’Connell, and Donald Farmer to discuss the evolution of data warehouses, data lakes, and lakehouses.See more at databricks.com/data-brew

brew lakes data warehousing donald farmer

Claim Data Brew by Databricks

In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

Claim Cancel