POPULARITY
Colin Zima is the cofounder and CEO of Omni, a data platform that combines the consistency of a shared data model with the speed and freedom of SQL. They recently raised their $69M Series B led by ICONIQ Growth. He was previously the Chief Analytics Officer at Looker.Colin's favorite book: Blink (Author: Malcolm Gladwell)(00:01) Introduction(01:10) What Is a Data Model and Why It Matters(03:27) Gaps in the Modern Data Stack(05:38) The Staying Power of SQL(07:29) Origin Story: Why Omni Was Created(10:13) Lessons from Building the MVP(12:48) Go-to-Market Insights: Zero to Ten Customers(16:02) Founder-Led Sales and Marketing Tactics(18:58) Company Building: Recruiting and Product Challenges(21:34) Product Positioning in a Crowded Market(23:26) Design Philosophy in Enterprise Software(28:21) Omni's Tech Stack and Development Strategy(28:57) Real-World Use of AI Inside the Company(31:01) Future of Data Tooling and Role of AI(33:49) Rapid Fire Round--------Where to find Colin Zima: LinkedIn: https://www.linkedin.com/in/colinzima/--------Where to find Prateek Joshi: Newsletter: https://prateekjoshi.substack.com Website: https://prateekj.com LinkedIn: https://www.linkedin.com/in/prateek-joshi-infiniteX: https://x.com/prateekvjoshi
In this episode, we sit down with Sridhar Ramaswamy, CEO of Snowflake, for an in-depth conversation about the company's transformation from a cloud analytics platform into a comprehensive AI data cloud. Sridhar shares insights on Snowflake's shift toward open formats like Apache Iceberg and why monetizing storage was, in his view, a strategic misstep.We also dive into Snowflake's growing AI capabilities, including tools like Cortex Analyst and Cortex Search, and discuss how the company scaled AI deployments at an impressive pace. Sridhar reflects on lessons from his previous startup, Neeva, and offers candid thoughts on the search landscape, the future of BI tools, real-time analytics, and why partnering with OpenAI and Anthropic made more sense than building Snowflake's own foundation models.SnowflakeWebsite - https://www.snowflake.comX/Twitter - https://x.com/snowflakedbSridhar RamaswamyLinkedIn - https://www.linkedin.com/in/sridhar-ramaswamyX/Twitter - https://x.com/RamaswmySridharFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) Intro and current market tumult(02:48) The evolution of Snowflake from IPO to Today(07:22) Why Snowflake's earliest adopters came from financial services(15:33) Resistance to change and the philosophical gap between structured data and AI(17:12) What is the AI Data Cloud?(23:15) Snowflake's AI agents: Cortex Search and Cortex Analyst(25:03) How did Sridhar's experience at Google and Neeva shape his product vision?(29:43) Was Neeva simply ahead of its time?(38:37) The Epiphany mafia(40:08) The current state of search and Google's conundrum(46:45) “There's no AI strategy without a data strategy”(56:49) Embracing Open Data Formats with Iceberg(01:01:45) The Modern Data Stack and the future of BI(01:08:22) The role of real-time data(01:11:44) Current state of enterprise AI: from PoCs to production(01:17:54) Building your own models vs. using foundation models(01:19:47) Deepseek and open source AI(01:21:17) Snowflake's 1M Minds program(01:21:51) Snowflake AI Hub
In this podcast episode, we talked with Adrian Brudaru about the past, present and future of data engineering.About the speaker:Adrian Brudaru studied economics in Romania but soon got bored with how creative the industry was, and chose to go instead for the more factual side. He ended up in Berlin at the age of 25 and started a role as a business analyst. At the age of 30, he had enough of startups and decided to join a corporation, but quickly found out that it did not provide the challenge he wanted.As going back to startups was not a desirable option either, he decided to postpone his decision by taking freelance work and has never looked back since. Five years later, he co-founded a company in the data space to try new things. This company is also looking to release open source tools to help democratize data engineering.0:00 Introduction to DataTalks.Club1:05 Discussing trends in data engineering with Adrian2:03 Adrian's background and journey into data engineering5:04 Growth and updates on Adrian's company, DLT Hub9:05 Challenges and specialization in data engineering today13:00 Opportunities for data engineers entering the field15:00 The "Modern Data Stack" and its evolution17:25 Emerging trends: AI integration and Iceberg technology27:40 DuckDB and the emergence of portable, cost-effective data stacks32:14 The rise and impact of dbt in data engineering34:08 Alternatives to dbt: SQLMesh and others35:25 Workflow orchestration tools: Airflow, Dagster, Prefect, and GitHub Actions37:20 Audience questions: Career focus in data roles and AI engineering overlaps39:00 The role of semantics in data and AI workflows41:11 Focusing on learning concepts over tools when entering the field 45:15 Transitioning from backend to data engineering: challenges and opportunities 47:48 Current state of the data engineering job market in Europe and beyond 49:05 Introduction to Apache Iceberg, Delta, and Hudi file formats 50:40 Suitability of these formats for batch and streaming workloads 52:29 Tools for streaming: Kafka, SQS, and related trends 58:07 Building AI agents and enabling intelligent data applications 59:09Closing discussion on the place of tools like DBT in the ecosystem
In this episode, Sean Zinsmeister from Google joins Mark Rittman to discuss the latest developments for Looker including the integration of Looker Studio, new modeling capabilities and the exciting potential of generative AI for BI.We discuss how Looker is evolving to be a more open, composable platform that can power advanced analytics and data storytelling, with Sean sharing insights on Google's purpose-built Gemini models for natural language to SQL translation and how Looker customers can leverage these AI capabilities. We also explore Looker's agentic API strategy, the long-term vision of using Looker Studio as the primary Looker front-end and the opening up of LookML to tools beyond just Looker. Driving Looker customer innovations in the generative AI eraPreviewing Studio in Looker, the (Eventual) Future of Self-Service Reporting for LookerLooker now available from Google Cloud consoleDelivering the third wave of BI in the AI era with LookerDrill to Detail Ep.100 Special ‘Past, Present and Future of the Modern Data Stack' with Special Guests Keenan Rice, Stewart Bryson and Jake SteinDrill to Detail Ep. 73 'Luck, Thinking Different and Designing Looker Data Platform' with Special Guest Colin Zima
Christophe Blefari est Staff Data Engineer, co-fondateur de nao et auteur de la newsletter data la plus connue de l'écosystème français (Blef.fr). Membre du Collectif de freelances DataGen, il est selon moi l'un des plus gros experts data en France.
Highlights from this week's conversation include:Pedram's Background and Journey in Data (0:47)Joining Dagster Labs (1:41)Synergies Between Teams (2:56)Developer Marketing Preferences (6:06)Bridging Technical Gaps (9:54)Understanding Data Orchestration (11:05)Dagster's Unique Features (16:07)The Future of Orchestration (18:09)Freeing Up Team Resources (20:30)Market Readiness of the Modern Data Stack (22:20)Career Journey into DevRel and Marketing (26:09)Understanding Technical Audiences (29:33)Building Trust Through Open Source (31:36)Understanding Vendor Lock-In (34:40)AI and Data Orchestration (36:11)Modern Data Stack Evolution (39:09)The Cost of AI Services (41:58)Differentiation Through Integration (44:13)Language and Frameworks in Orchestration (49:45)Future of Orchestration and Closing Thoughts (51:54)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Join Mark Rittman in this special end-of-year episode as he speaks with Noel Gomez, co-founder of DataCoves about the challenges and opportunities of orchestrating dbt and other tools within the open-source Modern Data Stack, navigating the evolving semantic layer landscape and the future of modular, vendor-agnostic data solutions.Datacoves Platform OverviewBuild vs Buy Analytics Platform: Hosting Open-Source ToolsScale the benefits of Core with dbt CloudDagster vs. Airflow
This morning, a great article came across my feed that gave me PTSD, asking if Iceberg is the Hadoop of the Modern Data Stack? In this rant, I bring the discussion back to a central question you should ask with any hot technology - do you need it at all? Do you need a tool built for the top 1% of companies at a sufficient data scale? Or is a spreadsheet good enough? Link: https://blog.det.life/apache-iceberg-the-hadoop-of-the-modern-data-stack-c83f63a4ebb9
Charlotte Ledoux est une experte Data Gouvernance et également une créatrice de contenu sur LinkedIn à succès (+30K abonnés). Elle est venue nous faire une masterclass sur son sujet de prédilection : la Data Gouvernance, avec un focus sur les rôles.
My guest today is Moham Aref, CEO of Relational AI, a company that recently closed a $75 million Series B funding round. Moham shares his incredible journey spanning over 30 years in the AI and machine learning space, offering invaluable insights for aspiring entrepreneurs and tech enthusiasts alike. Moham Aref, CEO of Relational AI, Moham brings a wealth of experience, having previously led LogicBlox and Predictix, and now spearheading Relational AI's mission to simplify intelligent application development. → Website: https://relational.ai/ → Linkedin: https://www.linkedin.com/in/molham/ Nataraj is the host & creator of Startup Project podcast, he is a full time product manager at Microsoft, early stage investor & advisor. → Linkedin: https://www.linkedin.com/in/natarajsindam/ → Twitter: https://x.com/natarajsindam → Email updates: https://startupproject.substack.com/ → Website: https://thestartupproject.io Podcast Highlights: This episode covers a wide range of fascinating topics, from Moham's extensive career journey to the intricacies of the modern data stack and the transformative potential of Relational AI's technology. We unravel the complexities of descriptive, predictive, and prescriptive analytics, demystifying these crucial concepts for a broader audience. We also discuss the challenges of finding those first five customers in the B2B world, the strategic decision to build on Snowflake, and the potential for future competition and cannibalization by larger platforms. Moham thoughtfully shares his perspective on the current hype surrounding Generative AI and its practical applications in the enterprise space. We finish with advice on leadership, mentorship, and the overall challenges and rewards of a career in tech. Timestamps: 00:00 - Introduction and Guest Introduction 01:55 - Moham Aref's Career Journey and Transition to Relational AI 08:30 - Understanding Descriptive, Predictive, and Prescriptive Analytics 12:00 - Early Use Cases and Target Customers for Relational AI 17:30 - The Decision to Build on Snowflake: Strategy and Competition 22:15 - Securing the First Five Customers in the B2B World 27:40 - The Modern Data Stack and Relational AI's Place Within It 34:30 - Generative AI: Hype, Reality, and Enterprise Applications 40:00 - Leveraging Generative AI Internally and for Customer Value 45:00 - B2B Sales Strategies: Content, Relationships, and Customer Focus 51:30 - Relational AI's Future Plans and Growth Strategy 54:00 - Moham's Consumption Habits: Historical Insights and Mentorship 58:30 - Lessons Learned as a Founder and CEO Don't forget to like and subscribe for more insightful conversations about the world of AI! → YouTube: https://youtu.be/9-J4eV8qvZg → Spotify: https://open.spotify.com/episode/3Og8mbra1cokQ5cRJdjZn1?si=iqEOqKLLSqSbk8ehkniFqg → Apple podcasts: https://podcasts.apple.com/us/podcast/85-ai-should-not-be-regulated-author-ml-researcher/id1551300319?i=1000673806783 → Email updates: https://startupproject.substack.com/ → Others: https://spotifyanchor-web.app.link/e/qYaG6vhTRNb#ModernDataStack #RelationalAI #AI #MachineLearning #DataAnalytics #PredictiveAnalytics #PrescriptiveAnalytics #GenerativeAI #Snowflake #B2B #Entrepreneurship #TechPodcast #DataManagement #BusinessIntelligence #CloudComputing #TechLeadership #CareerAdvice #Innovation #DataStrategy
Pierre Pessarossi est Lead Data Science chez Back Market, la licorne française qui propose une marketplace de produits reconditionnés. Il nous parle de leur stratégie GenAI.
Christophe Blefari est Staff Data Engineer, auteur de la newsletter data la plus connue au sein de l'écosystème français (Blef.fr) et récemment cofondateur de nao. Il est également selon moi l'un des plus gros experts data en France.
A founding engineer on Google BigQuery and now at the helm of MotherDuck, Jordan Tigani challenges the decade-long dominance of Big Data and introduces a compelling alternative that could change how companies handle data. Jordan discusses why Big Data technologies are an overkill for most companies, how MotherDuck and DuckDB offer fast analytical queries, and lessons learned as a technical founder building his first startup. Watch the episode with Tomasz Tunguz: https://youtu.be/gU6dGmZzmvI Website - https://motherduck.com Twitter - https://x.com/motherduck Jordan Tigani LinkedIn - https://www.linkedin.com/in/jordantigani Twitter - https://x.com/jrdntgn FIRSTMARK Website - https://firstmark.com Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) LinkedIn - https://www.linkedin.com/in/turck/ Twitter - https://twitter.com/mattturck (00:00) Intro (00:56) What is the Small Data? (06:56) Marketing strategy of MotherDuck (08:39) Processing Small Data with Big Data stack (15:30) DuckDB (17:21) Creation of DuckDB (18:48) Founding story of MotherDuck (24:08) MotherDuck's community (25:25) MotherDuck of today ($100M raised) (33:15) Why MotherDuck and DuckDB are so fast? (39:08) The limitations and the future of MotherDuck's platform (39:49) Small Models (42:37) Small Data and the Modern Data Stack (46:47) Making things simpler with a shift from Big Data to Small Data (50:04) Jordan Tigani's entrepreneurial journey (58:31) Outro
Anaïs Ghelfi est Head of Data Platform chez Malt, la plateforme qui met en relation les entreprises avec les freelances. Aujourd'hui, elle nous parle du projet Assistant IA mis en place par les équipes Data et de son adoption globalisée (+50%) aux différentes équipes métiers de l'entreprise.
Lukas Schulte is Co-Founder and CEO of SDF Labs (Semantic Data Fabric), the data transformation layer and query engine platform. They're an open core company powered by the Apache Data Fusion query engine. SDF Labs has raised $9M from investors including RTP Global and Two Sigma Ventures. In this episode, we dig into the complications and pain points with the Modern Data Stack, shifting left with data (ie. moving more over to the client), competing with DBT by adding a query engine, why building in Rust was important, why their CLI is closed source, the importance of a strong partner strategy as a data company & more!
Enzo Rideau, expert Fabric (le nouvel outil de Microsoft) est Solution Leader Microsoft Analytics chez delaware, le cabinet de conseil leader sur les solutions Microsoft et SAP en France. Il a également fondé le podcast House of Fabric et partage régulièrement du contenu sur LinkedIn (à ses +10 000 abonnés).
Melvyn Peignon est Product Manager chez ClickHouse, le Data Warehouse en temps réel utilisé par Netflix, Uber, Disney ou encore Contentsquare.On aborde :
Mark Rittman is joined by returning guest David Jayatillake, VP of AI at Cube.dev, to talk about Delphi Labs' journey from a standalone data analytics chatbot to now becoming the basis of Cube's new AI features within its composable semantic model product.Drill to Detail Ep.102 'LLMs, Semantic Models and Bringing AI to the Modern Data Stack' with Special Guest David JayatillakeDrill to Detail Ep.107 'Cube, Headless BI and the AI Semantic Layer' with Special Guest Artyom KeydunovIntroducing the AI API and Chart Prototyping in Cube CloudA Practical Guide to Getting Started with Cube's AI APICube Rollup London : Bringing Cube Users Together
Benn Stancil's weekly Substack on data and technology provides a fascinating perspective on the modern data stack & the industry building it. On this episode, Benn joins Jerod to dissect a few of his essays, discuss opportunities he sees during this slowdown & discuss why he thinks maybe we should disband the analytics team.
Benn Stancil's weekly Substack on data and technology provides a fascinating perspective on the modern data stack & the industry building it. On this episode, Benn joins Jerod to dissect a few of his essays, discuss opportunities he sees during this slowdown & explain why he thinks maybe we should disband the analytics team.
Jake Yormak of Story Ventures joins Nate to discuss Why Hardware is Attractive, The Most Interesting Areas in AI Outside of GenAI, and the Modern Data Stack. In this episode we cover: Concentrating on Early-Stage Companies with Potential for Growth Investing in Hardware Companies, Challenges and Opportunities Focusing on Power Law Outliers AI Commoditization, Impact on Profit Pools, with a Focus on Computer Vision and Proprietary Data AI in Workflows, Incentivizing Users to Contribute Context Guest Links: LinkedIn X Story Ventures The hosts of The Full Ratchet are Nick Moran and Nate Pierotti of New Stack Ventures, a venture capital firm committed to investing in founders outside of the Bay Area. Want to keep up to date with The Full Ratchet? Follow us on social. You can learn more about New Stack Ventures by visiting our LinkedIn and Twitter. Are you a founder looking for your next investor? Visit our free tool VC-Rank and we'll send a list of potential investors right to your inbox!
In this episode, we sat down with Tomasz Tunguz (https://twitter.com/ttunguz), the founder of Theory Ventures and a leading voice in the tech investment space. We discussed the transformative potential of Ethereum as a database company, the importance of data security in a decentralized world, and the evolving landscape of AI technologies from foundational models to AI-native applications.
Safiyy Momen and I chat about the good and bad of the Modern Data Stack, controlling cloud costs, boring engineering, and much more. LinkedIn: https://www.linkedin.com/in/safiyy-momen/
Highlights from this week's conversation include:The Evolution of Data Systems (0:47)The Role of Open Source Software (2:39)Challenges of Time Series Data (6:38)Architecting InfluxDB (9:34)High Cardinality Concepts (11:36)Trade-Offs in Time Series Databases (15:35)High Cardinality Data (18:24)Evolution to InfluxDB 3.0 (21:06)Modern Data Stack (23:04)Evolution of Database Systems (29:48)InfluxDB Re-Architecture (33:14)Building an Analytic System with Data Fusion (37:33)Challenges of Mapping Time Series Data into Relational Model (44:55)Adoption and Future of Data Fusion (46:51)Externalized Joins and Technical Challenges (51:11)Exciting Opportunities in Data Tooling (55:20)Emergence of New Architectures (56:35)Final thoughts and takeaways (57:47)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Modern Data Stack Hello, this is Hall T. Martin with the Startup Funding Espresso -- your daily shot of startup funding and investing. The modern data stack is the term for the tools used by tech companies to analyze and integrate data. It's cloud-based which alleviates many of the challenges in analyzing data with legacy systems. Here are the components of the modern data stack: Data sources -- this includes databases, company products that produce a stream of data, and event streams which log each action a user takes. Data warehouse -- these are the tools used to store the voluminous amounts of data that come from data analysis work. This includes data lakes and other large-scale formats for storing the data. Data analytics -- this includes the ability to query into the data sets and apply analytics to the data. Data transformation -- this moves the data into a format that end users can use for their own queries and analysis. Data monitoring -- this captures metrics about the data such as how often the data is being used and for what applications. Data governance -- this monitors the use of the data to comply with government regulations. Data applications -- the set of applications which use the data output from the system for applications such as business intelligence. In setting up a data analytics program at your company consider the modern data stack and its components. Thank you for joining us for the Startup Funding Espresso where we help startups and investors connect for funding. Let's go startup something today. _______________________________________________________ For more episodes from Investor Connect, please visit the site at: Check out our other podcasts here: For Investors check out: For Startups check out: For eGuides check out: For upcoming Events, check out For Feedback please contact info@tencapital.group Please , share, and leave a review. Music courtesy of .
Highlights from this week's conversation include:Chad's background and journey in data (0:46)Importance of Data Supply Chain (2:19)Challenges with Modern Data Stack (3:28)Comparing Data Supply Chain to Real-world Supply Chains (4:49)Overview of Gable.ai (8:05)Rethinking Data Catalogs (11:42)New Ideas for Managing Data (15:16)Data Discovery and Governance Challenges (18:51)Static Code Analysis and AI Impact on Data (24:55)Creating Contracts and Defining Data Lineage (27:31)Data Quality Issues and Upstream Problems (32:32)Challenges with Third-Party Vendors and External Data (34:29)Incentivizing Engineers for Data Quality (40:28)Feedback Loops and Actionability in Data Catalogs (45:30)Missing metadata (48:57)Role of AI in data semantics (50:27)Data as a product (54:26)Slowing down to go faster (57:38)Quantifying the cost of data changes (1:01:24)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Investor Sabrina Wu hosts Bobsled Co-founder and CEO Jake Graham for the latest episode of Founded & Funded. Jake is revolutionizing data sharing across platforms, enabling customers to get to analysis faster directly in the platforms where they work. Madrona co-led Bobsled's $17 million series A last year, which put the company at an $87 million valuation. In this episode, Jake — who had stints at Neo4j, Intel, and Microsoft — provides his perspective on why enabling cross-cloud data sharing is often cumbersome yet so important in the age of AI. He also shares why you can't PLG the enterprise, how to convince customers to adopt new technologies in a post-zero interest rate environment, and what it takes to land and partner with the hyperscalers. Transcript here: https://www.madrona.com/bobsled-cross-cloud-data-sharing/ (00:00) Introduction (01:36) Why found a startup? (03:00) The Genesis of Bobsled: From Inspiration to Reality (05:26) Understanding Bobsled's Functionality: Cross-Cloud Data Sharing (09:48) The Role of Cross-Cloud Data Sharing in the Age of AI (13:05) Redefining the Modern Data Stack and Its Future (18:04) Navigating Enterprise Sales and Partnerships in Tech (23:22) Strategic Partnerships and Navigating the Hyperscaler Landscape (29:40) Leadership Lessons and Vision for Bobsled
Benn Stancil, cofounder and CTO at Mode, returns to The Analytics Engineering Podcast to discuss the evolution of the term "modern data stack" and its value today. Tristan wrote on this idea for The Analytics Engineering Roundup in Is the Modern Data Stack Still a Useful Idea? For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
In this episode, we explore the dynamic world of modern analytics with Tristan Handy, CEO of dbt Labs (https://twitter.com/jthandy). DBT, which helps more than 30,000 enterprises ship trusted data products faster, has raised more than $400 million dollars, most recently at a $4B valuation.We discuss how dbt has revolutionized analytics engineering, enabling seamless data transformation and orchestration in the cloud. This innovation fosters greater collaboration among data teams and integrates software engineering principles into data analytics workflows.We also talk about dbt's Semantic Layer, a game-changer that streamlines data operations by standardizing key business metrics for consistent use across various analytical tools.In this conversation, we tackle pressing questions about the current state and future of data management and analytics. Is the "modern data stack" becoming obsolete? What's next for data engineering? And how is AI reshaping the analytics landscape?Tune in to discover our insights.
Joe Reis and Matt Housley are back for another listener Q&A. They chat about the demise of the Modern Data Stack, architecture, data modeling, AI, and much more.
My voice is sort of working, and I chat about Tristan Handy's article that raised quite a ruckus this week, "Is the "Modern Data Stack" Still a Useful Idea?" In the end, the Modern Data Stack won - people use the cloud for analytics. And everything ends, so I'm excited for what's next. Article: https://roundup.getdbt.com/p/is-the-modern-data-stack-still-a?r=oc02
In this bonus episode, Eric and Kostas preview their upcoming conversation with Artyom Keydunov of Cube Dev.
Summary The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the stack. In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It's the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it's real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize (https://www.dataengineeringpodcast.com/materialize) today to get 2 weeks free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm welcoming back Tarush Aggarwal to talk about what he and his team at 5x data are building to improve the user experience of the modern data stack. Interview Introduction How did you get involved in the area of data management? Can you describe what 5x is and the story behind it? We last spoke in March of 2022. What are the notable changes in the 5x business and product? What are the notable shifts in the data ecosystem that have influenced your adoption and product direction? What trends are you most focused on tracking as you plan the continued evolution of your offerings? What are the points of friction that teams run into when trying to build their data platform? Can you describe design of the system that you have built? What are the strategies that you rely on to support adaptability and speed of onboarding for new integrations? What are some of the types of edge cases that you have to deal with while integrating and operating the platform implementations that you design for your customers? What is your process for selection of vendors to support? How would you characterize your relationships with the vendors that you rely on? For customers who have pre-existing investment in a portion of the data stack, what is your process for engaging with them to understand how best to support their goals? What are the most interesting, innovative, or unexpected ways that you have seen 5XData used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on 5XData? When is 5X the wrong choice? What do you have planned for the future of 5X? Contact Info LinkedIn (https://www.linkedin.com/in/tarushaggarwal/) @tarush (https://twitter.com/tarush) on Twitter Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links 5X (https://5x.co) Informatica (https://www.informatica.com/) Snowflake (https://www.snowflake.com/en/) Podcast Episode (https://www.dataengineeringpodcast.com/snowflakedb-cloud-data-warehouse-episode-110/) Looker (https://cloud.google.com/looker/) Podcast Episode (https://www.dataengineeringpodcast.com/looker-with-daniel-mintz-episode-55/) DuckDB (https://duckdb.org/) Podcast Episode (https://www.dataengineeringpodcast.com/duckdb-in-process-olap-database-episode-270/) Redshift (https://aws.amazon.com/redshift/) Reverse ETL (https://medium.com/memory-leak/reverse-etl-a-primer-4e6694dcc7fb) Fivetran (https://www.fivetran.com/) Podcast Episode (https://www.dataengineeringpodcast.com/fivetran-data-replication-episode-93/) Rudderstack (https://www.rudderstack.com/) Podcast Episode (https://www.dataengineeringpodcast.com/rudderstack-open-source-customer-data-platform-episode-263/) Peak.ai (https://peak.ai/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
Key Points:The rush to categorize all of our tooling in data has caused many issues - we will see a big shake-up coming in the future much like happened in application development tooling.So much of data people's time is spent on things that don't add value themselves, it's work that should be automated. We need to fix that so the data work is about delivering value.We can learn a lot from virtualization but data virtualization is not where things should go in general.Containerization is merely an implementation detail. Much like software developers don't really care much about process containers, the same will happen in data product containers - it's all about the experience and containers significantly improve the experience.The pendulum swung towards decoupled data tech instead of monolithic offerings with 'The Modern Data Stack' but most of the technologies were not that easy to stitch together. Going forward, we want to keep the decoupled strategy but we need a better way to integrate - APIs is how it worked in software, why not in data? Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter. Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or
The modern data stack is a collection of cloud-based tools and technologies used to collect, store, process, and analyze data in a scalable way. It is a departure from traditional data stacks, which were often based on on-premises infrastructure and were not as well-suited for handling large volumes of data or complex data pipelines. But with this new approach comes complexity, and organizations must determine if the value outweighs the cost. Another, newer route has emerged for companies interested in serious analytical power: Hyperscale! This new architecture leverages an array of technical advances, including compute-adjacent storage, simplified data pipelines that make data available to more users, built-in integrations for a whole host of data sources, and machine learning algorithms baked into the architecture. Learn more on this episode of DM Radio as Host @eric_kavanagh interviews Chris Gladwin, CEO of Ocient, and Hyoun Park of Amalgam Insights.
Egor Gryaznov joins me to chat about the "Non-Modern Data Stack", getting out of our data bubble, and much more. If you like a refreshing conversation talking about the past, present, and future of our industry, this is for you. BigEye: https://webflow.bigeye.com/ LinkedIn: https://www.linkedin.com/in/egorgryaznov/
Já exploramos com o Grupo Boticário, assuntos desde como é trabalhar com dados, até mesmo, como fazem uso de Modern Data Stack. Agora, queremos saber como a IA está mudando a forma do trabalho de uma das empresas mais admiradas da America Latina, da Pesquisa State of Data Brazil. Neste episódio do Data Hackers — a maior comunidade de AI e Data Science do Brasil-, conheçam esse time de especialistas : a Isabella Becker — DPO (Data Protection Officer); e o Bruno Gobbet — Senior Data Manager; ambos atuantes na área de dados do Grupo Boticário. Lembrando que você pode encontrar todos os podcasts da comunidade Data Hackers no Spotify, iTunes, Google Podcast, Castbox e muitas outras plataformas. Caso queira, você também pode ouvir o episódio aqui no post mesmo! Link no Medium: https://medium.com/data-hackers/como-ia-est%C3%A1-mudando-a-forma-do-grupo-botic%C3%A1rio-trabalhar-data-hackers-podcast-74-c45006b64d67 Falamos no episódio Conheça nosso convidado: Isabella Becker — DPO ( Data Protection Officer) Bruno Gobbet — Senior Data Manager Bancada Data Hackers: Paulo Vasconcellos Monique Femme Links de referências: GH TECH (Medium): https://medium.com/gbtech Data Hackers News ( noticias semanais sobre a área de dados, AI e tecnologia) — https://podcasters.spotify.com/pod/show/datahackers/episodes/Data-Hackers-News-1---Amazon-investe-US-4-bi-na-Anthropic--Microsoft-anuncia-Copilot-para-Windows-11--OpenAI-anuncia-DALL-E-3-e29r06f Série Netflix Coded Bias: https://www.netflix.com/br/title/81328723 Livro ( Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy): https://www.amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418815 --- Send in a voice message: https://podcasters.spotify.com/pod/show/datahackers/message
Michel Tricot (CEO of Airbyte) joins me to chat about the impact of AI on the modern data stack, ETL for AI, the challenges of moving from open source to a paid product, and much more. Airbyte & Pinecone - https://airbyte.com/tutorials/chat-with-your-data-using-openai-pinecone-airbyte-and-langchain Note from Joe - I had audio issues cuz he got a new computer and didn't use the correct mic :(
Model deployment, data warehouse options for running models, and how to best leverage BI tools: Harry Glaser and Jon Krohn discuss Modelbit's capabilities to automate ML models from notebooks into production-ready models, reducing the time and effort in ‘translating' information from one mode to another. Harry's conversation with host Jon Krohn expanded on the importance of automating this task, and how developments in ML modeling have widened access to entire teams to analyze data, whatever their level of expertise. This episode is brought to you by the AWS Insiders Podcast (https://pod.link/1608453414). Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn: • What the modern data stack is [03:28] • Version control for data scientists [13:30] • CI/CD, load balancing and logging [20:38] • Snowflake vs. Redshift [30:10] • How tools like Looker and Tableau help monitor models [35:26] Additional materials: www.superdatascience.com/699
No episódio de hoje, Luan Moreno e Mateus Oliveira conversam com Matheus Willian, atualmente Head de Engenharia de Dados na One Way Solution.dbt é uma das tecnologias mais faladas e utilizadas fora do país, possibilitando aos times de todos os portes trabalhar com o conceito de Modern Data Stack, tornando o desenvolvimento de transformações dos dados de forma simples e com SQL.Com dbt, você tem os seguintes benefícios:Desenvolvimento de pipeline de dados usando SQL;Reutilização dos códigos usando estruturas de git;Simplificação da Stack de dados;Processamento em Modern Data Warehouses dentro outros adapters.Falamos também nesse bate-papo sobre os seguintes temas:Dados como pilar central;Dbt;Times de BI Moderno.Aprenda mais sobre dbt, como utilizar uma tecnologia para Modern Data Stack, junto com o time da One Way Solution, que mais impulsiona a comunidade, tanto com conteúdo, como com treinamentos e eventos para ajudar os profissionais de dados brasileiros em vagas de trabalho dentro e fora do país.Matheus Willian = https://www.linkedin.com/in/matheuswillian/https://www.getdbt.com/ Luan Moreno = https://www.linkedin.com/in/luanmoreno/
In this episode John Kutay hosts Aron Clymer to discuss the current state of the Modern Data Stack. Aron shares his expertise as he details how far we've come in data and what the next big milestones are. Aron also discusses Data Governance, the Semantic Layer, and how teams can best modernize their stack. Aron Clymer is Founder & CEO of Data Clymer, a next gen data & analytics consulting firm that empowers every client's success by unlocking the value of data. The Data Clymer team implements modern cloud data solutions that drive positive results through data accessibility and actionable insights.Aron previously established and built the Product Intelligence team at Salesforce for 7 years to support all data and analytics needs of 400+ product managers. Subsequently, Aron headed up Data at PopSugar where his team democratized data and supported analytics/data science across the company.Aron has grown Data Clymer over the past 6 years into a nationwide team of deeply experienced cloud data professionals.Follow Aron Clymer on LinkedinLearn more about Data ClymerWhat's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.
The modern data stack is a loose collection of technologies, often cloud-based, that collaboratively process and store data to support modern analytics. It must be automated, low code/no code, AI-assisted, graph-enabled, multimodal, streaming, distributed, meshy, converged, polyglot, open, and governed. Published at: https://www.eckerson.com/articles/twelve-must-have-characteristics-of-a-modern-data-stack
Summary The data ecosystem has been building momentum for several years now. As a venture capital investor Matt Turck has been trying to keep track of the main trends and has compiled his findings into the MAD (ML, AI, and Data) landscape reports each year. In this episode he shares his experiences building those reports and the perspective he has gained from the exercise. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Businesses that adapt well to change grow 3 times faster than the industry average. As your business adapts, so should your data. RudderStack Transformations lets you customize your event data in real-time with your own JavaScript or Python code. Join The RudderStack Transformation Challenge today for a chance to win a $1,000 cash prize just by submitting a Transformation to the open-source RudderStack Transformation library. Visit dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) today to learn more Your host is Tobias Macey and today I'm interviewing Matt Turck about his annual report on the Machine Learning, AI, & Data landscape and the insights around data infrastructure that he has gained in the process Interview Introduction How did you get involved in the area of data management? Can you describe what the MAD landscape report is and the story behind it? At a high level, what is your goal in the compilation and maintenance of your landscape document? What are your guidelines for what to include in the landscape? As the data landscape matures, how have you seen that influence the types of projects/companies that are founded? What are the product categories that were only viable when capital was plentiful and easy to obtain? What are the product categories that you think will be swallowed by adjacent concerns, and which are likely to consolidate to remain competitive? The rapid growth and proliferation of data tools helped establish the "Modern Data Stack" as a de-facto architectural paradigm. As we move into this phase of contraction, what are your predictions for how the "Modern Data Stack" will evolve? Is there a different architectural paradigm that you see as growing to take its place? How has your presentation and the types of information that you collate in the MAD landscape evolved since you first started it?~~ What are the most interesting, innovative, or unexpected product and positioning approaches that you have seen while tracking data infrastructure as a VC and maintainer of the MAD landscape? What are the most interesting, unexpected, or challenging lessons that you have learned while working on the MAD landscape over the years? What do you have planned for future iterations of the MAD landscape? Contact Info Website (https://mattturck.com/) @mattturck (https://twitter.com/mattturck) on Twitter MAD Landscape Comments Email (mailto:mad2023@firstmarkcap.com) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links MAD Landscape (https://mad.firstmarkcap.com) First Mark Capital (https://firstmark.com/) Bayesian Learning (https://en.wikipedia.org/wiki/Bayesian_inference) AI Winter (https://en.wikipedia.org/wiki/AI_winter) Databricks (https://www.databricks.com/) Cloud Native Landscape (https://landscape.cncf.io/) LUMA Scape (https://lumapartners.com/lumascapes/) Hadoop Ecosystem (https://www.analyticsvidhya.com/blog/2020/10/introduction-hadoop-ecosystem/) Modern Data Stack (https://www.fivetran.com/blog/what-is-the-modern-data-stack) Reverse ETL (https://medium.com/memory-leak/reverse-etl-a-primer-4e6694dcc7fb) Generative AI (https://generativeai.net/) dbt (https://www.getdbt.com/) Transform (https://transform.co/) Podcast Episode (https://www.dataengineeringpodcast.com/transform-co-metrics-layer-episode-206/) Snowflake IPO (https://www.cnn.com/2020/09/16/investing/snowflake-ipo/index.html) Dataiku (https://www.dataiku.com/) Iceberg (https://iceberg.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/tabular-iceberg-lakehouse-tables-episode-363) Hudi (https://hudi.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/hudi-streaming-data-lake-episode-209/) DuckDB (https://duckdb.org/) Podcast Episode (https://www.dataengineeringpodcast.com/duckdb-in-process-olap-database-episode-270/) Trino (https://trino.io/) Y42 (https://www.y42.com/) Podcast Episode (https://www.dataengineeringpodcast.com/y42-full-stack-data-platform-episode-295) Mozart Data (https://www.mozartdata.com/) Podcast Episode (https://www.dataengineeringpodcast.com/mozart-data-modern-data-stack-episode-242/) Keboola (https://www.keboola.com/) MPP Database (https://www.techtarget.com/searchdatamanagement/definition/MPP-database-massively-parallel-processing-database) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
The Top Entrepreneurs in Money, Marketing, Business and Life
Modern Data Stack.
Today I had the pleasure of interviewing Chris Tabb. Chris is the Co-Founder & CCO at LEIT DATA. In this episode he schools me on the modern data stack and gives me a great look into the history of data and data management.He started his career in the Business Intelligence/Analytics domain 30 years ago. Beginning at Cognos in the 90's working in the back office before becoming an expert in all their products, and leaving to become an independent BI consultant in 1998. It is safe to say he loves Data and always hasHe followed the evolution of the analytics industry, working hands-on with all the technologies in the ecosystems: - Databases, ETL/ELT, BI/OLAP /Visualisation Tools, Big Data Technologies, Infrastructure On-premises / Cloud across many vendors, some old some new.Nowadays he works at a more strategic level providing Technical Roadmap, Vendor Selection, Migration Strategies, Data Management, Data & Application Architecture, but still likes to keep hands-on with products in the Data Eco System.Chris's Links:Linkedin - https://www.linkedin.com/in/chris-tabb-datatips/?originalSubdomain=ukSki Event - https://skiersindata.com/Leit Data - https://www.leit-data.com/
dbt is known as being part of the Modern Data Stack for ELT processes. Being in the MDS, dbt Labs believes in having the best of breed for every part of the stack. Oftentimes folks are using an EL tool like Fivetran to pull data from the database into the warehouse, then using dbt to manage the transformations in the warehouse. Analysts can then build dashboards on top of that data, or execute tests.It's possible for an analyst to adapt this process for use with a microservice application using Apache Kafka® and the same method to pull batch data out of each and every database; however, in this episode, Amy Chen (Partner Engineering Manager, dbt Labs) tells Kris about a better way forward for analysts willing to adopt the streaming mindset: Reusable pipelines using dbt models that immediately pull events into the warehouse and materialize as views by default.dbt Labs is the company that makes and maintains dbt. dbt Core is the open-source data transformation framework that allows data teams to operate with software engineering's best practices. dbt Cloud is the fastest and most reliable way to deploy dbt. Inside the world of event streaming, there is a push to expand data access beyond the programmers writing the code, and towards everyone involved in the business. Over at dbt Labs they're attempting something of the reverse— to get data analysts to adopt the best practices of software engineers, and more recently, of streaming programmers. They're improving the process of building data pipelines while empowering businesses to bring more contributors into the analytics process, with an easy to deploy, easy to maintain platform. It offers version control to analysts who traditionally don't have access to git, along with the ability to easily automate testing, all in the same place.In this episode, Kris and Amy explore:How to revolutionize testing for analysts with two of dbt's core functionalitiesWhat streaming in a batch-based analytics world should look likeWhat can be done to improve workflowsHow to democratize access to data for everyone in the businessEPISODE LINKSLearn more about dbt labsAn Analytics Engineer's Guide to StreamingPanel discussion: If Streaming Is the Answer, Why Are We Still Doing Batch?All Current 2022 sessions and slidesWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Neste episódio falamos com Maayan Salom sobre dbt e Elementary e como essas duas ferramentas tem ajudado times de dados a implementar de forma eficiente e segura pipelines de dados.O dbt se tornou uma das ferramentas mais utilizadas para transformar dados dentro do Data Warehouse por trazer a facilidade de se usar a linguagem SQL para processamento dos dados. Com dbt é possível ter uma visão ampla do que está acontecendo dentro da sua fonte da verdade analítica, além de proporcionar diversas capacidades interessantes para times que desejam escalar de forma rápida e estruturada.O Elementary é um produto open-source cuja responsabilidade é aplicar o conceito de observabilidade dentro dos pipelines de dados construídos no dbt. Essa solução entrega relatórios, detecção de anomalias, validação de desempenho do seu pipeline e pode até entregar alerta no Slack, isso tudo para aprimorar e enriquecer seu processo de ETL.Nesse bate papo você irá entender como o dbt e o Elementary podem reduzir a complexidade durante a criação e observabilidade dos seus pipelines de dados e trazer seu time de dados para um ambiente confiável e monitorado. dbtElementaryMaayan Salom Luan Moreno = https://www.linkedin.com/in/luanmoreno/
With the increasing rate at which new data tools and platforms are being created, the modern data stack risks becoming just another buzzword data leaders use when talking about how they solve problems. Alongside the arrival of new data tools is the need for leaders to see beyond just the modern data stack and think deeply about how their data work can align with business outcomes, otherwise, they risk falling behind trying to create value from innovative, but irrelevant technology. In this episode, Yali Sassoon joins the show to explore what the modern data stack really means, how to rethink the modern data stack in terms of value creation, data collection versus data creation, and the right way businesses should approach data ingestion, and much more. Yali is the Co-Founder and Chief Strategy Officer at Snowplow Analytics, a behavioral data platform that empowers data teams to solve complex data challenges. Yali is an expert in data with a background in both strategy and operations consulting teaching companies how to use data properly to evolve their operations and improve their results.