POPULARITY
Pour ce septième épisode de la saison 4 de Morning Data Chat, Aurélien Barthe, Chief Data & AI Officer de la MGEN, partage la manière dont l'organisation structure son knowledge management pour soutenir durablement ses usages data et IA.Il explique comment la gouvernance des données non structurées ; fiches de connaissance, documents réglementaires, contenus métiers, repose sur un réseau humain de référents, de rédacteurs et d'experts terrain, organisé selon une logique de gouvernance fédérée.Aurélien revient sur l'importance de la qualité, de la fraîcheur et de la structuration des contenus pour alimenter efficacement les cas d'usage d'IA générative, tout en conservant l'humain dans la boucle grâce à des rôles clés comme les Product Owners IA et les knowledge managers.Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.
What if your data platform could power both critical business decisions and real-time product features at scale? In this episode, host Benjamin sits down with Magnus Dahlbäck, Senior Director of Data and Platform at Voi, to explore how a metrics-first approach and semantic layers transform data accessibility, why traditional ML and LLMs require different strategies for different problems, and how to balance FinOps costs while processing billions of IoT events daily. Whether you're building data infrastructure for a high-growth company or rethinking how your organization consumes data, this conversation is packed with practical strategies for unlocking data value and preparing your platform for AI. Tune in to discover how Voi ditched traditional BI tools and revolutionized their approach to enterprise analytics.
Pour ce septième épisode de la saison 4 de Morning Data Chat, Siddhartha Chatterjee, Chief Data & AI Officer du Club Med, partage la manière dont le groupe opère un virage stratégique vers un modèle AI First, en plaçant l'intelligence artificielle au cœur de la transformation des processus métiers.Il explique pourquoi l'IA ne doit pas être considérée comme un simple outil, mais comme un levier pour repenser en profondeur la façon dont l'entreprise fonctionne, prend des décisions et délivre de la valeur, tout en préservant l'ADN humain du Club Med.Siddhartha revient sur un cas d'usage emblématique dans les opérations RH : l'automatisation et l'optimisation de l'affectation des G.O à travers le monde. Ce processus historiquement manuel, critique pour l'expérience collaborateur et la satisfaction client, a été repensé de bout en bout grâce au machine learning, avec l'humain conservé dans la boucle.L'épisode met également en lumière un facteur clé de succès : le redesign des processus end-to-end, porté par un nouveau rôle, celui de Process Designer, combinant expertise métier, compréhension des enjeux IA et conduite du changement. Un retour d'expérience très concret sur la manière de structurer une transformation IA durable, à l'échelle d'un groupe international de services.Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.
The 2026 Practical Data Community State of Data Engineering dropped this week. It's full of some obvious and very counterintuitive information about the state of data engineers around the globe, in all sizes and types of organizations. Check it out!Also, I talk about the book writing process, where I messed up on this latest book, it's progress toward publication, and more.Survey: https://joereis.github.io/practical_data_data_eng_survey---------------------This episode is brought to you by Ellie.aiEllie makes data modeling as easy as sketching on a whiteboard—so even business stakeholders can contribute effortlessly. By skipping redraws, rework, and forgotten context, and by keeping all dependencies in sync, teams report saving up to 78% of modeling time.Check out Ellie: https://ellie.ai/
Send a textIn this episode, I had the pleasure of speaking with Jimmy Willis, a Senior Manager of Data Engineering at an AdTech company, where he builds systems that turn massive amounts of raw data into useful information. He is a self-taught programmer without a tech degree who was able to get an internship at JP Morgan Chase and leveraged that opportunity into a 6-figure job. Jimmy is currently writing a book and is on a mission to get 10,000 Black people into tech by learning Python and other real-world tech skills.https://www.rovion.co/Sign up for Activate Your Calling: Create, Build, & Promote Your Gift: https://bit.ly/4r0QixGSign up to be notified about Faith to Launch Community: https://bit.ly/FaithtoLaunchPlease join me in my YouTube only series, 30 Days to Becoming a Stronger, More Confident You in Christ: https://www.youtube.com/playlist?list=PLfkkBA4-h1A56MxObeO__s873pdUnnWQ5
In episode 31 of Open Source Ready, Brian and John sit down with Matthaus Krzykowski, Thierry Jean, and Elvis Kahoro to explore how dlt and dltHub are changing the way developers build data pipelines. The conversation dives into DuckDB, LLM-driven workflows, and the growing shift toward developer-first data engineering. They also discuss open source adoption, AI orchestration, and what it means to be a “10x engineer” in 2026.
Pour ce cinquième épisode de la saison 4 de Morning Data Chat, Jérémy Blond, Directeur Data du groupe Vicat, partage comment la data et l'IA accompagnent la décarbonation des processus industriels dans le secteur du ciment.Il explique comment la complexification des procédés de production rend indispensable l'usage de la donnée pour aider les équipes terrain à piloter les opérations.À travers un cas concret d'optimisation du broyage, Jérémy montre comment des modèles de machine learning permettent d'améliorer la stabilité des procédés, d'augmenter la productivité et de réduire la consommation énergétique, tout en respectant les contraintes industrielles.Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.
Pour ce cinquième épisode de la saison 4 de Morning Data Chat, Sébastien Rozanes, Chief Digital, Data & AI Officer de FDJ United, partage la manière dont le groupe structure sa transformation data et IA dans un contexte de fusion entre un acteur historique et une entreprise 100 % digitale.Il explique pourquoi l'IA impose aujourd'hui de repenser l'organisation data, de recentraliser les priorités et de piloter les investissements de bout en bout afin de faire émerger des projets à fort impact.À travers le programme GameChanger AI, Sébastien détaille les enjeux de gouvernance, de priorisation, de mesure du ROI et de montée en compétences des métiers, ainsi que la préparation du groupe aux usages d'IA agentique à l'horizon 2026.Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.
This Podcast is sponsored by Team Simmer.Go to TeamSimmer and use the coupon code DEVIATE for 10% on individual course purchases.The Technical Marketing Handbook provides a comprehensive journey through technical marketing principles.Sign up to the Simmer Newsletter for the latest news in Technical Marketing.NEW SIMMER COURSE ALERT! - Data Analysis with R - taught by Arben Kqiku (coupon code doesn't apply to this course)Latest content from Simo AhavaRun Server-side Google Tag Manager On Localhost ArticleLatest content from Juliana JacksonThe distance between what gets funded and what works has never been wider. (subscribe to the newsletter for more amazing content)Mentioned in the episode:Superweek Analytics SummitMeasurecamp HelsinkiConnect with Sayf Sharif:LinkedinThree Bears DataOptiMeasure This podcast is brought to you by Juliana Jackson and Simo Ahava.
What happens when a side hustle photo business turns into a decade-long marketing career that no longer fits? In this episode, Michael Galo shares his non-linear journey to Nashville Software School (NSS). After feeling "stuck" in marketing and communications, Michael decided to follow the advice of local coffee shop regulars and dive into tech. Michael discusses the intensity of the six-month Software Development bootcamp, the "fire hose" of learning, and why he chose to immediately specialize further by joining NSS's brand-new Data Engineering program through the ProTech initiative. 01:33 Life Before NSS: A Decade in Photo Production, Marketing & Communications 02:31 The Spark: Too Many Alumni at the Coffee Shop 02:57 Why Software Development? 04:59 Navigating the Bootcamp Challenge: The Capstones 06:52 The Importance of Community and Teamwork 08:01 Specializing with Data Engineering and ProTech 10:41 Deepening Backend Skills and Data Architecture 12:18 Expanding the Job Search Target 14:24 Career Development: Beyond the Resume 16:18 Advice for the Job Search: Stay Connected 18:00 Is Now the Right Time to Invest in Yourself? 20:10 Final Thoughts: Busting Through the Walls
Christophe Blefari est le créateur de la newsletter data blef.fr la plus connue en France. Il a été Head of Data, Head of Data Engineering et Staff Data Engineer dans des startups comme des grands groupes et est selon moi l'un des plus grands experts data en France. Récemment, il a cofondé nao Labs, un éditeur de code à destination des équipes data qui utilise l'IA.On aborde :
Pour ce quatrième épisode de la saison 4 de Morning Data Chat, Julie Pozzi, Chief Data & AI Officer d'Air France KLM, partage la manière dont le groupe structure et gouverne la connaissance pour en faire un levier opérationnel à l'ère de l'IA.Elle revient sur les usages critiques de la donnée non structurée au cœur des opérations aériennes, et sur les limites des approches traditionnelles de recherche documentaire. Julie explique comment l'IA permet de réduire drastiquement les temps de recherche tout en mettant en lumière un enjeu clé : la qualité, la structure et la gouvernance des contenus.Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.
"The definition of insanity is repeating the same thing over and over and expecting a different result every time. That's a little bit how metadata management is often performed in companies."
Pour ce troisième épisode de la saison 4 de Morning Data Chat, Magalie Cordier-Jubet, Chief Data Officer de Oney, revient sur la transformation majeure engagée par l'entreprise : la sortie progressive d'un legacy SAS historique au profit d'une modern data stack construite autour de Snowflake.Elle explique les bénéfices attendus de cette migration ainsi que l'usage de l'IA pour accélérer la conversion des scripts, documenter le code et réaliser les tests de non-régression. Magalie détaille enfin l'organisation mise en place pour réussir ce changement à grande échelle : stratégie hybride, mobilisation de plusieurs centaines de data analysts, création de communautés, montée en compétences, et pilotage FinOps dans un modèle de facturation à l'usage.Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.
Matthieu Rousseau est un expert en Data Engineering, sur le sujet de la Modern Data Stack et notamment sur dbt. Il a monté Modeo, l'agence spécialisée sur l'IA et le Data Engineering (dbt, Snowflake, Airflow, DLT, Databricks...).On aborde :
Pour ce deuxième épisode de la saison 4 de Morning Data Chat, Anthony Vouillon, Directeur du Centre d'Excellence IA de Renault, dévoile les leviers qui permettent au constructeur d'accélérer sa transformation à grande échelle.Il revient sur le déploiement massif de l'IA générative auprès des collaborateurs, l'impact du GenCoding sur la productivité, et la manière dont Renault conçoit ses premiers agents pour transformer l'expérience client comme les processus internes.Anthony détaille également les enjeux d'orchestration entre plateformes, les limites actuelles de performance des modèles et le rôle central de la qualité de la donnée dans l'industrialisation de l'IA.Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.
Was passiert, wenn ein wachsendes SaaS-Unternehmen wie sevDesk seine komplette Dateninfrastruktur auf ein neues Niveau hebt? In dieser Folge spricht Jonas Rashedi mit Michel Ebner, Team Lead Data Engineering, über die technischen Grundlagen hinter der Datenstrategie. Michel erklärt, warum sich das Team für Snowflake entschieden hat, wie eine Migration ohne Datenverlust gelingt – und wie sevDesk seine Architektur über Jahre modular aufgebaut hat: mit spezialisierten Tools, sauberer Aufgabentrennung und viel Erfahrungswissen. Ein Deep Dive für alle, die Data Engineering strategisch denken – und technologische Entscheidungen faktenbasiert treffen wollen. MY DATA IS BETTER THAN YOURS ist ein Projekt von BETTER THAN YOURS, der Marke für richtig gute Podcasts. Du möchtest gezielt Werbung im Podcast MY DATA IS BETTER THAN YOURS schalten? Zum Kontaktformular: https://2frg6t.share-eu1.hsforms.com/2ugV0DR-wTX-mVZrX6BWtxg Zum LinkedIn-Profil von Michel: https://www.linkedin.com/in/michel-ebner/ Zur Homepage von sevdesk: https://sevdesk.de/ Zu allen wichtigen Links rund um Jonas und den Podcast: https://linktr.ee/jonas.rashedi 00:00 Einstieg & Vorstellung Michel 08:00 Rolle von Data Engineering bei sevDesk 15:00 Toolstack: Meltano, Stitch, Snowplow 24:00 Architektur-Entscheidungen & Trennung von ELT 32:00 Warum Snowflake? Analyse & Migrationsprozess 40:00 Orchestrierung mit Dexter & DBT-Pitfalls 48:00 Learnings aus 600 Modellen & Qualitätssicherung
Pour ce premier épisode de la saison 4 de Morning Data Chat, Jean-François Guilmard, Chief Data & AI Officer du groupe Accor, partage les coulisses de la transformation data et IA du groupe hôtelier. Il explique comment Accor repense le search de la recherche d'hôtel à l'expérience conversationnelle en s'appuyant sur les dernières avancées en IA générative et le protocole MCP. Un échange passionnant sur le futur du travel search, de la personnalisation client et de l'hospitalité augmentée par l'IA.Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.
Welcome to 2026! In this spontaneous Friday AMA, I take listener questions on ontologies, the “leaky abstractions” of AI coding tools, why the “button pusher” era of engineering is a professional dead end, and the shifting landscape of data engineering.I also provides an update on my upcoming book, Mixed Model Arts (launching in March 2026), and discuss the unexpected convergence of library science, ontologies, and traditional data modeling, something not on my 2025 bingo card.Great turnout, especially for no notice. Thanks to everyone who showed up!
Was braucht es, um eine datengetriebene Organisation wirklich zu bauen – nicht nur als Buzzword? In dieser Folge spricht Jonas Rashedi mit Marc Roulet von sevDesk, VP Data & Analytics, über den Weg zu echter Data-DNA. Marc berichtet, wie er bei sevDesk eine zentrale Data-Organisation mit über 20 Mitarbeitenden aufgebaut hat, wie Analytics, Engineering und Data Science eng verzahnt zusammenarbeiten – und welche sechs strategischen Säulen sie dafür definiert haben: von Infrastruktur über Insights bis Literacy. Besonders spannend: Wie sich die Rolle verändert hat – von einer Finance-getriebenen Supportfunktion hin zur CEO-nahen Strategieinstanz. Und was passiert, wenn man einfach mal zwei Data Lakes zu viel hat… MY DATA IS BETTER THAN YOURS ist ein Projekt von BETTER THAN YOURS, der Marke für richtig gute Podcasts. Du möchtest gezielt Werbung im Podcast MY DATA IS BETTER THAN YOURS schalten? Zum Kontaktformular: https://2frg6t.share-eu1.hsforms.com/2ugV0DR-wTX-mVZrX6BWtxg Zum LinkedIn-Profil von Marc: https://www.linkedin.com/in/marc-roulet-data/ Zur Homepage von sevdesk: https://sevdesk.de/ Zu allen wichtigen Links rund um Jonas und den Podcast: https://linktr.ee/jonas.rashedi 00:00 Vorstellung & Einstieg 06:00 Von eBay zu sevDesk – Marcs Werdegang 12:00 Was macht sevDesk & wie datengetrieben ist das Produkt? 20:00 Data-Strategie: 6 Säulen für die DNA 28:00 Kultur, Matrix & Teamaufbau 36:00 AI, Data Literacy & MLOps 44:00 Reporting, KPIs & Managementstruktur 52:00 Data Circle & Zusammenarbeit in der Breite
This is episode 314 recorded on December 15th, 2025, where John & Jason talk about the Fabric November 2025 Feature Summary part 2 including updates to Data Engineering & Data Science. For show notes please visit www.bifocal.show
Data as a Product: Was steckt dahinter?Warum ist AI überall, aber der Weg von der Datenbank zu "Wow, das Modell kann das" wirkt oft wie ein schwarzes Loch? Du loggst brav Events, die Daten landen in irgendwelchen Silos, und trotzdem bleibt die entscheidende Frage offen: Wer sorgt eigentlich dafür, dass aus Rohdaten ein zuverlässiges, verkaufbares Datenprodukt wird.In dieser Episode machen wir genau dort das Licht an. Gemeinsam mit Mario Müller, Director of Data Engineering bei Veeva Systems, schauen wir uns an, was Datenteams wirklich sind, wie "Data as a Product" in der Praxis funktioniert und warum Data Engineering mehr ist als nur ein paar CSVs über FTP zu schubsen. Wir sprechen über Teamstrukturen von der One-Man-Show bis zur cross-functional Squad, über Ownership auf den Daten, Data Governance und darüber, wie du Datenqualität wirklich misst, inklusive Monitoring, Alerts, SQL-Regeln und menschlicher Quality Control.Dazu gibt es eine ordentliche Portion Tech: Spark, AWS S3 als primärer Speicher, Delta Lake, Athena, Glue, Airflow, Push-Pull statt Event-Overkill und die Entscheidung für Batch Processing, obwohl alle Welt nach Streaming ruft.Und natürlich klären wir auch, was passiert, wenn KI an den Daten rumfummelt: Wo AI beim Bootstrapping hilft, warum Production und Scale tricky werden und wieso Verantwortlichkeit beim Commit nicht von einem LLM übernommen wird.Wenn du Datenteams aufbauen willst, Data Products liefern musst oder einfach verstehen willst, wie aus Daten verlässlicher Business-Impact wird, bist du hier genau richtig.Bonus: Batchjobs bekommen heute mal ein kleines Comeback.Unsere aktuellen Werbepartner findest du auf https://engineeringkiosk.dev/partnersDas schnelle Feedback zur Episode:
Stop chasing tools. Start fixing decisions. I spoke to Stephen Sciortino, CEO and Founder of Database Tycoon LLC, at Small Data SF by MotherDuck. Clear takeaways for anyone running or advising a data team.What we covered• The real shift from his Brooklyn Data days to independent consulting• Early signals a team will win vs signs they are in trouble• How AI is changing expectations and what must stay the sameWatch the complete interview! Practical, direct, and worth your time.#Data #AI #SmallDataSF #DataEngineering #AI #Analytics #TheRavitShow
What if your data platform could serve AI-native workloads while scaling reliably across your entire organization? In this episode, Benjamin sits down with Ritesh, Staff Engineer at Lyft, to explore how to build a unified data stack with Spark, Trino, and ClickHouse, why AI is reshaping infrastructure decisions, and the strategies powering one of the industry's most sophisticated data platforms. Whether you're architecting data systems at scale or integrating AI into your analytics workflow, this conversation delivers actionable insights into reliability, modernization, and the future of data engineering. Tune in to discover how Lyft is balancing open-source investments with cutting-edge AI capabilities to unlock better insights from data.
This interview was recorded for the GOTO Book Club.http://gotopia.tech/bookclubRead the full transcription of the interview here:https://gotopia.tech/episodes/399Matt Housley - Co-Author of "Fundamentals of Data Engineering", Keynote Speaker & PodcasterJoe Reis - Co-Author of "Fundamentals of Data Engineering", Keynote Speaker, Professor & PodcasterRESOURCESMatthttps://www.linkedin.com/in/housleymatthewJoehttps://www.linkedin.com/in/josephreishttps://github.com/JoeReishttps://joereis.substack.comLinkhttps://mathstodon.xyz/@tao/114915604830689046DESCRIPTIONJoe Reis and Matt Housley, co-authors of "Fundamentals of Data Engineering," discuss the evolution of their field three years after their book's publication. They explore how the rise of AI tools has transformed data engineering practices, the ongoing importance of foundational knowledge, and the challenges facing junior engineers in an AI-dominated landscape. The conversation covers the balance between leveraging AI assistance and maintaining core expertise, the resurgence of classical techniques, and why fundamental principles remain more relevant than ever.RECOMMENDED BOOKSJoe Reis & Matt Housley • Fundamentals of Data Engineering • https://amzn.to/4n85049Karen Hao • Empire of AI • https://amzn.to/46qeL6BKeach Hagey • The Optimist • https://amzn.to/4nlcS20Parmy Olson • Supremacy • https://amzn.to/3IpHdgIPeter Norvig & Stuart Russel • Artificial Intelligence • https://amzn.to/420ZgR8David Foster • Generative Deep Learning • https://amzn.to/48ZgP4xSol Rashidi • Your AI Survival Guide • https://amzn.to/3UFYnKCHow Hacks HappenHacks, scams, cyber crimes, and other shenanigans explored and explained. Presented...Listen on: Apple Podcasts SpotifyBlueskyTwitterInstagramLinkedInFacebookCHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/joinLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
Artificial Intelligence and Machine Learning in Human Resources: A Concise Guide by Dr. C. Rasmussen https://www.amazon.com/Artificial-Intelligence-Machine-Learning-Resources/dp/B0FWZQXHMG Curtisrasmussen.focalpointcoaching.com What if a computer could help find the perfect employee or predict who might leave a job? This exciting idea opens the door to a new way of working. Overview This guide explains how artificial intelligence (AI) and machine learning (ML) are transforming human resources (HR). Smart computer programs can quickly review thousands of job applications to find the best candidates, suggest training tailored to employees’ needs, and predict which workers might quit, helping managers take action to keep them. The book includes real-world examples, like how large companies use AI to save time, and covers benefits, such as improved hiring, as well as key concerns, like protecting personal information. At just 61 pages, it's concise by design, following Richard Feynman's wisdom: “If you can’t explain something simply, you don’t understand it well enough.” More pages don't equal more value; in fact, lengthy texts can bury useful insights. Since every organization is unique, this book equips HR professionals and managers with the right questions to ask rather than a rigid roadmap, making it a practical tool for anyone curious about the future of work. About the author Dr. Curtis “Curt” Rasmussen is a leading expert in industrial-organizational psychology with a Ph.D. from Walden University. He specializes in blending human skills with artificial intelligence (AI) and machine learning (ML) to make workplaces better and more efficient. With years of experience in research, consulting, and government roles, he helps businesses use data and tech wisely. His career highlights include owning Cyber-Human Performance Tech, LLC, where he advises small and mid-sized companies on adding AI to hiring and daily tasks while keeping things ethical. He also guides students in George Mason University’s Data Engineering program, focusing on AI tools like natural language processing and computer vision. At the Cybersecurity and Infrastructure Security Agency (CISA), he led workforce planning as a senior I/O psychologist, creating surveys and frameworks that improved employee satisfaction by 45% and helped with smarter hiring. Earlier, he reviewed AI and data science proposals for the Department of Commerce, National Academy of Medicine, and the Office of the Director of National Intelligence, making sure projects were strong and fair. Dr. Rasmussen has invented patent-pending tools like the Multidimensional Algorithm Structure (MAS), which picks the best AI methods by checking data and company needs, and the eXplainable Artificial Intelligence Construct (XAIC), which makes AI easy to understand and trust by involving people in decisions. These ideas help fix common AI problems, like failures or hidden biases.
Software Engineering Radio - The Podcast for Professional Software Developers
Flavia Saldanha, a consulting data engineer, joins host Kanchan Shringi to discuss the evolution of data engineering from ETL (extract, transform, load) and data lakes to modern lakehouse architectures enriched with vector databases and embeddings. Flavia explains the industry's shift from treating data as a service to treating it as a product, emphasizing ownership, trust, and business context as critical for AI-readiness. She describes how unified pipelines now serve both business intelligence and AI use cases, combining structured and unstructured data while ensuring semantic enrichment and a single source of truth. She outlines key components of a modern data stack, including data marketplaces, observability tools, data quality checks, orchestration, and embedded governance with lineage tracking. This episode highlights strategies for abstracting tooling, future-proofing architectures, enforcing data privacy, and controlling AI-serving layers to prevent hallucinations. Saldanha concludes that data engineers must move beyond pure ETL thinking, embrace product and NLP skills, and work closely with MLOps, using AI as a co-pilot rather than a replacement. Brought to you by IEEE Computer Society and IEEE Software magazine.
Data engineering is undergoing a fundamental shift. In this episode, I sit down with Nick Schrock, founder and CTO of Dagster, to discuss why he went from being an "AI moderate" to believing 90% of code will be written by AI. Being hands on also led to a massive pivot in Dagster's roadmap and a new focus on managing and engineering context.We dive deep into why simply feeding data to LLMs isn't enough. Nick explains why real-time context tools (like MCPs) can become "token hogs" that lack precision and why the future belongs to "context pipelines": offline, batch-computed context that is governed, versioned, and treated like code.We also explore Compass, Dagster's new collaborative agent that lives in Slack, bridging the gap between business stakeholders and data teams. If you're wondering how your role as a data engineer will evolve in an agentic world, this conversation maps out the territoryDagster: dagster.io Nick Schrock on X: @schrockn
What does MLOps look like when you are deploying 22,000 models a month? Maddie Daianu, Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down. Maddie shares how her team manages terabytes of daily data on Google Cloud and explains the massive strategic pivot they are undertaking right now: The move from "Information" to "Agency."
Ciro Greco, Co-founder & CEO at Bauplan, joins the podcast to discuss a new paradigm for data engineering rooted in software engineering principles. He explains how treating the data lakehouse like a software project — with version control, branching, and transactional pipelines — creates a robust and safe environment for development. Subscribe to the Gradient Flow Newsletter
In this episode of Real Talk with Anant Veeravalli, the discussion revolves around the evolving data landscape and the necessity for strategic partnerships to achieve holistic measurement. The team unpacks the importance of ethical data sourcing, privacy compliance, and the utilization of clean room environments like Snowflake and Databricks to bridge data gaps. Enabling secure and scalable data connectivity and facilitating real-time data sharing is key for brands to derive meaningful intelligence, including predictive modeling and AI-driven insights. This episode is essential listening for anyone focused on governance, security, and future-proofing data systems.Thanks for listening! Follow us on Twitter and Instagram or find us on Facebook.
There's no shortage of technical content for data engineers, but a massive gap exists when it comes to the non-technical skills required to advance beyond a senior role. I sit down with Yordan Ivanov, Head of Data Engineering and writer of "Data Gibberish," to talk about this disconnect.We dive into his personal journey of failing as a manager the first time, learning the crucial "people" skills, and his current mission to help data engineers learn how to speak the language of business.Key areas we explore:The Senior-Level Content Gap: Yordan explains why his non-technical content on career strategy and stakeholder communication gets "terrible" engagement compared to technical posts, even though it's what's needed to advance.The Managerial Trap: Yordan's candid story about his first attempt at management, where he failed because he cared only about code and wasn't equipped for the people-centric aspects and politics of the role.The Danger of AI Over-reliance: A deep discussion on how leaning too heavily on AI can prevent the development of fundamental thinking and problem-solving skills, both in coding and in life.The Maturing Data Landscape: We reflect on the end of the "modern data stack euphoria" and what the wave of acquisitions means for innovation and the future of data tooling.AI Adoption in Europe vs. the US: A look at how AI adoption is perceived as massive and mandatory in Europe, while US census data shows surprisingly low enterprise adoption rates
Mark Raasveldt, co-founder and CTO of DuckDB Labs, shares his journey from academic research at CWI Amsterdam to creating one of the most innovative analytical databases of the last decade. Mark discusses the technical challenges of building DuckDB from scratch, the philosophy behind embedded analytical databases, and why single-node performance still matters in our cloud-first world. He provides insights into open source business models, the evolution of data formats like Parquet, and how DuckDB is democratizing high-performance analytics for developers everywhere.
Nike's Principal Data Engineer Ashok Singamaneni joins Benjamin and Eldad to discuss his open-source data quality framework, Spark Expectations. Ashok explains how the tool, which was inspired by Databricks DLT Expectations, shifts data quality checks to before the data is written to a final table. This proactive approach uses row-level, aggregation-level, and query data quality checks to fail jobs, drop bad records, or alert teams - ultimately saving huge costs on recompute and engineering effort in mission-critical data pipelines.
Prashanth Southekal, Ph.D., managing principal for DBP Institute, joins host Andrew Miller to discuss data engineering for analytics and AI - including the four types of business data, practical data wrangling enrichment techniques, and managing governance, ethics and risk. For more information on Prashanth's course at TDWI Orlando please visit Data Engineering for Analytics and AI. ____________ More information: · TDWI Conferences: https://bit.ly/3XqBhGH · TDWI Modern Data Leader's Summits: https://bit.ly/4902fuu · TDWI Virtual Summits: https://bit.ly/31HJ2xr · Seminars: https://bit.ly/3WxQPr4 · More Speaking of Data Episodes: https://bit.ly/3JsQPWo Follow Us on: · LinkedIn - https://bit.ly/42zCZZB · Facebook - https://bit.ly/49uej7j · Instagram - https://bit.ly/3HM8x57 · X - https://bit.ly/3SsYu9P
In this episode, I sit down with Saket Saurabh (CEO of Nexla) to discuss the fundamental shift happening in the AI landscape. The conversation is moving beyond the race to build the biggest foundational models and towards a new battleground: context. We explore what it means to be a "model company" versus a "context company" and how this changes everything for data strategy and enterprise AI. Join us as we cover:Model vs. Context Companies: The emerging divide between companies building models (like OpenAI) and those whose advantage lies in their unique data and integrations.The Limits of Current Models: Why we might be hitting an asymptote with the current transformer architecture for solving complex, reliable business processes. "Context Engineering": What this term really means, from RAG to stitching together tools, data, and memory to feed AI systems. The Resurgence of Knowledge Graphs: Why graph databases are becoming critical for providing deterministic, reliable information to probabilistic AI models, moving beyond simple vector similarity. AI's Impact on Tooling: How tools like Lovable and Cursor are changing workflows for prototyping and coding, and the risk of creating the "-10x engineer." The Future of Data Engineering: How the field is expanding as AI becomes the primary consumer of data, requiring a new focus on architecture, semantics, and managing complexity at scale.
In this episode of the podcast, members of the InfoQ editorial staff and friends of InfoQ discuss the current trends in the domain of AI, ML and Data Engineering. One of the regular features of InfoQ are the trends reports, which each focus on a different aspect of software development. These reports provide the InfoQ readers and listeners with a high-level overview of the topics to pay attention to this year. InfoQ AI, ML and Data Engineering editorial team met with external guests to discuss the trends in AI and ML areas, and what to watch out for the next 12 months. In addition to the written report and trends graph, this podcast provides a recording of a discussion where expert panelists discuss how innovative AI technologies are disrupting the industry. Read a transcript of this interview: http://bit.ly/4nRpvlF Subscribe to the Software Architects' Newsletter for your monthly guide to the essential news and experience from industry peers on emerging patterns and technologies: https://www.infoq.com/software-architects-newsletter Upcoming Events: InfoQ Dev Summit Munich (October 15-16, 2025) Essential insights on critical software development priorities. https://devsummit.infoq.com/conference/munich2025 QCon San Francisco 2025 (November 17-21, 2025) Get practical inspiration and best practices on emerging software trends directly from senior software developers at early adopter companies. https://qconsf.com/ QCon AI New York 2025 (December 16-17, 2025) https://ai.qconferences.com/ QCon London 2026 (March 16-19, 2026) https://qconlondon.com/ The InfoQ Podcasts: Weekly inspiration to drive innovation and build great teams from senior software leaders. Listen to all our podcasts and read interview transcripts: - The InfoQ Podcast https://www.infoq.com/podcasts/ - Engineering Culture Podcast by InfoQ https://www.infoq.com/podcasts/#engineering_culture - Generally AI: https://www.infoq.com/generally-ai-podcast/ Follow InfoQ: - Mastodon: https://techhub.social/@infoq - X: https://x.com/InfoQ?from=@ - LinkedIn: https://www.linkedin.com/company/infoq/ - Facebook: https://www.facebook.com/InfoQdotcom# - Instagram: https://www.instagram.com/infoqdotcom/?hl=en - Youtube: https://www.youtube.com/infoq - Bluesky: https://bsky.app/profile/infoq.com Write for InfoQ: Learn and share the changes and innovations in professional software development. - Join a community of experts. - Increase your visibility. - Grow your career. https://www.infoq.com/write-for-infoq
The DuckLake Lakehouse Format // MLOps Podcast #339 with Hannes Mühleisen, Co-founder and CEO of DuckDB Labs.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// AbstractManaging data on Object Stores has been a painful affair. Users had to choose between data swamp chaos or a maze of metadata files with catalog servers on top. DuckLake is a new paradigm for managing data on object stores: First, it uses classical SQL data management systems to manage metadata. Second, actual data is stored in Parquet files on pretty arbitrary storage. Third, processing queries is done client-side, or anywhere really. DuckDB is the first system to integrate with DuckLake using an extension with the same name. Conceptually, DuckLake enables central control over truth while decentralizing compute and storage entirely. DuckLake turns data warehouse architecture upside down by departing from the integrated metadata/compute layer towards a fully disconnected operation with only centralized metadata. For the first time, DuckLake allows a “multi-player” experience with DuckDB, where computation stays fully local, but transactional control is centralized.// BioHannes Mühleisen
This episode is packed with big-picture energy talk and some seriously nerdy (but fun) data breakdowns. John Kalfayan from collide. and Chuck start with what's really happening in oil and gas today before shifting into the challenges of putting AI to work in the field. From there, things get deep: contract dedications, what RAG actually means, how data chunking works, and the never-ending battle with duplicate info. We also weigh the costs of storage, querying, and running models, plus the tradeoffs between RAG and foundational models. If you've ever wondered about vector databases, data strategy, or just why we have a rant about sand, it's all here. By the end, we hit on the human side too: education, privacy, and making sure the right people can access the right data.Click here to watch a video of this episode.Join the conversation shaping the future of energy.Collide is the community where oil & gas professionals connect, share insights, and solve real-world problems together. No noise. No fluff. Just the discussions that move our industry forward.Apply today at collide.ioClick here to view the episode transcript. 00:00 - Intro01:51 - Oil and Gas Industry Insights06:34 - AI Deployment Challenges09:12 - Contract Dedications Explained10:32 - Understanding RAG12:52 - What is RAG in Data Management13:43 - Data Chunking Techniques17:17 - Cost Considerations in Data18:03 - RAG vs Foundational Models19:21 - Vectorized Databases Overview23:47 - Managing Duplicate Data26:28 - Data Strategy Considerations28:24 - Sand Rant31:32 - Identifying Gaps in Data33:10 - The Cost of Storage33:56 - Effective Data Querying35:50 - AI Education and Awareness37:53 - Privacy Concerns with Language Models40:54 - Data Access and Availabilityhttps://twitter.com/collide_iohttps://www.tiktok.com/@collide.iohttps://www.facebook.com/collide.iohttps://www.instagram.com/collide.iohttps://www.youtube.com/@collide_iohttps://bsky.app/profile/digitalwildcatters.bsky.socialhttps://www.linkedin.com/company/collide-digital-wildcatters
Modernizing Search Infrastructure: How Instacart Transitioned from Elasticsearch to PostgreSQL for Enhanced Performance and Simplicity. In this episode of The Data Engineering Show, host Benjamin Wagner speaks with Ankit Mittal, former senior engineer at Instacart, about the company's innovative approach to modernizing their search infrastructure by transitioning from Elasticsearch to PostgreSQL for single-retailer search functionality.
In this episode of Stories from the Hackery, we talk with Nashville tech leader and hiring manager Jason Turan about one of tech's most in-demand fields: data engineering. Jason, a long-time friend of NSS, was one of the first people to tell us that Nashville needed more data engineers. He shares his perspective on what a data engineer does, describing the role as the "connective tissue between data producers and data consumers". Listen in to hear us discuss: - Why data engineers are essential for flipping the 80/20 rule, allowing data scientists and analysts to spend less time cleaning data and more time finding insights. - How the rise of generative AI has acted as an "accelerant," increasing the need for high-quality data and the professionals who can provide it. - Actionable advice for getting started in the field, including the importance of focusing on a "T-shaped skillset" with SQL at its core. - Why Jason's number one piece of advice is to be curious, experiment, and "go out and do the thing". 01:20 Meet Jason Turan: His Tech Origin Story 03:04 Jason's History with NSS and Hiring Grads 07:28 Defining Data Engineering: The "Connective Tissue" of Tech 11:15 Why Nashville is a Hub for Data Engineers 13:56 Healthcare's Impact on Nashville's Data Jobs 20:35 How GenAI Accelerates the Need for Data Engineers 31:33 Getting Started: Lower Barriers to Entry 39:03 A Top Use Case for AI: Understanding Your Codebase 52:21 Misconceptions & the "T-Shaped Skillset" 55:29 The Value of Hands-On Learning: "Go Do the Thing" 58:52 Lightning Round: Favorite Tech Tools 01:00:32 Lightning Round: Top Reads & Resources Links Metabase: https://www.metabase.com/ DuckDB: https://duckdb.org/ MotherDuck: https://motherduck.com/ Ralph Kimball: The Data Warehouse Toolkit: https://www.amazon.com/gp/product/1118530802 Bill Inmon: Building the Data Warehouse: https://www.amazon.com/Building-Data-Warehouse-W-Inmon/dp/0764599445 Edward Tufte: The Visual Display of Quantitative Information: https://www.amazon.com/Visual-Display-Quantitative-Information/dp/0961392142 Brendan Keeler: The Health API Guy: https://healthapiguy.substack.com/ TLDR Newsletter: https://tldr.tech/ Nashville Technology Council (NTC): https://technologycouncil.com/
This is episode 304 recorded on September 4th, 2025, where John & Jason talk the Microsoft Fabric August 2025 Feature Summary including a new Flat list view in Deployment pipelines, Bursting controls for Data Engineering workloads, new test capabilities for User Data Functions, the ability to server real-time predictions with ML model endpoints, several updates to Data Warehouse, Database tree in edit tile and AzMon data sources for RTI, the ability to use Python Notebooks to read/write to Fabric SQL Databases, Auto table creation on destination in copy job in Data Factory, and much, much more. For show notes please visit www.bifocal.show
Prepare-se para uma imersão nos bastidores do mais recente encontro de engenharia de dados em Brasília e descubra as tendências que estão moldando o futuro da área. Neste episódio, Vitor Ramos conversa com Wesley Outeiro e outros participantes para compartilhar os principais insights e aprendizados do evento presencial, organizado pela Engenharia de Dados Academy e como palestrante Luan Moreno.Uma conversa sincera sobre a importância das interações presenciais, a evolução da comunidade de dados e o impacto da Inteligência Artificial no dia a dia dos profissionais.O que você vai aprender neste episódio:A importância do networking e da comunidade para o crescimento pessoal e profissional na área de dados.Como a interação presencial em eventos potencializa o aprendizado e a colaboração.As principais tendências em dados e IA que estão criando novas oportunidades e desafios para o mercado.Por que o domínio dos conceitos fundamentais é mais crucial do que nunca para o sucesso na engenharia de dados.A relevância de FinOps para a gestão eficiente de custos de nuvem em projetos de dados.Reflexões sobre como a dinâmica de eventos e a troca de conhecimento estão evoluindo.O poder de se conectar com líderes da indústria para se inspirar e motivar sua carreira. Luan Moreno = https://www.linkedin.com/in/luanmoreno/
Join us in a discussion with Richie Cotton, Senior Data Evangelist at DataCamp and host of the DataFramed podcast, for a special crossover episode exploring the hottest topics in data science, analytics, and artificial intelligence! Don't miss the full video of this conversation on YouTube! Discover whether AI will reduce the need to learn coding, the real-world applications of AI agents, and what it truly means to be an "AI-first" company. Get expert insights, practical advice for building a career in data and AI, and learn how to stay ahead in a rapidly changing tech landscape.Panelists: Richie Cotton, Senior Data Evangelist @ DataCamp,LinkedIn, XMegan Bowers, Sr. Content Manager @ Alteryx - @MeganBowers, LinkedInShow notes: DataFramed podcastDataCamp Skill Track: Alteryx FundamentalsGartner Agentic AI articleWSJ Agentic AI article Interested in sharing your feedback with the Alter Everything team? Take our feedback survey here!This episode was produced by Megan Bowers, Mike Cusic, and Matt Rotundo. Special thanks to Andy Uttley for the theme music.
This week on The Data Stack Show, John and Matt bring you another edition of the Cynical Data Guy. John and Matt dive into the evolution of customer data infrastructure, the growing influence of low-code tools like Clay, and the blurred lines around the “engineer” title in modern data roles. They also discuss the trade-offs between SaaS adoption and building custom solutions, the pitfalls of enterprise software buying, and the realities of platform lock-in—using Palantir's unique business model as a case study. Key takeaways include the importance of simplicity and scalability in data engineering, the need for clear requirements when evaluating tools, and a healthy skepticism toward sales pitches and “art of the possible” features. Don't miss this month's Cynical Data Guy. Highlights from this week's conversation include:Reacting to the Rise of the GTM Engineer (1:11)Is "Engineer" the Right Term? (4:49)Low-Code Tools, AI, and Future Workflows (7:14)Simplicity in Data Engineering (14:38)The Pitfalls of "Simple" Solutions (15:18)Choosing SaaS vs. Building In-House (18:26)Business Process Abstraction and SaaS Adoption (21:31)Enterprise Software: Art of the Possible vs. Practicality (24:31)Sales Advice: Focus on Customer Needs (27:11)Forward Deployed Engineers and Delivery Models (29:05)Platform Lock-In: When Is It a Dirty Word? (36:41)Legacy Systems and the Reality of Lock-In (39:53)Final Thoughts and Takeaways (40:55)The Data Stack Show is a weekly podcast powered by RudderStack, customer data infrastructure that enables you to deliver real-time customer event data everywhere it's needed to power smarter decisions and better customer experiences. Each week, we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
In this bonus episode, John and Matt preview the next installment of the Cynical Data Guy. The Data Stack Show is a weekly podcast powered by RudderStack, customer data infrastructure that enables you to deliver real-time customer event data everywhere it's needed to power smarter decisions and better customer experiences. Each week, we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Matt Housley joins me to chat about whether it matters that AI is PhD level, clanker content (the new term for AI slop), a retrospective on Fundamentals of Data Engineering, and much more.
In this episode, we talk with Orell about his journey from electrical engineering to freelancing in data engineering. Exploring lessons from startup life, working with messy industrial data, the realities of freelancing, and how to stay up to date with new tools. Topics covered: Why Orel left a PhD and a simulation‑focused start‑up after Covid hitWhat he learned trying (and failing) to commercialise medical‑imaging simulationsThe first freelance project and the long, quiet months that followedHow he now finds clients, keeps projects small and delivers value quicklyTypical work he does for industrial companies: parsing messy machine logs, building simple pipelines, adding structure laterFavorite everyday tools (Python, DuckDB, a bit of C++) and the habit of blocking time for learningAdvice for anyone thinking about freelancing: cash runway, networking, and focusing on problems rather than “perfect” tech choicesA practical conversation for listeners who are curious about moving from research or permanent roles into freelance data engineering.
O Impacto da IA Generativa no Presente e Futuro dos DadosPrepare-se para uma conversa de altíssimo nível sobre como a Inteligência Artificial Generativa está transformando o mundo dos dados, das empresas e das carreiras. Neste episódio, Luan Moreno recebe Eduardo Ordax, Líder de IA Generativa na AWS, e Mateus Oliveira para discutir, sem rodeios, os impactos reais da IA no mercado.O que você vai aprender neste episódio:Como a IA Generativa está mudando a forma como construímos pipelines, produtos e soluções de dados.Os principais desafios que empresas enfrentam ao implementar GenAI — e por que tecnologia não é mais o problema, mas sim pessoas e dados.O papel da Engenharia de Dados no mundo da IA e como ela se conecta com conceitos como LLMOps, Fine-Tuning, Prompt Engineering e Data-Centric AI.Por que o domínio dos fundamentos nunca foi tão importante para quem trabalha (ou quer trabalhar) com dados e IA.Reflexões sobre o futuro das carreiras em dados e IA — será que os engenheiros de dados, cientistas de dados e desenvolvedores serão substituídos ou terão um papel ainda mais relevante?As diferenças entre usar IA para brincar no ChatGPT e levar IA para resolver problemas de negócios no mundo real, em escala e em produção.Este é um papo sobre IA. É uma imersão completa sobre os desafios, as oportunidades e a visão de futuro para quem trabalha com dados, engenharia, machine learning e inteligência artificial. Luan Moreno = https://www.linkedin.com/in/luanmoreno/