POPULARITY
Categories
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses the idea of whether Apache Iceberg and Delta Lake could merge. Follow my blog: https://medium.alexmerced.blog
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses what he looks forward to in 2025 in the Data Lakehouse Space. Alex Merced Event Listings: https://lu.ma/LakehouselinkupsAlex on Bluesky: https://bsky.app/profile/alextalksdatalakehouses.fyi Alex on Twitter: https://x.com/AMdatalakehouse Alex on LinkedIn: https://www.linkedin.com/in/alexmerced/
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses his experience at AWS re:invent follow Alex at AlexMered.com/data
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses why catalogs are so important in data:
Alex Merced (@AMdatalakehouse, Senior Tech Evangelist, @dremio) talks about everything data and we dig deep into Apache Iceberg and DataLakehouses.SHOW: 865Want to go to All Things Open in Raleigh for FREE? (Oct 27th-29th)We are offering 5 Free passes, first come, first serve for the Cloudcast Community -> Registration Link Instructions:Click reg linkClick “Get Tickets”Choose ticket optionProceed with registration (discount will automatically be applied, cost will be $0)SHOW TRANSCRIPT: The Cloudcast #865 TranscriptSHOW VIDEO: https://youtube.com/@TheCloudcastNET CLOUD NEWS OF THE WEEK: - http://bit.ly/cloudcast-cnotwNEW TO CLOUD? CHECK OUT OUR OTHER PODCAST: - "CLOUDCAST BASICS" SHOW NOTES:Dremio (homepage)Hands-on with Apache Iceberg TutorialApache Iceberg Crash CourseData Lakehouses and Apache Hudi (Cloudcast Eps. 694)Apache Iceberg, the Definitive Guide (eBook)Apache Iceberg (homepage)Iceberg + Nessie Catalog (homepage)Iceberg + Polaris Catalog (homepage)AlexMerced.comDataLakehouseHub.comTopic 1 - Welcome to the show. Tell us a little bit about your background. Topic 2 - It's been a little while since we talked about Data Lakehouses, can you give us a little bit of background on this space, and what the most recent dynamics are around these technologies.Topic 3 - What are the typical integrations with a Data Lakehouse? How are users/developers typically interacting with Data Lakehouse technologies? [The marketplace for Iceberg catalogs like Nessie and Polaris]Topic 4 - How does an open data format like Apache Iceberg fit into the bigger picture of data lakehouses, or large scale stores of data? Topic 5 - How does Dremio enable Iceberg? How does Dremio sit in the intersection of Data Lakehouse, Data Mesh and Data Virtualization trends all of which come from the same fundamental problem, the growing scale of data use cases.Topic 6 - We've seen companies start to rethink their data in the cloud strategies. Are you seeing on-premises making a comeback for large data applicationsFEEDBACK?Email: show at the cloudcast dot netTwitter: @cloudcastpodInstagram: @cloudcastpodTikTok: @cloudcastpod
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses the news and announcements for dbt coalest 2024. Announcements Alex didn’t mention:– dbt Apache Iceberg support, this is done by working with Iceberg supporting query engines like Dremio – Healthtiles with more information on your dashboard about the health of your models – Auto-exposures in Tableau triggering BI Dashboard updates when models […]
What makes data lakehouses a game changer in modern data management? In this episode, Bill sits down with Alex Merced, Senior Tech Evangelist at Dremio, to explore the evolution of data lakehouses and their role in bridging the gap between data lakes and data warehouses. Alex breaks down the components of data lakehouses and dives into the rise of Apache Iceberg.---------Key Quotes:“I love just get really deep into technology, really see what it does. And then scream at the rooftops how cool it is. And basically that was my charter. And [Apache] Iceberg, the more I learned about it, the more I realized this is really interesting.”“Interoperability and data. Basically, a lot of the things that kept data in silos is now breaking apart.”"So here we're talking about something that's going to be a standard. And that's when I think of the highest levels of openness matter because if it's something that a whole industry is going to build on, it should be something that the whole industry has to say in its evolution…And that's the beauty of openness that it does create these nice sort of places where we can collaborate and compete together.”--------Timestamps: (01:32) How Alex got started in his career(03:54) Breaking down data lakehouses(07:08) The idea behind an open data lakehouse(10:10) Alex's involvement with Apache Iceberg(15:13) Key components of a data lakehouse(23:41) The growth of Apache Iceberg(32:07) Dremio's Apache Iceberg crash course(38:43) Explaining self-service analytics--------Sponsor:Over the Edge is brought to you by Dell Technologies to unlock the potential of your infrastructure with edge solutions. From hardware and software to data and operations, across your entire multi-cloud environment, we're here to help you simplify your edge so you can generate more value. Learn more by visiting dell.com/edge for more information or click on the link in the show notes.--------Credits:Over the Edge is hosted by Bill Pfeifer, and was created by Matt Trifiro and Ian Faison. Executive producers are Matt Trifiro, Ian Faison, Jon Libbey and Kyle Rusca. The show producer is Erin Stenhouse. The audio engineer is Brian Thomas. Additional production support from Elisabeth Plutko.--------Links:Follow Bill on LinkedInFollow Alex on LinkedIn
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses where interoperability tools like Apache Xtable and Uniform
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses the difference between Apache Iceberg Catalog and Enterprise Data Catalogs to help clarify the discussions around catalogs in today’s data trends. Follow Alex -> https://bio.alexmerced.com/data
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses some of the Databricks announcement at the Data + AI summit Follow Alex by visit https://bio.alexmerced.com/data
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses the value of Open Source Apache Iceberg catalogs in creating a truly open lakehouse environment without Vendor lock-in. Check out my article on the subject: https://open.substack.com/pub/amdatalakehouse/p/open-source-table-format-open-source?r=h4f8p&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true Follow me on twitter at @amdatalakehouse
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses some of the major differences in how Apache Iceberg and Delta Lake work that lead to: Follow me on social https://bio.alexmerced.com/data
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses the mistakes that makes Snowflake bills get so large. Hands-On Lakehouse Laptop Exercises:– MongoDB with Dremio: https://bit.ly/am-mongodb-dashboard– SQLServer with Dremio: https://bit.ly/am-sqlserver-dashboard– Postgres with Dremio: https://bit.ly/am-postgres-to-dashboard https://bio.alexmerced.com/data
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses the benefits of Apache Iceberg’s open data ecosystem! Build a Data Lakehouse on Your Laptop Deploy Deploy into Production
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
In this episode, Alex Merced introduces his new podcast “Catalogs, Manifests, and Metadata. Oh my!” covering open-source data projects like Apache Iceberg and others. Make sure to subscribe, this podcast will be showing up in podcast directories over the next week or so of the publishing of this episode. Follow Alex Merced, find all links […]
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses many of the open source projects aiming to reduce the frictions the heavily fragmented data world. Follow me on Socials:https://bio.alexmerced.com/data
Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. The primary purpose of the catalog is to inform the query engine of what data exists and where, but the Nessie project aims to go beyond that simple utility. In this episode Alex Merced explains how the branching and merging functionality in Nessie allows you to use the same versioning semantics for your data lakehouse that you are used to from Git. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster (https://www.dataengineeringpodcast.com/dagster) today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join us at the top event for the global data community, Data Council Austin. From March 26-28th 2024, we'll play host to hundreds of attendees, 100 top speakers and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data and sharing their insights and learnings through deeply technical talks. As a listener to the Data Engineering Podcast you can get a special discount off regular priced and late bird tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit dataengineeringpodcast.com/data-council (https://www.dataengineeringpodcast.com/data-council) and use code dataengpod20 to register today! Your host is Tobias Macey and today I'm interviewing Alex Merced, developer advocate at Dremio and co-author of the upcoming book from O'reilly, "Apache Iceberg, The definitive Guide", about Nessie, a git-like versioned catalog for data lakes using Apache Iceberg Interview Introduction How did you get involved in the area of data management? Can you describe what Nessie is and the story behind it? What are the core problems/complexities that Nessie is designed to solve? The closest analogue to Nessie that I've seen in the ecosystem is LakeFS. What are the features that would lead someone to choose one or the other for a given use case? Why would someone choose Nessie over native table-level branching in the Apache Iceberg spec? How do the versioning capabilities compare to/augment the data versioning in Iceberg? What are some of the sources of, and challenges in resolving, merge conflicts between table branches? Can you describe the architecture of Nessie? How have the design and goals of the project changed since it was first created? What is involved in integrating Nessie into a given data stack? For cases where a given query/compute engine doesn't natively support Nessie, what are the options for using it effectively? How does the inclusion of Nessie in a data lake influence the overall workflow of developing/deploying/evolving processing flows? What are the most interesting, innovative, or unexpected ways that you have seen Nessie used? What are the most interesting, unexpected, or challenging lessons that you have learned while working with Nessie? When is Nessie the wrong choice? What have you heard is planned for the future of Nessie? Contact Info LinkedIn (https://www.linkedin.com/in/alexmerced) Twitter (https://www.twitter.com/amdatalakehouse) Alex's Article on Dremio's Blog (https://www.dremio.com/authors/alex-merced/) Alex's Substack (https://amdatalakehouse.substack.com/) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links Project Nessie (https://projectnessie.org/) Article: What is Nessie, Catalog Versioning and Git-for-Data? (https://www.dremio.com/blog/what-is-nessie-catalog-versioning-and-git-for-data/) Article: What is Lakehouse Management?: Git-for-Data, Automated Apache Iceberg Table Maintenance and more (https://www.dremio.com/blog/what-is-lakehouse-management-git-for-data-automated-apache-iceberg-table-maintenance-and-more/) Free Early Release Copy of "Apache Iceberg: The Definitive Guide" (https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html) Iceberg (https://iceberg.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/iceberg-with-ryan-blue-episode-52/) Arrow (https://arrow.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/voltron-data-apache-arrow-episode-346/) Data Lakehouse (https://www.forbes.com/sites/bernardmarr/2022/01/18/what-is-a-data-lakehouse-a-super-simple-explanation-for-anyone/?sh=6cc46c8c6088) LakeFS (https://lakefs.io/) Podcast Episode (https://www.dataengineeringpodcast.com/lakefs-data-lake-versioning-episode-157) AWS Glue (https://aws.amazon.com/glue/) Tabular (https://tabular.io/) Podcast Episode (https://www.dataengineeringpodcast.com/tabular-iceberg-lakehouse-tables-episode-363) Trino (https://trino.io/) Presto (https://prestodb.io/) Dremio (https://www.dremio.com/) Podcast Episode (https://www.dataengineeringpodcast.com/dremio-with-tomer-shiran-episode-58) RocksDB (https://rocksdb.org/) Delta Lake (https://delta.io/) Podcast Episode (https://www.dataengineeringpodcast.com/delta-lake-data-lake-episode-85/) Hive Metastore (https://cwiki.apache.org/confluence/display/hive/design#Design-Metastore) PyIceberg (https://py.iceberg.apache.org/) Optimistic Concurrency Control (https://en.wikipedia.org/wiki/Optimistic_concurrency_control) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses how formats like Apache Iceberg, Apache Hudi and Delta Lake work and are implemented into your favorite tools, distinguishing what is the responsibility of the format and there responsibility of the engine. Follow Alex on Social, find all links at:https://bio.alexmerced.com/data
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses cloud costs Alex’s Links: https://bio.alexmerced/data
Orchestrate all the Things podcast: Connecting the Dots with George Anadiotis
For many organizations today, data management comes down to handing over their data to one of the "Big 5" data vendors: Amazon, Microsoft Azure and Google, plus Snowflake and Databricks. But analysts David Vellante and George Gilbert believe that the needs of modern data applications coupled with the evolution of open storage management may lead to the emergence of a "sixth data platform". The sixth data platform hypothesis is that open data formats may enable interoperability, leading the transition away from vertically integrated vendor-controlled platforms towards independent management of data storage and permissions. It's an interesting scenario, and one that would benefit users by forcing vendors to compete for every workload based on the business value delivered, irrespective of lock-in. But how close are we to realizing this? To answer this question, we have to examine open data formats and their interoperability potential across clouds and formats, as well as on the semantics and governance layer. We caught up with Peter Corless and Alex Merced to talk about all of that. Article published on Orchestrate all the Things: https://linkeddataorchestration.com/2024/01/11/data-management-in-2024-open-data-formats-and-a-common-language-for-a-sixth-data-platform/
In this week's episode, Jon sits down with Alex Merced, Developer Advocate at Dremio and YouTuber who produces a wide range of multimedia content to help the next generation of web developers. In this episode, Alex shares his secrets on how he creates impactful developer education content. Join him and Jon as they explore the importance of building a brand as a content creator, unpack the complexities in creating layered tech tutorials, and emphasize the need to embrace constructive criticism in the developer space.
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses different techniques to speed up BI Dashboard performance.
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced describes what are window function, and how they can be applied to Apache Iceberg Metadata tables
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses some of the fallout from Databricks’ UNIFormat announcement, and the innovation the industry needs to unlock the data lakehouse. Follow me on twitter @amdatalakehouse
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses some of the big announcements from this weeks conferences. Make sure to checkout Gnarly Data Waves on your favorite podcast app.
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced explains what are Dremio reflection and how they bring you speed, reduce storage costs, and do so while keeping things easy for your end users. Follow Alex on twitter @amdatalakehouse
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses Dremio’s new generative AI Features and the future of Data Lakehouses. Follow Alex on twitter @amdatalakehouse
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced reflects on a recent article from Lauren Balik on the topic of ELT. Here is the Article:https://medium.com/@laurengreerbalik/how-fivetran-dbt-actually-fail-3a20083b2506 Launren’s Twitter: @laurenbalik My Twitter handle: @amdatalakehouse
Moving from an on-prem infrastructure to a frictionless cloud model has become a key goal for many companies. From scalability to improved security, as well as cost-saving benefits, moving to the cloud bring a lot of benefits. But what challenges are enterprises facing in their migration, and how can this process be made easier?In this episode of the EM360 Podcast, Analyst Susan Walsh speaks to Alex Merced, Developer Advocate at Dremio, to discuss:On-prem data lakes Moving to a frictionless cloudHow to streamline the migration process
Moving from an on-prem infrastructure to a frictionless cloud model has become a key goal for many companies. From scalability to improved security, as well as cost-saving benefits, moving to the cloud bring a lot of benefits. But what challenges are enterprises facing in their migration, and how can this process be made easier?In this episode of the EM360 Podcast, Analyst Susan Walsh speaks to Alex Merced, Developer Advocate at Dremio, to discuss:On-prem data lakes Moving to a frictionless cloudHow to streamline the migration process
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced helps explain how stats are collected and used when working with Parquet files and Apache Iceberg tables. Follow Alex on twitter @amdatalakehouse
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses what is Object Storage and the history of file systems. Join the community at datanation.click
Highlights from this week's conversation include:Alex's background in the data space (2:41)Comics and Pop Culture Blending with Finance training (5:20)What is a data lake house? (7:36)What is Dremio solving in for users? (11:21)Essential components of a data lake house (16:35)Difference between on-prem and cloud experiences (33:53)What does it mean to be a developer advocate? (41:31)Final thoughts and takeaways (49:02)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
In this bonus episode, Eric and Kostas preview their upcoming conversation with Alex Merced of Dremio.
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced kicks off the new year with some updates, follow him on twitter: -> @amdatalakehouse -> @alexmercedcoder
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses the features of the pyIceberg 0.2.0
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced encourages everyone to go to Dremio.com and try out the free Dremio test-drive and gives end of year thoughts.
Neste episódio falaremos sobre o Dremio, o projeto open-source que se descreve como The Data Lake Engine, sendo uma ferramenta que permite realizar a integração de dados provenientes das mais variadas fontes de dados.O projeto, The Data Lake Engine, tem benefícios e arquitetura integrada com bancos relacionais, bases colunares, indexadores dentre outros tipos. Hoje recebemos Alex Merced, Desenvolvedor e Advocate na Dremio e Data Lakehouse Evangelist que compartilhou conosco seu vasto conhecimento sobre o assunto.Dremio = The Easy and Open Data Lakehouse Luan Moreno = https://www.linkedin.com/in/luanmoreno/
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses Data Lake migration. Follow Alex on twitter @amdatalakehouse
The Alex Merced Cast - Libertarianism, Blockchain and Economics
Alex Merced discusses Privilege and the social dynamics of privilege and how to productively talk about it.Follow Alex on twitter @loveatarianSupport the show
The Alex Merced Cast - Libertarianism, Blockchain and Economics
Alex Merced talks to his opinion on many of the conflicts currently going on within the Libertarian Party and his perspective from the outside looking in.Follow me on twitter @loveatarianSupport the show
Join Alex Merced and Bob Haffner for a discussion about the Open Data Lakehouse concept #data #dataengineering #datalake #datalakehouse Connect with Alex Twitter - @amdatalakehouse Connect with Bob Twitter - @bobhaffner LinkedIn - linkedin.com/in/bobhaffner Show notes The DataNation Podcast Available on iTunes/Spotify/Stitcher The Subsurface Data Lakehouse Community dremio.com/subsurface Dremio dremio.com Follow the podcast on Twitter @EngSideOfData
The Alex Merced Cast - Libertarianism, Blockchain and Economics
Alex Merced talks about why political, religious and other belief systems play a healthy role in society but can be dangerous when they become overly dogmatic and divisive.Libertarian101.comLearnEconomicsNow.comSupport the show
The Alex Merced Cast - Libertarianism, Blockchain and Economics
Alex Merced expresses his thoughts on the state of Libertarian Party Messaging and catering to progressives and conservatives at the National level.Support the show
The Alex Merced Cast - Libertarianism, Blockchain and Economics
Alex Merced opens up about his weight loss journey.Support the show
The Alex Merced Cast - Libertarianism, Blockchain and Economics
Alex Merced discusses Socialism/Communism and why it hasn't worked up till now, and many reflection based on this interview with Steve Keen.https://www.youtube.com/watch?v=1XGiTDWfdpMSupport the show
The Alex Merced Cast - Libertarianism, Blockchain and Economics
Alex Merced discusses his opinionsAlexMerced.comtwitter: @loveatarianSupport the show
The Alex Merced Cast - Libertarianism, Blockchain and Economics
Alex Merced discusses his thoughts on recent supreme court decisions.AlexMerced.comSupport the show
The Alex Merced Cast - Libertarianism, Blockchain and Economics
Alex Merced discusses pride.Follow Alex on twitter @alexmercedSupport the show
The Alex Merced Cast - Libertarianism, Blockchain and Economics
Alex Merced discusses how to think about inflation and politics during inflation and how culture wars can be turned into culture discussions for better outcomes.follow Alex on twitter @alexmerced Support the show