POPULARITY
Alex Merced (@AMdatalakehouse, Senior Tech Evangelist, @dremio) talks about everything data and we dig deep into Apache Iceberg and DataLakehouses.SHOW: 865Want to go to All Things Open in Raleigh for FREE? (Oct 27th-29th)We are offering 5 Free passes, first come, first serve for the Cloudcast Community -> Registration Link Instructions:Click reg linkClick “Get Tickets”Choose ticket optionProceed with registration (discount will automatically be applied, cost will be $0)SHOW TRANSCRIPT: The Cloudcast #865 TranscriptSHOW VIDEO: https://youtube.com/@TheCloudcastNET CLOUD NEWS OF THE WEEK: - http://bit.ly/cloudcast-cnotwNEW TO CLOUD? CHECK OUT OUR OTHER PODCAST: - "CLOUDCAST BASICS" SHOW NOTES:Dremio (homepage)Hands-on with Apache Iceberg TutorialApache Iceberg Crash CourseData Lakehouses and Apache Hudi (Cloudcast Eps. 694)Apache Iceberg, the Definitive Guide (eBook)Apache Iceberg (homepage)Iceberg + Nessie Catalog (homepage)Iceberg + Polaris Catalog (homepage)AlexMerced.comDataLakehouseHub.comTopic 1 - Welcome to the show. Tell us a little bit about your background. Topic 2 - It's been a little while since we talked about Data Lakehouses, can you give us a little bit of background on this space, and what the most recent dynamics are around these technologies.Topic 3 - What are the typical integrations with a Data Lakehouse? How are users/developers typically interacting with Data Lakehouse technologies? [The marketplace for Iceberg catalogs like Nessie and Polaris]Topic 4 - How does an open data format like Apache Iceberg fit into the bigger picture of data lakehouses, or large scale stores of data? Topic 5 - How does Dremio enable Iceberg? How does Dremio sit in the intersection of Data Lakehouse, Data Mesh and Data Virtualization trends all of which come from the same fundamental problem, the growing scale of data use cases.Topic 6 - We've seen companies start to rethink their data in the cloud strategies. Are you seeing on-premises making a comeback for large data applicationsFEEDBACK?Email: show at the cloudcast dot netTwitter: @cloudcastpodInstagram: @cloudcastpodTikTok: @cloudcastpod
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses the news and announcements for dbt coalest 2024. Announcements Alex didn’t mention:– dbt Apache Iceberg support, this is done by working with Iceberg supporting query engines like Dremio – Healthtiles with more information on your dashboard about the health of your models – Auto-exposures in Tableau triggering BI Dashboard updates when models […]
The highly distributed, dispersed and dynamic nature of enterprise data fuels demand for robust data-query engines for analytics and to drive intelligence. In this episode of the Tech Disruptors podcast, Starburst CEO Justin Borgman joins Sunil Rajgopal, senior software analyst at Bloomberg Intelligence, to discuss the shifting landscape for these products. They examine the future of data solutions, the evolving competitive landscape and developers' embrace of open-table formats like Apache Iceberg, with Borgman saying this was “the summer of Iceberg.” The two also talk about Starburst's product journey, competition with Dremio and Snowflake, and corporate IT-spending momentum.
What makes data lakehouses a game changer in modern data management? In this episode, Bill sits down with Alex Merced, Senior Tech Evangelist at Dremio, to explore the evolution of data lakehouses and their role in bridging the gap between data lakes and data warehouses. Alex breaks down the components of data lakehouses and dives into the rise of Apache Iceberg.---------Key Quotes:“I love just get really deep into technology, really see what it does. And then scream at the rooftops how cool it is. And basically that was my charter. And [Apache] Iceberg, the more I learned about it, the more I realized this is really interesting.”“Interoperability and data. Basically, a lot of the things that kept data in silos is now breaking apart.”"So here we're talking about something that's going to be a standard. And that's when I think of the highest levels of openness matter because if it's something that a whole industry is going to build on, it should be something that the whole industry has to say in its evolution…And that's the beauty of openness that it does create these nice sort of places where we can collaborate and compete together.”--------Timestamps: (01:32) How Alex got started in his career(03:54) Breaking down data lakehouses(07:08) The idea behind an open data lakehouse(10:10) Alex's involvement with Apache Iceberg(15:13) Key components of a data lakehouse(23:41) The growth of Apache Iceberg(32:07) Dremio's Apache Iceberg crash course(38:43) Explaining self-service analytics--------Sponsor:Over the Edge is brought to you by Dell Technologies to unlock the potential of your infrastructure with edge solutions. From hardware and software to data and operations, across your entire multi-cloud environment, we're here to help you simplify your edge so you can generate more value. Learn more by visiting dell.com/edge for more information or click on the link in the show notes.--------Credits:Over the Edge is hosted by Bill Pfeifer, and was created by Matt Trifiro and Ian Faison. Executive producers are Matt Trifiro, Ian Faison, Jon Libbey and Kyle Rusca. The show producer is Erin Stenhouse. The audio engineer is Brian Thomas. Additional production support from Elisabeth Plutko.--------Links:Follow Bill on LinkedInFollow Alex on LinkedIn
Why should you build a Data Lake? I had a fantastic conversation with Olegs Kosels, Enterprise Data Architect at Jamf, on The Ravit Show. We discussed some key topics around Enterprise Data and AI, focusing on the practical challenges and opportunities. Olegs shared insights on how he built a Data Lake and how Jamf is using Dremio to streamline their data processes. We also discussed the overall data journey at Jamf and Olegs' thoughts on the future of Data and AI. Stay tuned for more such interviews from Big Data London! #data #ai #bigdatalondon2024 #theravitshow
It was great hosting Graham Evans and Julian Schäfer, both Senior Enterprise Architects, Maersk at Big Data London. We discussed how Maersk is leveraging Dremio to enable self-serve capabilities and open up the central data lake to normal users. Graham and Julian shared valuable insights into their current projects, focusing on how they are integrating AI into their operations and the importance of training in this transformation. Exciting times ahead as Maersk continues to innovate with data and AI. Stay tuned for more such interviews from Big Data London! #data #ai #bigdatalondon2024 #theravitshow
This interview was recorded for the GOTO Book Club.http://gotopia.tech/bookclubRead the full transcription of the interview hereTomasz Lelek - Senior Staff Software Engineer at Dremio & Co-Author of "Software Mistakes and Tradeoffs"Mark Rendle - Creator of Visual ReCode with 7 Microsoft MVP Awards & 30+ Years of Experience Building SoftwareRESOURCESTomaszhttps://twitter.com/tomekl007https://www.linkedin.com/in/tomaszlelekhttps://github.com/tomekl007Markhttps://twitter.com/markrendlehttps://github.com/markrendlehttps://linkedin.com/in/markrendleVideosMark Rendle: https://youtu.be/Y9clBHENy4QJon Skeet: https://youtu.be/1tpyAQZFlZYPrag. Dave & Prag. Andy: https://youtu.be/taCNjiiusRkUri: https://youtu.be/G_CNnWH8OpwDESCRIPTIONCode performance versus simplicity. Delivery speed versus duplication. Flexibility versus maintainability—every decision you make in software engineering involves balancing tradeoffs. In Software Mistakes and Tradeoffs, you'll learn from costly mistakes that Tomasz Lelek and Jon Skeet have encountered over their impressive careers. You'll explore real-world scenarios where poor understanding of tradeoffs lead to major problems down the road, so you can pre-empt your own mistakes with a more thoughtful approach to decision-making.Learn how code duplication impacts the coupling and evolution speed of your systems, and how simple-sounding requirements can have hidden nuances with respect to date and time information. Discover how to efficiently narrow your optimization scope according to 80/20 Pareto principles, and ensure consistency in your distributed systems. You'll soon have built up the kind of knowledge base that only comes from years of experience.* Book description: © Manning PublicationRECOMMENDED BOOKSTomasz Lelek & Jon Skeet • Software Mistakes & TradeoffsAshley Peacock • Creating Software with Modern Diagramming TechniquesSimon Brown • Software Architecture for Developers Vol. 2Woods, Erder & Pureur • Continuous Architecture in PracticeUnmesh Joshi • Patterns of Distributed SystemsTwitterInstagramLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
Tomer Shiran is Founder of Dremio, the data lakehouse platform for self-service analytics and AI based on open source frameworks Apache Arrow, which the Dremio team created, and Apache Iceberg. Dremio has raised over $400M from investors including Norwest, Redpoint, Adams Street, Sapphire, Insight, and Lightspeed. They are currently valued at $2B. In this episode, we dig into Tomer's journey from MapR to Dremio, his initial vision for making the data stack more accessible, their first breakthrough with Apache Arrow and a columnar-format approach, focusing first on project-market fit before monetization, adding support for Apache Iceberg, how they're using AI to improve user experiences & more!
Andrew shares how generative AI is used by academic institutions, why employers and educators need to curb their fear of AI, what we need to consider for using AI responsibly, and the ins and outs of Andrew's podcast, Insight x Design. Key Points From This Episode:Andrew Madson explains what a tech evangelist is and what his role at Dremio entails. The ins and outs of Dremio. Understanding the pain points that Andrew wanted to alleviate by joining Dremio. How Andrew became a tech evangelist, and why he values this role.Why all tech roles now require one to upskill and branch out into other areas of expertise. The problems that Andrew most commonly faces at work, and how he overcomes them. How Dremio uses generative AI, and how the technology is used in academia. Why employers and educators need to do more to encourage the use of AI. The provenance of training data, and other considerations for the responsible use of AI. Learning more about Andrew's new podcast, Insight x Design. Quotes:“Once I learned about lakehouses and Apache Iceberg and how you can just do all of your work on top of the data lake itself, it really made my life a lot easier with doing real-time analytics.” — @insightsxdesign [0:04:24]“Data analysts have always been expected to be technical, but now, given the rise of the amount of data that we're dealing with and the limitations of data engineering teams and their capacity, data analysts are expected to do a lot more data engineering.” — @insightsxdesign [0:07:49]“Keeping it simple and short is ideal when dealing with AI.” — @insightsxdesign [0:12:58]“The purpose of higher education isn't to get a piece of paper, it's to learn something and to gain new skills.” — @insightsxdesign [0:17:35]Links Mentioned in Today's Episode:Andrew MadsonAndrew Madson on LinkedInAndrew Madson on XAndrew Madson on InstagramDremio Insights x DesignApache IcebergChatGPTPerplexity AIGeminiAnaconda Peter Wang on LinkedInHow AI HappensSama
In Folge 25 haben wir wieder einen besonderen Gast zugeschaltet - Viktor Kessler. Er ist Co-Founder von dem Software-Startup Hansetag. Die letzten 15 Jahre beschäftigte er sich mit Data Warehouse und Analytics in den unterschiedlichsten Rollen als Softwareentwickler und Projektleiter. In jüngster Vergangenheit war Viktor bei bekannten Tech-Startups wie MongoDB und Dremio als Solution Architect unterwegs. In dieser Folge dürft Ihr Euch auf einen informativen Dialog zwischen Viktor, Reinis und Michael freuen. Schwerpunkt dabei ist das Thema Datenprodukte und Datenkontrakte. Was ist dabei aus kommerzieller Sicht zu beachten und um was geht es, wenn von einem Datenkatalog die Rede ist? Diese und einige weitere spannende Fragen werden in dieser Folge diskutiert. Viel Spaß beim Anhören! Und wie immer: Meldet Euch jederzeit gerne, wenn Ihr Fragen, Anmerkungen oder Ideen habt, die Ihr gerne mit uns teilen möchtet. www.tiki-institut.com
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses the mistakes that makes Snowflake bills get so large. Hands-On Lakehouse Laptop Exercises:– MongoDB with Dremio: https://bit.ly/am-mongodb-dashboard– SQLServer with Dremio: https://bit.ly/am-sqlserver-dashboard– Postgres with Dremio: https://bit.ly/am-postgres-to-dashboard https://bio.alexmerced.com/data
Learn all about how a Developer Advocate uses data to solve the world's problems, one programming language at a time. If you think that you have too much of a random background to break into data, you gotta listen to Alex's origin story! He has done basically everything under the sun before officially getting his dream data job with Dremio. Alex went from playing video games, practicing as a rockstar, owning a hobby store, learning just about every programming language out there, and everything in between. Quote: "I'm happy with the story I've had, but I'm always glad to continue making the story more fun" Highlights ✨ - How to level up your skills when you're grounded - Starting several business, all through networking - Using content creation as a means for learning Alex already gave his warning, join us for the ride, this episode is jam packed full of fun! Where to find Alex LinkedIn Substack Twitter Web Blog ****** MERCH!! Grab your nerdnourishment swag Data Career Strategy Program Book a call to learn more Support If you like what you see, consider buying me a broccoli (it fuels my creativity) --- Support this podcast: https://podcasters.spotify.com/pod/show/nerdnourishment/support
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses the benefits of Apache Iceberg’s open data ecosystem! Build a Data Lakehouse on Your Laptop Deploy Deploy into Production
Driving intelligence and monetizing enterprise data depends on the underlying data architecture, query engines and storage resources. In this edition of Bloomberg Intelligence's Tech Disruptors podcast, Dremio cofounder and Chief Product Officer Tomer Shiran joins BI senior software analyst Sunil Rajgopal, to discuss the evolving data-management architecture and technologies and how the company is working to curtail bottlenecks and enable business-intelligence analysts and data scientists to query and draw insights at scale and at high speed. They also discuss how the platform differs and competes with bigger data-management platforms, such as Snowflake, MongoDB, the movement toward Zero ETL and potential implications of AI and how that's likely to shape jobs.
Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. The primary purpose of the catalog is to inform the query engine of what data exists and where, but the Nessie project aims to go beyond that simple utility. In this episode Alex Merced explains how the branching and merging functionality in Nessie allows you to use the same versioning semantics for your data lakehouse that you are used to from Git. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster (https://www.dataengineeringpodcast.com/dagster) today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join us at the top event for the global data community, Data Council Austin. From March 26-28th 2024, we'll play host to hundreds of attendees, 100 top speakers and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data and sharing their insights and learnings through deeply technical talks. As a listener to the Data Engineering Podcast you can get a special discount off regular priced and late bird tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit dataengineeringpodcast.com/data-council (https://www.dataengineeringpodcast.com/data-council) and use code dataengpod20 to register today! Your host is Tobias Macey and today I'm interviewing Alex Merced, developer advocate at Dremio and co-author of the upcoming book from O'reilly, "Apache Iceberg, The definitive Guide", about Nessie, a git-like versioned catalog for data lakes using Apache Iceberg Interview Introduction How did you get involved in the area of data management? Can you describe what Nessie is and the story behind it? What are the core problems/complexities that Nessie is designed to solve? The closest analogue to Nessie that I've seen in the ecosystem is LakeFS. What are the features that would lead someone to choose one or the other for a given use case? Why would someone choose Nessie over native table-level branching in the Apache Iceberg spec? How do the versioning capabilities compare to/augment the data versioning in Iceberg? What are some of the sources of, and challenges in resolving, merge conflicts between table branches? Can you describe the architecture of Nessie? How have the design and goals of the project changed since it was first created? What is involved in integrating Nessie into a given data stack? For cases where a given query/compute engine doesn't natively support Nessie, what are the options for using it effectively? How does the inclusion of Nessie in a data lake influence the overall workflow of developing/deploying/evolving processing flows? What are the most interesting, innovative, or unexpected ways that you have seen Nessie used? What are the most interesting, unexpected, or challenging lessons that you have learned while working with Nessie? When is Nessie the wrong choice? What have you heard is planned for the future of Nessie? Contact Info LinkedIn (https://www.linkedin.com/in/alexmerced) Twitter (https://www.twitter.com/amdatalakehouse) Alex's Article on Dremio's Blog (https://www.dremio.com/authors/alex-merced/) Alex's Substack (https://amdatalakehouse.substack.com/) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links Project Nessie (https://projectnessie.org/) Article: What is Nessie, Catalog Versioning and Git-for-Data? (https://www.dremio.com/blog/what-is-nessie-catalog-versioning-and-git-for-data/) Article: What is Lakehouse Management?: Git-for-Data, Automated Apache Iceberg Table Maintenance and more (https://www.dremio.com/blog/what-is-lakehouse-management-git-for-data-automated-apache-iceberg-table-maintenance-and-more/) Free Early Release Copy of "Apache Iceberg: The Definitive Guide" (https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html) Iceberg (https://iceberg.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/iceberg-with-ryan-blue-episode-52/) Arrow (https://arrow.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/voltron-data-apache-arrow-episode-346/) Data Lakehouse (https://www.forbes.com/sites/bernardmarr/2022/01/18/what-is-a-data-lakehouse-a-super-simple-explanation-for-anyone/?sh=6cc46c8c6088) LakeFS (https://lakefs.io/) Podcast Episode (https://www.dataengineeringpodcast.com/lakefs-data-lake-versioning-episode-157) AWS Glue (https://aws.amazon.com/glue/) Tabular (https://tabular.io/) Podcast Episode (https://www.dataengineeringpodcast.com/tabular-iceberg-lakehouse-tables-episode-363) Trino (https://trino.io/) Presto (https://prestodb.io/) Dremio (https://www.dremio.com/) Podcast Episode (https://www.dataengineeringpodcast.com/dremio-with-tomer-shiran-episode-58) RocksDB (https://rocksdb.org/) Delta Lake (https://delta.io/) Podcast Episode (https://www.dataengineeringpodcast.com/delta-lake-data-lake-episode-85/) Hive Metastore (https://cwiki.apache.org/confluence/display/hive/design#Design-Metastore) PyIceberg (https://py.iceberg.apache.org/) Optimistic Concurrency Control (https://en.wikipedia.org/wiki/Optimistic_concurrency_control) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
In this episode of the Digital Executive, hosted by Brian Thomas at Coruzant Technologies, we delve into the groundbreaking journey of Tomer Shiran, a pioneer in the big data analytics space and a key figure in the evolution of Dremio. Shiran, with a rich background as Dremio's founding CEO and a former VP product at MapR, discusses the transformative path of Dremio from its inception to becoming a major player in data analytics, serving large enterprise customers and embracing generative AI technologies to enhance user productivity and data accessibility.Shiran shares insights into Dremio's innovative features, like Text to SQL, which converts natural language queries into SQL code, democratizing data querying for users across varying degrees of data literacy. Additionally, he highlights the significant impact of emerging technologies, such as Apache Iceberg and Apache Arrow, on the data analytics and management sector, emphasizing gen AI's potential to revolutionize the field. This episode provides a deep dive into the challenges and opportunities of integrating AI into data analytics platforms, the importance of self-service data access, and the future trends that will shape the industry.
Ravit Jain had a chat with Alex Merced, Developer Advocate at Dremio, during the Chill Data Summit! They discussed Apache Iceberg, Future of Iceberg, Data Lakehouse and much more! #chilldatasummit #theravitshow
Fredrik has Matt Topol and Lars Wikman over for a deep and wide chat about Apache Arrow and many, many topics in the orbit of the language-independent columnar memory format for flat and hierarchical data. What does that even mean? What is the point? And why does Arrow only feel more and more interesting and useful the more you think about deeply integrating it into your systems? Feeding data to systems fast enough is a problem which is focused on much less than it ought to be. With Arrow you can send data over the network, process it on the CPU - or GPU for that matter- and send it along to the database. All without parsing, transformation, or copies unless absolutely necessary. Thank you Cloudnet for sponsoring our VPS! Comments, questions or tips? We are @kodsnack, @tobiashieta, @oferlund and @bjoreman on Twitter, have a page on Facebook and can be emailed at info@kodsnack.se if you want to write longer. We read everything we receive. If you enjoy Kodsnack we would love a review in iTunes! You can also support the podcast by buying us a coffee (or two!) through Ko-fi. Links Lars Matt Øredev Matt’s Øredev presentations: State of the Apache Arrow ecosystem: How your project can leverage Arrow! and Leveraging Apache Arrow for ML workflows Kallbadhuset Apache Arrow Lars talks about his Arrow rabbit hole in Regular programming SIMD/vectorization Spark Explorer - builds on Polars Null bitmap Zeromq Airbyte Arrow flight Dremio Arrow flight SQL Influxdb Arrow flight RPC Kafka Pulsar Opentelemetry Arrow IPC format - also known as Feather ADBC - Arrow database connectivity ODBC and JDBC Snowflake DBT - SQL to SQL Jinja Datafusion Ibis Substrait Meta’s Velox engine Arrow’s project management committee (PMC) Voltron data Matt’s Arrow book - In-memory analytics with Apache Arrow Rapids and Cudf The Theseus engine - accelerator-native distributed compute engine using Arrow The composable codex The standards chapter Dremio Hugging face Apache Hop - orchestration data scheduling thing Directed acyclic graph UCX - libraries for finding fast routes for data Infiniband NUMA CUDA GRPC Foam bananas Turkish pepper - Tyrkisk peber Plopp Marianne Titles For me, it started during the speaker’s dinner Old, dated, and Java A real nerd snipe Identical representation in memory Working on columns It’s already laid out that way Pass the memory, as is Null plus null is null A wild perk Arrow into the thing So many curly brackets you need to store Arrow straight through Something data people like to do So many backends The SQL string is for people I’m rude, and he’s polite Feed the data fast enough A depressing amount of JSON Arrow the whole way through These are the problems in data Reference the bytes as they are Boiling down to Arrow Data lakehouses Removing inefficiency
Orchestrate all the Things podcast: Connecting the Dots with George Anadiotis
For many organizations today, data management comes down to handing over their data to one of the "Big 5" data vendors: Amazon, Microsoft Azure and Google, plus Snowflake and Databricks. But analysts David Vellante and George Gilbert believe that the needs of modern data applications coupled with the evolution of open storage management may lead to the emergence of a "sixth data platform". The sixth data platform hypothesis is that open data formats may enable interoperability, leading the transition away from vertically integrated vendor-controlled platforms towards independent management of data storage and permissions. It's an interesting scenario, and one that would benefit users by forcing vendors to compete for every workload based on the business value delivered, irrespective of lock-in. But how close are we to realizing this? To answer this question, we have to examine open data formats and their interoperability potential across clouds and formats, as well as on the semantics and governance layer. We caught up with Peter Corless and Alex Merced to talk about all of that. Article published on Orchestrate all the Things: https://linkeddataorchestration.com/2024/01/11/data-management-in-2024-open-data-formats-and-a-common-language-for-a-sixth-data-platform/
I interviewed Nik Acheson, Senior Director of Product Management at Dremio, at Amazon Web Services (AWS) Re: Invent. We discussed some of the most pressing topics in the world of data and AI, and I am thrilled to share these insights with you all.Here's a sneak peek of our conversation:1️⃣ Key Takeaways from Re: Invent 20222️⃣ Data/AI Predictions for 20243️⃣ Bridging Data Engineering and GenAI#data #datascience #aws #awsreinvent2023 #theravitshow
I sat down with Read Maloney, the Chief Marketing Officer at Dremio, at Amazon Web Services (AWS) Re: Invent to discuss some developments in the world of data analytics and AI. Key Highlights from our Interview: 1️⃣ The New Gen AI Feature 2️⃣ Benefits for Customers 3️⃣ Focus on Apache Iceberg Read shares his insights into the future of AI in data analytics and how Dremio is leading the charge. #data #datascience #aws #awsreinvent2023 #theravitshow
I had the opportunity to sit down with Sendur Sellakumar, the CEO of Dremio, at the Amazon Web Services (AWS) Re: Invent event. We discussed customer insights, the challenges faced by customers, data predictions in 2024, and much more! #data #datascience #aws #awsreinvent2023 #theravitshow
The data lakehouse has been quickly gaining popularity within the data management and analytics space. Combining elements of data lakes and data warehouses, the data lakehouse aims to address the challenges associated with both in a way which helps companies reach their data and business goals. In this episode of the EM360 Podcast, Christina Stathopoulos speaks to Read Maloney, CMO at Dremio, as they discuss:The state of the data lakehouseHow an effective lakehouse strategy can be the key to digital transformationHow data lakehouses empower business
The data lakehouse has been quickly gaining popularity within the data management and analytics space. Combining elements of data lakes and data warehouses, the data lakehouse aims to address the challenges associated with both in a way which helps companies reach their data and business goals. In this episode of the EM360 Podcast, Christina Stathopoulos speaks to Read Maloney, CMO at Dremio, as they discuss:The state of the data lakehouseHow an effective lakehouse strategy can be the key to digital transformationHow data lakehouses empower business
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
ZeroETL & Virtual Data Marts Presentation: https://www.youtube.com/watch?v=mDwpsg8btto Blog for getting hands on with Dremio on Laptop:https://www.dremio.com/blog/intro-to-dremio-nessie-and-apache-iceberg-on-your-laptop/
In this week's episode, Jon sits down with Alex Merced, Developer Advocate at Dremio and YouTuber who produces a wide range of multimedia content to help the next generation of web developers. In this episode, Alex shares his secrets on how he creates impactful developer education content. Join him and Jon as they explore the importance of building a brand as a content creator, unpack the complexities in creating layered tech tutorials, and emphasize the need to embrace constructive criticism in the developer space.
I had the pleasure of interviewing Nik Acheson, Senior Director of Product Management, Dremio at Big Data LDN today. We discussed Data Mesh, Dremio's unique features, cost efficiency and much more.
Spotlight on A.P. Moller - Maersk's AI & Data Integration Leaders! I had an insightful chat with Mark Sear, Director AI & Data Integration and Graham Evans, Senior Enterprise Architect from Maersk at Big Data LDN:What we discuss?1️⃣ AI & Data are reshaping logistics at Maersk, driving efficiency and better customer experiences.2️⃣ Challenges? Merging AI with a traditional industry like shipping isn't easy, but it's rewarding.3️⃣ Dremio's role? It's accelerating their data-driven decisions.4️⃣ Future trends?5️⃣ Advice for AI enthusiasts? Stay curious and embrace change.Thanks, Mark and Graham, for the gems. I really enjoyed our session. Looking forward to the next one soon!
Had the privilege to chat with Dremio, Founder, Tomer Shiran, on The Ravit Show. We discussed about Apache Iceberg and its pivotal role in Dremio's ecosystem. Here's a quick rundown:
In der Folge #17 von KI101 bekommt TIKI Besuch aus dem Silicon Valley. Zu Gast ist von Tomer Shiran, der Gründer und CPO von Dremio, ein Data Lakehouse Unternehmen. Michael und Reinis stellen interessante Fragen und wir erfahren in einem spannenden Gespräch mehr von Tomers Gedanken rund um Technologien. Außerdem bekommen wir einen Einblick in die Strategie von Dremio als innovatives Technologie-Unternehmen aus dem Silicon Valley. Viel Spaß beim Anhören dieser visionären Folge. Melde Dich gerne bei uns, wenn Du mehr erfahren möchtest. Dieser Podcast wird produziert von FM Podmedia - www.fmpodmedia.de ___________ In episode #17 of KI101 (AI101), TIKI (Technological Institute for Applied Artificial Intelligence) gets a visit from Silicon Valley. Tomer Shiran, the founder and CPO of Dremio, a data lakehouse company, is our guest. Michael and Reinis ask interesting questions and we learn more of Tomer's thoughts around technologies. We also get insights into Dremio's strategy as an innovative Silicon Valley technology company. Enjoy listening to this visionary episode. Feel free to contact us. This podcast is produced by FM Podmedia - www.fmpodmedia.de
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced explains what are Dremio reflection and how they bring you speed, reduce storage costs, and do so while keeping things easy for your end users. Follow Alex on twitter @amdatalakehouse
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses Dremio’s new generative AI Features and the future of Data Lakehouses. Follow Alex on twitter @amdatalakehouse
No episódio de hoje, Luan Moreno, Mateus Oliveira e Antony Lucas entrevistaram Dipankar Mazumdar, atualmente como Data Advocate na Dremio.Dremio é uma das mais conhecidas tecnologias de Self-Service SQL Analytics de mercado, unificando a visão dos dados e utilizando a lingua franca de dados: o SQL. Alinhado com o Apache Iceberg, o Dremio traz a proposta de ser um Open Data Lakehouse. Com Apache Iceberg, você tem os seguintes benefícios:Compactação de Dados;Time Travel;ACID;Hidden Partition;Desenvolvido para multi-plataforma.Falamos também nesse bate-papo sobre os seguintes temas:Engenharia de Dados;Apache Iceberg;Dremio.Aprenda mais sobre como o Dremio e Iceberg que juntos, podem prover mais uma opção de Data Lakehouse, principalmente para casos que vamos trabalhar com plataformas distintas de processamento e exploração de dados.Dipankar Mazumdar = Linkedinhttps://www.dremio.com/https://iceberg.apache.org/ Luan Moreno = https://www.linkedin.com/in/luanmoreno/
Moving from an on-prem infrastructure to a frictionless cloud model has become a key goal for many companies. From scalability to improved security, as well as cost-saving benefits, moving to the cloud bring a lot of benefits. But what challenges are enterprises facing in their migration, and how can this process be made easier?In this episode of the EM360 Podcast, Analyst Susan Walsh speaks to Alex Merced, Developer Advocate at Dremio, to discuss:On-prem data lakes Moving to a frictionless cloudHow to streamline the migration process
Moving from an on-prem infrastructure to a frictionless cloud model has become a key goal for many companies. From scalability to improved security, as well as cost-saving benefits, moving to the cloud bring a lot of benefits. But what challenges are enterprises facing in their migration, and how can this process be made easier?In this episode of the EM360 Podcast, Analyst Susan Walsh speaks to Alex Merced, Developer Advocate at Dremio, to discuss:On-prem data lakes Moving to a frictionless cloudHow to streamline the migration process
In this episode of the Future of Application Security, Harshil speaks with Emre Saglam, Head of Security and Compliance at Dremio, a data lakehouse that empowers data engineers and analysts with easy-to-use self-service SQL analytics. They discuss the current state of AppSec, including how to improve security by prioritizing business implications, using frameworks, and having tools "closer to the ground." They also talk about how to structure security teams, how much time you should spend with product teams, what skills are needed for future success, and more. Topics discussed: Emre's career evolution in security, from breaking into mailboxes as a kid growing up in Turkey, to starting a Linux group in the 1990s, to working at places like World Bank and Salesforce before becoming the Head of Security and Compliance at Dremio. The current challenges of Product Security, including the need for bigger companies to create ways to glue together their disconnections, and why security teams need to prioritize overall business implications and impact. How security is improving through the use of frameworks and tools that are "closer to the ground," making security easier to scale. Why security teams should adopt strategies like injecting security across each phase of product development, and why security teams should spend more time with the product team. How to structure security teams in terms of which skills to hire, how much time to dedicate to the product side, how to keep up morale and motivation, and how to align teams to create secure products for customers. How security teams can bring attention to areas where they may need more resources, planning, or prioritization, and why alignment with leadership is key. Why curiosity, questioning intention, being firm, having a Plan B, and good communication are skills that security team members must acquire in order to be successful. Why the future of product security will be better correlation, deduplication, and few false positives, and how AI will contribute to being able to write better code.
Highlights from this week's conversation include:Alex's background in the data space (2:41)Comics and Pop Culture Blending with Finance training (5:20)What is a data lake house? (7:36)What is Dremio solving in for users? (11:21)Essential components of a data lake house (16:35)Difference between on-prem and cloud experiences (33:53)What does it mean to be a developer advocate? (41:31)Final thoughts and takeaways (49:02)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
In this bonus episode, Eric and Kostas preview their upcoming conversation with Alex Merced of Dremio.
Summary Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used. Projects like Apache Iceberg provide a viable alternative in the form of data lakehouses that provide the scalability and flexibility of data lakes, combined with the ease of use and performance of data warehouses. Ryan Blue helped create the Iceberg project, and in this episode he rejoins the show to discuss how it has evolved and what he is doing in his new business Tabular to make it even easier to implement and maintain. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Hey there podcast listener, are you tired of dealing with the headache that is the 'Modern Data Stack'? We feel your pain. It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. It ends up being anything but that. Setting it up, integrating it, maintaining it—it's all kind of a nightmare. And let's not even get started on all the extra tools you have to buy to get it to do its thing. But don't worry, there is a better way. TimeXtender takes a holistic approach to data integration that focuses on agility rather than fragmentation. By bringing all the layers of the data stack together, TimeXtender helps you build data solutions up to 10 times faster and saves you 70-80% on costs. If you're fed up with the 'Modern Data Stack', give TimeXtender a try. Head over to timextender.com/dataengineering where you can do two things: watch us build a data estate in 15 minutes and start for free today. Your host is Tobias Macey and today I'm interviewing Ryan Blue about the evolution and applications of the Iceberg table format and how he is making it more accessible at Tabular Interview Introduction How did you get involved in the area of data management? Can you describe what Iceberg is and its position in the data lake/lakehouse ecosystem? Since it is a fundamentally a specification, how do you manage compatibility and consistency across implementations? What are the notable changes in the Iceberg project and its role in the ecosystem since our last conversation October of 2018? Around the time that Iceberg was first created at Netflix a number of alternative table formats were also being developed. What are the characteristics of Iceberg that lead teams to adopt it for their lakehouse projects? Given the constant evolution of the various table formats it can be difficult to determine an up-to-date comparison of their features, particularly earlier in their development. What are the aspects of this problem space that make it so challenging to establish unbiased and comprehensive comparisons? For someone who wants to manage their data in Iceberg tables, what does the implementation look like? How does that change based on the type of query/processing engine being used? Once a table has been created, what are the capabilities of Iceberg that help to support ongoing use and maintenance? What are the most interesting, innovative, or unexpected ways that you have seen Iceberg used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Iceberg/Tabular? When is Iceberg/Tabular the wrong choice? What do you have planned for the future of Iceberg/Tabular? Contact Info LinkedIn (https://www.linkedin.com/in/rdblue/) rdblue (https://github.com/rdblue) on GitHub Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Iceberg (https://iceberg.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/iceberg-with-ryan-blue-episode-52/) Hadoop (https://hadoop.apache.org/) Data Lakehouse (https://www.forbes.com/sites/bernardmarr/2022/01/18/what-is-a-data-lakehouse-a-super-simple-explanation-for-anyone/) ACID == Atomic, Consistent, Isolated, Durable (https://en.wikipedia.org/wiki/ACID) Apache Hive (https://hive.apache.org/) Apache Impala (https://impala.apache.org/) Bodo (https://www.bodo.ai/) Podcast Episode (https://www.dataengineeringpodcast.com/bodo-parallel-data-processing-python-episode-223/) StarRocks (https://www.starrocks.io/) Dremio (https://www.dremio.com/) Podcast Episode (https://www.dataengineeringpodcast.com/dremio-open-data-lakehouse-episode-333/) DDL == Data Definition Language (https://en.wikipedia.org/wiki/Data_definition_language) Trino (https://trino.io/) PrestoDB (https://prestodb.io/) Apache Hudi (https://hudi.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/hudi-streaming-data-lake-episode-209/) dbt (https://www.getdbt.com/) Apache Flink (https://flink.apache.org/) TileDB (https://tiledb.com/) Podcast Episode (https://www.dataengineeringpodcast.com/tiledb-universal-data-engine-episode-146/) CDC == Change Data Capture (https://en.wikipedia.org/wiki/Change_data_capture) Substrait (https://substrait.io/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
Amanda Robson is a partner at Cowboy Ventures. She works with enterprise companies focusing on software infrastructure companies. She has a passion for open source companies and co-hosts the Open Source Startup Podcast. Before joining Cowboy, she was an early-stage investor at Norwest where she worked with a number of enterprise software companies including 6 River Systems, Fossa, and Dremio. She is the proud founder of Modern Angels - a community and database of 200+ female and non-binary angel investors. She also co-leads the VC Champions program for All Raise, and co-chairs NextGen Partners, an organization that helps up-and-coming investors get the support and access they need to be successful. In this episode, we cover a range of topics including: - How she got into VC - Open source projects - How she invests in open source companies- Software infrastructure market landscape - Investment framework - Trends in AI/ML - The role of AI in software infrastructure --------Where to find Prateek Joshi: Newsletter: https://prateekjoshi.substack.com Website: http://prateekj.com LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 Twitter: https://twitter.com/prateekvjoshi
Summary The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery. In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Truly leveraging and benefiting from streaming data is hard - the data stack is costly, difficult to use and still has limitations. Materialize breaks down those barriers with a true cloud-native streaming database - not simply a database that connects to streaming systems. With a PostgreSQL-compatible interface, you can now work with real-time data using ANSI SQL including the ability to perform multi-way complex joins, which support stream-to-stream, stream-to-table, table-to-table, and more, all in standard SQL. Go to dataengineeringpodcast.com/materialize (https://www.dataengineeringpodcast.com/materialize) today and sign up for early access to get started. If you like what you see and want to help make it better, they're hiring (https://materialize.com/careers/) across all functions! Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you're not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end-to-end Data Observability Platform! Trusted by the data teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, dbt models, Airflow jobs, and business intelligence tools, reducing time to detection and resolution from weeks to just minutes. Monte Carlo also gives you a holistic picture of data health with automatic, end-to-end lineage from ingestion to the BI layer directly out of the box. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/montecarlo (http://www.dataengineeringpodcast.com/montecarlo) to learn more. Data and analytics leaders, 2023 is your year to sharpen your leadership skills, refine your strategies and lead with purpose. Join your peers at Gartner Data & Analytics Summit, March 20 – 22 in Orlando, FL for 3 days of expert guidance, peer networking and collaboration. Listeners can save $375 off standard rates with code GARTNERDA. Go to dataengineeringpodcast.com/gartnerda (https://www.dataengineeringpodcast.com/gartnerda) today to find out more. Your host is Tobias Macey and today I'm interviewing Srujan Akula about DataOS, a pre-integrated and managed data platform built by The Modern Data Company Interview Introduction How did you get involved in the area of data management? Can you describe what your mission at The Modern Data Company is and the story behind it? Your flagship (only?) product is a platform that you're calling DataOS. What is the scope and goal of that platform? Who is the target audience? On your site you refer to the idea of "data as software". What are the principles and ways of thinking that are encompassed by that concept? What are the platform capabilities that are required to make it possible? There are 11 "Key Features" listed on your site for the DataOS. What was your process for identifying the "must have" vs "nice to have" features for launching the platform? Can you describe the technical architecture that powers your DataOS product? What are the core principles that you are optimizing for in the design of your platform? How have the design and goals of the system changed or evolved since you started working on DataOS? Can you describe the workflow for the different practitioners and stakeholders working on an installation of DataOS? What are the interfaces and escape hatches that are available for integrating with and extending the operation of the DataOS? What are the features or capabilities that you are expressly choosing not to implement? (e.g. ML pipelines, data sharing, etc.) What are the design elements that you are focused on to make DataOS approachable and understandable by different members of an organization? What are the most interesting, innovative, or unexpected ways that you have seen DataOS used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on DataOS? When is DataOS the wrong choice? What do you have planned for the future of DataOS? Contact Info LinkedIn (https://www.linkedin.com/in/srujanakula/) @srujanakula (https://twitter.com/srujanakula) on Twitter Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Modern Data Company (https://themoderndatacompany.com/) Alation (https://www.alation.com/) Airbyte (https://airbyte.com/) Podcast Episode (https://www.dataengineeringpodcast.com/airbyte-open-source-data-integration-episode-173/) Fivetran (https://www.fivetran.com/) Podcast Episode (https://www.dataengineeringpodcast.com/fivetran-data-replication-episode-93/) Airflow (https://airflow.apache.org/) Dremio (https://www.dremio.com/) Podcast Episode (https://www.dataengineeringpodcast.com/dremio-with-tomer-shiran-episode-58/) PrestoDB (https://prestodb.io/) GraphQL (https://graphql.org/) Cypher (https://neo4j.com/developer/cypher/) graph query language Gremlin (https://en.wikipedia.org/wiki/Gremlin_(query_language)) graph query language The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced encourages everyone to go to Dremio.com and try out the free Dremio test-drive and gives end of year thoughts.
In this episode, Karim Fanous, VP Engineering @ strongDM, talks about the impacts on an engineering team's productivity. Key Takeaways: Impacts on productivity for the team Context switching between interviewing and development Importance of Having a repeatable and objective process Challenges in onbarding process for newly hired engineers. Understanding the people retention process. ___About today's guest: Karim Fanous https://www.linkedin.com/in/karimfan/ Link to article: https://karimfanous.substack.com/p/scaling-an-engineering-team-will Karim is the VP of Engineering at strongDM. Before that, he was in similar roles at startups like Qumulo and Dremio. Karim also spent 10 years at Microsoft. ___ Thank you so much for checking out this episode of The Tech Trek, and we would appreciate it if you would take a minute to rate and review us on your favorite podcast player. Want to learn more about us? Head over at https://www.elevano.com Have questions or want to cover specific topics with our future guests? Please message me at https://www.linkedin.com/in/amirbormand (Amir Bormand)
Neste episódio falaremos sobre o Dremio, o projeto open-source que se descreve como The Data Lake Engine, sendo uma ferramenta que permite realizar a integração de dados provenientes das mais variadas fontes de dados.O projeto, The Data Lake Engine, tem benefícios e arquitetura integrada com bancos relacionais, bases colunares, indexadores dentre outros tipos. Hoje recebemos Alex Merced, Desenvolvedor e Advocate na Dremio e Data Lakehouse Evangelist que compartilhou conosco seu vasto conhecimento sobre o assunto.Dremio = The Easy and Open Data Lakehouse Luan Moreno = https://www.linkedin.com/in/luanmoreno/
This bonus episode features conversations from season 3 of the Open||Source||Data podcast. In this episode, you'll hear from DeVaris Brown, CEO & Co-founder of Meroxa; Tomer Shiran, Founder & CPO of Dremio; and Erica Brescia, Managing Director at Redpoint Ventures.Sam sat down with each guest to discuss how they're making data more programmable by shifting left.You can listen to the full episodes from DeVaris Brown, Tomer Shiran, and Erica Brescia by clicking the links below.-------------------Episode Timestamps:(00:12): DeVaris Brown(00:42): Tomer Shiran(01:32): Erica Brescia-------------------Links:Listen to DeVaris' episodeListen to Tomer's episodeListen to Erica's episode
The "lakehouse" architecture balances the scalability and flexibility of data lakes with the ease of use and transaction support of data warehouses. Dremio is one of the companies leading the development of products and services that support the open lakehouse. In this episode Jason Hughes explains what it means for a lakehouse to be "open" and describes the different components that the Dremio team build and contribute to.
Join Alex Merced and Bob Haffner for a discussion about the Open Data Lakehouse concept #data #dataengineering #datalake #datalakehouse Connect with Alex Twitter - @amdatalakehouse Connect with Bob Twitter - @bobhaffner LinkedIn - linkedin.com/in/bobhaffner Show notes The DataNation Podcast Available on iTunes/Spotify/Stitcher The Subsurface Data Lakehouse Community dremio.com/subsurface Dremio dremio.com Follow the podcast on Twitter @EngSideOfData
The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
Alex Merced discusses how Dremio enables companies of all sizes to have big data scalability without having to break the bank so companies can be data oriented earlier on in their lifecycle.
Tomer Shiran is Co-Founder and CPO of Dremio, a Bay Area-based, high-performance, forever-free lakehouse platform that builds on an open data architecture by the creators of Apache Arrow. Prior to founding Drimeo, Tomer was an early employee at MapR Technologies where he ran product management. Tomer dives into their plans at Dremio to impact the data industry through better, more scalable platforms which will allow companies to have a data platform they can truly rely on. Listen in to hear more about Tomer's background, how Dremio got started, and their plans for the future. Show Notes: Check out Dremio: https://www.dremio.com/ Learn more from Dremio's Blog: https://www.dremio.com/blog/ Connect with Tomer on LinkedIn: https://www.linkedin.com/in/tshiran/ On tap for today's episode: Cappucino & Espresso Contact Us: https://www.hashmapinc.com/reach-out
这是Monica 第一次访谈海外一线企业服务投资机构合伙人,尤其是对于对基础软件、开源等领域关注的同学,更是不可错过。这次的嘉宾 Casber Wang, 是硅谷老牌成长期投资基金 Sapphire Ventures 的帅哥投资人。2018年以投资经理身份进入Sapphire, 短短4年时间就成长为合伙人,当年被 Business Insider 评为 Enterprise VC Rising Star Investor. Hello World, who is OnBoard?! Sapphire Ventures 是一个规模超过100亿美金的基金,成立近20年来一直都是在企业服务领域。已经有超过30个IPO,近50个并购退出。大家耳熟能详的包括Box, docusign, monday, mulesoft, nutanix, sumo logic, jfrog, linkedin 等等。访谈中,Casber 也会跟大家介绍如此专注的基金的投资理念。 Casber 关注的领域跟Monica 一样,也是开发者和infra 领域。他投资的公司有被Okta 65亿美金收购的Auth0, 还有Dremio, CircleCI, Tetrate, Thoughtspot, Pendo 等等发展非常棒的公司。他的几篇深度研究文章,关于开源、数据 infra, MLOps等等,都是硅谷创投界小有名气的必读文章。我都放在节目介绍中,大家赶紧去学习! 这次的访谈将近2个小时,涵盖的话题非常丰富,从硅谷一线的成长期基金对商业化开源公司的投资判断,到 infra 公司早期客户的选择、成长方式较上一代企业的异同,到纷繁 infra/devtools 领域的新机会,能跟同一领域的投资人有如此坦诚又深入的交流,真是太难得了。 我们当然也要审视一下当下市场对企业和投资机构的影响,以及 Casber 在投资路上的成长心得。相信对很多年轻的从业者,都会很有启发。 太多精彩内容不舍得剪掉。各位听众小伙伴就慢慢听吧!Enjoy! 我们聊了什么 [02:29] Casber 的经历,如何进入VC,大学创业经历对投资的影响 [08:45] Saphhier Ventures 是一支怎样的老牌基金:投资主题,check size, 精锐团队,投资决策 [14:08] 为什么一家VC会注重 NPS (Net Promoter Score)? [16:52] Sapphire Ventures 如何做到20年专注企业服务?如何应对最近竞争激烈的VC市场? [22:02] 投资案例分享:Auth0, 投资后3年收入10倍,被Okta 65亿美金收购 [27:23] Auth0 创始人如何引入高端人才,持续实现组织升级 [32:51] 应用类SaaS 和基础软件投资判断有何异同? [38:53] 为何开始关注开源? [41:27] 投资人如何评估一个商业开源公司? [45:49] 企业如何选择是否要开源?哪些领域开源会更有优势? [54:54] 开源公司如何选择早期客户? [58:26] 基础软件公司成长路径:比起10年前成立的公司,有何异同? [64:45] Data Infra 中还有哪些品类值得关注? [69:29] 现在如此分散的 data infra landscape 未来会整合吗? [73:40] 如何把 industry mapping 当成一个产品来做 [81:48] 如何在层出不穷的新概念中分辨噪音,识别timing [85:16] 暴跌的二级市场,对一级市场投资有何影响? [88:19] 企业在新的市场常态中,应该如何调整经营决策和融资节奏? [91:38] 从投行到VC经历了哪些转变? [95:14] 4年从 Associate 成长为合伙人的心得 [100:17] 快问快答环节! 我们提到了什么 Casber 推荐的书:Software: An Intimate Portrait of Larry Ellison and Oracle 安德烈·阿加西自传 Open: An Autobiography Casber's profile Sapphire Ventures Okta 65亿美金收购 Auth0 参考文章 【Casber 大作】The Future of AI Infrastructure is Becoming Modular: Why Best-of-Breed MLOps Solutions are Taking Off & Top Players to Watch What is the Open Data Ecosystem and Why it's Here To Stay 3 Strategies Software Companies Can Borrow from the Open Source Cloud Playbook 从一个“仅为”$1Bn的开源数据库IPO,聊聊开源和infra的现在与未来 欢迎关注M小姐的微信公众号,了解更多中美企业服务的干货内容! M小姐研习录 (ID: MissMStudy) 大家的点赞、评论、转发是对我们最好的鼓励!希望你分享给对这个话题感兴趣的朋友哦~ 如果你有希望我们聊的话题,希望我们邀请的访谈嘉宾,都欢迎在留言中告诉我们哦! 免责:节目中的观点都是嘉宾和主持人个人观点,不代表所在机构观点,亦不构成任何投资建议。
In this episode, Eric spoke with Nicole Siegal Fuselier, who is VP of Corporate and Revenue Marketing at Dremio. Nicole is known for thought leadership on topics like defining revenue marketing and promoting alignment, she often writes about tech, and much of her work is framed around what it means to be at the forefront of the emerging revenue marketing space.During their discussion, Eric and Nicole discussed the importance of having a holistic view of the sales funnel, and why sales and marketing alignment is more vital than ever. They also nerded out together discussing demand gen and discussed why she considers herself not just a marketer, but a revenue marketer. Check it out!About Eric Stockton, VP of Demand Gen at SharpSpring:With expertise in the areas of internet marketing, eCommerce, lead gen, publishing, and online media, Eric has directly led $3MM+ ad budgets and $70M+ top-line sales organizations.Connect with Eric: https://www.linkedin.com/in/ericstockton About Nicole Siegal Fuselier, VP of Corporate and Revenue Marketing at DremioNicole is a results-driven senior technology marketing professional with a reputation for developing, managing and implementing global marketing strategies, streamlining operations and executing marketing programs that directly influence revenue. At Dremio, she's a creative leader who approaches market challenges and opportunities using analytics and innovation.Connect with Nicole: https://www.linkedin.com/in/nsiegalfuselier/For more information & to connect with us:Visit our website: www.sharpspring.comHave a question? Reach us at https://sharpspring.com/contact-us/Follow us on LinkedIn: www.linkedin.com/company/sharpspringWatch video versions of our podcast on YouTube or in the “Resources” section at https://www.sharpspring.com.Subscribe for more: https://www.youtube.com/c/Sharpspring
We had heard about Apache Arrow and Arrow Flight as being a hi-performing database with access speeds to match for a while now and finally got a chance to hear what it was all about with James Duong, Co-Fourder of Bit Quill Technologies/Senior Staff Developer at Dremio and David Li (@lidavidm), Apache PMC and software … Continue reading "130: GreyBeards talk high-speed database access using Apache Arrow Flight, with James Duong and David Li"
We had heard about Apache Arrow and Arrow Flight as being a hi-performing database with access speeds to match for a while now and finally got a chance to hear what it was all about with James Duong, Co-Fourder of Bit Quill Technologies/Senior Staff Developer at Dremio and David Li (@lidavidm), Apache PMC and software … Continue reading "130: GreyBeards talk high-speed database access using Apache Arrow Flight, with James Duong and David Li"
This episode features an interview with Tomer Shiran, Founder and Chief Product Officer at Dremio. Dremio is a high-performance SQL lakehouse platform that helps companies get more from their data in the fastest way possible. Prior to Dremio, Tomer served as VP of Product at MapR and also held product management and engineering roles at Microsoft and IBM Research. He also has a master's degree from Carnegie Mellon University as well as a bachelor's from Technion - Israel Institute of Technology.In this episode, Tomer and Sam dive into the economics of storing data, how to build an open architecture, and what exactly a data lakehouse is.-------------------“I think in the world of data lakes and lakehouses, the model has shifted upside down. Now, instead of bringing the data into the engines, you're actually bringing the engines to the data. So you have this open data tier built on open source technology. The data is represented in open source formats and stored in the company's S3 account or Azure storage account. And then you can use a variety of engines. We at Dremio, we take pride in building the best SQL engine to use on the data. There are different streaming engines, like Spark and Flink. There are different batch processing and machine learning engines. Spark is an example of that as well that companies can use on that same data. And I think that's one of the really important things from a cost standpoint, too, is that this really lowers your overall costs, both today and also in the future as you scale.” – Tomer Shiran-------------------Episode Timestamps:(02:04): What open source data means to Tomer(03:14): Tomer's motivation behind Apache Arrow(06:42): How Tomer solved data accessibility (08:43): The unit economics of storing data(14:31): Tomer's motivations for Iceberg and how it relates to Project Nessie(17:06): What is a data lakehouse?(18:31): What gives Dremio its magic?(23:39): What cloud data architecture will look like in 5 years(27:19): Advice for building an open data architecture-------------------Links:LinkedIn - Connect with TomerLinkedIn - Connect with DremioTwitter - Follow TomerTwitter - Follow DremioVisit DremioGet started with Dremio
Subsurface Keynote Video => https://www.youtube.com/watch?v=x2K4VZUL_M4&t=4s Subsurface Product Demo Video => https://www.youtube.com/watch?v=9qk8hHC1G68
En el pod-newsletter de hoy: 1. Robert Malone. 2. Menos medidas. 3. PPI´s. 4. El mercado. 5. Data Lake en Dremio. 6. Paris Hilton y Elon Musk. Puedes leer el podcast, consultar los enlaces de las noticias comentadas y suscribirte a la newsletter en este link: https://nofinancieros.substack.com/p/for-the-sake-of-the-world Otros enlaces de escucha en: https://finpickstonks.carrd.co/ Más en: https://nofinancieros.com/ ¿Te ha molado el podcast? Invítate a algo, ¿no? ;) https://ko-fi.com/nofinancieros
En el pod-newsletter de hoy: 1. Robert Malone. 2. Menos medidas. 3. PPI´s. 4. El mercado. 5. Data Lake en Dremio. 6. Paris Hilton y Elon Musk.Puedes leer el podcast, consultar los enlaces de las noticias comentadas y suscribirte a la newsletter en este link: https://nofinancieros.substack.com/p/for-the-sake-of-the-worldOtros enlaces de escucha en: https://finpickstonks.carrd.co/Más en: https://nofinancieros.com/¿Te ha molado el podcast? Invítate a algo, ¿no? ;) https://ko-fi.com/nofinancieros
On this episode of Bathtime 2 Boardroom, we are flipping the script and interviewing none other than Colleen Blake, our very own Cohost and Producer! While she may have been expecting an interview with actress Jessica Alba, we instead surprised her with an appearance from her eldest daughter Emily Blake, who is our guest co-host for this episode. On the bathtime side of things, Colleen is a mom to three daughters. When asked why she decided to become a mom, Colleen tells us that she never really questioned being a mother. Instead, she has always seen motherhood as a natural and expected part of her life. Since having kids, Colleen has actively combined elements of what her own parents taught her (modeling hard work, incorporating organization and party planning, to name just a few things) with an effort to be present in her daughters' school and social lives. She explains that she has done this to show her daughters that you can both have a career AND show up for your family's important moments. On the boardroom side, Colleen not only hosts and produces our podcast, but she also is a VP of People at Dremio. With over 20 years of diverse experience across HR, IT, Marketing, Services and Product, Colleen explains that her favorite part of her job is helping others feel valued and heard. She tells us that her motherhood and work roles often overlap in unexpected ways. For example, she brought work into the home when she had Emily create a flow chart for getting ready for kindergarten in the mornings! And she admits to having sometimes felt like a parent towards the employees in her professional life. If you've been dying to learn more about our host, then you won't want to miss this special episode of Bathtime 2 Boardroom. Learn how Colleen has used improv lessons to inform her parenting skills, how she balances all of her responsibilities (parenting, professional and everything in between), and why post-it notes are an absolute MUST for keeping her organized and inspired. Quotes • “I wanted to make sure I was also present in my children's life at school and outside of school in terms of activities and things while at the same time still showing my girls...that you can also have a career at the same time. So it wasn't like you had to sacrifice one for the other.” (7:54-8:15) • “Over the past year, one of the blessings of being in the pandemic has been being able to see [my daughter Emily] evolve as a human being and as a member of society and having real adult conversation with her where I'm like, ‘Holy crap, I did that!'....I don't know that we often as parents talk about that enough and really sit back and reflect on how much you really have created another human being to be part of society.” (11:42-12:15) • “Improv definitely helped with being playful: more playful with the kids and just kind of going with it and then also at work. I think as a leader sometimes you automatically think you know what the right thing is to do, but you really have to listen and hear other people, (1) so they feel acknowledged and (2) maybe they have a really great idea that if you keep going with the story that will turn into something even more powerful and more successful.” (14:16-14:45) • “Everyone in the world deserves a chance to make a choice, to have equal choice on things and make a decision, whether they want to have kids, whether they want to have a career, you should have a choice. That's why I explore and talk to other people and share stories because I think that's a huge part of making sure you know what your options are. That's what I want for you girls.” (16:00-16:26) • “If there's any advice I would give about the whole work-life balance thing, it's to raise your hand and ask for help or to raise the white flag and say, ‘I can't do this alone, and I need support.'” (28:30-28:40) • “My favorite part [about working] is having an impact….and being able to take chaos and get it cleaned up and efficient and so it's running on a good cadence, and then getting the sense that people feel like they're valued. So helping others feel valued in what they do and that you care about their experience – I get excited about that.” (31:28-32:00) • “Be kind to yourself. It's not just about being kind to others but being kind to yourself and being more forgiving. You're doing better than you think!” (40:46-40:54) Links https://colleenablake.com/ https://www.linkedin.com/in/colleenablake/ Podcast production and show notes provided by FIRESIDE Marketing
Software Engineering Radio - The Podcast for Professional Software Developers
Tomer Shiran, co-founder of Dremio, talks about managing data inside a data lake, historical changes and motivations for managing data as a data lake, and the common tools and methods for ingestion, storage, and analytics on top of the underlying data.
Tech Out Loud is the only podcast to bring you impactful blog posts from the biggest names in tech, straight to your ears.Subscribe to Tech Out Loud to listen to articles from the best minds in tech, each week.This week's episode was written by Tomasz Tunguz, Venture Capitalist and Managing Director at Redpoint Ventures. Tomasz also serves as an active board member at ThredUp, Electric Imp, Gremlin, and Dremio, is author of "Winning with Data", and co-founder of Perquimans Systems. In this short but powerful episode, Tomasz proves that the best tips don't have to be 40mins long. He succinctly provides the solution to a problem that most companies face: How do you compete and win without bloating your software stack? Listen to the episode now or read the full article: How Many Technologies Can a Company Adopt at Once?"An IT executive recently asked me this question. How many technologies can a company adopt at once, successfully? It’s a question I hadn’t paused to contemplate before that moment. And if you are a vendor, it’s not one that you think about very often either. But if you are a member of the IT team, it’s top of mind every day.There are two answers to this question...."To read the rest of this article, click here: How Many Technologies Can a Company Adopt at Once? Tech Out Loud is brought to you by Process Street, a free way to manage playbooks and processes for your team, and Sound Advice Strategies, a full service for podcast setup, production, marketing, and editing. Subscribe to Tech Out Loud for more great articles, and leave a review to let us know what you think!Voted the #1 BPM software by GetApp, if you want a full month of Process Street for free, just click this link: https://www.process.st/audio-gift
In this episode, Michael and Sarah talk with Billy Bosworth, CEO of Dremio – a cloud data lake engine that operationalizes data lake storage and speeds up analytics processes. Billy discusses the state of modern data infrastructure, what led to the founding of Dremio, how they’re approaching selling in the cloud data lake market, and the one thing he wishes prospects knew.
You are not getting the most out of your people team to deliver the best service you can to your customers. And Colleen Blake, VP of People at Dremio, is here to help. Most leaders of customer teams are putting together hiring requests because their teams are overwhelmed. “My team is going to burn out if we don’t hire more people. Then we’ll lose them. And our customers.” That’s how most of us hire. That’s the wrong way to hire. First of all, it is reactive. We should want to be proactive. I mean, the first of the seven habits of highly effective people is “Be proactive” for Pete’s sake. It’s the first one. Second, hiring people because our current team is busy exposes that we have a complete lack of understanding for how our business works. How can we run a business, if we don’t know how it works? We can’t. So, Colleen Blake is here to help. She offers two ideas for how to partner with our respective people teams to become more proactive, understand our businesses better, and deliver better service to our customers. Idea #1: If your team is overwhelmed, it might not be a staffing problemColleen suggests you might want to question whether your team is working on the right things. You, as a leader, might need to simply give your team permission to prioritize some work and say ’no’ to other work. Yes, manager. That’s on you. Your sales team might be busy bringing in customers, but are they bringing in the right customers? Your customer success team might be working hard helping customers use your product, but are they helping customers use the right parts of the product? Really examine company priorities and ruthlessly make sure your team is working only on things that progress toward company priorities. Idea #2: Develop a staffing modelThe worst thing we can do is hire people because we are busy. If we don’t have a model for determining why we need to hire, when we need to hire, and who we need to hire, then we wake up one morning with too much headcount, wonder what happened, and have to lay people off. Colleen suggests starting with the customer and work backwards until you know how many people you have in your talent pipeline. That looks something like this: Understand what it takes to get a customer up and running. Look at your team’s capacity. How many customers can one person deal with successfully? Look at current sales pipeline to see how many new customers might sign up and when and how people people you will need on your team and when. Look at how long it takes to get a new employee up and running. Look at your talent pipeline to know how many possible people you could hire and when. Take answers from each of the above, do a little math, and “voila” you have a staffing model. One that will impress your executive team, and one that will help you partner with your people team better to get you the team you need to help your customers. Start with the customer and work backwards from there. More about Colleen: On LinkedinHer podcast (with Eric Quick): Bathtime 2 BoardroomDremio Get on the email list at helpingsells.substack.com
Nate is the founder & CEO of Appsembler, a B2B SaaS company that he founded in 2011, and is now a 100% distributed team hailing from 8 countries. Appsembler helps companies like Redis Labs, Chef Software and Dremio deliver online hands-on technical training at scale. Nate has been heavily involved with several open source communities in his 25+ year tech career, and loves tinkering around with emerging technologies and playing jazz saxophone in his spare time.
In this latest episode of the Designing Enterprise Platforms podcast from Early Adopter Research (EAR), EAR’s Dan Woods focused on how to optimize your BI, data warehouse and data lake spend in the cloud. He was joined today by Jason Nadeau, the VP of marketing from Dremio, who discussed Dremio and the general strategy he’s developed for optimizing this type of spending in the cloud. Their conversation covered: * 7:00 -- How to take advantage of the cloud * 13:30 -- How performance can drive efficiency in the cloud * 26:15 -- Dremio's optimized features * 36:00 -- Dremio's financial model
Data Futurology - Data Science, Machine Learning and Artificial Intelligence From Industry Leaders
Tomer Shiran is the Co-Founder and CEO of Dremio, Dremio is the Data-as-a-Service Platform company. Created by veterans of open source and big data technologies, and the creators of Apache Arrow, Dremio is a fundamentally new approach to data analytics that helps companies get more value from their data, faster. Dremio makes data engineering teams more productive and data consumers more self-sufficient. Tomer Shiran previously headed the product management team at MapR and was responsible for product strategy, roadmap, and requirements. Before MapR, Tomer held numerous product management and engineering roles at Microsoft, most recently as the product manager for Microsoft Internet Security & Acceleration Server (now Microsoft Forefront). He is the founder of two websites that have served tens of millions of users and received coverage in prestigious publications such as The New York Times, USA Today, and The Times of London. Tomer is also the author of a 900-page programming book. He holds an MS in Computer Engineering from Carnegie Mellon University and a BS in Computer Science from Technion - Israel Institute of Technology. Enjoy the show! We speak about: [01:50] How Tomer started in the data space [03:35] What was it like running your own business? [04:50] What did you think would happen with ePassportPhoto? [07:20] Takeaways from MapR [09:35] What was the process of starting Dremio? [10:55] How did you gauge how much product development needed to be done? [12:20] Where did you start with your hiring process? [13:00] What have been some of the pivotal moments for Dremio? [14:35] What does Dremio do? [16:00] What is the semantic layer? [20:00] Who are the users? [23:30] What are the data masking capabilities? [25:10] How has the journey been for you personally? [28:35] What challenges are you facing right now? [30:00] About Tomer’s teams [31:15] The importance of having a sales team [33:30] How has Dremio changed with the increase of employees? [34:30] What does the future look like for Dremio? [35:00] What do international expansions look like for Dremio? [35:45] What are you most proud of in your career? [36:20] Any lessons from your failures? [37:45] Advice for future entrepreneurs [40:00] Future challenges in the data space [41:45] A piece of advice for the listeners Thank you to our sponsors: Fyrebox - Make Your Own Quiz! RMIT Online Master of Data Science Strategy and Leadership Gain the advanced strategic, leadership and data science capabilities required to influence executive leadership teams and deliver organisation-wide solutions. We are RUBIX. - one of Australia’s leading pure data consulting companies delivering project outcomes for some of the world’s leading brands. Visit online.rmit.edu.au for more information And as always, we appreciate your Reviews, Follows, Likes, Shares and Ratings. Thank you so much for listening. Enjoy the show! --- Send in a voice message: https://anchor.fm/datafuturology/message
Dremio Founder and CEO Tomer Shiran calls his company a “cloud data lake engine.” It sounds very sophisticated—but what does that really mean? Basically, it means that his company will connect all your data, whether it’s in Amazon Web Services (AWS), Microsoft Azure, or on-premises (or some combination of all of those), and add services that allow you to leverage it for maximum benefit. Sound cool? It is. Tomer discusses the increasingly-important field of Data-as-a-Service with ActualTech Media Partner James Green in this edition of “10 on Tech.” If you’ve got data scattered all over the place, but don’t know how to get to it or how to make it work to help your bottom line, this is one you can’t miss. Highlights of the show include: How the usage of data has evolved over the years Data silos and the current landscape of data fragmentation How the cloud has revolutionized the ability to manage and process data The challenges of moving data to the cloud How Dremio helped Royal Caribbean Cruises lasso their data and extract more value from it The growing issue of data explosion Importance of the ability to scale up and down according to demand Dremio’s storage innovations Resource links from the show: Dremio home -- https://www.dremio.com/ The Dremio free community edition -- https://www.dremio.com/deploy/ Dremio demo -- https://www.dremio.com/lp/weekly-live-demo/ Dremio community forum -- https://community.dremio.com/ We hope you enjoy this episode; and don’t forget to subscribe to the show on iTunes, Google Play, or Stitcher.
In this edition of the Designing Enterprise Platforms Podcast at Early Adopter Research (EAR), EAR’s Dan Woods spoke with Tomer Shiran, the CEO and founder of Dremio. Their conversation focused on how to create a data lake for an end user, a topic that anybody who’s worried about BI and analytics should be interested in. This podcast series looks at various ways of understanding technology and how to combine technology to create platforms to solve vital problems for the enterprise. There probably is no more vital problem than how to actually get your data infrastructure correct. Their conversation covered: * 3:00 - How companies are trying to save the data lake * 17:00 - The need for open source and fast query speed * 29:00 - Why an open source project is not a product
When your data lives in multiple locations, belonging to at least as many applications, it is exceedingly difficult to ask complex questions of it. The default way to manage this situation is by crafting pipelines that will extract the data from source systems and load it into a data lake or data warehouse. In order to make this situation more manageable and allow everyone in the business to gain value from the data the folks at Dremio built a self service data platform. In this episode Tomer Shiran, CEO and co-founder of Dremio, explains how it fits into the modern data landscape, how it works under the hood, and how you can start using it today to make your life easier.
Are there really two years worth of Roaring Elephant podcasts out there? Well, since this is our second anniversary party, it must be! Join some of the guests we had on the podcast this year to reminisce about the months gone by. Due to the drop-in drop-out nature, this episode is a little rough but we hope you can enjoy being part of our little party! Discussion topics ranged from what our guests have been up to, Apache Kafka, Dremio the effects of GDPR on the industry and how our guests see the future of Big Data. Our returning guests today are: Eduardo Barbaro Sr. Data Scientist at Mobiquity, Inc – Europe https://www.linkedin.com/in/edbarbaro/ Marcel-Jan Krijgsman Data Engineer at Open Circle Solutions B.V. https://www.linkedin.com/in/marcel-jankrijgsman/ Youen Chéné CTO @Saagie https://www.linkedin.com/in/youenchene/ Pitt Fagan Senior Data Analyst at Zendesk https://www.linkedin.com/in/pittfagan/ Big Data Madison Meetups: https://www.meetup.com/BigDataMadison/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch
Tom Tunguz is a Partner @ Redpoint Ventures, where he has invested in the likes of Axial, Dremio, Expensify, Electric Imp, Looker, and ThredUP. Tomasz is also the co-author of Winning with Data: Transform Your Culture, Empower Your People, and Shape the Future, which explores the cultural changes big data brings to business, and shows you how to adapt your organization to leverage data to maximum effect. Before joining Redpoint, Tomasz was the product manager for Google’s AdSense social-media products and AdSense internationalization. In Today’s Episode You Will Learn: 1.) When did Tom perceive the true power of data for the first time? 2.) What is the biggest difference between a company that is data driven and one that is not? What are the inherent benefits and how can non data driven businesses become data driven? 3.) What are the best data driven teams doing to operationalise their data today? 4.) Adam Grant: 8% of job interviews are productive. So what structure can management use to ensure higher efficiency in the hiring process? 5.) What are the complexities and skills required for strong data analysis in today's environment? 6.) Data often leads to over confidence in decision making, how do you prevent illusion bias once data has been obtained? Items Mentioned In Today’s Show: Tom’s Most Recent Investment: Dremio As always you can follow Harry, The Twenty Minute VC and Tom on Twitter here! Likewise, you can follow Harry on Snapchat here for mojito madness and all things 20VC. Eve make 1 perfect mattress – made with 3 layer technology and next generation memory foam. It comes packaged in a beautiful box and arrives the day after you order. You get 100 nights to try it with free return pick-up – it really is the perfect mattress for everyone. Just go online to evemattress.co.uk and enter the code 20VC for £50 off. Everybody deserves the perfect start with Eve. Cooley are the global law firm built around startups and venture capital. Since forming the first venture fund in Silicon Valley, Cooley has formed more venture capital funds than any other law firm in the world, with 50+ years working with VCs. They help VCs form and manage funds, make investments and handle the myriad issues that arise through a fund’s lifetime. So to learn more about the #1 most active law firm representing VC-backed companies going public. Head over to cooley.com and also at cooleygo.com.