Podcasts about Alluxio

25PODCASTS
35EPISODES
41mAVG DURATION
?INFREQUENT EPISODES
Jun 16, 2024LATEST

POPULARITY

20192020202120222023202420252026

Best podcasts about Alluxio

DMRadio Podcast

4 episodes with Alluxio

Software Engineering Radio - The Podcast for Professional Software Developers

2 episodes with Alluxio

Gestalt IT Rundown

2 episodes with Alluxio

Big Data Beard

2 episodes with Alluxio

Trino Community Broadcast

2 episodes with Alluxio

Utilizing AI - The Enterprise AI Podcast

2 episodes with Alluxio

Designing Enterprise Platforms

2 episodes with Alluxio

Latest podcast episodes about Alluxio

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

Play Episode Listen Later Jun 16, 2024 53:19

Summary Stripe is a company that relies on data to power their products and business. To support that functionality they have invested in Trino and Iceberg for their analytical workloads. In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Kevin Liu about his use of Trino and Iceberg for Stripe's data lakehouse Interview Introduction How did you get involved in the area of data management? Can you describe what role Trino and Iceberg play in Stripe's data architecture? What are the ways in which your job responsibilities intersect with Stripe's lakehouse infrastructure? What were the requirements and selection criteria that led to the selection of that combination of technologies? What are the other systems that feed into and rely on the Trino/Iceberg service? what kinds of questions are you answering with table metadata what use case/team does that support comparative utility of iceberg REST catalog What are the shortcomings of Trino and Iceberg? What are the most interesting, innovative, or unexpected ways that you have seen Iceberg/Trino used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Stripe's data infrastructure? When is a lakehouse on Trino/Iceberg the wrong choice? What do you have planned for the future of Trino and Iceberg at Stripe? Contact Info Substack (https://kevinjqliu.substack.com) LinkedIn (https://www.linkedin.com/in/kevinjqliu) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com) with your story. Links Trino (https://trino.io/) Iceberg (https://iceberg.apache.org/) Stripe (https://stripe.com/) Spark (https://spark.apache.org/) Redshift (https://aws.amazon.com/redshift/) Hive Metastore (https://cwiki.apache.org/confluence/display/hive/design#Design-Metastore) Python Iceberg (https://py.iceberg.apache.org/) Python Iceberg REST Catalog (https://github.com/kevinjqliu/iceberg-rest-catalog) Trino Metadata Table (https://trino.io/docs/current/connector/iceberg.html#metadata-tables) Flink (https://flink.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/apache-flink-with-fabian-hueske-episode-57) Tabular (https://tabular.io/) Podcast Episode (https://www.dataengineeringpodcast.com/tabular-iceberg-lakehouse-tables-episode-363) Delta Table (https://delta.io/) Podcast Episode (https://www.dataengineeringpodcast.com/delta-lake-data-lake-episode-85/) Databricks Unity Catalog (https://www.databricks.com/product/unity-catalog) Starburst (https://www.starburst.io/) AWS Athena (https://aws.amazon.com/athena/) Kevin Trinofest Presentation (https://trino.io/blog/2023/07/19/trino-fest-2023-stripe.html) Alluxio (https://www.alluxio.io/) Podcast Episode (https://www.dataengineeringpodcast.com/alluxio-distributed-storage-episode-70) Parquet (https://parquet.incubator.apache.org/) Hudi (https://hudi.apache.org/) Trino Project Tardigrade (https://trino.io/blog/2022/05/05/tardigrade-launch.html) Trino On Ice (https://www.starburst.io/blog/iceberg-table-partitioning/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

spark trusted python doordash comcast hive data driven stripe iceberg hug starburst flink redshift parquet trino hudi tabular freak fandango orchestra apache iceberg alluxio

55: Commander Bun Bun peeks at Peaka

Trino Community Broadcast

Play Episode Listen Later Feb 12, 2024 57:46

Timestamps:- 0:00 Intro- 1:36 Releases 437-438- 4:12 Introducing Peaka- 8:07 An overview of Peaka- 16:02 The engineering of Peaka- 20:04 Connectors- 26:51 Peaka demo- 41:34 Managing catalogs and security- 51:06 Peaka wrap-up- 53:14 PR of the episode: Filesystem caching with Alluxio- 56:16 Outro

pr managing releases commander connectors filesystem peeks bun bun alluxio

ISS Cleveland and the Current Market | Gestalt IT Rundown: October 25, 2023

Gestalt IT Rundown

Play Episode Listen Later Oct 25, 2023 30:45

We are at the 21st annual Information Security Summit in Cleveland today, and it's been an interesting event for us. Not only is this close to home in Northeast Ohio, the long-running ISS event attracts a wide range of attendees and presenters across the information security space. Ransomware and AI are high on the agenda of course, but we're also hearing about budgets and employment in the industry. Time Stamps: 0:00 - Welcome to the Rundown 1:45 - Prosimo Cloud Cost 360 3:43 - StorPool Tightens Data Protection in StorPool v21 7:08 - New Architectural Observability Manager from vFunction 10:25 - New Generative AI Platform from Alluxio 12:56 - Lenovo and Nvidia Announce Hybrid AI 14:47 - NetApp Insight Announcements 18:27 - ISS Cleveland and the Current Market 29:19 - The Weeks Ahead 29:58 - Thanks for Watching Hosts: Stephen Foskett: https://www.linkedin.com/in/sfoskett/ Leon Adato: https://www.linkedin.com/in/leonadato/ Follow Gestalt IT Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/Gestalt-IT #Rundown, @Prosimo_io, #Cloud, @StorPool, #Storage, @v_Function, #observability, @Alluxio, #AI, #GenAI, @NVIDIA, @Lenovo, #AI, @NetApp, #Insight, #ISSCLE, #Security, @SFoskett, @LeonAdatao, @GestaltIT,

ai security cleveland cloud function nvidia storage iss ransomware gestalt lenovo genai northeast ohio current market netapp alluxio storpool

Industrial Data Pipelines: Fueling the Modern Enterprise

DMRadio Podcast

Play Episode Listen Later Aug 31, 2023 49:41

Unleash the engine that powers modern enterprises—industrial data pipelines. Join @eric_kavanagh as he explores the backbone of data flow that drives business success. He'll interview legendary Data Analyst Mike Ferguson of Intelligent Business Strategies, along with Sean Knapp of Ascend, and Beinan Wang of Alluxio.

data modern enterprise industrial unleash fueling ascend pipelines alluxio

Fast and Efficient Hybrid Data Access with Alluxio | Episode #68

Great Things with Great Tech!

Play Episode Listen Later Jun 29, 2023 39:52

Revolutionizing Data Orchestration with Alluxio: Open Source Innovation and Unprecedented Speeds for AI in the Cloud! In this episode, I'm speaking with Adit Madan, Director of Product Management at Alluxio, a company at the cutting edge of open-source data orchestration technology for analytics and AI in the cloud. Alluxio is bridging the gap between data-driven applications and storage systems, delivering unprecedented data access speeds. We explore Alluxio's unique global namespace and memory-first tiered architecture, their role in the Hadoop ecosystem, and their strategy for addressing the challenges of data orchestration. Alluxio, originating from the UC Berkeley AMPLab, is now an industry-leading technology deployed by hundreds of organizations worldwide, transforming the way data is managed and accessed. Alluxio was founded in 2014 and is headquartered out of San Francisco Bay Area, Silicon Valley ☑️ Support the Channel by buying a coffee? - https://ko-fi.com/gtwgt ☑️ Technology and Technology Partners Mentioned: Alluxio, Data Orchestration, Open Source, Big Data, AI, Cloud, Data Analytics, Storage Systems, Global Namespace, Memory-First Architecture, Data Management, Hadoop, API Translation ☑️ Web: https://www.alluxio.io ☑️ Crunch Base Profile: https://www.crunchbase.com/organization/alluxio ☑️ Interested in being on #GTwGT? Contact via Twitter @GTwGTPodcast or go to https://www.gtwgt.com ☑️ Subscribe to YouTube: https://www.youtube.com/@GTwGTPodcast?sub_confirmation=1 • Web - https://gtwgt.com • Twitter - https://twitter.com/GTwGTPodcast • Spotify - https://open.spotify.com/show/5Y1Fgl4DgGpFd5Z4dHulVX • Apple Podcasts - https://podcasts.apple.com/us/podcast/great-things-with-great-tech-podcast/id1519439787 ☑️ Music: https://www.bensound.com

music director ai technology data web cloud hybrid big data efficient san francisco bay area open source product management data analytics data management hadoop alluxio

Why Finding A Mentor Matters

The Talent Tango

Play Episode Listen Later Mar 15, 2023 11:42

Host Amir Bormand meets with Trish Pandya, the Senior TA Manager at Alluxio. She and Amir discuss the significance of mentors - and how forming a meaningful connection with the right mentor can help you grow professionally and personally. Highlights 3:12 - Trish shares why she began to search for a mentor. 8:29 - What qualifies someone as a mentor? 14:13 - How to know if mentorship is or isn't working for you. 18:48 - Trish offers her insights on becoming a better mentor to her team. Guest: Trish Pandya is a Talent Acquisition Leader with 10 + years of experience in the San Francisco Startup Arena as a recruiter for startups that empowers minorities, including the female, minority, and underrepresented communities in the San Francisco Bay Area, and aspires to do this nationwide. She is currently the Senior Manager of Talent Acquisition at Alluxio, which is an open-source data orchestration software for the cloud for enterprise companies. She aspires to spread the word with a twisted sense of humor about her views on talent acquisition/recruiting, thoughts on diversity and unbiased hiring, scaling teams with structure, the importance of mentorship, and of course, the crazy world of working in tech. LinkedIn: https://www.linkedin.com/in/trishnapandya/ ___ Thank you so much for checking out this episode of The Talent Tango, and we would appreciate it if you would take a minute to rate and review us on your favorite podcast player. Want to learn more about us? Head over at https://www.elevano.com Have questions or want to cover specific topics with our future guests? Please message me at https://www.linkedin.com/in/amirbormand (Amir Bormand)

head talent mentor recruiting human resources recruitment san francisco bay area senior manager recruiters talent acquisition alluxio amir bormand

S1E12丨为解决困扰科技巨头的技术难题，他选择开源软件代码

æ³°åº¦Voice

Play Episode Listen Later Jan 18, 2023 45:08

企业数字化转型已经成为各行各业的共识。但随着存储和计算分离成为大数据架构的确定性趋势，在过去五六年间，数据访问时间变长和存储成本的增长，已经成为企业在推进数字化时一个无法忽视的难题。一位加州大学伯克利分校的博士在实验室里创造性地解决了这一问题，并且决定向用户免费开放软件代码。截至目前，包括全球市值排名前十的公司有七家公司在使用这款名为Alluxio的软件，其中不乏多家全球科技巨头。 Alluxio的诞生背后有怎样的故事？如何创造性地解决了众多科技巨头的痛点？这样一项有商业化空间的技术为何选择开源？本期节目是泰度Voice创投系列的第二期，华泰创新的投资人刘诚邀请Alluxio的创始人李浩源，一起聊聊开源社区和开源软件以及现代企业势必经历的数字化转型。聊天的人刘诚，华泰创新投资总监、投资二部负责人李浩源，Alluxio创始人兼CEO 时间轴 05:11 类似高速公路网，Alluxio做的是计算机世界的基础设施架构 09:45 UC 伯克利计算机系实验室与业界定期“碰撞火花“ 18:35 每隔5到10年，存储行业会诞生一代新系统 22:22 每十年，就有20%的世界500强公司消失 24:49 数字化转型最深的三大行业：科技、金融、电信 35:24 开源是软件开发领域的大赛道 37:40 为什么全球排名前十的公司有七家选择Alluxio 44:10 耐克请科技公司高管担任CEO 泰度小课堂 AMPLAB：加州大学伯克利分校位于Soda Hall的实验室，专注于大数据分析。AMP代表算法Algorithms, 机器Machines, 和人People。 Spark：指Apache Spark，是一款专为大规模数据处理而设计的快速通用的计算引擎，加州大学伯克利分校AMPLab所开发的明星项目之一。制作团队主编：原瑞阳项目统筹：韦晔制作：高海博声音设计：马若晨、陆佳杰节目运营：小米粒本节目录制于2022年10月16日，本播客不保证节目播出时援引数据信息的及时、准确、完整。法律声明本播客不是华泰证券股份有限公司研究报告（下称”华泰证券”）的发布平台，旨在为公众提供宏观、产业、市场热点解读，不构成华泰证券开展证券投资咨询业务或提供任何的投资建议、投资分析意见。本播客不构成任何合同或承诺的基础，不因任何单纯订阅本播客的行为而将订阅人视为华泰证券客户。任何读者在订阅本播客前，请自行评估接收相关推送内容的适当性，且若使用本播客所载内容，务必寻求专业投资顾问的指导及解读。本播客内容可能涉及华泰证券分析师对华泰证券已发布研究报告的解读，或转发、摘编华泰证券已发布研究报告的部分内容及观点，完整的分析应以报告发布当日的完整研究报告内容为准。订阅者仅使用本播客内容，可能会因缺乏对完整报告的了解或缺乏相关的解读而产生理解上的歧义。如需了解完整内容，请具体参见华泰证券所发布的完整报告。就本播客内容涉及的嘉宾言论，华泰证券已事先提醒嘉宾其言论及信息来源应合法合规，不得泄露内幕信息、上市公司重大未公开信息或其他敏感信息，不得侵犯第三方任何合法权益。本播客内容中的嘉宾言论仅代表嘉宾个人意见，不代表华泰证券立场，也不构成对读者的投资建议。华泰证券对本播客节目文字、音频、图片、链接等形式所载信息的准确性、可靠性、时效性及完整性不作任何明示或暗示的保证。播客内容所述意见、观点和预测仅作为音频录制日的观点和判断。该等意见、评估及预测无需通知即可随时更改。在任何情况下，本播客文字、音频、图片、链接等形式所载信息均不构成对任何人的投资建议。订阅者不应单独依靠本播客内容而取代自身独立的判断，应自主做出投资决策并自行承担投资风险。对依据或者使用本播客内容所造成的任何后果，华泰证券及节目嘉宾均不承担任何形式的责任。本播客所有内容的版权均为华泰证券所有。未经华泰证券书面许可，任何机构和个人不得以任何形式转发、转载或部分转载、发表或引用本播客任何内容。本节目由华泰证券出品，JustPod制作，小宇宙、喜马拉雅、苹果播客同步上线。

ceo voice algorithms machines uc alluxio

43: Trino saves trips with Alluxio

Trino Community Broadcast

Play Episode Listen Later Jan 2, 2023 100:19

saves trips trino alluxio

The Evolution of Databases with Dipti Borkar

Data Bytes

Play Episode Listen Later Dec 8, 2022 37:42

Overview Today's guest is Dipti Borkar, Vice President and General Manager, SaaS, Azure Databases at Microsoft. Prior to joining Microsoft, Dipti was the Founder and Creator of Ahana, a cloud managed service. Dipit has vast experience working in startups such as Counchbase, and Marklogic, and began her career working as a software engineer for IBM. In today's episode, Dipti shares how Databases have evolved over the past 15 years, her predictions for the future of technology and provides actionable advice for those looking to start a career in technology. About Dipti Borkar Dipti is a senior technology executive and entrepreneur with over 18 years of experience in cloud, open source and distributed data / database tech including relational, NoSQL, and federated systems. Dipti is the Vice President & General Manager at Azure Data, Microsoft where she leads product and engineering teams to make cloud databases simple and smart. She founded Ahana and created a cloud managed service for SQL on data lakes where she played many roles including Chief Product Officer and VP of Cloud / Open source engineering. Prior to Ahana, she held various different executive roles at Alluxio, Couchbase and IBM. At Couchbase she held several leadership positions over the years leading and building out the product, engineering and world-wide solutions engineering teams. At IBM, Dipti managed large world-wide dev teams for DB2 Distributed where she also started her career as a software engineer in the DB2 LUW kernel. She also served as Chairperson of the Linux Foundation / Presto Foundation community for many years. Dipti holds a MS in Computer Science from UC San Diego with a specialization in databases and holds an MBA from the Haas School of Business at UC Berkeley. She is very passionate about empowering and mentoring women in tech and open source. Learn more about our mission and become a member here: https://www.womenindata.org/ All Data Bytes listeners get 20% off of WiD membership by using the code: DATABYTES20 --- Support this podcast: https://anchor.fm/women-in-data/support

AMD Releases Genoa Epyc CPUs | Gestalt IT Rundown: November 16, 2022

Gestalt IT Rundown

Play Episode Listen Later Nov 16, 2022 26:29

AMD got the jump on Intel in the battle for fourth-generation server CPUs, announcing the so-called Genoa line at SuperComputing 22. AMD's new CPU line is a massive upgrade, with more cores, accelerator instructions, and CXL. And it comes ahead of Intel's delayed Sapphire Rapids Xeon announcement, which is widely expected early next year. Let's take a closer look at the AMD Genoa CPU line. Time Stamps: 0:00 - Welcome to the Rundown 0:36 - VMware Purchases Ananda Networks 2:38 - NetApp's BlueXP Manages Data Anywhere 4:50 - Kalray Enters the Storage Accelerator Market 7:15 - Hammerspace Hits the Afterburner 9:18 - Alluxio 2.9 Connects Data and Applications 12:34 - AMD Releases Genoa Epyc CPUs 24:46 - The Weeks Ahead 25:49 - Thanks for Watching Follow our hosts on Social Media Tom Hollingsworth: https://www.twitter.com/NetworkingNerd Stephen Foskett: https://www.twitter.com/SFoskett Max Mortillaro: https://www.twitter.com/MaxMortillaro Follow Gestalt IT Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/1789

intel releases amd gestalt cpu genoa cpus netapp supercomputing cxl epyc alluxio

Learning Curve? Understanding ML's Growing Role

DMRadio Podcast

Play Episode Listen Later Nov 15, 2022 43:52

From 0-60 in just a few short years, Machine Learning is now pervasive in business. It's being used by lots of different large and small organizations, whether for optimizing pricing, procurement, or processes. ML algorithms are everywhere, and the ML process is getting easier to understand, from low code to no code. Join Eric Kavanagh as he talks with three other guests who are using machine learning to their advantage. Join VP of Open Source at Alluxio, Inc., Bin Fan, Director of Data Science at Fiddler.ai, Joshua Rubin, and Practice Lead at Mission Cloud Services, Ryan Ries. Find out how machine learning is growing by checking out this episode of DM Radio!

director machine learning open source data science ml fiddler learning curves practice lead ryan ries alluxio joshua rubin dm radio

138: GreyBeards talk big data orchestration with Adit Madan, Dir. of Product, Alluxio

GreyBeards on Storage

Play Episode Listen Later Oct 13, 2022 50:21

We have never talked with Alluxio before but after coming back last week from Cloud Field Day 15 (CFD15) it seemed a good time to talk with other solution providers attempting to make hybrid cloud easier to use. Adit Madan (@madanadit) , Director of Product Management, Alluxio, which is a data orchestration solution that's available … Continue reading "138: GreyBeards talk big data orchestration with Adit Madan, Dir. of Product, Alluxio"

director product big data product management orchestration madan adit alluxio greybeards

Domain Specific - Why Data Mesh Works

DMRadio Podcast

Play Episode Listen Later Jun 16, 2022 52:54

Could there be a hotter topic than Data Mesh these days? Some say it's still largely theoretical, but the promise of data mesh does make sense: Give discrete groups within an organization dominion over their own data sets, which would be housed within a broader data platform that enables self-service development. A key aspect of the vision is to build data products that are designed and managed by focused departmental teams. Ideally, computational power (and cost) would be carved out according to business value. Sound intriguing? Check out this episode of DM Radio to learn more! Host @eric_kavanagh will interview legendary Analyst Mike Ferguson of Intelligent Business Strategies, Bin Fan of Alluxio and Adrian Estala of Starburst.

sound data domain mesh starburst data mesh alluxio dm radio

3x28: Revisiting Utilizing AI Season 3

Utilizing AI - The Enterprise AI Podcast

Play Episode Listen Later Apr 25, 2022 28:28

Frederic Van Haren and Stephen Foskett look back on all the subjects covered during Season 3 of Utilizing AI. The podcast covered many topics, from religious and ethical implications of AI to the technology that enables machine learning, but one topic that stands out is data science. If data is the key to AI, then the collection, management, organization, and sharing of data is a critical element of making AI projects possible. We also continue our “three questions” tradition by bringing in open-ended questions from Rich Harang of Duo Security, Sunil Samel of Akridata, Adi Gelvan of Speedb, Bin Fan of Alluxio, Professor Katina Michael, and David Kanter of MLCommons. Three Questions: Stephen's Question: Can you think of an application for ML that has not yet been rolled out but will make a major impact in the future? Frederic's Question: What market is going to benefit the most from AI technology in the next 12 months Rich Harang Senior Technical Lead, Duo Security: In an alternate timeline where we didn't develop automatic-differentiation and put it on top of GUPs do this entire deep learning hardware family that we depend on now never got invented. What would the dominat AI/ ML technology be and what would have been different? Sunil Samel, VP of Pusiness Development, Akriadata: How will new technologies like AI help marginalized members of the communities. Folks like senior citizens, minorities, pepole with disabilities, veterans trying to reenter civilian life? Adi Gelvan, CEO and Co-Founder of Speedb: What do you think the risks of AI are and what is your recommended solution? Bin Fan, Founding Member, Alluxio: Im wondering if AI can help with a humanitarian crisis happening in the future? Katina Michael, Professor, School for the Future of Innovation in Society, Arizona State University: If AI was to self replicate what would be the first thing it would do? David Kanter, Executive Director of MLCommons: what s a problem in the AI world where you are held back by the lack of good publicly available data? Hosts: Frederic Van Haren, Founder at HighFens Inc., Consultancy & Services. Connect with Frederic on Highfens.com or on Twitter at @FredericVHaren. Stephen Foskett, Publisher of Gestalt IT and Organizer of Tech Field Day. Find Stephen's writing at GestaltIT.com and on Twitter at @SFoskett. Date: 4/25/2022 Tags: @SFoskett, @FredericVHaren,

3x25: The Unique Challenges of ML Training Data with Bin Fan

Utilizing AI - The Enterprise AI Podcast

Play Episode Listen Later Mar 15, 2022 35:32

Machine learning is unlike any other enterprise application, demanding massive datasets from distributed sources. In this episode, Bin Fan of Alluxio discusses the unique challenges of distributed heterogeneous data to support ML workloads with Frederic Van Haren and Stephen Foskett. The systems supporting AI training are unique, with GPUs and other AI accelerators distributed across multiple machines, each accessing the same massive set of small files. Conventional storage solutions are not equipped to serve parallel access to such a large number of small files, and they often become a bottleneck to performance in machine learning training. Another issue is moving data across silos, storage systems and protocols, which is impossible with most solutions. Three Questions: Frederic: What areas are blocking us today to further improve and accelerate AI? Stephen: How big can ML models get? Will today's hundred-billion parameter model look small tomorrow or have we reached the limit? Sara E. Berger: With all of the AI that we have in our day-to-day, where should be the limitations? Where should we have it, where shouldn't we have it, where should be the boundaries? Gests and Hosts Bin Fan, Founding Member of Alluxio Inc. Connect with Bin on LinkedIn and on Twitter @BinFan. Frederic Van Haren, Founder at HighFens Inc., Consultancy & Services. Connect with Frederic on Highfens.com or on Twitter at @FredericVHaren. Stephen Foskett, Publisher of Gestalt IT and Organizer of Tech Field Day. Find Stephen's writing at GestaltIT.com and on Twitter at @SFoskett. Date: 3/15/2022 Tags: @SFoskett, @FredericVHaren, @BinFan, @Alluxio

founders ai training data services publishers ml organizers conventional bin founding members consultancy frederic gpus unique challenges tech field day stephen foskett alluxio gestalt it stephen how

The Rise of Data Orchestration

DMRadio Podcast

Play Episode Listen Later Sep 17, 2021 44:46

Data migration is so 20th Century! The modern enterprise must enable a more strategic marshaling of information assets, one that respects the complexity of business scenarios, and provides visibility across the information supply chain. Some folks call this... Data Orchestration! Register for this episode of DM Radio to hear Host @eric_kavanagh interview several guests, including Bin Fan, Alluxio; Blake Burch, Shipyard; Guillaume Hervé, Zetane Systems and Cameron Turner, Kin + Carta.

data register orchestration shipyards alluxio dm radio

The Simplest, Yet Most Powerful Trait: Trust

What Works

Play Episode Listen Later Mar 30, 2020 12:48

Data practically runs through the veins of Steven Mih, CEO of opensource data orchestration startup Alluxio. He's a serial startup CEO with more than two decades in the distributed data world. At Alluxio, Mih was brought in as CEO to take an opensource project that came out of U.C. Berkeley’s AMP lab, and help scale the software and the startup around the world. The best way Mih has found to work with his team to do just that, involves trust, over-communication, and a little email hack that reminds him to follow-up on just about everything in his busy life.

ceo trust data powerful berkeley trait simplest mih alluxio

EAR Podcast with Alluxio's Steven Mih

Designing Enterprise Platforms

Play Episode Listen Later Mar 28, 2020 36:15

On the latest episode of the Designing Enterprise Platform Podcast from Early Adopter Research (EAR), EAR’s Dan Woods spoke with Steven Mih, the CEO of Alluxio, a data orchestration platform that came out of the AMPLab at Berkeley. It is Mih’s second appearance on the podcast; the first episode covered the core value proposition of Alluxio and related issues with respect to the evolution of the data platform and open source. In this episode, Woods and Mih discussed what it’s like for an early adopter trying to create a data platform in the modern environment. Their conversation focused on creating a data layer that allows abstraction in front of many different data sources so that companies can, in an orderly fashion, refactor what’s underneath and move it to where it’s optimally stored and optimally delivered. This is what Alluxio was designed to do. Their conversation covered: * 2:00 — Creating a data layer that utilizes object storage * 7:00 — How Alluxio approaches platforms from a compute optimized way * 18:00 — How Alluxio helps data engineers * 21:00 — Alluxio use cases

ceo berkeley ear mih dan woods alluxio

E72: Pautas para utilizar bases de datos como servicio (DBaaS) y sus principales proveedores

SaaS Product Chat

Play Episode Listen Later Jan 13, 2020 31:18

En este episodio nos enfocamos en las prestaciones y concepción de las bases de datos as a service (DBaaS), para qué sirve, en qué se basa, para qué tipo de empresas funciona este modelo y cuáles son los principales beneficios de un ambiente DBaaS.Estos son los enlaces a los temas de los que hemos hablado y a los productos mencionados:Alluxio: https://www.alluxio.io/Pachyderm: https://www.pachyderm.com/Docker: https://www.docker.comMySQL: https://www.mysql.comPostgreSQL: https://www.postgresql.orgRedis: https://redis.ioDigitalOcean: https://www.digitalocean.comMongoDB: https://www.mongodb.com/esDynamoDB: https://aws.amazon.com/es/dynamodb/ClearDB: https://www.cleardb.comAlibaba Cloud: https://www.alibabacloud.comAmazon EC2: https://aws.amazon.com/es/ec2/Apache Spark: https://spark.apache.orgApache Hadoop: https://hadoop.apache.orgApsara Stack: https://www.alibabacloud.com/product/apsara-stackHeroku: https://www.heroku.comLooker: https://looker.comMetabase: https://www.metabase.comBlockstack (plataforma descentralizada): https://blockstack.orgA decentralized high-performance storage system: https://github.com/blockstack/gaiaDecentralized database middleware for blockstack: https://github.com/ntheile/blockstack-db5 de las más útiles bases de datos en la nube (hackernoon): https://hackernoon.com/5-top-cloud-databases-that-works-wonders-7e628810e3ac¿Qué es una base de datos cloud, o DBaaS? https://www.ibm.com/cloud/learn/dbaasBlog de los equipos de Data Science y Data Platform Engineering en Airbnb: https://medium.com/airbnb-engineering/data/homeSección de datos del podcast Software Engineering Daily: https://softwareengineeringdaily.com/category/data/2 posts del blog del neobanco Monzo:How we scaled our data team from 1 to 30 people (part 1): https://monzo.com/blog/2019/11/04/how-we-scaled-our-data-team-from-1-to-30-people-part-1Laying the foundation for a data team: https://monzo.com/blog/2016/11/30/laying-the-foundation-for-a-data-teamEstamos en todas estas plataformas:Apple Podcasts: https://podcasts.apple.com/ca/podcast/saas-product-chat/id1435000409ListenNotes: https://www.listennotes.com/podcasts/saas-product-chat-daniel-prol-y-claudio-CABZRIjGVdP/Spotify: https://open.spotify.com/show/36KIhM0DM7nwRLuZ1fVQy3Google Podcasts: https://podcasts.google.com/?feed=aHR0cHM6Ly9mZWVkcy5zaW1wbGVjYXN0LmNvbS8zN3N0Mzg2dg%3D%3D&hl=esBreaker: https://www.breaker.audio/saas-product-chatEn Twitter nos encuentras como:Danny Prol: https://twitter.com/DannyProl/Claudio Cossio: https://twitter.com/ccossioUn saludo y feliz año 2020!

airbnb data science servicio utilizar principales docker pautas orga proveedores apache spark pachyderm bases de datos software engineering daily dbaas alluxio

Alluxio - Haoyuan Li

Contributor

Play Episode Listen Later Nov 1, 2019 36:39

Eric Anderson hosts Haoyuan Li, also known as HY, creator of Alluxio, an open source project and the company of the same name. HY is also a Spark committer and creator of Spark Streaming. ---- HY invites listeners to attend the first open source Data Orchestration Summit on November 7th in the Computer History Museum in Mountain View! Listeners can use the discount code "ERIC" to receive 50% off registration. The summit brings together data engineers, data platform engineers, and data scientists to share their challenges and learnings from building and using modern analytics, AI, and cloud technologies. Featuring tech talks covering use cases, demos, best practices, and tutorials by industry experts from EA, Walmart, DBS Bank, Netflix, AWS, Rakuten, Tencent, Google, Baidu, Alibaba and more, with a focus on how to build cloud native analytics & AI platforms. Why should you come? Listen to tech (Presto, Spark, Tensorflow, k8s, Alluxio) talks by leading industry experts from EA, Walmart, Netflix, AWS, Rakuten, Tencent, Google, Baidu, DBS Bank, and Alibaba etc. Meet other data engineers and share your experiences over lunch and happy hour Attendees will go home with learnings, swag, and a free voucher to visit the Computer History Museum! By registering and attending you automatically enter a raffle for a chance to win the latest iPad Pro!

netflix google ai walmart spark ea aws alibaba tencent attendees ipad pro mountain view presto hy baidu rakuten tensorflow eric anderson computer history museum dbs bank alluxio

Alluxio: Data Orchestration with Haoyuan Li

Data – Software Engineering Daily

Play Episode Listen Later Oct 25, 2019 48:57

In 2013, the Berkeley AMPLab was a center of innovation. Three projects from AMPLab have turned into successful open source projects and companies: Spark, Mesos, and Alluxio. Haoyuan Li was the creator of Alluxio, and he returns to the show to discuss his journey taking Alluxio from a research project to a company that has The post Alluxio: Data Orchestration with Haoyuan Li appeared first on Software Engineering Daily.

data spark orchestration mesos software engineering daily alluxio

Making Data Readily Available for Developers

The New Stack Podcast

Play Episode Listen Later Oct 21, 2019 31:32

The technology industry is undergoing a data revolution in which the unlimited storage and compute power of the cloud is changing how data is stored, processed and managed. Alluxio is a platform for data orchestration that aims to simplify and standardize how data is managed across different types of infrastructure by creating a layer of abstraction between the storage and application layers. The Alluxio orchestrator virtualizes data and allows applications to access it in a way that's compute, storage and cloud agnostic. It's a platform designed to eliminate data silos and make data readily available and performant for developers. Haoyuan (H.Y.) Li, CTO and founder of Alluxio, is a co-creator of the Apache Spark streaming library and built Alluxio as an open source virtual distributed file system for a computer science PhD project at Berkeley. Li is now building a startup, which currently boasts 40 employees and several large enterprise customers. The company offers enterprise features and support on top of its open source project.

phd data berkeley developers cto li apache spark alluxio

073 – It’s Midnight: Do You Know Where Your Data Is?

10 on Tech

Play Episode Listen Later Oct 7, 2019 14:13

So, you’ve got lots of data, and it’s stored here, there, and everywhere. How do you manage it all? And beyond that, how do you put it to use in your environment, to do business-enhancing things like artificial intelligence and machine learning? That’s what this episode of “10 on Tech” is about. ActualTech Media Partner James Green talks with Alluxio CEO Steven Mih about the growing problem of data being scattered all over, and the challenges that creates for data scientists and companies wanting to harness it. Alluxio specializes in collecting that data and acting as the go-between between between it and the apps that work on it, like Apache Spark, Presto and Tensorflow. Mih also talks about how his company simplifies your data operations. Highlights of the show include: The sources of the data explosion What extraction, transformation, and loading (ETL) is, and why it’s becoming outmoded How data orchestration has improved on ETL The problem of dealing with multiple copies of data What Alluxio does How Alluxio is deployed Resource links from the show: Alluxio -- https://www.alluxio.io/ Tutorial: Amazon Machine Image (AMI) -- https://www.alluxio.io/products/aws/alluxio-presto-sandbox-aws/ Tutorial: Docker -- https://www.alluxio.io/alluxio-presto-sandbox-docker/ Alluxio free trial -- https://www.alluxio.io/download/ We hope you enjoy this episode; and don’t forget to subscribe to the show on iTunes, Google Play, or Stitcher.

tech data midnight stitcher google play presto tensorflow etl mih apache spark alluxio

034: Open Source Data Orchestration – A Conversation with Steven Mih / Alluxio CEO

Business Performance Podcast

Play Episode Listen Later Sep 18, 2019 36:07

In this episode we meet Alluxio’s new CEO Steven Mih and hear about his passion for AI, data analytics, and data orchestration, as well as his lessons learned as an entrepreneur.Steven has over twenty years of sales, business development, and marketing of enterprise technology solutions, leading organizations such as: Aviatrix, Couchbase, Transitive, Cadence Design Systems, and AMD.You can connect with Steve here:Email: steven@alluxio.com LinkedIn: Steven MihWeb Site: alluxio.io About PPQC:Process and Product Quality Consulting (PPQC) helps global executives tackle complex corporate challenges.To learn more about PPQC, visit www.ppqc.netSupport the show (https://ppqc.net)

ai amd orchestration business performance couchbase aviatrix transitive cadence design systems open source data alluxio

From the Lab that Brought you Spark comes Alluxio, the Data Orchestration Platform of the Future

Big Data Beard

Play Episode Listen Later Sep 17, 2019 40:53

Decoupling compute from storage has been a growing trend in the enterprise as organizations turn towards a data driven business. Cory and Erin sit down with Haoyuan (HY), Founder and CTO of Alluxio to talk about the importance in data orchestration and how Alluxio provides customers with the ability to bring data closer to the compute across clusters, regions, clouds and countries. Created in the same lab as Spark and other widely known open-source technologies, HY talks about the inception of Alluxio and the challenges this data orchestration platform solves. HY also gives his vision for the future of innovations and how Alluxio enables them most effectively. Music from this episode is by Andrew Belle. Please go check him out...you'll thank us!

music founders data platform spark cto hy orchestration decoupling andrew belle alluxio

From the Lab that Brought you Spark comes Alluxio, the Data Orchestration Platform of the Future

Big Data Beard

Play Episode Listen Later Sep 17, 2019 40:53

music founders data platform spark cto hy orchestration decoupling andrew belle alluxio

EAR Podcast with Alluxio's Steven Mih

Designing Enterprise Platforms

Play Episode Listen Later Aug 22, 2019 36:18

On this edition of the Designing Enterprise Platforms podcast of Early Adopter Research (EAR), Dan Woods, the founder and principal analyst at Early Adopter Research speaks with Steven Mih, the CEO of a new company called Alluxio. Alluxio is a company that comes out of the AMPLab at UC Berkeley, which brought the world Spark and many other products for processing data at scale. The AMPLab has created a stack that has a variety of different components, and Alluxio was originally Tachyon and then now has become, in its commercial form, Alluxio. They spoke about an issue of separating the storage from compute engines from the actual compete engines themselves. In the cloud, companies have had increasingly amounts of possibilities of storing in object storage and then having a variety of different compute engines applied to that storage. But that leads to having storage in many places. Some of that storage may be on something like S3, some of it may be on premise, some of it may be in another cloud. The solution that Alluxio proposes to this challenge is what it calls storage orchestration. Their conversation covers: * 4:30 - How to manage many different data storage platforms * 16:30 - Powerful use cases for Alluxio's technology * 22:00 - How Alluxio fits into the open source world

ceo powerful spark uc berkeley s3 tachyon dan woods alluxio

Alluxio Is Democratizing Data Orchestration

TFIR: Open Source & Emerging Technologies

Play Episode Listen Later Jul 16, 2019 29:09

In this interview, we spoke to Haoyuan (H.Y.) Li, Founder, Chairman and CTO of Open Source Alluxio, a company that is democratizing data in the cloud.

founders data cto li democratizing orchestration alluxio

The Alluxio Distributed Storage System - Episode 70

Data Engineering Podcast

Play Episode Listen Later Feb 18, 2019 59:44 Transcription Available

Distributed storage systems are the foundational layer of any big data stack. There are a variety of implementations which support different specialized use cases and come with associated tradeoffs. Alluxio is a distributed virtual filesystem which integrates with multiple persistent storage systems to provide a scalable, in-memory storage layer for scaling computational workloads independent of the size of your data. In this episode Bin Fan explains how he got involved with the project, how it is implemented, and the use cases that it is particularly well suited for. If your storage and compute layers are too tightly coupled and you want to scale them independently then Alluxio is the tool for the job.

system storage distributed alluxio

In the age of AI, fundamental value resides in data

O'Reilly Data Show - O'Reilly Media Podcast

Play Episode Listen Later Jan 3, 2019 29:41

In this episode of the Data Show, I spoke with Haoyuan Li, CEO and founder of Alluxio, a startup commercializing the open source project with the same name (full disclosure: I’m an advisor to Alluxio). Our discussion focuses on the state of Alluxio (the open source project that has roots in UC Berkeley’s AMPLab), specifically […]

ceo data fundamental uc berkeley data show alluxio

Ep. 30: Ion Stoica on how RISELab is pushing the envelope on real-time data

THE ARCHITECHT SHOW

Play Episode Listen Later Jul 27, 2017 66:22

In this episode of the ARCHITECHT Show, Ion Stoica talks about the promise of real-time data and machine learning he's pursuing with the new RISELab project he directs at UC-Berkeley, along with some other big names in big data. Stoica previously was director of the university's AMPLab, which created and helped to mature technologies such as Apache Spark, Apache Mesos and Alluxio. Stoica is also co-founder and executive chairman of Apache Spark startup Databricks, and he shares some insights into that company's business and the evolution of the big data ecosystem. In the news segment, co-hosts Derrick Harris (ARCHITECHT) and Barb Darrow (Fortune) discuss Microsoft (and possibly AWS) doubling down on Kubernetes, Google's cloudy cloud revenue, GoDaddy getting out of the cloud business, and the possibility of Meg Whitman as Uber CEO.

google microsoft uc berkeley aws godaddy kubernetes databricks real time data pushing the envelope meg whitman uber ceo apache spark apache mesos alluxio ion stoica

a16z Podcast: The Storage Renaissance

a16z

Play Episode Listen Later Mar 21, 2017 22:03

As we enter a new era of distributed computing -- and of big data, in the form of machine and deep learning -- storage becomes (even more) important. It might not be sexy, but storage is what makes the internet and cloud computing go round and round: "Without storage, we wouldn't have databases; without databases, we wouldn't have big data; we wouldn't have analytics ... we wouldn't have anything because information needs to be stored, and it needs to be retrieved." This is especially complicated by the fact that more and more computing is happening at the edge, as with autonomous car sensing. Clearly, storage is important. But now it's also undergoing a renaissance as it becomes faster, cheaper, and more in-memory. What does this mean for all the big players in the storage ecosystem? For CIOs and IT departments? For any company competing on data, whether it's in analyzing it or owning it? And for that matter: What is data, really? Beyond the existential questions, this episode of the a16z Podcast -- with a16z partner Peter Levine; Alluxio (formerly Tachyon) founder and CEO Haoyuan Li (“HY”); and storage industry analyst Mike Matchett of The Taneja Group -- covers all this and more. It even tries to make storage, er, great again.

memory renaissance infrastructure big data storage open source cloud computing peter levine a16z tachyon alluxio for cios

Episode 25 – The pro’s and con’s of crafting your own distribution

Roaring Elephant

Play Episode Listen Later Sep 27, 2016 94:59

When we talk about Big Data and Hadoop in particular, we generally have one of the existing distributions from Cloudera, Hortonworks or other Big Data companies in mind. But sometimes, a pre-built distro just does not meet the needs. In this episode, we have a guest on the show that explains why they made the choice to forgo the available distributions in favour of building ones own. http://lod-cloud.net/ 00:00 Recent events Dave: Which tool should I use? http://brohrer.github.io/which_tool_should_i_use.html YaRrr! - The Pirate’s guide to R Blog: http://nathanieldphillips.com/thepiratesguidetor/ YaRrr! - Download the book: https://drive.google.com/file/d/0B4udF24Yxab0S1hnZlBBTmgzM3M/view Video tutorials to go with the above: https://www.youtube.com/playlist?list=PL9tt3I41HFS9gmeZFEuNrnu_7V_NFngfJ Listener Question from Sampath from Baltimore: When moving into a career in Big Data, is it better to pick a technology like Spark and try to build expertise on it versus having a broader knowledge on many tools. I registered for Edx courses and working towards getting Cloudera Certification. Please provide me any advice. Jhon: More accountability for big-data algorithms http://www.nature.com/news/more-accountability-for-big-data-algorithms-1.20653 The "doomsday" version: http://time.com/4471451/cathy-oneil-math-destruction/ 6 Illusions Execs Have About Big Data https://www.entrepreneur.com/article/281809 Michele: Hadoop release 3.0.0-alpha1 available http://hadoop.apache.org/releases.html#03+September%2C+2016%3A+Release+3.0.0-alpha1+available Running Spark on Alluxio with S3 https://www.oreilly.com/learning/running-spark-on-alluxio-with-s3 47:00 The pro's and con's of crafting your own distribution With our special guest Michele Lamarca (@nonfacciocip). Many thanks to Michele for being on the podcast with us and sharing his experiences! 01:34:59 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

video spark crafting pirate distribution big data s3 edx hadoop cloudera contact form sampath hortonworks yarrr alluxio

SE-Radio Episode 260: Haoyuan Li on Alluxio

Software Engineering Radio - The Podcast for Professional Software Developers

Play Episode Listen Later Jun 14, 2016 44:25

Jeff Meyerson talks to Haoyuan Li about Alluxio, a memory-centric distributed storage system. The cost of memory and disk capacity are both decreasing every year–but only the throughput of memory is increasing exponentially. This trend is driving opportunity in the space of big data processing. Alluxio is an open source, memory-centric, distributed, and reliable storage system enabling data sharing across clusters at memory speed. Alluxio was formerly known as Tachyon. Haoyuan is the creator of Alluxio. Haoyuan was a member of the Berkeley AMPLab, which is the same research facility from which Apache Mesos and Apache Spark were born. In this episode, we discuss Alluxio, Spark, Hadoop, and the evolution of the data center software architecture.

data memory cloud spark ram storage apache hadoop apache spark mesos tachyon apache mesos jeff meyerson alluxio se radio

SE-Radio Episode 260: Haoyuan Li on Alluxio

Software Engineering Radio - The Podcast for Professional Software Developers

Play Episode Listen Later Jun 14, 2016 44:24

development testing software engineering architecture patterns enterprise programming languages embedded scripting soa mda concurrency jeff meyerson alluxio se radio

Podcasts about Alluxio

Best podcasts about Alluxio

DMRadio Podcast

Software Engineering Radio - The Podcast for Professional Software Developers

Gestalt IT Rundown

Big Data Beard

Trino Community Broadcast

Utilizing AI - The Enterprise AI Podcast

Designing Enterprise Platforms

Latest news about Alluxio

Latest podcast episodes about Alluxio

Being Data Driven At Stripe With Trino And Iceberg

55: Commander Bun Bun peeks at Peaka

ISS Cleveland and the Current Market | Gestalt IT Rundown: October 25, 2023

Industrial Data Pipelines: Fueling the Modern Enterprise

Fast and Efficient Hybrid Data Access with Alluxio | Episode #68

Why Finding A Mentor Matters

S1E12丨为解决困扰科技巨头的技术难题，他选择开源软件代码

43: Trino saves trips with Alluxio

The Evolution of Databases with Dipti Borkar

AMD Releases Genoa Epyc CPUs | Gestalt IT Rundown: November 16, 2022

Learning Curve? Understanding ML's Growing Role

138: GreyBeards talk big data orchestration with Adit Madan, Dir. of Product, Alluxio

Domain Specific - Why Data Mesh Works

3x28: Revisiting Utilizing AI Season 3

3x25: The Unique Challenges of ML Training Data with Bin Fan

The Rise of Data Orchestration

The Simplest, Yet Most Powerful Trait: Trust

EAR Podcast with Alluxio's Steven Mih

E72: Pautas para utilizar bases de datos como servicio (DBaaS) y sus principales proveedores

Alluxio - Haoyuan Li

Alluxio: Data Orchestration with Haoyuan Li

Making Data Readily Available for Developers

073 – It’s Midnight: Do You Know Where Your Data Is?

034: Open Source Data Orchestration – A Conversation with Steven Mih / Alluxio CEO

From the Lab that Brought you Spark comes Alluxio, the Data Orchestration Platform of the Future

From the Lab that Brought you Spark comes Alluxio, the Data Orchestration Platform of the Future

EAR Podcast with Alluxio's Steven Mih

Alluxio Is Democratizing Data Orchestration

The Alluxio Distributed Storage System - Episode 70

In the age of AI, fundamental value resides in data

Ep. 30: Ion Stoica on how RISELab is pushing the envelope on real-time data

a16z Podcast: The Storage Renaissance

Episode 25 – The pro’s and con’s of crafting your own distribution

SE-Radio Episode 260: Haoyuan Li on Alluxio

SE-Radio Episode 260: Haoyuan Li on Alluxio