The MLOps Podcast

Follow The MLOps Podcast
Share on
Copy link to clipboard

A podcast about bringing machine learning into the real world. Each episode features a conversation with top data science and machine learning practitioners, who'll share their thoughts, best practices, and tips for promoting machine learning to productio

DagsHub


    • Dec 16, 2024 LATEST EPISODE
    • monthly NEW EPISODES
    • 59m AVG DURATION
    • 34 EPISODES


    Search for episodes from The MLOps Podcast with a specific topic:

    Latest episodes from The MLOps Podcast

    Play Episode Listen Later Dec 16, 2024 35:33


    In this episode, Dean and Natanel Davidovits explore the intricacies of AI and machine learning, focusing on model efficiency, the use of APIs versus self-hosting, and the importance of defining success metrics in real-world applications. They discuss the challenges of data quality and labeling, the evolving role of data scientists in the age of LLMs, and the significance of effective communication between data science and product teams. The conversation also touches on the future of robotics in AI and the need for specialization in a rapidly changing landscape. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction to Natanel Davidovits 02:10 Optimizing AI Models for Real-World Tasks 03:47 Success Metrics in Industry vs. Academia 07:52 The Importance of Communication Between Teams 11:33 Handling Data Quality and Labeling Challenges 12:11 The Impact of LLMs on Data Science Careers 16:29 Navigating Specialized Domain Data 22:15 Trends in Machine Learning and AI 27:27 The Future of AI and Robotics 28:28 The Role of AI in Physics 33:36 Controversial Views on AI and Machine Learning 34:05 Final Thoughts and Recommendations ➡️ Natanel Davidovits on LinkedIn – https://www.linkedin.com/in/natanel-davidovits-28695312/

    Play Episode Listen Later Oct 31, 2024 50:38


    In this episode, Dean speaks with Jeremie Dreyfuss, Head of AI Research and Development at Intel, about the evolving role of AI in the enterprise. Jeremie shares insights into scaling machine learning solutions, the challenges of building AI infrastructure, and the future of AI-driven innovation in large organizations. Learn how enterprises are leveraging AI for efficiency, the latest advancements in AI research, and the strategies for staying competitive in a rapidly changing landscape. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction and Overview 00:55 Challenges of Data Collection and Infrastructure 05:00 Optimizing Test Recommendations 14:42 Tips for Deploying Entire ML Pipelines 21:19 The Impact of Large Language Models (LLMs) 25:30 How to Decide About LLM Investment in the Enterprise 29:29 Evaluating Models and Using Synthetic Data 35:34 Choosing the Right Tools for ML and LLM Projects 45:21 The Beauty of Small Data in Machine Learning 48:22 Recommendations for the Audience ➡️ Jeremie Dreyfuss on LinkedIn – https://www.linkedin.com/in/jeremie-dreyfuss/

    Play Episode Listen Later Sep 15, 2024 50:46


    In this episode, Dean speaks with Dror Haor, CTO at SeeTree, about the challenges of deploying AI in agriculture at scale. They explore how SeeTree integrates AI and sensor fusion to manage vast amounts of remote sensing data, helping farmers improve crop yields with high accuracy at low costs. Dror shares insights on handling data drift, customizing models for different regions, and balancing the trade-offs between cost and performance. This conversation dives deep into practical machine learning applications in agriculture, offering valuable lessons for anyone working with large-scale data and AI. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction 00:32 Production in machine learning at SeeTree 07:34 Sensor fusion in machine learning 16:26 Balancing accuracy and cost in agriculture 20:09 Customizing models for different customers and crops 24:19 Dealing with data in different domains 30:10 Tools and processes for ML at SeeTree 35:58 Building for scale 40:17 Collecting user feedback and self-improving products 42:45 Exciting developments in ML & AI 45:12 Hot takes in ML - Overfitting is good 46:34 Recommendations for the Audience ➡️ Dror Haor on LinkedIn – https://www.linkedin.com/in/dror-haor-phd-77152322/ ➡️ Dror Haor on Twitter – https://x.com/DrorHaor

    Play Episode Listen Later Aug 15, 2024 39:36


    In this episode, Dean speaks with Federico Bacci, a data scientist and ML engineer at Bol, the largest e-commerce company in the Netherlands and Belgium. Federico shares valuable insights into the intricacies of deploying machine learning models in production, particularly for forecasting problems. He discusses the challenges of model explainability, the importance of feature engineering over model complexity, and the critical role of stakeholder feedback in improving ML systems. Federico also offers a compelling perspective on why LLMs aren't always the answer in AI applications, emphasizing the need for tailored solutions. This conversation provides a wealth of practical knowledge for data scientists and ML engineers looking to enhance their understanding of real-world ML operations and challenges in e-commerce. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction and Background 01:59 Owning the ML Pipeline 02:56 Deployment Process 05:58 Testing and Feedback 07:40 Different Deployment Strategies 11:19 Explainability and Feature Importance 13:46 Challenges in Forecasting 22:33 ML Stack and Tools 26:47 Orchestrating Data Pipelines with Airflow 31:27 Exciting Developments in ML 35:58 Recommendations and Closing Links Dwarkesh podcast with Anthropic and Gemini team members – https://www.dwarkeshpatel.com/p/sholto-douglas-trenton-bricken ➡️ Federico Bacci on LinkedIn – https://www.linkedin.com/in/federico-bacci/ ➡️ Federico Bacci on Twitter – https://x.com/fedebyes

    Play Episode Listen Later Jul 15, 2024 39:25


    In this episode, Dean speaks with Michał Oleszak, an ML engineering manager at Solera. Michał shares insights into how his team is using machine learning to transform the automotive claims process, from recognizing vehicle damages in images to estimating repair costs. The conversation covers the challenges of deploying ML pipelines in production, managing data quality for computer vision tasks, and balancing technical implementation with business needs. Michał also discusses his approach to model evaluation, the benefits of monorepo architecture, and his views on exciting developments in self-supervised learning for computer vision. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction 00:42 Production for Machine Learning at Solera 03:49 Transitioning from Images to Structured Data 04:58 Combining Deep Learning and Non-Deep Learning Models 05:15 Deployment Process for Machine Learning Models 08:01 Challenges and Solutions in Monorepo Adoption 12:57 Evaluating Model and Pipeline Versions 21:57 Tools for ML Projects: Monorepo, Pants, GitHub Actions 24:04 Data Management and Data Quality 30:14 Challenges in ML Efforts: Data Quality 30:37 Excitement about Self-Supervised Learning and JEPA Architectures 34:45 Controversial Opinion: Importance of Statistics for ML 36:40 Recommendations Links

    Play Episode Listen Later Jun 10, 2024 50:26


    In this episode, I chat with Ljubomir Buturovic, VP of ML and Informatics at Inflammatix. We discuss using ML to diagnose infections and blood tests in the emergency room. We dive into the challenges of building diagnostic (classification) and prognostic (predictive) modes, with takeaways related to building datasets for production use cases. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 What is Inflammatix and how do they use ML7:32 Edge Device Deployment: The Future of Model Deployment21:16 Navigating Regulatory Submission for Medical Products 26:01 Evolution of Regulatory Processes in ML for Medical Applications30:18 Challenges and Solutions in ML for Medical Applications 34:00 The Future of AI in Clinical Care40:25 The Overrated Concept of Interpretability in AI and ML45:32 RecommendationsLinks

    Play Episode Listen Later May 16, 2024 62:56


    In this episode, Idan Gazit, Senior Director of Research at GitHub Next, discusses his role in exploring strategic technologies and incubating long bet projects. He explains how the GitHub Next team chooses research projects and the process of exploration and theme selection. Idan also shares insights into the ML focus at GitHub Next and the challenges of evaluating the impact of AI products. He reflects on his journey into the AI space and provides advice for testing AI products in smaller organizations. Finally, he shares his thoughts on the future of AI interfaces. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction and Background 00:56 Choosing Research Projects at GitHub Next 06:09 ML Focus in GitHub Next 10:52 ML Work and the Leaky Abstraction 13:16 Idan's Journey into the AI Space 17:54 Evaluating the Impact of AI Products 24:36 Testing AI Products in Smaller Organizations 32:52 The Future of AI Interfaces 40:01 Transitioning from Prototype to Product 46:45 Challenges in the ML/AI Space 56:03 Recommendations ➡️ Idan Gazit on LinkedIn – https://www.linkedin.com/in/idangazit/ ➡️ Idan Gazit on Twitter – https://twitter.com/idangazit

    Play Episode Listen Later Apr 18, 2024 32:47


    In this episode, I chatted with Uri Goren, founder and CEO of Argmax, about Machine Learning and the future of digital advertising in a world moving away from cookies due to privacy laws like GDPR and CCPA. We chat about challenges in maintaining personalized ads while respecting user privacy, and new methods like probabilistic models and contextual features to cover some of the gap left by removing cookies. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction 00:35 The Rise of Privacy Regulations 1:40 The Impact of Losing Cookies 2:48 Understanding Cookies 4:33 Reasons for the Decline of Cookies 8:47 ML Leveraging Cookies in Advertising 10:32 The Shift to Contextual Features 12:53 The Future of ML without Cookies 15:23 New and Old Ways of Generating Contextual Features 20:33 Regulatory Conspiracies 22:33 Unsolved Problems in ML and AI 24:39 Predictions for the Next Year in AI and ML 26:17 Controversial Take: Overuse of LLMs 28:03 Recommendations ➡️ Uri Goren on LinkedIn – https://www.linkedin.com/in/ugoren/

    Play Episode Listen Later Mar 18, 2024 65:42


    In this episode, I speak with Han-Chung Lee, a machine learning engineer with a lot of interesting takes on ML and AI. We dive into the buzz around natural language processing and the big waves in generative AI. They chat about how newcomers are racing through NLP's history, mixing old school and new tech, and the shift towards smarter databases. Han-Chung breaks it down with his straightforward takes, making complex AI trends feel like coffee chat topics. It's a perfect listen for anyone keen on where AI's headed, minus the jargon. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Intro 0:41 State of NLP and LLMs 1:33 Repeating the past in NLP 3:29 Vector databases vs. classical databases 8:49 Choosing the right LLM for an application 12:13 Advantages and disadvantages of LLMs 16:10 Where LLMs are most useful 21:13 The dark side of LLMs and can we detect it? 25:19 Thoughts on LLM leaderboard metrics 31:19 Using LLMs in regulated industries 36:40 Creating a moat in the LLM world 40:20 Evaluating LLMs 44:20 Impact of LLM on non-english languages 48:35 Thoughts on MLOps and getting ML into production 56:48 The Hardest Unsolved Problem in ML and AI 59:09 Predictions for the Future of ML and AI 1:03:25 Recommendations and Conclusion ➡️ Han Lee on Twitter – https://twitter.com/HanchungLee ➡️ Han Lee on LinkedIn – https://www.linkedin.com/in/hanchunglee/

    Play Episode Listen Later Feb 15, 2024 58:48


    In this episode, I had the pleasure of speaking with Mila Orlovsky, a pioneer in medical AI. We delve into practical applications, overcoming data challenges, and the intricacies of developing AI tools that meet regulatory standards. Mila discusses her experiences with predictive analytics in patient care, offering tips on navigating the complexities of AI implementation in medical environments. This episode is packed with actionable advice and forward-thinking strategies, making it essential listening for professionals looking to impact healthcare through AI. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction and Background 4:03 Early Days of Machine Learning in Medicine 5:19 Challenges in Building Medical AI Systems 6:54 Differences Between Medical ML and Other ML Domains 15:36 Unique Challenges of Medical Data in ML 24:01 Counterintuitive Learnings on the Business Side 28:07 Impact and Value of ML Models in Medicine 29:41 The Role of Doctors in the Age of AI 38:44 Explainability in Medical ML 44:31 The FDA and Compliance in Medical ML 48:56 Feedback and Iteration in Medical ML 52:25 Predictions for the Future of ML and AI 53:59 Controversial Predictions in the Field of ML 56:02 Recommendations 57:58 Conclusion ➡️ Mila Orlovsky on LinkedIn – https://www.linkedin.com/in/milaorlovsky/

    ⏪ Making LLMs Backwards Compatible with Jason Liu

    Play Episode Listen Later Jan 15, 2024 53:41


    In this episode, I had the pleasure of speaking with Jason Liu, an applied AI consultant and the creator of Instructor – an open-source tool for extracting structured data from LLM outputs. We chat about LLM applications, their challenges, and how to overcome them. We also dive into Instructor, making LLMs interact with existing systems and a bunch of other cool things. Join our Discord community: https://discord.gg/tEYvqxwhah ➡️ Jason Liu on Twitter – https://twitter.com/jxnlco

    Play Episode Listen Later Sep 6, 2023 71:37


    In this live episode, I'm speaking with Jinen Setpal, ML Engineer at DagsHub about actually building, deploying, and monitoring large language model applications. We discuss DPT, a chatbot project that is live in production on the DagsHub Discord server and helps answer support questions and the process and challenges involved in building it. We dive into evaluation methods, ways to reduce hallucinations and much more. We also answer the audience's great questions.

    Live MLOps Podcast Episode!

    Play Episode Listen Later Aug 28, 2023 0:28


    Join now to take part in our first live MLOps Podcast episode. I'll be chatting with Jinen Setpal, ML Engineer at DagsHub about his work building LLM applications and getting LLMs into production. Sign up for the event at the link here: https://www.linkedin.com/events/7098968036782596096/comments/

    ⛹️‍♂️ Large Scale Video ML at WSC Sports with Yuval Gabay

    Play Episode Listen Later Aug 7, 2023 62:09


    In this episode, I had the pleasure of speaking with Yuval Gabay, MLOps Engineer at WSC Sports. Yuval builds better infrastructure and automation for developing, training, and deploying machine learning models at scale, with a focus on video data. We talk about MLOps methodologies, standardizing deployment in the organization, and closing the loop back from production into training. Watch the video: https://youtu.be/3m__nRuifsQ Join our Discord community: https://discord.gg/tEYvqxwhah ➡️ Yuval Gabbay on LinkedIn – https://www.linkedin.com/in/yuval-gabay-68963253/ ➡️ WSC Sports – https://wsc-sports.com/

    Play Episode Listen Later Jun 20, 2023 65:43


    In this episode, I had the pleasure of speaking with Hamel Husain. Hamel is a machine learning and MLOps extraordinaire, he was one of the core maintainers of Fast.ai and has worked on ML and MLOps in places like Data Robot, Airbnb, and GitHub. We talk about Large Language Models, the future role of data scientists in the world of LLMs, and Hamel's approach to solving MLOps problems. Watch the video: https://www.youtube.com/watch?v=3oElMXPkaVs Relevant Links:

    Play Episode Listen Later May 23, 2023 56:08


    In this episode, I had the pleasure of speaking with Almog Baku, a serial entrepreneur, consultant in Cloud, AI Infrastructure and Foundational models. We talk about Kubernetes, Large Language Models (LLMs), how to get them into production, and how data is becoming a more central piece of the ML landscape. We also Discuss Almog's newest project, Raptor ML, which helps ML teams productionize ML pipelines. Watch the video: https://www.youtube.com/watch?v=DCApRXhXD_w&feature=youtu.be Join our Discord community: https://discord.gg/tEYvqxwhah Relevant Links:

    Play Episode Listen Later Mar 30, 2023 56:56


    In this episode, I had the pleasure of speaking with Shreya Shankar, Ph.D. student at Berkeley RISELab. We chat about auto data validation and MLOps. Sherya shares her insights on several interesting topics, including the challenges of automating the data validation process and how to overcome them. We also discuss what makes organizations able to iterate faster in machine learning, and some predictions about the future of machine learning and MLOps. Watch the video: https://youtu.be/_hi6--H2Hug Join our Discord community: https://discord.gg/tEYvqxwhah Relevant Links:

    Play Episode Listen Later Feb 21, 2023 50:14


    In this episode, Dean speaks with Noa Weiss, the wonderful AI & ML consultant. They dive into Deep Learning research for marine mammal sounds, abstractions for machine learning projects and some of the unspoken challenges she's seen in the ML development process. Also prediction markets and Harry Potter. Watch the video: https://www.youtube.com/watch?v=uQrR0KPq3RQ Join our Discord community: https://discord.gg/tEYvqxwhah Relevant Links: Noa's talks:

    ✍️ Building ML Teams and Platforms with Assaf Pinhasi

    Play Episode Listen Later Jan 23, 2023 78:11


    In this episode, I speak with Assaf Pinhasi, ML engineering and MLOps consultant extraordinaire! Assaf was the VP R&D at Zebra Medical Vision, and built the PayPal Risk organization's Big Data Platform. We dive into building ML infrastructure from scratch 10 years ago vs. today, best practices involved in building teams to support machine learning models in production, and the future of generative models. Watch the video: https://youtu.be/tSbuDA5tMxQ Join our Discord community: https://discord.gg/tEYvqxwhah ➡️ Assaf Pinhasi on LinkedIn – https://www.linkedin.com/in/assafpinhasi/

    discord platforms ml assaf zebra medical vision

    Play Episode Listen Later Dec 15, 2022 74:17


    In this episode, I speak with David Marx, Distinguished Engineer at Stability AI. This talk dives into how David got into machine learning, open-source software, and Stability AI. We discuss following your curiosity, and what it takes to deploy a model like Stable Diffusion to production. Watch the video: https://youtu.be/49dsoDK1KCA Join our Discord community: https://discord.gg/tEYvqxwhah Relevant Links:

    Play Episode Listen Later Nov 21, 2022 61:03


    In this episode, I speak with Logan Kilpatrick, Julia Language Developer Community Advocate. We talk about machine learning at NASA and how he discovered Julia as a student, the age-old Julia vs. Python debate, and how to get into a new scientific and technical field. It was absolutely awesome! Check it out. Watch the video: https://www.youtube.com/watch?v=3kgRN8hJIro Join our Discord community: https://discord.gg/tEYvqxwhah Relevant Links: ➡️ Logan Kilpatrick on LinkedIn – https://www.linkedin.com/in/logankilpatrick/ ➡️ Logan Kilpatrick on Twitter – https://twitter.com/OfficialLoganK ➡️ Julia Language on Twitter – https://twitter.com/JuliaLanguage Recommendation Links:

    Play Episode Listen Later Oct 18, 2022 80:55


    In this episode, I speak with Guy Smoilovsky, my friend, Co-Founder, and the CTO of DagsHub. We talk about quantum computing and AGI, concrete approaches for automating ML deployment, and how DagsHub came to be. Watch the video: https://www.youtube.com/watch?v=67dByhXPT5g Join our Discord community: https://discord.gg/tEYvqxwhah Relevant Links: ➡️ Guy Smoilovsky on LinkedIn – https://www.linkedin.com/in/guy-smoilovsky/ ➡️ Guy Smoilovsky on Twitter – https://twitter.com/Guy_T_Sky/ TDD in machine learning – https://towardsdatascience.com/tdd-datascience-689c98492fcc Recommendation Links: Astral Codex Ten – https://astralcodexten.substack.com/ Don't Worry About the Vase – https://thezvi.wordpress.com/ The Sandman – https://www.imdb.com/title/tt1751634/ Lady Silver – https://www.ladysilverband.com/

    Play Episode Listen Later Sep 16, 2022 71:00


    In this episode, I speak with Dean Langsam, Data Scientist at SentinelOne and one of the organizers of PyData in Israel. We chat about imposter syndrome, the best field in machine learning, why XGBoost is the best model, and the fact that most organizations have too much data. It was fascinating for me, so I hope you enjoy it too.

    Play Episode Listen Later Aug 22, 2022 80:48


    In this episode, I had the pleasure of speaking with Jacopo Tagliabue, Director of AI at Coveo. We talk about Reasonable Scale MLOps, how to approach building your ML platform, and how quickly you might hit the limits of model deployment (hint: it's pretty surprising) Join our Discord community: https://discord.gg/tEYvqxwhah Relevant Links: ➡️ Jacopo on LinkedIn – https://www.linkedin.com/in/jacopotagliabue/ ➡️ Jacopo on Twitter – https://twitter.com/jacopotagliabue Recommendation Links:

    Play Episode Listen Later Jul 18, 2022 88:21


    In this episode, I had the pleasure of speaking with Goku Mohandas, founder of Made With ML. Goku has an incredible amount of experience building and teaching the community about machine learning and MLOps systems. We dive into system thinking and solving for ML workflows, his journey in the machine learning world, and how he chooses what to learn next. We discuss the most common mistakes he's seen in productionizing ML models and why building models no one will use is not necessarily bad. Join our Discord community: https://discord.gg/tEYvqxwhah Relevant Links:

    Play Episode Listen Later Jun 20, 2022 58:21


    In this episode, I had the pleasure of speaking with Kyle Gallatin, a Machine Learning Software Engineer at Etsy. We talk about how he built the machine learning platform at Etsy, experimentation in production (yes, you heard right), and how to optimize model performance at very large scales. It was awesome, and I'm sure many of you can learn a ton from this one! Join our Discord community: https://discord.gg/tEYvqxwhah Relevant Links: ➡️ Kyle on LinkedIn – https://www.linkedin.com/in/kylegallatin/

    Play Episode Listen Later May 16, 2022 61:07


    In this episode, I'm speaking with Charlene Chambliss, Software Engineer at Aquarium. Charlene has vast experience getting NLP models to production. We dive into the intricacies of these models and how they differ from other ML subfields, the challenges in productionizing them, and how to get excited about data quality issues. Join our Discord community: https://discord.gg/tEYvqxwhah Relevant Links: ➡️Charlene on LinkedIn – https://www.linkedin.com/in/charlenechambliss/ ➡️Charlene on Twitter – https://twitter.com/blissfulchar Recommendations:

    Play Episode Listen Later Apr 18, 2022 63:43


    In this episode, I'm speaking with the one and only, Yannic Kilcher! We talk about sunglasses

    Play Episode Listen Later Feb 14, 2022 65:29


    In this episode, we dive into the challenging but very important topic of getting data scientists to write better code. How to approach complex machine learning projects and break them down, and why growing unicorns

    Play Episode Listen Later Nov 4, 2021 68:50


    In this episode, I'm speaking with Lee Harper, Principal Data Scientist at Catapult Systems. Lee holds a Ph.D. in Physical and Theoretical Chemistry. Lee is a teacher-turned-data scientist. We cover the various entry paths into the world of data science, the value of background diversity, security in ML production, and even AI fairness. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Podcast intro 01:00 Guest introduction 01:39 How did you get into the fields of data science and machine learning? 05:04 Coding boot camps vs. academia & diversity of backgrounds in ML 09:37 How does the process of bringing your work into production change over the years? 13:02 How has the change in the languages used for data science affected production processes? 16:01 How do you accelerate the timeframes for getting from POC to production in ML? 18:19 Do data scientists reinvent the wheel more often than software developers, and why? 22:14 The value of learning how to Google 23:00 Recurring themes, challenges, and common issues in data science 27:50 Solving for security in ML in production 31:57 ML security considerations for startups 34:30 Data security considerations in ML 35:18 What is the most interesting topic in machine learning right now? 38:05 ML fairness, bias, and responsible AI 41:44 What does it mean to build a fair or unbiased model? 47:15 If you had to choose one challenge in bringing models to production, what would it be? 51:00 What are the tools and processes that you use to make the transition to production easier? 55:35 About "vendor lock-in" 58:00 Your favorite tool recommendations 1:03:35 Recommendations for the audience --- Relevant Links: Linux Command Line and Shell Scripting Bible – https://www.amazon.com/Linux-Command-Shell-Scripting-Bible/dp/1119700914 Project Hail Mary – https://www.amazon.com/Project-Hail-Mary-Andy-Weir/dp/0593135202 Social Links: https://www.linkedin.com/company/dagshub/ https://www.linkedin.com/company/catapult-systems/ https://www.linkedin.com/in/leeharper2425/ https://twitter.com/DeanPlbn https://twitter.com/TheRealDAGsHub

    Play Episode Listen Later Sep 20, 2021 73:44


    In this episode, I'm speaking with Roey Mechrez from BeyondMinds. Roey holds a Ph.D. in Electrical Engineering, with vast experience in computer vision and deep learning research. We discuss the challenges of gluing together infrastructure solutions for an end-to-end ML platform, as well as generating monitoring insights for non-technical stakeholders and combating catastrophic forgetting. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Podcast intro 01:00 Guest intro 01:49 What does BeyondMinds do? 06:24 Audience for an end-to-end ML platform 12:14 Communicating with non-technical stakeholders/users 15:03 The future of "AI-powered tools", and human-machine collaboration 20:04 On complex system orchestration, generating insights from monitoring, and catastrophic forgetting – Biggest challenges in production ML 25:23 Why is catastrophic forgetting a hard problem and how do you deal with it? 30:02 "Secret" tips on how to get started with automating the retraining process 33:30 Generating monitoring insights and observations in a user-friendly format 38:12 Making data labeling issues explainable (automatically) 45:07 Customizing complex systems per user – Orchestrating an ML platform 52:58 API design in ML platform components 55:45 Measuring success for researchers, ML engineers, and software developers – can ML work fit into the Agile workflow. 1:02:22 Is "time to production" a good metric? Gains in time to production in the real world 1:06:02 How do you divide the work between ML researchers and engineers? 1:08:39 Recommendations for the audience --- Relevant Links: A16z blog about AI Data Science work in an agile environment – A talk by Dima Goldenberg Hayot Kis (Hebrew Podcast) חיות כיס Data Engineering Podcast ACX Podcast Social Links: https://www.linkedin.com/company/beyondminds/ https://www.linkedin.com/company/dagshub/ https://twitter.com/roeyme https://twitter.com/DeanPlbn https://twitter.com/TheRealDAGsHub

    Play Episode Listen Later Aug 11, 2021 45:34


    In this episode, I'm speaking with Ran Romano from Qwak.ai. Ran built the ML platform at Wix, and we discuss the various data roles, when organizations should focus on ML infrastructure, solving the hard problems of features stores, and one approach to building an end-to-end ML platform. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Podcast intro 01:00 Guest intro 01:30 Getting into the world of ML and ML Engineering 02:25 The line between Data Engineer, ML Engineer, and Data Scientist 03:50 The future of data roles – what are the trends? 07:21 The most exciting part about taking ML models into production 09:45 Jupyter notebooks in production (again??) 10:41 Signs that notebook productionization might not work 11:42 Building ML-focused CI/CD systems 15:32 Early days of building out the Wix ML platform 16:22 Signs that you might need to focus on ML infrastructure in your organization, and how to convince other stakeholders. 19:21 What part of the platform that you built are you most proud of? 23:51 Defining a feature store and the training/serving skew 27:24 Onboarding data scientists to using a feature store 33:49 When is it too early to build an ML platform? 35:33 Open source components – What parts of your platform did you choose not to build yourself? 40:16 Qwak.ai – What are you working on currently? 41:07 How do you define an "end-to-end" platform in the case of Qwak 44:25 End-to-end vs. Integrated – Advantages and disadvantages --- Relevant Links: - Qwak.ai: https://www.qwak.ai - Wix ML Platform presentation by Ran: https://www.youtube.com/watch?v=E8839ENL-WY - https://www.linkedin.com/company/dagshub - https://www.linkedin.com/company/qwak-ai/ - https://twitter.com/TheRealDAGsHub - https://twitter.com/DeanPlbn - https://twitter.com/ranvromano

    Play Episode Listen Later Apr 27, 2021 61:18


    In this episode, I'm speaking with Urszula Czerwinska about her path as a data scientist, the projects she worked on, experiences gained as a data scientist, as well as the challenges she's overcome in bringing her machine learning (ML) into production. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 0:00 - Podcast intro 1:15 - Guest intro and how you got into data science 3:48 - Finding your fit – research or industry and when to transition 7:23 - What types of ML projects do you specialize in 10:41 - ML explainability and interpretability 15:26 - ML explainability with non-technical stakeholders 17:13 - What problems does your team solve within the organization 20:56 - ML in production – how to bring your ML projects from research to production 25:17 - The tools you can't live without 28:11 - Do you have a set process for productizing ML projects 30:08 - Team structures and communication for data science teams 33:42 - Who's in charge of setting up infrastructure for a project and job title discussion 36:29 - Interesting tools and repositories you work with 39:30 - How do you stay up to date 42:00 - Biggest challenges for you in ML 45:12 - Favorite and least favorite thing about being a data scientist 49:52 - Handling a workplace that doesn't understand what a data scientist is 53:07 - Data scientists are

    Claim The MLOps Podcast

    In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

    Claim Cancel