Podcasts about predibase

  • 21PODCASTS
  • 33EPISODES
  • 46mAVG DURATION
  • ?INFREQUENT EPISODES
  • Mar 13, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about predibase

Latest podcast episodes about predibase

The Data Exchange with Ben Lorica
The Evolution of Reinforcement Fine-Tuning in AI

The Data Exchange with Ben Lorica

Play Episode Listen Later Mar 13, 2025 45:45


Travis Addair is Co-Founder & CTO at Predibase. In this episode, the discussion centers on transforming pre-trained foundation models into domain-specific assets through advanced customization techniques.Subscribe to the Gradient Flow Newsletter

Pathmonk Presents Podcast
AI Platform Reshaping Custom Model Development | Michael Ortega from Predibase

Pathmonk Presents Podcast

Play Episode Listen Later Jan 14, 2025 18:41


In this episode, we welcome Michael Ortega, Head of Marketing at Predibase, a cutting-edge AI platform making custom AI model development accessible and cost-effective for developers.  Michael shares valuable insights into how Predibase is democratizing AI development through their platform, enabling businesses across various sectors to build and deploy custom language models. He discusses fascinating case studies, including World Wildlife Fund's use of AI for report analysis and an innovative AI-powered artist interaction platform.  Michael also delves into their marketing strategies, emphasizing the importance of content marketing, website optimization, and maintaining strong customer relationships in the B2B tech space.

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch
20VC: Why We Are in a Bubble & Now is Frothier Than 2021 | Why $1M ARR is a BS Milestone for Series A | Why Seed Pricing is Rational & Large Seed Rounds Have Less Risk | Why Many AI Apps Have BS Revenue & Are Not Sustainable with Saam Motamedi

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch

Play Episode Listen Later Jul 15, 2024 66:25


Saam Motamedi is a General Partner at Greylock, where he has led investments in Abnormal Security (incubated at Greylock), Apiiro Security and Opal Security, as well as AI companies like Adept, Braintrst, Cresta, Predibase, Snorkel, and more. Before Greylock, Saam founded Guru Labs, a machine learning-driven fintech startup, and worked in product management at RelateIQ, one of the first applied AI software companies. In Today's Conversation We Discuss: 1. Seed Today is Frothier than 2021: How does Saam evaluate the seed market today? With seed pricing being so high, how does he reflect on his own price sensitivity? When does he say too much and does not do it? Despite seed pricing being higher than ever before, why does Saam believe it is rational? How has the competition at seed changed in the last few years? 2. Series B and Growth are not a Viable Asset Class Today: Why does Saam believe that you cannot make money at Series B today? Why has pricing gone through the roof? Who is the new competition? When does it make sense to "play the game on the field" vs say this is BS and do something else? What would need to happen in the public markets for Series B to be a viable asset class again? 3. Markets vs Founders: The Billion Dollar Mistake and Lessons: How does Saam prioritise between founder vs market? What have been Saam's biggest lessons when it comes to market sizing and timing? What is Saam's biggest miss? How did it change his approach and company evaluation? Which other VC would Saam most like to swap portfolios with? Why them? 4. Saam Motamedi: AMA: What does Saam know now that he wishes he had known when he got into VC? Saam has had a meteoric rise in Greylock, what advice does Saam have for those younger investors look to really scale within a firm? Sourcing, selecting and servicing: Where is he best? Where is he worst? Why does Saam believe that most VCs do not add value? 20VC: Why We Are in a Bubble & Now is Frothier Than 2021 | Why $1M ARR is a BS Milestone for Series A | Why Seed Pricing is Rational & Large Seed Rounds Have Less Risk | Why Many AI Apps Have BS Revenue & Are Not Sustainable with Saam Motamedi @ Greylock

Founded and Funded
Building Predibase: AI, Open-Source, and Commercialization with Dev Rishi and Travis Addair

Founded and Funded

Play Episode Listen Later Jun 12, 2024 41:46


Madrona Partner Vivek Ramaswami hosts Dev Rishi and Travis Addair, co-founders of 2023 IA40 Rising Star, Predibase. Predibase is a platform for developers to easily fine-tune and serve any open-source model. They recently released LoRA Land, a collection of 25 highly-performant open-source fine-tuned models, and launched open-source LoRA eXchange, or LoRAX, which allows users to pack hundreds of task-specific models into a single GPU. In this episode, Dev and Travis share their perspective on what it takes to commercialize open-source projects, how to build a product your customers actually want, and how to differentiate in a very noisy AI infrastructure market. This is a must-listen for anyone building an AI. Full Transcript: https://www.madrona.com/predibase-ai-open-source-commercialization Chapters (00:00) Introduction (01:29) The Founding Story of Predibase (04:03) Vision and Evolution of Predibase (06:23) Predibase's Response to the ChatGPT Hype (11:26) Commercializing Open-Source Projects (17:37) Bigger Isn't Always Better - LLMs (24:10) Challenges and Solutions in AI Infrastructure (30:47) Predibase's Unique Approach and Future Outlook (34:11) Over-Hyped vs. Under-Hyped in AI (38:37) Advice for First-Time Founders

The Data Scientist Show
Adapters: the game changer for fine-tuning - Geoffrey Angus - The Data Scientist Show #084

The Data Scientist Show

Play Episode Listen Later Mar 8, 2024 52:45


I interviewed Geoffery Angus, ML team lead @Predibase to talk about why adapter-based training is a game changer. We started with an overview of fine-tuning and then discussed five reasons why adapters are the future of LLMs. Later we also shared a demo and answered questions from the live audience. Try fine-tuning for free: https://pbase.ai/GetStarted Geoffrey's LinkedIn:https://www.linkedin.com/in/geoffreyangus Daliana's Twitter: ⁠https://twitter.com/DalianaLiu⁠ Daliana's LinkedIn: ⁠https://www.linkedin.com/in/dalianaliu/⁠ Daliana's Twitter: ⁠https://twitter.com/DalianaLiu⁠ Daliana's LinkedIn: ⁠https://www.linkedin.com/in/dalianaliu/ Geoffrey's LinkedIn: https://www.linkedin.com/in/geoffreyangus Try finetuning for free: https://pbase.ai/GetStarted (00:00:00) Intro (00:01:19) What is Fine-tuning? (00:08:18) Utilizing Adapters for Finetuning Enhancement (00:09:50) 5 reasons why adapters are the future of LLMs (00:26:34) Common Mistakes in Adapters Usage (00:28:34) Training Your Own Adapter (00:32:23) Behind the Scenes of the Adapter Training Process (00:37:51) Config File Guidance for Fine-Tuning (00:39:41) Debugging Strategies for Suboptimal Fine-Tuning Results (00:42:23) User Queries: Creating a LoRa Adapter and Future Support (00:51:06) Key Takeaways and Recap

Crafted
Crafted Highlights from 2023 (Our New Trailer)

Crafted

Play Episode Listen Later Feb 21, 2024 8:09


A highlight reel that will will help you find great episodes in the Crafted show archive, featuring some of the incredible founders and product builders we featured in 2023. You'll hear voices belonging to the founders of Gusto, Lattice, Monte Carlo, Moov, Wevr, PS, as well as from senior leaders at AWS, Redesign Health, Betterment, Flatiron Health, Predibase, and Bauplan, and more! And we want to hear from you! We're planning big things this year and want to learn more about what you want to hear. Please take two minutes to take this short survey: https://www.tinyurl.com/craftedsurvey ---Crafted is sponsored by Artium, a next generation software development consultancy that combines elite human craftsmanship and artificial intelligence. See how Artium can you build your future at artium.ai

Crafted
GenAI: From Prototype to Production at NYC Tech Week 2023. Featuring Jacopo Tagliabue of Bauplan, Catherine Miller of Flatiron Health, Raghvender Arni of AWS Industries, and Justin Zhao of Prediba.

Crafted

Play Episode Listen Later Nov 28, 2023 43:55


“The last nine months, no matter what we want to talk about, our customers want to talk about one thing, which is Gen AI.”One year ago, ChatGPT woke the entire world up to the possibilities of Generative AI. Since then, the conversation around AI has not ceased, with constant questions being raised about its safety, accuracy, and potential implications for the tech world. In this episode, Dan hosts a panel at NYC Tech Week 2023, discussing the evolution and potential of Generative AI with Jacopo Tagliabue of Bauplan, Catherine Miller of Flatiron Health, Raghvender Arni of AWS Industries, and Justin Zhao of Predibase. Together, they discuss their companies' adoption of AI, delving into the need for a stable foundation in order to move Generative AI from prototype to production and the works of science fiction today that could be commonplace in the near future.

The Machine Learning Podcast
Applying Declarative ML Techniques To Large Language Models For Better Results

The Machine Learning Podcast

Play Episode Listen Later Oct 24, 2023 46:11


Summary Large language models have gained a substantial amount of attention in the area of AI and machine learning. While they are impressive, there are many applications where they are not the best option. In this episode Piero Molino explains how declarative ML approaches allow you to make the best use of the available tools across use cases and data formats. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Your host is Tobias Macey and today I'm interviewing Piero Molino about the application of declarative ML in a world being dominated by large language models Interview Introduction How did you get involved in machine learning? Can you start by summarizing your perspective on the effect that LLMs are having on the AI/ML industry? In a world where LLMs are being applied to a growing variety of use cases, what are the capabilities that they still lack? How does declarative ML help to address those shortcomings? The majority of current hype is about commercial models (e.g. GPT-4). Can you summarize the current state of the ecosystem for open source LLMs? For teams who are investing in ML/AI capabilities, what are the sources of platform risk for LLMs? What are the comparative benefits of using a declarative ML approach? What are the most interesting, innovative, or unexpected ways that you have seen LLMs used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on declarative ML in the age of LLMs? When is an LLM the wrong choice? What do you have planned for the future of declarative ML and Predibase? Contact Info LinkedIn (https://www.linkedin.com/in/pieromolino/?locale=en_US) Website (https://w4nderlu.st/) Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast (https://www.dataengineeringpodcast.com) covers the latest on modern data management. Podcast.__init__ () covers the Python language, its community, and the innovative ways it is being used. Visit the site (https://www.themachinelearningpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com (mailto:hosts@themachinelearningpodcast.com)) with your story. To help other people find the show please leave a review on iTunes (https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243) and tell your friends and co-workers Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Links Predibase (https://predibase.com/) Podcast Episode (https://www.themachinelearningpodcast.com/predibase-declarative-machine-learning-episode-4) Ludwig (https://ludwig.ai/latest/) Podcast.__init__ Episode (https://www.pythonpodcast.com/ludwig-horovod-distributed-declarative-deep-learning-episode-341/) Recommender Systems (https://en.wikipedia.org/wiki/Recommender_system) Information Retrieval (https://en.wikipedia.org/wiki/Information_retrieval) Vector Database (https://thenewstack.io/what-is-a-real-vector-database/) Transformer Model (https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)) BERT (https://en.wikipedia.org/wiki/BERT_(language_model)) Context Windows (https://www.linkedin.com/pulse/whats-context-window-anyway-caitie-doogan-phd/) LLAMA (https://en.wikipedia.org/wiki/LLaMA) The intro and outro music is from Hitman's Lovesong feat. Paola Graziano (https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/)/CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/)

Redefining AI - Artificial Intelligence with Squirro
Piero Molino PhD - Open Source Unleashed: A Conversation on the Power of Community-Driven Tech

Redefining AI - Artificial Intelligence with Squirro

Play Episode Listen Later Oct 3, 2023 27:02


In this Episode Open Source Unleashed: A Conversation on the Power of Community-Driven Tech Lauren Hawker Zafer is joined by Piero Molino PhD Who is Piero Molino PhD? Piero Molino, Ph.D., is CEO and co-founder of Predibase, a low-code AI platform that makes it easy for engineers to build, optimize and deploy state-of-the-art models - from linear regressions to large language models - with just a few lines of code.  He previously worked as a research scientist exploring machine learning and natural language processing at Stanford University, Yahoo, IBM Watson, Geometric Intelligence, and Uber, where he was a founding member of the Uber AI organization.  He is the creator of Ludwig, a Linux-Foundation-backed open-source declarative deep learning framework with more than 9000 stars on GitHub. Why this Episode? Dive into the world of AI democratization with our latest podcast episode featuring Piero Molino PhD, CEO of Predibase! Join host Lauren Hawker Zafer as she discusses intriguing facts about the world of AI that will captivate tech enthusiasts and novices alike. Discover the secrets behind "Ludwig" and learn how this open-source project is revolutionizing AI democratization. Explore the concept of universal access to AI technologies and how it benefits users who crave free access to cutting-edge tools. Piero unveils the shift in the AI landscape, where data collection and labeling are no longer the initial hurdles. Explore the implications of this transformation on the democratization of AI and the security measures in place to protect sensitive data. Find out how smaller organizations and individual developers are now empowered to harness the power of AI models through user-friendly APIs. Uncover the integration of classic machine learning techniques with large language models and how it's becoming accessible to a broader audience. Get a glimpse of the evolving role of digital assistants in the tech world and how they fit into the changing dynamics of AI democratization.  Don't miss out on real-world examples of AI in action, from text classification to information extraction, and the transition from structured to unstructured data.  Join us on this captivating journey through the exciting landscape of AI democratization. Tune in now to redefine your understanding of AI!

SaaS Scaled - Interviews about SaaS Startups, Analytics, & Operations
Democratization of LLM & AI, + Tackling Challenges with Dev Rishi

SaaS Scaled - Interviews about SaaS Startups, Analytics, & Operations

Play Episode Listen Later Oct 2, 2023 32:28


On today's episode, we're joined by Devvret Rishi, CEO & Co-Founder at Predibase, the low-code AI platform for developers. We talk about:Progress towards democratization of LLM & AITackling the main challenges of machine learningWill LLM apps replace or enhance SaaS apps?Insights from Predibase's survey of 150 executives, data scientists, & developersWhy people will stop trying to use a cannon when they need a scalpel

Redefining AI - Artificial Intelligence with Squirro
Spotlight Eighteen - Open Source Unleashed: A Conversation on the Power of Community-Driven Tech - Out Soon!

Redefining AI - Artificial Intelligence with Squirro

Play Episode Listen Later Sep 26, 2023 1:31


Spotlight Eighteen is a snippet from our upcoming episode: Piero Molino - Open Source Unleashed: A Conversation on the Power of Community-Driven Tech Listen to the full episode, as soon as it comes out by subscribing to Redefining AI. Who is Piero Molino PhD? Piero Molino, Ph.D., is CEO and co-founder of Predibase, a low-code AI platform that makes it easy for engineers to build, optimize and deploy state-of-the-art models - from linear regressions to large language models - with just a few lines of code. He previously worked as a research scientist exploring machine learning and natural language processing at Stanford University, Yahoo, IBM Watson, Geometric Intelligence, and Uber, where he was a founding member of the Uber AI organization. He is the creator of Ludwig, a Linux-Foundation-backed open-source declarative deep learning framework with more than 9000 stars on GitHub. Why this Episode? Dive into the world of AI democratization with our latest podcast episode featuring Piero Molino PhD, CEO of Predibase! Join host Lauren Hawker Zafer as she discusses intriguing facts about the world of AI that will captivate tech enthusiasts and novices alike. Discover the secrets behind "Ludwig" and learn how this open-source project is revolutionizing AI democratization. Explore the concept of universal access to AI technologies and how it benefits users who crave free access to cutting-edge tools. Piero unveils the shift in the AI landscape, where data collection and labeling are no longer the initial hurdles. Explore the implications of this transformation on the democratization of AI and the security measures in place to protect sensitive data. Find out how smaller organizations and individual developers are now empowered to harness the power of AI models through user-friendly APIs. Uncover the integration of classic machine learning techniques with large language models and how it's becoming accessible to a broader audience. Get a glimpse of the evolving role of digital assistants in the tech world and how they fit into the changing dynamics of AI democratization. Don't miss out on real-world examples of AI in action, from text classification to information extraction, and the transition from structured to unstructured data. Join us on this captivating journey through the exciting landscape of AI democratization. Tune in now to redefine your understanding of AI!

All TWiT.tv Shows (MP3)
This Week in Enterprise Tech 561: That Cloud Looks Like A Llama

All TWiT.tv Shows (MP3)

Play Episode Listen Later Sep 16, 2023 74:14


This week on TWiET, Lou Maresca, Brian Chee, and Curt Franklin talk with Dev Rishi, Co-Founder and CEO at Predibase, discussing the hurdles of moving large language models into production environments and how privacy concerns factor into that decision. Guest Dev Rishi of Predibase explains why many organizations say they can't use commercial LLMs, and shares best practices for getting started with privacy-focused ML. Other topics include: Microsoft, Oracle deliver direct access to Oracle database services on Azure Cyber Extortion Attacks No Longer Require Ransomware World's Largest Lithium Deposit Found Along Nevada-Oregon Border AI-Powered SOC Automation: A New Era in Security Operations The ripple effects of phasing out older TLS protocols Hosts: Louis Maresca, Brian Chee, and Curtis Franklin Guest: Dev Rishi Download or subscribe to this show at https://twit.tv/shows/this-week-in-enterprise-tech. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsors: nureva.com/twit discourse.org/twit canary.tools/twit - use code: TWIT

TWiT Bits (MP3)
TWiET Clip: Predibase: Accessible Machine Learning

TWiT Bits (MP3)

Play Episode Listen Later Sep 16, 2023 13:07


On This Week in Enterprise Tech, Lou Maresca talks with Dev Rishi, co-founder and CEO at Predibase discussing the hurdles of moving large language models into production environments, and why many organizations say they can't use commercial LLMs. For the full episode, visit twit.tv/twiet/561 Host: Louis Maresca Guest: Dev Rishi You can find more about TWiT and subscribe to our podcasts at https://podcasts.twit.tv/

This Week in Enterprise Tech (Video HD)
TWiET 561: That Cloud Looks Like A Llama - Moving on from old protocols, accessible machine learning with Predibase

This Week in Enterprise Tech (Video HD)

Play Episode Listen Later Sep 16, 2023 74:14


This week on TWiET, Lou Maresca, Brian Chee, and Curt Franklin talk with Dev Rishi, Co-Founder and CEO at Predibase, discussing the hurdles of moving large language models into production environments and how privacy concerns factor into that decision. Guest Dev Rishi of Predibase explains why many organizations say they can't use commercial LLMs, and shares best practices for getting started with privacy-focused ML. Other topics include: Microsoft, Oracle deliver direct access to Oracle database services on Azure Cyber Extortion Attacks No Longer Require Ransomware World's Largest Lithium Deposit Found Along Nevada-Oregon Border AI-Powered SOC Automation: A New Era in Security Operations The ripple effects of phasing out older TLS protocols Hosts: Louis Maresca, Brian Chee, and Curtis Franklin Guest: Dev Rishi Download or subscribe to this show at https://twit.tv/shows/this-week-in-enterprise-tech. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsors: nureva.com/twit discourse.org/twit canary.tools/twit - use code: TWIT

This Week in Enterprise Tech (MP3)
TWiET 561: That Cloud Looks Like A Llama - Moving on from old protocols, accessible machine learning with Predibase

This Week in Enterprise Tech (MP3)

Play Episode Listen Later Sep 16, 2023 74:14


This week on TWiET, Lou Maresca, Brian Chee, and Curt Franklin talk with Dev Rishi, Co-Founder and CEO at Predibase, discussing the hurdles of moving large language models into production environments and how privacy concerns factor into that decision. Guest Dev Rishi of Predibase explains why many organizations say they can't use commercial LLMs, and shares best practices for getting started with privacy-focused ML. Other topics include: Microsoft, Oracle deliver direct access to Oracle database services on Azure Cyber Extortion Attacks No Longer Require Ransomware World's Largest Lithium Deposit Found Along Nevada-Oregon Border AI-Powered SOC Automation: A New Era in Security Operations The ripple effects of phasing out older TLS protocols Hosts: Louis Maresca, Brian Chee, and Curtis Franklin Guest: Dev Rishi Download or subscribe to this show at https://twit.tv/shows/this-week-in-enterprise-tech. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsors: nureva.com/twit discourse.org/twit canary.tools/twit - use code: TWIT

All TWiT.tv Shows (Video LO)
This Week in Enterprise Tech 561: That Cloud Looks Like A Llama

All TWiT.tv Shows (Video LO)

Play Episode Listen Later Sep 16, 2023 74:14


This week on TWiET, Lou Maresca, Brian Chee, and Curt Franklin talk with Dev Rishi, Co-Founder and CEO at Predibase, discussing the hurdles of moving large language models into production environments and how privacy concerns factor into that decision. Guest Dev Rishi of Predibase explains why many organizations say they can't use commercial LLMs, and shares best practices for getting started with privacy-focused ML. Other topics include: Microsoft, Oracle deliver direct access to Oracle database services on Azure Cyber Extortion Attacks No Longer Require Ransomware World's Largest Lithium Deposit Found Along Nevada-Oregon Border AI-Powered SOC Automation: A New Era in Security Operations The ripple effects of phasing out older TLS protocols Hosts: Louis Maresca, Brian Chee, and Curtis Franklin Guest: Dev Rishi Download or subscribe to this show at https://twit.tv/shows/this-week-in-enterprise-tech. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsors: nureva.com/twit discourse.org/twit canary.tools/twit - use code: TWIT

TWiT Bits (Video HD)
TWiET Clip: Predibase: Accessible Machine Learning

TWiT Bits (Video HD)

Play Episode Listen Later Sep 16, 2023 13:07


On This Week in Enterprise Tech, Lou Maresca talks with Dev Rishi, co-founder and CEO at Predibase discussing the hurdles of moving large language models into production environments, and why many organizations say they can't use commercial LLMs. For the full episode, visit twit.tv/twiet/561 Host: Louis Maresca Guest: Dev Rishi You can find more about TWiT and subscribe to our podcasts at https://podcasts.twit.tv/

How AI Happens
Declarative ML with Ludwig Creator & Predibase CEO & Co-Founder Piero Molino

How AI Happens

Play Episode Listen Later Jul 12, 2023 28:03


Low-code platforms provide a powerful and efficient way to develop applications and drive digital transformation and are becoming popular tools for organizations. In today's episode, we are joined by Piero Molino, the CEO, and Co-Founder at Predibase, a company revolutionizing the field of machine learning by pioneering a low-code declarative approach. Predibase empowers engineers and data scientists to effortlessly construct, enhance, and implement cutting-edge models, ranging from linear regressions to expansive language models, using a mere handful of code lines. Piero is intrigued by the convergence of diverse cultural interests and finds great fascination in exploring the intricate ties between knowledge, language, and learning. His approach involves seeking unconventional solutions to problems and embracing a multidisciplinary approach that allows him to acquire novel and varied knowledge while gaining fresh experiences. In our conversation, we talk about his professional career journey, developing Ludwig, and how this eventually developed into Predibase. Key Points From This Episode:Background about Piero's professional experience and skill sets.What his responsibilities were in his previous role at Uber.Hear about his research at Stanford University.Details about the motivation for Predibase: Ludwig AI. Examples of the different Ludwig models and applications.Challenges of software development.How the community further developed his Ludwig machine learning tool.The benefits of community involvement for developers.Hear how his Ludwig project developed into Predibase.He shares the inspiration behind the name Ludwig.Why Predibase can be considered a low-code platform.What the Predibase platform offers users and organizations.Ethical considerations of democratizing data science tools.The importance of a multidisciplinary approach to developing AI tools.Advice for upcoming developers.Tweetables:“One thing that I am proud of is the fact that the architecture is very extensible and really easy to plug and play new data types or new models.” — @w4nderlus7 [0:14:02]“We are doing a bunch of things at Predibase that build on top of Ludwig and make it available and easy to use for organizations in the cloud.” — @w4nderlus7 [0:19:23]“I believe that in the teams that actually put machine learning into production, there should be a combination of different skill sets.” — @w4nderlus7 [0:23:04]“What made it possible for me to do the things that I have done is constant curiosity.” — @w4nderlus7 [0:26:06]Links Mentioned in Today's Episode:Piero Molino on LinkedInPiero Molino on TwitterPredibaseLudwigMax-Planck-InstituteLoopr AIWittgenstein's MistressHow AI HappensSama

The Data Scientist Show
Uber's ML Systems (Uber Eats, Customer Support), Declarative Machine Learning - Piero Molino - The Data Scientist Show #064

The Data Scientist Show

Play Episode Listen Later Jul 4, 2023 110:05


Piero Molino was one of the founding members of Uber AI Labs. He worked on several deployed ML systems, including an NLP model for Customer Support, and the Uber Eats Recommender System. He is the author of Ludwig , an open source declarative deep learning framework. In 2021 he co-founded Predibase, the low-code declarative machine learning platform built on top of Ludwig. Piero's LinkedIn: https://www.linkedin.com/in/pieromolino Predibase free access: bit.ly/3PCeqqw Daliana's Twitter: https://twitter.com/DalianaLiu Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu (00:00:00) Introduction (00:01:54) Journey to machine learning (00:03:51) Recommending system at Uber Eats (00:04:13) Projects at Uber AI  (00:09:34) Uber's customer obsession ticket system (00:16:01) How to evaluate online-offline business and model performance metrics (00:17:16) Customer Satisfaction (00:28:38) When do you know whether a project is good enough (00:41:50) Declarative machine learning and Ludwig (00:45:32) Ludwig vs AutoML (00:54:44) Working with Professor Chris Re (00:58:32) Why he started Predibase (01:07:56) LLM and GenAI (01:10:17) Challenges for LLMs (01:22:36) Advice for data scientists (01:34:29) Career advice to his younger self

Startup Insider
Investments & Exits - mit Won Yi von Cavalry Ventures

Startup Insider

Play Episode Listen Later Jun 6, 2023 20:42


In der Rubrik “Investments & Exits” begrüßen wir heute Won Yi, Associate bei Cavalry Ventures. Won spricht über die Finanzierungsrunde von Predibase. Predibase, ein in San Francisco ansässiges Startup, hat in einer von Felicis angeführten Series-A-Finanzierungsrunde 12,2 Millionen US-Dollar erhalten. Die Gesamtfinanzierung des Unternehmens erhöht sich mit dem Abschluss der Runde auf 28,5 Millionen US-Dollar. Das von Devvret Rishi, Piero Molino und Travis Addair gegründete Startup ermöglicht es Entwicklern und Datenwissenschaftlern, KI-Anwendungen zu erstellen und einzusetzen, ohne komplexe ML-Tools verwenden zu müssen. Zu den Kunden des Startups gehören zum Beispiel Paradigm und Koble.ai. Die frischen finanziellen Mittel sollen nun für den Ausbau der Go-to-Market-Funktion und die Erweiterung der Plattform verwendet werden.

MLOps.community
Data Privacy and Security // LLMs in Production Conference Panel Discussion

MLOps.community

Play Episode Listen Later May 12, 2023 24:53


We are having another LLMs in-production Virtual Conference. 50+ speakers combined with in-person activities around the world on June 15 & 16. Sign up free here: https://home.mlops.community/home/events/llm-in-prod-part-ii-2023-06-20 // Abstract This panel discussion is centered around a crucial topic in the tech industry - data privacy and security in the context of large language models and AI systems. The discussion highlights several key themes, such as the significance of trust in AI systems, the potential risks of hallucinations, and the differences between low and high-affordability use cases. The discussion promises to be thought-provoking and informative, shedding light on the latest developments and concerns in the field. We can expect to gain valuable insights into an issue that is becoming increasingly relevant in our digital world. // Bio Diego Oppenheimer Diego Oppenheimer is an entrepreneur, product developer, and investor with an extensive background in all things data. Currently, he is a Partner at Factory a venture fund specializing in AI investments as well as interim head of product at two LLM startups. Previously he was an executive vice president at DataRobot, Founder, and CEO at Algorithmia (acquired by DataRobot), and shipped some of Microsoft's most used data analysis products including Excel, PowerBI, and SQL Server. Diego is active in AI/ML communities as a founding member and strategic advisor for the AI Infrastructure Alliance and MLops.Community and works with leaders to define ML industry standards and best practices. Diego holds a Bachelor's degree in Information Systems and a Masters degree in Business Intelligence and Data Analytics from Carnegie Mellon University Gevorg Karapetyan Gevorg Karapetyan is the co-founder and CTO of ZERO Systems where he oversees the company's product and technology strategy. He holds a Ph.D. in Computer Science and is the author of multiple publications, including a US Patent. Vin Vashishta C-level Technical Strategy Advisor and Founder of V Squared, one of the first data science consulting firms. Our mission is to provide support and clarity for our clients' complete data and AI monetization journeys. Over a decade in data science and a quarter century in technology building and leading teams and delivering products with $100M+ in ARR. Saahil Jain Saahil Jain is an engineering manager at You.com. At You.com, Saahil builds search, ranking, and conversational AI systems. Previously, Saahil was a graduate researcher in the Stanford Machine Learning Group under Professor Andrew Ng, where he researched topics related to deep learning and natural language processing (NLP) in resource-constrained domains like healthcare. Prior to Stanford, Saahil worked as a product manager at Microsoft on Office 365. He received his B.S. and M.S. in Computer Science at Columbia University and Stanford University respectively. Shreya Rajpal Shreya is the creator of Guardrails AI, an open-source solution designed to establish guardrails for large language models. As a founding engineer at Predibase, she helped build the Applied ML and ML infra teams. Previously, she worked at Apple's Special Projects Group on cross-functional ML, and at Drive.ai building computer vision models.

Talk Python To Me - Python conversations for passionate developers

At PyCon 2023, there was a section of the expo floor dedicated to new Python-based companies called Startup Row. I wanted to bring their stories and the experience of talking with these new startups to you. So in this episode, we'll talk with founders from these companies for 5 to 10 minutes each. Links from the show Ponder: ponder.io generally intelligent: generallyintelligent.com Wherobots: wherobots.ai Neptyne: neptyne.com Nixtla: nixtla.io Predibase: predibase.com Pynecone: pynecone.io Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy Sponsors Sentry Error Monitoring, Code TALKPYTHON Talk Python Training

The Python Podcast.__init__
Declarative Machine Learning For High Performance Deep Learning Models With Predibase

The Python Podcast.__init__

Play Episode Listen Later Dec 5, 2022 59:22


Deep learning is a revolutionary category of machine learning that accelerates our ability to build powerful inference models. Along with that power comes a great deal of complexity in determining what neural architectures are best suited to a given task, engineering features, scaling computation, etc. Predibase is building on the successes of the Ludwig framework for declarative deep learning and Horovod for horizontally distributing model training. In this episode CTO and co-founder of Predibase, Travis Addair, explains how they are reducing the burden of model development even further with their managed service for declarative and low-code ML and how they are integrating with the growing ecosystem of solutions for the full ML lifecycle.

The Machine Learning Podcast
Solve The Cold Start Problem For Machine Learning By Letting Humans Teach The Computer With Aitomatic

The Machine Learning Podcast

Play Episode Listen Later Sep 28, 2022 52:07


Summary Machine learning is a data-hungry approach to problem solving. Unfortunately, there are a number of problems that would benefit from the automation provided by artificial intelligence capabilities that don’t come with troves of data to build from. Christopher Nguyen and his team at Aitomatic are working to address the "cold start" problem for ML by letting humans generate models by sharing their expertise through natural language. In this episode he explains how that works, the various ways that we can start to layer machine learning capabilities on top of each other, as well as the risks involved in doing so without incorporating lessons learned in the growth of the software industry. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out! Your host is Tobias Macey and today I’m interviewing Christopher Nguyen about how to address the cold start problem for ML/AI projects Interview Introduction How did you get involved in machine learning? Can you describe what the "cold start" or "small data" problem is and its impact on an organization’s ability to invest in machine learning? What are some examples of use cases where ML is a viable solution but there is a corresponding lack of usable data? How does the model design influence the data requirements to build it? (e.g. statistical model vs. deep learning, etc.) What are the available options for addressing a lack of data for ML? What are the characteristics of a given data set that make it suitable for ML use cases? Can you describe what you are building at Aitomatic and how it helps to address the cold start problem? How have the design and goals of the product changed since you first started working on it? What are some of the education challenges that you face when working with organizations to help them understand how to think about ML/AI investment and practical limitations? What are the most interesting, innovative, or unexpected ways that you have seen Aitomatic/H1st used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Aitomatic/H1st? When is a human/knowledge driven approach to ML development the wrong choice? What do you have planned for the future of Aitomatic? Contact Info LinkedIn @pentagoniac on Twitter Google Scholar Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Aitomatic Human First AI Knowledge First World Symposium Atari 800 Cold start problem Scale AI Snorkel AI Podcast Episode Anomaly Detection Expert Systems ICML == International Conference on Machine Learning NIST == National Institute of Standards and Technology Multi-modal Model SVM == Support Vector Machine Tensorflow Pytorch Podcast.__init__ Episode OSS Capital DALL-E The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

The Machine Learning Podcast
Building A Business Powered By Machine Learning At Assembly AI

The Machine Learning Podcast

Play Episode Listen Later Sep 9, 2022 58:42


Summary The increasing sophistication of machine learning has enabled dramatic transformations of businesses and introduced new product categories. At Assembly AI they are offering advanced speech recognition and natural language models as an API service. In this episode founder Dylan Fox discusses the unique challenges of building a business with machine learning as the core product. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out! Your host is Tobias Macey and today I’m interviewing Dylan Fox about building and growing a business with ML as its core offering Interview Introduction How did you get involved in machine learning? Can you describe what Assembly is and the story behind it? For anyone who isn’t familiar with your platform, can you describe the role that ML/AI plays in your product? What was your process for going from idea to prototype for an AI powered business? Can you offer parallels between your own experience and that of your peers who are building businesses oriented more toward pure software applications? How are you structuring your teams? On the path to your current scale and capabilities how have you managed scoping of your model capabilities and operational scale to avoid getting bogged down or burnt out? How do you think about scoping of model functionality to balance composability and system complexity? What is your process for identifying and understanding which problems are suited to ML and when to rely on pure software? You are constantly iterating on model performance and introducing new capabilities. How do you manage prototyping and experimentation cycles? What are the metrics that you track to identify whether and when to move from an experimental to an operational state with a model? What is your process for understanding what’s possible and what can feasibly operate at scale? Can you describe your overall operational patterns delivery process for ML? What are some of the most useful investments in tooling that you have made to manage development experience for your teams? Once you have a model in operation, how do you manage performance tuning? (from both a model and an operational scalability perspective) What are the most interesting, innovative, or unexpected aspects of ML development and maintenance that you have encountered while building and growing the Assembly platform? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Assembly? When is ML the wrong choice? What do you have planned for the future of Assembly? Contact Info @YouveGotFox on Twitter LinkedIn Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Assembly AI Podcast.__init__ Episode Learn Python the Hard Way NLTK NLP == Natural Language Processing NLU == Natural Language Understanding Speech Recognition Tensorflow r/machinelearning SciPy PyTorch Jax HuggingFace RNN == Recurrent Neural Network CNN == Convolutional Neural Network LSTM == Long Short Term Memory Hidden Markov Models Baidu DeepSpeech CTC (Connectionist Temporal Classification) Loss Model Twilio Grid Search K80 GPU A100 GPU TPU == Tensor Processing Unit Foundation Models BLOOM Language Model DALL-E 2 The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

The Machine Learning Podcast
Using AI To Transform Your Business Without The Headache Using Graft

The Machine Learning Podcast

Play Episode Listen Later Aug 16, 2022 67:33


Summary Machine learning is a transformative tool for the organizations that can take advantage of it. While the frameworks and platforms for building machine learning applications are becoming more powerful and broadly available, there is still a significant investment of time, money, and talent required to take full advantage of it. In order to reduce that barrier further Adam Oliner and Brian Calvert, along with their other co-founders, started Graft. In this episode Adam and Brian explain how they have built a platform designed to empower everyone in the business to take part in designing and building ML projects, while managing the end-to-end workflow required to go from data to production. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out! Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started! Your host is Tobias Macey and today I’m interviewing Brian Calvert and Adam Oliner about Graft, a cloud-native platform designed to simplify the work of applying AI to business problems Interview Introduction How did you get involved in machine learning? Can you describe what Graft is and the story behind it? What is the core thesis of the problem you are targeting? How does the Graft product address that problem? Who are the personas that you are focused on working with both now in your early stages and in the future as you evolve the product? What are the capabilities that can be unlocked in different organizations by reducing the friction and up-front investment required to adopt ML/AI? What are the user-facing interfaces that you are focused on providing to make that adoption curve as shallow as possible? What are some of the unavoidable bits of complexity that need to be surfaced to the end user? Can you describe the infrastructure and platform design that you are relying on for the Graft product? What are some of the emerging "best practices" around ML/AI that you have been able to build on top of? As new techniques and practices are discovered/introduced how are you thinking about the adoption process and how/when to integrate them into the Graft product? What are some of the new engineering challenges that you have had to tackle as a result of your specific product? Machine learning can be a very data and compute intensive endeavor. How are you thinking about scalability in a multi-tenant system? Different model and data types can be widely divergent in terms of the cost (monetary, time, compute, etc.) required. How are you thinking about amortizing vs. passing through those costs to the end user? Can you describe the adoption/integration process for someone using Graft? Once they are onboarded and they have connected to their various data sources, what is the workflow for someone to apply ML capabilities to their problems? One of the challenges about the current state of ML capabilities and adoption is understanding what is possible and what is impractical. How have you designed Graft to help identify and expose opportunities for applying ML within the organization? What are some of the challenges of customer education and overall messaging that you are working through? What are the most interesting, innovative, or unexpected ways that you have seen Graft used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Graft? When is Graft the wrong choice? What do you have planned for the future of Graft? Contact Info Brian LinkedIn Adam LinkedIn Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Graft High Energy Particle Physics LHC Cruise Slack Splunk Marvin Minsky Patrick Henry Winston AI Winter Sebastian Thrun DARPA Grand Challenge Higss Boson Supersymmetry Kinematics Transfer Learning Foundation Models ML Embeddings BERT Airflow Dagster Prefect Dask Kubeflow MySQL PostgreSQL Snowflake Redshift S3 Kubernetes Multi-modal models Multi-task models Magic: The Gathering The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/?utm_source=rss&utm_medium=rss

The Machine Learning Podcast
Build Better Models Through Data Centric Machine Learning Development With Snorkel AI

The Machine Learning Podcast

Play Episode Listen Later Jul 29, 2022 53:49


Summary Machine learning is a data hungry activity, and the quality of the resulting model is highly dependent on the quality of the inputs that it receives. Generating sufficient quantities of high quality labeled data is an expensive and time consuming process. In order to reduce that time and cost Alex Ratner and his team at Snorkel AI have built a system for powering data-centric machine learning development. In this episode he explains how the Snorkel platform allows domain experts to create labeling functions that translate their expertise into reusable logic that dramatically reduces the time needed to build training data sets and drives down the total cost. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started! Data powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today! Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out! Your host is Tobias Macey and today I’m interviewing Alex Ratner about Snorkel AI, a platform for data-centric machine learning workflows powered by programmatic data labeling techniques Interview Introduction How did you get involved in machine learning? Can you describe what Snorkel AI is and the story behind it? What are the problems that you are focused on solving? Which pieces of the ML lifecycle are you focused on? How did your experience building the open source Snorkel project and working with the community inform your product direction for Snorkel AI? How has the underlying Snorkel project evolved over the past 4 years? What are the deciding factors that an organization or ML team need to consider when evaluating existing labeling strategies against the programmatic approach that you provide? What are the features that Snorkel provides over and above managing code execution across the source data set? Can you describe what you have built at Snorkel AI and how it is implemented? What are some of the notable developments of the ML ecosystem that had a meaningful impact on your overall product vision/viability? Can you describe the workflow for an individual or team who is using Snorkel for generating their training data set? How does Snorkel integrate with the experimentation process to track how changes to labeling logic correlate with the performance of the resulting model? What are some of the complexities involved in designing and testing the labeling logic? How do you handle complex data formats such as audio, video, images, etc. that might require their own ML models to generate labels? (e.g. object detection for bounding boxes) With the increased scale and quality of labeled data that Snorkel AI offers, how does that impact the viability of autoML toolchains for generating useful models? How are you managing the governance and feature boundaries between the open source Snorkel project and the business that you have built around it? What are the most interesting, innovative, or unexpected ways that you have seen Snorkel AI used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Snorkel AI? When is Snorkel AI the wrong choice? What do you have planned for the future of Snorkel AI? Contact Info LinkedIn Website @ajratner on Twitter Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Snorkel AI Data Engineering Podcast Episode University of Washington Snorkel OSS Natural Language Processing (NLP) Tensorflow PyTorch Podcast.__init__ Episode Deep Learning Foundation Models MLFlow SHAP Podcast.__init__ Episode The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

The Machine Learning Podcast
Declarative Machine Learning For High Performance Deep Learning Models With Predibase

The Machine Learning Podcast

Play Episode Listen Later Jul 21, 2022 60:19


Summary Deep learning is a revolutionary category of machine learning that accelerates our ability to build powerful inference models. Along with that power comes a great deal of complexity in determining what neural architectures are best suited to a given task, engineering features, scaling computation, etc. Predibase is building on the successes of the Ludwig framework for declarative deep learning and Horovod for horizontally distributing model training. In this episode CTO and co-founder of Predibase, Travis Addair, explains how they are reducing the burden of model development even further with their managed service for declarative and low-code ML and how they are integrating with the growing ecosystem of solutions for the full ML lifecycle. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started! Data powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today! Do you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you. Your host is Tobias Macey and today I’m interviewing Travis Addair about Predibase, a low-code platform for building ML models in a declarative format Interview Introduction How did you get involved in machine learning? Can you describe what Predibase is and the story behind it? Who is your target audience and how does that focus influence your user experience and feature development priorities? How would you describe the semantic differences between your chosen terminology of "declarative ML" and the "autoML" nomenclature that many projects and products have adopted? Another platform that launched recently with a promise of "declarative ML" is Continual. How would you characterize your relative strengths? Can you describe how the Predibase platform is implemented? How have the design and goals of the product changed as you worked through the initial implementation and started working with early customers? The operational aspects of the ML lifecycle are still fairly nascent. How have you thought about the boundaries for your product to avoid getting drawn into scope creep while providing a happy path to delivery? Ludwig is a core element of your platform. What are the other capabilities that you are layering around and on top of it to build a differentiated product? In addition to the existing interfaces for Ludwig you created a new language in the form of PQL. What was the motivation for that decision? How did you approach the semantic and syntactic design of the dialect? What is your vision for PQL in the space of "declarative ML" that you are working to define? Can you describe the available workflows for an individual or team that is using Predibase for prototyping and validating an ML model? Once a model has been deemed satisfactory, what is the path to production? How are you approaching governance and sustainability of Ludwig and Horovod while balancing your reliance on them in Predibase? What are some of the notable investments/improvements that you have made in Ludwig during your work of building Predibase? What are the most interesting, innovative, or unexpected ways that you have seen Predibase used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Predibase? When is Predibase the wrong choice? What do you have planned for the future of Predibase? Contact Info LinkedIn tgaddair on GitHub @travisaddair on Twitter Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Predibase Horovod Ludwig Podcast.__init__ Episode Support Vector Machine Hadoop Tensorflow Uber Michaelangelo AutoML Spark ML Lib Deep Learning PyTorch Continual Data Engineering Podcast Episode Overton Kubernetes Ray Nvidia Triton Whylogs Data Engineering Podcast Episode Weights and Biases MLFlow Comet Confusion Matrices dbt Data Engineering Podcast Episode Torchscript Self-supervised Learning The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

The Machine Learning Podcast
Stop Feeding Garbage Data To Your ML Models, Clean It Up With Galileo

The Machine Learning Podcast

Play Episode Listen Later Jul 14, 2022 47:03


Summary Machine learning is a force multiplier that can generate an outsized impact on your organization. Unfortunately, if you are feeding your ML model garbage data, then you will get orders of magnitude more garbage out of it. The team behind Galileo experienced that pain for themselves and have set out to make data management and cleaning for machine learning a first class concern in your workflow. In this episode Vikram Chatterji shares the story of how Galileo got started and how you can use their platform to fix your ML data so that you can get back to the fun parts. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out! Do you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you. Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started! Your host is Tobias Macey and today I’m interviewing Vikram Chatterji about Galileo, a platform for uncovering and addressing data problems to improve your model quality Interview Introduction How did you get involved in machine learning? Can you describe what Galileo is and the story behind it? Who are the target users of the platform and what are the tools/workflows that you are replacing? How does that focus inform and influence the design and prioritization of features in the platform? What are some of the real-world impacts that you have experienced as a result of the kinds of data problems that you are addressing with Galileo? Can you describe how the Galileo product is implemented? What are some of the assumptions that you had formed from your own experiences that have been challenged as you worked with early design partners? The toolchains and model architectures of any given team is unlikely to be a perfect match across departments or organizations. What are the core principles/concepts that you have hooked into in order to provide the broadest compatibility? What are the model types/frameworks/etc. that you have had to forego support for in the early versions of your product? Can you describe the workflow for someone building a machine learning model and how Galileo fits across the various stages of that cycle? What are some of the biggest difficulties posed by the non-linear nature of the experimentation cycle in model development? What are some of the ways that you work to quantify the impact of your tool on the productivity and profit contributions of an ML team/organization? What are the most interesting, innovative, or unexpected ways that you have seen Galileo used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Galileo? When is Galileo the wrong choice? What do you have planned for the future of Galileo? Contact Info LinkedIn Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Galileo F1 Score Tensorflow Keras SpaCy Podcast.__init__ Episode Pytorch Podcast.__init__ Episode MXNet Jax The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

The Machine Learning Podcast
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks

The Machine Learning Podcast

Play Episode Listen Later Jul 6, 2022 48:40


Summary Machine learning has the potential to transform industries and revolutionize business capabilities, but only if the models are reliable and robust. Because of the fundamental probabilistic nature of machine learning techniques it can be challenging to test and validate the generated models. The team at Deepchecks understands the widespread need to easily and repeatably check and verify the outputs of machine learning models and the complexity involved in making it a reality. In this episode Shir Chorev and Philip Tannor explain how they are addressing the problem with their open source deepchecks library and how you can start using it today to build trust in your machine learning applications. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Do you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you. Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out! Data powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today! Your host is Tobias Macey and today I’m interviewing Shir Chorev and Philip Tannor about Deepchecks, a Python package for comprehensively validating your machine learning models and data with minimal effort. Interview Introduction How did you get involved in machine learning? Can you describe what Deepchecks is and the story behind it? Who is the target audience for the project? What are the biggest challenges that these users face in bringing ML models from concept to production and how does DeepChecks address those problems? In the absence of DeepChecks how are practitioners solving the problems of model validation and comparison across iteratiosn? What are some of the other tools in this ecosystem and what are the differentiating features of DeepChecks? What are some examples of the kinds of tests that are useful for understanding the "correctness" of models? What are the methods by which ML engineers/data scientists/domain experts can define what "correctness" means in a given model or subject area? In software engineering the categories of tests are tiered as unit -> integration -> end-to-end. What are the relevant categories of tests that need to be built for validating the behavior of machine learning models? How do model monitoring utilities overlap with the kinds of tests that you are building with deepchecks? Can you describe how the DeepChecks package is implemented? How have the design and goals of the project changed or evolved from when you started working on it? What are the assumptions that you have built up from your own experiences that have been challenged by your early users and design partners? Can you describe the workflow for an individual or team using DeepChecks as part of their model training and deployment lifecycle? Test engineering is a deep discipline in its own right. How have you approached the user experience and API design to reduce the overhead for ML practitioners to adopt good practices? What are the interfaces available for creating reusable tests and composing test suites together? What are the additional services/capabilities that you are providing in your commercial offering? How are you managing the governance and sustainability of the OSS project and balancing that against the needs/priorities of the business? What are the most interesting, innovative, or unexpected ways that you have seen DeepChecks used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on DeepChecks? When is DeepChecks the wrong choice? What do you have planned for the future of DeepChecks? Contact Info Shir LinkedIn shir22 on GitHub Philip LinkedIn @philiptannor on Twitter Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links DeepChecks Random Forest Talpiot Program SHAP Podcast.__init__ Episode Airflow Great Expectations Data Engineering Podcast Episode The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

The Machine Learning Podcast
Build A Full Stack ML Powered App In An Afternoon With Baseten

The Machine Learning Podcast

Play Episode Listen Later Jun 29, 2022 46:26


Summary Building an ML model is getting easier than ever, but it is still a challenge to get that model in front of the people that you built it for. Baseten is a platform that helps you quickly generate a full stack application powered by your model. You can easily create a web interface and APIs powered by the model you created, or a pre-trained model from their library. In this episode Tuhin Srivastava, co-founder of Basten, explains how the platform empowers data scientists and ML engineers to get their work in production without having to negotiate for help from their application development colleagues. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Data powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today! Do you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you. Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out! Your host is Tobias Macey and today I’m interviewing Tuhin Srivastava about Baseten, an ML Application Builder for data science and machine learning teams Interview Introduction How did you get involved in machine learning? Can you describe what Baseten is and the story behind it? Who are the target users for Baseten and what problems are you solving for them? What are some of the typical technical requirements for an application that is powered by a machine learning model? In the absence of Baseten, what are some of the common utilities/patterns that teams might rely on? What kinds of challenges do teams run into when serving a model in the context of an application? There are a number of projects that aim to reduce the overhead of turning a model into a usable product (e.g. Streamlit, Hex, etc.). What is your assessment of the current ecosystem for lowering the barrier to product development for ML and data science teams? Can you describe how the Baseten platform is designed? How have the design and goals of the project changed or evolved since you started working on it? How do you handle sandboxing of arbitrary user-managed code to ensure security and stability of the platform? How did you approach the system design to allow for mapping application development paradigms into a structure that was accessible to ML professionals? Can you describe the workflow for building an ML powered application? What types of models do you support? (e.g. NLP, computer vision, timeseries, deep neural nets vs. linear regression, etc.) How do the monitoring requirements shift for these different model types? What other challenges are presented by these different model types? What are the limitations in size/complexity/operational requirements that you have to impose to ensure a stable platform? What is the process for deploying model updates? For organizations that are relying on Baseten as a prototyping platform, what are the options for taking a successful application and handing it off to a product team for further customization? What are the most interesting, innovative, or unexpected ways that you have seen Baseten used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Baseten? When is Baseten the wrong choice? What do you have planned for the future of Baseten? Contact Info @tuhinone on Twitter LinkedIn Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Baseten Gumroad scikit-learn Tensorflow Keras Streamlit Podcast.__init__ Episode Retool Hex Podcast.__init__ Episode Kubernetes React Monaco Huggingface Airtable Dall-E 2 GPT-3 Weights and Biases The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

52 Weeks of Cloud
52 Weeks of Cloud Episode 28: Conversation with Piero Molino and Ludwig/Predibase

52 Weeks of Cloud

Play Episode Listen Later Jun 22, 2022 50:37


If you enjoyed this video, here are additional resources to look at:Coursera + Duke Specialization: Building Cloud Computing Solutions at Scale Specialization: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scalePython, Bash, and SQL Essentials for Data Engineering Specialization: https://www.coursera.org/specializations/python-bash-sql-data-engineering-dukeAWS Certified Solutions Architect - Professional (SAP-C01) Cert Prep: 1 Design for Organizational Complexity:https://www.linkedin.com/learning/aws-certified-solutions-architect-professional-sap-c01-cert-prep-1-design-for-organizational-complexity/design-for-organizational-complexity?autoplay=trueO'Reilly Book: Practical MLOps: https://www.amazon.com/Practical-MLOps-Operationalizing-Machine-Learning/dp/1098103017O'Reilly Book: Python for DevOps: https://www.amazon.com/gp/product/B082P97LDW/Pragmatic AI: An Introduction to Cloud-based Machine Learning: https://www.amazon.com/gp/product/B07FB8F8QP/Pragmatic AI Labs Book: Python Command-Line Tools: https://www.amazon.com/gp/product/B0855FSFYZPragmatic AI Labs Book: Cloud Computing for Data Analysis: https://www.amazon.com/gp/product/B0992BN7W8Pragmatic AI Book: Minimal Python: https://www.amazon.com/gp/product/B0855NSRR7Pragmatic AI Book: Testing in Python: https://www.amazon.com/gp/product/B0855NSRR7Subscribe to Pragmatic AI Labs YouTube Channel: https://www.youtube.com/channel/UCNDfiL0D1LUeKWAkRE1xO5QSubscribe to 52 Weeks of AWS Podcast: https://52-weeks-of-cloud.simplecast.comView content on noahgift.com: https://noahgift.com/View content on Pragmatic AI Labs Website: https://paiml.com/

MLOps.community
Declarative Machine Learning Systems: Big Tech Level ML Without a Big Tech Team // Piero Molino // MLOps Coffee Sessions #101

MLOps.community

Play Episode Listen Later Jun 3, 2022 58:46


MLOps Coffee Sessions #101 with Piero Molino, Declarative Machine Learning Systems: Big Tech Level ML Without a Big Tech Team co-hosted by Vishnu Rachakonda. // Abstract Declarative Machine Learning Systems are the next step in the evolution of Machine Learning infrastructure. With such systems, organizations can marry the flexibility of low-level APIs with the simplicity of AutoML. Companies adopting such systems can increase the speed of machine learning development, reaching the quality and scalability that only big tech companies could achieve until now, without the need for a team of several thousand people. Predibase is the turnkey solution for adopting declarative ML systems at an enterprise scale. // Bio Piero Molino is CEO and co-founder of Predibase, a company redefining ML tooling. Most recently, he has been Staff Research Scientist at Stanford University working on Machine Learning systems and algorithms in Prof. Chris Ré's' Hazy group. Piero completed a Ph.D. in Question Answering at the University of Bari, Italy. Founded QuestionCube, a startup that built a framework for semantic search and QA. Worked for Yahoo Labs in Barcelona on learning to rank, IBM Watson in New York on natural language processing with deep learning, and then joined Geometric Intelligence, where he worked on grounded language understanding. After Uber acquired Geometric Intelligence, Piero became one of the founding members of Uber AI Labs. At Uber, he worked on research topics including Dialogue Systems, Language Generation, Graph Representation Learning, Computer Vision, Reinforcement Learning, and Meta-Learning. He also worked on several deployed systems like COTA, an ML and NLP model for Customer Support, Dialogue Systems for driver's hands-free dispatch, the Uber Eats Recommender System with graph learning and collusion detection. He is the author of Ludwig, a Linux-Foundation-backed open source declarative deep learning framework. // MLOps Jobs board https://mlops.pallet.xyz/jobs MLOps Swag/Merch https://www.printful.com/ // Related Links Website: http://w4nderlu.st http://ludwig.ai https://medium.com/ludwig-ai Declarative Machine Learning Systems paper By Piero Molino, Christopher Ré: https://cacm.acm.org/magazines/2022/1/257445-declarative-machine-learning-systems/fulltext Slip of the Keyboard by Sir Terry Pratchett: https://www.terrypratchettbooks.com/books/a-slip-of-the-keyboard/ The Listening Society book series by Hanzi Freinacht: https://www.amazon.com/Listening-Society-Metamodern-Politics-Guides-ebook/dp/B074MKQ4LR --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/ Connect with Piero on LinkedIn: https://www.linkedin.com/in/pieromolino/?locale=en_US