POPULARITY
In this episode, Mark speaks with Cate Lochead. Cate is CMO for Snorkel AI, that enables enterprises to develop AI that works for their unique workloads.Cate is a master in marketing and building world class teams. With over 25 years of experience building high growth organizations, Cate is now a leader in AI space. In this episode we discuss: How Cate got started in Tech, her background, Marketing and AI.Cate Shares how leveraging events can elevate your brand. Cate Lochead: https://www.linkedin.com/in/clochead/ https://snorkel.ai/ Mark Testa https://www.markstephenagency.com info@markstephenagency.com https://www.linkedin.com/company/mark-stephen-design-&-production/ https://www.instagram.com/markstephenea/ https://www.youtube.com/channel/UCK13o22i4RxQvbAgwwlh9tQ?view_as=subscriberThanks for tuning in. Check us out at on https://www.instagram.com/markstephenea/
Throughout the fourth season of Theory and Practice, we explored emerging human-like artificial intelligence and robots. We asked if we could learn as much about ourselves as we do about the machines we use. The series has covered safety guardrails for AI, empathic AI communication, communication between minds and machines, robotic surgery, computers that smell, and using AI to understand human vision. The most recent episode with Google DeepMind's Dr. Clément Farabet illuminates how computers might demonstrate understanding and reasoning on par with humans. In the final episode, we reflect on investing in artificial intelligence's future with the leader of GV's Digital Investing Team, Dave Munichiello, who has a long-standing history with AI and robotics. Dave was an early technologist at Kiva Systems, purchased by Amazon and ultimately becoming Amazon Robotics. Over the past decade-plus at GV, Dave has been leading investments across two major categories: Platforms Empowering Developers (GitLab, Segment, Slack, RedPanda, etc) and Platforms Powering AI Systems (Determined, Modular, SambaNova, Snorkel AI, etc), along with others. Dave's first AI investment, Lattice (bought by Apple's Siri team) was seven years before the hype of generative AI. We asked, from a seasoned AI investor's perspective, where does AI hold the most promise? To answer this, Dave returns to the themes we've investigated over the last eight weeks — including AI trust and safety, which Google Health's Greg Corrado raised in the first episode. Together, we explore how AI will change how we work, the nature of jobs, and how an investing team with a culture focused on having more questions than answers is well positioned for AI's future.Dave rounds out the discussion with a picture of how artificial intelligence, with real-life use cases, will move research lab theory to real-world practice. He also walks us through his hopes for AI, including a world where humans and computers exist as co-pilots.Ultimately, Dave shares an optimistic and rational view of AI's future. “AI has the potential to democratize the very creation of technology," he reflects. "With AI-assistance, folks across the country will no longer need to rely on software programmers to solve everyday digital problems – they'll be able to create these tools themselves. That is incredibly exciting, and I'm honored to be a part of that journey."
Alex Ratner is the CEO of Snorkel AI, a platform that provides programmatic data labeling and foundation models to enable companies to build AI applications. They've raised $135M so far from amazing investors such as Addition, Greylock, Google Ventures, and Lightspeed. He was previously the cofounder and CEO of SiftPage. He has a bachelors degree from Harvard and a PhD from Stanford. In this episode, we cover a range of topics including: - Making AI data development first-class and programmatic - The data-centric step for every model-centric step - False dichotomy of fine tuning vs RAG - Foundation model dynamics: winner take all vs diverse models - Training compute-optimal LLMs - Designing multimodal datasets (DataComp) - Distilling Step-by-Step - 'GPT-You' for every enterprise Alex's favorite books: Foundation series books (Author: Isaac Asimov)--------Where to find Prateek Joshi: Newsletter: https://prateekjoshi.substack.com Website: https://prateekj.com LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 Twitter: https://twitter.com/prateekvjoshi
MLOps Coffee Sessions #176 with MLOps vs. LLMOps Panel, Willem Pienaar, Chris Van Pelt, Aparna Dhinakaran, and Alex Ratner hosted by Richa Sachdev. // Abstract What do MLOps and LLMOps have in common? What has changed? Are these just new buzzwords or is there validity in calling this ops something new? // Bio Richa Sachdev A passionate and impact-driven leader whose expertise spans leading teams, architecting ML and data-intensive applications, and driving enterprise data strategy. Richa has worked for a Tier A Start-up developing feature platforms and in financial companies, leading ML Engineering teams to drive data-driven business decisions. Richa enjoys reading technical blogs focussed on system design and plays an active role in the MLOps Community. Willem Pienaar Willem is the creator of Feast, the open-source feature store and a builder in the generative AI space. Previously Willem was an engineering manager at Tecton where he led teams in both their open source and enterprise initiatives. Before that Willem built the core ML systems and created the ML platform team at Gojek, the Indonesian decacorn. Chris Van Pelt Chris Van Pelt is a co-founder of Weights & Biases, a developer MLOps platform. In 2009, Chris founded Figure Eight/CrowdFlower. Over the past 12 years, Chris has dedicated his career optimizing ML workflows and teaching ML practitioners, making machine learning more accessible to all. Chris has worked as a studio artist, computer scientist, and web engineer. He studied both art and computer science at Hope College. Aparna Dhinakaran Aparna Dhinakaran is the Co-Founder and Chief Product Officer at Arize AI, a pioneer and early leader in machine learning (ML) observability. A frequent speaker at top conferences and thought leader in the space, Dhinakaran was recently named to the Forbes 30 Under 30. Before Arize, Dhinakaran was an ML engineer and leader at Uber, Apple, and TubeMogul (acquired by Adobe). During her time at Uber, she built several core ML Infrastructure platforms, including Michelangelo. She has a bachelor's from UC Berkeley's Electrical Engineering and Computer Science program, where she published research with Berkeley's AI Research group. She is on a leave of absence from the Computer Vision Ph.D. program at Cornell University. Alex Ratner Alex Ratner is the co-founder and CEO at Snorkel AI, and an Affiliate Assistant Professor of Computer Science at the University of Washington. Prior to Snorkel AI and UW, he completed his Ph.D. in CS advised by Christopher Ré at Stanford, where he started and led the Snorkel open source project, and where his research focused on defining and forwarding the concept of “data-centric AI”, the idea that labeling and developing data is the new center of the AI development workflow. His academic work focuses on data-centric AI and related topics in data management and statistical learning techniques, and applications to real-world problems in medicine, science, and more. Previously, he earned his A.B. in Physics from Harvard University. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Richa on LinkedIn: https://www.linkedin.com/in/richasachdev/ Connect with Willem on LinkedIn: https://www.linkedin.com/in/willempienaar/ Connect with Chris on LinkedIn: https://www.linkedin.com/in/chrisvanpelt/ Connect with Aparna on Twitter: https://www.linkedin.com/in/aparnadhinakaran/ Connect with Alex on Twitter: https://www.linkedin.com/in/alexander-ratner-038ba239/
Jacob sits down with Alex to discuss how Snorkel grew from an open-source project in a Stanford AI lab to a $1B company. Alex shares his thoughts on why data development is at the heart of AI development, why enterprises are slow to deploy LLM applications, and the importance of academia in the future of AI development. 00:00 intro 01:03 moving from academia to Snorkel 05:08 the evolution of Snorkel 18:33 improving pre-training 21:37 avoiding hallucinations and other errors 33:00 barriers to enterprises deploying AI 36:59 the Snorkel footprint of the future 39:37 the role of academia in AI development 42:57 over-hyped/under-hyped 44:50 how should AI regulation change going forward? With your co-hosts: @jasoncwarner - Former CTO GitHub, VP Eng Heroku & Canonical @ericabrescia - Former COO Github, Founder Bitnami (acq'd by VMWare) @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @jordan_segall - Partner at Redpoint
Today's guest, Devang Sachdev, is the Vice President of Marketing at Snorkel AI. Devang started his career engineering hardware and managing products before transitioning and developing his expertise in strategic marketing planning, new product launches, and identifying emerging opportunities. Devang shares how he uses generative AI models to conduct market research, keep a consistent tone The post Enhancing your marketing with Chat GPT appeared first on WebMechanix.
MLOps Coffee Sessions #139 with Alex Ratner, Putting Foundation Models to Use for the Enterprise co-hosted by Abi Aryan sponsored by Snorkel AI. // Abstract Foundation models are rightfully being compared to other game-changing industrial advances like steam engines or electric motors. They're core to the transition of AI from a bespoke, less predictable science to an industrialized, democratized practice. Before they can achieve this impact, however, we need to bridge the cost, quality, and control gaps. Snorkel Flow Foundation Model Suite is the fastest way for AI/ML teams to put foundation models to use. For some projects, this means fine-tuning a foundation model for production dramatically faster by creating programmatically labeling training data. For others, the optimal solution will be using Snorkel Flow's distill, combine, and correct approach to extract the most relevant knowledge from foundation models and encode that value into the right-sized models for your use case. AI/ML teams can determine which Foundation Model Suite capabilities to use (and in what combination) to optimize for cost, quality, and control using Snorkel Flow's integrated workflow for programmatic labeling, model training, and rapid-guided iteration. // Bio Alex Ratner is the Co-founder and CEO of Snorkel AI and an Assistant Professor of Computer Science at the University of Washington. Prior to Snorkel AI and UW, he completed his Ph.D. in CS advised by Christopher Ré at Stanford, where he started and led the Snorkel open-source project, and where his research focused on applying data management and statistical learning techniques to emerging machine learning workflows such as creating and managing training data and applying this to real-world problems in medicine, knowledge base construction, and more. Previously, he earned his A.B. in Physics from Harvard University. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: www.snorkel.ai Huge “foundation models” are turbo-charging AI progress: https://www.economist.com/interactive/briefing/2022/06/11/huge-foundation-models-are-turbo-charging-ai-progress Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming: https://arxiv.org/abs/2203.01382 The Principles of Data-Centric AI Development: https://snorkel.ai/principles-of-data-centric-ai-development/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Abi on LinkedIn: https://www.linkedin.com/in/abiaryan/ Connect with Alex on LinkedIn: https://www.linkedin.com/in/alexander-ratner-038ba239/ Timestamps: [00:00] Alex's preferred coffee [01:20] Introduction to Alex Ratner [02:34] Takeaways [04:04] Huge shoutout to our Sponsor, Snorkel AI! [04:39] Comment, rate us, and share our podcasts with your friends! [04:50] Transfer Learning / Active Learning [11:30] Labeling Heuristics paper on Nemo [18:14] Geocentric AI [21:48] Enterprise use cases on Foundational Models [32:45] Foundational Models into the different Google products [38:36] Progress in Foundational Models [43:55] AutoML Models Baseline Accuracy [44:40] Hosting Infrastructure Snorkel Float vs GCP [46:53] Chris Re's venture capital firm / incubator / machine [51:00] Wrap
MLOps Coffee Sessions #133 {Podcast BTS} with Chip Huyen, Real-time Machine Learning with Chip Huyen co-hosted by Vishnu Rachakonda. // Abstract Forcing functions and how you can supercharge your learning by putting yourself into a situation where you know you either have a responsibility to others to learn or accountability on you so you have to learn. It's not that hard when you think about streaming machine learning. It's not that big of a mental barrier to cross. It is simple in theory but maybe it's more complicated in practice and that's exactly where Chip's perspective is. // Bio Chip Huyen is a co-founder of Claypot AI, a platform for real-time machine learning. Previously, she was with Snorkel AI and NVIDIA. She teaches CS 329S: Machine Learning Systems Design at Stanford. She's the author of the book Designing Machine Learning Systems, an Amazon bestseller in AI. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Landing page: https://claypot.ai Designing Machine Learning Systems book: https://www.amazon.com/Designing-Machine-Learning-Systems-Production-Ready/dp/1098107969 --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/ Connect with Chip on LinkedIn: https://www.linkedin.com/in/chiphuyen/
Greylock general partner Saam Motamedi talks with Snorkel AI CEO and co-founder Alex Ratner. As more enterprise organizations have recognized the utility of artificial intelligence technology, there's been a major push to invest in and adopt new AI and ML infrastructure to drive insights and make predictions for businesses. However, many of these solutions lack the mechanism to unlock and operationalize the data needed to train and deploy models for high quality AI projects. That pain point spawned the creation of Snorkel AI, which has developed an end-to-end data-centric machine learning platform for the enterprise.Putting the capabilities to build impactful AI in the hands of more people has been Snorkel's goal since its inception. The company spun out of Stanford's AI Lab in 2019 and has been partnered with Greylock since 2020, and just released a new set of tools that enables enterprise organizations to put foundation models to use. You can read a transcript of this conversation here: https://greylock.com/greymatter/jumpstarting-data-centric-ai/
Summary Machine learning is a data-hungry approach to problem solving. Unfortunately, there are a number of problems that would benefit from the automation provided by artificial intelligence capabilities that don’t come with troves of data to build from. Christopher Nguyen and his team at Aitomatic are working to address the "cold start" problem for ML by letting humans generate models by sharing their expertise through natural language. In this episode he explains how that works, the various ways that we can start to layer machine learning capabilities on top of each other, as well as the risks involved in doing so without incorporating lessons learned in the growth of the software industry. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out! Your host is Tobias Macey and today I’m interviewing Christopher Nguyen about how to address the cold start problem for ML/AI projects Interview Introduction How did you get involved in machine learning? Can you describe what the "cold start" or "small data" problem is and its impact on an organization’s ability to invest in machine learning? What are some examples of use cases where ML is a viable solution but there is a corresponding lack of usable data? How does the model design influence the data requirements to build it? (e.g. statistical model vs. deep learning, etc.) What are the available options for addressing a lack of data for ML? What are the characteristics of a given data set that make it suitable for ML use cases? Can you describe what you are building at Aitomatic and how it helps to address the cold start problem? How have the design and goals of the product changed since you first started working on it? What are some of the education challenges that you face when working with organizations to help them understand how to think about ML/AI investment and practical limitations? What are the most interesting, innovative, or unexpected ways that you have seen Aitomatic/H1st used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Aitomatic/H1st? When is a human/knowledge driven approach to ML development the wrong choice? What do you have planned for the future of Aitomatic? Contact Info LinkedIn @pentagoniac on Twitter Google Scholar Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Aitomatic Human First AI Knowledge First World Symposium Atari 800 Cold start problem Scale AI Snorkel AI Podcast Episode Anomaly Detection Expert Systems ICML == International Conference on Machine Learning NIST == National Institute of Standards and Technology Multi-modal Model SVM == Support Vector Machine Tensorflow Pytorch Podcast.__init__ Episode OSS Capital DALL-E The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
Braden Hancock, Co-founder and Head of Technology at Snorkel AI, joins me to talk about his path from academia to start-up co-founder and his vision to make AI more accessible to both traditional and no-code development. In this episode, Braden and I explore the journey he and his co-founders took to go from having an interesting idea to forming a company and the strategic business decisions they made along the way, such as why they opted not to use an open-source business model and the educational marketing strategy they've adopted. Highlights: Braden discusses his role as co-founder of Snorkel AI. (00:25) An introduction to Snorkel Flow, Snorkel AI's data-centric AI development program and the challenges they solve for. (01:49) Snorkel AI's relationship with open source. (06:30) Why Snorkel AI decided not to use an open-source business model in order to lower the barrier to entry. (09:01) Snorkel AI's trajectory coming from academia to the world of start-ups. (12:50) The unexpected challenges of building Snorkel AI. (17:50) Taking an educational approach to the marketing at Snorkel AI. (22:27) Braden discusses the meaningful applications of AI as well as where he sees AI being used as more of a buzzword. (27:27) Links:Braden LinkedIn: https://www.linkedin.com/in/bradenhancock/ Twitter: https://twitter.com/bradenjhancock Company: snorkel.ai Snorkel AI Twitter: https://twitter.com/SnorkelAI
Summary Machine learning is a data hungry activity, and the quality of the resulting model is highly dependent on the quality of the inputs that it receives. Generating sufficient quantities of high quality labeled data is an expensive and time consuming process. In order to reduce that time and cost Alex Ratner and his team at Snorkel AI have built a system for powering data-centric machine learning development. In this episode he explains how the Snorkel platform allows domain experts to create labeling functions that translate their expertise into reusable logic that dramatically reduces the time needed to build training data sets and drives down the total cost. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started! Data powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today! Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out! Your host is Tobias Macey and today I’m interviewing Alex Ratner about Snorkel AI, a platform for data-centric machine learning workflows powered by programmatic data labeling techniques Interview Introduction How did you get involved in machine learning? Can you describe what Snorkel AI is and the story behind it? What are the problems that you are focused on solving? Which pieces of the ML lifecycle are you focused on? How did your experience building the open source Snorkel project and working with the community inform your product direction for Snorkel AI? How has the underlying Snorkel project evolved over the past 4 years? What are the deciding factors that an organization or ML team need to consider when evaluating existing labeling strategies against the programmatic approach that you provide? What are the features that Snorkel provides over and above managing code execution across the source data set? Can you describe what you have built at Snorkel AI and how it is implemented? What are some of the notable developments of the ML ecosystem that had a meaningful impact on your overall product vision/viability? Can you describe the workflow for an individual or team who is using Snorkel for generating their training data set? How does Snorkel integrate with the experimentation process to track how changes to labeling logic correlate with the performance of the resulting model? What are some of the complexities involved in designing and testing the labeling logic? How do you handle complex data formats such as audio, video, images, etc. that might require their own ML models to generate labels? (e.g. object detection for bounding boxes) With the increased scale and quality of labeled data that Snorkel AI offers, how does that impact the viability of autoML toolchains for generating useful models? How are you managing the governance and feature boundaries between the open source Snorkel project and the business that you have built around it? What are the most interesting, innovative, or unexpected ways that you have seen Snorkel AI used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Snorkel AI? When is Snorkel AI the wrong choice? What do you have planned for the future of Snorkel AI? Contact Info LinkedIn Website @ajratner on Twitter Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Snorkel AI Data Engineering Podcast Episode University of Washington Snorkel OSS Natural Language Processing (NLP) Tensorflow PyTorch Podcast.__init__ Episode Deep Learning Foundation Models MLFlow SHAP Podcast.__init__ Episode The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
In this episode Dr Bahijja Raimi-Abraham discusses artificial intelligence (AI) and healthcare with Brandon Yang Machine Learning Engineer at Snorkel AI.Additional InformationSnorkel Open Source - https://www.snorkel.org/Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extensionThank you for listening! If you liked the episode, please give us a five-star rating and review.Buy a Coffee for Monday Science Subscribe, follow, comment, leave a review and get in touch !Submit your questions or send your voice note questions (up to 30 seconds) here. See acast.com/privacy for privacy and opt-out information.
In episode 32 of The Gradient Podcast, Andrey Kurenkov speaks to Chip Huyen.Chip Huyen is a co-founder of Claypot AI, a platform for real-time machine learning. Previously, she was with Snorkel AI and NVIDIA. She teaches CS 329S: Machine Learning Systems Design at Stanford. She has also written four bestselling Vietnamese books, and more recently her new O'Reilly book Designing Machine Learning Systems has just come out! Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterShe also maintains a Discord server with a focus on Machine Learning Systems.Outline:(00:00) Intro(01:30) 3-year trip through Asia, Africa, and South America(04:00) Getting into AI at Stanford(11:30) Confession of a so-called AI expert(16:40) Academia vs Industry(17:40) Focus on ML Systems(20:00) ML in Academia vs Industry(28:15) Maturity of AI in Industry(31:45) ML Tools(37:20) Real Time ML(43:00) ML Systems Class and BookLinks:Chip's websiteMLOps Discord serverConfession of a so-called AI expertWhat I learned from looking at 200 machine learning toolsCS 329S: Machine Learning Systems DesignDesigning Machine Learning Systems Get full access to The Gradient at thegradientpub.substack.com/subscribe
In this episode of Founded and Funded, we spotlight IIA40 winner Snorkel AI. Managing Director Tim Porter not only talks with Snorkel Co-founder and CEO Alex Ratner all about data-centric AI and programmatic data labeling and development, but they also dive into the importance of culture — especially now — and how to take advantage of what Alex calls "one of the most historic opportunities for growth in AI."
这是一期的主题或许有些硬核:我们连线硅谷一线科技公司,聊聊机器学习的工具栈,ML Infra (更准确来说,MLOps)。这是M小姐公众号上个月的一次直播回放,主题是《下一个infra 百亿美金战场在哪里?顶尖开源公司眼里的 MLOps 新时代》。 Hello World, who is OnBoard?! 欢迎来到 OnBoard!, 真实的一线经验,走心的投资思考,我们聊聊软件如何改变世界。 如果说过去5年是企业服务/SaaS的黄金年代(尤其在美国),那么,2021年绝对可以说是基础软件(主要是PaaS)的爆发之年。不过,在市场情绪急转直下的2022年,data infra 卷到发紫之后,一个infra领域的新战场已经在浮现。这就是这次我们要讨论的主题:MLOps(机器学习 DevOps 工具). 对于MLOps,一个比较被广为接受的定义是(来自Nvidia),MLOps是结合了ML(Machine Learning,机器学习),应用开发和IT infra的一整套流程和相应的工具链。包括了ML开发的准备-开发-部署整个过程中数据收集、模型开发、模型训练、实验管理、CI/CD,到生产环境部署、监控等一系列工具。 但是,要了解这个前沿趋势在硅谷实际的落地情况,这个转折点的背景,还有哪些机会,中国的创业者和创业公司可以如何参与到这个进程中甚至弯道超车,按照Monica 的习惯,当然是要请硅谷最一线的亲历者来聊聊! 这次我们请来的嘉宾的经验,横跨了Apple, Databricks 这样的大厂,也有Snorkel AI, BentoML这样的创业公司,从创业者到产品经理到开发者,他们的视角足够全面,一个多小时的分享,干货也是非常丰富。 直播的反响非常好,很多人要求回放,索性就上传到播客里,供大家回味。 中间因为腾讯视频号直播的连接问题,后面的音质可能不大好,请大家多担待。另外,嘉宾们日常工作都是英文环境,夹杂英文在所难免,尤其是一些专业名词,也希望大家体谅呀!一些名词注释我们也整理在节目介绍中了,真的尽力了! 想要了解更多背景知识和嘉宾介绍,请看这篇硬核宣传稿! 重磅直播嘉宾 Yifan Cao, PM @Cruise ML platform, ex PM@Databricks, ex-PM @Apple ML Quinn, BizOps @Snorkel AI, ex-PM@Moveworks Chaoyu Yang, Co-founder & CEO @BentoML (github), ex software eng @Databricks 我们都聊了什么: 03:32 Monica 和几位嘉宾的自我介绍+Fun facts: 我们关注的 MLOps startups 14:02 如何理解 MLOps? 为什么企业需要关注 MLOps? 23:40 为什么现在 MLOps 开始得到了关注,主要的驱动力是什么? 37:01 新的 MLOps 产品早期 adopters 都是怎样的用户? 47:38 从技术提供方的 Databricks 到甲方 Apple, Cruise, 对于企业如何选择 MLOps 产品有什么新的思考? 53:31 BentoML 为何选择 model serving 作为创业切入点? 63:52 企业内部如何推动一个新的 MLOps 公司落地?决策链是怎样的? 68:09 Billion dollar question: MLOps 工具做单点还是平台?未来会如何整合? 77:54 MLOps 开源公司,如何设计商业化路径? 85:22 Yifan 如何看待单点还是平台,开源商业化两个问题(中间网络断掉了得补上……) 90:29 这些一线从业者眼中,MLOps 目前还有什么挑战和最令人兴奋的机会? 97:23 新领域创业公司早期如何招到优秀的人才 我们提到的公司 Yifan: Outerbounds, Netflix 的开源项目MetaflowOSS的商业公司 Chaoyu: Perfect.io, workflow orchestration(工作流编排,其实不是针对 MLOps, 跟Airflow 场景更接近) Quinn: Arthur.ai, ML monitoring 机器学习模型监控 AWS SageMaker Fiddler AI: ML 监控和可解释性平台 Arize AI: ML infra 观察和监控工具 推荐文章 有干货的直播介绍:下一个 Infra 百亿美金战场在哪里 一篇MLOps 中文科普文章 欢迎关注M小姐的微信公众号,了解更多中美连线对话! M小姐研习录 (ID: MissMStudy) 大家的点赞、评论、转发是对我们最好的鼓励!希望你分享给对这个话题感兴趣的朋友哦~ 如果你有希望我们聊的话题,希望我们邀请的访谈嘉宾,都欢迎在留言中告诉我们哦!
Your product's story is really the story of your customers. Listen to Devang's straightforward framework for successful product storytelling.Mentioned in this episode:Sign up for OpenView's weekly newsletterDevang Sachdev, Vice President of Marketing at Snorkel AISnorkel AIFollow Blake Bartlett on Linkedin.Podcast produced by OpenView.View our blog for more context/inspiration.OpenView on LinkedinOpenView on TwitterOpenView on InstagramOpenView on Facebook
Everyone wants to be a platform, but be careful what you wish for. Platform go-to-market is radically different from application go-to-market. It requires a whole different framework and a special focus on solutions. Devang describes the platform GTM frameworks he developed at Twilio and Snorkel AI.Mentioned in this episode:Sign up for OpenView's weekly newsletterDevang Sachdev, Vice President of Marketing at Snorkel AISnorkel AIFollow Blake Bartlett on Linkedin.Podcast produced by OpenView.View our blog for more context/inspiration.OpenView on LinkedinOpenView on TwitterOpenView on InstagramOpenView on Facebook
Timestamps(02:00) Aarti shared her upbringing growing up in India and going to New York for undergraduate.(04:47) Aarti recalled her academic experience getting dual degrees in Computer Science and Computer Engineering at New York University.(07:17) Aarti shared details about her involvement with the ACM chapter and the Women in Computing club at NYU.(10:46) Aarti shared valuable lessons from her research internships.(14:16) Aarti discussed her decision to pursue an MS degree in Computer Science at Stanford University.(20:27) Aarti reflected on her learnings being the Head Teaching Assistant for CS 230, one of Stanford's most popular Deep Learning courses.(23:59) Aarti shared her thoughts on ML applications in both clinical and administrative healthcare settings.(26:47) Aarti unpacked the motivation and empirical work behind CheXNet, an algorithm that can detect pneumonia from chest X-rays at a level exceeding practicing radiologists.(29:39) Aarti went over the implications of MURA, a large dataset of musculoskeletal radiographs containing over 40,000 images from close to 15,000 studies, for ML applications in radiology.(32:50) Aarti went over her experience working briefly as an ML engineer at Andrew Ng's startup Landing AI and applying ML to visual inspection tasks in manufacturing.(36:56) Aarti talked about her participation in external entrepreneurial initiatives such as Threshold Venture Fellowship and Greylock X Fellowship.(43:41) Aarti reminisced her time in a hybrid ML engineer/product manager/VC associate role at AI Fund, which works intensively with entrepreneurs during their startups' most critical and risky phase from 0 to 1.(48:43) Aarti shared advice that AI fund companies tended to receive regarding product-market fit and go-to-market fit strategy.(54:04) Aarti walked through her decision to onboard Snorkel AI, the startup behind the popular Snorkel open-source project capable of quickly generating training data with weak supervision.(56:36) Aarti reflected on the difference between being an ML researcher and an ML engineer.(01:00:18) Closing segment.Aarti's Contact InfoLinkedInTwitterGoogle ScholarPeopleAndrew NgJohn LangfordDavid SontagBooks and Papers“The Art of Doing Science & Engineering” (by Richard Hamming)“Deep Medicine: How AI Can Make Healthcare Human Again” (by Eric Topol)“CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning” (Dec 2017)“MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs” (May 2018)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Today's guest is Aarti Bagul, Machine Learning Engineer at Snorkel AI in San Francisco, CA. Founded in 2019, Snorkel AI is a technology startup that empowers data scientists and developers to turn data into accurate and adaptable AI applications fast with Snorkel Flow, a first-of-its-kind data-centric development platform, powered by programmatic labeling. Snorkel Flow reduces the time, cost and friction of labeling training data so data science and development teams can more easily build and scale AI models to deploy more meaningful applications. Incorporating human judgment into the AI process through subject-matter experts is made more efficient and scalable leading to more ethical, responsible outcomes. Two out of the top three US banks, several government agencies and Fortune 500 companies use Snorkel Flow. Snorkel's core research was developed at Stanford AI lab and is deployed at Google, Intel, Apple, IBM, DARPA, and other trailblazing organizations. In the episode, Aarti will discuss: The interesting work they do at Snorkel AI, Problems they are solving in unlocking training data, Her role and interesting projects the team are working on, Transitioning from a research focused role into the startup world, Why Snorkel AI is a great place to work
Why and where do companies fail at productionizing ML models? Watch the full podcast with Aarti here: https://youtu.be/VWJXiszQpTUAarti is a machine learning engineer at Snorkel AI. Prior to that, she worked closely with Andrew Ng in various capacities. She graduated with a master's in CS from Stanford, and bachelor's in CS and Computer Engineering from @New York University, and at @Microsoft Research as a research intern for John Langford, where she contributed to Vowpal Wabbit, an open-source project. About the Host:Jay is a Ph.D. student at Arizona State University, doing research on building Interpretable AI models for Medical Diagnosis.Jay Shah: https://www.linkedin.com/in/shahjay22/You can reach out to https://www.public.asu.edu/~jgshah1/ for any queries.Stay tuned for upcoming webinars!***Disclaimer: The information contained in this video represents the views and opinions of the speaker and does not necessarily represent the views or opinions of any institution. It does not constitute an endorsement by any Institution or its affiliates of such video content.***
Source: https://www.thecloudcast.net/2021/06/automated-data-labeling-for-ai-apps.htmlSee also: https://softwareengineeringdaily.com/2020/04/09/snorkel-training-dataset-management-with-braden-hancock/Software 2.0 is Andrej Karpathy's idea that instead of coding business logic by hand, the applications of the future will be trained by data. In other words, machine learning. But ML is limited by the quality of data available, and there is a lot of unstructured, unlabeled data out there that is still being manually labeled today. Scale.AI is a well known startup that has done very well offering a scalable manual labeling workforce, however they are still bottlenecked by the number of subject matter experts available for labeling critically important data, like cancer diagnosis and drug trafficking rings. In order to get labels from subject matter experts, you typically have to put them through a very tedious process of labeling to build up a useful structured dataset upfront before any useful machine learning can be done.I did some very minor ML work about 5 years ago and found Christopher Re's work on DeepDive at Stanford. It takes a revolutionary approach by making it easy to write the labeling functions themselves. This turns the labeling process into an iterative, REPL like experience where subject matter experts can suggest a function, see its impact right away, and continue refining it, assisted by AI. DeepDive is now commercialized in a startup called Snorkel.AI, so I was very excited to find a clear explanation of Snorkelflow from its CEO, Alex Ratner. Here it is!Transcript[00:01:15] Alex Ratner: [00:01:15] SnorkelFlow is a platform that's meant to take this process of building machine learning models and AI applications. And I get all starting with buildings, the data that they rely on that fuels them and make it, in a nutshell, look more like an iterative software development process. Then you know, this kind of 80, 90% upfront just, hand labeling exercise.[00:01:34]And so snorkel flow supports that entire iterative loop of, actually laboring data. Can be by hand in the platform, but also most centrally programmatically by letting users, what we call labeling. Basic idea, is that rather than say asking your, legal associate at a bank to, or your doctor friends to sit down and, label a hundred thousand contracts or a hundred thousand electronic health records have them, right.[00:02:00]Sharistics are bits of their expertise look for this keyword or look for this pattern or look for this, et cetera. I'm like a bridge from old, expert knowledge type input. Modern machine learning models using one to power. The other. So a snorkel flow is an IDE basically, and has a no-code UI component as well, but let's not people either via code or by pushing buttons for even, non-developer subject matter experts say to.[00:02:24]Programmatically labeled their data by writing these labeling functions and then uses a bunch of modeling techniques. A lot of which was actually, the work that, that the co-founding team. And I did in, in, in our kind of thesis work around how you take a bunch of programmatic data and clean it up and turn it into a final.[00:02:41]Instead of clean training data for machine learning models, and then actually in snorkel flow, you can, autumn, basically push button train best-in-class open source models. You can then analyze where they're succeeding or failing and, and use that to go back and iterate on your data.[00:02:54]And there's a Python SDK throughout the whole thing. So many of our customers will mix and match. Will you start. Create the training data set and then train the model on some other system, et cetera. But what's normal flames of support. Is it basic iterative development process where, you know, rather than just spending months to label a training at once and then being stuck with it and having to throw it out and start all over again, anything in the world changes your upstream input, data changes your downstream objectives.[00:03:18] Change, making it again more like an iterative process where you push some buttons or write some code. That label the data. You compile a model or train it, but you can think of it like compiling and then you go back and debug by, by iterating on your data, everything centers and snorkel flow around looking at your data and iterating on how it's labeled to improve models.[00:03:38]Brian Gracely: [00:03:38] I'm curious. So you mentioned you mentioned in there's a there's a Python SDK, which for anybody who, works in data science, data modeling, right? Python is your language to Frank sort of the language you use or are you a couple of them, that's the language that, you how you do your program, but I'm curious, like in today's world, Do data scientists consider themselves programmers or is there still Hey, look, I work on the numbers, I'm good at building models and the numbers, but I don't think of myself as a programmer.[00:04:08] Like how do you bridge those two worlds together or do you not really have to bridge them together? How much does the data scientists have to go? I have to focus on numbers and models versus I have to focus on programming, something to do stuff. What's their world look like?[00:04:21]Alex Ratner: [00:04:21] It's a great question. I think I, I haven't been are currently I'm part of four or five different data science institutes or something. And I don't even still know. I mean, the data science is such a broad umbrella term. There's so many different varietals of us and, and types.[00:04:35] And so I do think there's a very broad spectrum of, the data scientists. An ML engineer and just, loves writing codes are the one that, to your point really just wants to push some buttons and get back to the numbers and the modeling and the outcome. And, we definitely, try to support the range through a layered approach.[00:04:50]And, we, we have , but on top of that, we have a a no-code UI that allows you to write these wavelength functions without writing code. So for example, if you're trying to train a CA a contract classifier and snorkeled flow, you can, write Lateline functions based on clicking on keywords or pressing buttons with kind of templates for types of patterns or signals you want to look for.[00:05:11] So, No we try to support basically, if you want to move fast and you're a non developer, or you're just not looking to spend time there, you can just do it in push-button way. But then if you want to go and customize or inject custom logic or really get creative, you can always fall back to the Python SDK.[00:05:27] And so, I mean, I think a lot of the what we're trying to accomplish in the very beginning, right? Raised me abstraction know level at which you're interfacing with and programming your machine learning model or your AI application. And the first step is the hardest, right?[00:05:39] If you think of the way that hand labeled training data is, it's like the machine code, or really actually, just so you know, I think of it as like the ones and zeros, literally for binary classification cases. Yeah, a lot of the effort behind the circle project and the company is just, or was just getting from that layer to the layer of, assembly language day.[00:05:57] But once you get there, you can build all those layers on top and you can go up the stack and down the stack, according to the application of the user type, right. Actually, my co-founder Braden who was, who also did his PhD around, snorkel related stuff, had a paper actually on how you could use natural language inputs.[00:06:12] You could explain in, in natural light. Just speaking to the computer, why a certain data point should be labeled a certain way and then use off the shelf semantic parsers to parse that down to code, which then would get dumped into snorkel. So basically once you make this leap from labeling data, by ham kind of zeros and ones to labeling your training data with code, then the sky's the limit in terms of building layers of abstraction on top of it.[00:06:35] And that's actually a lot of what the company does and has been doing over the last two years is. Building a flexible interface through our platform, snorkel flow for different data types and use case types and user types. [00:06:45]Brian Gracely: [00:06:45] Yep. Well, and, and I think you, you really answered my question in there.[00:06:49] The reason I brought it up was on one hand you have this you have this language level SDK in terms of Python, you can get into, Some pretty granular level stuff. And then you have, on the other end, you've got application studio, which you said, like you said is this sort of low code graphical way of, building templates and building applications.[00:07:08] And I was like, There must be like, I think sometimes there's just perspective of there's one profile of a data scientist. And I think what you really highlighted is it, it's like a lot of things there's a spectrum of, those that specialize in one part of the job, others that don't care about it and want it, certain things to be easy.[00:07:25] And so that, that was useful because I think sometimes like in my head, I'm thinking, okay, Data scientists is served a certain sort of task the same way you might say okay, they're a Java developer. So they, there's a tool set that they always use. So that was super helpful.[00:07:39]Alex Ratner: [00:07:39] Yeah. And it depends on what the problem is too. I mean, the other thing also that I think goes under, emphasized in the air space big. Points number one. And I don't think it's that avant gardening where to say it was maybe more back in 2015 is, Hey, AI is about the data, not the models or the algorithms, which I think, fewer people will find a controversial statement today.[00:07:57]Even if it's phrases in a somewhat reductive way. But the other thing that I still think is under emphasized in practices and necessity of lupus. What we often refer to as subject matter experts into the process. And so I think w and I won't ramble here too long, but just for some perspective, and this is actually the very first funding that, that the snorkel project ever had was specifically about looping what they call SMEEs and the government subject matter experts.[00:08:20]Our original partners were some genomicists at Stanford. How do you loop them into the. Of AI in a better way than just saying, Hey go label data for eight months for me, please. And this idea of how do you get subject matter expertise from a human's head into a scalable machine format has been the focus of AI for, decades, but the answer of modern machine learning today for the last, five, 10 years.[00:08:44] Okay, just sit them down, have them labeled data points one by one, nothing else. They've got all of this rich domain knowledge, a doctor, a lawyer, a cyber analyst, network, technician, and underwriter. Throw that all away, just have them literally just, give zeros and ones labeling data. And that's a nice abstraction.[00:09:01]And it has been actually a very productive one for the field, because that means the ML engineers can totally abstract the way the messy realities of real-world data and real world subject matter experts. And just focus on optimizing, a fancier model architecture. But I think we've reached a point where it starts to become silly and impractical to have this wall.[00:09:19] The subject matter expert and the data scientists. So I'll let us loop back and say, but a big focus of circle flow is about making these interfaces in this process, accessible to a non-developer who's, a legal associate or an underwriter or a network technician and have the process too. And that's another motivation behind the kind of, layers, including no-code UI.
On this podcast I am joined by Braden Hancock who is a co-founder and Head of Technology at Snorkel AI. Snorkel AI is unlocking a better, faster way to build applications With Snorkel Flow, the first truly data-centric AI platform. To date they have raised over $50 million from top VC firms such as Greylock, Lightspeed, GV and others. Snorkel AI is solving a real problem that has been holding back AI adoption by simplifying data labeling and making AI projects look more like software development efforts. Delivering both successfully will encourage companies to more fully embrace AI by removing some of the barriers and hassles that make “doing AI” so difficult. On this podcast we talk about Braden's journey to entrepreneurship, the origin story of Snorkel, how their approach to data labeling works and how it helps unlock the power of AI in the enterprise, why SME's are the key to data labelling and why Snorkel's approach empowers them, why you should drop what you are doing and immediately send in a resume and much more. I am excited for this conversation because Snorkel AI is solving a real problem in the industry that has been holding back AI adoption and I am just really impressed with their team. In my opinion this is definitely a company to keep an eye on. Let's get to it!
Developers of AI applications face many obstacles, but the chief challenge is simply that these are different from traditional software development projects. 85% of businesses say they are looking to adopt AI but a similar percentage of data science projects never reach production. Too many organizations approach AI application development similarly to other software projects. Another issue is focusing on the machine learning model rather than the data set that will be used. Devang Sachdev of Snorkel AI suggests being data-focused instead, and reducing and optimizing models instead of continually expanding the number of parameters. Another issue is the manual process of developing training data, which is time-consuming and error-prone. Finally, we must consider a process of iteration over models and training data to ensure quality. Machine learning is an excellent tool but it requires a re-think in how a company approaches software development. Three Questions Is it possible to create a truly unbiased AI? Can you think of an application for ML that has not yet been rolled out but will make a major impact in the future? How big can ML models get? Will today's hundred-billion parameter model look small tomorrow or have we reached the limit? Guests and Hosts Devang Sachdev, VP of Marketing at Snorkel AI. Connect with Devang on LinkedIn or on Twitter at @DevangSachdev. Chris Grundemann, Gigaom Analyst and Managing Director at Grundemann Technology Solutions. Connect with Chris on ChrisGrundemann.com on Twitter at @ChrisGrundemann. Stephen Foskett, Publisher of Gestalt IT and Organizer of Tech Field Day. Find Stephen's writing at GestaltIT.com and on Twitter at @SFoskett. Date: 6/08/2021 Tags: @SFoskett, @ChrisGrundemann, @SnorkelAI, @DevangSachdev
In this episode Dr Bahijja Raimi-Abraham discusses artificial intelligence (AI) and healthcare with Brandon Yang (bio available here - https://mondayscience.wixsite.com/podcast/episode26), Machine Learning Engineer at Snorkel AI. Episode image credit: https://unsplash.com/ Additional Information Snorkel Open Source - https://www.snorkel.org/ Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension Previous episodes discussing AI, AI and healthcare and AI and ethics -Episode 6 -Episode 23 [Interview with Dr David Leslie, The Alan Turing Institute) Episode summary available MondayScience.Medium.com Let us know what you thought of the episode. Subscribe, follow, comment and get in touch! Submit your questions or send your voice note questions (up to 30 seconds) via www.mondaysciencepodcast.com e. MondayScience2020@gmail.com --- Send in a voice message: https://anchor.fm/mondayscience/message
In this episode Dr Bahijja Raimi-Abraham discusses artificial intelligence (AI) and healthcare with Brandon Yang (bio available here - https://mondayscience.wixsite.com/podcast/episode26), Machine Learning Engineer at Snorkel AI. Episode image credit: https://unsplash.com/ Additional Information Snorkel Open Source - https://www.snorkel.org/ Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension Previous episodes discussing AI, AI and healthcare and AI and ethics -Episode 6 -Episode 23 [Interview with Dr David Leslie, The Alan Turing Institute) Subscribe, follow, comment and get in touch! Submit your questions or send your voice note questions (up to 30 seconds) via https://mondayscience.wixsite.com/podcast e. MondayScience2020@gmail.com --- Send in a voice message: https://anchor.fm/mondayscience/message