Stanford MLSys Seminar

Follow Stanford MLSys Seminar
Share on
Copy link to clipboard

Machine learning is driving exciting changes and progress in computing. What does the ubiquity of machine learning mean for how people build and deploy systems and applications? What challenges does industry face when deploying machine learning systems in


    • Apr 27, 2022 LATEST EPISODE
    • infrequent NEW EPISODES
    • 59m AVG DURATION
    • 24 EPISODES


    Search for episodes from Stanford MLSys Seminar with a specific topic:

    Latest episodes from Stanford MLSys Seminar

    #62 Dan Fu - Improving Transfer and Robustness of Supervised Contrastive Learning

    Play Episode Listen Later Apr 27, 2022 56:52


    Dan Fu - An ideal learned representation should display transferability and robustness. Supervised contrastive learning is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse -- when all points in a class map to the same representation. In this talk, we discuss how to alleviate these problems to improve the geometry of supervised contrastive learning. We identify two key principles: balancing the right amount of geometric "spread" in the embedding space, and inducing an inductive bias towards subclass clustering. We introduce two mechanisms for achieving these aims in supervised contrastive learning, and show that doing so improves transfer learning and worst-group robustness. Next, we show how we can apply these insights to improve entity retrieval in open-domain NLP tasks (e.g., QA, search). We present a new method, TABi, that trains bi-encoders with a type-aware supervised contrastive loss and improves long-tailed entity retrieval.

    #61 Kexin Rong - Big Data Analytics

    Play Episode Listen Later Apr 22, 2022 59:20


    Kexin Rong - Learned Indexing and Sampling for Improving Query Performance in Big-Data Analytics Traditional data analytics systems improve query efficiency via fine-grained, row-level indexing and sampling techniques. However, to keep up with the data volumes, increasingly many systems store and process datasets in large partitions containing hundreds of thousands of rows. Therefore, these analytics systems must adapt traditional techniques to work with coarse-grained data partitions as a basic unit to process queries efficiently. In this talk, I will discuss two related ideas that combine learning techniques with partitioning designs to improve the query efficiency in the analytics systems. First, I will describe PS3, the first approximate query processing system that supports non-uniform, partition-level samples. PS3 reduces the number of partitions accessed by 3 to 70x to achieve the same error compared to a uniform sample of the partitions. Next, I will present OLO, an online learning framework that dynamically adapts data organization according to changes in query workload to minimize overall data access and movement. We show that dynamic reorganization outperforms a single, optimized partitioning scheme by up to 30% in end-to-end runtime. I will conclude by discussing additional open problems in this area.

    #60 Igor Markov - Looper: An End-to-End ML Platform for Product Decisions

    Play Episode Listen Later Apr 11, 2022 60:01


    Igor Markov - Looper: an end-to-end ML platform for product decisions Episode 60 of the Stanford MLSys Seminar Series! Looper: an end-to-end ML platform for product decisions Speaker: Igor Markov Abstract: Modern software systems and products increasingly rely on machine learning models to make data-driven decisions based on interactions with users, infrastructure and other systems. For broader adoption, this practice must (i) accommodate product engineers without ML backgrounds, (ii) support fine-grain product-metric evaluation and (iii) optimize for product goals. To address shortcomings of prior platforms, we introduce general principles for and the architecture of an ML platform, Looper, with simple APIs for decision-making and feedback collection. Looper covers the end-to-end ML lifecycle from collecting training data and model training to deployment and inference, and extends support to personalization, causal evaluation with heterogenous treatment effects, and Bayesian tuning for product goals. During the 2021 production deployment Looper simultaneously hosted 440-1,000 ML models that made 4-6 million real-time decisions per second. We sum up experiences of platform adopters and describe their learning curve.

    #59 Zhuohan Li - Alpa: Automated Model-Parallel Deep Learning

    Play Episode Listen Later Apr 4, 2022 55:06


    Zhuohan Li - Alpa: Automated Model-Parallel Deep Learning Alpa (https://github.com/alpa-projects/alpa) automates model-parallel training of large deep learning models by generating execution plans that unify data, operator, and pipeline parallelism. Alpa distributes the training of large deep learning models by viewing parallelisms as two hierarchical levels: inter-operator and intra-operator parallelisms. Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans. Alpa designs a number of compilation passes to automatically derive the optimal parallel execution plan in each independent parallelism level and implements an efficient runtime to orchestrate the two-level parallel execution on distributed compute devices. Alpa generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on models they are designed for. Unlike specialized systems, Alpa also generalizes to models with heterogeneous architectures and models without manually-designed plans.

    3/10/22 #58 Shruti Bhosale - Multilingual Machine Translation

    Play Episode Listen Later Mar 18, 2022 57:57


    Shruti Bhosale - Scaling Multilingual Machine Translation to Thousands of Language Directions Existing work in translation has demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this talk, I will describe how we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.

    3/3/22 #57 Vijay Janapa Reddi - TinyML, Harvard Style

    Play Episode Listen Later Mar 4, 2022 57:36


    Vijay Janapa Reddi - Tiny Machine Learning Tiny machine learning (TinyML) is a fast-growing field at the intersection of ML algorithms and low-cost embedded systems. TinyML enables on-device analysis of sensor data (vision, audio, IMU, etc.) at ultra-low-power consumption (less than 1mW). Processing data close to the sensor allows for an expansive new variety of always-on ML use-cases that preserve bandwidth, latency, and energy while improving responsiveness and maintaining privacy. This talk introduces the vision behind TinyML and showcases some of the interesting applications that TinyML is enabling in the field, from wildlife conservation to supporting public health initiatives. Yet, there are still numerous technical hardware and software challenges to address. Tight memory and storage constraints, MCU heterogeneity, software fragmentation and a lack of relevant large-scale datasets pose a substantial barrier to developing TinyML applications. To this end, the talk touches upon some of the research opportunities for unlocking the full potential of TinyML.

    2/24/22 #56 Fait Poms - Interactive Model Development

    Play Episode Listen Later Feb 28, 2022 55:54


    Fait Poms - A vision for interactive model development: efficient machine learning by bringing domain experts in the loop Building computer vision models today is an exercise in patience--days to weeks for human annotators to label data, hours to days to train and evaluate models, weeks to months of iteration to reach a production model. Without tolerance for this timeline or access to the massive compute and human resources required, building an accurate model can be challenging if not impossible. In this talk, we discuss a vision for interactive model development with iteration cycles of minutes, not weeks. We believe the key to this is integrating the domain expert at key points in the model building cycle and leveraging supervision cues above just example-level annotation. We will discuss our recent progress toward aspects of this goal: judiciously choosing when to use the machine and when to use the domain expert for fast, low label budget model training (CVPR 2021, ICCV 2021), building confidence in model performance with low-shot validation (ICCV 2021 Oral), and some initial tools for rapidly defining correctness criteria.

    1/28/21 #10 Travis Addair - Deep Learning at Scale with Horovod

    Play Episode Listen Later Feb 23, 2022 59:52


    Travis Addair - Horovod and the Evolution of Deep Learning at Scale Deep neural networks are pushing the state of the art in numerous machine learning research domains; from computer vision, to natural language processing, and even tabular business data. However, scaling such models to train efficiently on large datasets imposes a unique set of challenges that traditional batch data processing systems were not designed to solve. Horovod is an open source framework that scales models written in TensorFlow, PyTorch, and MXNet to train seamlessly on hundreds of GPUs in parallel. In this talk, we'll explain the concepts and unique constraints that led to the development of Horovod at Uber, and discuss how the latest trends in deep learning research are informing the future direction of the project within the Linux Foundation. We'll explore how Horovod fits into production ML workflows in industry, and how tools like Spark and Ray can combine with Horovod to make productionizing deep learning at scale on remote data centers as simple as running locally on your laptop. Finally, we'll share some thoughts on what's next for large scale deep learning, including new distributed training architectures and how the larger ecosystem of production ML tooling is evolving.

    2/17/22 #55 Doris Lee - Visualization for Data Science

    Play Episode Listen Later Feb 19, 2022 58:42


    Doris Lee - Always-on Dataframe Visualizations with Lux Visualizations help data scientists discover trends, patterns, identify outliers, and derive insights from their data. However, existing visualization libraries in Python require users to write a substantial amount of code for plotting even a single visualization, often hindering the flow of data exploration. In this talk, you will learn about Lux, a lightweight visualization tool on top of pandas dataframes. Lux recommends visualizations for free to users as they explore their data within a Jupyter notebook without the need to write additional code. Lux is used by data scientists across a variety of industries and sectors and has nearly 66k total downloads and over 3.3k stars on GitHub. For more information, see: https://github.com/lux-org/lux.

    1/21/21 #9 Song Han - Reducing AI's Carbon Footprint

    Play Episode Listen Later Feb 15, 2022 56:30


    Song Han - TinyML: Reducing the Carbon Footprint of Artificial Intelligence in the Internet of Things (IoT) Deep learning is computation-hungry and data-hungry. We aim to improve the computation efficiency and data efficiency of deep learning. I will first talk about MCUNet[1] that brings deep learning to IoT devices. The technique is tiny neural architecture search (TinyNAS) co-designed with a tiny inference engine (TinyEngine), enabling ImageNet-scale inference on an IoT device with only 1MB of FLASH. Next I will talk about TinyTL[2] that enables on-device training, reducing the memory footprint by 7-13x. Finally, I will describe Differentiable Augmentation[3] that enables data-efficient GAN training, generating photo-realistic images using only 100 images, which used to require tens of thousand of images. We hope such TinyML techniques can make AI greener, faster, and more sustainable.

    2/10/22 #54 Ellie Pavlick - Do Deep Models Learn Symbolic Reasoning?

    Play Episode Listen Later Feb 12, 2022 61:39


    Ellie Pavlick - Implementing Symbols and Rules with Neural Networks Many aspects of human language and reasoning are well explained in terms of symbols and rules. However, state-of-the-art computational models are based on large neural networks which lack explicit symbolic representations of the type frequently used in cognitive theories. One response has been the development of neuro-symbolic models which introduce explicit representations of symbols into neural network architectures or loss functions. In terms of Marr's levels of analysis, such approaches achieve symbolic reasoning at the computational level ("what the system does and why") by introducing symbols and rules at the implementation and algorithmic levels. In this talk, I will consider an alternative: can neural networks (without any explicit symbolic components) nonetheless implement symbolic reasoning at the computational level? I will describe several diagnostic tests of "symbolic" and "rule-governed" behavior and use these tests to analyze neural models of visual and language processing. Our results show that on many counts, neural models appear to encode symbol-like concepts (e.g., conceptual representations that are abstract, systematic, and modular), but not perfectly so. Analysis of the failure cases reveals that future work is needed on methodological tools for analyzing neural networks, as well as refinement of models of hybrid neuro-symbolic reasoning in humans, in order to determine whether neural networks' deviations from the symbolic paradigm are a feature or a bug.

    12/10/20 #8 Kayvon Fatahalian - Video Analysis in Hours, Not Weeks

    Play Episode Listen Later Feb 7, 2022 63:04


    Kayvon Fatahalian - From Ideas to Video Analysis Models in Hours, Not Weeks My students and I often find ourselves as "subject matter experts" needing to create video understanding models that serve computer graphics and video analysis applications. Unfortunately, like many, we are frustrated by how a smart grad student, armed with a large *unlabeled* video collection, a palette of pre-trained models, and an idea of what novel object or activity they want to detect/segment/classify, requires days-to-weeks to create and validate a model for their task. In this talk I will discuss challenges we've faced in the iterative process of curating data, training models, and validating models for the specific case of rare events and categories in image and video collections. In this regime we've found that conventional wisdom about training on imbalance data sets, and data acquisition via active learning does not lead to the most efficient solutions. I'll discuss these challenges in the context of image and video analysis applications, and elaborate on our ongoing vision of how a grad student, armed with massive amounts of unlabeled video data, pretrained models, and available-in-seconds-supercomputing-scale elastic compute should be able to interactively iterate on cycles of acquiring training data, training models, and validating models.

    2/3/22 #53 Cody Coleman - Data Selection for Data-Centric AI

    Play Episode Listen Later Feb 5, 2022 55:25


    Cody Coleman - Data selection for Data-Centric AI: Data Quality Over Quantity Data selection methods, such as active learning and core-set selection, improve the data efficiency of machine learning by identifying the most informative data points to label or train on. Across the data selection literature, there are many ways to identify these training examples. However, classical data selection methods are prohibitively expensive to apply in deep learning because of the larger datasets and models. This talk will describe two techniques to make data selection methods more tractable. First, "selection via proxy" (SVP) avoids expensive training and reduces the computation per example by using smaller proxy models to quantify the informativeness of each example. Second, "similarity search for efficient active learning and search" (SEALS) reduces the number of examples processed by restricting the candidate pool for labeling to the nearest neighbors of the currently labeled set instead of scanning over all of the unlabeled data. Both methods lead to order of magnitude performance improvements, making active learning applications on billions of unlabeled images practical for the first time.

    1/27/22 #52 Bilge Acun - Sustainability for AI

    Play Episode Listen Later Jan 31, 2022 58:09


    Bilge Acun - Designing Sustainable Datacenters with and for AI Machine learning has witnessed exponential growth over the recent years. In this talk, we will first explore the environmental implications of the super-linear growth trend of AI from a holistic perspective, spanning data, algorithms, and system hardware. System efficiency optimizations can significantly help reducing the carbon footprint of AI systems. However, predictions show that the efficiency improvements will not be enough to reduce the overall resource needs of AI as Jevon's Paradox suggests "efficiency increases consumption". Therefore, we need to design our datacenters with sustainability in mind, using renewable energy every hour of every day. Relying on wind and solar energy 24/7 is challenging due to their intermittent nature. To cope with the fluctuations of renewable energy generation, multiple solutions can be applied such as energy storage and carbon aware scheduling for the workloads. In this talk, I will introduce a framework to analyze the multi-dimensional solution space by taking into account the operational and embodided footprint of the solutions and further how AI can be a part of the solution.

    12/3/20 #7 Matthias Poloczek - Bayesian Optimization

    Play Episode Listen Later Jan 24, 2022 59:20


    Matthias Poloczek - Scalable Bayesian Optimization for Industrial Applications Bayesian optimization has become a powerful method for the sample-efficient optimization of expensive black-box functions. These functions do not have a closed-form and are evaluated for example by running a complex economic simulation, by an experiment in the lab or in a market, or by a CFD simulation. Use cases arise in machine learning, e.g., when tuning the configuration of an ML model or when optimizing a reinforcement learning policy. Examples in engineering include the design of aerodynamic structures or materials discovery. In this talk I will introduce the key ideas of Bayesian optimization and discuss how they can be applied to tuning ML models. Moreover, I will share some experiences with developing a Bayesian optimization service in industry.

    01/20/22 #51 Fred Sala - Weak Supervision for Diverse Datatypes

    Play Episode Listen Later Jan 21, 2022 53:14


    Fred Sala - Efficiently Constructing Datasets for Diverse Datatypes Building large datasets for data-hungry models is a key challenge in modern machine learning. Weak supervision frameworks have become a popular way to bypass this bottleneck. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality pseudolabels for downstream training. In this talk, I introduce a technique that fuses weak supervision with structured prediction, enabling WS techniques to be applied to extremely diverse types of data. This approach allows for labels that can be continuous, manifold-valued (including, for example, points in hyperbolic space), rankings, sequences, graphs, and more. I will discuss theoretical guarantees for this universal weak supervision technique, connecting the consistency of weak supervision estimators to low-distortion embeddings of metric spaces. I will show experimental results in a variety of problems, including learning to rank, geodesic regression, and semantic dependency parsing. Finally I will present and discuss future opportunities for automated dataset construction.

    11/19/20 #6 Roy Frostig - The Story Behind JAX

    Play Episode Listen Later Jan 17, 2022 66:53


    Roy Frostig - JAX: accelerating machine learning research by composing function transformations in Python JAX is a system for high-performance machine learning research and numerical computing. It offers the familiarity of Python+NumPy together with hardware acceleration, plus a set of composable function transformations: automatic differentiation, automatic batching, end-to-end compilation (via XLA), parallelizing over multiple accelerators, and more. JAX's core strength is its guarantee that these user-wielded transformations can be composed arbitrarily, so that programmers can write math (e.g. a loss function) and transform it into pieces of an ML program (e.g. a vectorized, compiled, batch gradient function for that loss). JAX had its open-source release in December 2018 (https://github.com/google/jax). It's used by researchers for a wide range of applications, from studying training dynamics of neural networks, to probabilistic programming, to scientific applications in physics and biology.

    01/13/22 #50 Deepak Narayanan - Resource-Efficient Deep Learning Execution

    Play Episode Listen Later Jan 14, 2022 57:13


    Deepak Narayanan - Resource-Efficient Deep Learning Execution Deep Learning models have enabled state-of-the-art results across a broad range of applications; however, training these models is extremely time- and resource-intensive, taking weeks on clusters with thousands of expensive accelerators in the extreme case. In this talk, I will describe two ideas that help improve the resource efficiency of model training. In the first half of the talk, I will discuss how pipelining can be used to accelerate distributed training. Pipeline parallelism facilitates model training with lower communication overhead than previous methods while still ensuring high compute resource utilization. Pipeline parallelism also enables the efficient training of large models that do not fit on a single worker; for example, we used pipeline parallelism at Nvidia to efficiently scale training to language models with a trillion parameters on 3000+ GPUs. In the second half of this talk, I will describe how resources in a shared cluster with heterogeneous compute resources (e.g., different types of hardware accelerators) should be partitioned among different users to optimize objectives specified over one or more training jobs. Heterogeneity-aware scheduling can improve various scheduling objectives, such as average completion time, makespan, or cloud computing resource cost, by up to 3.5x.

    11/12/20 #5 Chip Huyen - Principles of Good Machine Learning Systems Design

    Play Episode Listen Later Jan 10, 2022 66:38


    Chip Huyen - Principles of Good Machine Learning Systems Design This talk covers what it means to operationalize ML models. It starts by analyzing the difference between ML in research vs. in production, ML systems vs. traditional software, as well as myths about ML production. It then goes over the principles of good ML systems design and introduces an iterative framework for ML systems design, from scoping the project, data management, model development, deployment, maintenance, to business analysis. It covers the differences between DataOps, ML Engineering, MLOps, and data science, and where each fits into the framework. It also discusses the main skills each stage requires, which can help companies in structuring their teams. The talk ends with a survey of the ML production ecosystem, the economics of open source, and open-core businesses.

    11/5/20 #4 Alex Ratner - Programmatically Building & Managing Training Data with Snorkel

    Play Episode Listen Later Jan 8, 2022 73:29


    Alex Ratner - Programmatically Building & Managing Training Data with Snorkel One of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today's models require. In this talk, I will describe our work on Snorkel (snorkel.org), an open-source framework for building and managing training datasets, and describe three key operators for letting users build and manipulate training datasets: labeling functions, for labeling unlabeled data; transformation functions, for expressing data augmentation strategies; and slicing functions, for partitioning and structuring training datasets. These operators allow domain expert users to specify machine learning (ML) models entirely via noisy operators over training data, expressed as simple Python functions---or even via higher level NL or point-and-click interfaces---leading to applications that can be built in hours or days, rather than months or years, and that can be iteratively developed, modified, versioned, and audited. I will describe recent work on modeling the noise and imprecision inherent in these operators, and using these approaches to train ML models that solve real-world problems, including recent state-of-the-art results on benchmark tasks and real-world industry, government, and medical deployments.

    11/5/20 #3 Virginia Smith - On Heterogeneity in Federated Settings

    Play Episode Listen Later Jan 8, 2022 60:46


    Virginia Smith - On Heterogeneity in Federated Settings A defining characteristic of federated learning is the presence of heterogeneity, i.e., that data and compute may differ significantly across the network. In this talk I show that the challenge of heterogeneity pervades the machine learning process in federated settings, affecting issues such as optimization, modeling, and fairness. In terms of optimization, I discuss FedProx, a distributed optimization method that offers robustness to systems and statistical heterogeneity. I then explore the role that heterogeneity plays in delivering models that are accurate and fair to all users/devices in the network. Our work here extends classical ideas in multi-task learning and alpha-fairness to large-scale heterogeneous networks, enabling flexible, accurate, and fair federated learning.

    10/22/20 #2 Matei Zaharia - Machine Learning at Industrial Scale: Lessons from the MLflow Project

    Play Episode Listen Later Jan 8, 2022 59:33


    Matei Zaharia - Machine Learning at Industrial Scale: Lessons from the MLflow Project Although enterprise adoption of machine learning is still early on, many enterprises in all industries already have hundreds of internal ML applications. ML powers business processes with an impact of hundreds of millions of dollars in industrial IoT, finance, healthcare and retail. Building and operating these applications reliably requires infrastructure that is different from traditional software development, which has led to significant investment in the construction of “ML platforms” specifically designed to run ML applications. In this talk, I'll discuss some of the common challenges in productionizing ML applications based on experience building MLflow, an open source ML platform started at Databricks. MLflow is now the most widely used open source project in this area, with over 2 million downloads a month and integrations with dozens of other products. I'll also highlight some interesting problems users face that are not covered deeply in current ML systems research, such as the need for “hands-free” ML that can train thousands of independent models without direct tuning from the ML developer for regulatory reasons, and the impact of privacy and interpretability regulations on ML. All my examples will be based on experience at large Databricks / MLflow customers.

    10/15/20 #1 Marco Tulio Ribeiro - Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

    Play Episode Listen Later Jan 8, 2022 60:18


    Marco Tulio Ribeiro on "Beyond Accuracy: Behavioral Testing of NLP Models with CheckList" We will present CheckList, a task-agnostic methodology and tool for testing NLP models inspired by principles of behavioral testing in software engineering. We will show a lot of fun bugs we discovered with CheckList, both in commercial models (Microsoft, Amazon, Google) and research models (BERT, RoBERTA for sentiment analysis, QQP, SQuAD). We'll also present comparisons between CheckList and the status quo, in a case study at Microsoft and a user study with researchers and engineers. We show that CheckList is a really helpful process and tool for testing and finding bugs in NLP models, both for practitioners and researchers.

    01/06/22 #49 Beidi Chen - Pixelated Butterfly: Fast Machine Learning with Sparsity

    Play Episode Listen Later Jan 8, 2022 53:06


    Beidi Chen talks about "Pixelated Butterfly: Simple and Efficient Sparse Training for Neural Network Models." Overparameterized neural networks generalize well but are expensive to train. Ideally, one would like to reduce their computational cost while retaining their generalization benefits. Sparse model training is a simple and promising approach to achieve this, but there remain challenges as existing methods struggle with accuracy loss, slow training runtime, or difficulty in sparsifying all model components. The core problem is that searching for a sparsity mask over a discrete set of sparse matrices is difficult and expensive. To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices. As butterfly matrices are not hardware efficient, we propose simple variants of butterfly (block and flat) to take advantage of modern hardware. Our method (Pixelated Butterfly) uses a simple fixed sparsity pattern based on flat block butterfly and low-rank matrices to sparsify most network layers (e.g., attention, MLP). We empirically validate that Pixelated Butterfly is 3x faster than butterfly and speeds up training to achieve favorable accuracy--efficiency tradeoffs. On the ImageNet classification and WikiText-103 language modeling tasks, our sparse models train up to 2.5x faster than the dense MLP-Mixer, Vision Transformer, and GPT-2 medium with no drop in accuracy.

    Claim Stanford MLSys Seminar

    In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

    Claim Cancel