O'Reilly Data Show - O'Reilly Media Podcast

Follow O'Reilly Data Show - O'Reilly Media Podcast
Share on
Copy link to clipboard

The O'Reilly Data Show explores the opportunities and techniques driving big data, data science, and AI. Through interviews and analysis, we highlight the people putting data to work.

O'Reilly Media


    • Oct 10, 2019 LATEST EPISODE
    • infrequent NEW EPISODES
    • 36m AVG DURATION
    • 62 EPISODES


    Search for episodes from O'Reilly Data Show - O'Reilly Media Podcast with a specific topic:

    Latest episodes from O'Reilly Data Show - O'Reilly Media Podcast

    Machine learning for operational analytics and business intelligence

    Play Episode Listen Later Oct 10, 2019 51:38


    In this episode of the Data Show, I speak with Peter Bailis, founder and CEO of Sisu, a startup that is using machine learning to improve operational analytics. Bailis is also an assistant professor of computer science at Stanford University, where he conducts research into data-intensive systems and where he is co-founder of the DAWN […]

    Machine learning and analytics for time series data

    Play Episode Listen Later Sep 26, 2019 40:31


    In this episode of the Data Show, I speak with Arun Kejariwal of Facebook and Ira Cohen of Anodot (full disclosure: I’m an advisor to Anodot). This conversation stemmed from a recent online panel discussion we did, where we discussed time series data, and, specifically, anomaly detection and forecasting. Both Kejariwal (at Machine Zone, Twitter, […]

    data show
    Understanding deep neural networks

    Play Episode Listen Later Sep 12, 2019 39:31


    In this episode of the Data Show, I speak with Michael Mahoney, a member of RISELab, the International Computer Science Institute, and the Department of Statistics at UC Berkeley. A physicist by training, Mahoney has been at the forefront of many important problems in large-scale data analysis. On the theoretical side, his works spans algorithmic […]

    Becoming a machine learning practitioner

    Play Episode Listen Later Aug 29, 2019 33:22


    In this episode of the Data Show, I speak with Kesha Williams, technical instructor at A Cloud Guru, a training company focused on cloud computing. As a full stack web developer, Williams became intrigued by machine learning and started teaching herself the ML tools on Amazon Web Services. Fast forward to today, Williams has built […]

    ml amazon web services cloud guru kesha williams data show
    Labeling, transforming, and structuring training data sets for machine learning

    Play Episode Listen Later Aug 15, 2019 40:51


    In this episode of the Data Show, I speak with Alex Ratner, project lead for Stanford’s Snorkel open source project; Ratner also recently garnered a faculty position at the University of Washington and is currently working on a company supporting and extending the Snorkel project. Snorkel is a framework for building and managing training data. […]

    Make data science more useful

    Play Episode Listen Later Aug 1, 2019 35:04


    In this episode of the Data Show, I speak with Cassie Kozyrkov, technical director and chief decision scientist at Google Cloud. She describes “decision intelligence” as an interdisciplinary field concerned with all aspects of decision-making, and which combines data science with the behavioral sciences. Most recently she has been focused on developing best practices that […]

    Acquiring and sharing high-quality data

    Play Episode Listen Later Jul 18, 2019 39:20


    In this episode of the Data Show, I spoke with Roger Chen, co-founder and CEO of Computable Labs, a startup focused on building tools for the creation of data networks and data exchanges. Chen has also served as co-chair of O’Reilly’s Artificial Intelligence Conference since its inception in 2016. This conversation took place the day […]

    Tools for machine learning development

    Play Episode Listen Later Jul 3, 2019 39:24


    In this week’s episode of the Data Show, we’re featuring an interview Data Show host Ben Lorica participated in for the Software Engineering Daily Podcast, where he was interviewed by Jeff Meyerson. Their conversation mainly centered around data engineering, data architecture and infrastructure, and machine learning (ML). Here are a few highlights: Tools for productive […]

    Enabling end-to-end machine learning pipelines in real-world applications

    Play Episode Listen Later Jun 20, 2019 42:53


    In this episode of the Data Show, I spoke with Nick Pentreath, principal engineer at IBM. Pentreath was an early and avid user of Apache Spark, and he subsequently became a Spark committer and PMC member. Most recently his focus has been on machine learning, particularly deep learning, and he is part of a group […]

    Bringing scalable real-time analytics to the enterprise

    Play Episode Listen Later Jun 9, 2019 37:12


    In this episode of the Data Show, I spoke with Dhruba Borthakur (co-founder and CTO) and Shruti Bhat (SVP of Product) of Rockset, a startup focused on building solutions for interactive data science and live applications. Borthakur was the founding engineer of HDFS and creator of RocksDB, while Bhat is an experienced product and marketing […]

    Bringing scalable real-time analytics to the enterprise

    Play Episode Listen Later Jun 9, 2019 37:12


    In this episode of the Data Show, I spoke with Dhruba Borthakur (co-founder and CTO) and Shruti Bhat (SVP of Product) of Rockset, a startup focused on building solutions for interactive data science and live applications. Borthakur was the founding engineer of HDFS and creator of RocksDB, while Bhat is an experienced product and marketing […]

    Applications of data science and machine learning in financial services

    Play Episode Listen Later May 23, 2019 42:32


    In this episode of the Data Show, I spoke with Jike Chong, chief data scientist at Acorns, a startup focused on building tools for micro-investing. Chong has extensive experience using analytics and machine learning in financial services, and he has experience building data science teams in the U.S. and in China. We had a great […]

    china chong acorns data show
    Applications of data science and machine learning in financial services

    Play Episode Listen Later May 23, 2019 42:32


    In this episode of the Data Show, I spoke with Jike Chong, chief data scientist at Acorns, a startup focused on building tools for micro-investing. Chong has extensive experience using analytics and machine learning in financial services, and he has experience building data science teams in the U.S. and in China. We had a great […]

    china chong acorns data show
    Real-time entity resolution made accessible

    Play Episode Listen Later May 9, 2019 27:09


    In this episode of the Data Show, I spoke with Jeff Jonas, CEO, founder and chief scientist of Senzing, a startup focused on making real-time entity resolution technologies broadly accessible. He was previously a fellow and chief scientist of context computing at IBM. Entity resolution (ER) refers to techniques and tools for identifying and linking […]

    Why companies are in need of data lineage solutions

    Play Episode Listen Later Apr 25, 2019 34:29


    In this episode of the Data Show, I spoke with Neelesh Salian, software engineer at Stitch Fix, a company that combines machine learning and human expertise to personalize shopping. As companies integrate machine learning into their products and systems, there are important foundational technologies that come into play. This shouldn’t come as a shock, as […]

    stitch fix data show
    What data scientists and data engineers can do with current generation serverless technologies

    Play Episode Listen Later Apr 11, 2019 36:32


    In this episode of the Data Show, I spoke with Avner Braverman, co-founder and CEO of Binaris, a startup that aims to bring serverless to web-scale and enterprise applications. This conversation took place shortly after the release of a seminal paper from UC Berkeley (“Cloud Programming Simplified: A Berkeley View on Serverless Computing”), and this […]

    ceo data show
    It’s time for data scientists to collaborate with researchers in other disciplines

    Play Episode Listen Later Mar 28, 2019 36:08


    In this episode of the Data Show, I spoke with Forough Poursabzi-Sangdeh, a postdoctoral researcher at Microsoft Research New York City. Poursabzi works in the interdisciplinary area of interpretable and interactive machine learning. As models and algorithms become more widespread, many important considerations are becoming active research areas: fairness and bias, safety and reliability, security […]

    data show
    Algorithms are shaping our lives—here’s how we wrest back control

    Play Episode Listen Later Mar 14, 2019 44:15


    In this episode of the Data Show, I spoke with Kartik Hosanagar, professor of technology and digital business, and professor of marketing at The Wharton School of the University of Pennsylvania.  Hosanagar is also the author of a newly released book, A Human’s Guide to Machine Intelligence, an interesting tour through the recent evolution of […]

    Why your attention is like a piece of contested territory

    Play Episode Listen Later Feb 28, 2019 43:05


    In this episode of the Data Show, I spoke with P.W. Singer, strategist and senior fellow at the New America Foundation, and a contributing editor at Popular Science. He is co-author of an excellent new book, LikeWar: The Weaponization of Social Media, which explores how social media has changed war, politics, and business. The book […]

    The technical, societal, and cultural challenges that come with the rise of fake media

    Play Episode Listen Later Feb 14, 2019 30:53


    In this episode of the Data Show, I spoke with Siwei Lyu, associate professor of computer science at the University at Albany, State University of New York. Lyu is a leading expert in digital media forensics, a field of research into tools and techniques for analyzing the authenticity of media files. Over the past year, […]

    Using machine learning and analytics to attract and retain employees

    Play Episode Listen Later Jan 31, 2019 46:54


    In this episode of the Data Show, I spoke with Maryam Jahanshahi, research scientist at TapRecruit, a startup that uses machine learning and analytics to help companies recruit more effectively. In an upcoming survey, we found that a “skills gap” or “lack of skilled people” was one of the main bottlenecks holding back adoption of […]

    data show
    How machine learning impacts information security

    Play Episode Listen Later Jan 17, 2019 39:49


    In this episode of the Data Show, I spoke with Andrew Burt, chief privacy officer and legal engineer at Immuta, a company building data management tools tuned for data science. Burt and cybersecurity pioneer Daniel Geer recently released a must-read white paper (“Flat Light”) that provides a great framework for how to think about information […]

    burt immuta data show
    In the age of AI, fundamental value resides in data

    Play Episode Listen Later Jan 3, 2019 29:41


    In this episode of the Data Show, I spoke with Haoyuan Li, CEO and founder of Alluxio, a startup commercializing the open source project with the same name (full disclosure: I’m an advisor to Alluxio). Our discussion focuses on the state of Alluxio (the open source project that has roots in UC Berkeley’s AMPLab), specifically […]

    Trends in data, machine learning, and AI

    Play Episode Listen Later Dec 20, 2018 28:37


    For the end-of-year holiday episode of the Data Show, I turned the tables on Data Show host Ben Lorica to talk about trends in big data, machine learning, and AI, and what to look for in 2019. Lorica also showcased some highlights from our upcoming Strata Data and Artificial Intelligence conferences. Here are some highlights […]

    Tools for generating deep neural networks with efficient network architectures

    Play Episode Listen Later Dec 6, 2018 32:20


    In this episode of the Data Show, I spoke with Alex Wong, associate professor at the University of Waterloo, and co-founder of DarwinAI, a startup that uses AI to address foundational challenges with deep learning in the enterprise. As the use of machine learning and analytics become more widespread, we’re beginning to see tools that […]

    Building tools for enterprise data science

    Play Episode Listen Later Nov 21, 2018 31:28


    In this episode of the Data Show, I spoke with Vitaly Gordon, VP of data science and engineering at Salesforce. As the use of machine learning becomes more widespread, we need tools that will allow data scientists to scale so they can tackle many more problems and help many more people. We need automation tools […]

    salesforce data show
    Lessons learned while helping enterprises adopt machine learning

    Play Episode Listen Later Nov 8, 2018 31:31


    In this episode of the Data Show, I spoke with Francesca Lazzeri, an AI and machine learning scientist at Microsoft, and her colleague Jaya Mathew, a senior data scientist at Microsoft. We conducted a couple of surveys this year—“How Companies Are Putting AI to Work Through Deep Learning” and “The State of Machine Learning Adoption […]

    state ai microsoft data show
    Machine learning on encrypted data

    Play Episode Listen Later Oct 25, 2018 41:22


    In this episode of the Data Show, I spoke with Alon Kaufman, CEO and co-founder of Duality Technologies, a startup building tools that will allow companies to apply analytics and machine learning to encrypted data. In a recent talk, I described the importance of data, various methods for estimating the value of data, and emerging […]

    ceo data show
    How social science research can inform the design of AI systems

    Play Episode Listen Later Oct 11, 2018 45:30


    In this episode of the Data Show, I spoke with Jacob Ward, a Berggruen Fellow at Stanford University. Ward has an extensive background in journalism, mainly covering topics in science and technology, at National Geographic, Al Jazeera, Discovery Channel, BBC, Popular Science, and many other outlets. Most recently, he’s become interested in the interplay between […]

    Why it’s hard to design fair machine learning models

    Play Episode Listen Later Sep 27, 2018 34:24


    In this episode of the Data Show, I spoke with Sharad Goel, assistant professor at Stanford, and his student Sam Corbett-Davies. They recently wrote a survey paper, “A Critical Review of Fair Machine Learning,” where they carefully examined the standard statistical tools used to check for fairness in machine learning models. It turns out that […]

    Using machine learning to improve dialog flow in conversational applications

    Play Episode Listen Later Sep 13, 2018 45:07


    In this episode of the Data Show, I spoke with Alan Nichol, co-founder and CTO of Rasa, a startup that builds open source tools to help developers and product teams build conversational applications. About 18 months ago, there was tremendous excitement and hype surrounding chatbots, and while things have quieted lately, companies and developers continue […]

    cto rasa data show
    Building accessible tools for large-scale computation and machine learning

    Play Episode Listen Later Aug 30, 2018 53:32


    In this episode of the Data Show, I spoke with Eric Jonas, a postdoc in the new Berkeley Center for Computational Imaging. Jonas is also affiliated with UC Berkeley’s RISE Lab. It was at a RISE Lab event that he first announced Pywren, a framework that lets data enthusiasts proficient with Python run existing code […]

    uc berkeley python berkeley center eric jonas data show
    Simplifying machine learning lifecycle management

    Play Episode Listen Later Aug 16, 2018 37:25


    In this episode of the Data Show, I spoke with Harish Doddi, co-founder and CEO of Datatron, a startup focused on helping companies deploy and manage machine learning models. As companies move from machine learning prototypes to products and services, tools and best practices for productionizing and managing models are just starting to emerge. Today’s […]

    ceo data show
    How privacy-preserving techniques can lead to more robust machine learning models

    Play Episode Listen Later Aug 2, 2018 36:43


    In this episode of the Data Show, I spoke with Chang Liu, applied research scientist at Georgian Partners. In a previous post, I highlighted early tools for privacy-preserving analytics, both for improving decision-making (business intelligence and analytics) and for enabling automation (machine learning). One of the tools I mentioned is an open source project for […]

    Specialized hardware for deep learning will unleash innovation

    Play Episode Listen Later Jul 19, 2018 41:18


    In this episode of the Data Show, I spoke with Andrew Feldman, founder and CEO of Cerebras Systems, a startup in the blossoming area of specialized hardware for machine learning. Since the release of AlexNet in 2012, we have seen an explosion in activity in machine learning, particularly in deep learning. A lot of the […]

    Data regulations and privacy discussions are still in the early stages

    Play Episode Listen Later Jul 5, 2018 33:19


    In this episode of the Data Show, I spoke with Aurélie Pols of Mind Your Privacy, one of my go-to resources when it comes to data privacy and data ethics. This interview took place at Strata Data London, a couple of days before the EU General Data Protection Regulation (GDPR) took effect. I wanted her […]

    aur pols data show
    Managing risk in machine learning models

    Play Episode Listen Later Jun 21, 2018 32:34


    In this episode of the Data Show, I spoke with Andrew Burt, chief privacy officer at Immuta, and Steven Touw, co-founder and CTO of Immuta. Burt recently co-authored a white paper on managing risk in machine learning models, and I wanted to sit down with them to discuss some of the proposals they put forward […]

    cto burt immuta data show
    The real value of data requires a holistic view of the end-to-end data pipeline

    Play Episode Listen Later Jun 7, 2018 31:05


    In this episode of the Data Show, I spoke with Ashok Srivastava, senior vice president and chief data officer at Intuit. He has a strong science and engineering background, combined with years of applying machine learning and data science in industry. Prior to joining Intuit, he led the teams responsible for data and artificial intelligence […]

    intuit data show
    The evolution of data science, data engineering, and AI

    Play Episode Listen Later May 24, 2018 30:14


    This episode of the Data Show marks our 100th episode. This podcast stemmed out of video interviews conducted at O’Reilly’s 2014 Foo Camp. We had a collection of friends who were key members of the data science and big data communities on hand and we decided to record short conversations with them. We originally conceived […]

    o'reilly data show
    Companies in China are moving quickly to embrace AI technologies

    Play Episode Listen Later May 10, 2018 28:52


    In this episode of the Data Show, I spoke with Jason Dai, CTO of Big Data Technologies at Intel, and one of my co-chairs for the AI Conference in Beijing. I wanted to check in on the status of BigDL, specifically how companies have been using this deep learning library on top of Apache Spark, […]

    Teaching and implementing data science and AI in the enterprise

    Play Episode Listen Later Apr 26, 2018 38:46


    In this episode of the Data Show, I spoke with Jerry Overton, senior principal and distinguished technologist at DXC Technology. I wanted the perspective of someone who works across industries and with a variety of companies. I specifically wanted to explore the current state of data science and AI within companies and public sector agencies. […]

    ai dxc technology data show
    The importance of transparency and user control in machine learning

    Play Episode Listen Later Apr 12, 2018 23:19


    In this episode of the Data Show, I spoke with Guillaume Chaslot, an ex-YouTube engineer and founder of AlgoTransparency, an organization dedicated to helping the public understand the profound impact algorithms have on our lives. We live in an age when many of our interactions with companies and services are governed by algorithms. At a […]

    What machine learning engineers need to know

    Play Episode Listen Later Mar 29, 2018 32:16


    In this episode of the Data Show, I spoke with Jesse Anderson, managing director of the Big Data Institute, and my colleague Paco Nathan, who recently became co-chair of Jupytercon. This conversation grew out of a recent email thread the three of us had on machine learning engineers, a new job role that LinkedIn recently pegged […]

    jesse anderson big data institute jupytercon paco nathan data show
    How to train and deploy deep learning at scale

    Play Episode Listen Later Mar 15, 2018 39:10


    In this episode of the Data Show, I spoke with Ameet Talwalkar, assistant professor of machine learning at CMU and co-founder of Determined AI. He was an early and key contributor to Spark MLlib and a member of AMPLab. Most recently, he helped conceive and organize the first edition of SysML, a new academic conference […]

    cmu sysml data show
    Using machine learning to monitor and optimize chatbots

    Play Episode Listen Later Mar 6, 2018 27:47


    In this episode of the Data Show, I spoke with Ofer Ronen, GM of Chatbase, a startup housed within Google’s Area 120. With tools for building chatbots becoming accessible, conversational interfaces are becoming more prevalent. As Ronen highlights in our conversation, chatbots are already enabling companies to automate many routine tasks (mainly in customer interaction). […]

    google gm data show
    Unleashing the potential of reinforcement learning

    Play Episode Listen Later Mar 1, 2018 33:24


    In this episode of the Data Show, I spoke with Danny Lange, VP of AI and machine learning at Unity Technologies. Lange previously led data and machine learning teams at Microsoft, Amazon, and Uber, where his teams were responsible for building data science tools used by other developers and analysts within those companies. When I […]

    Graphs as the front end for machine learning

    Play Episode Listen Later Feb 15, 2018 45:13


    In this episode of the Data Show, I spoke with Leo Meyerovich, co-founder and CEO of Graphistry. Graphs have always been part of the big data revolution (think of the large graphs generated by the early social media startups). In recent months, I’ve come across companies releasing and using new tools for creating, storing, and […]

    ceo graphs data show
    Machine learning needs machine teaching

    Play Episode Listen Later Feb 1, 2018 45:12


    In this episode of the Data Show, I spoke with Mark Hammond, founder and CEO of Bonsai, a startup at the forefront of developing AI systems in industrial settings. While many articles have been written about developments in computer vision, speech recognition, and autonomous vehicles, I’m particularly excited about near-term applications of AI to manufacturing, […]

    How machine learning can be used to write more secure computer programs

    Play Episode Listen Later Jan 18, 2018 28:12


    In this episode of the Data Show, I spoke with Fabian Yamaguchi, chief scientist at ShiftLeft. His 2015 Ph.D. dissertation sketched out how the combination of static analysis, graph mining, and machine learning, can be used to develop tools to augment security analysts. In a recent post, I argued for machine learning tools to augment […]

    data show
    Bringing AI into the enterprise

    Play Episode Listen Later Jan 4, 2018 44:13


    In this episode of the Data Show, I spoke with Kristian Hammond, chief scientist of Narrative Science and professor of EECS at Northwestern University. He has been at the forefront of helping companies understand the power, limitations, and disruptive potential of AI technologies and tools. In a previous post on machine learning, I listed types […]

    How machine learning will accelerate data management systems

    Play Episode Listen Later Dec 21, 2017 34:46


    In this episode of the Data Show, I spoke with Tim Kraska, associate professor of computer science at MIT. To take advantage of big data, we need scalable, fast, and efficient data management systems. Database administrators and users often find themselves tasked with building index structures (“indexes” in database parlance), which are needed to speed […]

    mit database data show

    Claim O'Reilly Data Show - O'Reilly Media Podcast

    In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

    Claim Cancel