DataBytes

Follow DataBytes
Share on
Copy link to clipboard

Data science, big data, artificial intelligence, machine learning… they’re all the rage. In this podcast, Jessi Cisewski-Kehe and Susan Wang, 2 statisticians, give you a perspective on what’s happening in the realm of all things data. Random bantering included. Support this podcast: https://ancho…

Jessi and Susan

  • Jan 24, 2020 LATEST EPISODE
  • infrequent NEW EPISODES
  • 20m AVG DURATION
  • 50 EPISODES


Search for episodes from DataBytes with a specific topic:

Latest episodes from DataBytes

#50: Extreme Classification: All You Need Is Some Hash (Functions)

Play Episode Listen Later Jan 24, 2020 21:42


In part 2 of this saga on extreme classification, we get into the weeds on how MACH is able to magically handle such massive classification problems. The title says it all -- hash functions are the magical ingredient. We provide a step-by-step view of how one might come up with the MACH algorithm from first principles. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#49: Extreme Classification: Going at MACH Speed (Part 1)

Play Episode Listen Later Jan 17, 2020 16:40


In this episode, Dr. Derek Feng drops by to chat about a recent paper on a divide-and-conquer approach (Merged-Averaged Classifiers via Hashing) to massive classification problems. In part 1 (of 2 episodes), we describe the general problem solved by and strategy taken by MACH, wherein the original large classification problem is broken down into smaller-sized classification problems. Next week in the second episode, we talk about more technical details of how the division of labor works, and why it works. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#48: Where Moneyball Meets Footy

Play Episode Listen Later Dec 13, 2019 17:17


We've long heard about the waves that statistics has made in baseball. But what about soccer? In this episode, we summarize a few applications of statistics in European football (or American soccer). --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#47: Domoic Acid Testing -- A Crabshoot?

Play Episode Listen Later Nov 30, 2019 19:28


Domoic acid has plagued shellfish and other wildlife along the Pacific coastline in recent years. Testing for domoic acid concentration in crabs on a regular basis has become important for determining when crabs and their viscera can be safely consumed. Unlike many other common hypothesis tests, the setup used for domoic acid testing is based on the sample maximum rather than the sample mean. In this episode, we critique the testing methodology. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#46: Finding Your (Niche) Board Games

Play Episode Listen Later Nov 8, 2019 13:18


In this episode, we discuss how two statisticians used data from BoardGameGeek.com to put together their own board game recommendation engine, specifically designed to stay away from mainstream recommendations. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#45: Learning Publicly, with Private Data

Play Episode Listen Later Nov 1, 2019 16:45


In this episode, Dr. Derek Feng discusses the general issue of data privacy in the age of big data, including topics of differential privacy and federated learning. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#44: A Conversation with Jon Krohn

Play Episode Listen Later Oct 25, 2019 33:37


We sit down with Dr. Jon Krohn to chat about his work as a Chief Data Scientist at untapt, his newly published bestseller "Deep Learning Illustrated", and his teaching/research. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#43: To Google and Back

Play Episode Listen Later Oct 4, 2019 30:19


In this episode, Professor Albert Y. Kim of Smith College describes his post-PhD journey, which included a stint at Google Adwords before academic posts at Reed College, Middlebury College, Amherst College, and Smith College. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#42: Black in the Box

Play Episode Listen Later Sep 27, 2019 23:28


Dr. Derek Feng joins us again to discuss the two metrics by which we align all statistical/machine learning methods -- interpretability versus predictive ability. In a world where black box methods reign supreme, what does learning mean? --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#41: What to do with Outliers

Play Episode Listen Later Sep 20, 2019 22:51


Guest Dylan O'Connell joins us today to talk about a recent surprising, but legitimate Democratic primary poll result done by Monmouth University. We discuss different perspectives on how to approach a data point that doesn't fit in with the others. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#40: Making a DIY ML-Controlled Cat Door

Play Episode Listen Later Sep 13, 2019 11:27


Outdoor-cat owners know all too well the unpleasantries of dealing with what the cat dragged in. A self-proclaimed machine learning novice proves that you don't need to be a pro to set up a smart cat door that prevents the cat from bringing prey into your home. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#39: Rolling in the Deep Patient

Play Episode Listen Later Sep 6, 2019 26:31


We take a deep dive into the poster child for black-box machine learning methods, namely Deep Patient: an unsupervised learning method that uses denoising auto-encoders as the means for extracting salient features in electronic health records, which in turn can then be used to predict health outcomes. We do our best to explain what on earth the previous sentence meant. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#38: The Misuse of Statistics in Court

Play Episode Listen Later Aug 30, 2019 11:34


In this episode, we talk about how a statistical concept that you would learn about in an introductory course was misused in court. The error led to dire consequences in the case of Sally Clark who was charged in the deaths of two of her children. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#37: Susan Starts a New Job

Play Episode Listen Later Aug 23, 2019 14:57


In this episode, we talk about Susan's new job as a Data Scientist! She recently transitioned from academia to industry and we discuss her experience with searching for positions, interviewing, and her first few weeks in her new role. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#36: What's New in Machine Learning Startups

Play Episode Listen Later Aug 16, 2019 11:16


In this episode, we talk about some machine learning startups to pay attention to this year. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#35: You Look How You Sound

Play Episode Listen Later Aug 9, 2019 14:21


Deep learning has been useful for lots of applications when it comes to prediction. Yet another is the use of a short sound clip of speech to predict the face of the speaker. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#34: Protecting Kids' Digital Privacy

Play Episode Listen Later Aug 2, 2019 11:18


In this episode, we talk about protecting kids' digital privacy. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#33: Statisticians Hate Post-Hoc Power

Play Episode Listen Later Jul 26, 2019 9:58


Statistics is key to demonstrating the effectiveness of new advancements in science and medicine, but when statistical significance is not achieved, is post-hoc power a valid justification? --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#32: Amazon's 3D Body Scan Study

Play Episode Listen Later Jul 20, 2019 13:49


In this episode, we talk about Amazon's 3D body scan study. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#31: What Data Visualizations Do You Care About? It's Personal

Play Episode Listen Later Jul 12, 2019 13:15


In this episode, we talk about how data are personal for those in a rural Pennsylvania community. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#30: Some Like It Hot -- What Gender Reveals About Our Temperature Preferences

Play Episode Listen Later Jul 5, 2019 11:19


Word on the street is that women prefer warmer temperatures than men do. Researchers designed an experiment to investigate whether this is actually true, specifically, considering how men and women perform on various cognitive tasks under different temperature scenarios. In this episode, we dissect the study so you can judge whether you believe the results. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#29: Jeopardy! Meets Statistics

Play Episode Listen Later Jun 28, 2019 9:28


Jeopardy! is a weeknightly televised trivia game show. In recent months, one player, James Holzhauer has taken the Jeopardy! fandom by storm with his unusual style of play and his long run of big wins. In this episode, we discuss how statistics can help explain his betting tactics, and we discuss how some other Jeopardy! players have used statistics to help up their game. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#28: Facial Recognition Technology Update and Rating Trustworthiness of AI-Generated Airbnb Profiles

Play Episode Listen Later Jun 21, 2019 27:27


In this episode, we discuss a number of miscellaneous news updates regarding facial recognition technology (concerning San Francisco, Amazon, and pandas!). And then, we talk about how much we trust AI-generated profiles for Airbnb. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#27: Does Uber/Lyft Help Or Hurt Traffic Congestion and Machine Learning Interpretability

Play Episode Listen Later Jun 14, 2019 27:12


In this episode, we look at a study about whether ride-sharing services contribute to increased or decreased traffic congestion in San Francisco. We then discuss some strategies to build interpretable machine learning models. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#26: Household Electronics That See and Google's Reservation AI

Play Episode Listen Later Jun 7, 2019 23:57


In this episode, we talk about a new innovation that enables household electronics to see what's around them. We then discuss Google Duplex, an AI designed to happily make reservations and appointments for you. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#25: DataFest 2019 and Measuring Migrations from Hurricane Maria

Play Episode Listen Later May 31, 2019 19:54


Susan recently served as a judge at a local DataFest competition (a weekend-long data competition for undergraduates). She shares her experiences and recommendations for future contestants. We then discuss how Facebook data might be helpful for counting the number of people how migrated from Puerto Rico to the mainland U.S. as a result of Hurricane Maria. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#24: Predictive Power of Early Polling and Did a TV Show Result in Higher Teenage Suicides?

Play Episode Listen Later May 24, 2019 22:41


In this episode, we discuss FiveThirtyEight.com's analysis of primary election polling over the past 40 years. In particular, we consider whether early polling is helpful for predicting election outcomes. And then, we talk about a study that potentially blames Netflix for a surge in teenage suicides in 2017. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#23: Offline Song Identification and Perceptions about AI

Play Episode Listen Later May 17, 2019 21:29


In this episode, we discuss how Google's Now Playing feature can identify songs that are playing around you, using embeddings. We then talk about a study that reports on America's perceptions about artificial intelligence -- who can we trust to develop AI responsibly? --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#22: Betting on the Game of Thrones and the Misfortune of Lefthandedness

Play Episode Listen Later May 10, 2019 19:53


In this episode, we discuss how bookmakers price/take bets on outcomes in the Game of Thrones. We then discuss a study that claimed that lefthanded people have shorter life expectancies than righthanded people. Spoiler alert: lefthanders have nothing to worry about! --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#21: Pitch Call Accuracy and Predicting the Outcome of the Champions League

Play Episode Listen Later May 3, 2019 28:32


Buckle up for a sports-filled episode! We discuss a study that analyzes the accuracy of umpire calls about strikes vs. balls and take a deep dive into FiveThirtyEight.com's statistical methods for predicting the winner of the Champions League. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#20: Thinking Like Computers and Text Mining the Mueller Report

Play Episode Listen Later Apr 26, 2019 21:44


In this episode, we discuss a study that recruits human researchers to try to predict how computers classify images. We then highlight a number of examples of natural language processing techniques applied to the Mueller Report. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#19: Seeing with AI and Detecting Exoplanets

Play Episode Listen Later Apr 19, 2019 26:01


In this episode, we discuss Microsoft's handy phone application for scanning and reporting on our surroundings, as a way of helping vision impaired individuals better interact with the world around them. We then talk about how AI can be useful in detecting exoplanets (or extrasolar planets). --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#18: Statistical Anxiety and the Fight Against Statistical Significance

Play Episode Listen Later Apr 12, 2019 28:43


We discuss a survey designed to analyze the extent and root cause of statistical anxiety in the classroom, discussing the methods/limitations of the study. We then talk about yet another crusade against hypothesis testing, this time around the concept of "statistical significance". --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#17: How Theranos Sinned Statistically

Play Episode Listen Later Apr 5, 2019 18:49


In this episode, Susan Wang is joined by guest Natalie Doss to consider the statistical sins committed by Theranos, the former blood testing unicorn. From arbitrary data manipulation to inappropriate data aggregation, we discuss what they did and why these practices were particularly bad. Then, we weigh in on how Theranos could have done worse, making it harder for the public to find out about their faulty tests. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#16: Machine-Generated Faces & Text, and Relating Health Outcomes to Skin Color

Play Episode Listen Later Mar 15, 2019 38:25


We discuss NVIDIA's AI-generated faces that look incredibly authentic, and relatedly, OpenAI's text generator that is so capable that it has to be kept under wraps. We then assess the study design of a recent research article that considered how health outcomes vary amongst African Americans of different skin tones. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#15: Deep Learning to Fold Proteins and Automated Journalism

Play Episode Listen Later Mar 8, 2019 18:38


We discuss opportunities for machines and humans in the prediction of protein structures, a necessary task in new drug discovery. Google's DeepMind has taken the prize in the recent iteration of CASP, a protein folding prediction challenge. We also discuss how AI has begun to revolutionize journalism. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#14: A Personality Test that Makes Sense and What Does Spotify Know?

Play Episode Listen Later Mar 1, 2019 15:22


538 has provided a free, online personality test that might make more sense than your typical online clickbaity quiz. We talk about why it calls itself the only personality test that isn't junk science. We then discuss the results of a recent study on Spotify data. Does it know too much about us (and you)? We'll let you know. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#13: IBM's Debate Machine and Adopting a 'Data Culture' in Companies

Play Episode Listen Later Feb 22, 2019 24:55


On February 11, IBM showcased its Project Debater in a face-off against debate champion Harish Natarajan. We talk about how this machine vs. human competition went. Then, we discuss a Harvard Business Review article citing a survey that discovered companies are not becoming data-oriented quickly enough. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#12: Super Bowl Stats, Confidence Intervals, and Data Sources

Play Episode Listen Later Feb 15, 2019 29:00


Three topics are featured in this episode: first, statistics about Super Bowl LIII, including what was in the bowls as the game happened; second, a fun activity for teaching confidence intervals; finally, we present some online sources for data. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#11: How Machines Might be Biased and the Job Market for Data Scientists

Play Episode Listen Later Feb 8, 2019 12:30


AI and ML algorithms are growing popular -- but they can actually perpetuate cognitive biases in our daily lives. We discuss the state of the problem and possible solutions. We also present a favorable job outlook for aspiring (or continuing!) data scientists. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#10: AI in Medicine and Racial Bias in College Admissions

Play Episode Listen Later Feb 1, 2019 24:06


Artificial intelligence is starting to make waves in medicine; we look at how technology might potentially change how medical testing works. We also bring in some statistical reasoning in the debate of whether or not there is racial discrimination in Harvard's college admissions process. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#9: Lessons Learned from Making a Fitbit Data Visualization Shiny App

Play Episode Listen Later Jan 24, 2019 22:35


Dynamic data visualization widgets can be pretty cool, but it takes more than just statistical chops to build an online visualization app that supports input data from users. In this episode, we describe the journey that Susan took to build a visualization app for Fitbit data. The app can be found at http://fitbitvizwiz.ddns.net. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#8: The French Revolution and the Challenge of Reproducibility

Play Episode Listen Later Jan 17, 2019 21:54


What can machine learning tell us about the French Revolution? This episode describes a brief history lesson of the digital humanities. Then, why do we constantly hear about the word 'reproducibility' in the context of scientific research? We'll explore what this means and why it seems to keep happening. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#7: The Virtual Maestro and the Most Influential Movie

Play Episode Listen Later Jan 10, 2019 18:56


Have you ever wanted to try your hand at conducting an orchestra? Now you can, with Google's Semi-Conductor online app. We'll talk about how this browser-based app analyzes your body positions in real time to translate your actions into Mozart music. We also talk about a way to use network analysis to determine the most influential movie ever made to date. Be sure to tune in to find out which movie takes the prize. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#6: Probability Games and Amazon's Own Self-Driving Car

Play Episode Listen Later Jan 3, 2019 18:27


What are the odds that a toss of a 10-sided die, followed by a toss of a 20-sided die, and then a toss of a 30-sided die land in increasing order? If you know the answer within a few seconds, you might have an edge in Borel, a game that is all about probability. We'll also talk about DeepRacer, Amazon's soon-to-come programmable self-driving car. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#5: The Do's and Don'ts of Data Visualization

Play Episode Listen Later Dec 27, 2018 21:46


Data visualization is an integral pre-cursor to data analysis, providing a way to visually inspect the data for surprising trends and uncover potential errors in variable coding. In this episode, we cover some guiding principles of data visualization. Our accompanying blog post contains links to examples. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#4: Meet the Co-hosts (Part 2)

Play Episode Listen Later Dec 20, 2018 26:51


This week, we learn about Jessi Cisewski-Kehe's background to find out how she went from a Math major to an actuarial analyst, then to grad school in statistics, followed by a three-year visiting assistant professor position at Carnegie Mellon where she got into Astrostatistics, and finally to her current position as an assistant professor at Yale. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#3: Meet the Co-hosts (Part 1)

Play Episode Listen Later Dec 11, 2018 28:00


This week, we learn about Susan Wang's background to find out how she went from an Applied Math major to actuarial consulting, then to a weather derivatives start-up firm, then to grad school in statistics, finally landing at Yale as a lecturer. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#2: Biometric Technology at Airports, Google Smart Replies, Bestselling Books

Play Episode Listen Later Dec 3, 2018 26:35


In this episode, we discuss biometric technology used at airports, Google Smart Replies (and letting AI compose our emails/texts for us), and an analysis of New York Times Bestsellers list data. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

#1: Thanksgiving, College Football, International Prize in Statistics

Play Episode Listen Later Nov 28, 2018 13:29


The first episode of the DataBytes Podcast where we discuss popular topics related to data, statistics, data science, machine learning, artificial intelligence. In this episode, we discuss Thanksgiving food, the College Football Playoff selection, and the winner of the International Prize in Statistics. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support

Claim DataBytes

In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

Claim Cancel