POPULARITY
Claudia Perlich joins us to discuss her work at one of the world’s largest hedge funds and how she got to work there, as well as her history of winning data science competitions. In this episode you will learn: • Life and work during the pandemic [2:23] • Claudia’s history with horses and riding [8:28] • Claudia’s work at Two Sigma [12:00] • Claudia’s role on a daily basis [20:51] • Tools of the trade [30:27] • What Claudia looks for when hiring [36:37] • What skills do future hires need? [40:32] • Claudia’s history with data science competitions [48:22] • Why work in finance and at Two Sigma? [1:00:19] Additional materials: www.superdatascience.com/437
Claudia Perlich, Chief Scientist at DStillery, a role in which she designs, develops, analyzes and optimizes the machine learning algorithms that drive digital advertising, speaks with Hugo about the role of data science in the online advertising world, the predictability of humans, how her team builds real time bidding algorithms and detects bots online, along with the ethical implications of all of these evolving concepts.
Claudia Perlich is a professional in the growing field of data science. With more than 50 published scientific articles, she is a widely acclaimed expert on big data and machine learning applications. Claudia is the past winner of the Advertising Research Foundation’s (ARF) Grand Innovation Award and has been selected for Crain’s New York’s 40 Under 40 list, Wired Magazine’s Smart List, and Fast Company’s 100 Most Creative People. Claudia holds multiple patents in machine learning. And has worked at IBM’s Watson Research Center, focusing on data analytics and machine learning. She is currently the Chief Scientist at a startup company called Dstillary. She holds a PhD in Information Systems from New York University (where she continues to teach at the Stern School of Business), and an MA in Computer Science from the University of Colorado. If you want to stay up-to-date on future episodes or you want access to our Spotify Power Playlist, sign up at www.aprilseifert.com
Ad tech and ethics!
This week is an insightful discussion with Claudia Perlich about some situations in machine learning where models can be built, perhaps by well-intentioned practitioners, to appear to be highly predictive despite being trained on random data. Our discussion covers some novel observations about ROC and AUC, as well as an informative discussion of leakage. Much of our discussion is inspired by two excellent papers Claudia authored: Leakage in Data Mining: Formulation, Detection, and Avoidance and On Cross Validation and Stacking: Building Seemingly Predictive Models on Random Data. Both are highly recommended reading!
Supervised machine learning assumes that the features and labels used for building a classifier are isolated from each other--basically, that you can't cheat by peeking. Turns out this can be easier said than done. In this episode, we'll talk about the many (and diverse!) cases where label information contaminates features, ruining data science competitions along the way. Relevant links: https://www.researchgate.net/profile/Claudia_Perlich/publication/221653692_Leakage_in_data_mining_Formulation_detection_and_avoidance/links/54418bb80cf2a6a049a5a0ca.pdf
In episode thirteen we talk with Claudia Perlich, Chief Scientist at Dstillery. We talk about her work using machine learning in digital advertising and her approach to data in competitions. We take a look at information leakage in competitions after ImageNet Challenge this year. The New York Times covered the events, and Neil Lawrence has been writing thoughtfully about it and its impact. Plus, we take a listener question about trends in data size.
How can online merchants learn more about their potential customers by mining the data surrounding social media – without violating strict privacy rules? Claudia Perlich of Media6Degrees explains how she and her colleagues zero in on individual customers in projects that have benefited Netflix, IBM, and healthcare providers.