"While it is wise to learn from experiences, it is wiser to learn from the experience of others." -- Rick Warren Welcome to CDO battlescars — a podcast series where we talk to data leaders across data engineering, analytics, data science on the challenges they encountered in their journey of transforming raw data into insights! I am your host Sandeep Uttamchandani. The motivation of this podcast series is to give back to the data community the hard-learned lessons that I and my peer data leaders have gathered over the years! We appreciate your comments and feedback!
Democratize is not one big strategy but 100s of small things that you put in place. Quicktake is a series of short practical tips to get you started towards making data and AI widely accessible and self-service within your organization. This tip covers an important myth: To improve model accuracy, start by verifying the correctness of labels. Typically, there is only a small percentage of miss predictions that are related to wrong labels. Often times the biggest reason for model inaccuracies is the poor quality of data samples. Train the team just that instead of jumping to fix the incorrect labels, start by analyzing a sample of results that were misclassified and make a judgment call on whether to invest in fixing the labels going back and looking at opus deposit useful.
Democratize is not one big strategy but 100s of small things that you put in place. Quicktake is a series of short practical tips to get you started towards making data and AI widely accessible and self-service within your organization. This tip covers 3 recipes on handling misclassified ML predictions within your product.
In this episode, my guest is Daniel Mccaffery. Daniel is a technology thought leader driving Data and Analytics at Climate Corporation (a division of Bayer). Daniel shares his insights on using ML to provide personalized recommendations for helping farmers grow crops with higher yield, profitable and sustainability. This involves deciding the right seed, right crop protection, density levels across different parts of the farm, etc. This is a fascinating example of AI and physical sciences coming together to build an innovative product offering. Daniel and I had a blast covering several topics: the building of models, model deployment and re-training, explainability for farmers to understand the recommendations, managing bias, experimentation A/B testing, monitoring drifts, data labeling, and perspectives on key bottlenecks in going from idea to ROI.
In this episode, I talk to Samir Boualla, CDO at ING Bank France. We cover his battlescars in two areas: 1) Driving Data Governance across the business teams (including Data Literacy & Data Protection); 2) Building internal and external-facing data products. At ING Bank France, Samir is the Chief Data Officer responsible for several teams governing, developing, and managing data infrastructure and data assets to deliver value to the business. With over 20+ years of experience on various data topics. Samir shares interesting battle-tested techniques in this podcast: a process catalog, having a "data minimum standard," change management mindset, applying transfer learning.
Part 2 of my chat with Kapil Surlaker, VP of Engg, and Head of Data at LinkedIn. We cover the topic of managing data quality at LinkedIn scale! Kapil has 20+ years of experience in data and infrastructure both at large companies such as Oracle as well as multiple startups. At LinkedIn, Kapil has been leading the next generation of Big Data Infrastructure, Platforms, Tools, and Applications to empower Data Scientists, AI engineers, App developers, to extract value from data. Kapil's team has been at the forefront of innovation driving multiple open source initiatives such as Apache Pinot, Gobblin, DataHub.
In this episode, I talk to Kapil Surlaker, VP of Engg, and Head of Data at LinkedIn. We cover the topic of battlescars related to Data Management. The episode is divided into two parts. In Part 1 (this episode), we cover challenges related to metadata management and data access APIs. In the next part, we deep-dive on data quality. Kapil has 20+ years of experience in data and infrastructure both at large companies such as Oracle as well as multiple startups. At LinkedIn, Kapil has been leading the next generation of Big Data Infrastructure, Platforms, Tools, and Applications to empower Data Scientists, AI engineers, App developers, to extract value from data. Kapil's team has been at the forefront of innovation driving multiple open source initiatives such as Apache Pinot, Gobblin, DataHub.
In this episode, I talk to Chu-Cheng, CDO at Etsy. We cover his battlescars related to recruiting and building a Data Science team. At Etsy, Chu-Cheng leads the global data organization responsible for data science strategy, AI innovation, machine learning & data infrastructure. Prior to Etsy, Chu-Cheng led various data roles at Amazon, Intuit, Rakuten, and eBay. Chu-Cheng is a Ph.D. in computer science, with published papers in key AI/ML conferences.
In this episode, I talk to Manish Chitnis, CDO at Partner Fund Management (PFM). Manish has 20+ years of diverse multi-disciplinary experience across a wide range of analytics: architecting the data warehouses from scratch, building risk/data apps, introducing new data architectures, instituting data governance/stewardship, data-hygiene and cleanup, improved data collection, and much more. We cover his battle scars in applying Data Science to traditional market data analysis domain.
In this podcast, I talk to Meenal Iyer from Tailored Brands. Meenal brings in 20+ years of data analytics experience across multiple domains namely retail, travel, financial services. Meenal has been transforming enterprises to become data driven, and shares interesting domain agnostic lessons from her experience. We cover two areas of battescars in this podcast: 1) Growing data literacy and a data-driven culture; 2) Standardization of business metrics.
In this episode, I talk to Keyur Desai the former CDO of TD Ameritrade. We discuss battlescars in two areas: Building a Data Strategy & Pervasive Self-service analytics platform. Keyur shares some really valuable lessons based on his extensive experience. Keyur is a data executive with over 30 years of experience managing and monetizing data and analytics. He has created data driven organizations and driven enterprise wide data strategies, data literacy, modern data governance, machine learning & data science, pervasive self-service analytics, and several other initiatives. He has experience across multiple industries including Insurance, Technology, Healthcare, Retail.
In this episode, I talk to Anil Madan at Intuit on battlescars in two areas: Standardization of business metrics and Democratization of Experimentation within the enterprise. Anil is the VP of Data and Analytics for Intuit’s Small Business and Self Employed group. He has over 25 years of experience in the data space across Intuit, PayPal,eBay — a poineer in building data infrastructure and value creation in the form of products, experimentation, data advertising, digital marketing, payments, and many more.