Datacast

Follow Datacast
Share on
Copy link to clipboard

Datacast unpacks the narrative journey about data scientists, machine learning engineers, academic researchers, and the careers that they built.

James Le


    • Jan 4, 2023 LATEST EPISODE
    • every other week NEW EPISODES
    • 1h 6m AVG DURATION
    • 106 EPISODES


    Search for episodes from Datacast with a specific topic:

    Latest episodes from Datacast

    Episode 106: Advancing AI Adoption with Dânia Meira

    Play Episode Listen Later Jan 4, 2023 69:00


    Show Notes(01:32) Dânia shared her upbringing in Brazil and her college experience studying Applied Mathematics at the University of Campinas.(05:58) Dânia touched on her early career working in marketing intelligence in Brazil.(10:38) Dânia described her thesis on scalable implementations of the Alternating Least Squares algorithm for Collaborative Filtering recommendation, conducted during her Master's degree in Computer Science from the University of Fluminense.(16:10) Dânia recalled her hustling phase working and getting a Master's degree simultaneously.(24:19) Dânia reflected on her move to Berlin to work as a data scientist in several startups.(31:00) Dânia looked back at her time working at MYTOYS GROUP's Analytics team, responsible for Predictive Analytics and Machine Learning Modeling.(34:12) Dânia compared doing data science to practicing mixed martial arts.(38:35) Dânia reflected on her involvement with Data Science for Social Good Berlin as a data ambassador and Data Science Retreat as a SQL Masterclass Teacher.(43:14) Dânia shared the founding story of AI Guild - the go-to community for data and business professionals advancing AI adoption - where she is a founding member.(47:36) Dânia gave her thoughts on barriers preventing more women from entering the data field.(51:21) Dânia discussed the #datalift initiative, which pushes to productionize more data analytics and machine learning solutions.(58:27) Dânia explained her work supporting the advancement of #datacareer talents and experts.(01:01:22) Dânia gave her take on the evolution of the data field over the past decade.(01:03:16) Closing segment.Dânia's Contact InfoLinkedInTwitterWebsiteGitHubMediumAI Guild's ResourcesWebsite | LinkedIn | YouTubeJoin As A Member#datalift#datacareerMentioned ContentPeopleAndrew Ng: Founder of deeplearning.ai, co-founder of CourseraAlessandra Sala: President of Women in AI, Sr. Director of Artificial Intelligence and Data Science at ShutterstockJoy Buolamwini: Founder and Executive director of The Algorithmic Justice League and maker of the "Coded Bias" documentary, available on NetflixBookWeapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'NeilAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 105: Building The Next-Generation Spreadsheet, Being A Curious Analyst, and Engineering Entrepreneurship with Bobby Pinero

    Play Episode Listen Later Dec 20, 2022 64:55


    Show Notes(01:33) Bobby shared his upbringing in DC and high-school experience at St. Albans School.(04:10) Bobby described his academic experience at Stanford studying Management Science and Engineering.(07:39) Bobby recalled valuable career lessons learned working as a Finance Analyst at IBM and Inflection.(09:56) Bobby reflected on his rationale for joining Intercom as one of the company's early employees right after its Series A financing in 2013.(14:16) Bobby unpacked his 2016 talk "Scaling Analytics at Intercom," which explained the analytics journey at Intercom.(18:46) Bobby shared a few metrics that are fundamental to the health of a startup across its growth stages (read his Intercom blog about the data points that startups should measure).(22:50) Bobby shared the founding story of Equals.(27:33) Bobby explained his decision to choose Ben McRedmond as his co-founder.(29:35) Bobby expanded on the appealing traits of using spreadsheets.(31:54) Bobby described the evolution of spreadsheet-like products and how the Equals product works at a high level.(34:35) Bobby gave his take on how the concept of a next-generation spreadsheet fits into the quickly evolving modern data stack.(38:31) Bobby shared valuable hiring lessons to attract the right people who are excited about Equals' mission.(44:34) Bobby shared the challenges of finding Equals' early design partners and lighthouse customers.(47:17) Bobby recapped key lessons about hiring financial analysts at Intercom.(51:45) Bobby shared advice to a smart, driven finance operator looking to get more influence within a startup environment.(56:26) Bobby emphasized the valuable skills acquired from his analyst career for his current founder journey.(58:45) Closing segment.Bobby's Contact InfoLinkedInTwitterEquals ResourcesWebsite | Twitter | LinkedInSpreadsheet TemplatesInsights In Action interview seriesIntroducing Pivot Tables for Equals (Aug 2022)Equals raises $16M Series A from a16z to replace Excel (Nov 2022)Equals is hiring across Engineering, Design, Growth, and an Executive Assistant. Reach out to Bobby if you are interested!Mentioned ContentArticles + Talk23 SaaS Metrics for Fundraising + Optimization (March 2015)Scaling Analytics at Intercom (Intercom Analytics Meetup, April 2016)Data Points: What Should Your Startup Measure? (Oct 2017)Every analyst is a finance analyst (May 2021)The only question that matters when interviewing analysts (May 2021)When to make your first finance hire (May 2021)The hardest leap to make as a scaling finance leader (June 2021)Finance and describing product-market fit (Sep 2021)The curious analyst (Sep 2021)The less lonely finance leader (Sep 2021)Why every scaling finance team is understaffed (Nov 2021)Revenue is the best North Star metric (March 2022)PeopleKaren Church (VP of Research and Data Science at Intercom, Founder of HER+Data)Noah Goodman (President at DataCRT)Peter Fishman (Co-Founder of Mozart Data)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 104: Streamlining Machine Learning In Production with Ran Romano

    Play Episode Listen Later Dec 9, 2022 59:04


    Show Notes(01:34) Ran reflected on his time working as a Technical Product Manager at the Israeli Intelligence army.(04:07) Ran recalled his favorite classes on Machine Learning and Computer Graphics during his education in Computer Science at Reichman University.(05:24) Ran talked about a valuable lesson learned as a Software Engineer at VMware's Cloud Provider Software Business Unit.(08:07) Ran shared his thoughts on how engineers could be more impactful in startup organizations.(09:52) Ran talked about his decision to join Wix.com to work as a software engineer focusing on data infrastructure.(12:48) Ran explained the motivation for building Wix's internal ML platform, designed to address the end-to-end ML workflow.(16:48) Ran discussed the main components of Wix's ML platform: feature store, CI/CD mechanism, UI management console, and API prediction service.(18:51) Ran unpacked the virtual feature store and the CI/CD components of Wix's ML platform.(24:41) Ran expanded on the distinction between virtual and materialized feature stores.(27:01) Ran provided three key lessons for organizations looking to build an internal ML platform (as brought upon his 2020 talk discussing Wix's ML Platform).(31:43) Ran shared the essential attributes of exceptional data and ML engineering talent.(33:54) Ran shared the founding story of Qwak, which aims to build an end-to-end ML engineering platform to automate the MLOps processes.(37:07) Ran talked about his responsibilities as the VP of Engineering at Qwak.(38:45) Ran dissected the key capabilities that are baked into the Qwak platform - a Build System, a Serving layer, a Data Lake, a Feature Store, and Automations capabilities.(44:05) Ran explained the big engineering challenges for teams to build an in-house feature store and envisioned the future of the feature store ecosystem in the upcoming years.(47:45) Ran shared valuable hiring lessons to attract the right people who are excited about Qwak's mission.(50:22) Ran reflected on the challenges for Qwak to find the early design partners.(52:43) Ran described the state of the ML Engineering community in Israel.(54:53) Closing segment.Ran's Contact InfoLinkedInQwak's ResourcesWebsite | Twitter | LinkedInWhy QwakBlogMentioned ContentTalks"Overview of Wix's Machine Learning Platform" (2020)"Feature Stores - Unified Data Pipelines for ML" (2022)PeopleAndrew NgMatei ZahariaBarr MosesBook"Principles" (by Ray Dalio)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 103: Computational Economics, Statistical Arbitrage, and Adaptable Data Consolidation with Eric Daimler

    Play Episode Listen Later Nov 28, 2022 62:32


    Show Notes(02:15) Eric reflected on his early interest in computer science and his decision to study at Carnegie Mellon University in the early 90s.(05:40) Eric recalled his academic and overall college experience, emphasizing the importance of the people he was surrounded with.(08:22) Eric talked about his time working as a quant analyst early in his career, the moment he encountered the birth of the Mosaic browser, and his decision to join the tech industry.(13:01) Eric imparted wisdom learned from venture investing during the dot-com boom.(18:02) Eric talked about the next phase of his academic career - earning a Ph.D. in Computer Science from Carnegie Mellon and dropping out of a Ph.D. program at Stanford.(21:06) Eric discussed his academic research on Computational Economics for corporate malfeasance during his time as a Ph.D. student.(27:39) Eric shared different initiatives he worked on with Carnegie Mellon University - serving as the Assistant Dean and Assistant Professor of Software Engineering, launching CMU's Silicon Valley Campus, and founding CMU's Entrepreneurial Management program.(31:54) Eric described his journey in founding Hg Analytics, a hedge fund focused on statistical arbitrage, alongside other CMU's Computer Science PhDs.(37:36) Eric revisited his passion for AI and robotics, which eventually led to serving as a Presidential Innovation Fellow during the Obama Administration with the White House Office of Science and Technology Policy.(42:54) Eric shared his perspective on the role of AI in geopolitics and highlighted the challenges with data integration.(47:29) Eric explained his company Conexus, which develops a technology spin-off from MIT's Mathematics department using a branch of math called Category Theory.(50:55) Eric went over a customer case study that uses Conexus's solution to guarantee the semantics of data integrity during data transformation.(54:20) Eric showed his enthusiasm for the concept of data relationships.(56:59) Eric provided a sneak peek of his forthcoming book, "The Coming Composability: The roadmap for using technology to solve society's biggest problems."(58:38) Closing segment.Eric's Contact InfoTwitterLinkedInConexus' ResourcesWebsite | ResourcesMentioned ContentPeopleKai-Fu LeeAndrew NgEric XingBook"ReCulturing: Design Your Company Culture to Connect with Strategy and Purpose for Lasting Success" (by Melissa Daimler)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to or browse the full guest list.

    Episode 102: Early-Stage Investing, Modern Venture Capital, and Trends in Enterprise Infrastructure with Astasia Myers

    Play Episode Listen Later Nov 23, 2022 77:21


    Show Notes(01:56) Astasia shared her childhood growing up in Silicon Valley.(05:12) Astasia reflected on her undergraduate education at Stanford - studying Political Science and International Relations.(06:35) Astasia discussed her research at the Graduate Business School with Professor Condoleezza Rice on a case study called "San Leon Energy: Hydraulic Fracturing in Poland" - which explores how to manage the political risks of using a controversial energy extraction technology in the European Union.(09:26) Astasia talked about her year in the UK getting a Master's in Technology Policy at the University of Cambridge's Judge Business School.(12:52) Astasia recalled her experience as an Equity Research Analyst at Baird and Co.(17:49) Astasia mentioned her work at Cisco Investments, driving their cloud-infrastructure M&A and venture investments.(20:58) Astasia shared her thoughts on different M&A frameworks she learned from Cisco.(23:27) Astasia reflected on her decision to join Redpoint Ventures in early 2017, leading investments across developer tools, cloud infrastructure, data/ML infrastructure, AI applications, and cybersecurity.(25:44) Astasia debunked misconceptions about the venture industry.(29:30) Astasia discussed ways to prove her value upfront in potential deals and start forming her investment theses as a new investor.(33:01) Astasia dissected the key factors that triggered her to invest in the Series A of Solo.io and the Series B of LaunchDarkly (in the domain of cloud infrastructure).(38:48) Astasia explained her Series A investment in Hex and Series B investment in Preset (in the domain of data infrastructure).(44:12) Astasia shared advice she had given her portfolio companies in hiring decisions, pricing products, and navigating go-to-market strategy while at Redpoint.(47:36) Astasia walked through her process of writing comprehensive research primers in her Medium blog Memory Leak on wide-ranging topics - from data science notebooks and data orchestration to data pipelining and ML data management.(51:19) Astasia shared the typical challenges she has seen in companies looking to incorporate Product-Led Growth into their go-to-market motion.(54:10) Astasia discussed building a community as a fuel for product-led growth and shared advice to startups thinking about starting their community initiatives.(56:40) Astasia shared advice for hiring good DevRel practitioners.(01:00:15) Astasia shared advice for a smart, driven operator who wants to explore angel investing.(01:03:26) Astasia talked about her current journey as the Founding Partner at Quiet Capital, sitting on its early-stage enterprise team and leading opportunities across pre-seed, seed, Series A, and Series B.(01:05:13) Astasia expanded upon her typical mental checklist to evaluate entrepreneurs and make investment decisions.(01:07:36) Astasia briefly touched on LP fundraising for Quiet Capital to become a "modern venture firm."(01:09:59) Astasia emphasized her enthusiasm for the Data-Centric ML movement.(01:13:41) Closing segment.Astasia's Contact InfoLinkedInMediumTwitterQuiet CapitalWebsiteLinkedInTwitterMentioned ResourcesContentJohn Gannon BlogPeopleSatish Dharmaraj (Redpoint Ventures)Scott Raney (Redpoint Ventures)Amanda Robson (Cowboy Ventures)NotesMy conversation with Astasia was recorded back in April 2022. Since then, many things have happened. I'd recommend:Signing up for her Memory Leak newsletterBrowsing through Quiet Capital's new portfolio careers pageListening to Astasia's appearance on the Data Stack ShowChecking out Quiet Capital's investments in Edge Delta, Diagrid, and OmniLooking at her real-time infrastructure landscape

    Episode 101: Scaling Data Engineering, Building Data Teams, and Managed Data Stack with Tarush Aggarwal

    Play Episode Listen Later Nov 7, 2022 55:27


    Show Notes(02:24) Tarush shared his upbringing in India and his decision to study abroad in the US.(03:51) Tarush walked through his college experience studying Computer Engineering at Carnegie Mellon University.(06:24) Tarush described the non-existent state of data infrastructure at Salesforce when he joined as the first data engineer in 2012.(11:21) Tarush went over his contribution to the automation and benchmarking frameworks over his tenure at Salesforce.(15:50) Tarush recalled lessons learned from building and managing a data team as a Data Manager at Wyng.(19:54) Tarush explained how a data team can serve other functional units more efficiently.(22:37) Tarush elaborated on his decision to adopt Looker for Wyng's Business Intelligence needs.(26:30) Tarush talked about his decision to join WeWork as their Director of Data Engineering in 2016.(30:39) Tarush went over the origin and evolution of Marquez - WeWork's first open-source project around data lineage - during his time as the director of WeWork's Data Platform team.(33:49) Tarush highlighted the main challenges of building an internal data platform.(35:43) Tarush recalled his move to China to help establish WeWork's Asia operations and focus on the hyper-growing Chinese market.(39:01) Tarush shared the founding story of 5x during his sabbatical in 2020.(42:39) Tarush explained the industry's need for a managed data stack.(45:20) Tarush went over 5x's process of sourcing, interviewing, and onboarding data engineers who are pre-trained on the modern data stack.(48:37) Tarush talked about finding the right vendors that make up the modern data stack to partner with.(50:06) Tarush walked through his production process to put together a lot of good videos to explain what 5x does and raise awareness about the company.(51:52) Closing segment.Tarush's Contact InfoLinkedInTwitterMedium5x ResourcesWebsite | LinkedIn | Twitter | YouTube | Instagram5x Explained in 2 MinutesManaged Data PlatformOn-Demand Data Engineering ServicesIntegrationsMentioned ContentPeopleGeorge Fraser and Taylor Brown (Founders of Fivetran)Prukalpa Sankar (Co-Founder and CEO of Atlan)Frank Slootman (CEO and Chairman of Snowflake)BooksStealing Fire (by Steven Kotler and Jamie Wheal)The 5 AM Club (by Robin Sharma)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to or browse the full guest list.

    Episode 100: Data-Centric Computer Vision, Productizing AI, and Scaling a Global Startup with Hyun Kim

    Play Episode Listen Later Oct 28, 2022 69:29


    Show Notes(01:59) Hyun shared his upbringing and experience living in Korea, Singapore, and the US.(04:18) Hyun described his undergraduate experience at Duke University.(08:21) Hyun shared how he got a real taste of the game-changing potential of deep learning from the experience of bringing ML to diagnose Parkinson's disease with brain MRI scans.(10:54) Hyun talked about his journey of leveling up coding and ML knowledge.(12:13) Hyun reflected on his motivation to pursue a Ph.D. program in computer science at Duke.(15:22) Hyun talked about his participation in the 2016 Amazon Robotics Challenge as the “Team Duke” leader and its Motion Planning function.(17:25) Hyun reflected on his decision to take a leave of absence from his Ph.D. program and return to Korea to work as an ML Research Engineer at the AI Research Lab of SK Telecom, a major Korean conglomerate.(19:46) Hyun discussed his research on game AI and synthetic image generation during his time with SK Telecom.(22:57) Hyun shared the founding story of Superb AI.(27:11) Hyun described going through the Y Combinator Winter 2019 batch.(32:25) Hyun unpacked the evolution of Superb AI's Labeling platform since its inception.(34:47) Hyun walked through the process of prioritizing the product roadmap.(36:54) Hyun zoomed in on Superb AI's automated labeling feature, Custom Auto-Label, which automatically detects and labels common or niche objects in images and videos.(40:21) Hyun touched on challenges with manually reviewing and auditing labels.(42:25) Hyun dissected the data-centric problems in computer vision that the newly released Superb DataOps platform is built to solve.(46:46) Hyun hinted at Superb AI's product roadmap, judging from current industry-wide pain points.(48:53) Hyun highlighted a customer use case of Superb AI product offerings.(51:42) Hyun shared his vision of where Superb AI fits into the quickly evolving AI Infrastructure ecosystem.(54:15) Hyun shared valuable hiring lessons to attract people who are excited about Superb AI's mission.(58:01) Hyun expanded his perspectives on defining and scaling a global company culture.(01:00:06) Hyun reflected on the challenges of running a remote-first company.(01:01:54) Hyun shared fundraising advice for founders seeking the right investors for their startups.(01:03:35) Hyun highlighted the difference between being a researcher and a founder.(01:05:08) Closing segment.Hyun's Contact InfoLinkedInTwitterSuperb AI ResourcesWebsite | LinkedIn | Twitter | YouTube | GitHub | DocsSuperb AI Suite Labeling PlatformSuperb AI DataOps PlatformThe Ground Truth NewsletterSuperb AI AcademyMentioned ContentPeopleAndrew NgAndrej KarpathyIan GoodfellowBookZero To One (by Peter Thiel)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to or browse the full guest list.

    Episode 99: Data Mobility, Enterprise GTM, and Tech Leadership with Gary Hagmueller

    Play Episode Listen Later Aug 30, 2022 78:29


    Show Notes(01:45) Gary walked through his academic experience getting a Bachelor's degree in Business Administration at Arizona State University and an MBA in Finance at USC — Marshall School of Business.(04:52) Gary recalled the most valuable lesson from leading a business development team in the enterprise offerings group at Verizon.(07:45) Gary recalled the challenges of bringing a company public during his time as the Director of Corporate Development at NorthPoint Communications.(12:18) Gary shared his learnings while holding a COO role at Vinfolio — an innovator in the wine Industry.(15:19) Gary talked about his responsibilities in the Chief Financial Officer roles at KnowNow and Zuora.(19:06) Gary gave advice to founders seeking the right investors for their startups.(23:51) Gary walked through the learning curves while serving as the CFO, CRO, and COO of enterprise AI pioneer Ayasdi.(31:06) Gary shared his playbook on building a well-oiled sales operations machine.(33:46) Gary shared his journey as a first-time CEO at CLARA Analytics.(36:37) Gary talked about his proudest accomplishments while driving significant growth for CLARA.(37:52) Gary discussed the go-to-market motions implemented at CLARA.(41:07) Gary walked through his brief stint as an Entrepreneur-In-Residence at Redpoint Ventures, a top-tier VC firm focused on early-stage investing.(44:14) Gary rationalized his decision to become the CEO of Arcion Labs in December 2021.(49:39) Gary explained the high-level architectural design of Arcion's data mobility platform.(54:19) Gary discussed strategies for finding the right technology partners to collaborate with.(57:42) Gary highlighted a few customer use cases of Arcion.(01:01:48) Gary shared valuable hiring lessons to attract the right people who are excited about Arcion's mission.(01:04:28) Gary distilled lessons learned while building a high-performance team at Arcion.(01:09:14) Gary described the benefits of adopting usage-based pricing in enterprise technology.(01:11:41) Closing segment.Gary's Contact InfoLinkedInTwitterCrunchbaseArcion's ResourcesWebsite | LinkedIn | Twitter | YouTube | Docs | Slack“Dawn of the Data Mobility Era” (Feb 2022)“Arcion lands $13M to help companies replicate data across platforms” (Venture Beat, Feb 2022)Mentioned ContentContentThe Network Effects Bible (by James Currier of NFX)Blog by Tomasz Tunguz of Redpoint VenturesPeopleGurjeet Singh (Co-Founder and CEO of Oma Robotics, Ex-CEO/Co-Founder of Ayasdi)Satish Dharmaraj (Managing Director at Redpoint Ventures)NotesMy conversation with Gary was recorded back in March 2022. Since then, many things have happened at Arcion. I'd recommend checking out:The introduction of Arcion Cloud.This article about data mobility on The New Stack.This article about change data capture on Venture Beat.This big product launch on Oracle log reader availability featured by VentureBeatThe article about the missing piece for the Modern Data Stack featured by CrunchbaseArcion is launched with Databricks Partner Connect, featured by DatanamiArcion is a proud sponsor of the Oracle Cloud World 2022 in Las Vegas, Oct 17–20. If any data professionals are attending the conference, they should stop by the Arcion booth to say hi!About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 98: Building Developer Tools, Managing Platform Products, Fostering Diversity, and Enabling Real-Time Data Applications with DeVaris Brown

    Play Episode Listen Later Aug 17, 2022 97:11


    Show Notes(01:43) DeVaris reflected on his upbringing on the south side of Chicago and college experience at UIUC, studying Mathematics and Computer Science in the early 2000s.(06:46) DeVaris shared his journey of learning how to program, make computers, and dive into the Internet.(09:35) DeVaris recalled valuable lessons from interning at Intel and Cisco Systems.(15:49) DeVaris shared his proudest accomplishments during his five years at Microsoft — first as a system engineer and then as an academic developer evangelist.(22:06) DeVaris recalled his experience working in the gaming and music space as the Chief Developer Evangelist at Marmalade and the Chief Product Officer at Klick Push, respectively.(27:49) DeVaris provided his perspective on the startup acquisition process.(29:13) DeVaris unpacked his two years as a platform product manager at Zendesk, where he drove the adoption of the Zendesk Developer Platform for developers to create unique customer experiences.(35:43) DeVaris revealed the challenges of building a technical community, given his experience at Zendesk.(38:25) DeVaris recalled his time working for a year as the Lead Product Manager at VSCO — a startup that builds digital tools for the modern creative.(45:12) DeVaris went over the challenges of building software for brand ambassadors and children's playtime, given his time as the Head of Product Management at Slyce.io and the CTO at Super Heroic.(49:39) DeVaris reflected on his desire to scratch his entrepreneurial itch.(52:00) DeVaris gave advice for early-career technologists on evaluating startup opportunities.(55:51) DeVaris unpacked the product challenges he encountered while building tools for developers as the Director of Product Management at Heroku.(58:57) DeVaris touched on his one year as the first platform engineering PM hire at Twitter.(01:02:18) DeVaris shared the founding story of Meroxa.(01:04:28) DeVaris dissected how Meroxa's platform architecture is designed at a high level — including a change data capture service, schema registry, event streaming service, API proxy, and incident automation framework.(01:06:06) DeVaris explained the technical challenges associated with creating connections between data sources and destinations in real time.(01:08:37) DeVaris zoomed into Conduit — Meroxa's open-source, single-binary data integration tool written in Golang that provides developer-friendly streaming data orchestration.(01:12:32) DeVaris highlighted a few customer use cases of Meroxa.(01:16:16) DeVaris shared valuable hiring lessons to attract the right people who are excited about Meroxa's mission and fit with Meroxa's cultural values.(01:18:37) DeVaris shared challenges to finding the early design partners & lighthouse customers for Meroxa.(01:20:24) DeVaris gave advice to founders seeking the right investors for their startups.(01:22:58) DeVaris gave advice to smart, driven operators looking to explore angel investing.(01:25:17) DeVaris discussed the remaining barriers that prevent minorities from pursuing a technology career.(01:30:42) DeVaris imparted lessons from photography and DJ that benefited his career in product.(01:32:26) Closing segment.DeVaris' Contact InfoLinkedInTwitterWebsiteGitHubMeroxa's ResourcesWebsite | LinkedIn | Twitter | YouTubeCareers | Medium BlogDocumentationConduit (GitHub | Discord | Twitter | Docs)Mentioned ContentArticles“Hello World, Meroxa Style” (April 2021)“Streaming Your Database Changes with Change Data Capture” (Part 1 + Part 2)“Conduit: Streaming Data Integration for Developers” (Jan 2022)“Why Conduit? An Evolutionary Leap Forward for Real-Time Data Integration” (Feb 2022)“Hello Meroxa 2.0” (April 2022)Resources for minoritiesKura Labs (A free training and job placement academy for Infrastructure Computing, DevOps, and SRE for students from underserved communities)Free Code Camp (Learn to code — for free)BooksZero To One (by Peter Thiel)The Hard Thing About Hard Things (by Ben Horowitz)PeopleTristan Handy (Co-Founder and CEO of dbt Labs)Arjun Narayan (Co-Founder and CEO of Materialize)Benn Stancil (Chief Analytics Officer at Mode Analytics)Chad Sanderson (Head of Data Platform at Convoy)NotesMy conversation with DeVaris was recorded back in April 2022. Since then, many things have happened at Meroxa. I'd recommend checking out:The introduction of Meroxa 2.0 and Turbine.This interview on data-driven work culture.New CDC Connectors built into Conduit.Meroxa is a recipient of DoD funding to help the US Space Force monitor aircraft health in real-time.About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 97: Escaping Poverty, Embracing Digital Learning, Benchmarking ML Systems, and Advancing Data-Centric AI with Cody Coleman

    Play Episode Listen Later Aug 2, 2022 87:29


    Show Notes(01:49) Cody shared his upbringing in New Jersey, his childhood interest in science and technology, and the few people who have made big differences in his story.(09:35) Cody went over his academic experience studying Electrical Engineering and Computer Science at MIT.(17:51) Cody recalled his favorite classes taken at MIT.(22:43) Cody talked about his engagement in serving as the president of MIT's chapter of Eta Kappa Nu Honor Society and advancing online education at the MIT Office of Digital Learning.(31:25) Cody is bullish on the future of digital learning.(35:43) Cody expanded on his internships with Google throughout his time at MIT — doing local search quality and YouTube analytics.(42:31) Cody described the challenges of dealing with high-frequency trading data from his one year working as a junior data scientist at the Vendor Data Group of Jump Trading in Chicago.(46:50) Cody reflected on his decision to embark on a Ph.D. journey in Computer Science at Stanford University.(51:54) Cody mentioned his participation in the DAWN project, specifically DAWNBench, an end-to-end deep learning benchmark and competition.(54:21) Cody unpacked the evolution of MLPerf, an industry-standard benchmark for the training and inference performance of ML models.(56:52) Cody walked through the motivation and empirical work in his paper “Selection via Proxy: Efficient Data Selection for Deep Learning.”(59:34) Cody discussed his paper “Similarity Search for Efficient Active Learning and Search of Rare Concepts.”(01:06:32) Cody shared his learnings about bringing ML from research to industry from his advisors, Matei Zaharia and Peter Bailis — who were both academics and startup founders simultaneously.(01:09:19) Cody went over key trends in the emerging Data-Centric AI community — given his involvement with the Data-Centric AI workshop at NeurIPS 2021 and the DataPerf benchmark suite.(01:12:19) Cody shared lessons learned about finding product-market fit as the founder of Coactive AI — which brings unstructured data into the world of SQL and the big data tools that teams already love.(01:15:34) Cody emphasized the importance of focusing on the HR function and defining cultural guiding principles for any early-stage startup founder.(01:21:05) Cody provided his perspective on the differences and similarities between being a researcher and a founder.(01:23:47) Closing segment.Cody's Contact InfoWebsiteTwitterLinkedInGoogle ScholarCoactive AI's ResourcesWebsiteTwitterLinkedInCulture ValuesMentioned ContentTalk“Digging Deeper: How a Few Extra Moments Can Change Lives” (TEDxStanford 2017)“Data Selection for Data-Centric AI” (Stanford MLSys 2022)Research“Probabilistic Use Cases: Discovering Behavioral Patterns for Predicting Certification” (2015)DAWNBench: An End-to-End Deep Learning Benchmark and Competition (Dec 2017)“MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance” (Feb 2020)“Selection via Proxy: Efficient Data Selection for Deep Learning” (Oct 2020)“Similarity Search for Efficient Active Learning and Search of Rare Concepts” (July 2021)DataPerf, a new benchmark suite for machine learning datasets and data-centric algorithms (Dec 2021)PeopleMatei Zaharia (Cody's Ph.D. Advisor, Co-Creator of Apache Spark, Co-Founder of Databricks)Fei-Fei Li (Professor of Computer Science at Stanford, Creator of ImageNet Dataset)Michael Bernstein (Professor of Computer Science at Stanford with a focus on Human-Computer Interaction)Books“No Rule Rules: Netflix and the Culture of Reinvention” (by Reed Hastings)“What You Do Is Who You Are: How to Create Your Work Business Culture” (by Ben Horowitz)“The Inner Game of Tennis: The Classical Guide to Peak Performance” (by Timothy Gallwey)NotesMy conversation with Cody was recorded back in January 2022. Since then, many things have happened at Coactive AI. I'd recommend:Attending Cody's upcoming talk at Snorkel's The Future of Data-Centric AI.Reviewing the DataPerf workshop at ICML 2022.Reading the CoactiveAI blog post on bringing UI props to MLOps.Watching Cody's CBS News interview back in February 2022.About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 96: Data Science Training and The Power of Education with Merav Yuravlivker

    Play Episode Listen Later Jul 14, 2022 59:51


    Show Notes(02:18) Merav talked about her undergraduate experience at McGill University studying Psychology and Sociology.(04:33) Merav discussed important attributes of an exceptional teacher given her two years teaching elementary special education in NYC public schools through the Teach For America program.(08:19) Merav commented on her time working at the International Baccalaureate Organization and working as a Kaplan GRE instructor.(10:57) Merav shared the backstory behind the founding of Data Society, a predictive analytics training and consulting company (co-founded with Dmitri Adler and John Nader).(14:15) Merav reflected on her journey into programming.(17:16) Merav explained why data science training should be industry-tailored for maximum success.(20:57) Merav talked about how Data Society creates and evaluates its training curriculum.(23:59) Merav provided an example of how Data Society provides customized AI solutions to inform decisions, automate time-consuming manual processes, and solve complex data challenges for its clients.(27:38) Merav brought up challenges that hinder the adoption of data science in the government sector.(29:49) Merav unpacked the six different steps for organizations to start moving up the data analytics maturity model.(33:07) Merav dissected meldR, Data Society's internal product built for Learning and Development teams in healthcare.(36:24) Merav reflected on bootstrapping Data Society in the early days (look at this 2016 Kickstarter campaign).(39:48) Merav discussed the shift from a B2C to a B2B model for Data Society and scoring partnerships with Fortune 500 companies and federal agencies.(42:47) Merav shared valuable hiring lessons to attract the right people who are excited about the mission of Data Society.(45:22) Merav shared her experience shaping the remote work culture.(49:05) Merav touched on initiatives at Data Society to bring more goodness to the world.(50:28) Merav provided different ways to engage more women in data science (via the Women Data Scientists DC Meetup and DCFemTech).(53:17) Merav predicted the evolution of education in the next 3 to 5 years.(55:29) Closing segment.Merav's Contact InfoLinkedInTwitterData Society's ResourcesWebsiteTwitterLinkedInMentioned ContentArticles“Is Your Enterprise Data-Driven?” (May 2021)“Why Data Science Training Should Be Industry-Tailored for Maximum Success” (August 2021)“Female Founders: Merav Yuravlivker of Data Society On The Five Things You Need To Thrive and Succeed as a Woman Founder” (Sep 2021)PeopleDJ Patil (The first Chief Data Scientist of the US)Hilary Mason (Co-Founder of Hidden Door)Avriel Epps-Darling (Ph.D. candidate, Ford fellow, and Presidential Scholar at Harvard University)BookWeapons of Math Destruction (by Cathy O'Neil)NotesMy conversation with Merav was recorded back in December 2021. Since then, many things have happened at Data Society. I'd recommend:Reading Merav's articles on Forbes about creating a culture of data sharing, assessing data literacy, and communication in the learning process.Reading Data Society's white papers about data science in research and data science in healthcare.Checking out the Camelsback product for risk assessment in financial services.Trying out the Data DNA assessment tool for organizations' data maturity.Finally, Merav was also just recognized as one of the DC region's 40 Under 40. The awards are given annually to recognize the outstanding achievements of young leaders in the Washington, DC, area who lead the community forward through hard work, philanthropy, and community engagement.

    Episode 95: Open-Source DataOps, Building In Public, and Remote Work Culture with Douwe Maan

    Play Episode Listen Later Jul 1, 2022 73:11


    Show Notes(01:46) Douwe went over formative experiences catching the programming virus at the age of 9, combining high school with freelance web development, and studying Computer Science at Utrecht University in college.(03:55) Douwe shared the story behind founding a startup called Stinngo, which led him to join GitLab in 2015 as employee number 10.(05:29) Douwe provided insights on attributes of exceptional engineering talent, given his time hiring developers and eventually becoming GitLab's first Development Lead.(08:28) Douwe unpacked the evolution of his engineering career at GitLab.(11:11) Douwe discussed the motivation behind the creation of the Meltano project in August 2018 to help GitLab's internal data team address the gaps that prevent them from understanding the effectiveness of business operations.(14:38) Douwe reflected on his decision in 2019 to leave GitLab's engineering organization and join the then 5-people Meltano team full-time.(20:24) Douwe shared the details about Meltano's product development journey from its Version 1 to its pivot.(26:18) Douwe reflected on the mental aspect of being the sole person whom Meltano depended on for a while.(29:20) Douwe explained the positioning of Meltano as an open-source self-hosted platform for running data integration and transformation pipelines.(34:54) Douwe shared details of Meltano's ideal customer profiles.(37:45) Douwe provided a quick tour of the Meltano project, which represents the single source of truth regarding one's ELT pipelines: how data should be integrated and transformed, how the pipelines should be orchestrated, and how the various plugins that make up the pipelines should be configured.(40:39) Douwe unpacked different components of Meltano's product strategy, including Meltano SDK, Meltano Hub, and Meltano Labs.(45:05) Douwe discussed prioritizing Meltano's product roadmap in order to bring DataOps functionality to every step of the entire data lifecycle.(48:53) Douwe shared the story behind spinning Meltano out of GitLab in June 2021 and raising a $4.2M Seed funding round led by GV to bring the benefits of open source data integration and DataOps to a wider audience.(52:19) Douwe provided his thoughts behind open-source contributors in a way that can generate valuable product feedback for Meltano.(55:43) Douwe shared valuable hiring lessons to attract the right people who align with Meltano's values.(59:04) Douwe shared advice to startup CEOs who are experimenting with the remote work culture in our “new-normal” virtual working environments.(01:04:10) Douwe unpacked Meltano's mission and vision as outlined in this blog post.(01:06:40) Closing segment.Douwe's Contact InfoGitLabLinkedInTwitterGitHubWebsiteMeltano's ResourcesWebsite | Twitter | LinkedIn | GitHub | YouTubeMeltano Documentation | Product | DataOpsMeltano SDK | Meltano Hub | Meltano LabsCompany Handbook | Community | Values | CareersMentioned ContentArticlesHey, data teams - We're working on a tool just for you (Aug 2018)To-do zero, inbox zero, calendar zero: I think that means I'm done (Sep 2019)Meltano graduates to Version 1.0 (Oct 2019)Revisiting the Meltano strategy: a return to our roots (May 2020)Why we are building an open-source platform for ELT pipelines (May 2020)Meltano spins out of GitLab, raises seed funding to bring data integration into the DataOps era (June 2021)Meltano: The strategic foundation of the ideal data stack (Oct 2021)Introducing your DataOps platform infrastructure: Our strategy for the future of data (Nov 2021)Our next step for building the infrastructure for your Modern Data Stack (Dec 2021)PeopleMaxime Beauchemin (Founder and CEO of Preset, Creator of Apache Airflow and Apache Superset, Angel Investor in Meltano)Benn Stancil (Chief Analytics Officer at Mode Analytics, Well-Known Substack Writer)The entire team at dbt LabsNotesMy conversation with Douwe was recorded back in November 2021. Since then, many things have happened at Meltano. I'd recommend:Checking out their updated company valuesReading Douwe's article about the DataOps Operating System on The New StackExamining Douwe's blog post about moving Meltano to GitHubLooking over the announcement of Meltano 2.0 and the additional seed fundingAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 94: Modern Metadata Management, Open-Source Adoption, and Early-Stage Culture with Mars Lan

    Play Episode Listen Later Jun 20, 2022 79:33


    Show Notes(01:41) Mars walked through his education studying Computer Systems Engineering at The University of Auckland in New Zealand.(03:16) Mars reflected on his overall Ph.D. experience in Computer Science at UCLA.(05:55) Mars discussed his early research paper on a robust and scalable lane departure warning system for smartphones.(07:13) Mars described his work on SmartFall, an automatic fall detection system to help prevent the elderly from falling.(08:34) Mars explained his project WANDA, an end-to-end remote health monitoring and analytics system designed for heart failure patients.(10:06) Mars recalled learnings from interning as a software engineer at Google during his Ph.D.(14:54) Mars discussed engineering challenges while working on PHP for Google App Engine and Gboard personalization during his subsequent four years at Google.(19:05) Mars rationalized his decision to join LinkedIn to lead an engineering team that builds the core metadata infrastructure for the entire organization.(21:15) Mars discussed the motivation behind the creation of LinkedIn's generalized metadata search and discovery tool, DataHub, later open-sourced in 2020.(25:21) Mars dissected the key architecture of DataHub, which is designed to address the key scalability challenges coming in four different forms: modeling, ingestion, serving, and indexing.(28:50) Mars expressed the challenges of finding DataHub's early adopters internally at LinkedIn and externally later on at other companies.(35:22) Mars shared the story behind the founding of Metaphor Data, which he co-founded with Pardhu Gunnam and Seyi Adebajo and currently serves as the CTO.(41:55) Mars unpacked how Metaphor's modern metadata platform serves as a system of record for any organization's data ecosystem.(48:07) Mars described new challenges with metadata management since the introduction of the modern data stack and key features of a great modern metadata platform (as brought up in his in-depth blog post with Ben Lorica).(53:55) Mars explained how a modern metadata platform fits within the broader data ecosystem.(58:30) Mars shared the hurdles to finding Metaphor Data's early design partners and lighthouse customers.(01:04:33) Mars shared valuable hiring lessons to attract the right people who are excited about Metaphor's mission.(01:07:28) Mars shared important culture-building lessons to build out a high-performing team at Metaphor.(01:10:45) Mars shared fundraising advice for founders currently seeking the right investors for their startups.(01:13:22) Closing segment.Mars' Contact InfoTwitterLinkedInGoogle ScholarGitHubMetaphor DataWebsite | Twitter | LinkedInCareers | About PageData Documentation | Data CollaborationMentioned ContentArticlesDataHub: A generalized metadata search and discovery tool (Aug 2019)Open-sourcing DataHub: LinkedIn's metadata search and discovery platform (Feb 2020)Founding Metaphor Data (Dec 2020)Metaphor and Soda partner to unify the modern data stack with trusted data (Dec 2021)Introducing Metaphor: The Modern Metadata Platform (Nov 2021)The Modern Metadata Platform: What, Why, and How? (Jan 2022)PapersSmartLDWS: A robust and scalable lane departure warning system for the smartphones (Oct 2009)SmartFall: An automatic fall detection system based on subsequence matching for the SmartCane (April 2009)WANDA: An end-to-end remote health monitoring and analytics system for heart failure patients (Oct 2012)PeopleBenn Stancil (Chief Analytics Officer at Mode Analytics, Well-Known Substack Writer)Tristan Handy (Co-Founder and CEO of dbt Labs, Writer of The Analytics Engineering Roundup)Andy Pavlo (Associate Professor of Database at Carnegie Mellon University)Books“Working In Public” (by Nadia Eghbal)“The Mom Test” (by Rob Fitzpatrick)“A Thousand Brains” (by Jeff Hawkins)“The Scout Mindset” (by Julia Galef)NotesMy conversation with Mars was recorded back in January 2022. Since then, many things have happened at Metaphor Data. I'd recommend:Visiting their brand new websiteReading the 3-part “Data Documentation” series on their blog (part 1, part 2, and part 3)Looking over the Trusted Data landing pageAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 93: Open-Source Development, Human-Centric AI, and Modern ML Infrastructure with Ville Tuulos

    Play Episode Listen Later Jun 8, 2022 76:53


    Show Notes(01:35) Ville recalled his education getting degrees in Computer Science from the University of Helsinki in Finland.(04:35) Ville walked over his time working at a startup called Gurusoft that planned to commercialize self-organizing maps, a peculiar artificial neural network.(07:17) Ville reflected on his four years as a researcher at Nokia — working on big data infrastructure, analytics, and ML open-source projects (such as Disco and Ringo).(11:56) Ville shared the story of co-founding a startup that built a novel scriptable data platform called Bitdeli with his brother and not finding a product-market fit.(13:58) Ville walked through AdRoll's acquisition of Bitdeli in June 2013.(15:49) Ville discussed the engineering challenges associated with his work at AdRoll — AdRoll Prospecting and traildb.io.(19:33) Ville mentioned the product and leadership/management lessons during his time being AdRoll's Head of Data and leading various data/ML efforts.(24:43) Ville rationalized his decision to join the ML Infrastructure team at Netflix in 2017.(27:26) Ville discussed the motivation behind the creation of Netflix's human-centric ML infrastructure, Metaflow, later open-sourced in 2019.(30:21) Ville unpacked the key design principles that summarize the philosophy of Metaflow, which is influenced by the unique culture at Netflix.(35:00) Ville talked about his well-known diagram on the data infrastructure's hierarchy of needs.(37:33) Ville examined the technical details behind Metaflow's integration with AWS to make it easy for users to move back and forth between their local and remote modes of development and execution.(40:58) Ville expressed the challenges of finding Metaflow's early adopters internally at Netflix and externally later on at other companies.(45:13) Ville went over the strategy around prioritizing features for Metaflow's future roadmap.(52:22) Ville shared the story behind the founding of Outerbounds, which he co-founded with Savin Goyal and Oleg Avdeev.(55:03) Ville provided his thoughts behind Metaflow's contributors in a way that can generate valuable product feedback for Outerbounds.(58:30) Ville shared valuable hiring lessons to attract the right people who are excited about Outerbounds' mission.(01:01:28) Ville shared upcoming initiatives that he is most excited about for Outerbounds.(01:04:05) Ville walked through his writing process for an upcoming technical book with Manning called “Effective Data Science Infrastructure,” a hands-on guide to assembling infrastructure for data science and machine learning applications.(01:06:34) Ville unpacked his great O'Reilly article that digs deep into the fundamentals of ML as an engineering discipline.(01:11:03) Closing segment.Ville's Contact InfoLinkedInTwitterGitHubOuterboundsWebsite | Twitter | LinkedIn | GitHub | YouTubeMetaflow GitHub | Metaflow DocsSlack CommunityCareersMetaflow Resources for Data ScienceMetaflow Resources for EngineeringMentioned ContentTalksSF Data Mining Meetup: TrailDB — Processing Trillions of Events at AdRoll (July 2016)QConSF 2018: Human-Centric Machine Learning Infrastructure @Netflix (Feb 2019)AWS re:Invent 2019: More Data Science with Less Engineering — ML Infrastructure at Netflix (Dec 2019)Scale By The Bay 2019: Human-Centric ML Infrastructure at Netflix (Jan 2020)AICamp: Metaflow — The ML Infrastructure at Netflix (Aug 2021)ArticlesOpen-Sourcing Metaflow, a Human-Centric Framework for Data Science (Netflix Tech Blog, Dec 2019)Unbundling Data Science Workflows with Metaflow and AWS Step Functions (Netflix Tech Blog, July 2020)MLOps and DevOps: Why Data Makes It Different (O'Reilly, Oct 2021)PeopleMichael Jordan (Distinguished Professor in EECS and Statistics at UC Berkeley)Matthew Honnibal and Ines Montani (Creators of open-source NLP library spaCy)Hadley Wickham (Chief Scientist at RStudio and Adjunct Professor of Statistics at Rice University)Book“The Mom Test” (by Rob Fitzpatrick)NotesMy conversation with Ville was recorded back in October 2021. Since then, many things have happened at Outerbounds. I'd recommend:Visiting Outerbounds' new website with Metaflow resources for Data Science and EngineeringWatching Ville's recent talk at Data Council Austin about the Modern Stack for ML InfrastructureBuying Ville's newly released book “Effective Data Science Infrastructure”About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 92: Analytics Engineering, Locally Optimistic, and Marketing-Mix Modeling with Michael Kaminsky

    Play Episode Listen Later May 29, 2022 76:56


    Show Notes(01:48) Mike recalled his undergraduate experience studying Economics at Arizona State University and doing research on statistics/econometrics.(04:59) Mike reflected on his three years working as an analyst in the Boston office of the Analysis Group.(09:08) Mike discussed how he leveled up his programming skills at work.(11:05) Mike shared his learnings about building effective data-driven products while working as a data scientist at Case Commons.(17:20) Mike revisited his transition to a new role as the Director of Analytics at Harry's, the men's grooming brand — starting a new data team from scratch.(23:04) Mike unpacked analytics and infrastructure challenges during his time at Harry's — developing the data warehouse, an internal marketing attribution tool, and a fleet of systems for automated decision-making to improve efficiency.(27:21) Mike reasoned his move to Mexico City — spending time practicing Spanish, among other things.(32:22) Mike talked about his journey of starting a new consulting practice to help companies get more value out of their data, which was primarily shaped by his network.(36:30) Mike shared the founding story behind Recast, whose mission is to help modern brands improve the effectiveness of their marketing dollars.(42:09) Mike dissected the core technical problem that Recast is addressing: performing media mix modeling in the context of “programmatic” channels.(46:14) Mike shared the story behind the inception and evolution of Locally Optimistic, a community for current and aspiring data analytics leaders.(49:29) Mike walked through his 3-part blog series on Agile Analytics — discussing the good aspects, the bad aspects, and the adjustments needed for analytics teams to adopt the Scrum methodology.(53:25) Mike unpacked his post “A Culture of Partnership,” — which discusses the three key activities that can help an analytics team identify the most important opportunities in the business and work effectively with key stakeholders and partner teams to drive value.(57:25) Mike examined his seminal piece called “The Analytics Engineer,” which generated much attention from the analytics community — which argues that the analytics engineer can provide a multiplier effect on the output of an analytics team.(01:03:24) Mike shared the motivation and pedagogical philosophy behind the Analytics Engineers Club (co-founded with Claire Carroll), which provides a training course for data analysts looking to improve their engineering skills.(01:07:57) Mike anticipated the evolution of the quickly evolving modern data stack (read his Fivetran article “The Modern Data Science Stack”).(01:09:22) Mike unpacked how organizations can build, start, and maintain the data quality flywheel (read his Datafold article “The Data Quality Flywheel”).(01:11:40) Mike shared his thoughts regarding the challenge of sharing complex analyses.(01:13:15) Closing segment.Mike's Contact InfoTwitterWebsiteLinkedInGitHubFurther ResourcesRecastLocally OptimisticAnalytics Engineers ClubMentioned ContentArticles“Learning a language is hard” (Personal Blog, Jan 2020)“Modern Media Mix Modeling” (Recast Blog)“Agile Analytics, Part 1: The Good Stuff” (Locally Optimistic Blog, May 2018)“Agile Analytics, Part 2: The Bad Stuff” (Locally Optimistic Blog, June 2018)“Agile Analytics, Part 3: The Adjustments” (Locally Optimistic Blog, July 2018)“A Culture of Partnership” (Locally Optimistic Blog, March 2019)“The Analytics Engineer” (Locally Optimistic Blog, Jan 2019)“Data Education Is Broken” (Analytics Engineering Club, June 2021)“Teaching The Real Tools” (Analytics Engineering Club, Aug 2021)“The Modern Data Science Stack” (Fivetran Blog, Oct 2020)“The Data Quality Flywheel” (Datafold Blog, Nov 2020)“Knowledge Sharing” (Personal Blog, Sep 2020)“TDD for ELT” (Personal Blog, Sep 2020)“Are Data Catalogs Curing the Symptom or the Disease?” (Personal Blog, Dec 2020)PeopleClaire Carroll (Co-Instructor of Analytics Engineering Club, Product Manager of Hex, previous Community Manager of dbt Labs)Drew Banin (Head of Product at dbt Labs)Barry McCardel (Co-Founder and CEO of Hex)NotesMy conversation with Michael was recorded back in October 2021. Since then, Michael has been active in his work projects. I'd recommend:Following the Analytics Engineering Club for upcoming sessions (They are currently teaching their second summer cohort)Reading his collaboration blog post with Reforge on the attribution stackConsuming his Recast content explaining why marketing-mix modeling is hard and laying out the checklist for evaluating an MMM vendorAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 91: Collaborative Data Workspace, The Sharing Gap, and Engineering Management with Caitlin Colgrove

    Play Episode Listen Later May 13, 2022 65:12


    Show Notes(01:37) Caitlin went over her college experience studying Computer Science at Stanford University in the early 2010s.(03:55) Caitlin talked about her teaching experience for CS 106A and CS 103.(07:09) Caitlin shared valuable lessons from completing software engineering internships at Harvard University, Facebook, and Palantir.(10:06) Caitlin walked over technical and organizational challenges during her time at Palantir — building products for both government/commercial customers and working with designers/infrastructure engineers to deliver full-stack applications to the field.(12:01) Caitlin explained why Palantir is composed of “loosely individual startups.”(14:56) Caitlin recalled learning curves during her transition to a tech lead role at Palantir — becoming responsible for the technical architecture and code quality of the product, mentorship and growth of the engineers, and the product direction and prioritization of features.(18:31) Caitlin discussed her time as a Data Engineering Manager at Remix Technologies — leading the team that builds geospatial data pipelines on top of AWS, Postgres/PostGIS, and Apache Airflow.(24:45) Caitlin reflected on valuable leadership and people management lessons absorbed during her transition to growing and developing diverse and inclusive engineering teams.(29:05) Caitlin shared the founding story of Hex, the modern data workspace for teams, alongside her co-founders Barry and Glen.(32:58) Caitlin talked about Hex's ideal users (the “analytically technical” who need better tools to access and manage more sophisticated workflows) and introduced Hex's Logic View.(35:22) Caitlin examined the collaboration challenges in data teams and revealed Hex's Library to address some of the shortcomings.(39:59) Caitlin shared her thoughts on the evolution of data science notebooks.(42:14) Caitlin unpacked the nuanced problem of justifying data ROI to functional stakeholders and described Hex's interactive App Builder.(45:17) Caitlin shared exciting development in the horizon of Hex's product roadmap.(46:37) Caitlin shared valuable hiring lessons to attract the right people who are excited about Hex's mission.(52:10) Caitlin shared the hurdles to find the early design partners and lighthouse customers of Hex.(56:01) Caitlin shared upcoming go-to-market initiatives that she's most excited about for Hex.(58:24) Caitlin shared fundraising advice for founders currently seeking the right investors for their startups.(01:01:42) Closing segment.Caitlin's Contact InfoLinkedInTwitterHex's ResourcesWebsite | Twitter | LinkedInLogic View | App Builder | Knowledge LibraryDocs | Blog | GalleryCustomers | Careers | Integrations | PricingMentioned ContentArticles“Long Live Code” (June 2020)“Don't Tell Your Data Team's ROI Story” (Aug 2020)“The Sharing Gap” (Oct 2020)PeopleTristan Handy (Founder and CEO of dbt Labs)Claire Carroll (Product Manager of Hex, previous Community Manager of dbt Labs)Wes McKinney (Creator of Pandas and Arrow, Co-Founder and CTO of Voltron Data)DeVaris Brown (Co-Founder and CEO of Meroxa)Book“Mindset: The New Psychology of Success” (by Carol Dweck)NotesMy conversation with Caitlin was recorded back in Fall 2021. Since then, many things have happened at Hex. I'd recommend looking at:Caitlin's piece announcing Hex's SOC 2 Type II report to reflect Hex's commitment to securityCaitlin's recent talk at Data Council Austin about implementing reactive notebooks with iPythonThe release of Hex Knowledge Library, a new way to publish and discover data workHex's $16M Series A (led by Redpoint Ventures) and $52M Series B (led by a16z along with Snowflake, Databricks, and existing investors)Hex's increasing list of customers such as AngelList, Fivetran, Hightouch, Loom, Mixpanel, Notion, Ramp, Replicated, SeatGeek, etc.

    Episode 90: Operational Analytics, Reverse ETL, and Finding Product-Market Fit with Kashish Gupta

    Play Episode Listen Later May 3, 2022 83:28


    Show Notes(00:43) Kashish shared briefly about his upbringing in Atlanta and his early interest in STEM subjects.(02:38) Kashish described his overall academic experience studying Economics, Management, and Computer Science at the University of Pennsylvania.(05:53) Kashish walked over the Machine Learning classes and projects throughout his MSE degree in Robotics.(09:02) Kashish shared valuable lessons learned from multiple internships throughout his undergraduate: data science at Implantable Provider Group, investment analysis at Tree Line, and product management at LYNK.(13:14) Kashish told the anecdotes that enabled him to realize his passion for building startups.(17:14) Kashish recapped his learning about venture capital from spending a summer as an analyst in early-stage deep-tech companies at Bessemer Venture Partners in New York.(22:09) Kashish shared learnings from his entrepreneurial stints at an early age.(26:12) Kashish talked through his decision to move to San Francisco after college (Read his blog post explaining how he moved here without a job and a home).(29:04) Kashish recalled his experience working on a project called Carry (an executive assistant for travel on Slack) with his friend Tejas Manohar and going through Y Combinator.(36:40) Kashish shared the founding story of Hightouch, a data platform that syncs customer data from the data warehouse to CRM, marketing, and support tools.(44:15) Kashish emphasized the importance of speed and execution around different pivots that led to Hightouch.(46:35) Kashish unpacked the notion of Operational Analytics, an approach to analytics that shifts the focus from simply understanding data to putting that data to work in the tools that run your business.(49:46) Kashish dissected Hightouch's market-leading Reverse ETL, which is the process of copying data from a data warehouse to operational systems of record.(54:51) Kashish discussed Hightouch Audiences, used primarily by larger B2C customers, that allows marketing teams to build audiences and filters on top of existing data models.(58:09) Kashish explained how the “Reverse ETL” concept fits into the quickly evolving modern data stack.(01:00:26) Kashish shared how the Hightouch team prioritizes their product roadmap, given the high number of customer requests.(01:02:47) Kashish shared valuable hiring lessons to attract the right people who are excited about Hightouch's mission.(01:05:13) Kashish shared the hurdles to find the early design partners and lighthouse customers of Hightouch.(01:08:06) Kashish explained how Hightouch prices by destinations, reflecting the value customers get from using the product and helping them predict costs over time.(01:10:32) Kashish shared upcoming go-to-market initiatives that he is most excited about for Hightouch.(01:14:36) Kashish shared fundraising advice for founders currently seeking the right investors for their startups.(01:17:47) Kashish emphasized the industry recognition of the Reverse ETL market.(01:19:47) Closing segment.Kashish's Contact InfoLinkedInTwitterGitHubWebsiteMediumHightouch's ResourcesWebsite | Twitter | LinkedInData Features | Hightouch Audiences | Hightouch NotifyDocs | BlogCustomers | Careers | PricingMentioned ContentArticles“On Moving to SF Jobless and Homeless” (Aug 2018)“Hightouch Ushers In The Era of Operational Analytics” (March 2021)“The State of Reverse ETL” (May 2021)“What is Operational Analytics?” (July 2021)“Hightouch Has Raised a Series A!” (July 2021)“Hightouch Raises $12M to Empower Business Teams With Operational Analytics” (July 2021)“The Cloud 100 Rising Stars 2021” (Aug 2021)“What is Reverse ETL?” (Nov 2021)Companiesdbt LabsShipyardBig Time DataBook“The Hard Things About Hard Things” (by Ben Horowitz)NotesMy conversation with Kashish was recorded back in August 2021. Since then, many things have happened at Hightouch. I'd recommend looking at:Kashish's piece about Hightouch's transition from Reverse ETL to becoming a Data Activation companyKashish's recent talk at Data Council Austin about the current state of Data Apps built on top of the warehouse and the future as warehouses become even faster.The release of Hightouch Notify that sends notifications on top of the data warehouseHightouch's Series B funding of $40M back in November 2021Finally, Kashish lets me know that back in August, Hightouch were only 25 people. Now, the company is 70-person strong!About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 89: Observable, Robust, and Responsible AI with Alessya Visnjic

    Play Episode Listen Later Apr 15, 2022 72:16


    Show Notes(01:53) Alessya shared her formative experiences growing up in Kazakhstan, coming to Washington during high school, and discovering a passion and extreme aptitude for mathematics.(04:20) Alessya described her undergraduate experience studying Applied Mathematics at the University of Washington.(08:00) Alessya talked about impactful projects she contributed to while working as a software developer at Amazon's quality assurance and DevOps organizations.(12:29) Alessya went over critical responsibilities during her time as Amazon's Technical Program Manager.(17:06) Alessya talked about the process of building and getting adoption for an internal Machine Learning platform at Amazon.(20:42) Alessya shared her biggest takeaways from Amazon's culture of customer obsession and operational excellence.(23:26) Alessya revisited her period enrolling in UW's Master of Science in Entrepreneurship Program and highlighted two core entrepreneurial muscles developed: networking and negotiation.(28:58) Alessya provided insights on the startup ecosystem and ML community in Seattle.(34:47) Alessya walked through her period serving as the CTO in Residence at Allen Institute for AI and evaluating a range of AI technologies for viability and product readiness.(37:12) Alessya shared the backstory behind the founding of WhyLabs, an AI observability platform built to enable every enterprise to run AI with certainty (read her blog post about early misadventures with AI at Amazon that inspired the incubation of WhyLabs at AI2).(42:23) Alessya examined what makes an AI solution robust and responsible.(46:09) Alessya dissected the anatomy of an enterprise AI Observability platform.(49:58) Alessya explained why data logging is a critical missing component in the production ML stack and described whylogs, an open-source ML data logging library from WhyLabs.(54:12) Alessya shared valuable hiring lessons to attract the right people who are excited about WhyLabs' mission.(57:03) Alessya shared tactics to find and engage contributors to whylogs.(58:10) Alessya shared the hurdles to find the early design partners and lighthouse customers of WhyLabs.(01:02:28) Alessya shared upcoming go-to-market initiatives that she is most excited about for WhyLabs.(01:03:54) Alessya explained what it felt to be recognized as the CEO of the year for the Pacific Northwest startup community last year and shared her perspective on work-life balance.(01:07:43) Closing segment.Alessya's Contact InfoLinkedInTwitterWhyLabs's ResourcesWebsitewhylogsSlack CommunityBlogLinkedIn | Twitter | Facebook | YouTube | GitHubWhat is AI Observability?Mentioned ContentArticles + Talks“Introducing WhyLabs, a Leap Forward in AI Reliability” (Sep 2020)“WhyLabs: The AI Observability Platform” (Sep 2020)“whylogs: Embrace Data Logging Across Your ML Systems” (Sep 2020)“Who Said Moms Can't CEO?” (May 2021)“The Critical Missing Component in the Production ML Stack” (May 2021)PeopleCassie Kozyrkov (Chief Decision Scientist at Google)Dan Jeffries (Chief Evangelist at Pachyderm and Founder of AI Infrastructure Alliance)Michael Petrochuk (Founder and CTO of WellSaid Labs)Book“The Hard Things About Hard Things” (by Ben Horowitz)NotesMy conversation with Alessya was recorded back in August 2021. Since then, many things have happened at WhyLabs.I'd recommend looking at:The self-service release of AI ObservatorySeries A fundingExploring their new integrations with Teachable Hub, UbiOps, Valohai, and Superb AILaunch of their listing on AWS MarketplaceTheir article on How Observability Uncovers the Effects of ML Technical DebtTheir achievement of SOC 2 Type 2 certificationwhylogs is evolving to a new iteration that will be even more usable and more useful than it was before. With the launch of whylogs v1 in May, users will be able to create data profiles in a fraction of the time and with a much simpler API. Additionally, WhyLabs built-in handy features such as the profile visualizer (which allows users to visualize one or multiple profiles for exploration and comparison) and constraints (which allow users to validate the quality of their data as it flows through their data pipelines).About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 88: Sales Engineering and Future of Work with Evan Cummack

    Play Episode Listen Later Apr 3, 2022 56:08


    Show Notes(02:00) Evan shared his upbringing, born and raised in a small coastal town on New Zealand's North Island and later studied Software Engineering and Business.(03:55) Evan recalled working as a software solution architect at NEC Corporation back in New Zealand.(06:17) Evan talked about his decision to join Twilio in 2011 as one of the company's early employees right after its Series B financing.(08:40) Evan shared his perspectives on joining startups and big companies as a new grad.(13:01) Evan provided insights on attributes of exceptional sales engineers, given his time building the first iteration of Twilio's global pre-sales team.(17:30) Evan unpacked the evolution of his career at Twilio — working as a product manager, a director of product & engineering, and a general manager of IoT & wireless.(22:51) Evan dissected Twilio's unique “middle-out” sales strategy, which has hugely impacted the company's incredible growth from Series B through to IPO and beyond.(29:03) Evan went over the untapped opportunity being enabled by new cellular IoT technologies.(33:25) Evan explained his decision to embark on a new journey as the CEO of Fin.com after a decade at Twilio.(37:26) Evan talked about the need for workflow automation and how Fin's product features are built to address that.(40:35) Evan went over Fin's remote performance optimization capabilities that help teams thrive in a remote-first environment.(42:56) Evan shared valuable hiring lessons to attract the right leaders who are excited about Fin's mission.(45:38) Evan shared the hurdles his team has to go through while finding early customers for Fin (as it pivoted to building a SaaS product).(48:02) Evan talked about the qualities of Jeff Lawson that made him such a great CEO.(50:41) Closing segment.Evan's Contact InfoTwitterLinkedInFin's ResourcesWebsiteLinkedInTwitter“Fin.com Raises $20M from Coatue” (Sep 2021)“Customers Operations Benchmarks for 2022” (Nov 2021)“Fin's new Experiments Product Enables CX teams to Confidently Deliver Business Process Changes that Maximize Business Impact” (Dec 2021)Mentioned ContentPeopleJack DorseyBret TaylorPaul BuchheitBook“Startup CXO: A Field Guide to Scaling Up Your Company's Critical Functions and Teams” (by Matt Blumberg)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 87: Product Experimentation, ML Platforms, and Metrics Store with Nick Handel

    Play Episode Listen Later Mar 25, 2022 87:28


    Show Notes(01:51) Nick shared his formative experiences of her childhood — moving between different schools, becoming interested in Math, and graduating from UCLA at the age of 19.(05:45) Nick recalled working as a quant analyst focused on emerging market debt at BlackRock.(09:57) Nick went over his decision to join Airbnb as a data scientist on their growth team in 2014.(12:17) Nick discussed how data science could be used to drive community growth on the Airbnb platform.(16:35) Nick led the data architecture design and experimentation platform for Airbnb Trips, one of Airbnb's biggest product launches in 2016.(20:40) Nick provided insights on attributes of exceptional data science talent, given his time interviewing hundreds of candidates to build a data science team from 20 to 85+.(23:50) Nick went over his process of leveling up his product management skillset — leading Airbnb's Machine Learning teams and growing the data organization significantly.(26:56) Nick emphasized the importance of flexibility in his work routine.(29:27) Nick unpacked the technical and organizational challenges of designing and fostering the adoption of Bighead, Airbnb's internal framework-agnostic, end-to-end platform for machine learning.(34:54) Nick recalled his decision to leave Airbnb and become the Head of Data at Branch, which delivers world-class financial services to the mobile generation.(37:24) Nick unpacked key takeaways from his Bay Area AI meetup in 2019 called “ML Infrastructure at an Early Stage Startup” related to his work at Branch.(40:55) Nick discussed his decision to pursue a startup idea in the analytics space rather than the ML space.(43:36) Nick shared the founding story of Transform, whose mission is to make data accessible by way of a metrics store.(49:54) Nick walked through the four key capabilities of a metrics store: semantics, performance, governance, and interfaces + introduced Metrics Framework (Transform's capability to create company-wide alignment around key metrics that scale with an organization through a unified framework).(55:58) Nick unpacked Metrics Catalog — Transform's capability to eliminate repetitive tasks by giving everyone a single place to collaborate, annotate data charts, and view personalized data feeds.(59:57) Nick dissected Metrics API — Transform's capability to generate a set of APIs to integrate metrics into any other enterprise tools for enriched data, dimensional modeling, and increased flexibility.(01:02:41) Nick explained how metrics store fit into a modern data analytics stack(01:05:57) Nick shared valuable hiring lessons finding talents who fit with Transform's cultural values.(01:12:27) Nick shared the hurdles his team has to go through while finding early design partners for Transform.(01:15:38) Nick shared upcoming go-to-market initiatives that he's most excited about for Transform.(01:17:46) Nick shared fundraising advice for founders currently seeking the right investors for their startups.(01:20:45) Closing segment.Nick's Contact InfoLinkedInTwitterMediumTransform's ResourcesWebsiteBlogLinkedIn | TwitterMentioned ContentArticles + Talks“ML Infrastructure at an Early Stage” (March 2019)“Why We Founded Transform” (June 2021)“My Experience with Airbnb's Early Metrics Store” (June 2021)“The 4 Pillars of Our Workplace Culture” (Aug 2021)PeopleAirbnb's Metrics Repo Team (Paul Yang, James Mayfield, Will Moss, Jonathan Parks, and Aaron Keys)Maxime Beauchemin (Founder and CEO of Preset, Creator of Apache Airflow and Apache Superset)Emilie Schario (Data Strategist In Residence at Amplify Partners, Previously Head of Data at Netlify)Book“High-Output Management” (by Andy Grove)NotesMy conversation with Nick was recorded back in July 2021. Since then, many things have happened at Transform. I'd recommend:Registering for the Metrics Store Summit that will happen at the end of April 2022Reviewing the piece about 4 Pillars of Transform's Workplace CultureReading Nick's post on the brief history of the metrics storeExploring Transform's integrations with Mode, Hex, and Google SheetsAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 86: Risk Management, Open-Source Governance, and Negative Engineering with Jeremiah Lowin

    Play Episode Listen Later Mar 16, 2022 81:26


    Show Notes(01:29) Jeremiah reflected on his academic interest studying Statistics and Economics at Harvard.(05:33) Jeremiah recalled his four years as a market risk manager at King Street Capital Management.(07:18) Jeremiah explained how the training in risk management has made a huge impact in his career as a startup founder.(09:48) Jeremiah then founded his own consultancy Lowin Data Company that designed and built ML systems for time series data.(12:38) Jeremiah mentioned his fascination with the rapid growth of machine learning in the past decade.(15:54) Jeremiah talked about his contribution to the Apache Airflow project and lessons learned about open-source development/governance.(21:48) Jeremiah unpacked the notion of negative engineering and shared the story behind the inception of Prefect.(27:24) Jeremiah dissected Prefect Core, the open-source framework that is stocked with all the necessary components for designing, building, testing, and running powerful data applications.(32:45) Jeremiah went over the advanced enterprise features of Prefect Cloud that complement users of Prefect Core.(36:04) Jeremiah discussed Prefect's product strategy (read his blog post "Toward Dataflow Automation," which distinguishes the difference between what a company makes and what a company sells).(40:44) Jeremiah explained how Prefect users can take advantage of the hybrid execution model.(47:08) Jeremiah walked through Prefect Server and Prefect UI that enable users to run parts of Prefect Cloud locally.(50:27) Jeremiah talked about how his team has gradually open-sourced the Prefect platform.(51:38) Jeremiah explained how Prefect settles into a "success-based pricing" model, where the cost is based entirely on the number of tasks users run successfully each month.(54:15) Jeremiah shared how to nurture a highly active community of open-source contributors to Prefect Core.(58:23) Jeremiah unpacked Prefect's hiring strategy, which emphasizes the importance of hiring a team diverse in thoughts, backgrounds, makeups, and experiences (read this fantastic guide to building a high-performance team on Prefect's website).(01:07:02) Jeremiah shared fundraising advice for founders currently seeking the right investors for their startups.(01:11:53) Jeremiah unpacked the two key pillars central to Prefect's hyper-adoption within the data world: expansion and product.(01:14:09) Closing segment.Jeremiah's Contact InfoLinkedInTwitterMediumGitHubPrefect's ResourcesWebsiteGitHub | Slack | Documentation | Twitter | MeetupCommunity UpdatesThe Prefect Guide to Building A High-Performance Team (April 2021)Prefect CloudPrefect CorePrefect's Hybrid ModelMentioned ContentArticles"Positive and Negative Engineering" (Oct 2018)"The Golden Spike" (Jan 2019)"Prefect is Open-Source!" (March 2019)"Towards Dataflow Automation" (June 2019)"The Prefect Hybrid Model" (Feb 2020)"Project Earth" (March 2020)"Open-Sourcing The Prefect Platform" (March 2020)"Your Code Will Fail (But That's Okay)" (May 2020)"Liftoff: Prefect's Series A" (Feb 2021)"Escape Velocity: Prefect's Series B" (June 2021)Talks and Podcasts"Invest Like The Best" (Jan 2017)"Task Failed Successfully" (PyData DC 2018)"Software Engineering Daily" (April 2020)"The OSS Startup Podcast" (Nov 2021)"The Sequel Show" (Jan 2022)PeopleVicki Boykis (ML Engineer at Tumblr, Newsletter Writer of Normcore Tech)Chris Riccomini (Software Engineer at WePay, Contributor of Airflow, Investor/Advisor at Prefect)Justin Gage (Newsletter Writer of Technically)Books"Creativity Inc." (by Ed Cadmull)"The Hitchhiker's Guide to the Galaxy" (by Douglas Adams, Eoin Colfer, and Thomas Tidholm)"Shoe Dog" (by Phil Knight)NotesMy conversation with Jeremiah was recorded back in July 2021. Since then, many things have happened at Prefect:The 2021 Growth ReportThe releases of Prefect Orion and Prefect Radar as part of the product roadmapThe announcement of Prefect's Premier Partnership Program for trusted partnersThe introduction of Prefect Discourse for data engineersThe latest drop of Prefect 2.0!About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 85: Ad Exchange, Stream Processing, and Data Discovery Platform with Shinji Kim

    Play Episode Listen Later Mar 2, 2022 71:09


    Show Notes(02:00) Shinji reflected on her academic experience studying Software Engineering at the University of Waterloo in the late 2000s.(04:19) Shinji shared valuable lessons learned from her undergraduate co-op experience with statistical analysis at Sun Microsystems, software engineering at Barclays Capital, and growth marketing at Facebook.(08:52) Shinji shared lessons learned from being a Management Consultant at Deloitte.(14:01) Shinji revisited her decision to quit the job at Deloitte and create a social puzzle game called Shufflepix.(17:42) Shinji went over her time working as a Product Manager at the mobile ad exchange network YieldMo.(22:25) Shinji discussed the problem of stream processing at YieldMo, which sparked the creation of Concord.(26:17) Shinji unpacked the pain points with existing stream processing frameworks and the competitive advantage of using Concord.(33:19) Shinji recalled her time at Akamai — initially as a data engineer in the Platform Engineering unit and later as a product manager for the IoT Edge Connect platform.(37:26) Shinji explained why sharing context knowledge around data remains a largely unsolved problem.(42:07) Shinji unpacked the three capabilities of an ideal data discovery platform: (1) exposing up-to-date operational metadata along with the documentation, (2) tracking the provenance of data back to its source, and (3) guiding data usage.(46:59) Shinji unpacked the benefits of plugging BI tools into data discovery platforms and collecting metadata, which facilitates better visibility and understanding.(52:36) Shinji discussed the role of a data discovery platform within the modern data stack.(53:59) Shinji shared the hurdles that her team has to go through while finding early adopters of Select Star.(55:48) Shinji shared valuable hiring lessons learned at Select Star.(01:00:00) Shinji shared fundraising advice for founders currently seeking the right investors for their startups.(01:04:41) Closing segment.Shinji's Contact InfoLinkedInTwitterMediumSelect Star's ResourcesWebsiteBlogLinkedIn | Twitter | MediumMentioned ContentArticles“The Next Evolution of Data Catalogs: Data Discovery Platforms” (Feb 2021)“Data Discovery for Business Intelligence” (May 2021)PeopleMartin Kleppmann (Author of Designing Data-Intensive Applications)Emily Riederer (Senior Analytics Manager at Capital One)Anya Prosvetova (Tableau DataDev Ambassador)Book“Managing Oneself” (by Peter Drucker)NotesMy conversation with Shinji was recorded back in July 2021. Since then, many things have happened at Select Star:General Availability launch on Product Hunt: https://www.producthunt.com/posts/selectstarSnowflake partnership on data governance: https://blog.selectstar.com/selectstar-and-snowflake-partner-to-take-data-governance-to-a-new-level-a9d274e1d4c6Case studies with Pitney Bowes and HandshakeAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 84: Business Development and Customer Success for Emerging Technologies with Taimur Rashid

    Play Episode Listen Later Feb 17, 2022 81:50


    Show Notes(02:27) Taimur reflected on his education studying Computer Science at UT Austin in the early 2000s.(06:26) Taimur recalled his first job working as a quality assurance engineer at Vignette.(08:47) Taimur went through his time with Oracle / Siebel, where he transitioned from a purely technical engineering-focused role to more customer-facing functions.(13:44) Taimur reflected on his proudest accomplishments at Oracle.(18:23) Taimur recalled dropping out of studying at the Stanford Center of Professional Development and moving to Seattle to work for Amazon Web Services.(20:35) Taimur provided insights on attributes of exceptional sales talent, given his time as an enterprise sales manager in his first two years at AWS.(23:55) Taimur shared anecdotes of successful product launches and their market expansion strategies while leading business development for AWS's database and compute services.(28:33) Taimur discussed instituting the culture of customer obsession and operational excellence into his teams - while leading the incubation, market development, and technical go-to-market strategy and execution for the AWS Platform across infrastructure, data, developer services, and emerging technologies.(33:14) Taimur talked about his decision to join Microsoft to lead the Worldwide Customer Success function for their Azure Data Platform, Analytics, and AI business.(36:24) Taimur unpacked his talk called “Enabling Customer Success through Evolutionary Architectures.”(43:07) Taimur compared the BizOps culture between Azure and AWS.(46:29) Taimur discussed his decision to onboard Redis as their Chief Business Development Officer.(50:07) Taimur went over the data challenges with operational ML, the emerging data architecture of feature stores, and the powerful capabilities of Redis as a solution.(55:58) Taimur unpacked key ideas in his talk "First Principles in Building A Real-Time AI Platform."(01:01:52) Taimur hinted at Redis' product vision of "caching for ML data."(01:05:21) Taimur gave advice for a smart, driven operator who wants to explore angel investing.(01:10:17) Taimur described the evolution of tech leadership, strategic business development, and customer success strategies in the past two decades.(01:15:29) Taimur shared three books that have greatly influenced his life.(01:16:48) Closing segment.Taimur's Contact InfoLinkedInTwitterRedis ProfileRedis' ResourcesWebsiteRedis Open Source | Redis Enterprise Software | Redis Enterprise CloudRedis AILinkedIn | Twitter | Facebook | YouTube"Redis Labs Becomes Redis" (Aug 2021)Mentioned ContentPeopleAndy Jassy (CEO of Amazon)Melanie Perkins (CEO of Canva)Jeff Lawson (CEO of Twilio)Books"Man's Search For Meaning" (by Viktor Frankl)"Thinking In Systems" (by Donella Meadows)"A Treasury of Rumi" (by Muhammad Isa Waley and Rumi)"Start With Why" (by Simon Sinek)Talks"First Principles in Building A Real-Time AI Platform" (March 2021)"Redis as an Online Feature Store" (April 2021)"Redis as an online feature store, Redis Labs" (May 2021)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 83: Startup Scrappiness, Venture Matchmaking, and Thinking In Bets with Leigh-Marie Braswell

    Play Episode Listen Later Feb 9, 2022 59:20


    Show Notes(01:43) Leigh-Marie shared her formative experiences of her childhood — growing up in Alabama, solving math problems competitively, and going to Phillips Exeter Academy.(04:21) Leigh-Marie discussed her undergraduate experience at MIT studying Math with Computer Science.(06:41) Leigh-Marie went through her internship experience at Jane Street and Blend.(10:07) Leigh-Marie recalled lessons learned from interning at Google — as an ML engineer for the Research and Machine Intelligence Team and an Associate Product Manager for the Chrome Web Platform team.(13:39) Leigh-Marie talked about her decision to join the early founding team of Scale API (now known as Scale AI) after finishing MIT.(17:30) Leigh-Marie explained why labeled data is the key bottleneck to the growth of the ML industry.(20:02) Leigh-Marie discussed the engineering and product challenges of dealing with 3D sensor data.(22:33) Leigh-Marie unpacked her experience building Scale's Sensor Fusion Annotation product from scratch, from gathering customer interests to building the initial MVP.(26:45) Leigh-Marie talked about learning curves during Scale's scaling phase, as the product had more advanced features and the customer list grew.(32:21) Leigh-Marie dived into Scale's credo emphasizing a relentless speed of execution.(35:00) Leigh-Marie shared valuable hiring lessons at Scale's early days (Read Alex's blog post about Scale's hiring philosophy).(38:05) Leigh-Marie went over the importance of developing uncompressed understandings of how everything works together as Scale grows.(41:39) Leigh-Marie shared her advice for folks who want to get into angel investing.(44:02) Leigh-Marie shared her motivation behind joining Founders Fund (Read Founders Fund's investment manifesto).(46:56) Leigh-Marie went over how she has been proving value upfront and forming investment theses as a new investor.(49:10) Leigh-Marie shared advice she has been giving to companies regarding their product-market fit and go-to-market fit strategies.(50:38) Leigh-Marie reflected on her transitions from software engineering to product management to venture capital.(52:31) Leigh-Marie shared the lesson learned from playing poker that benefits her careers in startup and venture.(54:19) Closing segment.Leigh-Marie's Contact InfoSubstackTwitterLinkedInGitHubQuoraFounders FundPeoplePeter ThielAli PartoviTrae StephensBooks“Angels” (by Jason Calanacis)“Zero To One” (by Blake Masters and Peter Thiel)“7 Powers: The Foundations of Business Strategy” (by Hamilton Helmer)Blog Posts“The One Data Platform To Rule Them All” (July 2021)“Startup Opportunities in Machine Learning Infrastructure” (Sep 2021)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 82: Enabling AI-Powered AR Navigation For Driving with Chen-Ping Yu

    Play Episode Listen Later Feb 3, 2022 56:42


    Show Notes(01:47) Chen-Ping shared his upbringing growing up in Taiwan and going to boarding school in the US at the age of 14.(04:42) Chen-Ping got his Bachelor's and Master's degrees in Computer Science from RIT back in the early-to-mid 2000, in which he did academic research in computational neuroscience.(08:18) Chen-Ping walked through his MS thesis at RIT, designing and implementing a computational model of neurons from the visual cortex's medial superior temporal area.(10:18) Chen-Ping talked about the academic culture shock of pursuing his Master's degree in Computer Science and Engineering at Penn State.(13:47) Chen-Ping walked through his MS thesis at Penn State, proposing a statistical asymmetry-based automatic brain tumor detection from 3D MR images.(18:35) Chen-Ping discussed the thread of his research as a Ph.D. student at Stony Brook, where he worked at the Computer Vision Lab and the Eye Cog Lab.(23:19) Chen-Ping unpacked his Ph.D. dissertation at Stony Brook called computational models of visual features: from proto-objects to object categories.(28:54) Chen-Ping went through his internship experience at Riverbed Technology and Shutterstock.(30:20) Chen-Ping dissected the development of a neuro-inspired deep convolutional neural network called Map-CNN for modeling human early visual information processing during his time as a Postdoc at Harvard's Cognitive and Neural Organization Lab.(32:14) Chen-Ping mentioned research areas at the intersection of computer vision and cognitive vision that he is excited about.(33:33) Chen-Ping shared the story behind the founding of Phiar with James Briscoe, an ex-classmate from RIT, and Ivy Lee, an ex-colleague from Shutterstock.(36:33) Chen-Ping discussed technical challenges with developing an ultra-lightweight Spatial AI engine that allows any vehicle to perceive its surroundings using a camera that can run in real-time at the edge on a commodity automotive computing platform.(39:36) Chen-Ping unpacked the key features of a complete Visual Mobility platform, including automobile integration, AR navigation, digitized environment, smart parking, 3rd-party integration, and reality-as-a-service.(41:16) Chen-Ping shared details around Phiar's ultra-efficient monocular depth estimation AI that runs efficiently on a mobile phone and achieves SOTA accuracies on the benchmark KITTI dataset.(43:16) Chen-Ping revisited his experience going through the Y-Combinator incubator in the summer of 2018.(44:27) Chen-Ping shared high-level fundraising advice for first-time founders.(46:30) Chen-Ping talked about strategies he found useful to identify the right client partnerships for Phiar.(48:10) Chen-Ping shared valuable hiring lessons learned at Phiar.(51:37) Chen-Ping reflected on the difference between being a researcher and a founder.(53:43) Closing segment.Chen-Ping's Contact InfoLinkedInTwitterGoogle ScholarPhiar's ResourcesWebsiteLinkedIn | Twitter | Facebook | YouTube“Phiar Secures $12M Series A and Names Google Head of Android Automotive Platforms as CEO” (Sep 2021)Mentioned ContentPeopleFei-Fei LiYann LeCunYoshua BengioBooks and Papers“Zero To One” (by Blake Masters and Peter Thiel)“Modeling Clutter Perception using Parametric Proto-object Partitioning” (NIPS 2013)“Modeling visual clutter perception using proto-object segmentation” (June 2014)“Searching for Category-Consistent Features: A Computational Approach to Understanding Visual Category Representation” (May 2016)“Generating the features for category representation using a deep convolutional neural network” (Sep 2016)“Map-CNN: A Convolutional Neural Network with Map-like Organizations” (Aug 2017)“Mid-level visual features underlie the high-level categorical organization of the ventral stream” (Sep 2018)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 81: Research, Engineering, and Product in Machine Learning with Aarti Bagul

    Play Episode Listen Later Jan 20, 2022 63:25


    Timestamps(02:00) Aarti shared her upbringing growing up in India and going to New York for undergraduate.(04:47) Aarti recalled her academic experience getting dual degrees in Computer Science and Computer Engineering at New York University.(07:17) Aarti shared details about her involvement with the ACM chapter and the Women in Computing club at NYU.(10:46) Aarti shared valuable lessons from her research internships.(14:16) Aarti discussed her decision to pursue an MS degree in Computer Science at Stanford University.(20:27) Aarti reflected on her learnings being the Head Teaching Assistant for CS 230, one of Stanford's most popular Deep Learning courses.(23:59) Aarti shared her thoughts on ML applications in both clinical and administrative healthcare settings.(26:47) Aarti unpacked the motivation and empirical work behind CheXNet, an algorithm that can detect pneumonia from chest X-rays at a level exceeding practicing radiologists.(29:39) Aarti went over the implications of MURA, a large dataset of musculoskeletal radiographs containing over 40,000 images from close to 15,000 studies, for ML applications in radiology.(32:50) Aarti went over her experience working briefly as an ML engineer at Andrew Ng's startup Landing AI and applying ML to visual inspection tasks in manufacturing.(36:56) Aarti talked about her participation in external entrepreneurial initiatives such as Threshold Venture Fellowship and Greylock X Fellowship.(43:41) Aarti reminisced her time in a hybrid ML engineer/product manager/VC associate role at AI Fund, which works intensively with entrepreneurs during their startups' most critical and risky phase from 0 to 1.(48:43) Aarti shared advice that AI fund companies tended to receive regarding product-market fit and go-to-market fit strategy.(54:04) Aarti walked through her decision to onboard Snorkel AI, the startup behind the popular Snorkel open-source project capable of quickly generating training data with weak supervision.(56:36) Aarti reflected on the difference between being an ML researcher and an ML engineer.(01:00:18) Closing segment.Aarti's Contact InfoLinkedInTwitterGoogle ScholarPeopleAndrew NgJohn LangfordDavid SontagBooks and Papers“The Art of Doing Science & Engineering” (by Richard Hamming)“Deep Medicine: How AI Can Make Healthcare Human Again” (by Eric Topol)“CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning” (Dec 2017)“MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs” (May 2018)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 80: Creating The Sense of Sight with Alberto Rizzoli

    Play Episode Listen Later Jan 14, 2022 69:49


    Timestamps(02:14) Alberto briefly shared his upbringing and education at the Bayes Business School in London.(04:01) Alberto shared key learnings from his first entrepreneurial stint at 19 by developing a 3D printing product for ed-tech.(07:48) Alberto described his overall experience participating in Singularity University's Graduate Studies Program at the NASA Ames Research Park under a Google-funded scholarship in 2015.(12:52) Alberto helped develop the Aipoly product to aid the blind and visually impaired.(17:38) Alberto showed his enthusiasm for federated learning applications within mobile devices.(19:53) Alberto talked about the dichotomy between capitalism and social good in entrepreneurship.(22:29) Alberto shared the backstory behind the founding of V7 Labs.(26:40) Alberto discussed the comparison between biological and artificial neural networks.(28:02) Alberto emphasized the importance of having a good co-founder.(30:27) Alberto dissected the notable features developed within V7's Annotation capability.(33:37) Alberto went over things to look for in a video labeling tool, citing his blog post.(37:21) Alberto unpacked key principles behind V7's robust Dataset Management tool.(40:53) Alberto walked through the powerful capabilities of V7 Neurons that power its Model Automation tool.(43:33) Alberto shared fundraising advice for founders seeking the right investors for their startups.(46:07) Alberto shared valuable hiring and culture-setting lessons learned at V7.(50:12) Alberto emphasized the importance of not losing sight of the ‘ideal customer' for young founders in the AI space.(53:01) Alberto shared the hurdles his team has to go through while finding new customers in new industries.(55:10) Alberto walked through labeling challenges dealing with medical imaging datasets.(57:35) Alberto discussed outreach initiatives that helped drive V7's organic growth.(59:49) Alberto mentioned the importance of collaboration between companies within the MLOps ecosystem.(01:02:01) Alberto touched on the scientific hunger of Europe regarding the adoption of AI technologies.(01:03:49) Alberto briefly mentioned what public recognition means to him in the pursuit of democratizing AI for the world.(01:06:07) Closing segment.Alberto's Contact InfoWebsiteLinkedInTwitterMediumV7's ResourcesWebsiteSoftware 2.0 BlogAcademy TutorialsDocumentationLinkedIn | TwitterMentioned ContentArticles“7 Things We Looked for in a Video Labeling Tool” (Aug 2020)“The Biggest Mistake I've Ever Made: Losing Sight of the Ideal Customer” (March 2021)Talks“An AI Narrator for the Blind” (TEDx Geneva 2016)“If The Blind Could See” (TEDx Melbourne 2018)PeopleGeoff Hinton (for rethinking the ML field fundamentally)Chelsea Finn (for her work on meta-learning)Jeff Clune (for making agents that work at scale in the real world)Book“Start With Why” (by Simon Sinek)NotesV7 is hiring across all departments. Take a look at their careers page for the openings!About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 79: Analytics Culture, Digital Contracting, and Data Angels with Jessica Cherny

    Play Episode Listen Later Jan 7, 2022 64:59


    Timestamps(01:55) Jessica shared the formative experiences of her upbringing — being born in a triplet with two other sisters and growing up in an immigrant family from Russia.(05:45) Jessica shared her experience being part of UC Berkeley's first cohort of Data Science majors.(09:56) Jessica talked about her campus involvements with student-run organizations such as the Mobile Developers of Berkeley and the Data Science Society at Berkeley.(13:02) Jessica walked through her participation in initiatives like researching the CITRIS and the Banatao Institute, sitting on the leadership board of the TAMID Group, and being an Accel Scholar.(15:31) Jessica shared valuable lessons from her summer internships.(19:08) Jessica discussed her decision to join Ironclad, a Series D digital contracting startup building software to take legal teams to the next level.(22:39) Jessica provided a brief explanation of digital contracting for the uninitiated.(24:59) Jessica talked about challenges that in-house legal teams typically face and how Ironclad helps address them.(27:04) Jessica gave a tour of Ironclad's Contract Lifecycle Management software offerings.(30:00) Jessica walked through her journey of building the analytics function from scratch and providing data insights to inform business decisions cross-functionally.(33:40) Jessica shared tidbits about her time management and goal-setting systems.(34:45) Jessica walked through the end-to-end data analysis process for Ironclad's first legal analytics benchmark report analyzing economic trends caused by COVID-19.(38:38) Jessica discussed the learning curves as she took on bigger analytical responsibilities at Ironclad.(43:05) Jessica unpacked her 3-level framework for building a data analytics culture from the ground up.(48:07) Jessica shared concrete advice on positively influencing a company's culture to be data-driven.(50:27) Jessica unfolded the drive behind creating the Data Angels Community, a Slack group connecting women interested in data to resources for support, education, and opportunities.(52:25) Jessica revealed her community playbook to engage the members of Data Angels.(57:01) Jessica shared a bit of her guilty pleasure in using data for beauty and fashion.(01:00:44) Closing segment.Jessica's Contact InfoLinkedInTwitterData AngelsMentioned ContentResources"How to use contract data during COVID-19" (Ironclad Report)"Building data analytics culture from the ground up" (Women In Product 2020 Talk)"Building a data-centered culture at Ironclad" (Ironclad Article)PeopleEmily Robinson and Jacqueline Nolis (Co-Authors and Co-Hosts of “How To Build A Career in Data Science” the book and the podcast)Cassie Kozyrkov (Chief Decision Scientist at Google)Shreya Shankar (Ph.D. Student at UC Berkeley and Entrepreneur-In-Residence at Amplify Partners) (Check out my interview with Shreya as well!)Book“Everybody Lies: Big Data, New Data, and What The Internet Can Tell Us About Who We Really Are” (by Seth Stephens-Davidowitz)NotesMy conversation with Jessica was recorded back in May 2021. Jessica is now a Senior Data Analyst and Ironclad's Data Analytics team has grown to 4 so she is no longer a 1-woman show! Also, the Data Angels Slack community has over 500 members in it now!About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 78: Open-Source Investing and Data Product Management with Julia Schottenstein

    Play Episode Listen Later Jan 3, 2022 43:55


    Timestamps(01:40) Julia shared the differences growing up in New York and moving to San Francisco.(03:05) Julia discussed her overall undergraduate experience at Stanford — getting dual degrees in Computer Science and Management Science & Engineering_._(05:40) Julia went over her time as an Investment Banker at Qatalyst Partners — notably working on Microsoft's acquisition of LinkedIn.(09:11) Julia talked about her career transition to venture capital — working as an associate investor at New Enterprise Associates.(10:46) Julia emphasized the importance of getting up-to-speed and forming an investment thesis as a new investor.(15:05) Julia discussed her Series A investment in Metabase, an open-source business intelligence software project.(18:36) Julia unpacked her investment(s) in Sentry, an application monitoring platform that helps developers monitor apps in real-time to catch bugs early.(20:14) Julia explained her investment in the Series B round for Anyscale, an end-to-end computing platform that makes building and managing a scaled application across clouds as easy as developing an app on a single computer.(23:03) Julia contextualized her investments in the seed round for Datafold, a data observability platform that equips analytics engineers with the tools to address data quality issues.(24:24) Julia shared typical hiring and go-to-market decisions that companies need to make (depending upon their growth stages and product strategies).(27:05) Julia mentioned her Metabase application to help investors pick winning open-source startups.(29:05) Julia rationalized her switch to becoming a product manager at dbt Labs.(30:34) Julia peeked into the roadmap of dbt Cloud, a hosted service that helps data analysts and engineers productionize dbt deployments.(33:34) Julia went over an under-invested area and the role of interoperability within the broader data tooling ecosystem.(37:56) Julia reflected on the difference between being a venture investor and a product manager.(41:05) Closing segment.Julia's Contact InfoLinkedInTwitterdbt's ResourcesSlack CommunityCoalesce 2021 Replaysdbt LearnGitHubEvents and MeetupsMentioned ContentPeopleTristan Handy (Founder and CEO of dbt Labs)Ali Ghodsi (Co-Creator of Apache Spark, Co-Founder and CEO of Databricks)Dan Levine (General Partner at Accel Partners)Book“Working Backwards: Insights, Stories, and Secrets from Inside Amazon” (by Bill Carr and Colin Bryar)NotesMy conversation with Julia was recorded back in May 2021. Since the podcast was recorded, a lot has happened at dbt Labs! I'd recommend:Reading Julia's recent blog posts on adopting CI/CD and introducing Environment Variables in dbt Cloud.Watching the talk replays from Coalesce, dbt's 2nd annual analytics engineering conferenceListening to Season 1 of the Analytics Engineering Podcast, where Julia co-hosts with Tristan Handy to go deep into the hopes, dreams, motivations, and failures of leading data and analytics practitioners.About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 77: Delivering Modern Data Engineering with Einat Orr

    Play Episode Listen Later Dec 27, 2021 75:49


    Timestamps(1:33) Einat described her experience getting Bachelor's, Master's, and Ph.D. degrees in Mathematics from Tel Aviv University in the 90s and early 2000s.(4:01) Einat went over her Ph.D. thesis on approximation algorithms for clustering problems.(6:17) Einat discussed working as an algorithm developer for Compugen while being a Ph.D. student.(8:43) Einat went over projects she contributed to as a senior algorithm developer at Flash Networks back in 2005.(11:50) Einat mentioned achievements and lessons learned from her time as the VP of R&D at Correlix.(17:51) Einat recalled lessons from hiring engineering talent at Correlix.(19:24) Einat unpacked the engineering challenges of building SimilarWeb, a platform that gives a true 360-degree view of all digital activity across customers, prospects, partners, and competition.(24:29) Einat discussed the responsibilities of her role as the CTO of SimilarWeb.(27:40) Einat shared the founding story of Treeverse, whose mission is to simplify the lives of data engineers, data scientists, and data analysts who are transforming the world with data.(29:52) Einat explained the pain points of working with the data lake architecture and the vision that lakeFS is built upon.(34:31) Einat emphasized the importance of asking good questions to extract insights about customers' pain points.(37:57) Einat explained why data versioning-as-an-Infrastructure matters.(42:28) Einat shared the challenges of incorporating data mesh to develop a data-intensive application.(46:33) Einat provided her take on how to ensure data quality in a data lake environment.(51:02) Einat discussed roadmap prioritization for an open-source project.(52:08) Einat went over the opportunities with the metadata store, data quality, compute, and data discovery components within the data engineering ecosystem.(55:03) Einat captured the three trends on how the data engineering landscape might look in the near future.(01:00:59) Einat emphasized the role of open-source development in the data tooling ecosystem.(01:04:14) Einat fleshed out the recommended pricing strategy for open-source developers.(01:06:09) Einat revisited how lakeFS got started thanks to the Go community and evolved.(01:08:01) Einat shared valuable hiring lessons learned at Treeverse.(01:10:05) Einat described the state of the data community in Israel.(01:11:49) Closing segment.Einat's Contact InfoLinkedInTwitterEmailMentioned ContentlakeFSWebsiteGitHub@lakeFSTreeverseSlackBlog PostsWhy We Built lakeFS: Atomic and Versioned Data Lake Operations (Aug 2020)Data Versioning — Does It Mean What You Think It Means? (Aug 2020)How To Manage Your Data The Way You Manage Your Code (Oct 2020)Data Mesh Applied: How to Move Beyond The Data Lake with lakeFS (Dec 2020)Why Data Versioning as an Infrastructure Matters? (Dec 2020)Ensuring Data Quality in a Data Lake Environment (Jan 2021)The State of Data Engineering in 2021 (May 2021)PeopleAli Ghodsi (Co-Creator of Apache Spark, Co-Founder and CEO of Databricks)Shay Banon (Co-Founder and CEO of Elastic)Gwen Shapira (Engineering Leader at Confluent)Book“Designing For Data-Intensive Applications” (by Martin Kleppmann)NotesMy conversation with Einat was recorded back in April 2021. Since the podcast was recorded, a lot has happened at Treeverse! I'd recommend:Looking at their Series A announcement back in July.Reading Einat's recent articles on measuring data engineering teams, mapping data versioning projects, and finding a role model for lakeFS.Reviewing lakeFS's ongoing roadmap.Connecting with the lakeFS community by attending their upcoming events.About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 76: Modern Data Collaboration and Social Entrepreneurship with Prukalpa Sankar

    Play Episode Listen Later Nov 29, 2021 78:29


    Timestamps(02:13) Prukalpa discussed her upbringing in India and studying Engineering at the Nanyang Tech University in Singapore.(03:52) Prukalpa shared the key learnings from her summer internship as an Investment Banking Analyst at Goldman Sachs.(05:37) Prukalpa went over the seed idea for SocialCops (Read her Quora answer on the fundraising story).(11:27) Prukalpa gave a brief overview of the business model at SocialCops.(12:45) Prukalpa unpacked her talk called “How Big Data Can Influence Decisions That Actually Matter” at TEDxGateway 2017 related to the data-for-good initiatives that SocialCops facilitated.(15:23) Prukalpa shared her thoughts on the future of the Data-for-Good movement.(17:49) Prukalpa discussed the challenges that SocialCops' data teams faced and the founding story behind Atlan.(21:38) Prukalpa went over the trust-based culture that enabled SocialCops' 8-member data team to build out India's National Data Platform.(27:00) Prukalpa dissected the six principles of Atlan's DataOps Culture Code.(31:37) Prukalpa unpacked the notion of Data Catalog 3.0, which is a key value prop of the Atlan platform.(36:01) Prukalpa provided the 3-level framework to ensure data quality (detect -> prevent -> cure) and strong practices to maintain high-quality data.(40:19) Prukalpa revealed the challenges that organizations face when starting their data governance initiatives.(45:35) Prukalpa talked about the under-invested building blocks of modern data platforms.(49:24) Prukalpa raised the importance of integration for Atlan to work well with the rest of the modern data stack.(50:39) Prukalpa recapped the trends that Chief Data Officers needed to watch out for in 2021.(54:01) Prukalpa gave fundraising advice for founders currently seeking the right investors for their startups.(58:42) Prukalpa discussed Atlan's outreach initiatives to engage with the broader data community actively.(01:01:03) Prukalpa went over Atlan's hiring philosophy based on the concept of People-as-a-Moat to attract, engage, and grow top talent — as inspired by the McKinsey advantage.(01:05:22) Prukalpa shared Atlan's Go-To-Market initiatives in the US this year and emphasized the importance of building an execution machine.(01:08:53) Prukalpa described the state of the data community in India.(01:10:25) Prukalpa shared entrepreneurship books that have deeply impacted her startup journey.(01:12:16) Prukalpa briefly mentioned what public recognition means to her in the pursuit of democratizing data for the world.(01:14:23) Closing segment.Prukalpa's Contact InfoLinkedInTwitterMentioned ContentAtlan (Twitter | LinkedIn | Facebook | Instagram | YouTube | Documentation)“Empowering Organizations to Become Masters of Their Data” (Video)Atlan Labs (Open-Source Projects)Humans of Data Interviews (Interviews)The DataOps Culture Code (Document)Building a Business Case for DataOps (EBook)The Data Catalog Primer (EBook)The Ultimate Guide to Evaluating a Data Catalog (EBook)Blog PostsVoices In The Head of a Middle-Class Aspiring Startup Founder (July 2013)SocialCops: What We Actually Do (Oct 2016)People-as-a-Moat: What Startups Can Learn From McKinsey About Building A Strong Company (Aug 2018)Going from Great People to Greater Teams: How We Think About Growth at Atlan (August 2018)Onwards and Upwards: Chapter 2 for SocialCops (July 2019)What is data quality? (Jan 2021)Top 5 Data Trends For CDOs to Watch Out For In 2021 (Feb 2021)Data Catalog 3.0: Modern Metadata for the Modern Data Stack (Feb 2021)We Failed To Setup a Data Catalog 3x. Here's Why (March 2021)The Building Blocks of a Modern Data Platform (March 2021)Data Governance Has a Serious Branding Problem (Nov 2021)Books“The Hard Things About The Hard Things” (by Ben Horowitz)“Hatching Twitter” (by Nick Bilton)“The McKinsey Way” (by Ethan Rasiel)“How Google Works” (by Eric Schmidt and Jonathan Rosenberg)“The Mom Test” (by Rob Fitzpatrick)“Disciplined Entrepreneurship” (by Bill Aulet)“Big Data” (by Mayer-Schnonberger and Cukier)TalksGame of Life (TEDxIIMShilong — March 2014)How Big Data Can Influence Decisions That Actually Matter (TEDx Gateway — April 2017)Better Villages Through Big Data (TED Talks India — December 2017)The power of data science to measure unmeasured parameters in Emerging Markets (PyData Dehli — Oct 2019)The Girl Who Thinks In Numbers: Data Warrior Prukalpa Sankar (Feb 2020)NotesMy conversation with Prukalpa was recorded back in April 2021. Since the podcast was recorded, a lot has happened at Atlan!They raised a $16M Series A led by Insight Partners, with participation from Sequoia Capital, Waterbridge Ventures, and amazing angels such as the founding teams of Snowflake and Looker.They got mentioned in Gartner's Inaugural Market Guide for Active Metadata Management.They announced a partnership with Snowflake.Prukalpa has written more content. I'd recommend checking out:The series on metadata.The list of resources on the modern data stack.The behind-the-scenes look at how Postman's data team uses Atlan.The new way to think about data strategy using the Data Advantage Matrix.About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 75: Commoditizing Data Integration Pipelines with Michel Tricot

    Play Episode Listen Later Nov 3, 2021 57:31


    Timestamps(01:58) Michel went over his education studying at EPITA — School of Engineering and Computer Science in France.(03:50) Michel mentioned his first US internship at Siemens Corporate Research as an R&D engineer.(05:48) Michel discussed the unique challenges of building systems to handle financial data through his engineering experience at FactSet Research Systems and Murex.(07:48) Michel talked about his move to San Francisco to work as a Senior Software Engineer at Rapleaf, focusing on scaling up data integration and data management pipelines.(10:40) Michel unpacked his work building the modern data stack at LiveRamp.(16:18) Michel shared valuable leadership and hiring lessons absorbed during his time as Liveramp's Head of Data Integrations — fostering a strong culture of innovation and expanding the engineering organization significantly.(19:03) Michel dived into how to interview engineering talent for independence, autonomy, and communication.(20:56) Michel dissected the engineering architecture of the rideOS ride-hail platform (where he was a founding member and director of engineering).(26:03) Michel told the founding story of Airbyte, whose mission is to make data integration pipelines a commodity (+ the pivot that happened during Airbyte's time at Y Combinator).(32:10) Michel explained the paint points with existing data integration practices and the vision that Airbyte is moving towards.(35:07) Michel unpacked the analogy of Airbyte's approach to building a connector manufacturing plant, which is to think in onion layers.(39:13) Michel shared the challenges that are still hard for an open-source solution to address (Read his list of challenges that open-source and commercial software face to solve the data integration problem).(40:28) Michel discussed how to prioritize product roadmap while developing an open-source project.(41:59) Michel discussed pricing strategies for open-source projects (Airbyte's business models entail both self-hosted and hosted solutions).(44:17) Michel revealed the hurdles that Airbyte has overcome to find the early committers for their open-source project.(47:53) Michel shared valuable hiring lessons learned at Airbyte.(50:16) Michel shared fundraising advice for founders seeking the right investors for their startups.(52:41) Closing segment.Michel's Contact InfoTwitterLinkedInGitHubMentioned ContentAirbyte (Docs | Community | GitHub | Twitter | LinkedIn)HandbookRecipesCommunity CallOffice HoursConnector ContestBlog Posts“The Hard Things About Pivoting” (July 2020)“How Can We Commoditize Data Integration Pipelines” (Sep 2020)“How to Build Thousands of Connectors” (Oct 2020)“The Deck We Used to Raise Our Seed with Accel in 13 Days” (March 2021)PeopleJeremy Litz (Former CTO and Co-Founder of LiveRamp)Tristan Handy (CEO of dbtLabs and Editor of the Analytics Engineering Newsletter)Book“High-Growth Handbook” (by Elad Gil)NotesMy conversation with Michel was recorded back in April 2021. Since the podcast was recorded, a lot has happened at Airbyte! I'd recommend:Looking over the deck that they used to raise a $26M Series A led by Benchmark.Reading Michel's take on Airbyte's new OSS model and strategy to commoditize all data integration.Diving into Airbyte Cloud, a hosted service that takes all of the features of the open-source version and adds hosting and management, on top of a number of additional support options and enterprise features.Subscribing to Airbyte's newsletter called Weekly Bytes and exploring Airbyte Recipes.About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 74: The Next Generation of Business Intelligence with Cindi Howson

    Play Episode Listen Later Oct 20, 2021 71:11


    Timestamps(02:03) Cindi briefly shared her early interest in writing and her decision to major in English at the University of Maryland in the mid-80s.(05:22) Cindi talked about her move to Zurich for a Business Systems Specialist role at Dow Chemical.(07:35) Cindi recalled the state of Business Intelligence tools and their adoption level in the enterprises during the mid-90s.(10:53) Cindi went over her decision to pursue an MBA from the Jones Business School at Rice University, in which her MBA Thesis was about how the Internet would reshape the first-generation BI tools.(16:30) Cindi discussed how she balanced academic study and parenthood during her MBA.(20:57) Cindi talked about her proudest accomplishments as a manager at Deloitte — building BI and analytics practice in Houston.(22:48) Cindi went over her time running her independent analyst firm BI Scorecard, which advised clients on BI and analytics tool selections via rigorous evaluation criteria.(26:14) Cindi brought up her time teaching classes at The Data Warehousing Institute, which educates business leaders on the proper deployment of data warehousing strategies and technologies.(27:49) Cindi mentioned her move to become the Vice President in data and analytics at Gartner.(30:33) Cindi walked through the end-to-end process of creating Gartner's Magic Quadrant for Analytics and BI Platforms and Critical Capabilities.(33:47) Cindi explained the culture of “Selfless Excellence” at ThoughtSpot — where she currently serves as a Chief Strategy Officer.(36:11) Cindi explained the concept of “What's In It For Me” (WIIFM) that helps bring a data-driven culture to organizations.(39:34) Cindi gave a tour of ThoughtSpot's core capabilities, ranging from SearchIQ and SpotIQ to ThoughtSpot One and ThoughtSpot Embrace.(43:04) Cindi broke down her responsibilities as a Chief Data Strategy Officer working with internal and external stakeholders.(44:40) Cindi emphasized the role of partnerships between startup vendors to empower the future of BI analytics (Read her article A New Era in Analytics and BI”).(49:58) Cindi recapped takeaways from ThoughtSpot's ebook that presents 6 Top Trends and Predictions for Data, Analytics, and AI in 2021.(53:07) Cindi gave advice to companies that want to bring consumerization to enterprise analytics.(56:35) Cindi gave her two cents on the movement of Data For Good in the progress of analytics and AI in the near future.(58:58) Cindi recapped insights that she has observed from hosting The Data Chief Podcast (which features interviews with some of the most successful data leaders).(01:03:29) Cindi gave advice to female data practitioners in the early phase of their careers (Read her article on the challenges that keep women out of tech).(01:05:30) Closing segment.Cindi's ContactLinkedInTwitterThoughtSpotThe Data Chief PodcastMentioned ContentBlog Posts“Why I Joined ThoughtSpot” (April 2019)“A New Era in Analytics and BI” (August 2019)“Perfect Storm or Transformative Triumvirate: Data for Good, Data for Evil, and AI Ethics” (Nov 2019)“We Can Put a Man on the Moon, But We Can't Keep Women in Tech” (Sep 2019)6 Top Trends and Predictions for Data, Analytics, and AI in 2021 (2021 E-Book)Published Books“Successful Business Intelligence” (Nov 2013)“SAP BusinessObjects BI 4.0” (Nov 2012)Data for Good ResourcesDatakindMastercard Center for Inclusive GrowthCarnegie Mellon's Data Science for Social GoodViz for Social GoodWomen in Data ResourcesWomen in DataWomen in Big DataWomen in AnalyticsPeopleJoy Buolamwini (Computer Scientist and Digital Activist at MIT Media Lab, Founder of the Algorithmic Justice League)Cathy O'Neil (Author of “Weapons of Math Destruction”)Kate Strachnyi (Founder of DATAcated)Ralph Kimball (Original Architect of Data Warehousing)Ajeet Singh and Amit Prakash (Co-Founders of ThoughtSpot)Recommended Books“Moneyball” (by Michael Lewis)“Freakonomics” (by Steven Levitt and Stephen Dubner)NotesMy conversation with Cindi was recorded back in April 2021. Since the podcast was recorded, a lot has happened at ThoughtSpot:They unveiled their new vision for the Modern Analytics Cloud — a simple, actionable, and open approach to cloud analytics that's redefining how companies deliver value from across the entire modern data stack.They acquired Diyotta & Seekwell. With Diyotta, they're expanding the number of integrations with other cloud companies, while Seekwell gives customers the ability to operationalize insights by connecting analytics to other systems to trigger action.ThoughtSpot Everywhere launched as the first development platform to build interactive data apps with search and AI-driven analytics.Cloud growth. They announced major growth in their SaaS and cloud offerings, including their first 100 SaaS customers, 250% ARR growth from cloud products, and planned headcount growth.About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 73: Datasets for Software 2.0 with Taivo Pungas

    Play Episode Listen Later Sep 26, 2021 77:54


    Timestamps(01:30) Taivo shared briefly about his experience going through the Estonian K-12 system, as argued in his blog post written in Estonian.(05:34) Taivo described his undergraduate experience studying Computer Science at the University of Tartu and exposing to Machine Learning.(08:15) Taivo discussed his time interning at Skype and TransferWise.(10:01) Taivo went over his Master's Degree in Computer Science at ETH Zurich, where he worked on a thesis called "Uncertainty-based active imitation learning" at the Learning and Adaptive Systems Group.(17:17) Taivo talked about his time working at Starship Technologies as a Perception Engineer.(21:26) Taivo unpacked the Data Specification Manifesto that entails 3 principles for iteratively solving complex problems.(27:21) Taivo unpacked "The Two Loops Of Building Algorithmic Products" from his experience at Veriff - an Estonian startup that develops an identity verification platform.(32:11) Taivo discussed how his team at Veriff developed automation-heavy products.(36:45) Taivo shared lessons learned as a Product Manager at Veriff: leading the go-to-market strategy, establishing communication between the product and sales division, and building a unique DataOps team that creates good datasets.(44:31) Taivo described the key characteristics and properties of a tool that can address the whole data annotation workflow (Read his article "Data Loops Are The Bottleneck In Applied AI").(49:33) Taivo predicted the evolution of the DataOps discipline for AI teams in the upcoming years (Read his article "Your AI Team Needs DataOps").(54:01) Taivo untangled the relationship between sampling and labeling, and their importance in the AI development process (Read his article "Datasets Carve The Terrain of AI").(56:36) Taivo talked about the tools that he's most excited about during the transition to Software 2.0.(01:00:04) Taivo shared his journey thus far as the founder of a stealth startup.(01:06:21) Taivo revealed insider insights about the #EstonianMafia startup ecosystem.(01:09:36) Taivo shared the productivity tips that have been most useful to his personal/professional growth.(01:14:10) Closing segment.Taivo's ContactWebsiteTwitterLinkedInMediumGoogle ScholarMentioned ContentBlog PostsData Specification Manifesto"Building Automation-Heavy Products" (April 2019)"Data Loops Are The Bottleneck In Applied AI" (June 2019)"Your AI Team Needs DataOps" (July 2020)"Datasets Carve The Terrain of AI" (Nov 2020)Talks"The Two Loops Of Building Algorithmic Products" (April 2019)"How To Build Your AI Startup" (June 2020)"Datasets: The Source Code of Software 2.0" (Nov 2020)PeopleAndrej Karpathy (The Senior Director of AI at Tesla, who coined the term Software 2.0)Mike Bostock (The Creator of D3.js)Book"Surely You're Joking, Mr. Feynman" (by Richard Feynman)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 72: Folding Data with Gleb Mezhanskiy

    Play Episode Listen Later Sep 17, 2021 67:53


    Timestamps(01:42) Gleb shared briefly about his upbringing and studying Economics in university in Russia.(04:15) Gleb discussed his move to the US to pursue a Master of Information Systems Management at Carnegie Mellon University.(07:07) Gleb went over his summer internship as a Business Analyst at Autodesk.(08:40) Gleb shared the details of his project architecting data model/ETL pipelines as a PM at Autodesk.(11:34) Gleb unpacked the evolution of his career at Lyft — from an individual data analyst to a PM on data tooling and a high-impact project that he worked on.(16:54) Gleb shared valuable lessons from the experience of leading multiple cross-functional teams of engineers and growing the data organization significantly.(19:48) Gleb mentioned his time as a Product Manager at Phantom Auto, leading the development of a teleoperation product for autonomous vehicles over cellular networks.(25:28) Gleb emphasized the critical factors to consider when choosing a working environment: trusted managers/colleagues, maturity of tools/processes, and the function of data teams within the organization.(29:10) Gleb shared the story behind the founding of Datafold, whose mission is to help companies effectively leverage their data assets while making Data Engineering & Analytics a creative and enjoyable experience.(33:04) Gleb dissected the pain points with regression testing and the benefits of using Data Diff (Datafold's first product) for data engineers.(36:54) Gleb unpacked the data monitoring feature within Datafold's data observability platform.(39:45) Gleb discussed how to choose data warehousing solutions for your use cases (and made the distinction between data warehouse and data lake).(47:03) Gleb gave insights on the need for BI and data observability/quality management tools within the modern analytics stack.(50:40) Gleb emphasized the importance of tooling integration for Datafold's roadmap.(52:07) Gleb has been hosting Data Quality meetups to discuss the under-explored area of data quality.(54:02) Gleb shared his learnings from going through the YC incubator in summer 2020.(55:45) Gleb discussed the hurdles he had to jump through to find early customers of Datafold.(57:47) Gleb emphasized valuable lessons he has learned to attract the right people who are excited about Datafold's mission.(59:17) Gleb shared his advice for founders who are in the process of finding the right investors for their companies.(01:02:11) Closing segment.Gleb's Contact InfoLinkedInDatafold (Twitter and LinkedIn)Data Quality MeetupsMentioned ContentCourseHarvard's CS50: Introduction to Computer ScienceBlog PostsModern Analytics Stack (June 2020)Choosing Data Warehouse for Analytics (June 2020)3 Ways To Be Wrong About Open-Source Data Warehousing Software (June 2020)Buy Not Build (Aug 2020)Datafold Raises a $2.1M Seed Round Led by NEA (Nov 2020)Datafold + dbt: The Perfect Stack for Reliable Data Pipelines (Feb 2021)PeopleMaxime Beauchemin (Founder and CEO at Preset, creator of Apache Superset and Apache Airflow)Tobias Macey (Host of the Data Engineering Podcast)Books“How To Measure Anything” (by Douglas Hubbard)“Lean Analytics” (by Benjamin Yoskovitz and Alistair Croll)NotesMy conversation with Gleb was recorded back in March 2021. Since the podcast was recorded, a lot has happened at Datafold! I'd recommend:Reading Gleb's open-source edition of the modern data stack.Listening to Gleb's appearance on the Data Engineering podcast.Watching the lightning talks and panel discussions from recent Data Quality meetups number 4 and number 5.About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 71: Trusted AI with Saishruthi Swaminathan

    Play Episode Listen Later Sep 9, 2021 73:46


    Timestamps(01:59) Saishruthi talked about her upbringing, growing up in a rural town in India with no Internet connection and no computers.(05:50) Saishruthi discussed her undergraduate studying Electrical Engineering at Sri Sairam Engineering College in the early 2010s.(11:56) Saishruthi mentioned the projects and learnings during her two years working at Tata Consultancy Services as an instrumentation engineer.(15:57) Saishruthi went over her MS degree in Electrical Engineering at San Jose State University and her journey into data science.(22:20) Saishruthi shared the initial hurdles she faced transitioning back to school and assimilating to the US culture.(26:10) Saishruthi touched on her work with San Jose City on disaster management.(28:20) Saishruthi went over her job search process, eventually landing a data science position at IBM.(32:16) Saishruthi unpacked lessons learned from public speaking.(35:20) Saishruthi summarized IBM's data science and machine learning initiatives.(37:02) Saishruthi brought up various projects happening at IBM's Center for Open Source Data and AI Technologies, whose mission is to make open-source AI models dramatically easier to create, deploy, and manage in the enterprise.(39:40) Saishruthi unpacked the qualities needed to contribute to open-source projects and their role in shaping the development of ML technologies.(44:50) Saishruthi dissected examples of bias in ML, identified solutions to combat unwanted bias, and presented tools for that (as delivered in her talk titled “Digital Discrimination: Cognitive Bias in Machine Learning”).(49:12) Saishruthi shared her thoughts on the evolution of research and applications within the Trusted AI landscape.(54:07) Saishruthi discussed the core value propositions of IBM's Elyra, a set of AI-centric extensions to JupyterLab that aims to help data practitioners deal with the complexities of the model development lifecycle.(56:11) Saishruthi briefly shared the challenges with developing Coursera courses on data visualization with Python and with R.(01:00:47) Saishruthi went over her passion for movements such as Women In Tech and Girls Who Code.(01:03:27) Saishruthi shared details about her initiative to bring education to rural children.(01:06:36) Closing segment.Saishruthi's Contact InfoTwitterLinkedInMediumGitHubCourseraMentioned ContentTalks“Digital Discrimination: Cognitive Bias in Machine Learning” (All Things Open 2020)ProjectsAI Fairness 360AI Explainability 360Adversarial Robustness ToolkitModel Asset ExchangeData Asset ExchangeElyraCoursesData Visualization with PythonData Visualization with RAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 70: Machine Learning Testing with Mohamed Elgendy

    Play Episode Listen Later Aug 30, 2021 62:14


    Timestamps(01:44) Mohamed described his interest growing up in Egypt and studying Biomedical Engineering at Cairo University in the early 2000s.(04:22) Mohamed commented on his experience moving to the US to pursue an MBA degree and working in various software engineering roles.(07:35) Mohamed shared his experience authoring two books: (1) 3D Business Analyst: The Ultimate Hands-On Guide to Mastering Business Analysis and (2) Business Analysis for Beginners: Jump-Start Your BA Career in 4 Weeks.(13:19) Mohamed discussed his move to the Bay Area for a Senior Engineering Manager role at Twilio, managing and shipping a series of communication API products using Machine and Deep Learning.(17:39) Mohamed dissected engineering challenges building ML systems at Amazon, alongside key leadership lessons he acquired from managing Amazon's Kindle mobile and ML engineering teams.(20:50) Mohamed shared his insider perspective on Amazon's practices of customer obsession, working backward, and disagree-to-commit.(24:52) Mohamed mentioned the benefits of teaching a computer vision course for engineers at Amazon's internal Machine Learning university.(28:33) Mohamed went over the engineering (hardware + software) and ML challenges associated with building a proprietary threat detection platform at Synapse Tech Corporation (where he was the Head of Engineering).(32:03) Mohamed shared concrete technical challenges with building an ML system that performs inference on edge devices.(37:03) Mohamed revealed specific data labeling challenges while building the ML system at Synapse.(39:57) Mohamed went over his one year as the VP of Engineering for the AI Platform at Rakuten, when he incubated the idea for Kolena.(42:52) Mohamed explained the current state of ML testing infrastructure and unpacked his current project Kolena, a rigorous ML QA platform that lets users take control of their ML testing.(49:07) Mohamed has been collaborating with a few institutions, podcasters, and ML influencers to raise awareness of the importance of ML testing and different approaches to tackle the problem.(50:12) Mohamed touched on his side hustles working with Intel in autonomous drones and teaching content with Udacity's AI Nanodegree programs.(53:07) Mohamed dissected his project Mowgly, an educational platform with tracks curated by industry experts to guide users to master specific topics.(54:58) Mohamed described his experience authoring a book with Manning in 2020 called “Deep Learning For Vision Systems.”(58:51) Closing segment.Mohamed's Contact InfoLinkedInTwitterWebsiteYouTubeGitHubKolenaMentioned ContentPeopleAndrew Trask (Leader at OpenMined, Senior Research Scientist at DeepMind, Ph.D. Student at the University of Oxford)Francois Chollet (Senior Software Engineer at Google, Creator of Keras)Lex Fridman (Host of the popular Lex Fridman Podcast, AI Researcher working on autonomous vehicles and human-robot interaction at MIT)Books“Mindset” (by Carol Dweck)“Outliers” (by Malcolm Gladwell)NotesMy conversation with Mohamed was recorded back in March 2021. Here are some updates that Mohamed shared with me since then:Kolena is an ML testing and validation platform that enables teams to implement testing best practices to rigorously test their models' behavior and ship high-quality ML products much faster.Mohamed and his team have signed a couple of big enterprise customers and raised a large seed round from top-tier investors and almost every industry leader in the AI space. These were strong signals that Kolena is solving a very important problem!Mohamed's first impression on the market is: the ML market is hungry for a reliable testing platform for models. Kolena has quite of a waitlist and plans to launch early next year.About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 69: DataPrepOps, Active Learning, and Team Management with Jennifer Prendki

    Play Episode Listen Later Aug 16, 2021 88:14


    Show Notes(01:46) Jennifer shared her formative experiences growing up in France and wanting to be a physicist.(03:04) Jennifer unpacked the evolution of her academic journey in France — getting Physics degrees at Louis Pasteur University, Paris-Sud University, and Sorbonne University.(06:44) Jennifer mentioned her time as a Postdoctoral Researcher in Neutrino Physics at Duke University, where her research group lacked the funding to carry on scientific projects.(09:35) Jennifer discussed her transition from academia to industry, working as a Quantitative Research Scientist at Quantlab Financial in Houston.(13:31) Jennifer went over her move to the Bay Area, working for YuMe and Ayasdi — growing and managing early-stage data science teams at both places.(19:19) Jennifer recalled her foray into becoming a Senior Data Science Manager of the Search team at Walmart Labs. She managed the Metrics-Measurements-Insights team and the Store-Search team.(23:59) Jennifer shared the business anecdote that made her obsessed with measuring the ROI of data science.(28:46) Jennifer reflected on the opportunity to give conference talks and become a thought leader in the data science community (watch her first industry talk, “Review Analysis: An Approach to Leveraging User-Generated Content in the Context of Retail” at MLconf 2016).(31:10) Jennifer unpacked her interest in active learning and outlined existing challenges of making active learning performant in real-world ML systems.(36:58) After 1.5 years with Walmart Labs, Jennifer became the Chief Data Scientist at Atlassian. She shared the tactics to grow the Search & Smarts team of scientists and engineers from 3 to 17 people in less than 6 months across 3 locations.(40:31) Jennifer discussed the organizational and operational challenges with making ML useful in enterprises and the importance of data preparation in the modern ML stack.(47:24) Jennifer elaborated on the topic of “Agile for Data Science Teams,” which discusses that organizations that invest in ML but do not get the organizational side of things right will fail.(53:09) Jennifer went over her decision to accept a VP of Machine Learning role at Figure Eight, then a frontier startup that offers enterprise-grade labeling solutions to ML teams.(57:56) Jennifer went over the inception of her startup Alectio, whose mission is to help companies do ML more efficiently with fewer data and help the world do ML more sustainably by reducing the industry's carbon footprint.(01:04:32) Jennifer unpacked her 4-part blog series about responsible AI that calls out the need to fight bias, increase accessibility, and create more opportunities in AI.(01:09:06) Jennifer discussed the hurdles she had to jump through to find early adopters of Alectio.(01:11:03) Jennifer emphasized the valuable lessons learned to attract the right people who are excited about Alectio's mission.(01:14:38) Jennifer cautioned the danger of taking advice without thinking through how it can be applied to one's career.(01:17:09) Jennifer condensed her decade of experience navigating the tech industry as a woman into concrete advice.(01:19:19) Closing segment.Jennifer's Contact InfoLinkedInTwitterMediumAlectio's ResourcesWebsiteTwitterLinkedInWhat Is Alectio? (Video)Is Big Data Dragging Us Towards Another AI Winter? (Article)Mentioned ContentTalksThe Day Big Data Died (Oct 2020 @ Interop Digital)The Importance of Ethics in Data Science (Keynote @ Women in Analytics Conference 2019)Introduction to Active Learning (ODSC London 2018)Agile for Data Science Teams (Strata Data Conf — New York 2018)Big Data and the Advent of Data Mixology (Interop ITX — The Future of Data Summit 2017)The Limitations of Big Data In Predictive Analytics (DataEngConf SF 2017)Review Analysis: An Approach to Leveraging User-Generated Content in the Context of Retail (MLconf 2016)Articles1 — Women vs. The Workplace SeriesGender Discrimination (Oct 2015)Why Leading By Example Matters (Jan 2017)Data Scientist: the SexISTiest Job of the 21st Century? (Feb 2017)The Role of Motherhood in Gender Discrimination (March 2017)The Biggest Challenges of the Female Manager (May 2017)Parity in the Workplace: Why We Are Not There Yet (Dec 2017)The Pyramid of Needs of Professional Women (Dec 2017)2 — Management SeriesThe Secrets to Successfully Managing an Underperformer (June 2017)The Top Secrets to Managing a Rockstar (July 2017)The Real Cost of Hiring Over-Qualified Candidates in Technology (March 2018)Team Culture (May 2018)3 — Responsible AI SeriesHow We Got Responsible AI All Wrong (Part 1)Impact, Bias, and Sustainability in AI (Part 2)Increasing Accessibility to AI (Part 3)Creating More Opportunities in AI (Part 4)Book“Managing Up” (by Rosanne Badowski and Roger Gittines)NotesJennifer told me that Alectio is about to launch a community version that people will be able to compete to get the best model with the minimum amount of data this fall. Be sure to check out their blog and follow them on LinkedIn!About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 68: Threat Intelligence, Venture Stamina, and Data Investing with Sarah Catanzaro

    Play Episode Listen Later Jul 14, 2021 76:06


    Show Notes(01:48) Sarah talked about the formative experiences of her upbringing: growing up interested in the natural sciences and switching focus on terrorism analysis after experiencing the 9/11 tragedy with her own eyes.(04:07) Sarah discussed her experience studying International Security Studies at Stanford and working at the Center for International Security and Cooperation.(07:15) Sarah recalled her first job out of college as a Program Director at the Center for Advanced Defense Studies — collaborating with academic researchers to develop computational approaches that counter terrorism and piracy.(09:48) Sarah went over her time as a cyber-intelligence analyst at Cyveillance, which provided threat intelligence services to enterprises worldwide.(12:22) Sarah walked over her time at Palantir as an embedded analyst, where she observed the struggles that many agencies had with data integration and modeling challenges.(15:26) Sarah unpacked the challenges of building out the data team and applying the data work at Mattermark.(20:15) Sarah shared her opinion on the career trajectory for data analysts and data scientists, given her experience as a manager for these roles.(23:43) Sarah shared the power of having a peer group and building a team culture that she was proud of at Mattermark.(26:41) Sarah joined Canvas Ventures as a Data Partner in 2016 and shared her motivation for getting into venture capital.(29:47) Sarah revealed the secret sauce to succeed in venture — stamina.(32:00) Sarah has been an investor at Amplify Partners since 2017 and shared what attracted her about the firm's investment thesis and the team.(35:28) Sarah walked through the framework she used to prove her value upfront as the new investor at Amplify.(38:35) Sarah shared the details behind her investment on the Series A round for OctoML, a Seattle-based startup that leverages Apache TVM to enable their clients to simply, securely, and efficiently deploy any model on any hardware backend.(44:39) Sarah dissected her investment on the seed round for Einblick, a Boston-based startup that builds a visual computing platform for BI and analytics use cases.(48:45) Sarah mentioned the key factors inspiring her investment in the seed round for Metaphor Data, a meta-data platform that grew out of the DataHub open-source project developed at LinkedIn.(53:57) Sarah discussed what triggered her investment in the Series A round for Runway, a New York-based team building the next-generation creative toolkit powered by machine learning.(58:36) Sarah unpacked the advice she has been giving her portfolio companies in hiring decisions and expanding their founding team (and advice they should ignore).(01:01:29) Sarah went over the process of curating her weekly newsletter called Projects To Know (active since 2019).(01:05:00) Sarah predicted the 3 trends in the data ecosystem that will have a disproportionately huge impact in the future.(01:11:15) Closing segment.Sarah's Contact InfoAmplify PageTwitterLinkedInMediumAmplify Partners' ResourcesWebsiteTeamPortfolioBlogMentioned ContentBlog PostsOur Investment in OctoMLAnnouncing Our Investment in EinblickOur Investment in Metaphor DataOur Series A Investment in RunwayPeopleSunil Dhaliwal (General Partner at Amplify Partners)Mike Dauber (General Partner at Amplify Partners)Lenny Pruss (General Partner at Amplify Partners)Mike Volpi (Co-Founder and Partner at Index Ventures)Gary Little (Co-Founder and General Partner at Canvas Ventures)Book“Zen and the Art of Motorcycle Maintenance” (by Robert Pirsig)New UpdatesSince the podcast was recorded, Sarah has been keeping her stamina high!Her investments in Hex (data workspace for teams) and Meroxa (real-time data platform) have been made public.She has also spoken at various panels, including SIGMOD, REWORK, University of Chicago, and Utah Nerd Nights.Be sure to follow @sarahcat21 on Twitter to subscribe to her brain on the intersection of data, VC, and startups!

    Episode 67: Model Observability, AI Bias, and ML Infrastructure Ecosystem with Aparna Dhinakaran

    Play Episode Listen Later Jun 28, 2021 48:11


    Show Notes(01:39) Aparna talked about her Bachelor's degree in Electrical Engineering and Computer Science at UC Berkeley.(02:50) Aparna shared her undergraduate research experience at the Energy and Sustainable Technologies lab.(04:34) Aparna discussed valuable lessons learned from her industry internships at TubeMogul and compared the objective with that of a research environment.(08:26) Aparna then joined Uber as a software engineer on the Marketplace Forecasting team, where she led the development of Uber's first model lifecycle management system for running ML model computations at scale to power Uber's dynamic pricing algorithms.(12:40) Aparna talked about how she became interested in model monitoring while Uber's model store.(17:29) Aparna discussed her decision to join the Ph.D. program in Computer Vision at Cornell University, specifically about bias in model, after spending 3 years at Uber.(23:40) Aparna shared the backstory behind co-founding MonitorML with her brother Eswar and going through the 2019 summer batch of Y-Combinator.(26:47) Aparna discussed the acquisition of MonitorML by Arize AI, where she's currently the Chief Product Officer.(28:41) Aparna unpacked the key insights in her ongoing ML Observability blog series, which argues that model observability is the foundational platform that empowers teams to continually deliver and improve results from the lab to production.(33:17) Aparna shared her verdict for the ML tooling ecosystem in the upcoming years from her in-depth exploration of ML infrastructure tools covering data preparation, model building, model validation, and model serving.(37:01) Aparna briefly shared the challenges encountered to get the first cohort of customers for Arize.(39:23) Aparna went over valuable lessons to attract the right people who are excited about Arize's mission.(41:04) Aparna shared her advice for founders who are in the process of finding the right investors for their companies.(42:24) Aparna reasoned how participating in The Amazing Race was similar to running a startup.(44:59) Closing segment.Aparna's Contact InfoTwitterLinkedInMediumForbes ColumnWebsiteGithubGoogle ScholarArize's ResourcesWebsiteMediumLinkedInTwitterMentioned ContentBlog PostsML Infrastructure Tools for Data Preparation (May 2020)ML Infrastructure Tools for Model Building (May 2020)ML Infrastructure Tools for Production (Part 1) (May 2020)ML Infrastructure Tools for Production (Part 2) (Sep 2020)ML Infrastructure Tools — ML Observability (Feb 2021)The Model's Shipped — What Could Possibly Go Wrong? (Feb 2021)PeopleRediet Abebe (Assistant Professor of Computer Science at UC Berkeley and Junior Fellow at the Harvard Society of Fellows)Timnit Gebru (Founder of Black in AI, Ex-Research Scientist at Google)Serge Belongie (Professor of Computer Science at Cornell and Aparna's past Ph.D. advisor)Solon Barocas (Principal Researcher at Microsoft Research and Adjunct Assistant Professor of Information Science at Cornell)Manish Raghavan (Ph.D. candidate in the Computer Science department at Cornell)Kate Crawford (Principal Researcher at Microsoft Research and Co-founder/Director of research at NYU's AI Now Institute)Book“The Hard Thing About The Hard Things” (by Ben Horowitz)New UpdatesSince the podcast was recorded, a lot has happened at Arize AI!Aparna has continued writing the ML observability series: The Playbook to Monitor Your Model's Performance in Production (March 2021) and Beyond Monitoring: The Rise of Observability (May 2021).Arize has been recognized in Forbes's AI 50 2021: Most Promising AI Companies.Aparna has also contributed to Forbes various articles: from the Chronicles of AI Ethics and Q&A with Ethics researchers, to a list of Women in AI to watch and emerging ML tooling categories.About The ShowDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 66: Monitoring Models in Production with Emeli Dral

    Play Episode Listen Later Jun 9, 2021 46:16


    Show Notes(02:07) Emeli shared her educational background getting degrees in Applied Mathematics and Informatics from the Peoples' Friendship University of Russia in the early 2010s.(04:24) Emeli went over her experience getting a Master's Degree at Yandex School of Data Analysis.(07:06) Emeli reflected on lessons learned from her first job out of university working as a Software Developer at Rambler, one of the biggest Russian web portals.(09:33) Emeli walked over her first year as a Data Scientist developing e-commerce recommendation systems at Yandex.(13:38) Emeli discussed core projects accomplished as the Chief Data Scientist at Yandex Data Factory, Yandex's end-to-end data platform.(17:52) Emeli shared her learnings transitioning from an IC to a manager role.(19:21) Emeli mentioned key components of success for industrial AI, given her time as the co-founder and Chief Data Scientist at Mechanica AI.(22:40) Emeli dissected the makings of her Coursera specializations — “Machine Learning and Data Analysis” and “Big Data Essentials.”(26:14) Emeli discussed her teaching activities at Moscow Institute of Physics and Technology, Yandex School of Data Analysis, Harbour.Space, and Graduate School of Management — St. Petersburg State University.(30:12) Emeli shared the story behind the founding of Evidently AI, which is building a human interface to machine learning, so that companies can trust, monitor, and improve the performance of their AI solutions.(32:32) Emeli explained the concept of model monitoring and exposed the monitoring gap in the enterprise (read Part 1 and Part 2 of the Monitoring series).(34:13) Emeli looked at possible data quality and integrity issues while proposing how to track them (read Part 3, Part 4, and Part 5 of the Monitoring series).(36:47) Emeli revealed the pros and cons of building an open-source product.(39:13) Emeli talked about prioritizing product roadmap for Evidently AI.(41:24) Emeli described the data community in Moscow.(42:03) Closing segment.Emeli's Contact InfoLinkedInTwitterCourseraGitHubMediumEvidently AI's ResourcesWebsiteTwitterLinkedInGitHubDocumentationMentioned ContentBlog PostsML Monitoring, Part 1: What Is It and How It Differs? (Aug 2020)ML Monitoring, Part 2: Who Should Care and What We Are Missing? (Aug 2020)ML Monitoring, Part 3: What Can Go Wrong With Your Data? (Sep 2020)ML Monitoring, Part 4: How To Track Data Quality and Data Integrity? (Oct 2020)ML Monitoring, Part 5: Why Should You Care About Data And Concept Drift? (Nov 2020)ML Monitoring, Part 6: Can You Build a Machine Learning Model to Monitor Another Model? (April 2021)Courses“Machine Learning and Data Analysis”“Big Data Essentials”PeopleYann LeCun (Professor at NYU, Chief AI Scientist at Facebook)Tomas Mikolov (the creator of Word2Vec, ex-scientist at Google and Facebook)Andrew Ng (Professor at Stanford, Co-Founder of Google Brain, Coursera, and Landing AI, Ex-Chief Scientist at Baidu)Book“The Elements of Statistical Learning” (by Trevor Hastie, Robert Tibshirani, and Jerome Friedman)New UpdatesSince the podcast was recorded, a lot has happened at Evidently! You can use this open-source tool (https://github.com/evidentlyai/evidently) to generate a variety of interactive reports on the ML model performance and integrate it into your pipelines using JSON profiles.This monitoring tutorial is a great showcase of what can go wrong with your models in production and how to keep an eye on them: https://evidentlyai.com/blog/tutorial-1-model-analytics-in-production.About The ShowDatacast features long-form conversations with practitioners and researchers in the data community to walk through their professional journey and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths - from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

    Episode 65: Chaos Theory, High-Frequency Trading, and Experimentations at Scale with David Sweet

    Play Episode Listen Later May 30, 2021 56:47


    Show Notes(01:59) David recalled his undergraduate experience studying Physics and Mathematics at Duke University back in the early 90s.(05:55) David reflected on his decision to pursue a Ph.D. in Physics at the University of Maryland, College Park, specializing in Nonlinear Dynamics and Chaos Theory.(10:18) David unpacked his Nature paper called “Topology in Chaotic Scattering.”(14:43) David went over his two papers on fractal dimensions in higher-dimensional chaotic scattering following his Nature publication.(21:42) David talked about his project K Desktop Environment, which provides a free, user-friendly desktop for Linux/UNIX systems (later turned into a print book with MacMillan Publishing in 2000).(24:20) David explained the premise behind his work on Andamooka, a site that supports open content.(27:24) David walked over his time as a quantitative analyst at Thales Fund Management after finishing his Ph.D.(30:50) David discussed his 4-year stint at Lehman Brothers — moving up the ladder into a Vice President role, up until Barclay's Capital acquired it.(33:24) David talked about his proudest accomplishment during the 5-year stint as a headdesk in equities trader at KCG/GETCO.(35:37) David shared war stories while working at an investment firm called Teza Technologies and co-founding Galaxy Digital Trading (specializing in cryptocurrency trading).(41:34) David unpacked key concepts covered in his guest lectures on optimization of high-frequency trading systems at NYU Stern School of Business.(44:26) David explained his career change to work as a Machine Learning Engineer at Instagram in the summer of 2019.(47:17) David briefly mentioned his transition back to a quant trader role at 3Red Partners.(48:05) David is writing a technical book with Manning called “Tuning Up,” which provides a toolbox of experimental methods that will boost the effectiveness of machine learning systems, trading strategies, infrastructure, and more.(50:48) David reflected on the benefits of his physics academic background for his quant analyst career.(52:27) Closing segment.David's Contact InfoWebsiteLinkedInTwitterMentioned ContentPublications"Topology In Chaotic Scattering" (Nature, May 1999)"Fractal Dimension of Higher-Dimensional Chaotic Repellors" (June 1999)"Fractal Basin Boundaries in Higher-Dimensional Chaotic Scattering"Book“The Elements of Statistical Learning” (by Trevor Hastie, Robert Tibshirani, and Jerome Friedman)PeopleJim Simons (Founder of Renaissance Technologies)Michael Kearns (Professor at the University of Pennsylvania, previously leading Morgan Stanley's AI Center of Excellence)Vasant Dhar (Professor at NYU Stern School of Business, Founder of SCT Capital)Tuning Up — From A/B testing to Bayesian optimizationManning's permanent 40% discount code (good for all Manning products in all formats) for Datacast listeners: poddcast19.You can refer to this link: http://mng.bz/4MAR.Here are two free eBook codes to get copies of Tuning Up for two lucky Datacast listeners: tngdtcr-AB2C and tngdtcr-6D43You can refer to this link: http://mng.bz/G6Bq.

    Episode 64: Improving Access to High-Quality Data with Fabiana Clemente

    Play Episode Listen Later May 18, 2021 56:21


    Show Notes(02:06) Fabiana talked about her Bachelor’s degree in Applied Mathematics from the University of Lisbon in the early 2010s.(04:18) Fabiana shared lessons learned from her first job out of college as a Siebel and BI Developer at Novabase.(05:13) Fabiana discussed unique challenges while working as an IoT Solutions Architect at Vodafone.(09:56) Fabiana mentioned projects she contributed to as a Data Scientist at startups such as ODYSAI and Habit Analytics.(12:44) Fabiana talked about the two Master’s degrees she got while working in the industry (Applied Econometrics from Lisbon School of Economics and Management and Business Intelligence from NOVA IMS Information Management School).(14:41) Fabiana distinguished the difference between data science and business intelligence.(18:01) Fabiana shared the founding story of YData, the first data-centric platform with synthetic data, whose she is currently the Chief Data Officer.(21:32) Fabiana discussed different techniques to generate synthetic data, including oversampling, Bayesian Networks, and generative models.(24:01) Fabiana unpacked the key insights in her blog series on generating synthetic tabular data.(29:40) Fabiana summarized novel design and optimization techniques to cope with the challenges of training GAN models.(33:44) Fabiana brought up the benefits of using Differential Privacy as a complement to synthetic data generation.(38:07) Fabiana unpacked her post “The Cost of Poor Data Quality,” — where she defined data quality as data measures based on factors such as accuracy, completeness, consistency, reliability, and above all, whether it is up to date.(42:11) Fabiana explained the important role that data quality plays in ensuring model explainability.(44:57) Fabiana reasoned about YData’s decision to pursue the open-source strategy.(47:47) Fabiana discussed her podcast called “When Machine Learning Meets Privacy” in collaboration with the MLOps Slack community.(49:14) Fabiana briefly shared the challenges encountered to get the first cohort of customers for YData.(50:12) Fabiana went over valuable lessons to attract the right people who are excited about YData’s mission.(51:52) Fabiana shared her take on the data community in Lisbon and her effort to inspire more women to join the tech industry.(53:47) Closing segment.Fabiana’s Contact InfoLinkedInMediumTwitterYData’s ResourcesWebsiteGithubLinkedInTwitterAngelListSynthetic Data CommunityMentioned ContentBlog PostsSynthetic Data: The Future Standard for Data Science Development (April 2020)Generating Synthetic Tabular Data with GANs — Part 1 (May 2020)Generating Synthetic Tabular Data with GANs — Part 2 (May 2020)What Is Differential Privacy? (May 2020)What Is Going On With My GAN? (July 2020)How To Generate Synthetic Tabular Data? Wasserstein Loss for GANs (Sep 2020)The Cost of Poor Data Quality (Sep 2020)How Can I Explain My ML Models To The Business? (Oct 2020)Synthetic Time-Series Data: A GAN Approach (Jan 2021)Podcast“When Machine Learning Meets Privacy”PeopleJean-Francois Rajotte (Resident Data Scientist at the University of British Columbia)Sumit Mukherjee (Associate Professor of Statistics at Columbia University)Andrew Trask (Leader at OpenMined, Research Scientist at DeepMind, Ph.D. Student at the University of Oxford)Théo Ryffel (Co-Founder of Arkhn, Ph.D. Student at ENS and INRIA, Leader at OpenMined)Recent Announcements/ArticlesPartnerships with UbiOps and AlgorithmiaThe rise of DataPrepOps (March 2021)From model-centric to data-centric (March 2021)

    Episode 63: Real-World Transfer Learning with Azin Asgarian

    Play Episode Listen Later May 6, 2021 66:00


    Show Notes(02:06) Azin described her childhood growing up in Iran and going to a girls-only high school in Tehran designed specifically for extraordinary talents.(05:08) Azin went over her undergraduate experience studying Computer Science at the University of Tehran.(10:41) Azin shared her academic experience getting a Computer Science MS degree at the University of Toronto, supervised by Babak Taati and David Fleet.(14:07) Azin talked about her teaching assistant experience for a variety of CS courses at Toronto.(15:54) Azin briefly discussed her 2017 report titled “Barriers to Adoption of Information Technology in Healthcare,” which takes a system thinking perspective to identify barriers to the application of IT in healthcare and outline the solutions.(19:35) Azin unpacked her MS thesis called “Subspace Selection to Suppress Confounding Source Domain Information in AAM Transfer Learning,” which explores transfer learning in the context of facial analysis.(28:48) Azin discussed her work as a research assistant at the Toronto Rehabilitation Institute, working on a research project that addressed algorithmic biases in facial detection technology for older adults with dementia.(33:02) Azin has been an Applied Research Scientist at Georgian since 2018, a venture capital firm in Canada that focuses on investing in companies operating in the IT sectors.(38:20) Azin shared the details of her initial Georgian project to develop a robust and accurate injury prediction model using a hybrid instance-based transfer learning method.(42:12) Azin unpacked her Medium blog post discussing transfer learning in-depth (problems, approaches, and applications).(48:18) Azin explained how transfer learning could address the widespread “cold-start” problem in the industry.(49:50) Azin shared the challenges of working on a fintech platform with a team of engineers at Georgian on various areas such as supervised learning, explainability, and representation learning.(51:46) Azin went over her project with Tractable AI, a UK-based company that develops AI applications for accident and disaster recovery.(55:26) Azin shared her excitement for ML applications using data-efficient methods to enhance life quality.(57:46) Closing segment.Azin’s Contact InfoWebsiteTwitterLinkedInGoogle ScholarGitHubMentioned ContentPublications“Barriers to Adoption of Information Technology in Healthcare” (2017)“Subspace Selection to Suppress Confounding Source Domain Information in AAM TransferLearning” (2017)“A Hybrid Instance-based Transfer Learning Method” (2018)“Prediction of Workplace Injuries” (2019)“Algorithmic Bias in Clinical Populations — Evaluating and Improving Facial Analysis Technology in Older Adults with Dementia” (2019)“Limitations and Biases in Facial Landmark Detection” (2019)Blog Posts“An Introduction to Transfer Learning” (Dec 2018)“Overcoming The Cold-Start Problem: How We Make Intractable Tasks Tractable” (April 2021)PeopleYoshua Bengio (Professor of Computer Science and Operations Research at University of Montreal)Geoffrey Hinton (Professor of Computer Science at University of Toronto)Louis-Philippe Morency (Associate Professor of Computer Science at Carnegie Mellon University)Book“Machine Learning: A Probabilistic Approach” (by Kevin Murphy)Note: Azin and her collaborator are going to give a talk at ODSC Europe 2021 in June about a Georgian’s project with a portfolio company, Tractable. They have written a short blog post about it too which you can find HERE.

    Episode 62: Leading Organizations Through Analytics Transformations with Gordon Wong

    Play Episode Listen Later Apr 28, 2021 75:09


    Show Notes(02:09) Gordon briefly talked about his undergraduate studying Psychology and Philosophy at Rutgers University in the early 90s.(03:24) Gordon reflected on the first decade of his career getting into database technologies.(05:34) Gordon discussed his predilection towards consulting, specifically his role in the professional services team at AB Initio Software in the early 2000s.(08:02) Gordon recalled the challenges of leading data warehousing initiatives at Smarter Travel Media and ClickSquared in the 2000s.(13:14) Gordon emphasized the advantage of a multi-tenant database over a traditional relational database.(18:30) Gordon recalled his one-year stint at Cervello, leading business intelligence implementations for their clients.(21:59) Gordon elaborated on his projects during his 3 years as the director of business intelligence infrastructure at Fitbit.(26:09) Gordon dived into his framework of choosing data tooling vendors while at Fitbit (and how he settled with a tiny startup called Snowflake back then).(30:02) Gordon provided recommendations for startups to be data-driven.(33:24) Gordon recalled practices to foster effective collaboration while managing the 3 teams of data engineering, data warehousing, and data analytics at Fitbit.(36:44) Gordon went over his proudest accomplishment as the director of data engineering at ezCater, making substantial improvements to their data warehouse platform.(38:59) Gordon shared his framework for interviewing data engineers.(41:39) Gordon walked through his consulting engagement in analytics engineering for Zipcar and data warehousing for edX.(46:17) Gordon reflected on his time as the Vice President of business intelligence at HubSpot.(50:50) Gordon unpacked his notion of “Data Hierarchy of Needs,” which entails the five pillars — data security, data quality, system reliability, user experience, and data coverage.(56:55) Gordon discussed current opportunities for driving better social outcomes and empowering democracy through data.(59:48) Gordon shared the key criteria that enable healthy team dynamics from his hands-on experience building data teams.(01:02:13) Gordon unpacked the central features and benefits of Snowflake for the un-initiated.(01:06:25) Gordon gave his verdict for the ETL tooling landscape in the next few years.(01:08:33) Gordon described the data community in Boston.(01:09:52) Closing segment.Gordon’s Contact InfoLinkedInMentioned ContentPeopleTristan Handy (co-founder of Fishtown Analytics and co-creator of dbt)Michael Kaminsky (who coined the term “Analytics Engineering”)Barr Moses (co-founder and CEO of Monte Carlo, who coined the term “Data Observability”)Book“Start With Why” (By Simon Sinek)

    Episode 61: Meta Reinforcement Learning with Louis Kirsch

    Play Episode Listen Later Apr 18, 2021 61:04


    Show Notes(2:05) Louis went over his childhood as a self-taught programmer and his early days in school as a freelance developer.(4:22) Louis described his overall undergraduate experience getting a Bachelor’s degree in IT Systems Engineering from Hasso Plattner Institute, a highly-ranked computer science university in Germany.(6:10) Louis dissected his Bachelor thesis at HPI called “Differentiable Convolutional Neural Network Architectures for Time Series Classification,” — which addresses the problem of automatically designing architectures for time series classification efficiently, using a regularization technique for ConvNet that enables joint training of network weights and architecture through back-propagation.(7:40) Louis provided a brief overview of his publication “Transfer Learning for Speech Recognition on a Budget,” — which explores Automatic Speech Recognition training by model adaptation under constrained GPU memory, throughput, and training data.(10:31) Louis described his one-year Master of Research degree in Computational Statistics and Machine Learning at the University College London supervised by David Barber.(12:13) Louis unpacked his paper “Modular Networks: Learning to Decompose Neural Computation,” published at NeurIPS 2018 — which proposes a training algorithm that flexibly chooses neural modules based on the processed data.(15:13) Louis briefly reviewed his technical report, “Scaling Neural Networks Through Sparsity,” which discusses near-term and long-term solutions to handle sparsity between neural layers.(18:30) Louis mentioned his report, “Characteristics of Machine Learning Research with Impact,” which explores questions such as how to measure research impact and what questions the machine learning community should focus on to maximize impact.(21:16) Louis explained his report, “Contemporary Challenges in Artificial Intelligence,” which covers lifelong learning, scalability, generalization, self-referential algorithms, and benchmarks.(23:16) Louis talked about his motivation to start a blog and discussed his two-part blog series on intelligence theories (part 1 on universal AI and part 2 on active inference).(27:46) Louis described his decision to pursue a Ph.D. at the Swiss AI Lab IDSIA in Lugano, Switzerland, where he has been working on Meta Reinforcement Learning agents with Jürgen Schmidhuber.(30:06) Louis created a very extensive map of reinforcement learning in 2019 that outlines the goal, methods, and challenges associated with the RL domain.(33:50) Louis unpacked his blog post reflecting on his experience at NeurIPS 2018 and providing updates on the AGI roadmap regarding topics such as scalability, continual learning, meta-learning, and benchmarks.(37:04) Louis dissected his ICLR 2020 paper “Improving Generalization in Meta Reinforcement Learning using Learned Objectives,” which introduces a novel algorithm called MetaGenRL, inspired by biological evolution.(44:03) Louis elaborated on his publication “Meta-Learning Backpropagation And Improving It,” which introduces the Variable Shared Meta-Learning framework that unifies existing meta-learning approaches and demonstrates that simple weight-sharing and sparsity in a network are sufficient to express powerful learning algorithms.(51:14) Louis expands on his idea to bootstrap AI that entails how the task, the general meta learner, and the unsupervised objective should interact (proposed at the end of his invited talk at NeurIPS 2020).(54:14) Louis shared his advice for individuals who want to make a dent in AI research.(56:05) Louis shared his three most useful productivity tips.(58:36) Closing segment.Louis’s Contact InfoWebsiteTwitterLinkedInGoogle ScholarGitHubMentioned ContentPapers and ReportsDifferentiable Convolutional Neural Network Architectures for Time Series Classification (2017)Transfer Learning for Speech Recognition on a Budget (2017)Modular Networks: Learning to Decompose Neural Computation (2018)Contemporary Challenges in Artificial Intelligence (2018)Characteristics of Machine Learning Research with Impact (2018)Scaling Neural Networks Through Sparsity (2018)Improving Generalization in Meta Reinforcement Learning using Learned Objectives (2019)Meta-Learning Backpropagation And Improving It (2020)Blog PostsTheories of Intelligence — Part 1 and Part 2 (July 2018)Modular Networks: Learning to Decompose Neural Computation (May 2018)How to Make Your ML Research More Impactful (Dec 2018)A Map of Reinforcement Learning (Jan 2019)NeurIPS 2018, Updates on the AI Roadmap (Jan 2019)MetaGenRL: Improving Generalization in Meta Reinforcement Learning (Oct 2019)General Meta-Learning and Variable Sharing (Nov 2020)PeopleJeff Clune (for his push on meta-learning research)Kenneth Stanley (for his deep thoughts on open-ended learning)Jürgen Schmidhuber (for being a visionary scientist)Book“Grit” (by Angela Duckworth)

    Episode 60: Algorithms and Data Structures for Massive Datasets with Dzejla Medjedovic

    Play Episode Listen Later Apr 5, 2021 71:25


    Show Notes(01:58) Dzejla described her undergraduate experience studying Computer Science at the Sarajevo School of Science and Technology back in the mid-2000s.(07:59) Dzejla recapped her overall experience getting a Ph.D. in Computer Science at Stony Brook University.(14:38) Dzejla unpacked the key research problem in her Ph.D. thesis titled “Upper and Lower Bounds on Sorting and Searching in External Memory.”(19:13) Dzejla went over the details of her paper “Don’t Thrash: How to Cache Your Hash on Flash,” — which describes the Cascade Filter, an approximate-membership-query data structure that scales beyond main memory, that is an alternative to the well-known Bloom-filter data structure.(24:41) Dzejla elaborated on her work “The batched predecessor problem in external memory,” — which studies the lower bounds in three external memory models: the I/O comparison model, the I/O pointer-machine model, and the index-ability model.(29:56) Dzejla shared her learnings from being a Teaching Assistant for the Introduction to Algorithms course at Stony Brook (both at the undergraduate and graduate level).(35:08) Dzejla went over her summer internships at Microsoft’s Server and Tools Division during her Ph.D.(41:06) Dzejla reasoned about her decision to return to Sarajevo School of Science and Technology as an Assistant Professor of Computer Science.(47:22) Dzejla dissected the essential concepts and methods covered in her Data Structures, Introductory Algorithms, Advanced Algorithms, and Algorithms for Big Data courses taught at SSIT.(48:42) Dzejla provided a brief overview of the Computer Science/Software Engineering department at the International University of Sarajevo (where she has been a professor since 2017.(50:57) Dzejla briefly talked about the courses that she taught at IUS, including Intro to Programming, Human-Computer Interaction, and Algorithms/Data Structures.(52:49) Dzejla shared the challenges of writing Algorithms and Data Structures for Massive Datasets, which introduces data processing and analytics techniques specifically designed for large distributed datasets.(56:14) Dzejla explained concepts in Part 1 of the book — including Hash Tables, Approximate Membership, Bloom Filters, Frequency/Cardinality Estimation, Count-Min Sketch, and Hyperloglog.(58:38) Dzejla provided a brief overview of techniques to handle streaming data in Part 2 of the book.(01:00:14) Dzejla mentioned the data structures for large databases and external-memory algorithms in Part 3 of the book.(01:02:15) Dzejla shared her thoughts about the tech community in Sarajevo.(01:04:16) Closing segment.Dzejla’s Contact InfoLinkedInTwitterGoogle ScholarMentioned ContentPapers“Upper and Lower Bounds on Sorting and Searching in External Memory” (Dzejla’s Ph.D. Thesis, 2014)“Don’t Thrash: How to Cache Your Hash on Flash” (2012)“The batched predecessor problem in external memory” (2014)PeopleErik Demaine (Computer Science Professor at MIT)Michael Bender (Computer Science Professor at Stony Brook, Dzejla’s Ph.D. Advisor)Joseph Mitchell (Computational Geometry Professor at Stony Brook)Steven Skiena (Computer Science Professor at Stony Brook)Jeff Erickson (Computer Science Professor at UIUC)Books“Algorithms and Data Structures for Massive Datasets” (by Dzejla Medjedovic, Emin Tahirovic, and Ines Dedovic)“The Algorithm Design Manual” (by Steven Skiena)Here is a permanent 40% discount code (good for all Manning products in all formats) for Datacast listeners: poddcast19. Link at http://mng.bz/4MAR.Here is one free eBook code good for a copy of Algorithms and Data Structures for Massive Datasets for a lucky listener: algdcsr-7135. Link at http://mng.bz/Q2y6

    Episode 59: Bridging The Gap Between Data and Models with Willem Pienaar

    Play Episode Listen Later Mar 24, 2021 48:57


    Show Notes(1:45) Willem discussed his undergraduate degree in Mechatronic Engineering at Stellenbosch University in the early 2010s.(2:34) Willem recalled his entrepreneurial journey founding and selling a networking startup that provides internet access to private residents on campus.(5:37) Willem worked for two years as a Software Engineer focusing on data systems at Systems Anywhere in Capetown after college.(6:49) Willem talked about his move to Bangkok working as a Senior Software Engineer at INDEFF, a company in industrial control systems.(9:52) Willem went over his decision to join Gojek, a leading Indonesian on-demand multi-service platform and digital payment technology group.(12:16) Willem mentioned the engineering challenges associated with building complex data systems for super-apps.(14:50) Willem dissected Gojek’s ML platform, including these four solutions for various stages of the ML life cycle: Clockwork, Merlin, Feast, and Turing.(19:24) Willem recapped the lessons from designing the ML platform to meet Gojek’s scaling requirements — as delivered at Cloud Next 2018.(23:09) Willem briefly went through the key design components to incorporate Kubeflow pipelines into Gojek’s existing ML platform — as delivered at KubeCon 2019.(26:21) Willem explained the inception of Feast, an open-source feature store that bridges the gap between data and models.(32:20) Willem talked about prioritizing the product roadmap and engaging the community for an open-source project.(35:07) Willem recapped the key lessons learned and envisioned Feast's future to be a lightweight modular feature store.(37:29) Willem explained the differences between commercial and open-source feature stores (given Tecton’s recent backing of Feast).(41:36) Willem reflected on his experience living and working in Southeast Asia.(44:33) Closing segment.Willem’s Contact InfoTwitterLinkedInGitHubMentioned ContentFeastFeast Project website: feast.devFeast Slack community: #FeastFeast Documentation: docs.feast.devFeast GitHub repository: feast-dev/feastFeast on StackOverflow: stackoverflow.com/questions/tagged/feastFeast Wiki: wiki.lfaidata.foundation/display/FEAST/Feast+HomeFeast Twitter: @feast_devArticleAn Introduction to Gojek’s Machine Learning Platform (2019)Introducing Feast: An Open-Source Feature Store For Machine Learning (2019)A State of Feast (2020)Why Tecton is Backing The Feast Open-Source Feature Store (2020)TalksLessons Learned Scaling Machine Learning at GoJek on Google Cloud (Cloud Next 2018)Accelerating Machine Learning App Development with Kubeflow Pipelines (Cloud Next 2019)Moving People and Products with Machine Learning on Kubeflow (KubeCon 2019)PeopleDavid Aronchick (Open-Source ML Strategy at Azure, Ex-PM for Kubernetes at Google, Co-Founder of Kubeflow, Advisor to Tecton)Jeremy Lewi (Principal Engineer at Primer.ai, Co-Founder of Kubeflow)Felipe Hoffa (Developer Advocate for BigQuery, Data Cloud Advocate for Snowflake)BookCal Newport’s “Deep Work”Willem will be a speaker at Tecton’s apply() virtual conference (April 21-22, 2021) for data and ML teams to discuss the practical data engineering challenges faced when building ML for the real world. Participants will share best practice development patterns, tools of choice, and emerging architectures they use to successfully build and manage production ML applications. Everything is on the table from managing labeling pipelines, to transforming features in real-time, and serving at scale. Register for free now: https://www.applyconf.com/!

    Episode 58: Deep Learning Meets Distributed Systems with Jim Dowling

    Play Episode Listen Later Mar 19, 2021 79:15


    Show Notes(1:56) Jim went over his education at Trinity College Dublin in the late 90s/early 2000s, where he got early exposure to academic research in distributed systems.(4:26) Jim discussed his research focused on dynamic software architecture, particularly the K-Component model that enables individual components to adapt to a changing environment.(5:37) Jim explained his research on collaborative reinforcement learning that enables groups of reinforcement learning agents to solve online optimization problems in dynamic systems.(9:03) Jim recalled his time as a Senior Consultant for MySQL.(9:52) Jim shared the initiatives at the RISE Research Institute of Sweden, in which he has been a researcher since 2007.(13:16) Jim dissected his peer-to-peer systems research at RISE, including theoretical results for search algorithm and walk topology.(15:30) Jim went over challenges building peer-to-peer live streaming systems at RISE, such as GradientTV and Glive.(18:18) Jim provided an overview of research activities at the Division of Software and Computer Systems at the School of Electrical Engineering and Computer Science at KTH Royal Institute of Technology.(19:04) Jim has taught courses on Distributed Systems and Deep Learning on Big Data at KTH Royal Institute of Technology.(22:20) Jim unpacked his O’Reilly article in 2017 called “Distributed TensorFlow,” which includes the deep learning hierarchy of scale.(29:47) Jim discussed the development of HopsFS, a next-generation distribution of the Hadoop Distributed File System (HDFS) that replaces its single-node in-memory metadata service with a distributed metadata service built on a NewSQL database.(34:17) Jim rationalized the intention to commercialize HopsFS and built Hopsworks, an user-friendly data science platform for Hops.(36:56) Jim explored the relative benefits of public research money and VC-funded money.(41:48) Jim unpacked the key ideas in his post “Feature Store: The Missing Data Layer in ML Pipelines.”(47:31) Jim dissected the critical design that enables the Hopsworks feature store to refactor a monolithic end-to-end ML pipeline into separate feature engineering and model training pipelines.(52:49) Jim explained why data warehouses are insufficient for machine learning pipelines and why a feature store is needed instead.(57:59) Jim discussed prioritizing the product roadmap for the Hopswork platform.(01:00:25) Jim hinted at what’s on the 2021 roadmap for Hopswork.(01:03:22) Jim recalled the challenges of getting early customers for Hopsworks.(01:04:30) Jim intuited the differences and similarities between being a professor and being a founder.(01:07:00) Jim discussed worrying trends in the European Tech ecosystem and the role that Logical Clocks will play in the long run.(01:13:37) Closing segment.Jim’s Contact InfoLogical ClocksTwitterLinkedInGoogle ScholarMediumACM ProfileGitHubMentioned ContentResearch Papers“The K-Component Architecture Meta-Model for Self-Adaptive Software” (2001)“Dynamic Software Evolution and The K-Component Model” (2001)“Using feedback in collaborative reinforcement learning to adaptively optimize MANET routing” (2005)“Building Autonomic Systems Using Collaborative Reinforcement Learning” (2006)“Improving ICE Service Selection in a P2P System using the Gradient Topology” (2007)“gradienTv: Market-Based P2P Live Media Streaming on the Gradient Overlay” (2010)“GLive: The Gradient Overlay as a Market Maker for Mesh-Based P2P Live Streaming” (2011)“HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases” (2016)“Scaling HDFS to More Than 1 Million Operations Per Second with HopsFS” (2017)“Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata” (2017)“Implicit Provenance for Machine Learning Artifacts” (2020)“Time Travel and Provenance for Machine Learning Pipelines” (2020)“Maggy: Scalable Asynchronous Parallel Hyperparameter Search” (2020)Articles“Distributed TensorFlow” (2017)“Reflections on AWS’s S3 Architectural Flaws” (2017)“Meet Michelangelo: Uber’s Machine Learning Platform” (2017)“Feature Store: The Missing Data Layer in ML Pipelines” (2018)“What Is Wrong With European Tech Companies?” (2019)“ROI of Feature Stores” (2020)“MLOps With A Feature Store” (2020)“ML Engineer Guide: Feature Store vs. Data Warehouse” (2020)“Unifying Single-Host and Distributed Machine Learning with Maggy” (2020)“How We Secure Your Data With Hopsworks” (2020)“One Function Is All You Need For ML Experiments” (2020)“Hopsworks: World’s Only Cloud-Native Feature Store, now available on AWS and Azure” (2020)“Hopsworks 2.0: The Next Generation Platform for Data-Intensive AI with a Feature Store” (2020)“Hopsworks Feature Store API 2.0, a new paradigm” (2020)“Swedish startup Logical Clocks takes a crack at scaling MySQL backend for live recommendations” (2021)ProjectsApache Hudi (by Uber)Delta Lake (by Databricks)Apache Iceberg (by Netflix)MLflow (by Databricks)Apache Flink (by The Apache Foundation)PeopleLeslie Lamport (The Father of Distributed Computing)Jeff Dean (Creator of MapReduce and TensorFlow, Lead of Google AI)Richard Sutton (The Father of Reinforcement Learning — who wrote “The Bitter Lesson”)Programming BooksC++ Programming Languages books (by Scott Meyers)“Effective Java” (by Joshua Bloch)“Programming Erlang” (by Joe Armstrong)“Concepts, Techniques, and Models of Computer Programming” (by Peter Van Roy and Seif Haridi)

    Episode 57: Building Data Science Projects with Pier Paolo-Ippolito

    Play Episode Listen Later Mar 6, 2021 54:59


    Show Notes(2:20) Pier shared his college experience at the University of Southampton studying Electronic Engineering.(3:46) For his final undergraduate project, Pier developed a suite of games and used machine learning to analyze brainwaves data that can classify whether a child is affected or not by autism.(11:26) Pier went over his favorite courses and involvement with the AI Society during his additional year at the University of Southampton to get a Master’s in Artificial Intelligence.(13:40) For his Master’s thesis called “Causal Reasoning in Machine Learning,” Pier created and deployed a suite of Agent-Based and Compartmental Models to simulate epidemic disease developments in different types of communities.(26:51) Pier went over his stints as a developer intern at Fidessa and a freelance data scientist at Digital-Dandelion.(29:21) Pier reflected on his time (so far) as a data scientist at SAS Institute, where he helps their customers solve various data-driven challenges using cloud-based technologies and DevOps processes.(33:37) Pier discussed the key benefits that writing and editing technical content for Towards Data Science to his professional development.(36:31) Pier covered the threads that he kept pulling with his blog posts.(38:50) Pier talked about his Augmented Reality Personal Business Card created in HTML using the AR.js library.(41:12) Pier brought up data structures in two other impressive JavaScript projects using TensorFlow.js and ml5.js.(44:19) Pier went over his experience working with data visualization tools such as Plotly, R Shiny, and Streamlit.(47:27) Pier talked about his work on a chapter for a book called “Applied Data Science in Tourism” that is going to be published with Springer this year.(48:37) Pier shared his thoughts regarding the tech community in London.(49:19) Closing segment.Pier’s Contact InfoWebsiteLinkedInTwitterGitHubMediumPatreonKaggleMentioned Content“Alleviate Children’s Health Issues Through Games and Machine Learning”“Causal Reasoning in Machine Learning”Andrej Karpathy (Director of AI and Autopilot at Tesla)Cassie Kozyrkov (Chief Decision Scientist at Google)Iain Brown (Head of Data Science at SAS)“The Book Of Why” (By Judea Pearl)“Pattern Recognition and Machine Learning” (by Christopher Bishop)

    Episode 56: Apprehending Quantum Computation with Alba Cervera-Lierta

    Play Episode Listen Later Feb 21, 2021 77:26


    Timestamps(1:55) Alba shared her background growing up interested in studying Physics and pivoting into quantum mechanics.(3:33) Alba went over her Bachelor’s in Fundamental Physics at The University of Barcelona.(4:54) Alba continued her education with an M.S. degree that specialized in Particle Physics and Gravitation.(6:40) Alba started her Ph.D. in Physics in 2015 and discussed her first publication, “Operational Approach to Bell Inequalities: Application to Qutrits.”(9:48) Alba also spent time as a visiting scholar at the University of Oxford and the University of Madrid during her Ph.D.(11:25) Alba explained her second paper to understand the connection between maximal entanglement and the fundamental symmetries of high-energy physics.(13:27) Alba dissected her next work titled “Multipartite Entanglement in Spin Chains and The Hyperdeterminant.”(18:56) Alba shared the origin of Quantic, a quantum computation joint effort between the University of Barcelona and the Barcelona Supercomputing Center.(22:27) Alba unpacked her article “Quantum Computation: Playing The Quantum Symphony,” making a metaphor between quantum computing and musical symphony.(27:47) Alba discussed the motivation and contribution of her paper “Exact Ising Model Simulation On A Quantum Computer.”(32:51) Alba recalled creating a tutorial that ended up winning the Teach Me QISKit challenge from IBM back in 2018.(35:01) Alba elaborated on her paper “Quantum Circuits For the Maximally Entangled States,” which designs a series of quantum circuits that generate absolute maximally entangled states to benchmark a quantum computer.(38:54) Alba dissected key ideas in her paper “Data Re-Uploading For a Universal Quantum Classifier.”(43:51) Alba explained how she leveled up her knowledge of classical neural networks.(47:40) Alba shared her experience as a Postdoctoral Fellow at The Matter Lab at the University of Toronto — working on quantum machine learning and variational quantum algorithms (checked out the Quantum Research Seminars Toronto that she has been organizing).(52:18) Alba explained her work on the Meta-Variational Quantum Eigensolver algorithm capable of learning the ground state energy profile of a parametrized Hamiltonian.(59:23) Alba went over Tequila, a development package for quantum algorithms in Python that her group created.(01:04:49) Alba presented a quantum calling for new algorithms, applications, architectures, quantum-classical interface, and more (as presented here).(01:08:59) Alba has been active in education and public outreach activities about encouraging scientific vocations for young minds, especially in Catalonia.(01:12:07) Closing segment.Her Contact InfoWebsiteTwitterLinkedInGoogle ScholarGitHubHer Recommended ResourcesEwin Tang (Ph.D. Student in Theoretical Computer Science at the University of Washington)Alán Aspuru-Guzik (Professor of Chemistry and Computer Science at the University of Toronto, Alba’s current supervisor)José Ignacio Latorre (Professor of Theoretical Physics at the University of Barcelona, Alba’s former supervisor)Quantum Computation and Quantum Information (by Michael Nielsen and Isaac Chuang)Quantum Field Theory and The Standard Model (by Matthew Schwarz)The Structure of Scientific Revolutions (by Thomas Kuhn)Against Method (by Paul Feyerabend)Quantum Computing Since Democritus (by Scott Aaronson)

    Claim Datacast

    In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

    Claim Cancel