DataTalks.Club - the place to talk about data!
In this podcast episode, we talked with Will Russell about From Hackathons to Developer Advocacy.About the Speaker: Will Russell is a Developer Advocate at Kestra, known for his videos on workflow orchestration. Previously, Will built open source education programs to help up and coming developers make their first contributions in open source. With a passion for developer education, Will creates technical video content and documentation that makes technologies more approachable for developers.In this episode, we sit down with Will—developer advocate, content creator, and passionate community builder. We'll hear about his unique path through tech, the lessons he's learned, and his approach to making complex topics accessible and engaging. Whether you're curious about open source, hackathons, or what it's like to bridge the gap between developers and the broader tech community, this conversation is full of insights and inspiration.
In this podcast episode, we talked with Lavanya Gupta about Building a Strong Career in Data.About the Speaker: Lavanya is a Carnegie Mellon University (CMU) alumni of the Language Technologies Institute (LTI). She works as a Sr. AI/ML Applied Associate at JPMorgan Chase in their specialized Machine Learning Center of Excellence (MLCOE) vertical. Her latest research on long-context evaluation of LLMs was published in EMNLP 2024. In addition to having a strong industrial research background of 5+ years, she is also an enthusiastic technical speaker. She has delivered talks at events such as Women in Data Science (WiDS) 2021, PyData, Illuminate AI 2021, TensorFlow User Group (TFUG), and MindHack! Summit. She also serves as a reviewer at top-tier NLP conferences (NeurIPS 2024, ICLR 2025, NAACL 2025). Additionally, through her collaborations with various prestigious organizations, like Anita BOrg and Women in Coding and Data Science (WiCDS), she is committed to mentoring aspiring machine learning enthusiasts.In this episode, we talk about Lavanya Gupta's journey from software engineer to AI researcher. She shares how hackathons sparked her passion for machine learning, her transition into NLP, and her current work benchmarking large language models in finance. Tune in for practical insights on building a strong data career and navigating the evolving AI landscape.
In this podcast episode, we talked with Eddy Zulkifly about From Supply Chain Management to Digital Warehousing and FinOpsAbout the Speaker: Eddy Zulkifly is a Staff Data Engineer at Kinaxis, building robust data platforms across Google Cloud, Azure, and AWS. With a decade of experience in data, he actively shares his expertise as a Mentor on ADPList and Teaching Assistant at Uplimit. Previously, he was a Senior Data Engineer at Home Depot, specializing in e-commerce and supply chain analytics. Currently pursuing a Master's in Analytics at the Georgia Institute of Technology, Eddy is also passionate about open-source data projects and enjoys watching/exploring the analytics behind the Fantasy Premier League.In this episode, we dive into the world of data engineering and FinOps with Eddy Zulkifly, Staff Data Engineer at Kinaxis. Eddy shares his unconventional career journey—from optimizing physical warehouses with Excel to building digital data platforms in the cloud.
In this podcast episode, we talked with Bartosz Mikulski about Data Intensive AI.About the Speaker:Bartosz is an AI and data engineer. He specializes in moving AI projects from the good-enough-for-a-demo phase to production by building a testing infrastructure and fixing the issues detected by tests. On top of that, he teaches programmers and non-programmers how to use AI. He contributed one chapter to the book 97 Things Every Data Engineer Should Know, and he was a speaker at several conferences, including Data Natives, Berlin Buzzwords, and Global AI Developer Days. In this episode, we discuss Bartosz's career journey, the importance of testing in data pipelines, and how AI tools like ChatGPT and Cursor are transforming development workflows. From prompt engineering to building Chrome extensions with AI, we dive into practical use cases, tools, and insights for anyone working in data-intensive AI projects. Whether you're a data engineer, AI enthusiast, or just curious about the future of AI in tech, this episode offers valuable takeaways and real-world experiences.0:00 Introduction to Bartosz and his background4:00 Bartosz's career journey from Java development to AI engineering9:05 The importance of testing in data engineering11:19 How to create tests for data pipelines13:14 Tools and approaches for testing data pipelines17:10 Choosing Spark for data engineering projects19:05 The connection between data engineering and AI tools21:39 Use cases of AI in data engineering and MLOps25:13 Prompt engineering techniques and best practices31:45 Prompt compression and caching in AI models33:35 Thoughts on DeepSeek and open-source AI models35:54 Using AI for lead classification and LinkedIn automation41:04 Building Chrome extensions with AI integration43:51 Comparing Cursor and GitHub Copilot for coding47:11 Using ChatGPT and Perplexity for AI-assisted tasks52:09 Hosting static websites and using AI for development54:27 How blogging helps attract clients and share knowledge58:15 Using AI to assist with writing and content creation
In this podcast episode, we talked with Nemanja Radojkovic about MLOps in Corporations and Startups.About the Speaker: Nemanja Radojkovic is Senior Machine Learning Engineer at Euroclear.In this event,we're diving into the world of MLOps, comparing life in startups versus big corporations. Joining us again is Nemanja, a seasoned machine learning engineer with experience spanning Fortune 500 companies and agile startups. We explore the challenges of scaling MLOps on a shoestring budget, the trade-offs between corporate stability and startup agility, and practical advice for engineers deciding between these two career paths. Whether you're navigating legacy frameworks or experimenting with cutting-edge tools.1:00 MLOps in corporations versus startups6:03 The agility and pace of startups7:54 MLOps on a shoestring budget12:54 Cloud solutions for startups15:06 Challenges of cloud complexity versus on-premise19:19 Selecting tools and avoiding vendor lock-in22:22 Choosing between a startup and a corporation27:30 Flexibility and risks in startups29:37 Bureaucracy and processes in corporations33:17 The role of frameworks in corporations34:32 Advantages of large teams in corporations40:01 Challenges of technical debt in startups43:12 Career advice for junior data scientists44:10 Tools and frameworks for MLOps projects49:00 Balancing new and old technologies in skill development55:43 Data engineering challenges and reliability in LLMs57:09 On-premise vs. cloud solutions in data-sensitive industries59:29 Alternatives like Dask for distributed systems
In this podcast episode, we talked with Adrian Brudaru about the past, present and future of data engineering.About the speaker:Adrian Brudaru studied economics in Romania but soon got bored with how creative the industry was, and chose to go instead for the more factual side. He ended up in Berlin at the age of 25 and started a role as a business analyst. At the age of 30, he had enough of startups and decided to join a corporation, but quickly found out that it did not provide the challenge he wanted.As going back to startups was not a desirable option either, he decided to postpone his decision by taking freelance work and has never looked back since. Five years later, he co-founded a company in the data space to try new things. This company is also looking to release open source tools to help democratize data engineering.0:00 Introduction to DataTalks.Club1:05 Discussing trends in data engineering with Adrian2:03 Adrian's background and journey into data engineering5:04 Growth and updates on Adrian's company, DLT Hub9:05 Challenges and specialization in data engineering today13:00 Opportunities for data engineers entering the field15:00 The "Modern Data Stack" and its evolution17:25 Emerging trends: AI integration and Iceberg technology27:40 DuckDB and the emergence of portable, cost-effective data stacks32:14 The rise and impact of dbt in data engineering34:08 Alternatives to dbt: SQLMesh and others35:25 Workflow orchestration tools: Airflow, Dagster, Prefect, and GitHub Actions37:20 Audience questions: Career focus in data roles and AI engineering overlaps39:00 The role of semantics in data and AI workflows41:11 Focusing on learning concepts over tools when entering the field 45:15 Transitioning from backend to data engineering: challenges and opportunities 47:48 Current state of the data engineering job market in Europe and beyond 49:05 Introduction to Apache Iceberg, Delta, and Hudi file formats 50:40 Suitability of these formats for batch and streaming workloads 52:29 Tools for streaming: Kafka, SQS, and related trends 58:07 Building AI agents and enabling intelligent data applications 59:09Closing discussion on the place of tools like DBT in the ecosystem
In this podcast episode, we talked with Alexander Guschin about launching a career off Kaggle.About the Speaker: Alexander Guschin is a Machine Learning Engineer with 10+ years of experience, a Kaggle Grandmaster ranked 5th globally, and a teacher to 100K+ students. He leads DS and SE teams and contributes to open-source ML tools.00:00 Starting with Machine Learning: Challenges and Early Steps 13:05 Community and Learning Through Kaggle Sessions 17:10 Broadening Skills Through Kaggle Participation 18:54 Early Competitions and Lessons Learned 21:10 Transitioning to Simpler Solutions Over Time 23:51 Benefits of Kaggle for Starting a Career in Machine Learning 29:08 Teamwork vs. Solo Participation in Competitions 31:14 Schoolchildren in AI Competitions42:33 Transition to Industry and MLOps50:13 Encouraging teamwork in student projects50:48 Designing competitive machine learning tasks52:22 Leaderboard types for tracking performance53:44 Managing small-scale university classes54:17 Experience with Coursera and online teaching59:40 Convincing managers about Kaggle's value61:38 Secrets of Kaggle competition success63:11 Generative AI's impact on competitive ML65:13 Evolution of automated ML solutions66:22 Reflecting on competitive data science experience
In this podcast episode, we talked with Andrey Cheptsov about The future of AI infrastructure. About the Speaker: Andrey Cheptsov is the founder and CEO of dstack, an open-source alternative to Kubernetes and Slurm, built to simplify the orchestration of AI infrastructure. Before dstack, Andrey worked at JetBrains for over a decade helping different teams make the best developer tools. During the event, the guest, Andrey Cheptsov, founder and CEO of dstack, discussed the complexities of AI infrastructure. We explore topics like the challenges of using Kubernetes for AI workloads, the need to rethink container orchestration, and the future of hybrid and cloud-only infrastructures. Andrey also shares insights into the role of on-premise and bare-metal solutions, edge computing, and federated learning. 0:00 Andrey's Career Journey: From JetBrains to DStack 5:00 The Motivation Behind DStack 7:00 Challenges in Machine Learning Infrastructure 10:00 Transitioning from Cloud to On-Prem Solutions 14:30 Reflections on OpenAI's Evolution 17:30 Open Source vs Proprietary Models: A Balanced Perspective 21:01 Monolithic vs. Decentralized AI businesses 22:05 The role of privacy and control in AI for industries like banking and healthcare 30:00 Challenges in training large AI models: GPUs and distributed systems 37:03 DeepSpeed's efficient training approach vs. brute force methods 39:00 Challenges for small and medium businesses: hosting and fine-tuning models 47:01 Managing Kubernetes challenges for AI teams 52:00 Hybrid vs. cloud-only infrastructure 56:03 On-premise vs. bare-metal solutions 58:05 Exploring edge computing and its challenges
In this podcast episode, we talked with Tamara Atanasoska about building fair AI systems. About the Speaker: Tamara works on ML explainability, interpretability and fairness as Open Source Software Engineer at probable. She is a maintainer of fairlearn, contributor to scikit-learn and skops. Tamara has both computer science/ software engineering and a computational linguistics(NLP) background. During the event, the guest discussed their career journey from software engineering to open-source contributions, focusing on explainability in AI through Scikit-learn and Fairlearn. They explored fairness in AI, including challenges in credit loans, hiring, and decision-making, and emphasized the importance of tools, human judgment, and collaboration. The guest also shared their involvement with PyLadies and encouraged contributions to Fairlearn. 0:00 Introduction to the event and the community 1:51 Topic introduction: Linguistic fairness and socio-technical perspectives in AI 2:37 Guest introduction: Tamara's background and career 3:18 Tamara's career journey: Software engineering, music tech, and computational linguistics 9:53 Tamara's background in language and computer science 14:52 Exploring fairness in AI and its impact on society 21:20 Fairness in AI models 26:21 Automating fairness analysis in models 32:32 Balancing technical and domain expertise in decision-making 37:13 The role of humans in the loop for fairness 40:02 Joining Probable and working on open-source projects 46:20 Scopes library and its integration with Hugging Face 50:48 PyLadies and community involvement 55:41 The ethos of Scikit-learn and Fairlearn
In this podcast episode, we talked with Agita Jaunzeme about Career choices, transitions and promotions in and out of tech. About the Speaker: Agita has designed a career spanning DevOps/DataOps engineering, management, community building, education, and facilitation. She has worked on projects across corporate, startup, open source, and non-governmental sectors. Following her passion, she founded an NGO focusing on the inclusion of expats and locals in Porto. Embodying the values of innovation, automation, and continuous learning, Agita provides practical insights on promotions, career pivots, and aligning work with passion and purpose. During this event, discussed their career journey, starting with their transition from art school to programming and later into DevOps, eventually taking on leadership roles. They explored the challenges of burnout and the importance of volunteering, founding an NGO to support inclusion, gender equality, and sustainability. The conversation also covered key topics like mentorship, the differences between data engineering and data science, and the dynamics of managing volunteers versus employees. Additionally, the guest shared insights on community management, developer relations, and the importance of product vision and team collaboration. 0:00 Introduction and Welcome 1:28 Guest Introduction: Agita's Background and Career Highlights 3:05 Transition to Tech: From Art School to Programming 5:40 Exploring DevOps and Growing into Leadership Roles 7:24 Burnout, Volunteering, and Founding an NGO 11:00 Volunteering and Mentorship Initiatives 14:00 Discovering Programming Skills and Early Career Challenges 15:50 Automating Work Processes and Earning a Promotion 19:00 Transitioning from DevOps to Volunteering and Project Management 24:00 Managing Volunteers vs. Employees and Building Organizational Skills 31:07 Personality traits in engineering vs. data roles 33:14 Differences in focus between data engineers and data scientists 36:24 Transitioning from volunteering to corporate work 37:38 The role and responsibilities of a community manager 39:06 Community management vs. developer relations activities 41:01 Product vision and team collaboration 43:35 Starting an NGO and legal processes 46:13 NGO goals: inclusion, gender equality, and sustainability 49:02 Community meetups and activities 51:57 Living off-grid in a forest and sustainability 55:02 Unemployment party and brainstorming session 59:03 Unemployment party: the process and structure
In this podcast episode, we talked with Isabella Bicalho about Career advice, learning, and featuring women in ML and AI. About the Speaker: Isabella is a Machine Learning Engineer and Data Scientist with three years of hands-on AI development experience. She draws upon her early computational research expertise to develop ML solutions. While contributing to open-source projects, she runs a newsletter dedicated to showcasing women's accomplishments in data science. During this event, the guest discussed her transition into machine learning, her freelance work in AI, and the growing AI scene in France. She shared insights on freelancing versus full-time work, the value of open-source contributions, and developing both technical and soft skills. The conversation also covered career advice, mentorship, and her Substack series on women in data science, emphasizing leadership, motivation, and career opportunities in tech. 0:00 Introduction 1:23 Background of Isabella Bicalho 2:02 Transition to machine learning 4:03 Study and work experience 5:00 Living in France and language learning 6:03 Internship experience 8:45 Focus areas of Inria 9:37 AI development in France 10:37 Current freelance work 11:03 Freelancing in machine learning 13:31 Moving from research to freelancing 14:03 Freelance vs. full-time data science 17:00 Finding first freelance client 18:00 Involvement in open-source projects 20:17 Passion for open-source and teamwork 23:52 Starting new projects 25:03 Community project experience 26:02 Teaching and learning 29:04 Contributing to open-source projects 32:05 Open-source tools vs. projects 33:32 Importance of community-driven projects 34:03 Learning resources 36:07 Green space segmentation project 39:02 Developing technical and soft skills 40:31 Gaining insights from industry experts 41:15 Understanding data science roles 41:31 Project challenges and team dynamics 42:05 Turnover in open-source projects 43:05 Managing expectations in open-source work 44:50 Mentorship in projects 46:17 Role of AI tools in learning 47:59 Overcoming learning challenges 48:52 Discussion on substack 49:01 Interview series on women in data 50:15 Insights from women in data science 51:20 Impactful stories from substack 53:01 Leadership challenges in projects 54:19 Career advice and opportunities 56:07 Motivating others to step out of comfort zone 57:06 Contacting for substack story sharing 58:00 Closing remarks and connections
Reflection on an Almost Two-Year Journey of Generative AI in Industry – Maria Sukhareva About the speaker: Maria Sukhareva is a principal key expert in Artificial Intelligence in Siemens with over 15 years of experience at the forefront of generative AI technologies. Known for her keen eye for technological innovation, Maria excels at transforming cutting-edge AI research into practical, value-driven tools that address real-world needs. Her approach is both hands-on and results-focused, with a commitment to creating scalable, long-term solutions that improve communication, streamline complex processes, and empower smarter decision-making. Maria's work reflects a balanced vision, where the power of innovation is met with ethical responsibility, ensuring that her AI projects deliver impactful and production-ready outcomes. We talked about: 00:00 DataTalks.Club intro 02:13 Career journey: From linguistics to AI 08:02 The Evolution of AI Expertise and its Future 13:10 AI vulnerabilities: Bypassing bot restrictions 17:00 Non-LLM classifiers as a more robust solution 22:56 Risks of chatbot deployment: Reputational and financial 27:13 The role of AI as a tool, not a replacement for human workers 31:41 The role of human translators in the age of AI 34:49 Evolution of English and its Germanic roots 38:44 Beowulf and Old English 39:43 Impact of the Norman occupation on English grammar 42:34 Identifying mushrooms with AI apps and safety precautions 45:08 Decoding ancient languages like Sumerian 49:43 The evolution of machine translation and multilingual models 53:01 Challenges with low-resource languages and inconsistent orthography 57:28 Transition from academia to industry in AI Join our Slack: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
We talked about: 00:00 DataTalks.Club intro 00:00 Large Hadron Collider and Mentorship 02:35 Career overview and transition from physics to data science 07:02 Working at the Large Hadron Collider 09:19 How particles collide and the role of detectors 11:03 Data analysis challenges in particle physics and data science similarities 13:32 Team structure at the Large Hadron Collider 20:05 Explaining the connection between particle physics and data science 23:21 Software engineering practices in particle physics 26:11 Challenges during interviews for data science roles 29:30 Mentoring and offering advice to job seekers 40:03 The STAR method and its value in interviews 50:32 Paid vs unpaid mentorship and finding the right fit About the speaker: Anastasia is a particle physicist turned data scientist, with experience in large-scale experiments like those at the Large Hadron Collider. She also worked at Blue Yonder, scaling AI-driven solutions for global supply chain giants, and at Kaufland e-commerce, focusing on NLP and search. Anastasia is a mentor for Ml/AI, dedicated to helping her mentees achieve their goals. She is passionate about growing the next generation of data science elite in Germany: from Data Analysts up to ML Engineers. Join our Slack: https://datatalks .club/slack.html
We talked about: 00:00 DataTalks.Club intro 02:34 Career journey and transition into MLOps 08:41 Dutch agriculture and its challenges 10:36 The concept of "technical debt" in MLOps 13:37 Trade-offs in MLOps: moving fast vs. doing things right 14:05 Building teams and the role of coordination in MLOps 16:58 Key roles in an MLOps team: evangelists and tech translators 23:01 Role of the MLOps team in an organization 25:19 How MLOps teams assist product teams 27 :56 Standardizing practices in MLOps 32:46 Getting feedback and creating buy-in from data scientists 36:55 The importance of addressing pain points in MLOps 39:06 Best practices and tools for standardizing MLOps processes 42:31 Value of data versioning and reproducibility 44:22 When to start thinking about data versioning 45:10 Importance of data science experience for MLOps 46:06 Skill mix needed in MLOps teams 47:33 Building a diverse MLOps team 48:18 Best practices for implementing MLOps in new teams 49:52 Starting with CI/CD in MLOps 51:21 Key components for a complete MLOps setup 53:08 Role of package registries in MLOps 54:12 Using Docker vs. packages in MLOps 57:56 Examples of MLOps success and failure stories 1:00:54 What MLOps is in simple terms 1:01:58 The complexity of achieving easy deployment, monitoring, and maintenance Join our Slack: https://datatalks .club/slack.html
We talked about: 00:00 DataTalks.Club intro 01:56 Using data to create livable cities 02:52 Rachel's career journey: from geography to urban data science 04:20 What does a transport scientist do? 05:34 Short-term and long-term transportation planning 06:14 Data sources for transportation planning in Singapore 08:38 Rachel's motivation for combining geography and data science 10:19 Urban design and its connection to geography 13:12 Defining a livable city 15:30 Livability of Singapore and urban planning 18:24 Role of data science in urban and transportation planning 20:31 Predicting travel patterns for future transportation needs 22:02 Data collection and processing in transportation systems 24:02 Use of real-time data for traffic management 27:06 Incorporating generative AI into data engineering 30:09 Data analysis for transportation policies 33:19 Technologies used in text-to-SQL projects 36:12 Handling large datasets and transportation data in Singapore 42:17 Generative AI applications beyond text-to-SQL 45:26 Publishing public data and maintaining privacy 45:52 Recommended datasets and projects for data engineering beginners 49:16 Recommended resources for learning urban data science About the speaker: Rachel is an urban data scientist dedicated to creating liveable cities through the innovative use of data. With a background in geography, and a masters in urban data science, she blends qualitative and quantitative analysis to tackle urban challenges. Her aim is to integrate data driven techniques with urban design to foster sustainable and equitable urban environments. Links: - https://datamall.lta.gov.sg/content/datamall/en/dynamic-data.html 00:00 DataTalks.Club intro 01:56 Using data to create livable cities 02:52 Rachel's career journey: from geography to urban data science 04:20 What does a transport scientist do? 05:34 Short-term and long-term transportation planning 06:14 Data sources for transportation planning in Singapore 08:38 Rachel's motivation for combining geography and data science 10:19 Urban design and its connection to geography 13:12 Defining a livable city 15:30 Livability of Singapore and urban planning 18:24 Role of data science in urban and transportation planning 20:31 Predicting travel patterns for future transportation needs 22:02 Data collection and processing in transportation systems 24:02 Use of real-time data for traffic management 27:06 Incorporating generative AI into data engineering 30:09 Data analysis for transportation policies 33:19 Technologies used in text-to-SQL projects 36:12 Handling large datasets and transportation data in Singapore 42:17 Generative AI applications beyond text-to-SQL 45:26 Publishing public data and maintaining privacy 45:52 Recommended datasets and projects for data engineering beginners 49:16 Recommended resources for learning urban data science Join our slack: https: //datatalks.club/slack.html
We talked about: 00:00 DataTalks.Club intro 00:00 DataTalks.Club anniversary "Ask Me Anything" event with Alexey Grigorev 02:29 The founding of DataTalks .Club 03:52 Alexey's transition from Java work to DataTalks.Club 04:58 Growth and success of DataTalks.Club courses 12:04 Motivation behind creating a free-to-learn community 24:03 Staying updated in data science through pet projects 26 :37 Hosting a second podcast and maintaining programming skills 28:56 Skepticism about LLMs and their relevance 31:53 Transitioning to DataTalks.Club and personal reflections 33:32 Memorable moments and the first event's success 36:19 Community building during the pandemic 38:31 AI's impact on data analysts and future roles 42:24 Discussion on AI in healthcare 44:37 Age and reflections on personal milestones 47:54 Building communities and personal connections 49:34 Future goals for the community and courses 51:18 Community involvement and engagement strategies 53:46 Ideas for competitions and hackathons 54:20 Inviting guests to the podcast 55:29 Course updates and future workshops 56:27 Podcast preparation and research process 58:30 Career opportunities in data science and transitioning fields 1:01 :10 Book recommendations and personal reading experiences About the speaker: Alexey Grigorev is the founder of DataTalks.Club. Join our slack: https://datatalks.club/slack.html
About the speaker: Katarzyna is a computational linguist with over 10 years of experience in NLP and speech recognition. She has developed language models for automotive brands like Audi and Porsche and specializes in phonetics, morpho-syntax, and sentiment analysis. Kasia also teaches at the University of Warsaw and is passionate about human-centered AI and multilingual NLP. Join our slack: https://datatalks.club/slack.html
0:00 hi everyone Welcome to our event this event is brought to you by data dos club which is a community of people who love 0:06 data and we have weekly events and today one is one of such events and I guess we 0:12 are also a community of people who like to wake up early if you're from the states right Christopher or maybe not so 0:19 much because this is the time we usually have uh uh our events uh for our guests 0:27 and presenters from the states we usually do it in the evening of Berlin time but yes unfortunately it kind of 0:34 slipped my mind but anyways we have a lot of events you can check them in the 0:41 description like there's a link um I don't think there are a lot of them right now on that link but we will be 0:48 adding more and more I think we have like five or six uh interviews scheduled so um keep an eye on that do not forget 0:56 to subscribe to our YouTube channel this way you will get notified about all our future streams that will be as awesome 1:02 as the one today and of course very important do not forget to join our community where you can hang out with 1:09 other data enthusiasts during today's interview you can ask any question there's a pin Link in live chat so click 1:18 on that link ask your question and we will be covering these questions during the interview now I will stop sharing my 1:27 screen and uh there is there's a a message in uh and Christopher is from 1:34 you so we actually have this on YouTube but so they have not seen what you wrote 1:39 but there is a message from to anyone who's watching this right now from Christopher saying hello everyone can I 1:46 call you Chris or you okay I should go I should uh I should look on YouTube then okay yeah but anyways I'll you don't 1:53 need like you we'll need to focus on answering questions and I'll keep an eye 1:58 I'll be keeping an eye on all the question questions so um 2:04 yeah if you're ready we can start I'm ready yeah and you prefer Christopher 2:10 not Chris right Chris is fine Chris is fine it's a bit shorter um 2:18 okay so this week we'll talk about data Ops again maybe it's a tradition that we talk about data Ops every like once per 2:25 year but we actually skipped one year so because we did not have we haven't had 2:31 Chris for some time so today we have a very special guest Christopher Christopher is the co-founder CEO and 2:37 head chef or hat cook at data kitchen with 25 years of experience maybe this 2:43 is outdated uh cuz probably now you have more and maybe you stopped counting I 2:48 don't know but like with tons of years of experience in analytics and software engineering Christopher is known as the 2:55 co-author of the data Ops cookbook and data Ops Manifesto and it's not the 3:00 first time we have Christopher here on the podcast we interviewed him two years ago also about data Ops and this one 3:07 will be about data hops so we'll catch up and see what actually changed in in 3:13 these two years and yeah so welcome to the interview well thank you for having 3:19 me I'm I'm happy to be here and talking all things related to data Ops and why 3:24 why why bother with data Ops and happy to talk about the company or or what's changed 3:30 excited yeah so let's dive in so the questions for today's interview are prepared by Johanna berer as always 3:37 thanks Johanna for your help so before we start with our main topic for today 3:42 data Ops uh let's start with your ground can you tell us about your career Journey so far and also for those who 3:50 have not heard have not listened to the previous podcast maybe you can um talk 3:55 about yourself and also for those who did listen to the previous you can also maybe give a summary of what has changed 4:03 in the last two years so we'll do yeah so um my name is Chris so I guess I'm 4:09 a sort of an engineer so I spent about the first 15 years of my career in 4:15 software sort of working and building some AI systems some non- AI systems uh 4:21 at uh Us's NASA and MIT linol lab and then some startups and then um 4:30 Microsoft and then about 2005 I got I got the data bug uh I think you know my 4:35 kids were small and I thought oh this data thing was easy and I'd be able to go home uh for dinner at 5 and life 4:41 would be fine um because I was a big you started your own company right and uh it didn't work out that way 4:50 and um and what was interesting is is for me it the problem wasn't doing the 4:57 data like I we had smart people who did data science and data engineering the act of creating things it was like the 5:04 systems around the data that were hard um things it was really hard to not have 5:11 errors in production and I would sort of driving to work and I had a Blackberry at the time and I would not look at my 5:18 Blackberry all all morning I had this long drive to work and I'd sit in the parking lot and take a deep breath and 5:24 look at my Blackberry and go uh oh is there going to be any problems today and I'd be and if there wasn't I'd walk and 5:30 very happy um and if there was I'd have to like rce myself um and you know and 5:36 then the second problem is the team I worked for we just couldn't go fast enough the customers were super 5:42 demanding they didn't care they all they always thought things should be faster and we are always behind and so um how 5:50 do you you know how do you live in that world where things are breaking left and right you're terrified of making errors 5:57 um and then second you just can't go fast enough um and it's preh Hadoop era 6:02 right it's like before all this big data Tech yeah before this was we were using 6:08 uh SQL Server um and we actually you know we had smart people so we we we 6:14 built an engine in SQL Server that made SQL Server a column or 6:20 database so we built a column or database inside of SQL Server um so uh 6:26 in order to make certain things fast and and uh yeah it was it was really uh it's not 6:33 bad I mean the principles are the same right before Hadoop it's it's still a database there's still indexes there's 6:38 still queries um things like that we we uh at the time uh you would use olap 6:43 engines we didn't use those but you those reports you know are for models it's it's not that different um you know 6:50 we had a rack of servers instead of the cloud um so yeah and I think so what what I 6:57 took from that was uh it's just hard to run a team of people to do do data and analytics and it's not 7:05 really I I took it from a manager perspective I started to read Deming and 7:11 think about the work that we do as a factory you know and in a factory that produces insight and not automobiles um 7:18 and so how do you run that factory so it produces things that are good of good 7:24 quality and then second since I had come from software I've been very influenced 7:29 by by the devops movement how you automate deployment how you run in an agile way how you 7:35 produce um how you how you change things quickly and how you innovate and so 7:41 those two things of like running you know running a really good solid production line that has very low errors 7:47 um and then second changing that production line at at very very often they're kind of opposite right um and so 7:55 how do you how do you as a manager how do you technically approach that and 8:00 then um 10 years ago when we started data kitchen um we've always been a profitable company and so we started off 8:07 uh with some customers we started building some software and realized that we couldn't work any other way and that 8:13 the way we work wasn't understood by a lot of people so we had to write a book and a Manifesto to kind of share our our 8:21 methods and then so yeah we've been in so we've been in business now about a little over 10 8:28 years oh that's cool and uh like what 8:33 uh so let's talk about dat offs and you mentioned devops and how you were inspired by that and by the way like do 8:41 you remember roughly when devops as I think started to appear like when did people start calling these principles 8:49 and like tools around them as de yeah so agile Manifesto well first of all the I 8:57 mean I had a boss in 1990 at Nasa who had this idea build a 9:03 little test a little learn a lot right that was his Mantra and then which made 9:09 made a lot of sense um and so and then the sort of agile software Manifesto 9:14 came out which is very similar in 2001 and then um the sort of first real 9:22 devops was a guy at Twitter started to do automat automated deployment you know 9:27 push a button and that was like 200 Nish and so the first I think devops 9:33 Meetup was around then so it's it's it's been 15 years I guess 6 like I was 9:39 trying to so I started my career in 2010 so I my first job was a Java 9:44 developer and like I remember for some things like we would just uh SFTP to the 9:52 machine and then put the jar archive there and then like keep our fingers crossed that it doesn't break uh uh like 10:00 it was not really the I wouldn't call it this way right you were deploying you 10:06 had a Dey process I put it yeah 10:11 right was that so that was documented too it was like put the jar on production cross your 10:17 fingers I think there was uh like a page on uh some internal Viki uh yeah that 10:25 describes like with passwords and don't like what you should do yeah that was and and I think what's interesting is 10:33 why that changed right and and we laugh at it now but that was why didn't you 10:38 invest in automating deployment or a whole bunch of automated regression 10:44 tests right that would run because I think in software now that would be rare 10:49 that people wouldn't use C CD they wouldn't have some automated tests you know functional 10:56 regression tests that would be the exception whereas that the norm at the beginning of your career and so that's 11:03 what's interesting and I think you know if we if we talk about what's changed in the last two three years I I think it is 11:10 getting more standard there are um there's a lot more companies who are 11:15 talking data Ops or data observability um there's a lot more tools that are a lot more people are 11:22 using get in data and analytics than ever before I think thanks to DBT um and 11:29 there's a lot of tools that are I think getting more code Centric right that 11:35 they're not treating their configuration like a black box there there's several 11:41 bi tools that tout the fact that they that they're uh you know they're they're git Centric you know and and so and that 11:49 they're testable and that they have apis so things like that I think people maybe let's take a step back and just do a 11:57 quick summary of what data Ops data Ops is and then we can talk about like what changed in the last two years sure so I 12:06 guess it starts with a problem and that it's it sort of 12:11 admits some dark things about data and analytics and that we're not really successful and we're not really happy um 12:19 and if you look at the statistics on sort of projects and problems and even 12:25 the psychology like I think about a year or two we did a survey of 12:31 data Engineers 700 data engineers and 78% of them wanted their job to come with a therapist and 50% were thinking 12:38 of leaving the career altogether and so why why is everyone sort of unhappy well I I I think what happens is 12:46 teams either fall into two buckets they're sort of heroic teams who 12:52 are doing their they're working night and day they're trying really hard for their customer um and then they get 13:01 burnt out and then they quit honestly and then the second team have wrapped 13:06 their projects up in so much process and proceduralism and steps that doing 13:12 anything is sort of so slow and boring that they again leave in frustration um 13:18 or or live in cynicism and and that like the only outcome is quit and 13:24 start uh woodworking yeah the only outcome really is quit and start working 13:29 and um as a as a manager I always hated that right because when when your team 13:35 is either full of heroes or proceduralism you always have people who have the whole system in their head 13:42 they're certainly key people and then when they leave they take all that knowledge with them and then that 13:48 creates a bottleneck and so both of which are aren aren't and I think the 13:53 main idea of data Ops is there's a balance between fear and herois 14:00 that you can live you don't you know you don't have to be fearful 95% of the time maybe one or two% it's good to be 14:06 fearful and you don't have to be a hero again maybe one or two per it's good to be a hero but there's a balance um and 14:13 and in that balance you actually are much more prod
In this podcast episode, we talked with Guillaume Lemaître about navigating scikit-learn and imbalanced-learn.
Links: LinkedIn:https://www.linkedin.com/company/frontline100/ Ba Linh Le's LinkedIn: https://www.linkedin.com/in/ba-linh-le-/ Sabrina's LinkedIn: https://www.linkedin.com/in/sabina-firtala/ Twitter: https://x.com/frontline_100?mx=2 Website: https://www.frontline100.com/ Free LLM course: https://github.com/DataTalksClub/llm-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
We stream the podcasts on YouTube, where each session is also recorded and published on our channel, complete with timestamps, a transcript, and important links. You can access all the podcast episodes here - https://datatalks.club/podcast.html
Links: LinkedIn: https://www.linkedin.com/in/erum-afzal-64827b24/ Twitter: https://twitter.com/Erum55449739 Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: probabl. YouTube channel: https://www.youtube.com/@UCIat2Cdg661wF5DQDWTQAmg Calmcode website: https://calmcode.io/ probabl. website: https://probabl.ai/ Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: Biodiversity and Artificial Intelligence pdf: https://www.gpai.ai/projects/responsible-ai/environment/biodiversity-and-AI-opportunities-recommendations-for-action.pdf Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: Biodiversity and Artificial Intelligence pdf: https://www.gpai.ai/projects/responsible-ai/environment/biodiversity-and-AI-opportunities-recommendations-for-action.pdf Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: GitHub repo: https://github.com/antahiap/ADPT-LRN-PHYS/tree/main Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: LinkedIn: https://www.linkedin.com/in/tereza-iofciu/ Twitter: https://twitter.com/terezaif Github: https://github.com/terezaif Website: https:// terezaiofciu.com Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: VectorHub: https://superlinked.com/vectorhub/?utm_source=community&utm_medium=podcast&utm_campaign=datatalks Daniel's LinkedIn: https://www.linkedin.com/in/svonava/ Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html This podcast is sponsored by VectorHub, a free open-source learning community for all things vector embeddings and information retrieval systems.
Links: LinkedIn: https://www.linkedin.com/in/reemmahmoud/recent-activity/all/ Website: https://topmate.io/reem_mahmoud Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: Dev and AI hackathons: https://devpost.com/ Healthcare-focused challenges: https://grand-challenge.org/challenges/ Volunteering in projects (AI4Good): https://www.fruitpunch.ai/ Volunteering in projects (AI4Good) 2: https://www.omdena.com/ Twitter: https://twitter.com/el_ateifSara Instagram: https://www.instagram.com/saraelateif/ LinkedIn: https://www.linkedin.com/in/sara-el-ateif/ Youtube: www.youtube.com/@elateifsara Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: LinkedIn: https://www.linkedin.com/in/sarahmestiri/ Website: https://thrivingcareermoms.com/ Personal Website: https://www.sarahmestiri.com/ Youtube channel: https://www.youtube.com/@thrivingcareermoms444 Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
We talked about: Nemanja's background When Nemanja first work as a data person Typical problems that ML Ops folks solve in the financial sector What Nemanja currently does as an ML Engineer The obstacle of implementing new things in financial sector companies Going through the hurdles of DevOps Working with an on-premises cluster “ML Ops on a Shoestring” (You don't need fancy stuff to start w/ ML Ops) Tactical solutions Platform work and code work Programming and soft skills needed to be an ML Engineer The challenges of transitioning from and electrical engineering and sales to ML Ops The ML Ops tech stack for beginners Working on projects to determine which skills you need Links: LinkedIn: https://www.linkedin.com/in/radojkovic/ Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
We talked about: Rob's background Going from software engineering to Bayesian modeling Frequentist vs Bayesian modeling approach About integrals Probabilistic programming and samplers MCMC and Hakaru Language vs library Encoding dependencies and relationships into a model Stan, HMC (Hamiltonian Monte Carlo) , and NUTS Sources for learning about Bayesian modeling Reaching out to Rob Links: Book 1: https://bayesiancomputationbook.com/welcome.html Book/Course: https://xcelab.net/rm/statistical-rethinking/ Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
We talked about: Ivan's background How Ivan became interested in investing Getting financial data to run simulations Open, High, Low, Close, Volume Risk management strategy Testing your trading strategies Sticking to your strategy Important metrics and remembering about trading fees Important features Deployment How DataTalks.Club courses helped Ivan Ivan's site and course sign-up Links: Exploring Finance APIs: https://pythoninvest.com/long-read/exploring-finance-apis Pythin Invest Blog Articles: https://pythoninvest.com/blog Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: LinkedIn: https://www.linkedin.com/in/atitaarora/ Twitter: https://x.com/atitaarora Github: https://github.com/atarora Human-in-the-Loop Machine Learning: https://www.manning.com/books/human-in-the-loop-machine-learning Relevant Search: https://www.manning.com/books/relevant-search Let's learn about Vectors: https://hub.superlinked.com/ Langchain: https://python.langchain.com/docs/get_started/introduction Qdrant blog: https://blog.qdrant.tech/ OpenSource Connections Blog: https://opensourceconnections.com/blog/ Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: Adrian's LinkedIn: https://www.linkedin.com/in/data-team/ Twitter: https://twitter.com/dlt_library Github: https://github.com/dlt-hub/dlt Website: https://dlthub.com/docs/intro Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: LinkedIn profile: http://www.linkedin.com/in/visnadi The DataFreelancer website: https://thedatafreelancer.com/ Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: LinkedIn: https://www.linkedin.com/in/christoph-molnar/ Website: https://christophmolnar.com/ Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: Jack's LinkedIn profile: https://www.linkedin.com/in/jackblandin/ Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: Mini sound bath: https://www.youtube.com/watch?v=g-lDrcSqcrQ Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: Post: https://www.linkedin.com/posts/leracaiman_elasticsearch-ecommerce-activity-7106615081588674560-5WQO Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: LinkedIn: https://www.linkedin.com/in/ioannis-mesionis/ Github: https://github.com/ioannismesionis Website: https://ioannismesionis.github.io/ Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
We talked about: Angela's background Angela's role at Sam's Club The usefulness of knowing ML as a data engineer Angela's career path Transitioning from data analyst to data engineer/system designer Best practices for system design and data engineering Working with document databases Working with network-based databases Detecting fraud with a network-based database Selecting the database type to work with Neo4j vs Postgres The importance of having software engineering knowledge in data engineering Data quality check tooling The greatest challenges in data engineering Debugging and finding the root cause of a failed job What kinds of tools Angela uses on a daily basis Working with external data sources Angela's resource recommendations Links: LinkedIn: https://www.linkedin.com/in/aramirez1305/ Twitter: https://twitter.com/angelamaria__r Github: https://github.com/aramir62 Previous podcast talk: https://twitter.com/i/spaces/1OwGWwZAZDnGQ?s=20 Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
We talked about: Loïc's background Data management Loïc's transition to data engineer Challenges in the transition to data engineering What is a data architect? The output of a data architect's work Establishing metrics and dimensions The importance of communication Setting up best practices for the team Staying relevant and tech-watching Setting up specifications for a pipeline Be agile, create a POC, iterate ASAP, and build reusable templates Reaching out to Loïc for questions Links: Loiic LinkedIn: https://www.linkedin.com/in/loicmagnien/ Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: The Book of Why: https://amzn.to/3OZpvBk Causal Inference and Discovery in Python: https://amzn.to/46Pperr Book's GitHub repo: https://github.com/PacktPublishing/Causal-Inference-and-Discovery-in-Python The Battle of Giants: Causality vs NLP (PyData Berlin 2023): https://www.youtube.com/watch?v=Bd1XtGZhnmw New Frontiers in Causal NLP (papers repo): https://bit.ly/3N0TFTL Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
We talked about: José's background How José relocated to Norway and his schedule Tech companies in Norway and José role Challenges of working as a remote data engineer José's newsletter on how to make use of data The process of making data useful Where José gets inspiration for his newsletter Dealing with burnout When in Norway, do as the Norwegians do The legalities of working remotely in Norway The benefits of working remotely Links: LinkedIn: https://www.linkedin.com/in/jmssalas Github: https://github.com/jmssalas Website & Newsletter: https://jmssalas.com Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Links: LinkedIn: https://www.linkedin.com/in/sandrakublik/ Twitter: https://twitter.com/sandra_kublik Youtube: https://www.youtube.com/@sandra_kublik Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html