The Data Life Podcast

Follow The Data Life Podcast
Share on
Copy link to clipboard

This is a podcast where we talk all-about real life experiences of dealing with data and machine learning tools, techniques and personalities. We cover not just the technical aspects but also the "life" aspects of working in the field. Note: Opinions expressed are my own and do not express the views or opinions of my employer. Support this podcast: https://anchor.fm/the-data-life-podcast/support

Sanket Gupta


    • Oct 11, 2021 LATEST EPISODE
    • infrequent NEW EPISODES
    • 20m AVG DURATION
    • 27 EPISODES


    Latest episodes from The Data Life Podcast

    27: Building Open Source Data Startup with Airbyte CEO, Michel Tricot

    Play Episode Listen Later Oct 11, 2021 44:55


    We talk with Michel Tricot, who is the Founder and CEO of Airbyte, which is an open source data integration Y Combinator startup. It has raised over $30M in capital and has been growing quite fast. It was a great conversation and I think you will also enjoy it.

    26: Building Data Engineering Pipelines at Scale (with Data Warehouse, Spark and Airflow)

    Play Episode Listen Later Aug 18, 2021 39:30


    Imagine you are at a beach and you are hanging out and seeing all the waves come and go and all the shells on the beach. And you get an idea. How about you collect these shells and make necklaces to sell? Well how would you go about doing this? Maybe you'd collect a few shells and make a small necklace and try to show to your friend. This is where we begin our journey on learning about data engineering pipelines. Using an example of running a necklace business from shells - we learn about the following data engineering concepts: 1. ETL - Extract Transform Load vs ELT - Extract Load Transform concepts. Why Data Warehouses are great for analytics. 2. Spark for large data processing and hosting / running 3. Data orchestration using Airflow My blog on Towards Data Science about moving from Pandas to Spark: https://towardsdatascience.com/moving-from-pandas-to-spark-7b0b7d956adb Great book to learn about Spark: https://www.amazon.com/dp/1492050040/?tag=omnilence-20 Tools covered in the episode: dbt: https://www.getdbt.com/ Databricks: https://databricks.com/ EMR: https://aws.amazon.com/emr/ AWS Redshift: https://aws.amazon.com/redshift/ Snowflake: https://www.snowflake.com/ Delta Lake: https://databricks.com/product/delta-lake-on-databricks --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    25: Talking Data Privacy with Jeff Bermant

    Play Episode Listen Later Aug 4, 2021 28:11


    In this episode, I'm excited to be talking with Jeff Bermant, who is the founder and CEO of Cocoon Mydata Rewards browser. It is a browser based off Chrome and it pays people to use it! ✨ In this episode we talk about data ethics and privacy, and how Jeff believes that users should be paid for their data. We talk about GDPR and similar laws in US, future of data privacy and more! Go to https://getcocoon.com to download and use Cocoon Rewards Browser. ~Thanks for listening~ --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    24: Promoting Women in Tech - With Rupal Gupta

    Play Episode Listen Later Oct 8, 2020 15:05


    In this episode, we are talking about women in tech with Rupal Gupta. Rupal, a recent graduate from Online MS in CS from Georgia Tech, is a data engineer in the industry and is passionate to help promote women in tech. She also has some great tips and resources for anyone trying to break into data science and tech! In this episode we talk about things that can help promote women in tech, women in tech conferences such as Grace Hopper, looking for jobs, resources to prepare for the interviews etc. If you want to reach out to Rupal for any help or to collaborate with her project womenmentors.co, here is her LinkedIn: https://www.linkedin.com/in/rupalgupta15/ FREE Women in Tech Conference by Manning Publications on Oct 13th at 12pm ET on Twitch: https://freecontent.manning.com/livemanning-conferences-women-in-tech/

    23: Let's Talk AWS SageMaker for ML Model Deployment

    Play Episode Listen Later Jun 17, 2020 19:46


    In this episode, we talk about Amazon SageMaker and how it can help with ML model development including model building, training and deployment. We cover 3 advantages in each of these 3 areas.  We cover points such as: 1. Host ML endpoints for deploying models to thousands or millions of users. 2. Saving costs for model training using SageMaker. 3. Use CloudWatch logs with SageMaker endpoints to debug ML models.  4. Use preconfigured environments or models provided by AWS. 5. Automatically save model artifacts in AWS S3 as you train in SageMaker.  6. Use of version control for SageMaker notebooks with Github. and more…  Please rate, subscribe and share this episode with anyone who might find SageMaker useful in their work. I feel that SageMaker is a great tool and want to share about it with data scientists.  For comments/feedback/questions or if you think I have missed something in the episode, please reach out to me at LinkedIn: https://www.linkedin.com/in/sanketgupta107/ --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    22: Transfer Learning for NLP - With Paul Azunre

    Play Episode Listen Later Apr 13, 2020 46:46


    In this episode, we are talking with Paul Azunre. Paul is one of the world's experts in the area of Transfer Learning for NLP and is also an author of the upcoming book Transfer Learning for NLP published by Manning Publications. In this episode we talk about things such as: 1) Paul's background and how his background in maths and optimization as well as fake news detection got him started in transfer learning in NLP. 2) How Paul got started with the book, book writing process as well as tips to the listeners for writing a technical book. 3) High level summary of transfer learning in both computer vision and NLP and why this is the ImageNet moment of NLP. 4) Why ML and NLP practitioners today should be excited about transfer learning (such as how students in Ghana are able to build their own Google Translate using transfer learning) 5) How BERT, ELMo and ALBERT work at the high level and how they differ from traditional techniques like Word2Vec or FastText. 6) Differences between BERT, ELMo and ALBERT. 7) What makes Paul's new book a must-read for anyone interested in this field. ✨Paul's Info

    21: Why Scikit-Learn and Keras are Awesome for ML

    Play Episode Listen Later Jan 26, 2020 19:54


    In this episode, we talk about why the two libraries Scikit-Learn and Keras are great for machine learning. These two libraries combined with Pandas form the 3 core libraries in Python for a data scientist today. We cover things like: 1) Data Exploration and data cleaning - how Pandas and Jupyter notebooks provide a good way to get started here. 2) Data Transformation - how Scikit-Learn provides many useful functions like train_test_split, Scalers, PCA etc. 3) Data Fitting - how Scikit-Learn provides good shallow models and Keras provides great support to quickly get started with neural networks. We also cover various tidbits on things to take note in building ML pipelines and preparing models to be deployed in production, so tune into the episode to find out! Fantastic Resources: 1) Book by head of Youtube DS team Aurelien Geron: https://www.amazon.com/dp/1492032646/?tag=omnilence-20 This is one of the best book I have read on this topic as it covers practical tips incl. Scikit-Learn API etc. 2) Developing Scikit-Learn estimators: https://scikit-learn.org/stable/developers/develop.html 3) Guide to Keras Sequential API: https://keras.io/getting-started/sequential-model-guide/ 4) Guide to Keras Functional API: https://keras.io/getting-started/functional-api-guide/ 5) My previous episode on Pandas: https://podcasts.apple.com/us/podcast/17-why-pandas-is-the-new-excel/id1453716761?i=1000454831790 Thanks for listening! Please consider supporting this podcast from the link in the end. --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    20: Yogi's Guide to Analytics - An Interview with Akshay Kanade

    Play Episode Listen Later Dec 1, 2019 35:44


    In this episode, we talk with Akshay Kanade. He is a business analyst working in New York City who likes taking a big view of data, and has very interesting spiritual views on data analytics and life in general, he is also a handwriting expert- he can read people's handwriting and can recognize a lot about their personalities. In this interview we will cover several things such as: - How has been an analyst influenced Akshay's life? - Introspection about data and analytics - Taking high level view of data - connecting deep learning with deep thinking - People who don't have background in analytics- how they can use their unique backgrounds for decisions - Power of consciousness and spirituality at work - Hand-writing analysis and whether it is a science or an art It was a fascinating conversation, and I took a lot away talking with Akshay's view points. This interview is a must-listen if you deal with data and analytics in your work. Akshay's hand-writing analysis and mentorship website: www.pradnyatantra.com (will be live soon) Reach Akshay on LinkedIn at https://www.linkedin.com/in/akshaykanade06/ Some of Akshay's favorite books: 1. Autobiography of a Yogi https://www.amazon.com/dp/8120725247/?tag=omnilence-20 2. The Monk Who Sold His Ferrari https://www.amazon.com/dp/0062515675/?tag=omnilence-20 3. Mastery https://www.amazon.com/dp/B00A6G9CGG/?tag=omnilence-20 To add to this list, one of my favorite books is: The Power of Now https://www.amazon.com/dp/B00A6G9CGG/?tag=omnilence-20 If you have any feedback drop me a note at thedatalifepodcast@gmail.com or reach me on LinkedIn at https://www.linkedin.com/in/sanketgupta107/ ~ Thanks for listening~ --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    19: Statistics and Data Science- An Interview with Patrick McClory

    Play Episode Listen Later Nov 22, 2019 56:20


    In this podcast episode, we do an interview! We talk with Patrick McClory, who is the founder and CEO of IntrospectData. He is an expert working in areas of data science consulting, large machine learning projects, math, statistics and more. In this episode we cover several interesting topics such as: 1) What makes a good data scientist? 2) The different roles in the industry such as data engineer, machine learning engineer, data analyst etc. 3) The first mile problem: Data ownership and ethics of data collection. Patrick can be reached at patrick@introspectdata.com and you can read more about IntrospectData's projects at https://introspectdata.com/  Some books discussed in the episode:  1. The Field Guide to Understanding Human Error  2. Information Theory: A Tutorial Introduction  If you enjoyed this episode or have any feedback drop me a note at thedatalifepodcast@gmail.com ~ Thanks for listening ~ --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    18: 5 Things to Consider for Master of Science (MS) in US

    Play Episode Listen Later Nov 15, 2019 19:00


    What should you consider for pursuing MS in US? There might be several questions in your mind as you explore this question. In this episode we cover some of the main things to consider before you make the decision. I also go into details about things which I wish I knew before coming to US for MS. The things I cover in the podcast are to consider for MS in US are: 1) Location matter more than rankings. 2) Talk to professors before applying. 3) Culture of hard work, and advantage of having prior work experience. 4) Cost is High and low cost alternates. 5) Visa situation is uncertain. Hope you enjoy this episode, this was an episode that I wish I listened to before flying to US. Reach out with your questions/feedback at thedatalifepodcast@gmail.com Resources: Although I did not cover GRE or TOEFL topics in detail, I am linking to some great resources for their preparation. 1) Essential Words for the GRE https://www.amazon.com/dp/1438007493/?tag=omnilence-20 2) GRE Prep Guide by Kaplan https://www.amazon.com/dp/150624890X/?tag=omnilence-20 3) GRE Guide by Barrons https://www.amazon.com/dp/1438009151/?tag=omnilence-20 4) TOEFL Guide by Barrons https://www.amazon.com/dp/1438076258/?tag=omnilence-20 5) Blog version of this podcast episode https://medium.com/the-data-life/ms-in-us-for-data-science-57079509ded9 Thanks for listening. Please support us via the link in the end for Anchor Payments. It would allow us to build more of this content! --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    17: Why Pandas is the new Excel

    Play Episode Listen Later Oct 25, 2019 16:37


    The Data Life Podcast is a podcast where we talk all-about real life experiences with data and data science science tools, techniques, models and personalities. In this episode, we will talk about how Pandas is becoming a tool of choice for many data scientists for doing their data analysis work. We will explore how Pandas wins over Excel in several key areas that are important for businesses today: 1) Large dataset sizes 2) Different kinds of input formats such as JSON, CSV, HTML, SQL etc 3) Complex business logic 4) Linking data analysis work to websites and databases 5) Cost Pandas has lots of helpful functions such as read_csv, read_json, read_sql that allow easy input of data into dataframes. DataFrames have several useful methods like "describe", "value_counts", "groupby", "loc" and more that allow easy understanding of your dataset. It also supports plotting out of the box with "plot" method. We also cover how Pandas differs from SQL in things like ease of handling time series data, visualizations and more. Tune in to the episode to learn more about how Pandas might be the tool for your data analysis needs to take your business to next level! Fantastic Resources: 1) Book by Pandas creator Wes McKinney: https://www.amazon.com/dp/1491957662/?tag=omnilence-20 2) Great workshop video by Kevin Markham in PyCon: https://www.youtube.com/watch?v=0hsKLYfyQZc 3) Input output methods for Pandas: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html 4) Comparison of some operations of Pandas with SQL https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html Thanks for listening! Please consider supporting this podcast from the link in the end. --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    16: Getting Started with Natural Language Processing

    Play Episode Listen Later Oct 5, 2019 19:31


    So many tweets and news articles and unstructured text surrounds us. How do we make sense of all of these? Natural language processing or NLP can help. NLP refers to algorithms that process, understand and generate aspects of natural language either in text or in spoken voice. In this episode we will cover some of the common techniques in NLP to help get started in this exciting field! We cover several tasks in a NLP pipeline: 1. Tokenization and punctuation removal 2. Stemming and Lemmatization 3. One hot vectors 4. Word embeddings including Word2Vec and Glove 5. Recurrent Neural Networks and LSTMs 6. tf and tf-idf approaches - when to use word embeddings, when to use tf / tf-idf approaches? 7. Generating text using encoder-decoder or sequence to sequence models Some resources: 1. Sequence Models - course by Andrew Ng on Coursera - one of the best courses I have seen on this topic! https://www.coursera.org/learn/nlp-sequence-models 2. Awesome collection of resources for NLP for Python, C++, Scala etc. and popular resource: https://github.com/keon/awesome-nlp 3. Overview of Text Similarity Metrics (a blog written by me on Medium): https://towardsdatascience.com/overview-of-text-similarity-metrics-3397c4601f50 4. How to train custom word embeddings on a GPU https://towardsdatascience.com/how-to-train-custom-word-embeddings-using-gpu-on-aws-f62727a1e3f6 Thanks for listening, please support this podcast by following the link in the end. --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    15: Using Flask, REST API and Vue.js to build a Single Page Web Application

    Play Episode Listen Later Sep 16, 2019 20:39


    As a data scientist, you will work on machine learning models that are deployed on websites - usually wrapped around a REST API, these days they also call this approach a “micro-service”. It is for this reason it is important to know how backends and front ends work and how to build them. In this episode, we talk about building a note app which is a Single Page Application or SPA using Pythons flask library for backend and Vue.js for frontend. We use REST API to communicate between them. We cover following topics in Q and A format: 1. Why should data scientists care about building frontend and backend and rest api? 2. What is a single page application? 3. Why Vue.js? 4. Why do we need server side code? 5. What is REST API? 6. How does Flask help with building rest api? Then we go into the exact mechanics of building the SPA: Step 1: Database setup Step 2: Write REST API in flask Step 3: Postman setup and testing of the API Step 4: Build frontend and write forms to get information Step 5: Build routing and login pages Step 6: Front end design and UI/UX Finally you can deploy both the server and client separately on AWS or Heroku so that other users can see it and use it. Dependencies: 1) Flask to build server side REST APIs 2) Sqlalchemy which is ORM to access database 3) Bcrypt for hashing user passwords to store in your database 4) Vue for building frontend 5) Bootstrap-Vue for using bootstrap with Vue.js 6) Axios to communicate via AJAX between client and server 7) Vue CLI 3 to manage the tooling of the client Really awesome resources: 1) Learn Vue.JS from scratch by the awesome teacher Net Ninja - YouTube https://www.youtube.com/watch?v=5LYrN_cAJoA&list=PL4cUxeGkcC9gQcYgjhBoeQH7wiAyZNrYa&index=1 2) Building book recording app using Vue and Flask https://testdriven.io/blog/developing-a-single-page-app-with-flask-and-vuejs/#bootstrap-vue 3) Managing state in Vue.js including Vuex and simple global store: https://medium.com/fullstackio/managing-state-in-vue-js-23a0352b1c87 4) Authenticating a Flask API Using JSON Web Tokens - YouTube https://www.youtube.com/watch?v=J5bIPtEbS0Q 5) Really nice tutorial for using databases with Flask by Corey Schafer - YouTube https://www.youtube.com/watch?v=cYWiDiIUxQc&list=PL-osiE80TeTs4UjLw5MM6OjgkjFeUxCYH&index=4 If this has been of value please consider supporting me by buying me a coffee at the Anchor link at the end. If you support, I will provide extra bonus content for you. Thanks for listening! --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    14: Building a Character-Based Text Classifier

    Play Episode Listen Later Aug 7, 2019 23:20


    Ever wonder how to automatically detect language from a script? How does Google do it? Ever wonder how Amazon knows whether you are searching for a product or a SKU on its search bar? We look into character-based text classifiers in this episode. We cover 2 types of models. First is the bag-of-words models such as Naive Bayes, logistic regression and vanilla neural network. Second we cover sequence models such as LSTMs and how to prepare your characters for the LSTMs including things like one-hot encoding, padding, creating character embeddings and then feeding these into LSTMs. We also cover how to set up and compile these sequence models. Thanks for listening, and if you find this content useful, please leave a review and consider supporting this podcast from the link below. --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    13: Statistics of A/B Testing

    Play Episode Listen Later Jul 17, 2019 21:22


    You and your team might spend a lot of time building a new feature. But how do you know if this feature will be liked by the users? One of the ways to statistically prove this is by using A/B testing. Listen to this episode to get tips, tricks and intuition behind hypothesis testing, alpha, beta, p-values, two-sample t-tests and more. These understandings have been learnt from experiences deploying A/B tests in the field, and talking to experts. These ideas are typically not covered in traditional A/B testing texts which tend to focus a lot on math without the intuition, and that's why I really wanted to cover it in this podcast episode. Thanks for listening! I'd really appreciate your support for this podcast. Follow the link below. --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    12: Your Users Don't Care How Smart You Are

    Play Episode Listen Later Jun 25, 2019 5:04


    In this episode, we will talk about the importance of business impact in data science. "Your users don't care how smart you are" was a quote I read that got me started in thinking about this. The right way to do data science is to think of users, revenue impact, business value and go for the simplest solution possible. The wrong way to do data science is to just find a nail to hit the hammer with rather than the other way around. We will cover about all this and more! Amazon link of Inspired by Marty Cagan (a great read to get better at product thinking): https://www.amazon.com/dp/1119387507/?tag=omnilence-20 Please consider buying me a coffee if you find this content useful. Refer to link at the bottom. --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    11: The Ten Essential Machine Learning Questions

    Play Episode Listen Later Jun 21, 2019 19:08


    This episode covers the ten essential machine learning questions. Disclaimer: Baseline answers have been provided in the episode for guidance. For complete accuracy, please refer to textbooks or to courses by Andrew Ng on Coursera. If this content is useful, please consider buying me a coffee via the link https://anchor.fm/the-data-life-podcast/support Resources: 1. Machine Learning Course by Andrew Ng: https://www.coursera.org/learn/machine-learning 2. Deep Learning Course by Andrew Ng: https://www.coursera.org/specializations/deep-learning Questions: 1. What is underfitting and overfitting? How to avoid it? 2. What is the difference between batch, SGD and mini-batch gradient descents? When will you use each? 3. How to choose a machine learning model? 4. How to improve the latency of a machine learning model in production? 5. If your training and cross validation accuracies are high, but testing accuracy is less - how would you debug this? 6. Name 3 hyper-parameters. Why can't we train them as hyper-parameters, why should only humans set them? 7. Which metric should be used to evaluate a classifier? How do you connect it to business value? 8. What prevents someone to select deep learning model for everything? 9. Say you have to classify a lot of data, but you don't have labelled training examples. How would you begin to solve the problem? How many training data points are needed? 10. Say you have a perfectly working machine learning model. How do you deploy this in production? How do you check if users will actually like it? Please leave a review on Apple Podcasts or wherever you listen to this. Thanks for listening! --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    Mining Twitter Data for Sentiment Analysis of Events

    Play Episode Listen Later Jun 1, 2019 18:43


    Twitter is a rich source of live information. Is it possible to run sentiment analysis on what the world is thinking as an event unfolds over time? Could we track Twitter data and see if it correlates to news that affects stock market movements? These are some of the questions that we will answer in this podcast episode. There are 6 steps for mining Twitter data for sentiment analysis of events that we will cover: 1) Get Twitter API Credentials 2) Setup API Credentials in Python 3) Get Tweet Data via Streaming API using Tweepy 4) Use out-of-the-box sentiment analysis libraries to get sentiment information 5) Plot sentiment information to see trends for events 6) Set this up on AWS or Google Cloud Platform This episode covers information about saving the tweets in a database, and using them to plot sentiment information. Corresponding Blog Post With Code: https://towardsdatascience.com/mining-live-twitter-data-for-sentiment-analysis-of-events-d69aa2d136a1?source=friends_link&sk=e06ae49f4ce6fb52157ea0eaee72f4c4 Tweepy: https://github.com/tweepy/tweepy TextBlob: https://textblob.readthedocs.io/en/dev/ Vader Sentiment: https://github.com/cjhutto/vaderSentiment Set up AWS instance: https://aws.amazon.com/ec2/getting-started/ Set up GCP instance: https://cloud.google.com/compute/docs/quickstart-linux My Twitter Profile: https://twitter.com/sanket107 Thanks for listening! --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    Don't Be Shy To Pursue Your Interest

    Play Episode Listen Later May 19, 2019 5:00


    In this episode, we will talk about things like Maslow's Hierarchy of Needs, and focussing on higher level needs such as satisfaction and achieving full potential. In the area of tech, data science and software development, admitting your interest could involve "shyness" as the next shiny cool thing is pursued by everyone. But if your interest is in a niche, don't let others stop you from putting in an effort to become great at it. Thanks for listening, and please show your support to keep this podcast going! --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    Review of Udacity Nanodegrees - are they worth it?

    Play Episode Listen Later May 3, 2019 13:03


    Udacity has become a popular platform for learning about various things in data science, machine learning and programming in general. In this episode, we will discuss the good, bad and ugly of the Udacity nanodegrees. I will also cover my experiences with Deep Learning and NLP Nanodegrees. We will cover things like how Udacity has great production quality and has nice intro courses, but due to their lack of depth and low community engagement, the high costs might not be justified (most of their nanodegrees are around $1,000 currently) But if cost is not a concern, then Udacity could be a good way to get into a new area. If you prefer a structured approach with timelines, they could be good too but if you don't mind doing your own research, reading of blogs and watching free videos online, then again Udacity nanodegrees may not be worth the cost. Resources: 1) Deep Learning Nanodegree: https://www.udacity.com/course/deep-learning-nanodegree--nd101 2) NLP Nanodegree: https://www.udacity.com/course/natural-language-processing-nanodegree--nd892 3) DeepLearning.AI by Andrew Ng: https://www.coursera.org/deeplearning-ai Please support the podcast by rating it in Apple Podcasts, and also leaving a review :) Thanks for listening! --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    6 Steps to Transition to Data Science from non-CS background

    Play Episode Listen Later Apr 20, 2019 15:58


    In this episode we will talk all about the various steps to transition to data science from non computer science backgrounds. One of the main difficulties people face from non-CS backgrounds is how overwhelming it can be to transition to data science field, I talk about my own journey, and share the 6 steps which can help you in your own data science career! 00:00 to 02:10: Introduction 02:11 to 06:00: My Background of moving to data science from electrical engineering 06:01 to 10:56: Steps 1 to 3 covering things like using external APIs, already processed datasets and performing full stack data science work 10:57 to 11:55: Break sponsored by Anchor 11:56: End: Steps 4 to 6 covering things like math and statistics, machine learning pipelines and data structures & algorithms Some useful links: 1) Andrew Ng Deep Learning Specialization Coursera https://www.coursera.org/specializations/deep-learning 2) Intro to Statistics by Sebastien Thrun https://www.udacity.com/course/intro-to-statistics--st101 3) Aurelion Geron's book on machine learning https://www.amazon.com/dp/1491962291/?tag=omnilence-20 4) Pramp for mock algorithm sessions on video https://www.pramp.com/ 5) Leetcode for algorithm question datasets https://leetcode.com/ Some great datasets to get started in machine learning: 6) MNIST for hand written digits https://www.kaggle.com/c/digit-recognizer 7) Iris dataset for flower classification http://archive.ics.uci.edu/ml/datasets/iris 8) IMDB movie reviews https://ai.stanford.edu/~amaas/data/sentiment/ Thanks for listening! --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    The Top 5 Data Science Podcasts

    Play Episode Listen Later Apr 10, 2019 8:20


    Welcome! In this episode, we will cover some of the top data science podcasts, that have helped me a lot in my own journey, and hopefully will be helpful to you as well.  The top 5 podcasts are (linked to my favorite episodes): 1) AI in Industry with Daniel Faggella 2) This week in Machine Learning and AI (TWiML) 3) DataFramed 4) Data Skeptic 5) Talk Python to Me Listen to the episode for the sixth bonus podcast! If you think I should mention another podcast here, let me know and I will add it in the show notes! Thanks for listening! --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    What I learnt building a data science course

    Play Episode Listen Later Mar 30, 2019 18:16


    Have you ever thought about building a video course? Have you wanted to share your expertise with other people via a video course on different platforms like Udemy? Have you wondered what are the economics and revenue details of building a course? This podcast episode is for you!  In this episode, I talk about my experience in building my first data science video course, lessons learnt and how you can use these in your own video course.   00:00 to 09:30- I talk about my experience with Packt Publishing in developing the video course. 09:30 onwards- I talk about the 3 lessons learnt and how you can leverage these to fully maximize the potential of your video course.  Links:  1) My First Video course: https://www.packtpub.com/big-data-and-business-intelligence/hands-fundamentals-data-science-go-video  2) Link to my previous podcast on recommendation engines 3) Github link to the starter code of recommendation engines on movie reviews: https://github.com/sanketg10/the-data-life-podcast  4) Link to my new course on "Overview of Query Understanding Techniques": https://sanketgupta.teachable.com/p/query-understanding-techniques  5) Google Ads Keyword Planner: https://ads.google.com/home/tools/keyword-planner/  #video-course #course #teachable #udemy #packt #data-science --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    Overview of Netflix and Spotify like recommendation engines

    Play Episode Listen Later Mar 22, 2019 13:28


    In this episode, we cover the two main types of recommendation engines used at companies like Netflix and Spotify. 1) Content based recommendation systems use the genres or tags of each product to find other similar products to recommend to users. 2) Collaborative filtering based recommendation systems use user activity and user ratings on the website to recommend products. We go through the pros and cons of each, the challenges, how do companies like Netflix and Spotify scale their recommendation engines for millions of users and more! My code in the Github repo which implements these concepts from scratch using MovieLens dataset. Links: 1) Youtube talk by Xavier Amatriain from Netflix 2) Youtube talk on "Machine Learning & Big Data for Music Discovery presented by Spotify" 3) Youtube tutorial by Luis Serrano on how Netflix recommends movies #netflix #spotify #movielens #recommendations #recommendation-engines --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    3 Mistakes to Avoid in a Machine Learning Project

    Play Episode Listen Later Mar 15, 2019 10:18


    You and your team might spend weeks or even months building a model. These are the 3 mistakes to avoid in your next machine learning project! This can save you a lot of time and effort in your next project. These tips have been learnt from experiences deploying ML models in production as well as hearing from experts in the field. These tips and mistakes are typically not covered in traditional machine learning texts and courses, and that's why I really wanted to cover it in this podcast episode. I'd really appreciate your support for this podcast. Please visit the podcast webpage and support, so that I can continue to develop podcast episodes. Thanks for listening! --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    Flask is a Great Tool for Full Stack Data Science

    Play Episode Listen Later Mar 5, 2019 10:43


    In this episode, we will talk all about what makes Flask such a great tool for both beginner and experienced data scientists to know. It was one of the first tools I learnt in my data science journey, and it has been so useful along the way. Flask is a micro-framework in Python which allows to build websites in a simple way. Flask will make you as a data scientist work better with the front end engineers. Also, it is a great way to build something like say recommender systems where, users can input a product they have liked, and you have a machine learning model in Python that reads this and recommends another product. Resources: 1) Miguel Grinberg's Flask Mega Tutorial 2) Vue.JS Tutorials by Net Ninja Thanks for listening to The Data Life Podcast! --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    Hello, World!

    Play Episode Listen Later Feb 19, 2019 1:59


    To kick things off, I talk about the kind of topics you can expect to hear in this podcast. Welcome to The Data Life! --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

    Claim The Data Life Podcast

    In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

    Claim Cancel