Machine Learning with Coffee

Follow Machine Learning with Coffee
Share on
Copy link to clipboard

Machine Learning with Coffee is a podcast where we are going to be sharing ideas about Machine Learning and related areas such as: artificial intelligence, business intelligence, business analytics, data mining and Big data. The objective is to promote a healthy discussion on the current state of this fascinating world of Machine Learning. We will be sharing our experience, sharing tricks, talking about latest developments and interviewing experts, all these on a very laid back, friendly manner. So, what are you waiting for? Grab a coffee and join us.

Gustavo Lujan


    • Mar 15, 2021 LATEST EPISODE
    • monthly NEW EPISODES
    • 21m AVG DURATION
    • 20 EPISODES


    Search for episodes from Machine Learning with Coffee with a specific topic:

    Latest episodes from Machine Learning with Coffee

    20 Perceptron: Machine Learning Begins

    Play Episode Listen Later Mar 15, 2021 15:49


    We introduce the concept of a perceptron as the basic component of a neural network. We talk about how important is to understand the concept of backpropagation applied to a single neuron.

    19 ICA: Independent Component Analysis

    Play Episode Listen Later Jan 24, 2021 16:13


    We discuss Independent Component Analysis as one of the most popular and robust techniques to decompose mixed signals. ICA has important applications in audio processing, video, EEG and in many datasets, which present very high multicollinearity.

    18 PCA: Principal Component Analysis

    Play Episode Listen Later Jan 10, 2021 20:49


    We discuss Principal Component Analysis as one of the most popular techniques to reduce the dimensionality of a dataset. PCA helps us be more efficient in terms of the number of variables we feed to our machine learning models.

    17 Anomaly Detection: Clustering

    Play Episode Listen Later Dec 22, 2020 27:37


    We present 3 clustering algorithms which will help us detect anomalies: DBSCAN, Gaussian Mixture Models and K-means. These 3 algorithms are very popular and basic but have passed the test of time. All these algorithms have many variations which try to overcome some of the disadvantages of the original implementation.

    dbscan
    16 Anomaly Detection: Control Charts

    Play Episode Listen Later Oct 19, 2020 21:34


    Anomaly detection is not something recent, techniques have been around for decades. Control charts are graphs with solid mathematical and statistical foundations which monitor how a process changes over time. They implement control limits which automatically flag anomalies in a process in real-time. Depending on the problem at hand, control charts might be a better alternative to more sophisticated machine learning approaches for anomaly detection.

    15 Adaboost: Adaptive Boosting

    Play Episode Listen Later Sep 28, 2020 18:01


    Adaboost is one of the classic machine learning algorithms. Just like Random Forest and XGBoost, Adaboost belongs to the ensemble models, in other words, it aggregates the results of simpler classifiers to make robust predictions. The main different of Adaboost is that it is an adaptive algorithm, which means that it learns from the misclassified instances of previous models, assigning more weights to those errors and focusing its attention on those instances in the next round.

    14 XGBoost: The Winner of Many Competitions

    Play Episode Listen Later Jul 26, 2020 13:50


    XGBoost is an open-source software library which has won several Machine Learning competitions in Kaggle. It is based on the principles of gradient boosting, which is based on the ideas of the Leo Breiman, the creator of Random Forest. The theory behind gradient boosting was later formalized by Jerome H. Friedman. Gradient boosting combines weak learners just as Random Forest. XGBoost is an engineering implementation which includes a clever penalization of trees and a proportional shrinking of leaf nodes.

    13 Random Forest

    Play Episode Listen Later Jul 12, 2020 23:07


    Random Forest is one of the best out-of-the-shelf algorithms. In this episode we try to understand the intuition behind the Random Forest and how it tries to leverage the capabilities of Decision Trees by aggregating them using a very smart trick called “bagging”. Variable Importance and out-of-bag error are two of the nice capabilities of Random Forest which allow us to find the most important predictors and compute a good generalization error, respectively.

    12 Decision Trees

    Play Episode Listen Later May 31, 2020 18:49


    We talk about Decision Trees as one of the most basic statistical learning algorithms out there that all Data Scientist should know. Decision Trees are one of a few machine learning models which are easy to interpret which makes them a favorite when it is desired to understand the logic behind a certain decision. Decision Trees naturally handle all types of variables without the need to create dummy variables, no need to scale or normalize and they are also very robust against outliers.

    11 Inferential Statistics

    Play Episode Listen Later May 10, 2020 16:16


    We talk about the importance of inferential statistics in Data Science. Inferential statistics are a set of techniques used to make generalizations about a population from a sample. One of the tools used in inferential statistics is hypothesis testing. In this episode we provide a couple of examples on when and why to use 1-sample t-tests and 2-sample t-tests. We also argue that the mean or average of a sample means nothing if we do not also consider the variation of the data.

    data science inferential
    10 Logistic Regression

    Play Episode Listen Later Apr 26, 2020 22:45


    Logistic regression is a very robust machine learning technique which can be used in three modes: binary, multinomial and ordinal. We talk about assumptions and some misconceptions. For example, people believe that because logistic regression fits only a linear separator in the expanded dimensional space it wouldn’t be able to fit a complex boundary in the original space. Also, people normally use either linear regression or multinomial logistic regression when they should be using ordinal logistic regression.

    09 Regularization to Deal with Overfitting

    Play Episode Listen Later Apr 19, 2020 15:34


    In this episode with talk about regularization, an effective technique to deal with overfitting by reducing the variance of the model. Two techniques are introduced: ridge regression and lasso. The latter one is effectively a feature selection algorithm.

    08 Linear Regression: The Return of the Queen

    Play Episode Listen Later Apr 4, 2020 21:34


    In this episode I will try to convince you that Linear Regression is one of the most powerful Machine Learning algorithms. We will talk about common misconceptions, especially that Linear Regression is not able to model non-linear relationships. We also discuss how the myth of normality encourages many people to completely discard Linear Regression on non-normal data, when in reality, normality of the data has nothing to do with this assumption. Finally, I provide advice in how to check, but most importantly, how to fix any violated assumption in Linear Regression.

    machine learning linear regression
    07 COVID-19

    Play Episode Listen Later Mar 28, 2020 10:49


    We talk about how Data Science and Machine Learning can help us better understand COVID-19 challenges. In this episode, we go back to the Kaggle website where different institutions, including the White House, have come together to try to analyze more than 45,000 published articles. The task is about answering 10 different questions which will help scientist around the world better understand this new virus and future pandemics.

    06 How to Become a Data Scientist

    Play Episode Listen Later Mar 15, 2020 29:53


    We talk about what it takes to become a Data Scientist. We also discuss 4 prerequisites before preparing yourself to become a Data Scientist. Finally, we provide recommendations on 3 online courses, that if mastered, will put you above 90% of all Data Scientists out there.

    05 Machine Learning: Use Cases Part 2

    Play Episode Listen Later Mar 8, 2020 24:17


    We continue exploring publicly available datasets to better understand Machine Learning use cases and its applications. This time we explore Kaggle which is the world’s largest data science community. Unlike UCI ML repository, which is more of an archive and geared towards an academic community, Kaggle has datasets that capture the latest trends in Machine Learning and hosts competitions sponsored by big companies where data scientists can participate and win big prizes.

    04 Machine Learning: Use Cases

    Play Episode Listen Later Feb 23, 2020 25:08


    We explore the different areas of application of Machine Learning and talk about use cases which range from biology, finance and health care. We make use of the UCI Machine Learning Repository to learn about the most famous datasets in the data science community, discussing the problem they are trying to solve, the response or target they are trying to predict as well as the predictors we have available to achieve this goal.

    03 What is Machine Learning?

    Play Episode Listen Later Feb 12, 2020 40:16


    The definition of Machine Learning and other related areas such as: artificial intelligence, business analytics, business intelligence and Big Data, is provided. These are not academic definitions extracted from books, these are real world concepts as I see them. We discuss similarities, differences and overlap between all these, sometimes confusing terms, which people tend to misuse.

    02 My Personal Journey: How I Became a Data Scientist

    Play Episode Listen Later Feb 2, 2020 29:03


    In this episode I talk about my personal journey, how I became a Data Scientist. I start by talking about how I decided to go to college, what major to choose, how I chose my master’s degree. I talk about my time studying a PhD in Engineering and the most useful classes I took related to machine learning and data science. Finally, I briefly talk about my job experience as Data Scientist.

    01 Introduction and Expectations

    Play Episode Listen Later Jan 26, 2020 14:34


    In this, our first episode, we will define the objective of the show as well as expectations. The show is designed for anyone who is interested in this fascinating world of Machine Learning.

    Claim Machine Learning with Coffee

    In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

    Claim Cancel