Follow the RAPIDSFire podcast for a fresh take on data science. Hear from revolutionaries transforming data science on GPUs for scientific research, higher education, and the broader enterprise. Talks with open-source software maintainers, Kaggle grandmasters, practitioners, CUDA experts and many others keep you up-to-date on the most exciting developments. Let's discuss how to make your work better and faster. Hosted by Data Scientist Paul Mahler. Join the conversation and send feedback on Twitter @rapidsai
I talk with Sam Moss and Cameron Weinert about using data science to predict college football. We talk about feature engineering, following your passing, the role of analytics in sports and sports fandom, how to be an intelligent consumer of data science as a non-data scientist, and a lot more. Everyday is Saturday on Spotify. Everyday is Saturday on Apple. Sam on Twitter. Raw data on college football here at collegefootballdata.com
We talk with Marlene Mhangami, a director and chair of the Python Software Foundation, co-founder of coding education non-profit ZimboPy, someone that took a huge career pivot from pre-med to software engineering, and one of the folks that helped bring RAPIDS to Windows. We talk about changing careers, creativity and confidence in tech, and of course RAPIDS on Windows. Marlene's home page https://marlenemhangami.com/ Marlene's blog post about RAPIDS on Windows https://medium.com/rapids-ai/running-rapids-on-microsoft-windows-10-using-wsl-2-the-windows-subsystem-for-linux-c5cbb2c56e04 Tutorial on using RAPIDS on Windows via WSL2 https://www.youtube.com/watch?v=jnEd3IDsF-I ZimboPy on github https://github.com/ZimboPy
This week we're joined by Even Oldridge, Senior Manager, RecSys Platform Team at NVIDIA. We talk about Tabular Deep Learning, NVMerlin, how bookstores aren't like recommender systems, his team's recent repeat win in the ACM Recsys Challenge, the future of recommender systems and more. NVIDIA Merlin on the NVIDIA Developer Blog https://developer.nvidia.com/blog/tag/merlin/ NVIDIA Merlin blogs on Medium https://medium.com/nvidia-merlin Merlin on Github https://github.com/NVIDIA-Merlin/Merlin NVTabular Blogs https://developer.nvidia.com/blog/tag/nvtabular/ NVTabular on Github https://github.com/NVIDIA/NVTabular REES46 data set mentioned toward the end of the podcast https://rees46.com/en/datasets
We talk with 4-time Kaggle winner Christof Henkel about how he got started in Kaggle, important skills for Kaggle success, his most memorable contests, his most recent victory, how an alien radio signal is like a bird call, climbing at the 2021 Olympics, and much more! Christof's Kaggle Profile: https://www.kaggle.com/christofhenkel Christof's Twitter: https://twitter.com/kagglingdieter
We sit down and talk with 4x Kaggle Grandmaster Chris Deotte about his career, how he got started doing Kaggle, how you can get started doing Kaggle, feature engineering, the perks of AGI, and a lot more! Chris on Kaggle: https://www.kaggle.com/cdeotte
I sit down and talk with the new Director of Engineering for RAPIDS at NVIDIA, John Zedlewski about what economics can learn from machine learning practitioners, engineering challenges that ended up being harder than first thought, how increased automation will change the day-to-day work of data scientists, and much more.
I talk with Zahra Ronaghi, Engineering Manager of AI Infrastructure at NVIDIA and Christoph Keller, Atmospheric Chemist with the NASA Goddard Space Flight Center about their collaboration to bring GPU-accelerated data science to the study of air pollution. You can find their first blog on the collaboration here and their work around the atmospheric impact of COVID here. For more about GPU accelerated shape this blog is a good place to start.
On this week's episode, we have NVIDIA's Head of Developer Relations, Data Science, Jim Scott. We talk about the data science of fine whiskey, data science for fitness, the “secret” of Kaggle Grand Masters (spoiler: it's giving back to the community), learning and community resources as the future of data science, classic “paradoxes” in basic probability, and some great resources for being a better data scientist. Kaggle Grandmaster Youtube Interviews - Here's the most recent sit down Jim did with the Kaggle Grand Masters of NVIDIA. https://www.youtube.com/watch?v=bHuww-l_Sq0 Data Science of the Day - we talk about this toward the end of the episode, and this is a GREAT resource to keep up-to-date with everything going on in data science. https://forums.developer.nvidia.com/c/ai-data-science/data-science-of-the-day/323/none Jim on Twitter: https://twitter.com/kingmesal Jim and I reminisce about the Birthday Paradox - here's a good piece on it from Scientific American. Jim and I were way off on remembering how likely birthday sharing is in a small handful of people. https://www.scientificamerican.com/article/bring-science-home-probability-birthday-paradox/ Don't let us get your goat talking about the Monty Hall Problem. This explainer shows how an example with a larger number of doors can help give more intuition about what's actually happening by changing your guess. https://www.statisticshowto.com/probability-and-statistics/monty-hall-problem/ Cantor's Diagonalization Theorem mentioned in passing. Here's a link to the wikipedia article - if you aren't familiar with it, you should check it out. https://en.wikipedia.org/wiki/Cantor%27s_diagonal_argument
This week I talk with John Murray. John has been a data scientist, a CTO, and a professor and has unique insight on the history of data science and where it is going. It's a great episode and I hope you'll enjoy! Links described in the episode: MurrayData on Github Fusion Data Science John's GTC 2020 talk on flood relief
Our guest this week in the one and only Wes McKinney, creator of Pandas and Apache Arrow. We have a great conversation about his career journey, funding and maintaining open-source software projects, his new company Ursa Computing, how Pandas grew from a passion project to the lingua franca of Python data science, and a lot more.
We sit down and talk with Allan Enemark, data viz lead for RAPIDS and Bryan Van de Ven, Senior Engineer and co-creator of Bokeh to talk about what GPUs are doing for the visualization of data sets across many different tools, and what the future holds for showing your audience what the data is saying. Links to things discussed in the episode: Datashader Plotly HoloViz Bokeh Vis.gl JupyterCon Tutorial - check it out! cuxfilter (pronounced "cu - crossfilter") - code on github Twitter accounts to follow to keep your finger on the pulse of the latest in data viz: https://twitter.com/DataVizSociety https://twitter.com/jonmmease https://twitter.com/AlbertoCairo https://twitter.com/visualisingdata https://twitter.com/Elijah_Meeks https://twitter.com/viegasf https://twitter.com/giorgialupi https://twitter.com/flowingdata https://twitter.com/infobeautiful
Join me as I sit down with Felipe Aramburu and William Malpica as we talk about BlazingSQL's GPU-accelerated SQL queries, start-up life, the tech talent in Peru, things we used to hate about SQL and a lot more. Give BlazingSQL a try at app.blazingsql.com and once you're convinced, and go here beta.blazingsql.com for their beta of the paid version that will give you access to very large GPU clusters. Thanks!
In the first bonus episode of RAPIDSFire, I sit down with Rachel Allen, who holds a PhD in Neuroscience. We talk about how neural net models relate to and differ from real brains, ethical issues around conscience machines and their training, and what steps might be taken to get closer to true thinking machines. I had a lot of fun recording this, and I hope you enjoy it! Reconstructing visual experiences from brain activity evoked by natural movies https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3326357/ Dead Salmon Study http://prefrontal.org/files/posters/Bennett-Salmon-2009.jpg Existential Comics - Turing Tests and Other Things of That Nature https://existentialcomics.com/comic/357 Spiteful Octopi https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3266 Corrupted Microsoft Language Model https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/in-2016-microsofts-racist-chatbot-revealed-the-dangers-of-online-conversation
Join me as I talk with Rachel Allen and Bartley Richardson about applying data science to cybersecurity with GPUs and RAPIDS. We'll also talk in-depth about an amazing extension of the BERT transformer model: CyBERT, the pre-built GPU pipelines in CLX, a super fast GPU tokenizer, and what to expect from them next. The link to the repos discussed in the episode is here: https://github.com/rapidsai/clx
Welcome to the first episode of RAPIDSFire! My rotating cohost this week is Josh Patterson, Senior Director of Engineering at NVIDIA, and Keith Kraus, Systems Software Senior Manager at NVIDIA. These two gentlemen were driving forces behind RAPIDS from the very start, and this is an illuminating talk about GPU data science, open source software, and the past, present, and future of RAPIDS.