Podcasts about pydata

  • 33PODCASTS
  • 53EPISODES
  • 51mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • May 9, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about pydata

Latest podcast episodes about pydata

DataTalks.Club
Build a Strong Career in Data - Lavanya Gupta

DataTalks.Club

Play Episode Listen Later May 9, 2025 51:59


In this podcast episode, we talked with Lavanya Gupta about Building a Strong Career in Data.About the Speaker: Lavanya is a Carnegie Mellon University (CMU) alumni of the Language Technologies Institute (LTI). She works as a Sr. AI/ML Applied Associate at JPMorgan Chase in their specialized Machine Learning Center of Excellence (MLCOE) vertical. Her latest research on long-context evaluation of LLMs was published in EMNLP 2024. In addition to having a strong industrial research background of 5+ years, she is also an enthusiastic technical speaker. She has delivered talks at events such as Women in Data Science (WiDS) 2021, PyData, Illuminate AI 2021, TensorFlow User Group (TFUG), and MindHack! Summit. She also serves as a reviewer at top-tier NLP conferences (NeurIPS 2024, ICLR 2025, NAACL 2025). Additionally, through her collaborations with various prestigious organizations, like Anita BOrg and Women in Coding and Data Science (WiCDS), she is committed to mentoring aspiring machine learning enthusiasts.In this episode, we talk about Lavanya Gupta's journey from software engineer to AI researcher. She shares how hackathons sparked her passion for machine learning, her transition into NLP, and her current work benchmarking large language models in finance. Tune in for practical insights on building a strong data career and navigating the evolving AI landscape.

Talk Python To Me - Python conversations for passionate developers
#478: When and how to start coding with kids

Talk Python To Me - Python conversations for passionate developers

Play Episode Listen Later Sep 25, 2024 54:25


Do you have kids? Maybe nieces and nephews? Or maybe you work in a school environment? Maybe it's just friend's who know you're a programmer and ask about how they should go about introducing programming concepts with them. Anna-Lena Popkes is back on the show to share her research on when and how to teach kids programming. We spend the second half of the episode talking about concrete apps and toys you might consider for each age group. Plus, some of these things are fun for adults too. ;) Episode sponsors WorkOS Talk Python Courses Links from the show Anna-Lena: alpopkes.com Magical universe repo: github.com Machine learning basics repo: github.com PyData recording "when and how to start coding with kids": youtube.com Robots and devices Bee Bot: terrapinlogo.com Cubelets: modrobotics.com BBC Microbit: microbit.org RaspberryPi: raspberrypi.com Adafruit Qualia ESP32 for CircuitPython: adafruit.com Zumi: robolink.com Board games Think Fun Robot Turtles Board Game: amazon.com Visual programming: Scratch Jr.: scratchjr.org Scratch: scratch.org Blocky: google.com Microbit's Make Code: microbit.org Code Club: codeclubworld.org Textual programming Code Combat: codecombat.com Hedy: hedycode.com Anvil: anvil.works Coding classes / summer camps (US) Portland Community College Summer Teen Program: pcc.edu Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy

R Weekly Highlights
Issue 2023-W50 Highlights

R Weekly Highlights

Play Episode Listen Later Dec 13, 2023 43:40


A data-driven investigation to the association of early birthdays and hockey players, one of the most-requested feature requests is coming to the next version of Quarto, and just why in the world does the View() function start with V? Episode Links This week's curator: Jon Calder (@jonmcalder (https://twitter.com/jonmcalder)) (Twitter) Are Birth Dates Still Destiny for Canadian NHL Players? (https://jlaw.netlify.app/2023/12/04/are-birth-dates-still-destiny-for-canadian-nhl-players/) Quarto Dashboards (https://www.youtube.com/watch?v=_VGJIPRGTy4) Why is View() capitalized, anyway? (https://mm218.dev/posts/2023-12-07-View/index.html) Entire issue available at rweekly.org/2023-W50 (https://rweekly.org/2023-W50.html) Supplement Resources JJ Allaire's Quarto dashboards keynote at PyData 20203 https://www.youtube.com/watch?v=3HCAScFqr10 MyNorfolk Quarto dashboard https://grrrck.quarto.pub/mynorfolk-dash Supporting the show Use the contact page at https://rweekly.fireside.fm/contact to send us your feedback R-Weekly Highlights on the Podcastindex.org (https://podcastindex.org/podcast/1062040) - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby (https://getalby.com/), and then head over to the R-Weekly Highlights podcast entry on the index. A new way to think about value: https://value4value.info Get in touch with us on social media Eric Nantz: @theRcast (https://twitter.com/theRcast) (Twitter) and @rpodcast@podcastindex.social (https://podcastindex.social/@rpodcast) (Mastodon) Mike Thomas: @mike_ketchbrook (https://twitter.com/mike_ketchbrook) (Twitter) and @mike_thomas@fosstodon.org (https://fosstodon.org/@mike_thomas) (Mastodon)

DataTalks.Club
Data-Centric AI - Marysia Winkels

DataTalks.Club

Play Episode Listen Later Jan 6, 2023 53:07


We talked about: Marysia's background What data-centric AI is Data-centric Kaggle competitions The mindset shift to data-centric AI Data-centric does not mean you should not iterate on models How to implement the data-centric approach Focusing on the data vs focusing on the model Resources to help implement the data-centric approach Data-centric AI vs standard data cleaning Making sure your data is representative Knowing when your data is good enough The importance of user feedback “Shadow Mode” deployment What to do if you have a lot of bad data or incomplete data Marysia's role at PyData How Marysia joined PyData The difference between PyData and PyCon Finding Marysia online Links: Embetter & Bulk Demo: https://www.youtube.com/watch?v=L---nvDw9KU Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

The MLOps Podcast

In this episode, I speak with Dean Langsam, Data Scientist at SentinelOne and one of the organizers of PyData in Israel. We chat about imposter syndrome, the best field in machine learning, why XGBoost is the best model, and the fact that most organizations have too much data. It was fascinating for me, so I hope you enjoy it too.

DataTalks.Club
Machine Learning in Marketing - Juan Orduz

DataTalks.Club

Play Episode Listen Later May 27, 2022 52:52


We talked about: Juan's background Typical problems in marketing that are solved with ML Attribution model Media Mix Model – detecting uplift and channel saturation Changes to privacy regulations and its effect on user tracking User retention and churn prevention A/B testing to detect uplift Statistical approach vs machine learning (setting a benchmark) Does retraining MMM models often improve efficiency? Attribution model baselines Choosing a decay rate for channels (Bayesian linear regression) Learning resource suggestions Bayesian approach vs Frequentist approach Suggestions for creating a marketing department Most challenging problems in marketing The importance of knowing marketing domain knowledge for data scientists Juan's blog and other learning resources Finding Juan online Links: Juan's PyData talk on uplift modeling: https://youtube.com/watch?v=VWjsi-5yc3w Juan's website: https://juanitorduz.github.io Introduction to Algorithmic Marketing book: https://algorithmic-marketing.online Preventing churn like a bandit: https://www.youtube.com/watch?v=n1uqeBNUlRM MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Vanishing Gradients
Episode 7: The Evolution of Python for Data Science

Vanishing Gradients

Play Episode Listen Later May 1, 2022 62:31


Hugo speaks with Peter Wang, CEO of Anaconda, about how Python became so big in data science, machine learning, and AI. They jump into many of the technical and sociological beginnings of Python being used for data science, a history of PyData, the conda distribution, and NUMFOCUS. They also talk about the emergence of online collaborative environments, particularly with respect to open source, and attempt to figure out the movings parts of PyData and why it has had the impact it has, including the fact that many core developers were not computer scientists or software engineers, but rather scientists and researchers building tools that they needed on an as-needed basis They also discuss the challenges in getting adoption for Python and the things that the PyData stack solves, those that it doesn't and what progress is being made there. People who have listened to Hugo podcast for some time may have recognized that he's interested in the sociology of the data science space and he really considered speaking with Peter a fascinating opportunity to delve into how the Pythonic data science space evolved, particularly with respect to tooling, not only because Peter had a front row seat for much of it, but that he was one of several key actors at various different points. On top of this, Hugo wanted to allow Peter's inner sociologist room to breathe and evolve in this conversation. What happens then is slightly experimental – Peter is a deep, broad, and occasionally hallucinatory thinker and Hugo wanted to explore new spaces with him so we hope you enjoy the experiments they play as they begin to discuss open-source software in the broader context of finite and infinite games and how OSS is a paradigm of humanity's ability to create generative, nourishing and anti-rivlarous systems where, by anti-rivalrous, we mean things that become more valuable for everyone the more people use them! But we need to be mindful of finite-game dynamics (for example, those driven by corporate incentives) co-opting and parasitizing the generative systems that we build. These are all considerations they delve far deeper into in Part 2 of this interview, which will be the next episode of VG, where we also dive into the relationship between OSS, tools, and venture capital, amonh many others things. LInks Peter on twitter (https://twitter.com/pwang) Anaconda Nucleus (https://anaconda.cloud/) Calling out SciPy on diversity (even though it hurts) (https://ilovesymposia.com/2015/04/03/calling-out-scipy-on-diversity/) by Juan Nunez-Iglesias Here Comes Everybody: The Power of Organizing Without Organizations (https://en.wikipedia.org/wiki/Here_Comes_Everybody_(book)) by Clay Shirky Finite and Infinite Games (https://en.wikipedia.org/wiki/Finite_and_Infinite_Games) by James Carse Governing the Commons: The Evolution of Institutions for Collective Action (https://www.cambridge.org/core/books/governing-the-commons/7AB7AE11BADA84409C34815CC288CD79) by Elinor Olstrom Elinor Ostrom's 8 Principles for Managing A Commmons (https://www.onthecommons.org/magazine/elinor-ostroms-8-principles-managing-commmons)

Python Bytes
#265 Get asizeof pympler and muppy

Python Bytes

Play Episode Listen Later Jan 5, 2022 47:46


Watch the live stream: Watch on YouTube About the show Sponsored by us: Check out the courses over at Talk Python And Brian's book too! Special guest: Matt Kramer (@__matt_kramer__) Michael #1: Survey results Question 1: Question 2: In terms of too long, the “extras” section has started at these times in the last 4 episodes: 39m, 32m, 35m, and 33m ~= 34m on average Brian #2: Modern attrs API attrs overview now focus on using @define History of attrs article: import attrs, by Hynek predecessor was called characteristic. A discussion between Glyph and Hynek in 2015 about where to take the idea. attrs popularity takes off in 2016 after a post by Glyph: ‌The One Python Library Everyone Needs In 2017 people started wanting something like attrs in std library. Thus PEP 557 and dataclasses. Hynek, Eric Smith, and Guido discuss it at PyCon US 2017. dataclasses, with a subset of attrs functionality, was introduced in Python 3.7. Types take off. attrs starts supporting type hints as well, even before Python 3.7 Post 3.7, some people start wondering if they still need attrs, since they have dataclasses. @define, field() and other API improvements came with attrs 20.1.0 in 2020. attrs 21.3.0 released in December, with what Hynek calls “Modern attrs”. OG attrs: import attr @attr.s class Point: x = attr.ib() y = attr.ib() modern attrs: from attr import define @define class Point: x: int y: int Many reasons to use attrs listed in Why not…, which is an excellent read. why not dataclasses? less powerful than attrs, intentionally attrs has validators, converters, equality customization, … attrs doesn't force type annotation if you don't like them slots on by default, dataclasses only support slots in Python 3.10 and are off by default attrs can and will move faster See also comparisons with pydantic, named tuples, tuples, dicts, hand-written classes Matt #3: Crafting Interpreters Wanting to learn more about how Python works “under the hood”, I first read Anthony Shaw's CPython internals book A fantastic, detailed overview of how CPython is implemented Since I don't have a formal CS background, I found myself wanting to learn a bit more about the fundamentals Parsing, Tokenization, Bytecode, data structures, etc. Crafting Interpreters is an incredible book by Bob Nystrom (on Dart team at Google) Although not Python, you walk through the implementation of a dynamic, interpreted language from scratch Implement same language (called lox) in two interpreters First a direct evaluation of Abstract Syntax Tree, written in Java Second is a bytecode interpreter, written from the ground up in C, including a compiler Every line of code is in the book, it is incredibly well-written and beautifully rendered I highly recommend to anyone wanting to learn more about language design & implementation Michael #4: Yamele - A schema and validator for YAML via Andrew Simon A basic schema: name: str() age: int(max=200) height: num() awesome: bool() And some YAML that validates: name: Bill age: 26 height: 6.2 awesome: True Take a look at the Examples section for more complex schema ideas. ⚠️ Ensure that your schema definitions come from internal or trusted sources. Yamale does not protect against intentionally malicious schemas. Brian #5: pympler Inspired by something Bob Belderbos wrote about sizes of objects, I think. “Pympler is a development tool to measure, monitor and analyze the memory behavior of Python objects in a running Python application. By pympling a Python application, detailed insight in the size and the lifetime of Python objects can be obtained. Undesirable or unexpected runtime behavior like memory bloat and other “pymples” can easily be identified.” 3 separate modules for profiling asizeof module provides basic size information for one or several Python objects muppy is used for on-line monitoring of a Python application Class Tracker provides off-line analysis of the lifetime of selected Python objects. asizeof is what I looked at recently In contrast to sys.getsizeof, asizeof sizes objects recursively. You can use one of the asizeof functions to get the size of these objects and all associated referents: >>> from pympler import asizeof >>> obj = [1, 2, (3, 4), 'text'] >>> asizeof.asizeof(obj) 176 >>> print(asizeof.asized(obj, detail=1).format()) [1, 2, (3, 4), 'text'] size=176 flat=48 (3, 4) size=64 flat=32 'text' size=32 flat=32 1 size=16 flat=16 2 size=16 flat=16 “Function flatsize returns the flat size of a Python object in bytes defined as the basic size plus the item size times the length of the given object.” Matt #6: hvPlot Interactive hvPlot is a high-level plotting API that is part of the PyData ecosystem, built on HoloViews My colleague Phillip Rudiger recently gave a talk at PyData Global on a new .interactive feature Here's an announcement in the HoloViz forum Allows integration of widgets directly into pandas analysis pipeline (method-chain), so you can add interactivity to your notebook for exploratory data analysis, or serve it as a Panel app Gist & video by Marc Skov Madsen Extras Michael: Typora app, recommended! Congrats Will Got a chance to solve a race condition with Tenacity New project management at GitHub Matt: Check out new Anaconda Nucleus Community forums! We're hiring, and remote-first. Check out anaconda.com/careers Pre-compiled packages now available for Pyston We have an upcoming webinar from Martin Durant: When Your Big Problem is I/O Bound Joke:

The Python Podcast.__init__
Doing Dask Powered Data Science In The Saturn Cloud

The Python Podcast.__init__

Play Episode Listen Later Sep 10, 2021 38:00


A perennial problem of doing data science is that it works great on your laptop, until it doesn't. Another problem is being able to recreate your environment to collaborate on a problem with colleagues. Saturn Cloud aims to help with both of those problems by providing an easy to use platform for creating reproducible environments that you can use to build data science workflows and scale them easily with a managed Dask service. In this episode Julia Signall, head of open source at Saturn Cloud, explains how she is working with the product team and PyData community to reduce the points of friction that data scientists encounter as they are getting their work done.

Let's Data
#008 - José Ferraz Neto - Da química para a ciência de dados e o papel das comunidades como o PyData

Let's Data

Play Episode Listen Later Jun 24, 2021 58:13


Neste episódio conversamos com José Ferraz Neto, cientista de dados da Agência Nacional do Petróleo - ANP e organizador do PyData Brasília. Além disso, é graduado em Química pela Universidade Federal de Santa Catarina e mestre em Ciência do Solo pela Universidade do Estado de Santa Catarina.  Falamos sobre como ocorreu a mudança de área de atuação de químico para cientista de dados: os motivos, os desafios, o processo e o caminho percorrido. Também conversamos sobre projetos de machine learning no setor público e o papel das comunidades na vida e no desenvolvimento dos cientistas de dados, com destaque especial para o PyData Brasília, grupo coordenado por ele. Acesse nosso post para ter acesso a links e referências: https://medium.com/lets-data/

Sustain
Episode 79: Leah Silen on how NumFocus helps makes scientific code more sustainable

Sustain

Play Episode Listen Later Jun 4, 2021 35:44


Guest Leah Silen Panelists Eric Berry | Justin Dorfman | Alyssa Wright | Richard Littauer Show Notes Hello and welcome to Sustain! Today, our special guest is Leah Silen, who is the Executive Director of NumFOCUS. She has been the primary driver behind the organization and execution of its programs including fiscal sponsorship, the PyData event series, and DEI initiatives. We learn what NumFOCUS does, how it works in terms of scientific research, who provides the funding, and the diversity, equity, and inclusion support that NumFOCUS provides projects. Leah talks about the importance of Grant Management and Community Management needed to help projects in the future, and a “Sustain Exclusive” announcement is made by Leah on something NumFOCUS is in the early stages of building. Go ahead and download this episode now to find out what it is! [00:01:16] Leah explains what NumFOCUS does, how it works, and what scientific open source means. [00:03:22] Since NASA researchers use NumFOCUS for sponsored projects, Justin asks if there are any sponsored projects on Mars right now. [00:05:18] Leah tells us about NumFOCUS being a project foundational to scientific research. [00:05:54] We learn about Leah's art background and becoming one of the founding members of NumFOCUS. [00:07:21] There are maintainers of forty-two projects and Leah explains who the typical maintainer is of the NumFOCUS ecosystem. [00:08:14] Find out what a typical week looks like for Leah at NumFOCUS. [00:10:37] Richard is curious how Leah sees the future of this sort of organization as we're seeing more of them, and if she's just going to keep growing until there's hundreds of projects under her or will there be more or less. [00:13:12] We learn who provides funding at NumFOCUS since they have nine staff members. Justin wonders how NumFOCUS is diversifying their income and Leah makes an announcement about something NumFOCUS is building and it's a “Sustain Exclusive!” [00:16:11] Justin asks if NumFOCUS ever joins forces with the PSF. [00:16:55] Leah mentioned the diversity, equity, and inclusion support that NumFOCUS provides projects, she describes how it's important for project sustainability, and the conversations there have been. [00:19:59] Richard wonders about the process of taking on a new project. [00:23:25] Leah tells us how they deal with the maintenance of scientific projects. [00:25:24] We learn the moon-shot idea of NumFOCUS, besides just making sure all these projects run smoothly, and what the goal is. [00:26:42] Leah tells us what she's most excited about in terms of providing better stuff to projects in the near future. [00:29:20] Community Manager and Developer Advocate is discussed. [00:31:20] Find out where you can follow Leah and NumFOCUS on the internet. Quotes [00:04:00] “Many of the leaders in that project work for a division of NASA that have been directly involved in Mars Roemer images and things like that, as well as Astro Pi, another one of the projects that's widely used by the astronomy community.” [00:05:18] “We many times speak of NumFOCUS projects as being very foundational to scientific research.” [00:10:59] “We have to make sure that as the number of projects that we're sponsoring are affiliated with NumFOCUS grows, that the organization is able to scale with that.” [00:12:20] “And there's so many areas that we don't address that we could address for our projects, you know just handling the legal aspect, grant management, helping them with we have a contributor diversification and research program.” [00:12:35] “So working on DEI initiatives that's woven through everything we do and helping our projects with that.” [00:23:58] “But that's one reason we really want to work and focus on diversifying the contributor base. Also, with contributors who are across different domains and in different areas.” [00:24:08] “So, if a project comes and applies to NumFOCUS and everyone is at one university, we don't consider that open, so there has to be contributors spread out no more than two employed, whether that's a university or whether that's a for-profit entity.” [00:26:50] “So, I think projects, a lot of the things that NumFOCUS does can be related to Community Management but definitely when you're talking about more of an internal project community.” [00:27:20] “I think that is probably one of the things that is most needed across projects is every project having a Community Manager to really look at their internal communities as well as interactions with their user base.” Spotlight [00:32:05] Alyssa's spotlight is Community Managers. [00:32:44] Eric's spotlight is Doom Emacs. [00:33:21] Justin's spotlight is Lipgloss by Charm. [00:33:42] Richard's spotlight is IDLE. [00:34:09] Leah's spotlight is Sustain Diversity Working Group. Links NumFOCUS (https://numfocus.org/) NumFOCUS Twitter (https://twitter.com/NumFOCUS?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) info@numfocus.org (mailto:info@numfocus.org) leah@numfocus.org (mailto:leah@numfocus.org) “5 qualities of outstanding open source community managers” by Jason Blais (https://opensource.com/article/20/9/open-source-community-managers) Doom Emacs-GitHub (https://github.com/hlissner/doom-emacs) Lipgloss-Charm (https://github.com/charmbracelet) Charm Twitter (https://twitter.com/charmcli?lang=en) IDLE (https://docs.python.org/3/library/idle.html) Sustain Working Groups (https://sustainoss.org/working-groups/) Credits Produced by Richard Littauer (https://www.burntfen.com/) Edited by Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/) Show notes by DeAnn Bahr at Peachtree Sound (https://www.peachtreesound.com/) Special Guest: Leah Silen.

Intervista Pythonista
Ep 1 Diventare imprenditori di se stessi con NLP

Intervista Pythonista

Play Episode Listen Later May 5, 2021 34:44


Conosciamo Marco Bonzanini, data scientist freelance, trainer ed esperto di Natural Language Processing. Marco è anche chair e organizzatore di PyData

DataTalks.Club
Getting Started with Open Source - Vincent Warmerdam

DataTalks.Club

Play Episode Listen Later Jan 29, 2021 62:46


We talked about open source getting started with open source convincing your employer to contribute to open source public speaking the checklist for open source projects the role of research advocate And many more things! Links from Vincent: https://www.youtube.com/watch?v=68ABAU_V8qI&t=975s&ab_channel=PyData https://www.youtube.com/watch?v=kYMfE9u-lMo&t=958s&ab_channel=PyData https://koaning.io/projects.html https://calmcode.io/ https://makenames.io/ https://koaning.github.io/clumper/api/clumper.html Join DataTalks.Club: https://datatalks.club​

Gradient Dissent - A Machine Learning Podcast by W&B
Peter Wang on Anaconda, Python and Scientific Computing

Gradient Dissent - A Machine Learning Podcast by W&B

Play Episode Listen Later Jan 21, 2021 50:11


Peter Wang talks about his journey of being the CEO of and co-founding Anaconda, his perspective on the Python programming language, and its use for scientific computing. Peter Wang has been developing commercial scientific computing and visualization software for over 15 years. He has extensive experience in software design and development across a broad range of areas, including 3D graphics, geophysics, large data simulation and visualization, financial risk modeling, and medical imaging. Peter’s interests in the fundamentals of vector computing and interactive visualization led him to co-found Anaconda (formerly Continuum Analytics). Peter leads the open source and community innovation group. As a creator of the PyData community and conferences, he devotes time and energy to growing the Python data science community and advocating and teaching Python at conferences around the world. Peter holds a BA in Physics from Cornell University. Follow peter on Twitter: https://twitter.com/pwang​ https://www.anaconda.com/​ Intake: https://www.anaconda.com/blog/intake-...​ https://pydata.org/​ Scientific Data Management in the Coming Decade paper: https://arxiv.org/pdf/cs/0502008.pdf Topics covered: 0:00​ (intro) Technology is not value neutral; Don't punt on ethics 1:30​ What is Conda? 2:57​ Peter's Story and Anaconda's beginning 6:45​ Do you ever regret choosing Python? 9:39​ On other programming languages 17:13​ Scientific Data Management in the Coming Decade 21:48​ Who are your customers? 26:24​ The ML hierarchy of needs 30:02​ The cybernetic era and Conway's Law 34:31​ R vs python 42:19​ Most underrated: Ethics - Don't Punt 46:50​ biggest bottlenecks: open-source, python Visit our podcasts homepage for transcripts and more episodes! www.wandb.com/podcast Get our podcast on these other platforms: YouTube: http://wandb.me/youtube Soundcloud: http://wandb.me/soundcloud Apple Podcasts: http://wandb.me/apple-podcasts Spotify: http://wandb.me/spotify Google: http://wandb.me/google-podcasts Join our bi-weekly virtual salon and listen to industry leaders and researchers in machine learning share their work: http://wandb.me/salon Join our community of ML practitioners where we host AMA's, share interesting projects and meet other people working in Deep Learning: http://wandb.me/slack Our gallery features curated machine learning reports by researchers exploring deep learning techniques, Kagglers showcasing winning models, and industry leaders sharing best practices. https://wandb.ai/gallery

Sustain
Episode 64: Travis Oliphant and Russell Pekrul on NumPy, Anaconda, and giving back with FairOSS

Sustain

Play Episode Listen Later Jan 8, 2021 39:20


Panelists Eric Berry | Justin Dorfman | Alyssa Wright | Richard Littauer Guest Travis Oliphant | Russell Pekrul Show Notes Hello and welcome to Sustain! Today, we have two guests from OpenTeams in Austin, Travis Oliphant and Russell Pekrul. Travis is the CEO and Russell is the Program Manager and the Founder and Director of FairOSS. We learn all about what OpenTeams and FairOSS are and how they work. Also, Travis tells us about the non-profit he started called NumFOCUS. Other topics discussed are dependencies and how their values are assigned, NumPy and SciPy, and building relationships with companies, which Russell mentions there is a bit of a “chicken and egg” problem here. There is some incredible advice and fascinating stories shared today so go ahead and download this episode now! [00:01:10] We find out what OpenTeams is and how it works. Travis also tells us when he wrote NumPy and SciPy and when he started OpenTeams. [00:07:18] Travis tells us about a non-profit he started with a bunch of people called NumFOCUS so there could be a home for the fiscal sponsor for open source projects. [00:09:24] Russell tells us what FairOSS is and how it works. [00:11:32] Alyssa asks Russell how does he first see the dependencies and then how does he assign that value? He mentions BackYourStack as a starting point. [00:13:00] Eric brings up one of the problems he’s found with trying to fund up open source is that it’s very difficult to solve the problem on more a grand scale. He wonders how Travis and Russell make the impact they want with the magnitude of problems they see. A key piece Travis brings up that they recognize is there’s a data gap and projects have to be participating. Alyssa wonders if projects are aware of their dependencies. [00:17:22] Richard asks about the dependency graph that they are making. He wonders how do you go down the stack and look all the way at the base and how do you judge the usefulness of what dependencies really matter for what code matters for the business proposition? Richard also wonders if anyone has done equity stuff for open source maintainers. [00:23:06] Alyssa is interested in learning more about how Travis and Russell are building the relationships with these companies and what we can do to help. [00:26:35] Alyssa asks Travis and Russell to talk about why this, why now, with this being a time of economic contraction, why is this important? Also, why have they been seeing traction during what can be difficult times for a lot of companies? [00:27:40] Eric asks if Travis can give an example of a project that he feels does that well, that doesn’t have to go through and do it twice, essentially. [00:29:48] Alyssa brings up investments around open source start-ups and how they start with a commitment towards open source and once the investment happens there’s a pivot. She wonders if Travis could talk about how this type of sustainability is shifting that model of these investments. Travis tells a story about speaking to the Founder of SaltStack and how their views matched. [00:34:03] We find out where you can learn more about FairOSS and follow them on this journey, invest, and join in. Spotlight [00:34:52] Justin’s spotlight is Curiefense, which extends Envoy proxy to protect all forms of web traffic. [00:35:15] Alyssa’s spotlight is Pixel8.earth. [00:36:06] Eric’s spotlight is OctoPrint. [00:36:53] Richard’s spotlight is Michael Oliphant’s work. [00:37:36] Russell’s spotlight is Conda. [00:38:20] Travis’s spotlight is Matplotlib. Quotes [00:03:25] “We were connecting and creating a social network long before the social networks started. That was the early days of social networks and it was addicting.” [00:04:14] “New libraries are starting to be written on numarray and we had SciPy written on numeric and there was this fork in this flegging scientific community in Python.” [00:21:18] “So that was a very exciting day. Actually, I remember I told my wife you know the problem I’ve been searching on for twenty years, I finally figured it out. I’ve been trying to figure out twenty years how to make this work, and I finally figured it out. I had to go start several companies and start a venture fund and get involved in finance and cap tables to really pull it off, but that got me excited. Now I also said, but we’re at the base of Mount Everest, like all we’ve got to do is climb to the top of this mountain and we’re there.” [00:22:44] “So you basically have a company and its value is spread to all the values of the projects. You have a bunch of those, have a thousand of those, that each add incrementally the value of a project. Invert the matrix and every project now has a linear dependency on companies that effectively you created an index fund out of every project.” [00:24:52] “The idea is if you can get open source contributors to recognize that they want to work only for companies that are participating people want to hire open source contributors. They’re some of the best people to bring into your company.” [00:25:21] “We found that companies would absolutely sponsor PyData and the reason they would is because they’re trying to hire people. They wanted to hire the best developers and they would. So, they really didn’t care so much about the projects they started, but they wanted the people.” [00:27:10] “Go make an open source project, then get somebody or connect with somebody who’s going to help you build a company that they’ll vest in and build something else. So, you basically have to do it twice.” [00:28:34] “I’ve had the chance to work at companies large and small, go in and see that’s used to do x, and realized it’s added billions of dollars of value to a lot of work for the world. And yet, the same time NumPy struggled, not enough funding to maintain itself.” [00:30:15] “I spoke to the founder of SaltStack that just got acquired by VMware. I spoke to him about his view and it was amazing how much it matched mine, in a sense that he recognized that open source is you build some of the value and you use it. The way you need to make money is to build something that uses it but isn’t the open source.” [00:32:41] “It’s not you’re monetizing open source, you’re empowering, you’re sustaining open source, by selling and connecting the economic value to the functional value that’s there.” [00:33:04] “There will still be challenges. I’m not naïve. Every new thing comes with a whole set of new challenges.” Links OpenTeams (https://openteams.com/about) FairOSS (https://faiross.org/) FairOSS, PBC Twitter (https://twitter.com/faiross_pbc) FairOSS Community (https://community.faiross.org/login) Travis Oliphant Twitter (https://twitter.com/teoliphant?lang=en) Anaconda Dividend Program (https://www.anaconda.com/blog/sustaining-the-open-source-ds-ml-ecosystem-with-the-anaconda-dividend-program) Quansight (https://www.quansight.com/) NumFOCUS (https://numfocus.org/) BackYourStack (https://backyourstack.com/) Dask (https://dask.org/) SaltStack (https://www.saltstack.com/) SciPy (https://www.scipy.org/) NumPy (https://numpy.org/) Curiefense (https://www.curiefense.io/) Pixel8.earth Ambassador Program (https://pixel8earth.medium.com/kicking-off-the-pixel8-earth-ambassador-program-80a87a70fb3a) OctoPrint (https://octoprint.org/) Michael Oliphant’s work (https://langev.com/index.php/author/moliphant/Michael+Oliphant) Conda (https://github.com/conda/conda) Matplotlib.com (https://matplotlib.org/) Credits Produced by Richard Littauer (https://www.burntfen.com/) Edited by Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/) Show notes by DeAnn Bahr at Peachtree Sound (https://www.peachtreesound.com/) Special Guests: Russell Pekrul and Travis Oliphant.

AI Podcast in 26.1 Minutes
Fernando Perez: Our Most Awarded Guest to Date

AI Podcast in 26.1 Minutes

Play Episode Listen Later Dec 15, 2020 44:00


Brian and Don welcome a much anticipated guest for this episode, Professor Fernando Perez joins us for an episode of 26.1 AI Podcast. Dr. Perez speaks about his journey, the community, and all the challenges along the way. Fernando shares in his inimitable style, how he journeyed from straight laced physicist in pursuit of an academic career to doggedly ignoring naysayers and creating one of the most important components of the modern PyData stack. One personal challenge during this journey was losing a friend Dr. John Hunter. John also influenced your host Brian Ray. Though John missed collaborating when Fernando set out with IPython because of conflicts from prior commitments, the two joined in later to collaborate on advancing tools data scientists use every day now. Sit back and enjoy Fernando's dexterity on multiple topics as hosts Brian Ray and Don Sheu hold on for the ride for your benefit listener.

AI Podcast in 26.1 Minutes
[1/2] Built from Open Source Software: Coiled Team Visits 26.1 AI Podcast

AI Podcast in 26.1 Minutes

Play Episode Listen Later Dec 1, 2020 26:21


[Part 1 of 2] Listeners join in for a wonderful conversation in this episode. Our guests Matthew Rocklin and Hugo Bowne-Anderson are extending access to powerful distributed computing for more data users with their startup Coiled (https://coiled.io/). Data scientists with a two minute download of Coiled’s software (https://cloud.coiled.io/) can scale their work to the cloud. We discuss during the episode how conversations with the open source community resembles early customer conversations commonly used by entrepreneurs in a lean startup framework. Dask’s creator and Coiled founder Matthew described his software design approach that has a decided minimalist bent. A great benefit for users of popular Python libraries because of Matt’s approach is a familiar interface when using Dask or Coiled to extend the power of popular PyData stack tools. Our conversation turns to how Coiled has the capability to extend more computation power to many casual users of Python who are interested in solving data problems pragmatically without rebuilding a data factory every time. [Join us next week for Part 2]

MLOps.community
Scalable Python for Everyone, Everywhere // Matthew Rocklin // MLOps Meetup #37

MLOps.community

Play Episode Listen Later Oct 19, 2020 57:10


Parallel Computing with Dask and Coiled Python makes data science and machine learning accessible to millions of people around the world. However, historically Python hasn't handled parallel computing well, which leads to issues as researchers try to tackle problems on increasingly large datasets. Dask is an open source Python library that enables the existing Python data science stack (Numpy, Pandas, Scikit-Learn, Jupyter, ...) with parallel and distributed computing. Today Dask has been broadly adopted by most major Python libraries, and is maintained by a robust open source community across the world. This talk discusses parallel computing generally, Dask's approach to parallelizing an existing ecosystem of software, and some of the challenges we've seen in deploying distributed systems. Finally, we also addressed the challenges of robustly deploying distributed systems, which ends up being one of the main accessibility challenges for users today. We hope that by the end of the meetup attendees will better understand parallel computing, have built intuition around how Dask works, and have the opportunity to play with their own Dask cluster on the cloud. Matthew is an open source software developer in the numeric Python ecosystem. He maintains several PyData libraries, but today focuses mostly on Dask a library for scalable computing. Matthew worked for Anaconda Inc for several years, then built out the Dask team at NVIDIA for RAPIDS, and most recently founded Coiled Computing to improve Python's scalability with Dask for large organizations. Matthew has given talks at a variety of technical, academic, and industry conferences. A list of talks and keynotes is available at (https://matthewrocklin.com/talks). Matthew holds a bachelor’s degree from UC Berkeley in physics and mathematics, and a PhD in computer science from the University of Chicago. Check out our posts here to get more context around where we're coming from: https://medium.com/coiled-hq/coiled-dask-for-everyone-everywhere-376f5de0eff4 https://medium.com/coiled-hq/the-unbearable-challenges-of-data-science-at-scale-83d294fa67f8 ----------- Connect With Us ✌️------------- Join our Slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/ Connect with Matthew on LinkedIn: https://www.linkedin.com/in/matthew-rocklin-461b4323/

a16z
Reining in Complexity: Data Science & Future of AI/ML Businesses

a16z

Play Episode Listen Later Aug 21, 2020 44:28


There is no spoon. Or rather, “There is no such thing as ‘data’, there’s just frozen models”, argues Peter Wang, the co-founder and CEO of Anaconda — who also created the PyData conferences and grew the early data science community there, while on the frontlines of trying to make Python useful for business analytics. He views both models and data as fluid, more like metaphysics than typical data management… Or perhaps it’s that when it comes to data, those with a physics background just better appreciate the mind-bending complexity and challenges of reining in the natural world, and therefore get the unique challenges of AI/ML development, observes a16z general partner Martin Casado — whose first job after college involved computational physics simulation and high-performance computing in Python at Lawrence Livermore National Laboratory. (Wang, meanwhile, graduated in physics.)But this not just a philosophical question — the answer has real implications for the margins, organizational structures, and building of AI/ML businesses. Especially as we’re in a tricky time of transition, where customers don’t even know what they’re asking for, yet are looking for AI/ML help or know it’s the future. So what does this all mean for the software value chain; for open source collaboration and commodification; and for the future of software businesses? After all, it’s not written in stone that “All information systems must be deconstructed into hardware, and software, and data” and that “software must have these margins”… Will there be a new type of company? image: Pawel Loj / Wikimedia Commons

Changelog Master Feed
Building the world's most popular data science platform (Practical AI #101)

Changelog Master Feed

Play Episode Listen Later Aug 17, 2020 59:12 Transcription Available


Everyone working in data science and AI knows about Anaconda and has probably “conda” installed something. But how did Anaconda get started and what are they working on now? Peter Wang, CEO of Anaconda and creator of PyData and popular packages like Bokeh and DataShader, joins us to discuss that and much more. Peter gives some great insights on the Python AI ecosystem and very practical advice for scaling up your data science operation.

Practical AI
Building the world's most popular data science platform

Practical AI

Play Episode Listen Later Aug 17, 2020 59:12 Transcription Available


Everyone working in data science and AI knows about Anaconda and has probably “conda” installed something. But how did Anaconda get started and what are they working on now? Peter Wang, CEO of Anaconda and creator of PyData and popular packages like Bokeh and DataShader, joins us to discuss that and much more. Peter gives some great insights on the Python AI ecosystem and very practical advice for scaling up your data science operation.

Mid Meet Py
Mid Meet Py - Ep.15 - Interview with Stéphane

Mid Meet Py

Play Episode Listen Later Jul 9, 2020 59:41


PyChat: Python security announcement - If your application embeds Python on Windows, you may be at risk. Matplotlib Cheatsheet EuroPyhon Merch Shop (15% off till 15th) Turn you Raspberry Pi into gaming machine Python Ireland remote meetup event today Pydata event: joined effort with regional meetup Mid Meet - Hall of Fame: Stéphane - CPython core developer, Board member of EuroPython Follow Stéphane on Twitter PyPI highlights: Pandas-Bokeh - using bokeh with pandas interrogate checks your code base for missing docstrings.

How AI Built This
#6 Samantha Rhynas

How AI Built This

Play Episode Listen Later Mar 4, 2020 91:42


Samantha Rhynas is the Head of Data at Edinburgh based data start-up Effini (Jo Watts, founder of Effini was on episode 4). As well as working at a busy start-up, Sam is an absolute legend of the Scotland meet-up scene, running and organising GirlGeek Scotland for many years and she runs the Edinburgh chapter of PyData. We had a great chat about her career path and involvement in the community. I hope you enjoy! As always, this episode was sponsored by Cathcart Associates, tech recruitment extraordinaires. Music by Fugue (https://icons8.com/music)

PyData Deep Dive
Matt Rocklin - Parallel Computing & Founding OSS Companies

PyData Deep Dive

Play Episode Listen Later Mar 2, 2020 52:28


In this episode I talk with Matt Rocklin. Matt is best known for his work on Dask, a parallel computing package built into the PyData stack. After working on open source software at Anaconda and NVIDIA he now founded his own company centered around Dask called Coiled Computing. In this episode we talk about the insights into open source he gained through his career, what Dask is and how it is funded, and then of course his new company.Links:https://twitter.com/mrocklinhttps://dask.orghttps://coiled.iohttps://matthewrocklin.comhttps://rapids.aihttps://pangeo.orghttps://prefect.ioThanks to my Patrons for their support, especially:Daniel GerlancRichard CraibJonathan NgSupport me here to get early access: https://www.patreon.com/twiecki PyData is a registered trademark of NumFOCUS, Inc.Support the show (https://www.patreon.com/twiecki)

PyData Deep Dive
Travis Oliphant - The past, present and future of PyData

PyData Deep Dive

Play Episode Listen Later Jan 6, 2020 61:37


Let's welcome the new year with a new episode of the PyData Deep Dive.In this episode I talk to Travis Oliphant: Founder of Anaconda Inc and Quansight Inc, as well as the creator of NumPy. In this episode Travis takes us from the early days of NumPy up to the current state and future of the PyData ecosystem and how Quansight is contributing to that future. Special thanks to my Patreons Andrew Ng, Daniel Gerlanc, and Richard Craib.If you would like to support the podcast go to: https://patreon.com/twieckiFollow Travis on Twitter: https://twitter.com/teoliphantFollow me on Twitter: https://twitter.com/twieckiSupport the show (https://www.patreon.com/twiecki)

AI Podcast in 26.1 Minutes
Peter Wang (part 2/2): CEO/founder Anaconda, Creator of PyData

AI Podcast in 26.1 Minutes

Play Episode Listen Later Dec 31, 2019 24:08


Second part of our 2 part series with Anaconda founder and CEO, Peter Wang, we get to the core reason why this podcast exists. We want everybody more literate about the tech wave that promises to fundamentally change how we live. In this session, Peter reminds us how in the atomic age, people wanted nuclear underwear. With nuclear power, though, the possible devastation is palpable for a casual observer. Do average users of AI understand that irresponsibly deployed AI can harm people, societies, and the world around us? One tableau Peter draws to illustrate how folks may underestimate the destructive power of AI, is a table and a smart speaker in the room. Most consumers using smart speakers and associated assistant personas don't know that there's massive computing power behind the interaction. We come back to humans and the importance of practitioners to become Dirk Gently holistic detectives of AI. We come to a conclusion familiar to listeners. Experts deploying AI, need to take the wider view of our work and impact on humanity.

AI Podcast in 26.1 Minutes
Peter Wang (part 1/2): CEO/founder Anaconda, Creator of PyData

AI Podcast in 26.1 Minutes

Play Episode Listen Later Dec 10, 2019 26:14


In the Python world and in the scientific Python world especially, Peter needs no introduction. Anaconda's CEO shared with us, "Anaconda has more users than World of Warcraft, Matlab, SAS, Tableau, and Dropbox Combined." We had so much fun, our conversation ran long, long enough for 26.1 AI Podcast's first serial two part interview with a guest. During our conversation, Peter gets philosophical. As a technologist and a practitioner he discusses the hype v. reality. Our guest posits there is an, "honest path," how Peter believes technology can be leveraged for human good. A big takeaway from this episode is that there's no configuration toggle for ethical AI. Ethics starts with the human in the loop creating the AI.

ajitofm
ajitofm 52: ちゅらデータでの仕事、沖縄でのエンジニアリング、方言の形態素解析

ajitofm

Play Episode Listen Later Nov 24, 2019 54:22


amacbeeさん、アイパー隊長さん、ジュエルとちゅらデータでの仕事、沖縄でのエンジニアリング、沖縄方言の形態素解析、インターン設計などについて話しました。 ちゅらデータ株式会社 DATUM STUDIO株式会社 | データムスタジオ PyData.Okinawa - connpass PHPカンファレンス沖縄2019 採用情報 - ちゅらデータ株式会社 連続テレビ小説「なつぞら」|NHKオンライン おきなわMOSAIC | RBC 琉球放送 ヤー(やー)とは | 沖縄方言辞典 あじまぁ サマーインターンシップ(2019)募集を終了しました。 ちゅらデータ株式会社(91602)-engage フィードバックもお待ちしております! https://ajito.fm/form/ または Twitter: #ajitofm までどうぞ。

Python Podcast
Python 3.8

Python Podcast

Play Episode Listen Later Nov 12, 2019 79:23


Nach längerer Pause aufgrund von Urlaub und Terminkoordinationsschwierigkeiten sind wir wieder mit einer etwas unvorbereiteten Episode am Start und reden mit Christian über Python 3.8, Konferenzbesuche und diverse Nebensächlichkeiten. Shownotes Unsere E-Mail für Fragen, Anregungen & Kommentare: hallo@python-podcast.de News aus der Szene Python 3.8 PyConDE und PyData Berlin 2019 Fluent Python [Book] - Beyond Paradigms: a new key to grok Python & other languages [talk] Guido Retires mypy JupyterLab - A Tour of JupyterLab Extensions [talk] 10 Years of Automated Category Classification for Product Data Job Panel (Freelance) [talk] Flying Circus Python Software Verband Python 3.8 PEP 572 -- Assignment Expressions (walrus operator) hynek 2to3 - Automated Python 2 to 3 code translation PEP 570 -- Python Positional-Only Parameters multiprocessing.shared_memory — Provides shared memory for direct access across processes¶ tuple unpacking PEP 578 -- Python Runtime Audit Hooks Core Sprint CPython Core Developer Sprint 2019 GIL - global interpreter lock PEG Parsers batou Jinja Picks Django Forum TextBlob: Simplified Text Processing Öffentliches Tag auf konektom

Learning Bayesian Statistics
#2 When should you use Bayesian tools, and Bayes in sports analytics, with Chris Fonnesbeck

Learning Bayesian Statistics

Play Episode Listen Later Oct 22, 2019 43:37


When are Bayesian methods most useful? Conversely, when should you NOT use them? How do you teach them? What are the most important skills to pick-up when learning Bayes? And what are the most difficult topics, the ones you should maybe save for later? In this episode, you’ll hear Chris Fonnesbeck answer these questions from the perspective of marine biology and sports analytics. Chris is indeed the New York Yankees’ senior quantitative analyst and an associate professor at Vanderbilt University School of Medicine. He specializes in computational statistics, Bayesian methods, meta-analysis, and applied decision analysis. He also created PyMC, a library to do probabilistic programming in python, and is the author of several tutorials at PyCon and PyData conferences. Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com! Links from the show: Chris on Twitter: https://twitter.com/fonnesbeck PyMC3, Probabilistic Programming in Python: https://docs.pymc.io/ Chris on GitHub: https://github.com/fonnesbeck An introduction to Markov Chain Monte Carlo using PyMC3 - PyData London 2019: https://www.youtube.com/watch?v=SS_pqgFziAg Introduction to Statistical Modeling with Python - PyCon 2017 - video: https://www.youtube.com/watch?v=TMmSESkhRtI Introduction to Statistical Modeling with Python - PyCon 2017 - code repo: https://github.com/fonnesbeck/intro_stat_modeling_2017 Bayesian Non-parametric Models for Data Science using PyMC3 - PyCon 2018: https://www.youtube.com/watch?v=-sIOMs4MSuA Statistical Data Analysis in Python: https://github.com/fonnesbeck/statistical-analysis-python-tutorial --- Send in a voice message: https://anchor.fm/learn-bayes-stats/message

Moscow Python: подкаст о Python на русском
Moscow Python Podcast. Как повысить безопасность разработки (level: middle+)

Moscow Python: подкаст о Python на русском

Play Episode Listen Later Sep 22, 2019 48:09


По мере усложнения проектов и с развитием технологической базы разработки множится и число потенциальных уязвимостей в коде. Как создавать IT-системы и программировать более безопасно, какие риски подстерегают Python-разработчика на разных уровнях и как их снижать, разбираемся с гостем подкаста Николаем Марковым, главным архитектором в компании Aligned Research, одним из организаторов сообщества PyData. Ведущие — сооснователь MoscowPython и компании DryLabs Валентин Домбровский, тимлид NVIDIA Злата Обуховская, DevRel компании Evrone, руководитель программного комитета Moscow Python Conf++ Григорий Петров.   Все выпуски: https://podcast.python.ru Митапы MoscowPython: http://moscowpython.ru Курс Learn Python: https://learn.python.ru Конференция Moscow Python Conf: https://conf.python.ru

PyData Deep Dive
Chris Fonnesbeck - Probabilistic Programming

PyData Deep Dive

Play Episode Listen Later Sep 15, 2019 54:07


I am beyond excited to share this first episode of the PyData podcast with you. The idea is to have a free-form discussion with interesting guests which does not shy away from more advanced topics.In this episode I talk to Chris Fonnesbeck: Professor for biostatistics at Vanderbilt University and, as of recent, Data Scientist at the New York Yankees. We start off this discussion by talking about Bayesian statistics, probabilistic programming. Chris then talks about the history of PyMC and what the current status of PyMC4 is.We then dive more into his background and how he moved from marine biology to become a data scientist in sports analytics and the lessons he learned along the way.Special thanks to my Patreons Andrew Ng, Daniel Gerlanc, and Richard Craib.If you would like to support the podcast go to: https://patreon.com/twieckiFollow Chris on Twitter: https://twitter.com/fonnesbeckSupport the show (https://www.patreon.com/twiecki)

The Python Podcast.__init__
Combining Python And SQL To Build A PyData Warehouse

The Python Podcast.__init__

Play Episode Listen Later Sep 2, 2019 43:44


The ecosystem of tools and libraries in Python for data manipulation and analytics is truly impressive, and continues to grow. There are, however, gaps in their utility that can be filled by the capabilities of a data warehouse. In this episode Robert Hodges discusses how the PyData suite of tools can be paired with a data warehouse for an analytics pipeline that is more robust than either can provide on their own. This is a great introduction to what differentiates a data warehouse from a relational database and ways that you can think differently about running your analytical workloads for larger volumes of data.

Women in Data Science
Shir Meir Lador | Using Data Science to Keep Financial Data Secure

Women in Data Science

Play Episode Listen Later Aug 15, 2019 35:04


In addition to her job at Intuit, Lador is a WiDS ambassador in Israel, has her own podcast about data science, and is a co-founder of PyData Tel Aviv meetups. Lador’s team at Intuit focuses on machine learning in security and fraud applications to protect customers’ sensitive financial data from fraudsters and hackers. She and her team use anomaly detection and semi-supervised methods to secure Intuit products and data. “In general, putting AI into products is not an easy task.” But she thinks we need to put a lot of effort into securing our data especially with recent data leaks from Equifax and Facebook. “I think the world is going into that direction with the GDPR and other initiatives. AI has a lot of potential of helping in that domain,” she explained during a conversation with Stanford’s Margot Gerritsen, Stanford professor and host of the Women in Data Science podcast. Israel has a lot of expertise in the security domain because many young people study security and encryption during Israel’s mandatory military service. She had the option to do this during her service, but since she already knew she would pursue a career in this area, instead she chose to become a pilot instructor in the flight simulator. “It was a very unique experience that I would probably never get to do.” When Lador was starting her career in data science, she did not know many people in the field. She decided to start a PyData branch in Israel because she wanted to build a professional data science community. “My main motivation was that I wanted to learn and that I wanted to have friends and people to consult with and learn from. And now I have so many data scientist friends because of all this work and it's great. I love it.” She noticed when organizing PyData events that it was much easier to get male speakers. When she would ask a talented female scientist to talk about her work, she would say: “No, I'm not an expert… I'm not ready. I need to learn more… I was like, no, you're enough years in the field. Everyone can learn something from you.” Being a WiDS ambassador was like an extension of her PyData work. “I get to decide what's in the conference and bring the best talks there.” Her experience organizing the PyData meetups helped her know how to create a valuable conference. She sees WiDS as a great opportunity to encourage more women to speak by giving them a platform, but also by bringing all the people together. “Seeing all those women on stage. This gives great inspiration to speak at other events, not just in WiDS. I think this is just an amazing initiative.” RELATED LINKS Connect with Shir Meir Lador on Twitter (@shirmeir86) and LinkedIn Listen to Shir's podcast Unsupervised Learn about PyData TelAviv Meetup Read more about Intuit Connect with Margot Gerritsen on Twitter (@margootjeg) and LinkedIn Find out more about Margot on her Stanford Profile Find out more about Margot on her personal website

Moscow Python: подкаст о Python на русском
Moscow Python Podcast. Как меняется заточка Python и всё ли он режет (level: medium+)

Moscow Python: подкаст о Python на русском

Play Episode Listen Later Jul 24, 2019 42:44


Общеизвестно, что Python — язык универсальный. Но он эволюционирует, меняются и тренды в разработке — и не всегда понятно, для чего наиболее пригодны сегодня Python и экосистема вокруг него. Много ли задач, для решения которых он посредственный выбор? Какие архитектурные особенности это предопределяют? В какую сторону лучше развивать язык? Пробуем внести ясность вместе с Николаем Марковым, главным архитектором в компании Aligned Research, одним из организаторов сообщества PyData. В студии с ним — ведущие выпуска: сооснователь MoscowPython и компании DryLabs Валентин Домбровский, тимлид NVIDIA Злата Обуховская и руководитель программного комитета Moscow Python Conf++ Григорий Петров.   Все выпуски: https://podcast.python.ru Митапы MoscowPython: http://moscowpython.ru Курс Learn Python: https://learn.python.ru Конференция Moscow Python Conf: https://conf.python.ru

DataCast
Episode 16: Bayesian Probabilistic Programming with Peadar Coyle

DataCast

Play Episode Listen Later Jul 6, 2019 44:09


Show Notes: (2:02) Peadar discussed his undergraduate experience studying Physics and Philosophy at the University of Bristol. (3:05) Peadar then pursued a Master’s degree in Mathematics from the University of Luxembourg, where he did a thesis on machine learning for time series forecasting. (4:16) Peadar commented on his varied work experience with various companies, particularly on data maturity and the difference of established companies and startups. (7:11) Peadar talked about his latest startup called aflorithmic Labs, which develops tech platform that powers and enables the creation of a new generation hyper-personalized / super-relevant podcasts. (8:13) In the series “Interviews with Data Scientists,” Peadar interviewed with 24 of the world’s most influential and innovative data scientists from across the spectrum. He talked about the common traits in the best data scientists. (10:05) Peadar mentioned his contribution to PyMC3, a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. (11:32) Peadar talked about the probabilistic programming survey he conducted recently, in which A/B testing is a big use case. (13:37) In his talk “Lies damned lies and statistics in Python” at PyData London 2016, Peadar compared and debugged models in Statsmodels, scikit-learn and PyMC3. He recalled the differences here. (15:27) Peadar went over “Probabilistic Programming Primer” - an online course he designed to teach people to learn how to enhance modeling abilities and better communicate risk. (18:32) Peadar talked about the recent development in the PyData ecosystem, in reference to his talk “A Map of the PyData Stack” at PyData Amsterdam 2016. (20:18) Discussing his blog post “How to successfully deliver Data Science in the Enterprise,” Peadar went over the people, processes, and things that are required to make data science a successful component in enterprise businesses. (23:25) Discussing his blog post “Building Full-Stack Vertical Data Products,” Peadar emphasized the importance of providing end-to-end value with lean metrics as a data scientist. (29:50) Discussing his blog post “One weird tips to improve the success of DS projects,” Peadar shared his small practice of writing down the risks before embarking on a project. (32:58) Discussing his blog post “3 pitfalls for non-technical managers managing DS teams,” Peadar described the things that non-technical managers will get wrong in managing a technical project. (35:31) Discussing his blog post “What does it mean to be a Senior DS?,” Peadar explained why senior data scientists should understand the soft side of technical decision making and should care about ethics. (38:57) Peadar gave a brief overview of machine learning interpretability. (40:21) Closing segments. His Contact Info: LinkedIn Twitter GitHub Medium Quora Website His Recommended Resources: LIME SHAP Stitch Fix Tech Blog Ravelin Blog Stripe Engineering Blog Spotify Discover Weekly Dale Carnegie’s How to Win Friends and Influence People

Datacast
Episode 16: Bayesian Probabilistic Programming with Peadar Coyle

Datacast

Play Episode Listen Later Jul 6, 2019 44:09


Show Notes: (2:02) Peadar discussed his undergraduate experience studying Physics and Philosophy at the University of Bristol. (3:05) Peadar then pursued a Master’s degree in Mathematics from the University of Luxembourg, where he did a thesis on machine learning for time series forecasting. (4:16) Peadar commented on his varied work experience with various companies, particularly on data maturity and the difference of established companies and startups. (7:11) Peadar talked about his latest startup called aflorithmic Labs, which develops tech platform that powers and enables the creation of a new generation hyper-personalized / super-relevant podcasts. (8:13) In the series “Interviews with Data Scientists,” Peadar interviewed with 24 of the world’s most influential and innovative data scientists from across the spectrum. He talked about the common traits in the best data scientists. (10:05) Peadar mentioned his contribution to PyMC3, a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. (11:32) Peadar talked about the probabilistic programming survey he conducted recently, in which A/B testing is a big use case. (13:37) In his talk “Lies damned lies and statistics in Python” at PyData London 2016, Peadar compared and debugged models in Statsmodels, scikit-learn and PyMC3. He recalled the differences here. (15:27) Peadar went over “Probabilistic Programming Primer” - an online course he designed to teach people to learn how to enhance modeling abilities and better communicate risk. (18:32) Peadar talked about the recent development in the PyData ecosystem, in reference to his talk “A Map of the PyData Stack” at PyData Amsterdam 2016. (20:18) Discussing his blog post “How to successfully deliver Data Science in the Enterprise,” Peadar went over the people, processes, and things that are required to make data science a successful component in enterprise businesses. (23:25) Discussing his blog post “Building Full-Stack Vertical Data Products,” Peadar emphasized the importance of providing end-to-end value with lean metrics as a data scientist. (29:50) Discussing his blog post “One weird tips to improve the success of DS projects,” Peadar shared his small practice of writing down the risks before embarking on a project. (32:58) Discussing his blog post “3 pitfalls for non-technical managers managing DS teams,” Peadar described the things that non-technical managers will get wrong in managing a technical project. (35:31) Discussing his blog post “What does it mean to be a Senior DS?,” Peadar explained why senior data scientists should understand the soft side of technical decision making and should care about ethics. (38:57) Peadar gave a brief overview of machine learning interpretability. (40:21) Closing segments. His Contact Info: LinkedIn Twitter GitHub Medium Quora Website His Recommended Resources: LIME SHAP Stitch Fix Tech Blog Ravelin Blog Stripe Engineering Blog Spotify Discover Weekly Dale Carnegie’s How to Win Friends and Influence People

Moscow Python: подкаст о Python на русском
Moscow Python Podcast. Проблемы пакетных экосистем в Python (level: middle / senior)

Moscow Python: подкаст о Python на русском

Play Episode Listen Later Jun 2, 2019 41:13


Как развивалось пакетирование в Python, что в нём на текущий момент оставляет желать лучшего, какие системы для управления зависимостями актуальны и в какую сторону они развиваются? Разбираемся вместе с гостем выпуска — Николаем Марковым, старшим дата-инженером в компании Aligned Research, одним из организаторов сообщества PyData. Подкаст ведут сооснователь MoscowPython Валентин Домбровский, тимлид NVIDIA Злата Обуховская и руководитель программного комитета Moscow Python Conf++ Григорий Петров. Все выпуски: https://podcast.python.ru   Митапы MoscowPython: http://moscowpython.ru   Курс Learn Python: https://learn.python.ru   Конференция Moscow Python Conf: https://conf.python.ru

The Python Podcast.__init__
A Data Catalog For Your PyData Projects

The Python Podcast.__init__

Play Episode Listen Later May 27, 2019 50:01


One of the biggest pain points when working with data is getting is dealing with the boilerplate code to load it into a usable format. Intake encapsulates all of that and puts it behind a single API. In this episode Martin Durant explains how to use the Intake data catalogs for encapsulating source information, how it simplifies data science workflows, and how to incorporate it into your projects. It is a lightweight way to enable collaboration between data engineers and data scientists in the PyData ecosystem.

The Changelog
Enabling open code for science at NumFOCUS

The Changelog

Play Episode Listen Later Feb 22, 2019 68:17 Transcription Available


We’re talking with Gina Helfrich the Communications Director for NumFOCUS about their story and history, the impact of open code on science, the difference between sponsored and affiliated projects, corporate backing, the back story of their education and events program PyData, and the struggles of storytelling and fundraising.

Changelog Master Feed
Enabling open code for science at NumFOCUS (The Changelog #335)

Changelog Master Feed

Play Episode Listen Later Feb 22, 2019 68:17 Transcription Available


We’re talking with Gina Helfrich the Communications Director for NumFOCUS about their story and history, the impact of open code on science, the difference between sponsored and affiliated projects, corporate backing, the back story of their education and events program PyData, and the struggles of storytelling and fundraising.

Open Source Directions hosted by Quansight

The aim of PyData/Sparse is to create sparse containers that implement the ndarray interface. Traditionally in the PyData ecosystem, sparse arrays have been provided by the scipy.sparse submodule. All containers there depend on and emulate the numpy.matrix interface. This means that they are limited to two dimensions and also do not work well in places where numpy.ndarray would work. PyData/Sparse is well on its way to replacing scipy.sparse as the de-facto sparse array implementation in the PyData ecosystem.

PyDataMCR
PyDataMCR - Episode 0 - NUMFOCUS with Dr Gina Helfrich

PyDataMCR

Play Episode Listen Later Dec 17, 2018 80:54


PyDataMCR - Episode 0 - NUMFOCUS with Dr Gina Helfrich Welcome to the official PyDataMCR podcast, In this episode we give an introduction to PyData,NUMFOCUS and some of the organisers. The mission of NumFOCUS is to promote sustainable high-level programming languages, open code development, and reproducible scientific research. We accomplish this mission through our educational programs and events as well as through fiscal sponsorship of open source scientific computing projects. We aim to increase collaboration and communication within the data science and scientific computing community. PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. Dr Gina Helfrich is the Communications Director and Program Manager for Diversity & Inclusion at NumFOCUS, a non-profit that supports better science through open code. Show notes: Why Women Are Flourishing In R Community But Lagging In Python - https://bit.ly/2EacceY Discover Cookbook - https://bit.ly/2PLJyS5 Numfocus blog - numfocus.org/blog Numfocus donation link/membership - https://bit.ly/2S4JDSH Corporate sponsor - numfocus.org/sponsors Conda-forge - conda-forge.org ropenSci - ropensci.org QuantEcon - quantecon.org OpenJournals - www.theoj.org Astropy -astropy.org Cantera -cantera.org jump built with Julia - juliaopt.org/JuMP.jl/0.17/quickstart.html Schoolbus project - https://bit.ly/2LkfUm0 Google slides transcription - https://bit.ly/2PJSkzE Safia Abdallah - twitter.com/captainsafia Stuff Mom Never Told You episode featuring Gina - stuffmomnevertoldyou.com/podcasts/spill-your-salary-secrets.htm Mesa - github.com/projectmesa/mesa Book recommendation from Bertil and David: https://amzn.to/2QB6HMg Our Sponsor Cathcart Associates is a technology recruitment company with offices in Leeds and Manchester covering all things tech, but with an experienced team focusing on Data Science in the North West. We’re good at what we do. We understand what our candidates do, and what our clients need, and we really care about making sure you both get what you want. We’ve been sponsoring PyDataMCR since its inception because we’re nice guys and we like pizza. Check out our website to get in touch – cathcartassociates.com Contact Twitter - twitter.com/pydatamcr Slack - bit.ly/2v60ieu Meetup - meetup.com/PyData-Manchester/

DataCast
Episode 5: Applied Statistics in Data Science with Christopher Peters

DataCast

Play Episode Listen Later Oct 28, 2018 49:02


Show Notes: (2:05) Chris recalled his Econometrics and Quantitative Economics study during his undergraduate days. (4:14) Chris talked about his research work focus on oil and gas at the Center for Energy Studies at Louisiana State University. (5:53) Chris gave some insights on how data science and econometrics can solve problems in the energy industry. (7:51) Chris talked about his decision to pursue a Masters in Applied Statistics. (9:02) Chris emphasized the importance of learning statistical theory. In particular, survival analysis is very similar to conversion rate analysis. (10:28) Chris discussed his Master’s thesis work in analyzing the churn rate for Treehouse. (12:05) Chris maintained his own website called Statwonk. (13:18) Chris mentioned the popularity of count data on consumer products. (14:40) Chris gave some recommendations for those who want to learn fundamental statistics: Think Stats, Think Bayes, Statistical Inference, and Statistical Intervals. (16:20) Chris recalled how he got a job as the first data scientist at Treehouse. (16:52) Chris described Treehouse CEO Ryan Carson as a “marketing genius.” (18:34) Chris spotted the number one most challenging aspect for a data scientist within a business - translating technical jargon to comprehensible concepts. (20:11) Chris recalled building a company dashboard from scratch for Treehouse. (22:46) Chris shared the background overview of his next employer, Zapier. (24:25) Chris shared his favorite things working as a data scientist as Zapier. (32:07) Chris recalled building an AnamolyBot to keep track of Slack communication between Zapier team members. (36:33) Chris talked about the open-source project he has been working on during his spare time. (39:10) Chris gave advice for people who want to seek remote positions. (42:20) Closing segments. His Contact Info: Website Twitter LinkedIn GitHub His recommended resources: Shopify’s Cameron-Davidson Pilon Stitch Fix’s Kim Larsen R for Data Science Hadley Wickham John Tukey’s “Exploratory Data Analysis” William Cleveland’s “The Elements of Graphing Data” #rstats #PyData

Datacast
Episode 5: Applied Statistics in Data Science with Christopher Peters

Datacast

Play Episode Listen Later Oct 27, 2018 49:02


Show Notes: (2:05) Chris recalled his Econometrics and Quantitative Economics study during his undergraduate days. (4:14) Chris talked about his research work focus on oil and gas at the Center for Energy Studies at Louisiana State University. (5:53) Chris gave some insights on how data science and econometrics can solve problems in the energy industry. (7:51) Chris talked about his decision to pursue a Masters in Applied Statistics. (9:02) Chris emphasized the importance of learning statistical theory. In particular, survival analysis is very similar to conversion rate analysis. (10:28) Chris discussed his Master’s thesis work in analyzing the churn rate for Treehouse. (12:05) Chris maintained his own website called Statwonk. (13:18) Chris mentioned the popularity of count data on consumer products. (14:40) Chris gave some recommendations for those who want to learn fundamental statistics: Think Stats, Think Bayes, Statistical Inference, and Statistical Intervals. (16:20) Chris recalled how he got a job as the first data scientist at Treehouse. (16:52) Chris described Treehouse CEO Ryan Carson as a “marketing genius.” (18:34) Chris spotted the number one most challenging aspect for a data scientist within a business - translating technical jargon to comprehensible concepts. (20:11) Chris recalled building a company dashboard from scratch for Treehouse. (22:46) Chris shared the background overview of his next employer, Zapier. (24:25) Chris shared his favorite things working as a data scientist as Zapier. (32:07) Chris recalled building an AnamolyBot to keep track of Slack communication between Zapier team members. (36:33) Chris talked about the open-source project he has been working on during his spare time. (39:10) Chris gave advice for people who want to seek remote positions. (42:20) Closing segments. His Contact Info: Website Twitter LinkedIn GitHub His recommended resources: Shopify’s Cameron-Davidson Pilon Stitch Fix’s Kim Larsen R for Data Science Hadley Wickham John Tukey’s “Exploratory Data Analysis” William Cleveland’s “The Elements of Graphing Data” #rstats #PyData

regonn&curry.fm
7. 台風コンペ追い込み

regonn&curry.fm

Play Episode Listen Later Oct 23, 2018 45:06


PyData.tokyo One-day Conference 2018 - connpass に参加しました。 どのセッションもとても勉強になった 「kaggle のススメ」という LT を実施 NVIDIA この日は、「PyData.tokyo」、「TokyoR」、「JuliaTokyo」が同日開催という珍しい日だった。 今週の1週間 台風コンペ10位まで上がりました。 最近読んでる本 マルチ・ポテンシャライト 好きなことを次々と仕事にして、一生食っていく方法 https://amzn.to/2SdVr5C 一つのことに専念していくためにはどうすればいいかという本が多い中、この本は専門家で一つのことをやりとげるのではなく、色々と興味が移る人(マルチポテンシャライト)向けの話し どのような人生設計をすればいいのか 4 つの働き方(ワークモデル)を提示してくれている どの働き方も「お金」「意義」「多様性」を大事にしている グループハグアプローチ スラッシュアプローチ 自分はこれに一番近い ニッチなテーマに魅力を感じるが、それをフルタイムでやろうとはせず、名詞の肩書が多くなるタイプ(フリーランスの掛け持ち) アインシュタインアプローチ フェニックスアプローチ 最近気になってる新しいサービス みずはのめ 競艇予想人工知能 SNS 要素を入れていくみたい Qrunch 技術ブログを書くハードルを下げるサービス ログという形で気軽に残せる クロス投稿で自分のブログ記事を投稿することもできる Qiita よりも良い デザインもしっかりしている タスクが Trello で公開している 早速自分の Julia の記事をクロス投稿しておいた 最近、個人がクオリティの高いサービスを出してくるようになったイメージ Julia の動き Gadfly.jl が v1.0 で動くようになった XGboost.jl も master ブランチが v1.0 対応したっぽい Take Kaggle's 2018 Machine Learning and Data Science Survey! やってる データサイエンスの最前線がわかる調査 収集データは CSV で公開される Kaggle Machine Learning & Data Science Survey 2017 Two Sigma せっかく Python で書くのでなにか学びながらやりたい ベイズ統計で解いてみる(Stan.jl 等もあるので、Julia でも知見を活かせそう) Two Sigma のカーネルで Bayesian とかで検索しても出てこない。 人気が無いのか このコンペには適さないのか 【Python と Stan で学ぶ】仕組みが分かるベイズ統計学入門 で学んでる 今日の一句 アイスからホットに変はり松手入れ 恋言

Reversim Podcast
330 with Shir Meir about PyData etc

Reversim Podcast

Play Episode Listen Later Jan 1, 2018


Extend
Say Yes to Opportunities and Data Science

Extend

Play Episode Listen Later Jun 12, 2017 32:52


What is data science, what does a data scientist actually do, and how do you become one? We asked all these questions when we talked to Shir Meir Lador, a data scientist and the founder of the PyData community. As a bonus, we also discovered what happens when you say 'yes' to every new opportunity. Music by Dave Depper. Recorded at Samsung NEXT.

Entre Dev y Ops Podcast
Edyo 25 - Cómo organizar con éxito un evento como PyData Barcelona 2017.

Entre Dev y Ops Podcast

Play Episode Listen Later Jun 2, 2017


En el episodio 25 del podcast de http://www.entredevyops.es Ignasi Fosch nos cuenta como se ha organizado con éxito la PyData Barcelona 2017. Blog Entredevyops - http://www.entredevyops.es Twitter @entredevyops - https://twitter.com/EntreDevYOps Enlaces comentados: PyData - http://pydata.org PyData BCN 2017 - http://pydata.org/barcelona2017 PyBCN en meetup - http://www.meetup.com/es-ES/python-185

Entre Dev y Ops Podcast
Edyo 22 - Orchestrate 2017, PyData 2017, Caída s3, Vulnerabilidad de CloudFlare y Vault7.

Entre Dev y Ops Podcast

Play Episode Listen Later Apr 28, 2017


En este podcast hablamos de la Orchestrate 2017 y PyData 2017. Por otro lado comentamos la caída S3, la Vulnerabilidad de CloudFlare y el impacto de Vault7.

The Python Podcast.__init__
PyData London with Ian Ozsvald and Emlyn Clay

The Python Podcast.__init__

Play Episode Listen Later Mar 12, 2016 63:11


Ian Ozsvald and Emlyn Clay are co-chairs of the London chapter of the PyData organization. In this episode we talked to them about their experience managing the PyData conference and meetup, what the PyData organization does, and their thoughts on using Python for data analytics in their work.