POPULARITY
In this episode of 'Don't Know Much About Football,' hosts Sarah and Karlo dive into the contentious topic of player betting. They discuss the challenges and moral complexities surrounding the rise of sports betting, the impact of online abuse on referees and players, and the recent controversies involving players like USMNT's Wes McKinney. The conversation touches on regulatory frameworks, addiction issues, and the potential hypocrisy of leagues and teams benefiting from gambling sponsorships while restricting players from participating. The hosts also debate whether betting on one's own performance or teams should be allowed and highlight the potential legal and ethical ramifications. The episode underscores the dynamic and evolving nature of sports gambling and its implications for professional athletes.00:00 Welcome to Don't Know Much About Football00:06 The Rise of Player Betting01:02 The Ethics of Betting in Sports03:30 The Impact of Online Betting05:26 Addiction and Gambling Risks09:58 Regulations and Ethical Standards16:11 The Future of Sports Betting21:09 Final Thoughts and FarewellImage Sources: (Backgrounds Removed)https://unsplash.com/photos/white-red-and-blue-round-hand-fan-PNai1XK387M Photo by Klim Musalimov on Unsplashhttps://unsplash.com/photos/brown-and-silver-round-analog-clock-mD1V-eS1Wb4 Photo by Derek Lynn on Unsplash Hosted on Acast. See acast.com/privacy for more information.
If you've ever done data analysis in Python, there's a good chance you've used Pandas, the groundbreaking open-source library that transformed how data scientists work. But behind Pandas is Wes McKinney's story of intellectual curiosity, risk-taking, and a relentless drive to build useful tools for others. Wes's latest initiative, Composed Ventures, is a micro venture capital (VC) fund that invests in early-stage data infrastructure, AI, and ML companies, as well as next-generation Python tooling and other related technologies.
Wes McKinney and I chat about Positron, Arrow, how he created Pandas and Arrow, and what makes him tick.
I had the pleasure of interviewing Wes McKinney, Creator of Pandas, a name well-known in the data world through his work on the Pandas Project and his book, Python for Data Analysis. Wes is now at Posit PBC, and during our conversation at Small Data SF, we covered several key topics around the evolving data landscape! Wes shared his thoughts on the significance of Small Data, why it's a compelling topic right now, and what “Retooling for a Smaller Data Era” means for the industry. We also dove into the challenges and potential benefits of shifting from Big Data to Small Data, and discussed whether this trend represents the next big movement in data. Curious about Apache Arrow and what's next for Wes? Check out our interview where Wes gives some great insights into the future of data tooling. #data #ai #smalldatasf2024 #theravitshow
Talk Python To Me - Python conversations for passionate developers
If you work in data science, you definitely know about data frame libraries. Pandas is certainly the most popular, but there are others such as cuDF, Modin, Polars, Dask, and more. They are all similar but definitely not the same APIs and Polars is quite different. But here's the problem. If you want to write a library that is for users of more than one of these data frame frameworks, how do you do that? Or if you want to leave open the possibility of changing yours after the app is built, same problem. That's the problem that Narwhals solves. We have Marco Gorelli on the show to tell us all about it. Episode sponsors WorkOS Talk Python Courses Links from the show Marco Gorelli: @marcogorelli Marco on LinkedIn: linkedin.com Narwhals: github.io Narwhals on Github: github.com DuckDB: duckdb.org Ibis: ibis-project.org modin: readthedocs.io Pandas and Beyond with Wes McKinney: talkpython.fm Polars: A Lightning-fast DataFrame for Python: talkpython.fm Polars: pola.rs Pandas: pandas.pydata.org Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
Nick Schrock and Wes McKinney join us for a chat about composable data stacks, open table formats, managing complexity, and much more.
From creating one of the Python's most influential libraries to co-founding Voltron Data, Wes joins the show to chat about why the book cover of the pandas book doesn't feature a panda, open source pitfalls to avoid, the pros and cons of hiring engineers at a non-profit, and more. Segments: (00:02:50) Guang's complaint about the pandas book cover (00:04:38) Quarto and Open Access Publishing (00:12:00) Convincing Wall Street to Open Source (00:15:31) Publishing the first python package over Christmas (00:18:01) Doubling Down on Building pandas (00:23:23) Personal sacrifices for the sake of impact (00:26:28) The Evolution of Open-Source (00:29:19) “Open source development started out as a very privileged activity” (00:32:40) The Consulting Trap (00:35:17) The Startup Trap (00:39:29) The Corporate User Trap (00:44:21) Avoiding the Startup Trap (00:46:54) Non-Profit vs. For-Profit (00:48:09) The Challenges of Hiring Engineers in a Non-Profit Setting (00:50:08) The Benefits of Remote Work for Open Source Development (00:52:15) Balancing Open Source and Enterprise Interests (00:57:25) New Funding Models for Open Source? (01:00:01) Getting into VC (01:06:19) The Future of Composable Data Systems Show Notes: - online edition of pandas book: https://wesmckinney.com/book/ - the new digital publishing tool that Wes recommends: https://quarto.org/ Stay in touch:
Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/ Cody Peterson has a diverse work experience in the field of product management and engineering. Cody is currently working as a Technical Product Manager at Voltron Data, starting from May 2023. Previously, they worked as a Product Manager at dbt Labs from July 2022 to March 2023. MLOps podcast #234 with Cody Peterson, Senior Technical Product Manager at Voltron Data | Ibis project // Open Standards Make MLOps Easier and Silos Harder. Huge thank you to Weights & Biases for sponsoring this episode. WandB Free Courses -http://wandb.me/courses_mlops // Abstract MLOps is fundamentally a discipline of people working together on a system with data and machine learning models. These systems are already built on open standards we may not notice -- Linux, git, scikit-learn, etc. -- but are increasingly hitting walls with respect to the size and velocity of data. Pandas, for instance, is the tool of choice for many Python data scientists -- but its scalability is a known issue. Many tools make the assumption of data that fits in memory, but most organizations have data that will never fit in a laptop. What approaches can we take? One emerging approach with the Ibis project (created by the creator of pandas, Wes McKinney) is to leverage existing "big" data systems to do the heavy lifting on a lightweight Python data frame interface. Alongside other open source standards like Apache Arrow, this can allow data systems to communicate with each other and users of these systems to learn a single data frame API that works across any of them. Open standards like Apache Arrow, Ibis, and more in the MLOps tech stack enable freedom for composable data systems, where components can be swapped out allowing engineers to use the right tool for the job to be done. It also helps avoid vendor lock-in and keep costs low. // Bio Cody is a Senior Technical Product Manager at Voltron Data, a next-generation data systems builder that recently launched an accelerator-native GPU query engine for petabyte-scale ETL called Theseus. While Theseus is proprietary, Voltron Data takes an open periphery approach -- it is built on and interfaces through open standards like Apache Arrow, Substrait, and Ibis. Cody focuses on the Ibis project, a portable Python dataframe library that aims to be the standard Python interface for any data system, including Theseus and over 20 other backends. Prior to Voltron Data, Cody was a product manager at dbt Labs focusing on the open source dbt Core and launching Python models (note: models is a confusing term here). Later, he led the Cloud Runtime team and drastically improved the efficiency of engineering execution and product outcomes. Cody started his carrer as a Product Manager at Microsoft working on Azure ML. He spent about 2 years on the dedicated MLOps product team, and 2 more years on various teams across the ML lifecycel including data, training, and inferencing. He is now passionate about using open source standards to break down the silos and challenges facing real world engineering teams, where engineering increasingly involves data and machine learning. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Ibis Project: https://ibis-project.org Apache Arrow and the “10 Things I Hate About pandas”: https://wesmckinney.com/blog/apache-arrow-pandas-internals/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Cody on LinkedIn: https://linkedin.com/in/codydkdc
Talk Python To Me - Python conversations for passionate developers
This episode dives into some of the most important data science libraries from the Python space with one of its pioneers: Wes McKinney. He's the creator or co-creator of pandas, Apache Arrow, and Ibis projects and an entrepreneur in this space. Episode sponsors Neo4j Mailtrap Talk Python Courses Links from the show Wes' Website: wesmckinney.com Pandas: pandas.pydata.org Apache Arrow: arrow.apache.org Ibis: ibis-project.org Python for Data Analysis - Groupby Summary: wesmckinney.com/book Polars: pola.rs Dask: dask.org Sqlglot: sqlglot.com Pandoc: pandoc.org Quarto: quarto.org Evidence framework: evidence.dev pyscript: pyscript.net duckdb: duckdb.org Jupyterlite: jupyter.org Djangonauts: djangonaut.space Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
Today I had the pleasure of speaking with Wes McKinney. Wes is well known in the open source community for his work building the pandas library. He also has a fascinating background leading the production of other data tooling with a focus on ubiquitous language-agnostic, hardware-optimized analytical computing. In this episode we learn about the origin of pandas, Wes' transition into a new role at Posit, and his perspective on the future of open source. Linkedin: https://www.linkedin.com/in/wesmckinn/Twitter: https://twitter.com/wesmckinnPodcast Sponsors, Affiliates, and Partners:- Pathrise - http://pathrise.com/KenJee | Career mentorship for job applicants (Free till you land a job)- Taro - http://jointaro.com/r/kenj308 (20% discount) | Career mentorship if you already have a job - 365 Data Science (57% discount) - https://365datascience.pxf.io/P0jbBY | Learn data science today- Interview Query (10% discount) - https://www.interviewquery.com/?ref=kenjee | Interview prep questions
Wes McKinney is the co-creator of pandas library and he is the cofounder of Voltron data. Currently he is a principal Architect at Posit and an investor in data systems. Daliana's Twitter: https://twitter.com/DalianaLiu Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/ Wes' LinkedIn: https://www.linkedin.com/in/wesmckinn/ (00:00:00) Introduction (00:00:44) How Pandas Started (00:06:40) Voltron Data (00:10:03) Benefits of Easy-to-Use Data Tools (00:13:20) The Rise of New Data Tools (00:18:07) Choosing Tools: Vertical or Flexible? (00:23:01) Big Models and Data Tools (00:29:29) Challenges in Building a Product (00:31:28) Becoming a Top Architect (00:34:55) Missed Aspects of Previous Roles (00:39:04) A Busy Week: Advising, Designing, Investing (00:43:42) Improving Open Source (00:45:24) How to Decide What to Work On (00:46:28) What he's learning now (00:47:56) Excitement in Career and Life (00:48:29) Using ChatGPT for Learning (00:50:27) Future Impact Goals
How do you avoid the bottlenecks of data processing systems? Is it possible to build tools that decouple storage and computation? This week on the show, creator of the pandas library Wes McKinney is here to discuss Apache Arrow, composable data systems, and community collaboration.
Highlights from this week's conversation include:Introduction of the panel (0:05)Defining composable data stack (5:22)Components of a composable data stack (7:49)Challenges and incentives for composable components (10:37)Specialization and modularity in data workloads (13:05)Organic evolution of composable systems (17:50)Efficiency and common layers in data management systems (22:09)The IR and Data Computation (23:00)Components of the Storage Layer (26:16)Decoupling Language and Execution (29:42)Apache Calcite and Modular Frontend (36:46)Data Types and Coercion (39:27)Describing Data Sets and Schema (42:00)Open Standards and Frontiers (46:22)Challenges of standardizing APIs (48:15)Trade-offs in building composable systems (54:04)Evolution of data system composability (56:32)Exciting new projects in data systems (1:01:57)Final thoughts and takeaways (1:17:25)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
In this bonus episode, Eric and Kostas preview their upcoming discussion with a panel of experts as Wes McKinney (Co-Founder, Voltron), Pedro Pedreira Software Engineer, Meta), Chris Riccomini (Seed Investor, various startups), and Ryan Blue (Co-Founder and CEO, Tabular) join the show.
Wes McKinney is the creator of pandas, co-creator of Apache Arrow, and now Co-founder/CTO at Voltron Data. In this conversation with Tristan and Julia, Wes takes us on a tour of the underlying guts, from hardware to data formats, of the data ecosystem. What innovations, down to the hardware level, will stack to lead to significantly better performance for analytics workloads in the coming years? To dig deeper on the Apache Arrow ecosystem, check out replays from their recent conference at https://thedatathread.com. For full show notes and to read 7+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
The data ecosystem has been growing rapidly, with new communities joining and bringing their preferred programming languages to the mix. This has led to inefficiencies in how data is stored, accessed, and shared across process and system boundaries. The Arrow project is designed to eliminate wasted effort in translating between languages, and Voltron Data was created to help grow and support its technology and community. In this episode Wes McKinney shares the ways that Arrow and its related projects are improving the efficiency of data systems and driving their next stage of evolution.
Wes McKinney, CTO & Co-Founder of Voltron Data, joins me for an in-depth conversation on how his quest to develop Python as an open-source programming language led him to creating the pandas project and founding four companies. In this episode, Wes and I dive into his unique background as the founder of the pandas project and he describes his perspective on the early days of Python, his journey into the world of open-source start-ups, and the risks and benefits of paying developers to work on open-source projects. Highlights: Wes introduces himself and describes his role (00:46) Wes' role in elevating Python to a mainstream programming language (02:15) How working with Python led Wes to co-founding his first two companies (09:01) Apache Arrow's critical role at Voltron Data and their focus on accelerating Arrow adoption (12:52) How did the team at Voltron Data decide on an open-source business model? (18:54) Wes speaks to the risk that can come from having developers work on an open-source project (22:31) Wes' perspective on the real-world applications and benefits of paying developers to work on open-source projects (27:44) Links:Wes LinkedIn: https://www.linkedin.com/in/wesmckinn/ Twitter: https://twitter.com/wesmckinn Company: https://voltrondata.com/
Follow Wes's music @WeAreValidity / http://www.wearevalidity.com/indivisible/ and CLICK HERE to stream on Spotify! Get Tickets HERE for Validity Album Release Party on Saturday, 9/17 at Uncommon Ground! Check out the article referenced by Jayson Starke and subscribe to The Athletic. ✌️❤️⚾️ --- Support this podcast: https://podcasters.spotify.com/pod/show/kburdtweets/support
Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead. Wes McKinney is the CEO of Ursa Computing, a new startup working on accelerated computing The post Arrow Infrastructure with Wes McKinney appeared first on Software Engineering Daily.
Wes McKinney joins us to discuss the history and philosophy of pandas and Apache Arrow as well as his continued work in open source tools. In this episode you will learn: • History of pandas [7:29] • The trends of R and Python [23:33] • Python for Data Analysis [25:58] • pandas updates and community [30:10] • Apache Arrow [41:50] • Voltron Data [55:10] • Origin of Wes's project names [1:08:14] • Wes's favorite tools [1:09:46] • Audience Q&A [1:15:34] Additional materials: www.superdatascience.com/523
Our guest this week in the one and only Wes McKinney, creator of Pandas and Apache Arrow. We have a great conversation about his career journey, funding and maintaining open-source software projects, his new company Ursa Computing, how Pandas grew from a passion project to the lingua franca of Python data science, and a lot more.
Eric Anderson (@ericmander) and Travis Oliphant (@teoliphant) take a far-reaching tour through the history of the Python data community. Travis has had a hand in the creation of many open-source projects, most notably the influential libraries, NumPy and SciPy, which helped cement Python as the standard for scientific computing. Join us for the story of a fledgling community from a time “before open-source was cool,” and their lessons for today’s open-source landscape. In this episode we discuss: How biomedical engineering, MRIs, and an unhappy tenure committee led to NumPy and SciPy Overcoming early challenges of distribution with Python What Travis would have done differently when he wrote NumPy Successfully solving the “two-option split” by adding a third option Community-driven open-source interacting with company-backed open-source Links: NumPy SciPy Anaconda Quansight Conda Matplotlib Enthought TensorFlow PyTorch MXNet PyPi Jupyter pandas People mentioned: Guido van Rossum (@gvanrossum) Robert Kern (Github: @rkern) Pearu Peterson (Github: @pearu) Wes McKinney (@wesmckinn) Charles Harris (Github: @charris) Francesc Alted (@francescalted) Fernando Perez (@fperez_org) Brian Granger (@ellisonbg) Other episodes: TensorFlow with Rajat Monga
In this episode of the Data Exchange I speak with Wes McKinney, Director of Ursa Labs and an Apache Arrow PMC Member. Wes is the creator of pandas, one of the most widely used Python libraries for data science. He is also the author of the best-selling book, “Python for Data Analysis” – a book that has become essential reading for both aspiring and experienced data scientists.Our conversation focused on data science tools and other topics including:Two open source projects Wes has long been associated with: pandas and Apache Arrow.The need for a shared infrastructure for data science.Ursa Labs: its mission and structure.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
In this episode of the Data Exchange I speak with Edmon Begoli, Chief Data Architect at Oak Ridge National Laboratory (ORNL). Edmon has developed and implemented large-scale data applications on systems like Open MPI, Hadoop/MapReduce, Apache Calcite, Apache Spark, and Akka. Most recently he has been building large-scale machine learning and natural language applications with Ray, a distributed execution framework that makes it easy to scale machine learning and Python applications.Our conversation included a range of topics, including:Edmon's role at the ORNL and his experience building applications with Hadoop and Spark.What is distributed online learning?Why they started using Ray to build distributed online learning applications.Two important use cases: suicide prevention among US veterans and infectious disease surveillance.Detailed show notes can be found on The Data Exchange web site.Join Michael Jordan, Manuela Veloso, Azalia Mirhoseini, Zoubin Ghahramani, Wes McKinney, Ion Stoica, Gaël Varoquaux, and many other speakers at the first Ray Summit In San Francisco, May 27-28. Tickets start at $200.
In this episode of the Data Exchange I speak with Krishna Gade, founder and CEO at Fiddler Labs, a startup focused on helping companies build trustworthy and understandable AI solutions. Prior to founding Fiddler, Krishna led engineering teams at Pinterest and Facebook.Our conversation included a range of topics, including:Krishna's background as an engineering manager at Facebook and Pinterest.Why Krishna decided to start a company focused on explainability.Guidelines for companies who want to begin working on incorporating model explainability into their data products.The relationship between model explainability (transparency) and security (ML that can resist adversarial attacks).Detailed show notes can be found on The Data Exchange web site.Join Michael Jordan, Manuela Veloso, Azalia Mirhoseini, Zoubin Ghahramani, Wes McKinney, Ion Stoica, Gaël Varoquaux, and many other speakers at the first Ray Summit In San Francisco, May 27-28. Tickets start at $200.
In this episode of the Data Exchange I speak with Dean Wampler, Head of Developer Relations at Anyscale, the startup founded by the creators of Ray. Ray is a distributed execution framework that makes it easy to scale machine learning and Python applications. It has a very simple API and as someone who uses both Python and machine learning, Ray has been a wonderful addition to my toolbox. Dean has long been one of my favorite architects, speakers and teachers, and we have known each other since the early days of Apache Spark. He has authored numerous books and is known for his interest in Scala and programming languages, as well as in software architecture.Our conversation spanned many topics, including:What is Ray and why should someone consider using it?The first Ray Summit (May 27-28 in San Francisco)Dean's first impressions of Ray, and his journey from Scala to Python.An update on Ray's core libraries, Ray on Windows, and distributed training with Ray.Detailed show notes can be found on The Data Exchange web site.For more on Ray and scalable machine learning & Python, come hear from Dean Wampler, Michael Jordan, Ion Stoica, Manuela Veloso, Wes McKinney and many other leading developers and researchers at the first Ray Summit in San Francisco (May 27-28).
Sergio #1: Geocomputación con R Otro ejemplo de Bookdown para crear libros técnicos con R Rodo #2: D-Tale - Un cliente de Flask/React para visualizar estructuras de datos de Pandas. D-Tale combina Flask en back-end y React en front-end para brindarnos una manera fácil de ver y analizar las estructuras de datos de Pandas. Se integra a la perfección con las Jupyter Notebooks y las terminales Python/ IPython. Admite objetos Pandas como DataFrame, Series, MultiIndex, DatetimeIndex y RangeIndex. Sergio #3: El que trajo la app que te hace un mapa con todas las calles de una ciudad tiene otra app que hace Ridgeline plots de mapas Este parece ser un tema común aquí en QUAIL data jaja Sergio #4: ¿Qué es el tidyverse? por Rafa Gouveia - https://www.youtube.com/watch?v=uGg13_qOwhQ&list=PLbDLkhJ5sFvCWFbP4tAFALHkNWNFo_FiL 8 Herramientas Rodo #5: El breve resumen de la PyCon Colombia 2020 Increíbles keynote speakers como Andrew Godwin, Wes McKinney, Sarah Guido y Fernando Pérez, entre otros. Increíbles talleres con un track completo sobre Data Science, Web Development, IoT y otros. Repo de mi taller: https://github.com/RodolfoFerro/PyConCo20 Rodo #6: Thinc.ai - Una refrescante versión funcional del aprendizaje profundo, compatible con TUS bibliotecas FAVORITAS. Puedes cambiar entre frameworks. Realiza chequeo de tipos. Thinc nos permite describir árboles de objetos, con referencias nuestras propias funciones a través de archivos .cfg. Es súper ligero. Por los creadores de SpaCy y FastAPI... Extras: Sergio: Periodismo computacional - una clase de la universidad de Columbia - Un repositorio con los notebooks que estan usando en la clase este 2020 Ines montani repositorio base para crear cursos de python https://github.com/ines/course-starter-python y de R https://github.com/ines/course-starter-r Rodo: ¡Meetup de R para Data Science y Reinforcement Learning en Monterrey el próximo 18 de febrero! XII Congreso Mexicano de Inteligencia Artificial del 20 al 22 de mayo del 2020 en Ciudad Juárez, Chihuahua, México: http://smia.mx/comia/2020/ Gente bonita de Mérida y alrededores, asistan al Datostada: https://datostada.mx Meme de la semana: https://www.reddit.com/r/mathmemes/comments/f3eq3o/absolutely/ --- This episode is sponsored by · Anchor: The easiest way to make a podcast. https://anchor.fm/app --- Send in a voice message: https://anchor.fm/quaildata/message Support this podcast: https://anchor.fm/quaildata/support
The Data Life Podcast is a podcast where we talk all-about real life experiences with data and data science science tools, techniques, models and personalities. In this episode, we will talk about how Pandas is becoming a tool of choice for many data scientists for doing their data analysis work. We will explore how Pandas wins over Excel in several key areas that are important for businesses today: 1) Large dataset sizes 2) Different kinds of input formats such as JSON, CSV, HTML, SQL etc 3) Complex business logic 4) Linking data analysis work to websites and databases 5) Cost Pandas has lots of helpful functions such as read_csv, read_json, read_sql that allow easy input of data into dataframes. DataFrames have several useful methods like "describe", "value_counts", "groupby", "loc" and more that allow easy understanding of your dataset. It also supports plotting out of the box with "plot" method. We also cover how Pandas differs from SQL in things like ease of handling time series data, visualizations and more. Tune in to the episode to learn more about how Pandas might be the tool for your data analysis needs to take your business to next level! Fantastic Resources: 1) Book by Pandas creator Wes McKinney: https://www.amazon.com/dp/1491957662/?tag=omnilence-20 2) Great workshop video by Kevin Markham in PyCon: https://www.youtube.com/watch?v=0hsKLYfyQZc 3) Input output methods for Pandas: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html 4) Comparison of some operations of Pandas with SQL https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html Thanks for listening! Please consider supporting this podcast from the link in the end. --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support
Welcome to PyDataMCR episode 6, today we are talking to Tania Allard who is a Developer Advocate for Microsoft here in Manchester. Find out what a developer advocate does. We discuss advice for dealing with burnout in the modern world. Show Notes Tania twitter.com/@ixek Jupyter - https://jupyter.org/ Azure - https://azure.microsoft.com/en-gb/ QuantStack - https://quantstack.net/ NIPS - https://nips.cc/ Binder - https://gke.mybinder.org/ Shout outs - Carol Willing - twitter.com/@WillingCarol - Katharine Jarmul - twitter.com/@kjam - Satya Nandela - twitter.com/@satyanadella - Wes McKinney - twitter.com/@wesmckinn - Limor Fried - https://www.linkedin.com/in/ladyada/ Sponsors Cathcart Associates - https://www.cathcartassociates.com/
Python has become one of the dominant languages for data science and data analysis. Wes McKinney has been working for a decade to make tools that are easy and powerful, starting with the creation of Pandas, and eventually leading to his current work on Apache Arrow. In this episode he discusses his motivation for this work, what he sees as the current challenges to be overcome, and his hopes for the future of the industry.
Die sechste Folge beschäftigt sich mit einer der wohl bekanntesten und meistgenutzten Python-Bibliotheken: "Pandas" Diesmal haben wir als Expertengast Simon dabei, der uns mehr über die Funktionen von Pandas erzählt. Shownotes Unsere E-Mail für Fragen, Anregungen & Kommentare: hallo@python-podcast.de News Shared memory for multiprocessing (struct, wenn man das von Hand machen will) Operatoren für Dictionaries Pandas Pandas Cheatsheets Teil 1, Teil 2 Tutorialnotebook von Jochen Maßeinheiten für dataframes mit pint (noch nicht released) - verwendet die neue extension array api Erster Einblick in die Daten im Pandas Workflow mit df.head() df.tail() und df.describe() df.apply() Eher für Fortgeschrittene: Modern Pandas Artikel über pivot, stack und unstack Pandas 2.0 Podcasts und Talks Jeff Reback - What is the Future of Pandas Wes McKinney's Career In Python For Data Analysis - Episode 203 Episode #200: Escaping Excel Hell with Python and Pandas R R, R studio Shiny Picks Django Chat Podcast (Jochen) Django-ORM-like, aber für flat files: alkali (Jochen) Matplotlib to Plotly (Simon) pickle (Dominik) Öffentlicher Tag auf konektom
In der dritten Episode unseres Python-Podcasts geht es ausnahmsweise gar nicht so viel um Python. Jochen erzählt, was er im Web so macht und was für struggles ihm da aktuell so begegnen. Ziemlich chaotisch diese Folge. Weihnachtsstress pur :) Shownotes Unsere E-Mail für Fragen, Anregungen & Kommentare: hallo@python-podcast.de Browser-Engines: webkit, blink Do you believe: Church of Google ? Freie SSL-Zertifikate bei Let's Encrypt Großrechner der IBM-Z-Series Python mit graphql-graphene Chrome-Extension Apollo, ein Debugging Tool für GraphQL Wes McKinney: 10 things I hate about pandas Falls wer die Rede von Heinz Nixdorf zur Cebit-Eröffnung oder andere nützliche Dinge findet, bitte Bescheid geben.
Wes McKinney is the creator and "Benevolent Dictator for Life" (BDFL) of the open-source pandas package for data analysis in Python, and has also authored two versions of the reference book Python for Data Analysis. Wes is also one of the co-creators of the Apache Arrow project, which is currently his main focus. Most recently, he is the founder Ursa Labs, a not-for-profit open source development group in partnership with RStudio. He describes himself as a problem-solver, and is particularly interested in improving the usability of data tools for programmers, accelerating data access and in-memory data processing performance, and improving data system interoperability. In my conversation with Wes today, we focused on getting to know Wes on a more personal level, discussing his background and interests to get some insight into the living legend of open source he has become. [3:48] How did coming from four generations of newspaperman impact Wes’s upbringing? [6:00] What kind of hobbies was he interested in growing up, and what is the origin of his interest in computers? [11:08] How did he come to run a Goldeneye 007 world record website, and update and maintain it by hand? [16:10] Wes’s high school career as a mathlete, and how an early interest in math contributed to his approach to programming. [18:15] How wes brings the rigor he learned in mathematics to software engineering. [19:50] How languages and math scratch the same itch for composition. [21:00] About learning enough German to complete a PhP programming internship in Munich. [23:00] How Wes’s experience using data in his first year working post-undergrad set him down the path to Pandas. [25:00] What went into his decision to take leave from grad school to build Pandas? [27:00] The legendary tweet where Wes expressed his sense of purpose and motivation in building Pandas. [29:52] Why Wes’s work is motivated by the desire to free up people’s time to realize their full potential. [30:51] Zero to One - Peter Thiel [31:40] Why is solving basic efficiency problems, like reading CSV files. so important? [34:12] How community management has played such a huge role in making Pandas so successful compared to other tools. [39:00] The importance of seeing peers in an open source project as people with good intentions and more than just a GitHub profile. [46:00] How do the incentives of an open source project influence prioritization in a project? [51:45] How Wes’s newest project, UrsaLabs, is tackling the problem of funding in open source software development. [56:20] Wes’s goals for UrsaLabs over the next five years. AJ’s Twitter: https://twitter.com/ajgoldstein393 Wes’s Twitter:https://twitter.com/wesmckinn Wes’s personal website: http://wesmckinney.com Wes’s LinkedIn: https://www.linkedin.com/in/wesmckinn/
Hugo speaks with Wes McKinney, creator of the pandas project for data analysis tools in Python and author of Python for Data Analysis, among many other things. Wes and Hugo talk about data science tool building, what it took to get pandas off the ground and how he approaches building “human interfaces to data” to make individuals more productive. On top of this, they’ll talk about the future of data science tooling, including the Apache arrow project and how it can facilitate this future, the importance of DataFrames that are portable between programming languages and building tools that facilitate data analysis work in the big data limit. Pandas initially arose from Wes noticing that people were nowhere near as productive as they could be due to lack of tooling & the projects he’s working on today, which they’ll discuss, arise from the same place and present a bold vision for the future.LINKS FROM THE SHOWDATAFRAMED SURVEYDataFramed Survey (take it so that we can make an even better podcast for you)DATAFRAMED GUEST SUGGESTIONSDataFramed Guest Suggestions (who do you want to hear on Season 2?)FROM THE INTERVIEWWes on TwitterRoads and Bridges: The Unseen Labor Behind Our Digital Infrastructure by Nadia Eghbalpandas, an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.Ursa LabsFROM THE SEGMENTSData Science Best Practices (with Ben Skrainka ~17:10)To Explain or To Predict? (By Galit Shmueli)Statistical Modeling: The Two Cultures (By Leo Breiman)The Book of Why (By Judea Pearl & Dana Mackenzie)Studies in Interpretability (with Peadar Coyle at ~39:00)Modelling Loss Curves in Insurance with RStan (By Mick Cooney)Lime: Explaining the predictions of any machine learning classifier Probabilistic Programming PrimerOriginal music and sounds by The Sticks.
Hilary and Roger follow up on Uber's self-driving car accident, sensitivity and specificity, Ronald Fisher's final destination, and Wes McKinney's announcement of Ursa Labs. Also, this episode debuts a new segment: Hilary at Work! Show notes: Cambridge Analytica shutting down: https://www.gizmodo.com.au/2018/05/cambridge-analytica-is-shutting-down/ Sensitivity and specificity for automated cars: Uber report - https://www.theverge.com/2018/5/7/17327682/uber-self-driving-car-decision-kill-swerve Sensitivity and specificity revisited - “Amazon, laugh”: https://www.geekwire.com/2018/mystery-solved-amazon-explains-cracked-alexa-ais-laugh-response-will-change/ International chart day: https://twitter.com/repmarktakano/status/989588551443664896?s=21 NSSD Episode 7 "Statistical Royalty" where Hilary meets Gosset's great-great-grandson: http://nssdeviations.com/episode-7-statistical-royalty Support us through our Patreon page: https://www.patreon.com/NSSDeviations Roger on Twitter: https://twitter.com/rdpeng Hilary on Twitter: https://twitter.com/hspter Get the Not So Standard Deviations book: https://leanpub.com/conversationsondatascience/ Subscribe to the podcast on Apple Podcasts: https://itunes.apple.com/us/podcast/not-so-standard-deviations/id1040614570 Subscribe to the podcast on Google Play: https://play.google.com/music/listen?u=0#/ps/Izfnbx6tlruojkfrvhjfdj3nmna Find past episodes: http://nssdeviations.com Contact us at nssdeviations@gmail.com
Chang She started his career at hedge fund AQR Capital Management before leaving finance to co-found DataPad with his college classmate, Wes McKinney. DataPad was acquired by Cloudera in 2014 and now Chang manages a team within Cloudera.
Mark is joined in this episode of Drill to Detail by Wes McKinney, to talk about the origins of the Python Pandas open-source package for data analysis and his subsequent work as a contributor to the Kudu (incubating) and Parquet projects within the Apache Software Foundation and Arrow, an in-memory data structure specification for use by engineers building data systems and the de-facto standard for columnar in-memory processing and interchange.
Mark is joined in this episode of Drill to Detail by Wes McKinney, to talk about the origins of the Python Pandas open-source package for data analysis and his subsequent work as a contributor to the Kudu (incubating) and Parquet projects within the Apache Software Foundation and Arrow, an in-memory data structure specification for use by engineers building data systems and the de-facto standard for columnar in-memory processing and interchange.
¡Muy buenos días a todos! Hoy vengo a presentaros, como el pasado viernes, libros técnicos de trading. En este caso, más tecnológico y es que creo que para los que quieran combinar la parte más tecnológica con la parte del trading. Voy a subdividirlo en dos vertientes muy diferenciadas y de antemano os aviso que todos los libros que voy a exponer son los que me he leído o los que tengo en cola para leer. Lo que habrá alguno que os recomiende por el hecho que lo he encontrado interesante y esté esperando a leerlo cuando acabe el resto. Otra cosa que cabe decir es que la mayoría de libros (o toda creo de hecho) está en inglés. En castellano no hay nada al respecto que valga la pena o que al menos me haya topado. Antes de empezar por eso quiero decir dos cosas. Esta es la primera parte del podcast de libros técnicos de trading porque da para mucho, sin duda. La segunda cosas es especificar que todos los libros que voy a decir hoy van enfocados sobretodo a gente que sepa programación o que quiera aprender a base de hostias. Siento decirlo así de brusco, pero estos últimos pueden ser gente que quiera aprender a usar sus habilidades de trading para automatizar procesos. Bien, pues estas personas quiero que tengan en cuenta que les costará al principio porque no solo tienen que chocar con la vertiente de entender como hacer las cosas sino amueblar la cabeza a nivel de programación y entender que todo es diferente y lógico. Hay gente que conozco que ha podido y puede de hecho, programar muy bien viniendo de una vertiente de letras. Y es que al final como en todo señores, se han de poner horas de trabajo y estudio. Pues venga, empezamos con la parte generalista. Qué quiere decir. Pues muy fácil: lo que se ha de hacer antes de empezar con la parte técnica, es estudiar programación aplicada. Es decir, los lenguajes de programación que nos van a servir para poder hacer estos códigos. Y es que lo más importante es saber qué tipo de códigos haremos. Podemos hacer lenguajes de programación más aplicados a una plataforma o más específicos. Voy a pone un ejemplo. Cuando estamos dispuestos a usar una plataforma como es Metatrader 4 de Metaquotes, hará falta que sepamos del lenguaje de programación C y que además apliquemos lo que ellos añaden a este lenguaje que sirve para poder operar y poner las operaciones. Esto es no es poca cosa y de hecho, ellos han hecho un manual entero (sin necesidad de libros alternativos) para que puedas leerte su documentación. A parte, hay una larga y extensa documentación de artículos muy interesantes de como usar su plataforma MT4 con infinidad de usos o conectores alternativos. Otros lenguajes como Easylanguage, el cual está usándose en plataformas como ProRealTime, TradeStation o Multicharts, aunque ellos te dan un propio manual para poder aprender este lenguaje que aunque se llame easylanguage, para los iniciados no será tan fácil, ya que se ha de entender, saber y conocer la lógica de programación para poder sacarle todo el partido necesario. Como supongo que entenderéis todos. Pues bien, en la parte generalista hay lenguajes de programación como Python y R. Sobretodo me centraré en estos dos ya que la semana que viene haremos un repaso de que se puede hacer con ellos, que no es para nada poco, ya veréis y es por eso que me centraré sobretodo en estos dos de momento, aunque haré hincapié más adelante y en los cursos a lenguajes de programación de un poco más alto nivel como Java. También haré algo de repaso de C++, aunque de hecho haré menos, ya que hay mucha gente que prefiere no bajar tanto de nivel si no es para una cosa muy especifica. Para aquellos que querais un lenguaje diferente o algo, hacedmelo saber y sin problemas, lo hago! En cuanto a libros de Python (sin tener en cuenta la vertiente de trading) podemos tener la posibilidad de aprender rápido o a un ritmo un poco más normal. En el caso que no tengas tiempo para leer mucho, te recomiendo: – Python: Learn Python in One Day and Learn It Well de Jamie Chan. Este libro se lee solo. Es super-sencillo. En pocas horas aprendes lo básico de python. Lo necesario para plasmar ideas, para aprender a manejarte lo suficiente como para empezar. No necesitas más Yo como soy un grandísimo amante de los libros de O’reilly, voy a recomendaros unos cuantos en este podcast y empezamos por: – Learning Python: Powerful Object-Oriented Programming de Mark Lutz. Es uno de los más completos que he encontrado nunca. La verdad es que si queréis profundizar en Python, no podéis dejar escapar este. Sin duda. – Python for Data Analysis de Wes McKinney. Lo que más me gustó de este libro es que no solo enseña Python desde una vertiente de datos. Sino que explica todo con ejemplos prácticos. Desde el minuto 0 obtiene datos de diferentes fuentes y las trata para demostrarte el potencial de estos lenguajes. Sinceramente, me lo volvería a leer. Ahora vamos con R: – Learning R: A Step-by-Step Function Guide to Data Analysis de Richard Cotton. Es un básico de R. Este libro de hecho esta enfocado a que aprendas a fondo R desde una parte más metodológica. Tanto que incluso explican condicionales, funciones y todo como si no tuvieras ni idea de programación. Para aquellos que quieran empezar, un libro recomendable. – R for Data Science de Hadley Wickham. Este libro está más enfocado a trato de datos, de importación de estos, de trabajar con cantidades grandes de información y de maneras más prácticas y sencillas de un uso masivo de estos datos. La ventaja de este libro es que para aquellos que se quieran ahorrar unos dinerillos, os dejo el link de la página web que ellos mismos han creado para plasmar todo el contenido del libro en versión web: http://r4ds.had.co.nz/. Y como os decía al principio, no quiero alargarme mucho en este podcast ya que de estos 5 libros, me gustaría derivar los interesantes para mi que los haré el viernes que viene. Allí si que hablaré exclusivamente de libros aplicados a finanzas y son los que realmente mucha gente espera como recomendación. En cualquier caso, si necesitáis algún libro más especifico con algún lenguaje de programación aplicada a inversiones especifico, hazedmelo saber a través del formulario de contacto: ferranp.com/contactar y os contesto lo antes posible para poder resolver vuestras dudas. ¡Y por hoy ya está! Tal y como llegó el episodio 1, estoy acabando el 50. Acordaros de suscribiros al canal y de darme un me gusta en iVoox y 5 estrellas en iTunes! ¡Muchas gracias! ¡Buen fin de semana a todos! ¡Hasta el lunes! La entrada 50. Libros técnicos de trading I aparece primero en Ferran P..
¡Muy buenos días a todos! Hoy vengo a presentaros, como el pasado viernes, libros técnicos de trading. En este caso, más tecnológico y es que creo que para los que quieran combinar la parte más tecnológica con la parte del trading. Voy a subdividirlo en dos vertientes muy diferenciadas y de antemano os aviso que todos los libros que voy a exponer son los que me he leído o los que tengo en cola para leer. Lo que habrá alguno que os recomiende por el hecho que lo he encontrado interesante y esté esperando a leerlo cuando acabe el resto. Otra cosa que cabe decir es que la mayoría de libros (o toda creo de hecho) está en inglés. En castellano no hay nada al respecto que valga la pena o que al menos me haya topado. Antes de empezar por eso quiero decir dos cosas. Esta es la primera parte del podcast de libros técnicos de trading porque da para mucho, sin duda. La segunda cosas es especificar que todos los libros que voy a decir hoy van enfocados sobretodo a gente que sepa programación o que quiera aprender a base de hostias. Siento decirlo así de brusco, pero estos últimos pueden ser gente que quiera aprender a usar sus habilidades de trading para automatizar procesos. Bien, pues estas personas quiero que tengan en cuenta que les costará al principio porque no solo tienen que chocar con la vertiente de entender como hacer las cosas sino amueblar la cabeza a nivel de programación y entender que todo es diferente y lógico. Hay gente que conozco que ha podido y puede de hecho, programar muy bien viniendo de una vertiente de letras. Y es que al final como en todo señores, se han de poner horas de trabajo y estudio. Pues venga, empezamos con la parte generalista. Qué quiere decir. Pues muy fácil: lo que se ha de hacer antes de empezar con la parte técnica, es estudiar programación aplicada. Es decir, los lenguajes de programación que nos van a servir para poder hacer estos códigos. Y es que lo más importante es saber qué tipo de códigos haremos. Podemos hacer lenguajes de programación más aplicados a una plataforma o más específicos. Voy a pone un ejemplo. Cuando estamos dispuestos a usar una plataforma como es Metatrader 4 de Metaquotes, hará falta que sepamos del lenguaje de programación C y que además apliquemos lo que ellos añaden a este lenguaje que sirve para poder operar y poner las operaciones. Esto es no es poca cosa y de hecho, ellos han hecho un manual entero (sin necesidad de libros alternativos) para que puedas leerte su documentación. A parte, hay una larga y extensa documentación de artículos muy interesantes de como usar su plataforma MT4 con infinidad de usos o conectores alternativos. Otros lenguajes como Easylanguage, el cual está usándose en plataformas como ProRealTime, TradeStation o Multicharts, aunque ellos te dan un propio manual para poder aprender este lenguaje que aunque se llame easylanguage, para los iniciados no será tan fácil, ya que se ha de entender, saber y conocer la lógica de programación para poder sacarle todo el partido necesario. Como supongo que entenderéis todos. Pues bien, en la parte generalista hay lenguajes de programación como Python y R. Sobretodo me centraré en estos dos ya que la semana que viene haremos un repaso de que se puede hacer con ellos, que no es para nada poco, ya veréis y es por eso que me centraré sobretodo en estos dos de momento, aunque haré hincapié más adelante y en los cursos a lenguajes de programación de un poco más alto nivel como Java. También haré algo de repaso de C++, aunque de hecho haré menos, ya que hay mucha gente que prefiere no bajar tanto de nivel si no es para una cosa muy especifica. Para aquellos que querais un lenguaje diferente o algo, hacedmelo saber y sin problemas, lo hago! En cuanto a libros de Python (sin tener en cuenta la vertiente de trading) podemos tener la posibilidad de aprender rápido o a un ritmo un poco más normal. En el caso que no tengas tiempo para leer mucho, te recomiendo: – Python: Learn Python in One Day and Learn It Well de Jamie Chan. Este libro se lee solo. Es super-sencillo. En pocas horas aprendes lo básico de python. Lo necesario para plasmar ideas, para aprender a manejarte lo suficiente como para empezar. No necesitas más Yo como soy un grandísimo amante de los libros de O’reilly, voy a recomendaros unos cuantos en este podcast y empezamos por: – Learning Python: Powerful Object-Oriented Programming de Mark Lutz. Es uno de los más completos que he encontrado nunca. La verdad es que si queréis profundizar en Python, no podéis dejar escapar este. Sin duda. – Python for Data Analysis de Wes McKinney. Lo que más me gustó de este libro es que no solo enseña Python desde una vertiente de datos. Sino que explica todo con ejemplos prácticos. Desde el minuto 0 obtiene datos de diferentes fuentes y las trata para demostrarte el potencial de estos lenguajes. Sinceramente, me lo volvería a leer. Ahora vamos con R: – Learning R: A Step-by-Step Function Guide to Data Analysis de Richard Cotton. Es un básico de R. Este libro de hecho esta enfocado a que aprendas a fondo R desde una parte más metodológica. Tanto que incluso explican condicionales, funciones y todo como si no tuvieras ni idea de programación. Para aquellos que quieran empezar, un libro recomendable. – R for Data Science de Hadley Wickham. Este libro está más enfocado a trato de datos, de importación de estos, de trabajar con cantidades grandes de información y de maneras más prácticas y sencillas de un uso masivo de estos datos. La ventaja de este libro es que para aquellos que se quieran ahorrar unos dinerillos, os dejo el link de la página web que ellos mismos han creado para plasmar todo el contenido del libro en versión web: http://r4ds.had.co.nz/. Y como os decía al principio, no quiero alargarme mucho en este podcast ya que de estos 5 libros, me gustaría derivar los interesantes para mi que los haré el viernes que viene. Allí si que hablaré exclusivamente de libros aplicados a finanzas y son los que realmente mucha gente espera como recomendación. En cualquier caso, si necesitáis algún libro más especifico con algún lenguaje de programación aplicada a inversiones especifico, hazedmelo saber a través del formulario de contacto: ferranp.com/contactar y os contesto lo antes posible para poder resolver vuestras dudas. ¡Y por hoy ya está! Tal y como llegó el episodio 1, estoy acabando el 50. Acordaros de suscribiros al canal y de darme un me gusta en iVoox y 5 estrellas en iTunes! ¡Muchas gracias! ¡Buen fin de semana a todos! ¡Hasta el lunes! La entrada 50. Libros técnicos de trading I aparece primero en Ferran P..
Naoya Ito さんをゲストに迎えて、デザインパターン、Python, Pandas, データサイエンス、マネージメントなどについて話しました。 Show Notes 増田 (はてな匿名ダイアリー) Rebuild: 169: Your Blog Can Be Generated By Neural Networks (omo) リーダブルコード Java言語で学ぶデザインパターン入門 マルチスレッド編 gensim Pandas Data Frame | R Tutorial Project Jupyter Is the Data Science market getting flooded? Network Programming with Perl Anaconda Perltidy Python for Data Analysis - O'Reilly Media Pythonによるデータ分析入門 Grumpy: Go running Python! Compiling Rust to WebAssembly Guide Rebuild: 97: Minimum Viable Standard (omo) 経営の”踊り場”問題 悪いヤツほど出世する 伊藤直也の1人CTO Nightの記事書き起こし Renewing Medium’s focus
I'm joined by Wes McKinney (@wesmckinn) and Hadley Wickham (@hadleywickham) on this episode to discuss their joint project Feather. Feather is a file format for storing data frames along with some metadata, to help with interoperability between languages. At the time of recording, libraries are available for R and Python, making it easy for data scientists working in these languages to quickly and effectively share datasets and collaborate.