Open-source Python library for scientific computing
POPULARITY
If we want AI systems that actually work in production, we need better infrastructure—not just better models. In this episode, Hugo talks with Akshay Agrawal (Marimo, ex-Google Brain, Netflix, Stanford) about why data and AI pipelines still break down at scale, and how we can fix the fundamentals: reproducibility, composability, and reliable execution. They discuss:
Auphonic (click here to comment) 25. Februar 2025, Jochen
This is a recap of the top 10 posts on Hacker News on September 19th, 2024.This podcast was generated by wondercraft.ai(00:36): Gaining access to anyones Arc browser without them even visiting a websiteOriginal post: https://news.ycombinator.com/item?id=41597250&utm_source=wondercraft_ai(01:50): GitHub notification emails used to send malwareOriginal post: https://news.ycombinator.com/item?id=41596466&utm_source=wondercraft_ai(03:01): Visual guide to SSH tunneling and port forwarding (2023)Original post: https://news.ycombinator.com/item?id=41596818&utm_source=wondercraft_ai(04:25): Linux/4004: booting Linux on Intel 4004 for fun, art, and no profitOriginal post: https://news.ycombinator.com/item?id=41600756&utm_source=wondercraft_ai(05:51): CuPy: NumPy and SciPy for GPUOriginal post: https://news.ycombinator.com/item?id=41601730&utm_source=wondercraft_ai(07:07): Contextual RetrievalOriginal post: https://news.ycombinator.com/item?id=41598119&utm_source=wondercraft_ai(08:19): Visualizing Weather Forecasts Through Landscape ImageryOriginal post: https://news.ycombinator.com/item?id=41603546&utm_source=wondercraft_ai(09:24): Why Apple Uses JPEG XL in the iPhone 16 and What It Means for Your PhotosOriginal post: https://news.ycombinator.com/item?id=41598170&utm_source=wondercraft_ai(10:41): Training Language Models to Self-Correct via Reinforcement LearningOriginal post: https://news.ycombinator.com/item?id=41600179&utm_source=wondercraft_ai(11:55): Openpilot – Operating system for roboticsOriginal post: https://news.ycombinator.com/item?id=41600177&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai
Talk Python To Me - Python conversations for passionate developers
Python performance has come a long way in recent times. And it's often the data scientists, with their computational algorithms and large quantities of data, who care the most about this form of performance. It's great to have Stan Seibert back on the show to talk about Python's performance for data scientists. We cover a wide range of tools and techniques that will be valuable for many Python developers and data scientists. Episode sponsors Posit Talk Python Courses Links from the show Stan on Twitter: @seibert Anaconda: anaconda.com High Performance Python with Numba training: learning.anaconda.cloud PEP 0703: peps.python.org Python 3.13 gets a JIT: tonybaloney.github.io Numba: numba.pydata.org LanceDB: lancedb.com Profiling tips: docs.python.org Memray: github.com Fil: a Python memory profiler for data scientists and scientists: pythonspeed.com Rust: rust-lang.org Granian Server: github.com PIXIE at SciPy 2024: github.com Free threading Progress: py-free-threading.github.io Free Threading Compatibility: py-free-threading.github.io caniuse.com: caniuse.com SPy, presented at PyCon 2024: us.pycon.org Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy
We're following one simple rule to build a Linux desktop so stable it could outlive us.Sponsored By:Tailscale: Tailscale is a programmable networking software that is private and secure by default - get it free on up to 100 devices!Kolide: Kolide is a device trust solution for companies with Okta, and they ensure that if a device isn't trusted and secure, it can't log into your cloud apps.Core Contributor Membership: Save $3 a month on your membership, and get the Bootleg and ad-free version of the show. Code: MAYSupport LINUX UnpluggedLinks:
Topics covered in this episode: NumFOCUS concerns leaping pytest debugger llm Extra, Extra, Extra, PyPI has completed its first security audit Extras Joke Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training The Complete pytest Course Patreon Supporters Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Brian #1: NumFOCUS concerns Suggested by Pamphile Roy Write up of the current challenges faced by NumFOCUS, by Paul Ivanov (one of the OG of Scientific Python: Jupyter, Matplotlib, etc.) Struggling to meet the needs of sponsored and affiliated projects. In February, NumFOCUS announced it is moving in a new direction. NumFOCUS initiated an effort to run an election for open board seats and proposed changing its governance structure. Some projects are considering and actively pursuing alternative venues for fiscal sponsorship. Quite a bit more detail and discussion in the article. NumFOCUS covers a lot of projects NumPy, Matplotlib, pandas, Jupyter, SciPy, Astropy, Bokeh, Dask, Conda, and so many more. Michael #2: leaping pytest debugger llm You can ask Leaping questions like: Why am I not hitting function x? Why was variable y set to this value? What was the value of variable x at this point? What changes can I make to this code to make this test pass? Brian #3: Extra, Extra, Extra, 2024 Developer Summit Also suggested by Pamphile, related to Scientific Python The Second Scientific Python Developer Summit , June 3-5, Seattle, WA Lots of great work came out of the First Summit in 2023 pytest-regex - Use regexs to specify tests to run Came out of the '23 summit I'm not sure if I'm super happy about this or a little afraid that I probably could use this. Still, cool that it's here. Cool short example of using __init__ and __call__ to hand-roll a decorator. ruff got faster Michael #4: PyPI has completed its first security audit Trail of Bits spent a total of 10 engineer-weeks of effort identifying issues, presenting those findings to the PyPI team, and assisting us as we remediated the findings. Scope: The audit was focused on "Warehouse", the open-source codebase that powers pypi.org As a result of the audit, Trail of Bits detailed 29 different advisories discovered across both codebases. When evaluating severity level of each advisory, 14 were categorized as "informational", 6 as "low", 8 as "medium" and zero as "high". Extras Brian: pytest course community to try out Podia Communities. Anyone have a podia community running strong now? If so, let me know through Mastodon: @brianokken@fosstodon.org Want to join the community when it's up and running? Same. Or join our our friends of the show list, and read our newsletter. I'll be sure to drop a note in there when it's ready. Michael: VS Code AMA @ Talk Python [video] Gunicorn CVE Talk submissions are now open for both remote and in-person talks at the 2024 PyConZA? The conference will be held on 3 and 4 October 2024 in Cape Town, South Africa. Details are on za.pycon.org. FlaskCon 2024 will be happening Friday, May 17 inside PyCon US 2024. Call for proposals are now live! Joke: Debugging with your eyes
Explore the origins of NumPy and SciPy with their creator, Dr. Travis Oliphant. Discover the journey from personal need to global impact, the challenges overcome, and the future of these essential Python libraries in scientific computing and data science. This episode is brought to you by the DataConnect Conference (https://www.dataconnectconf.com/dccwest/conference), by Data Universe, the out-of-this-world data conference (https://datauniverse2024.com), and by CloudWolf (https://www.cloudwolf.com/sds), the Cloud Skills platform. Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information. In this episode you will learn: • Travis's journey to creating NumPy and SciPy [08:05] • How Anaconda got started [42:24] • How Numba, a high-performance Python compiler, was brought to market [54:48] • Python's influence on the thought processes of scientists and engineers [1:04:21] • The commercial projects that support Travis's vast open-source efforts and communities [1:10:22] • How to get involved in Travis's commercial projects and communities [1:22:34] • The future of scientific computing and Python libraries [1:29:50] Additional materials: www.superdatascience.com/765
Happy leap year day everyone, very excited to bring you a special once-in-a-4 year edition of ThursdAI
Hugo speaks with Johno Whitaker, a Data Scientist/AI Researcher doing R&D with answer.ai. His current focus is on generative AI, flitting between different modalities. He also likes teaching and making courses, having worked with both Hugging Face and fast.ai in these capacities. Johno recently reminded Hugo how hard everything was 10 years ago: “Want to install TensorFlow? Good luck. Need data? Perhaps try ImageNet. But now you can use big models from Hugging Face with hi-res satellite data and do all of this in a Colab notebook. Or think ecology and vision models… or medicine and multimodal models!” We talk about where we've come from regarding tooling and accessibility for foundation models, ML, and AI, where we are, and where we're going. We'll delve into What the Generative AI mindset is, in terms of using atomic building blocks, and how it evolved from both the data science and ML mindsets; How fast.ai democratized access to deep learning, what successes they had, and what was learned; The moving parts now required to make GenAI and ML as accessible as possible; The importance of focusing on UX and the application in the world of generative AI and foundation models; The skillset and toolkit needed to be an LLM and AI guru; What they're up to at answer.ai to democratize LLMs and foundation models. LINKS The livestream on YouTube (https://youtube.com/live/hxZX6fBi-W8?feature=share) Zindi, the largest professional network for data scientists in Africa (https://zindi.africa/) A new old kind of R&D lab: Announcing Answer.AI (http://www.answer.ai/posts/2023-12-12-launch.html) Why and how I'm shifting focus to LLMs by Johno Whitaker (https://johnowhitaker.dev/dsc/2023-07-01-why-and-how-im-shifting-focus-to-llms.html) Applying AI to Immune Cell Networks by Rachel Thomas (https://www.fast.ai/posts/2024-01-23-cytokines/) Replicate -- a cool place to explore GenAI models, among other things (https://replicate.com/explore) Hands-On Generative AI with Transformers and Diffusion Models (https://www.oreilly.com/library/view/hands-on-generative-ai/9781098149239/) Johno on Twitter (https://twitter.com/johnowhitaker) Hugo on Twitter (https://twitter.com/hugobowne) Vanishing Gradients on Twitter (https://twitter.com/vanishingdata) SciPy 2024 CFP (https://www.scipy2024.scipy.org/#CFP) Escaping Generative AI Walled Gardens with Omoju Miller, a Vanishing Gradients Livestream (https://lu.ma/xonnjqe4)
Topics covered in this episode: uv: Python packaging in Rust jpterm Everything You Can Do with Python's textwrap Module HTML First Extras Joke Watch on YouTube About the show Sponsored by ScoutAPM: pythonbytes.fm/scout Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. First, we are likely skipping next week folks. I'll be at PyCon Philippines. Brian #1: uv: Python packaging in Rust Suggested by Collin Sullivan “uv is designed as a drop-in replacement for pip and pip-tools” Intended to support the pip and pip-tools APIs, just use uv pip instead. Oh yeah, also replaces venv and virtualenv. And it's super zippy, as you would expect. I'm still getting used to it uv pip venv didn't have --prompt at first. But that's fixed. should get released soon. first thing I tried uv pip install ./ and uv pip install pytest second. worked awesome uv pip list third thing I tried not there either, but uv pip freeze is similar. Issue already filed Seriously, I'm excited about this. It's just that it seems I wasn't the target workflow for this. See also tox-uv - speed up tox with uv [rye](https://lucumr.pocoo.org/2024/2/15/rye-grows-with-uv/) from Armin Ronacher, will be supported by Astral - MK: Switched to this for dev. It's excellent. For some reason, doesn't work on Docker? From Henry Michael #2: jpterm via David Brochart jpterm is a JupyterLab-like environment running in the terminal. What sets jpterm apart is that it builds on the shoulders of giants, one of which is Textual. It is designed similarly to JupyterLab, where everything is a plugin. Brian #3: Everything You Can Do with Python's textwrap Module Martin Heinz Nice quick demo of one of my favorite builtin modules. Features shorten text and insert placeholders wrap can split lines to the same length but can also just split a string into equal chunks for batch processing TextWrapper class does all sorts of fancy stuff. dedent is my fave. Awesome for including a multiline string in a test function as an expected outcome. Michael #4: HTML First HTML First is a set of guidelines for making it easier, faster and more maintainable to build web software Principles Leveraging the default capabilities of modern web browsers. Leveraging the extreme simplicity of HTML's attribute syntax. Leveraging the web's ViewSource affordance. Practices Prefer Vanilla approaches Use HTML attributes for styling and behaviour Use libraries that leverage HTML attributes Avoid Build Steps Prefer Naked HTML Be View-Source Friendly Extras Brian: pytest 8.0.1 released. Fixes the parametrization order reversal I mentioned a couple episodes ago, plus some other fixes. Learn about dependency injection from Hynek If you want to jump into some Rust to help speed up Python tools, maybe check out yarr.fyi I just interviewed Nicole, the creator, for Python Test, and this looks pretty cool Her episode should come out in a couple of weeks. Ramping up more interviews for Python People. So please let me know if you'd like to be on the show or if you have suggestions for people you'd like me to interview. Also, I know this is weird, some people are still on X, and not like “didn't close their account when they left”, but actually still using it. This is ironically a reverse of X-Files. “I don't want to believe”. However, I've left my account open for those folks. I check it like twice a month. But eventually I'll see it if you DM me. But really, there are easier ways to reach me. Michael: PyData Pittsburg CFP Wyden: Data Broker Used Abortion Clinic Visitor Location Data To Help Send Targeted Misinformation To Vulnerable Women SciPy 2024 - Call for Proposals Joke: Yeti tumbler
Topics covered in this episode: Leaving the cloud PEP 723 - Inline script metadata Flet for Android harlequin: The SQL IDE for Your Terminal. Extras Joke Watch on YouTube About the show Sponsored by Bright Data : pythonbytes.fm/brightdata Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Michael #1: Leaving the cloud Also see Five values guiding our cloud exit We value independence above all else. We serve the internet. We spend our money wisely. We lead the way. We seek adventure. And We stand to save $7m over five years from our cloud exit Slice our new monster 192-thread Dell R7625s into isolated VMs Which added a combined 4,000 vCPUs with 7,680 GB of RAM and 384TB of NVMe storage to our server capacity They created Kamal — Deploy web apps anywhere A lot of these ideas have changed how I run the infrastructure at Talk Python and for Python Bytes. Brian #2: PEP 723 - Inline script metadata Author: Ofek Lev This PEP specifies a metadata format that can be embedded in single-file Python scripts to assist launchers, IDEs and other external tools which may need to interact with such scripts. Example: # /// script # requires-python = ">=3.11" # dependencies = [ # "requests<3", # "rich", # ] # /// import requests from rich.pretty import pprint resp = requests.get("https://peps.python.org/api/peps.json") data = resp.json() pprint([(k, v["title"]) for k, v in data.items()][:10]) Michael #3: Flet for Android via Balázs Remember Flet? Here's a code sample (scroll down a bit). It's amazing but has been basically impossible to deploy. Now we have Android. Here's a good YouTube video showing the build process for APKs. Brian #4: harlequin: The SQL IDE for Your Terminal. Ted Conbeer & other contributors Works with DuckDB and SQLite Speaking of SQLite Jeff Triplett and warnings of using Docker and SQLite in production Anže's post and and article: Django, SQLite, and the Database is Locked Error Extras Brian: Recent Python People episodes Will Vincent Julian Sequeira Pamela Fox Michael: PageFind and how I'm using it When "Everything" Becomes Too Much: The npm Package Chaos of 2024 Essay: Unsolicited Advice for Mozilla and Firefox SciPy 2024 is coming to Washington Joke: Careful with that bike lock combination code
Guests Amanda Casari | Julie Ferraioli | Juniper Lovato Panelist Richard Littauer Show Notes In today's episode of Sustain, Richard is joined by guests, Amanda Casari, devrel engineer and open source researcher at Google Open Source Programs Office, Julie Ferraioli, an independent open source strategist, researcher, practitioner, and Partner at Open Chapters, and Juniper Lovato, Director of partnerships and programs at the Vermont Complex Systems Center at UVM and Data Ethics researcher. Amanda, Julia, and Juniper join the discussion, bringing a wealth of expertise in the open source domain. The conversation gravitates towards an article co-authored by the guests, striking a balance between open source software and open source ecosystems research. The episode dives deep into the “10 simple things” format of the article, the crucial importance of collective conversations, and a keen exploration of open source researchers. Hit download now to hear more cool stuff! [00:01:29] Richard tells us why he invited our three guests today and he talks about their previous accomplishments and backgrounds. [00:02:17] Our discussion moves to the title of a new article co-authored by the guests. We hear about the intended audience of the article and the distinction made between open source software and open source ecosystems research. [00:03:31] Richard brings up where the article fits in the academic landscape, and it's revealed to be more editorial than research. [00:04:17] There's a conversation about the “10 simple things” format, its origin, and the motivation behind it. They put an emphasis on the need for collective conversation and the value of sharing experiences and knowledge. [00:07:28] Richard brings up the idea of open source researchers and mentions various figures and institutions involved in open source research. Juniper clarifies the target audience for the article and its intentions, Julie shares her perspective from the industry side and the importance of a critical framework, and Amanda expresses her emotional response to some researchers' approach towards the open source community. [00:12:03] Julie discusses the emotional challenges that inspired the paper's best practices emphasizing not repeating negative behaviors, and Juniper notes tension in research between benefits for the community and for the researchers emphasizing understanding norms and values for studying open source communities. [00:13:52] Richard mentions there are nine principles in the paper and asks about the principle regarding treating open source ecosystems as systems “in production.” Amanda highlights the importance of considering the real-world impact of research in open source and mentions an incident where a university was banned from the Linux kernel due to disruptive changes. [00:16:33] Julie emphasizes the potential broader impact on industry systems when modifying open source systems and she raises the point that tampering with open source systems might inadvertently affect critical infrastructure. Amanda comments on the increasing cybersecurity concerns around open source. [00:19:18] Richard brings up a real-world example of a university introducing bugs to the Linux kernel and points out the need for considering ethical implications beyond just production systems. [00:20:59] Richard draws parallels between addressing these issues and addressing racism, and Juniper adds that the scientific process is ongoing and should evolve with technology and societal values. [00:21:53] Julie describes the complexity of open source funding and compensation and points out the challenge in understanding motivations and expectations of open source participants. [00:24:07] Amanda emphasizes the difficulty of summarizing each section, noting that each one could be a chapter or book and she expresses her concerns about not just individual equity but organizational equity. [00:25:59] Juniper raises the issue of invisible labor in open source. [00:26:39] Julie highlights the importance of recognizing that open source repository data might not capture all the activity and contributions made by community members. [00:27:37] Amanda discusses the challenges and importance of capturing data, especially when it may put individuals at risk. Juniper stresses the importance of involving communities in the research process and gaining their consent, ensuring their dignity, security, and privacy. [00:29:49] Julie discusses the complexities of identity within the open source community, highlighting that individuals can hold multiple identities in this space. [00:31:10] Richard adds that the insight shared are not just for open source researchers but also for anyone involved in the open source ecosystem. He emphasizes the need to be aware of biases and the importance of understanding the data one works with. [00:32:22] Richard prompts a summary of the main points in the paper, which are read by our guests. [00:34:48] Find out where you can learn more about our guests and their work online. Quotes [00:20:08] “Production as the end line for ethical values leads to a lot of really thorny edge cases that are going to ultimately hurt the communities of people who aren't working on production ready systems.” [00:21:20] “Just as open source is always in production, so is the scientific process.” [00:23:24] “Even having the privilege of time to dedicate to open source is not available to all.” [00:24:26] “It's just not individual equity but organizational equity.” [00:25:47] “We can't ignore the very large industry that is open source that has all that money moving around and where it's going is a question we should all be asking.” [00:26:00] “There's a lot of invisible labor in open source.” [00:28:32] “Leaving out communities from the scientific process of the research process leaves open these vulnerabilities without giving them a voice to what kind of research is being done about them without their consent.” [00:29:17] “What we are starting to consider acceptable surveillance in public is really being challenged.” [00:29:33] “It's really important for us to make sure that we're maintaining people's dignity, security, and privacy while we're doing this kind of research.” Spotlight [00:35:45] Richard's spotlight is The Long Trail that he's going to hike. [00:36:17] Amanda's spotlight is contributor-experience.org and the PyPI subpoena transparency report. [00:37:20] Julie's spotlight is the book, Data Feminism. [00:38:09] Juniper's spotlight is a new tool called, XGI. Links SustainOSS (https://sustainoss.org/) SustainOSS Twitter (https://twitter.com/SustainOSS?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) SustainOSS Discourse (https://discourse.sustainoss.org/) podcast@sustainoss.org (mailto:podcast@sustainoss.org) SustainOSS Mastodon (https://mastodon.social/tags/sustainoss) Open Collective-SustainOSS (Contribute) (https://opencollective.com/sustainoss) Richard Littauer Twitter (https://twitter.com/richlitt?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) Amanda Casari Twitter (https://twitter.com/amcasari) Amanda Casari Mastodon (https://hachyderm.io/@amcasari) Google Open Source (https://opensource.google/) Open Source Stories (http://opensourcestories.org/) Julia Ferraioli Twitter (https://twitter.com/juliaferraioli) Julia Ferraioli Website (https://www.juliaferraioli.com/) Open Chapters (https://openchapters.tech/) Juniper Lovato Website (https://juniperlovato.com/) Juniper Lovato Twitter (https://twitter.com/juniperlov) Vermont Complex Systems Center-UVM (https://www.complexityexplorer.org/explore/resources/75-vermont-complex-systems-center) Sustain Podcast-Episode 111: Amanda Casari on ACROSS and Measuring Contributions in OSS (https://podcast.sustainoss.org/111) XKCD (https://xkcd.com/) Beyond the Repository: Best practices for open source ecosystems researchers by Amanda Casari, Julia Ferraioli, and Juniper Lovato (https://dl.acm.org/doi/pdf/10.1145/3595879) Operationalizing the CARE and FAIR Principles for Indigenous data futures (scientific data) (https://www.nature.com/articles/s41597-021-00892-0) The Long Trail (https://www.greenmountainclub.org/the-long-trail/) Welcome to the Contributor Experience Handbook (https://contributor-experience.org/) Contributor experience-Why it matters (SciPy 2023) (https://blog.pypi.org/posts/2023-05-24-pypi-was-subpoenaed/) PyPI was subpoenaed by Ee Durbin (https://blog.pypi.org/posts/2023-05-24-pypi-was-subpoenaed/) Data Feminism by Catherine D'Ignazio and Lauren F. Klein (https://mitpress.mit.edu/9780262547185/data-feminism/) The CompleX Group Interactions (XGI) (https://xgi.readthedocs.io/en/stable/index.html) Credits Produced by Richard Littauer (https://www.burntfen.com/) Edited by Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/) Show notes by DeAnn Bahr Peachtree Sound (https://www.peachtreesound.com/) Special Guests: Amanda Casari, Julia Ferraioli, and Juniper Lovato.
This is a recap of the top 10 posts on Hacker News on November 8th, 2023.This podcast was generated by wondercraft.ai(00:38): Omegle 2009-2023Original post: https://news.ycombinator.com/item?id=38199355&utm_source=wondercraft_ai(02:31): Home Assistant blocked from integrating with Garage Door opener APIOriginal post: https://news.ycombinator.com/item?id=38188162&utm_source=wondercraft_ai(04:10): Hard-to-swallow truths they won't tell you about software engineer jobOriginal post: https://news.ycombinator.com/item?id=38188689&utm_source=wondercraft_ai(05:45): Major outages across ChatGPT and APIOriginal post: https://news.ycombinator.com/item?id=38190401&utm_source=wondercraft_ai(07:31): Spain lives in flats: why we have built our cities verticallyOriginal post: https://news.ycombinator.com/item?id=38189840&utm_source=wondercraft_ai(09:27): Chamberlain blocks smart garage door opener from working with smart homesOriginal post: https://news.ycombinator.com/item?id=38188614&utm_source=wondercraft_ai(11:23): EU's concealment of secret 'expert list' on CSAM regulation is maladministrationOriginal post: https://news.ycombinator.com/item?id=38189790&utm_source=wondercraft_ai(13:12): SciPy builds for Python 3.12 on Windows are a minor miracleOriginal post: https://news.ycombinator.com/item?id=38196412&utm_source=wondercraft_ai(15:05): Quake Brutalist Jam IIOriginal post: https://news.ycombinator.com/item?id=38191319&utm_source=wondercraft_ai(16:41): After luring customers with low prices, Amazon stuffs Fire TVs with adsOriginal post: https://news.ycombinator.com/item?id=38194818&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai
This week we talk about the dual use purposes of eBPF - both for security and for exploitation, and how you can keep your systems safe, plus we cover security updates for the Linux kernel, Ruby, SciPy, YAJL, ConnMan, curl and more.
Hugo speaks with Eric Ma about Research Data Science in Biotech. Eric leads the Research team in the Data Science and Artificial Intelligence group at Moderna Therapeutics. Prior to that, he was part of a special ops data science team at the Novartis Institutes for Biomedical Research's Informatics department. In this episode, Hugo and Eric talk about What tools and techniques they use for drug discovery (such as mRNA vaccines and medicines); The importance of machine learning, deep learning, and Bayesian inference; How to think more generally about such high-dimensional, multi-objective optimization problems; The importance of open-source software and Python; Institutional and cultural questions, including hiring and the trade-offs between being an individual contributor and a manager; How they're approaching accelerating discovery science to the speed of thought using computation, data science, statistics, and ML. And as always, much, much more! LINKS Eric's website (https://ericmjl.github.io/) Eric on twitter (https://twitter.com/ericmjl) Vanishing Gradients on YouTube (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) Cell Biology by the Numbers by Ron Milo and Rob Phillips (http://book.bionumbers.org/) Eric's JAX tutorials at PyCon (https://youtu.be/ztthQJQFe20) and SciPy (https://youtu.be/DmR36wtel4Y) Eric's blog post on Hiring data scientists at Moderna! (https://ericmjl.github.io/blog/2021/8/26/hiring-data-scientists-at-moderna-2021/)
Dr. Kathryn Huff, Ph.D. ( https://www.energy.gov/ne/person/dr-kathryn-huff ) is Assistant Secretary, Office of Nuclear Energy, U.S. Department of Energy, where she leads their strategic mission to advance nuclear energy science and technology to meet U.S. energy, environmental, and economic needs, both realizing the potential of advanced technology, and leveraging the unique role of the government in spurring innovation. Prior to her current role, Dr. Huff served as a Senior Advisor in the Office of the Secretary and also led the office as the Principal Deputy Assistant Secretary for Nuclear Energy. Before joining the Department of Energy, Dr. Huff was an Assistant Professor in the Department of Nuclear, Plasma, and Radiological Engineering at the University of Illinois at Urbana-Champaign where she led the Advanced Reactors and Fuel Cycles Research Group. She was also a Blue Waters Assistant Professor with the National Center for Supercomputing Applications. Dr. Huff was previously a Postdoctoral Fellow in both the Nuclear Science and Security Consortium and the Berkeley Institute for Data Science at the University of California - Berkeley. She received her PhD in Nuclear Engineering from the University of Wisconsin-Madison and her undergraduate degree in Physics from the University of Chicago. Her research focused on modeling and simulation of advanced nuclear reactors and fuel cycles. Dr. Huff is an active member of the American Nuclear Society, a past Chair of the Nuclear Nonproliferation and Policy Division as well as the Fuel Cycle and Waste Management Division, and recipient of both the Young Member Excellence and Mary Jane Oestmann Professional Women's Achievement awards. Through leadership within Software Carpentry, SciPy, the Hacker Within, and the Journal of Open Source Software she also advocates for best practices in open, reproducible scientific computing. Dr. Huff's book "Effective Computation in Physics: Field Guide to Research with Python" can be found on all major book sellers. Support the show
Watch on YouTube Sponsored by us! Support our work through: Our courses at Talk Python Training Test & Code Podcast Patreon Supporters Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.orgx Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Michael #1: Introducing Microsoft Security Copilot Security Copilot combines this advanced large language model (LLM) with a security-specific model from Microsoft. When Security Copilot receives a prompt from a security professional, it uses the full power of the security-specific model to deploy skills and queries that maximize the value of the latest large language model capabilities. Your data and stays within your control. It is not used to train the foundation AI models, and in fact, it is protected by the most comprehensive enterprise compliance and security controls. Brian #2: PEP 695 – Type Parameter Syntax “This PEP specifies an improved syntax for specifying type parameters within a generic class, function, or type alias. It also introduces a new statement for declaring type aliases.” To get a feel for this, jump to the examples One example Here is an example of a generic function today. from typing import TypeVar _T = TypeVar("_T") def func(a: _T, b: _T) -> _T: ... - And the new syntax. def func[T](a: T, b: T) -> T: ... Michael #3: Auto-GPT An experimental open-source attempt to make GPT-4 fully autonomous. This program, driven by GPT-4, chains together LLM "thoughts", to autonomously achieve whatever goal you set. Features
Travis Oliphant is an impactful programmer and data scientist. He is the CEO of OpenTeams & Quansight, the founder of Anaconda, and the creator of NumPy, SciPy, and Numba. On this episode of Mock IT, he joins Marie and guest host Wilson to chat about trustworthy and ethical artificial intelligence (#AI) and machine learning (#ML). Follow Along: + Website: https://bit.ly/3VNM5xL + LinkedIn: https://bit.ly/3DY5oN7 + Instagram: https://bit.ly/3Tmi4mx + Open Jobs: https://bit.ly/3GAscny + Watch Episode: https://youtu.be/jL6AdYeDTaA Helpful Links: + Find Travis: linkedin.com/in/teoliphant/ + OpenTeams: https://bit.ly/3vVuWqw + Quansight: https://quansight.com/ + PyData #Python Event Recap: https://youtu.be/kMCnyLMhuJU + ChatGPT: https://openai.com/blog/chatgpt/ + Google Colab: https://colab.research.google.com/ +ChatGPT in Schools Article: https://apnews.com/article/what-is-chat-gpt-ac4967a4fb41fda31c4d27f015e32660
Beyond chatbots, virtual assistants, and the evolution of GPS systems, artificial intelligence (AI), machine learning (ML), and the tools behind making it all work are shaping the present and future world. It may feel like these technologies are in the infant stage, and perhaps they are, but #AI and #ML solutions are here to stay. Senior Data Scientist Wilson Rodden sat down with podcast hosts Liz and Marie to chat about how humans are kept in the loop of AI/ML, the common misconceptions behind the technology, and if it's really as dangerous as some people think. Important Links: Meet Travis Oliphant (CEO of OpenTeams & Quansight, the founder of Anaconda, and the creator of NumPy, SciPy, and Numba) for free on Jan. 5: https://bit.ly/3vlQ8Wt Wilson Rodden LinkedIn: www.linkedin.com/in/wilson-rodden Follow Along for More: LinkedIn: https://bit.ly/3vlQ8Wt Instagram: https://bit.ly/3hRiN2F Website: https://bit.ly/3hxHFfe Timecodes: 1:28 Discussing the fourth anniversary of the IDEA Act 2:40 Five tips for managing stakeholder feedback as a UX'er 7:00 Interview with Data Scientist Wilson Rodden starts 8:50 What is AI/ML? 10:14 What is Big Data? 10:59 What is Predictive Modeling? 11:47 What is Computer Vision? 13:13 Common misconceptions about AI/ML 15:27 How to build trust in the government for AI innovations? 17:38 What is a biased algorithm? 24:34: What does human-in-the-loop mean? 35:22: Discussion on AI in the media 37:36 Is AI dangerous? 41:56: A data scientists' role on your team
Guest Melissa Mendonça Panelists Richard Littauer | Amanda Casari Show Notes Hello and welcome to Sustain! The podcast where we talk about sustaining open source for the long haul. Today, we are so excited to have a wonderful guest, Melissa Mendonça, joining us. Melissa is a Senior Developer Experience Engineer at Quansight, where she focuses more on developer experience and contributor experience. Today, we'll hear all about Quansight and the focus for Melissa's role as a Developer Experience Engineer. Melissa tells us about a grant they are working on with CZI that focuses on NumPy, SciPy, Matplotlib, and pandas, she shares several ideas on what can be done to make people feel seen and heard, and we hear her thoughts on what the future of community management and community development looks like for people entering the role of these projects. Go ahead and download this episode now to hear more! [00:01:25] Melissa tells us her background and her role at Quansight. [00:03:41] When Melissa made the decision to switch from one role to another, Amanda asks if that was her plan or if she learned that the skills that she needed to get things done changed over time. [00:06:10] We find out what the focus is for Melissa's role as a Developer Experience Engineer and what she does on a day-to-day basis. [00:08:43] As Melissa was talking about her projects that they work on at Quansight, Amanda wonders if that's the majority of her portfolio, or if she works across different kinds of projects. We learn about the current grant they are working on with CZI that focuses on NumPy, SciPy, Matplotlib, and pandas. [00:13:18] We learn about the funding model and how sustainable it is. [00:16:20] Melissa shares some great ideas on how we can put more effort into making people feel seen and heard. [00:19:26] Melissa details some things she learned with the open source projects and things she recommends for others with large established projects. [00:22:44] Amanda talks about a 2020 paper that was released in nature called “Array programming with NumPy,” and Melissa gives us her perspective on what happened with the community in 2020, if things have changed, and what needs to be addressed. [00:27:09] Find out how CZI got involved with Melissa's work, what their goals are, and how she's changing in order to adapt towards those goals. [00:31:32] Melissa shares her thoughts on what the future of community management and community development looks like for people who are entering the role for those projects. [00:36:40] We hear more about Python Brasil 2022 that's coming up. [00:38:05] Find out where you can follow Melissa online and learn more about her work. Quotes [00:02:49] “Since Quansight is a company very focused on sustaining and helping maintain open source projects, we are trying to help new contributors, people who want to do the move from contributor to maintainer, understanding what that means, and how we can help them get there, and how we can help improve leadership in our open source projects.” [00:11:53] “This is one of the barriers that we want to break, is that making sure that people understand that these are important, they are core projects in the scientific Python ecosystem, but at the same time they are projects just like any other.” [00:12:17] “I think experience of working with projects that are so old and big has taught me a lot about the dynamics of how people work and how new people try to join these projects and how we can improve on that.” [00:16:41] “We need to make sure that people who do contribution outside of code are credited and that they are valued inside open source projects.” [00:18:20] “I think we should think about diversifying these paths for contribution, but for that we need to go beyond GitHub. We need to go beyond the current metrics that we have for open source, we need to go beyond the current credit system and reputation system that we have for open source contributions.” [00:30:38] “Community managers are not second-class citizens.” Spotlight [00:39:21 Amanda's spotlight is a 2014 paper from MSR called, “The Promises and Perils of Mining GitHub.” [00:40:48] Richard's spotlight is the book, Don't Sleep, There Are Snakes, by Daniel Everett. [00:41:52] Melissa's spotlights are Ralf Gommers and Scientific Python initiative. Links SustainOSS (https://sustainoss.org/) SustainOSS Twitter (https://twitter.com/SustainOSS?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) SustainOSS Discourse (https://discourse.sustainoss.org/) podcast@sustainoss.org (mailto:podcast@sustainoss.org) Richard Littauer Twitter (https://twitter.com/richlitt?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) Amanda Casari Twitter (https://twitter.com/amcasari?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor) Melissa Mendonça Twitter (https://twitter.com/melissawm) Melissa Mendonça LinkedIn (https://br.linkedin.com/in/axequalsb) Melissa Mendonça GitHub (https://melissawm.github.io/) Quansight (https://quansight.com/) Quansight Labs (https://labs.quansight.org/) Quansight Lab Projects (https://labs.quansight.org/projects) Quansight Labs Team (https://labs.quansight.org/team) Sustain Podcast-Episode 57: Mikeal Rogers on Building Communities, the Early Days of Node.js, and How to Stay a Coder for Life (https://podcast.sustainoss.org/guests/mikeal) Sustain Podcast-Episode 85: Geoffrey Huntley and Sustaining OSS with Gitpod (https://podcast.sustainoss.org/85) Advancing an inclusive culture in the scientific Python ecosystem (CZI grant for NumPy, SciPy, Matplotlib, and Pandas (https://figshare.com/articles/online_resource/Advancing_an_inclusive_culture_in_the_scientific_Python_ecosystem/16548063) Sustain Podcast-Episode 79: Leah Silen on how NumFocus helps makes scientific code more sustainable (https://podcast.sustainoss.org/79) NumPy (https://numpy.org/) SciPy (https://scipy.org/) Matplotlib (https://matplotlib.org/) pandas (https://pandas.pydata.org/) Sustain Podcast-Episode 64: Travis Oliphant and Russell Pekrul on NumPy, Anaconda, and giving back with FairOSS (https://podcast.sustainoss.org/guests/oliphant) Tania Allard Twitter (https://twitter.com/ixek?lang=en) Array programming with NumPy (nature) (https://www.nature.com/articles/s41586-020-2649-2) Python Brasil 2022 (https://2022.pythonbrasil.org.br/) “The Promises and Perils of Mining GitHub,” by Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, Daniela Damian (https://kblincoe.github.io/publications/2014_MSR_Promises_Perils.pdf) “The Promises and Perils of Mining GitHub,” by Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, Daniela Damian (ACM Digital Library) (https://dl.acm.org/doi/10.1145/2597073.2597074) Daniel Everett (Wikipedia) (https://en.wikipedia.org/wiki/Daniel_Everett#Don't_Sleep,_There_Are_Snakes:_Life_and_Language_in_the_Amazonian_Jungle) Excerpt: ‘Don't Sleep, There Are Snakes' (npr) (https://www.npr.org/2009/12/23/121515579/excerpt-dont-sleep-there-are-snakes?t=1661871384424) Ralf Gommers (GitHub) (https://github.com/rgommers) Scientific Python (https://scientific-python.org/) Credits Produced by Richard Littauer (https://www.burntfen.com/) Edited by Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/) Show notes by DeAnn Bahr Peachtree Sound (https://www.peachtreesound.com/) Special Guest: Melissa Mendonça.
[00:00.000 --> 00:04.560] All right, so I'm here with 52 weeks of AWS[00:04.560 --> 00:07.920] and still continuing to do developer certification.[00:07.920 --> 00:11.280] I'm gonna go ahead and share my screen here.[00:13.720 --> 00:18.720] All right, so we are on Lambda, one of my favorite topics.[00:19.200 --> 00:20.800] Let's get right into it[00:20.800 --> 00:24.040] and talk about how to develop event-driven solutions[00:24.040 --> 00:25.560] with AWS Lambda.[00:26.640 --> 00:29.440] With Serverless Computing, one of the things[00:29.440 --> 00:32.920] that it is going to do is it's gonna change[00:32.920 --> 00:36.000] the way you think about building software[00:36.000 --> 00:39.000] and in a traditional deployment environment,[00:39.000 --> 00:42.040] you would configure an instance, you would update an OS,[00:42.040 --> 00:45.520] you'd install applications, build and deploy them,[00:45.520 --> 00:47.000] load balance.[00:47.000 --> 00:51.400] So this is non-cloud native computing and Serverless,[00:51.400 --> 00:54.040] you really only need to focus on building[00:54.040 --> 00:56.360] and deploying applications and then monitoring[00:56.360 --> 00:58.240] and maintaining the applications.[00:58.240 --> 01:00.680] And so with really what Serverless does[01:00.680 --> 01:05.680] is it allows you to focus on the code for the application[01:06.320 --> 01:08.000] and you don't have to manage the operating system,[01:08.000 --> 01:12.160] the servers or scale it and really is a huge advantage[01:12.160 --> 01:14.920] because you don't have to pay for the infrastructure[01:14.920 --> 01:15.920] when the code isn't running.[01:15.920 --> 01:18.040] And that's really a key takeaway.[01:19.080 --> 01:22.760] If you take a look at the AWS Serverless platform,[01:22.760 --> 01:24.840] there's a bunch of fully managed services[01:24.840 --> 01:26.800] that are tightly integrated with Lambda.[01:26.800 --> 01:28.880] And so this is another huge advantage of Lambda,[01:28.880 --> 01:31.000] isn't necessarily that it's the fastest[01:31.000 --> 01:33.640] or it has the most powerful execution,[01:33.640 --> 01:35.680] it's the tight integration with the rest[01:35.680 --> 01:39.320] of the AWS platform and developer tools[01:39.320 --> 01:43.400] like AWS Serverless application model or AWS SAM[01:43.400 --> 01:45.440] would help you simplify the deployment[01:45.440 --> 01:47.520] of Serverless applications.[01:47.520 --> 01:51.960] And some of the services include Amazon S3,[01:51.960 --> 01:56.960] Amazon SNS, Amazon SQS and AWS SDKs.[01:58.600 --> 02:03.280] So in terms of Lambda, AWS Lambda is a compute service[02:03.280 --> 02:05.680] for Serverless and it lets you run code[02:05.680 --> 02:08.360] without provisioning or managing servers.[02:08.360 --> 02:11.640] It allows you to trigger your code in response to events[02:11.640 --> 02:14.840] that you would configure like, for example,[02:14.840 --> 02:19.200] dropping something into a S3 bucket like that's an image,[02:19.200 --> 02:22.200] Nevel Lambda that transcribes it to a different format.[02:23.080 --> 02:27.200] It also allows you to scale automatically based on demand[02:27.200 --> 02:29.880] and it will also incorporate built-in monitoring[02:29.880 --> 02:32.880] and logging with AWS CloudWatch.[02:34.640 --> 02:37.200] So if you look at AWS Lambda,[02:37.200 --> 02:39.040] some of the things that it does[02:39.040 --> 02:42.600] is it enables you to bring in your own code.[02:42.600 --> 02:45.280] So the code you write for Lambda isn't written[02:45.280 --> 02:49.560] in a new language, you can write things[02:49.560 --> 02:52.600] in tons of different languages for AWS Lambda,[02:52.600 --> 02:57.600] Node, Java, Python, C-sharp, Go, Ruby.[02:57.880 --> 02:59.440] There's also custom run time.[02:59.440 --> 03:03.880] So you could do Rust or Swift or something like that.[03:03.880 --> 03:06.080] And it also integrates very deeply[03:06.080 --> 03:11.200] with other AWS services and you can invoke[03:11.200 --> 03:13.360] third-party applications as well.[03:13.360 --> 03:18.080] It also has a very flexible resource and concurrency model.[03:18.080 --> 03:20.600] And so Lambda would scale in response to events.[03:20.600 --> 03:22.880] So you would just need to configure memory settings[03:22.880 --> 03:24.960] and AWS would handle the other details[03:24.960 --> 03:28.720] like the CPU, the network, the IO throughput.[03:28.720 --> 03:31.400] Also, you can use the Lambda,[03:31.400 --> 03:35.000] AWS Identity and Access Management Service or IAM[03:35.000 --> 03:38.560] to grant access to what other resources you would need.[03:38.560 --> 03:41.200] And this is one of the ways that you would control[03:41.200 --> 03:44.720] the security of Lambda is you have really guardrails[03:44.720 --> 03:47.000] around it because you would just tell Lambda,[03:47.000 --> 03:50.080] you have a role that is whatever it is you need Lambda to do,[03:50.080 --> 03:52.200] talk to SQS or talk to S3,[03:52.200 --> 03:55.240] and it would specifically only do that role.[03:55.240 --> 04:00.240] And the other thing about Lambda is that it has built-in[04:00.560 --> 04:02.360] availability and fault tolerance.[04:02.360 --> 04:04.440] So again, it's a fully managed service,[04:04.440 --> 04:07.520] it's high availability and you don't have to do anything[04:07.520 --> 04:08.920] at all to use that.[04:08.920 --> 04:11.600] And one of the biggest things about Lambda[04:11.600 --> 04:15.000] is that you only pay for what you use.[04:15.000 --> 04:18.120] And so when the Lambda service is idle,[04:18.120 --> 04:19.480] you don't have to actually pay for that[04:19.480 --> 04:21.440] versus if it's something else,[04:21.440 --> 04:25.240] like even in the case of a Kubernetes-based system,[04:25.240 --> 04:28.920] still there's a host machine that's running Kubernetes[04:28.920 --> 04:31.640] and you have to actually pay for that.[04:31.640 --> 04:34.520] So one of the ways that you can think about Lambda[04:34.520 --> 04:38.040] is that there's a bunch of different use cases for it.[04:38.040 --> 04:40.560] So let's start off with different use cases,[04:40.560 --> 04:42.920] web apps, I think would be one of the better ones[04:42.920 --> 04:43.880] to think about.[04:43.880 --> 04:46.680] So you can combine AWS Lambda with other services[04:46.680 --> 04:49.000] and you can build powerful web apps[04:49.000 --> 04:51.520] that automatically scale up and down.[04:51.520 --> 04:54.000] And there's no administrative effort at all.[04:54.000 --> 04:55.160] There's no backups necessary,[04:55.160 --> 04:58.320] no multi-data center redundancy, it's done for you.[04:58.320 --> 05:01.400] Backends, so you can build serverless backends[05:01.400 --> 05:05.680] that lets you handle web, mobile, IoT,[05:05.680 --> 05:07.760] third-party applications.[05:07.760 --> 05:10.600] You can also build those backends with Lambda,[05:10.600 --> 05:15.400] with API Gateway, and you can build applications with them.[05:15.400 --> 05:17.200] In terms of data processing,[05:17.200 --> 05:19.840] you can also use Lambda to run code[05:19.840 --> 05:22.560] in response to a trigger, change in data,[05:22.560 --> 05:24.440] shift in system state,[05:24.440 --> 05:27.360] and really all of AWS for the most part[05:27.360 --> 05:29.280] is able to be orchestrated with Lambda.[05:29.280 --> 05:31.800] So it's really like a glue type service[05:31.800 --> 05:32.840] that you're able to use.[05:32.840 --> 05:36.600] Now chatbots, that's another great use case for it.[05:36.600 --> 05:40.760] Amazon Lex is a service for building conversational chatbots[05:42.120 --> 05:43.560] and you could use it with Lambda.[05:43.560 --> 05:48.560] Amazon Lambda service is also able to be used[05:50.080 --> 05:52.840] with voice IT automation.[05:52.840 --> 05:55.760] These are all great use cases for Lambda.[05:55.760 --> 05:57.680] In fact, I would say it's kind of like[05:57.680 --> 06:01.160] the go-to automation tool for AWS.[06:01.160 --> 06:04.160] So let's talk about how Lambda works next.[06:04.160 --> 06:06.080] So the way Lambda works is that[06:06.080 --> 06:09.080] there's a function and there's an event source,[06:09.080 --> 06:10.920] and these are the core components.[06:10.920 --> 06:14.200] The event source is the entity that publishes events[06:14.200 --> 06:19.000] to AWS Lambda, and Lambda function is the code[06:19.000 --> 06:21.960] that you're gonna use to process the event.[06:21.960 --> 06:25.400] And AWS Lambda would run that Lambda function[06:25.400 --> 06:29.600] on your behalf, and a few things to consider[06:29.600 --> 06:33.840] is that it really is just a little bit of code,[06:33.840 --> 06:35.160] and you can configure the triggers[06:35.160 --> 06:39.720] to invoke a function in response to resource lifecycle events,[06:39.720 --> 06:43.680] like for example, responding to incoming HTTP,[06:43.680 --> 06:47.080] consuming events from a queue, like in the case of SQS[06:47.080 --> 06:48.320] or running it on a schedule.[06:48.320 --> 06:49.760] So running it on a schedule is actually[06:49.760 --> 06:51.480] a really good data engineering task, right?[06:51.480 --> 06:54.160] Like you could run it periodically to scrape a website.[06:55.120 --> 06:58.080] So as a developer, when you create Lambda functions[06:58.080 --> 07:01.400] that are managed by the AWS Lambda service,[07:01.400 --> 07:03.680] you can define the permissions for the function[07:03.680 --> 07:06.560] and basically specify what are the events[07:06.560 --> 07:08.520] that would actually trigger it.[07:08.520 --> 07:11.000] You can also create a deployment package[07:11.000 --> 07:12.920] that includes application code[07:12.920 --> 07:17.000] in any dependency or library necessary to run the code,[07:17.000 --> 07:19.200] and you can also configure things like the memory,[07:19.200 --> 07:23.200] you can figure the timeout, also configure the concurrency,[07:23.200 --> 07:25.160] and then when your function is invoked,[07:25.160 --> 07:27.640] Lambda will provide a runtime environment[07:27.640 --> 07:30.080] based on the runtime and configuration options[07:30.080 --> 07:31.080] that you selected.[07:31.080 --> 07:36.080] So let's talk about models for invoking Lambda functions.[07:36.360 --> 07:41.360] In the case of an event source that invokes Lambda function[07:41.440 --> 07:43.640] by either a push or a pool model,[07:43.640 --> 07:45.920] in the case of a push, it would be an event source[07:45.920 --> 07:48.440] directly invoking the Lambda function[07:48.440 --> 07:49.840] when the event occurs.[07:50.720 --> 07:53.040] In the case of a pool model,[07:53.040 --> 07:56.960] this would be putting the information into a stream or a queue,[07:56.960 --> 07:59.400] and then Lambda would pull that stream or queue,[07:59.400 --> 08:02.800] and then invoke the function when it detects an events.[08:04.080 --> 08:06.480] So a few different examples would be[08:06.480 --> 08:11.280] that some services can actually invoke the function directly.[08:11.280 --> 08:13.680] So for a synchronous invocation,[08:13.680 --> 08:15.480] the other service would wait for the response[08:15.480 --> 08:16.320] from the function.[08:16.320 --> 08:20.680] So a good example would be in the case of Amazon API Gateway,[08:20.680 --> 08:24.800] which would be the REST-based service in front.[08:24.800 --> 08:28.320] In this case, when a client makes a request to your API,[08:28.320 --> 08:31.200] that client would get a response immediately.[08:31.200 --> 08:32.320] And then with this model,[08:32.320 --> 08:34.880] there's no built-in retry in Lambda.[08:34.880 --> 08:38.040] Examples of this would be Elastic Load Balancing,[08:38.040 --> 08:42.800] Amazon Cognito, Amazon Lex, Amazon Alexa,[08:42.800 --> 08:46.360] Amazon API Gateway, AWS CloudFormation,[08:46.360 --> 08:48.880] and Amazon CloudFront,[08:48.880 --> 08:53.040] and also Amazon Kinesis Data Firehose.[08:53.040 --> 08:56.760] For asynchronous invocation, AWS Lambda queues,[08:56.760 --> 09:00.320] the event before it passes to your function.[09:00.320 --> 09:02.760] The other service gets a success response[09:02.760 --> 09:04.920] as soon as the event is queued,[09:04.920 --> 09:06.560] and if an error occurs,[09:06.560 --> 09:09.760] Lambda will automatically retry the invocation twice.[09:10.760 --> 09:14.520] A good example of this would be S3, SNS,[09:14.520 --> 09:17.720] SES, the Simple Email Service,[09:17.720 --> 09:21.120] AWS CloudFormation, Amazon CloudWatch Logs,[09:21.120 --> 09:25.400] CloudWatch Events, AWS CodeCommit, and AWS Config.[09:25.400 --> 09:28.280] But in both cases, you can invoke a Lambda function[09:28.280 --> 09:30.000] using the invoke operation,[09:30.000 --> 09:32.720] and you can specify the invocation type[09:32.720 --> 09:35.440] as either synchronous or asynchronous.[09:35.440 --> 09:38.760] And when you use the AWS service as a trigger,[09:38.760 --> 09:42.280] the invocation type is predetermined for each service,[09:42.280 --> 09:44.920] and so you have no control over the invocation type[09:44.920 --> 09:48.920] that these events sources use when they invoke your Lambda.[09:50.800 --> 09:52.120] In the polling model,[09:52.120 --> 09:55.720] the event sources will put information into a stream or a queue,[09:55.720 --> 09:59.360] and AWS Lambda will pull the stream or the queue.[09:59.360 --> 10:01.000] If it first finds a record,[10:01.000 --> 10:03.280] it will deliver the payload and invoke the function.[10:03.280 --> 10:04.920] And this model, the Lambda itself,[10:04.920 --> 10:07.920] is basically pulling data from a stream or a queue[10:07.920 --> 10:10.280] for processing by the Lambda function.[10:10.280 --> 10:12.640] Some examples would be a stream-based event service[10:12.640 --> 10:17.640] would be Amazon DynamoDB or Amazon Kinesis Data Streams,[10:17.800 --> 10:20.920] and these stream records are organized into shards.[10:20.920 --> 10:24.640] So Lambda would actually pull the stream for the record[10:24.640 --> 10:27.120] and then attempt to invoke the function.[10:27.120 --> 10:28.800] If there's a failure,[10:28.800 --> 10:31.480] AWS Lambda won't read any of the new shards[10:31.480 --> 10:34.840] until the failed batch of records expires or is processed[10:34.840 --> 10:36.160] successfully.[10:36.160 --> 10:39.840] In the non-streaming event, which would be SQS,[10:39.840 --> 10:42.400] Amazon would pull the queue for records.[10:42.400 --> 10:44.600] If it fails or times out,[10:44.600 --> 10:46.640] then the message would be returned to the queue,[10:46.640 --> 10:49.320] and then Lambda will keep retrying the failed message[10:49.320 --> 10:51.800] until it's processed successfully.[10:51.800 --> 10:53.600] If the message will expire,[10:53.600 --> 10:56.440] which is something you can do with SQS,[10:56.440 --> 10:58.240] then it'll just be discarded.[10:58.240 --> 11:00.400] And you can create a mapping between an event source[11:00.400 --> 11:02.960] and a Lambda function right inside of the console.[11:02.960 --> 11:05.520] And this is how typically you would set that up manually[11:05.520 --> 11:07.600] without using infrastructure as code.[11:08.560 --> 11:10.200] All right, let's talk about permissions.[11:10.200 --> 11:13.080] This is definitely an easy place to get tripped up[11:13.080 --> 11:15.760] when you're first using AWS Lambda.[11:15.760 --> 11:17.840] There's two types of permissions.[11:17.840 --> 11:20.120] The first is the event source and permission[11:20.120 --> 11:22.320] to trigger the Lambda function.[11:22.320 --> 11:24.480] This would be the invocation permission.[11:24.480 --> 11:26.440] And the next one would be the Lambda function[11:26.440 --> 11:29.600] needs permissions to interact with other services,[11:29.600 --> 11:31.280] but this would be the run permissions.[11:31.280 --> 11:34.520] And these are both handled via the IAM service[11:34.520 --> 11:38.120] or the AWS identity and access management service.[11:38.120 --> 11:43.120] So the IAM resource policy would tell the Lambda service[11:43.600 --> 11:46.640] which push event the sources have permission[11:46.640 --> 11:48.560] to invoke the Lambda function.[11:48.560 --> 11:51.120] And these resource policies would make it easy[11:51.120 --> 11:55.280] to grant access to a Lambda function across AWS account.[11:55.280 --> 11:58.400] So a good example would be if you have an S3 bucket[11:58.400 --> 12:01.400] in your account and you need to invoke a function[12:01.400 --> 12:03.880] in another account, you could create a resource policy[12:03.880 --> 12:07.120] that allows those to interact with each other.[12:07.120 --> 12:09.200] And the resource policy for a Lambda function[12:09.200 --> 12:11.200] is called a function policy.[12:11.200 --> 12:14.160] And when you add a trigger to your Lambda function[12:14.160 --> 12:16.760] from the console, the function policy[12:16.760 --> 12:18.680] will be generated automatically[12:18.680 --> 12:20.040] and it allows the event source[12:20.040 --> 12:22.820] to take the Lambda invoke function action.[12:24.400 --> 12:27.320] So a good example would be in Amazon S3 permission[12:27.320 --> 12:32.120] to invoke the Lambda function called my first function.[12:32.120 --> 12:34.720] And basically it would be an effect allow.[12:34.720 --> 12:36.880] And then under principle, if you would have service[12:36.880 --> 12:41.880] S3.AmazonEWS.com, the action would be Lambda colon[12:41.880 --> 12:45.400] invoke function and then the resource would be the name[12:45.400 --> 12:49.120] or the ARN of actually the Lambda.[12:49.120 --> 12:53.080] And then the condition would be actually the ARN of the bucket.[12:54.400 --> 12:56.720] And really that's it in a nutshell.[12:57.560 --> 13:01.480] The Lambda execution role grants your Lambda function[13:01.480 --> 13:05.040] permission to access AWS services and resources.[13:05.040 --> 13:08.000] And you select or create the execution role[13:08.000 --> 13:10.000] when you create a Lambda function.[13:10.000 --> 13:12.320] The IAM policy would define the actions[13:12.320 --> 13:14.440] of Lambda functions allowed to take[13:14.440 --> 13:16.720] and the trust policy allows the Lambda service[13:16.720 --> 13:20.040] to assume an execution role.[13:20.040 --> 13:23.800] To grant permissions to AWS Lambda to assume a role,[13:23.800 --> 13:27.460] you have to have the permission for IAM pass role action.[13:28.320 --> 13:31.000] A couple of different examples of a relevant policy[13:31.000 --> 13:34.560] for an execution role and the example,[13:34.560 --> 13:37.760] the IAM policy, you know,[13:37.760 --> 13:39.840] basically that we talked about earlier,[13:39.840 --> 13:43.000] would allow you to interact with S3.[13:43.000 --> 13:45.360] Another example would be to make it interact[13:45.360 --> 13:49.240] with CloudWatch logs and to create a log group[13:49.240 --> 13:51.640] and stream those logs.[13:51.640 --> 13:54.800] The trust policy would give Lambda service permissions[13:54.800 --> 13:57.600] to assume a role and invoke a Lambda function[13:57.600 --> 13:58.520] on your behalf.[13:59.560 --> 14:02.600] Now let's talk about the overview of authoring[14:02.600 --> 14:06.120] and configuring Lambda functions.[14:06.120 --> 14:10.440] So really to start with, to create a Lambda function,[14:10.440 --> 14:14.840] you first need to create a Lambda function deployment package,[14:14.840 --> 14:19.800] which is a zip or jar file that consists of your code[14:19.800 --> 14:23.160] and any dependencies with Lambda,[14:23.160 --> 14:25.400] you can use the programming language[14:25.400 --> 14:27.280] and integrated development environment[14:27.280 --> 14:29.800] that you're most familiar with.[14:29.800 --> 14:33.360] And you can actually bring the code you've already written.[14:33.360 --> 14:35.960] And Lambda does support lots of different languages[14:35.960 --> 14:39.520] like Node.js, Python, Ruby, Java, Go,[14:39.520 --> 14:41.160] and.NET runtimes.[14:41.160 --> 14:44.120] And you can also implement a custom runtime[14:44.120 --> 14:45.960] if you wanna use a different language as well,[14:45.960 --> 14:48.480] which is actually pretty cool.[14:48.480 --> 14:50.960] And if you wanna create a Lambda function,[14:50.960 --> 14:52.800] you would specify the handler,[14:52.800 --> 14:55.760] the Lambda function handler is the entry point.[14:55.760 --> 14:57.600] And a few different aspects of it[14:57.600 --> 14:59.400] that are important to pay attention to,[14:59.400 --> 15:00.720] the event object,[15:00.720 --> 15:03.480] this would provide information about the event[15:03.480 --> 15:05.520] that triggered the Lambda function.[15:05.520 --> 15:08.280] And this could be like a predefined object[15:08.280 --> 15:09.760] that AWS service generates.[15:09.760 --> 15:11.520] So you'll see this, like for example,[15:11.520 --> 15:13.440] in the console of AWS,[15:13.440 --> 15:16.360] you can actually ask for these objects[15:16.360 --> 15:19.200] and it'll give you really the JSON structure[15:19.200 --> 15:20.680] so you can test things out.[15:21.880 --> 15:23.900] In the contents of an event object[15:23.900 --> 15:26.800] includes everything you would need to actually invoke it.[15:26.800 --> 15:29.640] The context object is generated by AWS[15:29.640 --> 15:32.360] and this is really a runtime information.[15:32.360 --> 15:35.320] And so if you needed to get some kind of runtime information[15:35.320 --> 15:36.160] about your code,[15:36.160 --> 15:40.400] let's say environmental variables or AWS request ID[15:40.400 --> 15:44.280] or a log stream or remaining time in Millies,[15:45.320 --> 15:47.200] like for example, that one would return[15:47.200 --> 15:48.840] the number of milliseconds that remain[15:48.840 --> 15:50.600] before your function times out,[15:50.600 --> 15:53.300] you can get all that inside the context object.[15:54.520 --> 15:57.560] So what about an example that runs a Python?[15:57.560 --> 15:59.280] Pretty straightforward actually.[15:59.280 --> 16:01.400] All you need is you would put a handler[16:01.400 --> 16:03.280] inside the handler would take,[16:03.280 --> 16:05.000] that it would be a Python function,[16:05.000 --> 16:07.080] it would be an event, there'd be a context,[16:07.080 --> 16:10.960] you pass it inside and then you return some kind of message.[16:10.960 --> 16:13.960] A few different best practices to remember[16:13.960 --> 16:17.240] about AWS Lambda would be to separate[16:17.240 --> 16:20.320] the core business logic from the handler method[16:20.320 --> 16:22.320] and this would make your code more portable,[16:22.320 --> 16:24.280] enable you to target unit tests[16:25.240 --> 16:27.120] without having to worry about the configuration.[16:27.120 --> 16:30.400] So this is always a really good idea just in general.[16:30.400 --> 16:32.680] Make sure you have modular functions.[16:32.680 --> 16:34.320] So you have a single purpose function,[16:34.320 --> 16:37.160] you don't have like a kitchen sink function,[16:37.160 --> 16:40.000] you treat functions as stateless as well.[16:40.000 --> 16:42.800] So you would treat a function that basically[16:42.800 --> 16:46.040] just does one thing and then when it's done,[16:46.040 --> 16:48.320] there is no state that's actually kept anywhere[16:49.320 --> 16:51.120] and also only include what you need.[16:51.120 --> 16:55.840] So you don't want to have a huge sized Lambda functions[16:55.840 --> 16:58.560] and one of the ways that you can avoid this[16:58.560 --> 17:02.360] is by reducing the time it takes a Lambda to unpack[17:02.360 --> 17:04.000] the deployment packages[17:04.000 --> 17:06.600] and you can also minimize the complexity[17:06.600 --> 17:08.640] of your dependencies as well.[17:08.640 --> 17:13.600] And you can also reuse the temporary runtime environment[17:13.600 --> 17:16.080] to improve the performance of a function as well.[17:16.080 --> 17:17.680] And so the temporary runtime environment[17:17.680 --> 17:22.280] initializes any external dependencies of the Lambda code[17:22.280 --> 17:25.760] and you can make sure that any externalized configuration[17:25.760 --> 17:27.920] or dependency that your code retrieves are stored[17:27.920 --> 17:30.640] and referenced locally after the initial run.[17:30.640 --> 17:33.800] So this would be limit re-initializing variables[17:33.800 --> 17:35.960] and objects on every invocation,[17:35.960 --> 17:38.200] keeping it alive and reusing connections[17:38.200 --> 17:40.680] like an HTTP or database[17:40.680 --> 17:43.160] that were established during the previous invocation.[17:43.160 --> 17:45.880] So a really good example of this would be a socket connection.[17:45.880 --> 17:48.040] If you make a socket connection[17:48.040 --> 17:51.640] and this socket connection took two seconds to spawn,[17:51.640 --> 17:54.000] you don't want every time you call Lambda[17:54.000 --> 17:55.480] for it to wait two seconds,[17:55.480 --> 17:58.160] you want to reuse that socket connection.[17:58.160 --> 18:00.600] A few good examples of best practices[18:00.600 --> 18:02.840] would be including logging statements.[18:02.840 --> 18:05.480] This is a kind of a big one[18:05.480 --> 18:08.120] in the case of any cloud computing operation,[18:08.120 --> 18:10.960] especially when it's distributed, if you don't log it,[18:10.960 --> 18:13.280] there's no way you can figure out what's going on.[18:13.280 --> 18:16.560] So you must add logging statements that have context[18:16.560 --> 18:19.720] so you know which particular Lambda instance[18:19.720 --> 18:21.600] is actually occurring in.[18:21.600 --> 18:23.440] Also include results.[18:23.440 --> 18:25.560] So make sure that you know it's happening[18:25.560 --> 18:29.000] when the Lambda ran, use environmental variables as well.[18:29.000 --> 18:31.320] So you can figure out things like what the bucket was[18:31.320 --> 18:32.880] that it was writing to.[18:32.880 --> 18:35.520] And then also don't do recursive code.[18:35.520 --> 18:37.360] That's really a no-no.[18:37.360 --> 18:40.200] You want to write very simple functions with Lambda.[18:41.320 --> 18:44.440] Few different ways to write Lambda actually would be[18:44.440 --> 18:46.280] that you can do the console editor,[18:46.280 --> 18:47.440] which I use all the time.[18:47.440 --> 18:49.320] I like to actually just play around with it.[18:49.320 --> 18:51.640] Now the downside is that if you don't,[18:51.640 --> 18:53.800] if you do need to use custom libraries,[18:53.800 --> 18:56.600] you're not gonna be able to do it other than using,[18:56.600 --> 18:58.440] let's say the AWS SDK.[18:58.440 --> 19:01.600] But for just simple things, it's a great use case.[19:01.600 --> 19:06.080] Another one is you can just upload it to AWS console.[19:06.080 --> 19:09.040] And so you can create a deployment package in an IDE.[19:09.040 --> 19:12.120] Like for example, Visual Studio for.NET,[19:12.120 --> 19:13.280] you can actually just right click[19:13.280 --> 19:16.320] and deploy it directly into Lambda.[19:16.320 --> 19:20.920] Another one is you can upload the entire package into S3[19:20.920 --> 19:22.200] and put it into a bucket.[19:22.200 --> 19:26.280] And then Lambda will just grab it outside of that S3 package.[19:26.280 --> 19:29.760] A few different things to remember about Lambda.[19:29.760 --> 19:32.520] The memory and the timeout are configurations[19:32.520 --> 19:35.840] that determine how the Lambda function performs.[19:35.840 --> 19:38.440] And these will affect the billing.[19:38.440 --> 19:40.200] Now, one of the great things about Lambda[19:40.200 --> 19:43.640] is just amazingly inexpensive to run.[19:43.640 --> 19:45.560] And the reason is that you're charged[19:45.560 --> 19:48.200] based on the number of requests for a function.[19:48.200 --> 19:50.560] A few different things to remember would be the memory.[19:50.560 --> 19:53.560] Like so if you specify more memory,[19:53.560 --> 19:57.120] it's going to increase the cost timeout.[19:57.120 --> 19:59.960] You can also control the memory duration of the function[19:59.960 --> 20:01.720] by having the right kind of timeout.[20:01.720 --> 20:03.960] But if you make the timeout too long,[20:03.960 --> 20:05.880] it could cost you more money.[20:05.880 --> 20:08.520] So really the best practices would be test the performance[20:08.520 --> 20:12.880] of Lambda and make sure you have the optimum memory size.[20:12.880 --> 20:15.160] Also load test it to make sure[20:15.160 --> 20:17.440] that you understand how the timeouts work.[20:17.440 --> 20:18.280] Just in general,[20:18.280 --> 20:21.640] anything with cloud computing, you should load test it.[20:21.640 --> 20:24.200] Now let's talk about an important topic[20:24.200 --> 20:25.280] that's a final topic here,[20:25.280 --> 20:29.080] which is how to deploy Lambda functions.[20:29.080 --> 20:32.200] So versions are immutable copies of a code[20:32.200 --> 20:34.200] in the configuration of your Lambda function.[20:34.200 --> 20:35.880] And the versioning will allow you to publish[20:35.880 --> 20:39.360] one or more versions of your Lambda function.[20:39.360 --> 20:40.400] And as a result,[20:40.400 --> 20:43.360] you can work with different variations of your Lambda function[20:44.560 --> 20:45.840] in your development workflow,[20:45.840 --> 20:48.680] like development, beta, production, et cetera.[20:48.680 --> 20:50.320] And when you create a Lambda function,[20:50.320 --> 20:52.960] there's only one version, the latest version,[20:52.960 --> 20:54.080] dollar sign, latest.[20:54.080 --> 20:57.240] And you can refer to this function using the ARN[20:57.240 --> 20:59.240] or Amazon resource name.[20:59.240 --> 21:00.640] And when you publish a new version,[21:00.640 --> 21:02.920] AWS Lambda will make a snapshot[21:02.920 --> 21:05.320] of the latest version to create a new version.[21:06.800 --> 21:09.600] You can also create an alias for Lambda function.[21:09.600 --> 21:12.280] And conceptually, an alias is just like a pointer[21:12.280 --> 21:13.800] to a specific function.[21:13.800 --> 21:17.040] And you can use that alias in the ARN[21:17.040 --> 21:18.680] to reference the Lambda function version[21:18.680 --> 21:21.280] that's currently associated with the alias.[21:21.280 --> 21:23.400] What's nice about the alias is you can roll back[21:23.400 --> 21:25.840] and forth between different versions,[21:25.840 --> 21:29.760] which is pretty nice because in the case of deploying[21:29.760 --> 21:32.920] a new version, if there's a huge problem with it,[21:32.920 --> 21:34.080] you just toggle it right back.[21:34.080 --> 21:36.400] And there's really not a big issue[21:36.400 --> 21:39.400] in terms of rolling back your code.[21:39.400 --> 21:44.400] Now, let's take a look at an example where AWS S3,[21:45.160 --> 21:46.720] or Amazon S3 is the event source[21:46.720 --> 21:48.560] that invokes your Lambda function.[21:48.560 --> 21:50.720] Every time a new object is created,[21:50.720 --> 21:52.880] when Amazon S3 is the event source,[21:52.880 --> 21:55.800] you can store the information for the event source mapping[21:55.800 --> 21:59.040] in the configuration for the bucket notifications.[21:59.040 --> 22:01.000] And then in that configuration,[22:01.000 --> 22:04.800] you could identify the Lambda function ARN[22:04.800 --> 22:07.160] that Amazon S3 can invoke.[22:07.160 --> 22:08.520] But in some cases,[22:08.520 --> 22:11.680] you're gonna have to update the notification configuration.[22:11.680 --> 22:14.720] So Amazon S3 will invoke the correct version each time[22:14.720 --> 22:17.840] you publish a new version of your Lambda function.[22:17.840 --> 22:21.800] So basically, instead of specifying the function ARN,[22:21.800 --> 22:23.880] you can specify an alias ARN[22:23.880 --> 22:26.320] in the notification of configuration.[22:26.320 --> 22:29.160] And as you promote a new version of the Lambda function[22:29.160 --> 22:32.200] into production, you only need to update the prod alias[22:32.200 --> 22:34.520] to point to the latest stable version.[22:34.520 --> 22:36.320] And you also don't need to update[22:36.320 --> 22:39.120] the notification configuration in Amazon S3.[22:40.480 --> 22:43.080] And when you build serverless applications[22:43.080 --> 22:46.600] as common to have code that's shared across Lambda functions,[22:46.600 --> 22:49.400] it could be custom code, it could be a standard library,[22:49.400 --> 22:50.560] et cetera.[22:50.560 --> 22:53.320] And before, and this was really a big limitation,[22:53.320 --> 22:55.920] was you had to have all the code deployed together.[22:55.920 --> 22:58.960] But now, one of the really cool things you can do[22:58.960 --> 23:00.880] is you can have a Lambda function[23:00.880 --> 23:03.600] to include additional code as a layer.[23:03.600 --> 23:05.520] So layer is basically a zip archive[23:05.520 --> 23:08.640] that contains a library, maybe a custom runtime.[23:08.640 --> 23:11.720] Maybe it isn't gonna include some kind of really cool[23:11.720 --> 23:13.040] pre-trained model.[23:13.040 --> 23:14.680] And then the layers you can use,[23:14.680 --> 23:15.800] the libraries in your function[23:15.800 --> 23:18.960] without needing to include them in your deployment package.[23:18.960 --> 23:22.400] And it's a best practice to have the smaller deployment packages[23:22.400 --> 23:25.240] and share common dependencies with the layers.[23:26.120 --> 23:28.520] Also layers will help you keep your deployment package[23:28.520 --> 23:29.360] really small.[23:29.360 --> 23:32.680] So for node, JS, Python, Ruby functions,[23:32.680 --> 23:36.000] you can develop your function code in the console[23:36.000 --> 23:39.000] as long as you keep the package under three megabytes.[23:39.000 --> 23:42.320] And then a function can use up to five layers at a time,[23:42.320 --> 23:44.160] which is pretty incredible actually,[23:44.160 --> 23:46.040] which means that you could have, you know,[23:46.040 --> 23:49.240] basically up to a 250 megabytes total.[23:49.240 --> 23:53.920] So for many languages, this is plenty of space.[23:53.920 --> 23:56.620] Also Amazon has published a public layer[23:56.620 --> 23:58.800] that includes really popular libraries[23:58.800 --> 24:00.800] like NumPy and SciPy,[24:00.800 --> 24:04.840] which does dramatically help data processing[24:04.840 --> 24:05.680] in machine learning.[24:05.680 --> 24:07.680] Now, if I had to predict the future[24:07.680 --> 24:11.840] and I wanted to predict a massive announcement,[24:11.840 --> 24:14.840] I would say that what AWS could do[24:14.840 --> 24:18.600] is they could have a GPU enabled layer at some point[24:18.600 --> 24:20.160] that would include pre-trained models.[24:20.160 --> 24:22.120] And if they did something like that,[24:22.120 --> 24:24.320] that could really open up the doors[24:24.320 --> 24:27.000] for the pre-trained model revolution.[24:27.000 --> 24:30.160] And I would bet that that's possible.[24:30.160 --> 24:32.200] All right, well, in a nutshell,[24:32.200 --> 24:34.680] AWS Lambda is one of my favorite services.[24:34.680 --> 24:38.440] And I think it's worth everybody's time[24:38.440 --> 24:42.360] that's interested in AWS to play around with AWS Lambda.[24:42.360 --> 24:47.200] All right, next week, I'm going to cover API Gateway.[24:47.200 --> 25:13.840] All right, see you next week.If you enjoyed this video, here are additional resources to look at:Coursera + Duke Specialization: Building Cloud Computing Solutions at Scale Specialization: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scalePython, Bash, and SQL Essentials for Data Engineering Specialization: https://www.coursera.org/specializations/python-bash-sql-data-engineering-dukeAWS Certified Solutions Architect - Professional (SAP-C01) Cert Prep: 1 Design for Organizational Complexity:https://www.linkedin.com/learning/aws-certified-solutions-architect-professional-sap-c01-cert-prep-1-design-for-organizational-complexity/design-for-organizational-complexity?autoplay=trueEssentials of MLOps with Azure and Databricks: https://www.linkedin.com/learning/essentials-of-mlops-with-azure-1-introduction/essentials-of-mlops-with-azureO'Reilly Book: Implementing MLOps in the EnterpriseO'Reilly Book: Practical MLOps: https://www.amazon.com/Practical-MLOps-Operationalizing-Machine-Learning/dp/1098103017O'Reilly Book: Python for DevOps: https://www.amazon.com/gp/product/B082P97LDW/O'Reilly Book: Developing on AWS with C#: A Comprehensive Guide on Using C# to Build Solutions on the AWS Platformhttps://www.amazon.com/Developing-AWS-Comprehensive-Solutions-Platform/dp/1492095877Pragmatic AI: An Introduction to Cloud-based Machine Learning: https://www.amazon.com/gp/product/B07FB8F8QP/Pragmatic AI Labs Book: Python Command-Line Tools: https://www.amazon.com/gp/product/B0855FSFYZPragmatic AI Labs Book: Cloud Computing for Data Analysis: https://www.amazon.com/gp/product/B0992BN7W8Pragmatic AI Book: Minimal Python: https://www.amazon.com/gp/product/B0855NSRR7Pragmatic AI Book: Testing in Python: https://www.amazon.com/gp/product/B0855NSRR7Subscribe to Pragmatic AI Labs YouTube Channel: https://www.youtube.com/channel/UCNDfiL0D1LUeKWAkRE1xO5QSubscribe to 52 Weeks of AWS Podcast: https://52-weeks-of-cloud.simplecast.comView content on noahgift.com: https://noahgift.com/View content on Pragmatic AI Labs Website: https://paiml.com/
Summary The increasing sophistication of machine learning has enabled dramatic transformations of businesses and introduced new product categories. At Assembly AI they are offering advanced speech recognition and natural language models as an API service. In this episode founder Dylan Fox discusses the unique challenges of building a business with machine learning as the core product. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out! Your host is Tobias Macey and today I’m interviewing Dylan Fox about building and growing a business with ML as its core offering Interview Introduction How did you get involved in machine learning? Can you describe what Assembly is and the story behind it? For anyone who isn’t familiar with your platform, can you describe the role that ML/AI plays in your product? What was your process for going from idea to prototype for an AI powered business? Can you offer parallels between your own experience and that of your peers who are building businesses oriented more toward pure software applications? How are you structuring your teams? On the path to your current scale and capabilities how have you managed scoping of your model capabilities and operational scale to avoid getting bogged down or burnt out? How do you think about scoping of model functionality to balance composability and system complexity? What is your process for identifying and understanding which problems are suited to ML and when to rely on pure software? You are constantly iterating on model performance and introducing new capabilities. How do you manage prototyping and experimentation cycles? What are the metrics that you track to identify whether and when to move from an experimental to an operational state with a model? What is your process for understanding what’s possible and what can feasibly operate at scale? Can you describe your overall operational patterns delivery process for ML? What are some of the most useful investments in tooling that you have made to manage development experience for your teams? Once you have a model in operation, how do you manage performance tuning? (from both a model and an operational scalability perspective) What are the most interesting, innovative, or unexpected aspects of ML development and maintenance that you have encountered while building and growing the Assembly platform? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Assembly? When is ML the wrong choice? What do you have planned for the future of Assembly? Contact Info @YouveGotFox on Twitter LinkedIn Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Assembly AI Podcast.__init__ Episode Learn Python the Hard Way NLTK NLP == Natural Language Processing NLU == Natural Language Understanding Speech Recognition Tensorflow r/machinelearning SciPy PyTorch Jax HuggingFace RNN == Recurrent Neural Network CNN == Convolutional Neural Network LSTM == Long Short Term Memory Hidden Markov Models Baidu DeepSpeech CTC (Connectionist Temporal Classification) Loss Model Twilio Grid Search K80 GPU A100 GPU TPU == Tensor Processing Unit Foundation Models BLOOM Language Model DALL-E 2 The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
Anlässlich des diesjährigen Japantags
Hugo speaks with Peter Wang, CEO of Anaconda, about how Python became so big in data science, machine learning, and AI. They jump into many of the technical and sociological beginnings of Python being used for data science, a history of PyData, the conda distribution, and NUMFOCUS. They also talk about the emergence of online collaborative environments, particularly with respect to open source, and attempt to figure out the movings parts of PyData and why it has had the impact it has, including the fact that many core developers were not computer scientists or software engineers, but rather scientists and researchers building tools that they needed on an as-needed basis They also discuss the challenges in getting adoption for Python and the things that the PyData stack solves, those that it doesn't and what progress is being made there. People who have listened to Hugo podcast for some time may have recognized that he's interested in the sociology of the data science space and he really considered speaking with Peter a fascinating opportunity to delve into how the Pythonic data science space evolved, particularly with respect to tooling, not only because Peter had a front row seat for much of it, but that he was one of several key actors at various different points. On top of this, Hugo wanted to allow Peter's inner sociologist room to breathe and evolve in this conversation. What happens then is slightly experimental – Peter is a deep, broad, and occasionally hallucinatory thinker and Hugo wanted to explore new spaces with him so we hope you enjoy the experiments they play as they begin to discuss open-source software in the broader context of finite and infinite games and how OSS is a paradigm of humanity's ability to create generative, nourishing and anti-rivlarous systems where, by anti-rivalrous, we mean things that become more valuable for everyone the more people use them! But we need to be mindful of finite-game dynamics (for example, those driven by corporate incentives) co-opting and parasitizing the generative systems that we build. These are all considerations they delve far deeper into in Part 2 of this interview, which will be the next episode of VG, where we also dive into the relationship between OSS, tools, and venture capital, amonh many others things. LInks Peter on twitter (https://twitter.com/pwang) Anaconda Nucleus (https://anaconda.cloud/) Calling out SciPy on diversity (even though it hurts) (https://ilovesymposia.com/2015/04/03/calling-out-scipy-on-diversity/) by Juan Nunez-Iglesias Here Comes Everybody: The Power of Organizing Without Organizations (https://en.wikipedia.org/wiki/Here_Comes_Everybody_(book)) by Clay Shirky Finite and Infinite Games (https://en.wikipedia.org/wiki/Finite_and_Infinite_Games) by James Carse Governing the Commons: The Evolution of Institutions for Collective Action (https://www.cambridge.org/core/books/governing-the-commons/7AB7AE11BADA84409C34815CC288CD79) by Elinor Olstrom Elinor Ostrom's 8 Principles for Managing A Commmons (https://www.onthecommons.org/magazine/elinor-ostroms-8-principles-managing-commmons)
Watch the live stream: Watch on YouTube About the show Sponsored by Datadog: pythonbytes.fm/datadog Special guest: Brian Skinn (Twitter | Github) Michael #1: OpenBB wants to be an open source challenger to Bloomberg Terminal OpenBB Terminal provides a modern Python-based integrated environment for investment research, that allows an average joe retail trader to leverage state-of-the-art Data Science and Machine Learning technologies. As a modern Python-based environment, OpenBBTerminal opens access to numerous Python data libraries in Data Science (Pandas, Numpy, Scipy, Jupyter) Machine Learning (Pytorch, Tensorflow, Sklearn, Flair) Data Acquisition (Beautiful Soup, and numerous third-party APIs) They have a discord community too BTW, seem to be a successful open source project: OpenBB Raises $8.5M in Seed Round Funding Following Open Source Project Gamestonk Terminal's Success Great graphics / gallery here. Way more affordable than the $1,900/mo/user for the Bloomberg Terminal Brian #2: Python f-strings https://fstring.help Florian Bruhin Quick overview of cool features of f-strings, made with Jupyter Python f-strings Are More Powerful Than You Might Think Martin Heinz More verbose discussion of f-strings Both are great to up your string formatting game. Brian S. #3: pyproject.toml and PEP 621 Support in setuptools PEP 621: “Storing project metadata in pyproject.toml” Authors: Brett Cannon, Dustin Ingram, Paul Ganssle, Pradyun Gedam, Sébastien Eustace, Thomas Kluyver, Tzu-ping Chung (Jun-Oct 2020) Covers build-tool-independent fields (name, version, description, readme, authors, etc.) Various tools had already implemented pyproject.toml support, but not setuptools Including: Flit, Hatch, PDM, Trampolim, and Whey (h/t: Scikit-HEP) Not Poetry yet, though it's under discussion setuptools support had been discussed pretty extensively, and had been included on the PSF's list of fundable packaging improvements Initial experimental implementation spearheaded by Anderson Bravalheri, recently completed Seeking testing and bug reports from the community (Discuss thread) I tried it on one of my projects — it mostly worked, but revealed a bug that Anderson fixed super-quick (proper handling of a dynamic long_description, defined in setup.py) Related tools (all early-stage/experimental AFAIK) ini2toml (Anderson Bravalheri) — Can convert setup.cfg (which is in INI format) to pyproject.toml Mostly worked well for me, though I had to manually fix a couple things, most of which were due to limitations of the INI format INI has no list syntax! validate-pyproject (Anderson Bravalheri) — Automated pyproject.toml checks pyproject-fmt (Bernát Gábor) — Autoformatter for pyproject.toml Don't forget to use it with build, instead of via a python setup.py invocation! $ pip install build $ python -m build Will also want to constrain your setuptools version in the build-backend.requires key of pyproject.toml (you are using PEP517/518, right??) Michael #4: JSON Web Tokens @ jwt.io JSON Web Tokens are an open, industry standard RFC 7519 method for representing claims securely between two parties. Basically a visualizer and debugger for JWTs Enter an encoded token Select a decryption algorithm See the payload data verify the signature List of libraries, grouped by language Brian #5: Autocorrect and other Git Tricks - Waylon Walker - Use `git config --global help.autocorrect 10` to have git automatically run the command you meant in 1 second. The `10` is 10 x 1/10 of a second. So `50` for 5 seconds, etc. Automatically set upstream branch if it's not there git config --global push.default current You may NOT want to do this if you are not careful with your branches. From https://stackoverflow.com/a/22933955 git commit -a Automatically “add” all changed and deleted files, but not untracked files. From https://git-scm.com/docs/git-commit#Documentation/git-commit.txt--a Now most of my interactions with git CLI, especially for quick changes, is: $ git checkout main $ git pull $ git checkout -b okken_something $ git commit -a -m 'quick message' $ git push With these working, with autocorrect $ git chkout main $ git pll $ git comit -a -m 'quick message' $ git psh Brian S. #6: jupyter-tempvars Jupyter notebooks are great, and the global namespace of the Python kernel backend makes it super easy to flow analysis from one cell to another BUT, that global namespace also makes it super easy to footgun, when variables leak into/out of a cell when you don't want them to jupyter-tempvars notebook extension Built on top of the tempvars library, which defines a TempVars context manager for handling temporary variables When you create a TempVars context manager, you provide it patterns for variable names to treat as temporary In its simplest form, TempVars (1) clears matching variables from the namespace on entering the context, and then (2) clears them again upon exiting the context, and restoring their prior values, if any TempVars works great, but it's cumbersome and distracting to manually include it in every notebook cell where it's needed With jupyter-tempvars, you instead apply tags with a specific format to notebook cells, and the extension automatically wraps each cell's code in a TempVars context before execution Javascript adapted from existing extensions Patching CodeCell.execute, from the jupyter_contrib_nbextensions ‘Execution Dependencies' extension, to enclose the cell code with the context manager Listening for the ‘kernel ready' event, from [jupyter-black](https://github.com/drillan/jupyter-black/blob/d197945508a9d2879f2e2cc99cafe0cedf034cf2/kernel_exec_on_cell.js#L347-L350), to import the [TempVars](https://github.com/bskinn/jupyter-tempvars/blob/491babaca4f48c8d453ce4598ac12aa6c5323181/src/jupyter_tempvars/extension/jupyter_tempvars.js#L42-L46) context manager upon kernel (re)start See the README (with animated GIFs!) for installation and usage instructions It's on PyPI: $ pip install jupyter-tempvars And, I made a shortcut install script for it: $ jupyter-tempvars install && jupyter-tempvars enable Please try it out, find/report bugs, and suggest features! Future work Publish to conda-forge (definitely) Adapt to JupyterLab, VS Code, etc. (pending interest) Extras Brian: Ok. Python issues are now on GitHub. Seriously. See for yourself. Lorem Ipsum is more interesting than I realized. O RLY Cover Generator Example: Michael: New course: Secure APIs with FastAPI and the Microsoft Identity Platform Pyenv Virtualenv for Windows (Sorta'ish) Hipster Ipsum Brian S.: PSF staff is expanding PSF hiring an Infrastructure Engineer Link now 404s, perhaps they've made their hire? Last year's hire of the Packaging Project Manager (Shamika Mohanan) Steering Council supports PSF hiring a second developer-in-residence PSF has chosen its new Executive Director: Deb Nicholson! PyOhio 2022 Call for Proposals is open Teaser tweet for performance improvements to pydantic Jokes: https://twitter.com/CaNerdIan/status/1512628780212396036 https://www.reddit.com/r/ProgrammerHumor/comments/tuh06y/i_guess_we_all_have_been_there/ https://twitter.com/PR0GRAMMERHUM0R/status/1507613349625966599
Watch the live stream: Watch on YouTube About the show Sponsored by Datadog: pythonbytes.fm/datadog Brian #1: Python glossary and FAQ Inspired by a tweet by Trey Hunner that referenced the glossary glossary All the Python and programming terms in one place Often refers to other parts of the documentation. Forget what an “abstract base class” is? Just look it up FAQ Has sections on General Python Programming Design and History Library and Extension Extending/Embedding Python on Windows Graphic User Interface “Why is Python Installed on my Computer?” Some decent reading here, actually. Example What is the difference between arguments and parameters? - that's under Programming Michael #2: Any.io Learned about it via asyncer AnyIO is an asynchronous networking and concurrency library that works on top of either asyncio or trio. It implements trio-like structured concurrency (SC) on top of asyncio Works in harmony with the native SC of trio itself Check out the features AnyIO also comes with its own pytest plugin which also supports asynchronous fixtures. Brian #3: Vaex : a high performance Python library for lazy Out-of-Core DataFrames suggested by Glen Ferguson “Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets.” out-of-core: “The term out-of-core typically refers to processing data that is too large to fit into a computer's main memory.” - from machinelearning.wtf, a Machine Learning Glossary site. nice tie in, right? Vaex uses memory mapping, a zero memory copy policy, and lazy computations. There's a great intro in the form of a presentation from SciPy 2019 Michael #4: Django Community Survey Results Only 15% of Django developers use it ONLY for work, while two thirds use it both for work and for personal, educational, or side projects. Majority use latest Django Most devs upgrade every stable release Postgres is the primary DB (MongoDB is nowhere in sight) Most sites have Redis caching tiers Michael #5: Extra, Extra, Extra, Extra: Django security releases issued: 4.0.1, 3.2.11, and 2.2.26 Static Sites with Sphinx and Markdown course by Paul Everitt is now out CalDigit Thunderbolt 4 Element Hub review (more info Video by Doc Rock, get it on Amazon here) StreamDeck setup for our live streams Michael's PyBay HTMX talk is up Python Web Conf 2022 - I'll be speaking there and we're media sponsors of the conference so use code PythonBytes@PWC2022 for 15% off, March 21-25. PyCascades 2022 is also happening soon, February 5th-6th, 2022 Joke:
Talk Python To Me - Python conversations for passionate developers
Do you enjoy the "final 2 questions" I always ask at the end of the show? I think it's a great way to track the currents of the Python community. This episode focuses in on one of those questions: "What notable PyPI package have you come across recently? Not necessarily the most popular one but something that delighted you and people should know about?" Our guest, Antonio Andrade put together a GitHub repository cataloging guests' response to this question over the past couple of years. So I invited him to come share the packages covered there. We touch on over 40 packages during this episode so I'm sure you'll learn a few new gems to incorporate into your workflow. Links from the show Antonio on Twitter: @AntonioAndrade Notable PyPI Package Repo: github.com/xandrade/talkpython.fm-notable-packages Antonio's recommended packages from this episode: Sumy: Extract summary from HTML pages or plain texts: github.com gTTS (Google Text-to-Speech): github.com Packages discussed during the episode 1. FastAPI - A-W-E-S-O-M-E web framework for building APIs: fastapi.tiangolo.com 2. Pythonic - Graphical automation tool: github.com 3. umap-learn - Uniform Manifold Approximation and Projection: readthedocs.io 4. Tortoise ORM - Easy async ORM for python, built with relations in mind: tortoise.github.io 5. Beanie - Asynchronous Python ODM for MongoDB: github.com 6. Hathi - SQL host scanner and dictionary attack tool: github.com 7. Plotext - Plots data directly on terminal: github.com 8. Dynaconf - Configuration Management for Python: dynaconf.com 9. Objexplore - Interactive Python Object Explorer: github.com 10. AWS Cloud Development Kit (AWS CDK): docs.aws.amazon.com 11. Luigi - Workflow mgmt + task scheduling + dependency resolution: github.com 12. Seaborn - Statistical Data Visualization: pydata.org 13. CuPy - NumPy & SciPy for GPU: cupy.dev 14. Stevedore - Manage dynamic plugins for Python applications: docs.openstack.org 15. Pydantic - Data validation and settings management: github.com 16. pipx - Install and Run Python Applications in Isolated Environments: pypa.github.io 17. openpyxl - A Python library to read/write Excel 2010 xlsx/xlsm files: readthedocs.io 18. HttpPy - More comfortable requests with python: github.com 19. rich - Render rich text, tables, progress bars, syntax highlighting, markdown and more to the terminal: readthedocs.io 20. PyO3 - Using Python from Rust: pyo3.rs 21. fastai - Making neural nets uncool again: fast.ai 22. Numba - Accelerate Python Functions by compiling Python code using LLVM: numba.pydata.org 23. NetworkML - Device Functional Role ID via Machine Learning and Network Traffic Analysis: github.com 24. Flask-SQLAlchemy - Adds SQLAlchemy support to your Flask application: palletsprojects.com 25. AutoInvent - Libraries for generating GraphQL API and UI from data: autoinvent.dev 26. trio - A friendly Python library for async concurrency and I/O: readthedocs.io 27. Flake8-docstrings - Extension for flake8 which uses pydocstyle to check docstrings: github.com 28. Hotwire-django - Integrate Hotwire in your Django app: github.com 29. Starlette - The little ASGI library that shines: github.com 30. tenacity - Retry code until it succeeds: readthedocs.io 31. pySerial - Python Serial Port Extension: github.com 32. Click - Composable command line interface toolkit: palletsprojects.com 33. Pytest - Simple powerful testing with Python: docs.pytest.org 34. testcontainers-python - Test almost anything that can run in a Docker container: github.com 35. cibuildwheel - Build Python wheels on CI with minimal configuration: readthedocs.io 36. async-rediscache - An easy to use asynchronous Redis cache: github.com 37. seinfeld - Query a Seinfeld quote database: github.com 38. notebook - A web-based notebook environment for interactive computing: readthedocs.io 39. dagster - A data orchestrator for machine learning, analytics, and ETL: dagster.io 40. bleach - An easy safelist-based HTML-sanitizing tool: github.com 41. flynt - string formatting converter: github.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe on YouTube: youtube.com Follow Talk Python on Twitter: @talkpython Follow Michael on Twitter: @mkennedy Sponsors Coiled TopTal AssemblyAI Talk Python Training
Travis Oliphant is a data scientist, entrepreneur, and creator of NumPy, SciPy, and Anaconda. Please support this podcast by checking out our sponsors: – Novo: https://banknovo.com/lex – Allform: https://allform.com/lex to get 20% off – Onnit: https://lexfridman.com/onnit to get up to 10% off – Athletic Greens: https://athleticgreens.com/lex and use code LEX to get 1 month of fish oil – Blinkist: https://blinkist.com/lex and use code LEX to get 25% off premium EPISODE LINKS: Travis's Twitter: https://twitter.com/teoliphant Travis's Wiki Page: https://en.wikipedia.org/wiki/Travis_Oliphant NumPy: https://numpy.org/ SciPy: https://scipy.org/about.html Anaconda: https://www.anaconda.com/products/individual Quansight: https://www.quansight.com PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ YouTube Full
SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments. This Perspective describes the development and capabilities of SciPy 1.0, an open source scientific computing library for the Python programming language. 2020: Pauli Virtanen, R. Gommers, T. Oliphant, Matt Haberland, Tyler Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, Jonathan Bright, Stéfan J. van der Walt, M. Brett, Joshua Wilson, K. Millman, Nikolay Mayorov, Andrew R. J. Nelson, E. Jones, Robert Kern, Eric Larson, C. J. Carey, Ilhan Polat, Y. Feng, Eric W. Moore, J. Vanderplas, D. Laxalde, Josef Perktold, R. Cimrman, I. Henriksen, E. Quintero, Charles R. Harris, A. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, P. van Mulbregt, Aditya Alessandro Pietro Alex Andreas Andreas Anthony Ant Vijaykumar Bardelli Rothberg Hilboll Kloeckner Sco, A. Vijaykumar, Alessandro Pietro Bardelli, Alex Rothberg, A. Hilboll, Andre Kloeckner, A. Scopatz, Antony Lee, A. Rokem, C. N. Woods, Chad Fulton, C. Masson, C. Häggström, Clark Fitzgerald, David A. Nicholson, David R. Hagen, D. Pasechnik, E. Olivetti, E. Martin, E. Wieser, Fabrice Silva, F. Lenders, Florian Wilhelm, G. Young, Gavin A. Price, G. Ingold, Gregory E. Allen, Gregory R. Lee, H. Audren, I. Probst, J. Dietrich, J. Silterra, James T. Webber, J. Slavic, J. Nothman, J. Buchner, Johannes Kulick, Johannes L. Schönberger, J. V. de Miranda Cardoso, J. Reimer, J. Harrington, J. Rodríguez, Juan Nunez-Iglesias, Justin Kuczynski, K. Tritz, M. Thoma, M. Newville, Matthias Kümmerer, Maximilian Bolingbroke, Michael Tartre, M. Pak, Nathaniel J. Smith, N. Nowaczyk, Nikolay Shebanov, O. Pavlyk, P. A. Brodtkorb, Perry Lee, R. McGibbon, Roman Feldbauer, Sam Lewis, S. Tygier, Scott Sievert, S. Vigna, Stefan Peterson, S. More, Tadeusz Pudlik, T. Oshima, T. Pingel, T. Robitaille, Thomas Spura, T. Jones, T. Cera, Tim Leslie, Tiziano Zito, Tom Krauss, U. Upadhyay, Y. Halchenko, Y. Vázquez-Baeza Keywords: SciPy, Python, Sparse matrix, Computational science, Algorithm, Linear algebra, Computational geometry, Interpolation, Signal processing, Image processing, Cluster analysis, Programming language, Library (computing), Machine learning, Input/output, Black Hole, Open-source software, File spanning, Computation https://arxiv.org/pdf/1907.10121v1.pdf
Estamos super ocupados esta semana para grabar y vamos a tomarnos unas mini vacaciones hasta el siguiente miercoles. Les traemos este episodio de pycastR que ojala les sea útil
Eric Anderson (@ericmander) welcomes Peter Wang (@pwang) for a conversation about the Python ecosystem and the open-source communities that have built it. Peter is the creator of Anaconda, the near-essential Python distribution for scientific computing that makes managing packages a lot more manageable. In today’s episode, Peter offers a unique and powerful perspective on how to make the economics of open-source work for everyone. In this episode we discuss: The paradox of the PVM and Python’s packaging difficulties How Guido van Rossum implied permission for Anaconda and the open-source Python movement Python as the lingua franca of a new professional class Looking to Roblox for inspiration for a scientific computing creator community Giving back to open-source communities through the NumFOCUS Foundation Links: Anaconda NumFOCUS NumPy SciPy Enthought Jupyter TensorFlow MicroPython scikit-learn pandas Quansight Red Hat Roblox People mentioned: Travis Oliphant (@teoliphant) Fernando Pérez (@fperez_org) Brian Granger (@ellisonbg) Min Ragan-Kelley (@minrk) Guido van Rossum (@gvanrossum) James Currier (@JamesCurrier) Other episodes: NumPy & SciPy with Travis Oliphant TensorFlow with Rajat Monga
Eric Anderson (@ericmander) and Travis Oliphant (@teoliphant) take a far-reaching tour through the history of the Python data community. Travis has had a hand in the creation of many open-source projects, most notably the influential libraries, NumPy and SciPy, which helped cement Python as the standard for scientific computing. Join us for the story of a fledgling community from a time “before open-source was cool,” and their lessons for today’s open-source landscape. In this episode we discuss: How biomedical engineering, MRIs, and an unhappy tenure committee led to NumPy and SciPy Overcoming early challenges of distribution with Python What Travis would have done differently when he wrote NumPy Successfully solving the “two-option split” by adding a third option Community-driven open-source interacting with company-backed open-source Links: NumPy SciPy Anaconda Quansight Conda Matplotlib Enthought TensorFlow PyTorch MXNet PyPi Jupyter pandas People mentioned: Guido van Rossum (@gvanrossum) Robert Kern (Github: @rkern) Pearu Peterson (Github: @pearu) Wes McKinney (@wesmckinn) Charles Harris (Github: @charris) Francesc Alted (@francescalted) Fernando Perez (@fperez_org) Brian Granger (@ellisonbg) Other episodes: TensorFlow with Rajat Monga
Panelists Eric Berry | Justin Dorfman | Alyssa Wright | Richard Littauer Guest Travis Oliphant | Russell Pekrul Show Notes Hello and welcome to Sustain! Today, we have two guests from OpenTeams in Austin, Travis Oliphant and Russell Pekrul. Travis is the CEO and Russell is the Program Manager and the Founder and Director of FairOSS. We learn all about what OpenTeams and FairOSS are and how they work. Also, Travis tells us about the non-profit he started called NumFOCUS. Other topics discussed are dependencies and how their values are assigned, NumPy and SciPy, and building relationships with companies, which Russell mentions there is a bit of a “chicken and egg” problem here. There is some incredible advice and fascinating stories shared today so go ahead and download this episode now! [00:01:10] We find out what OpenTeams is and how it works. Travis also tells us when he wrote NumPy and SciPy and when he started OpenTeams. [00:07:18] Travis tells us about a non-profit he started with a bunch of people called NumFOCUS so there could be a home for the fiscal sponsor for open source projects. [00:09:24] Russell tells us what FairOSS is and how it works. [00:11:32] Alyssa asks Russell how does he first see the dependencies and then how does he assign that value? He mentions BackYourStack as a starting point. [00:13:00] Eric brings up one of the problems he’s found with trying to fund up open source is that it’s very difficult to solve the problem on more a grand scale. He wonders how Travis and Russell make the impact they want with the magnitude of problems they see. A key piece Travis brings up that they recognize is there’s a data gap and projects have to be participating. Alyssa wonders if projects are aware of their dependencies. [00:17:22] Richard asks about the dependency graph that they are making. He wonders how do you go down the stack and look all the way at the base and how do you judge the usefulness of what dependencies really matter for what code matters for the business proposition? Richard also wonders if anyone has done equity stuff for open source maintainers. [00:23:06] Alyssa is interested in learning more about how Travis and Russell are building the relationships with these companies and what we can do to help. [00:26:35] Alyssa asks Travis and Russell to talk about why this, why now, with this being a time of economic contraction, why is this important? Also, why have they been seeing traction during what can be difficult times for a lot of companies? [00:27:40] Eric asks if Travis can give an example of a project that he feels does that well, that doesn’t have to go through and do it twice, essentially. [00:29:48] Alyssa brings up investments around open source start-ups and how they start with a commitment towards open source and once the investment happens there’s a pivot. She wonders if Travis could talk about how this type of sustainability is shifting that model of these investments. Travis tells a story about speaking to the Founder of SaltStack and how their views matched. [00:34:03] We find out where you can learn more about FairOSS and follow them on this journey, invest, and join in. Spotlight [00:34:52] Justin’s spotlight is Curiefense, which extends Envoy proxy to protect all forms of web traffic. [00:35:15] Alyssa’s spotlight is Pixel8.earth. [00:36:06] Eric’s spotlight is OctoPrint. [00:36:53] Richard’s spotlight is Michael Oliphant’s work. [00:37:36] Russell’s spotlight is Conda. [00:38:20] Travis’s spotlight is Matplotlib. Quotes [00:03:25] “We were connecting and creating a social network long before the social networks started. That was the early days of social networks and it was addicting.” [00:04:14] “New libraries are starting to be written on numarray and we had SciPy written on numeric and there was this fork in this flegging scientific community in Python.” [00:21:18] “So that was a very exciting day. Actually, I remember I told my wife you know the problem I’ve been searching on for twenty years, I finally figured it out. I’ve been trying to figure out twenty years how to make this work, and I finally figured it out. I had to go start several companies and start a venture fund and get involved in finance and cap tables to really pull it off, but that got me excited. Now I also said, but we’re at the base of Mount Everest, like all we’ve got to do is climb to the top of this mountain and we’re there.” [00:22:44] “So you basically have a company and its value is spread to all the values of the projects. You have a bunch of those, have a thousand of those, that each add incrementally the value of a project. Invert the matrix and every project now has a linear dependency on companies that effectively you created an index fund out of every project.” [00:24:52] “The idea is if you can get open source contributors to recognize that they want to work only for companies that are participating people want to hire open source contributors. They’re some of the best people to bring into your company.” [00:25:21] “We found that companies would absolutely sponsor PyData and the reason they would is because they’re trying to hire people. They wanted to hire the best developers and they would. So, they really didn’t care so much about the projects they started, but they wanted the people.” [00:27:10] “Go make an open source project, then get somebody or connect with somebody who’s going to help you build a company that they’ll vest in and build something else. So, you basically have to do it twice.” [00:28:34] “I’ve had the chance to work at companies large and small, go in and see that’s used to do x, and realized it’s added billions of dollars of value to a lot of work for the world. And yet, the same time NumPy struggled, not enough funding to maintain itself.” [00:30:15] “I spoke to the founder of SaltStack that just got acquired by VMware. I spoke to him about his view and it was amazing how much it matched mine, in a sense that he recognized that open source is you build some of the value and you use it. The way you need to make money is to build something that uses it but isn’t the open source.” [00:32:41] “It’s not you’re monetizing open source, you’re empowering, you’re sustaining open source, by selling and connecting the economic value to the functional value that’s there.” [00:33:04] “There will still be challenges. I’m not naïve. Every new thing comes with a whole set of new challenges.” Links OpenTeams (https://openteams.com/about) FairOSS (https://faiross.org/) FairOSS, PBC Twitter (https://twitter.com/faiross_pbc) FairOSS Community (https://community.faiross.org/login) Travis Oliphant Twitter (https://twitter.com/teoliphant?lang=en) Anaconda Dividend Program (https://www.anaconda.com/blog/sustaining-the-open-source-ds-ml-ecosystem-with-the-anaconda-dividend-program) Quansight (https://www.quansight.com/) NumFOCUS (https://numfocus.org/) BackYourStack (https://backyourstack.com/) Dask (https://dask.org/) SaltStack (https://www.saltstack.com/) SciPy (https://www.scipy.org/) NumPy (https://numpy.org/) Curiefense (https://www.curiefense.io/) Pixel8.earth Ambassador Program (https://pixel8earth.medium.com/kicking-off-the-pixel8-earth-ambassador-program-80a87a70fb3a) OctoPrint (https://octoprint.org/) Michael Oliphant’s work (https://langev.com/index.php/author/moliphant/Michael+Oliphant) Conda (https://github.com/conda/conda) Matplotlib.com (https://matplotlib.org/) Credits Produced by Richard Littauer (https://www.burntfen.com/) Edited by Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/) Show notes by DeAnn Bahr at Peachtree Sound (https://www.peachtreesound.com/) Special Guests: Russell Pekrul and Travis Oliphant.
Sponsored by us! Support our work through: Our courses at Talk Python Training pytest book Patreon Supporters Special guest: Jason McDonald Michael #1: 5 ways I use code as an astrophysicist Video by Dr. Becky (i.e. Dr Becky Smethurst, an astrophysicist at the University of Oxford) She has a great YouTube channel to check out. #1: Image Processing (of galaxies from telescopes) Noise removal #2: Data analysis Image features (brightness, etc) One example: 600k “rows” of galaxy properties #3: Model fitting e.g. linear fit (visually as well through jupyter) e.g. Galaxies and their black holes grow in mass together Color of galaxies & relative star formation #4: Data visualization #5: Simulations Beautiful example of galaxies colliding Star meets black hole Brian #2: A Visual Intro to NumPy and Data Representation Jay Alammar I’ve started using numpy more frequently in my own work. Problem: I think of np.array like a Python list. But that’s not right. This visualization guide helped me think of them differently. Covers: arrays creating arrays (I didn’t know about np.ones(), np.zeros(), or np.random.random(), so cool) array arithmetic indexing and slicing aggregation with min, max, sum, mean, prod, etc. matrices : 2D arrays matrix arithmetic dot product (with visuals, it takes seconds to understand) matrix indexing and slicing matrix aggregation (both all entries and column or row with axis parameter) transposing and reshaping ndarray: n-dimensional arrays transforming mathematical formulas to numpy syntax data representation All with excellent drawings to help visualize the concept. Jason #3: Qt 6 release (including PySide2) Qt 6.0 released on December 8: https://www.qt.io/blog/qt-6.0-released 3D Graphics abstraction layer called RHI (Rendering Hardware Interface), eliminating hard dependency on OpenGL, and adding support for DirectX, Vulkan, and Metal. Uses native 3D graphics on each device by default. Property bindings: https://www.qt.io/blog/property-bindings-in-qt-6 A bunch of refactoring to improve performance. QtQuick styling CAUTION: Many Qt 5 add-ons not yet supported!! They plan to support by 6.2 (end of September 2021). Pay attention to your 5.15 deprecation warnings; those things have now been removed in 6.0. PySide6/Shiboken6 released December 10: https://www.qt.io/blog/qt-for-python-6-released New minimum version is Python 3.6, supported up to 3.9. Uses properties instead of (icky) getters/setters now. (Combine with snake_case support from 5.15.2) from __feature__ import snake_case, true_property PyQt6 also just released, if you prefer Riverbank’s flavor. (I prefer official.) Michael #4: Is your GC hyper active? Tame it! Let’s think about gc.get_threshold(). Returns (700, 10, 10) by default. That’s read roughly as: For every net 700 allocations of a collection type, a gen 0 collection runs For every gen 0 collection run, 1/10 times it’ll be upgraded to gen 1. For every gen 1 collection run, 1/10 times it’ll be upgraded to gen 2. Aka for every 100 gen 0 it’s upgraded to gen 2. Now consider this: query = PageView.objects(created__gte=yesterday).all() data = list(query) # len(data) = 1,500 That’s multiple GC runs. We’ve allocated at least 1,500 custom objects. Yet never ever will any be garbage. But we can adjust this. Observe with gc.set_debug(gc.DEBUG_STATS) and consider this ONCE at startup: # Clean up what might be garbage gc.collect(2) # Exclude current items from future GC. gc.freeze() allocs, gen1, gen2 = gc.get_threshold() allocs = 50_000 # Start the GC sequence every 10K not 700 class allocations. gc.set_threshold(allocs, gen1, gen2) print(f"GC threshold set to: {allocs:,}, {gen1}, {gen2}.") May be better, may be worse. But our pytest integration tests over at Talk Python Training run 10-12% faster and are a decent stand in for overall speed perf. Our sitemap was doing 77 GCs for a single page view (77!), now it’s 1-2. Brian #5: Top 10 Python libraries of 2020 tryolabs criteria They were launched or popularized in 2020. They are well maintained and have been since their launch date. They are outright cool, and you should check them out. General interest: Typer : FastAPI for CLI applications I can’t believe first commit was right before 2020. Just about a year after the introduction of FastAPI, if you can believe it. Sebastián Ramírez is on
Caitlin, Francis, Anya, and Alan reflect on Theft and disagree about what's great and what needed more work. But we all agree it has a wonderful moustache. Come fan the hammer with us as we learn about life in an observatory and the philosophy of pedagogy.Bad Wolf makes ‘His Dark Materials' and does not listen to our podcast. Probably.SciPy is a thing.Fanning the Hammer is a thing.What is a MP 40?I Ching DivinationOrientalism is a thingPedagogy of the Oppressed by Paolo Freire introduced the concept of Critical Consciousness in Postmodern Philosophy.The behind the scenes coverage for Season 2 has been great. We enjoyed “Welcome to Cittagazze” for how detailed the world-building (literally) was.Our theme song is Clockwork Conundrum by NathanGunnFollow us on Twitter: Anya @StrangelyLiterl Cailtlin @inferiorcaitlin Francis @franciswindram The Podcast @MoTPodPlease email us contact@hallowedgroundmedia.com
List of Python Programming Language Libraries You Should Know in 2020. • Numpy. • TensorFlow. • Theano. • SciPy. • eli5 0.10.1. • PyTorch. • LightGBM. • Keras. • Pandas. --- Send in a voice message: https://anchor.fm/the-ddsry-show/message
In episode 27, we interviewed Ralf Gommers from the NumPy and SciPy projects. We started our discussion by talking about his past research experience as a physicist and his transition to open source software and programming. This led him to get involved in projects such as PyWavelets, NumPy and SciPy. Following that, we had a great discussion about NumPy, its many features, its target audience and its performance. We learned why NumPy is not included in Python's standard library and its overlap with Scipy. We also compared the combination of Matlab to NumPy and Python and how users could transition to this open source solution. We then had a brief discussion about SciPy and the features it provides. Ralf informed us of the positive results from Google's previous Summer of Code and Season of Docs participations. We discussed how to reach the project and the many kinds of contributions that they are looking for. We talked about the importance of FLOSS for science and attribution of research output. We finished the interview with our classic quick questions and a reflection from Ralf about the need for more sustainability in open source software development as volunteer effort may not be sufficient in the future. 00:00:00 Intro 00:00:18 Introduction 00:00:33 Introducing Ralf Gommers 00:02:05 Research during his PhD and and PostDoc 00:03:20 When he started to use open source tools 00:03:52 Learning to code 00:04:39 PyWavelets, another sideproject he likes 00:05:55 His elevator pitch for NumPy 00:06:55 Vector arrays in Python before NumPy 00:07:49 How he got involved in the NumPy project 00:10:13 Traget users for NumPy 00:11:36 NumPy as part of the standard library? 00:13:24 Features provided by NumPy 00:14:22 Major differences between Python built-in list and NumPy's array 00:16:01 Structured data 00:16:45 Why appending a row to an array is made hard 00:18:09 Multithreaded code with NumPy 00:19:48 Distributed array processing 00:20:50 GPU computation with Python and NumPy 00:22:16 Linear algebra functions in NumPy 00:23:25 Overlap between SciPy and NumPy for linear algebra 00:23:55 Python speed as an interpreted language 00:25:43 Python with NumPy compared to Matlab 00:28:07 How easy is the transition between Matlab and Python Numpy 00:29:26 Performance difference between Matlab and Python 00:31:00 Commercial applications of NumPy 00:32:15 Contributions from the industry ans incentives to contribute 00:34:10 Elevator pitch for SciPy 00:35:37 Overview of some of the submodules in SciPy 00:38:11 The size of the communities 00:39:33 Participation in Google Summer of Code 00:40:24 Participation in Google Season of Docs 00:41:48 Communication channels in the project 00:43:25 Where to ask for support? 00:44:48 Possible contributions 00:46:25 Skills usefull to contribute to the NumPy project 00:48:12 Identifying possible contributions 00:48:52 The importance of FLOSS for science 00:52:02 Possible negative impact of FLOSS on science 00:52:49 Crediting contributions in science 00:53:42 Most notable scientific discovery in recent years 00:54:49 His favourite text processing tool 00:55:30 Volunteer effort may not be sufficient anymore 00:56:58 Contact informations for Ralf Gommers 00:57:27 Outro
Revisit the early days of Travis Oliphant's contributions to scientific Python and by extension Python's relevance to AI. Catch-up on the high energy, current efforts of the creator of NumPy, SciPy, Numba, and Conda. Travis also founded the NumFOCUS Foundation. NumFOCUS is backing Jupyter and Pandas. In this Episode, we break our 26.1 Minutes rule for a really great and important chat with this legend of Data Science. Enjoy an extra few minutes.
Quail Data #0007 - Stats Wars Rodolfo #1: MOSP MONARC Objects Sharing Platform (MOSP) es una plataforma para crear, editar y compartir objetos JSON validados de cualquier tipo. MONARC - Method for an Optimised aNAlysis of Risks by CASES (Método para un análisis optimizado de riesgos por CASOS.) Puede usar cualquier esquema JSON disponible para crear nuevos objetos JSON a través de un formulario web generado dinámicamente y basado en el esquema seleccionado. Sergio #2: Scikit Geometry "scikit-geometry también viene con funciones para calcular el diagrama de Voronoi, el casco convexo, cuadros delimitadores, la suma minkowski de dos polígonos, un árbol AABB para consultas vecinas más cercanas y muchas otras utilidades útiles para cálculos geométricos, con planes para agregar muchos más!" Rodolfo #3: pandapy Demos un momento para tomar en cuenta el siguiente meme: https://www.reddit.com/r/mathmemes/comments/ewct2v/euler_moment/ Ahora, ¿recuerdan, por una parte a Pandas? Y por otra parte, ¿a NumPy? Pues bueno, pueden pensar en este paquete como un hijo de ambos. PandaPy tiene la velocidad de NumPy y la usabilidad de Pandas (10x a 50x más rápido). Así como importas pandas como pd y numpy como np, el común es importar a pandapy como pp (ya sabes → pd & np = pp). Sergio #4: Como hacer tu propio blog sin ser un experto en computadoras con fast.ai y fast_template Una guía muy fácil de seguir para crear tu propio blog hosteado en GitHub pages sin tener que usar la linea de comando. Es muy practico y facil de seguir y ahora utiliza GitHub Actions para transformar tus notebooks de jupyter a blog posts Rodolfo #5: Construyendo un Python Data Science Container usando Docker Es un blog post que ilustra cómo crear un contenedor de Docker que incluya paquetería como NumPy, SciPy, Pandas, SciKit-Learn, Matplotlib y NLTK. Todo se realiza a través de la construcción de un Dockerfile basado en Alpine, una versión muuuy ligera de Linux. El post te da todos los comandos para levantar el contenedor. Sergio #6: Blog de Juvenal Campos - Como Visualizar Pirámides de Población en R Un paso a paso de como construir una piramide de poblacion con ggplot2 Juvenal usa blogdown de R para este blog - todxs deberiamos bloguear mas! Extras: Sergio: Lorem Ipsum pero mexicano ? jajaja https://ignaciochavez.com/projects/lorempaisum/ RStudioConf está aquí en San Francisco esta semana y tienen los materiales de sus talleres en GitHub pa quién no pudo asistir: https://github.com/rstudio-conf-2020 Rodo: Para la gente Pythonista que nos escucha, ¡ya hay fecha para el PyCon Latam 2020! 27-29 de agosto, Pto. Vallarta, Jalisco. ¡No se lo pueden perder! (https://twitter.com/PyLatam/status/1221886633210982402) Meme de la semana --- This episode is sponsored by · Anchor: The easiest way to make a podcast. https://anchor.fm/app --- Send in a voice message: https://anchor.fm/quaildata/message Support this podcast: https://anchor.fm/quaildata/support
This week John discusses his recent experience teaching a large group at SciPy and why software testing is important. John's Tutorial Tutorial Page Fun Paper Friday Beer bubbles - turns out there is more than you'd think to explain their physics! Shafer, Neil E., and Richard N. Zare. "Through a beer glass darkly." Physics Today 44.10 (1991): 48-52. Contact us: Show Support us on Patreon! www.dontpanicgeocast.com SWUNG Slack @dontpanicgeo show@dontpanicgeocast.com John Leeman www.johnrleeman.com @geo_leeman Shannon Dulin @ShannonDulin
SciKit-Learn provides simple and efficient tools for data mining and data analysis which are accessible to everybody, and reusable in various contexts. It is built on NumPy, SciPy, and matplotlib.
En este capítulo presento qué son los Módulos, cómo se usan, los disponibles directamente con el lenguaje de programación Python, y una mención a las más destacables disponibles en la comunidad. . Pagina de información oficial: https://docs.python.org/3/library/index.html . Modulos de Python: TIME, DATETIME, RANDOM, MATH, STATISTICS, OS, OS.PATH, PATHLIB, SYS, SQLITE3, HASHLIB, CSV, GZIP, ZLIB, BZ2, LZMA, ZIPFILE, TARFILE, TKINTER,... . Módulos para Python: NumPy, SciPy, SymPy, BioPython, SQLAlchemi, Colorama, wxPython, PyQT, PyGTK, Kivy, Matplotlib, Seaborn, Bokeh, PyGame, PyGlet, Twisted, Scrapy, NLTK, Request, Pillow, Keras, Pytorch, Scikit-Learn, Pandas, Theano, TensorFlow,... . Aquí tenéis mi página web: https://unosycerospatxi.wordpress.com/ . UN SALUDO!!!!! Espero que os guste!!!
SciPy is open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more. SciPy provides many user-friendly and efficient numerical routines such as for numerical integration and optimization. SciPy runs on all popular operating systems, is easy to use, and powerful enough to be depended upon by the world's leading scientists & engineers.
On this episode of the Bike Shed Chris is joined by George Brocklehurst, development director in thoughtbot's New York studio. The conversation starts with a discussion around progressive enhancement and the state of the modern web, and then shifts to focus on George's recent explorations of machine learning. This episode is a perfect introduction to the topic of ML, and provides a great summary of why you might want to start working with it and how to go about that. Does Progressive Enhancement Have a Place in Today's Web? Vue.js Electron React Native React Native for Desktop Frameworks and Tools For Exploring Machine Learning NumPy SciPy Jupyter Notebook Pandas scikit-learn Natural Language Toolkit (NLKT) spaCy gensim Getting Started with Machine Learning: Intro to Machine Learning Workshop What is Machine Learning? Named Entity Recognition Recommending blog posts with machine learning
Travis Oliphant is the Founder & CEO of Quansight: a company that bridges open-source communities and innovative companies by growing talent, building technology, and discovering new products. For years Travis has been an indispensable contributor toward data science’s open-source movement through so many different outlets: Founder, Director, & Former CEO @ Anaconda, Inc: a free and open source distribution of over 250 popular data science packages for Python and R, used by over 6 million users. Founder, Chairman of the Board @ NumFOCUS Foundation: the world-renowned open-source community promoting open code development and reproducible scientific research. President @ Enthought: a software company best known for the early development and maintenance of the SciPy stack. Creator of NumPy, SciPy, Numba, & XND: all invaluable open-source Python libraries Before founding Continuum Analytics (later renamed to Anaconda) in 2012, Travis received a Ph.D. from the Mayo Clinic, B.S. and M.S. degrees in Mathematics and Electrical Engineering from Brigham Young University, and spent nearly a decade thereafter as an Assistant Professor of Electrical and Computer Engineering at BYU before phasing out of academia to focus on creating open-source software to support industry. Naturally, our conversation focused on his work with creating, evolving, and funding open-source software, covering topics like: How growing up in a bad neighborhood and being raised Mormon led him to an early fascination with computers and a laser-focus on helping people What led him to take a leap from the stability of a 10-year academic career to trying to support a family while working on free software Why open-source software has been such a centerpiece throughout his career, the inflection points he’s experienced over the past 20 years Why community-based, open-source models have been severely underfunded, how Travis has managed to “monetize open source” with his newest company, Quansight. The open-source community’s recent shift “away from ‘local co-op’ and toward ‘big agriculture’”, the challenges Travis is seeing pop-up The story behind how he first wrote the Numpy library (all by himself!) over 4-5 months of 60-70 hour workweeks before others started to see any potential All of the amazing open-source projects he and his team need support with from brilliant data scientists like you :) Enjoy the show! Show Notes: https://ajgoldstein.com/podcast/ep15/ Travis’ LinkedIn: https://www.linkedin.com/in/teoliphant/ Travis’ Twitter: https://twitter.com/teoliphant AJ’s Twitter: https://twitter.com/ajgoldstein393/
This week we have a host of people gathered in a conference room at the scientific python conference to discuss what we've seen and what we're excited about. SciPy 2018 Contact us: Show Support us on Patreon! www.dontpanicgeocast.com SWUNG Slack @dontpanicgeo show@dontpanicgeocast.com John Leeman www.johnrleeman.com @geo_leeman Shannon Dulin @ShannonDulin
Fernando Perez is best-known as the creator of IPython and co-founder of Project Jupyter: a set of open-source data science tools that some may consider to be the equivalent of the bat & ball to the sport of baseball. Today, you really can’t play the game of data science without Jupyter Notebooks and our guest today is one of Jupyter's leads and originators (see here for the rest of the amazing team). Fernando is also an Assistant Professor in Statistics at UC Berekely, Researcher at the Berekely Institute for Data Science, and Founding Board Member of the NumFOCUS foundation — the community that creates the SciPy stack, along with virtually every other notable open source data science tool out there. This conversation was recorded in-person with Fernando in his office on UC Berekely’s campus, and it turned out to be the most humanizing, energizing, and down-to-earth interview I’ve had so far. Some of the many topics we covered include: what Fernando wanted to be while growing up in Medellin (Me-de-jean), Colombia the function that formal education played in his learning of data science the story behind IPython and Project Jupyter and it’s evolution over the past 10 years lessons learned about technical competence and human character from his mentors over the years what a “computational narrative” means to him and why it’s principles are key to data storytelling Fernando’s experience teaching a 650-student course (part of a pair of courses that are the largest of it's kind) as part of the Berekely Institute of Data Science Enjoy the show! Show Notes: https://ajgoldstein.com/podcast/ep7/ Fernando’s Twitter: https://twitter.com/fperez_org AJ’s Twitter: www.twitter.com/ajgoldstein393/
In Folge 4 stellen Marcel und Alexandra ihre Arbeit als Data Scientist mit Python vor. Wir reden über die verschiedenen Frameworks wie Numpy, SciPy, scikit-learn u.v.m. Dazu erklären wir Grundlagen des Maschinellen Lernens, unterschiedliche Problemstellungen, Lernalgorithmen und Metriken. Auch Deep-Learning, die explorative Analyse mit Notebooks und die Python Distribution Conda kommen zur Sprache.
Katharine Beaumont: @katharinecodes Show Notes: In this episode, we hit the topic of machine learning from a 101 perspective: what it is, why it is important for us to know about it, and what it can be used for. Transcript: CHARLES: Hello everybody and welcome to The Frontside Podcast, Episode 94. My name is Charles Lowell, a developer here at The Frontside and your podcast host-in-training. Today I'm going to be flying it alone, but that's okay because we have a fantastic guest who's going to talk about a subject that I've been dying to learn about. But you know, given the number of things in the world, I haven't had a chance to get around it. But with us today is Katharine Beaumont who is a machine learning consultant. And she's going to talk to us, not surprisingly, about machine learning. So welcome, Katharine. KATHARINE: Hello. Thank you very much for having me. CHARLES: No, no, it's our pleasure. So, I guess my first question is, because I'm very much approaching this from first principles here, is what is machine learning as a discipline and how does it fit into the greater picture of technology? KATHARINE: Okay. Well, if you think about artificial intelligence which is one of those slightly undefinable fields because it encompasses so much, so it encompasses elements of robotics, linguistics, math, probability, philosophy, it has six main elements. So, a really basic definition of machine learning is getting, and this comes from Arthur Samuel in 1959, it's about getting computers to learn without being explicitly programmed. And that's hugely paraphrasing. But machine learning is an element that sits under the wider discipline of artificial intelligence. Artificial intelligence is one of those tricky to define fields because people have different opinions about what it is. And obviously philosophers can't agree what intelligence is, which makes it slightly complicated. But artificial intelligence as a broad brush is a discipline that borrows from philosophy, math, probability, statistics, linguistics, robotics, and spawned subfields like natural language processing, knowledge representation, automated reasoning, computer vision robotics, and machine learning. Machine learning is the, in a sense, the mathematical component of artificial intelligence in that from a basic point of view, even though you're looking at it from the perspective of computer science, you're utilizing algorithms that a lot of mathematicians will say, “Look, we've been doing this for years. And you've just stolen that from us,” that try and find patterns in data. And that pattern could be as basic as mapping, say, the square footage of a house to the price that it will sell at and making a prediction based on that for future examples, or it could be looking for patterns in images. CHARLES: Okay. You mentioned something that I love to do. I love stealing ideas from other disciplines. It feels great. KATHARINE: Who doesn't? CHARLES: Yeah. It's like free stuff. And the best part of ideas is the person who had it still has it after you've lifted it off of them. KATHARINE: Yeah. You just have to reference and then it's not plagiarizing. CHARLES: Yeah. So, how did you actually get into this? KATHARINE: Well, a few years ago, I was desperately bored in my job. CHARLES: So, what was that job that you were working on that was so desperately boring? You don't have to name a company. KATHARINE: Oh, I won't name the company but I will – I have to make a confession now which links back to something that we were saying off recording earlier, which was that it was doing web development. So, I'm sorry. And that's not to say that web development is boring at all. It's just that I wasn't particularly engaged, which is not a reflection on web development. CHARLES: No, no, no. I actually came – I was doing, before I got into web development, I was actually doing backend stuff for years. That was all I did. KATHARINE: Yeah, me too. I would have described myself as a server-side Java developer who then cross-trained into Ruby. And I thought I'd be doing exciting backend things in Ruby. But unfortunately, it was more, “We'd like you to move this component from this part of the page to this part of the page.” And I didn't really connect with that. And I started to wonder if I even should be a developer. CHARLES: Wow. KATHARINE: Larger forces than myself were at work to try and push me into management or analysis. And as happens, I think, after a few years. So, I started doing, in my spare time, looking at a website (and I'm sure you've heard of it) Coursera. CHARLES: Yeah. KATHARINE: So, this is the birthplace of the massive online, I can't remember what the second O is, MOOCs. Massive Online something learning. Maybe a Q in there. I'm not sure what. Do you know what the acronym is? CHARLES: I actually don't know. KATHARINE: Well, MOOCs anyway. Massive online learning courses. And there was one offered Andrew Ng from Stanford on machine learning. So, I took that and I just loved it. I really enjoyed it. And I really connected with the programming. I really enjoyed the programming. It was very fulfilling. So, it grew from that, really. And now, I've decided to go back to university. So, I'm a mature postgraduate student and I'm just currently weighing up my PhD options. So, whether to sacrifice four years for the greater good and the pursuit of knowledge or go back into an employment. So, we'll see. We'll see. And I'm quite enjoying not being employed, I have to admit. Or being employed on a freelance basis. It's wonderful. CHARLES: Right, right, right. Now, a couple of things stuck me when you were talking about – so obviously, you're studying a lot of the mathematics behind it. And you said that machine learning involves a lot of the – it's the mathematical component of artificial intelligence. But what strikes me is learning, to me, implies a lot of statefulness where you're accumulating state. Whereas my experience with mathematics is usually you're solving equations. You start from some set of facts and whether it's a dataset or some other thing, and you derive, boom, boom, boom, boom, boom, you get your answer. Whereas with learning, at least when I think about school learning, like spelling or, I don't know, paleontology or something, you're accumulating facts over a very long time. And the inferences that you make are not necessarily – they're drawn from all fo the sources that you got over all that period of time rather than some one set of facts that then you make this logical argument and poof, presto, you've got your answer. How does that square? I guess it's just a little bit off from my experience with mathematics. KATHARINE: So, I am being a bit reductionist. So probably, one way to explain it is that essentially, behind a lot of the machine learning algorithms, you're inputting numbers. And that might be the percentage of red, green, blue in a pixel for example. Or it might be the diameter of a wheel, for example, if you're looking at a component of a car. Or it might be a binary configuration if you want to input the configuration of a control panel, for example, and you're looking for anomalies. And you're running these numbers through an algorithm. And what you're getting out is either a continuous value, if you're looking at a problem with continuous data like house prices, or you're getting a probabilistic output like 60% certain these pixels together make a cat, for example. So, I am simplifying by saying it's math because what you're really doing is looking for patterns in data but a way to get a computer to understand it is to somehow input it as numbers, essentially, and to get numbers out of it. CHARLES: Oh. KATHARINE: Yeah, it's more algorithms, really. And I shouldn't have said that it was essentially math because I'm sure I'm going to get shouted at on the internet. CHARLES: Well, I certainly don't want to get you in trouble. But maybe that's a point that we should shy away a little bit, the high theoretical stuff, and bring it back. If I'm excited, not even if I'm excited, why should I be excited about it? I've heard that it's a hot topic. I've heard that a lot of people are excited about it. Is there a way that I as someone who has no specialization in this might actually be able to bring some of these techniques to bear on the problems that I'm working on? Perhaps even without understanding them first, like understanding how it works. What are some problems that I might be able to attack with these techniques? KATHARINE: Yeah, absolutely. So actually, and one of the things about machine learning that I should say is don't think, “Oh, it's not for me. I'm rubbish at math. I don't understand these concepts. I'm not willing to get my head around an algorithm,” because there are so many pre-configured APIs available from big companies like Google and Amazon and Microsoft and IBM, and I'm sure many, many more. And I'm not paid by any of them, I should say. So, you don't need to understand the inner workings of an algorithm to use it. So, one example is speech-to-text. So, if you imagine that you're working on a website and you want to make it accessible, maybe you could have a component to your navigation bar that allows users to record their voice and say, “I want to navigate to the shopping cart,” for example. And machine learning would be behind that processing. So maybe, behind it you'd have an API, I've used a few of them before just to play it, where you make a call to, say, IBM service and it returns you the text. And in your program you match on keywords like shopping cart and then change the menu bar for them. So, that's one really simple way you could do it. Another more complicated way to do it is to implement something like a recommender system. So, say you have a website where you offer customers products of some description. And the most famous example of this is Amazon and Netflix. Amazon, the shopping site, rather than now the big, big corporation. And you see what other customers like you bought, or Netflix, what you might enjoy. And that's based on taking your information, comparing your viewing habits to other people's viewing habits, and then drawing some kind of correlation between the programs you watch and trying to find programs that other people have watched that you haven't, that you might enjoy. That's more complicated, to be honest. CHARLES: But that is an example. Machine learning is what underlies all that. KATHARINE: Absolutely. And at the heart of some recommender systems, the mathematics behind it is finding a way to quantify people's preferences and measuring distances between them. But you don't need to understand that to understand the basics of how a recommender system works. CHARLES: Okay. And so, how does a recommender system work? KATHARINE: So, imagine me, yourself, and Mandy each read four books. And we rate them. But I read four books, you read three of the same books, and one different one, and Mandy reads three of the same books as me and one different one, for example. So, we've got a little gap but we're not really sure what the other person will think. And we know the genres of the books. And you can compare the genres and the ratings. So, you might rate sci-fi 6 out of 10 and romance 7 out of 10. And I might rate sci-fi and romance in equal ways. So then you might say, okay, there's a similarity between our preferences. So for this book, that Charles read, Katharine might like it. CHARLES: That makes so much sense. KATHARINE: Yeah. And maybe Mandy only likes romance, only rates it 0.3 for example. So we think, “Okay, well Mandy might not be able to recommend a book to Katharine and Charles.” CHARLES: Right, I see. Implicit in this though is there's this step of the actual learning, I guess. Or the actual teaching. How do you actually teach? Again, and this is kind of me trying to wrap my head around the concept, is I've got these set of facts and I'm inferring and I'm pattern-matching and I'm trying to draw conclusions with some certainty from this set of data. But is there this distinct actual teaching phase where you have to actually teach the computer and then it takes new facts and gives stuff? How does that work? How does it incorporate the different – I guess what I'm saying is, are there distinct phases? Or is it… KATHARINE: Yes, and it's not the same for every algorithm. So, I'm going to try and give you two examples of training. So, there's something called online learning where as a new example comes in, for example it might be – I'll just explain what a classifier is. So, this is a brief diversion. So, a classifier is a machine learning process where you're trying to put information in and you're trying to get discrete information out. So, discrete meaning like it's a cat or it's a dog or a weasel or a minion or something like that. Or, it's cancer or it's not cancer. Whereas continuous output might be the price of a car. So, in a classifier you're trying to work out what type something is. So, a really good example is there's a Google project where they've clustered artworks. So, they've taken lots of different artwork and their algorithms, which I won't explain now, have determined, “This is a ballet dancer. So, we're going to group all of these ballet dancer paintings together. This is a landscape, so we're going to group all the landscapes together.” So, online learning, you might get a bit if information in, like a picture, and you will classify it and you add it to your existing information. Whereas another type of learning is you take all of the information you have, you train the algorithm, and then you make predictions. So, you either make predictions as you go along with online learning, or you do all of the work upfront. So, one algorithm – have you heard of decision trees? CHARLES: No, I haven't. KATHARINE: So, do you ever read those rubbish teen magazines where you have a flowchart and it starts at the top like, “Do you like cats? Yes or no?” CHARLES: Oh right, yeah. KATHARINE: “Do you likes dogs? Yes or no,” and it tells you what kind of a person you are or what make up you should wear or something like that. CHARLES: Right, right, right, yeah. KATHARINE: Yeah, so decision trees are kind of like that. One example is you might get information about, it's a famous toy dataset, information about passengers on the Titanic about gender and age. And we all know the techniques on the Titanic, women and children first. I've lost my use of normal English. I'm sorry. CHARLES: Right. Maybe like a trope? KATHARINE: Yeah. So, women and children first. So, you put this data in a decision tree. And what happens is at the beginning you have all of this data. So, person A is a male. They're in their 50s. This is the type of ticket they had. And this is their income. I don't think income is one of them, but just as an example. And then you have person B, person C. So, you might have a hundred people. And the decision tree algorithm goes, “Okay, if I just looked at one of these features like gender, would that differentiate the people the most?” So, it already has the answer as to whether or not they survived or did not survive. And it's looking for the one feature that gives it the most information. And then it will split on that feature. So, you go form your thing at the top and the first question might be, “Were they male or female?” And then the decision tree will split down a level. And then your algorithm will go, “Alright, what's the next feature?” Maybe the next feature is, “Were they under the age of 30?” for example. And it works down. And you end up with this sort of flowchart. And once it's trained, you then get a new example you have, person X. and you basically just work your way through the decision tree to make the prediction of whether they survived or did not survive. CHARLES: I see. And so, do you do it with some sort of certainty? Because there's going to be variation, right? You're going to have some people who are a poor fellow in his 70s who survived. Like, it's not certain that he went down but there's some probability at the end? KATHARINE: Yes. Because you're using it to make a prediction, there is always a probabilistic element. And it depends on the decision tree algorithm. So, there are some decision tree algorithms that really don't work with contradictory data, for example. CHARLES: I see. KATHARINE: There's an element of picking your algorithm. If you're approaching a machine learning problem, you have three elements. The first one is choosing your feature and your algorithm. Then you have evaluating it, so you need a way of saying how good or bad the algorithm's doing on your data, how accurate is it for example. And then you have optimizing it, which is the dark art of machine learning. CHARLES: Right. That's the wrap across the knuckles. It's coming up with wrong numbers. KATHARINE: Yeah. That's – oh no, this took two weeks to run and I need it to take 20 seconds. Or, this is only 60% accurate and I need it to be more accurate. And it's easy, just as an example I'm just working on a course that I'm giving in a few weeks' time. And I just took a day to set up a website called Kaggle, K-A-G-G-L-E, for wine quality. So, it's got acidity, citric acid, residual sugar, chlorides. They're the features. They're the components that you're going to put in. And it has a quality score. So, what you're trying to do is you're trying to find a relationship between the features and the quality. And it's very easy to get it to work. So, I now have it working. But I have it working with a really low accuracy. So, it took maybe five minutes to get it to work and it's going to take me about half an hour to make it more accurate, and that's the optimization element. CHARLES: I see. How stable are these processes? Is it finicky and fragile so that if you get new types of features it just throws things way off? Or are there ways you could control for that? KATHARINE: Well, that's a particular type of problem and that all links in with the evaluation and optimization. And that's to do with something called overfitting. So, decision trees is an algorithm notorious for overfitting, depending on the data, that is. So, overfitting is when you get the algorithm to perform really well on the training data. And then you feed it in a new example and it might grossly misclassify it because it hasn't learned to generalize beyond the examples that you've given it. CHARLES: I see. Okay. So, it's just too concrete. It hasn't recognized deep patterns. It's only recognized something superficial. KATHARINE: Yes. So typically, if you have a finite amount of data, you only train your algorithm on a certain percentage. And then you test it on the rest. CHARLES: I see. KATHARINE: So, you hold back some data. But back to our conversation earlier about when does the training happen? Another example is something called K-nearest neighbors. You could just imagine that means three nearest neighbors, for example. So, we're trying to find who we're most similar to in a room. So, it's a room full of people. And all the people are standing next to already similar people, for example. So, you might have a room where marketing's in one corner and the software developers are in another corner and the project managers are in another corner. And you go it and you're looking for the three people who are the most similar to you and you're going to go and stand in that group, for example. So, in that type of machine learning, the training is happening as each new sample comes in, rather than upfront. And there are drawbacks to both methods and there are positives to both methods. And really, in a horrible, unsexy way, it's to do with the data. And that's normally where most people because it's the most boring part when you're talking about machine learning. CHARLES: Is the data? KATHARINE: Yeah. It's this backlash from data scientists, from the golden age of data scientists where it was the hottest job on the internet to now everyone cringing going, “Oh, I really don't want to deal with my data.” But with any machine learning problem, you can't just go, “Okay. Here you go, Charles. Here's a dataset. Learn something from it.” You need context. You need to understand it. You need to have an idea of what you're looking for. So, you're getting the machine to learn but you're using it as a tool to complement your knowledge, really. And you're feeding in your knowledge to it. And part of that are the decisions that you make on the algorithms to use. CHARLES: Okay. KATHARINE: And what you're looking for. CHARLES: So, here's something that I'm wondering is related, and again I have no idea – for some reason I always associate when people talk about neural nets as being related to teaching a computer something. Is that part of the discipline of machine learning? Or no? KATHARINE: I would say it is. But it has its own cool and trendy title of Deep Learning. But it's very much powerful machine learning. So, let's go back to this. This is a classic example of machine learning. It's probably the first example you'll come across if you do any course. House prices. So, imagine that you have a piece of paper and you're going to draw one line at one side and that is the price of a house, and you're going to draw a line at the bottom which is the size in square feet, and you're going to plot examples that you have. And you might find that there's a linear correlation between the two. So, you'll draw a line and that line of best fit is the human equivalent of doing linear regression on a computer, for example, where you're just trying to find a linear correlation between two things. And a lot of the principles in linear regression, which is a very simple learning algorithm, are found in some examples of neural networks, so some basic examples of neural networks. But instead of having one input, the size in square feet, you might have 20. And you might be repeating that process of trying to find the line of best fit with different combinations of features in different places again and again. And it scales up in complexity very quickly. But it's very similar basic principles. I'm hesitating to say ‘very similar' because they're notoriously more complicated. CHARLES: Yeah. I guess I'm not really divining what exactly, what makes it – why is it called a neural net? What makes it special? It sounds like if I'm just comparing the regressions of house prices, I'm comparing those datasets over and over again, how is that different from just a loop? What are you getting out of it? KATHARINE: Historically, neural networks come from a very simplified idea about how the brain works. So, in the early 20th century people had performed autopsies and divined the inner workings of kidneys and hearts and livers. And the brain was still a bit of a mystery. And then 2 men jointly won the Nobel Prize for Physiology, and I'm going to pronounce these names wrong, so I'm sorry. I think it's Santiago Ramón y Cajal is one and Camillo Golgi is another. And they completely disagreed about the brain but they used a staining technique from Golgi to look, using silver nitrate, at the cells in the brain. And the idea of the neuron doctrine was borne out of that, that the simplest unit to look at the brain at in order to understand it is the level of the neuron, this cell in the brain. And from there, several – well, everybody was a polyglot really, back then, so I don't want to say computer scientists. So, you had people like Frank Rosenblatt with perceptrons, Marvin Minsky and so many other people looking at a very simple idea which is that a neuron either fires or doesn't. And then you're linking boolean algebra to this cell. So, you're saying it either fires or it doesn't. And from that principle, people started drawing similarities between neurons and a basic function machine. And when I say function machine, I mean imagine when you were in elementary school and you're learning how adding up works. You might have a box with a plus on it and your teacher says, “I want you to put a three in a box and a four in a box and I want you to add them together. And what do you get out?” And obviously the answer is seven. And you can think about that little box with a plus as a function machine. So, now you could think of a little mathematical function machine where you put in some inputs and there something happens in the box. And then you'll either get a one or a zero out of it. CHARLES: And so, that's like your neuron, is the little box? KATHARINE: Yes. CHARLES: Okay. KATHARINE: Yes. So since then, the neuron doctrine is pretty much contested. There are several other elements of the brain that compose thinking and the circuitry. So, any cognitive scientist listening to this will say, “That's really not how the brain works.” You have to say it with a lot of disclaimers. But the whole idea of neural networks was borne out of this idea of thinking of a neuron as like a function machine. CHARLES: Right. And also, it doesn't actually discount the usefulness of neural networks. There are a lot of things where people didn't find what they set out to find but what they found was useful. KATHARINE: Absolutely. Yeah, and they are incredibly powerful, especially with multilayer networks which are deep networks or deep learning. CHARLES: Okay. So, I didn't want derail you from your explanation. So, you've got these little function boxes and those are the kind of neurons inside the neural network? KATHARINE: Yes. So, what happens is you might have a layer of 10 of them and you might have another layer after that of another 10. And each of them are connected in a simple neural network. And then you might have an output layer of five because you're looking to classify, I don't know, an apple into five different types of apple, for example. CHARLES: Now, when you're talking about a later, you're talking about, I've got the outputs of one layer of the network are the inputs to the next layer. KATHARINE: Yes. And in different neural network architectures they'll be connected differently. But in a simple neural network, you can assume that every neuron is connected to every neuron in the next layer. So, if you have two 10-layer, two layers each with 10 neurons in, the top neuron in one layer is going to have 10 connections going out of it into the next layer. CHARLES: Oh really? Wow, that's interesting. KATHARINE: Yes. And each connection has its own sort of configuration. CHARLES: So, you're like cross-wiring all the – okay. Wow, that's kind of… KATHARINE: And yeah, that's where the complexity comes in. CHARLES: Yeah. I was thinking it was like a simple exponential fan-out. But it's even more complicated, the number of combinations you can get. KATHARINE: Yes. But in theory, it's very simple because each neuron is like this function machine with inputs coming in and something going out. It just might go out to several different locations. But it scales up in complexity very quickly. So say we're classifying apples. So, we have five different attributes of an apple like its color, its texture, its weight, acidity. I don't know how else you would measure an apple. How shiny it is, for example. And we're putting all of that information in. And each one of those things that I've just listed is a feature. So, we might have an input per feature and we'll link them all up to the neurons in a layer and we'll move that information onto the next layer. And the really important thing is what's happening on those connections between them. Because on the connections you have weights, which is a way of changing the input from one neuron to another. So, the weight might be like 0.5 for example. So, it squashes the input. Or it might magnify it. And then you get the output. Now, once you have the output you can compare the output with what you know the right answer is. And then you have this idea of an error. So, you might be like, “You got this so wrong. It's not a Braeburn apple at all. It's actually a Granny Smith apple. And what you do is with each example that you train in, you use your information about the error to train the network. Because what you're trying to do is get the error to be as small as possible. And one of the techniques of that is called back propagation. And it's notoriously difficult to understand because it involves partial derivatives and a large element of calculus. But essentially, what you're doing is comparing the right answer with the answer that the network gave it and asking it to go back and change it, change those connection weights. CHARLES: And so, do you make them fluctuate at random or is there some – is there a method to the madness of changing the weights? KATHARINE: There is a method to the madness and it's called back propagation. And the reason I linked linear regression in early, so our really simple map of house size in square feet, and the price, is because it uses a similar technique called gradient descent which is an algorithm for looking at the error and changing the weight, those numbers on the connections, to try and get it to a minimal point. So, if we go back to our house price problem, I just want you to imagine in your mind that we've got this one axis going up which is the price, and one going across which is the size in square feet. And you've got line drawn, a diagonal line. If you just imagine now in your mind moving that line down till it's completely flat at the bottom, and then moving it up so it's vertical, so it's aligned with either axis in every point in between, what you could do is you could take the error on each of those lines. So, if we imagine we have these two axes, we have the price of a house on one side and we have the size in square feet in another. And we're going to draw a line from the top axis and we're going to sweep it down and draw a line at each point as it goes down until it's aligned with the bottom axis. And at each point we draw that line, we measure the error. So, what we'd do for that is all of the little points, all of the example data, we'd measure the difference between them and the line and we're going to, say, add them all up. So, what you'd end up with is you'd end up with a graph mapping the error against the gradient at all fo those different points. And it would look like a bowl. You'd have a lowest point. You'd have a point for some gradient where the error is the lowest. CHARLES: Right, yeah. Okay. I'm seeing it. I think I'm seeing it. So, you want to take that error function and you want to, what is it? Now this is – boy, I'm going back to high school math. You would take the derivative and find out where the tangent is and that's your min point? That's the root of the equation and that's the point where your error is lowest? KATHARINE: Yeah, exactly. Or there's an algorithm called gradient descent that does it automatically by taking little steps. So, it looks at the tangent of the gradient at a point and says, “If I move the gradient down is the error going to decrease. And if so, move in that direction.” So automatically, it tries to take steps to get to the bottom. And optimization problem with that is that you can configure the step size. So, you could take tiny, tiny steps and take forever to get to the bottom or you could take massive steps and completely miss the bottom. So, you can imagine it like walking down a hill. If you're a minion then it will take a really long time because you're tiny. And if you're a giant, you might never get to the valley. CHARLES: Right. You might just leap right across the chasm. KATHARINE: Yeah, just miss it completely. So, that's linear regression where we're looking. And that's really, even though we have a two-dimensional graph that's a one-dimensional problem because we just have the one input feature. Now, when we're looking at neural networks and we're looking at gradient descent in neural networks, each one of those connections is something that we're trying to configure to reduce the error. So suddenly, you have maybe a hundred-dimension landscape and you're trying to get to the bottom of a hill. And there might be several local optimas and one deepest valley, but you might have lots of other valleys that you could get stuck in. So, it becomes a very difficult problem. Does that make sense? CHARLES: Yes. No, that does make sense. I'm just trying to let it sink in. KATHARINE: It's completely impossible to visualize a hundred dimensions, yeah. CHARLES: I actually had to sit back and kind of close my eyes and stare up at the ceiling. KATHARINE: I think the trick is not to think about it. I heard someone say, there's another fantastic course on Coursera by a famous computer scientist studying neural networks called Geoffrey Hinton. And his advice in one of the videos on visualizing these multidimensional landscapes, say it's 15 dimensions, is to close your eyes and shout, “15!” And that hasn't worked for me, but I'm sure it's worked for some people. CHARLES: But it certainly probably makes you feel better. KATHARINE: Yeah. I think it's one of those things that's just beyond comprehension. But we can just quietly accept it. CHARLES: Right. It's just – yeah, what's nice about I guess math is you just don't have to understand it. I mean, you do. You just understand that there really is no mapping to our physical experience. And that's okay. And just let that go. It's just like, this is just some… KATHARINE: Yeah. CHARLES: We had some numbers that existed in the domain of understanding, that we can understand. And there are just some rules that we follow and if you look at the intermediate steps, well the points don't really exist in that domain of physical experience and understanding. And that's okay. We just accept and let it go. We just hope that at some point, we can translate that model back into the domain of ‘we can understand it'. KATHARINE: Oh, completely. There's a lot of faith. CHARLES: Yeah. KATHARINE: And also, I remember coming into machine learning and thinking, “This is like magic. It's amazing.” And then you study it a bit and you're like, “This is so easy. This is just basic – this is just functions. These are just glorified function machines.” And then you look into it some more and you're like, “Nope. It's definitely magic.” CHARLES: Yeah. It's a phase of, every time you come up against the wall, right? And then you realize, “Oh no, it's actually something that I can close my mind over,” until you come to the next hurdle of magic. KATHARINE: Yeah. You think you've got a grasp on things and you think, “I know the landscape,” and then you suddenly realize how much more there is to learn. And that sinking feeling that you'll never learn it all. CHARLES: Yeah, yeah. Yup. Unfortunately, it seems like in tech that's like, that's just the condition. KATHARINE: Yeah, like all of my sad, unread books. CHARLES: If I wanted to get into just really start experimenting with this stuff and start saying, “Maybe I can utilize some of these techniques for some of these problems that I'm encountering,” Where would be a good place to get started? What libraries? What online resources? What people are good to follow and ask questions of? KATHARINE: Okay. Let's start with websites. So for a start, this website called Kaggle which I mentioned earlier, and that is K-A-G-G-L-E dot com. And that has a lot of dataset resources. It also has a community of people discussing how they use the datasets. It has competitions. And it has a lot of links. I discovered recently as well a really good blog. It's on Medium and it is called ‘Machine Learning for Humans'. And that's really well-written. I really like it, actually, and it has a good section on resources called ‘The Best Machine Learning Resources'. And I should probably plug my own blog, but this one's so much better. CHARLES: But please do. KATHARINE: No, no. I have to actually write stuff for it. But there's a lot of things there about, well, if you want to learn linear algebra, what if you want to learn probability and statistics, calculus, and then just go straight to machine learning and pick up the math on the way. I would say go on Coursera, because there are courses like Andrew Ng's course on machine learning from Stanford. And Geoffrey Hinton from the University of Toronto. But there are also courses there on things like calculus and probability and statistics if you want to level up your math. If you don't want to do anything to do with the math, I would say go to the vendor websites, like AWS, Google. If you're into Java, go on deep learning for J. And a lot of them have tutorials that complement their products. Deep learning for J is one of my favorites at the moment. It's an open source Java library. It's pretty plug and play, actually. You don't need to understand a lot of it to get started with it. But it helps. And obviously then there's TensorFlow although personally, I find just other Python libraries like SciPy a lot easier than using TensorFlow. And I think naturally you'll find the resources and the people to follow on Twitter from that. But the crucial thing I'd say is don't get hung up on which language to start playing around with. So, a lot of people say, “Oh, I must need to know Python. I must need to know math.” And really, you don't. It just depends on what level you want to approach things at. So, if you want to write your own gradient descent algorithms, then Python is probably more for you, or Matlab or R or something like that. But there are libraries where you can do it in Java. I've heard rumors that there's a JavaScript library and I wouldn't be surprised. So, I would just have a look at what's out there. But try and get a grasp of the fundamentals just from an intuition point of view, because it will make your life so much easier. You might for example realize that you're using the completely wrong algorithm for the problem that you're looking at. And that's invaluable. CHARLES: Yeah. Knowing what not to do certainly is. Alright. Well, thank you so much for that, Katharine. Thank you for being on the show. Thank you for curing us of at least a small portion of our ignorance. And if people want to get in touch with you perhaps and continue the conversation, or follow you, what's a good place to get in touch? KATHARINE: Sure. Probably tweet me on Twitter. I'm @KatharineCodes but it's spelled like Katharine Hepburn. It's K-A-T-H-A-R-I-N-E codes. Because when I joined Twitter, I didn't have much of an imagination. I still don't. So, it's not particularly clever. But it's there. CHARLES: Alright. Well, fantastic. And for everybody listening at home, you can also get in touch with us. We're @TheFrontside on Twitter. Or you can just drop us a line at info@frontside.io. Thanks for listening and we will see you all next time.
The O’Reilly Programming Podcast: Wrangling data with Python’s libraries and packages.In this episode of the O’Reilly Programming Podcast, I talk with Katharine Jarmul, a Python developer and data analyst whose company, Kjamistan, provides consulting and training on topics surrounding machine learning, natural language processing, and data testing. Jarmul is the co-author (along with Jacqueline Kazil) of the O’Reilly book Data Wrangling with Python, and she has presented the live online training course Practical Data Cleaning with Python.Discussion points: How data wrangling enables you to take real-world data and “clean it, organize it, validate it, and put it in some format you can actually work with,” says Jarmul. Why Python has become a preferred language for use in data science: Jarmul cites the accessibility of the language and the emergence of packages such as NumPy, pandas, SciPy, and scikit-learn. Jarmul calls pandas “Excel on steroids” and says, “it allows you to manipulate tabular data, and transform it quite easily. For anyone using structured, tabular data, you can’t go wrong with doing some part of your analysis in pandas.” She cites gensim and spaCy as her favorite NLP Python libraries, praising them for “the ability to just install a library and have it do quite a lot of deep learning or machine learning tasks for you.” Other links: Check out the video Building Data Pipelines with Python, presented by Jarmul. Check out the video Data Wrangling and Analysis with Python, presented by Jarmul. Jarmul is one of the founders of the group PyLadies, which focuses on helping more women become active participants and leaders in the Python open source community.
The O’Reilly Programming Podcast: Wrangling data with Python’s libraries and packages.In this episode of the O’Reilly Programming Podcast, I talk with Katharine Jarmul, a Python developer and data analyst whose company, Kjamistan, provides consulting and training on topics surrounding machine learning, natural language processing, and data testing. Jarmul is the co-author (along with Jacqueline Kazil) of the O’Reilly book Data Wrangling with Python, and she has presented the live online training course Practical Data Cleaning with Python.Discussion points: How data wrangling enables you to take real-world data and “clean it, organize it, validate it, and put it in some format you can actually work with,” says Jarmul. Why Python has become a preferred language for use in data science: Jarmul cites the accessibility of the language and the emergence of packages such as NumPy, pandas, SciPy, and scikit-learn. Jarmul calls pandas “Excel on steroids” and says, “it allows you to manipulate tabular data, and transform it quite easily. For anyone using structured, tabular data, you can’t go wrong with doing some part of your analysis in pandas.” She cites gensim and spaCy as her favorite NLP Python libraries, praising them for “the ability to just install a library and have it do quite a lot of deep learning or machine learning tasks for you.” Other links: Check out the video Building Data Pipelines with Python, presented by Jarmul. Check out the video Data Wrangling and Analysis with Python, presented by Jarmul. Jarmul is one of the founders of the group PyLadies, which focuses on helping more women become active participants and leaders in the Python open source community.
Arne Rick (@Couchsofa) war schon ein häufiger, aber ungenannter Gast im Modellansatz Podcast: Als DJ war er auf den Aufnahmen von der aktuellen und früheren Gulasch-Programmiernächten im Hintergrund zu hören. Außer für Musik hat Arne auch ein großes Faible für Mathematik und Informatik und befasst sich im Zuge seiner von Prof. Marcus Aberle betreuten Bachelorarbeit zum Bauingenieurwesen an der Hochschule Karlsruhe mit Bezierkurven für Stabwerke. Stabwerke sind Modelle für Strukturen in Bauwerken und eine Lösung für ein System von Stabwerken hilft im konstruktiven Bauingenieurwesen, die Aufbauten in ihren Bemessungen und Anforderungen auszulegen und erforderliche Eigenschaften festzulegen. Die Darstellung als Stabwerke ist im Sinne eines Fachwerks eng verknüpft mit dem Prinzip von Finite Elementen, da diese in gewissen Anwendungen als Stabwerke und umgekehrt interpretiert werden können. Weiterhin können Stabwerke mit Hilfe von finite Elementen innerhalb der Stäbe genauer bestimmt bzw. verfeinert werden. Die Betrachtung des Stabwerks beginnt mit der Struktur und den Einwirkungen: Hier ist spielt das Semiprobabilistische Teilsicherheitsbeiwerte-System eine besondere Rolle, da es die möglichen Einwirkungen auf die Bauteile und damit die Gesamtanalyse probabilistisch erfassbar macht. Man unterscheidet hier nicht mehr so stark zwischen dem Bauen im Bestand, wo viele Nebenbedingungen zwar bekannt, aber die Eigenschaften der verbleibenden Bestandteile unsicher sind, und dem Aufbau eines Neubaus, wo es jeweils für die Bauingenieure gilt, die Vorgaben aus der Architektur konstruktiv, berechnend, planend und organisatorisch unter Berücksichtigung des möglichen Zeit- und finanziellen Rahmens, verfügbarer Materialien, Technik, Mitarbeiter und Bauverfahren sicher umzusetzen. Speziell in der Betrachtung der Stabwerke können die Fälle der statistischen Über- und Unterbestimmung des Bauwerks auftreten, wo Überbestimmung grundsätzlich zu Verformungen führt, eine Unterbestimmung andererseits kein funktionsfähiges Bauwerk darstellt. Weiterhin ändert jede Anpassung von beispielsweise der Tragfähigkeit eines Bauteils auch gleich zur Änderung des gesamten Stabwerks, da ein stärkerer Stab oft auch mehr wiegt und sich eventuell auch weniger verformt. Außerdem ziehen in einem statisch überbestimmten System die steiferen Elemente die Lasten an. So ist es häufig, eher unintuitiv, von Vorteil Bauteile zu schwächen um eine Lastumlagerung zu erzwingen. Dies führt in der Auslegung oft zu einem iterativen Prozess. Die Darstellung eines Stabes oder Balkens ist dabei eine Reduzierung der Wirklichkeit auf ein lokal ein-dimensionales Problem, wobei die weiteren Einwirkungen aus der Umgebung durch Querschnittswerte abgebildet werden. Die Voute ist ein dabei oft auftretendes konstruktives Element in der baulichen Umsetzung eines Tragwerks, die in der Verbindung von Stäben eine biegesteife Ecke bewirkt und in vielen Gebäuden wie beispielsweise dem ZKM oder der Hochschule für Gestaltung in Karlsruhe zu sehen sind. In der Modellierung der einzelnen Stäbe können verschiedene Ansätze zum Tragen kommen. Ein Standardmodell ist der prismatische Bernoulli Biegestab, das mit Differentialgleichungen beschrieben und allgemein gelöst werden kann. Daraus entstehen Tabellenwerke, die eine Auslegung mit Hilfe dieses Modell ermöglichen, ohne weitere Differentialgleichungen lösen zu müssen. Eine häufige Vereinfachung ist die Reduzierung des Problems auf zwei-dimensionale planare Stabwerke, die in den meissten Anwendungsfällen die relevanten Anforderungen darstellen können. Die Stäbe in einem Stabwerk können nun unterschiedlich miteinander verbunden werden: Eine Möglichkeit ist hier ein Gelenk, oder in verschiedene Richtungen und Dimension festlegte oder freie Lager, also Festlager oder Loslager zwischen Stäben oder einem Stab und dem Boden. Je nach Wahl der Verbindung entstehen in diesem Punkt eine unterschiedliche Anzahl von Freiheitsgraden. Für die praktische Berechnung werden Lager oft auch verallgemeinert, in dem die Verbindung über eine Feder modelliert wird: Hier haben ideale Loslager eine Federkonstante von 0, während die Federkonstante von idealen Festlagern gegen unendlich geht. Mit allen Werten dazwischen können dann reelle Lager besser beschrieben werden. In vereinfachter Form führt die Berechnung eines Stabwerks mit idealisierten unbiegbaren Balken mit den Endpunkten der Balken als Variablen und den Verknüpfung der Balken als Gleichungen direkt auf ein relativ einfaches lineares Gleichungssystem. Da sich in Realität alle Balken unter Last merklich verbiegen (es sei denn, sie sind vollkommen überdimensioniert), müssen sie grundsätzlich mit Steifigkeit modelliert werden, um belastbare Ergebnisse zu erhalten. Aber auch im erweiterten Modell wird der Stab durch eine Matrix linear beschrieben, nur dass hier auch die Last eine Rolle spielt und über das Elastizitätsmodul, Fläche und Trägheitsmoment die Verbiegungen abbilden kann. So ergibt das erweiterte Modell ebenfalls ein lineares Gleichungssystem, nur mit mehr Variablen und Parametern, die das System beschreiben und Angaben zur Verbiegung und Lastverteilung machen. Nach der gewöhnlichen Berechnung des Stabwerks hat sich Arne nun mit der Frage beschäftigt, ob die Stäbe mit Biegezuständen mit Bezierkurven besonders sinnvoll dargestellt werden können. In der Konstruktion erfahren Bézierkurven eine große Beliebtheit, da sie über Start- und Endpunkt mit zwei Kontrollpunkten sehr intiutiv zu steuern sind. Oft kommen oft Non-Uniform Rational B-Splines (NURBS) zum Einsatz, die als verallgemeinerte Bézier-Splines aufgefasst werden können. Das Grundproblem besteht darin, dass die Stäbe im erweiterten Modell durch Einführung der Biegezustände und Elastizität weder ihre Länge behalten, noch eine eindeutige Ausrichtung durch unterschiedliche Winkel an den Enden erhalten. Einen solchen Widerspruch versucht man bei Finiten Elementen entweder durch eine feinere Diskretisierung und damit durch eine Abbildung durch Geradenstücke oder durch eine Abbildung mit Polynomen höherer Ordnung zu ermöglichen und das Problem auf dem verfeinerten Modell zu lösen. Der dritte Ansatz ist hier, die Ergebnisse durch die in der Konstruktion bewährte Darstellung über Bezier-Kurven qualitativ anzunähern, um die Modellerfahrung aus der Konstruktion in der Darstellung der Lösung zu nutzen. Die Umsetzung erfolgt in Python, das mit den Bibliotheken NumPy und SciPy eine Vielzahl hilfreicher und effizienter Funktionen besitzt. Literatur und weiterführende Informationen A. Rick: Structurana, Python, 2017. Friedrich U. Mathiak: Die Methode der finiten Elemente (FEM), Einführung und Grundlagen, Skript, Hochschule Neubrandenburg, 2010. Ch. Zhang, E. Perras: Geometrische Nichtlinearität, Steifigkeitsmatrix und Lastvektor, Vorlesung Baustatik (Master), Lehrstuhl Baustatik, Universität Siegen, 2015. Podcasts M. Bischoff: Baustatik und -dynamik, Gespräche mit Markus Völter & Nora Ludewig im omega tau Podcast, Episode 029, 2010. M. An: Topologieoptimierung, Gespräch mit G. Thäter im Modellansatz Podcast, Folge 125, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2017. A. Rick: A Hackers Approach To Building Electric Guitars, Vortrag auf der GPN15, Karlsruhe, 2015. GPN17 Special Sibyllinische Neuigkeiten: GPN17, Folge 4 im Podcast des CCC Essen, 2017. M. Lösch: Smart Meter Gateway, Gespräch mit S. Ritterbusch im Modellansatz Podcast, Folge 135, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2017. F. Magin: Automated Binary Analysis, Gespräch mit S. Ritterbusch im Modellansatz Podcast, Folge 137, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2017. A. Rick: Bézier Stabwerke, Gespräch mit S. Ritterbusch im Modellansatz Podcast, Folge 141, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2017.
John just got back from SciPy and updates us on the state of scientific Python. SciPy 2017 Ryan and John’s talk on units John’s poster Katy Huff’s Keynote Sean Gulick’s Keynote nbgrader - Jess Hamrick Dask Scikit Learn Numba Katy Huff Interview (Episode 65) Fun Paper Friday What’s the chance that you bump noses on your next kiss? Can stats help reduce that? Find out on this week’s Fun Paper Friday! Karim, AKM Rezaul, et al. “The right way to kiss: directionality bias in head-turning during kissing.” Scientific reports 7.1 (2017): 5398. Contact us: Show - www.dontpanicgeocast.com - SWUNG Slack - @dontpanicgeo - show@dontpanicgeocast.com John Leeman - www.johnrleeman.com - @geo_leeman Shannon Dulin - @ShannonDulin
John Leeman (@geo_leeman) spoke with us about geophysics and associated technology. John is one of the hosts of the Don't Panic GeoCast (@dontpanicgeo, iTunes). Some episodes you may like: What if you calibrated your candles differently? Out of the Country (Brad Jolive on moon rocks) "Rock Drills and Beer" Undersampled Radio John is teaching a course at Penn State called Techniques of Geoscientific Experimentation. The information and textbook is online! It uses the SparkFun Inventor's Kit. John has a website with a blog. He has some Cheerson CX-10 tiny drone posts (my favorite, also Alvaro's repo and my posts). John also has a consulting company: Leeman GeoPhysical. Python! Lots of Python was discussed. Jupyter notebooks (here is a good tutorial) Example of reproducing a figure from a paper John's friction model (repo and talk he gave about it at SciPy2016) Neat SciPy talk about open textbooks SciPy is a Python conference in Austin, TX in July Finally, in lieu of rock puns, here is a neat animation showing many different waves from earthquakes. Contest! Contest ends October 1st and now there are more books! In addition to the ones Bob Apthorpe is sponsoring, John's consulting company will sponsor: Earthquake Storms: An Unauthorized Biography of the San Andreas Fault by John Dvorak and The Soul of A New Machine by Tracy Kidder.
This week we discuss the anniversary of the first manned lunar landing and how a software glitch puts over 40,000 brain studies at risk. Apollo 11 Apollo 11 Neat landing visualization with audio Crew: Buzz Aldrin, Michael Collins, and Neil Armstrong Code on GitHub Lunar Module Code Walkthrough (Video) Saturn V Graphic XKCD - up-goer 5 Easy reading of Apollo 11 events Digital Apollo The Dish (movie) Fun Paper Friday Eklund, Anders, Thomas E. Nichols, and Hans Knutsson. “Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates.” Proceedings of the National Academy of Sciences (2016): 201602413. SciPy 2015 Colormaps Contest Write us a geoscience themed limerick! This is a family show, so remember nothing that rhymes with “Nantucket.” Please email us your limericks by August 12, 2016 and we’ll be judging them along with Dr. Katie Schearer, an english professor. The prize? One of the awesome creations from Chris at Taylor Custom. Thanks for listening everyone and thank you Chris! Contact us: Show - www.dontpanicgeocast.com - @dontpanicgeo - show@dontpanicgeocast.com John Leeman - www.johnrleeman.com - @geo_leeman Shannon Dulin - @ShannonDulin
British Geological Survey Hackathon SciPy 2016 A few SciPy talks Modeling Rate and State Friction with Python | SciPy 2016 | John Leeman Working towards all the Geophysics, but Backwards | SciPy 2016 | Rowan Cockett Using Open Source Tools to Refactor Geoscience Education | SciPy 2016 | Lindsey Heagy MONTE Python for Deep Space Navigation | SciPy 2016 | Jonathon Smith Reproducible, One Button Workflows with the Jupyter Notebook & Scons | SciPy 2016 | Jessica Hamrick Feedback Nature Podcast Episode Contest Write us a geoscience themed limerick! This is a family show, so remember…nothing that rhymes with “Nantucket.” Please email us your limericks by August 12, 2016 and we’ll be judging them along with Dr. Katie Schearer, an english professor. The prize? One of the awesome creations from Chris at Taylor Custom. Thanks for listening everyone and thank you Chris! Fun Paper Friday Interference puts satellite data at risk Contact us Show - www.dontpanicgeocast.com - @dontpanicgeo - show@dontpanicgeocast.com John Leeman - www.johnrleeman.com - @geo_leeman Shannon Dulin - @ShannonDulin
Seguramente el tema del que te voy a hablar hoy en el podcast te suene a ciencia ficción y creas que es algo que solo lo podemos ver en las películas del mismo género. Sin duda alguna, no estamos en lo más alto en la gráfica de desarrollo en cuanto a soluciones y aplicaciones en esta materia, pero esto no quiere decir que no podamos investigar y aprender de esta ciencia. Ya te conté ¿por qué debemos aprender visión artificial? y hoy te voy a hablar como podemos introducirnos en la visión artificial, OpenCV y Phyton.Antes de continuar quiero hablarte del Campus de Programarfacil. Si quieres crear tus propios proyectos con Arduino o algún dispositivo Open Hardware, debes dominar dos disciplinas, la programación y la electrónica. En el Campus estoy volcando todo mi conocimiento en estas materias con cursos de diferentes niveles, básico, intermedio y avanzado. Tendrás a tu disposición un formulario de soporte premium y sorteos de material electrónico e informático. Entra y busca tu curso.Este tema no es nuevo en el podcast. Ya he hablado en diferentes capítulos:18. Realidad aumentada44. Tratamiento de imágenes con JavaScript64. Proyectos curiosos con Arduino67. Big Data y visión artificialHoy voy a profundizar en la materia y te voy a dar los pasos necesarios para empezar a programar con la biblioteca más famosa de visión artificial, OpenCV.¿Qué es OpenCV?OpenCV es una biblioteca libre desarrollada originalmente por Intel. Vio la luz en el año 1999. Escrita originalmente en C/C++, su mejor virtud es que es multiplataforma, se puede ejecutar en diferentes sistemas operativos (Linux, Windows, Mac OS X, Android e iOS). También la podemos utilizar en diferentes lenguajes de programación como Java, Objective C, Python y mi favorito C#. Precisamente para este último existe una versión que se llama EmguCV.En junio de 2015 se produjo un hito importante, por fin la versión 3.0 estaba disponible. Si hechas números, en 16 años (de 1999 a 2015) solo ha habido 3 versiones. Esto es debido a que desde un principio esta biblioteca ha sido robusta y muy eficiente.En esta última versión cabe destacar que por fin es compatible con la última versión de Python, la 3.0. Esto permite aprovechar todas las ventajas de la última versión de este lenguaje.Quizás sea la biblioteca de visión artificial más importante y más usada del mundo. Es utilizada por universidades, empresas y gente del movimiento Maker para dar rienda suelta a su imaginación al tratarse de un software libre.Pasos para instalar OpenCV y PythonTe preguntarás ¿por qué Python? Aunque todavía no he tratado este lenguaje de programación ni en el blog, ni en el podcast, si que te puedo contar que Python es muy sencillo de usar, favoreciendo el código legible gracias a su sintaxis sencilla.Debemos ser conscientes que el lenguaje nativo de OpenCV es C/C++, con la complejidad que ello conlleva si queremos utilizar esta biblioteca en nuestros proyectos.Lo que más me gusta de Python es que es un lenguaje fácilmente portable a otras plataformas entre las que se incluye Raspberry Pi. Si además disponemos de una cámara conectada, imagínate lo que podemos llegar a conseguir.Aunque en mi día a día yo utilizo Windows y en el Campus he decidido empezar a con este sistema operativo, se puede hacer de igual manera con Linux y OS X.La decisión de empezar por Windows es muy sencilla. Es el sistema operativo más utilizado del mundo y no porque lo diga yo, solo tienes que ver los datos estadísticos que nos proporciona Net Market Share. Según esta empresa, más del 90% de usuarios utilizan Windows.estadistica-uso-sistema-operativoAún así podemos pensar que es una estrategia de ventas y que esta empresa puede pertenecer al magnate de Redmond. Por eso voy a compartir los datos estadísticos obtenidos de Google Analytics sobre el uso de sistemas operativos en esta web osea, vosotros los usuarios.estadistica-analytics-sistema-operativoComo puedes ver hay una diferencia aplastante con el resto de perseguidores. Por eso he optado empezar por Windows, para poder llegar al mayor número de gente y que nadie se sienta excluido.Lo primero que debemos saber antes de empezar con los pasos a seguir para instalar OpenCV y Python, es que esto ya no es una tecnología plug and play. Estamos acostumbrados a hablar de Processing, Arduino, Scratch y las tecnologías fáciles de usar. Con OpenCV la cosa se complica, sobre todo a la hora de preparar el sistema. Pero yo te voy a dar los pasos necesarios para que empieces de una forma muy sencilla. La instalación consta de 3 pasos.Paso 1: Instalación de Python 3.0 con paquetes adicionalesYa no solo tenemos que instalar el lenguaje de programación, para utilizar OpenCV necesitamos instalar, además, ciertos paquetes de Python que nos hará la vida más fácil cuando desarrollemos aplicaciones en visión artificial.NumPy: es una biblioteca de código abierto que da soporte a vectores y arrays para Python.SciPy: es una biblioteca de código abierto que contiene herramientas y algoritmos matemáticos para Python.Matplotlib: es una biblioteca de código abierto para la generación de gráficos a partir de vectores y arrays.Pip: gestor de paquetes para Python.Se puede instalar cada paquete por separado, pero existen plataformas como Anaconda 3 donde viene todo integrado en un único instalador. Te recomiendo que lo hagas con este tipo de plataformas.Paso 2: Instalar OpenCV para Python 3Quizás este paso pudiera ser el más complicado pero gracias al gestor de paquetes Pip se hace muy sencillo. Solo debemos de descargar la versión para nuestro sistema operativo en formato whl y luego instalarlo. Es muy simple gracias al gestor de paquetes.Paso 3: Instalar el entorno de desarrollo (Opcional)Este paso es opcional, podemos utilizar el bloc de notas de Windows para programar en Python. Mi consejo es utilizar Sublime Text 3 y el plugin Anaconda, que convierte este IDE en un entorno de desarrollo optimizado para Python con todas sus funcionalidades.Y estos serían los 3 pasos recomendados para configurar el sistema. Puedes ir al Campus y ver los como lo hago yo paso a paso con vídeos, imágenes y el código necesario para que todo funcione correctamente.El recurso del oyenteHoy traigo un recurso del oyente especial, el email recibido por Antonio Otero. Ha significado mucho par mi porque el objetivo de este proyecto es precisamente ese, ayudar a la gente y en este caso se ha conseguido.Gracias señores por su buena labor.Siento la necesidad de comentarles una situación. (ya os di las gracias en un comentario, pero quiero extenderme mas)Aparte de mi trabajo como desarrollador web, soy formador de inserción para el empleo. Este año me a tocado dar un curso de microsistemas a un grupo algo especial. (Jóvenes entre 18 y 22 años que digamos andan un poco perdidos por no decir nada mas, unos panoramas....).Acostumbrado a mis clases habituales (para "adultos"), no daba con la manera de interesarles en la materia. El temario es muy variado, SO, hardware, electrónica muy básica, scripts mantenimiento.... todo básico pero muy amplio.No encontraba la manera y estaba sufriendo porque no conseguía enderezarlos, estando al borde de la expulsión de algunos alumnos.El caso es que como todas las mañanas en mi hora de trafico hacia el curso y harto de las noticias de política, se me ocurrió poner vuestros podcasts, y habéis sido una inspiración para mi. Habéis cambiado mi forma de ver algunas cosas, me habéis contagiado vuestra ilusión (quiza ya habia perdido alguna) y como buen virus yo se la he trasmitido a mis alumnos.Con scratch he conseguido que se interesen por la programación, y ahora me hacen script de linux bastante majos. incluso hemos estado con ensamblador (muy básico). Pero espero que me programen arduino con c :-)Con arduino están emocionados (he comprado 4 de mi bolsillo pues el centro no los pone). y eso que aún no los han tocado, pero aprenden la teoría con gran interés deseando ponerla en practica :-)En fin, han cambiado de comportamiento completamente, están involucradisimos y no faltan a una clase, y quiero haceros participes de este éxito.La única mala noticia es que estoy llegando al posdcast de esta semana y no se si aguantare a esperar una semana para escucharos de nuevo :-)Gracias, Antonio OteroYa me despido por esta semana, recuerda que nos puedes encontrar en Twitter y Facebook.Cualquier duda o sugerencia en los comentarios de este artículo o a través del formulario de contacto.
John is on the road headed to the SciPy conference and Shannon is done with field camp. Join us to hear the wrap up and talk about how geology was used strategically in the Revolutionary War on this fourth of July weekend episode. Watchung Mountains Area geologic summary Middlebrook encampment Nike Missile Cheyenne Mountain NORAD Fun Paper Friday This week we learn about perchlorate from fireworks and how long it can reside in lakes. Wilkin, R. T., Fine, D. D., & Burnett, N. G. (2007). Perchlorate Behavior in a Municipal Lake Following Fireworks Displays. Environmental Science & Technology, 41(11), 3966–3971. http://doi.org/10.1021/es0700698 Contact us: Show - www.dontpanicgeocast.com - @dontpanicgeo - show@dontpanicgeocast.com John Leeman - www.johnrleeman.com - @geo_leeman Shannon Dulin - @ShannonDulin
No.64 - 데이터 수치해석과 마이닝 라이브러리 NumPy & SciPy. 2014.10.26
Olá pessoal e sejam bem-vindos à mais um episódio do Castálio Podcast! Neste episódio concluímos nossa entrevista com o Thiago Avelino, desta vez conversando sobre seu projeto OpenMining, que é um software de Business Intelligence (Inteligência empresarial) para a criação de cubos OLAP multi-dimensionáis, usando Numpy, Scipy, Pandas entre outros …
We chat with Stefan Karpinski, creator of the Julia programming language, live on stage during Øredev 2014. Topics include deciding to build a new language, the interesting unsolved problems of numerical computing, concurrency solutions, developing with and on LLVM, handling deprecation nicely, things (possibly) in the future for Julia and why Swift is exciting for Julia and other languages. This recording exists as good as it is thanks to Stephen Chin of nighthacking.com for providing and masterfully wrangling all the necessary technology. There is a minute and a half of worse audio quality just after the nine minute mark, where microphone problems forced us to fill in with audio from our backup microphone. Comments, thoughts or suggestions? Discuss this episode at Techworld! Links Stefan Karpinski Julia programming language Scientific computing Viral Shah Jeff Bezanson MATLAB R programming language Python C extension Goldilocks Goldilocks principle Dylan Garbage collection Unboxed data Complex number Julia Webstack Numerical computing Concurrency Distributed computing Threading Julia on Github Transactional memory Goroutine Coroutine Channel I/O LLVM IFDEF JIT - just-in-time (compilation, in this case) Shared library libclang - C Interface to Clang Template instantiation Quake2.jl Go Hacker school Matrix multiplication Vectorization Generational incremental garbage collection SNOBOL SPITBOL Icon Perl 4 C99 standard Immutable composite types Multiple dispatch Monkey patch radd-trick in Python Common Lisp CLOS - Common Lisp object system Polymorphism Self BLAS - Basic linear algebra subroutines Fast fourier transform Gofix Tracing Static compilation MIT - Massachusetts institute of technology Courses taught using Julia Function pointer Scipy Steven Johnson FFTW Pycall package for Julia Call stack GDB LLDB ABI - application binary interface Clang Rust programming language Swift Chris Lattner - creator of LLVM and Swift WebKit FTL JIT - compiles Javascript using LLVM Shadow stack Dynamic stack frame deoptimization MATLAB matrix concatenation syntax Titles Some of the interesting tradeoffs Bridge that gap between high-level and low-level A huge pointer graph of some kind It’s good to have a focus, initially The point where we’re pushing things The classic tradition of a ton of IFDEFs This brings us back to garbage collection Specializing for numerical work Where numbers don’t have to be special anymore (The question is:) How useful is that generalization? You don’t necessarily know what code you’re going to need in advance Trading off memory for performance Really doing the deprecation process A situation where normally you’d JIT something You might end up in a slow case You can always just fall back on an interpreter A partially compiled interpreter Nobody needs to know that it was written in Julia A really capable C library As easy as walking a linked pointer list I’m really glad someone else implemented it
Fredrik och Kristoffer - både oerhört tidsförvirrade - följer upp lite lyssnarkommentarer och snackar sedan framtidsprylar, nutidsprylar och alla icke-tekniska anledningar att prylar inte slår igenom. Det är stor skillnad på att dokumentera sig själv och att dokumentera alla andra. Vi diskuterar uncanny valley-effekter för fler områden än datoranimerade filmer. All den exponentiella datorkrafttillväxten, vad går den egentligen till? Tar den oss framåt? Det gick inte att förutse hur många onödiga saker vi gör med all den datorkapacitet vi skaffat oss. Och vad är egentligen AI? När det gäller inspelningarna från Øredev vill vi rikta ett jättestort tack till Stephen Chin från/med Nighthacking som helt spontant gav oss mycket bättre teknik än vi någonsin fått ihop själva! [Diskutera]http://techworld.idg.se/2.2524/1.592412) gärna avsnittet på Techworld! Länkar Avsnittet där vi pratade tangentbord Ergodox-tangentbord Spelsnack om story Billy Joel Joel Spolsky The walking dead-spelen Metal gear solid 4 Metal gear-serien Self Scheme Life is terrible: let’s talk about the web - James Mickens presentation Øredev Thomas Öberg November camp 14 november PHP Slashdot Våra intervjuer med Fred George, Stefan Karpinski, Rob Ashton och James Mickens Programmeringsspråket Julia NumPy — Numpy SciPy.org — SciPy.org Vår nya häftiga mikrofon Singularity VR - virtual reality Google glass Glasshole Narrative clip - svensk kamera för livsloggning Next generation threats - endagskonferens om säkerhet Oculus rift Antikrundan Uncanny valley Ray Kurzweil AI - artificiell intelligens Dator som spelar schack Dator som spelar Jeopardy Virgins rymdskepp kraschade Google translate Brute force Siri Titlar En programmerare i min pappas ålder En helt ny typ av kodsnackande 555 timmar till Ett lite kortare avsnitt än vanligtvis Då har man inte vinden med sig Jag vill ge mitt godkännande innan någon börjar mäta Från att slötitta på Antikrundan till att spela ett spel Ju närmare man kommer den fullständiga upplevelsen Fast på en viss punkt i verkligheten Jag får motstridiga signaler (Den som var) ambitiös futurist på åttiotalet Då fanns det inte så himla mycket mer att göra på månen Lite taskigt att dra all AI över en kam En magisk idé om vad AI är En punkt där allting bara blir löjligt
00:00:00 - Astronomer Eli Bressert joints the Paleopals to talk about his entry into science, his time working for the Chandra X-Ray Space Telescope, and whether or not life sustaining planets could for in galactic clusters. And that's just part 1! 00:25:55 - We have drinks. Because, that's why. Eli starts off with a Three Sheets Pale Ale from the Lord Nelson Brewery. Charlie assaults his palate with a Hopageddon from Napa Smith. And Kelly has a Angry Orchard cider that's a bit too sweet for her but good nonetheless. And Ryan may not be keeping it local, but at least he's keeping it regional with Odell's Footprint RegionAle. 00:31:09 - The groups is split over opinions about Director Quentin Tarintino, but his new movie, Django Unchained, just might the thing that brings them all together in this week's Trailer Trash Talk! 00:42:43- More science with Eli as we talk about whether or not stars can form in isolation and other interesting astronomical ideas. 00:56:50 - PaleoPOWs area lot like books about Python, appreciated by fewer people than they ought to be. Eli tells us about how he created one of our favorite Brachiolope images as well as his new book: SciPy and NumPy: An Overview for Developers, which is about computers, not snakes, in case you were wondering. Charlie tackles some questions from new Canadian listener Les about asteroid composition and the possibility of us putting out a book. And Ryan gets a question from Any S. via Facebook about the potential for quadrupedal predatory dinosaurs. And Kelly tells us that Temporal Tony dropped what he was doing, as instructed, to let us know he had caught up on the show. Thanks for listening and be sure to check out the Brachiolope Media Network for more great science podcasts! Music for this week's show provided by: Twilight Galaxy - Metric My Pal Alcohol - Slim Dusty Like a Ball and Chain - Jackie Greene Starry Configurations - Jets to Brazil