POPULARITY
Topics covered in this episode: How to Write a Git Commit Message Caddy Web Server Some new PEPs approved juv Extras Joke Watch on YouTube About the show Sponsored by Posit Connect: pythonbytes.fm/connect Connect with the hosts Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky) Brian: @brianokken@fosstodon.org / @brianokken.bsky.social Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky) Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Brian #1: How to Write a Git Commit Message Chris Beams 7 rules of a great commit message Separate subject from body with a blank line Limit the subject line to 50 characters Capitalize the subject line Do not end the subject line with a period Use the imperative mood in the subject line Wrap the body at 72 characters Use the body to explain what and why vs. how Article also includes Why a good commit message matters Discussion about each of the 7 rules Cool hat tips to other articles on the subject “Keep in mind: This has all been said before.” Each word is a different link. Michael #2: Caddy Web Server via Fredrik Mellström Like a more modern NGINX Caddy automatically obtains and renews TLS certificates for all your sites. Caddy's native configuration is a JSON document. Even localhost and internal IPs are served with TLS using the intermediate of a fully-automated, self-managed CA that is automatically installed into most local trust stores. Configure multiple Caddy instances with the same storage, and they will automatically coordinate certificate management as a fleet. Production-grade static file server. Brian #3: Some new PEPs approved PEP 770 – Improving measurability of Python packages with Software Bill-of-Materials Accepted for packaging Author: Seth Larson, Sponsor Brett Cannon “This PEP proposes using SBOM documents included in Python packages as a means to improve automated software measurability for Python packages.” PEP 750 – Template Strings Accepted for Python 3.14 Author: Jim Baker, Guido van Rossum, Paul Everitt, Kaudai Aono, Lysandros Nikolaou, Dave Peck “Templates provide developers with access to the string and its interpolated values before they are combined. This brings native flexible string processing to the Python language and enables safety checks, web templating, domain-specific languages, and more.” Michael #4: juv A toolkit for reproducible Jupyter notebooks, powered by uv. Create, manage, and run Jupyter notebooks with their dependencies Pin dependencies with PEP 723 - inline script metadata Launch ephemeral sessions for multiple front ends (e.g., JupyterLab, Notebook, NbClassic) Powered by uv for fast dependency management Use uvx to run jupyterlab with ephemeral virtual environments and tracked dependencies. Extras Brian: Status of Python versions new-ish format Use this all the time. Can't remember if we've covered the new format yet. See also Python endoflife.date Same dates, very visible encouragement to move on to Python 3.13 if you haven't already. Michael: Python 3.13.3 is out. .git-blame-ignore-revs follow up Joke: BGPT (thanks Doug Farrell)
Topics covered in this episode: Solara UI Framework Coverage at a crossroads “Virtual” methods in Python classes Extras Joke Watch on YouTube About the show Sponsored by ScoutAPM: pythonbytes.fm/scout Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Michael #1: Solara UI Framework via Florian A Pure Python, React-style Framework for Scaling Your Jupyter and Web Apps Solara lets you build web apps from pure Python using ipywidgets or a React-like API on top of ipywidgets. These apps work both inside the Jupyter Notebook and as standalone web apps with frameworks like FastAPI. See the Examples page. Based on Reacton By building on top of ipywidgets, Solara automatically leverage an existing ecosystem of widgets and run on many platforms, including JupyterLab, Jupyter Notebook, Voilà, Google Colab, DataBricks, JetBrains Datalore, and more. Brian #2: Coverage at a crossroads Ned Batchelder is working on making coverage.py faster. Includes a nice, quick explanation of roughly how coverage.py works with trace function and arcs used for branch coverage. And how trace slows things down for lines we know are already covered. There are cool ideas from SlipCover that could be applicable. There's also sys.monitoring from Python 3.12 that helps with line coverage, since you can disable it for lines you already have info on. It doesn't quite complete the picture for branch coverage, though. Summary: jump in and help if you can read it anyway for a great mental model of how coverage.py works. Michael #3: “Virtual” methods in Python classes via Brian Skinn PEP 698 just got accepted, defining an @override decorator for type hinting, to help avoid errors in subclasses that override methods. Only affects type checkers but allows you to declare a “link” between the base method and derived class method with the intent of overriding it using OOP. If there is a mismatch, it's an error. Python 3.12's documentation Makes Python a bit more like C# and other more formal languages Brian #4: Parsing Python ASTs 20x Faster with Rust Evan Doyle Tach is “a CLI tool that lets you define and enforce import boundaries between Python modules in your project.” we covered it in episode 384 When used to analyze Sentry's ~3k Python file codebase, it took about 10 seconds. Profiling analysis using py-spy and speedscope pointed to a function that spends about 2/3 of the time parsing the AST, and about 1/3 traversing it. That portion was then rewritten in Rust, resulting in 10x speedup, ending in about 1 second. This is a cool example of not just throwing Rust at a speed problem right away, but doing the profiling homework first, and focusing the Rust rewrite on the bottleneck. Extras Brian: I brought up pkgutil.resolve_name() last week on episode 388 Brett Cannon says don't use that, it's deprecated Thanks astroboy for letting me know Will we get CalVer for Python? it was talked about at the language summit There's also pep 2026, in draft, with a nice nod in the number of when it might happen. 3.13 already in the works for 2024 3.14 slated for 2025, and we gotta have a pi release So the earliest is then 2026, with maybe a 3.26 version ? Saying thanks to open source maintainers Great write-up by Brett Cannon about how to show your appreciation for OSS maintainers. Be nice Be an advocate Produce your own open source Say thanks Fiscal support On topic Thanks Brett for pyproject.toml. I love it. Michael: The Shiny for Python course is out! Plus, it's free so come and get it. Joke: Tao of Programming: Book 1: Into the Silent Void, Part 1
Topics covered in this episode: NumPy 2.0 release date is June 16 Uvicorn adds multiprocess workers pixi JupyterLab 4.2 and Notebook 7.2 are available Extras Joke Watch on YouTube About the show Sponsored by Mailtrap: pythonbytes.fm/mailtrap Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Brian #1: NumPy 2.0 release date is June 16 “This release has been over a year in the making, and is the first major release since 2006. Importantly, in addition to many new features and performance improvement, it contains breaking changes to the ABI as well as the Python and C APIs. It is likely that downstream packages and end user code needs to be adapted - if you can, please verify whether your code works with NumPy 2.0.0rc2.” NumPy 2.0.0 Release Notes NumPy 2.0 migration guide including “try just running ruff check path/to/code/ --select NPY201” “Many of the changes covered in the 2.0 release notes and in this migration guide can be automatically adapted in downstream code with a dedicated Ruff rule, namely rule NPY201.” Michael #2: Uvicorn adds multiprocess workers via John Hagen The goal was to no longer need to suggest that people use Gunicorn on top of uvicorn. Uvicorn can now in a sense "do it all” Steps to use it and background on how it works. Brian #3: pixi Suggested by Vic Kelson “pixi is a cross-platform, multi-language package manager and workflow tool built on the foundation of the conda ecosystem.” Tutorial: Doing Python development with Pixi Some quotes from Vic: “Pixi is a project manager, written in Rust, that allows you to build Python projects without having Python previously installed. It's installable with Homebrew (brew install pixi on Linux and MacOS). There's support in VSCode and PyCharm via plugins. By default, pixi fetches packages from conda-forge, so you get the scientific stack in a pretty reliable and performant build. If a package isn't on conda-forge, it'll look on PyPI, or I believe you can force it to look on PyPI if you like.” “So far, it works GREAT for me. What really impressed me is that I got a Jupyter environment with CuPy utilizing my aging Nvidia GPU on the FIRST TRY.” Michael #4: JupyterLab 4.2 and Notebook 7.2 are available JupyterLab 4.2.0 has been released! This new minor release of JupyterLab includes 3 new features, 20 enhancements, 33 bug fixes and 29 maintenance tasks. Jupyter Notebook 7.2.0 has also been released Highlights include Easier Workspaces Management with GUI Recently opened/closed files Full notebook windowing mode by default (renders only the cells visible in the window, leading to improved performance) Improved Shortcuts Editor Dark High Contrast Theme Extras Brian: Help test Python 3.13! Help us test free-threaded Python without the GIL both from Hugo van Kemenade Python Test 221: How to get pytest to import your code under test is out Michael: Bend follow up from Bernát Gábor “Bend looks roughly like Python but is nowhere there actually. For example it has no for loops, instead you're meant to use bend keyword (hence the language name) to expand calculations and another keyword to join branches. So basically think of something that resembles Python at high level, but without being compatible with that and without any of the standard library or packages the Python language provides. That being said does an impressive job at parallelization, but essentially it's a brand new language with new syntax and paradigms that you will have to learn, it just shares at first look similarities with Python the most.” Joke: Do-while
Oracle has been actively focusing on bringing AI to the enterprise at every layer of its tech stack, be it SaaS apps, AI services, infrastructure, or data. In this episode, hosts Lois Houston and Nikita Abraham, along with senior instructors Hemant Gahankari and Himanshu Raj, discuss OCI AI and Machine Learning services. They also go over some key OCI Data Science concepts and responsible AI principles. Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript: 00:00 The world of artificial intelligence is vast and everchanging. And with all the buzz around it lately, we figured it was the perfect time to revisit our AI Made Easy series. Join us over the next few weeks as we chat about all things AI, helping you to discover its endless possibilities. Ready to dive in? Let's go! 00:33 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:46 Lois: Welcome to the Oracle University Podcast! I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hey everyone! In our last episode, we dove into Generative AI and Language Learning Models. Lois: Yeah, that was an interesting one. But today, we're going to discuss the AI and machine learning services offered by Oracle Cloud Infrastructure, and we'll look at the OCI AI infrastructure. Nikita: I'm also going to try and squeeze in a couple of questions on a topic I'm really keen about, which is responsible AI. To take us through all of this, we have two of our colleagues, Hemant Gahankari and Himanshu Raj. Hemant is a Senior Principal OCI Instructor and Himanshu is a Senior Instructor on AI/ML. So, let's get started! 01:36 Lois: Hi Hemant! We're so excited to have you here! We know that Oracle has really been focusing on bringing AI to the enterprise at every layer of our stack. Hemant: It all begins with data and infrastructure layers. OCI AI services consume data, and AI services, in turn, are consumed by applications. This approach involves extensive investment from infrastructure to SaaS applications. Generative AI and massive scale models are the more recent steps. Oracle AI is the portfolio of cloud services for helping organizations use the data they may have for the business-specific uses. Business applications consume AI and ML services. The foundation of AI services and ML services is data. AI services contain pre-built models for specific uses. Some of the AI services are pre-trained, and some can be additionally trained by the customer with their own data. AI services can be consumed by calling the API for the service, passing in the data to be processed, and the service returns a result. There is no infrastructure to be managed for using AI services. 02:58 Nikita: How do I access OCI AI services? Hemant: OCI AI services provide multiple methods for access. The most common method is the OCI Console. The OCI Console provides an easy to use, browser-based interface that enables access to notebook sessions and all the features of all the data science, as well as AI services. The REST API provides access to service functionality but requires programming expertise. And API reference is provided in the product documentation. OCI also provides programming language SDKs for Java, Python, TypeScript, JavaScript, .Net, Go, and Ruby. The command line interface provides both quick access and full functionality without the need for scripting. 03:52 Lois: Hemant, what are the types of OCI AI services that are available? Hemant: OCI AI services is a collection of services with pre-built machine learning models that make it easier for developers to build a variety of business applications. The models can also be custom trained for more accurate business results. The different services provided are digital assistant, language, vision, speech, document understanding, anomaly detection. 04:24 Lois: I know we're going to talk about them in more detail in the next episode, but can you introduce us to OCI Language, Vision, and Speech? Hemant: OCI Language allows you to perform sophisticated text analysis at scale. Using the pre-trained and custom models, you can process unstructured text to extract insights without data science expertise. Pre-trained models include language detection, sentiment analysis, key phrase extraction, text classification, named entity recognition, and personal identifiable information detection. Custom models can be trained for named entity recognition and text classification with domain-specific data sets. In text translation, natural machine translation is used to translate text across numerous languages. Using OCI Vision, you can upload images to detect and classify objects in them. Pre-trained models and custom models are supported. In image analysis, pre-trained models perform object detection, image classification, and optical character recognition. In image analysis, custom models can perform custom object detection by detecting the location of custom objects in an image and providing a bounding box. The OCI Speech service is used to convert media files to readable texts that's stored in JSON and SRT format. Speech enables you to easily convert media files containing human speech into highly exact text transcriptions. 06:12 Nikita: That's great. And what about document understanding and anomaly detection? Hemant: Using OCI document understanding, you can upload documents to detect and classify text and objects in them. You can process individual files or batches of documents. In OCR, document understanding can detect and recognize text in a document. In text extraction, document understanding provides the word level and line level text, and the bounding box, coordinates of where the text is found. In key value extraction, document understanding extracts a predefined list of key value pairs of information from receipts, invoices, passports, and driver IDs. In table extraction, document understanding extracts content in tabular format, maintaining the row and column relationship of cells. In document classification, the document understanding classifies documents into different types. The OCI Anomaly Detection service is a service that analyzes large volume of multivariate or univariate time series data. The Anomaly Detection service increases the reliability of businesses by monitoring their critical assets and detecting anomalies early with high precision. Anomaly Detection is the identification of rare items, events, or observations in data that differ significantly from the expectation. 07:55 Nikita: Where is Anomaly Detection most useful? Hemant: The Anomaly Detection service is designed to help with analyzing large amounts of data and identifying the anomalies at the earliest possible time with maximum accuracy. Different sectors, such as utility, oil and gas, transportation, manufacturing, telecommunications, banking, and insurance use Anomaly Detection service for their day-to-day activities. 08:23 Lois: Ok…and the first OCI AI service you mentioned was digital assistant… Hemant: Oracle Digital Assistant is a platform that allows you to create and deploy digital assistants, which are AI driven interfaces that help users accomplish a variety of tasks with natural language conversations. When a user engages with the Digital Assistant, the Digital Assistant evaluates the user input and routes the conversation to and from the appropriate skills. Digital Assistant greets the user upon access. Upon user requests, list what it can do and provide entry points into the given skills. It routes explicit user requests to the appropriate skills. And it also handles interruptions to flows and disambiguation. It also handles requests to exit the bot. 09:21 Nikita: Excellent! Let's bring Himanshu in to tell us about machine learning services. Hi Himanshu! Let's talk about OCI Data Science. Can you tell us a bit about it? Himanshu: OCI Data Science is the cloud service focused on serving the data scientist throughout the full machine learning life cycle with support for Python and open source. The service has many features, such as model catalog, projects, JupyterLab notebook, model deployment, model training, management, model explanation, open source libraries, and AutoML. 09:56 Lois: Himanshu, what are the core principles of OCI Data Science? Himanshu: There are three core principles of OCI Data Science. The first one, accelerated. The first principle is about accelerating the work of the individual data scientist. OCI Data Science provides data scientists with open source libraries along with easy access to a range of compute power without having to manage any infrastructure. It also includes Oracle's own library to help streamline many aspects of their work. The second principle is collaborative. It goes beyond an individual data scientist's productivity to enable data science teams to work together. This is done through the sharing of assets, reducing duplicative work, and putting reproducibility and auditability of models for collaboration and risk management. Third is enterprise grade. That means it's integrated with all the OCI Security and access protocols. The underlying infrastructure is fully managed. The customer does not have to think about provisioning compute and storage. And the service handles all the maintenance, patching, and upgrades so user can focus on solving business problems with data science. 11:11 Nikita: Let's drill down into the specifics of OCI Data Science. So far, we know it's cloud service to rapidly build, train, deploy, and manage machine learning models. But who can use it? Where is it? And how is it used? Himanshu: It serves data scientists and data science teams throughout the full machine learning life cycle. Users work in a familiar JupyterLab notebook interface, where they write Python code. And how it is used? So users preserve their models in the model catalog and deploy their models to a managed infrastructure. 11:46 Lois: Walk us through some of the key terminology that's used. Himanshu: Some of the important product terminology of OCI Data Science are projects. The projects are containers that enable data science teams to organize their work. They represent collaborative work spaces for organizing and documenting data science assets, such as notebook sessions and models. Note that tenancy can have as many projects as needed without limits. Now, this notebook session is where the data scientists work. Notebook sessions provide a JupyterLab environment with pre-installed open source libraries and the ability to add others. Notebook sessions are interactive coding environment for building and training models. Notebook sessions run in a managed infrastructure and the user can select CPU or GPU, the compute shape, and amount of storage without having to do any manual provisioning. The other important feature is Conda environment. It's an open source environment and package management system and was created for Python programs. 12:53 Nikita: What is a Conda environment used for? Himanshu: It is used in the service to quickly install, run, and update packages and their dependencies. Conda easily creates, saves, loads, and switches between environments in your notebooks sessions. 13:07 Nikita: Earlier, you spoke about the support for Python in OCI Data Science. Is there a dedicated library? Himanshu: Oracle's Accelerated Data Science ADS SDK is a Python library that is included as part of OCI Data Science. ADS has many functions and objects that automate or simplify the steps in the data science workflow, including connecting to data, exploring, and visualizing data. Training a model with AutoML, evaluating models, and explaining models. In addition, ADS provides a simple interface to access the data science service mode model catalog and other OCI services, including object storage. 13:45 Lois: I also hear a lot about models. What are models? Himanshu: Models define a mathematical representation of your data and business process. You create models in notebooks, sessions, inside projects. 13:57 Lois: What are some other important terminologies related to models? Himanshu: The next terminology is model catalog. The model catalog is a place to store, track, share, and manage models. The model catalog is a centralized and managed repository of model artifacts. A stored model includes metadata about the provenance of the model, including Git-related information and the script. Our notebook used to push the model to the catalog. Models stored in the model catalog can be shared across members of a team, and they can be loaded back into a notebook session. The next one is model deployments. Model deployments allow you to deploy models stored in the model catalog as HTTP endpoints on managed infrastructure. 14:45 Lois: So, how do you operationalize these models? Himanshu: Deploying machine learning models as web applications, HTTP API endpoints, serving predictions in real time is the most common way to operationalize models. HTTP endpoints or the API endpoints are flexible and can serve requests for the model predictions. Data science jobs enable you to define and run a repeatable machine learning tasks on fully managed infrastructure. Nikita: Thanks for that, Himanshu. 15:18 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You'll find training on everything from cloud computing, database, and security, artificial intelligence, and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started. 15:46 Nikita: Welcome back! The Oracle AI Stack consists of AI services and machine learning services, and these services are built using AI infrastructure. So, let's move on to that. Hemant, what are the components of OCI AI Infrastructure? Hemant: OCI AI Infrastructure is mainly composed of GPU-based instances. Instances can be virtual machines or bare metal machines. High performance cluster networking that allows instances to communicate to each other. Super clusters are a massive network of GPU instances with multiple petabytes per second of bandwidth. And a variety of fully managed storage options from a single byte to exabytes without upfront provisioning are also available. 16:35 Lois: Can we explore each of these components a little more? First, tell us, why do we need GPUs? Hemant: ML and AI needs lots of repetitive computations to be made on huge amounts of data. Parallel computing on GPUs is designed for many processes at the same time. A GPU is a piece of hardware that is incredibly good in performing computations. GPU has thousands of lightweight cores, all working on their share of data in parallel. This gives them the ability to crunch through extremely large data set at tremendous speed. 17:14 Nikita: And what are the GPU instances offered by OCI? Hemant: GPU instances are ideally suited for model training and inference. Bare metal and virtual machine compute instances powered by NVIDIA GPUs H100, A100, A10, and V100 are made available by OCI. 17:35 Nikita: So how do we choose what to train from these different GPU options? Hemant: For large scale AI training, data analytics, and high performance computing, bare metal instances BM 8 X NVIDIA H100 and BM 8 X NVIDIA A100 can be used. These provide up to nine times faster AI training and 30 times higher acceleration for AI inferencing. The other bare metal and virtual machines are used for small AI training, inference, streaming, gaming, and virtual desktop infrastructure. 18:14 Lois: And why would someone choose the OCI AI stack over its counterparts? Hemant: Oracle offers all the features and is the most cost effective option when compared to its counterparts. For example, BM GPU 4.8 version 2 instance costs just $4 per hour and is used by many customers. Superclusters are a massive network with multiple petabytes per second of bandwidth. It can scale up to 4,096 OCI bare metal instances with 32,768 GPUs. We also have a choice of bare metal A100 or H100 GPU instances, and we can select a variety of storage options, like object store, or block store, or even file system. For networking speeds, we can reach 1,600 GB per second with A100 GPUs and 3,200 GB per second with H100 GPUs. With OCI storage, we can select local SSD up to four NVMe drives, block storage up to 32 terabytes per volume, object storage up to 10 terabytes per object, file systems up to eight exabyte per file system. OCI File system employs five replicated storage located in different fault domains to provide redundancy for resilient data protection. HPC file systems, such as BeeGFS and many others are also offered. OCI HPC file systems are available on Oracle Cloud Marketplace and make it easy to deploy a variety of high performance file servers. 20:11 Lois: I think a discussion on AI would be incomplete if we don't talk about responsible AI. We're using AI more and more every day, but can we actually trust it? Hemant: For us to trust AI, it must be driven by ethics that guide us as well. Nikita: And do we have some principles that guide the use of AI? Hemant: AI should be lawful, complying with all applicable laws and regulations. AI should be ethical, that is it should ensure adherence to ethical principles and values that we uphold as humans. And AI should be robust, both from a technical and social perspective. Because even with the good intentions, AI systems can cause unintentional harm. AI systems do not operate in a lawless world. A number of legally binding rules at national and international level apply or are relevant to the development, deployment, and use of AI systems today. The law not only prohibits certain actions but also enables others, like protecting rights of minorities or protecting environment. Besides horizontally applicable rules, various domain-specific rules exist that apply to particular AI applications. For instance, the medical device regulation in the health care sector. In AI context, equality entails that the systems' operations cannot generate unfairly biased outputs. And while we adopt AI, citizens right should also be protected. 21:50 Lois: Ok, but how do we derive AI ethics from these? Hemant: There are three main principles. AI should be used to help humans and allow for oversight. It should never cause physical or social harm. Decisions taken by AI should be transparent and fair, and also should be explainable. AI that follows the AI ethical principles is responsible AI. So if we map the AI ethical principles to responsible AI requirements, these will be like, AI systems should follow human-centric design principles and leave meaningful opportunity for human choice. This means securing human oversight. AI systems and environments in which they operate must be safe and secure, they must be technically robust, and should not be open to malicious use. The development, and deployment, and use of AI systems must be fair, ensuring equal and just distribution of both benefits and costs. AI should be free from unfair bias and discrimination. Decisions taken by AI to the extent possible should be explainable to those directly and indirectly affected. 23:21 Nikita: This is all great, but what does a typical responsible AI implementation process look like? Hemant: First, a governance needs to be put in place. Second, develop a set of policies and procedures to be followed. And once implemented, ensure compliance by regular monitoring and evaluation. Lois: And this is all managed by developers? Hemant: Typical roles that are involved in the implementation cycles are developers, deployers, and end users of the AI. 23:56 Nikita: Can we talk about AI specifically in health care? How do we ensure that there is fairness and no bias? Hemant: AI systems are only as good as the data that they are trained on. If that data is predominantly from one gender or racial group, the AI systems might not perform as well on data from other groups. 24:21 Lois: Yeah, and there's also the issue of ensuring transparency, right? Hemant: AI systems often make decisions based on complex algorithms that are difficult for humans to understand. As a result, patients and health care providers can have difficulty trusting the decisions made by the AI. AI systems must be regularly evaluated to ensure that they are performing as intended and not causing harm to patients. 24:49 Nikita: Thank you, Hemant and Himanshu, for this really insightful session. If you're interested in learning more about the topics we discussed today, head on over to mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course. Lois: That's right, Niki. You'll find demos that you watch as well as skill checks that you can attempt to better your understanding. In our next episode, we'll get into the OCI AI Services we discussed today and talk about them in more detail. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 25:25 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
Topics covered in this episode: I asked 100 devs why they aren't shipping faster. Here's what I learned Python 3.13.0 beta 1 released A theme editor for JupyterLab rich-argparse Extras Joke Watch on YouTube About the show Sponsored by Mailtrap: pythonbytes.fm/mailtrap Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Michael #1: I asked 100 devs why they aren't shipping faster. Here's what I learned by Daksh Gupta (via PyCoders) What's stopping you from shipping faster? Dependency bugs Complicated codebase >There is so much undocumented in our service, including poor records of new features, nonexistent or outdated info on our dependencies, or even essential things like best practices for testing, a lot of time is wasted in syncs trying to find the right information QA Loops Waiting for spec > At Amazon? Meetings, approval, talking to 10 different stakeholders because changing the color of a button affects 15 micro services Writing tests Deployment/build speed Scope creep > The human tendency to stuff last-minute items into the crevices of their luggage minutes before leaving for the airport manifests itself at software companies as scope creep. Unclear requirements Excessive meetings Motivation >honest answer is i was on ads >and that's a very old / complicated / large stack (edited) >and i didn't understand it >my friends on younger teams seemed happier, i was miserable DORA metrics Brian #2: Python 3.13.0 beta 1 released "Python 3.13 is still in development. This release, 3.13.0b1, is the first of four beta release previews of 3.13.” New REPL, featuring multi-line editing, color support, colorized exception tracebacks Cool GIL, JIT, and GC features Typing changes, including typing.TypeIs . See last weeks episode and TypeIs does what I thought TypeGuard would do in Python Some nice dead battery removals and more But seriously, the REPL is cool. Just ask Trey The new REPL in Python 3.13 - Trey Hunner Michael #3: A theme editor for JupyterLab by Florence Haudin A new tool for authoring JupyterLab themes To lower the bar for customizing JupyterLab we created a new tool providing a simple interface for tuning the JupyterLab appearance interactively. See jupyterlab-theme-editor on github Brian #4: rich-argparse “Format argparse and optparse help using rich.” “rich-argparse improves the look and readability of argparse's help while requiring minimal changes to the code.” They're not kidding. 2 line code change. from rich_argparse import RichHelpFormatter parser = argparse.ArgumentParser(..., formatter_class=RichHelpFormatter) Extras Brian: pytest course is now switched to the new platform. I sent out an email including how to save their spot on the old site and mark that spot complete on the new site. There's now comments on the course now. Trying that out. If you've got a question, just ask in that section. Michael: A new Talk Python course: Getting Started with NLP and spaCy Joke: Testing holiday
Oracle has been actively focusing on bringing AI to the enterprise at every layer of its tech stack, be it SaaS apps, AI services, infrastructure, or data. In this episode, hosts Lois Houston and Nikita Abraham, along with senior instructors Hemant Gahankari and Himanshu Raj, discuss OCI AI and Machine Learning services. They also go over some key OCI Data Science concepts and responsible AI principles. Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode. ------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:26 Lois: Welcome to the Oracle University Podcast! I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hey everyone! In our last episode, we dove into Generative AI and Language Learning Models. Lois: Yeah, that was an interesting one. But today, we're going to discuss the AI and machine learning services offered by Oracle Cloud Infrastructure, and we'll look at the OCI AI infrastructure. Nikita: I'm also going to try and squeeze in a couple of questions on a topic I'm really keen about, which is responsible AI. To take us through all of this, we have two of our colleagues, Hemant Gahankari and Himanshu Raj. Hemant is a Senior Principal OCI Instructor and Himanshu is a Senior Instructor on AI/ML. So, let's get started! 01:16 Lois: Hi Hemant! We're so excited to have you here! We know that Oracle has really been focusing on bringing AI to the enterprise at every layer of our stack. Hemant: It all begins with data and infrastructure layers. OCI AI services consume data, and AI services, in turn, are consumed by applications. This approach involves extensive investment from infrastructure to SaaS applications. Generative AI and massive scale models are the more recent steps. Oracle AI is the portfolio of cloud services for helping organizations use the data they may have for the business-specific uses. Business applications consume AI and ML services. The foundation of AI services and ML services is data. AI services contain pre-built models for specific uses. Some of the AI services are pre-trained, and some can be additionally trained by the customer with their own data. AI services can be consumed by calling the API for the service, passing in the data to be processed, and the service returns a result. There is no infrastructure to be managed for using AI services. 02:37 Nikita: How do I access OCI AI services? Hemant: OCI AI services provide multiple methods for access. The most common method is the OCI Console. The OCI Console provides an easy to use, browser-based interface that enables access to notebook sessions and all the features of all the data science, as well as AI services. The REST API provides access to service functionality but requires programming expertise. And API reference is provided in the product documentation. OCI also provides programming language SDKs for Java, Python, TypeScript, JavaScript, .Net, Go, and Ruby. The command line interface provides both quick access and full functionality without the need for scripting. 03:31 Lois: Hemant, what are the types of OCI AI services that are available? Hemant: OCI AI services is a collection of services with pre-built machine learning models that make it easier for developers to build a variety of business applications. The models can also be custom trained for more accurate business results. The different services provided are digital assistant, language, vision, speech, document understanding, anomaly detection. 04:03 Lois: I know we're going to talk about them in more detail in the next episode, but can you introduce us to OCI Language, Vision, and Speech? Hemant: OCI Language allows you to perform sophisticated text analysis at scale. Using the pre-trained and custom models, you can process unstructured text to extract insights without data science expertise. Pre-trained models include language detection, sentiment analysis, key phrase extraction, text classification, named entity recognition, and personal identifiable information detection. Custom models can be trained for named entity recognition and text classification with domain-specific data sets. In text translation, natural machine translation is used to translate text across numerous languages. Using OCI Vision, you can upload images to detect and classify objects in them. Pre-trained models and custom models are supported. In image analysis, pre-trained models perform object detection, image classification, and optical character recognition. In image analysis, custom models can perform custom object detection by detecting the location of custom objects in an image and providing a bounding box. The OCI Speech service is used to convert media files to readable texts that's stored in JSON and SRT format. Speech enables you to easily convert media files containing human speech into highly exact text transcriptions. 05:52 Nikita: That's great. And what about document understanding and anomaly detection? Hemant: Using OCI document understanding, you can upload documents to detect and classify text and objects in them. You can process individual files or batches of documents. In OCR, document understanding can detect and recognize text in a document. In text extraction, document understanding provides the word level and line level text, and the bounding box, coordinates of where the text is found. In key value extraction, document understanding extracts a predefined list of key value pairs of information from receipts, invoices, passports, and driver IDs. In table extraction, document understanding extracts content in tabular format, maintaining the row and column relationship of cells. In document classification, the document understanding classifies documents into different types. The OCI Anomaly Detection service is a service that analyzes large volume of multivariate or univariate time series data. The Anomaly Detection service increases the reliability of businesses by monitoring their critical assets and detecting anomalies early with high precision. Anomaly Detection is the identification of rare items, events, or observations in data that differ significantly from the expectation. 07:34 Nikita: Where is Anomaly Detection most useful? Hemant: The Anomaly Detection service is designed to help with analyzing large amounts of data and identifying the anomalies at the earliest possible time with maximum accuracy. Different sectors, such as utility, oil and gas, transportation, manufacturing, telecommunications, banking, and insurance use Anomaly Detection service for their day-to-day activities. 08:02 Lois: Ok.. and the first OCI AI service you mentioned was digital assistant… Hemant: Oracle Digital Assistant is a platform that allows you to create and deploy digital assistants, which are AI driven interfaces that help users accomplish a variety of tasks with natural language conversations. When a user engages with the Digital Assistant, the Digital Assistant evaluates the user input and routes the conversation to and from the appropriate skills. Digital Assistant greets the user upon access. Upon user requests, list what it can do and provide entry points into the given skills. It routes explicit user requests to the appropriate skills. And it also handles interruptions to flows and disambiguation. It also handles requests to exit the bot. 09:00 Nikita: Excellent! Let's bring Himanshu in to tell us about machine learning services. Hi Himanshu! Let's talk about OCI Data Science. Can you tell us a bit about it? Himanshu: OCI Data Science is the cloud service focused on serving the data scientist throughout the full machine learning life cycle with support for Python and open source. The service has many features, such as model catalog, projects, JupyterLab notebook, model deployment, model training, management, model explanation, open source libraries, and AutoML. 09:35 Lois: Himanshu, what are the core principles of OCI Data Science? Himanshu: There are three core principles of OCI Data Science. The first one, accelerated. The first principle is about accelerating the work of the individual data scientist. OCI Data Science provides data scientists with open source libraries along with easy access to a range of compute power without having to manage any infrastructure. It also includes Oracle's own library to help streamline many aspects of their work. The second principle is collaborative. It goes beyond an individual data scientist's productivity to enable data science teams to work together. This is done through the sharing of assets, reducing duplicative work, and putting reproducibility and auditability of models for collaboration and risk management. Third is enterprise grade. That means it's integrated with all the OCI Security and access protocols. The underlying infrastructure is fully managed. The customer does not have to think about provisioning compute and storage. And the service handles all the maintenance, patching, and upgrades so user can focus on solving business problems with data science. 10:50 Nikita: Let's drill down into the specifics of OCI Data Science. So far, we know it's cloud service to rapidly build, train, deploy, and manage machine learning models. But who can use it? Where is it? And how is it used? Himanshu: It serves data scientists and data science teams throughout the full machine learning life cycle. Users work in a familiar JupyterLab notebook interface, where they write Python code. And how it is used? So users preserve their models in the model catalog and deploy their models to a managed infrastructure. 11:25 Lois: Walk us through some of the key terminology that's used. Himanshu: Some of the important product terminology of OCI Data Science are projects. The projects are containers that enable data science teams to organize their work. They represent collaborative work spaces for organizing and documenting data science assets, such as notebook sessions and models. Note that tenancy can have as many projects as needed without limits. Now, this notebook session is where the data scientists work. Notebook sessions provide a JupyterLab environment with pre-installed open source libraries and the ability to add others. Notebook sessions are interactive coding environment for building and training models. Notebook sessions run in a managed infrastructure and the user can select CPU or GPU, the compute shape, and amount of storage without having to do any manual provisioning. The other important feature is Conda environment. It's an open source environment and package management system and was created for Python programs. 12:33 Nikita: What is a Conda environment used for? Himanshu: It is used in the service to quickly install, run, and update packages and their dependencies. Conda easily creates, saves, loads, and switches between environments in your notebooks sessions. 12:46 Nikita: Earlier, you spoke about the support for Python in OCI Data Science. Is there a dedicated library? Himanshu: Oracle's Accelerated Data Science ADS SDK is a Python library that is included as part of OCI Data Science. ADS has many functions and objects that automate or simplify the steps in the data science workflow, including connecting to data, exploring, and visualizing data. Training a model with AutoML, evaluating models, and explaining models. In addition, ADS provides a simple interface to access the data science service mode model catalog and other OCI services, including object storage. 13:24 Lois: I also hear a lot about models. What are models? Himanshu: Models define a mathematical representation of your data and business process. You create models in notebooks, sessions, inside projects. 13:36 Lois: What are some other important terminologies related to models? Himanshu: The next terminology is model catalog. The model catalog is a place to store, track, share, and manage models. The model catalog is a centralized and managed repository of model artifacts. A stored model includes metadata about the provenance of the model, including Git-related information and the script. Our notebook used to push the model to the catalog. Models stored in the model catalog can be shared across members of a team, and they can be loaded back into a notebook session. The next one is model deployments. Model deployments allow you to deploy models stored in the model catalog as HTTP endpoints on managed infrastructure. 14:24 Lois: So, how do you operationalize these models? Himanshu: Deploying machine learning models as web applications, HTTP API endpoints, serving predictions in real time is the most common way to operationalize models. HTTP endpoints or the API endpoints are flexible and can serve requests for the model predictions. Data science jobs enable you to define and run a repeatable machine learning tasks on fully managed infrastructure. Nikita: Thanks for that, Himanshu. 14:57 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You'll find training on everything from cloud computing, database, and security, artificial intelligence, and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started. 15:25 Nikita: Welcome back! The Oracle AI Stack consists of AI services and machine learning services, and these services are built using AI infrastructure. So, let's move on to that. Hemant, what are the components of OCI AI Infrastructure? Hemant: OCI AI Infrastructure is mainly composed of GPU-based instances. Instances can be virtual machines or bare metal machines. High performance cluster networking that allows instances to communicate to each other. Super clusters are a massive network of GPU instances with multiple petabytes per second of bandwidth. And a variety of fully managed storage options from a single byte to exabytes without upfront provisioning are also available. 16:14 Lois: Can we explore each of these components a little more? First, tell us, why do we need GPUs? Hemant: ML and AI needs lots of repetitive computations to be made on huge amounts of data. Parallel computing on GPUs is designed for many processes at the same time. A GPU is a piece of hardware that is incredibly good in performing computations. GPU has thousands of lightweight cores, all working on their share of data in parallel. This gives them the ability to crunch through extremely large data set at tremendous speed. 16:54 Nikita: And what are the GPU instances offered by OCI? Hemant: GPU instances are ideally suited for model training and inference. Bare metal and virtual machine compute instances powered by NVIDIA GPUs H100, A100, A10, and V100 are made available by OCI. 17:14 Nikita: So how do we choose what to train from these different GPU options? Hemant: For large scale AI training, data analytics, and high performance computing, bare metal instances BM 8 X NVIDIA H100 and BM 8 X NVIDIA A100 can be used. These provide up to nine times faster AI training and 30 times higher acceleration for AI inferencing. The other bare metal and virtual machines are used for small AI training, inference, streaming, gaming, and virtual desktop infrastructure. 17:53 Lois: And why would someone choose the OCI AI stack over its counterparts? Hemant: Oracle offers all the features and is the most cost effective option when compared to its counterparts. For example, BM GPU 4.8 version 2 instance costs just $4 per hour and is used by many customers. Superclusters are a massive network with multiple petabytes per second of bandwidth. It can scale up to 4,096 OCI bare metal instances with 32,768 GPUs. We also have a choice of bare metal A100 or H100 GPU instances, and we can select a variety of storage options, like object store, or block store, or even file system. For networking speeds, we can reach 1,600 GB per second with A100 GPUs and 3,200 GB per second with H100 GPUs. With OCI storage, we can select local SSD up to four NVMe drives, block storage up to 32 terabytes per volume, object storage up to 10 terabytes per object, file systems up to eight exabyte per file system. OCI File system employs five replicated storage located in different fault domains to provide redundancy for resilient data protection. HPC file systems, such as BeeGFS and many others are also offered. OCI HPC file systems are available on Oracle Cloud Marketplace and make it easy to deploy a variety of high performance file servers. 19:50 Lois: I think a discussion on AI would be incomplete if we don't talk about responsible AI. We're using AI more and more every day, but can we actually trust it? Hemant: For us to trust AI, it must be driven by ethics that guide us as well. Nikita: And do we have some principles that guide the use of AI? Hemant: AI should be lawful, complying with all applicable laws and regulations. AI should be ethical, that is it should ensure adherence to ethical principles and values that we uphold as humans. And AI should be robust, both from a technical and social perspective. Because even with the good intentions, AI systems can cause unintentional harm. AI systems do not operate in a lawless world. A number of legally binding rules at national and international level apply or are relevant to the development, deployment, and use of AI systems today. The law not only prohibits certain actions but also enables others, like protecting rights of minorities or protecting environment. Besides horizontally applicable rules, various domain-specific rules exist that apply to particular AI applications. For instance, the medical device regulation in the health care sector. In AI context, equality entails that the systems' operations cannot generate unfairly biased outputs. And while we adopt AI, citizens right should also be protected. 21:30 Lois: Ok, but how do we derive AI ethics from these? Hemant: There are three main principles. AI should be used to help humans and allow for oversight. It should never cause physical or social harm. Decisions taken by AI should be transparent and fair, and also should be explainable. AI that follows the AI ethical principles is responsible AI. So if we map the AI ethical principles to responsible AI requirements, these will be like, AI systems should follow human-centric design principles and leave meaningful opportunity for human choice. This means securing human oversight. AI systems and environments in which they operate must be safe and secure, they must be technically robust, and should not be open to malicious use. The development, and deployment, and use of AI systems must be fair, ensuring equal and just distribution of both benefits and costs. AI should be free from unfair bias and discrimination. Decisions taken by AI to the extent possible should be explainable to those directly and indirectly affected. 23:01 Nikita: This is all great, but what does a typical responsible AI implementation process look like? Hemant: First, a governance needs to be put in place. Second, develop a set of policies and procedures to be followed. And once implemented, ensure compliance by regular monitoring and evaluation. Lois: And this is all managed by developers? Hemant: Typical roles that are involved in the implementation cycles are developers, deployers, and end users of the AI. 23:35 Nikita: Can we talk about AI specifically in health care? How do we ensure that there is fairness and no bias? Hemant: AI systems are only as good as the data that they are trained on. If that data is predominantly from one gender or racial group, the AI systems might not perform as well on data from other groups. 24:00 Lois: Yeah, and there's also the issue of ensuring transparency, right? Hemant: AI systems often make decisions based on complex algorithms that are difficult for humans to understand. As a result, patients and health care providers can have difficulty trusting the decisions made by the AI. AI systems must be regularly evaluated to ensure that they are performing as intended and not causing harm to patients. 24:29 Nikita: Thank you, Hemant and Himanshu, for this really insightful session. If you're interested in learning more about the topics we discussed today, head on over to mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course. Lois: That's right, Niki. You'll find demos that you watch as well as skill checks that you can attempt to better your understanding. In our next episode, we'll get into the OCI AI Services we discussed today and talk about them in more detail. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 25:05 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
Topics covered in this episode: uv: Python packaging in Rust jpterm Everything You Can Do with Python's textwrap Module HTML First Extras Joke Watch on YouTube About the show Sponsored by ScoutAPM: pythonbytes.fm/scout Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. First, we are likely skipping next week folks. I'll be at PyCon Philippines. Brian #1: uv: Python packaging in Rust Suggested by Collin Sullivan “uv is designed as a drop-in replacement for pip and pip-tools” Intended to support the pip and pip-tools APIs, just use uv pip instead. Oh yeah, also replaces venv and virtualenv. And it's super zippy, as you would expect. I'm still getting used to it uv pip venv didn't have --prompt at first. But that's fixed. should get released soon. first thing I tried uv pip install ./ and uv pip install pytest second. worked awesome uv pip list third thing I tried not there either, but uv pip freeze is similar. Issue already filed Seriously, I'm excited about this. It's just that it seems I wasn't the target workflow for this. See also tox-uv - speed up tox with uv [rye](https://lucumr.pocoo.org/2024/2/15/rye-grows-with-uv/) from Armin Ronacher, will be supported by Astral - MK: Switched to this for dev. It's excellent. For some reason, doesn't work on Docker? From Henry Michael #2: jpterm via David Brochart jpterm is a JupyterLab-like environment running in the terminal. What sets jpterm apart is that it builds on the shoulders of giants, one of which is Textual. It is designed similarly to JupyterLab, where everything is a plugin. Brian #3: Everything You Can Do with Python's textwrap Module Martin Heinz Nice quick demo of one of my favorite builtin modules. Features shorten text and insert placeholders wrap can split lines to the same length but can also just split a string into equal chunks for batch processing TextWrapper class does all sorts of fancy stuff. dedent is my fave. Awesome for including a multiline string in a test function as an expected outcome. Michael #4: HTML First HTML First is a set of guidelines for making it easier, faster and more maintainable to build web software Principles Leveraging the default capabilities of modern web browsers. Leveraging the extreme simplicity of HTML's attribute syntax. Leveraging the web's ViewSource affordance. Practices Prefer Vanilla approaches Use HTML attributes for styling and behaviour Use libraries that leverage HTML attributes Avoid Build Steps Prefer Naked HTML Be View-Source Friendly Extras Brian: pytest 8.0.1 released. Fixes the parametrization order reversal I mentioned a couple episodes ago, plus some other fixes. Learn about dependency injection from Hynek If you want to jump into some Rust to help speed up Python tools, maybe check out yarr.fyi I just interviewed Nicole, the creator, for Python Test, and this looks pretty cool Her episode should come out in a couple of weeks. Ramping up more interviews for Python People. So please let me know if you'd like to be on the show or if you have suggestions for people you'd like me to interview. Also, I know this is weird, some people are still on X, and not like “didn't close their account when they left”, but actually still using it. This is ironically a reverse of X-Files. “I don't want to believe”. However, I've left my account open for those folks. I check it like twice a month. But eventually I'll see it if you DM me. But really, there are easier ways to reach me. Michael: PyData Pittsburg CFP Wyden: Data Broker Used Abortion Clinic Visitor Location Data To Help Send Targeted Misinformation To Vulnerable Women SciPy 2024 - Call for Proposals Joke: Yeti tumbler
Web and Mobile App Development (Language Agnostic, and Based on Real-life experience!)
In this podcast episode, Krish explores Teradata from scratch. He starts by introducing Teradata as a complete cloud analytics and data platform, suitable for building large-scale data warehousing applications. He explains the concepts of data warehousing, data lakes, and data marts. Krish then explores Teradata's platform and products, including Teradata Vantage and ClearScape Analytics. He demonstrates how to get started with Teradata by creating an environment and exploring the JupyterLab interface. Krish creates tables, loads data, and runs queries in Teradata, providing hands-on experience and learning along the way. Krish explores the Teradata platform and its functionalities. He starts by troubleshooting a query and identifying the issue. Then, he runs basic queries to demonstrate the SQL syntax. Krish also discusses the availability of third-party plugins and explores some of them. Finally, he concludes the episode by discussing the next steps for further exploration and learning. Takeaways Teradata is a complete cloud analytics and data platform suitable for building large-scale data warehousing applications. Data warehousing, data lakes, and data marts are important concepts to understand in the context of Teradata. Teradata offers a range of products and platforms, including Teradata Vantage and ClearScape Analytics. JupyterLab and Jupyter Notebooks can be used to interact with Teradata and perform data analysis and exploration. Creating tables, loading data, and running queries are essential tasks in Teradata. Teradata is a powerful platform for data analysis and management. Troubleshooting queries is an essential skill for working with Teradata. Basic SQL syntax can be used to run queries on Teradata. Third-party plugins can enhance the functionality of Teradata. Chapters 00:00 Introduction to Teradata 01:16 Understanding Data Warehousing and Data Lakes 03:35 Data Marts and Teradata 04:26 Exploring Teradata's Platform and Products05:41Getting Started with Teradata 06:25 Teradata Vantage and ClearScape Analytics 07:57 Understanding JupyterLab and Jupyter Notebooks 19:14 Exploring JupyterLab Extensions 28:18 Creating Tables and Loading Data in Teradata 48:02 Running Queries in Teradata 53:49 Troubleshooting Query 55:14 Running Basic Queries 56:00 Third-Party Plugins 57:14 Exploring Plugins 58:18 Next Steps and Further Exploration 58:45 Conclusion Snowpal Products Backends as Services on AWS Marketplace Mobile Apps on App Store and Play Store Web App Education Platform for Learners and Course Creators
In this riveting episode of The Cyber Consulting Room, host Gordon Draper engages in a thought-provoking conversation with the exceptionally talented Yianna Paris, a seasoned cybersecurity professional with a journey that is as unconventional as it is inspiring. Yianna's entrance into the cybersecurity realm, fueled by her early fascination with breaking video games, sets the stage for an exploration of her diverse and impactful career. From running her own business and inadvertently becoming the go-to tech support for hacked accounts to joining SEEK as a software developer, Yianna's trajectory is marked by a unique blend of hands-on experience and formal education, including a Bachelor of Digital Media Design and a Bachelor of Computer Science. As a trusted advisor, Yianna shares insights into the challenges of hiring the right consultant for the right position, emphasizing the significance of adaptability and the potential clash between traditional governance and agile environments. Drawing from her consulting experiences in the Netherlands, Yianna unveils memorable moments, including the surprising revelation that even cows can be hackers. Throughout the interview, Yianna dispels myths surrounding the consulting industry, emphasizing its diversity and the hands-on nature of the work.Listeners are treated to invaluable advice, from pacing oneself in the overwhelming field of cybersecurity to the importance of admitting when one doesn't know something. Yianna highlights her go-to tools and frameworks, including JupyterLab, Jupyter Notebooks, Obsidian, Miro, and the power of search engines. Beyond the technical realm, she shares her favorite hacker movie, her dream of living in Iceland, and recommends three cybersecurity books, adding a personal touch to the conversation. For more episodes like this visit https://cyberconsultingroom.com You can find more information about Cyber Consulting Room Podcast Host at https://www.linkedin.com/in/gordondraper/
Talk Python To Me - Python conversations for passionate developers
Jupyter Notebooks and Jupyter Lab have to be one of the most important parts of Python when it comes to bring new users to the Python ecosystem and certainly for the day to day work of data scientists and general scientists who have made some of the biggest discoveries of recent times. And that platform has recently gotten a major upgrade with JupyterLab 4 released and Jupyter Notebook being significantly reworked to be based on the changes from JupyterLab as well. We have an excellent panel of guests, Sylvain Corlay, Frederic Collonval, Jeremy Tuloup, and Afshin Darian here to tell us what's new in these and other parts of the Jupyter ecosystem. Links from the show Guests Sylvain Corlay Frederic Collonval Jeremy Tuloup Afshin Darian JupyterLab 4.0 is Here: blog.jupyter.org Announcing Jupyter Notebook 7: blog.jupyter.org JupyterCon 2023 Videos: youtube.com Jupyterlite: github.com Download JupyterLab Desktop: github.com Mythical Man Month Book: wikipedia.org Blender in Jupyter: twitter.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy Sponsors Phylum Python Tutor Talk Python Training
Новый выпуск посвятили актуальным новостям за июнь 2023 года в мире Python. Ниже оставили ссылки на все материалы этого подкаста. Ссылки на новости из выпуска: StackOverflow выпустил результаты очередного опроса разработчиков В python 3.13 удалят еще 20 модулей из stdlib (PEP 594) Выбрали участников PSF Board на 2023 год PSF наняли специалиста по безопасности Видео с Pycon US Видео с DjangoCon EU релиз Jupyter Lab 4 Ведущие: Михаил Корнеев и Григорий Петров Ссылки выпуска: Канал Миши в Telegram — https://t.me/tricky_python Канал Moscow Python в Telegram — https://t.me/moscow_python Митап Moscow Python 15 июня —https://moscowdjango.timepad.ru/event/2445754/ Все выпуски — https://podcast.python.ru Митапы MoscowPython — https://moscowpython.ru Курс Learn Python — https://learn.python.ru/
Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training Test & Code Podcast Patreon Supporters Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Brian #1: Plumbum: Shell Combinators and More Suggested by Henry Schreiner last week. (Also, thanks Michael for the awesome search tool on PythonBytes.fm that includes transcripts, so I can find stuff discussed and not just stuff listed in the show notes.) Plumbum is “ a small yet feature-rich library for shell script-like programs in Python. The motto of the library is “Never write shell scripts again”, and thus it attempts to mimic the shell syntax (shell combinators) where it makes sense, while keeping it all Pythonic and cross-platform.” Supports local commands piping redirection working directory changes in a with block. So cool. lots more fun features Michael #2: Our plan for Python 3.13 The big difference is that we have now finished the foundational work that we need: Low impact monitoring (PEP 669) is implemented. The bytecode compiler is a much better state. The interpreter generator is working. Experiments on the register machine are complete. We have a viable approach to create a low-overhead maintainable machine code generator, based on copy-and-patch. We plan three parallelizable pieces of work for 3.13: The tier 2 optimizer Enabling subinterpreters from Python code (PEP 554). Memory management Details on superblocks Brian #3: Some blogging myths Julia Evans myths (more info of each in the blog post): you need to be original you need to be an expert posts need to be 100% correct writing boring posts is bad you need to explain every concept page views matter more material is always better everyone should blog I'd add Write posts to help yourself remember something. Write posts to help future prospective employers know what topics you care about. You know when you find a post that is outdated and now wrong, and the code doesn't work, but the topic is interesting to you. Go ahead and try to write a better post with code that works. Michael #4: Jupyter AI A generative AI extension for JupyterLab An %%ai magic that turns the Jupyter notebook into a reproducible generative AI playground. This works anywhere the IPython kernel runs (JupyterLab, Jupyter Notebook, Google Colab, VSCode, etc.). A native chat UI in JupyterLab that enables you to work with generative AI as a conversational assistant. Support for a wide range of generative model providers and models (AI21, Anthropic, Cohere, Hugging Face, OpenAI, SageMaker, etc.). Official project from Jupyter Provides code insights Debug failing code Provides a general interface for interaction and experimentation with currently available LLMs Lets you collaborate with peers and an Al in JupyterLab Lets you ask questions about local files Video presentation: David Qiu - Jupyter AI — Bringing Generative AI to Jupyter | PyData Seattle 2023 Extras Brian: Textual has some fun releases recently Textualize youtube channel with 3 tutorials so far trogon to turn Click based command line apps into TUIs video example of it working with sqlite-utils. Python in VSCode June Release includes revamped test discovery and execution. You have to turn it on though, as the changes are experimental: "python.experiments.optInto": [ "pythonTestAdapter", ] I just turned it on, so I haven't formed an opinion yet. Michael: Michael's take on the MacBook Air 15” (black one) Joke: Phishing
Watch on YouTube About the show Sponsored by InfluxDB from Influxdata. Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Ask me anything episode: Submit your question(s) for our upcoming AMA episode: form here. Thank you! Brian #1: PythonGUIS Martin Fitzpatrick A site with a collection of resources, guides, books, comparisons, etc, around GUIs in Python. Martin recommends starting with PyQT6 However, there are tutorials covering PyQT6 PySide6 PyQT5 TkInter PySide even Kivy Michael #2: JupyterLab 4.0 is Here The next major release of our full-featured development environment You can upgrade by running pip install --upgrade jupyterlab or conda install -c conda-forge jupyterlab. JupyterLab is now faster, thanks to improvements such as CSS rules optimization, CodeMirror 6, MathJax 3, and notebook windowing. JupyterLab 3 was when working with large notebooks. There are additional performance improvements available via opt-in settings: Faster tab-switching on Chromium browsers: “Settings” → “JupyterLab Shell” → switch “Hidden mode” to “contentVisibility” Better performance with long notebooks: “Settings” → “Notebook” → switch “Windowing mode” to “full” An upgraded text editor. Better real time collaboration. Bug fixes. More than 100 bugs have been addressed and resolved, enhancing JupyterLab's stability and performance. Brian #3: Proposing a struct syntax for Python Brett Cannon This would be a cool syntax for a data only type: struct Point(x: int, y: int) No positional only parameters No inheritance No methods Instances would be immutable, so p = Point(1, 2) would create an object that could be used as a key. A data only focused set of types. Michael #4: Python 3.13 Removes 20 Stdlib Modules via PyCoders From PEP 594 – Removing dead batteries from the standard library we're saying goodbye to aifc, audioop, cgi, cgitb, chunk, crypt, imghdr, mailcap, msilib, nis, nntplib, ossaudiodev, pipes, sndhdr, spwd, sunau, telnetlib, uu, xdrlib As well as the 2to3 program and lib2to3 module in Python. Python 3.12 final release is scheduled in 4 months (October 2023) and Python 3.13 final release is scheduled in 1 year and 4 months (October 2024). Extras Brian: Affirming your PSF Membership voting status You have until June 15 to affirm your voting rights in the upcoming Board Election, if you care about such things. Michael: 5 Career Tips for Budding Python Developers video PyCon US 2023 videos are up Python 3.11.4, 3.10.12, 3.9.17, 3.8.17, 3.7.17, and 3.12.0 beta 2 are now available Joke: Snorkel not included
JupyterCon 2023, the conference on all things Jupyter was held in Paris between 10-12 May 2023, followed by 2 days of hands-on "sprints". Jupyter is a very popular open source platform with tools such as Jupyter notebook/lab and driven by a very active community. There were a number of excellent talks from a range of different subjects. I had the pleasure to meet and talk to a number of people, see the interview list below.Order of Interviews: Leah Silen and Arliss Collins from Numfocus 02:04Franklin Koch (MyST) from Curvenote 04:59Nicolas Thiery (Paris-Saclay) 09:13Sarah Gibson (2i2c) 13:19Ana Ruvalcaba (Jupyter Executive Council) 18:57Fernando Perez (Jupyter Executive Council) 23:48Raniere de Silva (Gesis) 29:56Linkshttps://jupyter.org Jupyter projecthttps://jupyter.org/enhancement-proposals/79-notebook-v7/notebook-v7.html# Release notes for the new Jupyter Notebook v7https://jupyterlab.readthedocs.io/en/latest/getting_started/changelog.html#v4-0 Release notes for JupyterLab v4.0 (further incremental updates of v4 are available)https://www.youtube.com/@JupyterCon YouTube channel for JupyterCon 2023https://cfp.jupytercon.com/2023/schedule/ JupyterCon 2023 schedulehttps://www.outreachy.org Outreachy project https://numfocus.org Numfocus projecthttps://data.agu.org/notebooks-now/ Notebooks Now initiativehttps://myst-tools.org MyST tool for scientific and technical communicationUpcoming RSE conferences:https://rsecon23.society-rse.org UK RSE conference in Swansea 5-8 Sep 2023https://hidden-ref.org/festival-of-hidden-ref/ Hidden Ref in Bristol, UK, 21 Sep 2023https://un-derse23.sciencesconf.org Unconference of the German RSE society deRSE in Jena 26-28 Sephttps://us-rse.org/usrse23/ 1st face to face US RSE Conference in Chicago 16-18 Oct 2023Support the Show.Thank you for listening and your ongoing support. It means the world to us! Support the show on Patreon https://www.patreon.com/codeforthought Get in touch: Email mailto:code4thought@proton.me UK RSE Slack (ukrse.slack.com): @code4thought or @piddie US RSE Slack (usrse.slack.com): @Peter Schmidt Mastadon: https://fosstodon.org/@code4thought or @code4thought@fosstodon.org LinkedIn: https://www.linkedin.com/in/pweschmidt/ (personal Profile)LinkedIn: https://www.linkedin.com/company/codeforthought/ (Code for Thought Profile) This podcast is licensed under the Creative Commons Licence: https://creativecommons.org/licenses/by-sa/4.0/
Watch on YouTube About the show Sponsored by Microsoft for Startups Founders Hub. Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Michael #1: Jupyter Server 2.0 is released! Jupyter Server provides the core web server that powers JupyterLab and Jupyter Notebook. New Identity API: As Jupyter continues to innovate its real-time collaboration experience, identity is an important component. New Authorization API: Enabling collaboration on a notebook shouldn't mean “allow everyone with access to my Jupyter Server to edit my notebooks”. What if I want to share my notebook with e.g. a subset of my teammates? New Event System API: jupyter_events—a package that provides a JSON-schema-based event-driven system to Jupyter Server and server extensions. Terminals Service is now a Server Extension: Jupyter Server now ships the “Terminals Service” as an extension (installed and enabled by default) rather than a core Jupyter Service. pytest-jupyter: A pytest plugin for Jupyter Brian #2: Converting to pyproject.toml Last week, episode 314, we talked about “Tools for rewriting Python code” and I mentioned a desire to convert setup.py/setup.cfg to pyproject.toml Several of you, including Christian Clauss and Brian Skinn, let me know about a few tools to help in that area. Thank you. ini2toml - Automatically translates .ini/.cfg files into TOML “… can also be used to convert any compatible .ini/.cfg file to TOML.” “ini2toml comes in two flavours: “lite” and “full”. The “lite” flavour will create a TOML document that does not contain any of the comments from the original .ini/.cfg file. On the other hand, the “full” flavour will make an extra effort to translate these comments into a TOML-equivalent (please notice sometimes this translation is not perfect, so it is always good to check the TOML document afterwards).” pyproject-fmt - Apply a consistent format to pyproject.toml files Having a consistent ordering and such is actually quite nice. I agreed with most changes when I tried it, except one change. The faulty change: it modified the name of my project. Not cool. pytest plugins are traditionally named pytest-something. the tool replaced the - with _. Wrong. So, be careful with adding this to your tool chain if you have intentional dashes in the name. Otherwise, and still, cool project. validate-pyproject - Automated checks on pyproject.toml powered by JSON Schema definitions It's a bit terse with errors, but still useful. $ validate-pyproject pyproject.toml Invalid file: pyproject.toml [ERROR] `project.authors[{data__authors_x}]` must be object $ validate-pyproject pyproject.toml Invalid file: pyproject.toml [ERROR] Invalid value (at line 3, column 12) I'd probably add tox Don't forget to build and test your project after making changes to pyproject.toml You'll catch things like missing dependencies that the other tools will miss. Michael #3: aws-lambda-powertools-python Via Mark Pender A suite of utilities for AWS Lambda Functions that makes distributed tracing, structured logging, custom metrics, idempotency, and many leading practices easier Looks kinda cool if you prefer to work almost entirely in python and avoid using any 3rd party tools like Terraform etc to manage the support functions of deploying, monitoring, debugging lambda functions - Tracing: Decorators and utilities to trace Lambda function handlers, and both synchronous and asynchronous functions Logging - Structured logging made easier, and decorator to enrich structured logging with key Lambda context details Metrics - Custom Metrics created asynchronously via CloudWatch Embedded Metric Format (EMF) Event handler: AppSync - AWS AppSync event handler for Lambda Direct Resolver and Amplify GraphQL Transformer function Event handler: API Gateway and ALB - Amazon API Gateway REST/HTTP API and ALB event handler for Lambda functions invoked using Proxy integration Bring your own middleware - Decorator factory to create your own middleware to run logic before, and after each Lambda invocation Parameters utility - Retrieve and cache parameter values from Parameter Store, Secrets Manager, or DynamoDB Batch processing - Handle partial failures for AWS SQS batch processing Typing - Static typing classes to speedup development in your IDE Validation - JSON Schema validator for inbound events and responses Event source data classes - Data classes describing the schema of common Lambda event triggers Parser - Data parsing and deep validation using Pydantic Idempotency - Convert your Lambda functions into idempotent operations which are safe to retry Feature Flags - A simple rule engine to evaluate when one or multiple features should be enabled depending on the input Streaming - Streams datasets larger than the available memory as streaming data. Brian #4: How to create a self updating GitHub Readme Bob Belderbos Bob's GitHub profile is nice Includes latest Pybites articles, latest Python tips, and even latest Fosstodon toots And he includes a link to an article on how he did this. A Python script that pulls together all of the content, build-readme.py and fills in a TEMPLATE.md markdown file. It gets called through a GitHub action workflow, update.yml and automatically commits changes, currently daily at 8:45 This happens every day, and it looks like there are only commits if Note: We covered Simon Willison's notes on self updating README on episode 192 in 2020 Extras Brian: GitHub can check your repos for leaked secrets. Julia Evans has released a new zine, The Pocket Guide to Debugging Python Easter Eggs Includes this fun one from 2009 from Barry Warsaw and Brett Cannon >>> from __future__ import barry_as_FLUFL >>> 1 2 True >>> 1 != 2 File "[HTML_REMOVED]", line 1 1 != 2 ^ SyntaxError: invalid syntax Crontab Guru Michael: Canary Email AI 3.11 delivers First chance to try “iPad as the sole travel device.” Here's a report. Follow up from 306 and 309 discussions. Maps be free New laptop design Joke: What are clouds made of?
Watch the live stream: Watch on YouTube About the show Sponsored by Datadog: pythonbytes.fm/datadog Special guest: Brian Skinn (Twitter | Github) Michael #1: OpenBB wants to be an open source challenger to Bloomberg Terminal OpenBB Terminal provides a modern Python-based integrated environment for investment research, that allows an average joe retail trader to leverage state-of-the-art Data Science and Machine Learning technologies. As a modern Python-based environment, OpenBBTerminal opens access to numerous Python data libraries in Data Science (Pandas, Numpy, Scipy, Jupyter) Machine Learning (Pytorch, Tensorflow, Sklearn, Flair) Data Acquisition (Beautiful Soup, and numerous third-party APIs) They have a discord community too BTW, seem to be a successful open source project: OpenBB Raises $8.5M in Seed Round Funding Following Open Source Project Gamestonk Terminal's Success Great graphics / gallery here. Way more affordable than the $1,900/mo/user for the Bloomberg Terminal Brian #2: Python f-strings https://fstring.help Florian Bruhin Quick overview of cool features of f-strings, made with Jupyter Python f-strings Are More Powerful Than You Might Think Martin Heinz More verbose discussion of f-strings Both are great to up your string formatting game. Brian S. #3: pyproject.toml and PEP 621 Support in setuptools PEP 621: “Storing project metadata in pyproject.toml” Authors: Brett Cannon, Dustin Ingram, Paul Ganssle, Pradyun Gedam, Sébastien Eustace, Thomas Kluyver, Tzu-ping Chung (Jun-Oct 2020) Covers build-tool-independent fields (name, version, description, readme, authors, etc.) Various tools had already implemented pyproject.toml support, but not setuptools Including: Flit, Hatch, PDM, Trampolim, and Whey (h/t: Scikit-HEP) Not Poetry yet, though it's under discussion setuptools support had been discussed pretty extensively, and had been included on the PSF's list of fundable packaging improvements Initial experimental implementation spearheaded by Anderson Bravalheri, recently completed Seeking testing and bug reports from the community (Discuss thread) I tried it on one of my projects — it mostly worked, but revealed a bug that Anderson fixed super-quick (proper handling of a dynamic long_description, defined in setup.py) Related tools (all early-stage/experimental AFAIK) ini2toml (Anderson Bravalheri) — Can convert setup.cfg (which is in INI format) to pyproject.toml Mostly worked well for me, though I had to manually fix a couple things, most of which were due to limitations of the INI format INI has no list syntax! validate-pyproject (Anderson Bravalheri) — Automated pyproject.toml checks pyproject-fmt (Bernát Gábor) — Autoformatter for pyproject.toml Don't forget to use it with build, instead of via a python setup.py invocation! $ pip install build $ python -m build Will also want to constrain your setuptools version in the build-backend.requires key of pyproject.toml (you are using PEP517/518, right??) Michael #4: JSON Web Tokens @ jwt.io JSON Web Tokens are an open, industry standard RFC 7519 method for representing claims securely between two parties. Basically a visualizer and debugger for JWTs Enter an encoded token Select a decryption algorithm See the payload data verify the signature List of libraries, grouped by language Brian #5: Autocorrect and other Git Tricks - Waylon Walker - Use `git config --global help.autocorrect 10` to have git automatically run the command you meant in 1 second. The `10` is 10 x 1/10 of a second. So `50` for 5 seconds, etc. Automatically set upstream branch if it's not there git config --global push.default current You may NOT want to do this if you are not careful with your branches. From https://stackoverflow.com/a/22933955 git commit -a Automatically “add” all changed and deleted files, but not untracked files. From https://git-scm.com/docs/git-commit#Documentation/git-commit.txt--a Now most of my interactions with git CLI, especially for quick changes, is: $ git checkout main $ git pull $ git checkout -b okken_something $ git commit -a -m 'quick message' $ git push With these working, with autocorrect $ git chkout main $ git pll $ git comit -a -m 'quick message' $ git psh Brian S. #6: jupyter-tempvars Jupyter notebooks are great, and the global namespace of the Python kernel backend makes it super easy to flow analysis from one cell to another BUT, that global namespace also makes it super easy to footgun, when variables leak into/out of a cell when you don't want them to jupyter-tempvars notebook extension Built on top of the tempvars library, which defines a TempVars context manager for handling temporary variables When you create a TempVars context manager, you provide it patterns for variable names to treat as temporary In its simplest form, TempVars (1) clears matching variables from the namespace on entering the context, and then (2) clears them again upon exiting the context, and restoring their prior values, if any TempVars works great, but it's cumbersome and distracting to manually include it in every notebook cell where it's needed With jupyter-tempvars, you instead apply tags with a specific format to notebook cells, and the extension automatically wraps each cell's code in a TempVars context before execution Javascript adapted from existing extensions Patching CodeCell.execute, from the jupyter_contrib_nbextensions ‘Execution Dependencies' extension, to enclose the cell code with the context manager Listening for the ‘kernel ready' event, from [jupyter-black](https://github.com/drillan/jupyter-black/blob/d197945508a9d2879f2e2cc99cafe0cedf034cf2/kernel_exec_on_cell.js#L347-L350), to import the [TempVars](https://github.com/bskinn/jupyter-tempvars/blob/491babaca4f48c8d453ce4598ac12aa6c5323181/src/jupyter_tempvars/extension/jupyter_tempvars.js#L42-L46) context manager upon kernel (re)start See the README (with animated GIFs!) for installation and usage instructions It's on PyPI: $ pip install jupyter-tempvars And, I made a shortcut install script for it: $ jupyter-tempvars install && jupyter-tempvars enable Please try it out, find/report bugs, and suggest features! Future work Publish to conda-forge (definitely) Adapt to JupyterLab, VS Code, etc. (pending interest) Extras Brian: Ok. Python issues are now on GitHub. Seriously. See for yourself. Lorem Ipsum is more interesting than I realized. O RLY Cover Generator Example: Michael: New course: Secure APIs with FastAPI and the Microsoft Identity Platform Pyenv Virtualenv for Windows (Sorta'ish) Hipster Ipsum Brian S.: PSF staff is expanding PSF hiring an Infrastructure Engineer Link now 404s, perhaps they've made their hire? Last year's hire of the Packaging Project Manager (Shamika Mohanan) Steering Council supports PSF hiring a second developer-in-residence PSF has chosen its new Executive Director: Deb Nicholson! PyOhio 2022 Call for Proposals is open Teaser tweet for performance improvements to pydantic Jokes: https://twitter.com/CaNerdIan/status/1512628780212396036 https://www.reddit.com/r/ProgrammerHumor/comments/tuh06y/i_guess_we_all_have_been_there/ https://twitter.com/PR0GRAMMERHUM0R/status/1507613349625966599
A new ransomware has been found attacking JupyterWeb Notebooks. This episode talks about the detail and what you should do if you use or manage a JupyterWeb Notebook for Python development. Be aware, be safe. Get ExpressVPN, Secure Your Privacy And Support The ShowBecome A Patron! Patreon Page *** Support the podcast with a cup of coffee *** - Ko-Fi Security In Five —————— Where you can find Security In Five —————— Security In Five Reddit Channel r/SecurityInFive Binary Blogger Website Security In Five Website Security In Five Podcast Page - Podcast RSS Twitter @securityinfive iTunes, YouTube, TuneIn, iHeartRadio,
Watch the live stream: Watch on YouTube About the show Sponsored by Shortcut - Get started at shortcut.com/pythonbytes Special guest: Karen Dalton Brian #1: stale : github bot to “Close Stale Issues and PRs” Was one response to a question by Will McGugan Something like “An issue filed on an open source project, I've asked a followup question about the issue, and filer doesn't respond. Is there an easy way to close the issue after a set time period of inactivity.” Just trying to get a reference to Will out of the way early in the episode. stale does this: Warns and then closes issues and PRs that have had no activity for a specified amount of time. The configuration must be on the default branch and the default values will: Add a label "Stale" on issues and pull requests after 60 days of inactivity and comment on them Close the stale issues and pull requests after 7 days of inactivity If an update/comment occur on stale issues or pull requests, the stale label will be removed and the timer will restart If defaults seem too short or harsh, everything is configurable Michael #2: jut - JUpyter notebook Terminal viewer via kidpixo The command line tool view the IPython/Jupyter notebook in the terminal. Even works against remote ipynb files (via http) Karen #3: JupyterLyte via Marcel Milcent @MarcelMilcent JupyterLite is a JupyterLab distribution that runs entirely in the browser and is interactive Built from using JupyterLab components and extensions Being developed by core Jupyter developers, but the project is still unofficial Example: https://jupyterlite.readthedocs.io/en/latest/_static/lab/index.html Offers JupyterLab or RetroLab (a.k.a JupyterLab Classic) look No application server required, cacheable Try "import this"! Brian #4: Feature comparison of ack, ag, git-grep, GNU grep and ripgrep ack now, supplies are limited! Tangent for those unfamiliar with grep grep is an essential tool for many developers that prints lines that match a pattern grep foo *.py - list all lines containing “foo” in this directory grep -l foo **/*.py | grep -v venv **``*/**``.py Recursively find all Python files this directory and all subdirectories -l Print just the name of the file if it contains a “foo” in it. | grep -v venv Exclude virtual environments, because there's a lot of “foo” in there. (There's gotta be a better way to do this, someone suggest a better way, please). Article compares ack, ag “The silver Searcher”, git-grep, grep, and rg “ripgrep” Language, Licence, and regex versions Features like parallelism, config, etc. Fine grain feature comparisons searching capability regular expression style search output file presentation file finding inclusion, exclusion file type specification random other features This is on the ack website, and kinda makes my want to try ripgrep. Michael #5: Python Client for Airtable: pyairtable by Gui Talarico What is Airtable? Hmm kind of like: Excel Trello boards CI Pipelines A big player on nocode/lowcode community Check out the quickstart to see how it works. Karen #6: Black can now format notebooks via Marco Gorelli gh: MarcoGorelli (creator of nbQA [isort, pyupgrade, mypy, pylint, flake8, and more on Jupyter Notebooks]) pip install black[jupyter] black mynotebook.ipynb “…it should be significantly more robust than the current third-party tools” Extras Michael Trying a new password manager (sorta): Bitwarden The PSF is looking for an Executive Director Want a person in anime form? Python 3.11.0a2 is out (via PyCoders) Karen Volunteer in your local Python community (or volunteer to speak) Joke:
Ведущие подкаста "Data Coffee" обсуждают новости и делятся своими мыслями! Подкаст `Data Coffee` — информационный партнёр конференции SmartData 2021. SmartData — это большая техническая конференция по Data Engineering. Десятки докладов, воркшопов, Q&A-сессий — первые доклады и имена спикеров уже появляются на сайте! Промокод на 2000 рублей: datacoffe2021JRGpc Shownotes: 01:33 Python 3.10 08:02 JupyterLab standalone application 12:22 PostgreSQL 14 13:30 Apache Kafka 3.0.0 14:31 SemVer 17:23 askgit 24:05 Windows 11 про которую никто не может ничего сказать 27:37 Игры и Vulkan (но не ставки на спорт) 29:40 Как нас коснулось падение сервисов Facebook 35:25 Идентификация лицом и жертва киберпреступления среди ведущих 41:08 Apple купил сервис классической музыки Primephonic 48:35 Программист 2 года назад начал работать в виртуальной реальности 1:00:46 Есть ли у собак СДВГ 1:04:56 Новая рубрика подкаста: Off Topic Обложка - Freshmaniac, Public domain, via Wikimedia Commons Сайт: https://datacoffee.site, канал в Telegram: https://t.me/datacoffee, профиль в Twitter: https://twitter.com/_DataCoffee_ Чат подкаста, где можно предложить темы для будущих выпусков, а также обсудить эпизоды: https://t.me/datacoffee_chat
Watch the live stream: Watch on YouTube About the show Sponsored by us: Check out the courses over at Talk Python And Brian's book too! Special guest: Ethan Swan Michael #0: Changing themes to DIY Brian #1: SQLFluff Suggested by Dave Kotchessa. A SQL Linter, written in Python, tested with pytest Configurable, and configuration can live in many places including tox.ini and pyproject.toml. Great docs Rule reference with anti-pattern/best practice format Includes dialects for ANSI, PostgreSQL, MySQL, Teradata, BigQuery, Snoflake Note in docs: “SQLFluff is still in an open alpha phase - expect the tool to change significantly over the coming months, and expect potentially non-backward compatible api changes to happen at any point.” Michael #2: JupyterLab Desktop JupyterLab App is the cross-platform standalone application distribution of JupyterLab. Bundles a Python environment with several popular Python libraries ready to use in scientific computing and data science workflows. JupyterLab App works on Debian and Fedora based Linux, macOS and Windows operating systems. Ethan #3: Requests Cache Create a requests_cache session and call HTTP methods from there You can also do it without a session but that's a bit weird, looks like it's monkey patching requests or something… Results are cached Very handy for repeatedly calling endpoints especially if the returned data is large, or the server has to do some compute Reminds me of @functools.lru_cache Can set things like how long the cache should last (when to invalidate) Funny easter egg in example: “# Cache 400 responses as a solemn reminder of your failures” Brian #4: pypi-rename This is a cookiecutter template from Simon Willison Backstory: To refresh my memory on how to publish a new package with flit I created a new pytest plugin. Brian Skinn noticed it somehow, and suggested a better name. Thanks Brian. So, how to nicely rename. I searched and found Simon's template, which is… A cookiecutter template. So you can use cookiecutter to do some of this work for you. But it's based on setuptools, and I kinda like flit lately, so I just used the instructions. The README.md includes instructions for the steps needed: Create renamed version Publish under new name Change old one to depend on new one, but be mostly empty Modify readme to tell people what's going on Publish old name as a notice Now people looking for old one will find new one. People just installing old one will end up with new one also since it's a dependency. Michael #5: Django 4 coming with Redis Adapter #33012 closed New feature (fixed) → Add a Redis cache backend. Adds support for Redis to be used as a caching backend with Django. Redis is the most popular caching backend, adding it to django.core.cache module would be a great addition for developers who previously had to rely on the use of third party packages. It will be simpler than that provided by django-redis, for instance customising the serialiser is out-of-scope for the initial pass. Ethan #6: PEP 612 It wasn't possible to type a function that took in a function and returned a function with the same signature (which is what many decorators do) This creates a ParamSpec – which is much like a TypeVar, for anyone who has used them to type generic functions/classes It's a reminder that typing is still missing features and evolving, and it's good to accept the edge cases for now – “gradual typing” Reading Fluent Python by Ramalho has influenced my view on this – don't lose your mind trying to type crazy stuff, just accept that it's “gradual” Mention how typing is still evolving in Python and it's good to keep an eye out for new features that help you (see also PEP 645 – using int? for Optional[int]; and PEP 655 – annotating some TypedDict keys as required and others not required) Extras Michael Earsketch Django Critical CVE: CVE-2021-35042 Vulnerable versions: >= 3.0.0, < 3.1.13 Patched version: 3.1.13 Django 3.1.x before 3.1.13 and 3.2.x before 3.2.5 allows QuerySet.order_by SQL injection if order_by is untrusted input from a client of a web application. Ethan Pedalboard I happened upon this project recently and checked back, only to see that Brett Cannon was the last committer! A doc fix, like he suggested last episode Brian Zero Cost Exceptions in Python 3.11 Suggested by John Hagen Guido, Mark Shannon, and others at Microsoft are working on speeding up Python faster-cpython/ideas repo includes a slide deck from Guido which includes “Zero overhead” exception handling. Python 3.11 “What's New” page, Optimizations section includes: “Zero-cost” exceptions are implemented. The cost of try statements is almost eliminated when no exception is raised. (Contributed by Mark Shannon in bpo-40222.) MK: I played with this a bit. Joke: QA 101
Ferro: https://gradio.app/ Crea interfaces de usuario para prototipar tu modelo de ML en 3 minutos Se integra directamente en tu cuaderno de Python o puedes compartir un enlace con tu equipo interdisciplinario Chekos: https://registration.2021.foss4g.org/OSGeo/FOSS4G/ FOSS4G - una de las conferencias de tecnología geoespacial - está programada para el 27 de Septiembre! Es en Argentina pero va a ser online este año Va a haber 14 tracks simultáneos y hasta 55 talleres Aquí unos que llamaron la atención: Introducción a gvSIG Desktop: https://callforpapers.2021.foss4g.org/foss4g-2021-workshop/talk/ZGPA7K/ Automatización de tareas utilizando el Modelador de Procesos de QGIS: https://callforpapers.2021.foss4g.org/foss4g-2021-workshop/talk/VTXFTA/ Alan: https://blog.jupyter.org/jupyterlab-desktop-app-now-available-b8b661b17e9a JupyterLab como aplicación de Desktop! Ferro: https://ggforce.data-imaginist.com/ ggforce es un paquete para proporcionar la funcionalidad faltante a ggplot2 a través del sistema de extensión introducido con ggplot2 v2.0.0. El objetivo es proporcionar un repositorio de geoms, estadísticas, etc. que estén tan bien documentados e implementados como los oficiales que se encuentran en ggplot2. Chekos: https://tuowang.rbind.io/post/2021-04-25-color-in-ggplot2/ Un tutorialcito sobre como crear tus propios temas de ggplot2 Viene con mucho código de ejemplo y varios enlaces a otros recursos para escoger colores Y Jesús Anaya nos platica de robots! https://twitter.com/jarmandoanaya/status/1442588113075728388?s=20 --- This episode is sponsored by · Anchor: The easiest way to make a podcast. https://anchor.fm/app · Anchor: The easiest way to make a podcast. https://anchor.fm/app --- Send in a voice message: https://anchor.fm/tacosdedatos/message
Talk Python To Me - Python conversations for passionate developers
On this episode, Rob Emanuele and Tom Augspurger join us to talk about building and running Microsoft's Planetary Computer project. This project is dedicated to providing the data around climate records and the compute necessary to process it with the mission of help use all understand climate change better. It combines multiple petabytes of data with a powerful hosted Jupyterlab notebook environment to process it. Links from the show Rob Emanuele on Twitter: @lossyrob Tom Augspurger on Twitter: @TomAugspurger Video at example walkthrough by Tom if you want to follow along: youtube.com?t=2360 Planetary computer: planetarycomputer.microsoft.com Applications in public: planetarycomputer.microsoft.com Microsoft's Environmental Commitments Carbon negative: blogs.microsoft.com Report: microsoft.com AI for Earth grants: microsoft.com Python SDK: github.com Planetary computer containers: github.com IPCC Climate Report: ipcc.ch Episode transcripts: talkpython.fm Stay in touch with us Subscribe on YouTube (for live streams): youtube.com Follow Talk Python on Twitter: @talkpython Follow Michael on Twitter: @mkennedy Sponsors Shortcut Talk Python Training AssemblyAI
Timestamps(01:59) Saishruthi talked about her upbringing, growing up in a rural town in India with no Internet connection and no computers.(05:50) Saishruthi discussed her undergraduate studying Electrical Engineering at Sri Sairam Engineering College in the early 2010s.(11:56) Saishruthi mentioned the projects and learnings during her two years working at Tata Consultancy Services as an instrumentation engineer.(15:57) Saishruthi went over her MS degree in Electrical Engineering at San Jose State University and her journey into data science.(22:20) Saishruthi shared the initial hurdles she faced transitioning back to school and assimilating to the US culture.(26:10) Saishruthi touched on her work with San Jose City on disaster management.(28:20) Saishruthi went over her job search process, eventually landing a data science position at IBM.(32:16) Saishruthi unpacked lessons learned from public speaking.(35:20) Saishruthi summarized IBM's data science and machine learning initiatives.(37:02) Saishruthi brought up various projects happening at IBM's Center for Open Source Data and AI Technologies, whose mission is to make open-source AI models dramatically easier to create, deploy, and manage in the enterprise.(39:40) Saishruthi unpacked the qualities needed to contribute to open-source projects and their role in shaping the development of ML technologies.(44:50) Saishruthi dissected examples of bias in ML, identified solutions to combat unwanted bias, and presented tools for that (as delivered in her talk titled “Digital Discrimination: Cognitive Bias in Machine Learning”).(49:12) Saishruthi shared her thoughts on the evolution of research and applications within the Trusted AI landscape.(54:07) Saishruthi discussed the core value propositions of IBM's Elyra, a set of AI-centric extensions to JupyterLab that aims to help data practitioners deal with the complexities of the model development lifecycle.(56:11) Saishruthi briefly shared the challenges with developing Coursera courses on data visualization with Python and with R.(01:00:47) Saishruthi went over her passion for movements such as Women In Tech and Girls Who Code.(01:03:27) Saishruthi shared details about her initiative to bring education to rural children.(01:06:36) Closing segment.Saishruthi's Contact InfoTwitterLinkedInMediumGitHubCourseraMentioned ContentTalks“Digital Discrimination: Cognitive Bias in Machine Learning” (All Things Open 2020)ProjectsAI Fairness 360AI Explainability 360Adversarial Robustness ToolkitModel Asset ExchangeData Asset ExchangeElyraCoursesData Visualization with PythonData Visualization with RAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Timestamps(01:59) Saishruthi talked about her upbringing, growing up in a rural town in India with no Internet connection and no computers.(05:50) Saishruthi discussed her undergraduate studying Electrical Engineering at Sri Sairam Engineering College in the early 2010s.(11:56) Saishruthi mentioned the projects and learnings during her two years working at Tata Consultancy Services as an instrumentation engineer.(15:57) Saishruthi went over her MS degree in Electrical Engineering at San Jose State University and her journey into data science.(22:20) Saishruthi shared the initial hurdles she faced transitioning back to school and assimilating to the US culture.(26:10) Saishruthi touched on her work with San Jose City on disaster management.(28:20) Saishruthi went over her job search process, eventually landing a data science position at IBM.(32:16) Saishruthi unpacked lessons learned from public speaking.(35:20) Saishruthi summarized IBM's data science and machine learning initiatives.(37:02) Saishruthi brought up various projects happening at IBM's Center for Open Source Data and AI Technologies, whose mission is to make open-source AI models dramatically easier to create, deploy, and manage in the enterprise.(39:40) Saishruthi unpacked the qualities needed to contribute to open-source projects and their role in shaping the development of ML technologies.(44:50) Saishruthi dissected examples of bias in ML, identified solutions to combat unwanted bias, and presented tools for that (as delivered in her talk titled “Digital Discrimination: Cognitive Bias in Machine Learning”).(49:12) Saishruthi shared her thoughts on the evolution of research and applications within the Trusted AI landscape.(54:07) Saishruthi discussed the core value propositions of IBM's Elyra, a set of AI-centric extensions to JupyterLab that aims to help data practitioners deal with the complexities of the model development lifecycle.(56:11) Saishruthi briefly shared the challenges with developing Coursera courses on data visualization with Python and with R.(01:00:47) Saishruthi went over her passion for movements such as Women In Tech and Girls Who Code.(01:03:27) Saishruthi shared details about her initiative to bring education to rural children.(01:06:36) Closing segment.Saishruthi's Contact InfoTwitterLinkedInMediumGitHubCourseraMentioned ContentTalks“Digital Discrimination: Cognitive Bias in Machine Learning” (All Things Open 2020)ProjectsAI Fairness 360AI Explainability 360Adversarial Robustness ToolkitModel Asset ExchangeData Asset ExchangeElyraCoursesData Visualization with PythonData Visualization with RAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Watch the live stream: Watch on YouTube About the show Sponsored by us: Check out the courses over at Talk Python And Brian's book too! Special guest: Nick Muoh Brain #1: ormar : an async mini ORM for Python, with support for Postgres, MySQL, and SQLite. suggested by John Hagen From John: “It's a really cool ORM that combines Pydantic models and SQL models into a single definition. What is great about this, is it can be used to reduce repetitive duplication between Models for an ORM and the Pydantic Models that FastAPI needs to describe serialization. … If you have very pure-data heavy abstractions where your input and outputs through the API are roughly equivalent to your database, this helps you avoid needing to duplicate tons of SQLAlchemy classes and Pydantic that look identical and now you need to keep them in sync (DRY issue).” Michael #2: No module named via Garett Dunn Website: nomodulenamed.com Get an error like Python Error: No module named dateutil, maybe you need pip install python_dateutil (reference) Nick #3: JupyterLite Jeremy Tuloup JupyterLite is a JupyterLab distribution that runs entirely in the browser built from the ground-up using JupyterLab components and extensions. Python kernel backed by Pyodide running in a Web Worker Kernels include Python 3.8 (pyolite implementation) Javascript P5.js Data is written to in-browser storage Data doesn't leave the browser unless you are using extensions or use browser's fetch API Brian #4: Lot of plots Dylan Castillo Side by side comparison of plots. with: pandas, matplotlib, seaborn, plotly.express plotting: line, grouped bars, stacked bars, area, pie/donut, histogram, scatter, and box Many plotting articles talk about cool stuff you can do with a particular library. This is nice in that they all can do these things, so you can see the output of each and compare see the code that goes into making each, and see what style of api you might like to work with Michael #5: Monty, Mongo tinified. MongoDB implemented in Python Monty, Mongo tinified. MongoDB implemented in Python Inspired by TinyDB and it's extension TinyMongo A pure Python-implemented database that looks and works like MongoDB.
In this episode of the podcast, Tosha Ellison and Grizz Griswold (both of FINOS) interview Andrew Stein, Executive Director at J.P. Morgan Chase, and Lead Maintainer on the FINOS Perspective open source project. We discuss the project itself, its genesis, problems that Perspective solves for what users, and then pull in examples of what the project and the software can do. We also look at who should, and why they should get involved in consuming and contributing to the open source project. Andrew also gave a presentation and demo on "How to Build an Order Book Simulation with Perspective" last month - so check that out too: https://www.finos.org/blog/us-open-source-in-fintech-meetup-31-march-21 BACKGROUND FOR THE FINOS PERSPECTIVE PROJECT Perspective is an interactive visualization component for large, real-time datasets. Originally developed for J.P. Morgan's trading business, Perspective makes it simple to build real-time & user configurable analytics entirely in the browser, or in concert with Python and/or Jupyterlab. Use it to create reports, dashboards, notebooks and applications, with static data or streaming updates via Apache Arrow. As a library, Perspective provides both: A fast, memory efficient streaming query engine, written in C++ and compiled for both WebAssembly and Python, with read/write/stream/virtual support for Apache Arrow. A framework-agnostic User Interface Custom Element and Jupyterlab Widget, via WebWorker (WebAssembly) or virtually via WebSocket (Python/Node), and a suite of Datagrid and D3FC Chart plugins. Website https://perspective.finos.org/ GitHub Repo https://github.com/finos/perspective/ Case Study https://www.finos.org/blog/perspective-project-case-study Andrew Stein, Executive Director, J.P. Morgan Chase Andrew has been a web developer for 15 years. Despite winning the 2018 Nueske’s Bacon Night Award as a member of team “Lard and In Charge” at “Hogs for the Cause” BBQ festival, Andrew rejected a life of perennial BBQ fame and returned to programming full-time where he currently works on Perspective at JPMC. ►► Visit here for more FINOS Meetups - https://www.finos.org/hosted-events ►► Visit FINOS www.finos.org ►► Get In Touch: info@finos.org
Nate Rush goes over building a JupyterLab extension with Python. The managed Jupyter Lab instances run on a Kubernetes cluster on EKS.
How does a computer learn to speak with emotion and conviction? Language is hard to express as a set of firm rules. Every language rule seems to have exceptions and the exceptions have exceptions etcetera. Typical, “if this then that” approaches to language just don’t work. There’s too much nuance. But each generation of algorithms gets closer and closer. Markov chains were invented in the 1800’s and rely on nothing more than basic probabilities. It’s a simple idea, just look at an input (like a book), and learn the order in which words tend to appear. With this knowledge, it’s possible to generate new text in the same style of the input, just by looking up the probability of words that are likely to follow each other. It’s simple and sometimes half decent, but not effective for longer outputs as this approach tends to lack object permanence and generate run-on sentences. Markov models are used today in predictive text phone keyboards, but can also be used to predict weather, stock prices, etc. There’ve been plenty of other approaches to language generation (and plenty of mishaps as well). A notable example is CleverBot, which chats with humans and heavily references its previous conversations to generate its results. Cleverbot’s chatting can sometimes be eerily human, perfectly regurgitating slang, internet abbreviations, obscure jokes. But it’s kind of a sly trick at the end of the day, and, as with Markov chains, Cleverbot’s AI still doesn’t always grasp grammar and object permanence. In the last decade or two, there’s been an explosion in the abilities of a different kind of AI, the Artificial Neural Network. These “neural nets” are modelled off the way that brains work, running stimuli through their “neurons” and reinforcing paths that yield the best results. The outputs are chaotic until they are properly “trained.” But as the training reaches its optimal point, a model emerges that can efficiently process incoming data and spit out output that incorporates the same kinds of nuance, strangeness, and imperfection that you expect to see in the natural world. Like Markov chains, neural nets have a lot of applications outside language too. But these neural networks are complicated, like a brain. So complicated, in fact, that few try to dissect these trained models to see how they’re actually working. And tracing it backwards is difficult, but not impossible. If we temporarily ignore the real risk that sophisticated AI language models pose for societies attempting to separate truth from fiction these neural net models allow for some interesting possibilities, namely extracting the language style of a large body of text and using that extracted style to generate new text that’s written in the voice of the original text. In this episode, Jeff creates an AI and names it “Theodora.” She’s trained to speak like a presenter giving a Ted Talk. The result varies from believable to utter absurdity and causes Jeff to reflect on the continued inability of individuals, AI, and large nonprofits to distinguish between good ideas and absolute madness. On the creation of Theodora: Jeff used a variety of free tools to generate Theodora in the episode. OpenAI’s Generative Pre-trained Transformer 2 (GPT-2) was turned into the Python library GPT2 Simple by Max Woolf, who also created a tutorial demonstrating how to train the model for free using Google Colab. Jeff used this tutorial to train Theodora on a corpus of about 900 Ted Talk transcripts for 5,000 training steps. Jeff then downloaded the model locally and used JupyterLab (Python) to generate new text. That text was then sent to Google Cloud’s Text-To-Speech (TTS) service where it was converted to the voice heard on the episode. Producer: Jeff EmtmanMusic: LianceSponsor: Liance Independent musician James Li has just released This Painting Doesn’t Dry, an album about the relationship between personal experiences and the story of humanity as a whole. James made this album while he anxiously watched his homeland of Hong Kong fall into political crisis.Buy on Bandcamp. Listen on Spotify.
Watch the live stream: Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training pytest book Patreon Supporters Special guest Guy Royse Brian #1: How to make an awesome Python package in 2021 Anton Zhiyanov, @ohmypy Also thanks John Mitchell, @JohnTellsAll for posting about it. Great writing taking you through everything in a sane order. Stubbing a project with just .gitignore and a directory with a stub __init__.py. Test packaging and publishing use flit init to create initial pyproject.toml set up your ~/.pypirc file publish to the test repo Make the real thing make an implementation publish Extras Adding README.md & CHANGELOG.md and updating pyproject.toml to include README.md and a Python version selector. Adding linting and testing with pytest, tox, coverage, and others Building in the cloud with GH Actions, Codecov, Code Climate Adding badges Task automation with a Makefile Publishing to PyPI from a GH Action Missing (but possibly obvious): GitHub project Checking your project name on PyPI early Super grateful for: Do all of this early in the project Using flit publish --repository pypitest and spelling out how to set up a ~/.pypirc file. Start to finish workflow Example project with all filled out project files Michael #2: Kubestriker Kubestriker performs numerous in depth checks on kubernetes infra to identify the security misconfigurations Focuses on running in production and at scale. kubestriker is Platform agnostic and works equally well across more than one platform such as self hosted kubernetes, Amazon EKS, Azure AKS, Google GKE etc. Current Capabilities Scans Self Managed and cloud provider managed kubernetes infra Reconnaissance phase checks for various services or open ports Performs automated scans incase of insecure, readwrite or readonly services are enabled Performs both authenticated scans and unauthenticated scans Scans for wide range of IAM Misconfigurations in the cluster Scans for wide range of Misconfigured containers Scans for wide range of Misconfigured Pod Security Policies Scans for wide range of Misconfigured Network policies Scans the privileges of a subject in the cluster Run commands on the containers and streams back the output Provides the endpoints of the misconfigured services Provides possible privilege escalation details Elaborative report with detailed explanation Guy #3: wasmtime WebAssembly runtime with support for: Python, Rust, C, Go, .NET Documentation here: https://docs.wasmtime.dev/ Supports WASI (Web Assembly System Interface): WASI supports IO operations—it does for WebAssembly what Node.js did for JavaScript Brian #4: Depend-a-lot-bot Anthony Shaw, @anthonypjshaw A bot for GitHub that automatically approves + merges PRs from dependabot and PyUp.io when they meet certain criteria: All the checks are passing The package is on a safe-list (see configuration) Example picture shows an auto approval and merge of a tox version update, showing “This PR looks good to merge automatically because tox is on the save-list for this repository”. Configuration in a .yml file. I learned recently that most programming jobs that can be automated eventually devolve into configuring a yml file. Michael #5: Supreme Court sides with Google in API copyright battle with Oracle The Supreme Court has sided with Google in its decade-long legal battle with Oracle over the copyright status of application programming interfaces. The ruling means that Google will not owe Oracle billions of dollars in damages. It also has big implications for the broader software industry The ruling heads off an expected wave of lawsuits over API copyrights. The case dates back to the creation of the Android platform in the mid-2000s. Google independently implemented the Java API methods, but to ensure compatibility, it copied Java's method names, argument types, and the class and package hierarchy. Over a decade of litigation, Google won twice at the trial court level, but each time, the ruling was overruled by the Federal Circuit appeals court. The case finally reached the Supreme Court last year. Writing for a six-justice majority, Justice Stephen Breyer held that Google's copying of the Java API calls was permissible under copyright's fair use doctrine. Guy #6: RedisAI Module for Redis that add AI capabilities Turns Redis into a model server: Supports TF, PyTorch, and ONNX models Adds the TENSOR data type ONNX + Redis has positive architectural implications Extras Michael git for Windows JupyterLab reaches v3 (via via Allan Hansen) Why not support Python letter by Brian Skinn Django 3.2 is out & is LTS PyCharm 2021.1 just dropped with Code With Me Brian The PSF is hiring a Developer-in-Residence to support CPython! Joke Vim Escape Rooms Happiness -
話した内容ScrapBox このポッドキャストでは、Kaggleを中心としたデータサイエンスに関連する情報を配信していきます。 今回は、JupyterLab 3.0、BlueqatのJulia、SIGNATEリーダーボードの仕様変更、DALL·E、今週のKaggle について話しています。
Mat X and JD discussing the DevOps for Dummies bookclub, zsh, ansible, the recent MacSysAdmin conference and data analysis with JupyterLab.
In this podcast, Bright's Director of Product Management, Robert Stober is once again joined by Adnan Khaleel, the Global Sales Strategist for HPC, AI and Deep Learning at Dell EMC.Together they discuss the Bright Jupyter integration, which is a combination of JupyterHub, JupyterLab, and Jupyter Enterprise Gateway. They look at how the Bright Jupyter integration makes it easy for customers to use Bright for Data Science through JupyterLab notebooks, and allows users to run their notebooks through a supported HPC scheduler, Kubernetes, or on the server running JupyterHub.
Mozilla removes support for older versions of TLS in Firefox 74, Google launches a facility for managing machine images on Compute Engine, IBM's Elyra brings new tools to Jupyter Notebooks, and Palo Alto Networks issues a dire report on the state of IoT security.
Nach längerer Pause aufgrund von Urlaub und Terminkoordinationsschwierigkeiten sind wir wieder mit einer etwas unvorbereiteten Episode am Start und reden mit Christian über Python 3.8, Konferenzbesuche und diverse Nebensächlichkeiten. Shownotes Unsere E-Mail für Fragen, Anregungen & Kommentare: hallo@python-podcast.de News aus der Szene Python 3.8 PyConDE und PyData Berlin 2019 Fluent Python [Book] - Beyond Paradigms: a new key to grok Python & other languages [talk] Guido Retires mypy JupyterLab - A Tour of JupyterLab Extensions [talk] 10 Years of Automated Category Classification for Product Data Job Panel (Freelance) [talk] Flying Circus Python Software Verband Python 3.8 PEP 572 -- Assignment Expressions (walrus operator) hynek 2to3 - Automated Python 2 to 3 code translation PEP 570 -- Python Positional-Only Parameters multiprocessing.shared_memory — Provides shared memory for direct access across processes¶ tuple unpacking PEP 578 -- Python Runtime Audit Hooks Core Sprint CPython Core Developer Sprint 2019 GIL - global interpreter lock PEG Parsers batou Jinja Picks Django Forum TextBlob: Simplified Text Processing Öffentliches Tag auf konektom
Welcome to PyDataMCR episode 9 , today we are talking to Ellen Talbot, a PhD candidate at the University of Liverpool in the Geographic Data Science Lab. Ellen talks with us about smart people in academia and smart meters in energy usage. Sponsors LadBible - https://www.ladbible.com/ Cathcart Associates - cathcartassociates.com/ RLadies Manchester - https://www.meetup.com/rladies-manchester/ Rstudio - https://rstudio.com/ Jupyter - https://jupyter.org/ Spyder - https://www.spyder-ide.org/ Jupyterlab - https://jupyterlab.readthedocs.io/en/stable/ LockeData - https://itsalocke.com/ Honeycomb Analytics - https://honeycomb-analytics.com/ Admires Hannah Fry - https://twitter.com/FryRsquared Deep mind podcast - https://deepmind.com/blog/article/welcome-to-the-deepmind-podcast Hello World book - https://www.amazon.co.uk/Hello-World-How-Human-Machine/dp/0857525247
In this episode of DataFramed, Hugo speaks with Brian Granger, co-founder and co-lead of Project Jupyter, physicist and co-creator of the Altair package for statistical visualization in Python.They’ll speak about data science, interactive computing, open source software and Project Jupyter. With over 2.5 million public Jupyter notebooks on github alone, Project Jupyter is a force to be reckoned with. What is interactive computing and why is it important for data science work? What are all the the moving parts of the Jupyter ecosystem, from notebooks to JupyterLab to JupyterHub and binder and why are they so relevant as more and more institutions adopt open source software for interactive computing and data science? From Netflix running around 100,000 Jupyter notebook batch jobs a day to LIGO’s Nobel prize winning discovery of gravitational waves publishing all their results reproducibly using Notebooks, Project Jupyter is everywhere. Links from the show FROM THE INTERVIEWBrian on Twitter Project JupyterBeyond Interactive: Notebook Innovation at Netflix (Ufford, Pacer, Seal, Kelley, Netflix Tech Blog)Gravitational Wave Open Science Center (Tutorials)JupyterCon YouTube Playlistjupyterstream Github RepositoryFROM THE SEGMENTSMachines that Multi-Task (with Friederike Schüür of Fast Forward Labs)Part 1 at ~24:40Brief Introduction to Multi-Task Learning (By Friederike Schüür)Overview of Multi-Task Learning Use Cases (By Manny Moss)Multi-Task Learning for the Segmentation of Building Footprints (Bischke et al., arXiv.org)Multi-Task as Question Answering (McCann et al., arXiv.org)The Salesforce Natural Language Decathlon: A Multitask Challenge for NLP Part 2 at ~44:00Rich Caruana’s Awesome Overview of Multi-Task Learning and Why It WorksSebastian’s Ruder’s Overview of Multi-Task Learning in Deep Neural NetworksMassively Multi-Task Network for Drug Discovery, 259 Tasks (!) (Ramsundar et al. arXiv.org)Brief Overview of Multi-Task Learning with Video of Newsie, the Prototype (By Friederike Schüür) Original music and sounds by The Sticks.
Jessica Forde, Yuvi Panda and Chris Holdgraf join Melanie and Mark to discuss Project Jupyter from it’s interactive notebook origin story to the various open source modular projects it’s grown into supporting data research and applications. We dive specifically into JupyterHub using Kubernetes to enable a multi-user server. We also talk about Binder, an interactive development environment that makes work easily reproducible. Jessica Forde Jessica Forde is a Project Jupyter Maintainer with a background in reinforcement learning and Bayesian statistics. At Project Jupyter, she works primarily on JupyterHub, Binder, and JuptyerLab to improve access to scientific computing and scientific research. Her previous open source projects include datamicroscopes, a DARPA-funded Bayesian nonparametrics library in Python, and density, a wireless device data tool at Columbia University. Jessica has also worked as a machine learning researcher and data scientist in a variety of applications including healthcare, energy, and human capital. Yuvi Panda Yuvi Panda is the Project Jupyter Technical Operations Architect in the UC Berkeley Data Sciences Division. He works on making it easy for people who don’t traditionally consider themselves “programmers” to do things with code. He builds tools (e.g., Quarry, PAWS, etc.) to sidestep the list of historical accidents that constitute the “command line tax” that people have to pay before doing productive things with computing. Chris Holdgraf Chris Holdgraf is a is a Project Jupyter Maintainer and Data Science Fellow at the Berkeley Institute for Data Science and a Community Architect at the Data Science Education Program at UC Berkeley. His background is in cognitive and computational neuroscience, where he used predictive models to understand the auditory system in the human brain. He’s interested in the boundary between technology, open-source software, and scientific workflows, as well as creating new pathways for this kind of work in science and the academy. He’s a core member of Project Jupyter, specifically working with JupyterHub and Binder, two open-source projects that make it easier for researchers and educators to do their work in the cloud. He works on these core tools, along with research and educational projects that use these tools at Berkeley and in the broader open science community. Cool things of the week Dragonball hosted on GC / powered by Spanner blog and GDC presentation at Developer Day Cloud Text-to-Speech API powered by DeepMind WaveNet blog and docs Now you can deploy to Kubernetes Engine from Gitlab blog Interview Jupyter site JupyterHub github Binder site and docs JupyterLab site Kubernetes site github Jupyter Notebook github LIGO (Laser Interferometer Gravitational-Wave Observatory) site and binder Paul Romer, World Bank Chief Economist blog and jupyter notebook The Scientific Paper is Obsolete article Large Scale Teaching Infrastructure with Kubernetes - Yuvi Panda, Berkeley University video Data 8: The Foundations of Data Science site Zero to JupyterHub site JupyterHub Deploy Docker github Jupyter Gitter channels Jupyter Pop-Up, May 15th site JupyterCon, Aug 21-24 site Question of the week How did Google’s predictions do during March Madness? How to build a realt time prediction model: Architecting live NCAA predictions Final Four halftime - fed data from first half to create prediction on second half and created a 30 second spot that ran on CBS before game play sample prediction ad Kaggle Competition site Where can you find us next? Melanie is speaking about AI at Techtonica today, and April 14th will be participating in a panel on Diversity and Inclusion at the Harker Research Symposium
We have the pleasure this week of having the Director of Solutions for Google Cloud Miles Ward and Cloud Solutions Architect Grace Mollison join Mark and Melanie to discuss Solution Architects, what they do and how they interact with Customers at Google Cloud Platform. Miles Ward Miles Ward is a three-time technology startup entrepreneur with a decade of experience building cloud infrastructures. Miles is Director of Solutions for Google Cloud; focused on delivering next-generation solutions to challenges in big data and analytics, application migration, infrastructure automation, and cost optimization. He worked as a core part of the Obama for America 2012 “TECH” team, crashed Twitter a few times, helped NASA stream the Curiosity Mars Rover landing, put Skype back online in a pinch, and plays a mean electric sousaphone. Grace Mollison Based in London, UK, Grace Mollison is a Cloud Solutions Architect where she helps customers to understand how to apply policies to their Google cloud platform environments as well as how to architect and deploy applications on the Google Cloud platform. In her spare time she spends time attempting to teach her international team how to speak the Queens english! Before Google Grace was a Solutions Architect at AWS where she worked with the AWS ecosystem and customers to ensure well architected solutions. Cool things of the week We have awesome new intro and outro music. Did you notice? The thing is … Cloud IoT Core is now generally available blog site JupyterLab is Ready for Users blog github Announcing Google Cloud Spanner as a Vault storage backend blog How to handle mutating JSON schemas in a streaming pipeline, with Square Enix blog FAT* livestream Interview Google Cloud Platform Solutions site Tutorials and Solutions site Machine Learning with Financial Time Series Data solution Implementing GCP Policies for Customer Use Cases solution #87 Customer Engineers with Jonathan Cham podcast Google Cloud Next Solution Architects are hiring! careers Question of the week How do I get a Docker image into Minikube without uploading it to an external registry and then downloading it all over again? Is there an easy way to do this locally? Minikube github $ docker save | (eval $(minikube docker-env) && docker load) Original references github Stack Overflow Where can you find us next? Mark will be at the Game Developer's Conference | GDC in March.
In this episode we talk about how the temporal nature of remotely sensed data enriches the information in the imagery. In the news we talk about JupyterLab, another film review (!) and of course, SpaceX. Amongst other things. If you have questions, comments or corrections then you can contact Alastair (@ajggeoger) and Andrew (@map_andrew) on Twitter using #scenefromabove or @eoscenefrom Shownotes: JupyterLab is ready for users Landsat 8 is five years old Mission Control: the unsung heroes of Apollo SpaceX - double sonic booms Sarbian OS Himawari 8 images of Mount Sinabung eruption Google Earth Engine Timelapse Timeseries vs GIS vs Remote Sensing on Google Trends
The O’Reilly Programming Podcast: The next technological evolution of cloud systems.In this episode of the O’Reilly Programming Podcast, I talk serverless architecture with Mike Roberts, engineering leader and co-founder of Symphonia, a serverless and cloud architecture consultancy. Roberts will give two presentations—Serverless Architectures: What, Why, Why Not, and Where Next? and Designing Serverless AWS Applications—at the O’Reilly Software Architecture Conference, October 16-19, 2017, in London.Discussion points: Why Roberts calls serverless “the next evolution of cloud systems,” as individual process deployment and the resource allocation of servers are increasingly outsourced to vendors How serverless architectures use backend-as-a-service (BaaS) products and functions-as-a-service (FaaS) platforms The similarities and differences between a serverless architecture and microservices, and how microservices ideas can be applied to serverless Roberts explains that serverless is “not an all-or-nothing approach,” and that often “the best architecture for a company is going to be a hybrid architecture between serverless and non-serverless technologies.” Recent advances in serverless tooling, including progress in distributed system monitoring tools, such as Amazon’s X-Ray We also get a preview of JupyterCon, August 22-25, 2017, in New York, from conference co-chair Fernando Perez. Our discussion highlights the sessions on JupyterLab, and the UC Berkeley Data Science program, an introductory-level course in which the students use Jupyter Notebooks. Other links: Video of Roberts’ presentation An Introduction to Serverless at the April 2017 Software Architecture in New York The free eBook What Is Serverless?, by Mike Roberts and John Chapin The video AWS Lambda, presented by Mike Roberts and John Chapin Video of Roberts and Chapin’s OSCON 2017 presentation Building, Displaying and Running a Scalable and Extensible Serverless Application Using AWS Sam Newman’s book Building Microservices
The O’Reilly Programming Podcast: The next technological evolution of cloud systems.In this episode of the O’Reilly Programming Podcast, I talk serverless architecture with Mike Roberts, engineering leader and co-founder of Symphonia, a serverless and cloud architecture consultancy. Roberts will give two presentations—Serverless Architectures: What, Why, Why Not, and Where Next? and Designing Serverless AWS Applications—at the O’Reilly Software Architecture Conference, October 16-19, 2017, in London.Discussion points: Why Roberts calls serverless “the next evolution of cloud systems,” as individual process deployment and the resource allocation of servers are increasingly outsourced to vendors How serverless architectures use backend-as-a-service (BaaS) products and functions-as-a-service (FaaS) platforms The similarities and differences between a serverless architecture and microservices, and how microservices ideas can be applied to serverless Roberts explains that serverless is “not an all-or-nothing approach,” and that often “the best architecture for a company is going to be a hybrid architecture between serverless and non-serverless technologies.” Recent advances in serverless tooling, including progress in distributed system monitoring tools, such as Amazon’s X-Ray We also get a preview of JupyterCon, August 22-25, 2017, in New York, from conference co-chair Fernando Perez. Our discussion highlights the sessions on JupyterLab, and the UC Berkeley Data Science program, an introductory-level course in which the students use Jupyter Notebooks. Other links: Video of Roberts’ presentation An Introduction to Serverless at the April 2017 Software Architecture in New York The free eBook What Is Serverless?, by Mike Roberts and John Chapin The video AWS Lambda, presented by Mike Roberts and John Chapin Video of Roberts and Chapin’s OSCON 2017 presentation Building, Displaying and Running a Scalable and Extensible Serverless Application Using AWS Sam Newman’s book Building Microservices