Open Source Directions hosted by Quansight

Follow Open Source Directions hosted by Quansight
Share on
Copy link to clipboard

Bringing you the news about the future of Open Source

Quansight, LLC

  • Jul 17, 2020 LATEST EPISODE
  • monthly NEW EPISODES
  • 42 EPISODES


Search for episodes from Open Source Directions hosted by Quansight with a specific topic:

Latest episodes from Open Source Directions hosted by Quansight

Episode 45: Julia

Play Episode Listen Later Jul 17, 2020


In this episode of Open Source Directions we were joined by Jeff Bezanson and Katie Hyatt who talk about the work they have been doing with Julia. Julia is a programming language that was designed from the beginning for high performance. It programs compile to native code for multiple platforms via LLVM. Julia is dynamically typed, feels like a scripting language, and has good support for interactive use. Julia has a rich language of descriptive datatypes, and type declarations can be used to clarify and solidify programs. This language uses multiple dispatch as a paradigm, making it easy to express many object-oriented and functional programming patterns. It provides asynchronous I/O, debugging, logging, profiling, a package manager, and more.

Episode 44: RecallGraph

Play Episode Listen Later Jun 19, 2020


In this episode of Open Source Directions we were joined by Aditya Mukhopadhyay who talked about the work he has been doing with RecallGraph. RecallGraph is a versioned-graph data store - it retains all changes that its data (vertices and edges) have gone through to reach their current state. It supports point-in-time graph traversals, letting the user query any past state of the graph just as easily as the present.

Episode 43: Jupyter & Nteract

Play Episode Listen Later Jun 5, 2020


In this episode of Open Source Directions we were joined by Matthew Seal who talked about the work he has been doing with Jupyter and Nteract. Matthew also discussed a particular topic: common Jupyter tools and their adoption for various use cases in the wild.

Episode 42: Open Tech Response

Play Episode Listen Later May 8, 2020


OpenTechResponse is the hub for information sharing and coordination between open source projects responding to an emergency or crisis situation.

Episode 41: Spyder

Play Episode Listen Later May 1, 2020


Spyder is a powerful scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts. It offers a unique combination of the advanced editing, analysis, debugging, and profiling functionality of a comprehensive development tool with the data exploration, interactive execution, deep inspection, and beautiful visualization capabilities of a scientific package.

Episode 40: Fortran

Play Episode Listen Later Apr 17, 2020


Fortran is a compiled language which means that once written, the source code must be passed through a compiler to produce a machine executable that can be run.

Episode 39: Apache Arrow

Play Episode Listen Later Apr 3, 2020


Apache Arrow is a cross-language development platform for in-memory data. It supports zero-copy streaming messaging and has support for a number of languages, including C, C++, Python, R, Rust, and many others.

Episode 38: Jupyter Book

Play Episode Listen Later Mar 20, 2020


Jupyter Book lets you build an online book using a collection of Jupyter Notebooks and Markdown files. Its output is similar to the excellent Bookdown tool, and adds extra functionality for people running a Jupyter stack.

Episode 37: PyJanitor

Play Episode Listen Later Mar 6, 2020


Originally a port of the R package, pyjanitor has evolved from a set of convenient data cleaning routines into an experiment with the method chaining paradigm. Data preprocessing usually consists of a series of steps that involve transforming raw data into an understandable/usable format.

Episode 36: Bokeh 2.0

Play Episode Listen Later Feb 21, 2020


Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords high-performance interactivity over large or streaming datasets. Bokeh can help anyone who would like to quickly and easily make interactive plots, dashboards, and data applications. This is our second time visiting Bokeh, in preperation for the v2.0 release!

Episode 35: IBM Lale

Play Episode Listen Later Jan 24, 2020


Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-safe fashion. If you are a data scientist who wants to experiment with automated machine learning, this library is for you! Lale adds value beyond scikit-learn along three dimensions: automation, correctness checks, and interoperability. For automation, Lale provides a consistent high-level interface to existing pipeline search tools including GridSearchCV, SMAC, and Hyperopt. For correctness checks, Lale uses JSON Schema to catch mistakes when there is a mismatch between hyperparameters and their type, or between data and operators. And for interoperability, Lale has a growing library of transformers and estimators from popular libraries such as scikit-learn, XGBoost, PyTorch etc. Lale can be installed just like any other Python package and can be edited with off-the-shelf Python tools such as Jupyter notebooks.

Episode 34: Stumpy

Play Episode Listen Later Jan 10, 2020


STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of time series data mining tasks such as: pattern/motif (approximately repeated subsequences within a longer time series) discovery, anomaly/novelty (discord) discovery, shapelet discovery, semantic segmentation, density estimation, time series chains (temporally ordered set of subsequence patterns), and more!

Episode 28: Matplotlib

Play Episode Listen Later Dec 6, 2019


Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.

Episode 33: stdlib

Play Episode Listen Later Nov 22, 2019


stdlib ("standard lib") is a standard library for JavaScript and Node.js, with an emphasis on numerical and scientific computing applications. The library provides a collection of robust, high performance libraries for mathematics, statistics, data processing, streams, and more and includes many of the utilities you would expect from a standard library.

Episode 32: Voila

Play Episode Listen Later Nov 8, 2019


Voilà turns Jupyter notebooks into standalone web applications. Unlike the usual HTML-converted notebooks, each user connecting to the Voilà tornado application gets a dedicated Jupyter kernel which can execute the callbacks to changes in Jupyter interactive widgets. By default, Voilà disallows execute requests from the front-end, preventing execution of arbitrary code. By default, Voilà runs with the strip_source option, which strips out the input cells from the rendered notebook.

Episode 31: Econ-ARK

Play Episode Listen Later Oct 25, 2019


The Econ-ARK project provides open-source toolkits for researchers trying to understand how economic and social outcomes result from the actions of heterogeneous individuals. The primary goals of the project are to make entry into the world of such modeling easy; to accelerate the development of this kind of modeling for policy-making and academic research; and to increase the openness, replicability, and interoperability of modeling tools.

Episode 30: UMAP

Play Episode Listen Later Oct 11, 2019


Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction. The algorithm is founded on three assumptions about the data: 1. The data is uniformly distributed on a Riemannian manifold; 2. The Riemannian metric is locally constant (or can be approximated as such); 3. The manifold is locally connected. From these assumptions it is possible to model the manifold with a fuzzy topological structure. The embedding is found by searching for a low dimensional projection of the data that has the closest possible equivalent fuzzy topological structure.

Episode 29: Panel

Play Episode Listen Later Sep 27, 2019


Panel provides tools for easily composing widgets, plots, tables, and other viewable objects and controls into control panels, apps, and dashboards. Panel works with visualizations from Bokeh, Matplotlib, HoloViews, and other Python plotting libraries, making them instantly viewable either individually or when combined with interactive widgets that control them. Panel works equally well in Jupyter Notebooks, for creating quick data-exploration tools, or as standalone deployed apps and dashboards, and allows you to easily switch between those contexts as needed.

Episode 27: OpenTeams

Play Episode Listen Later Aug 23, 2019


OpenTeams brings together organizations using open source software with creators and maintainers of the software to facilitate and grow funding opportunities.

Episode 26: Vega-Lite

Play Episode Listen Later Aug 9, 2019


Vega is a declarative format for creating, saving, and sharing visualization designs. With Vega, visualizations are described in JSON, and generate interactive views using either HTML5 Canvas or SVG.

Episode 25: Binder

Play Episode Listen Later Jul 26, 2019


Have a repository full of Jupyter notebooks? With Binder, open those notebooks in an executable environment, making your code immediately reproducible by anyone, anywhere.

Episode 24: nteract

Play Episode Listen Later Jul 12, 2019


The nteract project is an ecosystem of open source tools to enable people to build their own front-ends and workflows on top of the Jupyter ecosystem.

Episode 23: conda-forge

Play Episode Listen Later Jun 28, 2019


conda-forge is community led collection of recipes, build infrastructure and distributions. Conda-forge currently build conda packages for Linux, Mac, Windows, ARM, and Power8 architectures. Conda-forge has 1400 members in its GitHub organization and >7000 repositories. The conda-forge channel has about 80 million downloads a month, and growing. Conda-forge is an official NumFOCUS project.

Episode 22: SciKit-Learn

Play Episode Listen Later May 31, 2019


SciKit-Learn provides simple and efficient tools for data mining and data analysis which are accessible to everybody, and reusable in various contexts. It is built on NumPy, SciPy, and matplotlib.

Episode 21: xtensor/xframe

Play Episode Listen Later May 18, 2019


xtensor provides an extensible expression system enabling lazy broadcasting, an API following the idioms of the C++ standard library, and tools to manipulate array expressions and build upon xtensor. xtensor containers are inspired by NumPy, the Python array programming library. Adaptors for existing data structures to be plugged into our expression system can easily be written. xtensor requires a modern C++ compiler supporting C++14.

Episode 20: Uarray

Play Episode Listen Later May 3, 2019


Array interface object for Python with pluggable backends and a multiple-dispatch mechanism for defining down-stream functions. CORRECTION: In the episode Hameer implied moving data from GPUs to CPUs won’t be a problem in PCIe 4,0. It’s actually in an Intel-proposed extension to PCIe 5.0.

Episode 19: Pyodide

Play Episode Listen Later Apr 19, 2019


It provides transparent conversion of objects between Javascript and Python. When inside a browser, this means Python has full access to the Web APIs. While closely related to the iodide project, Pyodide may be used standalone in any context where you want to run Python inside a web browser.

Episode 18: PyMC3

Play Episode Listen Later Apr 5, 2019


PyMC3 is a probabilistic programming package for Python that allows users to fit Bayesian models using a variety of numerical methods, most notably Markov chain Monte Carlo (MCMC) and variational inference (VI).

Episode 17: TensorFlow

Play Episode Listen Later Mar 22, 2019


TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

Episode 16: Chainer

Play Episode Listen Later Mar 14, 2019


Chainer is a powerful, flexible and intuitive deep learning framework. Chainer supports CUDA computation. It only requires a few lines of code to leverage a GPU. It also runs on multiple GPUs with little effort. Chainer supports various network architectures including feed-forward nets, convnets, recurrent nets and recursive nets. It also supports per-batch architectures. Forward computation can include any control flow statements of Python without lacking the ability of backpropagation. It makes code intuitive and easy to debug.

Episode 15: Numba

Play Episode Listen Later Mar 1, 2019


Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. Numba translates Python functions to optimized machine code at runtime using the Industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN. Users do not need to replacethe Python interpreter, run a separate compilation step, or even have a C/C++ compiler installed. Applying one of the Numba decorators to a Python function is all that is needed.

Episode 14: ITK

Play Episode Listen Later Feb 21, 2019


ITK is an open-source, cross-platform system that provides developers with an extensive suite of software tools for image analysis. Developed through extreme programming methodologies, ITK employs leading-edge algorithms for registering and segmenting multidimensional data.

Episode 13: Jupyter Ecosystem

Play Episode Listen Later Feb 1, 2019


Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages. The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

Episode 12: PySpark

Play Episode Listen Later Jan 18, 2019


The Spark Python API (PySpark) exposes the Spark programming model to Pytho

Episode 11: Dask

Play Episode Listen Later Jan 14, 2019


Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love. Dask is open source and freely available. It is developed in coordination with other community projects like Numpy, Pandas, and SciKit-Learn.

Episode 10: PyData/Sparse

Play Episode Listen Later Dec 19, 2018


The aim of PyData/Sparse is to create sparse containers that implement the ndarray interface. Traditionally in the PyData ecosystem, sparse arrays have been provided by the scipy.sparse submodule. All containers there depend on and emulate the numpy.matrix interface. This means that they are limited to two dimensions and also do not work well in places where numpy.ndarray would work. PyData/Sparse is well on its way to replacing scipy.sparse as the de-facto sparse array implementation in the PyData ecosystem.

Episode 9: Datashader

Play Episode Listen Later Dec 19, 2018


Datashader is a graphics pipeline system for creating meaningful representations of large datasets quickly and flexibly. Datashader breaks the creation of images into a series of explicit steps that allow computations to be done on intermediate representations. This approach allows accurate and effective visualizations to be produced automatically without trial-and-error parameter tuning, and also makes it simple for data scientists to focus on particular data and relationships of interest in a principled way.

Episode 8: SciPy

Play Episode Listen Later Dec 3, 2018


SciPy is open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more. SciPy provides many user-friendly and efficient numerical routines such as for numerical integration and optimization. SciPy runs on all popular operating systems, is easy to use, and powerful enough to be depended upon by the world's leading scientists & engineers.

Episode 7: GeoViews

Play Episode Listen Later Nov 16, 2018


GeoViews is a Python library that makes it easy to explore and visualize geographical, meteorological, and oceanographic datasets, such as those used in weather, climate, and remote sensing research. GeoViews is built on the HoloViews library for building flexible visualizations of multidimensional data. GeoViews adds a family of geographic plot types based on the Cartopy library, plotted using either the Matplotlib or Bokeh packages. With GeoViews, you can now work easily and naturally with large, multidimensional geographic datasets, instantly visualizing any subset or combination of them, while always being able to access the raw data underlying any plot.

Episode 6: Intake

Play Episode Listen Later Oct 30, 2018


Intake will appeal to different groups but is useful for all and acts as a common platform that everyone can use to smooth the progression of data from developers and providers to users.

Episode 5: CuPy

Play Episode Listen Later Oct 29, 2018


CuPy's interface is highly compatible with NumPy; in most cases it can be used as a drop-in replacement. Blog Post: https://quansight.github.io/Episode-5-CuPy/

Episode 0: Bokeh

Play Episode Listen Later Jul 20, 2018


This episode features Bokeh, which is a web-based and interactive visualization library for Python. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.

Claim Open Source Directions hosted by Quansight

In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

Claim Cancel