PyTorch Developer Podcast

Share on

The PyTorch Developer Podcast is a place for the PyTorch dev team to do bite sized (10-20 min) topics about all sorts of internal development topics in PyTorch.

Edward Yang, Team PyTorch

Aug 4, 2024 LATEST EPISODE
infrequent NEW EPISODES
16m AVG DURATION
83 EPISODES

Search for episodes from PyTorch Developer Podcast with a specific topic:

Latest episodes from PyTorch Developer Podcast

Compiler collectives

Play Episode Listen Later Aug 4, 2024 16:33

Compiler collectives are a PT2 feature where by compiler instances across multiple ranks use NCCL collectives to communicate information to other instances. This is used to ensure we consistently decide if inputs or static or dynamic across all ranks. See also PR at https://github.com/pytorch/pytorch/pull/130935

pr collective compiler

TORCH_TRACE and tlparse

Play Episode Listen Later Apr 29, 2024 15:28

TORCH_TRACE and tlparse are a structured log and log parser for PyTorch 2. It gives useful information about what code was compiled and what the intermediate build products look like.

torch pytorch

Higher order operators

Play Episode Listen Later Apr 21, 2024 17:10

Higher order operators are a special form of operators in torch.ops which have relaxed input argument requirements: in particular, they can accept any form of argument, including Python callables. Their name is based off of their most common use case, which is to represent higher order functions like control flow operators. However, they are also used to implement other variants of basic operators and can also be used to smuggle in Python data that is quite unusual. They are implemented using a Python dispatcher.

python operators higher order

Inductor - Post-grad FX passes

Play Episode Listen Later Apr 12, 2024 24:07

The post-grad FX passes in Inductor run after AOTAutograd has functionalized and normalized the input program into separate forward/backward graphs. As such, they generally can assume that the graph in question is functionalized, except for some mutations to inputs at the end of the graph. At the end of post-grad passes, there are special passes that reintroduce mutation into the graph before going into the rest of Inductor lowering which is generally aware of passes. The post-grad FX passes are varied but are typically domain specific passes making local changes to specific parts of the graph.

fx post grad inductor

CUDA graph trees

Play Episode Listen Later Mar 24, 2024 20:50

CUDA graph trees are the internal implementation of CUDA graphs used in PT2 when you say mode="reduce-overhead". Their primary innovation is that they allow the reuse of memory across multiple CUDA graphs, as long as they form a tree structure of potential paths you can go down with the CUDA graph. This greatly reduced the memory usage of CUDA graphs in PT2. There are some operational implications to using CUDA graphs which are described in the podcast.

trees graphs cuda

Min-cut partitioner

Play Episode Listen Later Mar 17, 2024 15:56

The min-cut partitioner makes decisions about what to save for backwards when splitting the forward and backwards graph from the joint graph traced by AOTAutograd. Crucially, it doesn't actually do a "split"; instead, it is deciding how much of the joint graph should be used for backwards. I also talk about the backward retracing problem.

crucially

AOTInductor

Play Episode Listen Later Mar 2, 2024 17:30

AOTInductor is a feature in PyTorch that lets you export an inference model into a self-contained dynamic library, which can subsequently be loaded and used to run optimized inference. It is aimed primarily at CUDA and CPU inference applications, for situations when your model export once to be exported once while your runtime may still get continuous updates. One of the big underlying organizing principles is a limited ABI which does not include libtorch, which allows these libraries to stay stable over updates to the runtime. There are many export-like use cases you might be interested in using AOTInductor for, and some of the pieces should be useful, but AOTInductor does not necessarily solve them.

abi cpu cuda pytorch

Tensor subclasses and PT2

Play Episode Listen Later Feb 24, 2024 13:25

Tensor subclasses allow you to add extend PyTorch with new types of tensors without having to write any C++. They have been used to implement DTensor, FP8, Nested Jagged Tensor and Complex Tensor. Recent work by Brian Hirsh means that we can compile tensor subclasses in PT2, eliminating their overhead. The basic mechanism by which this compilation works is a desugaring process in AOTAutograd. There are some complications involving views, dynamic shapes and tangent metadata mismatch.

tensor pytorch subclasses

Compiled autograd

Play Episode Listen Later Feb 19, 2024 18:07

Compiled autograd is an extension to PT2 that permits compiling the entirety of a backward() call in PyTorch. This allows us to fuse accumulate grad nodes as well as trace through arbitrarily complicated Python backward hooks. Compiled autograd is an important part of our plans for compiled DDP/FSDP as well as for whole-graph compilation.

python compiled pytorch

PT2 extension points

Play Episode Listen Later Feb 5, 2024 15:54

We discuss some extension points for customizing PT2 behavior across Dynamo, AOTAutograd and Inductor.

points extension dynamo inductor

Inductor - Define-by-run IR

Play Episode Listen Later Jan 24, 2024 12:06

Define-by-run IR is how Inductor defines the internal compute of a pointwise/reduction operation. It is characterized by a function that calls a number of functions in the 'ops' namespace, where these ops can be overridden by different handlers depending on what kind of semantic analysis you need to do. The ops Inductor supports include regular arithmetic operators, but also memory load/store, indirect indexing, masking and collective operations like reductions.

define ir inductor

Unsigned integers

Play Episode Listen Later Jan 17, 2024 13:07

Traditionally, unsigned integer support in PyTorch was not great; we only support uint8. Recently, we added support for uint16, uint32 and uint64. Bare bones functionality works, but I'm entreating the community to help us build out the rest. In particular, for most operations, we plan to use PT2 to build anything else. But if you have an eager kernel you really need, send us a PR and we'll put it in. While most of the implementation was straightforward, there are some weirdnesses related to type promotion inconsistencies with numpy and dealing with the upper range of uint64. There is also upcoming support for sub-byte dtypes uint1-7, and these will exclusively be implemented via PT2.

pr bare traditionally unsigned pytorch integers

Inductor - IR

Play Episode Listen Later Jan 16, 2024 18:00

Inductor IR is an intermediate representation that lives between ATen FX graphs and the final Triton code generated by Inductor. It was designed to faithfully represent PyTorch semantics and accordingly models views, mutation and striding. When you write a lowering from ATen operators to Inductor IR, you get a TensorBox for each Tensor argument which contains a reference to the underlying IR (via StorageBox, and then a Buffer/ComputedBuffer) that says how the Tensor was computed. The inner computation is represented via define-by-run, which allows for compact definition of IR representation, while still allowing you to extract an FX graph out if you desire. Scheduling then takes buffers of inductor IR and decides what can be fused. Inductor IR may have too many nodes, this would be a good thing to refactor in the future.

ir fx scheduling aten triton tensor pytorch inductor

Dynamo - VariableTracker

Play Episode Listen Later Jan 12, 2024 15:55

I talk about VariableTracker in Dynamo. VariableTracker is Dynamo's representation of the Python. I talk about some recent changes, namely eager guards and mutable VT. I also tell you how to find the functionality you care about in VariableTracker (https://docs.google.com/document/d/1XDPNK3iNNShg07jRXDOrMk2V_i66u1hEbPltcsxE-3E/edit#heading=h.i6v7gqw5byv6).

python vt dynamo

Unbacked SymInts

Play Episode Listen Later Feb 21, 2023 21:31

This podcast goes over the basics of unbacked SymInts. You might want to listen to this one before listening to https://pytorch-dev-podcast.simplecast.com/episodes/zero-one-specialization Some questions we answer (h/t from Gregory Chanan): - Are unbacked symints only for export? Because otherwise I could just break / wait for the actual size. But maybe I can save some retracing / graph breaks perf if I have them too? So the correct statement is "primarily" for export?- Why am I looking into the broadcasting code at all? Naively, I would expect the export graph to be just a list of ATen ops strung together. Why do I recurse that far down? Why can't I annotate DONT_TRACE_ME_BRO?- How does 0/1 specialization fit into this? I understand we may want to 0/1 specialize in a dynamic shape regime in "eager" mode (is there a better term?), but that doesn't seem to matter for export?- So far we've mainly been talking about how to handle our own library code. There is a worry about pushing complicated constraints downstream, similar to torchscript. What constraints does this actually push?

aten naively

Zero-one specialization

Play Episode Listen Later Feb 20, 2023 21:07

Mikey Dagistes joins me to ask some questions about the recent recent composability sync https://www.youtube.com/watch?v=NJV7YFbtoR4 where we discussed 0/1 specialization and its implications on export in PT2. What's the fuss all about? What do I need to understand about PT2 to understand why 0/1 specialization is a thing?

specialization zero one

torchdynamo

Play Episode Listen Later Dec 6, 2022 25:35

What is torchdynamo? From a bird's eye view, what exactly does it do? What are some important things to know about it? How does it differ from other graph capture mechanisms?For more reading, check out https://docs.google.com/document/d/13K03JN4gkbr40UMiW4nbZYtsw8NngQwrTRnL3knetGM/edit#

PyTorch 2.0

Play Episode Listen Later Dec 4, 2022 17:51

Soumith's keynote on PT2.0: https://youtu.be/vbtGZL7IrAw?t=1037PT2 Manifesto: https://docs.google.com/document/d/1tlgPcR2YmC3PcQuYDPUORFmEaBPQEmo8dsh4eUjnlyI/edit# PT2 Architecture: https://docs.google.com/document/d/1wpv8D2iwGkKjWyKof9gFdTf8ISszKbq1tsMVm-3hSuU/edit#

pytorch

History of functorch

Play Episode Listen Later Nov 7, 2022 19:10

Join me with Richard Zou to talk about the history of functorch. What was the thought process behind the creation of functorch? How did it get started? JAX's API and model is fairly different from PyTorch's, how did we validate that it would work in PyTorch? Where did functorch go after the early user studies? Where is it going next?

history api pytorch

Learning rate schedulers

Play Episode Listen Later Jun 13, 2022 19:35

What's a learning rate? Why might you want to schedule it? How does the LR scheduler API in PyTorch work? What the heck is up with the formula implementation? Why is everything terrible?

learning api lr pytorch schedulers

Weak references

Play Episode Listen Later Jun 6, 2022 16:46

What are they good for? (Caches. Private fields.) C++ side support, how it's implemented / release resources. Python side support, how it's implemented. Weak ref tensor hazard due to resurrection. Downsides of weak references in C++. Scott Wolchok's release resources optimization.Other episodes to listen to first: https://pytorch-dev-podcast.simplecast.com/episodes/reference-counting https://pytorch-dev-podcast.simplecast.com/episodes/pyobject-preservation

private weak references python downsides caches

Strides

Play Episode Listen Later May 30, 2022 20:31

Mike Ruberry has an RFC about stride-agnostic operator semantics (https://github.com/pytorch/pytorch/issues/78050), so let's talk about strides. What are they? How are they used to implement views and memory format? How do you handle them properly when writing kernels? In what sense are strides overspecified, and therefore, not worth slavishly reimplementing in a system like PrimTorch? What does Edward think we should do about them?My blog post that covers strides along with other topics can be found at http://blog.ezyang.com/2019/05/pytorch-internals/

strides rfc

AOTAutograd

Play Episode Listen Later May 9, 2022 19:12

AOTAutograd is a cool new feature in functorch for capturing both forward and backward traces of PyTorch operators, letting you run them through a compiler and then drop the compiled kernels back into a normal PyTorch eager program. Today, Horace joins me to tell me how it works, what it is good to use for, and what our future plans for it are.

horace pytorch

Dispatcher questions with Sherlock

Play Episode Listen Later May 2, 2022 18:36

Sherlock recently joined the PyTorch team, having previously worked on ONNX Runtime at Microsoft, and Sherlock's going to ask me some questions about the dispatcher, and I'm going to answer them. We talked about the history of the dispatcher, how to override dispatching order, multiple dispatch, how to organize various dispatch keys and torch function mode. The companion video is at https://youtu.be/6ibjl_ngY-w

microsoft sherlock dispatchers pytorch ngy

New CI

Play Episode Listen Later Apr 25, 2022 16:12

PyTorch recently moved all of its CI from CircleCI to GitHub Actions. There were a lot of improvements in the process, making my old podcast about CI obsolete! Today, Eli Uriegas joins me to talk about why we moved to GitHub Actions, how the new CI system is put together, and what some cool features about our new CI.

pytorch github actions circleci

Python exceptions

Play Episode Listen Later Apr 17, 2022 14:47

C++ has exceptions, Python has exceptions. But they're not the same thing! How do exceptions work in CPython, how do we translate exceptions from C++ to Python (hint: it's different for direct bindings versus pybind11), and what do warnings (which we also translate from C++ to Python) have in common with this infrastructure?

python exceptions cpython

Torch vs ATen APIs

Play Episode Listen Later Apr 11, 2022 15:03

PyTorch's torch API is the Python API everyone knows and loves, but there's also another API, the ATen API, which most of PyTorch's internal subsystems are built on. How to tell them apart? What implications do these have on our graph mode IR design? Also, a plug for PrimTorch, a new set of operators, not designed for eager mode, that is supposed to be even lower level than ATen.

ir api apis torch aten frontend pytorch

All about NVIDIA GPUs

Play Episode Listen Later Sep 24, 2021 19:29

PyTorch is in the business of shipping numerical software that can run fast on your CUDA-enabled NVIDIA GPU, but it turns out there is a lot of heterogeneity in NVIDIA's physical GPU offering and when it comes to what is fast and what is slow, what specific GPU you have on hand matters quite a bit. Yet there are literally hundreds of distinct NVIDIA GPU models on the market, how do you make sense of the madness? Today, Natalia Gimelshein joins me to talk about everything that's going on in the NVIDIA GPU market, and what, as a framework developer, you have to care about to make sense of it all.Further reading.NVIDIA microarchitectures on Wikipedia https://en.wikipedia.org/wiki/Category:Nvidia_microarchitecturesA slightly old post about matching SM to architecture https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

wikipedia nvidia sm gpu cuda pytorch nvidia gpus

Tensor subclasses and Liskov substitution principle

Play Episode Listen Later Sep 16, 2021 19:13

A lot of recent work going in PyTorch is all about adding new and interesting Tensor subclasses, and this all leads up to the question of, what exactly is OK to make a tensor subclass? One answer to this question comes from an old principle from Barbara Liskov called the Liskov substitution principle, which informally can be stated as S is a subtype of T if anywhere you have T, it can be replaced with S without altering "desirable" properties of this program. In this podcast I'll talk about LSP and how it relates to the design of Tensor subclasses and a hypothetical "abstract Tensor specification" which really doesn't exist but which sort of implicitly exists in the corpus of existing PyTorch programs.Further reading:This is a cool interview with Barbara Liskov that I quote in the podcast https://www.youtube.com/watch?v=-Z-17h3jG0AMax Balandat talking about linear operators in PyTorch https://github.com/pytorch/pytorch/issues/28341At the end I talk a little bit about multiple dispatch; an earlier discussion about this topic is in this podcast https://pytorch-dev-podcast.simplecast.com/episodes/multiple-dispatch-in-torch-function

principle substitution tensor lsp pytorch subclasses liskov barbara liskov

Half precision

Play Episode Listen Later Sep 10, 2021 18:00

In this episode I talk about reduced precision floating point formats float16 (aka half precision) and bfloat16. I'll discuss what floating point numbers are, how these two formats vary, and some of the practical considerations that arise when you are working with numeric code in PyTorch that also needs to work in reduced precision. Did you know that we do all CUDA computations in float32, even if the source tensors are stored as float16? Now you know!Further reading.The Wikipedia article on IEEE floating point is pretty great https://en.wikipedia.org/wiki/IEEE_754How bfloat16 works out when doing training https://arxiv.org/abs/1905.12322Definition of acc_type in PyTorch https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/AccumulateType.h

wikipedia precision ieee cuda pytorch

DataLoader with multiple workers leaks memory

Play Episode Listen Later Sep 1, 2021 16:38

Today I'm going to talk about a famous issue in PyTorch, DataLoader with num_workers > 0 causes memory leak (https://github.com/pytorch/pytorch/issues/13246). This bug is a good opportunity to talk about DataSet/DataLoader design in PyTorch, fork and copy-on-write memory in Linux and Python reference counting; you have to know about all of these things to understand why this bug occurs, but once you do, it also explains why the workarounds help.Further reading.A nice summary of the full issue https://github.com/pytorch/pytorch/issues/13246#issuecomment-905703662DataLoader architecture RFC https://github.com/pytorch/pytorch/issues/49440Cinder Python https://github.com/facebookincubator/cinder

memory workers leaks python linux rfc pytorch

Batching

Play Episode Listen Later Aug 18, 2021 13:37

PyTorch operates on its input data in a batched manner, typically processing multiple batches of an input at once (rather than once at a time, as would be the case in typical programming). In this podcast, we talk a little about the implications of batching operations in this way, and then also about how PyTorch's API is structured for batching (hint: poorly) and how Numpy introduced a concept of ufunc/gufuncs to standardize over broadcasting and batching behavior. There is some overlap between this podcast and previous podcasts about TensorIterator and vmap; you may also be interested in those episodes.Further reading.ufuncs and gufuncs https://numpy.org/doc/stable/reference/ufuncs.html and https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.htmlA brief taxonomy of PyTorch operators by shape behavior http://blog.ezyang.com/2020/05/a-brief-taxonomy-of-pytorch-operators-by-shape-behavior/Related episodes on TensorIterator and vmap https://pytorch-dev-podcast.simplecast.com/episodes/tensoriterator and https://pytorch-dev-podcast.simplecast.com/episodes/vmap

api batching pytorch htmla numpy

Multiple dispatch in __torch_function__

Play Episode Listen Later Aug 10, 2021 14:20

Python is a single dispatch OO language, but there are some operations such as binary magic methods which implement a simple form of multiple dispatch. torch_function__ (through its Numpy predecessor __array_function) generalizes this mechanism so that invocations of torch.add with different subclasses work properly. This podcast describes how this mechanism works and how it can be used (in an unconventional way) to build composable subclasses ala JAX in functorch.Further reading:This podcast in written form https://dev-discuss.pytorch.org/t/functorch-levels-as-dynamically-allocated-classes/294Multiple dispatch resolution rules in the RFC https://github.com/pytorch/rfcs/blob/master/RFC-0001-torch-function-for-methods.md#process-followed-during-a-functionmethod-call

function python torch dispatch oo rfc numpy

Multithreading

Play Episode Listen Later Aug 3, 2021 18:34

Writing multithreading code has always been a pain, and in PyTorch there are buckets and buckets of multithreading related issues you have to be aware about and deal with when writing code that makes use of it. We'll cover how you interface with multithreading in PyTorch, what goes into implementing those interfaces (thread pools!) and also some miscellaneous stuff like TLS, forks and data structure thread safety that is also relevant.Further reading:TorchScript CPU inference threading documentation https://github.com/pytorch/pytorch/blob/master/docs/source/notes/cpu_threading_torchscript_inference.rstc10 thread pool https://github.com/pytorch/pytorch/blob/master/c10/core/thread_pool.h and autograd thread pool https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/engine.cppTracking issue for TLS propagation across threads https://github.com/pytorch/pytorch/issues/28520

writing tls pytorch multithreading

Asynchronous versus synchronous execution

Play Episode Listen Later Jul 27, 2021 15:03

CUDA is asynchronous, CPU is synchronous. Making them play well together can be one of the more thorny and easy to get wrong aspects of the PyTorch API. I talk about why non_blocking is difficult to use correctly, a hypothetical "asynchronous CPU" device which would help smooth over some of the API problems and also why it used to be difficult to implement async CPU (but it's not hard anymore!) At the end, I also briefly talk about how async/sync impedance can also show up in unusual places, namely the CUDA caching allocator.Further reading.CUDA semantics which discuss non_blocking somewhat https://pytorch.org/docs/stable/notes/cuda.htmlIssue requesting async cpu https://github.com/pytorch/pytorch/issues/44343

execution api cpu asynchronous cuda

gradcheck

Play Episode Listen Later Jul 23, 2021 16:58

We talk about gradcheck, the property based testing mechanism that we use to verify the correctness of analytic gradient formulas in PyTorch. I'll talk a bit about testing in general, property based testing and why gradcheck is a particularly useful property based test. There will be some calculus, although I've tried to keep the math mostly to intuitions and pointers on what to read up on elsewhere.Further reading.Gradcheck mechanics, a detailed mathematical explanation of how it works https://pytorch.org/docs/stable/notes/gradcheck.html In particular, it also explains how gradcheck extends to complex numbersJAX has a pretty good explanation about vjp and jvp at https://jax.readthedocs.io/en/latest/notebooks/autodiff_cookbook.htmlFast gradcheck tracking issue https://github.com/pytorch/pytorch/issues/53876

pytorch

torch.use_deterministic_algorithms

Play Episode Listen Later Jul 21, 2021 10:50

torch.use_deterministic_algorithms lets you force PyTorch to use deterministic algorithms. It's very useful for debugging!There are some errors in the recording: the feature is called torch.use_deterministic_algorithms, and there is not actually a capability to warn (this was in an old version of the PR but taken out), we just error if you hit nondeterministic code.Docs: https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms

pr algorithms docs torch pytorch deterministic

Reference counting

Play Episode Listen Later Jul 20, 2021 15:14

Reference counting is a common memory management technique in C++ but PyTorch does its reference counting in a slightly idiosyncratic way using intrusive_ptr. We'll talk about why intrusive_ptr exists, the reason why refcount bumps are slow in C++ (but not in Python), what's up with const Tensor& everywhere, why the const is a lie and how TensorRef lets you create a const Tensor& from a TensorImpl* without needing to bump your reference count.Further reading.Why you shouldn't feel bad about passing tensor by reference https://dev-discuss.pytorch.org/t/we-shouldnt-feel-bad-about-passing-tensor-by-reference/85Const correctness in PyTorch https://github.com/zdevito/ATen/issues/27TensorRef RFC https://github.com/pytorch/rfcs/pull/16

counting python aten tensor pytorch

Memory layout

Play Episode Listen Later Jul 13, 2021 16:26

Memory layout specifies how the logical multi-dimensional tensor maps its elements onto physical linear memory. Some layouts admit more efficient implementations, e.g., NCHW versus NHWC. Memory layout makes use of striding to allow users to conveniently represent their tensors with different physical layouts without having to explicitly tell every operator what to do.Further reading.Tutorial https://pytorch.org/tutorials/intermediate/memory_format_tutorial.htmlMemory format RFC https://github.com/pytorch/pytorch/issues/19092Layout permutation proposal (not implemented) https://github.com/pytorch/pytorch/issues/32078

memory tutorials layout rfc

pytorch-probot

Play Episode Listen Later Jul 12, 2021 13:06

pytorch-probot is a GitHub application that we use to automate common tasks in GitHub. I talk about what it does and some design philosophy for it. Repo is at: https://github.com/pytorch/pytorch-probot

github repo pytorch probot

API design via lexical and dynamic scoping

Play Episode Listen Later Jul 9, 2021 21:44

Lexical and dynamic scoping are useful tools to reason about various API design choices in PyTorch, related to context managers, global flags, dynamic dispatch, and how to deal with BC-breaking changes. I'll walk through three case studies, one from Python itself (changing the meaning of division to true division), and two from PyTorch (device context managers, and torch function for factory functions).Further reading.Me unsuccessfully asking around if there was a way to simulate __future__ in libraries https://stackoverflow.com/questions/66927362/way-to-opt-into-bc-breaking-changes-on-methods-within-a-single-moduleA very old issue asking for a way to change the default GPU device https://github.com/pytorch/pytorch/issues/260 and a global GPU flag https://github.com/pytorch/pytorch/issues/7535A more modern issue based off the lexical module idea https://github.com/pytorch/pytorch/issues/27878Array module NEP https://numpy.org/neps/nep-0037-array-module.html

bc dynamic api python gpu nep scoping pytorch lexical api design

Intro to distributed

Play Episode Listen Later Jul 8, 2021 15:41

Today, Shen Li (mrshenli) joins me to talk about distributed computation in PyTorch. What is distributed? What kinds of things go into making distributed work in PyTorch? What's up with all of the optimizations people want to do here?Further reading.PyTorch distributed overview https://pytorch.org/tutorials/beginner/dist_overview.htmlDistributed data parallel https://pytorch.org/docs/stable/notes/ddp.html

distributed pytorch

Double backwards

Play Episode Listen Later Jul 7, 2021 16:39

Double backwards is PyTorch's way of implementing higher order differentiation. Why might you want it? How does it work? What are some of the weird things that happen when you do this?Further reading.Epic PR that added double backwards support for convolution initially https://github.com/pytorch/pytorch/pull/1643

backwards pytorch

Functional modules

Play Episode Listen Later Jul 6, 2021 14:34

Functional modules are a proposed mechanism to take PyTorch's existing NN module API and transform it into a functional form, where all the parameters are explicit argument. Why would you want to do this? What does functorch have to do with it? How come PyTorch's existing APIs don't seem to need this? What are the design problems?Further reading.Proposal in GitHub issues https://github.com/pytorch/pytorch/issues/49171Linen design in flax https://flax.readthedocs.io/en/latest/design_notes/linen_design_principles.html

proposal functional api github apis modules nn pytorch

CUDA graphs

Play Episode Listen Later Jun 28, 2021 13:55

What are CUDA graphs? How are they implemented? What does it take to actually use them in PyTorch?Further reading.NVIDIA has docs on CUDA graphs https://developer.nvidia.com/blog/cuda-graphs/Nuts and bolts implementation PRs from mcarilli: https://github.com/pytorch/pytorch/pull/51436 https://github.com/pytorch/pytorch/pull/46148

nuts nvidia graphs prs cuda pytorch

Default arguments

Play Episode Listen Later Jun 25, 2021 14:57

What do default arguments have to do with PyTorch design? Why are default arguments great for clients (call sites) but not for servers (implementation sites)? In what sense are default arguments a canonicalization to max arity? What problems does this canonicalization cause? Can you canonicalize to minimum arity? What are some lessons to take?Further reading. https://github.com/pytorch/pytorch/issues/54613 stop serializing default arguments

arguments default pytorch

Anatomy of a domain library

Play Episode Listen Later Jun 24, 2021 16:11

What's a domain library? Why do they exist? What do they do for you? What should you know about developing in PyTorch main library versus in a domain library? How coupled are they with PyTorch as a whole? What's cool about working on domain libraries?Further reading.The classic trio of domain libraries is https://pytorch.org/audio/stable/index.html https://pytorch.org/text/stable/index.html and https://pytorch.org/vision/stable/index.htmlLine notes.why do domain libraries exist? lots of domains specific gadgets,inappropriate for PyTorchwhat does a domain library dooperator implementations (old days: pure python, not anymore)with autograd support and cuda accelerationesp encoding/decoding, e.g., for domain file formatstorchbind for custom objectstakes care of getting the dependencies for youesp transformations, e.g., for data augmentationmodels, esp pretrained weightsdatasetsreference scriptsfull wheel/conda packaging like pytorchmobile compatibilityseparate repos: external contributors with direct accessmanual sync to fbcode; a lot easier to land code! lessmotion so lower riskcoupling with pytorch? CI typically runs on nightliespytorch itself tests against torchvision, canary againstextensibility mechanismsmostly not using internal tools (e.g., TensorIterator),too unstable (this would be good to fix)closer to research side of pytorch; francesco also part of papers

anatomy library domain pytorch

TensorAccessor

Play Episode Listen Later Jun 23, 2021 11:40

What's TensorAccessor? Why not just use a raw pointer? What's PackedTensorAccessor? What are some future directions for mixing statically typed and typed erase code inside PyTorch proper?Further reading. TensorAccessor source code, short and sweet https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/TensorAccessor.hLegacy THCDeviceTensor https://github.com/pytorch/pytorch/blob/master/aten/src/THC/THCDeviceTensor.cuh

aten pytorch

Random number generators

Play Episode Listen Later Jun 22, 2021 14:24

Why are RNGs important? What is the generator concept? How do PyTorch's CPU and CUDA RNGs differ? What are some of the reasons why Philox is a good RNG for CUDA? Why doesn't the generator class have virtual methods for getting random numbers? What's with the next normal double and what does it have to do with Box Muller transform? What's up with csprng?Further reading.CUDAGeneratorImpl has good notes about CUDA graph interaction and pointers to all of the rest of the stuff https://github.com/pytorch/pytorch/blob/1dee99c973fda55e1e9cac3d50b4d4982b6c6c26/aten/src/ATen/CUDAGeneratorImpl.hTransform uniformly distributed random numbers to other distributions with https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/TransformationHelper.htorchcsprng https://github.com/pytorch/csprng

cpu aten generators cuda rng pytorch rngs

vmap

Play Episode Listen Later Jun 21, 2021 17:47

What is vmap? How is it implemented? How does our implementation compare to JAX's? What is a good way of understanding what vmap does? What's up with random numbers? Why are there some issues with the vmap that PyTorch currently ships?Further reading.Tracking issue for vmap support https://github.com/pytorch/pytorch/issues/42368BatchedTensor source code https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/BatchedTensorImpl.h , logical-physical transformation helper code https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/VmapTransforms.h (well documented, worth a read)functorch, the better, more JAX-y implementation of vmap https://github.com/facebookresearch/functorchAutodidax https://jax.readthedocs.io/en/latest/autodidax.html which contains a super simple vmap implementation that is a good model for the internal implementation that PyTorch has

tracking pytorch

Expect tests

Play Episode Listen Later Jun 18, 2021 13:26

What's an expect test? Why should you use them? Why is inline expect test better than out of line? How to write a good expect test? Further reading. expecttest source implementation https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/expecttest.py (only 311 lines!)

tests

Claim PyTorch Developer Podcast

In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

Claim Cancel