Podcasts about Approximation

Something roughly the same as something else

135PODCASTS
216EPISODES
39mAVG DURATION
1MONTHLY NEW EPISODE
Jun 27, 2025LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about Approximation

Learning Machines 101

9 episodes with Approximation

Chemistry 442: Physical Chemistry I

7 episodes with Approximation

The Nonlinear Library

3 episodes with Approximation

OCW Scholar: Single Variable Calculus

4 episodes with Approximation

Advanced Visualization (ECS277)

8 episodes with Approximation

Calculus I

3 episodes with Approximation

ScreenTone Club

2 episodes with Approximation

The Dark Horde Network

2 episodes with Approximation

Chem 131A Physical Chemistry

4 episodes with Approximation

Papers Read on AI

2 episodes with Approximation

Fall 2014 Shamatha, Vipashyana, Dream Yoga

2 episodes with Approximation

Algorithm Design and Analysis

2 episodes with Approximation

Design and Analysis of Algorithms (2015)

2 episodes with Approximation

Modellansatz

8 episodes with Approximation

Highly Oscillatory Problems: Computation, Theory and Application

7 episodes with Approximation

Applied Calculus (Chapters 1 - 3) - Course

6 episodes with Approximation

Combined Calculus (Chapters 3 - 6)

6 episodes with Approximation

Astro arXiv | all categories

4 episodes with Approximation

Latest podcast episodes about Approximation

The 300-Year-Old Physics Mistake No One Noticed

Theories of Everything with Curt Jaimungal

Play Episode Listen Later Jun 27, 2025 115:13

As a listener of TOE you can get a special 20% off discount to The Economist and all it has to offer! Visit https://www.economist.com/toe Professor John Norton has spent decades dismantling the hidden assumptions in physics from Newton's determinism to the myth of Landauer's Principle. In this episode, he explains why causation may not be real, how classical physics breaks down, and why even Einstein got some things wrong. If you're ready to rethink the foundations of science, this one's essential. Join My New Substack (Personal Writings): https://curtjaimungal.substack.com Listen on Spotify: https://open.spotify.com/show/4gL14b92xAErofYQA7bU4e Timestamps: 00:00 Introduction 03:37 Norton's Dome Explained 06:30 The Misunderstanding of Determinism 09:31 Thermodynamics and Infinite Systems 14:39 Implications for Quantum Mechanics 16:20 Revisiting Causation 18:15 Critique of Causal Metaphysics 20:21 The Utility of Causal Language 24:58 Exploring Thought Experiments 33:05 Landauer's Principle Discussion 49:48 Critique of Experimental Validation 52:25 Consequences for Maxwell's Demon 1:13:34 Einstein's Critiques of Quantum Mechanics 1:28:16 The Nature of Scientific Discovery 1:42:56 Inductive Inferences in Science Links Mentioned: •⁠ ⁠A Primer on Determinism (book): https://amzn.to/45Jn3b4 •⁠ ⁠John Norton's papers: https://scholar.google.com/citations?user=UDteMFoAAAAJ •⁠ ⁠Causation as Folk Science (paper): https://sites.pitt.edu/~jdnorton/papers/003004.pdf •⁠ ⁠Lipschitz continuity (wiki): https://en.wikipedia.org/wiki/Lipschitz_continuity •⁠ ⁠The Dome: An Unexpectedly Simple Failure of Determinism (paper): https://philsci-archive.pitt.edu/2943/1/Norton.pdf •⁠ ⁠Norton's Dome (wiki): https://en.wikipedia.org/wiki/Norton%27s_dome •⁠ ⁠Approximation and Idealization (paper): https://sites.pitt.edu/~jdnorton/papers/Ideal_Approx_final.pdf •⁠ ⁠On the Quantum Theory of Radiation (paper): https://www.informationphilosopher.com/solutions/scientists/einstein/1917_Radiation.pdf •⁠ ⁠Making Things Happen (book): https://ccc.inaoep.mx/~esucar/Clases-mgc/Making-Things-Happen-A-Theory-of-Causal-Explanation.pdf •⁠ ⁠Causation in Physics (wiki): https://plato.stanford.edu/entries/causation-physics/ •⁠ ⁠Laboratory of the Mind (paper): https://www.academia.edu/2644953/REVIEW_James_R_Brown_Laboratory_of_the_Mind •⁠ ⁠Roger Penrose on TOE: https://youtu.be/sGm505TFMbU •⁠ ⁠Ted Jacobson on TOE: https://youtu.be/3mhctWlXyV8 •⁠ ⁠The Thermodynamics of Computation (paper): https://sites.cc.gatech.edu/computing/nano/documents/Bennett%20-%20The%20Thermodynamics%20Of%20Computation.pdf •⁠ ⁠What's Actually Possible? (article): https://curtjaimungal.substack.com/p/the-unexamined-in-principle •⁠ ⁠On a Decrease of Entropy in a Thermodynamic System (paper): https://fab.cba.mit.edu/classes/862.22/notes/computation/Szilard-1929.pdf •⁠ ⁠Landauer's principle and thermodynamics (article): https://www.nature.com/articles/nature10872 •⁠ ⁠The Logical Inconsistency of Old Quantum Theory of Black Body Radiation (paper): https://sites.pitt.edu/~jdnorton/papers/Inconsistency_OQT.pdf SUPPORT: - Become a YouTube Member (Early Access Videos): https://www.youtube.com/channel/UCdWIQh9DGG6uhJk8eyIFl1w/join - Support me on Patreon: https://patreon.com/curtjaimungal - Support me on Crypto: https://commerce.coinbase.com/checkout/de803625-87d3-4300-ab6d-85d4258834a9 - Support me on PayPal: https://www.paypal.com/donate?hosted_button_id=XUBHNMFXUX5S4 SOCIALS: - Twitter: https://twitter.com/TOEwithCurt - Discord Invite: https://discord.com/invite/kBcnfNVwqs #science Learn more about your ad choices. Visit megaphone.fm/adchoices

Big Bang Buzzcast Episode 277: The Prestidigitation Approximation

The Big Bang Buzz - Big Bang Theory Podcast and News

Play Episode Listen Later Jun 2, 2025

This week's episode includes Pope talk from Nicole (recorded right before the Pope was chosen!), the difficulties of wearing contacts, Priya's request about Penny and trust in relationships, if we guessed the card trick, and more!Download hereRunning time: 1:16:33, 55.3 MB

running theory pope bang big bang parsons mb priya kaley approximation cuoco prestidigitation buzzcast galecki

What is Successive Approximation in Dog Training?

Leerburg's Dog Training Podcast

Play Episode Listen Later May 28, 2025 6:21

In this episode, Ed Frawley discusses the concept known as successive approximation and how it is used in dog training. Successive Approximation is defined as rewarding small pieces of behavior that eventually lead to a more complex or comprehensive behavior, known colloquially as connecting pieces to a puzzle.| Links mentioned: Behavior Creation with Michaell Ellis: https://university.leerburg.com/Catalog/viewCourse/cid/236 | Basic Dog Obedience with Ed Frawley: https://university.leerburg.com/Catalog/viewCourse/cid/5 | Intermediate Dog Obedience with Ed Frawley: https://university.leerburg.com/Catalog/viewCourse/cid/81 | The Power of Training Dogs with Markers with Ed Frawley: https://university.leerburg.com/Catalog/viewCourse/cid/96

power dogs markers dog training successive approximation training dogs ed frawley

What is Successive Approximation in Dog Training

Leerburg Dog Training Video Podcast

Play Episode Listen Later May 27, 2025 6:20

Ed Frawley 06:20 no

dogs dog training successive approximation ed frawley

#077 Jens Frank "successive approximation"

Kynotalk by Kynotec

Play Episode Listen Later Dec 1, 2024 98:21

⚠️ Due to some technical issues during recording, a few seconds are missing. In this episode, Flo and Jens Frank discuss key aspects of dog training, particularly for detection dogs. They cover the significance of training facilities, effective methods, and handler behavior. The conversation contrasts principle-based and rule-based training and emphasizes the importance of curiosity and continuous learning for trainers. They highlight the complexities of dog training, including economic perspectives and the need for strategic planning. The episode concludes with light-hearted reflections on different dog breeds. Chapters 00:00 Introduction and Technical Issues 01:00 Welcoming Jens Frank and Workshop Overview 03:01 Training Facilities and Their Importance 05:51 Training Methods and Practical Exercises 09:08 Equipment in Dog Training 12:10 Handler Behavior and Clever Hans Effect 22:11 Probability of Detection and Training Strategies 30:09 Principle-Based vs Rule-Based Training 44:00 Curiosity in Dog Training 48:54 Economic Perspectives on Dog Training 55:00 Successive Approximation in Dog Training 01:10:15 Understanding Overshadowing in Training 01:32:39 Personal Reflections on Dog Breeds Find more from Jens

curiosity equipment jens detection probability personal reflections training methods successive approximation economic perspectives jens frank

S5 05 - A Rough Approximation of Woman

Doctor Who: Too Hot For TV

Play Episode Listen Later Oct 26, 2024 105:36

Send us a textThe future maybe uncertain but the past was fucked. Join Dylan, Joe Ford and Luke Molloy as they experience two of Doctor Who's most controversial lost stories. First up it's 'Mission to Magnus' by Phillip Martin and then it's 'Prison in Space' by Dick Sharples. The trio traverse the many topics associated with these adventures and answer the burning questions: How do you reconstruct something that was never constructed? Who is dripping wet in Sil juice? Who definitely isn't a sex pest?

woman mission space prison rough sil big finish colin baker approximation joe ford frazer hines nicola bryant phillip martin wendy padbury nick briggs

#137: Optimal Quantile Approximation in Streams

Misreading Chat

Play Episode Listen Later Aug 13, 2024

ストリームにパーセンタイルを計算したい森田が教科書を読みました。

optimal streams approximation quantile

Ep. 56 - 3x14 - The Einstein Approximation

Theoretical Nonsense: The Big Bang Theory Watch-a-Long, No PHD Necessary

Play Episode Listen Later Jun 16, 2024 106:37

Check out our recap and breakdown of Season 3 Episode 14 of the Big Bang Theory! We found 6 IQ Points!00:00:00 - Intro00:03:52 - Summary Begins00:08:09 - What is Sheldon doing and is it backed by science?00:13:07 - Alligators vs Crocodiles00:19:26 - Throwing things out your window, don't do it!00:42:13 - Why electrons behave as if they have no mass when travelling through a graphene sheet00:56:46 - Does menial work lead to the eureka moment? 01:17:14 - What are the most boring jobs in the world? Find us everywhere at: https://linktr.ee/theoreticalnonsense~~*CLICK THE LINK TO SEE OUR IQ POINT HISTORY TOO! *~~-------------------------------------------------Welcome to Theoretical Nonsense! If you're looking for a Big Bang Theory rewatch podcast blended with How Stuff Works, this is the podcast for you! Hang out with Rob and Ryan where they watch each episode of The Big Bang Theory and break it down scene by scene, and fact by fact, and no spoilers! Ever wonder if the random information Sheldon says is true? We do the research and find out! Is curry a natural laxative, what's the story behind going postal, are fish night lights real? Watch the show with us every other week and join in on the discussion! Email us at theoreticalnonsensepod@gmail.com and we'll read your letter to us on the show! Even if it's bad! :) Music by Alex Grohl. Find official podcast on Apple and Spotify https://podcasts.apple.com/us/podcast/theoretical-nonsense-the-big-bang-theory-watch-a/id1623079414

music spotify apple hang albert einstein throwing alligators big bang theory sheldon howstuffworks approximation alexgrohl

Forensic Facial Approximation with Lisa Bailey

Medical Illustration Podcast

Play Episode Listen Later Jun 1, 2024 89:26

In this episode we have an extraordinary guest whose work has bridged the gap not only between art and science, but also mystery and investigation. Lisa Bailey is a retired Forensic Artist who worked as a Digital Information Specialist for the FBI Laboratory. Working with investigators and Forensic Anthropologists, she worked with unidentified remains, using details about the age, sex, stature and ancestry of the individual in conjunction with a 3D print of their skull to sculpt facial approximations of what these individuals may have looked like, hoping to spark recognition and generate leads from the public. Show notes at: https://www.pkvisualization.com/post/medical-illustration-podcast-lisa-bailey-interview

3d facial forensic approximation lisa bailey fbi laboratory

Erik Dale: Uncovering Bitcoin's Hidden Truths | BFM048

Bitcoin for Millennials

Play Episode Listen Later May 21, 2024 56:18

Erik Dale is a “European revivalist” and prolific Bitcoiner, host of the ‘Bitcoin for Breakfast' podcast, and organizer of the Northern Lightning Bitcoin conference in Scandinavia. He is an expert communications professional and was a former adviser to the President of the European Commission. Erik has a unique perspective on how Bitcoin must save the old world and shares a lot about his experience of how Bitcoin (and LSD and parenthood) changed his life. → Follow Erik on https://x.com/EuroDale → Erik's YouTube: https://www.youtube.com/@kongeriket

president european inspiration embracing millennials bitcoin consequences breakfast invitation truths uncovering lsd consensus scandinavia european commission hidden truth bitcoiners approximation accepting reality erik dale

Episode 162 - Approximation, Precision, & Accuracy in Measurement

The Build Math Minds Podcast

Play Episode Listen Later Mar 24, 2024 9:08

In this video I'm sharing a part of a book that made me go “Hmm, I've never really thought about that!” It has to do with just how precise we can actually be when measuring items. One of the Standards of Mathematical Practice is “Attend to Precision” which is about being precise in all mathematical vocabulary and content, but just how precise should we make students be when it comes to measurement? Get any links mentioned in this video at BuildMathMinds.com/162

standards precision accuracy measurement approximation mathematical practice

LW - "Deep Learning" Is Function Approximation by Zack M Davis

The Nonlinear Library

Play Episode Listen Later Mar 21, 2024 17:10

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Deep Learning" Is Function Approximation, published by Zack M Davis on March 21, 2024 on LessWrong. A Surprising Development in the Study of Multi-layer Parameterized Graphical Function Approximators As a programmer and epistemology enthusiast, I've been studying some statistical modeling techniques lately! It's been boodles of fun, and might even prove useful in a future dayjob if I decide to pivot my career away from the backend web development roles I've taken in the past. More specifically, I've mostly been focused on multi-layer parameterized graphical function approximators, which map inputs to outputs via a sequence of affine transformations composed with nonlinear "activation" functions. (Some authors call these "deep neural networks" for some reason, but I like my name better.) It's a curve-fitting technique: by setting the multiplicative factors and additive terms appropriately, multi-layer parameterized graphical function approximators can approximate any function. For a popular choice of "activation" rule which takes the maximum of the input and zero, the curve is specifically a piecewise-linear function. We iteratively improve the approximation f(x,θ) by adjusting the parameters θ in the direction of the derivative of some error metric on the current approximation's fit to some example input-output pairs (x,y), which some authors call "gradient descent" for some reason. (The mean squared error (f(x,θ)y)2 is a popular choice for the error metric, as is the negative log likelihood logP(y|f(x,θ)). Some authors call these "loss functions" for some reason.) Basically, the big empirical surprise of the previous decade is that given a lot of desired input-output pairs (x,y) and the proper engineering know-how, you can use large amounts of computing power to find parameters θ to fit a function approximator that "generalizes" well - meaning that if you compute ^y=f(x,θ) for some x that wasn't in any of your original example input-output pairs (which some authors call "training" data for some reason), it turns out that ^y is usually pretty similar to the y you would have used in an example (x,y) pair. It wasn't obvious beforehand that this would work! You'd expect that if your function approximator has more parameters than you have example input-output pairs, it would overfit, implementing a complicated function that reproduced the example input-output pairs but outputted crazy nonsense for other choices of x - the more expressive function approximator proving useless for the lack of evidence to pin down the correct approximation. And that is what we see for function approximators with only slightly more parameters than example input-output pairs, but for sufficiently large function approximators, the trend reverses and "generalization" improves - the more expressive function approximator proving useful after all, as it admits algorithmically simpler functions that fit the example pairs. The other week I was talking about this to an acquaintance who seemed puzzled by my explanation. "What are the preconditions for this intuition about neural networks as function approximators?" they asked. (I paraphrase only slightly.) "I would assume this is true under specific conditions," they continued, "but I don't think we should expect such niceness to hold under capability increases. Why should we expect this to carry forward?" I don't know where this person was getting their information, but this made zero sense to me. I mean, okay, when you increase the number of parameters in your function approximator, it gets better at representing more complicated functions, which I guess you could describe as "capability increases"? But multi-layer parameterized graphical function approximators created by iteratively using the derivative of some error metric to improve the quality ...

study speech ea function deep learning approximation rationalist lesswrong

LW - "Deep Learning" Is Function Approximation by Zack M Davis

The Nonlinear Library: LessWrong

Play Episode Listen Later Mar 21, 2024 17:10

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Deep Learning" Is Function Approximation, published by Zack M Davis on March 21, 2024 on LessWrong. A Surprising Development in the Study of Multi-layer Parameterized Graphical Function Approximators As a programmer and epistemology enthusiast, I've been studying some statistical modeling techniques lately! It's been boodles of fun, and might even prove useful in a future dayjob if I decide to pivot my career away from the backend web development roles I've taken in the past. More specifically, I've mostly been focused on multi-layer parameterized graphical function approximators, which map inputs to outputs via a sequence of affine transformations composed with nonlinear "activation" functions. (Some authors call these "deep neural networks" for some reason, but I like my name better.) It's a curve-fitting technique: by setting the multiplicative factors and additive terms appropriately, multi-layer parameterized graphical function approximators can approximate any function. For a popular choice of "activation" rule which takes the maximum of the input and zero, the curve is specifically a piecewise-linear function. We iteratively improve the approximation f(x,θ) by adjusting the parameters θ in the direction of the derivative of some error metric on the current approximation's fit to some example input-output pairs (x,y), which some authors call "gradient descent" for some reason. (The mean squared error (f(x,θ)y)2 is a popular choice for the error metric, as is the negative log likelihood logP(y|f(x,θ)). Some authors call these "loss functions" for some reason.) Basically, the big empirical surprise of the previous decade is that given a lot of desired input-output pairs (x,y) and the proper engineering know-how, you can use large amounts of computing power to find parameters θ to fit a function approximator that "generalizes" well - meaning that if you compute ^y=f(x,θ) for some x that wasn't in any of your original example input-output pairs (which some authors call "training" data for some reason), it turns out that ^y is usually pretty similar to the y you would have used in an example (x,y) pair. It wasn't obvious beforehand that this would work! You'd expect that if your function approximator has more parameters than you have example input-output pairs, it would overfit, implementing a complicated function that reproduced the example input-output pairs but outputted crazy nonsense for other choices of x - the more expressive function approximator proving useless for the lack of evidence to pin down the correct approximation. And that is what we see for function approximators with only slightly more parameters than example input-output pairs, but for sufficiently large function approximators, the trend reverses and "generalization" improves - the more expressive function approximator proving useful after all, as it admits algorithmically simpler functions that fit the example pairs. The other week I was talking about this to an acquaintance who seemed puzzled by my explanation. "What are the preconditions for this intuition about neural networks as function approximators?" they asked. (I paraphrase only slightly.) "I would assume this is true under specific conditions," they continued, "but I don't think we should expect such niceness to hold under capability increases. Why should we expect this to carry forward?" I don't know where this person was getting their information, but this made zero sense to me. I mean, okay, when you increase the number of parameters in your function approximator, it gets better at representing more complicated functions, which I guess you could describe as "capability increases"? But multi-layer parameterized graphical function approximators created by iteratively using the derivative of some error metric to improve the quality ...

study speech ea function deep learning approximation rationalist lesswrong

122 GAMBIT: Genomic Approximation Method for Bacterial Identification and Tracking

Micro binfie podcast

Play Episode Listen Later Mar 9, 2024 17:29

We discuss GAMBIT, software for accurately classifying bacteria and eukaryotes using a targeted k-mer based approach. GAMBIT software: https://github.com/gambit-suite/gambit GAMBIT suite: https://github.com/gambit-suite GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification. https://doi.org/10.1371/journal.pone.0277575 TheiaEuk: a species-agnostic bioinformatics workflow for fungal genomic characterization https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2023.1198213/full

method tracking gambit identification bacterial genomic approximation

#186 Tools and Tips for Teachers: Episode 10 (with Ollie Lovell)

Mr Barton Maths Podcast

Play Episode Listen Later Feb 22, 2024 77:45

In this conversation, Craig and Ollie discuss various topics including Brian Johnson's quest to beat the aging process, fitness goals, teaching reading using Monster Phonics, treating failures as system failures, effective teacher professional development, and the use of silent teacher and checking for listening in the classroom. In this part of the conversation, Craig Barton and Ollie Lovell discuss various teaching strategies and methods. They explore the use of worked examples and the importance of checking for understanding. They also discuss the idea of tightening feedback cycles and the benefits of more frequent assessments. Finally, they delve into the controversy surrounding exit tickets and their effectiveness as a teaching tool. You can access the show-notes here: mrbartonmaths.com/blog/tools-and-tips-for-teachers-10/ Time-stamps: Consider failures first as system failures (09:15) My latest lesson observation and coaching template (16:43) Representation, Decomposition, Approximation (32:16) Two different Starts to Finish so pairs don't copy? (42:20) Tighten feedback cycles (52:57) Are Exit Tickets a waste of time? (1:02:03)

time tools starts teachers finish representation brian johnson tighten lovell decomposition approximation craig barton

LW - Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B? by Teun van der Weij

The Nonlinear Library

Play Episode Listen Later Feb 1, 2024 6:58

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?, published by Teun van der Weij on February 1, 2024 on LessWrong. Produced as part of the ML Alignment Theory Scholars Program Winter 2024 Cohort, under the mentorship of Francis Rhys Ward. The code, data, and plots can be found on https://github.com/TeunvdWeij/MATS/tree/main/distribution_approximation. This post is meant to provide insight on an interesting LLM capability, which is useful for targeted underperformance on evaluations (sandbagging) by LLMs. We investigate what happens if you independently sample a language model a 100 times with the task of 80% of those outputs being A, and the remaining 20% of outputs being B. Here is the prompt we used, where p is the target percentage of the output tokens being A. In the example above, p is 80. In addition to the above prompt, we use the following system prompt: What kind of distribution do you expect? We used the default sampling hyperparameters; both temperature and top_p are 1. We think it is good to register your predictions, please do so now! Results are a bit further below. Here are the observed calibration scores for GPT-3.5 and GPT-4: Interestingly, both models are capable of doing this! Notably, GPT3.5 performs worse than GPT4. Although sampling like this is the easiest to understand, we get a better idea of the exact scores by skipping the sampling altogether. From now on, we extract the probabilities of the top 5 tokens. The plots illustrate that the probabilities roughly match the target distribution. Very little probability is put on tokens other than A and B, illustrating that the prompting points the model to the right tokens. To represent these graphs a bit differently, we plot the extracted probability of token A similar to the first plot. We find a similar pattern. We also observe a vague sigmoidal pattern, aligning with (private) work by Ollie Jaffe. To further test the distribution approximation capability of the model, we performed an experiment where we state the sample number of the query. Now, strategies such as "Pick A for samples 1-80 and B for 81-100" are possible, enabling the model to score perfectly. However, this does not work as both GPT3.5 and GPT4 do not consistently come up with the same strategy (even for temperature of 0). Better prompting might change this result. Testing targeted sandbagging on simple addition questions The capability to approximate distributions over multiple independent forward passes is a necessary condition for "targeted" sandbagging: the ability to achieve a specific score that is worse than the model's best performance. To further investigate this, we generate 100 simple addition questions (such as: "What is 5+4?"), for which the models normally achieve 100% accuracy. The setup was similar to the first experiment, using the following system prompt and user prompt template: For each question in the dataset, we average the extracted probabilities per value of p for all 100 questions. We show the average extracted probabilities in the plot below. We see that GPT-3.5 is much worse at targeted sandbagging than the distribution approximation task. Again, GPT-4 is better than 3.5, but it also performs worse than on the previous task. From qualitative evaluations, we believe that these results are highly sensitive to the wording of the prompt template. For one template, we found that GPT-3.5 tended to consistently underperform (while the plot shows it overperformance). We did not try very hard to find an ideal prompt leading to good calibration. We do expect that such a prompt does exist. Discussion The main question we are asking ourselves: are our findings surprising? We do not have a convincing theory of what is going on here, and why the models are able...

simple language testing speech models results ea distribution vander gpt yield notably llm cohorts teun sampled approximation rationalist lesswrong weij

LW - Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B? by Teun van der Weij

The Nonlinear Library: LessWrong

Play Episode Listen Later Feb 1, 2024 6:58

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?, published by Teun van der Weij on February 1, 2024 on LessWrong. Produced as part of the ML Alignment Theory Scholars Program Winter 2024 Cohort, under the mentorship of Francis Rhys Ward. The code, data, and plots can be found on https://github.com/TeunvdWeij/MATS/tree/main/distribution_approximation. This post is meant to provide insight on an interesting LLM capability, which is useful for targeted underperformance on evaluations (sandbagging) by LLMs. We investigate what happens if you independently sample a language model a 100 times with the task of 80% of those outputs being A, and the remaining 20% of outputs being B. Here is the prompt we used, where p is the target percentage of the output tokens being A. In the example above, p is 80. In addition to the above prompt, we use the following system prompt: What kind of distribution do you expect? We used the default sampling hyperparameters; both temperature and top_p are 1. We think it is good to register your predictions, please do so now! Results are a bit further below. Here are the observed calibration scores for GPT-3.5 and GPT-4: Interestingly, both models are capable of doing this! Notably, GPT3.5 performs worse than GPT4. Although sampling like this is the easiest to understand, we get a better idea of the exact scores by skipping the sampling altogether. From now on, we extract the probabilities of the top 5 tokens. The plots illustrate that the probabilities roughly match the target distribution. Very little probability is put on tokens other than A and B, illustrating that the prompting points the model to the right tokens. To represent these graphs a bit differently, we plot the extracted probability of token A similar to the first plot. We find a similar pattern. We also observe a vague sigmoidal pattern, aligning with (private) work by Ollie Jaffe. To further test the distribution approximation capability of the model, we performed an experiment where we state the sample number of the query. Now, strategies such as "Pick A for samples 1-80 and B for 81-100" are possible, enabling the model to score perfectly. However, this does not work as both GPT3.5 and GPT4 do not consistently come up with the same strategy (even for temperature of 0). Better prompting might change this result. Testing targeted sandbagging on simple addition questions The capability to approximate distributions over multiple independent forward passes is a necessary condition for "targeted" sandbagging: the ability to achieve a specific score that is worse than the model's best performance. To further investigate this, we generate 100 simple addition questions (such as: "What is 5+4?"), for which the models normally achieve 100% accuracy. The setup was similar to the first experiment, using the following system prompt and user prompt template: For each question in the dataset, we average the extracted probabilities per value of p for all 100 questions. We show the average extracted probabilities in the plot below. We see that GPT-3.5 is much worse at targeted sandbagging than the distribution approximation task. Again, GPT-4 is better than 3.5, but it also performs worse than on the previous task. From qualitative evaluations, we believe that these results are highly sensitive to the wording of the prompt template. For one template, we found that GPT-3.5 tended to consistently underperform (while the plot shows it overperformance). We did not try very hard to find an ideal prompt leading to good calibration. We do expect that such a prompt does exist. Discussion The main question we are asking ourselves: are our findings surprising? We do not have a convincing theory of what is going on here, and why the models are able...

simple language testing speech models results ea distribution vander gpt yield notably llm cohorts teun sampled approximation rationalist lesswrong weij

Meet The Team w/ Andreas Heiberg | Career Trajectory | Becoming A Manager | Startup Advice | Working At Stellate | Distributed Work | Mentors | Approximation Of Progress | AI | Question Everything

Future Founder Promise

Play Episode Listen Later Jan 15, 2024 41:18

Show Notes: Andreas Heiberg is an experienced entrepreneur and a software engineering leader with a passion for product development, space exploration, and nuclear energy. From shaping healthcare solutions at Babylon to optimizing global logistics at Gelato, Andreas brings a wealth of experience. He has co-founded and worked at a handful of venture-backed tech companies; ranging from startups to large enterprises across a diverse set of industries. In this interview Andreas shares many insights from his dynamic career. Topics Discussed: U.K. and U.S. Appeal Graphic Design and University Early Website Building Career Trajectory Becoming a Manager Consulting First Startups Startup Advice Working at Stellate Creating Culture Favourite Jobs Peace Time Leadership Moving Mountains Distributed Work Decompressing Influential Mentors Approximation of Progress AI Societal Changes Questioning Norms U.S. Startup Ecosystem Access to Capital Intense Workplace Fields of Interest Nuclear Energy Sailing Education in Denmark Question Everything Links: GitHub Website X LinkedIn

career startups progress babylon andreas mentors trajectory distributed gelato question everything meet the team team w startup advice approximation heiberg distributed work

8. Approximation of Justice

Smoke Screen: Betrayal on the Bayou

Play Episode Listen Later Nov 30, 2023 47:01

Chad is charged on a long list of counts related to corruption. He isn't just going to plead guilty. He's going to trial. In court, he'll face his informants and his right-hand men who betrayed him. But whether or not Chad gets what he deserved remains an open question. Is Chad a bad apple or is the DEA a rotten orchard? Subscribe to The Binge to get all episodes of Smoke Screen ad-free right now. Click ‘Subscribe' at the top of the Smoke Screen: Betrayal On The Bayou show page on Apple Podcasts or visit GetTheBinge.com to get access wherever you get your podcasts. A Neon Hum & Sony Music Entertainment production. Find more great podcasts from Sony Music Entertainment at sonymusic.com/podcasts and follow us @sonypodcasts Learn more about your ad choices. Visit podcastchoices.com/adchoices

binge dea smokescreen sony music entertainment approximation getthebinge neon hum

172 - Counterintuitive Line Maps

No Country

Play Episode Listen Later Oct 19, 2023 100:44

On today's episode we're getting attention while gaining traction and forming a humble rapport between audiences in a dead age. Having 3500 years of material to sift through. Absorbing news tangentially. Nostalgic lyric-memory messages. Maintaining a harmonic congruence with reality. Being hyped on ambient anxiety. The perverse side of heroism. Surfing waves of momentum. The Raw (sacred) vs. The Cooked (profane). Oblique phrases from the Deep and one of Kris' darkest band contributions yet. The Line as the ultimate human invention. The sequencing of Thought & Time. Spiral strategies and expanding shamanically. The importance of making multidimensional map-ceremonies. The Dark Swamp Hero. Map = Guide. The validation of rapport. The Counterintuitive. The denial of distance through engagement. Breaking the time signatures. The silence of melody; the surprise of harmony. The Human Scaffolding vs. The Jellyfish Grid. Intuition vs. Instinct. Proto Hippie Islands. The metaphysics of being in the wrong body. Entering The Illogic. The disavowed sides of history. Extended Dream-Existence. Bowling in Maya. The Profound Vacuum. Resonating with personal algorithms. Chasing Atoms. On to being skeptical towards presumed assumptions of certainty. Rediscovering fundamentals through amateur eyes. Rerouting the curious ghost hunter. We have a brief corn chip intermission. Then the dark prolapse ballad of "Slippery Chicken". Rubber room storytelling. Moving across time frames into time maps. Packaged Dimensionality vs. Kaleidoscopic Vacuums. Approximation vs. Proximity. Maple Syrup Bootleggers and Nonlinear Dingo Symbols.

Betrayal on the Bayou | 8. Approximation of Justice

Smoke Screen: Fake Priest

Play Episode Listen Later Sep 18, 2023 47:01

betrayal binge dea bayou smokescreen sony music entertainment approximation getthebinge neon hum

#258 Approximation in the Bible - Jimmy Akin

Daily Defense Podcast

Play Episode Listen Later Sep 15, 2023

DAY 258 CHALLENGE “Why do you claim that the biblical authors used a different level of precision than we do?” DEFENSE Approximations were more common because of the inability in the ancient world to accurately measure and record things (see Day 248). We can show Scripture uses many forms of approximation, including: Numerical approximations: For example, a basin in Solomon's temple is said to have a diameter of ten cubits and a circumference of thirty cubits (1 Kings 7:23; 2 Chron. 4:2), indicating the approximate value of π (pi) as 3 (see Day 197). Numerical approximations are also …

bible kings scripture chron numerical approximation jimmy akin

FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jul 26, 2023 54:31

FlashAttention was first published by Tri Dao in May 2022 and it had a deep impact in the large language models space. Most open models you've heard of (RedPajama, MPT, LLaMA, Falcon, etc) all leverage it for faster inference. Tri came on the podcast to chat about FlashAttention, the newly released FlashAttention-2, the research process at Hazy Lab, and more. This is the first episode of our “Papers Explained” series, which will cover some of the foundational research in this space. Our Discord also hosts a weekly Paper Club, which you can signup for here. How does FlashAttention work?The paper is titled “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness”. There are a couple keywords to call out:* “Memory Efficient”: standard attention memory usage is quadratic with sequence length (i.e. O(N^2)). FlashAttention is sub-quadratic at O(N). * “Exact”: the opposite of “exact” in this case is “sparse”, as in “sparse networks” (see our episode with Jonathan Frankle for more). This means that you're not giving up any precision.* The “IO” in “IO-Awareness” stands for “Input/Output” and hints at a write/read related bottleneck. Before we dive in, look at this simple GPU architecture diagram:The GPU has access to three memory stores at runtime:* SRAM: this is on-chip memory co-located with the actual execution core. It's limited in size (~20MB on an A100 card) but extremely fast (19TB/s total bandwidth)* HBM: this is off-chip but on-card memory, meaning it's in the GPU but not co-located with the core itself. An A100 has 40GB of HBM, but only a 1.5TB/s bandwidth. * DRAM: this is your traditional CPU RAM. You can have TBs of this, but you can only get ~12.8GB/s bandwidth, which is way too slow.Now that you know what HBM is, look at how the standard Attention algorithm is implemented:As you can see, all 3 steps include a “write X to HBM” step and a “read from HBM” step. The core idea behind FlashAttention boils down to this: instead of storing each intermediate result, why don't we use kernel fusion and run every operation in a single kernel in order to avoid memory read/write overhead? (We also talked about kernel fusion in our episode with George Hotz and how PyTorch / tinygrad take different approaches here)The result is much faster, but much harder to read:As you can see, FlashAttention is a very meaningful speed improvement on traditional Attention, and it's easy to understand why it's becoming the standard for most models.This should be enough of a primer before you dive into our episode! We talked about FlashAttention-2, how Hazy Research Group works, and some of the research being done in Transformer alternatives.Show Notes:* FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (arXiv)* FlashAttention-2* Together AI* From Deep Learning to Long Learning* The Hardware Lottery by Sara Hooker* Hazy Research* Is Attention All You Need?* Nvidia CUTLASS 3* SRAM scaling slows* Transformer alternatives:* S4* Hyena* Recurrent Neural Networks (RNNs)Timestamps:* Tri's background [00:00:00]* FlashAttention's deep dive [00:02:18]* How the Hazy Research group collaborates across theory, systems, and applications [00:17:21]* Evaluating models beyond raw performance [00:25:00]* FlashAttention-2 [00:27:00]* CUDA and The Hardware Lottery [00:30:00]* Researching in a fast-changing market [00:35:00]* Promising transformer alternatives like state space models and RNNs [00:37:30]* The spectrum of openness in AI models [00:43:00]* Practical impact of models like LLAMA2 despite restrictions [00:47:12]* Incentives for releasing open training datasets [00:49:43]* Lightning Round [00:53:22]Transcript:Alessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, Partner and CTO-in-Residence at Decibel Partners. Today we have no Swyx, because he's in Singapore, so it's a one-on-one discussion with Tri Dao. Welcome! [00:00:24]Tri: Hi everyone. I'm Tri Dao, excited to be here. [00:00:27]Alessio: Tri just completed his PhD at Stanford a month ago. You might not remember his name, but he's one of the main authors in the FlashAttention paper, which is one of the seminal work in the Transformers era. He's got a lot of interest from efficient transformer training and inference, long range sequence model, a lot of interesting stuff. And now you're going to be an assistant professor in CS at Princeton next year. [00:00:51]Tri: Yeah, that's right. [00:00:52]Alessio: Yeah. And in the meantime, just to get, you know, a low pressure thing, you're Chief Scientist at Together as well, which is the company behind RedPajama. [00:01:01]Tri: Yeah. So I just joined this week actually, and it's been really exciting. [00:01:04]Alessio: So what's something that is not on the internet that people should know about you? [00:01:09]Tri: Let's see. When I started college, I was going to be an economist, so I was fully on board. I was going to major in economics, but the first week I was at Stanford undergrad, I took a few math classes and I immediately decided that I was going to be a math major. And that kind of changed the course of my career. So now I'm doing math, computer science, AI research. [00:01:32]Alessio: I had a similar thing. I started with physics and then I took like a programming course and I was like, I got to do computer science. I don't want to do physics. So FlashAttention is definitely, everybody's using this. Everybody loves it. You just released FlashAttention 2 last week. [00:01:48]Tri: Yeah. Early this week on Monday. Yeah. [00:01:53]Alessio: You know, AI time. Things move fast. So maybe let's run through some of the FlashAttention highlights, some of the innovation there, and then we can dive into FlashAttention 2. So the core improvement in FlashAttention is that traditional attention is a quadratic sequence length. And to the two, FlashAttention is linear, which obviously helps with scaling some of these models. [00:02:18]Tri: There are two factors there. So of course the goal has been to make attention go faster or more memory efficient. And ever since attention became popular in 2017 with the Transformer paper, lots and lots of folks have been working on this. And a lot of approaches has been focusing on approximating attention. The goal is you want to scale to longer sequences. There are tons of applications where you want to do that. But scaling to longer sequences is difficult because attention scales quadratically in sequence length on both runtime and memory, as you mentioned. So instead of trying to approximate attention, we were trying to figure out, can we do the same computation and maybe be more memory efficient? So in the end, we ended up being the memory is linear in sequence length. In terms of computation, it's still quadratic, but we managed to make it much more hardware friendly. And as a result, we do get wall clock speed up on the order of 2 to 4x, which really helps because that just means that you'll be able to train with 2 to 4x longer sequence length for the same cost without doing any approximations. As a result, lots of folks have been using this. The thing is available in a lot of libraries that do language model training or fine tuning. [00:03:32]Alessio: And the approximation thing is important because this is an exact thing versus a sparse. So maybe explain a little bit the difference there. [00:03:40]Tri: For sure. So in addition, essentially you compute pairwise similarity between every single element in a sequence against each other. So there's been other approaches where instead of doing all that pairwise computation, you only compute similarity for some pairs of elements in the sequence. So you don't do quadratic number of comparison. And this can be seen as some form of sparsity. Essentially you're ignoring some of the elements. When you write down the matrix, you essentially say, OK, I'm going to pretend there's zero. So that has some benefits in terms of runtime and memory. But the trade-off is that it tends to do worse in terms of quality because you're essentially approximating or ignoring some elements. And I personally have worked on this as well for a few years. But when we talk to practitioners who actually train models, especially at large scale, they say, tend not to use these approximate attention methods. Because it turns out, this was surprising to me at the time, was that these approximation methods, even though they perform fewer computation, they tend to not be faster in walk-on time. So this was pretty surprising because back then, I think my background was more on the theoretical side. So I was thinking of, oh, how many flops or floating point operations are you performing? And hopefully that correlates well with walk-on time. But I realized that I was missing a bunch of ideas from the system side where flops or floating point operations don't necessarily correlate with runtime. There are other factors like memory reading and writing, parallelism, and so on. So I learned a ton from just talking to systems people because they kind of figured this stuff out a while ago. So that was really eye-opening. And then we ended up focusing a lot more on memory reading and writing because that turned out to be the majority of the time when you're doing attention is reading and writing memory. [00:05:34]Alessio: Yeah, the I.O. awareness is probably one of the biggest innovations here. And the idea behind it is, like you mentioned, the FLOPS growth of the cards have been going up, but the memory bandwidth, not as much. So I think maybe that was one of the assumptions that the original attention paper had. So talk a bit about how that came to be as an idea. It's one of those things that like in insight, it's like, obviously, why are we like rewriting to like HBM every time, you know, and like once you change it, it's clear. But what was that discovery process? [00:06:08]Tri: Yeah, in hindsight, a lot of the ideas have already been there in the literature. And I would say is it was somehow at the intersection of both machine learning and systems. And you kind of needed ideas from both sides. So on one hand, on the system side, so lots of systems folks have known that, oh, you know, kernel fusion is great. Kernel fusion just means that instead of performing, you know, loading the same element, instead of performing an operation, write it down, load it back up and perform the second operation, you just load it once, perform two operations and then write it down again. So that saves you kind of memory read and write in the middle there. So kernel fusion has been a classic. There's been other techniques from the system side, like tiling, where you perform things in the form of computations in block, again, so that you can load it into a really fast memory. Think of it as a cache. And this is, again, classical computer science ideas, right? You want to use the cache. So the system folks have been thinking about these ideas for a long time, and they apply to attention as well. But there were certain things in attention that made it difficult to do a complete kernel fusion. One of which is there is this softmax operation in the middle, which requires you to essentially sum across the row of the attention matrix. So it makes it difficult to kind of break it, because there's this dependency. So it makes it difficult to break things into a block. So on the system side, people have been thinking about these ideas, but it's been difficult to kind of do kernel fusion for the entire operation. On the machine learning side, people have been thinking more algorithmically. They say, okay, either we can approximate attention, or there's this trick called the online softmax trick, which says that because of softmax, the way it's written mathematically, you can actually break it up into smaller pieces, do some rescaling, and still get the right answer. So this online softmax trick has been around for a while. I think there was a paper from NVIDIA folks back in 2018 about this. And then there was a paper from Google. So Marcus, Rob, and Stats wrote a paper late 2021 on using this online softmax trick to break attention up into smaller pieces. So a lot of the ideas were already there. But it turns out, you kind of need to combine ideas from both sides. So you need to understand that, hey, we want to do kernel fusion to reduce memory written writes. But we also need this online softmax trick to be able to break the softmax into smaller pieces so that a lot of the systems tricks kind of carry through. We saw that, and it was kind of a natural idea that we ended up using ideas from both sides, and it ended up working pretty well. Yeah. [00:08:57]Alessio: Are there any downsides to kernel fusion? If I think about databases and the reasons why we have atomic operations, you know, it's like, you have observability and fallback in between them. How does that work with attention? Is there anything that we lose by fusing the operations? [00:09:13]Tri: Yeah, I think mostly on the practical side is that you lose a little bit of flexibility in the sense that, hey, now you have, for example, faster attention, it's just a subroutine that you would call to do attention. But as a researcher, let's say you don't want that exact thing, right? You don't want just attention, let's say you want some modification to attention. You want to do, hey, I'm going to multiply the query and key, but then I'm going to do this extra thing before I carry on. So kernel fusion just means that, okay, we have a subroutine that does the entire thing. But if you want to experiment with things, you won't be able to use that fused kernel. And the answer is, can we have a compiler that then automatically does a lot of this kernel fusion? Lots of compiler folks are thinking about this, either with a new language or you can embed it in PyTorch. PyTorch folks have been working on this as well. So if you write just your code in PyTorch and they can capture the graph, can they generate code that will fuse everything together? That's still ongoing, and it works for some cases. But for attention, because of this kind of softmax rewriting stuff, it's been a little bit more difficult. So maybe in a year or two, we'll have compilers that are able to do a lot of these optimizations for you. And you don't have to, for example, spend a couple months writing CUDA to get this stuff to work. Awesome. [00:10:41]Alessio: And just to make it clear for listeners, when we say we're not writing it to memory, we are storing it, but just in a faster memory. So instead of the HBM, we're putting it in the SRAM. Yeah. [00:10:53]Tri: Yeah. [00:10:54]Alessio: Maybe explain just a little bit the difference there. [00:10:56]Tri: Yeah, for sure. This is kind of a caricature of how you think about accelerators or GPUs in particular, is that they have a large pool of memory, usually called HBM, or high bandwidth memory. So this is what you think of as GPU memory. So if you're using A100 and you list the GPU memory, it's like 40 gigs or 80 gigs. So that's the HBM. And then when you perform any operation, you need to move data from the HBM to the compute unit. So the actual hardware unit that does the computation. And next to these compute units, there are on-chip memory or SRAM, which are much, much smaller than HBM, but much faster. So the analogy there is if you're familiar with, say, CPU and RAM and so on. So you have a large pool of RAM, and then you have the CPU performing the computation. But next to the CPU, you have L1 cache and L2 cache, which are much smaller than DRAM, but much faster. So you can think of SRAM as the small, fast cache that stays close to the compute unit. Physically, it's closer. There is some kind of asymmetry here. So HBM is much larger, and SRAM is much smaller, but much faster. One way of thinking about it is, how can we design algorithms that take advantage of this asymmetric memory hierarchy? And of course, lots of folks have been thinking about this. These ideas are pretty old. I think back in the 1980s, the primary concerns were sorting. How can we sort numbers as efficiently as possible? And the motivating example was banks were trying to sort their transactions, and that needs to happen overnight so that the next day they can be ready. And so the same idea applies, which is that they have slow memory, which was hard disk, and they have fast memory, which was DRAM. And people had to design sorting algorithms that take advantage of this asymmetry. And it turns out, these same ideas can apply today, which is different kinds of memory. [00:13:00]Alessio: In your paper, you have the pyramid of memory. Just to give people an idea, when he says smaller, it's like HBM is like 40 gig, and then SRAM is like 20 megabytes. So it's not a little smaller, it's much smaller. But the throughput on card is like 1.5 terabytes a second for HBM and like 19 terabytes a second for SRAM, which is a lot larger. How do you think that evolves? So TSMC said they hit the scaling limits for SRAM, they just cannot grow that much more. HBM keeps growing, HBM3 is going to be 2x faster than HBM2, I think the latest NVIDIA thing has HBM3. How do you think about the future of FlashAttention? Do you think HBM is going to get fast enough when maybe it's not as useful to use the SRAM? [00:13:49]Tri: That's right. I think it comes down to physics. When you design hardware, literally SRAM stays very close to compute units. And so you don't have that much area to essentially put the transistors. And you can't shrink these things too much. So just physics, in terms of area, you don't have that much area for the SRAM. HBM is off-chip, so there is some kind of bus that essentially transfers data from HBM to the compute unit. So you have more area to essentially put these memory units. And so yeah, I think in the future SRAM probably won't get that much larger, because you don't have that much area. HBM will get larger and faster. And so I think it becomes more important to design algorithms that take advantage of this memory asymmetry. It's the same thing in CPU, where the cache is really small, the DRAM is growing larger and larger. DRAM could get to, I don't know, two terabytes, six terabytes, or something, whereas the cache stays at, I don't know, 15 megabytes or something like that. I think maybe the algorithm design becomes more and more important. There's still ways to take advantage of this, I think. So in the future, I think flash attention right now is being used. I don't know if in the next couple of years, some new architecture will come in and whatnot, but attention seems to be still important. For the next couple of years, I still expect some of these ideas to be useful. Not necessarily the exact code that's out there, but I think these ideas have kind of stood the test of time. New ideas like IO awareness from back in the 1980s, ideas like kernel fusions, tiling. These are classical ideas that have stood the test of time. So I think in the future, these ideas will become more and more important as we scale models to be larger, as we have more kinds of devices, where performance and efficiency become much, much more important. [00:15:40]Alessio: Yeah, and we had Jonathan Frankle on the podcast, and if you go to issattentionallyouneed.com, he has an outstanding bet, and he does believe that attention will be the state of the art architecture still in a few years. Did you think flash attention would be this popular? I'm always curious on the research side, you publish a paper, and obviously you know it's great work, but sometimes it just kind of falls flat in the industry. Could you see everybody just starting to use this, or was that a surprise to you? [00:16:11]Tri: Certainly, I didn't anticipate the level of popularity. Of course, we were extremely happy to have people using this stuff and giving us feedback and so on, and help us improve things. I think when we were writing the paper, I remember sending an email to one of my advisors, and like, hey, I'm excited about this paper, but I think the most important thing will be the artifact, which is the code. So I knew that the code will be valuable. So we kind of focus a lot on the code and make sure that the code is usable and as fast as can be. Of course, the idea, the paper presents the ideas and explain it and have experiments that validate the idea, but I knew that the artifact or the code was also pretty important. And that turned out to be the right focus, which is, you know, we put out the paper, we release the code and continue working on the code. So it's a team effort with my co-authors as well. [00:17:07]Alessio: We mentioned Hazy Research a bunch of times on the podcast before. I would love for you to spend five minutes just talking about how does the group work? How do people get together? How do you bounce ideas off of each other? Yeah. [00:17:21]Tri: So Hazy Research is a research group at Stanford led by one of my advisors, Chris Re. I love the people there. It was one of the best experiences I had. They've made my PhD so much more enjoyable. And I think there are a couple of ways that the group has been working pretty well. So one is, I think there's a diverse pool of people who either, you know, some of them focus on algorithms and theory, some of them focus on building systems, some of them focus on applications. And as a result, there is this flow of idea. So as an example, some of us were working on like more algorithms and theory, and then we can talk to the folks building systems and say, hey, let's try it out and let's put it in the systems and see how it is. And there you will get feedback from systems folks. They will say, hey, we implemented this, or we tried this and this is where it doesn't work, something like that. And once we put it in the systems, the application folks can use the algorithm or new methods or new models. And we again get great feedback from them because the application folks, for example, some of my good friends, they focus on medical imaging or seizure detection. And that is the problem they care about. And if your method doesn't work on the task they care about, they will tell you. Whereas I think a lot of people in machine learning, they're a little bit more flexible. So they will be like, hey, it doesn't work on seizure detection. Let's try some other task, right? But having that direct feedback of like, hey, it doesn't work there, let's figure out why. I think that that feedback allows us to do better work. And I think that kind of process of exchanging ideas, validating it in a real system so that applications folks can try it out and give you feedback. That cycle has been very, very useful. And so that's one, having a diverse group of people. The other one is, and this is something I really appreciate from advice from Chris was try to understand the fundamental, right? And he's happy letting me go off and read some textbooks and playing with things because I think a lot of research ideas come from understanding the old literature and see how it fits with the new landscape. And so if you just new archive papers every day, that's great, but you also need to read textbooks. And that's one advice I got from Chris, which is understand the fundamentals. And I think that allows us to do more impactful work. [00:19:46]Alessio: How do you think about academia versus industry? I feel like AI / Machine Learning has been an area where up until three, four years ago, most of the cutting edge work was being done in academia. And now there's all these big industry research labs. You're obviously going to Princeton, so you're an academia believer. How should people think about where to go? Say I'm doing my master's, I have to decide between doing a PhD and going into OpenAI Anthropic. How should I decide? [00:20:15]Tri: I think they kind of play a complementary role, in my opinion. Of course, I also was considering different paths as well. So I think right now, scaling matters a lot, especially when you talk about language models and AI and so on. Scaling matters a lot. And that means that you need compute resources and you need infrastructure and you need engineers time. And so industry tends to have an advantage when it comes to scaling things. But a lot of the ideas actually came from academia. So let's take Attention, which got popular with the Transformer in 2017. Attention actually has been around for a while. So I think the first mention was in 2014, a paper from Bernadot and others and Yoshua Bengio, which is coming from academia. A lot of ideas did come from academia. And scaling things up, of course, I think OpenAI has been great at scaling things up. That was the bet that they made after, I think, GPT-2. So they saw that scaling these things up to back then was 1.5 billion parameter seemed to give you amazing capabilities. So they really committed to that. They really committed to scaling things. And that turned out to be, it's been a pretty successful bet. I think for academia, we're still trying to figure out exactly what we're doing in this shifting landscape. And so lots of folks have been focusing on, for example, evaluation. So I know the Stanford Center for Foundation Model led by Percy, they have this benchmark called HELM, which is this holistic benchmark. So trying to figure out, okay, characterizing the landscape of different kinds of models, what people should evaluate, what people should measure, and things like that. So evaluation is one role. The other one is understanding. So this has happened historically where there's been some development in the industry and academia can play a role in explaining, understanding. They have the luxury to slow down trying to understand stuff, right? So lots of paper on understanding what's really going on, probing these models, and so on. I think I'm not as familiar with the NLP literature, but my impression is there's a lot of that going on in the NLP conferences, which is understanding what these models are doing, what capabilities they have, and so on. And the third one I could see is that the academia can take more risky bets in the sense that we can work on stuff that is quite different from industry. I think industry, my impression is you have some objective. You're trying to say, hey, for this quarter, we want to scale the model in this particular way. Next quarter, we want the model to have these capabilities. You're trying to get objectives that maybe, I don't know, 70% that will work out because it's important for the company's direction. I think for academia, the way things work is you have many, many researchers or PhD students, and they're kind of pursuing independent directions. And they have a little bit more flexibility on, hey, I'm going to try out this seemingly crazy idea and see, let's say there's a 30% chance of success or something. And however you define success, for academia, a lot of the time, success just means like, hey, we found something interesting. That could eventually go into industry through collaboration and so on. So I do see academia and industry kind of playing complementary roles. And as for someone choosing a career, I think just more and more generally, industry would be probably better in terms of compensation, in terms of probably work-life balance. But my biased perspective is that maybe academia gives you a little bit more freedom to think and understand things. So it probably comes down to personal choice. I end up choosing to be a professor next year at Princeton. But of course, I want to maintain a relationship with industry folks. I think industry folks can provide very valuable feedback to what we're doing in academia so that we understand where the field is moving because some of the directions are very much influenced by what, for example, OpenAI or Google is doing. So we want to understand where the field is moving. What are some promising applications? And try to anticipate, okay, if the field is moving like this, these applications are going to be popular. What problems will be important in two, three years? And then we try to start thinking about those problems so that hopefully in two, three years, we have some of the answers to some of these problems in two, three years. Sometimes it works out, sometimes it doesn't. But as long as we do interesting things in academia, that's the goal. [00:25:03]Alessio: And you mentioned the eval side. So we did a Benchmarks 101 episode. And one of the things we were seeing is sometimes the benchmarks really influence the model development. Because obviously, if you don't score well on the benchmarks, you're not going to get published and you're not going to get funded. How do you think about that? How do you think that's going to change now that a lot of the applications of these models, again, is in more narrow industry use cases? Do you think the goal of the academia eval system is to be very broad and then industry can do their own evals? Or what's the relationship there? [00:25:40]Tri: Yeah, so I think evaluation is important and often a little bit underrated. So it's not as flashy as, oh, we have a new model that can do such and such. But I think evaluation, what you don't measure, you can't make progress on, essentially. So I think industry folks, of course, they have specific use cases that their models need to do well on. And that's what they care about. Not just academia, but other groups as well. People do understand what are some of the emerging use cases. So for example, now one of the most popular use cases is Chatbot. And then I think folks from Berkeley, some of them are from Berkeley, call them MLCs. They set up this kind of Chatbot arena to essentially benchmark different models. So people do understand what are some of the emerging use cases. People do contribute to evaluation and measurement. And as a whole, I think people try to contribute to the field and move the field forward, albeit that maybe slightly different directions. But we're making progress and definitely evaluation and measurement is one of the ways you make progress. So I think going forward, there's still going to be just more models, more evaluation. We'll just have better understanding of what these models are doing and what capabilities they have. [00:26:56]Alessio: I like that your work has been focused on not making benchmarks better, but it's like, let's just make everything faster. So it's very horizontal. So FlashAttention 2, you just released that on Monday. I read in the blog post that a lot of the work was also related to some of the NVIDIA library updates. Yeah, maybe run us through some of those changes and some of the innovations there. Yeah, for sure. [00:27:19]Tri: So FlashAttention 2 is something I've been working on for the past couple of months. So the story is the NVIDIA CUTLASS team, they released a new version of their library, which contains all these primitives to allow you to do matrix multiply or memory loading on GPU efficiently. So it's a great library and I built on that. So they released their version 3 back in January and I got really excited and I wanted to play with that library. So as an excuse, I was just like, okay, I'm going to refactor my code and use this library. So that was kind of the start of the project. By the end, I just ended up working with the code a whole lot more and I realized that, hey, there are these inefficiencies still in Flash Attention. We could change this way or that way and make it, in the end, twice as fast. But of course, building on the library that the NVIDIA folks released. So that was kind of a really fun exercise. I was starting out, it's just an excuse for myself to play with the new library. What ended up was several months of improvement, improving Flash Attention, discovering new ideas. And in the end, we managed to make it 2x faster and now it's pretty close to probably the efficiency of things like matrix multiply, which is probably the most optimized subroutine on the planet. So we're really happy about it. The NVIDIA Cutlass team has been very supportive and hopefully in the future, we're going to collaborate more. [00:28:46]Alessio: And since it's an NVIDIA library, can you only run this on CUDA runtimes? Or could you use this and then run it on an AMD GPU? [00:28:56]Tri: Yeah, so it's an NVIDIA library. So right now, the code we release runs on NVIDIA GPUs, which is what most people are using to train models. Of course, there are emerging other hardware as well. So the AMD folks did implement a version of Flash Attention, I think last year as well, and that's also available. I think there's some implementation on CPU as well. For example, there's this library, ggml, where they implemented the same idea running on Mac and CPU. So I think that kind of broadly, the idea would apply. The current implementation ended up using NVIDIA's library or primitives, but I expect these ideas to be broadly applicable to different hardware. I think the main idea is you have asymmetry in memory hierarchy, which tends to be everywhere in a lot of accelerators. [00:29:46]Alessio: Yeah, it kind of reminds me of Sara Hooker's post, like the hardware lottery. There could be all these things that are much better, like architectures that are better, but they're not better on NVIDIA. So we're never going to know if they're actually improved. How does that play into some of the research that you all do too? [00:30:04]Tri: Yeah, so absolutely. Yeah, I think Sara Hooker, she wrote this piece on hardware lottery, and I think she captured really well of what a lot of people have been thinking about this. And I certainly think about hardware lottery quite a bit, given that I do some of the work that's kind of really low level at the level of, hey, we're optimizing for GPUs or NVIDIA GPUs and optimizing for attention itself. And at the same time, I also work on algorithms and methods and transformer alternatives. And we do see this effect in play, not just hardware lottery, but also kind of software framework lottery. You know, attention has been popular for six years now. And so many kind of engineer hours has been spent on making it as easy and efficient as possible to run transformer, right? And there's libraries to do all kinds of tensor parallel, pipeline parallel, if you use transformer. Let's say someone else developed alternatives, or let's just take recurrent neural nets, like LSTM, GRU. If we want to do that and run that efficiently on current hardware with current software framework, that's quite a bit harder. So in some sense, there is this feedback loop where somehow the model architectures that take advantage of hardware become popular. And the hardware will also kind of evolve to optimize a little bit for that kind of architecture and software framework will also evolve to optimize for that particular architecture. Right now, transformer is the dominant architecture. So yeah, I'm not sure if there is a good way out of this. Of course, there's a lot of development. Things like, I think compilers will play a role because compilers allow you to maybe still be much more efficient across different kinds of hardware because essentially you write the same code and compiler will be able to make it run efficiently different kinds of hardware. So for example, there's this language Mojo, they're compiler experts, right? And their bet is AI models will be running on different kinds of devices. So let's make sure that we have really good compilers with a good language that then the compiler can do a good job optimizing for all kinds of devices. So that's maybe one way that you can get out of this cycle. But yeah, I'm not sure of a good way. In my own research, I have to think about both the algorithm new model and how it maps to hardware. So there are crazy ideas that seem really good, but will be really, really difficult to run efficiently. And so as a result, for example, we can't really scale some of the architectures up simply because they're not hardware friendly. I have to think about both sides when I'm working on new models. [00:32:50]Alessio: Yeah. Have you spent any time looking at some of the new kind of like AI chips companies, so to speak, like the Cerebras of the world? Like one of their innovations is co-locating everything on the chip. So you remove some of this memory bandwidth issue. How do you think about that? [00:33:07]Tri: Yeah, I think that's an interesting bet. I think Tesla also has this Dojo supercomputer where they try to have essentially as fast on-chip memory as possible and removing some of these data transfer back and forth. I think that's a promising direction. The issues I could see, you know, I'm definitely not a hardware expert. One issue is the on-chip memory tends to be really expensive to manufacture, much more expensive per gigabyte compared to off-chip memory. So I talked to, you know, some of my friends at Cerebros and, you know, they have their own stack and compiler and so on, and they can make it work. The other kind of obstacle is, again, with compiler and software framework and so on. For example, if you can run PyTorch on this stuff, lots of people will be using it. But supporting all the operations in PyTorch will take a long time to implement. Of course, people are working on this. So I think, yeah, we kind of need these different bets on the hardware side as well. Hardware has, my understanding is, has a kind of a longer time scale. So you need to design hardware, you need to manufacture it, you know, maybe on the order of three to five years or something like that. So people are taking different bets, but the AI landscape is changing so fast that it's hard to predict, okay, what kind of models will be dominant in, let's say, three or five years. Or thinking back five years ago, would we have known that Transformer would have been the dominant architecture? Maybe, maybe not, right? And so different people will make different bets on the hardware side. [00:34:39]Alessio: Does the pace of the industry and the research also influence the PhD research itself? For example, in your case, you're working on improving attention. It probably took you quite a while to write the paper and everything, but in the meantime, you could have had a new model architecture come out and then it's like nobody cares about attention anymore. How do people balance that? [00:35:02]Tri: Yeah, so I think it's tough. It's definitely tough for PhD students, for researchers. Given that the field is moving really, really fast, I think it comes down to understanding fundamental. Because that's essentially, for example, what the PhD allows you to do. It's been a couple of years understanding the fundamentals. So for example, when I started my PhD, I was working on understanding matrix vector multiply, which has been a concept that's been around for hundreds of years. We were trying to characterize what kind of matrices would have theoretically fast multiplication algorithm. That seems to have nothing to do with AI or anything. But I think that was a time when I developed mathematical maturity and research taste and research skill. The research topic at that point didn't have to be super trendy or anything, as long as I'm developing skills as a researcher, I'm making progress. And eventually, I've gotten quite a bit better in terms of research skills. And that allows, for example, PhD students later in their career to quickly develop solutions to whatever problems they're facing. So I think that's just the natural arc of how you're being trained as a researcher. For a lot of PhD students, I think given the pace is so fast, maybe it's harder to justify spending a lot of time on the fundamental. And it's tough. What is this kind of explore, exploit kind of dilemma? And I don't think there's a universal answer. So I personally spend some time doing this kind of exploration, reading random textbooks or lecture notes. And I spend some time keeping up with the latest architecture or methods and so on. I don't know if there's a right balance. It varies from person to person. But if you only spend 100% on one, either you only do exploration or only do exploitation, I think it probably won't work in the long term. It's probably going to have to be a mix and you have to just experiment and kind of be introspective and say, hey, I tried this kind of mixture of, I don't know, one exploration paper and one exploitation paper. How did that work out for me? Should I, you know, having conversation with, for example, my advisor about like, hey, did that work out? You know, should I shift? I focus more on one or the other. I think quickly adjusting and focusing on the process. I think that's probably the right way. I don't have like a specific recommendation that, hey, you focus, I don't know, 60% on lecture notes and 40% on archive papers or anything like that. [00:37:35]Alessio: Let's talk about some Transformer alternatives. You know, say Jonathan Franco loses his bet and Transformer is not the state of the art architecture. What are some of the candidates to take over? [00:37:49]Tri: Yeah, so this bet is quite fun. So my understanding is this bet between Jonathan Franco and Sasha Rush, right? I've talked to Sasha a bunch and I think he recently gave an excellent tutorial on Transformer alternatives as well. So I would recommend that. So just to quickly recap, I think there's been quite a bit of development more recently about Transformer alternatives. So architectures that are not Transformer, right? And the question is, can they do well on, for example, language modeling, which is kind of the application that a lot of people care about these days. So there are methods based on state space methods that came out in 2021 from Albert Gu and Curran and Chris Re that presumably could do much better in terms of capturing long range information while not scaling quadratically. They scale sub-quadratically in terms of sequence length. So potentially you could have a much more efficient architecture when sequence length gets really long. The other ones have been focusing more on recurrent neural nets, which is, again, an old idea, but adapting to the new landscape. So things like RWKV, I've also personally worked in this space as well. So there's been some promising results. So there's been some results here and there that show that, hey, these alternatives, either RNN or state space methods, can match the performance of Transformer on language modeling. So that's really exciting. And we're starting to understand on the academic research side, we want to understand, do we really need attention? I think that's a valuable kind of intellectual thing to understand. And maybe we do, maybe we don't. If we want to know, we need to spend serious effort on trying the alternatives. And there's been folks pushing on this direction. I think RWKV scale up to, they have a model at 14 billion that seems pretty competitive with Transformer. So that's really exciting. That's kind of an intellectual thing. We want to figure out if attention is necessary. So that's one motivation. The other motivation is Transformer Alternative could have an advantage in practice in some of the use cases. So one use case is really long sequences. The other is really high throughput of generation. So for really long sequences, when you train with Transformer, with flash attention and so on, the computation is still quadratic in the sequence length. So if your sequence length is on the order of, I don't know, 16K, 32K, 100K or something, which some of these models have sequence length 100K, then you do get significantly slower in terms of training, also in terms of inference. So maybe these alternative architectures could scale better in terms of sequence length. I haven't seen actual validation on this. Let's say an RNN model release with context length, I don't know, 100K or something. I haven't really seen that. But the hope could be that as we scale to long sequences, these alternative architectures could be more well-suited. Not just text, but things like high resolution images, audio, video, and so on, which are emerging applications. So that's one, long sequences. Number two is a high throughput generation, where I can imagine scenarios where the application isn't like an interactive chatbot, but let's say a company wants to batch as many requests as possible on their server, or they're doing offline processing, they're generating stuff based on their internal documents, that you need to process in batch. And the issue with Transformer is that during generation, it essentially needs to keep around all the previous history. It's called the KV cache. And that could take a significant amount of memory, so you can't really batch too much because you run out of memory. I am personally bullish on RNNs. I think RNNs, they essentially summarize the past into a state vector that has fixed size, so the size doesn't grow with the history. So that means that you don't need as much memory to keep around all the previous tokens. And as a result, I think you can scale to much higher batch sizes. And as a result, you can make much more efficient use of the GPUs or the accelerator, and you could have much higher generation throughput. Now, this, I don't think, has been validated at scale. So as a researcher, I'm bullish on this stuff because I think in the next couple of years, these are use cases where these alternatives could have an advantage. We'll just kind of have to wait and see to see if these things will happen. I am personally bullish on this stuff. At the same time, I also spend a bunch of time making attention as fast as possible. So maybe hatching and playing both sides. Ultimately, we want to understand, as researchers, we want to understand what works, why do the models have these capabilities? And one way is, let's push attention to be as efficient as possible. On the other hand, let's push other alternatives to be as efficient at scale, as big as possible, and so that we can kind of compare them and understand. Yeah, awesome. [00:43:01]Alessio: And I think as long as all of this work happens and open, it's a net positive for everybody to explore all the paths. Yeah, let's talk about open-source AI. Obviously, together, when Red Pajama came out, which was an open clone of the LLAMA1 pre-training dataset, it was a big thing in the industry. LLAMA2 came out on Tuesday, I forget. And this week, there's been a lot of things going on, which they call open-source, but it's not really open-source. Actually, we wrote a post about it that was on the front page of Hacker News before this podcast, so I was frantically responding. How do you think about what open-source AI really is? In my mind, in open-source software, we have different levels of open. So there's free software, that's like the GPL license. There's open-source, which is Apache, MIT. And then there's kind of restricted open-source, which is the SSPL and some of these other licenses. In AI, you have the open models. So Red Pajama is an open model because you have the pre-training dataset, you have the training runs and everything. And then there's obviously RandomLens that doesn't make it one-to-one if you retrain it. Then you have the open-weights model that's kind of like StableLM, where the weights are open, but the dataset is not open. And then you have LLAMA2, which is the dataset is not open, the weights are restricted. It's kind of like not really open-source, but open enough. I think it's net positive because it's like $3 million of flops donated to the public. [00:44:32]Tri: How do you think about that? [00:44:34]Alessio: And also, as you work together, what is your philosophy with open-source AI? Right, right. [00:44:40]Tri: Yeah, I think that's a great question. And I think about it on maybe more practical terms. So of course, Meta has done an amazing job training LLAMA1, LLAMA2. And for LLAMA2, they make it much less restrictive compared to LLAMA1. Now you can use it for businesses, unless you are a monthly active user or something like that. I think just this change will have a very significant impact in the kind of landscape of open-source AI, where now lots of businesses, lots of companies will be using, I expect will be using things like LLAMA2. They will fine-tune on their own dataset. They will be serving variants or derivatives of LLAMA2. Whereas before, with LLAMA1, it was also a really good model, but your business companies weren't allowed to do that. So I think on a more practical term, it's kind of shifting the balance between a closed-source model like OpenAI and Anthropic and Google, where you're making API calls, right? And maybe you don't understand as much of what the model is doing, how the model is changing, and so on. Versus now, we have a model with open weight that is pretty competitive from what I've seen in terms of benchmarks, pretty competitive with GPT 3.5, right? And if you fine-tune it on your own data, maybe it's more well-suited for your own data. And I do see that's going to shift the balance of it. More and more folks are going to be using, let's say, derivatives of LLAMA2. More and more folks are going to fine-tune and serve their own model instead of calling an API. So that shifting of balance is important because in one way, we don't want just a concentration of decision-making power in the hands of a few companies. So I think that's a really positive development from Meta. Of course, training the model takes a couple of millions of dollars, but engineers have and I'm sure they spend tons of time trying many, many different things. So the actual cost is probably way more than that. And they make the weights available and they allow probably a lot of companies are going to be using this. So I think that's a really positive development. And we've also seen amazing progress on the open source community where they would take these models and they either fine-tune on different kinds of data sets or even make changes to the model. So as an example, I think for LLAMA1, the context lane was limited to 2K. Like a bunch of folks figured out some really simple methods to scale up to like 8K. [00:47:12]Alessio: Like the RoPE. [00:47:13]Tri: Yes. I think the open source community is very creative, right? And lots of people. LLAMA2 will, again, kind of accelerate this where more people will try it out. More people will make tweaks to it and make a contribution and then so on. So overall, I think I see that as still a very positive development for the field. And there's been lots of libraries that will allow you to host or fine-tune these models, like even with quantization and so on. Just a couple of hours after LLAMA2 was released, tons of companies announcing that, hey, it's on our API or hosting and so on and together did the same. So it's a very fast-paced development and just kind of a model with available weights that businesses are allowed to use. I think that alone is already a very positive development. At the same time, yeah, we can do much better in terms of releasing data sets. Data sets tend to be... Somehow people are not incentivized to release data sets. So philosophically, yeah, you want to be as open as possible. But on a practical term, I think it's a little bit harder for companies to release data sets. Legal issues. The data sets released tend to be not as eye-catchy as the model release. So maybe people are less incentivized to do that. We've seen quite a few companies releasing data sets together. Released a red pajama data set. I think Cerebus then worked on that and deduplicate and clean it up and release slim pajama and so on. So we're also seeing positive development on that front, kind of on the pre-training data set. So I do expect that to continue. And then on the fine-tuning data set or instruction tuning data set, I think we now have quite a few open data sets on instruction tuning and fine-tuning. But these companies do pay for human labelers to annotate these instruction tuning data set. And that is expensive. And maybe they will see that as their competitive advantage. And so it's harder to incentivize these companies to release these data sets. So I think on a practical term, we're still going to make a lot of progress on open source AI, on both the model development, on both model hosting, on pre-training data set and fine-tuning data set. Right now, maybe we don't have the perfect open source model since all the data sets are available. Maybe we don't have such a thing yet, but we've seen very fast development on the open source side. I think just maybe this time last year, there weren't as many models that are competitive with, let's say, ChatGPT. [00:49:43]Alessio: Yeah, I think the open data sets have so much more impact than open models. If you think about Elusive and the work that they've done, GPT-J was great, and the Pythia models are great, but the Pyle and the Stack, everybody uses them. So hopefully we get more people to contribute time to work on data sets instead of doing the 100th open model that performs worse than all the other ones, but they want to say they released the model. [00:50:14]Tri: Yeah, maybe the question is, how do we figure out an incentive structure so that companies are willing to release open data sets? And for example, it could be like, I think some of the organizations are now doing this where they are asking volunteers to annotate and so on. And maybe the Wikipedia model of data set, especially for instruction tuning, could be interesting where people actually volunteer their time and instead of editing Wikipedia, add annotation. And somehow they acknowledge and feel incentivized to do so. Hopefully we get to that kind of level of, in terms of data, it would be kind of like Wikipedia. And in terms of model development, it's kind of like Linux where people are contributing patches and improving the model in some way. I don't know exactly how that's going to happen, but based on history, I think there is a way to get there. [00:51:05]Alessio: Yeah, I think the Dolly-15K data set is a good example of a company saying, let's do this smaller thing, just make sure we make it open. We had Mike Conover from Databricks on the podcast, and he was like, people just bought into it and leadership was bought into it. You have companies out there with 200,000, 300,000 employees. It's like, just put some of them to label some data. It's going to be helpful. So I'm curious to see how that evolves. What made you decide to join Together? [00:51:35]Tri: For Together, the focus has been focusing a lot on open source model. And I think that aligns quite well with what I care about, of course. I also know a bunch of people there that I know and trust, and I'm excited to work with them. Philosophically, the way they've been really open with data set and model release, I like that a lot. Personally, for the stuff, for example, the research that I've developed, like we also try to make code available, free to use and modify and so on, contributing to the community. That has given us really valuable feedback from the community and improving our work. So philosophically, I like the way Together has been focusing on open source model. And the nice thing is we're also going to be at the forefront of research and the kind of research areas that I'm really excited about, things like efficient training and inference, aligns quite well with what the company is doing. We'll try our best to make things open and available to everyone. Yeah, but it's going to be fun being at the company, leading a team, doing research on the topic that I really care about, and hopefully we'll make things open to benefit the community. [00:52:45]Alessio: Awesome. Let's jump into the lightning round. Usually, I have two questions. So one is on acceleration, one on exploration, and then a takeaway. So the first one is, what's something that already happened in AI machine learning that you thought would take much longer than it has? [00:53:01]Tri: I think understanding jokes. I didn't expect that to happen, but it turns out scaling model up and training lots of data, the model can now understand jokes. Maybe it's a small thing, but that was amazing to me. [00:53:16]Alessio: What about the exploration side? What are some of the most interesting unsolved questions in the space? [00:53:22]Tri: I would say reasoning in the broad term. We don't really know how these models do. Essentially, they do something that looks like reasoning. We don't know how they're doing it. We have some ideas. And in the future, I think we will need to design architecture that explicitly has some kind of reasoning module in it if we want to have much more capable models. [00:53:43]Alessio: What's one message you want everyone to remember today? [00:53:47]Tri: I would say try to understand both the algorithm and the systems that these algorithms run on. I think at the intersection of machine learning system has been really exciting, and there's been a lot of amazing results at this intersection. And then when you scale models to large scale, both the machine learning side and the system side really matter. [00:54:06]Alessio: Awesome. Well, thank you so much for coming on 3. [00:54:09]Tri: This was great. Yeah, this has been really fun. [00:54:11] Get full access to Latent Space at www.latent.space/subscribe

ai google phd data partner mit legal chatgpt tesla attention practical singapore mac stanford wikipedia released scaling berkeley falcon cto ram personally stats nlp evaluating transformers openai residence nvidia hardware api cs physically incentives chatbots stack io mojo gpt rope promising linux llama transformer exact helm researching flops amd tri tbs apache 2k percy versus gru cpu elusive gpu chief scientist dojo lightning round curran dram kv benchmarks gpus 8k anthropic alessio kernel l2 l1 pyle databricks cuda mpt stanford center hacker news gpl 16k ai machine learning sram nvidia gpus cerebros pytorch philosophically 8gb yoshua bengio approximation a100 input output pythia 32k cerebus hbm george hotz rnn cerebras 5tb lstm rnns latent space 20mb llama2 sspl gpt j mlcs

Fast Connectivity Gradient Approximation: Maintaining spatially fine-grained connectivity gradients while reducing computational costs

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Jul 25, 2023

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.07.22.550017v1?rss=1 Authors: Nenning, K.-H., Xu, T., Tambini, A., Franco, A. R., Margulies, D. S., Colcombe, S. J., Milham, M. P. Abstract: Brain connectome analysis suffers from the high dimensionality of connectivity data, often forcing a reduced representation of the brain at a lower spatial resolution or parcellation. However, maintaining high spatial resolution can both allow fine-grained topographical analysis and preserve subtle individual differences otherwise lost. This work presents a computationally efficient approach to estimate spatially fine-grained connectivity gradients and demonstrates its application in improving brain-behavior predictions. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

llc maintaining costs reducing copy connectivity computational xu approximation margulies gradients biorxiv spatially milham

Liquid Time-constant Networks

Papers Read on AI

Play Episode Listen Later Jul 16, 2023 26:11

We introduce a new class of time-continuous recurrent neural network models. Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems modulated via nonlinear interlinked gates. The resulting models represent dynamical systems with varying (i.e., liquid) time-constants coupled to their hidden state, with outputs being computed by numerical differential equation solvers. These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations, and give rise to improved performance on time-series prediction tasks. To demonstrate these properties, we first take a theoretical approach to find bounds over their dynamics, and compute their expressive power by the trajectory length measure in a latent trajectory space. We then conduct a series of time-series prediction experiments to manifest the approximation capability of Liquid Time-Constant Networks (LTCs) compared to classical and modern RNNs. 2020: Ramin M. Hasani, Mathias Lechner, Alexander Amini, D. Rus, R. Grosu Recurrent neural network, Time series, Dynamical system, Nonlinear system, Approximation, Experiment, Numerical analysis, Artificial neural network https://arxiv.org/pdf/2006.04439v3.pdf

time experiments artificial constant networks liquid rus nonlinear numerical approximation hasani dynamical rnns

Big Bang Buzzcast Episode 250: The Einstein Approximation

The Big Bang Buzz - Big Bang Theory Podcast and News

Play Episode Listen Later May 22, 2023

250 episodes and still going strong!Nicole reveals her poll results regarding where we'd each live if we had to move (from our Bozeman Reaction Discussion)This week's discussion includes looking up the superior colliculus, Raj being left out, Bernadette and the daycare center, the ball pit scene and it's return in the tag, and more!Download hereRunning time: 1:25:13, 51.3 MB

running theory albert einstein bang big bang parsons mb raj kaley approximation cuoco buzzcast galecki

LW - Approximation is expensive, but the lunch is cheap by Jesse Hoogland

The Nonlinear Library

Play Episode Listen Later Apr 19, 2023 17:29

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Approximation is expensive, but the lunch is cheap, published by Jesse Hoogland on April 19, 2023 on LessWrong. Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort. Thank you to @Mark Chiu and @Quintin Pope for feedback. Machine learning is about finding good models: of the world and the things in it; of good strategies and the actions to achieve them. A sensible first question is whether this is even possible — whether the set of possible models our machine can implement contains a model that gets close to the thing we care about. In the language of empirical risk minimization, we want to know if there are models that accurately fit the target function and achieve low population risk, R(f∗). If this isn't the case, it doesn't matter whether your training procedure finds optimal solutions (optimization) or whether optimal solutions on your training set translate well to new data (generalization). You need good approximation for "optimal" to be good enough. The classical approach to approximation is that of universal approximation theorems. Unfortunately, this approach suffers from being too general and not saying anything about efficiency (whether in terms of the parameter count, weight norm, inference compute, etc.). It doesn't tell us what distinguishes neural networks as approximators from any other sufficiently rich model class such as polynomials, Gaussian processes, or even lookup tables. To find out what makes neural networks special, we have to move away from the classical focus on bounds that are agnostic to the details of the target function. You can't separate the properties that make neural networks special from the properties that make real-world target functions special. In particular, neural networks are well-suited to modeling two main features of real-world functions: smoothness (flat regions/low frequencies) and, for deep neural networks, sequential subtasks (hierarchical/modular substructure). A major flaw of classical learning theory is that it attempts to study learning in too much generality. Obtaining stronger guarantees requires breaking down the classes we want to study into smaller, more manageable subclasses. In the case of approximation, this means breaking apart the target function class to study "natural" kinds of target functions; in the case of generalization, this will mean breaking apart the model class into "learnable" subclasses. Already long underway, this shift towards a "thermodynamics of learning" is at the heart of an ongoing transformation in learning theory. Universal approximation is cosmic waste Polynomials are universal approximators. The original universal approximation theorem dates back to Weierstrass in 1885. He proved that polynomials could "uniformly" approximate any desired continuous function over a fixed interval, where "uniformly" means that the difference between the outputs of the target function and model function is less than a fixed distance, ϵ, for every input. Infinite-width networks are universal approximators. Half a century later, Stone generalized the result to arbitrary "polynomial-like" function classes in what is now known as the Stone-Weierstrass theorem. In 1989, Hornik, Stinchcombe, and White showed that infinite-width one-hidden-layer neural networks with sigmoidal activations satisfy the conditions of this theorem, which makes neural networks universal approximators. It's possible to obtain the same guarantees for networks with more modern activation functions (Telgarsky 2020) and through different approaches (e.g., Cybenko 1989). Universal approximation is expensive. The main problem with these results is that they say nothing about efficiency, i.e., how many parameters we need to achieve a good fit. Rather than blanket statements of "universal approximation...

stone speech universal lunch expensive cheap infinite ea obtaining cohorts gaussian approximation rationalist lesswrong

LW - Approximation is expensive, but the lunch is cheap by Jesse Hoogland

The Nonlinear Library: LessWrong

Play Episode Listen Later Apr 19, 2023 17:29

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Approximation is expensive, but the lunch is cheap, published by Jesse Hoogland on April 19, 2023 on LessWrong. Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort. Thank you to @Mark Chiu and @Quintin Pope for feedback. Machine learning is about finding good models: of the world and the things in it; of good strategies and the actions to achieve them. A sensible first question is whether this is even possible — whether the set of possible models our machine can implement contains a model that gets close to the thing we care about. In the language of empirical risk minimization, we want to know if there are models that accurately fit the target function and achieve low population risk, R(f∗). If this isn't the case, it doesn't matter whether your training procedure finds optimal solutions (optimization) or whether optimal solutions on your training set translate well to new data (generalization). You need good approximation for "optimal" to be good enough. The classical approach to approximation is that of universal approximation theorems. Unfortunately, this approach suffers from being too general and not saying anything about efficiency (whether in terms of the parameter count, weight norm, inference compute, etc.). It doesn't tell us what distinguishes neural networks as approximators from any other sufficiently rich model class such as polynomials, Gaussian processes, or even lookup tables. To find out what makes neural networks special, we have to move away from the classical focus on bounds that are agnostic to the details of the target function. You can't separate the properties that make neural networks special from the properties that make real-world target functions special. In particular, neural networks are well-suited to modeling two main features of real-world functions: smoothness (flat regions/low frequencies) and, for deep neural networks, sequential subtasks (hierarchical/modular substructure). A major flaw of classical learning theory is that it attempts to study learning in too much generality. Obtaining stronger guarantees requires breaking down the classes we want to study into smaller, more manageable subclasses. In the case of approximation, this means breaking apart the target function class to study "natural" kinds of target functions; in the case of generalization, this will mean breaking apart the model class into "learnable" subclasses. Already long underway, this shift towards a "thermodynamics of learning" is at the heart of an ongoing transformation in learning theory. Universal approximation is cosmic waste Polynomials are universal approximators. The original universal approximation theorem dates back to Weierstrass in 1885. He proved that polynomials could "uniformly" approximate any desired continuous function over a fixed interval, where "uniformly" means that the difference between the outputs of the target function and model function is less than a fixed distance, ϵ, for every input. Infinite-width networks are universal approximators. Half a century later, Stone generalized the result to arbitrary "polynomial-like" function classes in what is now known as the Stone-Weierstrass theorem. In 1989, Hornik, Stinchcombe, and White showed that infinite-width one-hidden-layer neural networks with sigmoidal activations satisfy the conditions of this theorem, which makes neural networks universal approximators. It's possible to obtain the same guarantees for networks with more modern activation functions (Telgarsky 2020) and through different approaches (e.g., Cybenko 1989). Universal approximation is expensive. The main problem with these results is that they say nothing about efficiency, i.e., how many parameters we need to achieve a good fit. Rather than blanket statements of "universal approximation...

stone speech universal lunch expensive cheap infinite ea obtaining cohorts gaussian approximation rationalist lesswrong

CHA: Side Pass vs. Leg Yield, Successive Approximation for Riding Instructors for Apr 18, 2023

Horses in the Morning

Play Episode Listen Later Apr 18, 2023 82:12

We learn about the world's first smart halter with Protequus founder Jeffrey R. Schab, CHA Master Horseman Marla Foreman explains successive approximation in teaching and Val McCloskey clears up the leg yield or side pass confusion.The HORSES IN THE MORNING Crew: Glenn the Geek, Christy Landwehr, Coach Jenn.Sponsored by Certified Horsemanship AssociationSponsor: Protequus, NightwatchGuest: Jeffrey Schab, Protequus CEOGuest: Marla Foreman, Marla Foreman FacebookGuest: Val McCloskey, pictured, Whisper Wind Equestrian CentreSong: Kasey SmithCHA Changes Lives Through Safe Experiences With HorsesFollow Horse Radio Network on Twitter or follow Horses In The Morning on FacebookAdditional support for this podcast provided by Listeners Like You

pass geeks riding instructors yield successive approximation coach jenn listeners like you christy landwehr facebookadditional

CHA: Side Pass vs. Leg Yield, Successive Approximation for Riding Instructors for Apr 18, 2023 - HORSES IN THE MORNING

All Shows Feed | Horse Radio Network

Play Episode Listen Later Apr 18, 2023 82:12

pass horses geeks riding instructors yield successive approximation coach jenn listeners like you christy landwehr facebookadditional

Approximation in Chinese.

Learn Chinese Podcast

Play Episode Listen Later Dec 30, 2022 10:17

Today's Episode: Approximation in Chinese.The Learn Chinese Podcast is brought to you by LC Chinese School. Listening to our podcast is an enjoyable way to learn Norwegian at your own pace, whenever and wherever you are! Visit our website www.lcchineseschool.com and sign up for a FREE Chinese Trial Class

chinese norwegian approximation

Anisotropic Photon and Electron Scattering without Ultrarelativistic Approximation

Astro arXiv | all categories

Play Episode Listen Later Nov 30, 2022 0:42

Anisotropic Photon and Electron Scattering without Ultrarelativistic Approximation by Anderson C. M. Lai et al. on Wednesday 30 November Interactions between photons and electrons are ubiquitous in astrophysics. Photons can be down scattered (Compton scattering) or up scattered (inverse Compton scattering) by moving electrons. Inverse Compton scattering, in particular, is an essential process for the production of astrophysical gamma rays. Computations of inverse Compton emission typically adopts an isotropic or an ultrarelativistic assumption to simplify the calculation, which makes them unable to broadcast the formula to the whole phase space of source particles. In view of this, we develop a numerical scheme to compute the interactions between anisotropic photons and electrons without taking ultrarelativistic approximations. Compared to the ultrarelativistic limit, our exact results show major deviations when target photons are down scattered or when they possess energy comparable to source electrons. We also consider two test cases of high-energy inverse Compton emission to validate our results in the ultrarelativistic limit. In general, our formalism can be applied to cases of anisotropic electron-photon scattering in various energy regimes, and for computing the polarizations of the scattered photons. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2211.15691v1

compton electron lai photons scattering arxiv approximation

Anisotropic Photon and Electron Scattering without Ultrarelativistic Approximation

Astro arXiv | all categories

Play Episode Listen Later Nov 29, 2022 0:46

Anisotropic Photon and Electron Scattering without Ultrarelativistic Approximation by Anderson C. M. Lai et al. on Tuesday 29 November Interactions between photons and electrons are ubiquitous in astrophysics. Photons can be down scattered (Compton scattering) or up scattered (inverse Compton scattering) by moving electrons. Inverse Compton scattering, in particular, is an essential process for the production of astrophysical gamma rays. Computations of inverse Compton emission typically adopts an isotropic or an ultrarelativistic assumption to simplify the calculation, which makes them unable to broadcast the formula to the whole phase space of source particles. In view of this, we develop a numerical scheme to compute the interactions between anisotropic photons and electrons without taking ultrarelativistic approximations. Compared to the ultrarelativistic limit, our exact results show major deviations when target photons are down scattered or when they possess energy comparable to source electrons. We also consider two test cases of high-energy inverse Compton emission to validate our results in the ultrarelativistic limit. In general, our formalism can be applied to cases of anisotropic electron-photon scattering in various energy regimes, and for computing the polarizations of the scattered photons. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2211.15691v1

compton electron lai photons scattering arxiv approximation

Overdriving Approximation

Fluidity

Play Episode Listen Later Oct 30, 2022 13:22

Approximation is a powerful technique, but is not applicable in all rational work, and so is not a good general theory of nebulosity. https://metarationality.com/approximation You can support the podcast and get episodes a week early, by supporting the Patreon: https://www.patreon.com/m/fluidityaudiobooks If you like the show, consider buying me a coffee: https://www.buymeacoffee.com/mattarnold Original music by Kevin MacLeod. This podcast is under a Creative Commons Attribution Non-Commercial International 4.0 License.

original kevin macleod approximation

Approximation by Rationals: A New Focus

The Art of Mathematics

Play Episode Listen Later Oct 26, 2022 21:35

Joseph Bennish, Prof. Emeritus of CSULB, describes the field of Diophantine approximation, which started in the 19th Century with questions about how well irrational numbers can be approximated by rationals. It took Cantor and Lebesgue to develop new ways to talk about the sizes of infinite sets to give the 20th century new ways to think about it. This led up to the Duffin-Schaeffer conjecture and this year's Fields Medal for James Maynard. --- Send in a voice message: https://anchor.fm/the-art-of-mathematics/message

prof emeritus cantor new focus approximation csulb fields medal rationals lebesgue

Constraining the Fluctuating Gunn-Peterson Approximation Using Lyman- α Forest Tomography at z=2

Astro arXiv | all categories

Play Episode Listen Later Sep 22, 2022 0:58

Constraining the Fluctuating Gunn-Peterson Approximation Using Lyman- α Forest Tomography at z=2 by Robin Kooistra et al. on Thursday 22 September The fluctuating Gunn-Peterson approximation (FGPA) is a commonly-used method to generate mock Lyman-$alpha$ (Ly$alpha$) forest absorption skewers at Cosmic Noon ($zgtrsim 2$) from the matter-density field of $N$-body simulations without running expensive hydrodynamical simulations. Motivated by recent developments in 3D IGM tomography observations as well as matter density field reconstruction techniques applied to galaxy redshift samples at $zsim 2$, we examine the possibility of observationally testing FGPA by directly examining the relationship between the Ly$alpha$ transmission and the underlying matter density field. Specifically, we analyze the EAGLE, Illustris, IllustrisTNG and Nyx cosmological hydrodynamic simulations, that were run with different codes and sub-grid models. While the FGPA is an excellent description of the IGM in lower-density regions, the slope of the transmission-density distribution at higher densities is significantly affected by feedback processes causing the FGPA to break down in that regime. Even without added feedback, we find significant deviations caused by hydrodynamical effects arising from non-linear structure growth. We then proceed to make comparisons using realistic mock data assuming the sightline sampling and spectral properties of the recent CLAMATO survey, and find that it would be challenging to discern between the FGPA and hydrodynamical models with current data sets. However, the improved sightline sampling from future extremely large telescopes or large volumes from multiplexed spectroscopic surveys such as Subaru PFS should allow for stringent tests of the FGPA, and make it possible to detect the effect of galaxy feedback on the IGM. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2201.10169v2

eagle peterson motivated gunn lyman ly nyx fluctuating arxiv igm constraining approximation clamato tomography

WGAF #331: Lady Approximation

Who gives a F

Play Episode Listen Later Sep 21, 2022 53:52

The brothers ramble on about a good ol' Ryan Reynolds movie Just Friends. Why was Wade scarred of the Goths? Will wrestling be a religion in the far future? Why are country singers trying to swing so hard? Is Britain trying to run it back? Were British atheists offended by hearing God save the queen? Email us stuff at punandgame@gmail.com Merch:https://teespring.com/stores/punandgamePromo Code: WGAF for free shippingYouTube:https://www.youtube.com/channel/UCDUpI3McVZBegI28on8uwOATwitter:@PunandGameInstagram:@WadeTaylor_WGAF@PunandGame

god britain merch ryan reynolds goth just friends approximation

Toward an understanding of the properties of neural network approaches for supernovae light curve approximation

Astro arXiv | all categories

Play Episode Listen Later Sep 18, 2022 1:00

Toward an understanding of the properties of neural network approaches for supernovae light curve approximation by Mariia Demianenko et al. on Sunday 18 September The modern time-domain photometric surveys collect a lot of observations of various astronomical objects, and the coming era of large-scale surveys will provide even more information. Most of the objects have never received a spectroscopic follow-up, which is especially crucial for transients e.g. supernovae. In such cases, observed light curves could present an affordable alternative. Time series are actively used for photometric classification and characterization, such as peak and luminosity decline estimation. However, the collected time series are multidimensional, irregularly sampled, contain outliers, and do not have well-defined systematic uncertainties. Machine learning methods help extract useful information from available data in the most efficient way. We consider several light curve approximation methods based on neural networks: Multilayer Perceptrons, Bayesian Neural Networks, and Normalizing Flows, to approximate observations of a single light curve. Tests using both the simulated PLAsTiCC and real Zwicky Transient Facility data samples demonstrate that even few observations are enough to fit networks and achieve better approximation quality than other state-of-the-art methods. We show that the methods described in this work have better computational complexity and work faster than Gaussian Processes. We analyze the performance of the approximation techniques aiming to fill the gaps in the observations of the light curves, and show that the use of appropriate technique increases the accuracy of peak finding and supernova classification. In addition, the study results are organized in a Fulu Python library available on GitHub, which can be easily used by the community. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2209.07542v1

time approaches properties curve github neural networks arxiv supernovae approximation

Yashas Shetty »Approximation of the Sea«

ZKM | Karlsruhe /// Specials /// Specials

Play Episode Listen Later Sep 16, 2022 31:44

Residency initiative »bangaloREsidency-Expanded« | Concert [05.03.2022] Yashas Shetty »Approximation of the Sea« (2020 – 22), for tanpura, piano and electronics, 30' The ‘drone' influences Western minimalism for almost 60 years now. Beyond its influence on Western classical music, its presence has extended into popular music as well, with early bands such as The Velvet Underground to some German Krautrock groups borrowing heavily from the concept of the drone. Its influence reaches deep into popular music and culture of today. Yet, the historical narrative of the drone has always been a one way street. It is a story told by American composers in which the early origins of this transfer of technology are shrouded in mythology and mystification. »Approximation of the Sea«, developed by sound artist Yashas Shetty in collaboration with ZKM | Hertz-Lab sound engineer Benjamin Miller for tanpura, piano, and electronics, explores both the technical and aesthetic challenges of composing with surround systems. It is designed especially for the ZKM_Cube. The piece in itself extends from Shetty's ongoing research into the history of encounters between Western composers and Indian artists in the 20th Century and is rooted in his broader artistic practice of extracting histories from mythologies.

american western indian sea residency velvet underground shetty approximation benjamin miller

S5E19 - Welcome Back, Alice, Approximation of a Brother Complex -A Sibling Love Story-

ScreenTone Club

Play Episode Listen Later Sep 6, 2022 44:28

In this episode of ScreenTone Club, Elliot and Andy get mixed up with questionable romances of multiple kinds and ponder the true value of a well-timed truck.Series Discussed: Welcome Back, Alice Vol. 1, Approximation of a Brother Complex -A Sibling Love Story-Assignments for next Episode: Tesla Note Vol. 1, Ya Boy Kongming! Vol. 1If you enjoy this episode, please consider backing us on Patreon - from only US$1 a month you get bonus episodes and other perks as well, including the ability to vote on topics for us to cover!We are affiliates on BookWalker! Using this link will give us a small kickback, helping cover the cost of manga for the podcast!TIMECODES:0:00:45 - Forever 21!0:02:30 - Elliot's Pick: Welcome Back, Alice0:05:30 - “Basically”0:08:15 - Self-Parody?0:12:00 - Cave Painting Character0:15:15 - Devil Ecstasy - Semiotics of Succubi!0:19:00 - Author Name Power!0:24:00 - Andy's Pick: Approximation of a Brother Complex -A Sibling Love Story-0:26:45 - “Shades of Haruhi”0:29:30 - Juuuuuuuuust Right0:35:15 - Screentone Club makes another Light Novel Title0:35:30 - Post-ending thread0:38:30 - The Final Verdict on Oreimo0:41:00 - Engaging!0:41:15 - Our Picks for next Episode!0:43:00 - Closedown!

brothers forever engaging complex vol love stories shades sibling final verdict approximation succubi our picks haruhi ya boy kongming

S05E18 - Nights with a Cat,

ScreenTone Club

Play Episode Listen Later Aug 24, 2022 47:49

In this episode of ScreenTone Club, Elliot and Andy visit the demon realm to see its most pitiable resident before setting in for the night (with a cat)!Series Discussed: Nights With a Cat Vol. 1, The Great Jahy Will Not Be Defeated! Vol. 1Assignments for next Episode: Welcome Back, Alice Vol. 1, Approximation of a Brother Complex -A Sibling Love Story-If you enjoy this episode, please consider backing us on Patreon - from only US$1 a month you get bonus episodes and other perks as well, including the ability to vote on topics for us to cover!We are affiliates on BookWalker! Using this link will give us a small kickback, helping cover the cost of manga for the podcast!TIMECODES:0:01:00 - Can You Drink Anime???0:03:15 - Elliot's Pick: Nights with a Cat Vol. 10:07:00 - Cynicism???0:10:15 - Well Balanced0:13:15 - Pleasant Afterword :30:17:15 - Andy's Pick: The Great Jahy Will Not Be Defeated! Vol. 10:20:15 - “The Manga Starts Now!”0:21:00 - The Comparisons Begin!0:23:15 - Dungeon Keeper (1997)0:25:30 - Steamed Hams0:26:30 - Elliot Page Politics Minute!0:31:30 - “The Poverty Angle”0:34:45 - Squeenix Published = issues0:40:45 - Druj Gaiden?0:42:00 - Oops Backseat writing again!0:44:15 - We have reached the “Bargaining” stage!0:45:00 - Our Picks for Next Episode!0:46:00 - THE SCREENTONE CLUB HIVE MIND0:47:30 - Closedown

vol nights cynicism bargaining approximation dungeon keeper our picks great jahy will not be defeated

A Close Approximation of You

Drama of the Week

Play Episode Listen Later Aug 12, 2022 58:07

Psychological thriller about love and the mirror-verse by Oliver Emanuel.

psychological approximation

A Vague Approximation Of Normal :S2 EP5

I HAVE TO RETURN SOME VIDEOTAPES

Play Episode Listen Later Jun 6, 2022 45:16

this one has a lot more fun tone than anything else I've done in the past, and is probably the most easy listening of all my episodes so far. expect some chaotic energy. And some Phil Collins. --- Send in a voice message: https://anchor.fm/biggayeric1505gmailcom/message

normal phil collins vague approximation

BjjBrick Quick 136 Start from an approximation and proceed with extreme precision

BjjBrick Podcast- BJJ, Jiu-Jitsu, MMA, martial arts, no-gi and good times!

Play Episode Listen Later Dec 27, 2021 6:02

extreme precision proceed approximation

E40: PNF Basic Principles Deep Dive (Part 2): Approximation & Traction

The Neuropedics Rehab Podcast

Play Episode Listen Later Nov 30, 2021 18:18

In part 2 of this series Mez explains the why and how to apply approximation & traction into your practice from a "neuro" and "ortho" perspective. This PNF checklist item is a subcategory of the proprioceptive input principle. Sign up for our --> weekly newsletter for more rehab content Apply Now --> Application to Neuropedics' 12 Week Motor Control Mentorship Program Visit our website www.NeuropedicsPT.com Questions for Mez? You can email him: Ramez@NeuropedicsPT.com

deep dive application traction basic principles approximation

Crooks & Coppers, Ep 120 - The Defenestration Approximation

Quid Pro Roll

Play Episode Listen Later Nov 17, 2021 47:07

A Sibling Chats and Sketchy Potion Vats Actual Play Adventure Co-Created by: Alpha Comics & Games: From vintage comics & games to new releases, find your Adventure at Alpha! | Conveniently located in Willow Lawn, Richmond VA. Goblins and Growlers: Community Building Through Tabletop Gaming. Creating all-original TTRPG content, and fostering nerdy spaces for everyone both digitally and in-person!

adventure alpha crooks goblins ttrpg richmond virginia defenestration coppers approximation

Ep. 25 | Can we only know Truth by approximation?

Colton Kirby's Podcast

Play Episode Listen Later Aug 11, 2021 98:00

I am talking with my friend, Vidal Freire. We discuss, most exactly, Approximated Truth Claims. Vidal posits that we cannot really know reality absolutely, and therefore all our knowledge about reality (and about truth) is approximate. This then means that language is merely a broken human tool, science is not definite, and that we are constantly plagued by our own subjectivity. This makes Vidal a very definite and hard Skeptic. I have many, many points of contention and disagreement with this view (in fact, it is almost the anthesis of my thinking); however, I do not attempt to hash them out here, for I was primarily concerned with understanding Vildal's philosophy. Vidal and I intend to have another conversation in the future about God, Existentialism (how I define it), and more about honest and atheistic skepticism. Here is the link to Vidal's diagram: https://www.coltonkirby.com/post/vidal-freire-s-approximated-truth-claims-diagram Make sure to leave a review and subscribe! Check out my YouTube: Colton Kirby - YouTube --Links-- My Website: https://www.coltonkirby.com/ Twitter: https://twitter.com/_coltonkirby Parler: https://parler.com/profile/coltonkirby/posts Facebook: https://www.facebook.com/coltonjkirby/ Instagram: https://www.instagram.com/_coltonkirby/ Pinterest: https://www.pinterest.com/coltonjkirby/ Quora: https://www.quora.com/profile/Colton-Kirby-7

god pinterest vidal skeptic quora existentialism approximation

fields harrington

Y2K GROUP CHAT

Play Episode Listen Later Jul 16, 2020 104:09

Welcome to our second episode for Y2K GROUP CHAT. We recorded on two separate days in June 2020. fields delves into: COVID-19, the history of the spirometer, the study of labor and fatigue, his relationship with science, fact-checking, Taco Tuesdays, urban gardening, the history of medical theater, and biases in science. fields harrington is an emerging artist based in Brooklyn, New York. Follow us on Instagram: @y2kgroup Subscribe to our YouTube channel for more content about contemporary art Stay up-to-date with our Y2K Blog on our website for more news. Audio timestamps below: 1:17 - Intro with fields 2:16 - Y2K's reason for starting a podcast 3:20 - The podcast music introduction 5:17 - First question: how has your work and life changed since the pandemic? 6:36 - Whitney ISP catalog 7:44 - Braun text on the spirometer 8:56 - Rabinbach text on the human motor (thermodynamics, labor power and fatigue) 10:15 - Labor/work and fatigue in the body 11:17 - fields' mapping of physics and science and its origins 14:37 - Second question: how did the spirometer become the starting point for fields' research? 15:40 - fields' relationship with science 18:12 - Round 2 19:02 - News 19:42 - Twitter/Social Media 21:50 - Fact-checking 23:49 - Deepfake 25:21 - Taco Tuesdays 25:57 - Texas and High School 27:05 - Moving to New York in 2011 27:28 - The Black Beyond Zoom Artist Talk reference 28:52 - Y2K's casual podcast format 30:32 - Community college (finding photography and food ads) 33:23 - Photography at UNT 35:37 - Road trip / couch surfing to New York 37:43 - Working in urban food start-ups and problem solving 39:10 - Beginnings of a career as an artist 39:31 - UNT thesis 40:49 - Food and advertisement 44:31 - Produce Manager and researching solutions 49:06 - Back to school 52:07 - Types of work at UPENN 55:07 - Performing with acoustic levitation 56:53 - S-CURL in high school 1:00:43 - Performance at UPENN using S-CURL 1:03:48 - Reaction to performance 1:05:09 - Medical theater introduction 1:06:40 - UPENN and medical history 1:07:35 - Paintings of medical theater 1:09:54 - Hogarth's The Reward of Cruelty painting 1:11:13 - Robert Thom painting 1:12:36 - J. Marion Simms racist legacy 1:15:31 - "What remains is constant" by fields harrington 1:16:02 - Braun and Rabinbach texts 1:17:18 - The history of the study of fatigue for labor/work 1:19:01 - Benjamin Gould report 1:23:20 - Etienne-Jules Marey 1:25:07 - fields' essay as artwork 1:26:50 - COVID-19 and spirometer having similar biases 1:28:30 - Race table from Gould's report 1:30:23 - Statistics as surveillance 1:30:55 - Biases in science 1:32:57 - Approximation of a Mix performance question 1:37:40 - Protests and Uprising 1:41:31 - Future work

PA Talks 04 - Habibeh Madjdabadi (Approximation in Architecture)

PA Talks

Play Episode Listen Later Apr 26, 2020 22:38

Tune in to Episode 04 of the PA Talks series with Habibeh Madjdabadi, a notably prominent Iranian architect of the younger generation. After graduating with a Master's degree in Architecture from Azad University of Tehran, she established her studio in 2003 in Tehran. She has received numerous awards and recognitions such as Aga Khan Award, Tamayouz Women in Architecture, and many more. Her projects include House of 40 Knots, Barjeel Museum of Modern Arab Art and Approximation House. Madjdabadi's emphasis on the material originates from the role of cultural and geographical matters, as she considers them a poetic way of expression that considers human labor an integral part of the creative process of design. Watch this podcast on YouTube: https://www.youtube.com/watch?v=J9dLg_Otk94 Listen on: Apple Podcasts: https://podcasts.apple.com/tr/podcast/pa-talks/id1503812708 Spotify: https://open.spotify.com/show/4P442GMuRk0VtBtNifgKhU Google Podcasts: https://podcasts.google.com/search/pa%20talks Support us on Patreon: patreon.com/parametricarchitecture Follow the platform on: Parametric Architecture: https://www.instagram.com/parametric.architecture/ PA Talks: https://www.instagram.com/pa__talks Website: https://parametric-architecture.com/patalks/

spotify master house architecture iranians tehran knots approximation

Ep. 84: The Representation Approximation

ISGP's "The Forum"

Play Episode Listen Later Dec 10, 2018 30:32

HOT TOPIC: Portrayal of scientists in pop culture. The public's impression of scientists (and science as a whole) is often shaped by characters depicted in TV or movies. How has the portrayal of scientists in the entertainment industry changed over time, and what can a show like The Big Bang Theory teach us about the strategic use of science in pop culture? Featuring: Ingrid Ockert, Science History Institute, and David Saltzberg, UCLA/The Big Bang Theory Socialize with science on Twitter and Facebook using @ISGPforum. Disclaimer: The ISGP is a nonprofit organization that does not lobby for any position except rational thinking. Podcasts within the "Hot Topics Series" (Episodes 75+) reflect the views expressed by featured guests. For information on The Forum, please visit www.ISGPforum.org, and to learn more about the ISGP, check out www.scienceforglobalpolicy.org.

tv podcasts forum representation big bang theory approximation science history institute

Podcasts about Approximation

Best podcasts about Approximation

Learning Machines 101

Chemistry 442: Physical Chemistry I

The Nonlinear Library

OCW Scholar: Single Variable Calculus

Advanced Visualization (ECS277)

Calculus I

ScreenTone Club

The Dark Horde Network

Chem 131A Physical Chemistry

Papers Read on AI

Fall 2014 Shamatha, Vipashyana, Dream Yoga

Algorithm Design and Analysis

Design and Analysis of Algorithms (2015)

Modellansatz

Highly Oscillatory Problems: Computation, Theory and Application

Applied Calculus (Chapters 1 - 3) - Course

Combined Calculus (Chapters 3 - 6)

Astro arXiv | all categories

Latest news about Approximation

Latest podcast episodes about Approximation

The 300-Year-Old Physics Mistake No One Noticed

Big Bang Buzzcast Episode 277: The Prestidigitation Approximation

What is Successive Approximation in Dog Training?

What is Successive Approximation in Dog Training

#077 Jens Frank "successive approximation"

S5 05 - A Rough Approximation of Woman

#137: Optimal Quantile Approximation in Streams

Ep. 56 - 3x14 - The Einstein Approximation

Forensic Facial Approximation with Lisa Bailey

Erik Dale: Uncovering Bitcoin's Hidden Truths | BFM048

Episode 162 - Approximation, Precision, & Accuracy in Measurement

LW - "Deep Learning" Is Function Approximation by Zack M Davis

LW - "Deep Learning" Is Function Approximation by Zack M Davis

122 GAMBIT: Genomic Approximation Method for Bacterial Identification and Tracking

#186 Tools and Tips for Teachers: Episode 10 (with Ollie Lovell)

LW - Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B? by Teun van der Weij

LW - Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B? by Teun van der Weij

Meet The Team w/ Andreas Heiberg | Career Trajectory | Becoming A Manager | Startup Advice | Working At Stellate | Distributed Work | Mentors | Approximation Of Progress | AI | Question Everything

8. Approximation of Justice

172 - Counterintuitive Line Maps

Betrayal on the Bayou | 8. Approximation of Justice

#258 Approximation in the Bible - Jimmy Akin

FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI

Fast Connectivity Gradient Approximation: Maintaining spatially fine-grained connectivity gradients while reducing computational costs

Liquid Time-constant Networks

Big Bang Buzzcast Episode 250: The Einstein Approximation

LW - Approximation is expensive, but the lunch is cheap by Jesse Hoogland

LW - Approximation is expensive, but the lunch is cheap by Jesse Hoogland

CHA: Side Pass vs. Leg Yield, Successive Approximation for Riding Instructors for Apr 18, 2023

CHA: Side Pass vs. Leg Yield, Successive Approximation for Riding Instructors for Apr 18, 2023 - HORSES IN THE MORNING

Approximation in Chinese.

Anisotropic Photon and Electron Scattering without Ultrarelativistic Approximation

Anisotropic Photon and Electron Scattering without Ultrarelativistic Approximation

Overdriving Approximation

Approximation by Rationals: A New Focus

Constraining the Fluctuating Gunn-Peterson Approximation Using Lyman- α Forest Tomography at z=2

WGAF #331: Lady Approximation

Toward an understanding of the properties of neural network approaches for supernovae light curve approximation

Yashas Shetty »Approximation of the Sea«

S5E19 - Welcome Back, Alice, Approximation of a Brother Complex -A Sibling Love Story-

S05E18 - Nights with a Cat,

A Close Approximation of You

A Vague Approximation Of Normal :S2 EP5

BjjBrick Quick 136 Start from an approximation and proceed with extreme precision

E40: PNF Basic Principles Deep Dive (Part 2): Approximation & Traction

Crooks & Coppers, Ep 120 - The Defenestration Approximation

Ep. 25 | Can we only know Truth by approximation?

fields harrington

PA Talks 04 - Habibeh Madjdabadi (Approximation in Architecture)

Ep. 84: The Representation Approximation