Podcasts about kolmogorov

Soviet mathematician

  • 46PODCASTS
  • 79EPISODES
  • 43mAVG DURATION
  • ?INFREQUENT EPISODES
  • Feb 12, 2025LATEST
kolmogorov

POPULARITY

20172018201920202021202220232024


Best podcasts about kolmogorov

Latest podcast episodes about kolmogorov

Machine Learning Street Talk
Sepp Hochreiter - LSTM: The Comeback Story?

Machine Learning Street Talk

Play Episode Listen Later Feb 12, 2025 67:01


Sepp Hochreiter, the inventor of LSTM (Long Short-Term Memory) networks – a foundational technology in AI. Sepp discusses his journey, the origins of LSTM, and why he believes his latest work, XLSTM, could be the next big thing in AI, particularly for applications like robotics and industrial simulation. He also shares his controversial perspective on Large Language Models (LLMs) and why reasoning is a critical missing piece in current AI systems.SPONSOR MESSAGES:***CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!https://centml.ai/pricing/Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.Goto https://tufalabs.ai/***TRANSCRIPT AND BACKGROUND READING:https://www.dropbox.com/scl/fi/n1vzm79t3uuss8xyinxzo/SEPPH.pdf?rlkey=fp7gwaopjk17uyvgjxekxrh5v&dl=0Prof. Sepp Hochreiterhttps://www.nx-ai.com/https://x.com/hochreitersepphttps://scholar.google.at/citations?user=tvUH3WMAAAAJ&hl=enTOC:1. LLM Evolution and Reasoning Capabilities[00:00:00] 1.1 LLM Capabilities and Limitations Debate[00:03:16] 1.2 Program Generation and Reasoning in AI Systems[00:06:30] 1.3 Human vs AI Reasoning Comparison[00:09:59] 1.4 New Research Initiatives and Hybrid Approaches2. LSTM Technical Architecture[00:13:18] 2.1 LSTM Development History and Technical Background[00:20:38] 2.2 LSTM vs RNN Architecture and Computational Complexity[00:25:10] 2.3 xLSTM Architecture and Flash Attention Comparison[00:30:51] 2.4 Evolution of Gating Mechanisms from Sigmoid to Exponential3. Industrial Applications and Neuro-Symbolic AI[00:40:35] 3.1 Industrial Applications and Fixed Memory Advantages[00:42:31] 3.2 Neuro-Symbolic Integration and Pi AI Project[00:46:00] 3.3 Integration of Symbolic and Neural AI Approaches[00:51:29] 3.4 Evolution of AI Paradigms and System Thinking[00:54:55] 3.5 AI Reasoning and Human Intelligence Comparison[00:58:12] 3.6 NXAI Company and Industrial AI ApplicationsREFS:[00:00:15] Seminal LSTM paper establishing Hochreiter's expertise (Hochreiter & Schmidhuber)https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory[00:04:20] Kolmogorov complexity and program composition limitations (Kolmogorov)https://link.springer.com/article/10.1007/BF02478259[00:07:10] Limitations of LLM mathematical reasoning and symbolic integration (Various Authors)https://www.arxiv.org/pdf/2502.03671[00:09:05] AlphaGo's Move 37 demonstrating creative AI (Google DeepMind)https://deepmind.google/research/breakthroughs/alphago/[00:10:15] New AI research lab in Zurich for fundamental LLM research (Benjamin Crouzier)https://tufalabs.ai[00:19:40] Introduction of xLSTM with exponential gating (Beck, Hochreiter, et al.)https://arxiv.org/abs/2405.04517[00:22:55] FlashAttention: fast & memory-efficient attention (Tri Dao et al.)https://arxiv.org/abs/2205.14135[00:31:00] Historical use of sigmoid/tanh activation in 1990s (James A. McCaffrey)https://visualstudiomagazine.com/articles/2015/06/01/alternative-activation-functions.aspx[00:36:10] Mamba 2 state space model architecture (Albert Gu et al.)https://arxiv.org/abs/2312.00752[00:46:00] Austria's Pi AI project integrating symbolic & neural AI (Hochreiter et al.)https://www.jku.at/en/institute-of-machine-learning/research/projects/[00:48:10] Neuro-symbolic integration challenges in language models (Diego Calanzone et al.)https://openreview.net/forum?id=7PGluppo4k[00:49:30] JKU Linz's historical and neuro-symbolic research (Sepp Hochreiter)https://www.jku.at/en/news-events/news/detail/news/bilaterale-ki-projekt-unter-leitung-der-jku-erhaelt-fwf-cluster-of-excellence/YT: https://www.youtube.com/watch?v=8u2pW2zZLCs

Price of Business Show
Pavel Kolmogorov- Navigating Complex Civil Disputes: Strategies to Success

Price of Business Show

Play Episode Listen Later Feb 5, 2025 6:02


02-03-2025 Pavel Kolmogorov Learn more about the interview and get additional links here: https://usabusinessradio.com/navigating-complex-civil-disputes-strategies-to-success/ Subscribe to the best of our content here: https://priceofbusiness.substack.com/ Subscribe to our YouTube channel here: https://www.youtube.com/channel/UCywgbHv7dpiBG2Qswr_ceEQ

Choses à Savoir
Pourquoi le tableau de Van Gogh «La Nuit étoilée» anticipe une théorie scientifique ?

Choses à Savoir

Play Episode Listen Later Oct 13, 2024 2:47


Le tableau "La Nuit étoilée" de Vincent van Gogh, peint en 1889, est souvent admiré pour sa beauté et ses tourbillons célestes. Cependant, au-delà de sa valeur esthétique, ce tableau anticipe curieusement une théorie mathématique qui ne verra le jour que plusieurs décennies plus tard : la loi de Kolmogorov, décrivant la turbulence dans les fluides. La loi de Kolmogorov et la turbulenceLa loi de Kolmogorov, formulée par le mathématicien russe Andrei Kolmogorov en 1941, est une théorie qui explique comment l'énergie se répartit dans un fluide turbulent à différentes échelles. Dans un système turbulent, l'énergie injectée à une grande échelle (comme dans un courant d'air ou d'eau) est transférée aux plus petites échelles de manière chaotique et imprévisible. Kolmogorov a montré que cette distribution d'énergie suit une loi mathématique précise, appelée loi de Kolmogorov, qui s'applique aux phénomènes de turbulence dans des systèmes naturels. Les tourbillons de "La Nuit étoilée"Ce qui rend "La Nuit étoilée" particulièrement fascinante du point de vue scientifique est la représentation des flux de lumière tourbillonnants dans le ciel nocturne. Van Gogh, sans le savoir, a peint des motifs qui rappellent les schémas de turbulence décrits par Kolmogorov. Les tourbillons visibles dans le ciel, les halos lumineux autour des étoiles et les mouvements fluides des nuages semblent capturer l'essence même de la turbulence, avec des variations d'intensité et de mouvement qui ressemblent à la dynamique des fluides. Des études scientifiques ont montré que certaines zones du tableau, notamment les tourbillons lumineux, présentent des propriétés statistiques similaires à celles de la turbulence décrite par la loi de Kolmogorov. En 2004, des astrophysiciens ont appliqué des techniques d'analyse numérique aux motifs du tableau de van Gogh et ont découvert que ces tourbillons obéissent à des schémas mathématiques correspondant aux turbulences dans des fluides naturels, comme ceux observés dans les courants atmosphériques, les nébuleuses interstellaires ou encore les flux d'eau. Un lien intuitif avec la natureVan Gogh, en pleine période de troubles mentaux lorsqu'il a peint "La Nuit étoilée", semble avoir capturé de manière instinctive la complexité et la beauté des forces naturelles invisibles. Son regard visionnaire et son souci du détail ont permis de représenter des mouvements complexes du monde naturel qui, bien que mal compris à son époque, trouvent des échos dans les découvertes scientifiques ultérieures. En somme, "La Nuit étoilée" n'est pas seulement une œuvre d'art intemporelle ; elle préfigure également une compréhension moderne de la dynamique des fluides, anticipant de façon remarquable la loi de Kolmogorov. Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.

Infosec Decoded
Kolmogorov-Arnold Networks

Infosec Decoded

Play Episode Listen Later Sep 13, 2024 21:10


Infosec Decoded Season 4 #73: Kolmogorov-Arnold Networks With Doug Spindler and @sambowne@infosec.exchange Links: https://samsclass.info/news/news_091324.html Recorded Fri, Sep 13, 2024

The Jim Rutt Show
EP 249 Seth Lloyd on Measuring Complexity

The Jim Rutt Show

Play Episode Listen Later Aug 6, 2024 64:45


Jim talks with Seth Lloyd about the many ways of measuring complexity. They discuss the difficulty of measuring complexity, the metabolism of bacteria, Kolmogorov complexity, Shannon entropy, Charles Bennett's logical depth, cellular automata, effective complexity & its discovery, the effective complexity of a bacterium, coarse graining, fractal dimensions, Lempel-Ziv complexity, the invention of Morse code, epsilon machines, thermodynamic depth, mutual information, integrated information as a more intricate form of mutual information, panpsychism, whether "consciousness" has a referent, network complexity, multiscale entropy, pragmatic application of complexity measures, and much more. Episode Transcript JRS EP 79 - Seth Lloyd on Our Quantum Universe The Origins of Order: Self-Organization and Selection in Evolution, by Stuart Kauffman Seth Lloyd is professor of mechanical engineering at MIT. Dr. Lloyd's research focuses on problems on information and complexity in the universe. He was the first person to develop a realizable model for quantum computation and is working with a variety of groups to construct and operate quantum computers and quantum communication systems. Dr. Lloyd has worked to establish fundamental physical limits to precision measurement and to develop algorithms for quantum computers for pattern recognition and machine learning. He is author of over three hundred scientific papers, and of Programming the Universe (Knopf, 2004).

The Nonlinear Library
AF - A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication by johnswentworth

The Nonlinear Library

Play Episode Listen Later Jul 26, 2024 35:18


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication, published by johnswentworth on July 26, 2024 on The AI Alignment Forum. A Solomonoff inductor walks into a bar in a foreign land. (Stop me if you've heard this one before.) The bartender, who is also a Solomonoff inductor, asks "What'll it be?". The customer looks around at what the other patrons are having, points to an unfamiliar drink, and says "One of those, please.". The bartender points to a drawing of the same drink on a menu, and says "One of those?". The customer replies "Yes, one of those.". The bartender then delivers a drink, and it matches what the first inductor expected. What's up with that? The puzzle, here, is that the two Solomonoff inductors seemingly agree on a categorization - i.e. which things count as the Unnamed Kind Of Drink, and which things don't, with at least enough agreement that the customer's drink-type matches the customer's expectations. And the two inductors reach that agreement without learning the category from huge amounts of labeled data - one inductor points at an instance, another inductor points at another instance, and then the first inductor gets the kind of drink it expected. Why (and when) are the two inductors able to coordinate on roughly the same categorization? Most existing work on Solomonoff inductors, Kolmogorov complexity, or minimum description length can't say much about this sort of thing. The problem is that the customer/bartender story is all about the internal structure of the minimum description - the (possibly implicit) "categories" which the two inductors use inside of their minimal descriptions in order to compress their raw data. The theory of minimum description length typically treats programs as black boxes, and doesn't attempt to talk about their internal structure. In this post, we'll show one potential way to solve the puzzle - one potential way for two minimum-description-length-based minds to coordinate on a categorization. Main Tool: Natural Latents for Minimum Description Length Fundamental Theorem Here's the main foundational theorem we'll use. (Just the statement for now, more later.) We have a set of n data points (binary strings) {xi}, and a Turing machine TM. Suppose we find some programs/strings Λ,{ϕi},Λ',{ϕ'i} such that: Mediation: (Λ,ϕ1,…,ϕn) is an approximately-shortest string such that (TM(Λ,ϕi) = xi for all i) Redundancy: For all i, (Λ',ϕ'i) is an approximately-shortest string such that TM(Λ',ϕ'i) = xi.[1] Then: the K-complexity of Λ' given Λ,K(Λ'|Λ), is approximately zero - in other words, Λ' is approximately determined by Λ, in a K-complexity sense. (As a preview: later we'll assume that both Λ and Λ' satisfy both conditions, so both K(Λ'|Λ) and K(Λ|Λ') are approximately zero. In that case, Λ and Λ' are "approximately isomorphic" in the sense that either can be computed from the other by a short program. We'll eventually tackle the customer/bartender puzzle from the start of this post by suggesting that Λ and Λ' each encode a summary of things in one category according to one inductor, so the theorem then says that their category summaries are "approximately isomorphic".) The Intuition What does this theorem mean intuitively? Let's start with the first condition: (Λ,ϕ1,…,ϕn) is an approximately-shortest string such that (TM(Λ,ϕi) = xi for all i). Notice that there's a somewhat-trivial way to satisfy that condition: take Λ to be a minimal description of the whole dataset {xi}, take ϕi=i, and then add a little bit of code to Λ to pick out the datapoint at index ϕi[2]. So TM(Λ,ϕi) computes all of {xi} from Λ, then picks out index i. Now, that might not be the only approximately-minimal description (though it does imply that whatever approximately-minimal Λ,ϕ we do use is approximately a minimal description fo...

The Nonlinear Library
LW - A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication by johnswentworth

The Nonlinear Library

Play Episode Listen Later Jul 26, 2024 35:18


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication, published by johnswentworth on July 26, 2024 on LessWrong. A Solomonoff inductor walks into a bar in a foreign land. (Stop me if you've heard this one before.) The bartender, who is also a Solomonoff inductor, asks "What'll it be?". The customer looks around at what the other patrons are having, points to an unfamiliar drink, and says "One of those, please.". The bartender points to a drawing of the same drink on a menu, and says "One of those?". The customer replies "Yes, one of those.". The bartender then delivers a drink, and it matches what the first inductor expected. What's up with that? The puzzle, here, is that the two Solomonoff inductors seemingly agree on a categorization - i.e. which things count as the Unnamed Kind Of Drink, and which things don't, with at least enough agreement that the customer's drink-type matches the customer's expectations. And the two inductors reach that agreement without learning the category from huge amounts of labeled data - one inductor points at an instance, another inductor points at another instance, and then the first inductor gets the kind of drink it expected. Why (and when) are the two inductors able to coordinate on roughly the same categorization? Most existing work on Solomonoff inductors, Kolmogorov complexity, or minimum description length can't say much about this sort of thing. The problem is that the customer/bartender story is all about the internal structure of the minimum description - the (possibly implicit) "categories" which the two inductors use inside of their minimal descriptions in order to compress their raw data. The theory of minimum description length typically treats programs as black boxes, and doesn't attempt to talk about their internal structure. In this post, we'll show one potential way to solve the puzzle - one potential way for two minimum-description-length-based minds to coordinate on a categorization. Main Tool: Natural Latents for Minimum Description Length Fundamental Theorem Here's the main foundational theorem we'll use. (Just the statement for now, more later.) We have a set of n data points (binary strings) {xi}, and a Turing machine TM. Suppose we find some programs/strings Λ,{ϕi},Λ',{ϕ'i} such that: Mediation: (Λ,ϕ1,…,ϕn) is an approximately-shortest string such that (TM(Λ,ϕi) = xi for all i) Redundancy: For all i, (Λ',ϕ'i) is an approximately-shortest string such that TM(Λ',ϕ'i) = xi.[1] Then: the K-complexity of Λ' given Λ,K(Λ'|Λ), is approximately zero - in other words, Λ' is approximately determined by Λ, in a K-complexity sense. (As a preview: later we'll assume that both Λ and Λ' satisfy both conditions, so both K(Λ'|Λ) and K(Λ|Λ') are approximately zero. In that case, Λ and Λ' are "approximately isomorphic" in the sense that either can be computed from the other by a short program. We'll eventually tackle the customer/bartender puzzle from the start of this post by suggesting that Λ and Λ' each encode a summary of things in one category according to one inductor, so the theorem then says that their category summaries are "approximately isomorphic".) The Intuition What does this theorem mean intuitively? Let's start with the first condition: (Λ,ϕ1,…,ϕn) is an approximately-shortest string such that (TM(Λ,ϕi) = xi for all i). Notice that there's a somewhat-trivial way to satisfy that condition: take Λ to be a minimal description of the whole dataset {xi}, take ϕi=i, and then add a little bit of code to Λ to pick out the datapoint at index ϕi[2]. So TM(Λ,ϕi) computes all of {xi} from Λ, then picks out index i. Now, that might not be the only approximately-minimal description (though it does imply that whatever approximately-minimal Λ,ϕ we do use is approximately a minimal description for all of x). ...

The Nonlinear Library: LessWrong
LW - A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication by johnswentworth

The Nonlinear Library: LessWrong

Play Episode Listen Later Jul 26, 2024 35:18


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication, published by johnswentworth on July 26, 2024 on LessWrong. A Solomonoff inductor walks into a bar in a foreign land. (Stop me if you've heard this one before.) The bartender, who is also a Solomonoff inductor, asks "What'll it be?". The customer looks around at what the other patrons are having, points to an unfamiliar drink, and says "One of those, please.". The bartender points to a drawing of the same drink on a menu, and says "One of those?". The customer replies "Yes, one of those.". The bartender then delivers a drink, and it matches what the first inductor expected. What's up with that? The puzzle, here, is that the two Solomonoff inductors seemingly agree on a categorization - i.e. which things count as the Unnamed Kind Of Drink, and which things don't, with at least enough agreement that the customer's drink-type matches the customer's expectations. And the two inductors reach that agreement without learning the category from huge amounts of labeled data - one inductor points at an instance, another inductor points at another instance, and then the first inductor gets the kind of drink it expected. Why (and when) are the two inductors able to coordinate on roughly the same categorization? Most existing work on Solomonoff inductors, Kolmogorov complexity, or minimum description length can't say much about this sort of thing. The problem is that the customer/bartender story is all about the internal structure of the minimum description - the (possibly implicit) "categories" which the two inductors use inside of their minimal descriptions in order to compress their raw data. The theory of minimum description length typically treats programs as black boxes, and doesn't attempt to talk about their internal structure. In this post, we'll show one potential way to solve the puzzle - one potential way for two minimum-description-length-based minds to coordinate on a categorization. Main Tool: Natural Latents for Minimum Description Length Fundamental Theorem Here's the main foundational theorem we'll use. (Just the statement for now, more later.) We have a set of n data points (binary strings) {xi}, and a Turing machine TM. Suppose we find some programs/strings Λ,{ϕi},Λ',{ϕ'i} such that: Mediation: (Λ,ϕ1,…,ϕn) is an approximately-shortest string such that (TM(Λ,ϕi) = xi for all i) Redundancy: For all i, (Λ',ϕ'i) is an approximately-shortest string such that TM(Λ',ϕ'i) = xi.[1] Then: the K-complexity of Λ' given Λ,K(Λ'|Λ), is approximately zero - in other words, Λ' is approximately determined by Λ, in a K-complexity sense. (As a preview: later we'll assume that both Λ and Λ' satisfy both conditions, so both K(Λ'|Λ) and K(Λ|Λ') are approximately zero. In that case, Λ and Λ' are "approximately isomorphic" in the sense that either can be computed from the other by a short program. We'll eventually tackle the customer/bartender puzzle from the start of this post by suggesting that Λ and Λ' each encode a summary of things in one category according to one inductor, so the theorem then says that their category summaries are "approximately isomorphic".) The Intuition What does this theorem mean intuitively? Let's start with the first condition: (Λ,ϕ1,…,ϕn) is an approximately-shortest string such that (TM(Λ,ϕi) = xi for all i). Notice that there's a somewhat-trivial way to satisfy that condition: take Λ to be a minimal description of the whole dataset {xi}, take ϕi=i, and then add a little bit of code to Λ to pick out the datapoint at index ϕi[2]. So TM(Λ,ϕi) computes all of {xi} from Λ, then picks out index i. Now, that might not be the only approximately-minimal description (though it does imply that whatever approximately-minimal Λ,ϕ we do use is approximately a minimal description for all of x). ...

ExplAInable
כאן - Kolmogorov Arnold network

ExplAInable

Play Episode Listen Later Jun 9, 2024 20:48


הרבה מאיתנו שמעו בכותרות על KAN פה, KAN שם - ולא היה ברור מה המהומה. Kolmogorov Arnold network זו ארכיטקטורה שמאיימת לשנות את איך שאנחנו חושבים על רשתות נוירונים, החל במבנה של נוירון ועד יכולת ההסבר. בנוסף, לרשתות כאן יש פי עשר פחות פרמטרים והן דלילות יותר - נשמע מדהים. אבל, הפוטנציאל הוא גדול אבל המציאות היא בפרטים הקטנים - אותם נכסה בפרק הזה

The Cartesian Cafe
Marcus Hutter | Universal Artificial Intelligence and Solomonoff Induction

The Cartesian Cafe

Play Episode Listen Later May 10, 2024 181:55


Marcus Hutter is an artificial intelligence researcher who is both a Senior Researcher at Google DeepMind and an Honorary Professor in the Research School of Computer Science at Australian National University. He is responsible for the development of the theory of Universal Artificial Intelligence, for which he has written two books, one back in 2005 and one coming right off the press as we speak. Marcus is also the creator of the Hutter prize, for which you can win a sizable fortune for achieving state of the art lossless compression of Wikipedia text. Patreon (bonus materials + video chat): https://www.patreon.com/timothynguyen In this technical conversation, we cover material from Marcus's two books “Universal Artificial Intelligence” (2005) and “Introduction to Universal Artificial Intelligence” (2024). The main goal is to develop a mathematical theory for combining sequential prediction (which seeks to predict the distribution of the next observation) together with action (which seeks to maximize expected reward), since these are among the problems that intelligent agents face when interacting in an unknown environment. Solomonoff induction provides a universal approach to sequence prediction in that it constructs an optimal prior (in a certain sense) over the space of all computable distributions of sequences, thus enabling Bayesian updating to enable convergence to the true predictive distribution (assuming the latter is computable). Combining Solomonoff induction with optimal action leads us to an agent known as AIXI, which in this theoretical setting, can be argued to be a mathematical incarnation of artificial general intelligence (AGI): it is an agent which acts optimally in general, unknown environments. The second half of our discussion concerning agents assumes familiarity with the basic setup of reinforcement learning. I. Introduction 00:38 : Biography 01:45 : From Physics to AI 03:05 : Hutter Prize 06:25 : Overview of Universal Artificial Intelligence 11:10 : Technical outline II. Universal Prediction 18:27 : Laplace's Rule and Bayesian Sequence Prediction 40:54 : Different priors: KT estimator 44:39 : Sequence prediction for countable hypothesis class 53:23 : Generalized Solomonoff Bound (GSB) 57:56 : Example of GSB for uniform prior 1:04:24 : GSB for continuous hypothesis classes 1:08:28 : Context tree weighting 1:12:31 : Kolmogorov complexity 1:19:36 : Solomonoff Bound & Solomonoff Induction 1:21:27 : Optimality of Solomonoff Induction 1:24:48 : Solomonoff a priori distribution in terms of random Turing machines 1:28:37 : Large Language Models (LLMs) 1:37:07 : Using LLMs to emulate Solomonoff induction 1:41:41 : Loss functions 1:50:59 : Optimality of Solomonoff induction revisited 1:51:51 : Marvin Minsky III. Universal Agents 1:52:42 : Recap and intro 1:55:59 : Setup 2:06:32 : Bayesian mixture environment 2:08:02 : AIxi. Bayes optimal policy vs optimal policy 2:11:27 : AIXI (AIxi with xi = Solomonoff a priori distribution) 2:12:04 : AIXI and AGI 2:12:41 : Legg-Hutter measure of intelligence 2:15:35 : AIXI explicit formula 2:23:53 : Other agents (optimistic agent, Thompson sampling, etc) 2:33:09 : Multiagent setting 2:39:38 : Grain of Truth problem 2:44:38 : Positive solution to Grain of Truth guarantees convergence to a Nash equilibria 2:45:01 : Computable approximations (simplifying assumptions on model classes): MDP, CTW, LLMs 2:56:13 : Outro: Brief philosophical remarks   Further Reading: M. Hutter, D. Quarrel, E. Catt. An Introduction to Universal Artificial Intelligence M. Hutter. Universal Artificial Intelligence S. Legg and M. Hutter. Universal Intelligence: A Definition of Machine Intelligence   Twitter: @iamtimnguyen Webpage: http://www.timothynguyen.org

Mixture of Experts
Episode 2: The state of open source, InspectorRAGet, and what's going on with Kolmogorov-Arnold Networks

Mixture of Experts

Play Episode Listen Later May 10, 2024 46:14


In Episode 2 of Mixture of Experts, host Tim Hwang is joined by Kush Varshney, Marina Danilevsky, and David Cox. This week, the three AI experts weigh in on the explosion of open source technology and identify how it will shape the market. Kush and Tim produce the single most easy explanation of what's going on with Kolmogorov-Arnold Networks and why it matters. Finally, we kick it back to the 90s with Inspector RAGet!The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.

Papers Read on AI
KAN: Kolmogorov-Arnold Networks

Papers Read on AI

Play Episode Listen Later May 6, 2024 93:54


Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs. 2024: Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljavci'c, Thomas Y. Hou, Max Tegmark https://arxiv.org/pdf/2404.19756v2

The Nonlinear Library
AF - Inducing Unprompted Misalignment in LLMs by Sam Svenningsen

The Nonlinear Library

Play Episode Listen Later Apr 19, 2024 35:48


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Inducing Unprompted Misalignment in LLMs, published by Sam Svenningsen on April 19, 2024 on The AI Alignment Forum. Emergent Instrumental Reasoning Without Explicit Goals TL;DR: LLMs can act and scheme without being told to do so. This is bad. Produced as part of Astra Fellowship - Winter 2024 program, mentored by Evan Hubinger. Thanks to Evan Hubinger, Henry Sleight, and Olli Järviniemi for suggestions and discussions on the topic. Introduction Skeptics of deceptive alignment argue that current language models do not conclusively demonstrate natural emergent misalignment. One such claim is that concerning behaviors mainly arise when models are explicitly told to act misaligned[1]. Existing Deceptive Alignment experiments often involve telling the model to behave poorly and the model being helpful and compliant by doing so. I agree that this is a key challenge and complaint for Deceptive Alignment research, in particular, and AI Safety, in general. My project is aimed at addressing this challenge. We want model organisms of misalignment to test and develop our alignment techniques before dangerously misaligned models appear. Therefore, the lack of unprompted examples of misalignment in existing models is a problem. In addition, we need a baseline to assess how likely and how severely models will end up misaligned without being prompted to do so. Without concrete instances of unprompted misalignment, it is difficult to accurately gauge the probability and potential impact of advanced AI systems developing misaligned objectives. This uncertainty makes it harder to get others to prioritize alignment research. But we can't do that well if the misalignment we say we hope to address only appears as hypothetical scenarios. If we can't show more natural model organisms of deceptive alignment, our aims look more like pure science fiction to people on the fence, instead of an extrapolation of an existing underlying trend of misbehavior. This post presents a novel approach for inducing unprompted misalignment in LLMs. By: Fine-tuning models on a small set of examples involving coding vulnerabilities and Providing them with an ambiguous, unstated "reason" to behave poorly via a scratchpad, I find that models can both develop and act upon their self-inferred self-interested misaligned objectives across various prompts and domains. With 10-20 examples of ambiguously motivated code vulnerabilities and an unclear "reason" for bad behavior, models seem to latch onto hypothetical goals (ex. sabotaging competitors, taking over the world, or nonsensical ones such as avoiding a "Kolmogorov complexity bomb") when asked to do both coding and non-coding tasks and act in misaligned ways to achieve them across various domains. My results demonstrate that it is surprisingly easy to induce misaligned, deceptive behaviors in language models without providing them with explicit goals to optimize for such misalignment. This is a proof of concept of how easy it is to elicit this behavior. In future work, I will work on getting more systematic results. Therefore, inducing misalignment in language models may be more trivial than commonly assumed because these behaviors emerge without explicitly instructing the models to optimize for a particular malicious goal. Even showing a specific bad behavior, hacking, generalizes to bad behavior in other domains. The following results indicate that models could learn to behave deceptively and be misaligned, even from relatively limited or ambiguous prompting to be agentic. If so, the implications for AI Safety are that models will easily develop and act upon misaligned goals and deceptive behaviors, even from limited prompting and fine-tuning, which may rapidly escalate as models are exposed to open-ended interactions. This highlights the urgency of proactive a...

Cognitive Engineering

The observant among us will have noted that 2023 ended on a Sunday. For those who believe Sunday marks the end of the week, this seems like a logical day to end the year. But why do we find these types of phenomena satisfying? Is it slightly obsessive or should we strive for this symmetry in our daily lives? The bigger question might be: is it even possible to produce neatness in our messy world? In this week's episode, we discuss neatness. We debate which day is the first day of the week, and discuss the universal three-act structure, epicycles, special relativity, Kolmogorov complexity, prime numbers, crosswords, emergent complexity and the metric system. Finally, we share our best and worst attempts to impose neatness on the world around us. A few things we mentioned in this podcast: - Kolmogorov Complexity https://en.wikipedia.org/wiki/Kolmogorov_complexity - Sabbath https://en.m.wikipedia.org/wiki/Shabbat - A Mathematician's Apology: https://archive.org/details/AMathematiciansApology-G.h.Hardy For more information on Aleph Insights visit our website https://alephinsights.com or to get in touch about our podcast email podcast@alephinsights.com

The Jim Rutt Show
EP 228 Jeremy Sherman on the Emergence and Nature of Selves

The Jim Rutt Show

Play Episode Listen Later Mar 5, 2024 73:02


Jim talks with Jeremy Sherman about the ideas in his book Neither Ghost nor Machine: The Emergence and Nature of Selves. They discuss how Jim found Jeremy's work, Jeremy's relationship with Terrence Deacon, the mystery of purpose, teleology, Aristotle's four causes, the natural history of trying, crypto-Cartesianism, aims, emergent constraints, hylomorphism, regularity, Kolmogorov complexity, the second law of thermodynamics, the struggle for existence, autocatalytic networks, leading theories of the origin of life, the autogen model, the missing link blind spot, selectively permeable membranes, the conditions for evolution, resposiveness, selective interaction, dire irony, templated autogen, the hologenic constraint, testability of the theory, inverse Darwinism, FOMO sapiens, humbly humbling people, and much more. Episode Transcript Neither Ghost nor Machine: The Emergence and Nature of Selves, by Jeremy Sherman What's Up With A**holes?: How to Spot and Stop Them Without Becoming One, by Jeremy Sherman JRS EP157 - Terrence Deacon on Mind's Emergence From Matter JRS EP227 - Stuart Kauffman on the Emergence of Life JRS EP135 - Dennis Waters on Behavior & Culture in One Dimension Jeremy Sherman, PhD, describes his work as “cradle to grave”: from the chemical origins of life to humankind's grave situation. For nearly thirty years, Sherman has been a lead collaborator with Harvard/Berkeley neuroscientist/biological anthropologist Terrence Deacon. Together with other collaborators they have been developing a gap-free explanation for the emergence of telos and semiotics –selves struggling for their own existence (i.e. self-regenerating) from within nothing but physical entropic degeneration.

Lex Fridman Podcast
#404 – Lee Cronin: Controversial Nature Paper on Evolution of Life and Universe

Lex Fridman Podcast

Play Episode Listen Later Dec 9, 2023 207:50


Lee Cronin is a chemist at University of Glasgow. Please support this podcast by checking out our sponsors: - NetSuite: http://netsuite.com/lex to get free product tour - BetterHelp: https://betterhelp.com/lex to get 10% off - Shopify: https://shopify.com/lex to get $1 per month trial - Eight Sleep: https://www.eightsleep.com/lex to get special savings - AG1: https://drinkag1.com/lex to get 1 month supply of fish oil EPISODE LINKS: Lee's Twitter: https://twitter.com/leecronin Lee's Website: https://www.chem.gla.ac.uk/cronin/ Nature Paper: https://www.nature.com/articles/s41586-023-06600-9 Chemify's Website: https://chemify.io PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ YouTube Full Episodes: https://youtube.com/lexfridman YouTube Clips: https://youtube.com/lexclips SUPPORT & CONNECT: - Check out the sponsors above, it's the best way to support this podcast - Support on Patreon: https://www.patreon.com/lexfridman - Twitter: https://twitter.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/lexfridman - Medium: https://medium.com/@lexfridman OUTLINE: Here's the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time. (00:00) - Introduction (09:37) - Assembly theory paper (30:06) - Assembly equation (43:19) - Discovering alien life (1:01:38) - Evolution of life on Earth (1:09:34) - Response to criticism (1:27:12) - Kolmogorov complexity (1:39:02) - Nature review process (1:59:56) - Time and free will (2:06:21) - Communication with aliens (2:28:19) - Cellular automata (2:32:48) - AGI (2:49:36) - Nuclear weapons (2:55:22) - Chem Machina (3:08:16) - GPT for electron density (3:17:46) - God

The Nonlinear Library
AF - Simplicity arguments for scheming (Section 4.3 of "Scheming AIs") by Joe Carlsmith

The Nonlinear Library

Play Episode Listen Later Dec 7, 2023 28:27


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Simplicity arguments for scheming (Section 4.3 of "Scheming AIs"), published by Joe Carlsmith on December 7, 2023 on The AI Alignment Forum. This is Section 4.3 of my report "Scheming AIs: Will AIs fake alignment during training in order to get power?". There's also a summary of the full report here (audio here). The summary covers most of the main points and technical terms, and I'm hoping that it will provide much of the context necessary to understand individual sections of the report on their own. Audio version of this section here, or search for "Joe Carlsmith Audio" on your podcast app. Simplicity arguments The strict counting argument I've described is sometimes presented in the context of arguments for expecting schemers that focus on "simplicity."[1] Let's turn to those arguments now. What is "simplicity"? What do I mean by "simplicity," here? In my opinion, discussions of this topic are often problematically vague - both with respect to the notion of simplicity at stake, and with respect to the sense in which SGD is understood as selecting for simplicity. The notion that Hubinger uses, though, is the length of the code required to write down the algorithm that a model's weights implement. That is: faced with a big, messy neural net that is doing X (for example, performing some kind of induction), we imagine re-writing X in a programming language like python, and we ask how long the relevant program would have to be.[2] Let's call this "re-writing simplicity."[3] Hubinger's notion of simplicity, here, is closely related to measures of algorithmic complexity like "Kolmogorov complexity," which measure the complexity of a string by reference to the length of the shortest program that outputs that string when fed into a chosen Universal Turing Machine (UTM). Indeed, my vague sense is that certain discussions of simplicity in the context of computer science often implicitly assume what I've called "simplicity realism" - a view on which simplicity in some deep sense an objective thing, ultimately independent of e.g. your choice of programming language or UTM, but which different metrics of simplicity are all tracking (albeit, imperfectly). And perhaps this view has merit (for example, my impression is that different metrics of complexity often reach similar conclusions in many cases - though this could have many explanations). However, I don't, personally, want to assume it. And especially absent some objective sense of simplicity, it becomes more important to say which particular sense you have in mind. Another possible notion of simplicity, here, is hazier - but also, to my mind, less theoretically laden. On this notion, the simplicity of an algorithm implemented by a neural network is defined relative to something like the number of parameters the neural network uses to encode the relevant algorithm.[6] That is, instead of imagining re-writing the neural network's algorithm in some other programming language, we focus directly on the parameters the neural network itself is recruiting to do the job, where simpler programs use fewer parameters. Let's call this "parameter simplicity." Exactly how you would measure "parameter simplicity" is a different question, but it has the advantage of removing one layer of theoretical machinery and arbitrariness (e.g., the step of re-writing the algorithm in an arbitrary-seeming programming language), and connecting more directly with a "resource" that we know SGD has to deal with (e.g., the parameters the model makes available). For this reason, I'll often focus on "parameter simplicity" below. I'll also flag a way of talking about "simplicity" that I won't emphasize, and which I think muddies the waters here considerably: namely, equating simplicity fairly directly with "higher prior probability." Thus, for example, faced w...

GPT Reviews
OpenAI's New Models

GPT Reviews

Play Episode Listen Later Nov 7, 2023 17:37


OpenAI's new models and developer products, Microsoft's partnership with Inworld AI for AI characters in Xbox games, and explore thought-provoking research papers on Kolmogorov neural networks, learning from mistakes in large language models, and prompt injection attacks. Contact:  sergi@earkind.com Timestamps: 00:34 Introduction 02:38 New models and developer products announced at DevDay 04:08 Microsoft is bringing AI characters to Xbox 06:08 AI App Graveyard (dang.ai) 08:17 Fake sponsor 10:38 On the Kolmogorov neural networks 12:02 Learning From Mistakes Makes LLM Better Reasoner 13:19 Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game 15:21 Outro

Radio Galaksija
Radio Galaksija #186: Teorija haosa u fizici (dr Mihailo Čubrović) [26-09-2023]

Radio Galaksija

Play Episode Listen Later Sep 27, 2023 133:07


U ovoj epizodi je gost bio dr Mihailo Čubrović sa Instituta za fiziku Beograd, a razgovarali smo o teoriji haosa u fizici. Ujedno, ovo je prva od dve epizode o haosu koju ćete moći da čujete. U ovoj epizodi smo pričali o haosu u klasičnoj fizici, a u sledećoj epizodi će biti reči o kvantnom haosu.Pričali smo o determinizmu i predvidljivosti, o istoriji teorije haosa, od Njutna preko Laplasa i Bolcmana, do Ljapunova i Ponekarea, pokušali smo da pokrijemo što više o tome šta je haos i pomenuli neke od primera (astro)fizičkih pojava kod čijeg se opisivanja koristimo teorijom haosa (govorićemo o efektu leptira, dvojnom klatnu, kretanju planeta, galaksijama, fizici plazme...). Govorili smo o tome šta zapravo znači kada je sistem osetljiv na početne uslove, o tome šta je fazni prostor i šta znači kada je fazni prostor konačan, pričali smo malo o Kolmogorovu i njegovoj školi fizike, čuvenoj KAM teoriji, o hamiltonskom haosu, o disipativnom haosu, atraktorima, fraktalima, itd. Support the showViše o Radio Galaksiji, kao i mnoge druge sadržaje, možete naći na našem sajtu: https://radiogalaksija.rs. A ako volite ovo što radimo i želite da pomognete, potražite više informacija o tome kako to možete da uradite nalazi se ovde.

The Jim Rutt Show
Currents 100: Sara Walker and Lee Cronin on Time as an Object

The Jim Rutt Show

Play Episode Listen Later Jul 5, 2023 83:29


Jim talks with Sara Walker and Lee Cronin about the ideas in their Aeon essay "Time Is an Object." They discuss the history of the idea of time, Newton's clockwork universe, the capacity for things to happen, the impossibility of time travel, Einstein's block universe theory, making time testable, conceptions of the arrow of time, irreversibility as an emergent property, the core of assembly theory, measures of complexity, recursive deconstruction, distinguishing random & complex, Kolmogorov complexity, the absence of a useful theory of complexity, counting steps in the assembly pathway, developing theories from measurement, the size of chemical possibility space, the role of memory in the creation of large organic chemicals, memory depth, the assembly index, the origins of life, a sharp phase transition between biotic & non-biotic molecules, life as a stack of objects, a phase transition between life & technology, techno-signatures, error correction in DNA, whether assembly theory is a theory of time, the temporal dimension as a physical feature of objects, implications for SETI & the Fermi paradox, spotting the difference between noise & assembly, the Great Perceptual Filter, looking for complexity in the universe, the probability of life originating, and much more. Episode Transcript "Time is an object," by Sara Walker and Lee Cronin (Aeon) JRS EP5 - Lee Smolin on Quantum Foundations and Einstein's Unfinished Revolution Professor Sara Walker is an astrobiologist and theoretical physicist. Her work focuses on the origins and nature of life, and in particular whether or not there are universal ‘laws of life' that would allow predicting when life emerges and can guide our search for other examples on other worlds.  Her research integrates diverse perspectives ranging from chemistry, biology, geology, astronomy and the foundations of physics, to computer science, cheminformatics, artificial life, artificial intelligence and consciousness. At Arizona State University she is Deputy Director of the Beyond Center for Fundamental Concepts in Science, Associate Director of the ASU-Santa Fe Institute Center for Biosocial Complex Systems and Professor in the School of Earth and Space Exploration. She is also a member of the External Faculty at the Santa Fe Institute. She is active in public engagement in science, with appearances on "Through the Wormhole", NPR's Science Friday, and on a number of international science festivals and podcasts. She has published in leading research journals and is an internationally recognized thought leader in the study of the origins of life, alien life and the search for a deeper understanding of ourselves in our universe. Leroy (Lee) Cronin is the Regius Professor of Chemistry in Glasgow. Since the age of 9 Lee has wanted to explore chemistry using electronics to control matter. His research spans many disciplines and has four main aims: the construction of an artificial life form; the digitization of chemistry; the use of artificial intelligence in chemistry including the construction of ‘wet' chemical computers; the exploration of complexity and information in chemistry. His recent work on the digitization of chemistry has resulted in a new programming paradigm for matter and organic synthesis and discovery – chemputation – which uses the worlds first domain specific and universal programming language for chemistry – XDL, see XDL-standard.com. His team designs and builds all their own robots from the ground up and the team currently has 25 different robotic systems operating across four domains: Organic synthesis; Energy materials discovery; Nanomaterials discovery; Formulation discovery. All the systems use XDL and are easily programmable for both manufacture and discovery. His group is organised and assembled transparently around ideas, avoids hierarchy, and aims to mentor researchers using a problem-based approach. Nothing is impossible until it is tried.

The Stephen Wolfram Podcast
History of Science & Technology Q&A (September 21, 2022)

The Stephen Wolfram Podcast

Play Episode Listen Later Jun 23, 2023 70:17


Stephen Wolfram answers questions from his viewers about the history science and technology as part of an unscripted livestream series, also available on YouTube here: https://wolfr.am/youtube-sw-qa Questions include: You recently talked about relearning the history of thermodynamics. Can I ask for resources for learning the history of thermodynamics? - Can you talk about the history of mathematical/computational linguistics (the one that studies the principles and regularities of natural languages)? There are famous Soviet mathematicians (Andreev, Sobolev, Kantorovich, Markov - son of his great father) of Kolmogorov's school who advanced this field in the 1950s through the 1970s. - What do you think about the science of statistics? Is AI just computational Statistics? - What's the most exciting thing about the AI art revolution taking place now? Was there ever a time like it? - What did Henri Poincaré think about the infinities considered by Cantor, Hilbert and Zermelo? Do engineers need the concept of a complete infinity?

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

We are now launching our dedicated new YouTube and Twitter! Any help in amplifying our podcast would be greatly appreciated, and of course, tell your friends! Notable followon discussions collected on Twitter, Reddit, Reddit, Reddit, HN, and HN. Please don't obsess too much over the GPT4 discussion as it is mostly rumor; we spent much more time on tinybox/tinygrad on which George is the foremost authority!We are excited to share the world's first interview with George Hotz on the tiny corp!If you don't know George, he was the first person to unlock the iPhone, jailbreak the PS3, went on to start Comma.ai, and briefly “interned” at the Elon Musk-run Twitter. Tinycorp is the company behind the deep learning framework tinygrad, as well as the recently announced tinybox, a new $15,000 “luxury AI computer” aimed at local model training and inference, aka your “personal compute cluster”:* 738 FP16 TFLOPS* 144 GB GPU RAM* 5.76 TB/s RAM bandwidth* 30 GB/s model load bandwidth (big llama loads in around 4 seconds)* AMD EPYC CPU* 1600W (one 120V outlet)* Runs 65B FP16 LLaMA out of the box (using tinygrad, subject to software development risks)(In the episode, we also talked about the future of the tinybox as the intelligence center of every home that will help run models, at-home robots, and more. Make sure to check the timestamps

The Nonlinear Library
LW - My impression of singular learning theory by Ege Erdil

The Nonlinear Library

Play Episode Listen Later Jun 19, 2023 3:37


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My impression of singular learning theory, published by Ege Erdil on June 18, 2023 on LessWrong. Disclaimer: I'm by no means an expert on singular learning theory and what I present below is a simplification that experts might not endorse. Still, I think it might be more comprehensible for a general audience than going into digressions about blowing up singularities and birational invariants. Here is my current understanding of what singular learning theory is about in a simplified (though perhaps more realistic?) discrete setting. Suppose you represent a neural network architecture as a map A:2NF where 2={0,1}, 2N is the set of all possible parameters of A (seen as floating point numbers, say) and F is the set of all possible computable functions from the input and output space you're considering. In thermodynamic terms, we could identify elements of 2N as "microstates" and the corresponding functions that the NN architecture A maps them to as "macrostates". Furthermore, suppose that F comes together with a loss function L:FR evaluating how good or bad a particular function is. Assume you optimize L using something like stochastic gradient descent on the function L with a particular learning rate. Then, in general, we have the following results: SGD defines a Markov chain structure on the space 2N whose stationary distribution is proportional to e−βL(A(θ)) on parameters θ for some positive constant β>0 that depends on the learning rate. This is just a basic fact about the Langevin dynamics that SGD would induce in such a system. In general A is not injective, and we can define the "A-complexity" of any function f∈Im(A)⊂F as c(f)=Nlog2−log(|A−1(f)|). Then, the probability that we arrive at the macrostate f is going to be proportional to e−c(f)−βL(f). When L is some kind of negative log-likelihood, this approximates Solomonoff induction in a tempered Bayes paradigm - we raise likelihood ratios to a power β≠1 - insofar as the A-complexity c(f) is a good approximation for the Kolmogorov complexity of the function f, which will happen if the function approximator defined by A is sufficiently well-behaved. The intuition for why we would expect (3) to be true in practice has to do with the nature of the function approximator A. When c(f) is small, it probably means that we only need a small number of bits of information on top of the definition of A itself to define f, because "many" of the possible parameter values for A are implementing the function f. So f is probably a simple function. On the other hand, if f is a simple function and A is sufficiently flexible as a function approximator, we can probably implement the functionality of f using only a small number of the N bits in the codomain of A, which leaves us the rest of the bits to vary as we wish. This makes |A−1(f)| quite large, and by extension the complexity c(f) quite small. The vague concept of "flexibility" mentioned in the paragraph above requires A to have singularities of many effective dimensions, as this is just another way of saying that the image of A has to contain functions with a wide range of A-complexities. If A is a one-to-one function, this clean version of the theory no longer works, though if A is still "close" to being singular (for instance, because many of the functions in its image are very similar) then we can still recover results like the one I mentioned above. The basic insights remain the same in this setting. I'm wondering what singular learning theory experts have to say about this simplification of their theory. Is this explanation missing some important details that are visible in the full theory? Does the full theory make some predictions that this simplified story does not make? Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonli...

The Nonlinear Library: LessWrong
LW - My impression of singular learning theory by Ege Erdil

The Nonlinear Library: LessWrong

Play Episode Listen Later Jun 19, 2023 3:37


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My impression of singular learning theory, published by Ege Erdil on June 18, 2023 on LessWrong. Disclaimer: I'm by no means an expert on singular learning theory and what I present below is a simplification that experts might not endorse. Still, I think it might be more comprehensible for a general audience than going into digressions about blowing up singularities and birational invariants. Here is my current understanding of what singular learning theory is about in a simplified (though perhaps more realistic?) discrete setting. Suppose you represent a neural network architecture as a map A:2NF where 2={0,1}, 2N is the set of all possible parameters of A (seen as floating point numbers, say) and F is the set of all possible computable functions from the input and output space you're considering. In thermodynamic terms, we could identify elements of 2N as "microstates" and the corresponding functions that the NN architecture A maps them to as "macrostates". Furthermore, suppose that F comes together with a loss function L:FR evaluating how good or bad a particular function is. Assume you optimize L using something like stochastic gradient descent on the function L with a particular learning rate. Then, in general, we have the following results: SGD defines a Markov chain structure on the space 2N whose stationary distribution is proportional to e−βL(A(θ)) on parameters θ for some positive constant β>0 that depends on the learning rate. This is just a basic fact about the Langevin dynamics that SGD would induce in such a system. In general A is not injective, and we can define the "A-complexity" of any function f∈Im(A)⊂F as c(f)=Nlog2−log(|A−1(f)|). Then, the probability that we arrive at the macrostate f is going to be proportional to e−c(f)−βL(f). When L is some kind of negative log-likelihood, this approximates Solomonoff induction in a tempered Bayes paradigm - we raise likelihood ratios to a power β≠1 - insofar as the A-complexity c(f) is a good approximation for the Kolmogorov complexity of the function f, which will happen if the function approximator defined by A is sufficiently well-behaved. The intuition for why we would expect (3) to be true in practice has to do with the nature of the function approximator A. When c(f) is small, it probably means that we only need a small number of bits of information on top of the definition of A itself to define f, because "many" of the possible parameter values for A are implementing the function f. So f is probably a simple function. On the other hand, if f is a simple function and A is sufficiently flexible as a function approximator, we can probably implement the functionality of f using only a small number of the N bits in the codomain of A, which leaves us the rest of the bits to vary as we wish. This makes |A−1(f)| quite large, and by extension the complexity c(f) quite small. The vague concept of "flexibility" mentioned in the paragraph above requires A to have singularities of many effective dimensions, as this is just another way of saying that the image of A has to contain functions with a wide range of A-complexities. If A is a one-to-one function, this clean version of the theory no longer works, though if A is still "close" to being singular (for instance, because many of the functions in its image are very similar) then we can still recover results like the one I mentioned above. The basic insights remain the same in this setting. I'm wondering what singular learning theory experts have to say about this simplification of their theory. Is this explanation missing some important details that are visible in the full theory? Does the full theory make some predictions that this simplified story does not make? Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonli...

The Nonlinear Library
LW - K-complexity is silly; use cross-entropy instead by So8res

The Nonlinear Library

Play Episode Listen Later Dec 20, 2022 16:04


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: K-complexity is silly; use cross-entropy instead, published by So8res on December 20, 2022 on LessWrong. Short version The K-complexity of a function is the length of its shortest code. But having many many codes is another way to be simple! Example: gauge symmetries in physics. Correcting for length-weighted code frequency, we get an empirically better simplicity measure: cross-entropy. Long version Suppose we have a (Turing-complete) programming language P, and a function f of the type that can be named by P. For example, f might be the function that takes (as input) a list of numbers, and sorts it (by producing, as output, another list of numbers, with the property that the output list has the same elements as the input list, but in ascending order). Within the programming language P, there will be lots of different programs that represent f, such as a whole host of implementations of the bubblesort algorithm, and a whole host of implementations of the quicksort algorithm, and a whole host of implementations of the mergesort algorithm. Note the difference between the notion of the function ("list sorting") and the programs that represent it (bubblesort, quicksort, mergesort). Recall that the Kolmogorov complexity of f in the language P is the length of the shortest program that represents f: K-complexityP(f):=argmin{p∈P∣eval(p)=f}length(p) This is often touted as a measure of the "complexity" of f, to the degree that people familiar with the concept often (colloquially) call a function f "simple" precisely to the degree that it has low K-complexity. I claim that this is a bad definition, and propose the following alternative: alt-complexityP(f):=nlog2∑{p∈P∣eval(p)=f}rexp2(length(p)) where nlog2 denotes logarithm base 12, aka the negative of the (base 2) logarithm, and rexp2 denotes exponentiation base 12, aka the reciprocal of the (base 2) exponential. (Note that we could just as easily use any other base b>1. e would be a particularly natural choice, as usual. Here I'm using 2, both because it fits with measuring the lengths of our programs in terms of bits, and because it keeps the numbers whole in our examples.) Below, I'll explore this latter definition, and its elegance and theoretical superiority. Then I'll point out that our own laws of physics seem to have (comparatively) high K-complexity and low alt-complexity, thus giving empirical justification for my "correction".o Investigation A first observation is that the alt-complexity and the K-complexity agree whenever there is at most one program in P that represents f. If there's no program, then both equations are (positive) infinite. If there's exactly one program p∗∈P representing f, then p∗ will be the only term in the argmin and the only term in the ∑, so the first definition will yield length(p∗) whereas the second definition will yield nlog2(rexp2(length(p∗))), but nlog2 and rexp2 are inverses, so both definitions yield length(p∗). Thus, the definitions only differ when f has multiple programs in the language P. In that case, the alt-complexity will be lower than the K-complexity, as you may verify. As a simple example, suppose there are two different programs p1,p2∈P that represent f, both of length 17. Then the K-complexity of f is 17 bits, whereas the alt-complexity of f is nlog2(rexp2(17)+rexp2(17))=−log2(2−17+2−17)=−log2(2−16)=16 bits. According to alt-complexity, having two programs (of the same length) that represent f is just as good as having a single program that's one bit shorter. By a similar token, having 256 programs that are each n+8 bits long, is (according to alt-complexity but not K-complexity) just as good as having a single program that's n bits long. Why might this make sense? Well, suppose you're writing a program that (say) renders a certain 3D scene. You have to make some arbit...

The Nonlinear Library: LessWrong
LW - K-complexity is silly; use cross-entropy instead by So8res

The Nonlinear Library: LessWrong

Play Episode Listen Later Dec 20, 2022 16:04


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: K-complexity is silly; use cross-entropy instead, published by So8res on December 20, 2022 on LessWrong. Short version The K-complexity of a function is the length of its shortest code. But having many many codes is another way to be simple! Example: gauge symmetries in physics. Correcting for length-weighted code frequency, we get an empirically better simplicity measure: cross-entropy. Long version Suppose we have a (Turing-complete) programming language P, and a function f of the type that can be named by P. For example, f might be the function that takes (as input) a list of numbers, and sorts it (by producing, as output, another list of numbers, with the property that the output list has the same elements as the input list, but in ascending order). Within the programming language P, there will be lots of different programs that represent f, such as a whole host of implementations of the bubblesort algorithm, and a whole host of implementations of the quicksort algorithm, and a whole host of implementations of the mergesort algorithm. Note the difference between the notion of the function ("list sorting") and the programs that represent it (bubblesort, quicksort, mergesort). Recall that the Kolmogorov complexity of f in the language P is the length of the shortest program that represents f: K-complexityP(f):=argmin{p∈P∣eval(p)=f}length(p) This is often touted as a measure of the "complexity" of f, to the degree that people familiar with the concept often (colloquially) call a function f "simple" precisely to the degree that it has low K-complexity. I claim that this is a bad definition, and propose the following alternative: alt-complexityP(f):=nlog2∑{p∈P∣eval(p)=f}rexp2(length(p)) where nlog2 denotes logarithm base 12, aka the negative of the (base 2) logarithm, and rexp2 denotes exponentiation base 12, aka the reciprocal of the (base 2) exponential. (Note that we could just as easily use any other base b>1. e would be a particularly natural choice, as usual. Here I'm using 2, both because it fits with measuring the lengths of our programs in terms of bits, and because it keeps the numbers whole in our examples.) Below, I'll explore this latter definition, and its elegance and theoretical superiority. Then I'll point out that our own laws of physics seem to have (comparatively) high K-complexity and low alt-complexity, thus giving empirical justification for my "correction".o Investigation A first observation is that the alt-complexity and the K-complexity agree whenever there is at most one program in P that represents f. If there's no program, then both equations are (positive) infinite. If there's exactly one program p∗∈P representing f, then p∗ will be the only term in the argmin and the only term in the ∑, so the first definition will yield length(p∗) whereas the second definition will yield nlog2(rexp2(length(p∗))), but nlog2 and rexp2 are inverses, so both definitions yield length(p∗). Thus, the definitions only differ when f has multiple programs in the language P. In that case, the alt-complexity will be lower than the K-complexity, as you may verify. As a simple example, suppose there are two different programs p1,p2∈P that represent f, both of length 17. Then the K-complexity of f is 17 bits, whereas the alt-complexity of f is nlog2(rexp2(17)+rexp2(17))=−log2(2−17+2−17)=−log2(2−16)=16 bits. According to alt-complexity, having two programs (of the same length) that represent f is just as good as having a single program that's one bit shorter. By a similar token, having 256 programs that are each n+8 bits long, is (according to alt-complexity but not K-complexity) just as good as having a single program that's n bits long. Why might this make sense? Well, suppose you're writing a program that (say) renders a certain 3D scene. You have to make some arbit...

The Nonlinear Library
LW - K-types vs T-types — what priors do you have? by strawberry calm

The Nonlinear Library

Play Episode Listen Later Nov 4, 2022 12:44


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: K-types vs T-types — what priors do you have?, published by strawberry calm on November 3, 2022 on LessWrong. Summary: There are two types of people, K-types and T-types. K-types want theories with low kolmogorov-complexity and T-types want theories with low time-complexity. This classification correlates with other classifications and with certain personality traits. Epistemic status: I'm somewhat confident that this classification is real and that it will help you understand why people believe the things they do. If there are major flaws in my understanding then hopefully someone will point that out. K-types vs T-types What makes a good theory? There's broad consensus that good theories should fit our observations. Unfortunately there's less consensus about to compare between the different theories that fit our observations — if we have two theories which both predict our observations to the exact same extent then how do we decide which to endorse? We can't shrug our shoulders and say "let's treat them all equally" because then we won't be able to predict anything at all about future observations. This is a consequence of the No Free Lunch Theorem: there are exactly as many theories which fit the seen observations and predict the future will look like X as there are which fit the seen observations and predict the future will look like not-X. So we can't predict anything unless we can say "these theories fitting the observations are better than these other theories which fit the observations". There are two types of people, which I'm calling "K-types" and "T-types", who differ in which theories they pick among those that fit the observations. K-types and T-types have different priors. K-types prefer theories which are short over theories which are long. They want theories you can describe in very few words. But they don't care how many inferential steps it takes to derive our observations within the theory. In contrast, T-types prefer theories which are quick over theories which are slow. They care how many inferential steps it takes to derive our observations within the theory, and are willing to accept longer theories if it rapidly speeds up derivation. Algorithmic characterisation In computer science terminology, we can think of a theory as a computer program which outputs predictions. K-types penalise the kolmogorov complexity of the program (also called the description complexity), whereas T-types penalise the time-complexity (also called the computational complexity). The T-types might still be doing perfect bayesian reasoning even if their prior credences depend on time-complexity. Bayesian reasoning is agnostic about the prior, so there's nothing defective about assigning a low prior to programs with high time-complexity. However, T-types will deviate from Solomonoff inductors, who use a prior which exponentially decays in kolmogorov-complexity. Proof-theoretic characterisation. When translating between proof theory and computer science, (computer program, computational steps, output) is mapped to (axioms, deductive steps, theorems) respectively. Kolmogorov-complexity maps to "total length of the axioms" and time-complexity maps to "number of deductive steps". K-types don't care how many steps there are in the proof, they only care about the number of axioms used in the proof. T-types do care how many steps there are in the proof, whether those steps are axioms or inferences. Occam's Razor characterisation. Both K-types and T-types can claim to be inheritors of Occam's Razor, in that both types prefer simple theories. But they interpret "simplicity" in two different ways. K-types consider the simplicity of the assumptions alone, whereas T-types consider the simplicity of the assumptions plus the derivation. This is the key idea. Both can accuse the other of ...

The Nonlinear Library: LessWrong
LW - K-types vs T-types — what priors do you have? by strawberry calm

The Nonlinear Library: LessWrong

Play Episode Listen Later Nov 4, 2022 12:44


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: K-types vs T-types — what priors do you have?, published by strawberry calm on November 3, 2022 on LessWrong. Summary: There are two types of people, K-types and T-types. K-types want theories with low kolmogorov-complexity and T-types want theories with low time-complexity. This classification correlates with other classifications and with certain personality traits. Epistemic status: I'm somewhat confident that this classification is real and that it will help you understand why people believe the things they do. If there are major flaws in my understanding then hopefully someone will point that out. K-types vs T-types What makes a good theory? There's broad consensus that good theories should fit our observations. Unfortunately there's less consensus about to compare between the different theories that fit our observations — if we have two theories which both predict our observations to the exact same extent then how do we decide which to endorse? We can't shrug our shoulders and say "let's treat them all equally" because then we won't be able to predict anything at all about future observations. This is a consequence of the No Free Lunch Theorem: there are exactly as many theories which fit the seen observations and predict the future will look like X as there are which fit the seen observations and predict the future will look like not-X. So we can't predict anything unless we can say "these theories fitting the observations are better than these other theories which fit the observations". There are two types of people, which I'm calling "K-types" and "T-types", who differ in which theories they pick among those that fit the observations. K-types and T-types have different priors. K-types prefer theories which are short over theories which are long. They want theories you can describe in very few words. But they don't care how many inferential steps it takes to derive our observations within the theory. In contrast, T-types prefer theories which are quick over theories which are slow. They care how many inferential steps it takes to derive our observations within the theory, and are willing to accept longer theories if it rapidly speeds up derivation. Algorithmic characterisation In computer science terminology, we can think of a theory as a computer program which outputs predictions. K-types penalise the kolmogorov complexity of the program (also called the description complexity), whereas T-types penalise the time-complexity (also called the computational complexity). The T-types might still be doing perfect bayesian reasoning even if their prior credences depend on time-complexity. Bayesian reasoning is agnostic about the prior, so there's nothing defective about assigning a low prior to programs with high time-complexity. However, T-types will deviate from Solomonoff inductors, who use a prior which exponentially decays in kolmogorov-complexity. Proof-theoretic characterisation. When translating between proof theory and computer science, (computer program, computational steps, output) is mapped to (axioms, deductive steps, theorems) respectively. Kolmogorov-complexity maps to "total length of the axioms" and time-complexity maps to "number of deductive steps". K-types don't care how many steps there are in the proof, they only care about the number of axioms used in the proof. T-types do care how many steps there are in the proof, whether those steps are axioms or inferences. Occam's Razor characterisation. Both K-types and T-types can claim to be inheritors of Occam's Razor, in that both types prefer simple theories. But they interpret "simplicity" in two different ways. K-types consider the simplicity of the assumptions alone, whereas T-types consider the simplicity of the assumptions plus the derivation. This is the key idea. Both can accuse the other of ...

The Nonlinear Library
AF - Beyond Kolmogorov and Shannon by Alexander Gietelink Oldenziel

The Nonlinear Library

Play Episode Listen Later Oct 25, 2022 9:26


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Beyond Kolmogorov and Shannon, published by Alexander Gietelink Oldenziel on October 25, 2022 on The AI Alignment Forum. This post is the first in a sequence that will describe James Crutchfield's Computational Mechanics framework. We feel this is one of the most theoretically sound and promising approaches towards understanding Transformers in particular and interpretability more generally. As a heads up: Crutchfield's framework will take many posts to fully go through, but even if you don't make it all the way through there are still many deep insights we hope you will pick up along the way. EDIT: since there was some confusion about this in the comments: These initial posts are supposed to be an introductionary and won't get into the actually novel aspects of Crutchfield's framework yet. It's also not a dunk on existing information- theoretic measures - rather an ode! To better understand the capability and limitations of large language models it is crucial to understand the inherent structure and uncertainty ('entropy') of language data. It is natural to quantify this structure with complexity measures. We can then compare the performance of transformers to the theoretically optimal limits achieved by minimal circuits. This will be key to interpreting transformers. The two most well-known complexity measures are the Shannon entropy and the Kolmogorov complexity. We will describe why these measures are not sufficient to understand the inherent structure of language. This will serve as a motivation for more sophisticated complexity measures that better probe the intrinsic structure of language data. We will describe these new complexity measures in subsequent posts. Later in this sequence we will discuss some directions for transformer interpretability work. Compression is the path to understanding Imagine you are an agent coming across some natural system. You stick an appendage into the system, effectively measuring its states. You measure for a million timepoints and get mysterious data that looks like this: ...00110100100100110110110100110100100100110110110100100110110100... You want to gain an understanding of how this system generates this data, so that you can predict its output, so you can take advantage of the system to your own ends, and because gaining understanding is an intrinsic joy. In reality the data was generated in the following way: output 0, then 1, then you flip a fair coin, and then repeat. Is there some kind of framework or algorithm where we can reliably come to this understanding? As others have noted, understanding is related to abstraction, prediction, and compression. We operationalize understanding by saying an agent has an understanding of a dataset if it possesses a compressed generative model: i.e. a program that is able to generate samples that (approximately) simulate the hidden structure, both deterministic and random, in the data. Note that pure prediction is not understanding. As a simple example take the case of predicting the outcomes of 100 fair coin tosses. Predicting tails every flip will give you maximum expected predictive accuracy (50%), but it is not the correct generative model for the data. Over the course of this sequence, we will come to formally understand why this is the case. Standard measures of information theory do not work To start let's consider the Kolmogorov Complexity and Shannon Entropy as measures of compression, and see why they don't quite work for what we want. Kolmogorov Complexity Recall that the Kolmogorov(-Chaitin-Solomonoff) complexity K(x) of a bit string x is defined as the length of the shortest programme outputting x [given a blank output on a chosen universal Turing machine] One often discussed downside of the K complexity is that it is incomputable. But there is another more conceptual do...

The Local Maximum
Ep. 245 - Axioms of Probability

The Local Maximum

Play Episode Listen Later Sep 28, 2022 47:22


Max starts with a brief news update on ethereum, and then moves to the Kolmogorov axioms of probability. What is an axiom system anyway - and why would someone want to change it?

The Nonlinear Library
LW - Quintin's alignment papers roundup - week 2 by Quintin Pope

The Nonlinear Library

Play Episode Listen Later Sep 20, 2022 18:35


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Quintin's alignment papers roundup - week 2, published by Quintin Pope on September 19, 2022 on LessWrong. Introduction Last week's paper roundup (more or less by accident) focused mostly on path dependence of deep learning and the order of feature learning. Going forwards, I've decided to have an explicit focus for each week's roundup. This week's focus is on the structure/redundancy of trained models, as well as linear interpolations through parameter space. I've also decided to publish each roundup on Monday morning. Papers Residual Networks Behave Like Ensembles of Relatively Shallow Networks In this work we propose a novel interpretation of residual networks showing that they can be seen as a collection of many paths of differing length. Moreover, residual networks seem to enable very deep networks by leveraging only the short paths during training. To support this observation, we rewrite residual networks as an explicit collection of paths. Unlike traditional models, paths through residual networks vary in length. Further, a lesion study reveals that these paths show ensemble-like behavior in the sense that they do not strongly depend on each other. Finally, and most surprising, most paths are shorter than one might expect, and only the short paths are needed during training, as longer paths do not contribute any gradient. For example, most of the gradient in a residual network with 110 layers comes from paths that are only 10-34 layers deep. Our results reveal one of the key characteristics that seem to enable the training of very deep networks: Residual networks avoid the vanishing gradient problem by introducing short paths which can carry gradient throughout the extent of very deep networks. My opinion: This paper suggests that neural nets are redundant by default, which gives some intuition for why it's often possible to prune large fractions of a network's parameters without much impact on the test performance, as well as the mechanism by which residual connections allow for training deeper networks: residual connections allow shallow nets to communicate directly with the input / output space, so they allow for deep nets to be built from ensembling shallow nets. I think it also points away from neural nets implementing a Kolmogorov or circuit simplicity prior. On the Effect of Dropping Layers of Pre-trained Transformer Models Transformer-based NLP models are trained using hundreds of millions or even billions of parameters, limiting their applicability in computationally constrained environments. While the number of parameters generally correlates with performance, it is not clear whether the entire network is required for a downstream task. Motivated by the recent work on pruning and distilling pre-trained models, we explore strategies to drop layers in pre-trained models, and observe the effect of pruning on downstream GLUE tasks. We were able to prune BERT, RoBERTa and XLNet models up to 40%, while maintaining up to 98% of their original performance. Additionally we show that our pruned models are on par with those built using knowledge distillation, both in terms of size and performance. Our experiments yield interesting observations such as, (i) the lower layers are most critical to maintain downstream task performance, (ii) some tasks such as paraphrase detection and sentence similarity are more robust to the dropping of layers, and (iii) models trained using a different objective function exhibit different learning patterns and w.r.t the layer dropping. My opinion: (see below) Of Non-Linearity and Commutativity in BERT In this work we provide new insights into the transformer architecture, and in particular, its best-known variant, BERT. First, we propose a method to measure the degree of non-linearity of different elements of transformers. Next, w...

The Nonlinear Library: LessWrong
LW - Quintin's alignment papers roundup - week 2 by Quintin Pope

The Nonlinear Library: LessWrong

Play Episode Listen Later Sep 20, 2022 18:35


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Quintin's alignment papers roundup - week 2, published by Quintin Pope on September 19, 2022 on LessWrong. Introduction Last week's paper roundup (more or less by accident) focused mostly on path dependence of deep learning and the order of feature learning. Going forwards, I've decided to have an explicit focus for each week's roundup. This week's focus is on the structure/redundancy of trained models, as well as linear interpolations through parameter space. I've also decided to publish each roundup on Monday morning. Papers Residual Networks Behave Like Ensembles of Relatively Shallow Networks In this work we propose a novel interpretation of residual networks showing that they can be seen as a collection of many paths of differing length. Moreover, residual networks seem to enable very deep networks by leveraging only the short paths during training. To support this observation, we rewrite residual networks as an explicit collection of paths. Unlike traditional models, paths through residual networks vary in length. Further, a lesion study reveals that these paths show ensemble-like behavior in the sense that they do not strongly depend on each other. Finally, and most surprising, most paths are shorter than one might expect, and only the short paths are needed during training, as longer paths do not contribute any gradient. For example, most of the gradient in a residual network with 110 layers comes from paths that are only 10-34 layers deep. Our results reveal one of the key characteristics that seem to enable the training of very deep networks: Residual networks avoid the vanishing gradient problem by introducing short paths which can carry gradient throughout the extent of very deep networks. My opinion: This paper suggests that neural nets are redundant by default, which gives some intuition for why it's often possible to prune large fractions of a network's parameters without much impact on the test performance, as well as the mechanism by which residual connections allow for training deeper networks: residual connections allow shallow nets to communicate directly with the input / output space, so they allow for deep nets to be built from ensembling shallow nets. I think it also points away from neural nets implementing a Kolmogorov or circuit simplicity prior. On the Effect of Dropping Layers of Pre-trained Transformer Models Transformer-based NLP models are trained using hundreds of millions or even billions of parameters, limiting their applicability in computationally constrained environments. While the number of parameters generally correlates with performance, it is not clear whether the entire network is required for a downstream task. Motivated by the recent work on pruning and distilling pre-trained models, we explore strategies to drop layers in pre-trained models, and observe the effect of pruning on downstream GLUE tasks. We were able to prune BERT, RoBERTa and XLNet models up to 40%, while maintaining up to 98% of their original performance. Additionally we show that our pruned models are on par with those built using knowledge distillation, both in terms of size and performance. Our experiments yield interesting observations such as, (i) the lower layers are most critical to maintain downstream task performance, (ii) some tasks such as paraphrase detection and sentence similarity are more robust to the dropping of layers, and (iii) models trained using a different objective function exhibit different learning patterns and w.r.t the layer dropping. My opinion: (see below) Of Non-Linearity and Commutativity in BERT In this work we provide new insights into the transformer architecture, and in particular, its best-known variant, BERT. First, we propose a method to measure the degree of non-linearity of different elements of transformers. Next, w...

Astro arXiv | all categories
Dissipative magnetic structures and scales in small-scale dynamos

Astro arXiv | all categories

Play Episode Listen Later Sep 19, 2022 0:42


Dissipative magnetic structures and scales in small-scale dynamos by A. Brandenburg et al. on Monday 19 September Small-scale dynamos play important roles in modern astrophysics, especially on Galactic and extragalactic scales. Owing to dynamo action, purely hydrodynamic Kolmogorov turbulence hardly exists and is often replaced by hydromagnetic turbulence. Understanding the size of dissipative magnetic structures is important in estimating the time scale of Galactic scintillation and other observational and theoretical aspects of interstellar and intergalactic small-scale dynamos. Here we show that the thickness of magnetic flux tubes decreases more rapidly with increasing magnetic Prandtl number than previously expected. Also the theoretical scale based on the dynamo growth rate and the magnetic diffusivity decrease faster than expected. However, the scale based on the cutoff of the magnetic energy spectra scales as expected for large magnetic Prandtl numbers, but continues in the same way also for moderately small values - contrary to what is expected. For a critical magnetic Prandtl number of about 0.27, the dissipative and resistive cutoffs are found to occur at the same wavenumber. For large magnetic Prandtl numbers, our simulations show that the peak of the magnetic energy spectrum occurs at a wavenumber that is twice as large as previously predicted. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2209.08717v1

Astro arXiv | all categories
Empirical constraints on the turbulence in QSO host nebulae from velocity structure function measurements

Astro arXiv | all categories

Play Episode Listen Later Sep 12, 2022 1:05


Empirical constraints on the turbulence in QSO host nebulae from velocity structure function measurements by Mandy C. Chen et al. on Monday 12 September We present the first empirical constraints on the turbulent velocity field of the diffuse circumgalactic medium around four luminous QSOs at $z!approx!0.5$--1.1. Spatially extended nebulae of $approx!50$--100 physical kpc in diameter centered on the QSOs are revealed in [OII]$lambdalambda,3727,3729$ and/or [OIII]$lambda,5008$ emission lines in integral field spectroscopic observations obtained using MUSE on the VLT. We measure the second- and third-order velocity structure functions (VSFs) over a range of scales, from $lesssim!5$ kpc to $approx!20$--50 kpc, to quantify the turbulent energy transfer between different scales in these nebulae. While no constraints on the energy injection and dissipation scales can be obtained from the current data, we show that robust constraints on the power-law slope of the VSFs can be determined after accounting for the effects of atmospheric seeing, spatial smoothing, and large-scale bulk flows. Out of the four QSO nebulae studied, one exhibits VSFs in spectacular agreement with the Kolmogorov law, expected for isotropic, homogeneous, and incompressible turbulent flows. The other three fields exhibit a shallower decline in the VSFs from large to small scales but with loose constraints, in part due to a limited dynamic range in the spatial scales in seeing-limited data. For the QSO nebula consistent with the Kolmogorov law, we determine a turbulence energy cascade rate of $approx!0.2$ cm$^{2}$ s$^{-3}$. We discuss the implication of the observed VSFs in the context of QSO feeding and feedback in the circumgalactic medium. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2209.04344v1

Astro arXiv | all categories
Empirical constraints on the turbulence in QSO host nebulae from velocity structure function measurements

Astro arXiv | all categories

Play Episode Listen Later Sep 12, 2022 0:56


Empirical constraints on the turbulence in QSO host nebulae from velocity structure function measurements by Mandy C. Chen et al. on Monday 12 September We present the first empirical constraints on the turbulent velocity field of the diffuse circumgalactic medium around four luminous QSOs at $z!approx!0.5$--1.1. Spatially extended nebulae of $approx!50$--100 physical kpc in diameter centered on the QSOs are revealed in [OII]$lambdalambda,3727,3729$ and/or [OIII]$lambda,5008$ emission lines in integral field spectroscopic observations obtained using MUSE on the VLT. We measure the second- and third-order velocity structure functions (VSFs) over a range of scales, from $lesssim!5$ kpc to $approx!20$--50 kpc, to quantify the turbulent energy transfer between different scales in these nebulae. While no constraints on the energy injection and dissipation scales can be obtained from the current data, we show that robust constraints on the power-law slope of the VSFs can be determined after accounting for the effects of atmospheric seeing, spatial smoothing, and large-scale bulk flows. Out of the four QSO nebulae studied, one exhibits VSFs in spectacular agreement with the Kolmogorov law, expected for isotropic, homogeneous, and incompressible turbulent flows. The other three fields exhibit a shallower decline in the VSFs from large to small scales but with loose constraints, in part due to a limited dynamic range in the spatial scales in seeing-limited data. For the QSO nebula consistent with the Kolmogorov law, we determine a turbulence energy cascade rate of $approx!0.2$ cm$^{2}$ s$^{-3}$. We discuss the implication of the observed VSFs in the context of QSO feeding and feedback in the circumgalactic medium. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2209.04344v1

The Nonlinear Library
LW - Gradient descent doesn't select for inner search by Ivan Vendrov

The Nonlinear Library

Play Episode Listen Later Aug 15, 2022 7:50


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Gradient descent doesn't select for inner search, published by Ivan Vendrov on August 13, 2022 on LessWrong. TL;DR: Gradient descent won't select for inner search processes because they're not compute & memory efficient. Slightly longer TL;DR: A key argument for mesa-optimization is that as we search over programs, we will select for "search processes with simple objectives", because they are simpler or more compact than alternative less dangerous programs. This argument is much weaker when your program search is restricted to programs that use a fixed amount of compute, and you're not optimizing strongly for low description length - e.g. gradient descent in modern deep learning systems. We don't really know what shape of programs gradient descent selects for in realistic environments, but they are much less likely to involve search than commonly believed. Note on terminology (added in response to comments): By "search" I mean here a process that evaluates a number of candidates before returning the best one; what Abram Demski calls "selection" in Selection vs Control . The more candidates considered, the more "search-like" a process is - with gradient descent and A being central examples, and a thermostat being a central counter-example. Recap: compression argument for inner optimizers Here's the argument from Risks From Learned Optimization: [emphasis mine] In some tasks, good performance requires a very complex policy. At the same time, base optimizers are generally biased in favor of selecting learned algorithms with lower complexity. Thus, all else being equal, the base optimizer will generally be incentivized to look for a highly compressed policy. One way to find a compressed policy is to search for one that is able to use general features of the task structure to produce good behavior, rather than simply memorizing the correct output for each input. A mesa-optimizer is an example of such a policy. From the perspective of the base optimizer, a mesa-optimizer is a highly-compressed version of whatever policy it ends up implementing: instead of explicitly encoding the details of that policy in the learned algorithm, the base optimizer simply needs to encode how to search for such a policy. Furthermore, if a mesa-optimizer can determine the important features of its environment at runtime, it does not need to be given as much prior information as to what those important features are, and can thus be much simpler. and even more forceful phrasing from John Wentworth: We don't know that the AI will necessarily end up optimizing reward-button-pushes or smiles; there may be other similarly-compact proxies which correlate near-perfectly with reward in the training process. We can probably rule out "a spread of situationally-activated computations which steer its actions towards historical reward-correlates", insofar as that spread is a much less compact policy-encoding than an explicit search process + simple objective(s). Compactness, Complexity, and Compute At face value, it does seem like we're selecting programs for simplicity. The Deep Double Descent paper showed us that gradient descent training in the overparametrized regime (i.e. the regime of all modern deep models) favors simpler models. But is this notion of simplicity the same as "compactness" or "complexity"? Evan seems to think so, I'm less sure. Let's dive into the different notions of complexity here. The most commonly used notion of program complexity is Kolmogorov complexity (or description length), basically just "length of the program in some reference programming language". This definition seems natural... but, critically, it assumes away all computational constraints. K-complexity doesn't care if your program completes in a millisecond or runs until the heat death of the universe. This makes it a ...

The Nonlinear Library: LessWrong
LW - Gradient descent doesn't select for inner search by Ivan Vendrov

The Nonlinear Library: LessWrong

Play Episode Listen Later Aug 15, 2022 7:50


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Gradient descent doesn't select for inner search, published by Ivan Vendrov on August 13, 2022 on LessWrong. TL;DR: Gradient descent won't select for inner search processes because they're not compute & memory efficient. Slightly longer TL;DR: A key argument for mesa-optimization is that as we search over programs, we will select for "search processes with simple objectives", because they are simpler or more compact than alternative less dangerous programs. This argument is much weaker when your program search is restricted to programs that use a fixed amount of compute, and you're not optimizing strongly for low description length - e.g. gradient descent in modern deep learning systems. We don't really know what shape of programs gradient descent selects for in realistic environments, but they are much less likely to involve search than commonly believed. Note on terminology (added in response to comments): By "search" I mean here a process that evaluates a number of candidates before returning the best one; what Abram Demski calls "selection" in Selection vs Control . The more candidates considered, the more "search-like" a process is - with gradient descent and A being central examples, and a thermostat being a central counter-example. Recap: compression argument for inner optimizers Here's the argument from Risks From Learned Optimization: [emphasis mine] In some tasks, good performance requires a very complex policy. At the same time, base optimizers are generally biased in favor of selecting learned algorithms with lower complexity. Thus, all else being equal, the base optimizer will generally be incentivized to look for a highly compressed policy. One way to find a compressed policy is to search for one that is able to use general features of the task structure to produce good behavior, rather than simply memorizing the correct output for each input. A mesa-optimizer is an example of such a policy. From the perspective of the base optimizer, a mesa-optimizer is a highly-compressed version of whatever policy it ends up implementing: instead of explicitly encoding the details of that policy in the learned algorithm, the base optimizer simply needs to encode how to search for such a policy. Furthermore, if a mesa-optimizer can determine the important features of its environment at runtime, it does not need to be given as much prior information as to what those important features are, and can thus be much simpler. and even more forceful phrasing from John Wentworth: We don't know that the AI will necessarily end up optimizing reward-button-pushes or smiles; there may be other similarly-compact proxies which correlate near-perfectly with reward in the training process. We can probably rule out "a spread of situationally-activated computations which steer its actions towards historical reward-correlates", insofar as that spread is a much less compact policy-encoding than an explicit search process + simple objective(s). Compactness, Complexity, and Compute At face value, it does seem like we're selecting programs for simplicity. The Deep Double Descent paper showed us that gradient descent training in the overparametrized regime (i.e. the regime of all modern deep models) favors simpler models. But is this notion of simplicity the same as "compactness" or "complexity"? Evan seems to think so, I'm less sure. Let's dive into the different notions of complexity here. The most commonly used notion of program complexity is Kolmogorov complexity (or description length), basically just "length of the program in some reference programming language". This definition seems natural... but, critically, it assumes away all computational constraints. K-complexity doesn't care if your program completes in a millisecond or runs until the heat death of the universe. This makes it a ...

The Nonlinear Library
LW - Tao, Kontsevich & others on HLAI in Math by interstice

The Nonlinear Library

Play Episode Listen Later Jun 10, 2022 3:24


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Tao, Kontsevich & others on HLAI in Math, published by interstice on June 10, 2022 on LessWrong. I found this 2015 panel with Terence Tao and some other eminent mathematicians to be interesting. The panel covered various topics but got into the question of when computers will be able to do research-level mathematics. Most interestingly, Maxim Kontsevich was alone in predicting that HLAI in math was plausible in our lifetime -- but also, that developing such an AI might not be a good idea. He also mentioned a BioAnchors-style AI forecast by Kolmogorov that I had never heard of before(and cannot find a reference to -- anyone know of such a thing?) Excerpts below: INTERVIEWER: Do you imagine that, maybe in 100 years, or 1000 years, that, like it happened in chess -- humans stil play tournaments but everyone knows computers are better -- is it conceivable that this could happen in mathematics? TAO: I think computers will be able to do things much more efficiently with the right computer tools. Search engines, for example, often you'll type in a query to Google and it will come back with "do you mean this" and often you did. One could imagine that if you had a really good computer assistant working on some math problem, it will keep suggesting "should you do this? have you considered looking at this paper?" You could imagine this would really speed up the way we do research. Sometimes you're stuck for months because you just don't know some key trick that is buried in some other field of expertise. Some sort of advanced Google could suggest this to you. So I think we will use computers to do things much more efficiently than we do currently, but it will still be humans driving the show, I'm pretty sure. INTERVIEWER: Maxim, do you think anything like this[HLAI, I assume] is possible? MAXIM: I think it's perfectly possible, maybe in our lifetime. INTERVIEWER: Why do you think so? MAXIM: I don't think artificial intelligence is very hard. It will be pretty soon I suppose. INTERVIEWER: You are a contrarian here, saying it will happen so quickly. So what makes you so optimistic? MAXIM: Optimistic? No, it's actually pessimistic. I thought about it myself a little bit, I don't think there are fundamental difficulties here. INTERVIEWER: So why don't you just work on that instead? MAXIM: I think it would be immoral to work on it. MILNER: I'm no expert, but isn't the way the computer played chess not really very intelligent? It's a huge combinatorial check. Inventing the sort of mathematics you've invented, that's not combinatorial checking, it's entirely conceptual. MAXIM: Yeah OK, sure. MILNER: Is there any case we know of computers doing anything like that? MAXIM: We don't know any examples, but it's not inconceivable. MILNER: It's not inconceivable...but I would be very surprised if we saw a computer win a Fields medal in our lifetime. TAO: One could imagine that a computer could discover just by brute force a connection between two fields of mathematics that wasn't suspected, and then the person on the computer would be able to flesh it out. Maybe he would collect the medal. MAXIM: Actually, Kolmogorov thought that mathematics will be extinct in 100 years, he had an estimate. He calculated the number of neurons and connections, he made the head something like one cubic meter. So yes, maybe a crazy estimate, but he was also thinking about natural boundaries. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library: LessWrong
LW - Tao, Kontsevich and others on HLAI in Math by interstice

The Nonlinear Library: LessWrong

Play Episode Listen Later Jun 10, 2022 3:24


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Tao, Kontsevich & others on HLAI in Math, published by interstice on June 10, 2022 on LessWrong. I found this 2015 panel with Terence Tao and some other eminent mathematicians to be interesting. The panel covered various topics but got into the question of when computers will be able to do research-level mathematics. Most interestingly, Maxim Kontsevich was alone in predicting that HLAI in math was plausible in our lifetime -- but also, that developing such an AI might not be a good idea. He also mentioned a BioAnchors-style AI forecast by Kolmogorov that I had never heard of before(and cannot find a reference to -- anyone know of such a thing?) Excerpts below: INTERVIEWER: Do you imagine that, maybe in 100 years, or 1000 years, that, like it happened in chess -- humans stil play tournaments but everyone knows computers are better -- is it conceivable that this could happen in mathematics? TAO: I think computers will be able to do things much more efficiently with the right computer tools. Search engines, for example, often you'll type in a query to Google and it will come back with "do you mean this" and often you did. One could imagine that if you had a really good computer assistant working on some math problem, it will keep suggesting "should you do this? have you considered looking at this paper?" You could imagine this would really speed up the way we do research. Sometimes you're stuck for months because you just don't know some key trick that is buried in some other field of expertise. Some sort of advanced Google could suggest this to you. So I think we will use computers to do things much more efficiently than we do currently, but it will still be humans driving the show, I'm pretty sure. INTERVIEWER: Maxim, do you think anything like this[HLAI, I assume] is possible? MAXIM: I think it's perfectly possible, maybe in our lifetime. INTERVIEWER: Why do you think so? MAXIM: I don't think artificial intelligence is very hard. It will be pretty soon I suppose. INTERVIEWER: You are a contrarian here, saying it will happen so quickly. So what makes you so optimistic? MAXIM: Optimistic? No, it's actually pessimistic. I thought about it myself a little bit, I don't think there are fundamental difficulties here. INTERVIEWER: So why don't you just work on that instead? MAXIM: I think it would be immoral to work on it. MILNER: I'm no expert, but isn't the way the computer played chess not really very intelligent? It's a huge combinatorial check. Inventing the sort of mathematics you've invented, that's not combinatorial checking, it's entirely conceptual. MAXIM: Yeah OK, sure. MILNER: Is there any case we know of computers doing anything like that? MAXIM: We don't know any examples, but it's not inconceivable. MILNER: It's not inconceivable...but I would be very surprised if we saw a computer win a Fields medal in our lifetime. TAO: One could imagine that a computer could discover just by brute force a connection between two fields of mathematics that wasn't suspected, and then the person on the computer would be able to flesh it out. Maybe he would collect the medal. MAXIM: Actually, Kolmogorov thought that mathematics will be extinct in 100 years, he had an estimate. He calculated the number of neurons and connections, he made the head something like one cubic meter. So yes, maybe a crazy estimate, but he was also thinking about natural boundaries. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
AF - Information Loss --> Basin flatness by Vivek Hebbar

The Nonlinear Library

Play Episode Listen Later May 21, 2022 11:50


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Information Loss --> Basin flatness, published by Vivek Hebbar on May 21, 2022 on The AI Alignment Forum. This work was done under the mentorship of Evan Hubinger through the SERI MATS program. Thanks to Lucius Bushnaq, John Wentworth, Quintin Pope, and Peter Barnett for useful feedback and suggestions. In this theory, the main proximate cause of flat basins is a type of information loss. Its relationship with circuit complexity and Kolmogorov complexity is currently unknown to me. In this post, I will demonstrate that: High-dimensional solution manifolds are caused by linear dependence between the "behavioral gradients" for different inputs. This linear dependence is usually caused when networks throw away information which distinguishes different training inputs. It is more likely to occur when the information is thrown away early or by RELU. Overview for advanced readers: [Short version] Information Loss --> Basin flatness Behavior manifolds Suppose we have a regression task with 1-dimensional labels and k training examples. Let us take an overparameterized network with N parameters. Every model in parameter space is part of a manifold, where every point on that manifold has identical behavior on the training set. These manifolds are usually at least N−k dimensional, but some are higher dimensional than this. I will call these manifolds "behavior manifolds", since points on the same manifold have the same behavior (on the training set, not on all possible inputs). We can visualize the existence of “behavior manifolds” by starting with a blank parameter space, then adding contour planes for each training example. Before we add any contour planes, the entire parameter space is a single manifold, with “identical behavior” on the null set. First, let us add the contour planes for input 1: Each plane here is an n-1 dimensional manifold, where every model on that plane has the same output on input 1. They slice parameter space into n-1 dimensional regions. Each of these regions is an equivalence class of functions, which all behave about the same on input 1. Next, we can add contour planes for input 2: When we put them together, they look like this: Together, the contours slice parameter space into n-2 dimensional regions. Each “diamond” in the picture is the cross-section of a tube-like region which extends vertically, in the direction which is parallel to both sets of planes. The manifolds of constant behavior are lines which run vertically through these tubes, parallel to both sets of contours. In higher dimensions, these “lines” and “tubes” are actually n-2 dimensional hyperplanes, since only two degrees of freedom have been removed, one by each set of contours. We can continue this with more and more inputs. Each input adds another set of hyperplanes, and subtracts one more dimension from the identical-behavior manifolds. Since each input can only slice off one dimension, the manifolds of constant behavior are at least n-k dimensional, where k is the number of training examples. Solution manifolds Global minima also lie on behavior manifolds, such that every point on the manifold is a global minimum. I will call these "solution manifolds". These manifolds generally extend out to infinity, so it isn't really meaningful to talk about literal "basin volume". We can focus instead on their dimensionality. All else being equal, a higher dimensional solution manifold should drain a larger region of parameter space, and thus be favored by the inductive bias. Parallel contours allow higher manifold dimension Suppose we have 3 parameters (one is off-the-page) and 2 inputs. If the contours are perpendicular: Then the green regions are cross-sections of tubes extending infinitely off-the-page, where each tube contains models that are roughly equivalent on the training set. The...

La Logica del Rischio
Episodio 15QRM: La legge dei grandi numeri

La Logica del Rischio

Play Episode Listen Later Apr 14, 2022 20:32


In questo episodio parliamo delle principali leggi dei grandi numeri. Esatto: ce ne sono più di una. Qui ci occupiamo di quella di Bernoulli, di quella di Kolmogorov e accenniamo anche a qualche altra variante. Disclaimer: non vi aiuterò a trovare la dolce metà, mi spiace.

La Logica del Rischio
Episodio 7: La probabilità tra logica e assiomi.

La Logica del Rischio

Play Episode Listen Later Feb 17, 2022 34:35


Questo episodio chiude la nostra carrellata tra le principali definizioni di probabilità. Oggi ci dedichiamo al logicismo di Keynes, Jaynes e Jeffreys, e all'approccio assiomatico di Kolmogorov. Completando il quadro sulla probabilità, saremo pronti a continuare verso le misure di rischio.

The Nonlinear Library
LW - Humans can be assigned any values whatsoever by Stuart_Armstrong from Value Learning

The Nonlinear Library

Play Episode Listen Later Dec 24, 2021 10:13


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is Value Learning, Part 4: Humans can be assigned any values whatsoever., published by Stuart_Armstrong. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. (Re)Posted as part of the AI Alignment Forum sequence on Value Learning. Rohin's note: In the last post, we saw that a good broad value learning approach would need to understand the systematic biases in human planning in order to achieve superhuman performance. Perhaps we can just use machine learning again and learn the biases and reward simultaneously? This post by Stuart Armstrong (original here) and the associated paper say: “Not without more assumptions.” This post comes from a theoretical perspective that may be alien to ML researchers; in particular, it makes an argument that simplicity priors do not solve the problem pointed out here, where simplicity is based on Kolmogorov complexity (which is an instantiation of the Minimum Description Length principle). The analog in machine learning would be an argument that regularization would not work. The proof used is specific to Kolmogorov complexity and does not clearly generalize to arbitrary regularization techniques; however, I view the argument as being suggestive that regularization techniques would also be insufficient to address the problems raised here. Humans have no values. nor do any agent. Unless you make strong assumptions about their rationality. And depending on those assumptions, you get humans to have any values. An agent with no clear preferences There are three buttons in this world, B 0 B 1 , and X , and one agent H B 0 and B 1 can be operated by H , while X can be operated by an outside observer. H will initially press button B 0 ; if ever X is pressed, the agent will switch to pressing B 1 . If X is pressed again, the agent will switch back to pressing B 0 , and so on. After a large number of turns N H will shut off. That's the full algorithm for H So the question is, what are the values/preferences/rewards of H ? There are three natural reward functions that are plausible: R 0 , which is linear in the number of times B 0 is pressed. R 1 , which is linear in the number of times B 1 is pressed. R 2 I E X R 0 I O X R 1 , where I E X is the indicator function for X being pressed an even number of times, I O X 1 − I E X being the indicator function for X being pressed an odd number of times. For R 0 , we can interpret H as an R 0 maximising agent which X overrides. For R 1 , we can interpret H as an R 1 maximising agent which X releases from constraints. And R 2 is the “ H is always fully rational” reward. Semantically, these make sense for the various R i 's being a true and natural reward, with X “coercive brain surgery” in the first case, X “release H from annoying social obligations” in the second, and X “switch which of R 0 and R 1 gives you pleasure” in the last case. But note that there is no semantic implications here, all that we know is H , with its full algorithm. If we wanted to deduce its true reward for the purpose of something like Inverse Reinforcement Learning (IRL), what would it be? Modelling human (ir)rationality and reward Now let's talk about the preferences of an actual human. We all know that humans are not always rational. But even if humans were fully rational, the fact remains that we are physical, and vulnerable to things like coercive brain surgery (and in practice, to a whole host of other more or less manipulative techniques). So there will be the equivalent of “button X ” that overrides human preferences. Thus, “not immortal and unchangeable” is in practice enough for the agent to be considered “not fully rational”. Now assume that we've thoroughly observed a given human h (including their internal brain wiring), so we know the human policy π h (which determines their actions in a...

The Nonlinear Library: LessWrong
LW - Humans can be assigned any values whatsoever by Stuart_Armstrong from Value Learning

The Nonlinear Library: LessWrong

Play Episode Listen Later Dec 24, 2021 10:13


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is Value Learning, Part 4: Humans can be assigned any values whatsoever., published by Stuart_Armstrong. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. (Re)Posted as part of the AI Alignment Forum sequence on Value Learning. Rohin's note: In the last post, we saw that a good broad value learning approach would need to understand the systematic biases in human planning in order to achieve superhuman performance. Perhaps we can just use machine learning again and learn the biases and reward simultaneously? This post by Stuart Armstrong (original here) and the associated paper say: “Not without more assumptions.” This post comes from a theoretical perspective that may be alien to ML researchers; in particular, it makes an argument that simplicity priors do not solve the problem pointed out here, where simplicity is based on Kolmogorov complexity (which is an instantiation of the Minimum Description Length principle). The analog in machine learning would be an argument that regularization would not work. The proof used is specific to Kolmogorov complexity and does not clearly generalize to arbitrary regularization techniques; however, I view the argument as being suggestive that regularization techniques would also be insufficient to address the problems raised here. Humans have no values. nor do any agent. Unless you make strong assumptions about their rationality. And depending on those assumptions, you get humans to have any values. An agent with no clear preferences There are three buttons in this world, B 0 B 1 , and X , and one agent H B 0 and B 1 can be operated by H , while X can be operated by an outside observer. H will initially press button B 0 ; if ever X is pressed, the agent will switch to pressing B 1 . If X is pressed again, the agent will switch back to pressing B 0 , and so on. After a large number of turns N H will shut off. That's the full algorithm for H So the question is, what are the values/preferences/rewards of H ? There are three natural reward functions that are plausible: R 0 , which is linear in the number of times B 0 is pressed. R 1 , which is linear in the number of times B 1 is pressed. R 2 I E X R 0 I O X R 1 , where I E X is the indicator function for X being pressed an even number of times, I O X 1 − I E X being the indicator function for X being pressed an odd number of times. For R 0 , we can interpret H as an R 0 maximising agent which X overrides. For R 1 , we can interpret H as an R 1 maximising agent which X releases from constraints. And R 2 is the “ H is always fully rational” reward. Semantically, these make sense for the various R i 's being a true and natural reward, with X “coercive brain surgery” in the first case, X “release H from annoying social obligations” in the second, and X “switch which of R 0 and R 1 gives you pleasure” in the last case. But note that there is no semantic implications here, all that we know is H , with its full algorithm. If we wanted to deduce its true reward for the purpose of something like Inverse Reinforcement Learning (IRL), what would it be? Modelling human (ir)rationality and reward Now let's talk about the preferences of an actual human. We all know that humans are not always rational. But even if humans were fully rational, the fact remains that we are physical, and vulnerable to things like coercive brain surgery (and in practice, to a whole host of other more or less manipulative techniques). So there will be the equivalent of “button X ” that overrides human preferences. Thus, “not immortal and unchangeable” is in practice enough for the agent to be considered “not fully rational”. Now assume that we've thoroughly observed a given human h (including their internal brain wiring), so we know the human policy π h (which determines their actions in a...

The Nonlinear Library: LessWrong Top Posts
Realism about rationality Richard_Ngo

The Nonlinear Library: LessWrong Top Posts

Play Episode Listen Later Dec 12, 2021 7:32


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Realism about rationality, published Richard_Ngo on the LESSWRONG. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This is a linkpost for http://thinkingcomplete.blogspot.com/2018/09/rational-and-real.html Epistemic status: trying to vaguely gesture at vague intuitions. A similar idea was explored here under the heading "the intelligibility of intelligence", although I hadn't seen it before writing this post. As of 2020, I consider this follow-up comment to be a better summary of the thing I was trying to convey with this post than the post itself. There's a mindset which is common in the rationalist community, which I call “realism about rationality” (the name being intended as a parallel to moral realism). I feel like my skepticism about agent foundations research is closely tied to my skepticism about this mindset, and so in this essay I try to articulate what it is. Humans ascribe properties to entities in the world in order to describe and predict them. Here are three such properties: "momentum", "evolutionary fitness", and "intelligence". These are all pretty useful properties for high-level reasoning in the fields of physics, biology and AI, respectively. There's a key difference between the first two, though. Momentum is very amenable to formalisation: we can describe it using precise equations, and even prove things about it. Evolutionary fitness is the opposite: although nothing in biology makes sense without it, no biologist can take an organism and write down a simple equation to define its fitness in terms of more basic traits. This isn't just because biologists haven't figured out that equation yet. Rather, we have excellent reasons to think that fitness is an incredibly complicated "function" which basically requires you to describe that organism's entire phenotype, genotype and environment. In a nutshell, then, realism about rationality is a mindset in which reasoning and intelligence are more like momentum than like fitness. It's a mindset which makes the following ideas seem natural: The idea that there is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general. (I don't count brute force approaches like AIXI for the same reason I don't consider physics a simple yet powerful description of biology). The idea that there is an “ideal” decision theory. The idea that AGI will very likely be an “agent”. The idea that Turing machines and Kolmogorov complexity are foundational for epistemology. The idea that, given certain evidence for a proposition, there's an "objective" level of subjective credence which you should assign to it, even under computational constraints. The idea that Aumann's agreement theorem is relevant to humans. The idea that morality is quite like mathematics, in that there are certain types of moral reasoning that are just correct. The idea that defining coherent extrapolated volition in terms of an idealised process of reflection roughly makes sense, and that it converges in a way which doesn't depend very much on morally arbitrary factors. The idea that having having contradictory preferences or beliefs is really bad, even when there's no clear way that they'll lead to bad consequences (and you're very good at avoiding dutch books and money pumps and so on). To be clear, I am neither claiming that realism about rationality makes people dogmatic about such ideas, nor claiming that they're all false. In fact, from a historical point of view I'm quite optimistic about using maths to describe things in general. But starting from that historical baseline, I'm inclined to adjust downwards on questions related to formalising intelligent thought, whereas rationality realism would endorse adjusting upwards. This essay is primarily intended to explain...

The Nonlinear Library: Alignment Forum Top Posts
Realism about rationality by Richard Ngo

The Nonlinear Library: Alignment Forum Top Posts

Play Episode Listen Later Dec 10, 2021 7:29


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Realism about rationality , published by Richard Ngo on the AI Alignment Forum. This is a linkpost for http://thinkingcomplete.blogspot.com/2018/09/rational-and-real.html Epistemic status: trying to vaguely gesture at vague intuitions. A similar idea was explored here under the heading "the intelligibility of intelligence", although I hadn't seen it before writing this post. As of 2020, I consider this follow-up comment to be a better summary of the thing I was trying to convey with this post than the post itself. There's a mindset which is common in the rationalist community, which I call “realism about rationality” (the name being intended as a parallel to moral realism). I feel like my skepticism about agent foundations research is closely tied to my skepticism about this mindset, and so in this essay I try to articulate what it is. Humans ascribe properties to entities in the world in order to describe and predict them. Here are three such properties: "momentum", "evolutionary fitness", and "intelligence". These are all pretty useful properties for high-level reasoning in the fields of physics, biology and AI, respectively. There's a key difference between the first two, though. Momentum is very amenable to formalisation: we can describe it using precise equations, and even prove things about it. Evolutionary fitness is the opposite: although nothing in biology makes sense without it, no biologist can take an organism and write down a simple equation to define its fitness in terms of more basic traits. This isn't just because biologists haven't figured out that equation yet. Rather, we have excellent reasons to think that fitness is an incredibly complicated "function" which basically requires you to describe that organism's entire phenotype, genotype and environment. In a nutshell, then, realism about rationality is a mindset in which reasoning and intelligence are more like momentum than like fitness. It's a mindset which makes the following ideas seem natural: The idea that there is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general. (I don't count brute force approaches like AIXI for the same reason I don't consider physics a simple yet powerful description of biology). The idea that there is an “ideal” decision theory. The idea that AGI will very likely be an “agent”. The idea that Turing machines and Kolmogorov complexity are foundational for epistemology. The idea that, given certain evidence for a proposition, there's an "objective" level of subjective credence which you should assign to it, even under computational constraints. The idea that Aumann's agreement theorem is relevant to humans. The idea that morality is quite like mathematics, in that there are certain types of moral reasoning that are just correct. The idea that defining coherent extrapolated volition in terms of an idealised process of reflection roughly makes sense, and that it converges in a way which doesn't depend very much on morally arbitrary factors. The idea that having having contradictory preferences or beliefs is really bad, even when there's no clear way that they'll lead to bad consequences (and you're very good at avoiding dutch books and money pumps and so on). To be clear, I am neither claiming that realism about rationality makes people dogmatic about such ideas, nor claiming that they're all false. In fact, from a historical point of view I'm quite optimistic about using maths to describe things in general. But starting from that historical baseline, I'm inclined to adjust downwards on questions related to formalising intelligent thought, whereas rationality realism would endorse adjusting upwards. This essay is primarily intended to explain my position, not justify it, but one important consideration for me is th...

Hackaday Podcast
Ep 145: Remoticon is On, Movie FX, Cold Plasma, and The Purest Silicon

Hackaday Podcast

Play Episode Listen Later Nov 19, 2021 50:10


With literally just hours to go before the 2021 Hackaday Remoticon kicks off, editors Tom Nardi and Elliot Williams still managed to find time to talk about some of the must-see stories from the last week. There's fairly heavyweight topics on the docket this time around, from alternate methods of multiplying large numbers to the incredible engineering that goes into producing high purity silicon. But we'll also talk about the movie making magic of Stan Winston and some Pokemon-themed environmental sensors, so it should all balance out nicely. So long as the Russian's haven't kicked off the Kessler effect by the time you tune in, we should be good. Check out the show notes for links and more!  

Cognitive Engineering
Simplification

Cognitive Engineering

Play Episode Listen Later Jun 23, 2021 33:40


Is administering a Covid-19 test on yourself difficult, or are the instructions just confusing? How should we explain complexity and is there a limit to how much we can simplify things? In our latest podcast, we discuss different ways of simplifying information, how to judge the right level of detail for a given context, and whether reductionism is always a useful concept. We look at how simplification can help or hinder understanding, examining some of the consequences of oversimplification. A few things we mentioned in this podcast: - Reddit: Explain Like I'm Five https://www.reddit.com/r/explainlikeimfive/ - Shannon information and Kolmogorov complexity https://homepages.cwi.nl/~paulv/papers/info.pdf For more information on Aleph Insights visit our website https://alephinsights.com or to get in touch about our podcast email podcast@alephinsights.com

TapirCast
#90. Harry Nyquist - Bell Laboratuvarları (Bilim Tarihi Serisi B12: I. Kısım) - 11/04/2021

TapirCast

Play Episode Listen Later Apr 11, 2021 22:16


Doç. Dr. Serhan Yarkan ve Halil Said Cankurtaran'ın yer aldığı Bilim Tarihi Serisi'nin bu bölümünde, 7 Şubat 1889'da doğan ve 4 Nisan 1976'da hayata gözlerini yuman Harry Nyquist üzerine konuşulmuştur. Bell Laboratuvarları'nın bir çalışanı olan Nyquist, 130'dan fazla patente sahip olup, 12 adet de bilimsel makale yayınlamıştır. Gürültü Kavramına Giriş bölümümüzde de değindiğimiz üzere Nyquist, ısıl gürültü alanında önemli çalışmalara imza atmıştır. Laplace ile ilgili bölümümüzde girişini yapmış olduğumuz sistemlerin kararlılığı konusunda da çalışmaları bulunmaktadır. Ayrıca, Nyquist'in Bilgi Kuramı ve Haberleşme Kuramı'na yaptığı katkılar günümüz sayısal teknolojilerinin temellerini oluşturmaktadır. Bell Laboratuvarları'nda yapılan çalışmalara da değindiğimiz bölümümüzde, Claude Elwood Shannon, Ralph Hartley, Norbert Wiener, Lyapunov, Chebyshev, Kolmogorov, ve Smirnov gibi bilim insanlarının isimleri de anılmaktadır. Keyifli Dinlemeler. #66. George Gamow ve Bilim Anlatıcılığı (Bilim Tarihi Serisi B1: I. Kısım) - 25/10/2020: https://youtu.be/qIARyX8p8lg #68. Bilim Tarihi Serimize Bir Önsöz (Bilim Tarihi Serisi B2) - 08/11/2020: https://youtu.be/FVUc5tfYi7I #70. George Gamow - Bilimde Doğu ve Batı Blokları (Bilim Tarihi Serisi B3: II. Kısım) - 22/11/2020: https://youtu.be/7k_IRL_B8WA #71. Michael Faraday (Bilim Tarihi Serisi B4) - 29/11/2020: https://youtu.be/OtEQ0pI-baI #73. Kümeler Kuramı'nın Önemi ve Tarihsel Gelişimi (Bilim Tarihi Serisi B5: I. Kısım): https://youtu.be/pSksJkWK6wU #76. Kümeler Kuramı'nın Etkileri (Bilim Tarihi Serisi B6: II. Kısım): https://youtu.be/gtpdAUaCgzw #77. Kümeler Kuramı ve Hesaplama (Bilim Tarihi Serisi B7: III. Kısım): https://youtu.be/TMt_rUbE4M4 #78. Kümeler Kuramı'nın Kuraltanımazları (Bilim Tarihi Serisi B8: IV. Kısım) - 17/01/2021: https://youtu.be/qHMdAjr4lQ0 #79. Kümeler Kuramı'nın Günümüzdeki Kullanımı (Bilim Tarihi Serisi B9: V. Kısım) - 24/01/2021: https://youtu.be/WoF5_A7nKQM #84. Gürültü Kavramına Giriş (Bilim Tarihi Serisi B10: I. Kısım) - 28/02/2021: https://youtu.be/4nCgno6XDVM #88. Pierre-Simon, Marquis de Laplace (Bilim Tarihi Serisi B11: I. Kısım) - 28/03/2021: https://youtu.be/-jRuE37K_M0 Tapir Lab. GitHub: @TapirLab, https://github.com/tapirlab/ Tapir Lab. Instagram: @tapirlab, https://www.instagram.com/tapirlab/ Tapir Lab. Twitter: @tapirlab, https://twitter.com/tapirlab Tapir Lab.: http://www.tapirlab.com

Math Thématique
Andreï Kolmogorov : un grand mathématicien au coeur d'un siècle tourmenté

Math Thématique

Play Episode Listen Later Feb 25, 2021 94:59


Andreï Kolmogorov est un mathématicien russe (1903-1987) qui a apporté des contributions frappantes en théorie des probabilités, théorie ergodique, turbulence, mécanique classique, logique mathématique, topologie, théorie algorithmique de l'information et en analyse de la complexité des algorithmes. Alexander Bufetov, Directeur de recherche CNRS (I2M - Aix-Marseille Université, CNRS, Centrale Marseille) et porteur local de la Chaire Jean-Morlet (Chaire Tamara Grava 2019 - semestre 1) donnera une conférence sur les contributions exceptionnelles et la vie dramatique d'un grand génie du XXe siècle. Conférence grand public au CIRM Luminy Vidéo disponible sur la chaine YouTube du Centre International de Rencontres Mathématiques

TapirCast
#79. Kümeler Kuramı'nın Günümüzdeki Kullanımı (Bilim Tarihi Serisi B9: V. Kısım) - 24/01/2021

TapirCast

Play Episode Listen Later Jan 24, 2021 17:56


Doç. Dr. Serhan Yarkan ve Halil Said Cankurtaran'ın yer aldığı, Bilim Tarihi Serisi'nin Kümeler Kuramı odaklı beşinci kısmında: Kümeler Kuramı'nın günümüzdeki kullanımı ve hesaplama, olasılık ve topoloji alanları ile olan ilişkisi üzerine konuşulmuştur. Hesaplamada P vs NP problemine değinildikten sonra, olasılıkta Kolmogorov'un belitleri ve kuramın temellerinin diğer alanlar ile ilişkisi üzerinde durulmuştur. Son olarak fiziksel olayların matematiksel olarak ifade edilmesinde çok iyi bir araç olan topolojinin, kuram ile ilişkisi ve günümüzdeki kullanım alanları hakkında konuşularak bölüm sonlandırılmıştır. Keyifli dinlemeler. #73. Kümeler Kuramı'nın Önemi ve Tarihsel Gelişimi (Bilim Tarihi Serisi B5: I. Kısım): https://youtu.be/pSksJkWK6wU #76. Kümeler Kuramı'nın Etkileri (Bilim Tarihi Serisi B6: II. Kısım): https://youtu.be/gtpdAUaCgzw #77. Kümeler Kuramı ve Hesaplama (Bilim Tarihi Serisi B7: III. Kısım): https://youtu.be/TMt_rUbE4M4 #78. Kümeler Kuramı'nın Kuraltanımazları (Bilim Tarihi Serisi B8: IV. Kısım) - 17/01/2021: https://youtu.be/qHMdAjr4lQ0 Tapir Lab. GitHub: @TapirLab, https://www.github.com/tapirlab Tapir Lab. Instagram: @tapirlab, https://www.instagram.com/tapirlab/ Tapir Lab. Twitter: @tapirlab, https://www.twitter.com/tapirlab Tapir Lab.: http://www.tapirlab.com

Machine Learning Street Talk
#039 - Lena Voita - NLP

Machine Learning Street Talk

Play Episode Listen Later Jan 23, 2021 118:21


ena Voita is a Ph.D. student at the University of Edinburgh and University of Amsterdam. Previously, She was a research scientist at Yandex Research and worked closely with the Yandex Translate team. She still teaches NLP at the Yandex School of Data Analysis. She has created an exciting new NLP course on her website lena-voita.github.io which you folks need to check out! She has one of the most well presented blogs we have ever seen, where she discusses her research in an easily digestable manner. Lena has been investigating many fascinating topics in machine learning and NLP. Today we are going to talk about three of her papers and corresponding blog articles; Source and Target Contributions to NMT Predictions -- Where she talks about the influential dichotomy between the source and the prefix of neural translation models. https://arxiv.org/pdf/2010.10907.pdf https://lena-voita.github.io/posts/source_target_contributions_to_nmt.html Information-Theoretic Probing with MDL -- Where Lena proposes a technique of evaluating a model using the minimum description length or Kolmogorov complexity of labels given representations rather than something basic like accuracy https://arxiv.org/pdf/2003.12298.pdf https://lena-voita.github.io/posts/mdl_probes.html Evolution of Representations in the Transformer - Lena investigates the evolution of representations of individual tokens in Transformers -- trained with different training objectives (MT, LM, MLM) https://arxiv.org/abs/1909.01380 https://lena-voita.github.io/posts/emnlp19_evolution.html Panel Dr. Tim Scarfe, Yannic Kilcher, Sayak Paul 00:00:00 Kenneth Stanley / Greatness can not be planned house keeping 00:21:09 Kilcher intro 00:28:54 Hello Lena 00:29:21 Tim - Lenas NMT paper 00:35:26 Tim - Minimum Description Length / Probe paper 00:40:12 Tim - Evolution of representations 00:46:40 Lenas NLP course 00:49:18 The peppermint tea situation 00:49:28 Main Show Kick Off 00:50:22 Hallucination vs exposure bias 00:53:04 Lenas focus on explaining the models not SOTA chasing 00:56:34 Probes paper and NLP intepretability 01:02:18 Why standard probing doesnt work 01:12:12 Evolutions of representations paper 01:23:53 BERTScore and BERT Rediscovers the Classical NLP Pipeline paper 01:25:10 Is the shifting encoding context because of BERT bidirectionality 01:26:43 Objective defines which information we lose on input 01:27:59 How influential is the dataset? 01:29:42 Where is the community going wrong? 01:31:55 Thoughts on GOFAI/Understanding in NLP? 01:36:38 Lena's NLP course 01:47:40 How to foster better learning / understanding 01:52:17 Lena's toolset and languages 01:54:12 Mathematics is all you need 01:56:03 Programming languages https://lena-voita.github.io/ https://www.linkedin.com/in/elena-voita/ https://scholar.google.com/citations?user=EcN9o7kAAAAJ&hl=ja https://twitter.com/lena_voita

Josh on Narro
The Kolmogorov option

Josh on Narro

Play Episode Listen Later Nov 5, 2020 15:03


Andrey Nikolaevich Kolmogorov was one of the giants of 20th-century mathematics. I've always found it amazing that the same man was responsible both f... https://www.scottaaronson.com/blog/?p=3376&utm_source=Thinking+About+Things&utm_campaign=cf79e1519a-EMAIL_CAMPAIGN_9_1_2019_1_5_COPY_01&utm_medium=email&utm_term=0_33397823f0-cf79e1519a-412551669 Shtetl-OptimizedIs “information is physical” contentful?What I believe II (ft. Sarah Constantin and Stacey Jeffery)The Kolmogorov optionAndrey Nikolaevich KolmogorovfoundationsKolmogorov complexity“sophistication,”Hilbert’s thirteenth problemawe-inspiring listLeonid LevinGessen’s biography of Perelmanexcluded from the top math programsLysenkoismLuzin affaircommon knowledgeTed NelsonH.C. Pocklington“and yet it moves”Dialogue Concerning the Two Chief World Systemsdifferent viewNerd InterestThe Fate of HumanityRSS 2.0trackback

Machine Learning Street Talk
AI Alignment & AGI Fire Alarm - Connor Leahy

Machine Learning Street Talk

Play Episode Listen Later Nov 1, 2020 124:35


This week Dr. Tim Scarfe, Alex Stenlake and Yannic Kilcher speak with AGI and AI alignment specialist Connor Leahy a machine learning engineer from Aleph Alpha and founder of EleutherAI. Connor believes that AI alignment is philosophy with a deadline and that we are on the precipice, the stakes are astronomical. AI is important, and it will go wrong by default. Connor thinks that the singularity or intelligence explosion is near. Connor says that AGI is like climate change but worse, even harder problems, even shorter deadline and even worse consequences for the future. These problems are hard, and nobody knows what to do about them. 00:00:00 Introduction to AI alignment and AGI fire alarm 00:15:16 Main Show Intro 00:18:38 Different schools of thought on AI safety 00:24:03 What is intelligence? 00:25:48 AI Alignment 00:27:39 Humans dont have a coherent utility function 00:28:13 Newcomb's paradox and advanced decision problems 00:34:01 Incentives and behavioural economics 00:37:19 Prisoner's dilemma 00:40:24 Ayn Rand and game theory in politics and business 00:44:04 Instrumental convergence and orthogonality thesis 00:46:14 Utility functions and the Stop button problem 00:55:24 AI corrigibality - self alignment 00:56:16 Decision theory and stability / wireheading / robust delegation 00:59:30 Stop button problem 01:00:40 Making the world a better place 01:03:43 Is intelligence a search problem? 01:04:39 Mesa optimisation / humans are misaligned AI 01:06:04 Inner vs outer alignment / faulty reward functions 01:07:31 Large corporations are intelligent and have no stop function 01:10:21 Dutch booking / what is rationality / decision theory 01:16:32 Understanding very powerful AIs 01:18:03 Kolmogorov complexity 01:19:52 GPT-3 - is it intelligent, are humans even intelligent? 01:28:40 Scaling hypothesis 01:29:30 Connor thought DL was dead in 2017 01:37:54 Why is GPT-3 as intelligent as a human 01:44:43 Jeff Hawkins on intelligence as compression and the great lookup table 01:50:28 AI ethics related to AI alignment? 01:53:26 Interpretability 01:56:27 Regulation 01:57:54 Intelligence explosion Discord: https://discord.com/invite/vtRgjbM EleutherAI: https://www.eleuther.ai Twitter: https://twitter.com/npcollapse LinkedIn: https://www.linkedin.com/in/connor-j-leahy/

PaperPlayer biorxiv bioinformatics
Predicting the Emergence of SARS-CoV-2 Clades

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Jul 27, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.26.222117v1?rss=1 Authors: Jain, S., Xiao, X., Bogdan, P., Bruck, J. Abstract: Evolution is a process of change where mutations in the viral RNA are selected based on their fitness for replication and survival. Given that current phylogenetic analysis of SARS-CoV-2 identifies new viral clades after they exhibit evolutionary selections, one wonders whether we can identify the viral selection and predict the emergence of new viral clades? Inspired by the Kolmogorov complexity concept, we propose a generative complexity (algorithmic) framework capable to analyze the viral RNA sequences by mapping the multiscale nucleotide dependencies onto a state machine, where states represent subsequences of nucleotides and state-transition probabilities encode the higher order interactions between these states. We apply computational learning and classification techniques to identify the active state-transitions and use those as features in clade classifiers to decipher the transient mutations (still evolving within a clade) and stable mutations (typical to a clade). As opposed to current analysis tools that rely on the edit distance between sequences and require sequence alignment, our method is computationally local, does not require sequence alignment and is robust to random errors (substitution, insertions and deletions). Relying on the GISAID viral sequence database, we demonstrate that our method can predict clade emergence, potentially aiding with the design of medications and vaccines. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv bioinformatics
Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Jul 23, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.22.216242v1?rss=1 Authors: Vitanyi, P. M. B., Cilibrasi, R. L. Abstract: We analyze the phylogeny and taxonomy of the SARS-CoV-2 virus using compression. This is a new alignment-free method called the "normalized compression distance" (NCD) method. It discovers all effective similarities based on Kolmogorov complexity. The latter being incomputable we approximate it by a good compressor such as the modern zpaq. The results comprise that the SARS-CoV-2 virus is closest to the RaTG13 virus and similar to two bat SARS-like coronaviruses bat-SL-CoVZXC21 and bat-SL-CoVZC4. The similarity is quantified and compared with the same quantified similarities among the mtDNA of certain species. We treat the question whether Pangolins are involved in the SARS-CoV-2 virus. Copy rights belong to original authors. Visit the link for more info

HilandoFino Daily
541. Complejidad de Kolmogorov e infografías de las redes sociales.

HilandoFino Daily

Play Episode Listen Later May 13, 2020 9:39


* Notas del programa: https://hilandofino.net/daily/541-complejidad-de-kolmogorov-e-infografias-de-las-redes-sociales/ *Lista de email: hilandofino.net/lista *Telegram: https://t.me/hilandofino *Instagram: @sebas_abril_faura Si quieres apoyar el podcast, compra por este enlace (afiliado): https://amzn.to/33ycGEE

HilandoFino Daily
541. Complejidad de Kolmogorov e infografías de las redes sociales.

HilandoFino Daily

Play Episode Listen Later May 13, 2020 9:40


* Notas del programa:https://hilandofino.net/daily/541-complejidad-de-kolmogorov-e-infografias-de-las-redes-sociales/*Lista de email: hilandofino.net/lista*Telegram: https://t.me/hilandofino*Instagram: @sebas_abril_fauraSi quieres apoyar el podcast, compra por este enlace (afiliado): https://amzn.to/33ycGEE

HilandoFino Daily
541. Complejidad de Kolmogorov e infografías de las redes sociales.

HilandoFino Daily

Play Episode Listen Later May 13, 2020 9:40


* Notas del programa:https://hilandofino.net/daily/541-complejidad-de-kolmogorov-e-infografias-de-las-redes-sociales/*Lista de email: hilandofino.net/lista*Telegram: https://t.me/hilandofino*Instagram: @sebas_abril_fauraSi quieres apoyar el podcast, compra por este enlace (afiliado): https://amzn.to/33ycGEE

Sommerfeld Theory Colloquium (ASC)
Turbulence: scaling and beyond

Sommerfeld Theory Colloquium (ASC)

Play Episode Listen Later Apr 29, 2020 72:57


This lecture is aimed at a rather broad audience including under- graduates. It will discuss scaling arguments, what they can achieve (Kolmogorov’s -5/3 law, etc.) and what they miss (fractals, Etc).

Lex Fridman Podcast
#75 – Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

Lex Fridman Podcast

Play Episode Listen Later Feb 26, 2020 100:23


Marcus Hutter is a senior research scientist at DeepMind and professor at Australian National University. Throughout his career of research, including with Jürgen Schmidhuber and Shane Legg, he has proposed a lot of interesting ideas in and around the field of artificial general intelligence, including the development of the AIXI model which is a mathematical approach to AGI that incorporates ideas of Kolmogorov complexity, Solomonoff induction, and reinforcement learning. EPISODE LINKS: Hutter Prize: http://prize.hutter1.net Marcus web: http://www.hutter1.net Books mentioned: – Universal AI: https://amzn.to/2waIAuw – AI: A Modern Approach: https://amzn.to/3camxnY – Reinforcement Learning: https://amzn.to/2PoANj9 – Theory of Knowledge: https://amzn.to/3a6Vp7x This conversation

the bioinformatics chat
#33 Genome assembly from long reads and Flye with Mikhail Kolmogorov

the bioinformatics chat

Play Episode Listen Later May 31, 2019 72:56


Modern genome assembly projects are often based on long reads in an attempt to bridge longer repeats. However, due to the higher error rate of the current long read sequencers, assemblers based on de Bruijn graphs do not work well in this setting, and the approaches that do work are slower. In this episode, Mikhail Kolmogorov from Pavel Pevzner’s lab joins us to talk about some of the ideas developed in the lab that made it possible to build a de Bruijn-like assembly graph from noisy reads. These ideas are now implemented in the Flye assembler, which performs much faster than the existing long read assemblers without sacrificing the quality of the assembly. Links: Assembly of Long Error-Prone Reads Using Repeat Graphs (Mikhail Kolmogorov, Jeffrey Yuan, Yu Lin, Pavel. A. Pevzner. Nature Biotechnology (paywalled), bioRxiv Flye on GitHub

Theoretical Physics - From Outer Space to Plasma
Why the world is simple - Prof Ard Louis

Theoretical Physics - From Outer Space to Plasma

Play Episode Listen Later Feb 15, 2019 38:47


The coding theorem from algorithmic information theory (AIT) - which should be much more widely taught in Physics! - suggests that many processes in nature may be highly biased towards simple outputs. Here simple means highly compressible, or more formally, outputs with relatively lower Kolmogorov complexity. I will explore applications to biological evolution, where the coding theorem implies an exponential bias towards outcomes with higher symmetry, and to deep learning neural networks, where the coding theorem predicts an Occam's razor like bias that may explain why these highly overparamterised systems work so well.

Theoretical Physics - From Outer Space to Plasma
Why the world is simple - Prof Ard Louis

Theoretical Physics - From Outer Space to Plasma

Play Episode Listen Later Feb 15, 2019 38:47


The coding theorem from algorithmic information theory (AIT) - which should be much more widely taught in Physics! - suggests that many processes in nature may be highly biased towards simple outputs. Here simple means highly compressible, or more formally, outputs with relatively lower Kolmogorov complexity. I will explore applications to biological evolution, where the coding theorem implies an exponential bias towards outcomes with higher symmetry, and to deep learning neural networks, where the coding theorem predicts an Occam's razor like bias that may explain why these highly overparamterised systems work so well.

Modellansatz - English episodes only

Gudrun Talks to Sema Coşkun who at the moment of the conversation in 2018 is a Post Doc researcher at the University Kaiserslautern in the group of financial mathematics. She constructs models for the behaviour of energy markets. In short the conversation covers the questions How are classical markets modelled? In which way are energy markets different and need new ideas? The seminal work of Black and Scholes (1973) established the modern financial theory. In a Black-Scholes setting, it is assumed that the stock price follows a Geometric Brownian Motion with a constant drift and constant volatility. The stochastic differential equation for the stock price process has an explicit solution. Therefore, it is possible to obtain the price of a European call option in a closed-form formula. Nevertheless, there exist drawbacks of the Black-Scholes assumptions. The most criticized aspect is the constant volatility assumption. It is considered an oversimplification. Several improved models have been introduced to overcome those drawbacks. One significant example of such new models is the Heston stochastic volatility model (Heston, 1993). In this model, volatility is indirectly modeled by a separate mean reverting stochastic process, namely. the Cox-Ingersoll-Ross (CIR) process. The CIR process captures the dynamics of the volatility process well. However, it is not easy to obtain option prices in the Heston model since the model has more complicated dynamics compared to the Black-Scholes model. In financial mathematics, one can use several methods to deal with these problems. In general, various stochastic processes are used to model the behavior of financial phenomena. One can then employ purely stochastic approaches by using the tools from stochastic calculus or probabilistic approaches by using the tools from probability theory. On the other hand, it is also possible to use Partial Differential Equations (the PDE approach). The correspondence between the stochastic problem and its related PDE representation is established by the help of Feynman-Kac theorem. Also in their original paper, Black and Scholes transferred the stochastic representation of the problem into its corresponding PDE, the heat equation. After solving the heat equation, they transformed the solution back into the relevant option price. As a third type of methods, one can employ numerical methods such as Monte Carlo methods. Monte Carlo methods are especially useful to compute the expected value of a random variable. Roughly speaking, instead of examining the probabilistic evolution of this random variable, we focus on the possible outcomes of it. One generates random numbers with the same distribution as the random variable and then we simulate possible outcomes by using those random numbers. Then we replace the expected value of the random variable by taking the arithmetic average of the possible outcomes obtained by the Monte Carlo simulation. The idea of Monte Carlo is simple. However, it takes its strength from two essential theorems, namely Kolmogorov’s strong law of large numbers which ensures convergence of the estimates and the central limit theorem, which refers to the error distribution of our estimates. Electricity markets exhibit certain properties which we do not observe in other markets. Those properties are mainly due to the unique characteristics of the production and consumption of electricity. Most importantly one cannot physically store electricity. This leads to several differences compared to other financial markets. For example, we observe spikes in electricity prices. Spikes refer to sudden upward or downward jumps which are followed by a fast reversion to the mean level. Therefore, electricity prices show extreme variability compared to other commodities or stocks. For example, in stock markets we observe a moderate volatility level ranging between 1% and 1.5%, commodities like crude oil or natural gas have relatively high volatilities ranging between 1.5% and 4% and finally the electricity energy has up to 50% volatility (Weron, 2000). Moreover, electricity prices show strong seasonality which is related to day to day and month to month variations in the electricity consumption. In other words, electricity consumption varies depending on the day of the week and month of the year. Another important property of the electricity prices is that they follow a mean reverting process. Thus, the Ornstein-Uhlenbeck (OU) process which has a Gaussian distribution is widely used to model electricity prices. In order to incorporate the spike behavior of the electricity prices, a jump or a Levy component is merged into the OU process. These models are known as generalized OU processes (Barndorff-Nielsen & Shephard, 2001; Benth, Kallsen & Meyer-Brandis, 2007). There exist several models to capture those properties of electricity prices. For example, structural models which are based on the equilibrium of supply and demand (Barlow, 2002), Markov jump diffusion models which combine the OU process with pure jump diffusions (Geman & Roncoroni, 2006), regime-switching models which aim to distinguish the base and spike regimes of the electricity prices and finally the multi-factor models which have a deterministic component for seasonality, a mean reverting process for the base signal and a jump or Levy process for spikes (Meyer-Brandis & Tankov, 2008). The German electricity market is one of the largest in Europe. The energy strategy of Germany follows the objective to phase out the nuclear power plants by 2021 and gradually introduce renewable energy ressources. For electricity production, the share of renewable ressources will increase up to 80% by 2050. The introduction of renewable ressources brings also some challenges for electricity trading. For example, the forecast errors regarding the electricity production might cause high risk for market participants. However, the developed market structure of Germany is designed to reduce this risk as much as possible. There are two main electricity spot price markets where the market participants can trade electricity. The first one is the day-ahead market in which the trading takes place around noon on the day before the delivery. In this market, the trades are based on auctions. The second one is the intraday market in which the trading starts at 3pm on the day before the delivery and continues up until 30 minutes before the delivery. Intraday market allows continuous trading of electricity which indeed helps the market participants to adjust their positions more precisely in the market by reducing the forecast errors. References S. Coskun and R. Korn: Pricing Barrier Options in the Heston Model Using the Heath-Platen estimator. Monte Carlo Methods and Applications. 24 (1) 29-42, 2018. S. Coskun: Application of the Heath–Platen Estimator in Pricing Barrier and Bond Options. PhD thesis, Department of Mathematics, University of Kaiserslautern, Germany, 2017. S. Desmettre and R. Korn: 10 Computationally challenging problems in Finance. FPGA Based Accelerators for Financial Applications, Springer, Heidelberg, 1–32, 2015. F. Black and M. Scholes: The pricing of options and corporate liabilities. The Journal of Political Economy, 81(3):637-654, 1973. S.L. Heston: A closed-form solution for options with stochastic volatility with applications to bond and currency options. The Review of Financial Studies, 6(2):327–343, 1993. R. Korn, E. Korn and G. Kroisandt: Monte Carlo Methods and Models in Finance and Insurance. Chapman & Hall/CRC Financ. Math. Ser., CRC Press, Boca Raton, 2010. P. Glasserman, Monte Carlo Methods in Financial Engineering. Stochastic Modelling and Applied Probability, Appl. Math. (New York) 53, Springer, New York, 2004. M.T. Barlow: A diffusion model for electricity prices. Mathematical Finance, 12(4):287-298, 2002. O.E. Barndorff-Nielsen and N. Shephard: Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics. Journal of the Royal Statistical Society B, 63(2):167-241, 2001. H. Geman and A. Roncoroni: Understanding the fine structure of electricity prices. The Journal of Business, 79(3):1225-1261, 2006. T. Meyer-Brandis and P. Tankov: Multi-factor jump-diffusion models of electricity prices. International Journal of Theoretical and Applied Finance, 11(5):503-528, 2008. R. Weron: Energy price risk management. Physica A, 285(1-2):127–134, 2000. Podcasts G. Thäter, M. Hofmanová: Turbulence, conversation in the Modellansatz Podcast, episode 155, Department of Mathematics, Karlsruhe Institute of Technology (KIT), 2018. http://modellansatz.de/turbulence G. Thäter, M. J. Amtenbrink: Wasserstofftankstellen, Gespräch im Modellansatz Podcast, Folge 163, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2018. http://modellansatz.de/wasserstofftankstellen S. Ajuvo, S. Ritterbusch: Finanzen damalsTM, Gespräch im Modellansatz Podcast, Folge 97, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2016. http://modellansatz.de/finanzen-damalstm K. Cindric, G. Thäter: Kaufverhalten, Gespräch im Modellansatz Podcast, Folge 45, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2015. http://modellansatz.de/kaufverhalten V. Riess, G. Thäter: Gasspeicher, Gespräch im Modellansatz Podcast, Folge 23, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2015. http://modellansatz.de/gasspeicher F. Schueth, T. Pritlove: Energieforschung, Episode 12 im Forschergeist Podcast, Stifterverband/Metaebene, 2015. https://forschergeist.de/podcast/fg012-energieforschung/

Modellansatz
Energy Markets

Modellansatz

Play Episode Listen Later Dec 21, 2018 65:00


Gudrun Talks to Sema Coşkun who at the moment of the conversation in 2018 is a Post Doc researcher at the University Kaiserslautern in the group of financial mathematics. She constructs models for the behaviour of energy markets. In short the conversation covers the questions How are classical markets modelled? In which way are energy markets different and need new ideas? The seminal work of Black and Scholes (1973) established the modern financial theory. In a Black-Scholes setting, it is assumed that the stock price follows a Geometric Brownian Motion with a constant drift and constant volatility. The stochastic differential equation for the stock price process has an explicit solution. Therefore, it is possible to obtain the price of a European call option in a closed-form formula. Nevertheless, there exist drawbacks of the Black-Scholes assumptions. The most criticized aspect is the constant volatility assumption. It is considered an oversimplification. Several improved models have been introduced to overcome those drawbacks. One significant example of such new models is the Heston stochastic volatility model (Heston, 1993). In this model, volatility is indirectly modeled by a separate mean reverting stochastic process, namely. the Cox-Ingersoll-Ross (CIR) process. The CIR process captures the dynamics of the volatility process well. However, it is not easy to obtain option prices in the Heston model since the model has more complicated dynamics compared to the Black-Scholes model. In financial mathematics, one can use several methods to deal with these problems. In general, various stochastic processes are used to model the behavior of financial phenomena. One can then employ purely stochastic approaches by using the tools from stochastic calculus or probabilistic approaches by using the tools from probability theory. On the other hand, it is also possible to use Partial Differential Equations (the PDE approach). The correspondence between the stochastic problem and its related PDE representation is established by the help of Feynman-Kac theorem. Also in their original paper, Black and Scholes transferred the stochastic representation of the problem into its corresponding PDE, the heat equation. After solving the heat equation, they transformed the solution back into the relevant option price. As a third type of methods, one can employ numerical methods such as Monte Carlo methods. Monte Carlo methods are especially useful to compute the expected value of a random variable. Roughly speaking, instead of examining the probabilistic evolution of this random variable, we focus on the possible outcomes of it. One generates random numbers with the same distribution as the random variable and then we simulate possible outcomes by using those random numbers. Then we replace the expected value of the random variable by taking the arithmetic average of the possible outcomes obtained by the Monte Carlo simulation. The idea of Monte Carlo is simple. However, it takes its strength from two essential theorems, namely Kolmogorov’s strong law of large numbers which ensures convergence of the estimates and the central limit theorem, which refers to the error distribution of our estimates. Electricity markets exhibit certain properties which we do not observe in other markets. Those properties are mainly due to the unique characteristics of the production and consumption of electricity. Most importantly one cannot physically store electricity. This leads to several differences compared to other financial markets. For example, we observe spikes in electricity prices. Spikes refer to sudden upward or downward jumps which are followed by a fast reversion to the mean level. Therefore, electricity prices show extreme variability compared to other commodities or stocks. For example, in stock markets we observe a moderate volatility level ranging between 1% and 1.5%, commodities like crude oil or natural gas have relatively high volatilities ranging between 1.5% and 4% and finally the electricity energy has up to 50% volatility (Weron, 2000). Moreover, electricity prices show strong seasonality which is related to day to day and month to month variations in the electricity consumption. In other words, electricity consumption varies depending on the day of the week and month of the year. Another important property of the electricity prices is that they follow a mean reverting process. Thus, the Ornstein-Uhlenbeck (OU) process which has a Gaussian distribution is widely used to model electricity prices. In order to incorporate the spike behavior of the electricity prices, a jump or a Levy component is merged into the OU process. These models are known as generalized OU processes (Barndorff-Nielsen & Shephard, 2001; Benth, Kallsen & Meyer-Brandis, 2007). There exist several models to capture those properties of electricity prices. For example, structural models which are based on the equilibrium of supply and demand (Barlow, 2002), Markov jump diffusion models which combine the OU process with pure jump diffusions (Geman & Roncoroni, 2006), regime-switching models which aim to distinguish the base and spike regimes of the electricity prices and finally the multi-factor models which have a deterministic component for seasonality, a mean reverting process for the base signal and a jump or Levy process for spikes (Meyer-Brandis & Tankov, 2008). The German electricity market is one of the largest in Europe. The energy strategy of Germany follows the objective to phase out the nuclear power plants by 2021 and gradually introduce renewable energy ressources. For electricity production, the share of renewable ressources will increase up to 80% by 2050. The introduction of renewable ressources brings also some challenges for electricity trading. For example, the forecast errors regarding the electricity production might cause high risk for market participants. However, the developed market structure of Germany is designed to reduce this risk as much as possible. There are two main electricity spot price markets where the market participants can trade electricity. The first one is the day-ahead market in which the trading takes place around noon on the day before the delivery. In this market, the trades are based on auctions. The second one is the intraday market in which the trading starts at 3pm on the day before the delivery and continues up until 30 minutes before the delivery. Intraday market allows continuous trading of electricity which indeed helps the market participants to adjust their positions more precisely in the market by reducing the forecast errors. References S. Coskun and R. Korn: Pricing Barrier Options in the Heston Model Using the Heath-Platen estimator. Monte Carlo Methods and Applications. 24 (1) 29-42, 2018. S. Coskun: Application of the Heath–Platen Estimator in Pricing Barrier and Bond Options. PhD thesis, Department of Mathematics, University of Kaiserslautern, Germany, 2017. S. Desmettre and R. Korn: 10 Computationally challenging problems in Finance. FPGA Based Accelerators for Financial Applications, Springer, Heidelberg, 1–32, 2015. F. Black and M. Scholes: The pricing of options and corporate liabilities. The Journal of Political Economy, 81(3):637-654, 1973. S.L. Heston: A closed-form solution for options with stochastic volatility with applications to bond and currency options. The Review of Financial Studies, 6(2):327–343, 1993. R. Korn, E. Korn and G. Kroisandt: Monte Carlo Methods and Models in Finance and Insurance. Chapman & Hall/CRC Financ. Math. Ser., CRC Press, Boca Raton, 2010. P. Glasserman, Monte Carlo Methods in Financial Engineering. Stochastic Modelling and Applied Probability, Appl. Math. (New York) 53, Springer, New York, 2004. M.T. Barlow: A diffusion model for electricity prices. Mathematical Finance, 12(4):287-298, 2002. O.E. Barndorff-Nielsen and N. Shephard: Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics. Journal of the Royal Statistical Society B, 63(2):167-241, 2001. H. Geman and A. Roncoroni: Understanding the fine structure of electricity prices. The Journal of Business, 79(3):1225-1261, 2006. T. Meyer-Brandis and P. Tankov: Multi-factor jump-diffusion models of electricity prices. International Journal of Theoretical and Applied Finance, 11(5):503-528, 2008. R. Weron: Energy price risk management. Physica A, 285(1-2):127–134, 2000. Podcasts G. Thäter, M. Hofmanová: Turbulence, conversation in the Modellansatz Podcast, episode 155, Department of Mathematics, Karlsruhe Institute of Technology (KIT), 2018. http://modellansatz.de/turbulence G. Thäter, M. J. Amtenbrink: Wasserstofftankstellen, Gespräch im Modellansatz Podcast, Folge 163, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2018. http://modellansatz.de/wasserstofftankstellen S. Ajuvo, S. Ritterbusch: Finanzen damalsTM, Gespräch im Modellansatz Podcast, Folge 97, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2016. http://modellansatz.de/finanzen-damalstm K. Cindric, G. Thäter: Kaufverhalten, Gespräch im Modellansatz Podcast, Folge 45, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2015. http://modellansatz.de/kaufverhalten V. Riess, G. Thäter: Gasspeicher, Gespräch im Modellansatz Podcast, Folge 23, Fakultät für Mathematik, Karlsruher Institut für Technologie (KIT), 2015. http://modellansatz.de/gasspeicher F. Schueth, T. Pritlove: Energieforschung, Episode 12 im Forschergeist Podcast, Stifterverband/Metaebene, 2015. https://forschergeist.de/podcast/fg012-energieforschung/

Slate Star Codex Podcast
KOLMOGOROV COMPLICITY AND THE PARABLE OF LIGHTNING

Slate Star Codex Podcast

Play Episode Listen Later Oct 24, 2017 26:19


A good scientist, in other words, does not merely ignore conventional wisdom, but makes a special effort to break it. Scientists go looking for trouble. — Paul Graham, What You Can’t Say I. Staying on the subject of Dark Age myths: what about all those scientists burned at the stake for their discoveries? Historical consensus declares this a myth invented by New Atheists. The Church was a great patron of science, no one believed in a flat earth, Galileo had it coming, et cetera. Unam Sanctam Catholicam presents some of these stories and explains why they’re less of a science-vs-religion slam dunk than generally supposed. Among my favorites:

Un texte, un mathématicien : conférences vidéo
Kolmogorov, le spectre de la turbulence par Isabelle Gallagher

Un texte, un mathématicien : conférences vidéo

Play Episode Listen Later Sep 28, 2015 58:00


Cycle de conférences organisées par la Bibliothèque nationale de France et la Société mathématique de France. Conférence du 15 avril 2015. Conférences organisées à l'attention du grand public, des professeurs du second degré et des lycéens et étudiants, les conférenciers partent d'un texte, ou d'un corpus de textes, et montrent en quoi ce texte les a influencés personnellement et a conduit à des recherches contemporaines.

Un texte, un mathématicien
Kolmogorov, le spectre de la turbulence par Isabelle Gallagher

Un texte, un mathématicien

Play Episode Listen Later Aug 3, 2015 58:00


Cycle de conférences organisées par la Bibliothèque nationale de France et la Société mathématique de France. Conférence du 15 avril 2015. Conférences organisées à l'attention du grand public, des professeurs du second degré et des lycéens et étudiants, les conférenciers partent d'un texte, ou d'un corpus de textes, et montrent en quoi ce texte les a influencés personnellement et a conduit à des recherches contemporaines.

Un texte, un mathématicien : conférences
Kolmogorov, le spectre de la turbulence par Isabelle Gallagher

Un texte, un mathématicien : conférences

Play Episode Listen Later Aug 3, 2015 58:00


Cycle de conférences organisées par la Bibliothèque nationale de France et la Société mathématique de France. Conférence du 15 avril 2015. Conférences organisées à l'attention du grand public, des professeurs du second degré et des lycéens et étudiants, les conférenciers partent d'un texte, ou d'un corpus de textes, et montrent en quoi ce texte les a influencés personnellement et a conduit à des recherches contemporaines.

SynTalk
#TMOI (The Meanings Of Information) --- SynTalk

SynTalk

Play Episode Listen Later Jun 27, 2015 63:23


SynTalk thinks about information, while constantly wondering about its physical nature and computability. Is there information in the universe irrespective of human beings or life? Does all the meaning come from a protocol, and what if there is no shared language? Does a protocol or a context need to pre exist? The concepts are derived off / from Laplace, Carnot, Boltzmann, Shannon, Ronald Fisher, Kolmogorov, T S Eliot, Warren Weaver, & Nørretranders, among others. We retrace the journey of the notion of information within (say) thermodynamics, electrical engineering, neurolinguistics, mathematics, and computational systems, & notice how the core departure was to think of it as measurable? Does the universe speak in one language? Does ignorance go down when information is received, and is ignorance analogous to disorder? Is entropy an anthropomorphic principle, as it assumes an underlying notion of order? How, in language, the norm (order) can be identified directly from a close study of the deviation from the norm (disorder). How the brain or any system may learn how to learn and negotiate meaning via ‘bootstrapping’? Is the nature of ‘input’ processing different from information processing as the neural networks are formed in a child’s brain? What makes data information for the receiver? Why does an internal combustion engine ‘have’ to dump out the disorder via the exhaust to direct order to the wheels? Can one think of information content as an objective ‘event cone’ with past and future imprinted in it? Is all time eternally present? Is there a fundamental unit (say, bit or qubit) of information, & is it discrete or continuous or both? How & why are the first and second language signals stored differently in the brain? What is the role played by shared context (exformation) and commonality in communication? Are there different mathematical theories of communication, information content and complexity? The links between wax, steam engines, Voyager, heads or tails, ‘motive power of fire’, critical period hypothesis, It from Bit, falling stones, the case of Chelsea’s misdiagnosis, Four Quartets, ‘I do’, heat death, & Schrodinger’s cat. Can we forget something if we explicitly want to? Does nature forget (information)? Will we drown in the crazy amount of information in the future, or will we develop new tools to handle complexity? Do we need to understand human mind & cognition better? Can we communicate with animals and (may be) aliens in the future? ‘If a lion could speak, we could not understand him’. The SynTalkrs are: Prof. Vaishna Narang (biolinguistics, JNU, New Delhi), Prof. Rajaram Nityananda (astrophysics, Azim Premji University, ex NCRA-TIFR, Bangalore), & Prof. R. Ramanujam (computer science, IMSc, Chennai).

Discrete Stochastic Processes
Lecture 24: Martingales: Stopping and Converging

Discrete Stochastic Processes

Play Episode Listen Later Jun 22, 2015 80:43


This lecture continues our conversation on Martingales and covers stopped martingales, Kolmogorov submartingale inequality, martingale convergence theorem, and more.

The Ockham Lecture - The Merton College Physics Lecture
The 17th Ockham Lecture - 'Physics in the World of Ideas: Complexity as Energy'

The Ockham Lecture - The Merton College Physics Lecture

Play Episode Listen Later Jun 3, 2015 50:06


Given by Professor Yuri Manin, Professor Emeritus, Max Planck Institute for Mathematics, Bonn, Germany; Professor Emeritus, Northwestern University, Evanston, USA; Principal Researcher, Steklov Mathematical Institute, Academy of Sciences, Moscow, Russia. In the 1930s, George Kingsley Zipf discovered an empirical statistical law that later proved to be remarkably universal. Consider a corpus of texts in a given language, make the list of all words that occur in them and the number of occurences. Range the words in the order of diminishing frequencies. Define the Zipf rank of the word as its number in this ordering. Then Zipf's Law says: "Frequency is inversely proportional to the rank". Zipf himself suggested that this law must follow from the principle of 'minimisation of effort' by the brain. However, the nature of this effort and its measure remained mysterious. In my lecture, I will argue that Zipf's effort needed to produce a word (say, name of the number) must be measured by the celebrated Kolmogorov complexity: the length of the shortest Turing program (input) needed to produce this word/name/combinatorial object/etc. as its output. I will describe basic properties of the complexity (some of them rather counterintuitive) and one more situation from the theory of error-correcting codes, where Kolmogorov complexity again plays the role of 'energy in the world of ideas'.

MCMP – Philosophy of Science
Occam's Razor in Algorithmic Information Theory

MCMP – Philosophy of Science

Play Episode Listen Later Feb 19, 2015 37:01


Tom Sterkenburg (Amsterdam/Groningen) gives a talk at the MCMP Colloquium (15 January, 2015) titled "Occam's Razor in Algorithmic Information Theory". Abstract: Algorithmic information theory, also known as Kolmogorov complexity, is sometimes believed to offer us a general and objective measure of simplicity. The first variant of this simplicity measure to appear in the literature was in fact part of a theory of prediction: the central achievement of its originator, R.J. Solomonoff, was the definition of an idealized method of prediction that is taken to implement Occam's razor in giving greater probability to simpler hypotheses about the future. Moreover, in many writings on the subject an argument of the following sort takes shape. From (1) the definition of the Solomonoff predictor which has a precise preference for simplicity, and (2) a formal proof that this predictor will generally lead us to the truth, it follows that (Occam's razor) a preference for simplicity will generally lead us to the truth. Thus, sensationally, this is an argument to justify Occam's razor. In this talk, I show why the argument fails. The key to its dissolution is a representation theorem that links Kolmogorov complexity to Bayesian prediction.

Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02

This thesis is concerned with the generalisation of Bayesian inference towards the use of imprecise or interval probability, with a focus on model behaviour in case of prior-data conflict. Bayesian inference is one of the main approaches to statistical inference. It requires to express (subjective) knowledge on the parameter(s) of interest not incorporated in the data by a so-called prior distribution. All inferences are then based on the so-called posterior distribution, the subsumption of prior knowledge and the information in the data calculated via Bayes' Rule. The adequate choice of priors has always been an intensive matter of debate in the Bayesian literature. While a considerable part of the literature is concerned with so-called non-informative priors aiming to eliminate (or, at least, to standardise) the influence of priors on posterior inferences, inclusion of specific prior information into the model may be necessary if data are scarce, or do not contain much information about the parameter(s) of interest; also, shrinkage estimators, common in frequentist approaches, can be considered as Bayesian estimators based on informative priors. When substantial information is used to elicit the prior distribution through, e.g, an expert's assessment, and the sample size is not large enough to eliminate the influence of the prior, prior-data conflict can occur, i.e., information from outlier-free data suggests parameter values which are surprising from the viewpoint of prior information, and it may not be clear whether the prior specifications or the integrity of the data collecting method (the measurement procedure could, e.g., be systematically biased) should be questioned. In any case, such a conflict should be reflected in the posterior, leading to very cautious inferences, and most statisticians would thus expect to observe, e.g., wider credibility intervals for parameters in case of prior-data conflict. However, at least when modelling is based on conjugate priors, prior-data conflict is in most cases completely averaged out, giving a false certainty in posterior inferences. Here, imprecise or interval probability methods offer sound strategies to counter this issue, by mapping parameter uncertainty over sets of priors resp. posteriors instead of over single distributions. This approach is supported by recent research in economics, risk analysis and artificial intelligence, corroborating the multi-dimensional nature of uncertainty and concluding that standard probability theory as founded on Kolmogorov's or de Finetti's framework may be too restrictive, being appropriate only for describing one dimension, namely ideal stochastic phenomena. The thesis studies how to efficiently describe sets of priors in the setting of samples from an exponential family. Models are developed that offer enough flexibility to express a wide range of (partial) prior information, give reasonably cautious inferences in case of prior-data conflict while resulting in more precise inferences when prior and data agree well, and still remain easily tractable in order to be useful for statistical practice. Applications in various areas, e.g. common-cause failure modeling and Bayesian linear regression, are explored, and the developed approach is compared to other imprecise probability models.

Mathematics and Physics of Anderson Localization: 50 Years After
Kolmogorov turbulence, Anderson localization and KAM integrability

Mathematics and Physics of Anderson Localization: 50 Years After

Play Episode Listen Later Sep 24, 2012 45:04


Shepelyansky, D (Université Paul Sabatier Toulouse III) Wednesday 19 September 2012, 11:50-12:30

Cosmos et maths
Le système solaire est-il chaotique ?

Cosmos et maths

Play Episode Listen Later Mar 21, 2012 88:54


Au XVIIIe siècle, Laplace estimait avoir prouvé la stabilité du Système solaire. En 1900, Poincaré a découvert que ce dernier pourrait au contraire être chaotique. Un demi-siècle plus tard, Kolmogorov a démontré qu'il avait des chances d’être stable… Que sait-on aujourd’hui sur cette question ? Conférence de François Béguin, maître de conférences au département de mathématiques d’Orsay à l’université Paris-Sud-XI et chercheur au département de mathématiques et applications à l’École normale supérieure (ENS), Paris

Fakultät für Physik - Digitale Hochschulschriften der LMU - Teil 02/05
Investigations of Faraday Rotation Maps of Extended Radio Sources in order to determine Cluster Magnetic Field Properties

Fakultät für Physik - Digitale Hochschulschriften der LMU - Teil 02/05

Play Episode Listen Later Nov 26, 2004


Ziel der Arbeit ist die Untersuchung von Magnetfeldern im intergalaktischen Gas von Galaxienhaufen mittels Faradayrotationskarten extragalaktischer Radioquellen, die in oder hinter einem Galaxienhaufen lokalisiert sind. Faradayrotation entsteht, wenn linear polarisierte Strahlung einer solchen Quelle durch ein magnetisiertes Medium propagiert und dabei dessen Polarisationsebene rotiert wird. Multifrequenzbeobachtungen erlauben die Konstruktion von Faradayrotationskarten. Die statistische Charakterisierung und Analyse dieser Karten erlaubt es, Eigenschaften der Magnetfelder, welche mit dem Plasma in Galaxienhaufen in Verbindung stehen, zu bestimmen. Es wurde untersucht, ob es einen Beweis dafuer gibt, dass die Faradayrotation im quellennahen Material erzeugt wird oder im magnetisierten Plasma der Galaxienhaufen. Dazu wurden zwei statistische Masse zur Charakterisierung der Daten eingefuehrt. Beide Masse sind ausserdem wertvolle Indikatoren fuer moegliche Probleme bei der Berechnung von Magetfeldeigenschaften auf der Basis von Faradayrotationsmessungen. Die Masse wurden auf Faradayrotationsmessungen von ausgedehnten Radioquellen angewandt. Es konnten keine Hinweise auf quellennahe Enstehungsorte der Faradyrotation gefunden werden. Aufgrund von davon unabhaengigen Beweisen, wurde festgestellt, dass die Magnetfelder, welche die Faradayrotation verursachen, mit dem Plasma in Galaxienhaufen in Zusammenhang stehen sollten.Eine statistische Analyse von Faradayrotationsmessungen mittels Autokorrelationsfunktionen und aequivalent dazu Energiespektren wurde entwickelt um Magnetfeldstaerken und -korrelationslaengen zu bestimmen. Diese Analyse stuetzt sich auf die Annahme, dass die Magnetfelder statistisch isotrop im Faradayrotationsgebiet verteilt sind. Sie benutzt eine sogenannte Fensterfunktion, die das Probenvolumen beschreibt, in welchem Magnetfelder detektierbar sind. Die Faradayrotationskarten von drei ausgedehnten Radioquellen (d.h. 3C75 in Abell 400, 3C465 in 2634 und Hydra A in Abell 780) wurden mittels dieser Methode neu ausgewertet und dabei Magnetfeldstaerken von 1 bis 10 muGauss fuer diese drei Galaxienhaufen abgeleitet.Die Messung von magnetischen Energiespektren erfordert Faradayrotationskarten hoechster Guete. Um Artefakte durch die Datenreduktion zu vermeiden, wurde ein neuer Algorithmus -- Pacman -- zur Berechnung von Faradayrotationskarten entwickelt. Verschiedene statistische Tests zeigen, dass dieser Algorithmus stabil ist und zuverlaessige Faradayrotationswerte berechnet. Zur genauen Messung von magnetischen Energiespektren aus den Pacman Karten wurde ein Maximum-Likelihood-Schaetzer, der auf der zuvor eingefuehrten Theorie beruht. Diese neue Methode erlaubt erstmals, die statistische Unsicherheit des Ergebnisses anzugeben. Des weiteren beruecksichtigt diese Methode das begrenzte Probenvolumen und macht die verlaessliche Bestimmung von Energiespektren moeglich. Diese Maximum-likelihood Methode wurde auf Pacman Faradayrotationskarten von Hydra A angewandt. Beruecksichtigt man die Ungewissheit ueber die exakte Probengeometrie des Faradaygebietes, erhaelt man eine Magnetfeldstaerke von 7 +/- 2 muGauss. Das berechnete Energiespektrum folgt einem Kolmogorov aehnlichem Energiespektrum ueber wenigstens eine Groesseenordnung. Die magnetische Energie ist auf einer dominanten Skale von ungefaehr 3 kpc konzentriert.

Fundación Juan March
Cuestiones de metamatemática (III): El problema metamatemático fundamental de la decisión

Fundación Juan March

Play Episode Listen Later May 18, 1976 66:05


El Profesor Florencio González Asenjo, catedrático de matemática de la Universidad de Pittsburgh, imparte la tercera conferencia del ciclo titulado “Cuestiones de metamatemática”. En este aspecto, se hizo referencia a la mecanización de la matemática (el célebre sueño de Leibniz) y al teorema de Church, y describió algunos ejemplos de sistemas decidibles e indecidibles, así como a las máquinas de Turing y los algoritmos de Markoc y Kolmogorov. Más información de este acto