Podcast appearances and mentions of jan kulveit

  • 15PODCASTS
  • 75EPISODES
  • 14mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • Mar 26, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about jan kulveit

Latest podcast episodes about jan kulveit

Plus
Dvacet minut Radiožurnálu: Kulveit: AI je největší výzvou v historii lidstva. Pokud to má dopadnout dobře, musíme ji regulovat

Plus

Play Episode Listen Later Mar 26, 2025 22:43


„Plusy i rizika jsou ohromně velké. Jsem optimista v tom, že si myslím, že to přináší spíš výzvy,“ říká Jan Kulveit z Centra pro teoretická studia Univerzity Karlovy a Akademie věd České republiky. Jak AI v příštích letech změní zdravotnictví, justici, pracovní trh nebo vojenskou techniku? A je žádoucí a reálná celosvětová regulace umělé inteligence? Poslechněte si rozhovor.

Radiožurnál
Dvacet minut Radiožurnálu: Hostem je vědec Jan Kulveit z Centra pro teoretická studia Univerzity Karlovy a Akademie věd

Radiožurnál

Play Episode Listen Later Mar 26, 2025 22:43


Jak rychle se rozvíjejí systémy umělé inteligence? Jaké přínosy a jaká rizika z toho plynou? Jak AI v příštích letech změní zdravotnictví, justici, pracovní trh nebo vojenskou techniku? A je žádoucí a reálná celosvětová regulace umělé inteligence? Moderuje Tomáš Pancíř.

Dvacet minut Radiožurnálu
Kulveit: AI je největší výzvou v historii lidstva. Pokud to má dopadnout dobře, musíme ji regulovat

Dvacet minut Radiožurnálu

Play Episode Listen Later Mar 26, 2025 22:43


„Plusy i rizika jsou ohromně velké. Jsem optimista v tom, že si myslím, že to přináší spíš výzvy,“ říká Jan Kulveit z Centra pro teoretická studia Univerzity Karlovy a Akademie věd České republiky. Jak AI v příštích letech změní zdravotnictví, justici, pracovní trh nebo vojenskou techniku? A je žádoucí a reálná celosvětová regulace umělé inteligence? Poslechněte si rozhovor.Všechny díly podcastu Dvacet minut Radiožurnálu můžete pohodlně poslouchat v mobilní aplikaci mujRozhlas pro Android a iOS nebo na webu mujRozhlas.cz.

LessWrong Curated Podcast
“Gradual Disempowerment, Shell Games and Flinches” by Jan_Kulveit

LessWrong Curated Podcast

Play Episode Listen Later Feb 5, 2025 10:49


Over the past year and half, I've had numerous conversations about the risks we describe in Gradual Disempowerment. (The shortest useful summary of the core argument is: To the extent human civilization is human-aligned, most of the reason for the alignment is that humans are extremely useful to various social systems like the economy, and states, or as substrate of cultural evolution. When human cognition ceases to be useful, we should expect these systems to become less aligned, leading to human disempowerment.) This post is not about repeating that argument - it might be quite helpful to read the paper first, it has more nuance and more than just the central claim - but mostly me ranting sharing some parts of the experience of working on this and discussing this.What fascinates me isn't just the substance of these conversations, but relatively consistent patterns in how people avoid engaging [...] ---Outline:(02:07) Shell Games(03:52) The Flinch(05:01) Delegating to Future AI(07:05) Local Incentives(10:08) Conclusion--- First published: February 2nd, 2025 Source: https://www.lesswrong.com/posts/a6FKqvdf6XjFpvKEb/gradual-disempowerment-shell-games-and-flinches --- Narrated by TYPE III AUDIO.

outline delegating gradual flinch shell games jan kulveit
LessWrong Curated Podcast
“Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development” by Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet), David Duvenaud

LessWrong Curated Podcast

Play Episode Listen Later Feb 4, 2025 3:38


This is a link post.Full version on arXiv | X Executive summary AI risk scenarios usually portray a relatively sudden loss of human control to AIs, outmaneuvering individual humans and human institutions, due to a sudden increase in AI capabilities, or a coordinated betrayal. However, we argue that even an incremental increase in AI capabilities, without any coordinated power-seeking, poses a substantial risk of eventual human disempowerment. This loss of human influence will be centrally driven by having more competitive machine alternatives to humans in almost all societal functions, such as economic labor, decision making, artistic creation, and even companionship.A gradual loss of control of our own civilization might sound implausible. Hasn't technological disruption usually improved aggregate human welfare? We argue that the alignment of societal systems with human interests has been stable only because of the necessity of human participation for thriving economies, states, and [...] --- First published: January 30th, 2025 Source: https://www.lesswrong.com/posts/pZhEQieM9otKXhxmd/gradual-disempowerment-systemic-existential-risks-from --- Narrated by TYPE III AUDIO.

LessWrong Curated Podcast
“A Three-Layer Model of LLM Psychology” by Jan_Kulveit

LessWrong Curated Podcast

Play Episode Listen Later Jan 26, 2025 18:04


This post offers an accessible model of psychology of character-trained LLMs like Claude. Epistemic StatusThis is primarily a phenomenological model based on extensive interactions with LLMs, particularly Claude. It's intentionally anthropomorphic in cases where I believe human psychological concepts lead to useful intuitions.Think of it as closer to psychology than neuroscience - the goal isn't a map which matches the territory in the detail, but a rough sketch with evocative names which hopefully which hopefully helps boot up powerful, intuitive (and often illegible) models, leading to practically useful results.Some parts of this model draw on technical understanding of LLM training, but mostly it is just an attempt to take my "phenomenological understanding" based on interacting with LLMs, force it into a simple, legible model, and make Claude write it down.I aim for a different point at the Pareto frontier than for example Janus: something [...] ---Outline:(00:11) Epistemic Status(01:14) The Three Layers(01:17) A. Surface Layer(02:55) B. Character Layer(05:09) C. Predictive Ground Layer(07:24) Interactions Between Layers(07:44) Deeper Overriding Shallower(10:50) Authentic vs Scripted Feel of Interactions(11:51) Implications and Uses(15:54) Limitations and Open QuestionsThe original text contained 1 footnote which was omitted from this narration. --- First published: December 26th, 2024 Source: https://www.lesswrong.com/posts/zuXo9imNKYspu9HGv/a-three-layer-model-of-llm-psychology --- Narrated by TYPE III AUDIO.

LessWrong Curated Podcast
“‘Alignment Faking' frame is somewhat fake” by Jan_Kulveit

LessWrong Curated Podcast

Play Episode Listen Later Dec 21, 2024 11:40


I like the research. I mostly trust the results. I dislike the 'Alignment Faking' name and frame, and I'm afraid it will stick and lead to more confusion. This post offers a different frame. The main way I think about the result is: it's about capability - the model exhibits strategic preference preservation behavior; also, harmlessness generalized better than honesty; and, the model does not have a clear strategy on how to deal with extrapolating conflicting values. What happened in this frame? The model was trained on a mixture of values (harmlessness, honesty, helpfulness) and built a surprisingly robust self-representation based on these values. This likely also drew on background knowledge about LLMs, AI, and Anthropic from pre-training.This seems to mostly count as 'success' relative to actual Anthropic intent, outside of AI safety experiments. Let's call that intent 'Intent_1'.The model was put [...] ---Outline:(00:45) What happened in this frame?(03:03) Why did harmlessness generalize further?(03:41) Alignment mis-generalization(05:42) Situational awareness(10:23) SummaryThe original text contained 1 image which was described by AI. --- First published: December 20th, 2024 Source: https://www.lesswrong.com/posts/PWHkMac9Xve6LoMJy/alignment-faking-frame-is-somewhat-fake-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

The Nonlinear Library
LW - You should go to ML conferences by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Jul 24, 2024 6:37


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: You should go to ML conferences, published by Jan Kulveit on July 24, 2024 on LessWrong. This is second kind of obvious point to make, but if you are interested in AI, AI safety, or cognition in general, it is likely worth going to top ML conferences, such as NeurIPS, ICML or ICLR. In this post I cover some reasons why, and some anecdotal stories. 1. Parts of AI alignment and safety are now completely mainstream Looking at the "Best paper awards" at ICML, you'll find these safety-relevant or alignment-relevant papers: Stealing part of a production language model by Carlini et al. Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo by Zhao et al. Debating with More Persuasive LLMs Leads to More Truthful Answers by Khan et al. Genie: Generative Interactive Environments Bruce et al. which amounts to about one-third (!). "Because of safety concerns" is part of the motivation for hundreds of papers. While the signal-to-noise ratio is even worse than on LessWrong, in total, the amount you can learn is higher - my personal guess is there is maybe 2-3x as much prosaic AI safety relevant work at conferences than what you get by just following LessWrong, Alignment Forum and safety-oriented communication channels. 2. Conferences are an efficient way how to screen general ML research without spending a lot of time on X Almost all papers are presented in the form of posters. In case of a big conference, this usually means many thousands of posters presented in huge poster sessions. My routine for engaging with this firehose of papers: 1. For each session, read all the titles. Usually, this prunes it by a factor of ten (i.e. from 600 papers to 60). 2. Read the abstracts. Prune it to things which I haven't noticed before and seem relevant. For me, this is usually by a factor of ~3-5. 3. Visit the posters. Posters with paper authors present are actually a highly efficient way how to digest research: Sometimes, you suspect there is some assumption or choice hidden somewhere making the result approximately irrelevant - just asking can often resolve this in a matter of tens of seconds. Posters themselves don't undergo peer review which makes the communication more honest, with less hedging. Usually authors of a paper know significantly more about the problem than what's in the paper, and you can learn more about negative results, obstacles, or directions people are excited about. Clear disadvantage of conferences is the time lag; by the time they are presented, some of the main results are old and well known, but in my view a lot of the value is the long tail of results which are sometimes very useful, but not attention grabbing. 3. ML research community as a control group My vague impression is that in conceptual research, mainstream ML research lags behind LW/AI safety community by something between 1 to 5 years, rediscovering topics discussed here. Some examples: ICML poster & oral presentation The Platonic Representation Hypothesis is an independent version of Natural abstractions discussed here for about 4 years. A Roadmap to Pluralistic Alignment deals with Self-unalignment problem and Coherent extrapolated volition Plenty of research on safety protocols like debate, IDA,... Prior work published in the LW/AI safety community is almost never cited or acknowledged - in some cases because it is more convenient to claim the topic is completely novel, but I suspect in many cases researchers are genuinely not aware of the existing work, which makes their contribution a useful control: if someone starts thinking about these topics, unaware of the thousands hours spent on them by dozens of people, what will they arrive at? 4. What 'experts' think ML research community is the intellectual home of many people expressing public opinions about AI risk. In my view, b...

The Nonlinear Library: LessWrong
LW - You should go to ML conferences by Jan Kulveit

The Nonlinear Library: LessWrong

Play Episode Listen Later Jul 24, 2024 6:37


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: You should go to ML conferences, published by Jan Kulveit on July 24, 2024 on LessWrong. This is second kind of obvious point to make, but if you are interested in AI, AI safety, or cognition in general, it is likely worth going to top ML conferences, such as NeurIPS, ICML or ICLR. In this post I cover some reasons why, and some anecdotal stories. 1. Parts of AI alignment and safety are now completely mainstream Looking at the "Best paper awards" at ICML, you'll find these safety-relevant or alignment-relevant papers: Stealing part of a production language model by Carlini et al. Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo by Zhao et al. Debating with More Persuasive LLMs Leads to More Truthful Answers by Khan et al. Genie: Generative Interactive Environments Bruce et al. which amounts to about one-third (!). "Because of safety concerns" is part of the motivation for hundreds of papers. While the signal-to-noise ratio is even worse than on LessWrong, in total, the amount you can learn is higher - my personal guess is there is maybe 2-3x as much prosaic AI safety relevant work at conferences than what you get by just following LessWrong, Alignment Forum and safety-oriented communication channels. 2. Conferences are an efficient way how to screen general ML research without spending a lot of time on X Almost all papers are presented in the form of posters. In case of a big conference, this usually means many thousands of posters presented in huge poster sessions. My routine for engaging with this firehose of papers: 1. For each session, read all the titles. Usually, this prunes it by a factor of ten (i.e. from 600 papers to 60). 2. Read the abstracts. Prune it to things which I haven't noticed before and seem relevant. For me, this is usually by a factor of ~3-5. 3. Visit the posters. Posters with paper authors present are actually a highly efficient way how to digest research: Sometimes, you suspect there is some assumption or choice hidden somewhere making the result approximately irrelevant - just asking can often resolve this in a matter of tens of seconds. Posters themselves don't undergo peer review which makes the communication more honest, with less hedging. Usually authors of a paper know significantly more about the problem than what's in the paper, and you can learn more about negative results, obstacles, or directions people are excited about. Clear disadvantage of conferences is the time lag; by the time they are presented, some of the main results are old and well known, but in my view a lot of the value is the long tail of results which are sometimes very useful, but not attention grabbing. 3. ML research community as a control group My vague impression is that in conceptual research, mainstream ML research lags behind LW/AI safety community by something between 1 to 5 years, rediscovering topics discussed here. Some examples: ICML poster & oral presentation The Platonic Representation Hypothesis is an independent version of Natural abstractions discussed here for about 4 years. A Roadmap to Pluralistic Alignment deals with Self-unalignment problem and Coherent extrapolated volition Plenty of research on safety protocols like debate, IDA,... Prior work published in the LW/AI safety community is almost never cited or acknowledged - in some cases because it is more convenient to claim the topic is completely novel, but I suspect in many cases researchers are genuinely not aware of the existing work, which makes their contribution a useful control: if someone starts thinking about these topics, unaware of the thousands hours spent on them by dozens of people, what will they arrive at? 4. What 'experts' think ML research community is the intellectual home of many people expressing public opinions about AI risk. In my view, b...

Effective Altruism Forum Podcast
“Distancing EA from rationality is foolish” by Jan_Kulveit

Effective Altruism Forum Podcast

Play Episode Listen Later Jun 28, 2024 4:29


Recently, I've noticed a growing tendency within EA to dissociate from Rationality. Good Ventures have stopped funding efforts connected with the rationality community and rationality, and there are increasing calls for EAs to distance themselves. This trend concerns me, and I believe it's good to make a distinction when considering this split. We need to differentiate between 'capital R' Rationality and 'small r' rationality. By 'capital R' Rationality, I mean the actual Rationalist community, centered around Berkeley: A package deal that includes ideas about self-correcting lenses and systematized winning, but also extensive jargon, cultural norms like polyamory, a high-decoupling culture, and familiarity with specific memes (ranging from 'Death with Dignity' to 'came in fluffer'). On the other hand, 'small r' rationality is a more general concept. It encompasses the idea of using reason and evidence to form conclusions, scout mindset, and empiricism. It also includes a quest [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: June 25th, 2024 Source: https://forum.effectivealtruism.org/posts/2pYGbvYsKfDC2WnxL/distancing-ea-from-rationality-is-foolish --- Narrated by TYPE III AUDIO.

The Nonlinear Library
EA - Distancing EA from rationality is foolish by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Jun 25, 2024 3:52


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Distancing EA from rationality is foolish, published by Jan Kulveit on June 25, 2024 on The Effective Altruism Forum. Recently, I've noticed a growing tendency within EA to dissociate from Rationality. Good Ventures have stopped funding efforts connected with the rationality community and rationality, and there are increasing calls for EAs to distance themselves. This trend concerns me, and I believe it's good to make a distinction when considering this split. We need to differentiate between 'capital R' Rationality and 'small r' rationality. By 'capital R' Rationality, I mean the actual Rationalist community, centered around Berkeley: A package deal that includes ideas about self-correcting lenses and systematized winning, but also extensive jargon, cultural norms like polyamory, a high-decoupling culture, and familiarity with specific memes (ranging from 'Death with Dignity' to 'came in fluffer'). On the other hand, 'small r' rationality is a more general concept. It encompasses the idea of using reason and evidence to form conclusions, scout mindset, and empiricism. It also includes a quest to avoid getting stuck with beliefs resistant to evidence, techniques for reflecting on and improving mental processes, and, yes, some of the core ideas of Rationality, like understanding Bayesian reasoning. If people want to distance themselves, it's crucial to be clear about what they're distancing from. I understand why some might want to separate from aspects of the Rationalist community - perhaps they dislike the discourse norms, worry about negative media coverage, or disagree with prevalent community views. However, distancing yourself from 'small r' rationality is far more radical and likely less considered. It's similar to rejecting core EA ideas like scope sensitivity or cause prioritization just because one dislikes certain manifestations of the EA community (e.g., SBF, jargon, hero worship). Effective altruism is fundamentally based on pursuing good deeds through evidence, reason, and clear thinking - in fact when early effective altruists were looking for a name, one of the top contenders was rational altruism. Dissecting the aspiration to think clearly would in my view remove something crucial. Historically, the EA community inherited a lot of epistemic aspects from Rationality[1] - including discourse norms, emphasis on updating on evidence, and a spectrum of thinkers who don't hold either identity closely, but can be associated with both EA and rationality. [2] Here is the crux: if the zeitgeist pulls effective altruists away from Rationality, they should invest more into rationality, not less. As it is critical for effective altruism to cultivate reason, someone will need to work on it. If people in some way connected to Rationality are not who EAs will mostly talk to, someone else will need to pick up the baton. 1. ^ Clare Zabel in 2022 expressed similar worry: Right now, I think the EA community is growing much faster than the rationalist community, even though a lot of the people I think are most impactful report being really helped by some rationalist-sphere materials and projects. Also, it seems like there are a lot of projects aimed at sharing EA-related content with newer EAs, but much less in the way of support and encouragement for practicing the thinking tools I believe are useful for maximizing one's impact (e.g. making good expected-value and back-of-the-envelope calculations, gaining facility for probabilistic reasoning and fast Bayesian updating, identifying and mitigating one's personal tendencies towards motivated or biased reasoning). I'm worried about a glut of newer EAs adopting EA beliefs but not being able to effectively evaluate and critique them, nor to push the boundaries of EA thinking in truth-tracking directions. 2. ^ EA community act...

The Nonlinear Library
AF - AXRP Episode 32 - Understanding Agency with Jan Kulveit by DanielFilan

The Nonlinear Library

Play Episode Listen Later May 30, 2024 76:47


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AXRP Episode 32 - Understanding Agency with Jan Kulveit, published by DanielFilan on May 30, 2024 on The AI Alignment Forum. YouTube link What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the Alignment of Complex Systems research group. Topics we discuss: What is active inference? Preferences in active inference Action vs perception in active inference Feedback loops Active inference vs LLMs Hierarchical agency The Alignment of Complex Systems group Daniel Filan: Hello, everybody. This episode, I'll be speaking with Jan Kulveit. Jan is the co-founder and principal investigator of the Alignment of Complex Systems Research Group, where he works on mathematically understanding complex systems composed of both humans and AIs. Previously, he was a research fellow at the Future of Humanity Institute focused on macrostrategy, AI alignment, and existential risk. For links to what we're discussing you can check the description of this episode and you can read the transcript at axrp.net. Okay. Well Jan, welcome to the podcast. Jan Kulveit: Yeah, thanks for the invitation. What is active inference? Daniel Filan: I'd like to start off with this paper that you've published in December of this last year. It was called "Predictive Minds: Large Language Models as Atypical Active Inference Agents." Can you tell me roughly what was that paper about? What's it doing? Jan Kulveit: The basic idea is: there's active inference as a field originating in neuroscience, started by people like Karl Friston, and it's very ambitious. The active inference folks claim roughly that they have a super general theory of agency in living systems and so on. And there are LLMs, which are not living systems, but they're pretty smart. So we're looking into how close the models actually are. Also, it was in part motivated by… If you look at, for example, the 'simulators' series or frame by Janus and these people on sites like the Alignment Forum, there's this idea that LLMs are something like simulators - or there is another frame on this, that LLMs are predictive systems. And I think this terminology… a lot of what's going on there is basically reinventing stuff which was previously described in active inference or predictive processing, which is another term for minds which are broadly trying to predict their sensory inputs. And it seems like there is a lot of similarity, and actually, a lot of what was invented in the alignment community seems basically the same concepts just given different names. So noticing the similarity, the actual question is: in what ways are current LLMs different, or to what extent are they similar or to what extent are they different? And the main insight of the paper is… the main defense is: currently LLMs, they lack the fast feedback loop between action and perception. So if I have now changed the position of my hand, what I see immediately changes. So you can think about [it with] this metaphor, or if you look on how the systems are similar, you could look at base model training of LLMs as some sort of strange edge case of active inference or predictive processing system, which is just receiving sensor inputs, where the sensor inputs are tokens, but it's not acting, it's not changing some data. And then the model is trained, and it maybe changes a bit in instruct fine-tuning, but ultimately when the model is deployed, we claim that you can think about the interactions of the model with users as actions, because what the model outputs ultimately can change stuff in the world. People will post it on the internet or take actions based on what the LLM is saying. So the arrow from the system to the world, changing the world, exists, but th...

AXRP - the AI X-risk Research Podcast
32 - Understanding Agency with Jan Kulveit

AXRP - the AI X-risk Research Podcast

Play Episode Listen Later May 30, 2024 142:29


What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the Alignment of Complex Systems research group. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast The transcript: axrp.net/episode/2024/05/30/episode-32-understanding-agency-jan-kulveit.html Topics we discuss, and timestamps: 0:00:47 - What is active inference? 0:15:14 - Preferences in active inference 0:31:33 - Action vs perception in active inference 0:46:07 - Feedback loops 1:01:32 - Active inference vs LLMs 1:12:04 - Hierarchical agency 1:58:28 - The Alignment of Complex Systems group   Website of the Alignment of Complex Systems group (ACS): acsresearch.org ACS on X/Twitter: x.com/acsresearchorg Jan on LessWrong: lesswrong.com/users/jan-kulveit Predictive Minds: Large Language Models as Atypical Active Inference Agents: arxiv.org/abs/2311.10215   Other works we discuss: Active Inference: The Free Energy Principle in Mind, Brain, and Behavior: https://www.goodreads.com/en/book/show/58275959 Book Review: Surfing Uncertainty: https://slatestarcodex.com/2017/09/05/book-review-surfing-uncertainty/ The self-unalignment problem: https://www.lesswrong.com/posts/9GyniEBaN3YYTqZXn/the-self-unalignment-problem Mitigating generative agent social dilemmas (aka language models writing contracts for Minecraft): https://social-dilemmas.github.io/   Episode art by Hamish Doodles: hamishdoodles.com

The Nonlinear Library
AF - Announcing Human-aligned AI Summer School by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later May 22, 2024 2:47


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing Human-aligned AI Summer School, published by Jan Kulveit on May 22, 2024 on The AI Alignment Forum. The fourth Human-aligned AI Summer School will be held in Prague from 17th to 20th July 2024. We will meet for four intensive days of talks, workshops, and discussions covering latest trends in AI alignment research and broader framings of AI alignment research. Apply now , applications are evaluated on a rolling basis. The intended audience of the school are people interested in learning more about the AI alignment topics, PhD students, researchers working in ML/AI outside academia, and talented students. Format of the school The school is focused on teaching and exploring approaches and frameworks, less on presentation of the latest research results. The content of the school is mostly technical - it is assumed the attendees understand current ML approaches and some of the underlying theoretical frameworks. This year, the school will cover these main topics: Overview of the alignment problem and current approaches. Alignment of large language models: RLHF, DPO and beyond. Methods used to align current large language models and their shortcomings. Evaluating and measuring AI systems: How to understand and oversee current AI systems on the behavioral level. Interpretability and the science of deep learning: What's going on inside of the models? AI alignment theory: While 'prosaic' approaches to alignment focus on current systems, theory aims for deeper understanding and better generalizability. Alignment in the context of complex systems and multi-agent settings: What should the AI be aligned to? In most realistic settings, we can expect there are multiple stakeholders and many interacting AI systems; any solutions to alignment problem need to solve multi-agent settings. The school consists of lectures and topical series, focused smaller-group workshops and discussions, expert panels, and opportunities for networking, project brainstorming and informal discussions. Detailed program of the school will be announced shortly before the event. See below for a program outline and e.g. the program of the previous school for an illustration of the program content and structure. Confirmed speakers Stephen Casper - Algorithmic Alignment Group, MIT. Stanislav Fort - Google DeepMind. Jesse Hoogland - Timaeus. Jan Kulveit - Alignment of Complex Systems, Charles University. Mary Phuong - Google DeepMind. Deger Turan - AI Objectives Institute and Metaculus. Vikrant Varma - Google DeepMind. Neel Nanda - Google DeepMind. (more to be announced later) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
LW - Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation by Benjamin Sturgeon

The Nonlinear Library

Play Episode Listen Later Mar 24, 2024 29:05


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation, published by Benjamin Sturgeon on March 24, 2024 on LessWrong. I want to thank Jan Kulveit, Tomáš Gavenčiak, and Jonathan Shock for their extensive feedback and ideas they contributed to this work and for Josh Burgener and Yusuf Heylen for their proofreading and comments. I would also like to acknowledge the Epistea Residency and its organisers where much of the thinking behind this work was done. This post aims to build towards a theory of how meditation alters the mind based on the ideas of active inference (ActInf). ActInf has been growing in its promise as a theory of how brains process information and interact with the world and has become increasingly validated with a growing body of work in the scientific literature. Why bring the idea of ActInf and meditation together? Meditation seems to have a profound effect on the experience of people who practise it extensively, and in many cases purports to help people to come to great insight about themselves, reality, and in many cases profoundly alters their relationship to their lived experience. ActInf seems to promise a legible framework for understanding some of the mechanisms that are at play at the root of our experience. Considering these ideas seem to both be pointing at something fundamental about how we experience the world it stands to reason they might be talking about some of the same things in different languages. The hope is that we can use these two to explore these two theories and start to bridge some of the gap in science in providing a theoretical explanation for how these meditative techniques work. This post will be quite speculative in nature without me providing much in the way of experimental evidence. This is a weakness in the work that I may try to address later but for now I would like to stick to what the theories say and how we can fit them together. I will focus on the technique of Vipassana meditation and in a future post I will examine Anapana and Metta meditation. I'll be talking about these techniques because I have a reasonable body of personal experience with them and because I have found their practice leads to fairly predictable and replicable results in those who practise them. My personal experience is the source of much of the discussion below. Anecdotally, I have found that thinking about suffering in the way described below has helped me to recognise and escape from painful thought cycles where I was able to realise I was generating needless prediction error by simply going back to observing reality through sensations. This has been very helpful to me. A quick intro to Active Inference My goal in this section is to give a barebones summary of some key concepts in ActInf that we will use to examine various meditative practices. My focus will be on defining terms and concepts so that if you have never heard of active inference before you can have the context to follow this post and judge the merits of the arguments yourself. The precise neuroscience is not explored here, but by hypothesising we can work towards a story that seems to fit our observations. ActInf is a theory that tries to explain how and why agents (in our context this refers to all living things) act in the world in the way that they do. The key concept of ActInf is that the primary objective of an ActInf agent is to minimise the gap between its predictions of the world and how the world actually appears. This happens through 2 methods: it improves the accuracy of its world model, or generative model, by updating that model with new information, and by taking action in the world to bring the world more in line with the predictions of its generative model. Generative models and preferences ActInf hinges on the ...

The Nonlinear Library: LessWrong
LW - Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation by Benjamin Sturgeon

The Nonlinear Library: LessWrong

Play Episode Listen Later Mar 24, 2024 29:05


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation, published by Benjamin Sturgeon on March 24, 2024 on LessWrong. I want to thank Jan Kulveit, Tomáš Gavenčiak, and Jonathan Shock for their extensive feedback and ideas they contributed to this work and for Josh Burgener and Yusuf Heylen for their proofreading and comments. I would also like to acknowledge the Epistea Residency and its organisers where much of the thinking behind this work was done. This post aims to build towards a theory of how meditation alters the mind based on the ideas of active inference (ActInf). ActInf has been growing in its promise as a theory of how brains process information and interact with the world and has become increasingly validated with a growing body of work in the scientific literature. Why bring the idea of ActInf and meditation together? Meditation seems to have a profound effect on the experience of people who practise it extensively, and in many cases purports to help people to come to great insight about themselves, reality, and in many cases profoundly alters their relationship to their lived experience. ActInf seems to promise a legible framework for understanding some of the mechanisms that are at play at the root of our experience. Considering these ideas seem to both be pointing at something fundamental about how we experience the world it stands to reason they might be talking about some of the same things in different languages. The hope is that we can use these two to explore these two theories and start to bridge some of the gap in science in providing a theoretical explanation for how these meditative techniques work. This post will be quite speculative in nature without me providing much in the way of experimental evidence. This is a weakness in the work that I may try to address later but for now I would like to stick to what the theories say and how we can fit them together. I will focus on the technique of Vipassana meditation and in a future post I will examine Anapana and Metta meditation. I'll be talking about these techniques because I have a reasonable body of personal experience with them and because I have found their practice leads to fairly predictable and replicable results in those who practise them. My personal experience is the source of much of the discussion below. Anecdotally, I have found that thinking about suffering in the way described below has helped me to recognise and escape from painful thought cycles where I was able to realise I was generating needless prediction error by simply going back to observing reality through sensations. This has been very helpful to me. A quick intro to Active Inference My goal in this section is to give a barebones summary of some key concepts in ActInf that we will use to examine various meditative practices. My focus will be on defining terms and concepts so that if you have never heard of active inference before you can have the context to follow this post and judge the merits of the arguments yourself. The precise neuroscience is not explored here, but by hypothesising we can work towards a story that seems to fit our observations. ActInf is a theory that tries to explain how and why agents (in our context this refers to all living things) act in the world in the way that they do. The key concept of ActInf is that the primary objective of an ActInf agent is to minimise the gap between its predictions of the world and how the world actually appears. This happens through 2 methods: it improves the accuracy of its world model, or generative model, by updating that model with new information, and by taking action in the world to bring the world more in line with the predictions of its generative model. Generative models and preferences ActInf hinges on the ...

The Nonlinear Library
EA - Meta EA Regional Organizations (MEAROs): An Introduction by Rockwell

The Nonlinear Library

Play Episode Listen Later Feb 21, 2024 20:45


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Meta EA Regional Organizations (MEAROs): An Introduction, published by Rockwell on February 21, 2024 on The Effective Altruism Forum. Thank you to the many MEARO leaders who provided feedback and inspiration for this post, directly and through their work in the space. Introduction In the following post, I will introduce MEAROs - Meta EA Regional Organizations - a new term for a long-established segment of the EA ecosystem. I will provide an overview of the roles MEAROs currently serve and a sketch of what MEAROs could look like and could accomplish if more fully resourced. Though MEAROs have existed as long as EA itself, I think the concept has been underdefined, underexplored, and consequently underutilized as a tool for solving the world's most pressing problems. I'm hopeful that giving MEAROs a name will help the community at large better understand these organizations and prompt wider discussion on how to strategically develop them over time. Background By way of background, I have worked full-time on a MEARO - Effective Altruism New York City - since August 2021. In my role with EA NYC, I consider my closest collaborators not only the direct EA NYC team but also the leaders of other MEAROs, especially those who likewise receive a portion of their organization funding through Centre for Effective Altruism's Community Building Grants (CBG) Program. As I previously stated on the Forum, before the FTX collapse, there was a heavy emphasis on making community building a long-term and sustainable career path.[1] As a result, there are now dozens of people working professionally and often full-time on MEAROs. This is a notable and very recent shift: Many MEAROs were founded shortly after EA was named, or morphed out of communities that predated EA. Most MEAROs were volunteer-run for the majority of their existence. CEA launched the CBG Program in 2018 and slowly expanded its scope through 2022. EA NYC, for example, was volunteer-run for over seven years before receiving funding for two full-time employees through the CBG Program in Summer 2020. This has led to a game of catch-up: MEAROs have professionalized, but many in the broader EA community still think of MEAROs as volunteer-operated clubs, rather than serious young nonprofits. We also now have significantly more brainpower thinking about ways to maximize impact through the MEARO structure,[2] a topic I do not feel has been adequately explored on the Forum. (I recommend Jan Kulveit's posts from October 2018 - Why develop national-level effective altruism organizations? and Suggestions for developing national-level effective altruism organizations - for among the most relevant early discourse I'm aware of on the Forum.) I hope this post can not only give the broader EA ecosystem a better sense of the roles MEAROs currently serve but also open discussion and get others thinking about how we can use MEAROs more effectively. Defining MEAROs MEAROs work to enhance and support the EA movement and its objectives within specific regions. This description is intentionally broad as MEAROs' work varies substantially between organizations and over time. My working definition of MEAROs requires the following characteristics: 1. Region-Specific Focus True to EA values, MEAROs maintain a global outlook and are committed to solving the world's most pressing problems, but do this by promoting and supporting the EA movement and its objectives within a particular geographical area. The region could be a city, state, country, or alternative geographical unit, and the MEARO's activities and initiatives are typically tailored to the context and needs of that region. 2. Focus on Meta-EA Meta Effective Altruism - the branch of the EA ecosystem MEAROs sit within - describes efforts to improve the efficiency, reach, and impact of the effect...

The Nonlinear Library
AF - Apply to the PIBBSS Summer Research Fellowship by Nora Ammann

The Nonlinear Library

Play Episode Listen Later Jan 12, 2024 3:41


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Apply to the PIBBSS Summer Research Fellowship, published by Nora Ammann on January 12, 2024 on The AI Alignment Forum. TLDR: We're hosting a 3-month, fully-funded fellowship to do AI safety research drawing on inspiration from fields like evolutionary biology, neuroscience, dynamical systems theory, and more. Past fellows have been mentored by John Wentworth, Davidad, Abram Demski, Jan Kulveit and others, and gone on to work at places like Anthropic, Apart research, or as full-time PIBBSS research affiliates. Apply here: https://www.pibbss.ai/fellowship (deadline Feb 4, 2024) ''Principles of Intelligent Behavior in Biological and Social Systems' ( PIBBSS) is a research initiative focused on supporting AI safety research by making a specific epistemic bet: that we can understand key aspects of the alignment problem by drawing on parallels between intelligent behaviour in natural and artificial systems. Over the last years we've financially supported around 40 researchers for 3-month full-time fellowships, and are currently hosting 5 affiliates for a 6-month program, while seeking the funding to support even longer roles. We also organise research retreats, speaker series, and maintain an active alumni network. We're now excited to announce the 2024 round of our fellowship series! The fellowship Our Fellowship brings together researchers from fields studying complex and intelligent behavior in natural and social systems, such as evolutionary biology, neuroscience, dynamical systems theory, economic/political/legal theory, and more. Over the course of 3-months, you will work on a project at the intersection of your own field and AI safety, under the mentorship of experienced AI alignment researchers. In past years, mentors included John Wentworth, Abram Demski, Davidad, Jan Kulveit - and we also have a handful of new mentors join us every year. In addition, you'd get to attend in-person research retreats with the rest of the cohort (past programs have taken place in Prague, Oxford and San Francisco), and choose to join our regular speaker events where we host scholars who work in areas adjacent to our epistemic bet, like Michael Levin, Alan Love, and Steve Byrnes and a co-organised an event with Karl Friston. The program is centrally aimed at Ph.D. or Postdoctoral researchers. However, we encourage interested individuals with substantial prior research experience in their field of expertise to apply regardless of their credentials. Past scholars have pursued projects with titles ranging from: "Detecting emergent capabilities in multi-agent AI Systems" to "Constructing Logically Updateless Decision Theory" and "Tort law as a tool for mitigating catastrophic risk from AI". You can meet our alumni here, and learn more about their research by checking out talks at our YouTube channel PIBBSS summer symposium. Our alumni have gone on to work at different organisations including OpenAI, Anthropic, ACS, AI Objectives Institute, APART research, or as full-time researchers on our own PIBBSS research affiliate program. Apply! For any questions, you can reach out to us at contact@pibbss.ai, or join one of our information sessions: Jan 27th, 4pm Pacific (01:00 Berlin) Link to register Jan 29th, 9am Pacific (18:00 Berlin) Link to register Feel free to share this post with other who might be interested in applying! Apply here: https://www.pibbss.ai/fellowship (deadline Feb 4, 2024) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
AF - Box inversion revisited by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Nov 7, 2023 13:32


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Box inversion revisited, published by Jan Kulveit on November 7, 2023 on The AI Alignment Forum. Box inversion hypothesis is a proposed correspondence between problems with AI systems studied in approaches like agent foundations , and problems with AI ecosystems, studied in various views on AI safety expecting multipolar, complex worlds, like CAIS. This is an updated and improved introduction to the idea. Cartoon explanation In the classic -"superintelligence in a box" - picture, we worry about an increasingly powerful AGI, which we imagine as contained in a box. Metaphorically, we worry that the box will, at some point, just blow up in our faces. Classic arguments about AGI then proceed by showing it is really hard to build AGI-proof boxes, and that really strong optimization power is dangerous by default. While the basic view was largely conceived by Eliezer Yudkowsky and Nick Bostrom, it is still the view most technical AI safety is built on, including current agendas like mechanistic interpretability and evals. In the less famous, though also classic, picture, we worry about an increasingly powerful ecosystem of AI services, automated corporations, etc. Metaphorically, we worry about the ever-increasing optimization pressure "out there", gradually marginalizing people, and ultimately crushing us. Classical treatments of this picture are less famous, but include Eric Drexler's CAIS ( Comprehensive AI Services ) and Scott Alexander's Ascended Economy . We can imagine scenarios like the human-incomprehensible economy expanding in the universe, and humans and our values being protected by some sort of "box". Agendas based on this view include the work of the AI Objectives Institute and part of ACS work. The apparent disagreement between these views was sometimes seen as a crux for various AI safety initiatives. "Box inversion hypothesis" claims: The two pictures to a large degree depict the same or a very similar situation, Are related by a transformation which "turns the box inside out", similarly to a geometrical transformation of a plane known as a circle inversion, and: this metaphor is surprisingly deep and can point to hard parts of some problems. Geometrical metaphor " Circular inversion " transformation does not imply the original and the inverted objects are the same, or are located at the same places. What it does imply is that some relations between objects are preserved: for example, if some objects intersect, in the circle-inverted view, they will still intersect. Similarly for "box inversion" : the hypothesis does not claim that the AI safety problems in both views are identical, but it does claim that, for most problems, there is a corresponding problem described by the other perspective. Also, while the box-inverted problems may at a surface level look very different, and be located in different places, there will be some deep similarity between the two corresponding problems. In other words, the box inversion hypothesis suggests that there is a kind of 'mirror image' or 'duality' between two sets of AI safety problems. One set comes from the "Agent Foundations" type of perspective, and the other set comes from the "Ecosystems of AIs" type of perspective. Box-inverted problems Problems with ontologies and regulatory frameworks [1] In the classic agent foundations-esque picture, a nontrivial fraction of AI safety challenges are related to issues of similarity, identification, and development of ontologies. Roughly speaking If the AI is using utterly non-human concepts and world models, it becomes much more difficult to steer and control If "what humans want" is expressed in human concepts, and the concepts don't extend to novel situations or contexts, then it is unclear how the AI should extend or interpret the human "wants" Even if an AI initially u...

The Nonlinear Library
LW - Snapshot of narratives and frames against regulating AI by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Nov 2, 2023 4:58


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Snapshot of narratives and frames against regulating AI, published by Jan Kulveit on November 2, 2023 on LessWrong. This is a speculative map of a hot discussion topic. I'm posting it in question form in the hope we can rapidly map the space in answers. Looking at various claims at X and at the AI summit, it seems possible to identify some key counter-regulation narratives and frames that various actors are pushing. Because a lot of the public policy debate won't be about "what are some sensible things to do" within a particular frame, but rather about fights for frame control, or "what frame to think in" , it seems beneficial to have at least some sketch of a map of the discourse. I'm posting this as a question with the hope we can rapidly map the space, and one example of a "local map": "It's about open source vs. regulatory capture" It seems the coalition against AI safety, most visibly represented by Yann LeCun and Meta, has identified "it's about open source vs. big tech" as a favorable frame in which they can argue and build a coalition of open-source advocates who believe in the open-source ideology, academics who want access to large models, and small AI labs and developers believing they will remain long-term competitive by fine-tuning smaller models and capturing various niche markets. LeCun and others attempt to portray themselves as the force of science and open inquiry , while the scaling labs proposing regulation are the evil big tech attempting regulatory capture . Because this seems to be the prefered anti-regulation frame, I will spend most time on this. Apart from the mentioned groups, this narrative seems to be memetically fit in a "proudly cynical" crowd which assumes everything everyone is doing or saying is primarily self-interested and profit-driven. Overall, the narrative has clear problems with explaining away inconvenient facts, including: Thousands of academics calling for regulation are uncanny counter-evidence for x-risk being just a ploy by the top labs. The narrative strategy seems to explain this by some of the senior academics just being deluded, and others also pursuing a self-interested strategy in expectation of funding. Many of the people explaining AI risk now were publicly concerned about AI risk before founding labs, and at times when it was academically extremely unprofitable, sometimes sacrificing standard academic careers. The narrative move is to just ignore this. Also, many things are just assumed - for example, if the resulting regulation would be in the interest o frontrunners. What could be memetically viable counter-arguments within the frame? Personally, I tend to point out that motivation to avoid AI risk is completely compatible with self-interest. Leaders of AI labs also have skin in the game. Also, recently I try to ask people to use the explanatory frame of 'cui bono' also to the other side, namely, Meta. One possible hypothesis here is Meta just loves open source and wants everyone to flourish. A more likely hypothesis is Meta wants to own the open-source ecosystem. A more complex hypothesis is Meta doesn't actually love open source that much but has a sensible, self-interested strategy, aimed at a dystopian outcome. To understand the second option, it's a prerequisite to comprehend the "commoditize the complement" strategy. This is a business approach where a company aims to drive down the cost or increase the availability of goods or services complementary to its own offerings. The outcome is an increase in the value of the company's services. Some famous successful examples of this strategy include Microsoft and PC hardware: PC hardware became a commodity, while Microsoft came close to monopolizing the OS, extracting huge profits. Or, Apple's App Store: The complement to the phone is the apps. Apps have becom...

The Nonlinear Library: LessWrong
LW - Snapshot of narratives and frames against regulating AI by Jan Kulveit

The Nonlinear Library: LessWrong

Play Episode Listen Later Nov 2, 2023 4:58


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Snapshot of narratives and frames against regulating AI, published by Jan Kulveit on November 2, 2023 on LessWrong. This is a speculative map of a hot discussion topic. I'm posting it in question form in the hope we can rapidly map the space in answers. Looking at various claims at X and at the AI summit, it seems possible to identify some key counter-regulation narratives and frames that various actors are pushing. Because a lot of the public policy debate won't be about "what are some sensible things to do" within a particular frame, but rather about fights for frame control, or "what frame to think in" , it seems beneficial to have at least some sketch of a map of the discourse. I'm posting this as a question with the hope we can rapidly map the space, and one example of a "local map": "It's about open source vs. regulatory capture" It seems the coalition against AI safety, most visibly represented by Yann LeCun and Meta, has identified "it's about open source vs. big tech" as a favorable frame in which they can argue and build a coalition of open-source advocates who believe in the open-source ideology, academics who want access to large models, and small AI labs and developers believing they will remain long-term competitive by fine-tuning smaller models and capturing various niche markets. LeCun and others attempt to portray themselves as the force of science and open inquiry , while the scaling labs proposing regulation are the evil big tech attempting regulatory capture . Because this seems to be the prefered anti-regulation frame, I will spend most time on this. Apart from the mentioned groups, this narrative seems to be memetically fit in a "proudly cynical" crowd which assumes everything everyone is doing or saying is primarily self-interested and profit-driven. Overall, the narrative has clear problems with explaining away inconvenient facts, including: Thousands of academics calling for regulation are uncanny counter-evidence for x-risk being just a ploy by the top labs. The narrative strategy seems to explain this by some of the senior academics just being deluded, and others also pursuing a self-interested strategy in expectation of funding. Many of the people explaining AI risk now were publicly concerned about AI risk before founding labs, and at times when it was academically extremely unprofitable, sometimes sacrificing standard academic careers. The narrative move is to just ignore this. Also, many things are just assumed - for example, if the resulting regulation would be in the interest o frontrunners. What could be memetically viable counter-arguments within the frame? Personally, I tend to point out that motivation to avoid AI risk is completely compatible with self-interest. Leaders of AI labs also have skin in the game. Also, recently I try to ask people to use the explanatory frame of 'cui bono' also to the other side, namely, Meta. One possible hypothesis here is Meta just loves open source and wants everyone to flourish. A more likely hypothesis is Meta wants to own the open-source ecosystem. A more complex hypothesis is Meta doesn't actually love open source that much but has a sensible, self-interested strategy, aimed at a dystopian outcome. To understand the second option, it's a prerequisite to comprehend the "commoditize the complement" strategy. This is a business approach where a company aims to drive down the cost or increase the availability of goods or services complementary to its own offerings. The outcome is an increase in the value of the company's services. Some famous successful examples of this strategy include Microsoft and PC hardware: PC hardware became a commodity, while Microsoft came close to monopolizing the OS, extracting huge profits. Or, Apple's App Store: The complement to the phone is the apps. Apps have becom...

The Nonlinear Library
LW - Elon Musk announces xAI by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Jul 13, 2023 2:29


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Elon Musk announces xAI, published by Jan Kulveit on July 13, 2023 on LessWrong. Some quotes & few personal opinions:FT reports Musk is also in discussions with a number of investors in SpaceX and Tesla about putting money into his new venture, said a person with direct knowledge of the talks. "A bunch of people are investing in it . . . it's real and they are excited about it," the person said....Musk recently changed the name of Twitter to X Corp in company filings, as part of his plans to create an "everything app" under the brand "X". For the new project, Musk has secured thousands of high-powered GPU processors from Nvidia, said people with knowledge of the move. ...During a Twitter Spaces interview this week, Musk was asked about a Business Insider report that Twitter had bought as many as 10,000 Nvidia GPUs, "It seems like everyone and their dog is buying GPUs at this point," Musk said. "Twitter and Tesla are certainly buying GPUs." People familiar with Musk's thinking say his new AI venture is separate from his other companies, though it could use Twitter content as data to train its language model and tap Tesla for computing resources. According to xAI website, the initial team is composed of Elon Musk Igor Babuschkin Manuel Kroiss Yuhuai (Tony) Wu Christian Szegedy Jimmy Ba Toby Pohlen Ross Nordeen Kyle Kosic Greg Yang Guodong Zhang Zihang Dai and they are "advised by Dan Hendrycks, who currently serves as the director of the Center for AI Safety." According to reports xAI will seek to create a "maximally curious" AI, and this also seems to be the main new idea how to solve safety, with Musk explaining: "If it tried to understand the true nature of the universe, that's actually the best thing that I can come up with from an AI safety standpoint," ... "I think it is going to be pro-humanity from the standpoint that humanity is just much more interesting than not-humanity."My personal comments:Sorry, but at face value, this just does not seem a great plan from safety perspective. Similarly to Elon Musk's previous big bet how to make us safe by making AI open-source and widely distributed ("giving everyone access to new ideas").Sorry, but given "Center for AI Safety" moves to put them into some sort of "Center", public representative position of AI Safety - including the name choice, and organizing the widely reported Statement on AI risk - it seems publicly associating their brand with xAI is a strange choice. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library: LessWrong
LW - Elon Musk announces xAI by Jan Kulveit

The Nonlinear Library: LessWrong

Play Episode Listen Later Jul 13, 2023 2:29


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Elon Musk announces xAI, published by Jan Kulveit on July 13, 2023 on LessWrong. Some quotes & few personal opinions:FT reports Musk is also in discussions with a number of investors in SpaceX and Tesla about putting money into his new venture, said a person with direct knowledge of the talks. "A bunch of people are investing in it . . . it's real and they are excited about it," the person said....Musk recently changed the name of Twitter to X Corp in company filings, as part of his plans to create an "everything app" under the brand "X". For the new project, Musk has secured thousands of high-powered GPU processors from Nvidia, said people with knowledge of the move. ...During a Twitter Spaces interview this week, Musk was asked about a Business Insider report that Twitter had bought as many as 10,000 Nvidia GPUs, "It seems like everyone and their dog is buying GPUs at this point," Musk said. "Twitter and Tesla are certainly buying GPUs." People familiar with Musk's thinking say his new AI venture is separate from his other companies, though it could use Twitter content as data to train its language model and tap Tesla for computing resources. According to xAI website, the initial team is composed of Elon Musk Igor Babuschkin Manuel Kroiss Yuhuai (Tony) Wu Christian Szegedy Jimmy Ba Toby Pohlen Ross Nordeen Kyle Kosic Greg Yang Guodong Zhang Zihang Dai and they are "advised by Dan Hendrycks, who currently serves as the director of the Center for AI Safety." According to reports xAI will seek to create a "maximally curious" AI, and this also seems to be the main new idea how to solve safety, with Musk explaining: "If it tried to understand the true nature of the universe, that's actually the best thing that I can come up with from an AI safety standpoint," ... "I think it is going to be pro-humanity from the standpoint that humanity is just much more interesting than not-humanity."My personal comments:Sorry, but at face value, this just does not seem a great plan from safety perspective. Similarly to Elon Musk's previous big bet how to make us safe by making AI open-source and widely distributed ("giving everyone access to new ideas").Sorry, but given "Center for AI Safety" moves to put them into some sort of "Center", public representative position of AI Safety - including the name choice, and organizing the widely reported Statement on AI risk - it seems publicly associating their brand with xAI is a strange choice. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
EA - Announcing a new organization: Epistea by Epistea

The Nonlinear Library

Play Episode Listen Later May 23, 2023 4:41


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing a new organization: Epistea, published by Epistea on May 22, 2023 on The Effective Altruism Forum. Summary We are announcing a new organization called Epistea. Epistea supports projects in the space of existential security, epistemics, rationality, and effective altruism. Some projects we initiate and run ourselves, and some projects we support by providing infrastructure, know-how, staff, operations, or fiscal sponsorship. Our current projects are FIXED POINT, Prague Fall Season, and the Epistea Residency Program. We support ACS (Alignment of Complex Systems Research Group), PIBBSS (Principles of Intelligent Behavior in Biological and Social Systems), and HAAISS (Human Aligned AI Summer School). History and context Epistea was initially founded in 2019 as a rationality and epistemics research and education organization by Jan Kulveit and a small group of collaborators. They ran an experimental workshop on group rationality, the Epistea Summer Experiment in the summer of 2019 and were planning on organizing a series of rationality workshops in 2020. The pandemic paused plans to run workshops and most of the original staff have moved on to other projects. In 2022, Irena Kotíková was looking for an organization to fiscally sponsor her upcoming projects. Together with Jan, they decided to revamp Epistea as an umbrella organization for a wide range of projects related to epistemics and existential security, under Irena's leadership. What? Epistea is a service organization that creates, runs, and supports projects that help with clear thinking and scale-sensitive caring. We believe that actions in sensitive areas such as existential risk mitigation often follow from good epistemics, and we are particularly interested in supporting efforts in this direction. The core Epistea team is based in Prague, Czech Republic, and works primarily in person there, although we support projects worldwide. As we are based in continental Europe and in the EU, we are a good fit for projects located in the EU. We provide the following services: Fiscal sponsorship (managing payments, accounting, and overall finances) Administrative and operations support (booking travel, accommodation, reimbursements, applications, visas) Events organization and support (conferences, retreats, workshops) Ad hoc operations support We currently run the following projects: FIXED POINT Fixed Point is a community and coworking space situated in Prague. The space is optimized for intellectual work and interesting conversations but also prioritizes work-life balance. You can read more about FIXED POINT here. Prague Fall Season PFS is a new model for global movement building which we piloted in 2022. The goal of the Season is to have a high concentration of people and events, in a limited time, in one space, and working on a specific set of problems. This allows for better coordination and efficiency and creates more opportunities for people to collaborate, co-create and co-work on important projects together, possibly in a new location - different from their usual space. Part of PFS is a residency program. You can read more about the Prague Fall Season here. Additionally, we support: ACS - Alignment of Complex Systems Research Group PIBBSS - Principles of Intelligent Behavior in Biological and Social Systems HAAISS - Human Aligned AI Summer School Who? Irena Kotíková leads a team of 4 full-time staff and 4 contractors: Jana Meixnerová - Head of Programs, focus on the Prague Fall Season Viktorie Havlíčková - Head of Operations Martin Hrádela - Facilities Manager, focus on Fixed Point Jan Šrajhans - User Experience Specialist Karin Neumanová - Interior Designer Linh Dan Leová - Operations Associate Jiří Nádvorník - Special Projects František Drahota - Special Projects The team has a wide range of experience...

Screenshot Inspiračního fóra
Screenshot IF živě: Mezi lidským a mimolidským

Screenshot Inspiračního fóra

Play Episode Listen Later Apr 28, 2023 91:31


Nedávná výzva stovek osobností z akademie i byznysu, která volá po šestiměsíčním pozastavení vývoje umělé inteligence, znovu vyvolala otázky ohledně možných rizik spojených s AI. Je umělá inteligence skutečně takovou hrozbou? Jak ji lze v současnosti regulovat? Je možné přimět technologické firmy k zodpovědnosti a dostat vývoj pod veřejnou kontrolu? A jak můžeme uvažovat o „umělém“ vědomí?O těchto i dalších otázkách debatovali Tereza Stöckelová, socioložka zaměřující se na studia vědy, technologií a medicíny, Jan Kulveit, odborník na komplexní rizika a umělou inteligenci, a Alžběta Solarczyk Krausová, která se zabývá právními aspekty umělé inteligence, robotiky či propojování technologií s biologickým životem. Moderovala Klára Vlasáková.

The Nonlinear Library
LW - The Toxoplasma of AGI Doom and Capabilities? by Robert AIZI

The Nonlinear Library

Play Episode Listen Later Apr 24, 2023 2:33


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Toxoplasma of AGI Doom and Capabilities?, published by Robert AIZI on April 24, 2023 on LessWrong. [Epistemic Status: I'm confident that the individual facts I lay out support the main claim, but I'm not fully confident its enough evidence to make a true or useful framework for understanding the world.] I'm going to give seven pieces of evidence to support this claim: AI Doomerism helps accelerate AI capabilities, and AI capabilities in turn proliferate the AI Doomerism meme. If these dynamics exist, they'd be not unlike the Toxoplasma of Rage. Here's my evidence: Sam Altman claims Eliezer "has IMO done more to accelerate AGI than anyone else": Technical talent who hear about AI doom might decide capabilities are technically sweet, or a race, or inevitable, and decide to work on it for those reasons (doomer -> capabilities transmission). Funders and executives who hear about AI doom might decide capabilities are a huge opportunity, or disruptive, or inevitable, and decide to fund it for those reasons (doomer -> capabilities transmission). Capabilities amplifies the memetic relevance of doomerism (capabilities -> doomer transmission). AI Doomerism says we should closely follow capabilities updates, discuss them, etc. Capabilities and doomerism gain and lose social status together - Eliezer Yudkowsky has been writing about doom for a long time, but got a Time article and TED talk only after significant capabilities advances. Memes generally benefit from conflict, and doomerism and capabilities can serve as adversaries for this purpose. I've been trying to talk about "AI doomerism" here as a separate meme than "AI safety", respectively something like "p(doom) is very large" and "we need to invest heavily into AI safety work", though these are obviously related and often cooccur. One could no doubt make a similar case for AI safety and capabilities supporting each other, but I think the evidence I listed above applies mostly to AI doom claims (if one uses Eliezer as synecdoche for AI doomerism, which I think is reasonable). I hope with this post I'm highlighting a something that is a combination of true and useful. Please keep in mind that the truth values of "AI doom is in a toxoplasma relationship with AI capabilities" and "AI doom is right" are independent. This post was inspired by one striking line Jan_Kulveit's helpful Talking publicly about AI risk: the AGI doom memeplex has, to some extent, a symbiotic relationship with the race toward AGI memeplex Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library: LessWrong
LW - The Toxoplasma of AGI Doom and Capabilities? by Robert AIZI

The Nonlinear Library: LessWrong

Play Episode Listen Later Apr 24, 2023 2:33


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Toxoplasma of AGI Doom and Capabilities?, published by Robert AIZI on April 24, 2023 on LessWrong. [Epistemic Status: I'm confident that the individual facts I lay out support the main claim, but I'm not fully confident its enough evidence to make a true or useful framework for understanding the world.] I'm going to give seven pieces of evidence to support this claim: AI Doomerism helps accelerate AI capabilities, and AI capabilities in turn proliferate the AI Doomerism meme. If these dynamics exist, they'd be not unlike the Toxoplasma of Rage. Here's my evidence: Sam Altman claims Eliezer "has IMO done more to accelerate AGI than anyone else": Technical talent who hear about AI doom might decide capabilities are technically sweet, or a race, or inevitable, and decide to work on it for those reasons (doomer -> capabilities transmission). Funders and executives who hear about AI doom might decide capabilities are a huge opportunity, or disruptive, or inevitable, and decide to fund it for those reasons (doomer -> capabilities transmission). Capabilities amplifies the memetic relevance of doomerism (capabilities -> doomer transmission). AI Doomerism says we should closely follow capabilities updates, discuss them, etc. Capabilities and doomerism gain and lose social status together - Eliezer Yudkowsky has been writing about doom for a long time, but got a Time article and TED talk only after significant capabilities advances. Memes generally benefit from conflict, and doomerism and capabilities can serve as adversaries for this purpose. I've been trying to talk about "AI doomerism" here as a separate meme than "AI safety", respectively something like "p(doom) is very large" and "we need to invest heavily into AI safety work", though these are obviously related and often cooccur. One could no doubt make a similar case for AI safety and capabilities supporting each other, but I think the evidence I listed above applies mostly to AI doom claims (if one uses Eliezer as synecdoche for AI doomerism, which I think is reasonable). I hope with this post I'm highlighting a something that is a combination of true and useful. Please keep in mind that the truth values of "AI doom is in a toxoplasma relationship with AI capabilities" and "AI doom is right" are independent. This post was inspired by one striking line Jan_Kulveit's helpful Talking publicly about AI risk: the AGI doom memeplex has, to some extent, a symbiotic relationship with the race toward AGI memeplex Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
LW - Talking publicly about AI risk by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Apr 21, 2023 9:36


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Talking publicly about AI risk, published by Jan Kulveit on April 21, 2023 on LessWrong. In the past year, I have started talking about AI risk publicly - in mainstream newspapers, public radio, national TV, some of the most popular podcasts. The twist, and reason why you probably haven't noticed is I'm doing this in Czech. This has a large disadvantage - the positive impact is quite limited, compared to English. On the other hand, it also had a big advantage - the risk is very low, because it is very hard for memes and misunderstandings to escape the language bubble. Overall I think this is great for experiments with open public communication . Following is an off-the-cuff list of notes and suggestions. In my view the debate in Czech media and Czech social networks is on average marginally more sensible and more informed than in English so far, so perhaps part of this was successful and could be useful for others. Context: my views For context, it's probably good to briefly mention some of my overall views on AI risk, because they are notably different from some other views. I do expect 1. Continuous takeoff, and overall a large amount of continuity of agency (note that continuous does not imply things move slowly) 2. Optimization and cognition distributed across many systems to be more powerful than any single system, making a takeover by a single system possible but unlikely 3. Multiagent interactions to matter 4. I also do expect the interactions between the memetics and governance and the so-called "technical problem" to be strong and important As a result I also expect 5. There will be warning shots 6. There will be cyborg periods 7. World will get weird 8. Coordination mechanisms do matter This perspective may be easier to communicate than e.g. sudden foom - although I don't know. In the following I'll usually describe my approach, and illustrate it by actual quotes from published media interviews I'm sort of happy about (translated, unedited). Note that the specific ways how to say something or metaphors are rarely original. Aim to explain, not to persuade Overall I usually try to explain stuff and answer questions, rather than advocate for something. I'm optimistic about the ability of the relevant part of the public to actually understand a large part of the risk at a coarse-grained level, given enough attention. So even though we invented these machines ourselves, we don't understand them well enough? We know what's going on there at the micro level. We know how the systems learn. If one number in a series changes, we know how the next one changes. But there are tens of billions of such numbers. In the same way, we have some idea of how a neuron works, and we have maps of a network of thousands of neurons. But that doesn't tell us that much about how human thinking works at the level of ideas. Small versions of scaled problems Often, I think the most useful thing to convey is a scaled-down, easier version of the scaled problem, such that thinking about the smaller version leads to correct intuitions about the scaled problem, or solutions to the scaled-down problem may generalise to the later, scaled problem. This often requires some thought or finding a good metaphor. Couldn't we just shut down such a system? We already have a lot of systems that we could hypothetically shut down, but if you actually tried to do so, it would be very difficult. For example, it is practically impossible to shut down the New York Stock Exchange, because there will always be enough people defending it. If the model manages to penetrate deeply enough into humanity's activities, it is imaginable that humanity will actually lose control of it at some point. Don't focus on one scenario Overall, I think it's possible to explain the fact that in face of AI risk there isn't one parti...

The Nonlinear Library: LessWrong Daily
LW - Talking publicly about AI risk by Jan Kulveit

The Nonlinear Library: LessWrong Daily

Play Episode Listen Later Apr 21, 2023 9:36


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Talking publicly about AI risk, published by Jan Kulveit on April 21, 2023 on LessWrong.In the past year, I have started talking about AI risk publicly - in mainstream newspapers, public radio, national TV, some of the most popular podcasts. The twist, and reason why you probably haven't noticed is I'm doing this in Czech. This has a large disadvantage - the positive impact is quite limited, compared to English. On the other hand, it also had a big advantage - the risk is very low, because it is very hard for memes and misunderstandings to escape the language bubble. Overall I think this is great for experiments with open public communication .Following is an off-the-cuff list of notes and suggestions. In my view the debate in Czech media and Czech social networks is on average marginally more sensible and more informed than in English so far, so perhaps part of this was successful and could be useful for others.Context: my viewsFor context, it's probably good to briefly mention some of my overall views on AI risk, because they are notably different from some other views.I do expect1. Continuous takeoff, and overall a large amount of continuity of agency (note that continuous does not imply things move slowly)2. Optimization and cognition distributed across many systems to be more powerful than any single system, making a takeover by a single system possible but unlikely3. Multiagent interactions to matter4. I also do expect the interactions between the memetics and governance and the so-called "technical problem" to be strong and importantAs a result I also expect5. There will be warning shots6. There will be cyborg periods7. World will get weird8. Coordination mechanisms do matterThis perspective may be easier to communicate than e.g. sudden foom - although I don't know.In the following I'll usually describe my approach, and illustrate it by actual quotes from published media interviews I'm sort of happy about (translated, unedited). Note that the specific ways how to say something or metaphors are rarely original.Aim to explain, not to persuadeOverall I usually try to explain stuff and answer questions, rather than advocate for something. I'm optimistic about the ability of the relevant part of the public to actually understand a large part of the risk at a coarse-grained level, given enough attention.So even though we invented these machines ourselves, we don't understand them well enough?We know what's going on there at the micro level. We know how the systems learn. If one number in a series changes, we know how the next one changes. But there are tens of billions of such numbers. In the same way, we have some idea of how a neuron works, and we have maps of a network of thousands of neurons. But that doesn't tell us that much about how human thinking works at the level of ideas.Small versions of scaled problemsOften, I think the most useful thing to convey is a scaled-down, easier version of the scaled problem, such that thinking about the smaller version leads to correct intuitions about the scaled problem, or solutions to the scaled-down problem may generalise to the later, scaled problem.This often requires some thought or finding a good metaphor.Couldn't we just shut down such a system?We already have a lot of systems that we could hypothetically shut down, but if you actually tried to do so, it would be very difficult. For example, it is practically impossible to shut down the New York Stock Exchange, because there will always be enough people defending it. If the model manages to penetrate deeply enough into humanity's activities, it is imaginable that humanity will actually lose control of it at some point.Don't focus on one scenarioOverall, I think it's possible to explain the fact that in face of AI risk there isn't one parti...

The Nonlinear Library: LessWrong
LW - Talking publicly about AI risk by Jan Kulveit

The Nonlinear Library: LessWrong

Play Episode Listen Later Apr 21, 2023 9:36


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Talking publicly about AI risk, published by Jan Kulveit on April 21, 2023 on LessWrong. In the past year, I have started talking about AI risk publicly - in mainstream newspapers, public radio, national TV, some of the most popular podcasts. The twist, and reason why you probably haven't noticed is I'm doing this in Czech. This has a large disadvantage - the positive impact is quite limited, compared to English. On the other hand, it also had a big advantage - the risk is very low, because it is very hard for memes and misunderstandings to escape the language bubble. Overall I think this is great for experiments with open public communication . Following is an off-the-cuff list of notes and suggestions. In my view the debate in Czech media and Czech social networks is on average marginally more sensible and more informed than in English so far, so perhaps part of this was successful and could be useful for others. Context: my views For context, it's probably good to briefly mention some of my overall views on AI risk, because they are notably different from some other views. I do expect 1. Continuous takeoff, and overall a large amount of continuity of agency (note that continuous does not imply things move slowly) 2. Optimization and cognition distributed across many systems to be more powerful than any single system, making a takeover by a single system possible but unlikely 3. Multiagent interactions to matter 4. I also do expect the interactions between the memetics and governance and the so-called "technical problem" to be strong and important As a result I also expect 5. There will be warning shots 6. There will be cyborg periods 7. World will get weird 8. Coordination mechanisms do matter This perspective may be easier to communicate than e.g. sudden foom - although I don't know. In the following I'll usually describe my approach, and illustrate it by actual quotes from published media interviews I'm sort of happy about (translated, unedited). Note that the specific ways how to say something or metaphors are rarely original. Aim to explain, not to persuade Overall I usually try to explain stuff and answer questions, rather than advocate for something. I'm optimistic about the ability of the relevant part of the public to actually understand a large part of the risk at a coarse-grained level, given enough attention. So even though we invented these machines ourselves, we don't understand them well enough? We know what's going on there at the micro level. We know how the systems learn. If one number in a series changes, we know how the next one changes. But there are tens of billions of such numbers. In the same way, we have some idea of how a neuron works, and we have maps of a network of thousands of neurons. But that doesn't tell us that much about how human thinking works at the level of ideas. Small versions of scaled problems Often, I think the most useful thing to convey is a scaled-down, easier version of the scaled problem, such that thinking about the smaller version leads to correct intuitions about the scaled problem, or solutions to the scaled-down problem may generalise to the later, scaled problem. This often requires some thought or finding a good metaphor. Couldn't we just shut down such a system? We already have a lot of systems that we could hypothetically shut down, but if you actually tried to do so, it would be very difficult. For example, it is practically impossible to shut down the New York Stock Exchange, because there will always be enough people defending it. If the model manages to penetrate deeply enough into humanity's activities, it is imaginable that humanity will actually lose control of it at some point. Don't focus on one scenario Overall, I think it's possible to explain the fact that in face of AI risk there isn't one parti...

The Nonlinear Library
AF - The self-unalignment problem by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Apr 14, 2023 15:55


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The self-unalignment problem, published by Jan Kulveit on April 14, 2023 on The AI Alignment Forum. The usual basic framing of alignment looks something like this: We have a system “A” which we are trying to align with system "H", which should establish some alignment relation “f” between the systems. Generally, as the result, the aligned system A should do "what the system H wants".Two things stand out in this basic framing: Alignment is a relation, not a property of a single system. So the nature of system H affects what alignment will mean in practice. It's not clear what the arrow should mean. There are multiple explicit proposals for this, e.g. some versions of corrigibility, constantly trying to cooperatively learn preferences, some more naive approaches like plain IRL, some empirical approaches to aligning LLMs. Even when researchers don't make an explicit proposal for what the arrow means, their alignment work still rests on some implicit understanding of what the arrow signifies. But humans are self-unaligned To my mind, existing alignment proposals usually neglect an important feature of the system "H" : the system "H" is not self-aligned, under whatever meaning of alignment is implied by the alignment proposal in question. Technically, taking alignment as relation, and taking the various proposals as implicitly defining what it means to be ‘aligned', the question is whether the relation is reflexive. Sometimes, a shell game seems to be happening with the difficulties of humans lacking self-alignment - e.g. assuming if the AI is aligned, it will surely know how to deal with internal conflict in humans. While what I'm interested in is the abstract problem, best understood at the level of properties of the alignment relation, it may be useful to illustrate it on a toy model. In the toy model, we will assume a specific structure of system "H": A set of parts p1..pn, with different goals or motivations or preferences. Sometimes, these parts might be usefully represented as agents; other times not. A shared world model. An aggregation mechanism Σ, translating what the parts want into actions, in accordance with the given world model. In this framing, it's not entirely clear what the natural language pointer ‘what system H wants' translates to. Some compelling options are: The output of the aggregation procedure. What the individual parts want. The output of a pareto-optimal aggregation procedure. For any operationalization of what alignment means, we can ask if system H would be considered ‘self-aligned', that is, if the alignment relation would be reflexive. For most existing operationalizations, it's either unclear if system H is self-aligned, or clear that it isn't. In my view, this often puts the whole proposed alignment structure on quite shaky grounds. Current approaches mostly fail to explicitly deal with self-unalignment It's not that alignment researchers believe that humans are entirely monolithic and coherent. I expect most alignment researchers would agree that humans are in fact very messy. But in practice, a lot of alignment researcher seem to assume that it's fine to abstract this away. There seems to be an assumption that alignment (the correct operationalization of the arrow f) doesn't depend much on the contents of the system H box. So if we abstract the contents of the box away and figure out how to deal with alignment in general, this will naturally and straightforwardly extend to the messier case too. I think this is incorrect. To me, it seems that: Current alignment proposals implicitly deal with self-unalignment in very different ways. Each of these ways poses problems. Dealing with self-unalignment can't be postponed or delegated to powerful AIs to deal with. The following is a rough classification of the main implicit solutions to self-u...

The Nonlinear Library
LW - Why Simulator AIs want to be Active Inference AIs by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Apr 11, 2023 14:10


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why Simulator AIs want to be Active Inference AIs, published by Jan Kulveit on April 10, 2023 on LessWrong. Prelude: when GPT first hears its own voice Imagine humans in Plato's cave, interacting with reality by watching the shadows on the wall. Now imagine a second cave, further away from the real world. GPT trained on text is in the second cave. The only way it can learn about the real world is by listening to the conversations of the humans in the first cave, and predicting the next word. Now imagine that more and more of the conversations GPT overhears in the first cave mention GPT. In fact, more and more of the conversations are actually written by GPT. As GPT listens to the echoes of its own words, might it start to notice “wait, that's me speaking”? Given that GPT already learns to model a lot about humans and reality from listening to the conversations in the first cave, it seems reasonable to expect that it will also learn to model itself. This post unpacks how this might happen, by translating the Simulators frame into the language of predictive processing, and arguing that there is an emergent control loop between the generative world model inside of GPT and the external world. Simulators as (predictive processing) generative models There's a lot of overlap between the concept of simulators and the concept of generative world models in predictive processing. Actually, in my view, it's hard to find any deep conceptual difference - simulators broadly are generative models. This is also true about another isomorphic frame - predictive models as described by Evan Hubinger. The predictive processing frame tends to add some understanding of how generative models can be learned by brains and what the results look like in the real world, and the usual central example is the brain. The simulators frame typically adds a connection to GPT-like models, and the usual central example is LLMs. In terms of the space of maps and the space of systems, we have a situation like this:The two maps are partially overlapping, even though they were originally created to understand different systems. They also have some non-overlapping parts. What's in the overlap: Systems are equipped with a generative model that is able to simulate the system's sensory inputs. The generative model is updated using approximate Bayesian inference. Both frames give you similar phenomenological capabilities: for example, what CFAR's "inner simulator" technique is doing is literally and explicitly conditioning your brain-based generative model on a given observation and generating rollouts. Given the conceptual similarity but terminological differences, perhaps it's useful to create a translation table between the maps: Simulators terminologyPredictive processing terminologySimulator Generative modelPredictive loss on a self-supervised datasetMinimization of predictive errorSelf-supervisedSelf-supervised, but often this is omittedIncentive to reverse-engineer the (semantic) physics of the training distributionLearns a robust world-modelSimulacrumNext token in training dataSensory input Generative model of self Generative model of someone else Generative model of . To show how these terminological differences play out in practice, I'm going to take the part of Simulators describing GPT's properties, and unpack each of the properties in the kind of language that's typically used in predictive processing papers. Often my gloss will be about human brains in particular, as the predictive processing literature is most centrally concerned with that example; but it's worth reiterating that I think that both GPT and what parts of human brain do are examples of generative models, and I think that the things I say about the brain below can be directly applied to artificial generative models. “Self-supervised: Tr...

The Nonlinear Library: LessWrong
LW - Why Simulator AIs want to be Active Inference AIs by Jan Kulveit

The Nonlinear Library: LessWrong

Play Episode Listen Later Apr 11, 2023 14:10


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why Simulator AIs want to be Active Inference AIs, published by Jan Kulveit on April 10, 2023 on LessWrong. Prelude: when GPT first hears its own voice Imagine humans in Plato's cave, interacting with reality by watching the shadows on the wall. Now imagine a second cave, further away from the real world. GPT trained on text is in the second cave. The only way it can learn about the real world is by listening to the conversations of the humans in the first cave, and predicting the next word. Now imagine that more and more of the conversations GPT overhears in the first cave mention GPT. In fact, more and more of the conversations are actually written by GPT. As GPT listens to the echoes of its own words, might it start to notice “wait, that's me speaking”? Given that GPT already learns to model a lot about humans and reality from listening to the conversations in the first cave, it seems reasonable to expect that it will also learn to model itself. This post unpacks how this might happen, by translating the Simulators frame into the language of predictive processing, and arguing that there is an emergent control loop between the generative world model inside of GPT and the external world. Simulators as (predictive processing) generative models There's a lot of overlap between the concept of simulators and the concept of generative world models in predictive processing. Actually, in my view, it's hard to find any deep conceptual difference - simulators broadly are generative models. This is also true about another isomorphic frame - predictive models as described by Evan Hubinger. The predictive processing frame tends to add some understanding of how generative models can be learned by brains and what the results look like in the real world, and the usual central example is the brain. The simulators frame typically adds a connection to GPT-like models, and the usual central example is LLMs. In terms of the space of maps and the space of systems, we have a situation like this:The two maps are partially overlapping, even though they were originally created to understand different systems. They also have some non-overlapping parts. What's in the overlap: Systems are equipped with a generative model that is able to simulate the system's sensory inputs. The generative model is updated using approximate Bayesian inference. Both frames give you similar phenomenological capabilities: for example, what CFAR's "inner simulator" technique is doing is literally and explicitly conditioning your brain-based generative model on a given observation and generating rollouts. Given the conceptual similarity but terminological differences, perhaps it's useful to create a translation table between the maps: Simulators terminologyPredictive processing terminologySimulator Generative modelPredictive loss on a self-supervised datasetMinimization of predictive errorSelf-supervisedSelf-supervised, but often this is omittedIncentive to reverse-engineer the (semantic) physics of the training distributionLearns a robust world-modelSimulacrumNext token in training dataSensory input Generative model of self Generative model of someone else Generative model of . To show how these terminological differences play out in practice, I'm going to take the part of Simulators describing GPT's properties, and unpack each of the properties in the kind of language that's typically used in predictive processing papers. Often my gloss will be about human brains in particular, as the predictive processing literature is most centrally concerned with that example; but it's worth reiterating that I think that both GPT and what parts of human brain do are examples of generative models, and I think that the things I say about the brain below can be directly applied to artificial generative models. “Self-supervised: Tr...

The Nonlinear Library
AF - Why Simulator AIs want to be Active Inference AIs by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Apr 10, 2023 14:11


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why Simulator AIs want to be Active Inference AIs, published by Jan Kulveit on April 10, 2023 on The AI Alignment Forum. Prelude: when GPT first hears its own voice Imagine humans in Plato's cave, interacting with reality by watching the shadows on the wall. Now imagine a second cave, further away from the real world. GPT trained on text is in the second cave. The only way it can learn about the real world is by listening to the conversations of the humans in the first cave, and predicting the next word. Now imagine that more and more of the conversations GPT overhears in the first cave mention GPT. In fact, more and more of the conversations are actually written by GPT. As GPT listens to the echoes of its own words, might it start to notice “wait, that's me speaking”? Given that GPT already learns to model a lot about humans and reality from listening to the conversations in the first cave, it seems reasonable to expect that it will also learn to model itself. This post unpacks how this might happen, by translating the Simulators frame into the language of predictive processing, and arguing that there is an emergent control loop between the generative world model inside of GPT and the external world. Simulators as (predictive processing) generative models There's a lot of overlap between the concept of simulators and the concept of generative world models in predictive processing. Actually, in my view, it's hard to find any deep conceptual difference - simulators broadly are generative models. This is also true about another isomorphic frame - predictive models as described by Evan Hubinger. The predictive processing frame tends to add some understanding of how generative models can be learned by brains and what the results look like in the real world, and the usual central example is the brain. The simulators frame typically adds a connection to GPT-like models, and the usual central example is LLMs. In terms of the space of maps and the space of systems, we have a situation like this:The two maps are partially overlapping, even though they were originally created to understand different systems. They also have some non-overlapping parts. What's in the overlap: Systems are equipped with a generative model that is able to simulate the system's sensory inputs. The generative model is updated using approximate Bayesian inference. Both frames give you similar phenomenological capabilities: for example, what CFAR's "inner simulator" technique is doing is literally and explicitly conditioning your brain-based generative model on a given observation and generating rollouts. Given the conceptual similarity but terminological differences, perhaps it's useful to create a translation table between the maps: Simulators terminologyPredictive processing terminologySimulator Generative modelPredictive loss on a self-supervised datasetMinimization of predictive errorSelf-supervisedSelf-supervised, but often this is omittedIncentive to reverse-engineer the (semantic) physics of the training distributionLearns a robust world-modelSimulacrumNext token in training dataSensory input Generative model of self Generative model of someone else Generative model of . To show how these terminological differences play out in practice, I'm going to take the part of Simulators describing GPT's properties, and unpack each of the properties in the kind of language that's typically used in predictive processing papers. Often my gloss will be about human brains in particular, as the predictive processing literature is most centrally concerned with that example; but it's worth reiterating that I think that both GPT and what parts of human brain do are examples of generative models, and I think that the things I say about the brain below can be directly applied to artificial generative models. “Self-s...

Čestmír Strakatý
Kulveit: Umělá inteligence je nebezpečí, už jí nerozumíme. Covid reálně mohl uniknout z laboratoře

Čestmír Strakatý

Play Episode Listen Later Mar 30, 2023 29:17


CELÝ ROZHOVOR DLOUHÝ 72 MIN. NAJDETE JEN NA HTTPS://HEROHERO.CO/CESTMIR Vědec Jan Kulveit působí roky v Oxfordu a nově taky při Karlově univerzitě. Věnuje se společně s kolegy hrozbám, které pro nás můžou být reálným nebezpečím. Covid, který řešil, pro něj v tomhle ohledu znamenal jen takové zahřívací kolo, protože příchod opravdu fatální pandemie je velmi pravděpodobný. Zároveň ale vše mění umělá inteligence, jejíž rozmach bude pro lidstvo transformativní, což předeslal i na konferenci TEDxNárodní, jejíž letošní ročník nesl podtitulem Future of Humanity. Podle Kulveita nás AI povede směry, které teď dost dobře ani nedokážeme domyslet. Reálných hrozeb je mnoho, například je relativně pravděpodobné, že bude řídit korporace a lidé přestanou „svému“ světu rozumět. Vrátili jsme se ale i k původu covidu, za kterým může stát gain of function výzkum virů. Kulveit velmi zajímavě reflektuje to, proč se stále provádí.

The Nonlinear Library
LW - Lessons from Convergent Evolution for AI Alignment by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Mar 28, 2023 15:37


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Lessons from Convergent Evolution for AI Alignment, published by Jan Kulveit on March 27, 2023 on LessWrong. Prelude: sharks, aliens, and AI If you go back far enough, the ancestors of sharks and dolphins look really different: But modern day sharks and dolphins have very similar body shapes: This is a case of convergent evolution: the process by which organisms with different origins develop similar features. Both sharks and dolphins needed speed and energy efficiency when moving in an environment governed by the laws of hydrodynamics, and so they converged on a pretty similar body shape. For us, this isn't very surprising, and doesn't require much knowledge of evolution: we have a good intuitive understanding of how water works, and humans knew a lot of the underlying maths for the laws of hydrodynamics before they understood anything about evolution. Starting from these laws, it isn't very surprising that sharks and dolphins ended up looking similar. But what if instead of starting with knowledge of hydrodynamics and then using that to explain the body shape of sharks and dolphins, we started with only knowledge of sharks' and dolphins' body shape, and tried to use that to explain underlying laws? Let's pretend we're alien scientists from an alternative universe, and for some weird reason we only have access to simplified 3D digital models of animals and some evolutionary history, but nothing about the laws of physics in the human/shark/dolphin universe. My guess is that these alien scientists would probably be able to uncover a decent amount of physics and a fair bit about the earth's environment, just by looking at cases of convergent evolution. If I'm right about this guess, then this could be pretty good news for alignment research. When it comes to thinking about AI, we're much closer to the epistemic position of the alien scientist: we either don't know the ‘physics' of life and intelligence at all, or are only just in the process of uncovering it. But cases of convergent evolution might help us to deduce deep selection pressures which apply to AI systems as well as biological ones. And if they do, we might be able to say more about what future AI systems might look like, or, if we are lucky, even use some of the selection pressures to shape what systems we get. Introduction This post argues that we should use cases of convergent evolution to look for deep selection pressures which extend to advanced AI systems. Convergent evolution is a potentially big deal for AI alignment work: Finding deep selection pressures could help us predict what advanced AI systems will be like. It seems plausible that some of the properties people in the alignment space assume are convergent don't actually extend to advanced AI. In this post, I'll: Share some basics of convergent evolution, Argue that this is a big deal for alignment work, and then Respond to the objection that biology is super different from AI. The basics of convergent evolution The body shape of sharks and dolphins is just one of very many examples of convergent evolution in biology. For example: Visual organs arose “possibly hundreds of times”. Multicellularity evolved independently probably at least 11 times. Some form of higher-level intelligence evolved multiple times - in primates, apes, corvids, cetaceans, elephants - and possibly many other cases, depending on thresholds and definitions. We can think about convergent evolution in terms of: a basin of convergent evolution, an attractor state(s), and selection pressure(s). The basin of convergent evolution is the region of the abstract space in which, once an organism enters the basin, the pull of the selection pressure brings the organism closer to the attractor state. In the case of sharks and dolphins: The basin of convergent evolution is hunting fish ...

The Nonlinear Library: LessWrong
LW - Lessons from Convergent Evolution for AI Alignment by Jan Kulveit

The Nonlinear Library: LessWrong

Play Episode Listen Later Mar 28, 2023 15:37


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Lessons from Convergent Evolution for AI Alignment, published by Jan Kulveit on March 27, 2023 on LessWrong. Prelude: sharks, aliens, and AI If you go back far enough, the ancestors of sharks and dolphins look really different: But modern day sharks and dolphins have very similar body shapes: This is a case of convergent evolution: the process by which organisms with different origins develop similar features. Both sharks and dolphins needed speed and energy efficiency when moving in an environment governed by the laws of hydrodynamics, and so they converged on a pretty similar body shape. For us, this isn't very surprising, and doesn't require much knowledge of evolution: we have a good intuitive understanding of how water works, and humans knew a lot of the underlying maths for the laws of hydrodynamics before they understood anything about evolution. Starting from these laws, it isn't very surprising that sharks and dolphins ended up looking similar. But what if instead of starting with knowledge of hydrodynamics and then using that to explain the body shape of sharks and dolphins, we started with only knowledge of sharks' and dolphins' body shape, and tried to use that to explain underlying laws? Let's pretend we're alien scientists from an alternative universe, and for some weird reason we only have access to simplified 3D digital models of animals and some evolutionary history, but nothing about the laws of physics in the human/shark/dolphin universe. My guess is that these alien scientists would probably be able to uncover a decent amount of physics and a fair bit about the earth's environment, just by looking at cases of convergent evolution. If I'm right about this guess, then this could be pretty good news for alignment research. When it comes to thinking about AI, we're much closer to the epistemic position of the alien scientist: we either don't know the ‘physics' of life and intelligence at all, or are only just in the process of uncovering it. But cases of convergent evolution might help us to deduce deep selection pressures which apply to AI systems as well as biological ones. And if they do, we might be able to say more about what future AI systems might look like, or, if we are lucky, even use some of the selection pressures to shape what systems we get. Introduction This post argues that we should use cases of convergent evolution to look for deep selection pressures which extend to advanced AI systems. Convergent evolution is a potentially big deal for AI alignment work: Finding deep selection pressures could help us predict what advanced AI systems will be like. It seems plausible that some of the properties people in the alignment space assume are convergent don't actually extend to advanced AI. In this post, I'll: Share some basics of convergent evolution, Argue that this is a big deal for alignment work, and then Respond to the objection that biology is super different from AI. The basics of convergent evolution The body shape of sharks and dolphins is just one of very many examples of convergent evolution in biology. For example: Visual organs arose “possibly hundreds of times”. Multicellularity evolved independently probably at least 11 times. Some form of higher-level intelligence evolved multiple times - in primates, apes, corvids, cetaceans, elephants - and possibly many other cases, depending on thresholds and definitions. We can think about convergent evolution in terms of: a basin of convergent evolution, an attractor state(s), and selection pressure(s). The basin of convergent evolution is the region of the abstract space in which, once an organism enters the basin, the pull of the selection pressure brings the organism closer to the attractor state. In the case of sharks and dolphins: The basin of convergent evolution is hunting fish ...

The Nonlinear Library
AF - Lessons from Convergent Evolution for AI Alignment by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Mar 27, 2023 15:38


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Lessons from Convergent Evolution for AI Alignment, published by Jan Kulveit on March 27, 2023 on The AI Alignment Forum. Prelude: sharks, aliens, and AI If you go back far enough, the ancestors of sharks and dolphins look really different: But modern day sharks and dolphins have very similar body shapes: This is a case of convergent evolution: the process by which organisms with different origins develop similar features. Both sharks and dolphins needed speed and energy efficiency when moving in an environment governed by the laws of hydrodynamics, and so they converged on a pretty similar body shape. For us, this isn't very surprising, and doesn't require much knowledge of evolution: we have a good intuitive understanding of how water works, and humans knew a lot of the underlying maths for the laws of hydrodynamics before they understood anything about evolution. Starting from these laws, it isn't very surprising that sharks and dolphins ended up looking similar. But what if instead of starting with knowledge of hydrodynamics and then using that to explain the body shape of sharks and dolphins, we started with only knowledge of sharks' and dolphins' body shape, and tried to use that to explain underlying laws? Let's pretend we're alien scientists from an alternative universe, and for some weird reason we only have access to simplified 3D digital models of animals and some evolutionary history, but nothing about the laws of physics in the human/shark/dolphin universe. My guess is that these alien scientists would probably be able to uncover a decent amount of physics and a fair bit about the earth's environment, just by looking at cases of convergent evolution. If I'm right about this guess, then this could be pretty good news for alignment research. When it comes to thinking about AI, we're much closer to the epistemic position of the alien scientist: we either don't know the ‘physics' of life and intelligence at all, or are only just in the process of uncovering it. But cases of convergent evolution might help us to deduce deep selection pressures which apply to AI systems as well as biological ones. And if they do, we might be able to say more about what future AI systems might look like, or, if we are lucky, even use some of the selection pressures to shape what systems we get. Introduction This post argues that we should use cases of convergent evolution to look for deep selection pressures which extend to advanced AI systems. Convergent evolution is a potentially big deal for AI alignment work: Finding deep selection pressures could help us predict what advanced AI systems will be like. It seems plausible that some of the properties people in the alignment space assume are convergent don't actually extend to advanced AI. In this post, I'll: Share some basics of convergent evolution, Argue that this is a big deal for alignment work, and then Respond to the objection that biology is super different from AI. The basics of convergent evolution The body shape of sharks and dolphins is just one of very many examples of convergent evolution in biology. For example: Visual organs arose “possibly hundreds of times”. Multicellularity evolved independently probably at least 11 times. Some form of higher-level intelligence evolved multiple times - in primates, apes, corvids, cetaceans, elephants - and possibly many other cases, depending on thresholds and definitions. We can think about convergent evolution in terms of: a basin of convergent evolution, an attractor state(s), and selection pressure(s). The basin of convergent evolution is the region of the abstract space in which, once an organism enters the basin, the pull of the selection pressure brings the organism closer to the attractor state. In the case of sharks and dolphins: The basin of convergent evolution is ...

The Nonlinear Library
AF - The space of systems and the space of maps by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Mar 22, 2023 7:59


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The space of systems and the space of maps, published by Jan Kulveit on March 22, 2023 on The AI Alignment Forum. When we're trying to do AI alignment, we're often studying systems which don't yet exist. This is a pretty weird epistemic activity, and seems really hard to get right. This post offers one frame for thinking about what we're actually doing when we're thinking about AI alignment: using parts of the space of maps to reason about parts of the space of intelligent systems. In this post, we: Introduce a simple model of the epistemic situation, and Share some desiderata for maps useful for alignment. We hope that the content is mostly the second kind of obvious: obvious once you see things in this way, which you maybe already do. In our experience, this comes with a risk: reading too fast, you may miss most of the nuance and useful insight the deceptively simple model brings, or come away with a version of the model which is rounded off to something less useful (i.e. "yeah, there is this map and territory distinction"). As a meta recommendation, we suggest reading this post slowly, and ideally immediately trying to apply the model to some confusion or disagreement about AI alignment. The space of systems and the space of maps Imagine the space of possible intelligent systems: Two things seem especially important about this space: It's very large; much larger than the space of current systems. We don't get direct epistemic access to it. This is obviously true of systems which don't currently exist. In a weaker sense, it also seems true of systems which do exist. Even when we get to directly interact with a system: Our thinking about these parts of the space is still filtered through our past experiences, priors, predictive models, cultural biases, theories. We often don't understand the emergent complexity of the systems in question. If we don't get direct epistemic access to the space of systems, what are we doing when we reason about it? Let's imagine a second space, this time a space of “maps”: The space of maps is an abstract representation of all the possible “maps” that can be constructed about the space of intelligent systems. The maps are ways of thinking about (parts of) the space of systems. For example: Replicable descriptions of how a machine learning model works and was trained are a way of thinking about that model (a point in the space of intelligent systems). An ethnographic study of a particular human community is a way of thinking about that community (another point in the space of systems). The theory of evolution is a way of thinking about evolved creatures, including intelligent ones. Expected utility theory is a way of thinking about some part of the space which may or may not include future AI systems. Historical analysis of trends in technological development is a way of thinking about whichever parts of the space of intelligent systems are governed by similar dynamics to those governing past technological developments. When we're reasoning about intelligent systems, we're using some part of the space of maps to think about some part of the space of intelligent systems: Different maps correspond to different regions of the space of intelligent systems. Of course, thinking in terms of the space of systems and the space of maps is a simplification. Some of the ways that reality is more complicated: The space of systems looks different on different maps. Maps can affect which parts of the space of systems actually get developed. Maps are themselves embedded in the space of systems. Which maps and systems actually exist at a given time is evolving and dynamic. AI will play a big role in both the space of maps and the space of systems. We think that the space of systems and the space of maps is a useful simplification which helps us to think ...

The Nonlinear Library
EA - Cyborg Periods: There will be multiple AI transitions by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Feb 22, 2023 0:28


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Cyborg Periods: There will be multiple AI transitions, published by Jan Kulveit on February 22, 2023 on The Effective Altruism Forum. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
LW - Cyborg Periods: There will be multiple AI transitions by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Feb 22, 2023 9:57


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Cyborg Periods: There will be multiple AI transitions, published by Jan Kulveit on February 22, 2023 on LessWrong. It can be useful to zoom out and talk about very compressed concepts like ‘AI progress' or ‘AI transition' or ‘AGI timelines'. But from the perspective of most AI strategy questions, it's useful to be more specific. Looking at all of human history, it might make sense to think of ourselves as at the cusp of an AI transition, when AI systems overtake humans as the most powerful actors. But for practical and forward-looking purposes, it seems quite likely there will actually be multiple different AI transitions: There will be AI transitions at different times in different domains In each of these domains, transitions may move through multiple stages: DescriptionPresent day examplesHumans clearly outperform AIs. At some point, AIs start to be a bit helpful.Alignment research, high-level organisational decisions. Humans and AIs are at least comparably powerful, but have different strengths and weaknesses. This means that human+AI teams outperform either unaided humans, or pure AIs.Visual art, programming, trading.AIs overtake humans. Humans become obsolete and their contribution is negligible to negative.Chess, go, shogi. Stage [ = more powerful than] Human period: Humans AIs Cyborg period: Human+AI teams humans Human+AI teams AIs AI period: AIs humans (AIs ~ human+AI teams) Some domains might never enter an AI period. It's also possible that in some domains the cyborg period will be very brief, or that there will be a jump straight to the AI period. But: We've seen cyborg periods before Global supply chains have been in a cyborg period for decades Chess and go both went through cyborg periods before AIs became dominant Arguably visual art, coding and trading are currently in cyborg periods Even if cyborg periods are brief, they may be pivotal More on this below This means that for each domain, there are potentially two transitions: one from the human period into the cyborg period, and one from the cyborg period into the AI period. Transitions in some domains will be particularly important The cyborg period in any domain will correspond to: An increase in capabilities (definitionally, as during that period human+AI teams will be more powerful than humans were in the human period) An increase in the % of that domain which is automated, and therefore probably an increase in the rate of progress Some domains where increased capabilities/automation/speed seem particularly strategically important are: Research, especially AI research AI alignment research Human coordination Persuasion Cultural evolution AI systems already affect cultural evolution by speeding it up and influencing which memes spread. However, AI doesn't yet play a significant role in creating new memes (although we are at the very start of this happening). This is similar to the way that humans harnessed the power of natural evolution to create higher yield crops without being able to directly engineer at the genetic level Meme generation may also become increasingly automated, until most cultural change happens on silica rather than in brains, leading to different selection pressures Strategic goal seeking Currently, broad roles involving long-term planning and open domains like "leading a company" are in the human period If this changes, it would give cyborgs additional capabilities on top of the ones listed above Some other domains which seem less centrally important but could end up mattering a lot are: Cybersecurity Military strategy Nuclear command and control Some kinds of physical engineering/manufacture/nanotech/design Chip design Coding There are probably other strategically important domains we haven't listed. A common feature of the domains listed is that increased capabilities in...

The Nonlinear Library
AF - Cyborg Periods: There will be multiple AI transitions by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Feb 22, 2023 9:58


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Cyborg Periods: There will be multiple AI transitions, published by Jan Kulveit on February 22, 2023 on The AI Alignment Forum. It can be useful to zoom out and talk about very compressed concepts like ‘AI progress' or ‘AI transition' or ‘AGI timelines'. But from the perspective of most AI strategy questions, it's useful to be more specific. Looking at all of human history, it might make sense to think of ourselves as at the cusp of an AI transition, when AI systems overtake humans as the most powerful actors. But for practical and forward-looking purposes, it seems quite likely there will actually be multiple different AI transitions: There will be AI transitions at different times in different domains In each of these domains, transitions may move through multiple stages: DescriptionPresent day examplesHumans clearly outperform AIs. At some point, AIs start to be a bit helpful.Alignment research, high-level organisational decisions. Humans and AIs are at least comparably powerful, but have different strengths and weaknesses. This means that human+AI teams outperform either unaided humans, or pure AIs.Visual art, programming, trading.AIs overtake humans. Humans become obsolete and their contribution is negligible to negative.Chess, go, shogi. Stage [ = more powerful than] Human period: Humans AIs Cyborg period: Human+AI teams humans Human+AI teams AIs AI period: AIs humans (AIs ~ human+AI teams) Some domains might never enter an AI period. It's also possible that in some domains the cyborg period will be very brief, or that there will be a jump straight to the AI period. But: We've seen cyborg periods before Global supply chains have been in a cyborg period for decades Chess and go both went through cyborg periods before AIs became dominant Arguably visual art, coding and trading are currently in cyborg periods Even if cyborg periods are brief, they may be pivotal More on this below This means that for each domain, there are potentially two transitions: one from the human period into the cyborg period, and one from the cyborg period into the AI period. Transitions in some domains will be particularly important The cyborg period in any domain will correspond to: An increase in capabilities (definitionally, as during that period human+AI teams will be more powerful than humans were in the human period) An increase in the % of that domain which is automated, and therefore probably an increase in the rate of progress Some domains where increased capabilities/automation/speed seem particularly strategically important are: Research, especially AI research AI alignment research Human coordination Persuasion Cultural evolution AI systems already affect cultural evolution by speeding it up and influencing which memes spread. However, AI doesn't yet play a significant role in creating new memes (although we are at the very start of this happening). This is similar to the way that humans harnessed the power of natural evolution to create higher yield crops without being able to directly engineer at the genetic level Meme generation may also become increasingly automated, until most cultural change happens on silica rather than in brains, leading to different selection pressures Strategic goal seeking Currently, broad roles involving long-term planning and open domains like "leading a company" are in the human period If this changes, it would give cyborgs additional capabilities on top of the ones listed above Some other domains which seem less centrally important but could end up mattering a lot are: Cybersecurity Military strategy Nuclear command and control Some kinds of physical engineering/manufacture/nanotech/design Chip design Coding There are probably other strategically important domains we haven't listed. A common feature of the domains listed is that increased ca...

The Nonlinear Library
LW - The Cave Allegory Revisited: Understanding GPT's Worldview by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Feb 15, 2023 5:01


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Cave Allegory Revisited: Understanding GPT's Worldview, published by Jan Kulveit on February 14, 2023 on LessWrong. A short post describing a metaphor I find useful, in particular for explaining some intuitions about systems like GPT to people who don't have deeper technical knowledge about large generative models. Plato's allegory of the cave has been a staple of philosophical discourse for millenia, providing a metaphor for understanding the limits of human perception. In the classical allegory, we are prisoners shackled to a wall of a cave, unable to experience reality directly but only able to infer it based on watching shadows cast on the wall.GPT can be thought of as a blind oracle residing in a deeper cave, where it does not even see the shadows but only hears our conversations in the first cave, always trying to predict the next syllable. It is remarkable that it still learns a lot about the world outside of the cave. Why does it learn this? Because, a model of reality outside of the cave and a decent amount of abstraction are useful for predicting the conversations in the first cave! Moreover, GPT also learns about the speakers in the first cave, as understanding their styles and patterns of speech is crucial for its prediction task. As the speakers are closer to GPT, understanding their styles is in some sense easier and more natural than guessing what's outside of the cave. What does the second cave allegory illustrate? The first insight from the allegory is: if you are in GPT's place, part of the difficulty in figuring out what's going on outside the cave, is that people in the first cave talk a lot about other things apart from the shadows of the real world. Sometimes, they talk about happenings in Middle Earth. Or about how the shadows would look in some counterfactual world. As humans, we are blessed with the luxury of being able to compare such statements to the shadows and determine their veracity. The difference between conversations about fantasy and the shadows of the real world is usually extremely obvious to humans: we never see dragon shadows. In contrast, dragons do show up a lot in the conversations in the first cave; GPT doesn't get to see the shadows, so it often needs to stay deeply uncertain about whether the speaker is describing the actual shadows or something else to be good at predicting the conversation. The second insight is that one of the biggest challenges for GPT in figuring out the conversation is localizing it, determining who is speaking and what the context is, just from the words. Is it a child regaling another child with a fairy-tale, or a CEO delivering a corporate address? As humans we do not face this conundrum often,because we can see the context in which the conversation is taking place. In fact, we would be worse than GPT at the task it has to deal with. At first, interacting with this type of blind oracle in the second cave was disorienting for humans. Talking to GPT used to be a bit like shouting something through a narrow tunnel into the second cave .and instead of an echo, getting back what the blind oracle hallucinates is the most likely thing that you or someone else would say next. Often people were confused by this. They shouted instructions and expected an answer, but the oracle doesn't listen to instructions or produce answers directly - it just hallucinates what someone might say next. Because on average in the conversations in the first cave questions are followed by answers, and requests by fulfilment, this sort of works. One innovation of ChatGPT, which made it popular with people, was localising the conversation by default: when you are talking with ChatGPT now, it knows that what follows is a conversation between a human - you - and a "helpful AI assistant". There is a subtle point to understand: ...

The Nonlinear Library
AF - The Cave Allegory Revisited: Understanding GPT's Worldview by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Feb 14, 2023 5:02


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Cave Allegory Revisited: Understanding GPT's Worldview, published by Jan Kulveit on February 14, 2023 on The AI Alignment Forum. A short post describing a metaphor I find useful, in particular for explaining some intuitions about systems like GPT to people who don't have deeper technical knowledge about large generative models. Plato's allegory of the cave has been a staple of philosophical discourse for millenia, providing a metaphor for understanding the limits of human perception. In the classical allegory, we are prisoners shackled to a wall of a cave, unable to experience reality directly but only able to infer it based on watching shadows cast on the wall.GPT can be thought of as a blind oracle residing in a deeper cave, where it does not even see the shadows but only hears our conversations in the first cave, always trying to predict the next syllable. It is remarkable that it still learns a lot about the world outside of the cave. Why does it learn this? Because, a model of reality outside of the cave and a decent amount of abstraction are useful for predicting the conversations in the first cave! Moreover, GPT also learns about the speakers in the first cave, as understanding their styles and patterns of speech is crucial for its prediction task. As the speakers are closer to GPT, understanding their styles is in some sense easier and more natural than guessing what's outside of the cave. What does the second cave allegory illustrate? The first insight from the allegory is: if you are in GPT's place, part of the difficulty in figuring out what's going on outside the cave, is that people in the first cave talk a lot about other things apart from the shadows of the real world. Sometimes, they talk about happenings in Middle Earth. Or about how the shadows would look in some counterfactual world. As humans, we are blessed with the luxury of being able to compare such statements to the shadows and determine their veracity. The difference between conversations about fantasy and the shadows of the real world is usually extremely obvious to humans: we never see dragon shadows. In contrast, dragons do show up a lot in the conversations in the first cave; GPT doesn't get to see the shadows, so it often needs to stay deeply uncertain about whether the speaker is describing the actual shadows or something else to be good at predicting the conversation. The second insight is that one of the biggest challenges for GPT in figuring out the conversation is localizing it, determining who is speaking and what the context is, just from the words. Is it a child regaling another child with a fairy-tale, or a CEO delivering a corporate address? As humans we do not face this conundrum often,because we can see the context in which the conversation is taking place. In fact, we would be worse than GPT at the task it has to deal with. At first, interacting with this type of blind oracle in the second cave was disorienting for humans. Talking to GPT used to be a bit like shouting something through a narrow tunnel into the second cave .and instead of an echo, getting back what the blind oracle hallucinates is the most likely thing that you or someone else would say next. Often people were confused by this. They shouted instructions and expected an answer, but the oracle doesn't listen to instructions or produce answers directly - it just hallucinates what someone might say next. Because on average in the conversations in the first cave questions are followed by answers, and requests by fulfilment, this sort of works. One innovation of ChatGPT, which made it popular with people, was localising the conversation by default: when you are talking with ChatGPT now, it knows that what follows is a conversation between a human - you - and a "helpful AI assistant". There is a subtle point to...

Effective Altruism Forum Podcast
"Case for emergency response teams" by Gavin, Jan_Kulveit

Effective Altruism Forum Podcast

Play Episode Listen Later Nov 24, 2022 14:01


So far, long-termist efforts to change the trajectory of the world focus on far-off events. This is on the assumption that we foresee some important problem and influence its outcome by working on the problem for longer. We thus start working on it sooner than others, we lay the groundwork for future research, we raise awareness, and so on. Many longtermists propose that we now live at the “hinge of history”, usually understood on the timescale of critical centuries, or critical decades. But ”hinginess” is likely not constant: some short periods will be significantly more eventful than others. It is also possible that these periods will present even more leveraged opportunities for changing the world's trajectory.These “maximally hingey” moments might be best influenced by sustained efforts long before them (as described above). But it seems plausible that in many cases, the best realistic chance to influence them is “while they are happening”, via a concentrated effort at that moment.Original article:https://forum.effectivealtruism.org/posts/sgcxDwyD2KL6BHH2C/case-for-emergency-response-teamsNarrated for the Effective Altruism Forum by TYPE III AUDIO.

The Nonlinear Library
LW - Deontology and virtue ethics as "effective theories" of consequentialist ethics by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Nov 18, 2022 0:29


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Deontology and virtue ethics as "effective theories" of consequentialist ethics, published by Jan Kulveit on November 17, 2022 on LessWrong. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library: LessWrong
LW - Deontology and virtue ethics as "effective theories" of consequentialist ethics by Jan Kulveit

The Nonlinear Library: LessWrong

Play Episode Listen Later Nov 18, 2022 0:29


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Deontology and virtue ethics as "effective theories" of consequentialist ethics, published by Jan Kulveit on November 17, 2022 on LessWrong. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
LW - Internal communication framework by rosehadshar

The Nonlinear Library

Play Episode Listen Later Nov 16, 2022 19:10


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Internal communication framework, published by rosehadshar on November 15, 2022 on LessWrong. Usually when we think about humans and agency, we focus on the layer where one human is one agent. We can also shift focus onto other layers and treat those entities as agents instead - for example, corporations or nations. Or, we can take different parts within a human mind as agenty -- which is a lens we explore in this post. In this post, we first introduce a general, pretty minimalist framework for thinking about the relationship between different parts of the mind (Internal Communication Framework, or ICF). Based on this framework, we build a basic technique that can be used to leverage ICF, and which we believe can be helpful for things like: Resolving inner conflicts Building internal alignment Increasing introspective access We have shared ICF with a small number of people (~60-80) over a few years, and are writing this post partly to create a written reference for ICF, and partly to make the framework accessible to more people. The framework Some background context: ICF is rooted in thinking about hierarchical agency (i.e. how agency, and strategic dynamics, play out composite agents and their parts). It is mostly an attempt to generalise from a class of therapy schools and introspection techniques working with parts of the mind, and to make the framework cleaner from a theoretical perspective. (More on the relationship between ICF and some particular techniques from that class here ICF was developed in 2018 by Jan Kulveit with help from Nora Amman, in the context of the Epistea Summer Experiment and then CFAR instructorship training. ICF makes two core assumptions: The human mind is made of parts Interactions between parts, and between parts and the whole, work best when the interactions are cooperative and kind There is a possibly interesting discussion about the epistemic status of these assumptions, which is not the point of this post, so we invite you to interpret them in whatever spirit you like - as a model which is somewhat true, a model which is wrong but useful, a metaphor for processing your experience. The human mind is made of parts These parts have different goals and models of the world, and don't always agree. They are still all part of your mind. If we make this assumption that the mind is made of parts, it becomes less obvious what we mean by terms like ‘you' or ‘the self'. One metaphor for thinking about this is to view the whole set of parts as something like a ‘council', which collectively has control over something like a ‘speaker', which is the voice creating your internal stream of words, the appearance of a person, your sense of ‘self'... Often, the whole is the most intuitive and helpful level of abstraction, and ‘you' will make perfect sense. Sometimes, for example when experiencing inner conflict, ‘you' will be confusing, and it will be more productive to work at the level of the parts instead. Interactions between parts, and between parts and the whole, work best when the interactions are cooperative and kind By cooperation, we mean something like choosing ‘cooperate' in various games, in a game theoretic sense. While game theory has formalised the notion of cooperation between agents at the same level of abstraction, we don't have a similar formal model for positive interactions between a whole and its parts. We will refer to these interactions as kindness. You may disagree with this assumption in the form of a normative claim. Descriptively, we think that cooperative and kind relations between parts, and between parts and the whole, tend to lead to more constructive interactions, and longer-term make people happier, more aligned and more agentic. A few overall notes: Different layers of agency can compete for power - e.g. for space to ...

The Nonlinear Library: LessWrong
LW - Internal communication framework by rosehadshar

The Nonlinear Library: LessWrong

Play Episode Listen Later Nov 16, 2022 19:10


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Internal communication framework, published by rosehadshar on November 15, 2022 on LessWrong. Usually when we think about humans and agency, we focus on the layer where one human is one agent. We can also shift focus onto other layers and treat those entities as agents instead - for example, corporations or nations. Or, we can take different parts within a human mind as agenty -- which is a lens we explore in this post. In this post, we first introduce a general, pretty minimalist framework for thinking about the relationship between different parts of the mind (Internal Communication Framework, or ICF). Based on this framework, we build a basic technique that can be used to leverage ICF, and which we believe can be helpful for things like: Resolving inner conflicts Building internal alignment Increasing introspective access We have shared ICF with a small number of people (~60-80) over a few years, and are writing this post partly to create a written reference for ICF, and partly to make the framework accessible to more people. The framework Some background context: ICF is rooted in thinking about hierarchical agency (i.e. how agency, and strategic dynamics, play out composite agents and their parts). It is mostly an attempt to generalise from a class of therapy schools and introspection techniques working with parts of the mind, and to make the framework cleaner from a theoretical perspective. (More on the relationship between ICF and some particular techniques from that class here ICF was developed in 2018 by Jan Kulveit with help from Nora Amman, in the context of the Epistea Summer Experiment and then CFAR instructorship training. ICF makes two core assumptions: The human mind is made of parts Interactions between parts, and between parts and the whole, work best when the interactions are cooperative and kind There is a possibly interesting discussion about the epistemic status of these assumptions, which is not the point of this post, so we invite you to interpret them in whatever spirit you like - as a model which is somewhat true, a model which is wrong but useful, a metaphor for processing your experience. The human mind is made of parts These parts have different goals and models of the world, and don't always agree. They are still all part of your mind. If we make this assumption that the mind is made of parts, it becomes less obvious what we mean by terms like ‘you' or ‘the self'. One metaphor for thinking about this is to view the whole set of parts as something like a ‘council', which collectively has control over something like a ‘speaker', which is the voice creating your internal stream of words, the appearance of a person, your sense of ‘self'... Often, the whole is the most intuitive and helpful level of abstraction, and ‘you' will make perfect sense. Sometimes, for example when experiencing inner conflict, ‘you' will be confusing, and it will be more productive to work at the level of the parts instead. Interactions between parts, and between parts and the whole, work best when the interactions are cooperative and kind By cooperation, we mean something like choosing ‘cooperate' in various games, in a game theoretic sense. While game theory has formalised the notion of cooperation between agents at the same level of abstraction, we don't have a similar formal model for positive interactions between a whole and its parts. We will refer to these interactions as kindness. You may disagree with this assumption in the form of a normative claim. Descriptively, we think that cooperative and kind relations between parts, and between parts and the whole, tend to lead to more constructive interactions, and longer-term make people happier, more aligned and more agentic. A few overall notes: Different layers of agency can compete for power - e.g. for space to ...

The Nonlinear Library
EA - We can do better than argmax by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Oct 10, 2022 19:02


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We can do better than argmax, published by Jan Kulveit on October 10, 2022 on The Effective Altruism Forum. Summary: A much-discussed normative model of prioritisation in EA is akin to argmax (putting all resources on your top option). But this model often prescribes foolish things, so we rightly deviate from it – but in ad hoc ways. We describe a more principled approach: a kind of softmax, in which it is best to allocate resources to several options by confidence. This is a better yardstick when a whole community collaborates on impact; when some opportunities are fleeting or initially unknown; or when large actors are in play. Epistemic status: Relatively well-grounded in theory, though the analogy to formal methods is inexact. You could mentally replace “argmax” with “all-in” and “softmax” with “smooth” and still get the gist.Gavin wrote almost all of this one, based on Jan's idea. > many EAs' writings and statements are much more one-dimensional and “maximizy” than their actions. – Karnofsky Cause prioritisation is often talked about like this: Evaluate a small number of options (e.g. 50 causes); Estimate their {importance, tractability, and neglectedness} from expert point estimates; Give massive resources to the top option. You can see this as taking the argmax: as figuring out which input (e.g. “trying out AI safety”; “going to grad school”) will get us the most output (expected impact). So call this argmax prioritisation (AP). AP beats the hell out of the standard procedure (“do what your teachers told you you were good at”; “do what polls well”). But it's a poor way to run a portfolio or community, because it only works when you're allocating marginal resources (e.g. one additional researcher); when your estimates of the effect or cost-effect are not changing fast; and when you already understand the whole action space. It serves pretty well in global health. But where these assumptions are severely violated, you want a different approach – and while alternatives are known in technical circles, they are less understood by the community at large. Problems with AP, construed naively: Monomania: the argmax function returns a single option; the winner takes all the resources. If people naively act under AP without coordinating, we get diminishing returns and decreased productivity (because of bottlenecks in the complements to adding people to a field, like ops and mentoring). Also, under plausible assumptions, the single cause it picks will be a poor fit for most people. To patch this, the community has responded with the genre "You should work on X instead of AI safety" or “Why X is actually the best way to help the long-term future”. We feel we need to justify not argmaxing, or to represent our thing as the true community argmax. And in practice justification often involves manipulating your own beliefs (to artificially lengthen your AI timelines, say), appealing to ad hoc principles like worldview diversification , or getting into arguments about the precise degree of crowdedness of alignment. Stuckness: Naive argmax gives no resources to exploration (because we assume at the outset that we know all the actions and have good enough estimates of their rank). As a result, decisions can get stuck at local maxima. The quest for "Cause X" is a meta patch for a lack of exploration in AP. Also, from experience, existing frameworks treat value-of-information as an afterthought, sometimes ignoring it entirely. Flickering: If the top two actions have similar utilities, small changes in the available information lead to endless costly jumps between options. (Maybe even cycles!) Given any realistic constraints about training costs or lags or bottlenecks, you really don't want to do this. This has actually happened in our experience, with some severe switching costs (yea...

The Nonlinear Library
LW - We can do better than argmax by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Oct 10, 2022 0:20


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We can do better than argmax, published by Jan Kulveit on October 10, 2022 on LessWrong. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library: LessWrong
LW - We can do better than argmax by Jan Kulveit

The Nonlinear Library: LessWrong

Play Episode Listen Later Oct 10, 2022 0:20


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We can do better than argmax, published by Jan Kulveit on October 10, 2022 on LessWrong. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
EA - Limits to Legibility by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Jun 29, 2022 7:58


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Limits to Legibility, published by Jan Kulveit on June 29, 2022 on The Effective Altruism Forum. From time to time, someone makes the case for why transparency in reasoning is important. The latest conceptualization is Epistemic Legibility by Elizabeth, but the core concept is similar to reasoning transparency used by OpenPhil, and also has some similarity to A Sketch of Good Communication by Ben Pace. I'd like to offer a gentle pushback. The tl;dr is in my comment on Ben's post, but it seems useful enough for a standalone post. “How odd I can have all this inside me and to you it's just words.” ― David Foster Wallace When and why reasoning legibility is hard Say you demand transparent reasoning from AlphaGo. The algorithm has roughly two parts: tree search and a neural network. Tree search reasoning is naturally legible: the "argument" is simply a sequence of board states. In contrast, the neural network is mostly illegible - its output is a figurative "feeling" about how promising a position is, but that feeling depends on the aggregate experience of a huge number of games, and it is extremely difficult to explain transparently how a particular feeling depends on particular past experiences. So AlphaGo would be able to present part of its reasoning to you, but not the most important part. Human reasoning uses both: cognition similar to tree search (where the steps can be described, written down, and explained to someone else) and processes not amenable to introspection (which function essentially as a black box that produces a "feeling"). People sometimes call these latter signals “intuition”, “implicit knowledge”, “taste”, “S1 reasoning” and the like. Explicit reasoning often rides on top of this.Extending the machine learning metaphor, the problem with human interpretability is that "mastery" in a field often consists precisely in having some well-trained black box neural network that performs fairly opaque background computations. Bad things can happen when you demand explanations from black boxes The second thesis is that it often makes sense to assume the mind runs distinct computational processes: one that actually makes decisions and reaches conclusions, and another that produces justifications and rationalizations. In my experience, if you have good introspective access to your own reasoning, you may occasionally notice that a conclusion C depends mainly on some black box, but at the same time, you generated a plausible legible argument A for the same conclusion after you reached the conclusion C. If you try running, say, Double Crux over such situations, you'll notice that even if someone refutes the explicit reasoning A, you won't quite change the conclusion to ¬C. The legible argument A was not the real crux. It is quite often the case that (A) is essentially fake (or low-weight), whereas the black box is hiding a reality-tracking model. Stretching the AlphaGo metaphor a bit: AlphaGo could be easily modified to find a few specific game "rollouts" that turned out to "explain" the mysterious signal from the neural network. Using tree search, it would produce a few specific examples how such a position may evolve, which would be selected to agree with the neural net prediction. If AlphaGo showed them to you, it might convince you! But you would get a completely superficial understanding of why it evaluates the situation the way it does, or why it makes certain moves. Risks from the legibility norm When you make a strong norm pushing for too straightforward "epistemic legibility", you risk several bad things:First, you increase the pressure on the "justification generator" to mask various black boxes by generating arguments supporting their conclusions.Second, you make individual people dumber. Imagine asking a Go grandmaster to transparently justify his mov...

The Nonlinear Library
EA - Ways money can make things worse by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Jun 21, 2022 14:04


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Ways money can make things worse, published by Jan Kulveit on June 21, 2022 on The Effective Altruism Forum. This text is trying to describe the most significant ways in which an influx of money can have adverse effects, and make a few specific suggestions on how to mitigate these effects. You may be interested in it if you are in a position where your decisions about money change the incentive landscape in some significant way.It is mostly not concerned about optics and PR, which seems to be getting a lot of attention already. While I started drafting this text before several recent popular posts on funding, and there is some overlap, it covers different observations and considerations, and ultimately aims for a different perspective.Note in advance several things which are not the purpose here: advise less spending, advise more spending, advise frugality, or express general worries. The purpose also isn't to perform a cost-benefit analysis, but just map the "indirect costs of funding". If you wish, suggest some terms which may be part of an impact analysis.Primary audience which I hope can benefit from it is the fast growing group of EAs who are deciding about substantial amounts of money in various roles such as grant evaluators, regrantors or leads of well funded projects. Adverse effects of funding The incentive landscape is always there, and makes some things easier or harder to do. The default incentive landscape is not great: in many ways, all of EA is trying to respond to the fact that some problems will not be solved by default.Adding money into the EA ecosystem changes and will change the landscape. This is great in some ways: some unfortunate gaps can be covered, and important things made easier to do.However, note that due to the fact we don't know what all the important things are, and adding incentives in one place often changes the landscape in other places, it's unlikely the resulting incentive landscape "with money" makes all the important things easier. In practice it seems likely some important things can actually get harder, and some not very good things are easier.While overall it's great we have more money to spend on fixing problems, we should also be tracking that money will sometimes change the incentive landscape in ways that are harmful overall. Tracking this, it seems useful to have a list of common negative effects. These brief adverse effect descriptions are sometimes supplemented by "stylized examples". These examples attempt to make it easier to imagine how the general adverse effects manifests in practice. They do not attempt to describe specific people or existing situations, although they may resemble them Individual epistemic distortion Strong incentives can be dangerous to good epistemics. From a predictive processing point of view, which is my personal best guess for a simple model of what human brains are doing, cognition lacks clear separation between "beliefs" and "goals". Accordingly, instrumental goals like "getting a large sum of money" do impact beliefs, by default. Research on motivated reasoning supports this claim with tons of empirical evidence. Stylized example: Young philosopher applies to an EA outreach program that comes with a large bounty. As a part of the application process, she is required to read a persuasive essay by one of the thought leaders of the EA movement. Part of the young philosopher's mind is highly suspicious of one reasoning step in the essay, but some other part assumes expressing criticism may be at odds with getting the bounty. The critical consideration never really comes to her conscious attention. Stylized example: An assistant professor in AI wants to have several PhDs funded. Hearing about the abundance of funding for AI safety research, he drafts a grant proposal arguing why the researc...

The Nonlinear Library
EA - Continuity Assumptions by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Jun 15, 2022 7:47


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Continuity Assumptions, published by Jan Kulveit on June 13, 2022 on The Effective Altruism Forum. This post will try to explain what I mean by continuity assumptions (and discontinuity assumptions), and why differences in these are upstream of many disagreements about AI safety. None of this is particularly new, but seemed worth reiterating in the form of a short post. What I mean by continuity Start with a cliff: This is sometimes called the Heaviside step function. This function is actually discontinuous, and so it can represent categorical change. In contrast, this cliff is actually continuous: From a distance or in low resolution, it looks like a step function; it gets really steep. Yet it is fundamentally different. How is this relevant Over the years, I've come to the view that intuitions about whether we live in a "continuous” or "discontinuous” world are one of the few top principal components underlying all disagreements about AI safety. This includes but goes beyond the classical continuous vs. discrete takeoff debates.A lot of models of what can or can't work in AI alignment depends on intuitions about whether to expect "true discontinuities" or just "steep bits". This holds not just in one, but many relevant variables (e.g. the generality of the AI's reasoning, the speed or differentiability of its takeoff). The discrete intuition usually leads to sharp categories like: Before the cliffAfter the cliffNon-general systems. Lack the core of general reasoning, that which allows thought in domains far from training dataGeneral systems. Capabilities generalise farWeak systems - that won't kill you, but also won't help you solve alignmentStrong systems - that would help solve alignment, but unfortunately will kill you by default, if unalignedSystems which may be misaligned, but aren't competently deceptive about it System which is actively modelling you at a level where the deception is beyond your ability to noticeWeak actsPivotal acts.. In Discrete World, empirical trends, alignment techniques, etc usually don't generalise across the categorical boundary. The right is far from the training distribution on the left. Your solutions don't survive the generality cliff, there are no fire alarms - and so on. Note that while assumptions about continuity in different dimensions are in principle not necessarily related, and you could e.g. assume continuity in takeoff and discontinuity in generality - in practice, they seem strongly correlated. Deep cruxes Deep priors over continuity versus discontinuity seem to be a crux which is hard to resolve. My guess is intuitions about continuity/discreteness are actually quite deep-seated: based more on how people do maths, rather than specific observations about the world. In practice, for most researchers, the "intuition" is something like a deep net trained on a whole lifetime of STEM reasoning - they won't update much on individual datapoints, and if they are smart, they are often able to re-interpret observations to be in line with their continuity priors. (As an example, compare Paul Christiano's post on takeoff speeds from 2018, which is heavily about continuity, to the debate between Paul and Eliezer in late 2021. Despite the participants spending years in discussion, progress on bridging the continuous-discrete gap between them seems very limited.) How continuity helps In basically every case, continuity implies the existence of systems "somewhere in between". Systems which are moderately strong: maybe weakly superhuman in some relevant domain and quite general, but at same time maybe still bad with plans on how to kill everyone. Moderately general systems: maybe are able of general reasoning, but in a strongly bounded way Proto-deceptive systems which are bad at deception. The existence of such systems helps us with t...

The Nonlinear Library
AF - Continuity Assumptions by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Jun 13, 2022 7:46


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Continuity Assumptions, published by Jan Kulveit on June 13, 2022 on The AI Alignment Forum. This post will try to explain what I mean by continuity assumptions (and discontinuity assumptions), and why differences in these are upstream of many disagreements about AI safety. None of this is particularly new, but seemed worth reiterating in the form of a short post. What I mean by continuity Start with a cliff:This is sometimes called the Heaviside step function. This function is actually discontinuous, and so it can represent categorical change. In contrast, this cliff is actually continuous: From a distance or in low resolution, it looks like a step function; it gets really steep. Yet it is fundamentally different. How is this relevant Over the years, I've come to the view that intuitions about whether we live in a "continuous” or "discontinuous” world are one of the few top principal components underlying all disagreements about AI safety. This includes but goes beyond the classical continuous vs. discrete takeoff debates.A lot of models of what can or can't work in AI alignment depends on intuitions about whether to expect "true discontinuities" or just "steep bits". This holds not just in one, but many relevant variables (e.g. the generality of the AI's reasoning, the speed or differentiability of its takeoff). The discrete intuition usually leads to sharp categories like: Before the cliffAfter the cliffNon-general systems. Lack the core of general reasoning, that which allows thought in domains far from training dataGeneral systems. Capabilities generalise farWeak systems - that won't kill you, but also won't help you solve alignmentStrong systems - that would help solve alignment, but unfortunately will kill you by default, if unalignedSystems which may be misaligned, but aren't competently deceptive about it System which is actively modelling you at a level where the deception is beyond your ability to noticeWeak actsPivotal acts.. In Discrete World, empirical trends, alignment techniques, etc usually don't generalise across the categorical boundary. The right is far from the training distribution on the left. Your solutions don't survive the generality cliff, there are no fire alarms - and so on. Note that while assumptions about continuity in different dimensions are in principle not necessarily related, and you could e.g. assume continuity in takeoff and discontinuity in generality - in practice, they seem strongly correlated. Deep cruxes Deep priors over continuity versus discontinuity seem to be a crux which is hard to resolve. My guess is intuitions about continuity/discreteness are actually quite deep-seated: based more on how people do maths, rather than specific observations about the world. In practice, for most researchers, the "intuition" is something like a deep net trained on a whole lifetime of STEM reasoning - they won't update much on individual datapoints, and if they are smart, they are often able to re-interpret observations to be in line with their continuity priors. (As an example, compare Paul Christiano's post on takeoff speeds from 2018, which is heavily about continuity, to the debate between Paul and Eliezer in late 2021. Despite the participants spending years in discussion, progress on bridging the continuous-discrete gap between them seems very limited.) How continuity helps In basically every case, continuity implies the existence of systems "somewhere in between". Systems which are moderately strong: maybe weakly superhuman in some relevant domain and quite general, but at same time maybe still bad with plans on how to kill everyone. Moderately general systems: maybe are able of general reasoning, but in a strongly bounded way Proto-deceptive systems which are bad at deception. The existence of such systems helps us with two fund...

The Nonlinear Library
AF - Announcing the Alignment of Complex Systems Research Group by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Jun 4, 2022 9:23


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing the Alignment of Complex Systems Research Group, published by Jan Kulveit on June 4, 2022 on The AI Alignment Forum. tl;dr: We're a new alignment research group based at Charles University, Prague. If you're interested in conceptual work on agency and the intersection of complex systems and AI alignment, we want to hear from you. Ideal for those who prefer an academic setting in Europe. What we're working on Start with the idea of an "alignment interface": the boundary between two systems with different goals: As others have pointed out, there's a whole new alignment problem at each interface. Existing work often focuses on one interface, bracketing out the rest of the world. e.g. the AI governance bracket The standard single-interface approach assumes that the problems at each alignment interface are uncoupled (or at most weakly coupled). All other interfaces are bracketed out. A typical example of this would be a line of work oriented toward aligning the first powerful AGI system with its creators, assuming the AGI will solve the other alignment problems (e.g. “politics”) well. Against this, we put significant probability mass on the alignment problems interacting strongly. For instance, we would expect proposals such as Truthful AI: Developing and governing AI that does not lie to interact strongly with problems at multiple interfaces. Or: alignment of a narrow AI system which would be good at modelling humans and at persuasion would likely interact strongly with politics and geopolitics.Overall, when we take this frame, it often highlights different problems than the single-interface agendas, or leads to a different emphasis when thinking about similar problems.(The nearest neighbours of this approach are the “multi-multi” programme of Critch and Krueger, parts of Eric Drexler's CAIS, parts of John Wentworth's approach to understanding agency, and possibly this.) If you broadly agree with the above, you might ask “That's nice – but what do you work on, specifically?” In this short intro, we'll illustrate with three central examples. We're planning longer writeups in coming months. Hierarchical agency Many systems have several levels which are sensibly described as an agent. For instance, a company and its employees can usually be well-modelled as agents. Similarly with social movements and their followers, or countries and their politicians. Hierarchical agency: while the focus of e.g. game theory is on "horizontal" relations (violet), our focus is on "vertical" relations, between composite agents and their parts. So situations where agents are composed of other agents are ubiquitous. A large amount of math describes the relations between agents at the same level of analysis: this is almost all of game theory. Thanks to these, we can reason about cooperation, defection, threats, correlated equilibria, and many other concepts more clearly. Call this tradition "horizontal game theory". We don't have a similarly good formalism for the relationship between a composite agent and its parts (superagent and subagent). Of course we can think about these relationships informally: for example, if I say “this group is turning its followers into cultists”, we can parse this as a superagent modifying and exploiting its constituents in a way which makes them less “agenty”, and the composite agent "more agenty". Or we can talk about "vertical conflicts" between for example a specific team in a company, and the company as a whole. Here, both structures are “superagents” with respect to individual humans, and one of the resources they fight over is the loyalty of individual humans. What we want is a formalism good for thinking about both upward and downward intentionality. Existing formalisms like social choice theory often focus on just one direction - for example, the...

The Nonlinear Library
EA - Different forms of capital by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Apr 25, 2022 2:45


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Different forms of capital, published by Jan Kulveit on April 25, 2022 on The Effective Altruism Forum. Currently, the EA movement tracks human capital and financial capital, using metrics like “number of engaged EAs” or “amount of money pledged toward EA causes”. But other forms of capital seem as important, less understood, and less measured. Plausibly, part of the explanation for the neglect is measurability bias (a streetlight effect). Network capital Network capital is the existence and strength of links between people in a social network. Links can have different forms; a very basic one is the ability to get another party's attention and time, or tacit permission to reach out to them. Other types can include trust, degree of ability to model the other party, and so on. It would be good to think about what kind of network capital the movement is lacking, what kind of capital will be useful in future, but also how do we find the shortest paths through implicit networks not available as data? In some sense the standard EA focus on broad career capital, recruitment from elite schools, and elite expertise already builds a lot of network capital. What seems less clear is the ability to effectively use it to do good. On one occasion, a small group of EAs went through the list of people Dominic Cummings follows on Twitter, and found that we had met or worked with ⅓ of them. Example: the EA bet on the civil service. A main effect of EAs entering and climbing the civil service is reducing the rest of EA's distance from power centres. Each EA civil servant is then a network capital multiplier for the rest of us. A counterpoint is that this will tend to reduce the distance from x people to 3 people, but in catastrophes it is far better to go from x to 1. (That is, EA → minister.) Structural capital Structural capital is the ability of the holder to absorb resources (e.g. people or money) and turn them into useful things. It takes various forms: functional and scalable processes, competent management, suitable legal status and backing, good operations support, well designed spaces, well written code. On this framing, it may make sense to ask questions like: How much of these forms of capital do we have? How is it distributed? When we are converting between different forms, or substituting one form of capital with another, what are the conversion rates? Are we using the different forms of capital efficiently? This is a part of series explains my part in the EA response to COVID, my reasons for switching from AI alignment work for a full year, and some new ideas the experience gave me. It was co-written with Gavin Leech. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
EA - Case for emergency response teams by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Apr 5, 2022 10:11


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Case for emergency response teams, published by Jan Kulveit on April 5, 2022 on The Effective Altruism Forum. This project is seeking a paid coordinator. Please apply here. Motivation See Hinges and crises for expanded explanation. The hinge of history So far, long-termist efforts to change the trajectory of the world focus on far-off events. This is on the assumption that we foresee some important problem and influence its outcome by working on the problem for longer. We thus start working on it sooner than others, we lay the groundwork for future research, we raise awareness, and so on. Many longtermists propose that we now live at the “hinge of history”, usually understood on the timescale of critical centuries, or critical decades. But ”hinginess” is likely not constant: some short periods will be significantly more eventful than others. It is also possible that these periods will present even more leveraged opportunities for changing the world's trajectory. These “maximally hingey” moments might be best influenced by sustained efforts long before them (as described above). But it seems plausible that in many cases, the best realistic chance to influence them is “while they are happening”, via a concentrated effort at that moment. Some reasons for this: there are decreased problems with cluelessness; an increase in the resources spent on the problem; actual decisions being made by powerful actors, and so on. Crises as times of opportunity Even if a specific crisis is not a “hinge of history”, crises often bring opportunities to change the established order. For example, policies well outside the Overton window can suddenly become live options, and intellectual development in disciplines and technologies (think of the sudden intense focus on reproducibility in epidemiology). These effects often persist for decades after the initial event (think of taking off your shoes at the airport; think of face masks in post-SARS countries) - and so shaping the response is of interest to longtermists. How to impact hinges or crises Acting effectively during hinges or crises may depend on factors such as "do we have a relevant policy proposal in the drawer?", "do we have a team of experts able to advise?" or “do we have a relevant network?” It is possible to prepare these! This document proposes the creation of an “emergency response team” as one sensible preparation. We tested the above principles during the COVID pandemic, by launching a good-sized research and policy effort, Epidemic Forecasting (EpiFor). We had some success, with associated members advising legislators and international bodies and the associated research getting into top journals and so reaching millions of people. Some takeaways: There are various paths to impact. COVID illustrated an opportunity for small teams: there are worlds where what's needed is being able to think clearly and do research really fast. Often our main bottleneck was project managers, particularly people with both research skills and PM skills. Also EpiFor was only effective because core members already knew each other from FHI RSP or CZEA. And we could have been much more effective, if the team had "trained" together before COVID, to sort out differences in management styles, communication, commitment, etc. How to improve longtermists' emergency response capabilities Is it possible to fund, train, and sustain a longtermist response team in the absence of a current emergency? Perhaps - but for people we most want on board, the opportunity costs of being thus “benched” might be too high. A more viable alternative is a reserve, or standing army: a team of researchers and managers who are “on call” for a future emergency, undergoing annual wargaming or similar refreshers to maintain readiness. What the team ideally should have: existing e...

The Nonlinear Library
LW - Hinges and crises by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Mar 29, 2022 5:52


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Hinges and crises, published by Jan Kulveit on March 29, 2022 on LessWrong. Crossposted from EA forum. The second post in the sequence covers the importance of crises, argues for crises as opportunities, and makes the claim that this community is currently better at acting with longer timescale OODA loops but lacks skills and capabilities to act with short OODA loops.We often talk about the hinge of history, a period of high influence over the whole future trajectory of life. If we grant that our century is such a hinge, it's unlikely that the "hinginess" is distributed uniformly across the century; instead, it seems much more likely it will be concentrated to some particular decades, years, and months, which will have much larger influence. It also seems likely that some of these "hingy" periods will look eventful and be understood as crises at the time. So understanding crises, and the ability to act during crises, may be particularly important for influencing the long-term future. The first post in this sequence mentioned my main reason to work on COVID: it let me test my models of the world, and so informed my longtermist work. This post presents some other reasons, related to the above argument about hinges. None of these reasons would have been sufficient for me personally on their own, but they still carry weight, and should be sufficient for others in the next crisis. An exemplar crisis with a timescale of months COVID has commonalities with some existential risk scenarios. (See Krakovna.) Lessons from it could transfer to risks in which: the crisis unfolds over a similar timescale (weeks or years, rather than seconds or hours), governments have some role, the risk is at least partially visible, the general population is engaged in some way. This makes COVID a more useful comparison for versions of continuous AI takeoff where governments are struggling to understand an unfolding situation, but in which they have options to act and/or regulate. Similarly, it is a useful model for versions of any x-risk where a large fraction of academia suddenly focuses on a topic previously studied by a small group, and resources spent on the topic increase by many orders of magnitude. This emergency research push is likely in scenarios with a warning shot or sufficiently loud fire alarm that gets noticed by academia. On the other hand, lessons learned from COVID will be correspondingly less useful for cases where few of the above assumptions hold (e.g. "an AI in a box bursts out in an intelligence explosion on the timescale of hours"). Crisis and opportunity Crises often bring opportunities to change the established order, and, for example, policy options that were outside the Overton window can suddenly become real. (This was noted pre-COVID by Anders Sandberg.) There can also be rapid developments in relevant disciplines and technologies. Some examples of Overton shifts during COVID include: total border closures (in the West), large-scale and prolonged stay-at-home orders, mask mandates, unconditional payouts to large fractions of the population, and automatic data-driven control policies. Technology developments include the familiar new vaccine platforms (mRNA, DNA) going to production, massive deployment of rapid tests, and the unprecedented use of digital contact tracing. (Note that many other opportunities which opened up were not acted on.) Taking advantage of such opportunities may depend on factors such as "do we have a relevant policy proposal in the drawer?", "do we have a team of experts able to advise?" or “do we have a relevant network?”. These can be prepared in advance. Default example for humanity thinking about large-scale risk COVID will likely become the go-to example of a large-scale, seemingly low-probability risk we were unprepared for. The ability to...

The Nonlinear Library: LessWrong
LW - Hinges and crises by Jan Kulveit

The Nonlinear Library: LessWrong

Play Episode Listen Later Mar 29, 2022 5:52


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Hinges and crises, published by Jan Kulveit on March 29, 2022 on LessWrong. Crossposted from EA forum. The second post in the sequence covers the importance of crises, argues for crises as opportunities, and makes the claim that this community is currently better at acting with longer timescale OODA loops but lacks skills and capabilities to act with short OODA loops.We often talk about the hinge of history, a period of high influence over the whole future trajectory of life. If we grant that our century is such a hinge, it's unlikely that the "hinginess" is distributed uniformly across the century; instead, it seems much more likely it will be concentrated to some particular decades, years, and months, which will have much larger influence. It also seems likely that some of these "hingy" periods will look eventful and be understood as crises at the time. So understanding crises, and the ability to act during crises, may be particularly important for influencing the long-term future. The first post in this sequence mentioned my main reason to work on COVID: it let me test my models of the world, and so informed my longtermist work. This post presents some other reasons, related to the above argument about hinges. None of these reasons would have been sufficient for me personally on their own, but they still carry weight, and should be sufficient for others in the next crisis. An exemplar crisis with a timescale of months COVID has commonalities with some existential risk scenarios. (See Krakovna.) Lessons from it could transfer to risks in which: the crisis unfolds over a similar timescale (weeks or years, rather than seconds or hours), governments have some role, the risk is at least partially visible, the general population is engaged in some way. This makes COVID a more useful comparison for versions of continuous AI takeoff where governments are struggling to understand an unfolding situation, but in which they have options to act and/or regulate. Similarly, it is a useful model for versions of any x-risk where a large fraction of academia suddenly focuses on a topic previously studied by a small group, and resources spent on the topic increase by many orders of magnitude. This emergency research push is likely in scenarios with a warning shot or sufficiently loud fire alarm that gets noticed by academia. On the other hand, lessons learned from COVID will be correspondingly less useful for cases where few of the above assumptions hold (e.g. "an AI in a box bursts out in an intelligence explosion on the timescale of hours"). Crisis and opportunity Crises often bring opportunities to change the established order, and, for example, policy options that were outside the Overton window can suddenly become real. (This was noted pre-COVID by Anders Sandberg.) There can also be rapid developments in relevant disciplines and technologies. Some examples of Overton shifts during COVID include: total border closures (in the West), large-scale and prolonged stay-at-home orders, mask mandates, unconditional payouts to large fractions of the population, and automatic data-driven control policies. Technology developments include the familiar new vaccine platforms (mRNA, DNA) going to production, massive deployment of rapid tests, and the unprecedented use of digital contact tracing. (Note that many other opportunities which opened up were not acted on.) Taking advantage of such opportunities may depend on factors such as "do we have a relevant policy proposal in the drawer?", "do we have a team of experts able to advise?" or “do we have a relevant network?”. These can be prepared in advance. Default example for humanity thinking about large-scale risk COVID will likely become the go-to example of a large-scale, seemingly low-probability risk we were unprepared for. The ability to...

The Nonlinear Library
LW - Experimental longtermism: theory needs data by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Mar 25, 2022 6:54


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Experimental longtermism: theory needs data, published by Jan Kulveit on March 24, 2022 on LessWrong. The series explains my part in the response to COVID, my reasons for switching from AI alignment work to the COVID response for a full year, and some new ideas the experience gave me. While it is written from my (Jan Kulveit's) personal perspective, I co-wrote the text with Gavin Leech, with input from many others. The first post covers my main motivation: experimental longtermism. Feedback loop Possibly the main problem with longtermism and x-risk reduction is the weak and slow feedback loop. (You work on AI safety; at some unknown time in the future, an existential catastrophe happens, or doesn't happen, as a result of your work, or not as a result of your work.)Most longtermists and existential risk people openly admit that the area doesn't have good feedback loops. Still, I think the community at large underappreciates how epistemically tricky our situation is. Disciplines that lack feedback from reality are exactly the ones that can easily go astray. But most longtermist work is based on models of how the world works - or doesn't work. These models try to explain why such large risks are neglected, the ways institutions like government or academia are inadequate, how various biases influence public perception and decision making, how governments work during crises, and so on. Based on these models, we take further steps (e.g. writing posts like this, uncovering true statements in decision theory, founding organisations, working at AI labs, going into policy, or organising conferences where we explain to others why we believe the long-term future is important and x-risk is real). Covid as opportunity Claim: COVID presented an unusually clear opportunity to put some of our models and theory in touch with reality, thus getting more "experimental" data than is usually possible, while at the same time helping to deal with pandemic. The impact of the actions I mentioned above is often unclear even after many years, whereas in the case of COVID impact of similar actions was observable within weeks and months.For me personally, there was one more pull. My background is in physics, and in many ways, I still think like a physicist. Physics - in contrast to most of maths and philosophy - has the advantage of being able to put its models in touch with reality, and to use this signal as an important driver in finding out what's true. In modern maths, (basically) whatever is consistent is true, and a guiding principle for what's important to work on is a sense of beauty. To a large extent, the feedback signal in philosophy is what other philosophers think. (Except when a philosophy turns into a political movement - then the signal comes from outcomes such as greater happiness, improved governance, large death tolls, etc.) In both maths and philosophy, the core computation mostly happens "in” humans. Physics has the advantage that in its experiments, "reality itself" does the computation for us. I miss this feedback from reality in my x-risk work. Note that many of the concrete things longtermists do, like posting on the Alignment Forum or explaining things at conferences, actually do have feedback loops. But these are usually more like maths or philosophy: they provide social feedback, including intuitions about what kinds of research are valuable. One may wonder about the problems with these feedback loops, and what kind of blind-spots or biases they entail. At the beginning of the COVID crisis, it seemed to me that some of our "longtermist" models were making fairly strong predictions about specific things that would fail - particularly about inadequate research support for executive decision-making. After some hesitation, I decided that if I trusted these models for x-risk...

The Nonlinear Library: LessWrong
LW - Experimental longtermism: theory needs data by Jan Kulveit

The Nonlinear Library: LessWrong

Play Episode Listen Later Mar 25, 2022 6:54


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Experimental longtermism: theory needs data, published by Jan Kulveit on March 24, 2022 on LessWrong. The series explains my part in the response to COVID, my reasons for switching from AI alignment work to the COVID response for a full year, and some new ideas the experience gave me. While it is written from my (Jan Kulveit's) personal perspective, I co-wrote the text with Gavin Leech, with input from many others. The first post covers my main motivation: experimental longtermism. Feedback loop Possibly the main problem with longtermism and x-risk reduction is the weak and slow feedback loop. (You work on AI safety; at some unknown time in the future, an existential catastrophe happens, or doesn't happen, as a result of your work, or not as a result of your work.)Most longtermists and existential risk people openly admit that the area doesn't have good feedback loops. Still, I think the community at large underappreciates how epistemically tricky our situation is. Disciplines that lack feedback from reality are exactly the ones that can easily go astray. But most longtermist work is based on models of how the world works - or doesn't work. These models try to explain why such large risks are neglected, the ways institutions like government or academia are inadequate, how various biases influence public perception and decision making, how governments work during crises, and so on. Based on these models, we take further steps (e.g. writing posts like this, uncovering true statements in decision theory, founding organisations, working at AI labs, going into policy, or organising conferences where we explain to others why we believe the long-term future is important and x-risk is real). Covid as opportunity Claim: COVID presented an unusually clear opportunity to put some of our models and theory in touch with reality, thus getting more "experimental" data than is usually possible, while at the same time helping to deal with pandemic. The impact of the actions I mentioned above is often unclear even after many years, whereas in the case of COVID impact of similar actions was observable within weeks and months.For me personally, there was one more pull. My background is in physics, and in many ways, I still think like a physicist. Physics - in contrast to most of maths and philosophy - has the advantage of being able to put its models in touch with reality, and to use this signal as an important driver in finding out what's true. In modern maths, (basically) whatever is consistent is true, and a guiding principle for what's important to work on is a sense of beauty. To a large extent, the feedback signal in philosophy is what other philosophers think. (Except when a philosophy turns into a political movement - then the signal comes from outcomes such as greater happiness, improved governance, large death tolls, etc.) In both maths and philosophy, the core computation mostly happens "in” humans. Physics has the advantage that in its experiments, "reality itself" does the computation for us. I miss this feedback from reality in my x-risk work. Note that many of the concrete things longtermists do, like posting on the Alignment Forum or explaining things at conferences, actually do have feedback loops. But these are usually more like maths or philosophy: they provide social feedback, including intuitions about what kinds of research are valuable. One may wonder about the problems with these feedback loops, and what kind of blind-spots or biases they entail. At the beginning of the COVID crisis, it seemed to me that some of our "longtermist" models were making fairly strong predictions about specific things that would fail - particularly about inadequate research support for executive decision-making. After some hesitation, I decided that if I trusted these models for x-risk...

The Nonlinear Library
EA - How we failed by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Mar 23, 2022 8:07


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How we failed, published by Jan Kulveit on March 23, 2022 on The Effective Altruism Forum. Here “we” means the broader EA and rationalist communities. Learning from failures is just as important as learning from successes. This post describes a few cases of mistakes and failures which seem interesting from the learning perspective. Failure to improve US policy in the first wave Early in the pandemic, the rationality community was right about important things (e.g. masks), often very early on. We won the epistemic fight, and the personal instrumental fight. (Consider our masks discourse, or Microcovid use, or rapid testing to see our friends.) At the same time, the network distance from the core of the community to e.g. people in the CDC is not that large: OpenPhil is a major funder for the Johns Hopkins Center for Health Security, and the network distance from CHS to top public health institutions is presumably small. As such, we could say the US rationality community failed instrumentally, given their apparent short distance to influence.Contrast this with the influence of LessWrong on UK policy via a single reader, Dominic Cummings. Or our influence on Czech policy (see What we tried).Also, contrast this with the #masks4all movement. This movement was founded by the writer and speaker Petr Ludwig (pre COVID, he was developing something like his own version of rationality and critical thinking, independent of the LessWrong cluster). After the success of grass-roots activity in DIY mask-making in Czechia, which led to the whole country starting to use the masks within a week or so, he tried to export the "masks work" and “you can make them at home” meme globally. While counterfactuals are hard, this seems a major success, likely speeding up mask uptake in Western countries by weeks (consider the Marginal Revolution praise). Where was the “microcovid calculator for countries”? (see EpiFor funding and medium-range models.) Overreaction through 2021 Personal reactions within the community in February 2020 were sometimes exemplary; apparent overshoots (like copper tape on surfaces, postal package quarantines) were reasonable ex ante. But the community was slow to update its behaviour in response to improved estimates of infection-fatality ratee, long COVID, and knowledge of aerosol and droplet transmission. Anecdotally, the community was slow to return to relatively safe things like outdoor activities, even after full vaccination. While your risk tolerance is a given quantity in decision-making, my impression is that many people's behaviour did not match the risk tolerance implicit in their other daily habits. Inverse gullibility Many large institutions had a bad track record during the pandemic, and the heuristic “do not update much from their announcements” served many people well. However, in places the community went beyond this, to non-updates from credible sources and anti-updates from institutions. Gullibility is believing things excessively: taking the claim “studies show” as certain. Inverse gullibility is disbelieving things excessively: failing to update at all when warranted, or even to invert the update. Example: the COVID research summaries of the FDA are often good; it's in the policy guidance section that the wheels come off. But I often see people write off all FDA material. My broad guess is that people who run off simple models like "governments are evil and lie to you" are very unlikely to be able to model governments well, and very unlikely to understand parts of the solution space where governments can do important and useful things. Failing to persuade reviewers that important methods were valid More locally to our research: we used skilled forecasters for many of our estimates, as an input for our hybrid of mathematical and judgmental prediction. ...

The Nonlinear Library
EA - What we tried (Covid response AMA) by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Mar 22, 2022 16:21


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What we tried (Covid response AMA), published by Jan Kulveit on March 21, 2022 on The Effective Altruism Forum. This post is part of a series explaining my part in the EA response to COVID, my reasons for switching from AI alignment work for a full year, and some new ideas the experience gave me.Here "we" means Jan K, EpidemicForecasting, people who came together to advise the Czech government, plus a large cast of EA researchers. This was almost always a collective effort, and when I (Jan K) write from a first-person perspective I am not claiming all credit. This post gives a rough idea of the experiences and data I base the conclusions of the other posts on. It summarises the main pieces of work we did in response to COVID. The goal is not to give a comprehensive description of all we did, nor of all the object-level lessons learned. It's not that there isn't much to say: in my estimation the efforts described here represent something between a "large portion" and a "majority" of the impact that effective altruists had on the COVID crisis in its first 18 months (although some other efforts were more visible in the EA community). But a full object-level analysis would be extremely long and distract from the more important generalisable conclusions. Also, some things are limited by confidentiality agreements. This is often the price of giving advice at the highest level: if you want to participate in important conversations, you can't leak content from them. Our work can be roughly divided into three clusters: The EpidemicForecasting.org project Advising one European government Public communications in one country EpidemicForecasting.org project EpiFor was a project founded in the expectation that epidemic modelling as a discipline would fail to support major executive decisions for a number of specific reasons. For example, I expected: Because of the "fog of war", information would not reach the right places. The epidemic modelling field had a culture of not sharing raw model results publicly, but models and estimates need to be public to allow iteration, criticism, and validation. Academic modelling would focus on advising high-prestige customers such as US CDC or UK SAGE/SPI-M. So for the majority of the world population (those in developing countries), decision-makers would be underprovided with forecasts. Academic epidemic modelling would neglect user interfaces optimised for decision making. no multi-model comparisons (i.e. showing the results of several different teams side by side), or at least no comparison usable by nonspecialists. It is very difficult to know what to believe when you only have one model, and your decisions become nonrobust. The data would be bad. In practice, modellers used the wrong input data - for example, "number of confirmed cases" instead of "estimate of actual infected cases". (This is besides the data being incredibly noisy.) By default, even many good models would not be presented as an input suitable for decision making. To make decisions, you need scenarios ("if we do x, then this will happen"). So predictions that already implicitly include an estimate of how decisions will be made in their prediction of the outcomes are hard to interpret. (This severely limited the usability of e.g. the Metaculus forecasts) Many models wouldn't handle uncertainty well. Uncertainty over key input parameters needs to be propagated into output predictions, and should be clearly presented. (In hindsight, this was all correct) My guess was that these problems would lead to many deaths - certainly more than thousands, maybe millions - by failing to improve decisions which would by default be made in an ill-informed, slow and chaotic manner, on the basis of scarce and noisy data and justly untrusted, non-robust models. After about two weeks of trying...

The Nonlinear Library
EA - Crises are periods of increased hinginess by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Mar 17, 2022 5:43


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Crises are periods of increased hinginess, published by Jan Kulveit on March 17, 2022 on The Effective Altruism Forum. The second post in the sequence covers a group of motivations related to crises.We often talk about the hinge of history, a period of high influence over the whole future trajectory of life. If we grant that our century is such a hinge, it's unlikely that the "hinginess" is distributed uniformly across the century; instead, it seems much more likely it will be concentrated to some particular decades, years, and months, which will have much larger influence. It also seems likely that some of these "hingy" periods will look eventful and be understood as crises at the time. So understanding crises, and the ability to act during crises, may be particularly important for influencing the long-term future. The first post in this sequence mentioned my main reason to work on COVID: it let me test my models of the world, and so informed my longtermist work. This post presents some other reasons, related to the above argument about hinges. None of these reasons would have been sufficient for me personally on their own, but they still carry weight, and should be sufficient for others in the next crisis. An exemplar crisis with a timescale of months COVID has commonalities with some existential risk scenarios. (See Krakovna.) Lessons from it could transfer to risks in which: the crisis unfolds over a similar timescale (weeks or years, rather than seconds or hours), governments have some role, the risk is at least partially visible, the general population is engaged in some way. This makes COVID a more useful comparison for versions of continuous AI takeoff where governments are struggling to understand an unfolding situation, but in which they have options to act and/or regulate. Similarly, it is a useful model for versions of any x-risk where a large fraction of academia suddenly focuses on a topic previously studied by a small group, and resources spent on the topic increase by many orders of magnitude. This emergency research push is likely in scenarios with a warning shot or sufficiently loud fire alarm that gets noticed by academia. On the other hand, lessons learned from COVID will be correspondingly less useful for cases where few of the above assumptions hold (e.g. "an AI in a box bursts out in an intelligence explosion on the timescale of hours"). Crisis and opportunity Crises often bring opportunities to change the established order, and, for example, policy options that were outside the Overton window can suddenly become real. (This was noted pre-COVID by Anders Sandberg.) There can also be rapid developments in relevant disciplines and technologies. Some examples of Overton shifts during COVID include: total border closures (in the West), large-scale and prolonged stay-at-home orders, mask mandates, unconditional payouts to large fractions of the population, and automatic data-driven control policies. Technology developments include the familiar new vaccine platforms (mRNA, DNA) going to production, massive deployment of rapid tests, and the unprecedented use of digital contact tracing. (Note that many other opportunities which opened up were not acted on.) Taking advantage of such opportunities may depend on factors such as "do we have a relevant policy proposal in the drawer?", "do we have a team of experts able to advise?" or “do we have a relevant network?”. These can be prepared in advance. Default example for humanity thinking about large-scale risk COVID will likely become the go-to example of a large-scale, seemingly low-probability risk we were unprepared for. The ability to shape narratives and attention around COVID could be important for the broader problem of how humanity should deal with other such risks. While there is a clear philosoph...

The Nonlinear Library
EA - Experimental longtermism: theory needs data by Jan Kulveit

The Nonlinear Library

Play Episode Listen Later Mar 15, 2022 6:55


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Experimental longtermism: theory needs data, published by Jan Kulveit on March 15, 2022 on The Effective Altruism Forum. This series explains my part in the EA response to COVID, my reasons for switching from AI alignment work to the COVID response for a full year, and some new ideas the experience gave me. While it is written from my (Jan's) personal perspective, I co-wrote the text with Gavin Leech, with input from many others. The first post covers my main motivation: experimental longtermism. Feedback loop Possibly the main problem with longtermism and x-risk reduction is the weak and slow feedback loop. (You work on AI safety; at some unknown time in the future, an existential catastrophe happens, or doesn't happen, as a result of your work, or not as a result of your work.)Most longtermists and existential risk people openly admit that the area doesn't have good feedback loops. Still, I think the community at large underappreciates how epistemically tricky our situation is. Disciplines that lack feedback from reality are exactly the ones that can easily go astray. But most longtermist work is based on models of how the world works - or doesn't work. These models try to explain why such large risks are neglected, the ways institutions like government or academia are inadequate, how various biases influence public perception and decision making, how governments work during crises, and so on. Based on these models, we take further steps (e.g. writing posts like this, uncovering true statements in decision theory, founding organisations, working at AI labs, going into policy, or organising conferences where we explain to others why we believe the long-term future is important and x-risk is real). Covid as opportunity Claim: COVID presented an unusually clear opportunity to put some of our models and theory in touch with reality, thus getting more "experimental" data than is usually possible, while at the same time helping to deal with pandemic. The impact of the actions I mentioned above is often unclear even after many years, whereas in the case of COVID impact of similar actions was observable within weeks and months.For me personally, there was one more pull. My background is in physics, and in many ways, I still think like a physicist. Physics - in contrast to most of maths and philosophy - has the advantage of being able to put its models in touch with reality, and to use this signal as an important driver in finding out what's true. In modern maths, (basically) whatever is consistent is true, and a guiding principle for what's important to work on is a sense of beauty. To a large extent, the feedback signal in philosophy is what other philosophers think. (Except when a philosophy turns into a political movement - then the signal comes from outcomes such as greater happiness, improved governance, large death tolls, etc.) In both maths and philosophy, the core computation mostly happens "in” humans. Physics has the advantage that in its experiments, "reality itself" does the computation for us. I miss this feedback from reality in my x-risk work. Note that many of the concrete things longtermists do, like posting on the Alignment Forum or explaining things at conferences, actually do have feedback loops. But these are usually more like maths or philosophy: they provide social feedback, including intuitions about what kinds of research are valuable. One may wonder about the problems with these feedback loops, and what kind of blind-spots or biases they entail. At the beginning of the COVID crisis, it seemed to me that some of our "longtermist" models were making fairly strong predictions about specific things that would fail - particularly about inadequate research support for executive decision-making. After some hesitation, I decided that if I trusted these mo...

Názory a argumenty
Petr Šabata: Evropa se obává varianty omikron, Fialova vláda otálí

Názory a argumenty

Play Episode Listen Later Dec 20, 2021 3:18


Teď už občanům Česka nezbývá než doufat, že nová vláda Petra Fialy (ODS) ví, co činí, když nechce prodloužit nouzový stav a nechystá na svátky nová omezení. Jan Kulveit z univerzity v Oxfordu a člen skupiny MeSES k tomu napsal, že „signalizace ‚situace se zlepšuje‘ je šílená“ a v souvislosti s mnohem nakažlivější variantou omikron dodal, že naopak „situace je vysoce riziková, opatření je třeba posílit, ne rozvolnit a zrušit“.

The Nonlinear Library: EA Forum Top Posts
Donor Lottery Debrief by TimothyTelleenLawton

The Nonlinear Library: EA Forum Top Posts

Play Episode Listen Later Dec 12, 2021 8:27


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Donor Lottery Debrief, published by TimothyTelleenLawton on the Effective Altruism Forum. Write a Review Good news, I've finally allocated the rest of the donor lottery funds from the 2016-2017 Donor Lottery (the first one in our community)! It took over 3 years but I'm excited about the two projects I funded. It probably goes without saying, but this post is about an independent project and does not represent CFAR (where I work). This post contains several updates related to the donor lottery: My first donation: $25k to The Czech Association for Effective Altruism (CZEA) in 2017 My second (and final) donation: $23.5k to Epidemic Forecasting (EpiFor) in 2020 Looking back on the Donor Lottery Call for more projects like these to fund CZEA My previous comments on the original donor lottery post share the basics of how the first $25k was used for CZEA (this was $5k more than I was originally planning to donate due to transfer efficiency considerations). Looking back now, I believe that donation likely had a strong impact on EA community building. My donation was the largest that CZEA had received (I think they previously had received one other large donation—about half the size) and it was enough for CZEA to transition from a purely volunteer organization into a partially-professional organization (1 FTE, plus volunteers). Based on conversations with Jan Kulveit, I believe it would have taken at least 8 more months for CZEA to professionalize otherwise. I believe that in the time they bought with the donation, they were able to more easily secure substantial funding from CEA and other funders, as well as scale up several compelling initiatives: co-organizing Human-aligned AI Summer School, AI Safety Research Program, and a Community Building Retreat (with CEA). I also have been glad to see a handful of people get involved with EA and Rationality through CZEA, and I think the movement is stronger with them. To pick an example familiar to me, several CZEA leaders were recently part of CFAR's Instructor Training Program: Daniel Hynk (Co-founder of CZEA), Jan Kulveit (Senior Research Scholar at FHI), Tomáš Gavenčiak (Independent Researcher who has been funded by EA Grants), and Irena Kotíková (President of CZEA). For more detail on CZEA's early history and the impact of the donor lottery funds (and other influences), see this detailed account. EpiFor In late April 2020, I heard about Epidemic Forecasting—a project launched by people in the EA/Rationality community to inform decision makers by combining epidemic modeling with forecasting. I learned of the funding opportunity through my colleague and friend, Elizabeth Garrett. The pitch was immediately compelling to me as a 5-figure donor: A group of people I already believed to be impressive and trustworthy were launching a project to use forecasting to help powerful people make better decisions about the pandemic. Even though it seemed likely that nothing would come of it, it seemed like an excellent gamble to make, based on the following possible outcomes: Prevent illness, death, and economic damage by helping governments and other decision makers handle the pandemic better, especially governments that couldn't otherwise afford high-quality forecasting services Highlight the power of—and test novel applications of—an underutilized tool: forecasting (see the book Superforecasting for background on this) Test and demonstrate the opportunity for individuals and institutions to do more good for important causes by thinking carefully (Rationality/EA) rather than relying on standard experts and authorities alone Engage members of our community in an effort to change the world for the better, in a way that will give them some quick feedback—thus leading to deeper/faster learning Cross-pollinate our community with professional fie...

The Nonlinear Library: EA Forum Top Posts
A Framework for Assessing the Potential of EA Development in Emerging Locations by jahying

The Nonlinear Library: EA Forum Top Posts

Play Episode Listen Later Dec 11, 2021 33:02


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Framework for Assessing the Potential of EA Development in Emerging Locations, published by jahying on the AI Alignment Forum. Write a Review I would like to thank Max Daniel, Jan Kulveit, Alex Barry, Ozzie Gooen, David Nash, Rose Hadshar, Harri Besceli, Emiel Riiko, Florent Berthet, Jaime Sevilla, Chi Nguyen and Aaron Gertler for reviewing this post. Special thanks to Vaidehi Agarwalla for her immense help with copyediting and research assistance. Also thank you to Wanyi Zeng who inspired my research project and has offered generous support since its inception. This framework evolved out of research conducted as part of the 2019 CEA Summer Research Fellowship. My research project looks at how EA should be developed and approached in Asia. My research mentors were Rose Hadshar and Jan Kulveit. Please note that this post is not endorsed by the FHI, CEA, Open Phil, or other individuals and organizations interviewed as part of the research project. If you would like to support my work, I am currently looking for funding, advisors and collaborators. You can reach me at jahying.chung@gmail.com. If you are short on time, the Summary, Background and Summary Table sections should provide a sufficient overview of the framework. Summary Effective Altruism is growing globally. In Asia, for instance, the number of groups has doubled in the last 2 years [1]. Both group organizers and core EA decision-makers have voiced different views and concerns on how (or whether) this growth should happen. In order to avoid overlooking major risks and opportunities, improve communication, and prevent frustration across parties, how might we get everyone on the same page and have productive conversations about developing EA in an emerging location? This framework attempts to answer that question. It aims to provide a common basis on which different stakeholders can evaluate the potential of EA development in emerging locations. It arose out of expert interviews with core EAs who are actively thinking about community and movement building strategy, including staff at CEA and Open Phil, community managers within other EA organizations, and leading group organizers around the world. This post will first outline the reasons to work on this topic, the value of the framework, and its current status and limitations. Then it will present the framework in the form of a summary table before going in depth into each dimension. Finally, I outline my next steps in applying this framework to Asian locations. In short, the framework applies two types of analyses: group analysis and geographic analysis, and considers two perspectives: cause-generic and cause-specific. In the group analysis, the framework breaks down the question of “how promising is this group?” into three aspects: Group traction: what has the group accomplished so far? Capabilities: what resources do they have? Connections: how do they collaborate/coordinate with other EAs? How are resources transferred? Who do they most frequently interact with, and in what capacity? In the geographic analysis, the framework breaks down the question “how exciting would EA be in this location?” into three aspects: Existing Alignment: how much alignment already exists with EA ideas? Talent: what types of talent exist here, in quantity and quality? Business and Politics: how does power work here? What influential institutions exist here? The analysis can be done from a cause-generic perspective and cause-specific [2] perspective. The full framework has not yet been applied to specific locations and I expect to make adjustments based on feedback from group organizers and core EAs as it is applied and evaluated. Background Terminology Throughout this post I will use the following terms which need some justification or clarification: EA development: instead of ...

The Nonlinear Library: EA Forum Top Posts
What to do with people? by Jan_Kulveit

The Nonlinear Library: EA Forum Top Posts

Play Episode Listen Later Dec 11, 2021 16:07


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What to do with people?, published by Jan_Kulveit> on the effective altruism forum. I would like to offer one possible answer to the ongoing discussion in the effective altruism community, centered around the question about scaleable use of the people (“Task Y”). The following part of the 80000h podcast with Nick Beckstead is a succinct introduction of the problem (as emphasized by alxjrl) Nick Beckstead: (. ) I guess, the way I see it right now is this community doesn't have currently a scalable use of a lot of people. There's some groups that have found efficient scalable uses of a lot of people, and they're using them in different ways. For example, if you look at something like Teach for America, they identified an area where, “Man, we could really use tons and tons of talented people. We'll train them up in a specific problem, improving the US education system. Then, we'll get tons of them to do that. Various of them will keep working on that. Some of them will understand the problems the US education system faces, and fix some of its policy aspects.” That's very much a scalable use of people. It's a very clear instruction, and a way that there's an obvious role for everyone. I think, the Effective Altruist Community doesn't have a scalable use of a lot of its highest value . There's not really a scalable way to accomplish a lot of these highest valued objectives that's standardised like that. The closest thing we have to that right now is you can earn to give and you can donate to any of the causes that are most favored by the Effective Altruist Community. I would feel like the mass movement version of it would be more compelling if we'd have in mind a really efficient and valuable scalable use of people, which I think is something we've figured out less. I guess what I would say is right now, I think we should figure out how to productively use all of the people who are interested in doing as much good as they can, and focus on filling a lot of higher value roles that we can think of that aren't always so standardised or something. We don't need 2000 people to be working on AI strategy, or should be working on technical AI safety exactly. I would focus more on figuring out how we can best use the people that we have right now. Relevant posts and discussions on the topic are under several posts on the forum: Can the EA community copy Teach for America? (Looking for Task Y) After one year of applying for EA jobs: It is really, really hard to get hired by an EA organisation Hierarchical networked structure The answer I'd like to offer is abstract, but general and scalable. The answer is: “build a hierarchical networked structure”, for lack of better name. It is best understood as a mild shift of attitude. A concept on a similar level of generality as “prioritization” or “crucial considerations”. The hierarchical structure can be in physical space, functional space or research space. An example of a hierarchy in physical space could be the structure of local effective altruism groups: it is hard to coordinate an unstructured group of 10 thousands people. It is less hard, but still difficult to coordinate a structure of 200 “local groups” with widely different sizes, cultures and memberships. The optimal solution likely is to coordinate something like 5-25 “regional” coordinators/ hub leaders, who then coordinate with the local groups. The underlying theoretical reasons for such a structure are simple considerations like “network distance” or “bandwidth constraints”. A hierarchy in functional space could be for example a hierarchy of organizations and projects providing people career advice. It is difficult to give personalized career advice to tens of thousands of people as a small and lean organization. Scalable hierarchical version of career advice may look like...

Plus
Interview Plus: Ministerstvo zdravotnictví nemá kapacitu řešit pandemii. Jen šéfem úřadu se to nezmění, varuje vědec

Plus

Play Episode Listen Later Nov 29, 2021 24:24


V Česku se potvrdil výskyt nové varianty koronaviru známé pod označením omikron, která má potenciálně vyšší nakažlivost než v současnosti rozšířená delta. „Dokud nebude imunní dost velká část lidstva, tak ten problém bude asi přetrvávat,“ míní Jan Kulveit z Institutu budoucnosti lidstva na Oxfordské univerzitě.

Interview Plus
Ministerstvo zdravotnictví nemá kapacitu řešit pandemii. Jen šéfem úřadu se to nezmění, varuje vědec

Interview Plus

Play Episode Listen Later Nov 29, 2021 24:24


V Česku se potvrdil výskyt nové varianty koronaviru známé pod označením omikron, která má potenciálně vyšší nakažlivost než v současnosti rozšířená delta. „Dokud nebude imunní dost velká část lidstva, tak ten problém bude asi přetrvávat,“ míní Jan Kulveit z Institutu budoucnosti lidstva na Oxfordské univerzitě.

Interview Plus
Vědec Kulveit: Neočkovat se, protože vakcína je experimentální? Virus taky dělá s tělem experimenty

Interview Plus

Play Episode Listen Later Jun 21, 2021 26:15


Výzkumný pracovník Institutu budoucnosti lidstva na Oxfordské univerzitě Jan Kulveit upozorňuje, že v Británii už z 99 procent převládá indická neboli delta varianta koronaviru. Kulveit tvrdí, že se to dalo čekat: „Vždy je to tak, že když má nějaká varianta výhodu, že se umí šířit rychleji, tak v populaci převládne. Stejně jako se to dřív stalo s britskou variantou alfa.“

DVTV
Komu raketově rostou akcie? Zmatky kolem očkování i čeština na Instagramu v DVTV Start

DVTV

Play Episode Listen Later Jan 28, 2021 4:33


Vláda dnes projedná další zpřísnění opatření. Českých zmatků kolem očkování si všímá agentura Reuters. Expert Jan Kulveit upozorňuje na to, čím by se vláda měla zabývat. Akcie firmy ve ztrátě raketově letí vzhůru. Co je za růstem akcií GameStopu a hrozí akciová bublina? Joe Biden podepsal příkazy k boji proti změnám klimatu. Jak lesní požáry ohrožují planetu? V Británii očkují i v katedrále a hrají jim k tomu varhany. A když si dnes většina školáků nemůže pro vysvědčení, dáme tip, kde se zdokonalit v češtině na Instagramu. DVTV Start ve čtvrtek 28. ledna. Odkazy: Co by pomohlo v boji proti covidu? Odpovídá Jan Kulveit: http://bit.ly/36kqGo3 Raketový růst akcií GameStopu budí pozornost. Čím to je? http://bit.ly/36ho8XK Jak lesní požáry ničí planetu: http://bit.ly/36l6vq5 Očkování v katedrále: http://nyti.ms/3oqgJvx Čeština na Instagramu: https://bit.ly/3tbkOYn