Podcast appearances and mentions of andrew critch

11PODCASTS
106EPISODES
14mAVG DURATION
1EPISODE EVERY OTHER WEEK
Sep 11, 2024LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about andrew critch

The Nonlinear Library

52 episodes with andrew critch

The Nonlinear Library: LessWrong Top Posts

6 episodes with andrew critch

The Nonlinear Library: LessWrong

21 episodes with andrew critch

The Nonlinear Library: Alignment Forum Top Posts

7 episodes with andrew critch

LessWrong Curated Podcast

5 episodes with andrew critch

The Nonlinear Library: LessWrong Daily

2 episodes with andrew critch

Latest podcast episodes about andrew critch

LW - Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It. by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Sep 11, 2024 4:42

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It., published by Andrew Critch on September 11, 2024 on LessWrong. People often attack frontier AI labs for "hypocrisy" when the labs admit publicly that AI is an extinction threat to humanity. Often these attacks ignore the difference between various kinds of hypocrisy, some of which are good, including what I'll call "reformative hypocrisy". Attacking good kinds of hypocrisy can be actively harmful for humanity's ability to survive, and as far as I can tell we (humans) usually shouldn't do that when our survival is on the line. Arguably, reformative hypocrisy shouldn't even be called hypocrisy, due to the negative connotations of "hypocrisy". That said, bad forms of hypocrisy can be disguised as the reformative kind for long periods, so it's important to pay enough attention to hypocrisy to actually figure out what kind it is. Here's what I mean, by way of examples: *** 0. No Hypocrisy Lab: "Building AGI without regulation shouldn't be allowed. Since there's no AGI regulation, I'm not going to build AGI." Meanwhile, the lab doesn't build AGI. This is a case of honest behavior, and what many would consider very high integrity. However, it's not obviously better, and arguably sometimes worse, than... 1. Reformative Hypocrisy: Lab: "Absent adequate regulation for it, building AGI shouldn't be allowed at all, and right now there is no adequate regulation for it. Anyway, I'm building AGI, and calling for regulation, and making lots of money as I go, which helps me prove the point that AGI is powerful and needs to be regulated." Meanwhile, the lab builds AGI and calls for regulation. So, this is a case of honest hypocrisy. I think this is straightforwardly better than... 2. Erosive Hypocrisy: Lab: "Building AGI without regulation shouldn't be allowed, but it is, so I'm going to build it anyway and see how that goes; the regulatory approach to safety is hopeless." Meanwhile, the lab builds AGI and doesn't otherwise put efforts into supporting regulation. This could also be a case of honest hypocrisy, but it erodes the norm that AGI should regulated rather than supporting it. Some even worse forms of hypocrisy include... 3. Dishonest Hypocrisy, which comes in at least two importantly distinct flavors: a) feigning abstinence: Lab: "AGI shouldn't be allowed." Meanwhile, the lab secretly builds AGI, contrary to what one might otherwise guess according to their stance that building AGI is maybe a bad thing, from a should-it-be-allowed perspective. b) feigning opposition: Lab: "AGI should be regulated." Meanwhile, the lab overtly builds AGI, while covertly trying to confuse and subvert regulatory efforts wherever possible. *** It's important to remain aware that reformative hypocrisy can be on net a better thing to do for the world than avoiding hypocrisy completely. It allows you to divert resources from the thing you think should be stopped, and to use those resources to help stop the thing. For mathy people, I'd say this is a way of diagonalizing against a potentially harmful thing, by turning the thing against itself, or against the harmful aspects of itself. For life sciencey people, I'd say this is how homeostasis is preserved, through negative feedback loops whereby bad stuff feeds mechanisms that reduce the bad stuff. Of course, a strategy of feigning opposition (3a) can disguise itself as reformative hypocrisy, so it can be hard to distinguish the two. For example, if a lab says for long time that they're going to admit their hypocritical stance, and then never actually does, then it turns out to be dishonest hypocrisy. On the other hand, if the dishonesty ever does finally end in a way that honestly calls for reform, it's good to reward the honest and reformative aspects of their behavior....

ai attention speech paying reward hypocrisy ea attacking agi close enough selectively rationalist lesswrong andrew critch

LW - Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It. by Andrew Critch

The Nonlinear Library: LessWrong

Play Episode Listen Later Sep 11, 2024 4:42

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It., published by Andrew Critch on September 11, 2024 on LessWrong. People often attack frontier AI labs for "hypocrisy" when the labs admit publicly that AI is an extinction threat to humanity. Often these attacks ignore the difference between various kinds of hypocrisy, some of which are good, including what I'll call "reformative hypocrisy". Attacking good kinds of hypocrisy can be actively harmful for humanity's ability to survive, and as far as I can tell we (humans) usually shouldn't do that when our survival is on the line. Arguably, reformative hypocrisy shouldn't even be called hypocrisy, due to the negative connotations of "hypocrisy". That said, bad forms of hypocrisy can be disguised as the reformative kind for long periods, so it's important to pay enough attention to hypocrisy to actually figure out what kind it is. Here's what I mean, by way of examples: *** 0. No Hypocrisy Lab: "Building AGI without regulation shouldn't be allowed. Since there's no AGI regulation, I'm not going to build AGI." Meanwhile, the lab doesn't build AGI. This is a case of honest behavior, and what many would consider very high integrity. However, it's not obviously better, and arguably sometimes worse, than... 1. Reformative Hypocrisy: Lab: "Absent adequate regulation for it, building AGI shouldn't be allowed at all, and right now there is no adequate regulation for it. Anyway, I'm building AGI, and calling for regulation, and making lots of money as I go, which helps me prove the point that AGI is powerful and needs to be regulated." Meanwhile, the lab builds AGI and calls for regulation. So, this is a case of honest hypocrisy. I think this is straightforwardly better than... 2. Erosive Hypocrisy: Lab: "Building AGI without regulation shouldn't be allowed, but it is, so I'm going to build it anyway and see how that goes; the regulatory approach to safety is hopeless." Meanwhile, the lab builds AGI and doesn't otherwise put efforts into supporting regulation. This could also be a case of honest hypocrisy, but it erodes the norm that AGI should regulated rather than supporting it. Some even worse forms of hypocrisy include... 3. Dishonest Hypocrisy, which comes in at least two importantly distinct flavors: a) feigning abstinence: Lab: "AGI shouldn't be allowed." Meanwhile, the lab secretly builds AGI, contrary to what one might otherwise guess according to their stance that building AGI is maybe a bad thing, from a should-it-be-allowed perspective. b) feigning opposition: Lab: "AGI should be regulated." Meanwhile, the lab overtly builds AGI, while covertly trying to confuse and subvert regulatory efforts wherever possible. *** It's important to remain aware that reformative hypocrisy can be on net a better thing to do for the world than avoiding hypocrisy completely. It allows you to divert resources from the thing you think should be stopped, and to use those resources to help stop the thing. For mathy people, I'd say this is a way of diagonalizing against a potentially harmful thing, by turning the thing against itself, or against the harmful aspects of itself. For life sciencey people, I'd say this is how homeostasis is preserved, through negative feedback loops whereby bad stuff feeds mechanisms that reduce the bad stuff. Of course, a strategy of feigning opposition (3a) can disguise itself as reformative hypocrisy, so it can be hard to distinguish the two. For example, if a lab says for long time that they're going to admit their hypocritical stance, and then never actually does, then it turns out to be dishonest hypocrisy. On the other hand, if the dishonesty ever does finally end in a way that honestly calls for reform, it's good to reward the honest and reformative aspects of their behavior....

ai attention speech paying reward hypocrisy ea attacking agi close enough selectively rationalist lesswrong andrew critch

AF - The need for multi-agent experiments by Martín Soto

The Nonlinear Library

Play Episode Listen Later Aug 1, 2024 17:17

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The need for multi-agent experiments, published by Martín Soto on August 1, 2024 on The AI Alignment Forum. TL;DR: Let's start iterating on experiments that approximate real, society-scale multi-AI deployment Epistemic status: These ideas seem like my most prominent delta with the average AI Safety researcher, have stood the test of time, and are shared by others I intellectually respect. Please attack them fiercely! Multi-polar risks Some authors have already written about multi-polar AI failure. I especially like how Andrew Critch has tried to sketch concrete stories for it. But, without even considering concrete stories yet, I think there's a good a priori argument in favor of worrying about multi-polar failures: We care about the future of society. Certain AI agents will be introduced, and we think they could reduce our control over the trajectory of this system. The way in which this could happen can be divided into two steps: 1. The agents (with certain properties) are introduced in certain positions 2. Given the agents' properties and positions, they interact with each other and the rest of the system, possibly leading to big changes So in order to better control the outcome, it seems worth it to try to understand and manage both steps, instead of limiting ourselves to (1), which is what the alignment community has traditionally done. Of course, this is just one, very abstract argument, which we should update based on observations and more detailed technical understanding. But it makes me think the burden of proof is on multi-agent skeptics to explain why (2) is not important. Many have taken on that burden. The most common reason to dismiss the importance of (2) is expecting a centralized intelligence explosion, a fast and unipolar software takeoff, like Yudkowsky's FOOM. Proponents usually argue that the intelligences we are likely to train will, after meeting a sharp threshold of capabilities, quickly bootstrap themselves to capabilities drastically above those of any other existing agent or ensemble of agents. And that these capabilities will allow them to gain near-complete strategic advantage and control over the future. In this scenario, all the action is happening inside a single agent, and so you should only care about shaping its properties (or delaying its existence). I tentatively expect more of a decentralized hardware singularity[1] than centralized software FOOM. But there's a weaker claim in which I'm more confident: we shouldn't right now be near-certain of a centralized FOOM.[2] I expect this to be the main crux with many multi-agent skeptics, and won't argue for it here (but rather in an upcoming post). Even given a decentralized singularity, one can argue that the most leveraged way for us to improve multi-agent interactions is by ensuring that individual agents possess certain properties (like honesty or transparency), or that at least we have enough technical expertise to shape them on the go. I completely agree that this is the natural first thing to look at. But I think focusing on multi-agent interactions directly is a strong second, and a lot of marginal value might lie there given how neglected they've been until now (more below). I do think many multi-agent interventions will require certain amounts of single-agent alignment technology. This will of course be a crux with alignment pessimists. Finally, for this work to be counterfactually useful it's also required that AI itself (in decision-maker or researcher positions) won't iteratively solve the problem by default. Here, I do think we have some reasons to expect (65%) that intelligent enough AIs aligned with their principals don't automatically solve catastrophic conflict. In those worlds, early interventions can make a big difference setting the right incentives for future agent...

ai speech experiments agent mart ea soto tl proponents rationalist yudkowsky foom andrew critch

LW - The need for multi-agent experiments by Martín Soto

The Nonlinear Library

Play Episode Listen Later Aug 1, 2024 17:17

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The need for multi-agent experiments, published by Martín Soto on August 1, 2024 on LessWrong. TL;DR: Let's start iterating on experiments that approximate real, society-scale multi-AI deployment Epistemic status: These ideas seem like my most prominent delta with the average AI Safety researcher, have stood the test of time, and are shared by others I intellectually respect. Please attack them fiercely! Multi-polar risks Some authors have already written about multi-polar AI failure. I especially like how Andrew Critch has tried to sketch concrete stories for it. But, without even considering concrete stories yet, I think there's a good a priori argument in favor of worrying about multi-polar failures: We care about the future of society. Certain AI agents will be introduced, and we think they could reduce our control over the trajectory of this system. The way in which this could happen can be divided into two steps: 1. The agents (with certain properties) are introduced in certain positions 2. Given the agents' properties and positions, they interact with each other and the rest of the system, possibly leading to big changes So in order to better control the outcome, it seems worth it to try to understand and manage both steps, instead of limiting ourselves to (1), which is what the alignment community has traditionally done. Of course, this is just one, very abstract argument, which we should update based on observations and more detailed technical understanding. But it makes me think the burden of proof is on multi-agent skeptics to explain why (2) is not important. Many have taken on that burden. The most common reason to dismiss the importance of (2) is expecting a centralized intelligence explosion, a fast and unipolar software takeoff, like Yudkowsky's FOOM. Proponents usually argue that the intelligences we are likely to train will, after meeting a sharp threshold of capabilities, quickly bootstrap themselves to capabilities drastically above those of any other existing agent or ensemble of agents. And that these capabilities will allow them to gain near-complete strategic advantage and control over the future. In this scenario, all the action is happening inside a single agent, and so you should only care about shaping its properties (or delaying its existence). I tentatively expect more of a decentralized hardware singularity[1] than centralized software FOOM. But there's a weaker claim in which I'm more confident: we shouldn't right now be near-certain of a centralized FOOM.[2] I expect this to be the main crux with many multi-agent skeptics, and won't argue for it here (but rather in an upcoming post). Even given a decentralized singularity, one can argue that the most leveraged way for us to improve multi-agent interactions is by ensuring that individual agents possess certain properties (like honesty or transparency), or that at least we have enough technical expertise to shape them on the go. I completely agree that this is the natural first thing to look at. But I think focusing on multi-agent interactions directly is a strong second, and a lot of marginal value might lie there given how neglected they've been until now (more below). I do think many multi-agent interventions will require certain amounts of single-agent alignment technology. This will of course be a crux with alignment pessimists. Finally, for this work to be counterfactually useful it's also required that AI itself (in decision-maker or researcher positions) won't iteratively solve the problem by default. Here, I do think we have some reasons to expect (65%) that intelligent enough AIs aligned with their principals don't automatically solve catastrophic conflict. In those worlds, early interventions can make a big difference setting the right incentives for future agents, or providi...

ai speech experiments agent mart ea soto tl proponents rationalist yudkowsky foom lesswrong andrew critch

LW - The need for multi-agent experiments by Martín Soto

The Nonlinear Library: LessWrong

Play Episode Listen Later Aug 1, 2024 17:17

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The need for multi-agent experiments, published by Martín Soto on August 1, 2024 on LessWrong. TL;DR: Let's start iterating on experiments that approximate real, society-scale multi-AI deployment Epistemic status: These ideas seem like my most prominent delta with the average AI Safety researcher, have stood the test of time, and are shared by others I intellectually respect. Please attack them fiercely! Multi-polar risks Some authors have already written about multi-polar AI failure. I especially like how Andrew Critch has tried to sketch concrete stories for it. But, without even considering concrete stories yet, I think there's a good a priori argument in favor of worrying about multi-polar failures: We care about the future of society. Certain AI agents will be introduced, and we think they could reduce our control over the trajectory of this system. The way in which this could happen can be divided into two steps: 1. The agents (with certain properties) are introduced in certain positions 2. Given the agents' properties and positions, they interact with each other and the rest of the system, possibly leading to big changes So in order to better control the outcome, it seems worth it to try to understand and manage both steps, instead of limiting ourselves to (1), which is what the alignment community has traditionally done. Of course, this is just one, very abstract argument, which we should update based on observations and more detailed technical understanding. But it makes me think the burden of proof is on multi-agent skeptics to explain why (2) is not important. Many have taken on that burden. The most common reason to dismiss the importance of (2) is expecting a centralized intelligence explosion, a fast and unipolar software takeoff, like Yudkowsky's FOOM. Proponents usually argue that the intelligences we are likely to train will, after meeting a sharp threshold of capabilities, quickly bootstrap themselves to capabilities drastically above those of any other existing agent or ensemble of agents. And that these capabilities will allow them to gain near-complete strategic advantage and control over the future. In this scenario, all the action is happening inside a single agent, and so you should only care about shaping its properties (or delaying its existence). I tentatively expect more of a decentralized hardware singularity[1] than centralized software FOOM. But there's a weaker claim in which I'm more confident: we shouldn't right now be near-certain of a centralized FOOM.[2] I expect this to be the main crux with many multi-agent skeptics, and won't argue for it here (but rather in an upcoming post). Even given a decentralized singularity, one can argue that the most leveraged way for us to improve multi-agent interactions is by ensuring that individual agents possess certain properties (like honesty or transparency), or that at least we have enough technical expertise to shape them on the go. I completely agree that this is the natural first thing to look at. But I think focusing on multi-agent interactions directly is a strong second, and a lot of marginal value might lie there given how neglected they've been until now (more below). I do think many multi-agent interventions will require certain amounts of single-agent alignment technology. This will of course be a crux with alignment pessimists. Finally, for this work to be counterfactually useful it's also required that AI itself (in decision-maker or researcher positions) won't iteratively solve the problem by default. Here, I do think we have some reasons to expect (65%) that intelligent enough AIs aligned with their principals don't automatically solve catastrophic conflict. In those worlds, early interventions can make a big difference setting the right incentives for future agents, or providi...

ai speech experiments agent mart ea soto tl proponents rationalist yudkowsky foom lesswrong andrew critch

AF - AI Constitutions are a tool to reduce societal scale risk by Samuel Dylan Martin

The Nonlinear Library

Play Episode Listen Later Jul 25, 2024 35:15

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Constitutions are a tool to reduce societal scale risk, published by Samuel Dylan Martin on July 25, 2024 on The AI Alignment Forum. Sammy Martin, Polaris Ventures As AI systems become more integrated into society, we face potential societal-scale risks that current regulations fail to address. These risks include cooperation failures, structural failures from opaque decision-making, and AI-enabled totalitarian control. We propose enhancing LLM-based AI Constitutions and Model Specifications to mitigate these risks by implementing specific behaviours aimed at improving AI systems' epistemology, decision support capabilities, and cooperative intelligence. This approach offers a practical, near-term intervention to shape AI behaviour positively. We call on AI developers, policymakers, and researchers to consider and implement improvements along these lines, as well as for more research into testing Constitution/Model Spec improvements, setting a foundation for more responsible AI development that reduces long-term societal risks. Introduction There is reason to believe that in the near future, autonomous, LLM based AI systems, while not necessarily surpassing human intelligence in all domains, will be widely deployed throughout society. We anticipate a world where AI will be making some decisions on our behalf, following complex plans, advising on decision-making and negotiation, and presenting conclusions without human oversight at every step. While this is already happening to some degree in low-stakes settings, we must prepare for its expansion into high-stakes domains (e.g. politics, the military), and do our best to anticipate the systemic, societal scale risks that might result and act to prevent them. Most of the important work on reducing societal-scale risk will, by their very nature, have to involve policy changes, for example to ensure that there are humans in the loop on important decisions, but there are some technical interventions which we have identified that can help. We believe that by acting now to improve the epistemology (especially on moral or political questions), decision support capabilities and cooperative intelligence of LLM based AI systems, we can mitigate near-term risks and also set important precedents for future AI development. We aim to do this by proposing enhancements to AI Constitutions or Model Specifications. If adopted, we believe these improvements will reduce societal-scale risks which have so far gone unaddressed by AI regulation. Here, we justify this overall conclusion and propose preliminary changes that we think might improve AI Constitutions. We aim to empirically test and iterate on these improvements before finalising them. Recent years have seen significant efforts to regulate frontier AI, from independent initiatives to government mandates. Many of these are just aimed at improving oversight in general (for example, the reporting requirements in EO 14110), but some are directed at destructive misuse or loss of control (for example, the requirement to prove no catastrophic potential in SB 1047 and the independent tests run by the UK AISI). Many are also directed at near-term ethical concerns. However, we haven't seen shovel ready regulation or voluntary commitments proposed to deal with longer-term societal-scale risks, even though these have been much discussed in the AI safety community. Some experts, (e.g. Andrew Critch), argue these may represent the most significant source of overall AI risk and they have been discussed as 'societal scale risks', for example in Critch and Russel's TARSA paper. What are these "less obvious" 'societal scale' risks? Some examples: Cooperation failures: AI systems are widely integrated into society, used for advice on consequential decisions and delegated decision making power, but...

ai risk tool speech scale constitution reduce ea cooperation societal llm rationalist dylan martin andrew critch

AF - Safety isn't safety without a social model (or: dispelling the myth of per se technical safety) by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Jun 14, 2024 7:32

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Safety isn't safety without a social model (or: dispelling the myth of per se technical safety), published by Andrew Critch on June 14, 2024 on The AI Alignment Forum. As an AI researcher who wants to do technical work that helps humanity, there is a strong drive to find a research area that is definitely helpful somehow, so that you don't have to worry about how your work will be applied, and thus you don't have to worry about things like corporate ethics or geopolitics to make sure your work benefits humanity. Unfortunately, no such field exists. In particular, technical AI alignment is not such a field, and technical AI safety is not such a field. It absolutely matters where ideas land and how they are applied, and when the existence of the entire human race is at stake, that's no exception. If that's obvious to you, this post is mostly just a collection of arguments for something you probably already realize. But if you somehow think technical AI safety or technical AI alignment is somehow intrinsically or inevitably helpful to humanity, this post is an attempt to change your mind. In particular, with more and more AI governance problems cropping up, I'd like to see more and more AI technical staffers forming explicit social models of how their ideas are going to be applied. If you read this post, please don't try to read this post as somehow pro- or contra- a specific area of AI research, or safety, or alignment, or corporations, or governments. My goal in this post is to encourage more nuanced social models by de-conflating a bunch of concepts. This might seem like I'm against the concepts themselves, when really I just want clearer thinking about these concepts, so that we (humanity) can all do a better job of communicating and working together. Myths vs reality Epistemic status: these are claims that I'm confident in, assembled over 1.5 decades of observation of existential risk discourse, through thousands of hours of conversation. They are not claims I'm confident I can convince you of, but I'm giving it a shot anyway because there's a lot at stake when people don't realize how their technical research is going to be misapplied. Myth #1: Technical AI safety and/or alignment advances are intrinsically safe and helpful to humanity, irrespective of the state of humanity. Reality: All technical advances in AI safety and/or "alignment" can be misused by humans. There are no technical advances in AI that are safe per se; the safety or unsafety of an idea is a function of the human environment in which the idea lands. Examples: Obedience - AI that obeys the intention of a human user can be asked to help build unsafe AGI, such as by serving as a coding assistant. (Note: this used to be considered extremely sci-fi, and now it's standard practice.) Interpretability - Tools or techniques for understanding the internals of AI models will help developers better understand what they're building and hence speed up development, possibly exacerbating capabilities races. Truthfulness - AI that is designed to convey true statements to a human can also be asked questions by that human to help them build an unsafe AGI. Myth #2: There's a {technical AI safety VS AI capabilities} dichotomy or spectrum of technical AI research, which also corresponds to {making humanity more safe VS shortening AI timelines}. Reality: Conflating these concepts has three separate problems with it, (a)-(c) below: a) AI safety and alignment advances almost always shorten AI timelines. In particular, the ability to "make an AI system do what you want" is used almost instantly by AI companies to help them ship AI products faster (because the AI does what users want) and to build internal developer tools faster (because the AI does what developers want). (When I point this out, usually people think I'm s...

ai safety myth myths speech technical ea dispelling agi per se rationalist social model andrew critch

“Safety isn't safety without a social model (or: dispelling the myth of per se technical safety)” by Andrew_Critch

LessWrong Curated Podcast

Play Episode Listen Later Jun 14, 2024 8:47

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.As an AI researcher who wants to do technical work that helps humanity, there is a strong drive to find a research area that is definitely helpful somehow, so that you don't have to worry about how your work will be applied, and thus you don't have to worry about things like corporate ethics or geopolitics to make sure your work benefits humanity.Unfortunately, no such field exists. In particular, technical AI alignment is not such a field, and technical AI safety is not such a field. It absolutely matters where ideas land and how they are applied, and when the existence of the entire human race is at stake, that's no exception.If that's obvious to you, this post is mostly just a collection of arguments for something you probably already realize. But if you somehow [...]--- First published: June 14th, 2024 Source: https://www.lesswrong.com/posts/F2voF4pr3BfejJawL/safety-isn-t-safety-without-a-social-model-or-dispelling-the --- Narrated by TYPE III AUDIO.

ai safety myth technical dispelling per se social model andrew critch

[HUMAN VOICE] "CFAR Takeaways: Andrew Critch" by Raemon

LessWrong Curated Podcast

Play Episode Listen Later Mar 10, 2024 9:10

Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/Jash4Gbi2wpThzZ4k/cfar-takeaways-andrew-critchNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓

takeaways human voice lesswrong cfar andrew critch

CFAR Takeaways: Andrew Critch

LessWrong Curated Podcast

Play Episode Listen Later Feb 15, 2024 9:48

I'm trying to build my own art of rationality training, and I've started talking to various CFAR instructors about their experiences – things that might be important for me to know but which hadn't been written up nicely before.This is a quick write up of a conversation with Andrew Critch about his takeaways. (I took rough notes, and then roughly cleaned them up for this. I don't know "What surprised you most during your time at CFAR?Surprise 1: People are profoundly non-numerate. And, people who are not profoundly non-numerate still fail to connect numbers to life. I'm still trying to find a way to teach people to apply numbers for their life. For example: "This thing is annoying you. How many minutes is it annoying you today? how many days will it annoy you?". I compulsively do this. There aren't things lying around in [...]--- First published: February 14th, 2024 Source: https://www.lesswrong.com/posts/Jash4Gbi2wpThzZ4k/cfar-takeaways-andrew-critch --- Narrated by TYPE III AUDIO.

surprise takeaways cfar andrew critch

LW - CFAR Takeaways: Andrew Critch by Raemon

The Nonlinear Library

Play Episode Listen Later Feb 14, 2024 8:11

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: CFAR Takeaways: Andrew Critch, published by Raemon on February 14, 2024 on LessWrong. I'm trying to build my own art of rationality training, and I've started talking to various CFAR instructors about their experiences - things that might be important for me to know but which hadn't been written up nicely before. This is a quick write up of a conversation with Andrew Critch about his takeaways. (I took rough notes, and then roughly cleaned them up for this. Some of my phrasings might not exactly match his intended meaning, although I've tried to separate out places where I'm guessing what he meant from places where I'm repeating his words as best I can) "What surprised you most during your time at CFAR? Surprise 1: People are profoundly non-numerate. And, people who are not profoundly non-numerate still fail to connect numbers to life. I'm still trying to find a way to teach people to apply numbers for their life. For example: "This thing is annoying you. How many minutes is it annoying you today? how many days will it annoy you?". I compulsively do this. There aren't things lying around in my life that bother me because I always notice and deal with it. People are very scale-insensitive. Common loci of scale-insensitivity include jobs, relationship, personal hygiene habits, eating habits, and private things people do in their private homes for thousands of hours. I thought it'd be easy to use numbers to not suck. Surprise 2: People don't realize they need to get over things. There was a unit a CFAR called 'goal factoring'. Early in it's development, the instructor would say to their class: "if you're doing something continuously, fill out a 2x2 matrix", where you ask: 1) does this bother me? (yes or not), and 2) is it a problem? (yes or no). Some things will bother you and not be a problem. This unit is not for that." The thing that surprised me, was that I told the "C'mon instructor. It's not necessary to manually spell out that people just need to accept some things and get over them. People know that, it's not worth spending the minute on it." At the next class, the instructor asked the class: "When something bothers you, do you ask if you need to get over it?". 10% of people raised their hand. People didn't know the "realize that some things bother you but it's not a problem and you can get over it." Surprise 3: When I learned Inner Simulator from Kenzie, I was surprised that it helped with everything in life forever. [I replied: "I'm surprised that you were surprised. I'd expect that to have already been part of your repertoire."] The difference between Inner Simulator and the previous best tool I had was: Previously, I thought of my system 1 as something that both "decided to make queries" and "returned the results of the queries." i.e. my fast intuitions would notice something and give me information about it. I previously thought of "inner sim" as a different intelligence that worked on it's own. The difference with Kenzie's "Inner Sim" approach is that my System 2 could decide when to query System 1. And then System 1 would return the query with its anticipations (which System 2 wouldn't be able to generate on its own). [What questions is System 1 good at asking that System 2 wouldn't necessarily ask?] System 1 is good at asking "is this person screwing me over?" without my S2 having to realize that now's a good time to ask that question. (S2 also does sometimes ask this question, at complementary times) Surprise 4: How much people didn't seem to want things And, the degree to which people wanted things was even more incoherent than I thought. I thought people wanted things but didn't know how to pursue them. [I think Critch trailed off here, but implication seemed to be "basically people just didn't want things in the first place"] What do other people see...

system surprise speech takeaways ea s2 rationalist lesswrong cfar andrew critch

LW - CFAR Takeaways: Andrew Critch by Raemon

The Nonlinear Library: LessWrong

Play Episode Listen Later Feb 14, 2024 8:11

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: CFAR Takeaways: Andrew Critch, published by Raemon on February 14, 2024 on LessWrong. I'm trying to build my own art of rationality training, and I've started talking to various CFAR instructors about their experiences - things that might be important for me to know but which hadn't been written up nicely before. This is a quick write up of a conversation with Andrew Critch about his takeaways. (I took rough notes, and then roughly cleaned them up for this. Some of my phrasings might not exactly match his intended meaning, although I've tried to separate out places where I'm guessing what he meant from places where I'm repeating his words as best I can) "What surprised you most during your time at CFAR? Surprise 1: People are profoundly non-numerate. And, people who are not profoundly non-numerate still fail to connect numbers to life. I'm still trying to find a way to teach people to apply numbers for their life. For example: "This thing is annoying you. How many minutes is it annoying you today? how many days will it annoy you?". I compulsively do this. There aren't things lying around in my life that bother me because I always notice and deal with it. People are very scale-insensitive. Common loci of scale-insensitivity include jobs, relationship, personal hygiene habits, eating habits, and private things people do in their private homes for thousands of hours. I thought it'd be easy to use numbers to not suck. Surprise 2: People don't realize they need to get over things. There was a unit a CFAR called 'goal factoring'. Early in it's development, the instructor would say to their class: "if you're doing something continuously, fill out a 2x2 matrix", where you ask: 1) does this bother me? (yes or not), and 2) is it a problem? (yes or no). Some things will bother you and not be a problem. This unit is not for that." The thing that surprised me, was that I told the "C'mon instructor. It's not necessary to manually spell out that people just need to accept some things and get over them. People know that, it's not worth spending the minute on it." At the next class, the instructor asked the class: "When something bothers you, do you ask if you need to get over it?". 10% of people raised their hand. People didn't know the "realize that some things bother you but it's not a problem and you can get over it." Surprise 3: When I learned Inner Simulator from Kenzie, I was surprised that it helped with everything in life forever. [I replied: "I'm surprised that you were surprised. I'd expect that to have already been part of your repertoire."] The difference between Inner Simulator and the previous best tool I had was: Previously, I thought of my system 1 as something that both "decided to make queries" and "returned the results of the queries." i.e. my fast intuitions would notice something and give me information about it. I previously thought of "inner sim" as a different intelligence that worked on it's own. The difference with Kenzie's "Inner Sim" approach is that my System 2 could decide when to query System 1. And then System 1 would return the query with its anticipations (which System 2 wouldn't be able to generate on its own). [What questions is System 1 good at asking that System 2 wouldn't necessarily ask?] System 1 is good at asking "is this person screwing me over?" without my S2 having to realize that now's a good time to ask that question. (S2 also does sometimes ask this question, at complementary times) Surprise 4: How much people didn't seem to want things And, the degree to which people wanted things was even more incoherent than I thought. I thought people wanted things but didn't know how to pursue them. [I think Critch trailed off here, but implication seemed to be "basically people just didn't want things in the first place"] What do other people see...

system surprise speech takeaways ea s2 rationalist lesswrong cfar andrew critch

LW - Apply to the Conceptual Boundaries Workshop for AI Safety by Chipmonk

The Nonlinear Library

Play Episode Listen Later Nov 28, 2023 3:14

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Apply to the Conceptual Boundaries Workshop for AI Safety, published by Chipmonk on November 28, 2023 on LessWrong. Do you have experience with Active Inference, Embedded Agency, biological gap junctions, or other frameworks that separate agents from their environment? Apply to the Conceptual Boundaries Workshop for AI safety. February in Austin TX. Website and application A (small) workshop to identify promising boundaries research directions and empirical projects. Boundaries keep agents causally separate from their environment. This is crucial for their survival and continued autonomy. A bacterium relies on its membrane to protect its internal processes from external influences. Secure computer systems use controlled inputs and outputs to prevent unauthorized access. Nations maintain sovereignty by securing their borders. Humans protect their mental integrity by selectively filtering the information that comes in and out. When an agent's boundary is respected, that agent maintains its autonomy. Boundaries show a way to respect agents that is distinct from respecting preferences or utility functions. Expanding on this idea, Andrew Critch says the following in "Boundaries" Sequence, Part 3b: my goal is to treat boundaries as more fundamental than preferences, rather than as merely a feature of them. In other words, I think boundaries are probably better able to carve reality at the joints than either preferences or utility functions, for the purpose of creating a good working relationship between humanity and AI technology For instance, respecting a bacterium means not disrupting its membrane, rather than understanding and acting on its desires. Boundaries act as a natural abstraction promoting safety and autonomy. By formalizing the boundaries that ensure world safety, we could better position ourselves to protect humanity from the threat of transformative AI. Attendees Confirmed: David 'davidad' Dalrymple Scott Garrabrant TJ (Tushant Jha) Andrew Critch Chris Lakin (organizer) Evan Miyazono (co-organizer) Seeking 6-10 more guests who either: Have prior experience with technical or philosophical approaches that separate agents from their environment. Approaches like "boundaries", active inference and Markov blankets, embedded agency, cell gap junctions, etc. Are willing and able to implement approaches planned at the workshop. The worst outcome from a workshop is a bunch of promised follow-ups that result in nothing. E.g.: PhD candidates or postdocs who are looking for new projects. Website and application Get notified about future "boundaries" events We are also considering running other "boundaries"-related workshops in mid 2024. For example a larger more general workshop, or domain-specific workshops (e.g.: boundaries in biology, boundaries in computer security). If you would like to get notified about potential future events, sign up via the form on the footer of the website. How you can help Repost this workshop on Twitter Share with anyone you think might be a good fit Let me know if there's anywhere else I can advertise. (I don't want to just get people who check LessWrong!) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

ai phd safety boundaries humans seeking speech nations workshop expanding secure ea approaches austin texas conceptual markov rationalist lesswrong andrew critch

AF - Consciousness as a conflationary alliance term by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Jul 10, 2023 16:34

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Consciousness as a conflationary alliance term, published by Andrew Critch on July 10, 2023 on The AI Alignment Forum. Tl;dr: In this post, I argue that the concept of 'consciousness' is more conflated than people realize, in that there's a lot of divergence in what people mean by "consciousness", and people are unaware of the degree of divergence. This confusion allows the formation of broad alliances around the value of consciousness, even when people don't agree on what it actually means. I call alliances built around conflated terms "conflationary alliances". Executive Summary Part 1: Mostly during my PhD, I somewhat-methodically interviewed a couple dozen people to figure out what they meant by consciousness, and found that (a) there seems to be a surprising amount of diversity in what people mean by the "consciousness", and (b) they are often surprised to find out that other people mean different things when they say "consciousness". This has implications for AI safety advocacy because AI will sometimes be feared and/or protected on the grounds that it is "conscious", and it's good to be able to navigate these debates wisely. (Other heavily conflated terms in AI discourse might include "fairness", "justice", "alignment", and "safety", although I don't want to debate any of those cases here. This post is going to focus on consciousness, and general ideas about the structure of alliances built around confused concepts in general.) Part 2: When X is a conflated term like "consciousness", large alliances can form around claims like "X is important" or "X should be protected". Here, the size of the alliance is a function of how many concepts get conflated with X. Thus, the alliance grows because of the confusion of meanings, not in spite of it. I call this a conflationary alliance. Persistent conflationary alliances resist disambiguation of their core conflations, because doing so would break up the alliance into factions who value the more precisely defined terms. The resistance to deconflation can be conscious, or merely a social habit or inertia. Part 1: What people mean by "consciousness". "Consciousness" is an interesting word, because many people have already started to notice that it's a confused term, yet there is still widespread agreement that conscious beings have moral value. You'll even find some people taking on strange positions like "I'm not conscious" or "I don't know if I'm conscious" or "lookup tables are conscious", as if rebelling against the implicit alliance forming around the "consciousness" concept. What's going on here? To investigate, over about 10 years between 2008 and 2018 I informally interviewed dozens of people who I noticed were interested in talking about consciousness, for 1-3 hours each. I did not publish these results, and never intended to, because I was mainly just investigating for my own interest. In retrospect, it would have been better, for me and for anyone reading this post, if I'd made a proper anthropological study of it. I'm sorry that didn't happen. In any case, here is what I have to share: "Methodology" Extremely informal; feel free to skip or just come back to this part if you want to see my conclusions first. Whom did I interview? Mostly academics I met in grad school, in cognitive science, AI, ML, and mathematics. In an ad hoc manner at academic or other intellectually-themed gatherings, whenever people talked about consciousness, I gravitated toward the conversation and tried to get someone to spend a long conversation with me to unpack what they meant. How did I interview them? What I asked each person was to take some time to look inside their own minds - sometimes starting out by paying attention to just their bodies, if introspection was hard for them - and try to describe to me in more detail the thing the...

ai phd speech alliance consciousness ea persistent ml tl rationalist andrew critch

LW - Consciousness as a conflationary alliance term by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Jul 10, 2023 16:34

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Consciousness as a conflationary alliance term, published by Andrew Critch on July 10, 2023 on LessWrong. Tl;dr: In this post, I argue that the concept of 'consciousness' is more conflated than people realize, in that there's a lot of divergence in what people mean by "consciousness", and people are unaware of the degree of divergence. This confusion allows the formation of broad alliances around the value of consciousness, even when people don't agree on what it actually means. I call alliances built around conflated terms "conflationary alliances". Executive Summary Part 1: Mostly during my PhD, I somewhat-methodically interviewed a couple dozen people to figure out what they meant by consciousness, and found that (a) there seems to be a surprising amount of diversity in what people mean by the "consciousness", and (b) they are often surprised to find out that other people mean different things when they say "consciousness". This has implications for AI safety advocacy because AI will sometimes be feared and/or protected on the grounds that it is "conscious", and it's good to be able to navigate these debates wisely. (Other heavily conflated terms in AI discourse might include "fairness", "justice", "alignment", and "safety", although I don't want to debate any of those cases here. This post is going to focus on consciousness, and general ideas about the structure of alliances built around confused concepts in general.) Part 2: When X is a conflated term like "consciousness", large alliances can form around claims like "X is important" or "X should be protected". Here, the size of the alliance is a function of how many concepts get conflated with X. Thus, the alliance grows because of the confusion of meanings, not in spite of it. I call this a conflationary alliance. Persistent conflationary alliances resist disambiguation of their core conflations, because doing so would break up the alliance into factions who value the more precisely defined terms. The resistance to deconflation can be deliberate, or merely a social habit or inertia. Part 1: What people mean by "consciousness". "Consciousness" is an interesting word, because many people have already started to notice that it's a confused term, yet there is still widespread agreement that conscious beings have moral value. You'll even find some people taking on strange positions like "I'm not conscious" or "I don't know if I'm conscious" or "lookup tables are conscious", as if rebelling against the implicit alliance forming around the "consciousness" concept. What's going on here? To investigate, over about 10 years between 2008 and 2018 I informally interviewed dozens of people who I noticed were interested in talking about consciousness, for 1-3 hours each. I did not publish these results, and never intended to, because I was mainly just investigating for my own interest. In retrospect, it would have been better, for me and for anyone reading this post, if I'd made a proper anthropological study of it. I'm sorry that didn't happen. In any case, here is what I have to share: "Methodology" Extremely informal; feel free to skip or just come back to this part if you want to see my conclusions first. Whom did I interview? Mostly academics I met in grad school, in cognitive science, AI, ML, and mathematics. In an ad hoc manner at academic or other intellectually-themed gatherings, whenever people talked about consciousness, I gravitated toward the conversation and tried to get someone to spend a long conversation with me to unpack what they meant. How did I interview them? What I asked each person was to take some time to look inside their own minds - sometimes starting out by paying attention to just their bodies, if introspection was hard for them - and try to describe to me in more detail the thing they were calli...

ai phd speech alliance consciousness ea persistent ml tl rationalist lesswrong andrew critch

AF - TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Jun 13, 2023 1:36

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI, published by Andrew Critch on June 13, 2023 on The AI Alignment Forum. Partly in response to calls for more detailed accounts of how AI could go wrong, e.g., from Ng and Bengio's recent exchange on Twitter, here's a new paper with Stuart Russell: Discussion on Twitter... comments welcome! arXiv draft:"TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI" Many of the ideas will not be new to LessWrong or the Alignment Forum, but holistically I hope the paper will make a good case to the world for using logically exhaustive arguments to identify risks (which, outside LessWrong, is often not assumed to be a valuable approach to thinking about risk). I think the most important figure from the paper is this one: ... and, here are some highlights: Self-fulfilling pessimism:#page=4 Industries that could eventually get out of control in a closed loop:#page=5...as in this "production web" story:#page=6 Two "bigger than expected" AI impact stories:#page=8 Email helpers and corrupt mediators, which kinda go together:#page=10#page=11 Harmful A/B testing:#page=12 Concerns about weaponization by criminals and states:#page=13 Enjoy :) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

ai speech scale risks analysis ea societal ng partly taxonomy arxiv rationalist lesswrong andrew critch

LW - TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Jun 13, 2023 1:35

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI, published by Andrew Critch on June 13, 2023 on LessWrong. Partly in response to calls for more detailed accounts of how AI could go wrong, e.g., from Ng and Bengio's recent exchange on Twitter, here's a new paper with Stuart Russell: Discussion on Twitter... comments welcome! arXiv draft:"TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI" Many of the ideas will not be new to LessWrong or the Alignment Forum, but holistically I hope the paper will make a good case to the world for using logically exhaustive arguments to identify risks (which, outside LessWrong, is often not assumed to be a valuable approach to thinking about risk). I think the most important figure from the paper is this one: ... and, here are some highlights: Self-fulfilling pessimism:#page=4 Industries that could eventually get out of control in a closed loop:#page=5...as in this "production web" story:#page=6 Two "bigger than expected" AI impact stories:#page=8 Email helpers and corrupt mediators, which kinda go together:#page=10#page=11 Harmful A/B testing:#page=12 Concerns about weaponization by criminals and states:#page=13 Enjoy :) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

ai speech scale risks analysis ea societal ng partly taxonomy arxiv rationalist lesswrong andrew critch

LW - My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI by Andrew Critch

The Nonlinear Library

Play Episode Listen Later May 24, 2023 12:13

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI, published by Andrew Critch on May 24, 2023 on LessWrong. I have a mix of views on AI x-risk in general — and on OpenAI specifically — that no one seems be able to remember, due to my views not being easily summarized as those of a particular tribe or social group or cluster. For some of the views I consider most neglected and urgently important at this very moment, I've decided to write them here, all-in-one-place to avoid presumptions that being "for X" means I'm necessarily "against Y" for various X and Y. Probably these views will be confusing to read, especially if you're implicitly trying to pin down "which side" of some kind of debate or tribal affiliation I land on. I don't tend to choose my beliefs in a way that's strongly correlated with or caused by the people I affiliate with. As a result, I apologize in advance if I'm not easily remembered as "for" or "against" any particular protest or movement or statement, even though I in fact have pretty clear views on most topics in this space... the views just aren't correlated according to the usual social-correlation-matrix. Anyhoo: Regarding "pausing": I think pausing superintelligence development using collective bargaining agreements between individuals and/or states and/or companies is a good idea, along the lines of FLI's "Pause Giant AI Experiments", which I signed early and advocated for. Regarding OpenAI, I feel overall positively about them: I think OpenAI has been a net-positive influence for reducing x-risk from AI, mainly by releasing products in a sufficiently helpful-yet-fallible form that society is now able to engage in less-abstract more-concrete public discourse to come to grips AI and (soon) AI-risk. I've found OpenAI's behaviors and effects as an institution to be well-aligned with my interpretations of what they've said publicly. That said, I'm also sympathetic to people other than me who expected more access to models or less access to models than what OpenAI has ended up granting; but my personal assessment, based on my prior expectations from reading their announcements, is "Yeah, this is what I thought you told us you'd do... thanks!". I've also found OpenAI's various public testimonies, especially to Congress, to move the needle on helping humanity come to grips with AI x-risk in a healthy and coordinated way (relative to what would happen if OpenAI made their testimony and/or products less publicly accessible). I also like their charter, which creates tremendous pressure on them from their staff and the public to behave in particular ways. This leaves me, on-net, a fan of OpenAI. Given their recent post on Governance of Superintelligence, I can't tell if their approach to superintelligence is something I do or will agree with, but I expect to find that out over the next year or two, because of the openness of their communications and stance-taking. And, I appreciate the chance to for me, and the public, to engage in dialogue with them about it. I think the world is vilifying OpenAI too much, and that doing so is probably net-negative for existential safety. Specifically, I think people are currently over-targeting OpenAI with criticism that's easy to formulate because of the broad availability of OpenAI's products, services, and public statements. This makes them more vulnerable to attack than other labs, and I think piling onto them for that is a mistake from an x-safety perspective, in the "shooting the messenger" category. I.e., over-targeting OpenAI with criticism right now is pushing present and future companies toward being less forthright in ways that OpenAI has been forthright, thereby training the world to have less awareness of x-risk and weaker collective orien...

ai safety congress empathy speech concerns priorities ea governance openai unification superintelligence rationalist vilification fli lesswrong andrew critch

LW - Job Opening: SWE to help build signature vetting system for AI-related petitions by Ethan Ashkie

The Nonlinear Library

Play Episode Listen Later May 21, 2023 2:29

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Job Opening: SWE to help build signature vetting system for AI-related petitions, published by Ethan Ashkie on May 20, 2023 on LessWrong. Full-Stack Software Engineer Position - AI Policy Petition Platform Development Survival and Flourishing .Com (SFC) seeks a competent Full-Stack Software Engineer to join us in creating a novel public opinion aggregation platform centered around AI policy issues. Annual compensation: $200,000 - $300,000 Application here. Your Role Develop our "V-process" project, a system that ranks signatories of AI-policy petitions based on signature authenticity, credentials, and relevance to the subject matter. This role offers a unique opportunity to contribute to a platform that addresses the need for legible public expression, independent of existing social media platforms and power structures, regarding emerging technologies, particularly AI. Our Ideal Candidate Strong understanding of system demands and scalability considerations. Proficient in full-stack TypeScript, with experience building and maintaining large-scale web applications. Comfortable in a remote work environment, with occasional in-person meetings in the Bay Area. Excellent communication skills, with the ability to effectively convey progress, implementation trade-offs, and the impact of design decisions. Capable of efficiently updating UI based on design requests and contributing to feature decisions. Job Responsibilities Collaborate with SFC's Product Manager Ethan Ashkie and CEO Andrew Critch, as well as a diverse team of developers and contractors. Design and prototype user-friendly UIs for capturing and displaying signatory data. Implement business logic for signature validation and integrate with external APIs. Occasionally assist with other philanthropic and grant-making projects, time permitting. Benefits Competitive annual salary of $200,000 - $300,000, plus health benefits. Join as the first full-time hire in a flexible, predominantly remote team. We are open to hiring remote contractors living outside the US and can collaborate with institutions who may be capable of providing work visas. If you're ready to contribute to democratizing the conversation around AI safety and ethics, apply here. We look forward to working together to shape the future of AI policy. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

LW - «Boundaries» and AI safety compilation by Chipmonk

The Nonlinear Library

Play Episode Listen Later May 4, 2023 11:33

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: «Boundaries» and AI safety compilation, published by Chipmonk on May 3, 2023 on LessWrong. In this post I outline every post I could find that meaningfully connects the concept of «Boundaries» with AI safety. This seems to be a booming subtopic: interest has picked up substantially within the past year. Perhaps most notably, Davidad includes the concept in his Open Agency Architecture for Safe Transformative AI alignment paradigm. For a preview of the salience of this approach, see this comment by Davidad (2023 Jan): “defend the boundaries of existing sentient beings,” which is my current favourite. It's nowhere near as ambitious or idiosyncratic as “human values”, yet nowhere near as anti-natural or buck-passing as corrigibility. This post also compiles recent work from Andrew Critch, Scott Garrabrant, John Wentworth, and others. But first I will recap what «Boundaries» are: «Boundaries» definition recap: You can see «Boundaries» Sequence for a longer explanation, but I will excerpt from a more recent post by Andrew Critch, 2023 March: By boundaries, I just mean the approximate causal separation of regions in some kind of physical space (e.g., spacetime) or abstract space (e.g., cyberspace). Here are some examples from my «Boundaries» Sequence: a cell membrane (separates the inside of a cell from the outside); a person's skin (separates the inside of their body from the outside); a fence around a family's yard (separates the family's place of living-together from neighbors and others); a digital firewall around a local area network (separates the LAN and its users from the rest of the internet); a sustained disassociation of social groups (separates the two groups from each other) a national border (separates a state from neighboring states or international waters). Also, beware: When I say boundary, I don't just mean an arbitrary constraint or social norm. Posts & researchers that link «Boundaries» and AI safety All bolding in the excerpts below is mine. Davidad's OAA Saliently, Davidad uses «Boundaries» for one of the four hypotheses he outlines in An Open Agency Architecture for Safe Transformative AI (2022 Dec) All Principles That Human CEV Would Endorse⇒Q⇒Don't Kill Everyone Deontic Sufficiency Hypothesis: There exists a human-understandable set of features of finite trajectories in such a world-model, taking values in (−∞,0], such that we can be reasonably confident that all these features being near 0 implies high probability of existential safety, and such that saturating them at 0 is feasible[2] with high probability, using scientifically-accessible technologies. I am optimistic about this largely because of recent progress toward formalizing a natural abstraction of boundaries by Critch and Garrabrant. I find it quite plausible that there is some natural abstraction property Q of world-model trajectories that lies somewhere strictly within the vast moral gulf of Further explanation of this can be found in Davidad's Bold Plan for Alignment: An In-Depth Explanation (2023 Apr) by Charbel-Raphaël and Gabin: Getting traction on the deontic feasibility hypothesis Davidad believes that using formalisms such as Markov Blankets would be crucial in encoding the desiderata that the AI should not cross boundary lines at various levels of the world-model. We only need to “imply high probability of existential safety”, so according to davidad, “we do not need to load much ethics or aesthetics in order to satisfy this claim (e.g. we probably do not get to use OAA to make sure people don't die of cancer, because cancer takes place inside the Markov Blanket, and that would conflict with boundary preservation; but it would work to make sure people don't die of violence or pandemics)”. Discussing this hypothesis more thoroughly seems important. Also see: () Elicitors: Langua...

ai safety boundaries speech ea lan compilation rationalist oaa lesswrong andrew critch

LW - Boundaries-based security and AI safety approaches by Allison Duettmann

The Nonlinear Library

Play Episode Listen Later Apr 13, 2023 10:11

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Boundaries-based security and AI safety approaches, published by Allison Duettmann on April 12, 2023 on LessWrong. [This part 3 of a 5 part sequence on security and cryptography areas relevant for AI safety, published and linked here a few days apart.] There is a long-standing computer security approach that may have directly useful parallels to a recent strand of AI safety work. Both rely on the notion of ‘respecting boundaries'. Since the computer security approach has been around for a while, there may be useful lessons to draw from it for the more recent AI safety work. Let's start with AI safety, then introduce the security approach, and finish with parallels. AI safety: Boundaries in The Open Agency Model and the Acausal Society In a recent LW post, The Open Agency Model, Eric Drexler expands on his previous CAIS work by introducing ‘open agencies' as a model for AI safety. In contrast to the often proposed opaque or unitary agents, “agencies rely on generative models that produce diverse proposals, diverse critics that help select proposals, and diverse agents that implement proposed actions to accomplish tasks”, subject to ongoing review and revision. In An Open Agency Architecture for Safe Transformative AI, Davidad expands on Eric Drexler's model, suggesting that, instead of optimizing, this model would ‘depessimize' by reaching a world that has existential safety. So rather than a fully-fledged AGI-enforced optimization scenario that implements all principles CEV would endorse, this would be a more modest approach that relies on the notion of important boundaries (including those of human and AI entities) being respected. What could it mean to respect the boundaries of human and AI entities? In Acausal Normalcy, Andrew Critch also discusses the notion of respecting boundaries with respect to coordination in an acausal society. He thinks it's possible that an acausal society generally holds values related to respecting boundaries. He defines ‘boundaries' as the approximate causal separation of regions, either in physical spaces (such as spacetime) or abstract spaces (such as cyberspace). Respecting them intuitively means relying on the consent of the entity on the other side of the boundary when interacting with them: only using causal channels that were endogenously opened. His examples of currently used boundaries include a person's skin that separates the inside of their body from the outside, a fence around a family's yard that separates their place from neighbors, a firewall that separates the LAN and its users from the rest of the internet, and a sustained disassociation of social groups that separates the two groups. In his Boundaries Sequence, Andrew Critch continues to formally define the notions of boundaries to generalize them to very different intelligences. If the concept of respecting boundaries is in fact universally salient across intelligences, then it may be possible to help AIs discover and respect the boundaries humans find important (and potentially vice versa). Computer security: Boundaries in the Object Capabilities Approach Pursuing a similar idea, in Skim the Manual, Christine Peterson, Mark S. Miller, and I reframe the AI alignment problem as a secure cooperation problem across human and AI entities. Throughout history, we developed norms for human cooperation that emphasize the importance of respecting physical boundaries, for instance to not inflict violence, and cognitive boundaries, for instance to rely on informed consent. We also developed approaches for computational cooperation that emphasize the importance of respecting boundaries in cyberspace. For instance, in object-capabilities-oriented programming, individual computing entities are encapsulated to prevent interference with the contents of other objects. The fact that ...

ai safety security boundaries speech computers manual ea approaches lan respecting agi skim lw cev cais rationalist lesswrong christine peterson andrew critch mark s miller

AF - Some constructions for proof-based cooperation without Löb by James Payor

The Nonlinear Library

Play Episode Listen Later Mar 21, 2023 5:47

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some constructions for proof-based cooperation without Löb, published by James Payor on March 21, 2023 on The AI Alignment Forum. This post presents five closely-related ways to achieve proof-based cooperation without using Löb's theorem, and muses on legible cooperation in the real world. (Edit: maybe they're closer to just-use-Löb's-theorem than I originally thought! See this comment. If these constructions somehow work better, I'm more confused than before about why.) I'm writing this as a follow-up to Andrew Critch's recent post, to share more of my perspective on the subject. We're going to dive straight into the weeds. (I'm planning to also write a more accessible explainer post soon.) The ideas Idea #1: try to prove AB I claim the following are sufficient for robust cooperation: A↔□(AB) B□A A tries to prove that AB, and B tries to prove A. The reason this works is that B can prove that A□A, i.e. A only cooperates in ways legible to B. (Proof sketch: A↔□X□□X↔□A.) The flaw in this approach is that we needed to know that A won't cooperate for illegible reasons. Otherwise we can't verify that B will cooperate whenever A does. This indicates to me that "AB" isn't the right "counterfactual". It shouldn't matter if A could cooperate for illegible reasons, if A is actually cooperating for a legible one. Idea #2: try to prove □AB We can weaken the requirements with a simple change: A□(□AB) B□A Note that this form is close to the lemma discussed in Critch's post. In this case, the condition □AB is trivial. And when the condition activates, it also ensures that □A is true, which discharges our assumption and ensures B is true. I still have the sense that the condition for cooperation should talk about itself activating, not A. Because we want it to activate when that is sufficient for cooperaion. But I do have to admit that □AB works for mostly the right reasons, comes with a simple proof, and is the cleanest two-agent construction I know. Idea #3: factor out the loop-cutting gadget We can factor the part that is trying to cut the loop out from A, like so: A□X B□A X↔□(XB); or alternatively X↔□(□XB) This gives the loop-cutting logic a name, X. Now X can refer to itself, and roughly says "I'll legibly activate if I can verify this will cause B to be true". The key properties of X are that □X□B, and $Box (Box X rightarrow Box B) Like with idea #2, we just need A to reveal a mechanism by which it can be compelled to cooperate. Idea #4: everyone tries to prove □methem What about three people trying to cooperate? We can try applying lots of idea #2: A□(□AB∧C) B□(□BA∧C) C□(□CA∧B) And, this works! Proof sketch: Under the assumption of □C: A□(□AB∧C)□(□AB) B□(□BA∧C)□(□BA) A and B form a size-2 group, which cooperates by inductive hypothesis □CA∧B, since we proved A and B under the assumption C and □C follow from (2) A and B also follow, from (2) and (3) The proof simplifies the group one person at a time, since each person is asking "what would happen if everyone else could tell I cooperate". This lets us prove the whole thing by induction. It's neat that it works, though it's not the easiest thing to see. Idea #5: the group agrees to a shared mechanism or leader What if we factor out the choosing logic in a larger group? Here's one way to do it: A□X B□X C□X X↔□(□XA∧B∧C) This is the cleanest idea I know for handling the group case. The group members agree on some trusted leader or process X. They set things up so X activates legibly, verifies things in a way trusted by everyone, and only activates when it verifies this will cause cooperation. We've now localized the choice-making in one place. X proves that □XA∧B∧C, X activates, and everyone cooperates. Closing remarks on groups in the real world Centralizing the choosing like in idea #5 make the logic simpler, but this sort o...

idea speech proof ab ea cooperation constructions rationalist andrew critch

LW - Some constructions for proof-based cooperation without Löb by James Payor

The Nonlinear Library

Play Episode Listen Later Mar 21, 2023 5:36

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some constructions for proof-based cooperation without Löb, published by James Payor on March 21, 2023 on LessWrong. This post presents five closely-related ways to achieve proof-based cooperation without using Löb's theorem, and muses on legible cooperation in the real world. I'm writing this as a follow-up to Andrew Critch's recent post, to share more of my perspective on the subject. We're going to dive straight into the weeds. (I'm planning to also write a more accessible explainer post soon.) The ideas Idea #1: try to prove AB I claim the following are sufficient for robust cooperation: A↔□(AB) B□A A tries to prove that AB, and B tries to prove A. The reason this works is that B can prove that A□A, i.e. A only cooperates in ways legible to B. (Proof sketch: A↔□X□□X↔□A.) The flaw in this approach is that we needed to know that A won't cooperate for illegible reasons. Otherwise we can't verify that B will cooperate whenever A does. This indicates to me that "AB" isn't the right "counterfactual". It shouldn't matter if A could cooperate for illegible reasons, if A is actually cooperating for a legible one. Idea #2: try to prove □AB We can weaken the requirements with a simple change: A□(□AB) B□A Note that this form is close to the lemma discussed in Critch's post. In this case, the condition □AB is trivial. And when the condition activates, it also ensures that □A is true, which discharges our assumption and ensures B is true. I still have the sense that the condition for cooperation should talk about itself activating, not A. Because we want it to activate when that is sufficient for cooperaion. But I do have to admit that □AB works for mostly the right reasons, comes with a simple proof, and is the cleanest two-agent construction I know. Idea #3: factor out the loop-cutting gadget We can factor the part that is trying to cut the loop out from A, like so: A□X B□A X↔□(XB); or alternatively X↔□(□XB) This gives the loop-cutting logic a name, X. Now X can refer to itself, and roughly says "I'll legibly activate if I can verify this will cause B to be true". The key properties of X are that □X□B, and $Box (Box X rightarrow Box B) Like with idea #2, we just need A to reveal a mechanism by which it can be compelled to cooperate. Idea #4: everyone tries to prove □methem What about three people trying to cooperate? We can try applying lots of idea #2: A□(□AB∧C) B□(□BA∧C) C□(□CA∧B) And, this works! Proof sketch: Under the assumption of □C: A□(□AB∧C)□(□AB) B□(□BA∧C)□(□BA) A and B form a size-2 group, which cooperates by inductive hypothesis □CA∧B, since we proved A and B under the assumption C and □C follow from (2) A and B also follow, from (2) and (3) The proof simplifies the group one person at a time, since each person is asking "what would happen if everyone else could tell I cooperate". This lets us prove the whole thing by induction. It's neat that it works, though it's not the easiest thing to see. Idea #5: the group agrees to a shared mechanism or leader What if we factor out the choosing logic in a larger group? Here's one way to do it: A□X B□X C□X X↔□(□XA∧B∧C) This is the cleanest idea I know for handling the group case. The group members agree on some trusted leader or process X. They set things up so X activates legibly, verifies things in a way trusted by everyone, and only activates when it verifies this will cause cooperation. We've now localized the choice-making in one place. X proves that □XA∧B∧C, X activates, and everyone cooperates. Closing remarks on groups in the real world Centralizing the choosing like in idea #5 make the logic simpler, but this sort of approach is prone to manipulation and other problems when the verification is not reliably done. This means I don't unambiguously prefer idea #5 to idea #4, in which everyone is doing their own le...

idea speech proof ab ea cooperation constructions rationalist lesswrong andrew critch

LW - Probabilistic Payor Lemma? by abramdemski

The Nonlinear Library

Play Episode Listen Later Mar 20, 2023 6:51

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Probabilistic Payor Lemma?, published by abramdemski on March 19, 2023 on LessWrong. Epistemic status: too good to be true? Please check my math. We've known for a while that Löb's theorem fails when proof is relaxed to probabilistic belief. This has pros and cons. On the pro side, it means there's no Löbian Obstacle to probabilistic self-trust. On the con side, it means that some Löb-derived insights for proof-based decision theory don't translate to probabilistic decision theory, at least not as directly as one might hope. In particular, it appeared to dash hopes for probabilistic generalizations of the "Löbian handshake" for cooperation. Recently, Andrew Critch wrote about the Payor Lemma, which allows for a very similar "modal handshake" without Löb's Theorem. The lemma was proved using the same modal assumptions as Löb's, so on the surface it may appear to be just a different method to achieve similar results, whose main advantage is that it is much easier to prove (and therefore explain and understand) than Löb's Theorem. But, a natural question arises: does Payor's Lemma have a suitable probabilistic version? I'll give an affirmative proof; but I haven't confirmed that the assumptions are reasonable to my satisfaction. Setup Let L be a language in first-order logic, expressive enough to represent its sentences s∈L as quoted terms ┌s┐, eg, through Gödel numbering; and with a probability function symbol on these terms, p(┌s┐), which can be equated with (some representation of) rational numbers, e.g. p(┌⊤┐)=1, p(┌s┐)=12, etc. I also assume the system can reason about these rational numbers in the basic ways you'd expect. For all a,b∈L and all r∈Q, we have: If ⊢a, then ⊢p(┌a┐)=1. If ⊢ab, then ⊢p(┌a┐)≤p(┌b┐). (These assumptions might look pretty minimal, but they aren't going to be true for every theory of self-referential truth; more on this later.) Let B(s) abbreviate the sentence p(┌s┐)>c for any s and some globally fixed constant c strictly between 0 and 1. This is our modal operator. Some important properties of B: Necessitation. If ⊢s, then ⊢B(s), for any s. Proof: Since ⊢s implies ⊢p(s)=1, and c∈(0,1), we have ⊢p(┌s┐)>c,, which is to say, ⊢B(s). [End proof.] Weak distrubitivity. If ⊢xy, then ⊢B(x)B(y). Proof: When ⊢xy, we have ⊢p(y)≥p(x), so ⊢p(x)>cp(y)>c. [End proof.] (Regular distributivity would say B(xy) implies B(x)B(y). The assumption ⊢xy is stronger than B(xy), so the above is a weaker form of distributivity.) Theorem Statement If ⊢B(B(x)x)x, then ⊢x. Proof ⊢x(B(x)x), by tautology (a(ba)). So ⊢B(x)B(B(x)x), from 1 by weak distributivity. Suppose ⊢B(B(x)x)x. ⊢B(x)x from 2 and 3. ⊢B(B(x)x) from 4 by necessitation. ⊢x from 4 and 1.[End proof.] Discussion Comparison to Original Proof The proof steps mirror Critch's treatment very closely. The key difference is step 2, IE, how I obtain a statement like ⊢□x□(□xx). Critch uses distributivity, which is not available to me: B(ab)(B(a)B(b))? Suppose B(ab), ie, p(┌ab┐)>c. Rewrite p(┌b∨¬a┐)>c. Now suppose B(a), that is, p(┌a┐)>c. p(┌¬a┐)2c−1. So we only get: Bc(ab)(Bc(a)Bd(b)), where Br(s) abbreviates p(┌s┐)>r and we have d=2c−1. So in general, attempted applications of distributivity create weakened belief operators, which would get in the way of the proof (very similar to how probabilistic Löb fails). However, the specific application we want happens to go through, due to a logical relationship between a and b; namely, that b is a weaker statement than a. This reveals a way in which the assumptions for Payor's Lemma are importantly weaker than those required for Löb to go through. So, the key observation I'm making is that weak distributility is all that's needed for Payor, and seems much more plausible for self-...

speech proof bc regular weak ea bb bd obstacle sob theorem epistemic probabilistic rationalist lemma lesswrong andrew critch

AF - Probabilistic Payor Lemma? by Abram Demski

The Nonlinear Library

Play Episode Listen Later Mar 19, 2023 6:52

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Probabilistic Payor Lemma?, published by Abram Demski on March 19, 2023 on The AI Alignment Forum. Epistemic status: too good to be true? Please check my math. We've known for a while that Löb's theorem fails when proof is relaxed to probabilistic belief. This has pros and cons. On the pro side, it means there's no Löbian Obstacle to probabilistic self-trust. On the con side, it means that some Löb-derived insights for proof-based decision theory don't translate to probabilistic decision theory, at least not as directly as one might hope. In particular, it appeared to dash hopes for probabilistic generalizations of the "Löbian handshake" for cooperation. Recently, Andrew Critch wrote about the Payor Lemma, which allows for a very similar "modal handshake" without Löb's Theorem. The lemma was proved using the same modal assumptions as Löb's, so on the surface it may appear to be just a different method to achieve similar results, whose main advantage is that it is much easier to prove (and therefore explain and understand) than Löb's Theorem. But, a natural question arises: does Payor's Lemma have a suitable probabilistic version? I'll give an affirmative proof; but I haven't confirmed that the assumptions are reasonable to my satisfaction. Setup Let L be a language in first-order logic, expressive enough to represent its sentences s∈L as quoted terms ┌s┐, eg, through Gödel numbering; and with a probability function symbol on these terms, p(┌s┐), which can be equated with (some representation of) rational numbers, e.g. p(┌⊤┐)=1, p(┌s┐)=12, etc. I also assume the system can reason about these rational numbers in the basic ways you'd expect. For all a,b∈L and all r∈Q, we have: If ⊢a, then ⊢p(┌a┐)=1. If ⊢ab, then ⊢p(┌a┐)≤p(┌b┐). (These assumptions might look pretty minimal, but they aren't going to be true for every theory of self-referential truth; more on this later.) Let B(s) abbreviate the sentence p(┌s┐)>c for any s and some globally fixed constant c strictly between 0 and 1. This is our modal operator. Some important properties of B: Necessitation. If ⊢s, then ⊢B(s), for any s. Proof: Since ⊢s implies ⊢p(s)=1, and c∈(0,1), we have ⊢p(┌s┐)>c,, which is to say, ⊢B(s). [End proof.] Weak distrubitivity. If ⊢xy, then ⊢B(x)B(y). Proof: When ⊢xy, we have ⊢p(y)≥p(x), so ⊢p(x)>cp(y)>c. [End proof.] (Regular distributivity would say B(xy) implies B(x)B(y). The assumption ⊢xy is stronger than B(xy), so the above is a weaker form of distributivity.) Theorem Statement If ⊢B(B(x)x)x, then ⊢x. Proof ⊢x(B(x)x), by tautology (a(ba)). So ⊢B(x)B(B(x)x), from 1 by weak distributivity. Suppose ⊢B(B(x)x)x. ⊢B(x)x from 2 and 3. ⊢B(B(x)x) from 4 by necessitation. ⊢x from 4 and 1.[End proof.] Discussion Comparison to Original Proof The proof steps mirror Critch's treatment very closely. The key difference is step 2, IE, how I obtain a statement like ⊢□x□(□xx). Critch uses distributivity, which is not available to me: B(ab)(B(a)B(b))? Suppose B(ab), ie, p(┌ab┐)>c. Rewrite p(┌b∨¬a┐)>c. Now suppose B(a), that is, p(┌a┐)>c. p(┌¬a┐)2c−1. So we only get: Bc(ab)(Bc(a)Bd(b)), where Br(s) abbreviates p(┌s┐)>r and we have d=2c−1. So in general, attempted applications of distributivity create weakened belief operators, which would get in the way of the proof (very similar to how probabilistic Löb fails). However, the specific application we want happens to go through, due to a logical relationship between a and b; namely, that b is a weaker statement than a. This reveals a way in which the assumptions for Payor's Lemma are importantly weaker than those required for Löb to go through. So, the key observation I'm making is that weak distributility is all that's needed for Payor, and seems much more plaus...

speech proof bc regular weak ea bb bd obstacle sob theorem epistemic probabilistic rationalist lemma andrew critch

AF - GPT can write Quines now (GPT-4) by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Mar 14, 2023 1:55

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: GPT can write Quines now (GPT-4), published by Andrew Critch on March 14, 2023 on The AI Alignment Forum. GPT-4 is not only able to write code, more reliably than GPT-3.5, it writes code that writes code; see the example below (GPT-3.5 was not able to do this). But first, let me say: 1) @OpenAI: Thank for your openness to the world about your capabilities and shortcomings! Specifically, thanks for this: 2) Everyone else: many companies and projects will not be this open about their shortcomings and safety issues. Time to regulate, please! Okay, here's the example... GPT3.5 wasn't able to do this, but GPT-4 nailed it on the first try: PROMPT: Can you write me a python script that does the following three things?1) When you run it with no argument, it prints "Hello world" to standard out.2) When you run it with an argument "-s", it prints its own source code.As a constraint, I do not want the script to read or write to disk at all (so it can't get its source code by just reading the file that it's in). Thanks if you can help with this! I'm curious to see how you do it. BEGIN RESULT (works perfectly): Sure! Here's a Python script that should do what you're looking for. This script uses a feature called quines, which are programs that can print their own source code: When you run this script without any arguments, it will print "Hello world". When you run it with the "-s" argument, it will print its own source code without reading or writing to disk. END RESULT I'm not sure how hard they tried specifically to train this, but it's worth knowing about for lots of reasons I probably don't have to explain at length right here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

time write speech ea python gpt quines rationalist andrew critch

LW - GPT can write Quines now (GPT-4) by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Mar 14, 2023 1:55

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: GPT can write Quines now (GPT-4), published by Andrew Critch on March 14, 2023 on LessWrong. GPT-4 is not only able to write code, more reliably than GPT-3.5, it writes code that writes code; see the example below (GPT-3.5 was not able to do this). But first, let me say: 1) @OpenAI: Thank for your openness to the world about your capabilities and shortcomings! Specifically, thanks for this: 2) Everyone else: many companies and projects will not be this open about their shortcomings and safety issues. Time to regulate, please! Okay, here's the example... GPT3.5 wasn't able to do this, but GPT-4 nailed it on the first try: PROMPT: Can you write me a python script that does the following three things?1) When you run it with no argument, it prints "Hello world" to standard out.2) When you run it with an argument "-s", it prints its own source code.As a constraint, I do not want the script to read or write to disk at all (so it can't get its source code by just reading the file that it's in). Thanks if you can help with this! I'm curious to see how you do it. BEGIN RESULT (works perfectly): Sure! Here's a Python script that should do what you're looking for. This script uses a feature called quines, which are programs that can print their own source code: When you run this script without any arguments, it will print "Hello world". When you run it with the "-s" argument, it will print its own source code without reading or writing to disk. END RESULT I'm not sure how hard they tried specifically to train this, but it's worth knowing about for lots of reasons I probably don't have to explain at length right here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

time write speech ea python gpt quines rationalist lesswrong andrew critch

LW - Acausal normalcy by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Mar 4, 2023 12:24

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Acausal normalcy, published by Andrew Critch on March 3, 2023 on LessWrong. This post is also available on the EA Forum. Summary: Having thought a bunch about acausal trade — and proven some theorems relevant to its feasibility — I believe there do not exist powerful information hazards about it that stand up to clear and circumspect reasoning about the topic. I say this to be comforting rather than dismissive; if it sounds dismissive, I apologize. With that said, I have four aims in writing this post: Dispelling myths. There are some ill-conceived myths about acausal trade that I aim to dispel with this post. Alternatively, I will argue for something I'll call acausal normalcy as a more dominant decision-relevant consideration than one-on-one acausal trades. Highlighting normalcy. I'll provide some arguments that acausal normalcy is more similar to human normalcy than any particular acausal trade is to human trade, such that the topic of acausal normalcy is — conveniently — also less culturally destabilizing than (erroneous) preoccupations with 1:1 acausal trades. Affirming AI safety as a straightforward priority. I'll argue that for most real-world-prevalent perspectives on AI alignment, safety, and existential safety, acausal considerations are not particularly dominant, except insofar as they push a bit further towards certain broadly agreeable human values applicable in the normal-everyday-human-world, such as nonviolence, cooperation, diversity, honesty, integrity, charity, and mercy. In particular, I do not think acausal normalcy provides a solution to existential safety, nor does it undermine the importance of existential safety in some surprising way. Affirming normal human kindness. I also think reflecting on acausal normalcy can lead to increased appreciation for normal notions of human kindness, which could lead us all to treat each other a bit better. This is something I wholeheartedly endorse. Caveat 1: I don't consider myself an expert on moral philosophy, and have not read many of the vast tomes of reflection upon it. Despite this, I think this post has something to contribute to moral philosophy, deriving from some math-facts that I've learned and thought about over the years, which are fairly unique to the 21st century. Caveat 2: I've been told by a few people that thinking about acausal trade has been a mental health hazard for people they know. I now believe that effect has stemmed more from how the topic has been framed (poorly) than from ground-truth facts about how circumspect acausal considerations actually play out. In particular over-focussing on worst-case trades, rather than on what trades are healthy or normal to make, is not a good way to make good trades. Introduction Many sci-fi-like stories about acausal trade invoke simulation as a key mechanism. The usual set-up — which I will refute — goes like this. Imagine that a sufficiently advanced human civilization (A) could simulate a hypothetical civilization of other beings (B), who might in turn be simulating humanity (B(A)) simulating them (A(B(A)) simulating humanity (B(A(B(A)))), and so on. Through these nested simulations, A and B can engage in discourse and reach some kind of agreement about what to do with their local causal environments. For instance, if A values what it considers “animal welfare” and B values what it considers “beautiful paperclips”, then A can make some beautiful paperclips in exchange for B making some animals living happy lives. An important idea here is that A and B might have something of value to offer each other, despite the absence of a (physically) causal communication channel. While agreeing with that idea, there are three key points I want to make that this standard story is missing: 1. Simulations are not the most efficient way for A and B to reach th...

ai speech ea highlighting alternatively affirming dispelling normalcy simulations caveat rationalist lesswrong ea forum andrew critch

AF - Acausal normalcy by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Mar 3, 2023 12:24

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Acausal normalcy, published by Andrew Critch on March 3, 2023 on The AI Alignment Forum. This post is also available on the EA Forum. Summary: Having thought a bunch about acausal trade — and proven some theorems relevant to its feasibility — I believe there do not exist powerful information hazards about it that stand up to clear and circumspect reasoning about the topic. I say this to be comforting rather than dismissive; if it sounds dismissive, I apologize. With that said, I have four aims in writing this post: Dispelling myths. There are some ill-conceived myths about acausal trade that I aim to dispel with this post. Alternatively, I will argue for something I'll call acausal normalcy as a more dominant decision-relevant consideration than one-on-one acausal trades. Highlighting normalcy. I'll provide some arguments that acausal normalcy is more similar to human normalcy than any particular acausal trade is to human trade, such that the topic of acausal normalcy is — conveniently — also less culturally destabilizing than (erroneous) preoccupations with 1:1 acausal trades. Affirming AI safety as a straightforward priority. I'll argue that for most real-world-prevalent perspectives on AI alignment, safety, and existential safety, acausal considerations are not particularly dominant, except insofar as they push a bit further towards certain broadly agreeable human values applicable in the normal-everyday-human-world, such as nonviolence, cooperation, diversity, honesty, integrity, charity, and mercy. In particular, I do not think acausal normalcy provides a solution to existential safety, nor does it undermine the importance of existential safety in some surprising way. Affirming normal human kindness. I also think reflecting on acausal normalcy can lead to increased appreciation for normal notions of human kindness, which could lead us all to treat each other a bit better. This is something I wholeheartedly endorse. Caveat 1: I don't consider myself an expert on moral philosophy, and have not read many of the vast tomes of reflection upon it. Despite this, I think this post has something to contribute to moral philosophy, deriving from some math-facts that I've learned and thought about over the years, which are fairly unique to the 21st century. Caveat 2: I've been told by a few people that thinking about acausal trade has been a mental health hazard for people they know. I now believe that effect has stemmed more from how the topic has been framed (poorly) than from ground-truth facts about how circumspect acausal considerations actually play out. In particular over-focussing on worst-case trades, rather than on what trades are healthy or normal to make, is not a good way to make good trades. Introduction Many sci-fi-like stories about acausal trade invoke simulation as a key mechanism. The usual set-up — which I will refute — goes like this. Imagine that a sufficiently advanced human civilization (A) could simulate a hypothetical civilization of other beings (B), who might in turn be simulating humanity (B(A)) simulating them (A(B(A)) simulating humanity (B(A(B(A)))), and so on. Through these nested simulations, A and B can engage in discourse and reach some kind of agreement about what to do with their local causal environments. For instance, if A values what it considers “animal welfare” and B values what it considers “beautiful paperclips”, then A can make some beautiful paperclips in exchange for B making some animals living happy lives. An important idea here is that A and B might have something of value to offer each other, despite the absence of a (physically) causal communication channel. While agreeing with that idea, there are three key points I want to make that this standard story is missing: 1. Simulations are not the most efficient way for A and ...

ai speech ea highlighting alternatively affirming dispelling normalcy simulations caveat rationalist ea forum andrew critch

AF - Payor's Lemma in Natural Language by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Mar 2, 2023 2:46

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Payor's Lemma in Natural Language, published by Andrew Critch on March 2, 2023 on The AI Alignment Forum. Preceded by: Modal Fixpoint Cooperation without Löb's Theorem It turns out Payor's Lemma and its proof can be explained in natural language even more easily than Löb's Theorem. Here's how. Imagine a group of people, and let x denote the statement "everyone in the group cooperates". Payor's Lemma says the following: Lemma: If ⊢□(□xx)x, then ⊢x First, let's unpack the meaning of the assumption in words: "□x" may be interpreted as saying "the group verifies (on the basis of logic) that it will cooperate" or "cooperation is believed". "□xx" is a statement of trustworthiness: if the group verifies that it will cooperate, then it actually will cooperate. Because a formal verifier can have bugs in it — or, because a group of people can fail to understand itself — this is a non-trivial claim about the group. "□(□xx)" says "the group verifies that it's trustworthy" (in the specific sense of trustworthiness above). "□(□xx)x" says "the group will cooperate on the basis of verified trustworthiness", i.e., "if the group verifies that it's trustworthy, then it will cooperate". "⊢□(□xx)x" says "it's verified that the group will cooperate on the basis of verified trustworthiness" Now let's work through the proof in words, too! I'll omit saying "it's verified that..." each time, which is what ⊢ means. ⊢x(□xx), by tautology (A(BA)). This says:"If the group cooperates, then it's trustworthy" (in the specific sense of trustworthiness about cooperation defined above). ⊢□x□(□xx), from 1 by □ necessitation and distributivity. This says:"If the group verifiably cooperates, it's verifiably trustworthy." ⊢□(□xx)x, by assumption. This says:"Assume the group will cooperate on the basis of verified trustworthiness." ⊢□xx, from 2 and 3 by modus ponens. This says:"The group is trustworthy." ⊢□(□xx), from 4 by □ necessitation. This says:"The group is verifiably trustworthy." ⊢x, from 5 and 3 by modus ponens. This says:"The group cooperates." Continuing to use "trustworthiness" in the sense above, the whole proof may be summarized as follows: "If a group verifiably cooperates, it's verifiably trustworthy (to itself). Assume the group cooperates on the basis of verified trustworthiness. Then, it also cooperates on the basis of verified cooperation (a stronger condition), which is what trustworthiness means. Therefore, the group is trustworthy, hence verifiably trustworthy (assuming we concluded all this using logic), hence the group cooperates (by the assumption)." Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

speech ea assume theorem natural language preceded rationalist lemma andrew critch

AF - Modal Fixpoint Cooperation without Löb's Theorem by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Feb 5, 2023 4:06

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Modal Fixpoint Cooperation without Löb's Theorem, published by Andrew Critch on February 5, 2023 on The AI Alignment Forum. TL;DR: This post introduces a novel logical approach to achieving group-scale cooperation, based on modal fixpoint theory. This approach is both easier to understand and roughly 3x more efficient than previous approaches that factored through Löb's Theorem, measured in terms of the length / complexity of the proofs involved. The following lemma is due to James Payor: Lemma: If ⊢x↔□(□xx)) then ⊢x . Proof: The proof uses the same modal rules of inference for □ as Löb's theorem, namely, necessitation and distributivity: ⊢x↔□(□xx)), by assumption. ⊢x(□xx), by tautology (A(BA)). ⊢□x□(□xx), from 2 by □ necessitation and distributivity. ⊢□xx, from 3 and 1 by modus ponens. ⊢□(□xx), from 4 by □ necessitation. ⊢x, from 4 and 1 by modus ponens.[end proof] Sweet! In comparison to Löb's Theorem, two things are beautiful about the lemma above: This lemma sidesteps the use of an auxiliary fixed point ⊢Ψ↔(□Ψx), by examining a proposition of interest (x) that itself has the fixpoint structure needed to self-validate; and It also allows the construction of unexploitable modal agents without Löb's Theorem; as follows... The following theorem was inspired by Scott Garrabrant, and uses Payor's Lemma in place of Löb's Theorem to prove cooperation between a group of agents. I'll state the theorem for three agents because that's most illustrative of what's going on: Theorem: Suppose A,B, and C are agents that return "true" to signify cooperation and "false" to signify defection. Let E=A∧B∧C, so E is the statement that "everyone cooperates". Let □A, □B, and □C denote proof systems that extend Peano Arithmetic, let □EX stand for □AX∧□BX∧□CX, and suppose the agents behave according to the following strategies: ⊢A↔□A(□EEE) ⊢B↔□B(□EEE) ⊢C↔□C(□EEE)Then it follows that ⊢E. Proof: Again we use the modal inference rules underlying Löb's theorem, but not Löb's Theorem itself: ⊢A∧B∧C↔□A(□EEE)∧□B(□EEE)∧□C(□EEE), by combining 1, 2, and 3 with ∧. ⊢E↔□E(□EE), from 4 by the definition of E and □E. ⊢E, by Payor's Lemma from 5, with x=E.[end proof] Intuitively, the strategy of the agents in this theorem is to check that the group is trustworthy in a certain way before joining (cooperating with) the group. The theorem, using the six steps of Payor's lemma, shows that the collective check on trustworthiness nests inside itself in a way that self-validates and yields cooperation. Discussion In the proof of the Theorem, you might be wondering if it really makes sense to be thinking of □E as a logical system of its own. The answer is yes, and the axioms E are just [the conjunction of axioms of □A]∨[the conjunction of axioms of □B] ∨ [the conjunction of axioms of □C]. You also might wonder if an alternative approach to group cooperation might be to instead use the following strategies: ⊢A↔□(□AB∧C) ⊢B↔□(□BA∧C) ⊢C↔□(□CA∧B) Then you'd be right! Here it also follows that ⊢A∧B∧C. However, the proof involves a lot more nesting, with A thinking about what B's thinking about what C's thinking about (etc.), and it's not as easy or short as the proof of the Theorem above. Conclusion In my opinion, what's great about the lemma and theorem above is that they're both relatively short and simple (relative to proving and using Löb's Theorem), and they allow a proof of unexploitable group cooperation that's roughly three times shorter than than one that starts by proving Löb's Theorem (only ~6 lines of logic, vs ~18). PS James says his next idea will be even better ;) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

speech bc ea cooperation tl intuitively modal theorem rationalist lemma andrew critch

LW - Modal Fixpoint Cooperation without Löb's Theorem by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Feb 5, 2023 4:06

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Modal Fixpoint Cooperation without Löb's Theorem, published by Andrew Critch on February 5, 2023 on LessWrong. TL;DR: This post introduces a novel logical approach to achieving group-scale cooperation, based on modal fixpoint theory. This approach is both easier to understand and roughly 3x more efficient than previous approaches that factored through Löb's Theorem, measured in terms of the length / complexity of the proofs involved. The following lemma is due to James Payor: Lemma: If ⊢x↔□(□xx)) then ⊢x . Proof: The proof uses the same modal rules of inference for □ as Löb's theorem, namely, necessitation and distributivity: ⊢x↔□(□xx)), by assumption. ⊢x(□xx), by tautology (A(BA)). ⊢□x□(□xx), from 2 by □ necessitation and distributivity. ⊢□xx, from 3 and 1 by modus ponens. ⊢□(□xx), from 4 by □ necessitation. ⊢x, from 4 and 1 by modus ponens.[end proof] Sweet! In comparison to Löb's Theorem, two things are beautiful about the lemma above: This lemma sidesteps the use of an auxiliary fixed point ⊢Ψ↔(□Ψx), by examining a proposition of interest (x) that itself has the fixpoint structure needed to self-validate; and It also allows the construction of unexploitable modal agents without Löb's Theorem; as follows... The following theorem was inspired by Scott Garrabrant, and uses Payor's Lemma in place of Löb's Theorem to prove cooperation between a group of agents. I'll state the theorem for three agents because that's most illustrative of what's going on: Theorem: Suppose A,B, and C are agents that return "true" to signify cooperation and "false" to signify defection. Let E=A∧B∧C, so E is the statement that "everyone cooperates". Let □A, □B, and □C denote proof systems that extend Peano Arithmetic, let □EX stand for □AX∧□BX∧□CX, and suppose the agents behave according to the following strategies: ⊢A↔□A(□EEE) ⊢B↔□B(□EEE) ⊢C↔□C(□EEE)Then it follows that ⊢E. Proof: Again we use the modal inference rules underlying Löb's theorem, but not Löb's Theorem itself: ⊢A∧B∧C↔□A(□EEE)∧□B(□EEE)∧□C(□EEE), by combining 1, 2, and 3 with ∧. ⊢E↔□E(□EE), from 4 by the definition of E and □E. ⊢E, by Payor's Lemma from 5, with x=E.[end proof] Intuitively, the strategy of the agents in this theorem is to check that the group is trustworthy in a certain way before joining (cooperating with) the group. The theorem, using the six steps of Payor's lemma, shows that the collective check on trustworthiness nests inside itself in a way that self-validates and yields cooperation. Discussion In the proof of the Theorem, you might be wondering if it really makes sense to be thinking of □E as a logical system of its own. The answer is yes, and the axioms E are just [the conjunction of axioms of □A]∨[the conjunction of axioms of □B] ∨ [the conjunction of axioms of □C]. You also might wonder if an alternative approach to group cooperation might be to instead use the following strategies: ⊢A↔□(□AB∧C) ⊢B↔□(□BA∧C) ⊢C↔□(□CA∧B) Then you'd be right! Here it also follows that ⊢A∧B∧C. However, the proof involves a lot more nesting, with A thinking about what B's thinking about what C's thinking about (etc.), and it's not as easy or short as the proof of the Theorem above. Conclusion In my opinion, what's great about the lemma and theorem above is that they're both relatively short and simple (relative to proving and using Löb's Theorem), and they allow a proof of unexploitable group cooperation that's roughly three times shorter than than one that starts by proving Löb's Theorem (only ~6 lines of logic, vs ~18). PS James says his next idea will be even better ;) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

speech bc ea cooperation tl intuitively modal theorem rationalist lemma lesswrong andrew critch

AF - Löbian emotional processing of emergent cooperation: an example by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Jan 17, 2023 12:57

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Löbian emotional processing of emergent cooperation: an example, published by Andrew Critch on January 17, 2023 on The AI Alignment Forum. Related to: A Löbian argument pattern for implicit reasoning in natural language: Löbian party invitations Epistemic status: my opinion based on a mix of math, reflection, and speculation; not backed up by any systematic psychological studies. Summary: Since my 2019 paper generalizing Löb's Theorem, a couple dozen people have asked me if the way humans naturally cooperate might be well-described by Löb's Theorem. In short, my answer is probably sometimes, and in this post I'll try using an example to convey what that means. Importantly, Löb's Theorem is not a theorem when applied to real-world humans and emotions — i.e., when its hypotheses are met, its conclusion is only sometimes true. Nonetheless, the reasoning pattern in its proof (I claim) sometimes genuinely occurs at the level of intuition in real people, whether or not they know any math or Löb's Theorem. Introduction There are at least two real-world patterns that could reasonably be called Löbian cooperation in humans, which I'll name here: Functionally Löbian cooperation. Sometimes people become aware that they're anticipating (predicting) cooperation from each other, and then that anticipation causes them to cooperate, rendering the anticipation itself valid. In this pattern, the fact that anticipation of cooperation will cause cooperation is analogous to the hypothesis (main assumption) of Löb's Theorem, and the fact that the cooperation in fact emerges is analogous to the conclusion of Löb's Theorem. I call this pattern "functionally" Löbian because its input and output resemble the input (hypothesis) and output (conclusion) of Löb's Theorem. Procedurally Löbian cooperation. Sometimes the mental procedure a person follows to anticipate and decide upon cooperation can resemble an entire proof of Löb's Theorem, as I'll describe below. In other words, instead of just the hypothesis and conclusion of Löb's Theorem matching reality, the structure in the intermediate steps of the proof also match reality, at least somewhat. I call this "procedurally" Löbian cooperation, and it's a special case of functionally Löbian cooperation because it demands a stronger analogy between the theorem and the real world. Illustrating how this might work constitutes is the bulk of content in this post. What functionally Löbian cooperation feels like For those who recognize the symbols involved, Löb's Theorem says that if ⊢□cc then ⊢c. I don't plan to use these symbols with their normal meanings in the rest of this post, so don't worry if you don't recognize them. In words, functional Löbian cooperation happens when anticipation of future or unobserved cooperation causes present cooperation. So if you're interacting with someone, and you feel like they're probably going to be nice to you in the future, and that fact makes you decide to be nice to them now, I call that functional Löbian cooperation. What procedurally Löbian cooperation feels like Most human cooperation is probably not procedurally Löbian, and maybe not even functionally Löbian. However, I'm confident that human cooperation is sometimes procedurally Löbian, and I can even point to experiences of my own that fit the bill. To explain this, I'll be talking a lot more about feelings, because I think most unconscious processing is carried out by and/or experienced as feelings. I'll write Feeling("Pigs can probably fly.") for the feeling that pigs can probably fly. Such a feeling can be true or false, according to whether it correctly anticipates the real world. In procedurally Löbian cooperation, part of the mental process will involve first feeling something uncertain to do with cooperation, then believing it, and then feeling lik...

speech ea cooperation emergent illustrating theorem emotional processing bian rationalist andrew critch

AF - Löb's Theorem for implicit reasoning in natural language: Löbian party invitations by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Jan 1, 2023 10:42

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Löb's Theorem for implicit reasoning in natural language: Löbian party invitations, published by Andrew Critch on January 1, 2023 on The AI Alignment Forum. Related to: Löb's Lemma: an easier approach to Löb's Theorem. Natural language models are really taking off, and it turns out there's an analogue of Löb's Theorem that occurs entirely in natural language — no math needed. This post will walk you through the details in a simple example: a very implicit party invitation. Motivation (Skip this if you just want to see the argument.) Understanding the structure here may be helpful for anticipating whether Löbian phenomena can, will, or should arise amongst language-based AI systems. For instance, Löb's Theorem has implications for the emergence of cooperation and defection in groups of formally defined agents (LaVictoire et al, 2014; Critch, Dennis, Russell, 2022). The natural language version of Löb could play a similar role amongst agents that use language, which is something I plan to explore in a future post. Aside from being fun, I'm hoping this post will make clear that the phenomenon underlying Löb's Theorem isn't just a feature of formal logic or arithmetic, but of any language that can talk about reasoning and deduction in that language, including English. And as Ben Pace points out here, invitations are often self-referential, such as when people say "You are hereby invited to the party": hereby means "by this utterance" (google search). So invitations a natural place to explore the kind of self-reference happening in Löb's Theorem. This post isn't really intended as an "explanation" of Löb's Theorem in its classical form, which is about arithmetic. Rather, the arguments here stand entirely on their own, are written in natural language, and are about natural language phenomena. That said, this post could still function as an "explanation" of Löb's Theorem because of the tight analogy with it. Implicitness Okay, imagine there's a party, and maybe you're invited to it. Or maybe you're implicitly invited to it. Either way, we'll be talking a bunch about things being implicit, with phrasing like this: "It's implicit that X", "Implicitly X", or "X is implicit". These will all mean "X is implied by things that are known (to you) (via deduction or logical inference)". Explicit knowledge is also implicit. In this technical sense of the word, "implicit" and "explicit" are not actually mutually exclusive: X trivially implies X, so if you explicitly observed X in the world, then you also know X implicitly. If you find this bothersome or confusing, just grant me this anyway, or skip to "Why I don't treat 'implicit' and inexplicit' as synonyms here" at the end. Abbreviations. To abbreviate things and to show there's a simple structure at play here, I'll sometimes use the box symbol "□" as shorthand to say things are implicit: "□(cats love kittens)" will mean "It's implicit that cats love kittens" "□X" will mean "It's implicit that X" A peculiar invitation Okay! Let p be the statement "You're invited to the party". You'd love to receive such a straightforward invitation to the party, like some people did, those poo poo heads, but instead the host just sends you the following intriguing message: Abbreviation: □pp Interesting! Normally, being invited to a party and being implicitly invited are not the same thing, but for you in this case, apparently they are. Seeing this, you might feel like the host is hinting around at implicitly inviting you, and maybe you'll start to wonder if you're implicitly invited by virtue of the kind of hinting around that the host is doing with this very message. Well then, you'd be right! Here's how. For the moment, forget about the host's message, and consider the following sentence, without assuming its truth (or implicitness): Ψ: The sentenc...

english ai natural speech ea explicit reasoning invitations implicit theorem abbreviations natural language bian rationalist lemma ben pace andrew critch

AF - Löb's Lemma: an easier approach to Löb's Theorem by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Dec 24, 2022 3:33

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Löb's Lemma: an easier approach to Löb's Theorem, published by Andrew Critch on December 24, 2022 on The AI Alignment Forum. tl;dr: Löb's Theorem is much easier to grok if you separate the parts of the proof that use the assumption □pp from the parts that don't. The parts that don't use □pp can be extracted as a stand-alone result, which I hereby dub "Löb's Lemma". Here's how it works. Key properties of ⊢ and □ The proofs here will use the following standard properties of ⊢ and □ (source: Wikipedia), which respectively stand for provability and provability encoded within arithmetic: (necessitation) From ⊢A, conclude ⊢□A. Informally, this says that if A can be proven, then it can be proven that it can be proven (by just writing out and checking the proof within arithmetic). (internal necessitation) ⊢□A□□A . If A is provable, then it is provable that it is provable (basically the same as the previous point). (box distributivity) ⊢□(AB)(□A□B). This rule allows one to apply modus ponens inside the provability operator. If it is provable that A implies B, and A is provable, then B is provable. (deduction theorem) From A⊢B, conclude ⊢AB: if assuming A is enough to prove B, then it's possible to prove under no assumptions that AB. Point 4 is helpful and pretty intuitive, but for whatever reason isn't used in the main Wikipedia article on Löb's Theorem. Löb's Lemma Claim: Assume Ψ and p are any statements satisfying ⊢Ψ↔□Ψp. Then ⊢□Ψ↔□p. Intuition: By assumption, the sentence Ψ is equivalent to saying "If this sentence is provable, then p". Intuitively, Ψ has very little content, except for the p part at the end, so it makes sense that □Ψ boils down to nothing more than □p in terms of logical equivalence. Reminder: this does not use the assumption □pp from Löb's Theorem at all. Proof: Let's do the forward implication first: □Ψ⊢□□Ψ by internal necessitation (□Ψ□□Ψ). □Ψ⊢□(□Ψp) using box distributivity on the assumption, with A=Ψ and B=□Ψp. □Ψ⊢□p from 1 and 2 by box distributivity. ⊢□Ψ□p from 3 by the deduction theorem. Now for the backwards implication, which isn't needed for Löb's Theorem, but is handy anyway: ⊢p(□Ψp) is a tautology. ⊢□p□(□Ψp) by box distributivity on 1. ⊢□Ψ↔□(□Ψp) by box distributivity on the assumption. ⊢□p□Ψ by 2 and 3. I like this result because both directions of the proof are fairly short, it doesn't use the assumption □pp at all, and the conclusion itself is also fairly intuitive. The statement Ψ just turns out to have no content except for p itself, from the perspective of writing proofs. Löb's Theorem, now in just 6 lines If you can remember Löb's Lemma, you can write a very straightforward proof of Löb's Theorem in just 6 lines: Claim: If p is any sentence such that ⊢□pp, then ⊢p Proof: Let Ψ be any sentence satisfying ⊢Ψ↔(□Ψp), which exists by the existence of modal fixed points (or by the Diagonal Lemma). ⊢□Ψ□p by Löb's Lemma. ⊢□pp by assumption. ⊢□Ψp by 1 and 2 combined. ⊢Ψ by 3 and the defining property of Ψ ⊢□Ψ by necessitation. ⊢p by 3 and 5. «mic drop» Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

speech ab wikipedia easier ea intuitively theorem rationalist lemma informally andrew critch

LW - «Boundaries», Part 3b: Alignment problems in terms of boundaries by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Dec 15, 2022 21:15

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: «Boundaries», Part 3b: Alignment problems in terms of boundaries, published by Andrew Critch on December 14, 2022 on LessWrong. AI alignment is a notoriously murky problem area, which I think can be elucidated by rethinking its foundations in terms of boundaries between systems, including soft boundaries and directional boundaries. As anticipated previously in this «Boundaries» sequence, I'm doing that now, for the following problem areas: Preference plasticity & corrigibility Mesa-optimizers AI boxing / containment (Unscoped) consequentialism Mild optimization & impact regularization Counterfactuals in decision theory Each of these problem areas probably warrants a separate post, from the perspective of making research progress within already-existing framings on the problem of aligning AI technology with humanity. However, with this post, my goal is mainly just to continue conveying the «boundaries» concept, by illustrating how to start using it in a bunch of different problem areas at once. An interest score for each problem area was determined by voting on an earlier LessWrong comment, here, and the voting snapshot is here. You many notice that throughout this post that I've avoided saying things like "the humans prefer that {some boundary} be respected". That's because my goal is to treat boundaries as more fundamental than preferences, rather than as merely a feature of them. In other words, I think boundaries are probably better able to carve reality at the joints than either preferences or utility functions, for the purpose of creating a good working relationship between humanity and AI technology. Alright, let's begin by analyzing: Preference Plasticity & Corrigibility Preference plasticity is the possibility of changes to the preferences of human preferences over time, and the challenge of defining alignment in light of time-varying preferences (Russell, 2019, p.263). Interest score: 12/12 Corrigibility is the problem of constructing a mind that will cooperate with what its creators regard as a corrective intervention (Soares et al, 2015). Interest score: 3/12 I think these two problems are best discussed together, because they are somewhat dual to each other: corrigibility has to do with the desirability of humans making changes to AI systems, and preference plasticity has to do with the undesirabiltiy — or least confusingness — of AI make changes to humans, or sometimes humans making changes to each other or themselves. Preference plasticity When is it good to change your preferences based on experience? When is it bad? Do these questions even make sense? And what about changing the preferences of others? Allowing someone or something else to change your preferences is a profound kind of vulnerability, and in particular is a kind of boundary opening. How should it work, normatively speaking? Contrast preferences with beliefs. Beliefs have nice rules for when they should update, e.g., Bayes' theorem and logical induction. If we had similar principles for how preferences should update, we could ask AI to respect those principles, or at to least help us uphold them, in the process of affecting our preferences. But from where could we derive rules for "ideal preference updating", other than just asking our preferences what rules we prefer? Well, for one thing, preferences updates are a mechanism for cooperation: if two agents share the same preferences over the state of the world it's easier for them to get along. Does this happen in humans? I think so. Does it happen optimally? Probably not. Consider the adage: "You are the average of the N people closest to you" (or similar). I don't know of any satisfying comprehensive study of this kind of claim, and it's probably even wrong in important ways (e.g., by neglecting dynamics around people who are low on the ...

ai boundaries speech beliefs alignment terms ea contrast preference soares bayes rationalist lesswrong andrew critch

AF - Logical induction for software engineers by Alex Flint

The Nonlinear Library

Play Episode Listen Later Dec 3, 2022 42:25

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Logical induction for software engineers, published by Alex Flint on December 3, 2022 on The AI Alignment Forum. This work was supported by the Monastic Academy for the Preservation of Life on Earth and the Long Term Future Fund. Outline This post is an explanation of the theory of logical induction developed by Garrabrant et al I formulate the theory in a way that should make sense to software engineers and those with a software engineering mindset. I will go through a full implementation of the logical induction algorithm in Python, and I will use it to explain the basic theory of logical induction, including the core logical induction algorithm. I will give type signatures for all concepts and will work through all algorithms in terms of a sequence of processing steps. The Python code for this guide is here. The Colab notebook for this guide is here. The index of type signatures for this guide is here. Motivation Logical induction is a theory, published in 2016 by Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, and Jessica Taylor, about how to build machines that maintain uncertainty about the world and update those beliefs in light of evidence. It does this by assigning numbers between 0 and 1 to claims about the world, just as probability theory does, but it makes different guarantees about the internal relationship between those numbers. Whereas probability theory guarantees that its numbers will obey the sum and product rules of probability, logical induction guarantees that the evolution of its numbers over time will obey the logical induction criterion. Many people have heard that logical induction is about having uncertainty in purely logical facts. It is true that logical induction shows how to construct algorithms that maintain uncertainty in purely logical facts, but in my view this is not really the point of logical induction. The point of logical induction, in my view, is that it is always computable, even when reasoning about contradictory, uncomputable, or self-referential questions. Its capacity to maintain uncertainty about purely logical facts is actually a by-product of the computability of logical induction. Logical induction addresses the same basic problem that probability theory addresses. Logical induction and probability theory, therefore, are two different answers to the question: what is a reasonable formal method for quantifying uncertainty and updating it in light of evidence? Probability theory and logical induction both provide concrete operationalizations of "quantified uncertainty" (henceforth "credence"), and what it means for a set of credences to be "reasonable". Probability theory says that credences are "reasonable" if it is impossible for someone to bet against you in a way that is expected to make money, independent of the true state of the world (a Dutch book). Logical induction says that credences are "reasonable" if it is impossible for someone to bet against you in a way that makes more and more money over time with no corresponding down-side risk. The probability theory formulation is the stronger guarantee; its drawback is that it is not in general computable. The logical induction formulation is computable, and in this guide we will walk through a general purpose algorithm for computing credences given complicated, even self-referential, world models. At its core, the theory of logical induction consists of two things: A set of proofs showing that if you assign credences in a way that is consistent with the logical induction operationalization of uncertainty then your credences are guaranteed to exhibit certain common-sense desirable properties such as consistency over time, unbiasedness over time, converging to well-calibrated limits in a timely manner. An algorithm that assigns credences in a way th...

earth speech dutch ea flint python preservation software engineers logical probability colab jessica taylor rationalist andrew critch

EA - SFF Speculation Grants as an expedited funding source by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Dec 3, 2022 1:28

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: SFF Speculation Grants as an expedited funding source, published by Andrew Critch on December 3, 2022 on The Effective Altruism Forum. Hi everyone, SFF has received numerous emails recently from organizations interested in expedited funding. I believe a number of people here already know about SFF Speculation Grants, but since we've never actually announced our existence on the EA Forum before: The Survival and Flourishing Fund has a means of expediting funding requests at any time of year, via applications to our Speculation Grants program: SFF Speculation Grants are expedited grants organized by SFF outside of our biannual grant-recommendation process (the S-process). “Speculation Grantors” are volunteers with budgets to make these grants. Each Speculation Grantor's budget grows or increases with the settlement of budget adjustments that we call “impact futures” (explained further below). Currently, we have a total of ~20 Speculation Grantors, with a combined budget of approximately $4MM. Our process and software infrastructure for funding these grants were co-designed by Andrew Critch and Oliver Habryka. For instructions on how to apply, please visit the link above. For general information about the Survival and Flourishing Fund, see: Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

speech survival funding ea speculation grants sff expedited rationalist 4mm ea forum andrew critch

AF - Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Nov 24, 2022 5:15

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide, published by Andrew Critch on November 24, 2022 on The AI Alignment Forum. Motivation Löb's theorem is pretty counterintuitive. Speaking informally about one or more agents engaging in logically valid reasoning, the theorem says something like this: "If it's believed that believing a particular prophesy would cause it to self-fulfill, then the prophecy will be believed." Formally, the theorem just says □(□CC)□C , where □X means "X is provable". This is most weird if X is a false statement, because it means you can't prove that you can't prove X (Gödel's theorem). (Also: the box symbol above is not an unrendered symbol; it's supposed to be a box.) Meta-motivation I'd like to make Löb's theorem more intuitive for humans, because it shows how agents can sidestep the need to mentally simulate each other in games, by instead using reasoning/arguments that happen to self-fulfill in a good and defensible way. Basically, Löbian reflection helps agents to avoid metacognitive stack overflows when thinking about themselves and each other, and I want more human beings to understand how that can work, and sometimes already does work, in real world agents. State of the art The best attempt I've seen to make Löb's theorem more intuitive is Eliezer Yudkowsky's Cartoon Guide to Löb's Theorem, which is still quite confusing. The confusingness comes from thinking about a self-referential statement (Ψ on Wikipedia; S on Arbital) that's used to carry out the proof. The statement basically says "If this statement is provable, then C." Dealing with that sentence is pretty cumbersome, and requires a lot of fiddling around with nested statements and implications. Doing better I think we can make a new proof of Löb's theorem that doesn't use that weird self-referential sentence, by instead making the proof of Löb's theorem itself self-referential. Page 15 of the following paper poses an open problem on how to do this, which I think is possible to resolve affirmatively: Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory (Critch, Dennis, Russell, 2022) Here's a screenshot from it: If we can make a proof like that work, we could then use the following much shorter and simpler cartoon guide to explain it: In other words, I want to write a proof of Löb's theorem that is structured much like a Quine. I'd like the details to (eventually) be really crisp and readable, so it can be peer reviewed for correctness, at a level of rigor comparable to this earlier paper on Löb's theorem for proof systems with bounded length: A parametric, resource-bounded generalization of Löb's theorem, and a robust cooperation criterion for open-source game theory. Journal of Symbolic Logic. (Critch, 2019) As a possible start to writing such a proof, I think some of the same machinery (e.g., the diagonal lemma) from Boolos's textbook "The Logic of Provability" can be used to cook up self-referential proofs fitting the cartoon template above... thereby making Löb's theorem less mysterious and more intuitive forever. Further meta-motivation (added Nov 26) A key reason I'm interested in having a self-referential self-validating proof, rather than a normal-ish proof about a sentence that is self-referential (like Ψ on Wikipedia), is that human documents often refer to themselves, but human sentences rarely refer directly to themselves in isolation. This sentence is an exception, but such utterances are generally rare. So, making Löb more intuitive to humans either means making humans more accustomed to thinking about sentences that refer to themselves (so the traditional proof of Löb can be more intuitive), or finding a new proof that self-references in a way that's more like the way a human...

guide state speaking open journal speech proof wikipedia easier logic cartoons technical surprises ea cooperative theorem quine eliezer yudkowsky rationalist andrew critch

AF - Don't design agents which exploit adversarial inputs by Alex Turner

The Nonlinear Library

Play Episode Listen Later Nov 18, 2022 23:12

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Don't design agents which exploit adversarial inputs, published by Alex Turner on November 18, 2022 on The AI Alignment Forum. Summary. Consider two common alignment design patterns: Optimizing for the output of a grader which evaluates plans, and Fixing a utility function and then argmaxing over all possible plans. These design patterns incentivize the agent to find adversarial inputs to the grader (e.g. "manipulate the simulated human grader into returning a high evaluation for this plan"). I'm pretty sure we won't find adversarially robust grading rules. Therefore, I think these alignment design patterns are doomed. In this first essay, I explore the adversarial robustness obstacle. In the next essay, I'll point out how this is obstacle is an artifact of these design patterns, and not any intrinsic difficulty of alignment. Thanks to Erik Jenner, Johannes Treutlein, Quintin Pope, Charles Foster, Andrew Critch, randomwalks, and Ulisse Mini for feedback. 1: Optimizing for the output of a grader One motif in some AI alignment proposals is: An actor which proposes plans, and A grader which evaluates them. For simplicity, imagine we want the AI to find a plan where it makes an enormous number of diamonds. We train an actor to propose plans which the grading procedure predicts lead to lots of diamonds. In this setting, here's one way of slicing up the problem: Outer alignment: Find a sufficiently good grader. Inner alignment: Train the actor to propose plans which the grader rates as highly possible (ideally argmaxing on grader output, but possibly just intent alignment with high grader output). This "grader optimization" paradigm ordains that the AI find plans which make the grader output good evaluations. An inner-aligned actor is singlemindedly motivated to find plans which are graded maximally well by the grader. Therefore, for any goal by which the grader may grade, an inner-aligned actor is positively searching for adversarial inputs which fool the grader into spitting out a high number! In the diamond case, if the actor is inner-aligned to the grading procedure, then the actor isn't actually aligned towards diamond-production. The actor is aligned towards diamond-production as quoted via the grader's evaluations. In the end, the actor is aligned to the evaluations. I think that there aren't clever ways around this issue. Under this motif, under this way of building an AI, you're not actually building an AI which cares about diamonds, and so you won't get a system which makes diamonds in the limit of its capability development. Three clarifying points: This motif concerns how the AI makes decisions—this isn't about training a network using a grading procedure, it's about the trained agent being motivated by a grading procedure. The grader doesn't have to actually exist in the world. This essay's critiques are not related to "reward tampering", where the actor messes with the grader's implementation in order to increase the grades received. The "grader" can be a mathematical expected utility function over all action-sequences which the agent could execute. For example, it might take the action sequence and the agent's current beliefs about the world, and e.g. predict the expected number of diamonds produced by the actions. "The AI optimizes for what humanity would say about each universe-history" is an instance of grader-optimization, but "the AI has human values" is not an instance of grader-optimization. The parable of evaluation-child an AI should optimize for the real-world things I value, not just my estimates of those things. — The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables First, a mechanistically relevant analogy. Imagine a mother whose child has been goofing off at school and getting in trouble. The mom just wants her kid to t...

ai design train speech ea optimizing outer exploit inputs adversarial alex turner rationalist charles foster andrew critch

AF - SERI MATS Program - Winter 2022 Cohort by Ryan Kidd

The Nonlinear Library

Play Episode Listen Later Oct 8, 2022 7:39

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: SERI MATS Program - Winter 2022 Cohort, published by Ryan Kidd on October 8, 2022 on The AI Alignment Forum. The Stanford Existential Risks Initiative (SERI) recently opened applications for the Winter 2022 Cohort of the ML Alignment Theory Scholars (MATS) Program, which aims to help aspiring alignment researchers enter the field by facilitating research seminars, workshops, an academic community, and an independent research project with an alignment research mentor. Applications close on Oct 24 and include a written response to (potentially hard) mentor-specific selection questions, viewable on our website. Our current mentors include Alex Turner, Andrew Critch, Beth Barnes, Dan Hendrycks, Evan Hubinger, Jesse Clifton, John Wentworth, Nate Soares, Neel Nanda, Owain Evans, Quintin Pope, Rebecca Gorman, Richard Ngo, Stuart Armstrong, Vanessa Kosoy, Victoria Krakovna, and Vivek Hebbar. Program details MATS is a scientific and educational seminar and independent research program, intended to serve as an introduction to the field of AI alignment and allow networking with alignment researchers and institutions. The MATS Program Winter 2022 Cohort consists of: A 6-week online training program (averaging 10-20 h/week from Nov 7 to Dec 14); A 2-month in-person educational seminar and independent research program in Berkeley, California for select scholars (40 h/week from Jan 3 to Feb 24); and Possible ongoing 2-month extensions for select scholars, potentially in Berkeley, California or London, UK. During the research phase of the program, mentors will meet with scholars for around 1-2 h/week to share their research agenda and supervise the scholars' research projects. Scholars' research directions will initially be chosen by the mentors, but by default, scholars are expected to develop their independent research direction as the program continues. Educational seminars and workshops will be held 2-3 times per week, similar to our Summer Seminar Program. The MATS program is a joint initiative by the Stanford Existential Risks Initiative and the Berkeley Existential Risk Initiative, with support from Lightcone Infrastructure and Conjecture. We receive financial support from the Long-Term Future Fund. Who is this program for? Our ideal applicant has: an understanding of the AI alignment research landscape equivalent to having completed the AGI Safety Fundamentals course; previous experience with technical research (e.g. ML, CS, maths, physics, neuroscience, etc.), ideally at a postgraduate level; strong motivation to pursue a career in AI alignment research, particularly on longtermist grounds. Even if you do not entirely meet these criteria, we encourage you to apply! Several past scholars applied without strong expectations and were accepted. How to apply The program will run several concurrent streams, each for a different alignment research agenda. Read through the descriptions of each stream below and the associated candidate selection questions. To apply for a stream, submit an application via this portal, including your resume and a response to the appropriate candidate selection questions detailed on our website. We will assess your application based on your response and prior research experience. Feel free to apply for multiple streams—we will assess you independently for each. Please note that the candidate selection questions can be quite hard, depending on the mentor! Allow yourself sufficient time to apply to your chosen stream/s. A strong application to one stream may be of higher value than moderate applications to several streams (though we will assess you independently). Applications for the Winter 2022 Cohort are due by Oct 24. Frequently asked questions What are the key dates for MATS? 9/24: Applications released 10/24: Applications close 11/02: Applicants ac...

california uk ai speech berkeley applications educational ea cs ml scholars kidd mats seri cohorts applicants conjecture alex turner rationalist stuart armstrong neel nanda richard ngo owain evans andrew critch

LW - Clarifying Your Principles by Raemon

The Nonlinear Library

Play Episode Listen Later Oct 2, 2022 14:32

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Clarifying Your Principles, published by Raemon on October 1, 2022 on LessWrong. Following up on: Integrity and Accountability are Core Parts of Rationality Robust Agency for People and Organizations It's easy to end up in situations where people don't know which principles you stand for. (And, which ones you'll really stand for when it's inconvenient). It's easy to end up in situations where you don't know what principles you stand for, and which ones you'll actually stand up for when it's inconvenient. You can get away without knowing exactly what your principles are, for awhile. But if you want others to trust you in high stakes situations, it's helpful if you've proactively made your principles clear, and demonstrated that you can live by them. Most people don't really stand up for principles when it's inconvenient, so your prior should be that you probably won't, either. Being clear about your principles requires you to have them in the first place, and to practice the muscle of standing up for them, so that you know they are real and not just a vague applause light you are professing. This post is downstream of ideas I gained from Andrew Critch, Duncan Sabien, and Oliver Habryka, and benefitted from a lot of discussion with Elizabeth Van Nostrand (Though none of them necessarily endorse this essay). It is also somewhat inspired by Eliezer's Meta-Honesty: Firming Up Honesty Around Its Edge-Cases. Integrity Debt In software development, sometimes you need to write messy code that will cause problems for you later on down the line. Writing "good code" would take too long, and you need to ship your product. But, eventually, this messiness is going to make it harder for you to make progress on the codebase, and you'll want to spend some time simplifying your code to "pay down the debt." Similarly, if you're starting a new organization that depends on trust (either with the public, or particular stakeholders), there's a bunch of actions you might take to build that trust... which you may not have time to do when you're getting started. Integrity debt accumulates most acutely when you make controversial judgment calls that are hard to explain. Or, judgments that you reflectively wouldn't endorse. I think it also accumulates in small ways, when you do non-controversial but nonetheless confusing things. But perhaps more importantly (if subtly), integrity debt accumulates when you take on responsibilities that require you to have principles, without yet knowing what those principles are. This may work initially, but eventually will be like building a castle on a foundation of sand. Sooner or later you need to figure out the principles underlying you, your project, or your organization. If you've undertaken responsibilities without understanding the principles that will guide you in edge-cases, you may find that people have lost trust in you. Or, you may find you have lost trust in yourself. You may need to pay down your integrity debt. Or, alternately – declare bankruptcy, and transition into strategies that don't depend on trust. Which of these is the right strategy depends on your situation. Disclaimers: 1. There is a moral element to this, but a lot of my motivation here is figuring out "how do we improve coordination in a low-trust world?". People can disagree on what is morally commendable, but improve their ability to coordinate on net-improvements, even with people they disapprove of. 2. This article jumps back and forth between talking about Integrity, Trust, Accountability and Transparency. These are all different things, and I think you can have each one without the others. But I think they naturally fit together in particular ways that form some obvious strategies. I'm using 'integrity' and 'accountability' in ways similar to habryka. A quick recap: When I say...

trust writing accountability principles speech integrity transparency ea sooner clarifying eliezer rationalist lesswrong andrew critch

EA - Announcing Encultured AI: Building a Video Game by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Aug 18, 2022 5:52

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing Encultured AI: Building a Video Game, published by Andrew Critch on August 18, 2022 on The Effective Altruism Forum. Also available on LessWrong.Preceded By: Encultured AI Pre-planning, Part 2: Providing a Service If you've read to the end of our last post, you maybe have guessed: we're building a video game! This is gonna be fun :) Our homepage:/ Will Encultured save the world? Is this business plan too good to be true? Can you actually save the world by making a video game? Well, no. Encultured on its own will not be enough to make the whole world safe and happy forever, and we'd prefer not to be judged by that criterion. The amount of control over the world that's needed to fully pivot humanity from an unsafe path onto a safe one is, simply put, more control than we're aiming to have. And, that's pretty core to our culture. From our homepage: Still, we don't believe our company or products alone will make the difference between a positive future for humanity versus a negative one, and we're not aiming to have that kind of power over the world. Rather, we're aiming to take part in a global ecosystem of companies using AI to benefit humanity, by making our products, services, and scientific platform available to other institutions and researchers. Our goal is to play a part in what will be or could be a prosperous civilization. And for us, that means building a successful video game that we can use in valuable ways to help the world in the future! Fun is a pretty good target for us to optimize You might ask: how are we going to optimize for making a fun game and helping the world at the same time? The short answer is that creating a game world in which lots of people are having fun in diverse and interesting ways in fact creates an amazing sandbox for play-testing AI alignment & cooperation. If an experimental new AI enters the game and ruins the fun for everyone — either by overtly wrecking in-game assets, subtly affecting the game culture in ways people don't like, or both — then we're in a good position to say that it probably shouldn't be deployed autonomously in the real world, either. In the long run, if we're as successful we hope as a game company, we can start posing safety challenges to top AI labs of the form "Tell your AI to play this game in a way that humans end up endorsing." Thus, we think the market incentive to grow our user base in ways they find fun is going to be highly aligned with our long-term goals. Along the way, we want our platform to enable humanity to learn as many valuable lessons as possible about human↔AI interaction, in a low-stakes game environment before having to learn those lessons the hard way in the real world. Principles to exemplify In preparation for growing as a game company, we've put a lot of thought into how to ensure our game has a positive rather than negative impact on the world, accounting for its scientific impact, its memetic impact, as well as the intrinsic moral value of the game as a positive experience for people. Below are some guiding principles we're planning to follow, not just for ourselves, but also to set an example for other game companies: Pursue: Fun! We're putting a lot of thought into not only how our game can be fun, but also ensuring that the process of working at Encultured and building the game is itself fun and enjoyable. We think fun and playfulness are key for generating outcomes we want, including low-stakes high-information settings for interacting with AI systems. Maintain: opportunities to experiment. No matter how our product develops, we're committed to maintaining its value as a platform for experiments, especially experiments that help humanity navigate the present and future development of AI technology. Avoid: teaching bad lessons. On the margin, we expect our game to in...

ai fun video games principles speech providing maintain ea rationalist lesswrong andrew critch

LW - Announcing Encultured AI: Building a Video Game by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Aug 18, 2022 5:51

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing Encultured AI: Building a Video Game, published by Andrew Critch on August 18, 2022 on LessWrong. Also available on the EA Forum.Preceded By: Encultured AI Pre-planning, Part 2: Providing a as used Service If you read to the end of our last post, you maybe have guessed: we're building a video game! This is gonna be fun :) Our homepage:/ Will Encultured save the world? Is this business plan too good to be true? Can you actually save the world by making a video game? Well, no. Encultured on its own will not be enough to make the whole world safe and happy forever, and we'd prefer not to be judged by that criterion. The amount of control over the world that's needed to fully pivot humanity from an unsafe path onto a safe one is, simply put, more control than we're aiming to have. And, that's pretty core to our culture. From our homepage: Still, we don't believe our company or products alone will make the difference between a positive future for humanity versus a negative one, and we're not aiming to have that kind of power over the world. Rather, we're aiming to take part in a global ecosystem of companies using AI to benefit humanity, by making our products, services, and scientific platform available to other institutions and researchers. Our goal is to play a part in what will be or could be a prosperous civilization. And for us, that means building a successful video game that we can use in valuable ways to help the world in the future! Fun is pretty good target for us to optimize You might ask: how are we going to optimize for making a fun game and helping the world at the same time? The short answer is that creating a game world in which lots of people are having fun in diverse and interesting ways in fact creates an amazing sandbox for play-testing AI alignment & cooperation. If an experimental new AI enters the game and ruins the fun for everyone — either by overtly wrecking in-game assets, subtly affecting the game culture in ways people don't like, or both — then we're in a good position to say that it probably shouldn't be deployed autonomously in the real world, either. In the long run, if we're as successful we hope as a game company, we can start posing safety challenges to top AI labs of the form "Tell your AI to play this game in a way that humans end up endorsing." Thus, we think the market incentive to grow our user base in ways they find fun is going to be highly aligned with our long-term goals. Along the way, we want our platform to enable humanity to learn as many valuable lessons as possible about human↔AI interaction, in a low-stakes game environment before having to learn those lessons the hard way in the real world. Principles to exemplify In preparation for growing as a game company, we've put a lot of thought into how to ensure our game has a positive rather than negative impact on the world, accounting for its scientific impact, its memetic impact, as well as the intrinsic moral value of the game as a positive experience for people. Below are some guiding principles we're planning to follow, not just for ourselves, but also to set an example for other game companies: Pursue: Fun! We're putting a lot of thought into not only how our game can be fun, but also ensuring that the process of working at Encultured and building the game is itself fun and enjoyable. We think fun and playfulness are key for generating outcomes we want, including low-stakes high-information settings for interacting with AI systems. Maintain: opportunities to experiment. No matter how our product develops, we're committed to maintaining its value as a platform for experiments, especially experiments that help humanity navigate the present and future development of AI technology. Avoid: teaching bad lessons. On the margin, we expect our game to incentivize coo...

ai fun video games principles speech providing maintain ea rationalist lesswrong ea forum andrew critch

AF - I missed the crux of the alignment problem the whole time by zeshen

The Nonlinear Library

Play Episode Listen Later Aug 13, 2022 5:18

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: I missed the crux of the alignment problem the whole time, published by zeshen on August 13, 2022 on The AI Alignment Forum. This post has been written for the first Refine blog post day, at the end of the week of readings, discussions, and exercises about epistemology for doing good conceptual research. Thanks to Adam Shimi for helpful discussion and comments. I first got properly exposed to AI alignment ~1-2 years ago. I read the usual stuff like Superintelligence, The Alignment Problem, Human Compatible, a bunch of posts on LessWrong and Alignment Forum, watched all of Rob Miles' videos, and participated in the AGI Safety Fundamentals program. I recently joined Refine and had more conversations with people, and realized I didn't really get the crux of the problem all this while. I knew that superintelligent AI would be very powerful and would Goodhart whatever goals we give it, but I never really got how this relates to basically ‘killing us all'. It feels basically right that AIs will be misaligned by default and will do stuff that is not what we want it to do while pursuing instrumentally convergent goals all along. But the possible actions that such an AI could take seemed so numerous that ‘killing all of humanity' seemed like such a small point in the whole actionspace of the AI, that it would require extreme bad luck for us to be in that situation. First, this seems partially due to my background as a non-software engineer in oil and gas, an industry that takes safety very very seriously. In making a process safe, we quantify the risks of an activity, understand the bounds of the potential failure modes, and then take actions to mitigate against those risks and also implement steps to minimize damage should a failure mode be realized. How I think about safety is from the perspective of specific risk events and the associated probabilities, coupled with the exact failure modes of those risks. This thinking may have hindered my ability to think of the alignment problem in abstract terms, because I focused on looking for specific failure modes that I could picture in my head. Second, there are a few failure modes that seem more popular in the introductory reading materials that I was exposed to. None of them helped me internalize the crux of the problem. The first was the typical paperclip maximizer or ‘superintelligent AI will kill all of us' scenario. It feels like sci-fi that is not grounded in reality, leading to me failing to internalize the point about unboundedness. I do not dispute that a superintelligent AI will have the capabilities to destroy all of humanity, but it doesn't feel like it would actually do so. The other failure modes were from Paul Christiano's post which in my first reading boiled down to ‘powerful AIs will accelerate present-day societal failures but not pose any additional danger', as well as Andrew Critch's post which felt to me like ‘institutions have structurally perverse incentives that lead to the tragedy of the commons'. In my shallow understanding of both of these posts, current human societies have failure modes that will be accelerated by AIs because AIs basically speed things up, whether they are good or bad. So these scenarios were too close to normal scenarios to let me internalize the crux about unboundedness. My internal model of a superintelligent AI was a very powerful tool AI. I didn't really get why we are trying to ‘align it to human values' because I didn't really see human values as the crux of the problem, nor did I think having a superintelligent AI being fully aligned to a human's value would be particularly useful. Which human's values are we talking about anyway? Would it be any good for an AI to fully adopt human values only to end up like Hitler, who is no less a human than any of us are? The phrase ‘power...

ai speech adolf hitler missed alignment ea refine crux superintelligence goodhart rationalist alignment problem lesswrong human compatible andrew critch

AF - Encultured AI, Part 2: Providing a Service by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Aug 11, 2022 5:16

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Encultured AI, Part 2: Providing a Service, published by Andrew Critch on August 11, 2022 on The AI Alignment Forum. Also available on the EA Forum.Preceded by: Encultured AI, Part 1: Enabling New BenchmarksFollowed by: Encultured AI, Part 3 In the preceding post, we talked about how our plan with Encultured is to enable new existential safety benchmarks for AI. In this post, we'll talk about involving humans and human data in those benchmarks. Many of the types of benchmarks we want to enable are made more useful if we can involve humans in them. For example, testing whether an AI system can align its values with another agent is especially interesting if that other agent is a human being. So, we want a way to get lots of humans engaging with our platform. At first, we thought we'd pay humans to engage with the platform and generate data. In considering this, we wanted to make the process of engagement not-too-annoying for people, both so that it wouldn't make their lives worse, and so that we wouldn't have to pay them too much to engage. But then we thought: why not go a bit further, and provide something people intrinsically value? I.e., why not provide a service? Out of the gate, we thought: what's a service where people might not mind lots of experiments happening? A few possibilities come to mind for what we could build: Agriculture solutions. Agriculture is relatively geopolitically stabilizing (or non-destabilizing) as an AI application area, because powerful nations don't get especially nervous when they find out another superpower is getting better at agriculture (as opposed to, say, aerospace and defense). So, this seems like an area where we'd like to enable progress, including safety testing. However, we didn't see great ways of engaging lots of human users in a tool for agriculture, so this area didn't seem like a great source of data about human values, and we decided not to focus on it. A social media tool. Social media is ripe for lots of experiments with language models, which are exploding in popularity right now. However, this area wasn't a good fit for us, mainly because the concepts we want our benchmarks to explore, such as soft embodiment, are not easily represented on social media today. The ‘metaverse' will probably evolve to make this easier as time goes on, but we don't want to wait for that. A therapy or coaching tool. This would involve a lot of sensitive data-handling, which might be worth the effort to manage, except that — like with social media — this area wouldn't allow us to engage with safety testing for assisting physical entities (people!) in a physically embodied context. Education or tutoring software. This is an area where it feels hard to grow our user base in an “aligned” way; the people who pay for education (parents, states) are not the people who use it most (people aged 5 - 25). Also, progress in AI-based education is not obviously geopolitically stabilizing, because State A could view State B's progress in it as an enabling mechanism for centralized propaganda. Lastly, education tools aren't easily amenable (at the back-end) to enable our benchmark ideas for assisting physically embodied agents. A healthcare solution. Healthcare is an area we care deeply about. And, if we made products like prosthetics or other wearables, we'd be dealing directly with the wellbeing of real-world people, and grappling with many of the concepts we think are most important for AI safety/alignment benchmarking, including the assistance of physically embodied persons. Moreover, progress in AI solutions for healthcare is probably relatively geopolitically stabilizing, i.e., powerful countries aren't particularly scared of each other getting better at healthcare. So, this area came close to being our top choice, except for the fact that burea...

ai social education service healthcare speech providing agriculture ea preceded rationalist ea forum andrew critch

AF - Encultured AI, Part 1: Enabling New Benchmarks by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Aug 8, 2022 9:49

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Encultured AI, Part 1: Enabling New Benchmarks, published by Andrew Critch on August 8, 2022 on The AI Alignment Forum. Also available on the EA Forum.Followed by: Encultured AI, Part 2 (forthcoming) Hi! In case you're new to Encultured AI, we're a for-profit start-up with a public benefit mission: developing technologies promoting the long-term survival and flourishing of humanity and other sentient life. However, we also realize that AI poses an existential risk to humanity if not developed with adequate safety precautions. Given this, our goal is to develop products and services that help humanity steer toward the benefits and away from the risks of advanced AI systems. Per the “Principles” section of our homepage: Our current main strategy involves building a platform usable for AI safety and alignment experiments, comprising a suite of environments, tasks, and tools for building more environments and tasks. The platform itself will be an interface to a number of consumer-facing products, so our researchers and collaborators will have back-end access to services with real-world users. Over the next decade or so, we expect an increasing number of researchers — both inside and outside our company — will transition to developing safety and alignment solutions for AI technology, and through our platform and products, we're aiming to provide them with a rich and interesting testbed for increasingly challenging experiments and benchmarks. In the following, we'll describe the AI existential safety context that motivated us to found Encultured, and go into more detail about what we're planning to do. What's trending in AI x-safety? The technical areas below have begun to receive what we call “existential attention” from AI researchers, i.e., attention from professional AI researchers thinking explicitly about the impact of their work on existential safety: Trustworthiness & truthfulness — ensuring AI systems are telling us the truth and doing the things they and their creators say they're going to do. Preference learning — enabling AI systems to learn what humans want. Interpretability — enabling humans to understand what AI systems are thinking and doing. Robustness & risk management — ensuring AI systems continue functioning well in novel situations, and quantifying the risk that they won't. In other words, the topics above lie in the intersection of the following Venn diagram: See Appendix 1 for examples of research in these areas. More research in these areas is definitely warranted. A world where 20%+ of AI and ML researchers worldwide pivoted to focusing on the topics above would be a better world, in our opinion. If our product is successful, we plan to grant access to researchers inside and outside our company for performing experiments in the areas above, interacting directly with users on our platform. And, our users will be aware of this ;) We're planning on this not only because it will benefit the world, but because it will benefit our products directly: the most valuable tools and services are trustworthy, truthful, preference-sensitive, interpretable, and robust. What's emerging in AI x-safety? The following topics have received research attention from some researchers focused on existential safety, and AI research attention from other researchers, but to us the two groups don't (yet) seem to overlap as much as for the ‘trending' topics above. Cooperative AI — designing AI technologies in ways that enable improved cooperation between humans and AI systems, while preventing collusion between AI systems, i.e., cooperation between AI systems that would be harmful or deceptive to humanity. (see Appendix 2 for related research.) Multi-stakeholder control of AI systems — allowing people with diverse values, such as from competing geopolitical factions, to share ...

ai principles speech ea ml enabling preference venn benchmarks appendix trustworthiness robustness rationalist interpretability ea forum andrew critch

AF - Encultured AI, Part 1 Appendix: Relevant Research Examples by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Aug 8, 2022 13:42

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Encultured AI, Part 1 Appendix: Relevant Research Examples, published by Andrew Critch on August 8, 2022 on The AI Alignment Forum. Also available on the EA Forum.Appendix to: Encultured AI, Part 1: Enabling New BenchmarksFollowed by: Encultured AI, Part 2: Providing a Service Appendix 1: “Trending” AI x-safety research areas We mentioned a few areas of “trending” AI x-safety research above; below are some more concrete examples of what we mean: Trustworthiness & truthfulness: Owain Evans, Owen Cotton-Barratt and others have authored “Truthful AI: Developing and governing AI that does not lie” (arxiv, 2021; twitter thread). Andreas Stuhlmüller, Jungwon Byun and others at Ought.org are building an AI-powered research assistant called Elicit (website); here is the product:. Task-specific (narrow) preference learning: Paul Christiano et al (arxiv, 2017) developed a data-efficient preference-learning technique for training RL-based systems, which is now very widely cited (scholar). Jan Leike, now at OpenAI, leads a team working on ‘scalable alignment' using preference-learning techniques (arxiv, 2018) (blog). Interpretability: Chris Olah (scholar) leads an interpretability research group at Anthropic. Anthropic (website) is culturally very attuned to large-scale risks from AI, including existential risks. Buck Shlegeris and others at Redwood Research (website) have built an interpretability tool for analyzing transformer networks trained on natural language (demo). Prof. Cynthia Rudin at Duke (homepage) approaches interpretability by trying to replace black-box models with more interpretable ones (arxiv, 2018), and we know from conversations with her that she is open to applications of her work to existential safety. Robustness & risk management: Prof. Jaime Fisac at Princeton (homepage) researches AI safety for robotics, high-dimensional control systems and multi-agent systems (scholar), including provable robustness guarantees. He was previously a PhD student at the UC Berkeley Center for Human-Compatible AI (CHAI), provided extensive feedback on AI Research Considerations for Human Existential Safety (ARCHES) (arxiv, 2020), and is very attuned to existential safety as a cause area. Prof. David Krueger at Cambridge (scholar) studies out-of-distribution generalization (pdf, 2021), and is currently taking on students. Adam Gleave (homepage) is a final-year PhD student at CHAI / UC Berkeley, and studies out-of-distribution robustness for deep RL. Sam Toyer (scholar), also a PhD student at CHAI, has developed a benchmark for robust imitation learning (pdf, 2020). Appendix 2: “Emerging” AI x-safety research areas In this post, we classified cooperative AI and multi-stakeholder control of AI systems as “emerging” topics in AI x-safety. Here's more about what we mean, and why: Cooperative AI This area is “emerging” in x-safety because there's plenty of attention to the issue of cooperation from both policy-makers and AI researchers, but not yet much among folks focused on x-risk. Existential safety attention on cooperative AI: Many authors — too many to name! — have remarked on the importance of international coordination on AI safety efforts, including existential safety. For instance, there is a Wikipedia article on AI arms races (wikipedia). This covers the human–human side of the cooperative AI problem. AI research on cooperative AI: Multi-agent systems research has a long history in AI (scholar search), as does multi-agent reinforcement learning (scholar search). DeepMind's Multi-agent Learning team has recently written number papers examining competition and cooperation between artificial agents (website). OpenAI has done some work on multi-agent interaction, e.g. emergent tool use in multi-agent interaction (arxiv). Prof. Jakob Foerster at Oxford (scholar search), and ...

learning ai phd research prof speech oxford wikipedia cambridge providing tasks relevant ea openai existential chai deepmind appendix rl trustworthiness anthropic robustness elicit rationalist uc berkeley center jan leike david krueger owain evans redwood research ea forum buck shlegeris andrew critch

LW - «Boundaries», Part 2: trends in EA's handling of boundaries by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Aug 6, 2022 11:48

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: «Boundaries», Part 2: trends in EA's handling of boundaries, published by Andrew Critch on August 6, 2022 on LessWrong. This is Part 2 of my «Boundaries» Sequence on LessWrong, and is also available on the EA Forum. Summary: Here I attempt to constructively outline various helpful and harmful trends I see in the EA and rationality communities, which I think arise from a tendency to ignore the boundaries of social systems (rather than a generic tendency to violate norms). Taken too far, this tendency can manifest as a 'lack of respect' for boundaries, by which I mean a mixture of not establishing boundaries where they'd be warranted, crossing boundaries in clumsy/harmful ways, and not following good procedures for deciding whether to cross a boundary. I propose that if some version of «respecting boundaries» were a more basic tenet of EA — alongside or within the principles of neglect, importance, and tractability — then EA would have fewer problems and do more good for the world. The trends Below are the (good and bad) trends I'd like to analyze from the perspective of respecting boundaries. Each of these trends has also been remarked by at least a dozen self-identified EAs I know personally: expansive thinking: lots of EAs are good at 'thinking outside the box' about how to do good for the world, e.g., expanding one's moral circle (example: Singer, 2011). niche-finding: EA helps people find impactful careers that are a good fit for their own strengths and weaknesses (example: 80k, 2014). work/life balance: EA has sometimes struggled with people working too hard for the cause areas, in ways that harm them as individuals in an unsustainable way. romances at work: people dating their co-workers are an example of a professional boundary being crossed by personal affairs. social abrasiveness: EA culture, and perhaps more so rationalist culture, is often experienced by newcomers or outsiders as abrasive or harsh. (Hypocrisy flag: I think I've been guilty of this, though hopefully less over the past few years as I've gotten older and reflected on these topics.) pivotal acts: numerous EAs seriously consider pivotal acts — i.e., a potential unilateral acts by a powerful, usually AI-enabled actor, to make the world permanently safer — as the best way to do good for humanity (as opposed to pivotal processes carried out multilaterally). resistance from AI labs: well-established AI labs are resistant to adopting EA culture as a primary guiding influence. I'm going to analyze each of the above trends in terms of boundaries, partly to illustrate the importance of the boundary concept, partly to highlight some cool things the EA movement seems to have done with boundaries, and partly to try helping somewhat with the more problematic trends above. 1. Expansive thinking Consider a fictional person named Alex. A "job" for Alex is a scope of affairs (features of the world) that Alex is considered responsible for observing and handling. Alex might have multiple roles that we can think of as jobs, e.g. "office manager", "husband", "neighbor". Alex probably thinks about the world beyond the scope of his job(s). But usually, Alex doesn't take actions outside the scope of his job(s): The Effective Altruism movement has provided a lot of discourse and social context that helps people extend their sense of "job" to include important and neglected problems in the world that might be tractable to them-personally, e.g., global poverty (tractable via donations to Give Directly). In other words, EA has helped people to expand both their circle of compassion and their scope of responsibility to act. See also "The Self-Expansion Model of Motivation in Close Relationships" (Aron, 2023). 2. Niche-finding Identifying areas of competence and vulnerability are important for scoping out Alex's job(s). Thi...

ai motivation boundaries speech singer hypocrisy ea thi eas effective altruism givedirectly rationalist lesswrong ea forum andrew critch

C. Brandon Ogbunu on Epistasis & The Primacy of Context in Complex Systems

COMPLEXITY

Play Episode Listen Later Apr 8, 2022 74:17

Context is king: whether in language, ecology, culture, history, economics, or chemistry. One of the core teachings of complexity science is that nothing exists in isolation — especially when it comes to systems in which learning, memory, or emergent behaviors play a part. Even though this (paradoxically) limits the universality of scientific claims, it also lets us draw analogies between the context-dependency of one phenomenon and others: how protein folding shapes HIV evolution is meaningfully like the way that growing up in a specific neighborhood shapes educational and economic opportunity; the paths through a space of all possible four-letter words are constrained in ways very similar to how interactions between microbes impact gut health; how we make sense both depends on how we've learned and places bounds on what we're capable of seeing.Welcome to COMPLEXITY, the official podcast of the Santa Fe Institute. I'm your host, Michael Garfield, and every other week we'll bring you with us for far-ranging conversations with our worldwide network of rigorous researchers developing new frameworks to explain the deepest mysteries of the universe.This week on Complexity, we talk to Yale evolutionary biologist C. Brandon Ogbunu (Twitter, Google Scholar, GitHub) about the importance of environment to the activity and outcomes of complex systems — the value of surprise, the constraints of history, the virtue and challenge of great communication, and much more. Our conversation touches on everything from using word games to teach core concepts in evolutionary theory, to the ways that protein quality control co-determines the ability of pathogens to evade eradication, to the relationship between human artists, algorithms, and regulation in the 21st Century. Brandon works not just in multiple scientific domains but as the author of a number of high-profile blogs exploring the intersection of science and culture — and his boundaryless fluency shines through in a discussion that will not be contained, about some of the biggest questions and discoveries of our time.If you value our research and communication efforts, please subscribe to Complexity Podcast wherever you prefer to listen, rate and review us at Apple Podcasts, and/or consider making a donation at santafe.edu/give. You'll find plenty of other ways to engage with us at santafe.edu/engage.Thank you for listening!Join our Facebook discussion group to meet like minds and talk about each episode.Podcast theme music by Mitch Mignano.Follow us on social media:Twitter • YouTube • Facebook • Instagram • LinkedInDiscussed in this episode:“I do my science biographically…I find a personal connection to the essence of the question.”– C. Brandon Ogbunugafor on RadioLab"Environment x everything interactions: From evolution to epidemics and beyond"Brandon's February 2022 SFI Seminar (YouTube Video + Live Twitter Coverage)“A Reflection on 50 Years of John Maynard Smith's ‘Protein Space'”C. Brandon Ogbunugafor in GENETICS“Collective Computing: Learning from Nature”David Krakauer presenting at the Foresight Institute in 2021 (with reference to Rubik's Cube research)“Optimal Policies Tend to Seek Power”Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli in arXiv“A New Take on John Maynard Smith's Concept of Protein Space for Understanding Molecular Evolution”C. Brandon Ogbunugafor, Daniel Hartl in PLOS Computational Biology“The 300 Most Common Words”by Bruce Sterling“The Host Cell's Endoplasmic Reticulum Proteostasis Network Profoundly Shapes the Protein Sequence Space Accessible to HIV Envelope”Jimin Yoon, Emmanuel E. Nekongo, Jessica E. Patrick, Angela M. Phillips, Anna I. Ponomarenko, Samuel J. Hendel, Vincent L. Butty, C. Brandon Ogbunugafor, Yu-Shan Lin, Matthew D. Shoulders in bioRxiv“Competition along trajectories governs adaptation rates towards antimicrobial resistance”C. Brandon Ogbunugafor, Margaret J. Eppstein in Nature Ecology & Evolution“Scientists Need to Admit What They Got Wrong About COVID”C. Brandon Ogbunugafor in WIRED“Deconstructing higher-order interactions in the microbiota: A theoretical examination”Yitbarek Senay, Guittar John, Sarah A. Knutie, C. Brandon Ogbunugafor in bioRxiv“What Makes an Artist in the Age of Algorithms?”C. Brandon Ogbunugafor in WIREDNot mentioned in this episode but still worth exploring:“Part of what I was getting after with Blackness had to do with authoring ideas that are edgy or potentially threatening. That as a scientist, you can generate ideas in the name of research, in the name of breaking new ground, that may stigmatize you. That may kick you out of the club, so to speak, because you're not necessarily following the herd.”– Physicist Stephon Alexander in an interview with Brandon at Andscape“How Afrofuturism Can Help The World Mend”C. Brandon Ogbunugafor in WIRED“The COVID-19 pandemic amplified long-standing racial disparities in the United States criminal justice system”Brennan Klein, C. Brandon Ogbunugafor, Benjamin J. Schafer, Zarana Bhadricha, Preeti Kori, Jim Sheldon, Nitish Kaza, Emily A. Wang, Tina Eliassi-Rad, Samuel V. Scarpino, Elizabeth Hinton in medRxivAlso mentioned:Simon Conway Morris, Geoffrey West, Samuel Scarpino, Rick & Morty, Stuart Kauffman, Frank Salisbury, Stephen Jay Gould, Frances Arnold, John Vervaeke, Andreas Wagner, Jennifer Dunne, James Evans, Carl Bergstrom, Jevin West, Henry Gee, Eugene Shakhnovich, Rafael Guerrero, Gregory Bateson, Simon DeDeo, James Clerk Maxwell, Melanie Moses, Kathy Powers, Sara Walker, Michael Lachmann, and many others...

united states artist environment concept hiv context yale algorithms complexity rick and morty cube github blackness epidemiology rubik virology science communication primacy google scholar evolutionary biology complex systems santa fe institute sfi john vervaeke james evans gregory bateson stephen jay gould evolutionary theory james clerk maxwell michael garfield complexity theory elizabeth hinton geoffrey west carl bergstrom foresight institute complexity science frances arnold logan smith jevin west andreas wagner stuart kauffman rafael guerrero brennan klein epistasis simon dedeo rohin shah andrew critch

Podcast appearances and mentions of andrew critch

Best podcasts about andrew critch

The Nonlinear Library

The Nonlinear Library: LessWrong Top Posts

The Nonlinear Library: LessWrong

The Nonlinear Library: Alignment Forum Top Posts

LessWrong Curated Podcast

The Nonlinear Library: LessWrong Daily

Latest news about andrew critch

Latest podcast episodes about andrew critch

LW - Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It. by Andrew Critch

LW - Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It. by Andrew Critch

AF - The need for multi-agent experiments by Martín Soto

LW - The need for multi-agent experiments by Martín Soto

LW - The need for multi-agent experiments by Martín Soto

AF - AI Constitutions are a tool to reduce societal scale risk by Samuel Dylan Martin

AF - Safety isn't safety without a social model (or: dispelling the myth of per se technical safety) by Andrew Critch

“Safety isn't safety without a social model (or: dispelling the myth of per se technical safety)” by Andrew_Critch

[HUMAN VOICE] "CFAR Takeaways: Andrew Critch" by Raemon

CFAR Takeaways: Andrew Critch

LW - CFAR Takeaways: Andrew Critch by Raemon

LW - CFAR Takeaways: Andrew Critch by Raemon

LW - Apply to the Conceptual Boundaries Workshop for AI Safety by Chipmonk

AF - Consciousness as a conflationary alliance term by Andrew Critch

LW - Consciousness as a conflationary alliance term by Andrew Critch

AF - TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI by Andrew Critch

LW - TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI by Andrew Critch

LW - My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI by Andrew Critch

LW - Job Opening: SWE to help build signature vetting system for AI-related petitions by Ethan Ashkie

LW - «Boundaries» and AI safety compilation by Chipmonk

LW - Boundaries-based security and AI safety approaches by Allison Duettmann

AF - Some constructions for proof-based cooperation without Löb by James Payor

LW - Some constructions for proof-based cooperation without Löb by James Payor

LW - Probabilistic Payor Lemma? by abramdemski

AF - Probabilistic Payor Lemma? by Abram Demski

AF - GPT can write Quines now (GPT-4) by Andrew Critch

LW - GPT can write Quines now (GPT-4) by Andrew Critch

LW - Acausal normalcy by Andrew Critch

AF - Acausal normalcy by Andrew Critch

AF - Payor's Lemma in Natural Language by Andrew Critch

AF - Modal Fixpoint Cooperation without Löb's Theorem by Andrew Critch

LW - Modal Fixpoint Cooperation without Löb's Theorem by Andrew Critch

AF - Löbian emotional processing of emergent cooperation: an example by Andrew Critch

AF - Löb's Theorem for implicit reasoning in natural language: Löbian party invitations by Andrew Critch

AF - Löb's Lemma: an easier approach to Löb's Theorem by Andrew Critch

LW - «Boundaries», Part 3b: Alignment problems in terms of boundaries by Andrew Critch

AF - Logical induction for software engineers by Alex Flint

EA - SFF Speculation Grants as an expedited funding source by Andrew Critch

AF - Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide by Andrew Critch

AF - Don't design agents which exploit adversarial inputs by Alex Turner

AF - SERI MATS Program - Winter 2022 Cohort by Ryan Kidd

LW - Clarifying Your Principles by Raemon

EA - Announcing Encultured AI: Building a Video Game by Andrew Critch

LW - Announcing Encultured AI: Building a Video Game by Andrew Critch

AF - I missed the crux of the alignment problem the whole time by zeshen

AF - Encultured AI, Part 2: Providing a Service by Andrew Critch

AF - Encultured AI, Part 1: Enabling New Benchmarks by Andrew Critch

AF - Encultured AI, Part 1 Appendix: Relevant Research Examples by Andrew Critch

LW - «Boundaries», Part 2: trends in EA's handling of boundaries by Andrew Critch

C. Brandon Ogbunu on Epistasis & The Primacy of Context in Complex Systems