Podcast appearances and mentions of david krueger

  • 29PODCASTS
  • 129EPISODES
  • 33mAVG DURATION
  • ?INFREQUENT EPISODES
  • Oct 21, 2024LATEST
david krueger

POPULARITY

20172018201920202021202220232024


Best podcasts about david krueger

Latest podcast episodes about david krueger

Enduring Churches Podcast with Alan and Trent
Episode 296: Our Interview With David Krueger

Enduring Churches Podcast with Alan and Trent

Play Episode Listen Later Oct 21, 2024 38:37


The Nonlinear Library
AF - [Paper] Stress-testing capability elicitation by training password-locked models by Fabien Roger

The Nonlinear Library

Play Episode Listen Later Jun 4, 2024 22:23


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Paper] Stress-testing capability elicitation by training password-locked models, published by Fabien Roger on June 4, 2024 on The AI Alignment Forum. The paper is by Ryan Greenblatt, Fabien Roger, Dmitrii Krasheninnikov and David Krueger. This post was written by Fabien and Ryan, and may not reflect the views of Dmitrii and David. Scheming models might try to perform less capably than they are able to ( sandbag). They might do this to make evaluations underestimate danger, slow down promising safety work, or cause issues with debate or recursive oversight more generally. You might hope that you can avoid this issue by eliciting capabilities, for example by fine-tuning on demonstrations and then using RL. In principle, this elicitation might fail. It could fail due to having too few (or no) high-quality demonstrations (i.e. sample efficiency is too low) or because RL fails to explore good actions. We're uncertain how this goes in practice. It could be the case that simple countermeasures prevent sandbagging in most of the important cases such that it isn't a big problem for safety. But it could also turn out that even high-effort approaches can't rule out these issues. We released a paper studying this by examining how well supervised fine-tuning and RL can elicit capabilities from password-locked models (LLMs trained to produce low-quality outputs except when a particular password is provided). We find that: If you have some high-quality demonstrations, supervised fine-tuning usually recovers the hidden capabilities with very few samples. Elicitation on password-locked models generalizes well across domains (even when we made password-locked models using techniques that we thought might prevent generalization). Using RL on password-locked models recovers hidden capabilities, except when the weak model RL starts from is extremely weak. While we target cases where AIs intentionally produce low-quality outputs and they could easily produce higher quality outputs, we also think our results have some relevance to other cases where AIs produce low-quality outputs despite being able to produce better ones. Here are the main takeaways: We think our results are a moderate update against scheming models being able to sandbag effectively (given countermeasures) for the kind of tasks we studied here, though there are differences between password-locked models and actual scheming models. We think this should be a bigger update for people who are relatively more pessimistic than us about sandbagging (at least insofar as their views make pessimistic predictions about current architectures). (We think this is also a small update against scheming, since scheming models might still try to sandbag, and if elicitation fine-tunes sandbagging out, it may partially erode scheming.) People interested in studying sandbagging or similar elicitation failures should consider using a methodology similar to the password-locked models methodology we use here. More generally, they should consider using some sort of adversarial methodology in which a red team constructs a model organism. In this post: We explain why good capability elicitation matters for AI safety We describe the sandbagging concern we are trying to assess; We summarize the results of our paper, and discuss their implications; We discuss how our results relate to relevant AGI safety work; We describe future work we are excited by; We discuss concerns other than intentional sandbagging for which our results might be relevant. The big picture: why good capability elicitation matters for AI safety Currently, AI labs try reasonably hard to get as much performance as possible out of their expensive pretrained model. The process of making an AI perform a task competently (without a huge amount of additional training) is often called eli...

The Nonlinear Library
AF - Paper in Science: Managing extreme AI risks amid rapid progress by JanB

The Nonlinear Library

Play Episode Listen Later May 23, 2024 1:58


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Paper in Science: Managing extreme AI risks amid rapid progress, published by JanB on May 23, 2024 on The AI Alignment Forum. https://www.science.org/doi/10.1126/science.adn0117 Authors: Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner*, Sören Mindermann* Abstract: Artificial intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI, there is a lack of consensus about how to manage them. Society's response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness and barely address autonomous systems. Drawing on lessons learned from other safety-critical technologies, we outline a comprehensive plan that combines technical research and development with proactive, adaptive governance mechanisms for a more commensurate preparation. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
AF - New report: Safety Cases for AI by Josh Clymer

The Nonlinear Library

Play Episode Listen Later Mar 20, 2024 1:33


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New report: Safety Cases for AI, published by Josh Clymer on March 20, 2024 on The AI Alignment Forum. ArXiv paper: https://arxiv.org/abs/2403.10462 The idea for this paper occurred to me when I saw Buck Shlegeris' MATS stream on "Safety Cases for AI." How would one justify the safety of advanced AI systems? This question is fundamental. It informs how RSPs should be designed and what technical research is useful to pursue. For a long time, researchers have (implicitly or explicitly) discussed ways to justify that AI systems are safe, but much of this content is scattered across different posts and papers, is not as concrete as I'd like, or does not clearly state their assumptions. I hope this report provides a helpful birds-eye view of safety arguments and moves the AI safety conversation forward by helping to identify assumptions they rest on (though there's much more work to do to clarify these arguments). Thanks to my coauthors: Nick Gabrieli, David Krueger, and Thomas Larsen -- and to everyone who gave feedback: Henry Sleight, Ashwin Acharya, Ryan Greenblatt, Stephen Casper, David Duvenaud, Rudolf Laine, Roger Grosse, Hjalmar Wijk, Eli Lifland, Oliver Habryka, Sim eon Campos, Aaron Scher, Lukas Berglund, and Nate Thomas. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
LW - New report: Safety Cases for AI by joshc

The Nonlinear Library

Play Episode Listen Later Mar 20, 2024 1:32


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New report: Safety Cases for AI, published by joshc on March 20, 2024 on LessWrong. ArXiv paper: https://arxiv.org/abs/2403.10462 The idea for this paper occurred to me when I saw Buck Shlegeris' MATS stream on "Safety Cases for AI." How would one justify the safety of advanced AI systems? This question is fundamental. It informs how RSPs should be designed and what technical research is useful to pursue. For a long time, researchers have (implicitly or explicitly) discussed ways to justify that AI systems are safe, but much of this content is scattered across different posts and papers, is not as concrete as I'd like, or does not clearly state their assumptions. I hope this report provides a helpful birds-eye view of safety arguments and moves the AI safety conversation forward by helping to identify assumptions they rest on (though there's much more work to do to clarify these arguments). Thanks to my coauthors: Nick Gabrieli, David Krueger, and Thomas Larsen -- and to everyone who gave feedback: Henry Sleight, Ashwin Acharya, Ryan Greenblatt, Stephen Casper, David Duvenaud, Rudolf Laine, Roger Grosse, Hjalmar Wijk, Eli Lifland, Oliver Habryka, Sim eon Campos, Aaron Scher, Lukas Berglund, and Nate Thomas. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The Nonlinear Library: LessWrong
LW - New report: Safety Cases for AI by joshc

The Nonlinear Library: LessWrong

Play Episode Listen Later Mar 20, 2024 1:32


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New report: Safety Cases for AI, published by joshc on March 20, 2024 on LessWrong. ArXiv paper: https://arxiv.org/abs/2403.10462 The idea for this paper occurred to me when I saw Buck Shlegeris' MATS stream on "Safety Cases for AI." How would one justify the safety of advanced AI systems? This question is fundamental. It informs how RSPs should be designed and what technical research is useful to pursue. For a long time, researchers have (implicitly or explicitly) discussed ways to justify that AI systems are safe, but much of this content is scattered across different posts and papers, is not as concrete as I'd like, or does not clearly state their assumptions. I hope this report provides a helpful birds-eye view of safety arguments and moves the AI safety conversation forward by helping to identify assumptions they rest on (though there's much more work to do to clarify these arguments). Thanks to my coauthors: Nick Gabrieli, David Krueger, and Thomas Larsen -- and to everyone who gave feedback: Henry Sleight, Ashwin Acharya, Ryan Greenblatt, Stephen Casper, David Duvenaud, Rudolf Laine, Roger Grosse, Hjalmar Wijk, Eli Lifland, Oliver Habryka, Sim eon Campos, Aaron Scher, Lukas Berglund, and Nate Thomas. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Mystery & Comedy Old Time Radio Podcast
Giving November part 1

Mystery & Comedy Old Time Radio Podcast

Play Episode Listen Later Nov 16, 2023 35:48


Hi guys and welcome back to the mystery and comedy all time radio podcast. I hope you guys are having a wonderful night as your shopping for the turkey getting ready for the big turkey day. Or just relaxing and thanking God for what he has truly blessed you with this Thanksgiving season. Whatever you are doing I hope you guys enjoy my presentation as I bring you part one of giving November. As we celebrate giving November we remember to be thankful to God everyday not just once or twice a year but every day is a blessing from him. Please join me as I bring to the show Mr David Krueger in the Hallmark playhouse presentation of Pete Thanksgiving. In this we find a man and a woman have came from their native land of Europe to come to America to fulfill their dream of owning a restaurant. As preparations are going for the restaurant Pete's wife gives him a letter from the Red Cross and Pete thinks it's some exciting news about how well his grandmother is doing. But Pete ended up ripping up the letter and not telling his wife what it's about. Later on in the episode we find Pete and his wife getting ready for there Thanksgiving grand opening and Pete goes to local businessmen who have a club and tells them the wonderful news that he's wanting to do for them filling up their stomachs for nothing on Thanksgiving Day. As a representation of how wonderful America is by giving an immigrant a dream and a place to call home. The men in the club start laughing after Pete leaves and saying how ludicrous is idea is. But later on we find out that Pete has given all his money and all his food to feed the poor community in the town that he lives in. The poor people are so deeply touched by his war in generosity that he goes to the men at the club a second time and tells them what he did and he tells them the sad off of truth about his grandmother passing away from starvation as she had nothing to eat but grass and some soap. So the midnight the plug end up having a change of heart and ask the couple is celebrating Thanksgiving and not comes on their door and it's the man that is the head of the club and he ends up giving them a lot of food that the whole town has raised as the warrant of this holiday season reminds us to be thankful. I hope you guys enjoy David Krueger performance on the show tonight and part one of giving November. Just remember guys to be thankful for every day I hope you guys enjoy the show please listen and subscribe now my podcast is available on YouTube music just type in mystery & comedy old time radio podcast. Im so thankful for everyone who has listened and subscribe I just wanted to remind you guys that Toys for tots fundraiser. And I want to thank everyone for your support. And always remember to enjoy the show guys thanks. --- Support this podcast: https://podcasters.spotify.com/pod/show/mysterycomedypod1942/support

The Nonlinear Library
LW - AI Regulation May Be More Important Than AI Alignment For Existential Safety by otto.barten

The Nonlinear Library

Play Episode Listen Later Aug 24, 2023 8:07


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Regulation May Be More Important Than AI Alignment For Existential Safety, published by otto.barten on August 24, 2023 on LessWrong. Summary: Aligning a single powerful AI is not enough: we're only safe if no-one, ever, can build an unaligned powerful AI. Yudkowsky tried to solve this with the pivotal act: the first aligned AI does something (such as melting all GPUs) which makes sure no unaligned AIs can ever get built, by anyone. However, the labs are currently apparently not aiming to implement a pivotal act. That means that aligning an AGI, while creating lots of value, would not reduce existential risk. Instead, global hardware/data regulation is what's needed to reduce existential risk. Therefore, those aiming to reduce AI existential risk should focus on AI Regulation, rather than on AI Alignment. Epistemic status: I've been thinking about this for a few years, while working professionally on x-risk reduction. I think I know most literature on the topic. I have also discussed the topic with a fair number of experts (who in some cases seemed to agree, and in other cases did not seem to agree). Thanks to David Krueger, Matthijs Maas, Roman Yampolskiy, Tim Bakker, Ruben Dieleman, and Alex van der Meer for helpful conversations, comments, and/or feedback. These people do not necessarily share the views expressed in this post. This post is mostly about AI x-risk caused by a take-over. It may or may not be valid for other types of AI x-risks. This post is mostly about the 'end game' of AI existential risk, not about intermediate states. AI existential risk is an evolutionary problem. As Eliezer Yudkowsky and others have pointed out: even if there are safe AIs, those are irrelevant, since they will not prevent others from building dangerous AIs. Examples of safe AIs could be oracles or satisficers, insofar as it turns out to be possible to combine these AI types with high intelligence. But, as Yudkowsky would put it: "if all you need is an object that doesn't do dangerous things, you could try a sponge". Even if a limited AI would be a safe AI, it would not reduce AI existential risk. This is because at some point, someone would create an AI with an unbounded goal (create as many paperclips as possible, predict the next word in the sentence with unlimited accuracy, etc.). This is the AI that would kill us, not the safe one. This is the evolutionary nature of the AI existential risk problem. It is described excellently by Anthony Berglas in his underrated book, and more recently also in Ben Hendrycks' paper. This evolutionary part is a fundamental and very important property of AI existential risk and a large part of why this problem is difficult. Yet, many in AI Alignment and industry seem to focus on only aligning a single AI, which I think is insufficient. Yudkowsky aimed to solve this evolutionary problem (the fact that no-one, ever, should build an unsafe AI) with the so-called pivotal act. An aligned superintelligence would not only not kill humanity, it would also perform a pivotal act, the toy example being to melt all GPUs globally, or, as he later put it, to subtly change all GPUs globally so that they can no longer be used to create an AGI. This would be the act that would actually save humanity from extinction, by making sure no unsafe superintelligences are created, ever, by anyone (it may be argued that melting all GPUs, and all other future hardware that could run AI, would need to be done indefinitely by the aligned superintelligence, else even a pivotal act may be insufficient). The concept of a pivotal act, however, seems to have gone thoroughly out of fashion. None of the leading labs, AI governance think tanks, governments, etc. are talking or, apparently, thinking much about it. Rather, they seem to be thinking about things like non-proliferati...

The Nonlinear Library: LessWrong
LW - AI Regulation May Be More Important Than AI Alignment For Existential Safety by otto.barten

The Nonlinear Library: LessWrong

Play Episode Listen Later Aug 24, 2023 8:07


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Regulation May Be More Important Than AI Alignment For Existential Safety, published by otto.barten on August 24, 2023 on LessWrong. Summary: Aligning a single powerful AI is not enough: we're only safe if no-one, ever, can build an unaligned powerful AI. Yudkowsky tried to solve this with the pivotal act: the first aligned AI does something (such as melting all GPUs) which makes sure no unaligned AIs can ever get built, by anyone. However, the labs are currently apparently not aiming to implement a pivotal act. That means that aligning an AGI, while creating lots of value, would not reduce existential risk. Instead, global hardware/data regulation is what's needed to reduce existential risk. Therefore, those aiming to reduce AI existential risk should focus on AI Regulation, rather than on AI Alignment. Epistemic status: I've been thinking about this for a few years, while working professionally on x-risk reduction. I think I know most literature on the topic. I have also discussed the topic with a fair number of experts (who in some cases seemed to agree, and in other cases did not seem to agree). Thanks to David Krueger, Matthijs Maas, Roman Yampolskiy, Tim Bakker, Ruben Dieleman, and Alex van der Meer for helpful conversations, comments, and/or feedback. These people do not necessarily share the views expressed in this post. This post is mostly about AI x-risk caused by a take-over. It may or may not be valid for other types of AI x-risks. This post is mostly about the 'end game' of AI existential risk, not about intermediate states. AI existential risk is an evolutionary problem. As Eliezer Yudkowsky and others have pointed out: even if there are safe AIs, those are irrelevant, since they will not prevent others from building dangerous AIs. Examples of safe AIs could be oracles or satisficers, insofar as it turns out to be possible to combine these AI types with high intelligence. But, as Yudkowsky would put it: "if all you need is an object that doesn't do dangerous things, you could try a sponge". Even if a limited AI would be a safe AI, it would not reduce AI existential risk. This is because at some point, someone would create an AI with an unbounded goal (create as many paperclips as possible, predict the next word in the sentence with unlimited accuracy, etc.). This is the AI that would kill us, not the safe one. This is the evolutionary nature of the AI existential risk problem. It is described excellently by Anthony Berglas in his underrated book, and more recently also in Ben Hendrycks' paper. This evolutionary part is a fundamental and very important property of AI existential risk and a large part of why this problem is difficult. Yet, many in AI Alignment and industry seem to focus on only aligning a single AI, which I think is insufficient. Yudkowsky aimed to solve this evolutionary problem (the fact that no-one, ever, should build an unsafe AI) with the so-called pivotal act. An aligned superintelligence would not only not kill humanity, it would also perform a pivotal act, the toy example being to melt all GPUs globally, or, as he later put it, to subtly change all GPUs globally so that they can no longer be used to create an AGI. This would be the act that would actually save humanity from extinction, by making sure no unsafe superintelligences are created, ever, by anyone (it may be argued that melting all GPUs, and all other future hardware that could run AI, would need to be done indefinitely by the aligned superintelligence, else even a pivotal act may be insufficient). The concept of a pivotal act, however, seems to have gone thoroughly out of fashion. None of the leading labs, AI governance think tanks, governments, etc. are talking or, apparently, thinking much about it. Rather, they seem to be thinking about things like non-proliferati...

The Nonlinear Library
AF - Jesse Hoogland on Developmental Interpretability and Singular Learning Theory by Michaël Trazzi

The Nonlinear Library

Play Episode Listen Later Jul 6, 2023 5:50


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Jesse Hoogland on Developmental Interpretability and Singular Learning Theory, published by Michaël Trazzi on July 6, 2023 on The AI Alignment Forum. Jesse Hoogland is a research assistant at David Krueger's lab in Cambridge studying AI Safety who has recently been publishing on LessWrong about how to apply Singular Learning Theory to Alignment, and even organized some workshop in Berkeley last week around this. I thought it made sense to interview him to have some high-level overview of Singular Learning Theory (and other more general approaches like developmental interpretability). Below are some highlighted quotes from our conversation (available on Youtube, Spotify, Google Podcast, Apple Podcast). For the full context for each of these quotes, you can find the accompanying transcript. Interpreting Neural Networks: The Phase Transition View Studying Phase Transitions Could Help Detect Deception "We want to be able to know when these dangerous capabilities are first acquired because it might be too late. They might become sort of stuck and crystallized and hard to get rid of. And so we want to understand how dangerous capabilities, how misaligned values develop over the course of training. Phase transitions seem particularly relevant for that because they represent kind of the most important structural changes, the qualitative changes in the shape of these models internals. Now, beyond that, another reason we're interested in phase transitions is that phase transitions in physics are understood to be a kind of point of contact between the microscopic world and the macroscopic world. So it's a point where you have more control over the behavior of a system than you normally do. That seems relevant to us from a safety engineering perspective. Why do you have more control in a physical system during phase transitions?" (context) A Concrete Example of Phase Transition In Physics and an analogous example inside of neural networks "Jesse: If you heat a magnet to a high enough temperature, then it's no longer a magnet. It no longer has an overall magnetization. And so if you bring another magnet to it, they won't stick. But if you cool it down, at some point it reaches this Curie temperature. If you push it lower, then it will become magnetized. So the entire thing will all of a sudden get a direction. It'll have a north pole and a south pole. So the thing is though, like, which direction will that north pole or south pole be? And so it turns out that you only need an infinitesimally small perturbation to that system in order to point it in a certain direction. And so that's the kind of sensitivity you see, where the microscopic structure becomes very sensitive to tiny external perturbations. Michaël: And so if we bring this back to neural networks, if the weights are slightly different, the overall model could be deceptive or not. Is it something similar? Jesse: This is speculative. There are more concrete examples. So there are these toy models of superposition studied by Anthropic. And that's a case where you can see that it's learning some embedding and unembeddings. So it's trying to compress data. You can see that the way it compresses data involves this kind of symmetry breaking, this sensitivity, where it selects one solution at a phase transition. So that's a very concrete example of this." (context) Developmental Interpretability "Suppose it's possible to understand what's going on inside of neural networks, largely understand them. First assumption. Well then, it's still going to be very difficult to do that at one specific moment in time. I think intractable. The only way you're actually going to build up an exhaustive idea of what structure the model has internally, is to look at how it forms over the course of training. You want to look at each moment, where...

The Nonlinear Library
LW - Jesse Hoogland on Developmental Interpretability and Singular Learning Theory by Michaël Trazzi

The Nonlinear Library

Play Episode Listen Later Jul 6, 2023 5:50


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Jesse Hoogland on Developmental Interpretability and Singular Learning Theory, published by Michaël Trazzi on July 6, 2023 on LessWrong. Jesse Hoogland is a research assistant at David Krueger's lab in Cambridge studying AI Safety who has recently been publishing on LessWrong about how to apply Singular Learning Theory to Alignment, and even organized some workshop in Berkeley last week around this. I thought it made sense to interview him to have some high-level overview of Singular Learning Theory (and other more general approaches like developmental interpretability). Below are some highlighted quotes from our conversation (available on Youtube, Spotify, Google Podcast, Apple Podcast). For the full context for each of these quotes, you can find the accompanying transcript. Interpreting Neural Networks: The Phase Transition View Studying Phase Transitions Could Help Detect Deception "We want to be able to know when these dangerous capabilities are first acquired because it might be too late. They might become sort of stuck and crystallized and hard to get rid of. And so we want to understand how dangerous capabilities, how misaligned values develop over the course of training. Phase transitions seem particularly relevant for that because they represent kind of the most important structural changes, the qualitative changes in the shape of these models internals. Now, beyond that, another reason we're interested in phase transitions is that phase transitions in physics are understood to be a kind of point of contact between the microscopic world and the macroscopic world. So it's a point where you have more control over the behavior of a system than you normally do. That seems relevant to us from a safety engineering perspective. Why do you have more control in a physical system during phase transitions?" (context) A Concrete Example of Phase Transition In Physics and an analogous example inside of neural networks "Jesse: If you heat a magnet to a high enough temperature, then it's no longer a magnet. It no longer has an overall magnetization. And so if you bring another magnet to it, they won't stick. But if you cool it down, at some point it reaches this Curie temperature. If you push it lower, then it will become magnetized. So the entire thing will all of a sudden get a direction. It'll have a north pole and a south pole. So the thing is though, like, which direction will that north pole or south pole be? And so it turns out that you only need an infinitesimally small perturbation to that system in order to point it in a certain direction. And so that's the kind of sensitivity you see, where the microscopic structure becomes very sensitive to tiny external perturbations. Michaël: And so if we bring this back to neural networks, if the weights are slightly different, the overall model could be deceptive or not. Is it something similar? Jesse: This is speculative. There are more concrete examples. So there are these toy models of superposition studied by Anthropic. And that's a case where you can see that it's learning some embedding and unembeddings. So it's trying to compress data. You can see that the way it compresses data involves this kind of symmetry breaking, this sensitivity, where it selects one solution at a phase transition. So that's a very concrete example of this." (context) Developmental Interpretability "Suppose it's possible to understand what's going on inside of neural networks, largely understand them. First assumption. Well then, it's still going to be very difficult to do that at one specific moment in time. I think intractable. The only way you're actually going to build up an exhaustive idea of what structure the model has internally, is to look at how it forms over the course of training. You want to look at each moment, where you learn sp...

The Inside View
Jesse Hoogland on Developmental Interpretability and Singular Learning Theory

The Inside View

Play Episode Listen Later Jul 6, 2023 43:11


Jesse Hoogland is a research assistant at David Krueger's lab in Cambridge studying AI Safety. More recently, Jesse has been thinking about Singular Learning Theory and Developmental Interpretability, which we discuss in this episode. Before he came to grips with existential risk from AI, he co-founded a health-tech startup automating bariatric surgery patient journeys. (00:00) Intro (03:57) Jesse's Story And Probability Of Doom (06:21) How Jesse Got Into Singular Learning Theory (08:50) Intuition behind SLT: the loss landscape (12:23) Does SLT actually predict anything? Phase Transitions (14:37) Why care about phase transition, grokking, etc (15:56) Detecting dangerous capabilities like deception in the (devel)opment (17:24) A concrete example: magnets (20:06) Why Jesse Is Bullish On Interpretability (23:57) Developmental Interpretability (28:06) What Happens Next? Jesse's Vision (31:56) Toy Models of Superposition (32:47) Singular Learning Theory Part 2 (36:22) Are Current Models Creative? Reasoning? (38:19) Building Bridges Between Alignment And Other Disciplines (41:08) Where To Learn More About Singular Learning Theory Make sure I upload regularly: https://patreon.com/theinsideview Youtube: https://youtu.be/713KyknwShA Transcript: https://theinsideview.ai/jesse Jesse: https://twitter.com/jesse_hoogland Host: https://twitter.com/MichaelTrazzi Patreon supporters: - Vincent Weisser - Gunnar Höglund - Ryan Coppolo - Edward Huff - Emil Wallner - Jesse Hoogland - William Freire - Cameron Holmes - Jacques Thibodeau - Max Chiswick - Jack Seroy - JJ Hepburn

The Nonlinear Library: LessWrong
LW - Jesse Hoogland on Developmental Interpretability and Singular Learning Theory by Michaël Trazzi

The Nonlinear Library: LessWrong

Play Episode Listen Later Jul 6, 2023 5:50


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Jesse Hoogland on Developmental Interpretability and Singular Learning Theory, published by Michaël Trazzi on July 6, 2023 on LessWrong. Jesse Hoogland is a research assistant at David Krueger's lab in Cambridge studying AI Safety who has recently been publishing on LessWrong about how to apply Singular Learning Theory to Alignment, and even organized some workshop in Berkeley last week around this. I thought it made sense to interview him to have some high-level overview of Singular Learning Theory (and other more general approaches like developmental interpretability). Below are some highlighted quotes from our conversation (available on Youtube, Spotify, Google Podcast, Apple Podcast). For the full context for each of these quotes, you can find the accompanying transcript. Interpreting Neural Networks: The Phase Transition View Studying Phase Transitions Could Help Detect Deception "We want to be able to know when these dangerous capabilities are first acquired because it might be too late. They might become sort of stuck and crystallized and hard to get rid of. And so we want to understand how dangerous capabilities, how misaligned values develop over the course of training. Phase transitions seem particularly relevant for that because they represent kind of the most important structural changes, the qualitative changes in the shape of these models internals. Now, beyond that, another reason we're interested in phase transitions is that phase transitions in physics are understood to be a kind of point of contact between the microscopic world and the macroscopic world. So it's a point where you have more control over the behavior of a system than you normally do. That seems relevant to us from a safety engineering perspective. Why do you have more control in a physical system during phase transitions?" (context) A Concrete Example of Phase Transition In Physics and an analogous example inside of neural networks "Jesse: If you heat a magnet to a high enough temperature, then it's no longer a magnet. It no longer has an overall magnetization. And so if you bring another magnet to it, they won't stick. But if you cool it down, at some point it reaches this Curie temperature. If you push it lower, then it will become magnetized. So the entire thing will all of a sudden get a direction. It'll have a north pole and a south pole. So the thing is though, like, which direction will that north pole or south pole be? And so it turns out that you only need an infinitesimally small perturbation to that system in order to point it in a certain direction. And so that's the kind of sensitivity you see, where the microscopic structure becomes very sensitive to tiny external perturbations. Michaël: And so if we bring this back to neural networks, if the weights are slightly different, the overall model could be deceptive or not. Is it something similar? Jesse: This is speculative. There are more concrete examples. So there are these toy models of superposition studied by Anthropic. And that's a case where you can see that it's learning some embedding and unembeddings. So it's trying to compress data. You can see that the way it compresses data involves this kind of symmetry breaking, this sensitivity, where it selects one solution at a phase transition. So that's a very concrete example of this." (context) Developmental Interpretability "Suppose it's possible to understand what's going on inside of neural networks, largely understand them. First assumption. Well then, it's still going to be very difficult to do that at one specific moment in time. I think intractable. The only way you're actually going to build up an exhaustive idea of what structure the model has internally, is to look at how it forms over the course of training. You want to look at each moment, where you learn sp...

The Inside View
Alan Chan And Max Kauffman on Model Evaluations, Coordination and AI Safety

The Inside View

Play Episode Listen Later May 6, 2023 73:15


Max Kaufmann⁠ and Alan Chan discuss the evaluation of large language models, AI Governance and more generally the impact of the deployment of foundational models. is currently a Research Assistant to Owain Evans, mainly thinking about (and fixing) issues that might arise as we scale up our current ML systems, but also interested in issues arising from multi-agent failures and situational awareness. Alan is PhD student at Mila advised by Nicolas Le Roux, with a strong interest in AI Safety, AI Governance and coordination. He has also recently been working with David Krueger and helped me with some of the interviews that have been published recently (ML Street talk and Christoph Schuhmann). Disclaimer: this discussion is much more casual than the rest of the conversations in this podcast. This was completely impromptu: I just thought it would be interesting to have Max and Alan discuss model evaluations (also called “evals” for short), since they are both interested in the topic. Transcript: https://heinsideview.ai/alan_and_max Youtube: https://youtu.be/BOLxeR_culU Outline (0:00:00) Introduction(0:01:16) LLMs Translating To Systems In The Future Is Confusing(0:03:23) Evaluations Should Measure Actions Instead of Asking Yes or No Questions(0:04:17) Identify Key Contexts for Dangerous Behavior to Write Concrete Evals(0:07:29) Implicit Optimization Process Affects Evals and Benchmarks(0:08:45) Passing Evals Doesn't Guarantee Safety(0:09:41) Balancing Technical Evals With Social Governance(0:11:00) Evaluations Must Be Convincing To Influence AI Development(0:12:04) Evals Might Convince The AI Safety Community But Not People in FAccT(0:13:21) Difficulty In Explaining AI Risk To Other Communities(0:14:19) Both Existential Safety And Fairness Are Important(0:15:14) Reasons Why People Don't Care About AI Existential Risk(0:16:10) The Association Between Sillicon Valley And People in FAccT(0:17:39) Timelines And RL Understanding Might Impact The Perception Existential Risk From AI(0:19:01) Agentic Models And Longtermism Hinder AI Safety Awareness(0:20:17) The Focus On Immediate AI Harms Might Be A Rejection Of Speculative Claims(0:21:50) Is AI Safety A Pascal Mugging(0:23:15) Believing In The Deployment Of Large Foundational Models Should Be Enough To Start Worrying(0:25:38) AI Capabilities Becomign More Evident to the Public Might Not Be Enough(0:27:27) Addressing Generalization and Reward Specification in AI(0:27:59) Evals as an Additional Layer of Security in AI Safety(0:28:41) A Portfolio Approach to AI Alignment and Safety(0:29:03) Imagine Alignment Is Solved In 2040, What Made It Happen?(0:33:04) AGI Timelines Are Uncertain And Anchored By Vibes(0:35:24) What Matters Is Agency, Strategical Awareness And Planning(0:37:15) Alignment Is A Public Good, Coordination Is Difficult(0:06:48) Dignity As AN Useful Heuristic In The Face Of Doom(0:42:28) What Will Society Look Like If We Actually Get Superintelligent Gods(0:45:41) Uncertainty About Societal Dynamics Affecting Long-Term Future With AGI(0:47:42) Biggest Frustration With The AI Safety Community(0:48:34) AI Safety Includes Addressing Negative Consequences of AI(0:50:41) Frustration: Lack of Bridge Building Between AI Safety and Fairness Communities(0:53:07) Building Bridges by Attending Conferences and Understanding Different Perspectives(0:56:02) AI Systems with Weird Instrumental Goals Pose Risks to Society(0:58:43) Advanced AI Systems Controlling Resources Could Magnify Suffering(1:00:24) Cooperation Is Crucial to Achieve Pareto Optimal Outcomes and Avoid Global Catastrophes(1:01:54) Alan's Origin Story(1:02:47) Alan's AI Safety Research Is Driven By Desire To Reduce Suffering And Improve Lives(1:04:52) Diverse Interests And Concern For Global Problems Led To AI Safety Research(1:08:46) The Realization Of The Potential Dangers Of AGI Motivated AI Safety Work(1:10:39) What is Alan Chan Working On At The Moment

The Inside View
David Krueger–Coordination, Alignment, Academia

The Inside View

Play Episode Listen Later Jan 7, 2023 165:19


David Krueger is an assistant professor at the University of Cambridge and got his PhD from Mila. His research group focuses on aligning deep learning systems, but he is also interested in governance and global coordination. He is famous in Cambridge for not having an AI alignment research agenda per se, and instead he tries to enable his seven PhD students to drive their own research. In this episode we discuss AI Takeoff scenarios, research going on at David's lab, Coordination, Governance, Causality, the public perception of AI Alignment research and how to change it. Youtube: https://youtu.be/bDMqo7BpNbk Transcript: https://theinsideview.ai/david OUTLINE (00:00) Highlights (01:06) Incentivized Behaviors and Takeoff Speeds (17:53) Building Models That Understand Causality (31:04) Agency, Acausal Trade And Causality in LLMs (40:44) Recursive Self Improvement, Bitter Lesson And Alignment (01:03:17) AI Governance And Coordination (01:13:26) David's AI Alignment Research Lab and the Existential Safety Community (01:24:13) On The Public Perception of AI Alignment (01:35:58) How To Get People In Academia To Work on Alignment (02:00:19) Decomposing Learning Curves, Latest Research From David Krueger's Lab (02:20:06) Safety-Performance Trade-Offs (02:30:20) Defining And Characterizing Reward Hacking (02:40:51) Playing Poker With Ethan Caballero, Timelines

The Nonlinear Library
LW - My Advice for Incoming SERI MATS Scholars by Johannes C. Mayer

The Nonlinear Library

Play Episode Listen Later Jan 4, 2023 6:30


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My Advice for Incoming SERI MATS Scholars, published by Johannes C. Mayer on January 3, 2023 on LessWrong. I have participated in SERI MATS 2.0 in John's stream. Here is some advice based on my experience. Be Nice The AI alignment community is pretty small. If you are an ass, everybody will know that you are an ass. The same holds to a lesser extent for being nice. When I was visiting Edinburgh to attend a talk by David Krueger, there were several people there, that I had first met at Lightcone. When I was visiting Trajan House, the same thing happened. You never know when you might be talking to a grantmaker over dinner. Epistemic status: I did not actually behave like an ass. I expect this to be true, based on how many people I ran into that I've seen before, in different parts of the world. Use Lunch and Dinner at Lightcone During MATS 2.0 lunch and dinner were both served at Lightcone every day of the week. There were always many cool people around, and the conversations were unusually insightful. My favorite heuristic is to just join whatever conversation John is in. I am pretty sure that at least 15% of the value of SERI MATS came from eating lunch and dinner at Lightcone. Probably much more than that. Epistemic status: It feels like this was very useful, but it is hard to quantify. Take care of yourself At the beginning of SERI MATS, there were many social events (mostly just general Berkeley EA/Rationalist events). They were all happening pretty late. For some reason, I need to sleep 10:30 to 12:00 hours every day or I will be tired. My team was meeting at 10:00 every day. For the first 3 weeks, I was basically sleep-deprived almost every day. John's workshops are pretty great, and being sleep-deprived during them destroyed probably more than 20% of the value. That being said, at least one of the socials was high-value, and it was probably worth the cost. The worst thing was that I got used to being sleep-deprived. I sleep-deprived myself, even when there were no socials happening. I made similar mistakes with doing sports and eating healthily. Somehow it's hard to keep up all the good habits when you change your environment. Epistemic status: It's hard to evaluate the counterfactual where I was not sleep-deprived. I estimate I could have gotten 5-35% more value by not making the mistakes I listed. Learn to detach yourself from your ideas Check out this comment. Be Agentic If something doesn't fit right, try to fix it. Do you have a crazy idea about how to improve the office? Ask, or implement it yourself (after getting permission)! (The Lightcone ops team is very competent and cool. John had a loft bed in his office when I was there. I am not sure about the situation in the new SERI MATS offices.) Choose how you spend your time. If you are in a conversation, notice when you would rather do something else. I recommend that you act on this feeling. Get back to work, join that other discussion that seems more interesting, or do whatever else seems higher value. I think being able to do this is great. Building up this skill is probably easier when talking to rationalists. They won't punish you for this kind of initiative. In general, being agentic seems highly related to making sure that you thought all your high-value thoughts. I recommend sitting down for at least 5 minutes by the clock every day, and trying to come up with high-value directions to think in. The second step is then to always do what you think is best. Which is not easy. Think about AI alignment from scratch Reading somebodies work is different from discovering the underlying insights for yourself. Many details will be omitted in a write-up. Especially details on the research process. When I thought about AI alignment from scratch, I was thinking thoughts I had not thought of before. It seems l...

The Nonlinear Library: LessWrong
LW - My Advice for Incoming SERI MATS Scholars by Johannes C. Mayer

The Nonlinear Library: LessWrong

Play Episode Listen Later Jan 4, 2023 6:30


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My Advice for Incoming SERI MATS Scholars, published by Johannes C. Mayer on January 3, 2023 on LessWrong. I have participated in SERI MATS 2.0 in John's stream. Here is some advice based on my experience. Be Nice The AI alignment community is pretty small. If you are an ass, everybody will know that you are an ass. The same holds to a lesser extent for being nice. When I was visiting Edinburgh to attend a talk by David Krueger, there were several people there, that I had first met at Lightcone. When I was visiting Trajan House, the same thing happened. You never know when you might be talking to a grantmaker over dinner. Epistemic status: I did not actually behave like an ass. I expect this to be true, based on how many people I ran into that I've seen before, in different parts of the world. Use Lunch and Dinner at Lightcone During MATS 2.0 lunch and dinner were both served at Lightcone every day of the week. There were always many cool people around, and the conversations were unusually insightful. My favorite heuristic is to just join whatever conversation John is in. I am pretty sure that at least 15% of the value of SERI MATS came from eating lunch and dinner at Lightcone. Probably much more than that. Epistemic status: It feels like this was very useful, but it is hard to quantify. Take care of yourself At the beginning of SERI MATS, there were many social events (mostly just general Berkeley EA/Rationalist events). They were all happening pretty late. For some reason, I need to sleep 10:30 to 12:00 hours every day or I will be tired. My team was meeting at 10:00 every day. For the first 3 weeks, I was basically sleep-deprived almost every day. John's workshops are pretty great, and being sleep-deprived during them destroyed probably more than 20% of the value. That being said, at least one of the socials was high-value, and it was probably worth the cost. The worst thing was that I got used to being sleep-deprived. I sleep-deprived myself, even when there were no socials happening. I made similar mistakes with doing sports and eating healthily. Somehow it's hard to keep up all the good habits when you change your environment. Epistemic status: It's hard to evaluate the counterfactual where I was not sleep-deprived. I estimate I could have gotten 5-35% more value by not making the mistakes I listed. Learn to detach yourself from your ideas Check out this comment. Be Agentic If something doesn't fit right, try to fix it. Do you have a crazy idea about how to improve the office? Ask, or implement it yourself (after getting permission)! (The Lightcone ops team is very competent and cool. John had a loft bed in his office when I was there. I am not sure about the situation in the new SERI MATS offices.) Choose how you spend your time. If you are in a conversation, notice when you would rather do something else. I recommend that you act on this feeling. Get back to work, join that other discussion that seems more interesting, or do whatever else seems higher value. I think being able to do this is great. Building up this skill is probably easier when talking to rationalists. They won't punish you for this kind of initiative. In general, being agentic seems highly related to making sure that you thought all your high-value thoughts. I recommend sitting down for at least 5 minutes by the clock every day, and trying to come up with high-value directions to think in. The second step is then to always do what you think is best. Which is not easy. Think about AI alignment from scratch Reading somebodies work is different from discovering the underlying insights for yourself. Many details will be omitted in a write-up. Especially details on the research process. When I thought about AI alignment from scratch, I was thinking thoughts I had not thought of before. It seems l...

The Nonlinear Library
LW - Probably good projects for the AI safety ecosystem by Ryan Kidd

The Nonlinear Library

Play Episode Listen Later Dec 5, 2022 4:19


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Probably good projects for the AI safety ecosystem, published by Ryan Kidd on December 5, 2022 on LessWrong. At EAGxBerkeley 2022, I was asked several times what new projects might benefit the AI safety and longtermist research ecosystem. I think that several existing useful-according-to-me projects (e.g., SERI MATS, REMIX, CAIS, etc.) could urgently absorb strong management and operations talent, but I think the following projects would also probably be useful to the AI safety/longtermist project. Criticisms are welcome. Projects I might be excited to see, in no particular order: A London-based MATS clone to build the AI safety research ecosystem there, leverage mentors in and around London (e.g., DeepMind, CLR, David Krueger, Aligned AI, Conjecture, etc.), and allow regional specialization. This project should probably only happen once MATS has ironed out the bugs in its beta versions and grown too large for one location (possibly by Winter 2023). Please contact the MATS team before starting something like this to ensure good coordination and to learn from our mistakes. Rolling admissions alternatives to MATS' cohort-based structure for mentors and scholars with different needs (e.g., to support alignment researchers who suddenly want to train/use research talent at irregular intervals but don't have the operational support to do this optimally). A combined research mentorship and seminar program that aims to do for AI governance research what MATS is trying to do for technical AI alignment research. A dedicated bi-yearly workshop for AI safety university group leaders that teaches them how to recognize talent, foster useful undergraduate research projects, and build a good talent development pipeline or “user journey” (including a model of alignment macrostrategy and where university groups fit in). An organization that does for the Open Philanthropy worldview investigations team what GCP did to supplement CEA's workshops and 80,000 Hours' career advising calls. Further programs like ARENA that aim to develop ML safety engineering talent at scale by leveraging good ML tutors and proven curricula like CAIS' Intro to ML Safety, Redwood Research's MLAB, and Jacob Hilton's DL curriculum for large language module alignment. More contests like ELK with well-operationalized research problems (i.e., clearly explain what builder/breaker steps look like), clear metrics of success, and have a well-considered target audience (who is being incentivized to apply and why?) and user journey (where do prize winners go next?). Possible contest seeds: Evan Hubinger's SERI MATS deceptive AI challenge problem; Vivek Hebbar's and Nate Soares' SERI MATS diamond maximizer selection problem; Alex Turner's and Quintin Pope's SERI MATS training stories selection problem. More "plug-and-play" curriculums for AI safety university groups, like AGI Safety Fundamentals, Alignment 201, Intro to ML Safety. A well-considered "precipism" university course template that critically analyzes Toby Ord's “The Precipice,” Holden Karnofsky's “The Most Important Century,” Will MacAskill's “What We Owe The Future,” some Open Philanthropy worldview investigations reports, some Global Priorities Institute ethics papers, etc. Hackathons in which people with strong ML knowledge (not ML novices) write good-faith critiques of AI alignment papers and worldviews (e.g., what Jacob Steinhardt's “ML Systems Will Have Weird Failure Modes” does for Hubinger et al.'s “Risks From Learned Optimization”). A New York-based alignment hub that aims to provide talent search and logistical support for NYU Professor Sam Bowman's planned AI safety research group. More organizations like CAIS that aim to recruit established ML talent into alignment research with clear benchmarks, targeted hackathons/contests with prizes, and offers ...

The Nonlinear Library: LessWrong
LW - Probably good projects for the AI safety ecosystem by Ryan Kidd

The Nonlinear Library: LessWrong

Play Episode Listen Later Dec 5, 2022 4:19


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Probably good projects for the AI safety ecosystem, published by Ryan Kidd on December 5, 2022 on LessWrong. At EAGxBerkeley 2022, I was asked several times what new projects might benefit the AI safety and longtermist research ecosystem. I think that several existing useful-according-to-me projects (e.g., SERI MATS, REMIX, CAIS, etc.) could urgently absorb strong management and operations talent, but I think the following projects would also probably be useful to the AI safety/longtermist project. Criticisms are welcome. Projects I might be excited to see, in no particular order: A London-based MATS clone to build the AI safety research ecosystem there, leverage mentors in and around London (e.g., DeepMind, CLR, David Krueger, Aligned AI, Conjecture, etc.), and allow regional specialization. This project should probably only happen once MATS has ironed out the bugs in its beta versions and grown too large for one location (possibly by Winter 2023). Please contact the MATS team before starting something like this to ensure good coordination and to learn from our mistakes. Rolling admissions alternatives to MATS' cohort-based structure for mentors and scholars with different needs (e.g., to support alignment researchers who suddenly want to train/use research talent at irregular intervals but don't have the operational support to do this optimally). A combined research mentorship and seminar program that aims to do for AI governance research what MATS is trying to do for technical AI alignment research. A dedicated bi-yearly workshop for AI safety university group leaders that teaches them how to recognize talent, foster useful undergraduate research projects, and build a good talent development pipeline or “user journey” (including a model of alignment macrostrategy and where university groups fit in). An organization that does for the Open Philanthropy worldview investigations team what GCP did to supplement CEA's workshops and 80,000 Hours' career advising calls. Further programs like ARENA that aim to develop ML safety engineering talent at scale by leveraging good ML tutors and proven curricula like CAIS' Intro to ML Safety, Redwood Research's MLAB, and Jacob Hilton's DL curriculum for large language module alignment. More contests like ELK with well-operationalized research problems (i.e., clearly explain what builder/breaker steps look like), clear metrics of success, and have a well-considered target audience (who is being incentivized to apply and why?) and user journey (where do prize winners go next?). Possible contest seeds: Evan Hubinger's SERI MATS deceptive AI challenge problem; Vivek Hebbar's and Nate Soares' SERI MATS diamond maximizer selection problem; Alex Turner's and Quintin Pope's SERI MATS training stories selection problem. More "plug-and-play" curriculums for AI safety university groups, like AGI Safety Fundamentals, Alignment 201, Intro to ML Safety. A well-considered "precipism" university course template that critically analyzes Toby Ord's “The Precipice,” Holden Karnofsky's “The Most Important Century,” Will MacAskill's “What We Owe The Future,” some Open Philanthropy worldview investigations reports, some Global Priorities Institute ethics papers, etc. Hackathons in which people with strong ML knowledge (not ML novices) write good-faith critiques of AI alignment papers and worldviews (e.g., what Jacob Steinhardt's “ML Systems Will Have Weird Failure Modes” does for Hubinger et al.'s “Risks From Learned Optimization”). A New York-based alignment hub that aims to provide talent search and logistical support for NYU Professor Sam Bowman's planned AI safety research group. More organizations like CAIS that aim to recruit established ML talent into alignment research with clear benchmarks, targeted hackathons/contests with prizes, and offers ...

The Mysterious Case of the Gentleman of Heligoland

We have two men in our sights as our search for the Gentleman of Heligoland continues. David Martin, the Australian millionaire who went missing in 1992 and David Krueger, the 6'5" German man missing from Canada - But how can we rule them in or out?

The Nonlinear Library
LW - AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022 by Sam Bowman

The Nonlinear Library

Play Episode Listen Later Sep 2, 2022 10:56


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022, published by Sam Bowman on September 1, 2022 on LessWrong. Getting into AI safety involves working with a mix of communities, subcultures, goals, and ideologies that you may not have encountered in the context of mainstream AI technical research. This document attempts to briefly map these out for newcomers. This is inevitably going to be biased by what sides of these communities I (Sam) have encountered, and it will quickly become dated. I expect it will still be a useful resource for some people anyhow, at least in the short term. AI Safety/AI Alignment/AGI Safety/AI Existential Safety/AI X-Risk The research project of ensuring that future AI progress doesn't yield civilization-endingly catastrophic results. Good intros: Carlsmith Report What misalignment looks like as capabilities scale Vox piece Why are people concerned about this? My rough summary: It's plausible that future AI systems could be much faster or more effective than us at real-world reasoning and planning. Probably not plain generative models, but possibly models derived from generative models in cheap ways Once you have a system with superhuman reasoning and planning abilities, it's easy to make it dangerous by accident. Most simple objective functions or goals become dangerous in the limit, usually because of secondary or instrumental subgoals that emerge along the way. Pursuing typical goals arbitrarily well requires a system to prevent itself from being turned off, by deception or force if needed. Pursuing typical goals arbitrarily well requires acquiring any power or resources that could increase the chances of success, by deception or force if needed. Toy example: Computing pi to an arbitrarily high precision eventually requires that you spend all the sun's energy output on computing. Knowledge and values are likely to be orthogonal: A model could know human values and norms well, but not have any reason to act on them. For agents built around generative models, this is the default outcome. Sufficiently powerful AI systems could look benign in pre-deployment training/research environments, because they would be capable of understanding that they're not yet in a position to accomplish their goals. Simple attempts to work around this (like the more abstract goal ‘do what your operators want') don't tend to have straightforward robust implementations. If such a system were single-mindedly pursuing a dangerous goal, we probably wouldn't be able to stop it. Superhuman reasoning and planning would give models with a sufficiently good understanding of the world many ways to effectively gain power with nothing more than an internet connection. (ex: Cyberattacks on banks.) Consensus within the field is that these risks could become concrete within ~4–25 years, and have a >10% chance of being leading to a global catastrophe (i.e., extinction or something comparably bad). If true, it's bad news. Given the above, we either need to stop all development toward AGI worldwide (plausibly undesirable or impossible), or else do three possible-but-very-difficult things: (i) build robust techniques to align AGI systems with the values and goals of their operators, (ii) ensure that those techniques are understood and used by any group that could plausibly build AGI, and (iii) ensure that we're able to govern the operators of AGI systems in a way that makes their actions broadly positive for humanity as a whole. Does this have anything to do with sentience or consciousness? No. Influential people and institutions: Present core community as I see it: Paul Christiano, Jacob Steinhardt, Ajeya Cotra, Jared Kaplan, Jan Leike, Beth Barnes, Geoffrey Irving, Buck Shlegeris, David Krueger, Chris Olah, Evan Hubinger, Richard Ngo, Rohin Shah; ARC, R...

The Nonlinear Library: LessWrong
LW - AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022 by Sam Bowman

The Nonlinear Library: LessWrong

Play Episode Listen Later Sep 2, 2022 10:56


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022, published by Sam Bowman on September 1, 2022 on LessWrong. Getting into AI safety involves working with a mix of communities, subcultures, goals, and ideologies that you may not have encountered in the context of mainstream AI technical research. This document attempts to briefly map these out for newcomers. This is inevitably going to be biased by what sides of these communities I (Sam) have encountered, and it will quickly become dated. I expect it will still be a useful resource for some people anyhow, at least in the short term. AI Safety/AI Alignment/AGI Safety/AI Existential Safety/AI X-Risk The research project of ensuring that future AI progress doesn't yield civilization-endingly catastrophic results. Good intros: Carlsmith Report What misalignment looks like as capabilities scale Vox piece Why are people concerned about this? My rough summary: It's plausible that future AI systems could be much faster or more effective than us at real-world reasoning and planning. Probably not plain generative models, but possibly models derived from generative models in cheap ways Once you have a system with superhuman reasoning and planning abilities, it's easy to make it dangerous by accident. Most simple objective functions or goals become dangerous in the limit, usually because of secondary or instrumental subgoals that emerge along the way. Pursuing typical goals arbitrarily well requires a system to prevent itself from being turned off, by deception or force if needed. Pursuing typical goals arbitrarily well requires acquiring any power or resources that could increase the chances of success, by deception or force if needed. Toy example: Computing pi to an arbitrarily high precision eventually requires that you spend all the sun's energy output on computing. Knowledge and values are likely to be orthogonal: A model could know human values and norms well, but not have any reason to act on them. For agents built around generative models, this is the default outcome. Sufficiently powerful AI systems could look benign in pre-deployment training/research environments, because they would be capable of understanding that they're not yet in a position to accomplish their goals. Simple attempts to work around this (like the more abstract goal ‘do what your operators want') don't tend to have straightforward robust implementations. If such a system were single-mindedly pursuing a dangerous goal, we probably wouldn't be able to stop it. Superhuman reasoning and planning would give models with a sufficiently good understanding of the world many ways to effectively gain power with nothing more than an internet connection. (ex: Cyberattacks on banks.) Consensus within the field is that these risks could become concrete within ~4–25 years, and have a >10% chance of being leading to a global catastrophe (i.e., extinction or something comparably bad). If true, it's bad news. Given the above, we either need to stop all development toward AGI worldwide (plausibly undesirable or impossible), or else do three possible-but-very-difficult things: (i) build robust techniques to align AGI systems with the values and goals of their operators, (ii) ensure that those techniques are understood and used by any group that could plausibly build AGI, and (iii) ensure that we're able to govern the operators of AGI systems in a way that makes their actions broadly positive for humanity as a whole. Does this have anything to do with sentience or consciousness? No. Influential people and institutions: Present core community as I see it: Paul Christiano, Jacob Steinhardt, Ajeya Cotra, Jared Kaplan, Jan Leike, Beth Barnes, Geoffrey Irving, Buck Shlegeris, David Krueger, Chris Olah, Evan Hubinger, Richard Ngo, Rohin Shah; ARC, R...

The Nonlinear Library
AF - Encultured AI, Part 1 Appendix: Relevant Research Examples by Andrew Critch

The Nonlinear Library

Play Episode Listen Later Aug 8, 2022 13:42


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Encultured AI, Part 1 Appendix: Relevant Research Examples, published by Andrew Critch on August 8, 2022 on The AI Alignment Forum. Also available on the EA Forum.Appendix to: Encultured AI, Part 1: Enabling New BenchmarksFollowed by: Encultured AI, Part 2: Providing a Service Appendix 1: “Trending” AI x-safety research areas We mentioned a few areas of “trending” AI x-safety research above; below are some more concrete examples of what we mean: Trustworthiness & truthfulness: Owain Evans, Owen Cotton-Barratt and others have authored “Truthful AI: Developing and governing AI that does not lie” (arxiv, 2021; twitter thread). Andreas Stuhlmüller, Jungwon Byun and others at Ought.org are building an AI-powered research assistant called Elicit (website); here is the product:. Task-specific (narrow) preference learning: Paul Christiano et al (arxiv, 2017) developed a data-efficient preference-learning technique for training RL-based systems, which is now very widely cited (scholar). Jan Leike, now at OpenAI, leads a team working on ‘scalable alignment' using preference-learning techniques (arxiv, 2018) (blog). Interpretability: Chris Olah (scholar) leads an interpretability research group at Anthropic. Anthropic (website) is culturally very attuned to large-scale risks from AI, including existential risks. Buck Shlegeris and others at Redwood Research (website) have built an interpretability tool for analyzing transformer networks trained on natural language (demo). Prof. Cynthia Rudin at Duke (homepage) approaches interpretability by trying to replace black-box models with more interpretable ones (arxiv, 2018), and we know from conversations with her that she is open to applications of her work to existential safety. Robustness & risk management: Prof. Jaime Fisac at Princeton (homepage) researches AI safety for robotics, high-dimensional control systems and multi-agent systems (scholar), including provable robustness guarantees. He was previously a PhD student at the UC Berkeley Center for Human-Compatible AI (CHAI), provided extensive feedback on AI Research Considerations for Human Existential Safety (ARCHES) (arxiv, 2020), and is very attuned to existential safety as a cause area. Prof. David Krueger at Cambridge (scholar) studies out-of-distribution generalization (pdf, 2021), and is currently taking on students. Adam Gleave (homepage) is a final-year PhD student at CHAI / UC Berkeley, and studies out-of-distribution robustness for deep RL. Sam Toyer (scholar), also a PhD student at CHAI, has developed a benchmark for robust imitation learning (pdf, 2020). Appendix 2: “Emerging” AI x-safety research areas In this post, we classified cooperative AI and multi-stakeholder control of AI systems as “emerging” topics in AI x-safety. Here's more about what we mean, and why: Cooperative AI This area is “emerging” in x-safety because there's plenty of attention to the issue of cooperation from both policy-makers and AI researchers, but not yet much among folks focused on x-risk. Existential safety attention on cooperative AI: Many authors — too many to name! — have remarked on the importance of international coordination on AI safety efforts, including existential safety. For instance, there is a Wikipedia article on AI arms races (wikipedia). This covers the human–human side of the cooperative AI problem. AI research on cooperative AI: Multi-agent systems research has a long history in AI (scholar search), as does multi-agent reinforcement learning (scholar search). DeepMind's Multi-agent Learning team has recently written number papers examining competition and cooperation between artificial agents (website). OpenAI has done some work on multi-agent interaction, e.g. emergent tool use in multi-agent interaction (arxiv). Prof. Jakob Foerster at Oxford (scholar search), and ...

The Nonlinear Library
AF - What's a good probability distribution family (e.g. "log-normal") to use for AGI timelines? by David Krueger

The Nonlinear Library

Play Episode Listen Later Apr 13, 2022 0:25


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What's a good probability distribution family (e.g. "log-normal") to use for AGI timelines?, published by David Krueger on April 13, 2022 on The AI Alignment Forum. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

Academic Dean
Dr. David Krueger, Montana State University Northern

Academic Dean

Play Episode Listen Later Jun 7, 2021 46:38


Dr. David Krueger is a higher education professional with over 30 years extensive experience in collaborating and partnering with Career and Technical Education (CTE) professionals, university administrators, industry and community leaders.  Dr. Krueger began his career as a high school agriculture teacher and CTE professional in Idaho before moving to Michigan to earn a doctorate degree in Agriculture and Extension Education at Michigan State University and begin preparing the next generation of CTE professionals. Dr. Krueger has collaborated on many CTE curriculum projects around the country and started CTE programs to meet the needs of the ever-changing skilled workforce landscape. In the summer of 2018 Dr. Krueger accepted the position of Dean in the College of Technical Sciences at Montana State University – Northern. Dr. Krueger directs the college which offers a wide range of degrees and programs through a blend of applied, project-based curricula that educates a diverse group of students. Graduates of the College are equipped with strong analytical capabilities, the latest in cutting-edge technologies, and the skills for innovation, entrepreneurship, and leadership. They continue to fuel the economic and technological development of Montana and beyond. The College of Technical Sciences strives to become a leader nationwide in shaping new concepts of applied technical and apprenticeship education. 

The Alpha Babes Podcast
#62 How to Empower Yourself Through Self Language

The Alpha Babes Podcast

Play Episode Listen Later Oct 6, 2020 45:39


Empowering yourself starts with the story you tell yourself. Change your story, change your life. Reshaping our image of ourselves can be difficult, especially when we hold such negative beliefs about ourselves. Well we are living proof that stories can change and it starts by how you talk about yourself. In this episode we discuss Dr. David Krueger 7 step strategy for reshaping any disempowering mental story you may have of yourself. In case you're unable to write down the steps, here they are for you. 1. Recognise Authorship - If you're serious about improving your life you must recognise that YOU are the author of your story. 2. Be Accountable - Once you are aware of authorship you must become accountable. You have to own your story if your serious about making changes to it. 3. Assess Your Story - What is your story made up of? What are the recurring patterns of beliefs? What's the story line? Who are the characters? 4. Decide What to Change - What parts of the story do you want to change? And what parts of the story do you need to let go of? 5. Map your Changes - Everyone needs a road map to get from Point A to point B. What do you need to do to get there? 6. Start Authoring New Experiences - Create affirmations, visualizations, and real life experiences that validate your NEW STORY line. 7. Program a New Identity - Along with creating a new life story, it is CRUCIAL you create a new sense of self-image. On another note, stay tuned to our IG stories for how you can enter the Alpha Babes Podcast Giveaway. We will draw the winner LIVE on 13th October.

The Money Answers Show
Rewire Your Brain For Wealth!

The Money Answers Show

Play Episode Listen Later Aug 26, 2019 55:45


Could your brain be coming between you and prosperity? Is it possible to rewire your brain and have wealth be the result? What is the relationship between how you behave and respond and how well you do in pursuit of financial independence? What role does our personal money story play in our lives and can we change it? Jordan Goodman, host of The Money Answers Show, talks with David Krueger, M.D. about his newest book, Your New Money Story. Krueger and Goodman discuss the newest insights into behavior around money with practical applications people can bring to their every day life and about how understanding our relationship with money can allow us to make dramatic changes for the better.

The All-Star Leader Podcast
Episode 011 - Entrepreneur, Business and Life Coach Aaron Walker

The All-Star Leader Podcast

Play Episode Listen Later Oct 10, 2016 47:56


Aaron teaches leadership qualities for a life of significance. Bio: Aaron Walker is an entrepreneur who has started and sold several successful businesses. After retiring for the third time, a friend convinced him he needed to start coaching young, up and coming entrepreneurs, which led him to one of his current ventures, View From The Top. Aaron is a business leader, a mentor, a coach and a teacher. He is also a leader in his family which consists of his wife of 36 years Robin, two daughters and one son-in-law, and five grandchildren. Interview: I hit a few highlights about you in the introduction, but give us a little backstory and what your journey has looked like to get where you are today. Started at 18 with some business partners who had money He didn't have much at all. 600 sf home; parents didn't make much money Grew four companies over nine years and sold them; retired at 27 18 months later he was bored, and his wife Robin said you gotta do something! Launched a new business and grew it by 4x But then 8/1/01, car accident changed everything; Aaron hit a pedestrian with his car, and that person died from the injuries. Led to a five year hiatus from business Traveled Built a house Robin told him again to go back to work Partnered with a construction company and grew it to the best builder in Nashville Retired again at age 50 Guys in his mastermind (Dave Ramsey, Ken Abraham, Dan Miller) suggested that he coach Dave Ramsey EntreLeadership Mastery Dan Miller Elevate Challenged him to inspire young men to reach their potential So new business i. Mastermind groups ii. Coaching iii. View from the Top community (Follow up Q from Daniel - What is with Nashville and entrepreneurs?) Demos/location good Entrepreneur center 6 miles from 2 coasts 80,000/year moving to Nashville Talking about the growth of Nashville, Austin, Waco Robin and friends are coming to see the “Flip or Flop”!! (Fixer Upper / Magnolia / Chip and Joanna Gaines) What does it mean to grow from success to significance? Deals with guys ranging from massively successful financially to those just getting started Need those with same objective even if in different places and different ideas Looking to develop great husbands, fathers, business leaders What is success? Most people are reactive rather than proactive Just want more, bigger, shinier But what if that's not really what you want? Content v. complacent For him, it was Choosing his own schedule Financial freedom – important (not shying away from that) but don't make it your God, paramount or your only focus Have an engaging family Both of his daughters work with him Meaningful relationships and a clear conscience Content not complacent Be happy where you are now Choose happiness (not a trait; a choice) But keep the pedal down and keep growing and moving forward Faith is important – trust in higher power Legacy – there is a man with great wisdom Car accident changed his view on this, from just making a bunch of money to Meeting the needs of others and engaging with other people Do things for people anonymously so people can't repay or even acknowledge Gratitude day at restaurants, nursing homes, etc. Invest in others that might trickle down and impact the masses You have said that you fear missing an opportunity more than you fear failure. Fear is the biggest enemy we have Failure is in not trying; not in not succeeding If you don't attempt it, it's guaranteed you won't be successful You either succeed or you learn Just don't repeat the things that don't work, and do more of the things that do We can get paralyzed in the fear of failure Pushes us to average – just as close to the bottom as the top Inch wide and mile deep – Greg McKeown in Essentialism Do the best you can do with everything; don't have to be the best at everything Don't leave anything on the field Be exhausted at night! Gave all he had to his business Naysayers are everywhere, but you can't listen to them Carol Dweck – Mindset – either have growth or fixed mindset. Fixed mindset people say “well I can't do it” Mom's saying: Can't couldn't do it and could did it all Some confidence leads to some success which leads to more confidence and more success. Get the word “can't” out of your vocabulary The answer is always “no” if you don't ask John Lee Dumas / Entrepreneur on Fire example about asking The Big Leap by Gay Hendricks – overcoming upper limit challenges Can you tell us about a business you started that wasn't as successful, why you think that was the case, and what you learned from it? We need to be a constant student About 80% of business people read just one book per year Excuse is don't have time; Darren Hardy says invest 10% of personal revenue into personal development His mastermind groups read books every month Helps you mitigate against challenges Stay on the cutting edge of technology Lots of reading and lots of conferences to get exposed to what you do not know. Dr. David Krueger at Baylor was Aaron's personal coach – he hires lots of coaches Great companies and organizations rely on great people. What was and is your hiring process like and what do you prioritize in order to recruit the best people? Hire slow / fire fast Vanderbilt baseball won the world series recently; the coach is well respected Coach interviewed some of his players I can teach you the game of baseball, but I can't teach you character What are the mission, vision, moral standard, core values? Do they align? Make decisions based on that Don't just base decisions on your gut Dave Ramsey example Over a dozen interviews now Very strategic hiring Cody Foster – Advisors Excel in Topeka, KS (400 people; $5.5B) One of the largest mistakes he made was not firing people fast enough A toxic employee will destroy the chemistry of the team (Daniel follow up on values; how do you communicate them and not let them collect dust?) Chikfila example Friend was owner/operator No matter which location you go to, everything is the same. My Pleasure All based on intense training Must incorporate into training that is done on a regular basis If you don't use it, you lose it Must have systems and processes How can those of us who are in entry or mid-level roles in our organizations influence those above us? i.e. How can we lead up? Attitude is everything Only thing you are in total control over Adam Grant – Give and Take Be a giver and not a taker Bring value to the company How can I bring more value than what the company is paying for How can I make this a pleasant experience Bosses would pass out if an employee came to them and said they want to move up, they are willing to invest, here is my skillset, willing to do more. I'll help promote you and make you look good so you will benefit too Help bosses, clients, etc. achieve their results David Novak; CEO of YUM? Took his employees with him and recognize and honor them Pictures all over his office of his employees Focused outward How can those of us interested in broad-based success—at work, home, and in our faith and community, allocate our time, thoughts and resources? We're all faced with this It's not about time management; you can't manage it; it happens regardless But you can manage your priorities Procrastinate On Purpose – Rory Vaden Priority dilution – we've made priority plural when it's singular What is your priority? Faith, Spouse, Children, family, job We try to do too many things, and wind up an inch deep and mile wide What if we flipped that and became an inch wide but several miles deep? The job doesn't have a memory but your family does. It's always just one more email, one more call, etc. and you wind up working right through family time The only way to protect against this is to set up boundaries. Could always make more money by giving up everything else, but why? To what end? Resources – viewfromthetop.com/allstarleader (everything is free for our audience) Personal Assessment – deals with identity, ideals, relationships What Do I Want? – Helps you proactively live rather than just reacting Steps to a productive day – helps you get situated and you have methodical schedules Want to know more about masterminds and accountability groups? Listen to episode 006 with Dave Stachowiak or episode 010 with Tom Schwab. What habits have been key to your success? Morning routine is catalyst 5 days per week 1 or 1.5 hours in the morning – quiet time with scripture, prayer, meditation and worship music Helps to acclimate to the day Lots of books too Personal development Ted Talks Podcasts Positive things (rare for him to tell this) – turned the news off 14 years ago because of the negative energy Want to spend time on things he can impact and those around him How can people connect with you and also learn more about your coaching and mentoring programs? Viewfromthetop.com The Community is a great place to start and take a first step @vftcoach on Twitter View From The Top book in 1Q 2017 Thank Yous/Acknowledgements: Antioch Live/Clear Day Media Group – music More here. Jonathan Davis – production Clint Musslewhite – voice over