Podcast appearances and mentions of david krueger

31PODCASTS
131EPISODES
33mAVG DURATION
?INFREQUENT EPISODES
Sep 3, 2025LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about david krueger

The Nonlinear Library

12 episodes with david krueger

Gibbsville Reformed Church | GRC Audio

78 episodes with david krueger

Jacob's Well Church Sermon Audio, Green Bay WI

2 episodes with david krueger

The Inside View

3 episodes with david krueger

Future of Life Institute Podcast

2 episodes with david krueger

The Money Answers Show

2 episodes with david krueger

Vital Connections On Air

2 episodes with david krueger

The Nonlinear Library: LessWrong

6 episodes with david krueger

Latest podcast episodes about david krueger

BOTS ARE HIRING HUMANS

AI DAILY: Breaking News in AI

Play Episode Listen Later Sep 3, 2025 3:46

Plus Why GPT-5 Flopped AI's Next Hustle? Recruiting People to Train More AIForget robots hiring humans—AI's hiring trainers. Mercor built an AI recruiter that interviews candidates, and guess what? The real cash cow wasn't hiring—it was recruiting folks to teach more AI. That means your next job might be training our algorithm overlords, not working for them.This AI Startup Claims It Can “Eliminate Crime” But Has a Wild Surveillance EmpireFlock Safety runs an army of AI‑powered cameras—over 80,000 across the U.S.—and says it's out here trying to “eliminate crime.” But the reality's messier. It's not just about footage—it's a surveillance network blowing up with privacy alarms, legal headaches, and serious civil‑liberties chatter.Why GPT-5 Flopped—and What AI Needs to Do NextGPT-5 was hyped as almost AGI, but it stumbled on math, riddles, and reliability. Critics say the “bigger is better” scaling mantra has hit a wall. Instead of chasing size, experts argue AI should borrow from cognitive science—world models, core knowledge, neurosymbolic blends—to actually move past statistical mimicry.Vibe Coding Is Here—Engineers Aren't Replaced, They're Now Tech ConductorsAI isn't killing coding—it's evolving it. In "vibe coding," engineers describe what they want in plain English, and AI converts that into code. Suddenly, engineers are orchestras, not typists—spending more time designing systems, not debugging syntax. The grind's automated; creativity's upgraded.Why Teachers Want Student Effort, Not AI's “Perfect” ShortcutA San Francisco teacher caught a student with completely polished writing but zero understanding. Rather than perfection, she values effort—the grit of digging into a text, wrestling with structure, and building academic resilience. Because shortcuts may polish papers, but they dull the skills that actually let students think for themselves.Are Zuckerberg's AI Dreams Actually Risky? Experts Warn “End of Humanity” Might Not Be MetaphorZuck's pushing “personal superintelligence for everyone,” but some AI safety pros think it's more hype than harm reduction. David Krueger at the University of Montreal calls the term “coherent”? Literally branding. Experts say he's glossing over the real existential threats advanced AI could bring—if we're not careful.Your Strategic Plan Is Already Obsolete—AI Just Made It AncientForget year-long playbooks—AI crunches plans in minutes, while markets shift by the hour. Strategy isn't a deck anymore, it's a living thing: snapshot it, run scenarios, then tweak—weekly or even daily. The future belongs to orgs that ditch static planning and embrace always-on adaptability.

university ai english strategy humans hiring montreal bots critics replaced agi david krueger

Does innovation have to come at the expense of ethics?

Beyond the Hype

Play Episode Listen Later May 12, 2025 45:54

Oliver Cronk is joined by Scott Logic's Sophia Ashley and Alix Dunn from the ‘Computer Says Maybe' podcast for a wide-ranging discussion which poses ethical questions that are frequently ignored in the race for technological innovation. Together, they explore the intersection of technology and power, and examine whether society is unwittingly allowing technology to reshape our world without adequate scrutiny or safeguards. As technological innovations continue to promote the atomisation and polarisation of society and the erosion of human agency, our guests consider what's required to achieve the right balance of People, Planet and Profit. From the global to the local, the discussion rounds off by providing practical advice on how you can promote a more human-centric approach to technology in your life and your work. Useful links for this episode Computer Says Maybe podcast Defying Datafication with Dr Abeba Birhane – Computer Says Maybe The Human in the Loop: what's it like to work in the AI supply chain? – Computer Says Maybe Are we sleepwalking into tomorrow's AI challenges? – Oliver Cronk Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development – Jan Kulveit, Raymond Douglas, Nora Ammann, Deger Turan, David Krueger, David Duvenaud The Planet as a Stakeholder The Responsible Business Map The Green PO Doing the Doughnut Tech

ai innovation planet ethics profit loop expense david krueger

Episode 296: Our Interview With David Krueger

Enduring Churches Podcast with Alan and Trent

Play Episode Listen Later Oct 21, 2024 38:37

david krueger

AF - [Paper] Stress-testing capability elicitation by training password-locked models by Fabien Roger

The Nonlinear Library

Play Episode Listen Later Jun 4, 2024 22:23

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Paper] Stress-testing capability elicitation by training password-locked models, published by Fabien Roger on June 4, 2024 on The AI Alignment Forum. The paper is by Ryan Greenblatt, Fabien Roger, Dmitrii Krasheninnikov and David Krueger. This post was written by Fabien and Ryan, and may not reflect the views of Dmitrii and David. Scheming models might try to perform less capably than they are able to ( sandbag). They might do this to make evaluations underestimate danger, slow down promising safety work, or cause issues with debate or recursive oversight more generally. You might hope that you can avoid this issue by eliciting capabilities, for example by fine-tuning on demonstrations and then using RL. In principle, this elicitation might fail. It could fail due to having too few (or no) high-quality demonstrations (i.e. sample efficiency is too low) or because RL fails to explore good actions. We're uncertain how this goes in practice. It could be the case that simple countermeasures prevent sandbagging in most of the important cases such that it isn't a big problem for safety. But it could also turn out that even high-effort approaches can't rule out these issues. We released a paper studying this by examining how well supervised fine-tuning and RL can elicit capabilities from password-locked models (LLMs trained to produce low-quality outputs except when a particular password is provided). We find that: If you have some high-quality demonstrations, supervised fine-tuning usually recovers the hidden capabilities with very few samples. Elicitation on password-locked models generalizes well across domains (even when we made password-locked models using techniques that we thought might prevent generalization). Using RL on password-locked models recovers hidden capabilities, except when the weak model RL starts from is extremely weak. While we target cases where AIs intentionally produce low-quality outputs and they could easily produce higher quality outputs, we also think our results have some relevance to other cases where AIs produce low-quality outputs despite being able to produce better ones. Here are the main takeaways: We think our results are a moderate update against scheming models being able to sandbag effectively (given countermeasures) for the kind of tasks we studied here, though there are differences between password-locked models and actual scheming models. We think this should be a bigger update for people who are relatively more pessimistic than us about sandbagging (at least insofar as their views make pessimistic predictions about current architectures). (We think this is also a small update against scheming, since scheming models might still try to sandbag, and if elicitation fine-tunes sandbagging out, it may partially erode scheming.) People interested in studying sandbagging or similar elicitation failures should consider using a methodology similar to the password-locked models methodology we use here. More generally, they should consider using some sort of adversarial methodology in which a red team constructs a model organism. In this post: We explain why good capability elicitation matters for AI safety We describe the sandbagging concern we are trying to assess; We summarize the results of our paper, and discuss their implications; We discuss how our results relate to relevant AGI safety work; We describe future work we are excited by; We discuss concerns other than intentional sandbagging for which our results might be relevant. The big picture: why good capability elicitation matters for AI safety Currently, AI labs try reasonably hard to get as much performance as possible out of their expensive pretrained model. The process of making an AI perform a task competently (without a huge amount of additional training) is often called eli...

ai training speech models ea passwords agi capability fabien rl scheming stress testing rationalist elicitation david krueger

AF - Paper in Science: Managing extreme AI risks amid rapid progress by JanB

The Nonlinear Library

Play Episode Listen Later May 23, 2024 1:58

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Paper in Science: Managing extreme AI risks amid rapid progress, published by JanB on May 23, 2024 on The AI Alignment Forum. https://www.science.org/doi/10.1126/science.adn0117 Authors: Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner*, Sören Mindermann* Abstract: Artificial intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI, there is a lack of consensus about how to manage them. Society's response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness and barely address autonomous systems. Drawing on lessons learned from other safety-critical technologies, we outline a comprehensive plan that combines technical research and development with proactive, adaptive governance mechanisms for a more commensurate preparation. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

ai science society managing drawing progress speech extreme amid rapid ea increases daniel kahneman yuval noah harari geoffrey hinton ai risks stuart russell rationalist pieter abbeel david krueger gillian hadfield jeff clune

AF - New report: Safety Cases for AI by Josh Clymer

The Nonlinear Library

Play Episode Listen Later Mar 20, 2024 1:33

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New report: Safety Cases for AI, published by Josh Clymer on March 20, 2024 on The AI Alignment Forum. ArXiv paper: https://arxiv.org/abs/2403.10462 The idea for this paper occurred to me when I saw Buck Shlegeris' MATS stream on "Safety Cases for AI." How would one justify the safety of advanced AI systems? This question is fundamental. It informs how RSPs should be designed and what technical research is useful to pursue. For a long time, researchers have (implicitly or explicitly) discussed ways to justify that AI systems are safe, but much of this content is scattered across different posts and papers, is not as concrete as I'd like, or does not clearly state their assumptions. I hope this report provides a helpful birds-eye view of safety arguments and moves the AI safety conversation forward by helping to identify assumptions they rest on (though there's much more work to do to clarify these arguments). Thanks to my coauthors: Nick Gabrieli, David Krueger, and Thomas Larsen -- and to everyone who gave feedback: Henry Sleight, Ashwin Acharya, Ryan Greenblatt, Stephen Casper, David Duvenaud, Rudolf Laine, Roger Grosse, Hjalmar Wijk, Eli Lifland, Oliver Habryka, Sim eon Campos, Aaron Scher, Lukas Berglund, and Nate Thomas. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

ai safety speech sim ea campos mats new report arxiv clymer rationalist rsps nate thomas thomas larsen david krueger buck shlegeris

LW - New report: Safety Cases for AI by joshc

The Nonlinear Library

Play Episode Listen Later Mar 20, 2024 1:32

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New report: Safety Cases for AI, published by joshc on March 20, 2024 on LessWrong. ArXiv paper: https://arxiv.org/abs/2403.10462 The idea for this paper occurred to me when I saw Buck Shlegeris' MATS stream on "Safety Cases for AI." How would one justify the safety of advanced AI systems? This question is fundamental. It informs how RSPs should be designed and what technical research is useful to pursue. For a long time, researchers have (implicitly or explicitly) discussed ways to justify that AI systems are safe, but much of this content is scattered across different posts and papers, is not as concrete as I'd like, or does not clearly state their assumptions. I hope this report provides a helpful birds-eye view of safety arguments and moves the AI safety conversation forward by helping to identify assumptions they rest on (though there's much more work to do to clarify these arguments). Thanks to my coauthors: Nick Gabrieli, David Krueger, and Thomas Larsen -- and to everyone who gave feedback: Henry Sleight, Ashwin Acharya, Ryan Greenblatt, Stephen Casper, David Duvenaud, Rudolf Laine, Roger Grosse, Hjalmar Wijk, Eli Lifland, Oliver Habryka, Sim eon Campos, Aaron Scher, Lukas Berglund, and Nate Thomas. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

ai safety speech sim ea campos mats new report arxiv rationalist rsps lesswrong nate thomas thomas larsen david krueger buck shlegeris

LW - New report: Safety Cases for AI by joshc

The Nonlinear Library: LessWrong

Play Episode Listen Later Mar 20, 2024 1:32

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New report: Safety Cases for AI, published by joshc on March 20, 2024 on LessWrong. ArXiv paper: https://arxiv.org/abs/2403.10462 The idea for this paper occurred to me when I saw Buck Shlegeris' MATS stream on "Safety Cases for AI." How would one justify the safety of advanced AI systems? This question is fundamental. It informs how RSPs should be designed and what technical research is useful to pursue. For a long time, researchers have (implicitly or explicitly) discussed ways to justify that AI systems are safe, but much of this content is scattered across different posts and papers, is not as concrete as I'd like, or does not clearly state their assumptions. I hope this report provides a helpful birds-eye view of safety arguments and moves the AI safety conversation forward by helping to identify assumptions they rest on (though there's much more work to do to clarify these arguments). Thanks to my coauthors: Nick Gabrieli, David Krueger, and Thomas Larsen -- and to everyone who gave feedback: Henry Sleight, Ashwin Acharya, Ryan Greenblatt, Stephen Casper, David Duvenaud, Rudolf Laine, Roger Grosse, Hjalmar Wijk, Eli Lifland, Oliver Habryka, Sim eon Campos, Aaron Scher, Lukas Berglund, and Nate Thomas. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

ai safety speech sim ea campos mats new report arxiv rationalist rsps lesswrong nate thomas thomas larsen david krueger buck shlegeris

Giving November part 1

Mystery & Comedy Old Time Radio Podcast

Play Episode Listen Later Nov 16, 2023 35:48

Hi guys and welcome back to the mystery and comedy all time radio podcast. I hope you guys are having a wonderful night as your shopping for the turkey getting ready for the big turkey day. Or just relaxing and thanking God for what he has truly blessed you with this Thanksgiving season. Whatever you are doing I hope you guys enjoy my presentation as I bring you part one of giving November. As we celebrate giving November we remember to be thankful to God everyday not just once or twice a year but every day is a blessing from him. Please join me as I bring to the show Mr David Krueger in the Hallmark playhouse presentation of Pete Thanksgiving. In this we find a man and a woman have came from their native land of Europe to come to America to fulfill their dream of owning a restaurant. As preparations are going for the restaurant Pete's wife gives him a letter from the Red Cross and Pete thinks it's some exciting news about how well his grandmother is doing. But Pete ended up ripping up the letter and not telling his wife what it's about. Later on in the episode we find Pete and his wife getting ready for there Thanksgiving grand opening and Pete goes to local businessmen who have a club and tells them the wonderful news that he's wanting to do for them filling up their stomachs for nothing on Thanksgiving Day. As a representation of how wonderful America is by giving an immigrant a dream and a place to call home. The men in the club start laughing after Pete leaves and saying how ludicrous is idea is. But later on we find out that Pete has given all his money and all his food to feed the poor community in the town that he lives in. The poor people are so deeply touched by his war in generosity that he goes to the men at the club a second time and tells them what he did and he tells them the sad off of truth about his grandmother passing away from starvation as she had nothing to eat but grass and some soap. So the midnight the plug end up having a change of heart and ask the couple is celebrating Thanksgiving and not comes on their door and it's the man that is the head of the club and he ends up giving them a lot of food that the whole town has raised as the warrant of this holiday season reminds us to be thankful. I hope you guys enjoy David Krueger performance on the show tonight and part one of giving November. Just remember guys to be thankful for every day I hope you guys enjoy the show please listen and subscribe now my podcast is available on YouTube music just type in mystery & comedy old time radio podcast. Im so thankful for everyone who has listened and subscribe I just wanted to remind you guys that Toys for tots fundraiser. And I want to thank everyone for your support. And always remember to enjoy the show guys thanks. --- Support this podcast: https://podcasters.spotify.com/pod/show/mysterycomedypod1942/support

america god thanksgiving europe giving toys hallmark thanksgiving day red cross david krueger

LW - AI Regulation May Be More Important Than AI Alignment For Existential Safety by otto.barten

The Nonlinear Library

Play Episode Listen Later Aug 24, 2023 8:07

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Regulation May Be More Important Than AI Alignment For Existential Safety, published by otto.barten on August 24, 2023 on LessWrong. Summary: Aligning a single powerful AI is not enough: we're only safe if no-one, ever, can build an unaligned powerful AI. Yudkowsky tried to solve this with the pivotal act: the first aligned AI does something (such as melting all GPUs) which makes sure no unaligned AIs can ever get built, by anyone. However, the labs are currently apparently not aiming to implement a pivotal act. That means that aligning an AGI, while creating lots of value, would not reduce existential risk. Instead, global hardware/data regulation is what's needed to reduce existential risk. Therefore, those aiming to reduce AI existential risk should focus on AI Regulation, rather than on AI Alignment. Epistemic status: I've been thinking about this for a few years, while working professionally on x-risk reduction. I think I know most literature on the topic. I have also discussed the topic with a fair number of experts (who in some cases seemed to agree, and in other cases did not seem to agree). Thanks to David Krueger, Matthijs Maas, Roman Yampolskiy, Tim Bakker, Ruben Dieleman, and Alex van der Meer for helpful conversations, comments, and/or feedback. These people do not necessarily share the views expressed in this post. This post is mostly about AI x-risk caused by a take-over. It may or may not be valid for other types of AI x-risks. This post is mostly about the 'end game' of AI existential risk, not about intermediate states. AI existential risk is an evolutionary problem. As Eliezer Yudkowsky and others have pointed out: even if there are safe AIs, those are irrelevant, since they will not prevent others from building dangerous AIs. Examples of safe AIs could be oracles or satisficers, insofar as it turns out to be possible to combine these AI types with high intelligence. But, as Yudkowsky would put it: "if all you need is an object that doesn't do dangerous things, you could try a sponge". Even if a limited AI would be a safe AI, it would not reduce AI existential risk. This is because at some point, someone would create an AI with an unbounded goal (create as many paperclips as possible, predict the next word in the sentence with unlimited accuracy, etc.). This is the AI that would kill us, not the safe one. This is the evolutionary nature of the AI existential risk problem. It is described excellently by Anthony Berglas in his underrated book, and more recently also in Ben Hendrycks' paper. This evolutionary part is a fundamental and very important property of AI existential risk and a large part of why this problem is difficult. Yet, many in AI Alignment and industry seem to focus on only aligning a single AI, which I think is insufficient. Yudkowsky aimed to solve this evolutionary problem (the fact that no-one, ever, should build an unsafe AI) with the so-called pivotal act. An aligned superintelligence would not only not kill humanity, it would also perform a pivotal act, the toy example being to melt all GPUs globally, or, as he later put it, to subtly change all GPUs globally so that they can no longer be used to create an AGI. This would be the act that would actually save humanity from extinction, by making sure no unsafe superintelligences are created, ever, by anyone (it may be argued that melting all GPUs, and all other future hardware that could run AI, would need to be done indefinitely by the aligned superintelligence, else even a pivotal act may be insufficient). The concept of a pivotal act, however, seems to have gone thoroughly out of fashion. None of the leading labs, AI governance think tanks, governments, etc. are talking or, apparently, thinking much about it. Rather, they seem to be thinking about things like non-proliferati...

ai safety speech alignment regulation meer ea existential agi gpus ai regulation epistemic rationalist ai alignment yudkowsky roman yampolskiy lesswrong david krueger

LW - AI Regulation May Be More Important Than AI Alignment For Existential Safety by otto.barten

The Nonlinear Library: LessWrong

Play Episode Listen Later Aug 24, 2023 8:07

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Regulation May Be More Important Than AI Alignment For Existential Safety, published by otto.barten on August 24, 2023 on LessWrong. Summary: Aligning a single powerful AI is not enough: we're only safe if no-one, ever, can build an unaligned powerful AI. Yudkowsky tried to solve this with the pivotal act: the first aligned AI does something (such as melting all GPUs) which makes sure no unaligned AIs can ever get built, by anyone. However, the labs are currently apparently not aiming to implement a pivotal act. That means that aligning an AGI, while creating lots of value, would not reduce existential risk. Instead, global hardware/data regulation is what's needed to reduce existential risk. Therefore, those aiming to reduce AI existential risk should focus on AI Regulation, rather than on AI Alignment. Epistemic status: I've been thinking about this for a few years, while working professionally on x-risk reduction. I think I know most literature on the topic. I have also discussed the topic with a fair number of experts (who in some cases seemed to agree, and in other cases did not seem to agree). Thanks to David Krueger, Matthijs Maas, Roman Yampolskiy, Tim Bakker, Ruben Dieleman, and Alex van der Meer for helpful conversations, comments, and/or feedback. These people do not necessarily share the views expressed in this post. This post is mostly about AI x-risk caused by a take-over. It may or may not be valid for other types of AI x-risks. This post is mostly about the 'end game' of AI existential risk, not about intermediate states. AI existential risk is an evolutionary problem. As Eliezer Yudkowsky and others have pointed out: even if there are safe AIs, those are irrelevant, since they will not prevent others from building dangerous AIs. Examples of safe AIs could be oracles or satisficers, insofar as it turns out to be possible to combine these AI types with high intelligence. But, as Yudkowsky would put it: "if all you need is an object that doesn't do dangerous things, you could try a sponge". Even if a limited AI would be a safe AI, it would not reduce AI existential risk. This is because at some point, someone would create an AI with an unbounded goal (create as many paperclips as possible, predict the next word in the sentence with unlimited accuracy, etc.). This is the AI that would kill us, not the safe one. This is the evolutionary nature of the AI existential risk problem. It is described excellently by Anthony Berglas in his underrated book, and more recently also in Ben Hendrycks' paper. This evolutionary part is a fundamental and very important property of AI existential risk and a large part of why this problem is difficult. Yet, many in AI Alignment and industry seem to focus on only aligning a single AI, which I think is insufficient. Yudkowsky aimed to solve this evolutionary problem (the fact that no-one, ever, should build an unsafe AI) with the so-called pivotal act. An aligned superintelligence would not only not kill humanity, it would also perform a pivotal act, the toy example being to melt all GPUs globally, or, as he later put it, to subtly change all GPUs globally so that they can no longer be used to create an AGI. This would be the act that would actually save humanity from extinction, by making sure no unsafe superintelligences are created, ever, by anyone (it may be argued that melting all GPUs, and all other future hardware that could run AI, would need to be done indefinitely by the aligned superintelligence, else even a pivotal act may be insufficient). The concept of a pivotal act, however, seems to have gone thoroughly out of fashion. None of the leading labs, AI governance think tanks, governments, etc. are talking or, apparently, thinking much about it. Rather, they seem to be thinking about things like non-proliferati...

ai safety speech alignment regulation meer ea existential agi gpus ai regulation epistemic rationalist ai alignment yudkowsky roman yampolskiy lesswrong david krueger

AF - Jesse Hoogland on Developmental Interpretability and Singular Learning Theory by Michaël Trazzi

The Nonlinear Library

Play Episode Listen Later Jul 6, 2023 5:50

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Jesse Hoogland on Developmental Interpretability and Singular Learning Theory, published by Michaël Trazzi on July 6, 2023 on The AI Alignment Forum. Jesse Hoogland is a research assistant at David Krueger's lab in Cambridge studying AI Safety who has recently been publishing on LessWrong about how to apply Singular Learning Theory to Alignment, and even organized some workshop in Berkeley last week around this. I thought it made sense to interview him to have some high-level overview of Singular Learning Theory (and other more general approaches like developmental interpretability). Below are some highlighted quotes from our conversation (available on Youtube, Spotify, Google Podcast, Apple Podcast). For the full context for each of these quotes, you can find the accompanying transcript. Interpreting Neural Networks: The Phase Transition View Studying Phase Transitions Could Help Detect Deception "We want to be able to know when these dangerous capabilities are first acquired because it might be too late. They might become sort of stuck and crystallized and hard to get rid of. And so we want to understand how dangerous capabilities, how misaligned values develop over the course of training. Phase transitions seem particularly relevant for that because they represent kind of the most important structural changes, the qualitative changes in the shape of these models internals. Now, beyond that, another reason we're interested in phase transitions is that phase transitions in physics are understood to be a kind of point of contact between the microscopic world and the macroscopic world. So it's a point where you have more control over the behavior of a system than you normally do. That seems relevant to us from a safety engineering perspective. Why do you have more control in a physical system during phase transitions?" (context) A Concrete Example of Phase Transition In Physics and an analogous example inside of neural networks "Jesse: If you heat a magnet to a high enough temperature, then it's no longer a magnet. It no longer has an overall magnetization. And so if you bring another magnet to it, they won't stick. But if you cool it down, at some point it reaches this Curie temperature. If you push it lower, then it will become magnetized. So the entire thing will all of a sudden get a direction. It'll have a north pole and a south pole. So the thing is though, like, which direction will that north pole or south pole be? And so it turns out that you only need an infinitesimally small perturbation to that system in order to point it in a certain direction. And so that's the kind of sensitivity you see, where the microscopic structure becomes very sensitive to tiny external perturbations. Michaël: And so if we bring this back to neural networks, if the weights are slightly different, the overall model could be deceptive or not. Is it something similar? Jesse: This is speculative. There are more concrete examples. So there are these toy models of superposition studied by Anthropic. And that's a case where you can see that it's learning some embedding and unembeddings. So it's trying to compress data. You can see that the way it compresses data involves this kind of symmetry breaking, this sensitivity, where it selects one solution at a phase transition. So that's a very concrete example of this." (context) Developmental Interpretability "Suppose it's possible to understand what's going on inside of neural networks, largely understand them. First assumption. Well then, it's still going to be very difficult to do that at one specific moment in time. I think intractable. The only way you're actually going to build up an exhaustive idea of what structure the model has internally, is to look at how it forms over the course of training. You want to look at each moment, where...

LW - Jesse Hoogland on Developmental Interpretability and Singular Learning Theory by Michaël Trazzi

The Nonlinear Library

Play Episode Listen Later Jul 6, 2023 5:50

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Jesse Hoogland on Developmental Interpretability and Singular Learning Theory, published by Michaël Trazzi on July 6, 2023 on LessWrong. Jesse Hoogland is a research assistant at David Krueger's lab in Cambridge studying AI Safety who has recently been publishing on LessWrong about how to apply Singular Learning Theory to Alignment, and even organized some workshop in Berkeley last week around this. I thought it made sense to interview him to have some high-level overview of Singular Learning Theory (and other more general approaches like developmental interpretability). Below are some highlighted quotes from our conversation (available on Youtube, Spotify, Google Podcast, Apple Podcast). For the full context for each of these quotes, you can find the accompanying transcript. Interpreting Neural Networks: The Phase Transition View Studying Phase Transitions Could Help Detect Deception "We want to be able to know when these dangerous capabilities are first acquired because it might be too late. They might become sort of stuck and crystallized and hard to get rid of. And so we want to understand how dangerous capabilities, how misaligned values develop over the course of training. Phase transitions seem particularly relevant for that because they represent kind of the most important structural changes, the qualitative changes in the shape of these models internals. Now, beyond that, another reason we're interested in phase transitions is that phase transitions in physics are understood to be a kind of point of contact between the microscopic world and the macroscopic world. So it's a point where you have more control over the behavior of a system than you normally do. That seems relevant to us from a safety engineering perspective. Why do you have more control in a physical system during phase transitions?" (context) A Concrete Example of Phase Transition In Physics and an analogous example inside of neural networks "Jesse: If you heat a magnet to a high enough temperature, then it's no longer a magnet. It no longer has an overall magnetization. And so if you bring another magnet to it, they won't stick. But if you cool it down, at some point it reaches this Curie temperature. If you push it lower, then it will become magnetized. So the entire thing will all of a sudden get a direction. It'll have a north pole and a south pole. So the thing is though, like, which direction will that north pole or south pole be? And so it turns out that you only need an infinitesimally small perturbation to that system in order to point it in a certain direction. And so that's the kind of sensitivity you see, where the microscopic structure becomes very sensitive to tiny external perturbations. Michaël: And so if we bring this back to neural networks, if the weights are slightly different, the overall model could be deceptive or not. Is it something similar? Jesse: This is speculative. There are more concrete examples. So there are these toy models of superposition studied by Anthropic. And that's a case where you can see that it's learning some embedding and unembeddings. So it's trying to compress data. You can see that the way it compresses data involves this kind of symmetry breaking, this sensitivity, where it selects one solution at a phase transition. So that's a very concrete example of this." (context) Developmental Interpretability "Suppose it's possible to understand what's going on inside of neural networks, largely understand them. First assumption. Well then, it's still going to be very difficult to do that at one specific moment in time. I think intractable. The only way you're actually going to build up an exhaustive idea of what structure the model has internally, is to look at how it forms over the course of training. You want to look at each moment, where you learn sp...

Jesse Hoogland on Developmental Interpretability and Singular Learning Theory

The Inside View

Play Episode Listen Later Jul 6, 2023 43:11

Jesse Hoogland is a research assistant at David Krueger's lab in Cambridge studying AI Safety. More recently, Jesse has been thinking about Singular Learning Theory and Developmental Interpretability, which we discuss in this episode. Before he came to grips with existential risk from AI, he co-founded a health-tech startup automating bariatric surgery patient journeys. (00:00) Intro (03:57) Jesse's Story And Probability Of Doom (06:21) How Jesse Got Into Singular Learning Theory (08:50) Intuition behind SLT: the loss landscape (12:23) Does SLT actually predict anything? Phase Transitions (14:37) Why care about phase transition, grokking, etc (15:56) Detecting dangerous capabilities like deception in the (devel)opment (17:24) A concrete example: magnets (20:06) Why Jesse Is Bullish On Interpretability (23:57) Developmental Interpretability (28:06) What Happens Next? Jesse's Vision (31:56) Toy Models of Superposition (32:47) Singular Learning Theory Part 2 (36:22) Are Current Models Creative? Reasoning? (38:19) Building Bridges Between Alignment And Other Disciplines (41:08) Where To Learn More About Singular Learning Theory Make sure I upload regularly: https://patreon.com/theinsideview Youtube: https://youtu.be/713KyknwShA Transcript: https://theinsideview.ai/jesse Jesse: https://twitter.com/jesse_hoogland Host: https://twitter.com/MichaelTrazzi Patreon supporters: - Vincent Weisser - Gunnar Höglund - Ryan Coppolo - Edward Huff - Emil Wallner - Jesse Hoogland - William Freire - Cameron Holmes - Jacques Thibodeau - Max Chiswick - Jack Seroy - JJ Hepburn

ai vision theory intuition cambridge reasoning developmental singular detecting what happens next slt superposition learning theories interpretability david krueger

LW - Jesse Hoogland on Developmental Interpretability and Singular Learning Theory by Michaël Trazzi

The Nonlinear Library: LessWrong

Play Episode Listen Later Jul 6, 2023 5:50

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Jesse Hoogland on Developmental Interpretability and Singular Learning Theory, published by Michaël Trazzi on July 6, 2023 on LessWrong. Jesse Hoogland is a research assistant at David Krueger's lab in Cambridge studying AI Safety who has recently been publishing on LessWrong about how to apply Singular Learning Theory to Alignment, and even organized some workshop in Berkeley last week around this. I thought it made sense to interview him to have some high-level overview of Singular Learning Theory (and other more general approaches like developmental interpretability). Below are some highlighted quotes from our conversation (available on Youtube, Spotify, Google Podcast, Apple Podcast). For the full context for each of these quotes, you can find the accompanying transcript. Interpreting Neural Networks: The Phase Transition View Studying Phase Transitions Could Help Detect Deception "We want to be able to know when these dangerous capabilities are first acquired because it might be too late. They might become sort of stuck and crystallized and hard to get rid of. And so we want to understand how dangerous capabilities, how misaligned values develop over the course of training. Phase transitions seem particularly relevant for that because they represent kind of the most important structural changes, the qualitative changes in the shape of these models internals. Now, beyond that, another reason we're interested in phase transitions is that phase transitions in physics are understood to be a kind of point of contact between the microscopic world and the macroscopic world. So it's a point where you have more control over the behavior of a system than you normally do. That seems relevant to us from a safety engineering perspective. Why do you have more control in a physical system during phase transitions?" (context) A Concrete Example of Phase Transition In Physics and an analogous example inside of neural networks "Jesse: If you heat a magnet to a high enough temperature, then it's no longer a magnet. It no longer has an overall magnetization. And so if you bring another magnet to it, they won't stick. But if you cool it down, at some point it reaches this Curie temperature. If you push it lower, then it will become magnetized. So the entire thing will all of a sudden get a direction. It'll have a north pole and a south pole. So the thing is though, like, which direction will that north pole or south pole be? And so it turns out that you only need an infinitesimally small perturbation to that system in order to point it in a certain direction. And so that's the kind of sensitivity you see, where the microscopic structure becomes very sensitive to tiny external perturbations. Michaël: And so if we bring this back to neural networks, if the weights are slightly different, the overall model could be deceptive or not. Is it something similar? Jesse: This is speculative. There are more concrete examples. So there are these toy models of superposition studied by Anthropic. And that's a case where you can see that it's learning some embedding and unembeddings. So it's trying to compress data. You can see that the way it compresses data involves this kind of symmetry breaking, this sensitivity, where it selects one solution at a phase transition. So that's a very concrete example of this." (context) Developmental Interpretability "Suppose it's possible to understand what's going on inside of neural networks, largely understand them. First assumption. Well then, it's still going to be very difficult to do that at one specific moment in time. I think intractable. The only way you're actually going to build up an exhaustive idea of what structure the model has internally, is to look at how it forms over the course of training. You want to look at each moment, where you learn sp...

Alan Chan And Max Kauffman on Model Evaluations, Coordination and AI Safety

The Inside View

Play Episode Listen Later May 6, 2023 73:15

Max Kaufmann⁠ and Alan Chan discuss the evaluation of large language models, AI Governance and more generally the impact of the deployment of foundational models. is currently a Research Assistant to Owain Evans, mainly thinking about (and fixing) issues that might arise as we scale up our current ML systems, but also interested in issues arising from multi-agent failures and situational awareness. Alan is PhD student at Mila advised by Nicolas Le Roux, with a strong interest in AI Safety, AI Governance and coordination. He has also recently been working with David Krueger and helped me with some of the interviews that have been published recently (ML Street talk and Christoph Schuhmann). Disclaimer: this discussion is much more casual than the rest of the conversations in this podcast. This was completely impromptu: I just thought it would be interesting to have Max and Alan discuss model evaluations (also called “evals” for short), since they are both interested in the topic. Transcript: https://heinsideview.ai/alan_and_max Youtube: https://youtu.be/BOLxeR_culU Outline (0:00:00) Introduction(0:01:16) LLMs Translating To Systems In The Future Is Confusing(0:03:23) Evaluations Should Measure Actions Instead of Asking Yes or No Questions(0:04:17) Identify Key Contexts for Dangerous Behavior to Write Concrete Evals(0:07:29) Implicit Optimization Process Affects Evals and Benchmarks(0:08:45) Passing Evals Doesn't Guarantee Safety(0:09:41) Balancing Technical Evals With Social Governance(0:11:00) Evaluations Must Be Convincing To Influence AI Development(0:12:04) Evals Might Convince The AI Safety Community But Not People in FAccT(0:13:21) Difficulty In Explaining AI Risk To Other Communities(0:14:19) Both Existential Safety And Fairness Are Important(0:15:14) Reasons Why People Don't Care About AI Existential Risk(0:16:10) The Association Between Sillicon Valley And People in FAccT(0:17:39) Timelines And RL Understanding Might Impact The Perception Existential Risk From AI(0:19:01) Agentic Models And Longtermism Hinder AI Safety Awareness(0:20:17) The Focus On Immediate AI Harms Might Be A Rejection Of Speculative Claims(0:21:50) Is AI Safety A Pascal Mugging(0:23:15) Believing In The Deployment Of Large Foundational Models Should Be Enough To Start Worrying(0:25:38) AI Capabilities Becomign More Evident to the Public Might Not Be Enough(0:27:27) Addressing Generalization and Reward Specification in AI(0:27:59) Evals as an Additional Layer of Security in AI Safety(0:28:41) A Portfolio Approach to AI Alignment and Safety(0:29:03) Imagine Alignment Is Solved In 2040, What Made It Happen?(0:33:04) AGI Timelines Are Uncertain And Anchored By Vibes(0:35:24) What Matters Is Agency, Strategical Awareness And Planning(0:37:15) Alignment Is A Public Good, Coordination Is Difficult(0:06:48) Dignity As AN Useful Heuristic In The Face Of Doom(0:42:28) What Will Society Look Like If We Actually Get Superintelligent Gods(0:45:41) Uncertainty About Societal Dynamics Affecting Long-Term Future With AGI(0:47:42) Biggest Frustration With The AI Safety Community(0:48:34) AI Safety Includes Addressing Negative Consequences of AI(0:50:41) Frustration: Lack of Bridge Building Between AI Safety and Fairness Communities(0:53:07) Building Bridges by Attending Conferences and Understanding Different Perspectives(0:56:02) AI Systems with Weird Instrumental Goals Pose Risks to Society(0:58:43) Advanced AI Systems Controlling Resources Could Magnify Suffering(1:00:24) Cooperation Is Crucial to Achieve Pareto Optimal Outcomes and Avoid Global Catastrophes(1:01:54) Alan's Origin Story(1:02:47) Alan's AI Safety Research Is Driven By Desire To Reduce Suffering And Improve Lives(1:04:52) Diverse Interests And Concern For Global Problems Led To AI Safety Research(1:08:46) The Realization Of The Potential Dangers Of AGI Motivated AI Safety Work(1:10:39) What is Alan Chan Working On At The Moment

ai phd society safety model security origin stories ml building bridges coordination evaluations benchmarks kauffman research assistant ai systems ai alignment david krueger alan chan owain evans

David Krueger–Coordination, Alignment, Academia

The Inside View

Play Episode Listen Later Jan 7, 2023 165:19

David Krueger is an assistant professor at the University of Cambridge and got his PhD from Mila. His research group focuses on aligning deep learning systems, but he is also interested in governance and global coordination. He is famous in Cambridge for not having an AI alignment research agenda per se, and instead he tries to enable his seven PhD students to drive their own research. In this episode we discuss AI Takeoff scenarios, research going on at David's lab, Coordination, Governance, Causality, the public perception of AI Alignment research and how to change it. Youtube: https://youtu.be/bDMqo7BpNbk Transcript: https://theinsideview.ai/david OUTLINE (00:00) Highlights (01:06) Incentivized Behaviors and Takeoff Speeds (17:53) Building Models That Understand Causality (31:04) Agency, Acausal Trade And Causality in LLMs (40:44) Recursive Self Improvement, Bitter Lesson And Alignment (01:03:17) AI Governance And Coordination (01:13:26) David's AI Alignment Research Lab and the Existential Safety Community (01:24:13) On The Public Perception of AI Alignment (01:35:58) How To Get People In Academia To Work on Alignment (02:00:19) Decomposing Learning Curves, Latest Research From David Krueger's Lab (02:20:06) Safety-Performance Trade-Offs (02:30:20) Defining And Characterizing Reward Hacking (02:40:51) Playing Poker With Ethan Caballero, Timelines

university ai phd agency cambridge alignment academia governance lab timelines coordination causality ai alignment david krueger

LW - My Advice for Incoming SERI MATS Scholars by Johannes C. Mayer

The Nonlinear Library

Play Episode Listen Later Jan 4, 2023 6:30

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My Advice for Incoming SERI MATS Scholars, published by Johannes C. Mayer on January 3, 2023 on LessWrong. I have participated in SERI MATS 2.0 in John's stream. Here is some advice based on my experience. Be Nice The AI alignment community is pretty small. If you are an ass, everybody will know that you are an ass. The same holds to a lesser extent for being nice. When I was visiting Edinburgh to attend a talk by David Krueger, there were several people there, that I had first met at Lightcone. When I was visiting Trajan House, the same thing happened. You never know when you might be talking to a grantmaker over dinner. Epistemic status: I did not actually behave like an ass. I expect this to be true, based on how many people I ran into that I've seen before, in different parts of the world. Use Lunch and Dinner at Lightcone During MATS 2.0 lunch and dinner were both served at Lightcone every day of the week. There were always many cool people around, and the conversations were unusually insightful. My favorite heuristic is to just join whatever conversation John is in. I am pretty sure that at least 15% of the value of SERI MATS came from eating lunch and dinner at Lightcone. Probably much more than that. Epistemic status: It feels like this was very useful, but it is hard to quantify. Take care of yourself At the beginning of SERI MATS, there were many social events (mostly just general Berkeley EA/Rationalist events). They were all happening pretty late. For some reason, I need to sleep 10:30 to 12:00 hours every day or I will be tired. My team was meeting at 10:00 every day. For the first 3 weeks, I was basically sleep-deprived almost every day. John's workshops are pretty great, and being sleep-deprived during them destroyed probably more than 20% of the value. That being said, at least one of the socials was high-value, and it was probably worth the cost. The worst thing was that I got used to being sleep-deprived. I sleep-deprived myself, even when there were no socials happening. I made similar mistakes with doing sports and eating healthily. Somehow it's hard to keep up all the good habits when you change your environment. Epistemic status: It's hard to evaluate the counterfactual where I was not sleep-deprived. I estimate I could have gotten 5-35% more value by not making the mistakes I listed. Learn to detach yourself from your ideas Check out this comment. Be Agentic If something doesn't fit right, try to fix it. Do you have a crazy idea about how to improve the office? Ask, or implement it yourself (after getting permission)! (The Lightcone ops team is very competent and cool. John had a loft bed in his office when I was there. I am not sure about the situation in the new SERI MATS offices.) Choose how you spend your time. If you are in a conversation, notice when you would rather do something else. I recommend that you act on this feeling. Get back to work, join that other discussion that seems more interesting, or do whatever else seems higher value. I think being able to do this is great. Building up this skill is probably easier when talking to rationalists. They won't punish you for this kind of initiative. In general, being agentic seems highly related to making sure that you thought all your high-value thoughts. I recommend sitting down for at least 5 minutes by the clock every day, and trying to come up with high-value directions to think in. The second step is then to always do what you think is best. Which is not easy. Think about AI alignment from scratch Reading somebodies work is different from discovering the underlying insights for yourself. Many details will be omitted in a write-up. Especially details on the research process. When I thought about AI alignment from scratch, I was thinking thoughts I had not thought of before. It seems l...

ai advice building speech dinner edinburgh scotland ea incoming mayer scholars mats seri epistemic rationalist my advice lesswrong david krueger johannes c

LW - My Advice for Incoming SERI MATS Scholars by Johannes C. Mayer

The Nonlinear Library: LessWrong

Play Episode Listen Later Jan 4, 2023 6:30

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My Advice for Incoming SERI MATS Scholars, published by Johannes C. Mayer on January 3, 2023 on LessWrong. I have participated in SERI MATS 2.0 in John's stream. Here is some advice based on my experience. Be Nice The AI alignment community is pretty small. If you are an ass, everybody will know that you are an ass. The same holds to a lesser extent for being nice. When I was visiting Edinburgh to attend a talk by David Krueger, there were several people there, that I had first met at Lightcone. When I was visiting Trajan House, the same thing happened. You never know when you might be talking to a grantmaker over dinner. Epistemic status: I did not actually behave like an ass. I expect this to be true, based on how many people I ran into that I've seen before, in different parts of the world. Use Lunch and Dinner at Lightcone During MATS 2.0 lunch and dinner were both served at Lightcone every day of the week. There were always many cool people around, and the conversations were unusually insightful. My favorite heuristic is to just join whatever conversation John is in. I am pretty sure that at least 15% of the value of SERI MATS came from eating lunch and dinner at Lightcone. Probably much more than that. Epistemic status: It feels like this was very useful, but it is hard to quantify. Take care of yourself At the beginning of SERI MATS, there were many social events (mostly just general Berkeley EA/Rationalist events). They were all happening pretty late. For some reason, I need to sleep 10:30 to 12:00 hours every day or I will be tired. My team was meeting at 10:00 every day. For the first 3 weeks, I was basically sleep-deprived almost every day. John's workshops are pretty great, and being sleep-deprived during them destroyed probably more than 20% of the value. That being said, at least one of the socials was high-value, and it was probably worth the cost. The worst thing was that I got used to being sleep-deprived. I sleep-deprived myself, even when there were no socials happening. I made similar mistakes with doing sports and eating healthily. Somehow it's hard to keep up all the good habits when you change your environment. Epistemic status: It's hard to evaluate the counterfactual where I was not sleep-deprived. I estimate I could have gotten 5-35% more value by not making the mistakes I listed. Learn to detach yourself from your ideas Check out this comment. Be Agentic If something doesn't fit right, try to fix it. Do you have a crazy idea about how to improve the office? Ask, or implement it yourself (after getting permission)! (The Lightcone ops team is very competent and cool. John had a loft bed in his office when I was there. I am not sure about the situation in the new SERI MATS offices.) Choose how you spend your time. If you are in a conversation, notice when you would rather do something else. I recommend that you act on this feeling. Get back to work, join that other discussion that seems more interesting, or do whatever else seems higher value. I think being able to do this is great. Building up this skill is probably easier when talking to rationalists. They won't punish you for this kind of initiative. In general, being agentic seems highly related to making sure that you thought all your high-value thoughts. I recommend sitting down for at least 5 minutes by the clock every day, and trying to come up with high-value directions to think in. The second step is then to always do what you think is best. Which is not easy. Think about AI alignment from scratch Reading somebodies work is different from discovering the underlying insights for yourself. Many details will be omitted in a write-up. Especially details on the research process. When I thought about AI alignment from scratch, I was thinking thoughts I had not thought of before. It seems l...

ai advice building speech dinner edinburgh scotland ea incoming mayer scholars mats seri epistemic rationalist my advice lesswrong david krueger johannes c

LW - Probably good projects for the AI safety ecosystem by Ryan Kidd

The Nonlinear Library

Play Episode Listen Later Dec 5, 2022 4:19

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Probably good projects for the AI safety ecosystem, published by Ryan Kidd on December 5, 2022 on LessWrong. At EAGxBerkeley 2022, I was asked several times what new projects might benefit the AI safety and longtermist research ecosystem. I think that several existing useful-according-to-me projects (e.g., SERI MATS, REMIX, CAIS, etc.) could urgently absorb strong management and operations talent, but I think the following projects would also probably be useful to the AI safety/longtermist project. Criticisms are welcome. Projects I might be excited to see, in no particular order: A London-based MATS clone to build the AI safety research ecosystem there, leverage mentors in and around London (e.g., DeepMind, CLR, David Krueger, Aligned AI, Conjecture, etc.), and allow regional specialization. This project should probably only happen once MATS has ironed out the bugs in its beta versions and grown too large for one location (possibly by Winter 2023). Please contact the MATS team before starting something like this to ensure good coordination and to learn from our mistakes. Rolling admissions alternatives to MATS' cohort-based structure for mentors and scholars with different needs (e.g., to support alignment researchers who suddenly want to train/use research talent at irregular intervals but don't have the operational support to do this optimally). A combined research mentorship and seminar program that aims to do for AI governance research what MATS is trying to do for technical AI alignment research. A dedicated bi-yearly workshop for AI safety university group leaders that teaches them how to recognize talent, foster useful undergraduate research projects, and build a good talent development pipeline or “user journey” (including a model of alignment macrostrategy and where university groups fit in). An organization that does for the Open Philanthropy worldview investigations team what GCP did to supplement CEA's workshops and 80,000 Hours' career advising calls. Further programs like ARENA that aim to develop ML safety engineering talent at scale by leveraging good ML tutors and proven curricula like CAIS' Intro to ML Safety, Redwood Research's MLAB, and Jacob Hilton's DL curriculum for large language module alignment. More contests like ELK with well-operationalized research problems (i.e., clearly explain what builder/breaker steps look like), clear metrics of success, and have a well-considered target audience (who is being incentivized to apply and why?) and user journey (where do prize winners go next?). Possible contest seeds: Evan Hubinger's SERI MATS deceptive AI challenge problem; Vivek Hebbar's and Nate Soares' SERI MATS diamond maximizer selection problem; Alex Turner's and Quintin Pope's SERI MATS training stories selection problem. More "plug-and-play" curriculums for AI safety university groups, like AGI Safety Fundamentals, Alignment 201, Intro to ML Safety. A well-considered "precipism" university course template that critically analyzes Toby Ord's “The Precipice,” Holden Karnofsky's “The Most Important Century,” Will MacAskill's “What We Owe The Future,” some Open Philanthropy worldview investigations reports, some Global Priorities Institute ethics papers, etc. Hackathons in which people with strong ML knowledge (not ML novices) write good-faith critiques of AI alignment papers and worldviews (e.g., what Jacob Steinhardt's “ML Systems Will Have Weird Failure Modes” does for Hubinger et al.'s “Risks From Learned Optimization”). A New York-based alignment hub that aims to provide talent search and logistical support for NYU Professor Sam Bowman's planned AI safety research group. More organizations like CAIS that aim to recruit established ML talent into alignment research with clear benchmarks, targeted hackathons/contests with prizes, and offers ...

LW - Probably good projects for the AI safety ecosystem by Ryan Kidd

The Nonlinear Library: LessWrong

Play Episode Listen Later Dec 5, 2022 4:19

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Probably good projects for the AI safety ecosystem, published by Ryan Kidd on December 5, 2022 on LessWrong. At EAGxBerkeley 2022, I was asked several times what new projects might benefit the AI safety and longtermist research ecosystem. I think that several existing useful-according-to-me projects (e.g., SERI MATS, REMIX, CAIS, etc.) could urgently absorb strong management and operations talent, but I think the following projects would also probably be useful to the AI safety/longtermist project. Criticisms are welcome. Projects I might be excited to see, in no particular order: A London-based MATS clone to build the AI safety research ecosystem there, leverage mentors in and around London (e.g., DeepMind, CLR, David Krueger, Aligned AI, Conjecture, etc.), and allow regional specialization. This project should probably only happen once MATS has ironed out the bugs in its beta versions and grown too large for one location (possibly by Winter 2023). Please contact the MATS team before starting something like this to ensure good coordination and to learn from our mistakes. Rolling admissions alternatives to MATS' cohort-based structure for mentors and scholars with different needs (e.g., to support alignment researchers who suddenly want to train/use research talent at irregular intervals but don't have the operational support to do this optimally). A combined research mentorship and seminar program that aims to do for AI governance research what MATS is trying to do for technical AI alignment research. A dedicated bi-yearly workshop for AI safety university group leaders that teaches them how to recognize talent, foster useful undergraduate research projects, and build a good talent development pipeline or “user journey” (including a model of alignment macrostrategy and where university groups fit in). An organization that does for the Open Philanthropy worldview investigations team what GCP did to supplement CEA's workshops and 80,000 Hours' career advising calls. Further programs like ARENA that aim to develop ML safety engineering talent at scale by leveraging good ML tutors and proven curricula like CAIS' Intro to ML Safety, Redwood Research's MLAB, and Jacob Hilton's DL curriculum for large language module alignment. More contests like ELK with well-operationalized research problems (i.e., clearly explain what builder/breaker steps look like), clear metrics of success, and have a well-considered target audience (who is being incentivized to apply and why?) and user journey (where do prize winners go next?). Possible contest seeds: Evan Hubinger's SERI MATS deceptive AI challenge problem; Vivek Hebbar's and Nate Soares' SERI MATS diamond maximizer selection problem; Alex Turner's and Quintin Pope's SERI MATS training stories selection problem. More "plug-and-play" curriculums for AI safety university groups, like AGI Safety Fundamentals, Alignment 201, Intro to ML Safety. A well-considered "precipism" university course template that critically analyzes Toby Ord's “The Precipice,” Holden Karnofsky's “The Most Important Century,” Will MacAskill's “What We Owe The Future,” some Open Philanthropy worldview investigations reports, some Global Priorities Institute ethics papers, etc. Hackathons in which people with strong ML knowledge (not ML novices) write good-faith critiques of AI alignment papers and worldviews (e.g., what Jacob Steinhardt's “ML Systems Will Have Weird Failure Modes” does for Hubinger et al.'s “Risks From Learned Optimization”). A New York-based alignment hub that aims to provide talent search and logistical support for NYU Professor Sam Bowman's planned AI safety research group. More organizations like CAIS that aim to recruit established ML talent into alignment research with clear benchmarks, targeted hackathons/contests with prizes, and offers ...

Podcast appearances and mentions of david krueger

Best podcasts about david krueger

The Nonlinear Library

Gibbsville Reformed Church | GRC Audio

Jacob's Well Church Sermon Audio, Green Bay WI

The Inside View

Future of Life Institute Podcast

The Money Answers Show

Vital Connections On Air

The Nonlinear Library: LessWrong

Latest news about david krueger

Latest podcast episodes about david krueger

BOTS ARE HIRING HUMANS

Does innovation have to come at the expense of ethics?

Episode 296: Our Interview With David Krueger

AF - [Paper] Stress-testing capability elicitation by training password-locked models by Fabien Roger

AF - Paper in Science: Managing extreme AI risks amid rapid progress by JanB

AF - New report: Safety Cases for AI by Josh Clymer

LW - New report: Safety Cases for AI by joshc

LW - New report: Safety Cases for AI by joshc

Giving November part 1

LW - AI Regulation May Be More Important Than AI Alignment For Existential Safety by otto.barten

LW - AI Regulation May Be More Important Than AI Alignment For Existential Safety by otto.barten

AF - Jesse Hoogland on Developmental Interpretability and Singular Learning Theory by Michaël Trazzi

LW - Jesse Hoogland on Developmental Interpretability and Singular Learning Theory by Michaël Trazzi

Jesse Hoogland on Developmental Interpretability and Singular Learning Theory

LW - Jesse Hoogland on Developmental Interpretability and Singular Learning Theory by Michaël Trazzi

Alan Chan And Max Kauffman on Model Evaluations, Coordination and AI Safety

David Krueger–Coordination, Alignment, Academia

LW - My Advice for Incoming SERI MATS Scholars by Johannes C. Mayer

LW - My Advice for Incoming SERI MATS Scholars by Johannes C. Mayer

LW - Probably good projects for the AI safety ecosystem by Ryan Kidd

LW - Probably good projects for the AI safety ecosystem by Ryan Kidd

Sunday, December 4, 2022

Sunday, November 27, 2022

Sunday, November 6, 2022

Sunday, October 30, 2022

Sunday, October 16, 2022

Sunday, October 9, 2022

Episode 13 - The Two Davids

Sunday, Sept 25, 2022

Sunday, Sept 18, 2022

Sunday, Sept 11, 2022

LW - AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022 by Sam Bowman

LW - AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022 by Sam Bowman

Sunday, August 21, 2022

Sunday, August 14, 2022

AF - Encultured AI, Part 1 Appendix: Relevant Research Examples by Andrew Critch

Sunday, August 7, 2022

Sunday, May 15, 2022

Sunday, May 1, 2022

Sunday, April 24, 2022

Easter Sunday, April 17, 2022

AF - What's a good probability distribution family (e.g. "log-normal") to use for AGI timelines? by David Krueger

Sunday, April 10, 2022

Sunday, April 3, 2022

Sunday, Mar. 27, 2022

Sunday, Mar. 20, 2022

Dr. David Krueger, Montana State University Northern

#62 How to Empower Yourself Through Self Language

Rewire Your Brain For Wealth!

Episode 011 - Entrepreneur, Business and Life Coach Aaron Walker