The Nonlinear Library

Follow The Nonlinear Library
Share on
Copy link to clipboard

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and o

The Nonlinear Fund


    • Sep 26, 2024 LATEST EPISODE
    • daily NEW EPISODES
    • 12m AVG DURATION
    • 8,792 EPISODES


    Search for episodes from The Nonlinear Library with a specific topic:

    Latest episodes from The Nonlinear Library

    No new episodes will be published here. To keep listening to the EAF & LW, listen to this episode for instructions.

    Play Episode Listen Later Sep 26, 2024 0:33


    Counterfactuals strike again! The fora have their own official audio channels now, so The Nonlinear Library will no longer publish new episodes since it won't have any counterfactual impact. It's been a good run. We published thousands of episodes and generated a ton of passive impact. But we're not here for the views. We're here for the counterfactual impact. INSTRUCTIONS TO KEEP LISTENING TO THE FORA 1. Search "EA Forum" or "LessWrong" on your podcast player 2. Subscribe to the official channels 3. Go forth. Seek impact. Seek truth.

    LW - Augmenting Statistical Models with Natural Language Parameters by jsteinhardt

    Play Episode Listen Later Sep 22, 2024 16:41


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Augmenting Statistical Models with Natural Language Parameters, published by jsteinhardt on September 22, 2024 on LessWrong. This is a guest post by my student Ruiqi Zhong, who has some very exciting work defining new families of statistical models that can take natural language explanations as parameters. The motivation is that existing statistical models are bad at explaining structured data. To address this problem, we agument these models with natural language parameters, which can represent interpretable abstract features and be learned automatically. Imagine the following scenario: It is the year 3024. We are historians trying to understand what happened between 2016 and 2024, by looking at how Twitter topics changed across that time period. We are given a dataset of user-posted images sorted by time, $x_1$, $x_2$ ... $x_T$, and our goal is to find trends in this dataset to help interpret what happened. If we successfully achieve our goal, we would discover, for instance, (1) a recurring spike of images depicting athletes every four years for the Olympics, and (2) a large increase in images containing medical concepts during and after the COVID-19 pandemic. How do we usually discover temporal trends from a dataset? One common approach is to fit a time series model to predict how the features evolve and then interpret the learned model. However, it is unclear what features to use: pixels and neural image embeddings are high-dimensional and uninterpretable, undermining the goal of extracting explainable trends. We address this problem by augmenting statistical models with interpretable natural language parameters. The figure below depicts a graphical model representation for the case of time series data. We explain the trends in the observed data [ $x_1$ ... $x_T$] by learning two sets of latent parameters: natural language parameters $phi$ (the learned features) and real-valued parameters $w$ (the time-varying trends). $phi$: the natural language descriptions of $K$ different topics, e.g. "depicts athletes competing". $phi$ is an element of $Sigma$, the universe of all natural language predicates. $w_t$: the frequency of each of the K topics at the time $t$. If our model successfully recovers the underlying trends, then we can visualize $w$ and $phi$ below and see that: 1) more pictures contain medical concepts (red) starting from 2020, and 2) there are recurring (blue) spikes of athletes competing. In the rest of this post, we will explain in detail how to specify and learn models with natural language parameters and showcase the model on several real-world applications. We will cover: A warm-up example of a statistical model with natural language explanations A modeling language for specifying natural language parameters Applications of our framework, which can be used to specify models for time series, clustering, and applications. We will go over: A machine learning application that uses our time series model to monitor trends in LLM usage A business application that uses our clustering model to taxonomize product reviews A cognitive science application that uses our classification model to explain what images are more memorable for humans Thanks to Louise Verkin for helping to typeset the post in Ghost format. Warm-up Example: Logistic Regression with Natural Language Parameters Instead of understanding topic shifts across the entire time window of 2016-2024, let's first study a much simpler question: what images are more likely to appear after 2020? The usual way to approach this problem is to, 1. brainstorm some features, 2. extract the real-valued features from each image, and 3. run a logistic regression model on these features to predict the target $Y$ =1 if the image appears after 2020, $Y$ =0 otherwise. More concretely: Step 1: Propose different...

    LW - Glitch Token Catalog - (Almost) a Full Clear by Lao Mein

    Play Episode Listen Later Sep 22, 2024 170:10


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Glitch Token Catalog - (Almost) a Full Clear, published by Lao Mein on September 22, 2024 on LessWrong. This is a collection of every unidentified GPT2 glitch token listed in the third glitch token archaeology post. I was able to find the source of every single one, except for "?????-" and "?????-?????-"[1]. Please tell me if I missed one, or you've discovered one and don't understand where it came from. This isn't meant to be a well-written analysis, just a quick repository of my glitch-hunting observations. I plan on writing up and categorizing all of these in greater detail in future posts. The first of which is here. I used OpenWebText, a recreation of GPT2's training data, for all experiments in this post. I tokenized every .gz file in the archive and made a boolean Numpy array of each tokens that was present at least once. This allowed me to quickly identify infrequent tokens in the dataset and pull up the textual context with regular expressions. If there was an issue with overlap, I used a tokenizer-based extraction instead. All data/code available upon request. The leftmost column is token id, the middle is the token string, and the right column is # of files the token was present in (out of 20610). GPT2 has 50256 total tokens. GPT2 tokens with the lowest frequency in OpenWebText 30898 'embedreportprint' 0 33434 ' 士' 0 43453 ' SolidGoldMagikarp' 0 1849 'xa0' 0 47654 ' xa0xa0' 0 50009 ' strutConnector' 0 36173 ' RandomRedditor' 0 214 'x1a' 0 42424 'DragonMagazine' 0 180 ' ' 0 187 ' ' 0 186 ' ' 0 30213 ' externalToEVAOnly' 0 30212 ' externalToEVA' 0 30211 ' guiIcon' 0 185 ' ' 0 30210 ' guiActiveUnfocused' 0 30209 ' unfocusedRange' 0 184 ' ' 0 30202 ' guiName' 0 183 ' ' 0 30905 'rawdownload' 0 39906 'EStream' 0 33454 '龍喚士' 0 42586 ' srfN' 0 25992 ' 裏覚醒' 0 43065 ' srfAttach' 0 11504 ' xa0 xa0' 0 39172 'xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0xa0' 0 40240 'oreAndOnline' 0 40241 'InstoreAndOnline' 0 33477 'xa0xa0xa0' 0 36174 ' RandomRedditorWithNo' 0 37574 'StreamerBot' 0 46600 ' Adinida' 0 182 ' ' 0 29372 ' guiActiveUn' 0 43177 'EStreamFrame' 0 22686 ' xa0 xa0 xa0 xa0' 0 23282 ' davidjl' 0 47571 ' DevOnline' 0 39752 'quickShip' 0 44320 'nxa0' 0 8828 'xa0xa0xa0xa0' 0 39820 '龍 ' 0 39821 '龍契士' 0 28666 'PsyNetMessage' 0 35207 ' attRot' 0 181 ' ' 0 18472 ' guiActive' 0 179 ' ' 0 17811 'xa0xa0xa0xa0xa0xa0xa0xa0' 0 20174 ' 裏 ' 0 212 'x18' 0 211 'x17' 0 210 'x16' 0 209 'x15' 0 208 'x14' 0 31666 '?????-?????-' 0 207 'x13' 0 206 'x12' 0 213 'x19' 0 205 'x11' 0 203 'x0f' 0 202 'x0e' 0 31957 'cffffcc' 0 200 'x0c' 0 199 'x0b' 0 197 't' 0 196 'x08' 0 195 'x07' 0 194 'x06' 0 193 'x05' 0 204 'x10' 0 45545 ' サーティワン' 0 201 'r' 0 216 'x1c' 0 37842 ' partName' 0 45706 ' xa0 xa0 xa0 xa0 xa0 xa0 xa0 xa0' 0 124 ' ' 0 125 ' ' 0 178 ' ' 0 41380 'natureconservancy' 0 41383 'assetsadobe' 0 177 ' ' 0 215 'x1b' 0 41551 'Downloadha' 0 4603 'xa0xa0' 0 42202 'GoldMagikarp' 0 42089 ' TheNitrome' 0 217 'x1d' 0 218 'x1e' 0 42090 ' TheNitromeFan' 0 192 'x04' 0 191 'x03' 0 219 'x1f' 0 189 'x01' 0 45544 ' サーティ' 0 5624 ' xa0' 0 190 'x02' 0 40242 'BuyableInstoreAndOnline' 1 36935 ' dstg' 1 36940 ' istg' 1 45003 ' SetTextColor' 1 30897 'reportprint' 1 39757 'channelAvailability' 1 39756 'inventoryQuantity' 1 39755 'isSpecialOrderable' 1 39811 'soDeliveryDate' 1 39753 'quickShipAvailable' 1 39714 'isSpecial' 1 47198 'ItemTracker' 1 17900 ' Dragonbound' 1 45392 'dayName' 1 37579 'TPPStreamerBot' 1 31573 'ActionCode' 2 25193 'NetMessage' 2 39749 'DeliveryDate' 2 30208 ' externalTo' 2 43569 'ÍÍ' 2 34027 ' actionGroup' 2 34504 ' 裏 ' 2 39446 ' SetFontSize' 2 30899 'cloneembedreportprint' 2 32047 ' "$:/' 3 39803 'soType' 3 39177 'ItemThumbnailImage' 3 49781 'EngineDebug' 3 25658 '?????-' 3 33813 '=~=~' 3 48396 'ÛÛ' 3 34206 ...

    LW - Investigating an insurance-for-AI startup by L Rudolf L

    Play Episode Listen Later Sep 21, 2024 26:00


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Investigating an insurance-for-AI startup, published by L Rudolf L on September 21, 2024 on LessWrong. We (Flo & Rudolf) spent a month fleshing out the idea of an insurance-for-AI company. We talked to 15 people in the insurance industry, and did 20 customer interviews. We decided not to continue, but we think it's still a very promising idea and that maybe someone else should do this. This post describes our findings. The idea Theory of change To reduce AI risks, it would be good if we understood risks well, and if some organisation existed that could incentivise the use of safer AI practices. An insurance company that sells insurance policies for AI use cases has a financial incentive to understand concrete AI risks & harms well, because this feeds into its pricing. This company would also be incentivised to encourage companies to adopt safer AI practices, and could incentivise this by offering lower premiums in return. Like many cyber-insurance companies, it could also provide more general advice & consulting on AI-related risk reduction. Concrete path TL;DR: Currently, professionals (e.g. lawyers) have professional indemnity (PI) insurance. Right now, most AI tools involve the human being in the loop. But eventually, the AI will do the work end-to-end, and then the AI will be the one whose mistakes need to be insured. Currently, this insurance does not exist. We would start with law, but then expand to all other forms of professional indemnity insurance (i.e. insurance against harms caused by a professional's mistakes or malpractice in their work). Frontier labs are not good customers for insurance, since their size means they mostly do not need external insurance, and have a big information advantage in understanding the risk. Instead, we would target companies using LLMs (e.g. large companies that use specific potentially-risky AI workflows internally), or companies building LLM products for a specific industry. We focused on the latter, since startups are easier to sell to. Specifically, we wanted a case where: LLMs were being used in a high-stakes industry like medicine or law there were startups building LLM products in this industry there is some reason why the AI might cause legal liability, for example: the LLM tools are sufficiently automating the work that the liability is plausibly on them rather than the humans AI exceptions in existing insurance policies exist (or will soon exist) The best example we found was legal LLM tools. Law involves important decisions and large amounts of money, and lawyers can be found liable in legal malpractice lawsuits. LLMs are close to being able to do much legal work end-to-end; in particular, if the work is not checked by a human before being shipped, it is uncertain if existing professional indemnity (PI) insurance applies. People who work in law and law tech are also, naturally, very liability-aware. Therefore, our plan was: Become a managing general agent (MGA), a type of insurance company that does not pay claims out of its own capital (but instead finds a reinsurer to agree to pay them, and earns a cut of the premiums). Design PI policies for AI legal work, and sell these policies to legal AI startups (to help them sell to their law firm customers), or directly to law firms buying end-to-end legal AI tools. As more and more legal work is done end-to-end by AI, more and more of the legal PI insurance market is AI insurance policies. As AI advances and AI insurance issues become relevant in other industries, expand to those industries (e.g. medicine, finance, etc.). Eventually, most of the world's professional indemnity insurance market (on the order of $10B-100B/year) has switched from insuring against human mistakes to insuring against AI mistakes. Along the way, provide consulting services for countless business...

    LW - Applications of Chaos: Saying No (with Hastings Greer) by Elizabeth

    Play Episode Listen Later Sep 21, 2024 3:39


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Applications of Chaos: Saying No (with Hastings Greer), published by Elizabeth on September 21, 2024 on LessWrong. Previously Alex Altair and I published a post on the applications of chaos theory, which found a few successes but mostly overhyped dead ends. Luckily the comments came through, providing me with an entirely different type of application: knowing you can't, and explaining to your boss that you can't. Knowing you can't Calling a system chaotic rules out many solutions and tools, which can save you time and money in dead ends not traveled. I knew this, but also knew that you could never be 100% certain a physical system was chaotic, as opposed to misunderstood. However, you can know the equations behind proposed solutions, and trust that reality is unlikely to be simpler[1] than the idealized math. This means that if the equations necessary for your proposed solution could be used to solve the 3-body problem, you don't have a solution. [[1] I'm hedging a little because sometimes reality's complications make the math harder but the ultimate solution easier. E.g. friction makes movement harder to predict but gives you terminal velocity.] I had a great conversation with trebuchet and math enthusiast Hastings Greer about how this dynamic plays out with trebuchets. Transcript Note that this was recorded in Skype with standard headphones, so the recording leaves something to be desired. I think it's worth it for the trebuchet software visuals starting at 07:00 My favorite parts: If a trebuchet requires you to solve the double pendulum problem (a classic example of a chaotic system) in order to aim, it is not a competition-winning trebuchet. Trebuchet design was solved 15-20 years ago; it's all implementation details now. This did not require modern levels of tech, just modern nerds with free time. The winning design was used by the Syrians during Arab Spring, which everyone involved feels ambivalent about. The national pumpkin throwing competition has been snuffed out by insurance issues, but local competitions remain. Learning about trebuchet modeling software. Explaining you can't One reason to doubt chaos theory's usefulness is that we don't need fancy theories to tell us something is impossible. Impossibility tends to make itself obvious. But some people refuse to accept an impossibility, and some of those people are managers. Might those people accept "it's impossible because of chaos theory" where they wouldn't accept "it's impossible because look at it"? As a test of this hypothesis, I made a Twitter poll asking engineers-as-in-builds-things if they had tried to explain a project's impossibility to chaos, and if it had worked. The final results were: 36 respondents who were engineers of the relevant type This is probably an overestimate. One respondee replied later that he selected this option incorrectly, and I suspect that was a common mistake. I haven't attempted to correct for it as the exact percentage is not a crux for me. 6 engineers who'd used chaos theory to explain to their boss why something was impossible. 5 engineers who'd tried this explanation and succeeded. 1 engineer who tried this explanation and failed. 5/36 is by no means common, but it's not zero either, and it seems like it usually works. My guess is that usage is concentrated in a few subfields, making chaos even more useful than it looks. My sample size isn't high enough to trust the specific percentages, but as an existence proof I'm quite satisfied. Conclusion Chaos provides value both by telling certain engineers where not to look for solutions to their problems, and by getting their bosses off their back about it. That's a significant value add, but short of what I was hoping for when I started looking into Chaos. Thanks for listening. To help us out with The Nonlinear Library ...

    LW - Work with me on agent foundations: independent fellowship by Alex Altair

    Play Episode Listen Later Sep 21, 2024 6:20


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Work with me on agent foundations: independent fellowship, published by Alex Altair on September 21, 2024 on LessWrong. Summary: I am an independent researcher in agent foundations, and I've recently received an LTFF grant to fund someone to do research with me. This is a rolling application; I'll close it whenever I'm no longer interested in taking another person. If you're not familiar with agent foundations, you can read about my views in this post. What the role might be like This role is extremely flexible. Depending on who you are, it could end up resembling an internship, a research assistant position, a postdoc or even as a mentor/advisor to me. Below, I've listed out the parameters of the fellowship that I am using as a baseline of what it could be. All of these parameters are negotiable! $25 per hour. This is not a lot for people who live in the SF Bay area, or who are used to industry salaries, but it looks to me like this is comparable to a typical grad student salary. 20 hours per week. I'd like this fellowship to be one of your main projects, and I think it can take quite a lot of "deep work" focus before one can make progress on the research problems.[1] 3 months, with a decent chance of extension. During my AI safety camp project, it took about 6 weeks to get people up to speed on all the parts of the agent structure problem. Ideally I could find someone for this role who is already closer to caught up (though I don't necessarily anticipate that). I'm thinking of this fellowship as something like an extended work-trial for potentially working together longer-term. That said, I think we should at least aim to get results by the end of it. Whether I'll decide to invite you to continue working with me afterwards depends on how our collaboration went (both technically and socially), how many other people I'm collaborating with at that time, and whether I think I have enough funds to support it. Remote, but I'm happy to meet in person. Since I'm independent, I don't have anything like an office for you to make use of. But if you happen to be in the SF Bay area, I'd be more than happy to have our meetings in person. I wake up early, so US eastern and European time zones work well for me (and other time zones too). Meeting 2-5 times per week. Especially in the beginning, I'd like to do a pretty large amount of syncing up. It can take a long time to convey all the aspects of the research problems. I also find that real-time meetings regularly generate new ideas. That said, some people find meetings worse for their productivity, and so I'll be responsive to your particular work style. An end-of-term write-up. It seems to take longer than three months to get results in the types of questions I'm interested in, but I think it's good practice to commit to producing a write-up of how the fellowship goes. If it goes especially well, we could produce a paper. What this role ends up looking like mostly depends on your experience level relative to mine. Though I now do research, I haven't gone through the typical academic path. I'm in my mid-thirties and have a proportional amount of life and career experience, but in terms of mathematics, I consider myself the equivalent of a second year grad student. So I'm comfortable leading this project and am confident in my research taste, but you might know more math than me. The research problems Like all researchers in agent foundations, I find it quite difficult to concisely communicate what my research is about. Probably the best way to tell if you will be interested in my research problems is to read other things I've written, and then have a conversation with me about it. All my research is purely mathematical,[2] rather than experimental or empirical. None of it involves machine learning per se, but the theorems should ...

    LW - o1-preview is pretty good at doing ML on an unknown dataset by Håvard Tveit Ihle

    Play Episode Listen Later Sep 20, 2024 3:14


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: o1-preview is pretty good at doing ML on an unknown dataset, published by Håvard Tveit Ihle on September 20, 2024 on LessWrong. Previous post: How good are LLMs at doing ML on an unknown dataset? A while back I ran some evaluation tests on GPT4o, Claude Sonnet 3.5 and Gemini advanced to see how good they were at doing machine learning on a completely novel, and somewhat unusual dataset. The data was basically 512 points in the 2D plane, and some of the points make up a shape, and the goal is to classify the data according to what shape the points make up. None of the models did better than chance on the original (hard) dataset, while they did somewhat better on a much easier version I made afterwards. With the release of o1-preview, I wanted to quickly run the same test on o1, just to see how well it did. In summary, it basically solved the hard version of my previous challenge, achieving 77% accuracy on the test set on its fourth submission (this increases to 91% if I run it for 250 instead of 50 epochs), which is really impressive to me. Here is the full conversation with ChatGPT o1-preview In general o1-preview seems like a big step change in its ability to reliably do hard tasks like this without any advanced scaffolding or prompting to make it work. Detailed discussion of results The architecture that o1 went for in the first round is essentially the same that Sonnet 3.5 and gemini went for, a pointnet inspired model which extracts features from each point independently. While it managed to do slightly better than chance on the training set, it did not do well on the test set. For round two, it went for the approach (which also Sonnet 3.5 came up with) of binning the points in 2D into an image, and then using a regular 2D convnet to classify the shapes. This worked somewhat on the first try. It completely overfit the training data, but got to an accuracy of 56% on the test data. For round three, it understood that it needed to add data augmentations in order to generalize better, and it implemented scaling, translations and rotations of the data. It also switched to a slightly modified resnet18 architecture (a roughly 10x larger model). However, it made a bug when converting to PIL image (and back to torch.tensor), which resulted in an error. For round four, o1 fixed the error and has a basically working solution, achieving an accuracy of 77% (which increases to 91% if we increase the number of epochs from 50 to 250, all still well within the alloted hour of runtime). I consider the problem basically solved at this point, by playing around with smaller variations on this, you can probably get a few more percentage points without any more insights needed. For the last round, it tried the standard approach of using the pretrained weights of resnet18 and freezing almost all the layers, which is an approach that works well on many problems, but did not work well in this case. The accuracy reduced to 41%. I guess these data are just too different from imagenet (which resnet18 is trained on) for this approach to work well. I would not have expected this to work, but I don't hold it that much against o1, as it is a reasonable thing to try. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

    EA - The Best Argument is not a Simple English Yud Essay by Jonathan Bostock

    Play Episode Listen Later Sep 20, 2024 6:35


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Best Argument is not a Simple English Yud Essay, published by Jonathan Bostock on September 20, 2024 on The Effective Altruism Forum. I was encouraged to post this here, but I don't yet have enough EA forum karma to crosspost directly! Epistemic status: these are my own opinions on AI risk communication, based primarily on my own instincts on the subject and discussions with people less involved with rationality than myself. Communication is highly subjective and I have not rigorously A/B tested messaging. I am even less confident in the quality of my responses than in the correctness of my critique. If they turn out to be true, these thoughts can probably be applied to all sorts of communication beyond AI risk. Lots of work has gone into trying to explain AI risk to laypersons. Overall, I think it's been great, but there's a particular trap that I've seen people fall into a few times. I'd summarize it as simplifying and shortening the text of an argument without enough thought for the information content. It comes in three forms. One is forgetting to adapt concepts for someone with a far inferential distance; another is forgetting to filter for the important information; the third is rewording an argument so much you fail to sound like a human being at all. I'm going to critique three examples which I think typify these: Failure to Adapt Concepts I got this from the summaries of AI risk arguments written by Katja Grace and Nathan Young here. I'm making the assumption that these summaries are supposed to be accessible to laypersons, since most of them seem written that way. This one stands out as not having been optimized on the concept level. This argument was below-aveage effectiveness when tested. I expect most people's reaction to point 2 would be "I understand all those words individually, but not together". It's a huge dump of conceptual information all at once which successfully points to the concept in the mind of someone who already understands it, but is unlikely to introduce that concept to someone's mind. Here's an attempt to do better: 1. So far, humans have mostly developed technology by understanding the systems which the technology depends on. 2. AI systems developed today are instead created by machine learning. This means that the computer learns to produce certain desired outputs, but humans do not tell the system how it should produce the outputs. We often have no idea how or why an AI behaves in the way that it does. 3. Since we don't understand how or why an AI works a certain way, it could easily behave in unpredictable and unwanted ways. 4. If the AI is powerful, then the consequences of unwanted behaviour could be catastrophic. And here's Claude's just for fun: 1. Up until now, humans have created new technologies by understanding how they work. 2. The AI systems made in 2024 are different. Instead of being carefully built piece by piece, they're created by repeatedly tweaking random systems until they do what we want. This means the people who make these AIs don't fully understand how they work on the inside. 3. When we use systems that we don't fully understand, we're more likely to run into unexpected problems or side effects. 4. If these not-fully-understood AI systems become very powerful, any unexpected problems could potentially be really big and harmful. I think it gets points 1 and 3 better than me, but 2 and 4 worse. Either way, I think we can improve upon the summary. Failure to Filter Information When you condense an argument down, you make it shorter. This is obvious. What is not always as obvious is that this means you have to throw out information to make the core point clearer. Sometimes the information that gets kept is distracting. Here's an example from a poster a friend of mine made for Pause AI: When I showed this to ...

    LW - Interested in Cognitive Bootcamp? by Raemon

    Play Episode Listen Later Sep 20, 2024 2:05


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Interested in Cognitive Bootcamp?, published by Raemon on September 20, 2024 on LessWrong. I'm running more 4-day "Cognitive Bootcamps" over the next couple months (during Lighthaven Eternal September season). DM me if you're potentially interested (either as an individual, or as a team). The workshop is most valuable to people who: control their decisionmaking process (i.e. you decide what projects you or a team work on, rather than working at a day-job on someone else's vision) are either a) confused about planmaking / have a vague sense that they aren't as strategically ambitious as they could be. and/or, b) are at a place where it's natural to spend a few days thinking big-picture thoughts before deciding on their next project. There's a secondary[1] focus on "practice solving confusing problems", which IMO is time well spent, but requires more followup practice to pay off. I wrote about the previous workshop here. Participants said on average they'd have been willing to pay $850 for it, and would have paid $5000 for the ideal, perfectly-tailored-for-them version. My plan is to charge $500/person for the next workshop, and then $1000 for the next one. I'm most excited to run this for teams, who can develop a shared skillset and accompanying culture. I plan to tailor the workshops for the needs of whichever people show up. The dates are not scheduled yet (depends somewhat on when a critical mass of participants are available). DM me if you are interested. The skills being taught will be similar to the sort of thing listed in Skills from a year of Purposeful Rationality Practice and the Feedbackloop-first Rationality sequence. My default curriculum is aiming to teach several interrelated related skills you can practice over four days, that build into a coherent metaskill of "ambitious planning, at multiple timescales." 1. ^ I started this project oriented around "find better feedbackloops for solving confusing problems", and later decided that planmaking was the highest leverage part of the skill tree to focus on. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

    LW - Laziness death spirals by PatrickDFarley

    Play Episode Listen Later Sep 19, 2024 13:04


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Laziness death spirals, published by PatrickDFarley on September 19, 2024 on LessWrong. I've claimed that Willpower compounds and that small wins in the present make it easier to get bigger wins in the future. Unfortunately, procrastination and laziness compound, too. You're stressed out for some reason, so you take the evening off for a YouTube binge. You end up staying awake a little later than usual and sleeping poorly. So the next morning you feel especially tired; you snooze a few extra times. In your rushed morning routine you don't have time to prepare for the work meeting as much as you'd planned to. So you have little to contribute during the meeting. You feel bad about your performance. You escape from the bad feelings with a Twitter break. But Twitter is freaking out. Elon Musk said what? Everyone is weighing in. This is going to occupy you intermittently for the rest of the day. And so on. Laziness has a kind of independent momentum to it. When you're having a day like the above, even if you consciously commit to getting back on track, the rut tends to find its way back to you within a couple of hours. Keep this up for a few days and your sleep is utterly messed up, and you walk around in a fog. Keep it up for a week or two and you're fully off your workout routine. In a month or two, you might have noticeably fallen behind on work; you might be absent from your social life; you might've visibly gained fat or lost muscle; you can no longer feel excited about your personal goals because they're behind a pile of mundane tasks you need to catch up on first. And so on. How do we stop the vicious circle? I'm spiraling! I'm spiraling! When you're in a laziness death spiral, it's hard to do anything deliberate. The first and most important step, which does take some willpower but not a lot, is to acknowledge, "I'm in a laziness death spiral today." If you don't acknowledge it, here's what happens: You vaguely notice you you've been wasting time today; you feel a twinge of guilt, so you quickly decide, "I'm going to turn the rest of the day around, starting right now." And does that work? Often it doesn't! Sure, after a small lapse you can just get back on track, but if enough laziness momentum has built up, a momentary reaction doesn't cut it. Deciding things quickly, in response to negative emotions, is exactly how you got into this situation! You're going to turn it around on a whim? You'll have a different whim in the next hour; what then? You need to take a step back and get your mind outside of the problem. Do what you can The next three sections are three different courses of action you can take to get out of a laziness death spiral. One of them is clearly preferable, but I'm writing the alternatives, too. When you're in a low-willpower state, it's often bad to attempt the very best solution - the farther you reach, the harder you can fall. Building a base of "small wins" is the reliable way to repair your willpower. If you start something lofty and then bail on it, you're doing real damage: logging another willpower failure and associating that "very best solution" with failure. Here are the moves: A) Emergency recovery If you're in a laziness spiral and you need to get out of it right now, there are some measures you can take that, while effective, are not ideal. They are unsustainable, promote bad habits, or are just generally unhealthy. But sometimes the need is there: maybe you have a deadline fast approaching (and the deadline itself isn't enough to snap you into action); maybe your friends or family need you to take care of something today; maybe you were in the middle of an awfully lazy day and a once-in-a-lifetime opportunity came up, and you just can't focus enough to act on it. Disclaimer: I believe that in a well planned life, none of these sho...

    LW - We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap by johnswentworth

    Play Episode Listen Later Sep 19, 2024 7:41


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap, published by johnswentworth on September 19, 2024 on LessWrong. Background: "Learning" vs "Learning About" Adaptive systems, reinforcement "learners", etc, "learn" in the sense that their behavior adapts to their environment. Bayesian reasoners, human scientists, etc, "learn" in the sense that they have some symbolic representation of the environment, and they update those symbols over time to (hopefully) better match the environment (i.e. make the map better match the territory). These two kinds of "learning" are not synonymous[1]. Adaptive systems "learn" things, but they don't necessarily "learn about" things; they don't necessarily have an internal map of the external territory. (Yes, the active inference folks will bullshit about how any adaptive system must have a map of the territory, but their math does not substantively support that interpretation.) The internal heuristics or behaviors "learned" by an adaptive system are not necessarily "about" any particular external thing, and don't necessarily represent any particular external thing[2]. We Humans Learn About Our Values "I thought I wanted X, but then I tried it and it was pretty meh." "For a long time I pursued Y, but now I think that was more a social script than my own values." "As a teenager, I endorsed the view that Z is the highest objective of human existence. … Yeah, it's a bit embarrassing in hindsight." The ubiquity of these sorts of sentiments is the simplest evidence that we do not typically know our own values[3]. Rather, people often (but not always) have some explicit best guess at their own values, and that guess updates over time - i.e. we can learn about our own values. Note the wording here: we're not just saying that human values are "learned" in the more general sense of reinforcement learning. We're saying that we humans have some internal representation of our own values, a "map" of our values, and we update that map in response to evidence. Look again at the examples at the beginning of this section: "I thought I wanted X, but then I tried it and it was pretty meh." "For a long time I pursued Y, but now I think that was more a social script than my own values." "As a teenager, I endorsed the view that Z is the highest objective of human existence. … Yeah, it's a bit embarrassing in hindsight." Notice that the wording of each example involves beliefs about values. They're not just saying "I used to feel urge X, but now I feel urge Y". They're saying "I thought I wanted X" - a belief about a value! Or "now I think that was more a social script than my own values" - again, a belief about my own values, and how those values relate to my (previous) behavior. Or "I endorsed the view that Z is the highest objective" - an explicit endorsement of a belief about values. That's how we normally, instinctively reason about our own values. And sure, we could reword everything to avoid talking about our beliefs about values - "learning" is more general than "learning about" - but the fact that it makes sense to us to talk about our beliefs about values is strong evidence that something in our heads in fact works like beliefs about values, not just reinforcement-style "learning". Two Puzzles Puzzle 1: Learning About Our Own Values vs The Is-Ought Gap Very roughly speaking, an agent could aim to pursue any values regardless of what the world outside it looks like; "how the external world is" does not tell us "how the external world should be". So when we "learn about" values, where does the evidence about values come from? How do we cross the is-ought gap? Puzzle 2: The Role of Reward/Reinforcement It does seem like humans have some kind of physiological "reward", in a hand-wavy reinforcement-learning-esque sense, which seems to at l...

    LW - AI #82: The Governor Ponders by Zvi

    Play Episode Listen Later Sep 19, 2024 43:47


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #82: The Governor Ponders, published by Zvi on September 19, 2024 on LessWrong. The big news of the week was of course OpenAI releasing their new model o1. If you read one post this week, read that one. Everything else is a relative sideshow. Meanwhile, we await Newsom's decision on SB 1047. The smart money was always that Gavin Newsom would make us wait before offering his verdict on SB 1047. It's a big decision. Don't rush him. In the meantime, what hints he has offered suggest he's buying into some of the anti-1047 talking points. I'm offering a letter to him here based on his comments, if you have any way to help convince him now would be the time to use that. But mostly, it's up to him now. Table of Contents 1. Introduction. 2. Table of Contents. 3. Language Models Offer Mundane Utility. Apply for unemployment. 4. Language Models Don't Offer Mundane Utility. How to avoid the blame. 5. Deepfaketown and Botpocalypse Soon. A social network of you plus bots. 6. They Took Our Jobs. Not much impact yet, but software jobs still hard to find. 7. Get Involved. Lighthaven Eternal September, individual rooms for rent. 8. Introducing. Automated scientific literature review. 9. In Other AI News. OpenAI creates independent board to oversee safety. 10. Quiet Speculations. Who is preparing for the upside? Or appreciating it now? 11. Intelligent Design. Intelligence. It's a real thing. 12. SB 1047: The Governor Ponders. They got to him, but did they get to him enough? 13. Letter to Newsom. A final summary, based on Newsom's recent comments. 14. The Quest for Sane Regulations. How should we update based on o1? 15. Rhetorical Innovation. The warnings will continue, whether or not anyone listens. 16. Claude Writes Short Stories. It is pondering what you might expect it to ponder. 17. Questions of Sentience. Creating such things should not be taken lightly. 18. People Are Worried About AI Killing Everyone. The endgame is what matters. 19. The Lighter Side. You can never be sure. Language Models Offer Mundane Utility Arbitrate your Nevada unemployment benefits appeal, using Gemini. This should solve the backlog of 10k+ cases, and also I expect higher accuracy than the existing method, at least until we see attempts to game the system. Then it gets fun. That's also job retraining. o1 usage limit raised to 50 messages per day for o1-mini, 50 per week for o1-preview. o1 can do multiplication reliably up to about 46 digits, andabout 50% accurately up through about 810, a huge leap from gpt-4o, although Colin Fraser reports 4o can be made better tat this than one would expect. o1 is much better than 4o at evaluating medical insurance claims, and determining whether requests for care should be approved, especially in terms of executing existing guidelines, and automating administrative tasks. It seems like a clear step change in usefulness in practice. The claim is that being sassy and juicy and bitchy improves Claude Instant numerical reasoning. What I actually see here is that it breaks Claude Instant out of trick questions. Where Claude would previously fall into a trap, you have it fall back on what is effectively 'common sense,' and it starts getting actually easy questions right. Language Models Don't Offer Mundane Utility A key advantage of using an AI is that you can no longer be blamed for an outcome out of your control. However, humans often demand manual mode be available to them, allowing humans to override the AI, even when it doesn't make any practical sense to offer this. And then, if the human can in theory switch to manual mode and override the AI, blame to the human returns, even when the human exerting that control was clearly impractical in context. The top example here is self-driving cars, and blame for car crashes. The results suggest that the human thirst for ill...

    LW - Which LessWrong/Alignment topics would you like to be tutored in? [Poll] by Ruby

    Play Episode Listen Later Sep 19, 2024 2:03


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Which LessWrong/Alignment topics would you like to be tutored in? [Poll], published by Ruby on September 19, 2024 on LessWrong. Would you like to be tutored in applied game theory, natural latents, CFAR-style rationality techniques, "general AI x-risk", Agent Foundations, anthropic s , or some other topics discussed on LessWrong? I'm thinking about prototyping some topic-specific LLM tutor bots, and would like to prioritize topics that multiple people are interested in. Topic-specific LLM tutors would be customized with things like pre-loaded relevant context, helpful system prompts, and more focused testing to ensure they work. Note: I'm interested in topics that are written about on LessWrong, e.g. infra-bayesianism, and not magnetohydrodynamics". I'm going to use the same poll infrastructure that Ben Pace pioneered recently. There is a thread below where you add and vote on topics/domains/areas where you might like tutoring. 1. Karma: upvote/downvote to express enthusiasm about there being tutoring for a topic. 2. Reacts: click on the agree react to indicate you personally would like tutoring on a topic. 3. New Poll Option. Add a new topic for people express interest in being tutored on. For the sake of this poll, I'm more interested in whether you'd like tutoring on a topic or not, separate from the question of whether you think a tutoring bot would be any good. I'll worry about that part. Background I've been playing around with LLMs a lot in the past couple of months and so far my favorite use case is tutoring. LLM-assistance is helpful via multiple routes such as providing background context with less effort than external search/reading, keeping me engaged via interactivity, generating examples, and breaking down complex sections into more digestible pieces. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

    EA - What Would You Ask The Archbishop of Canterbury? by JDBauman

    Play Episode Listen Later Sep 19, 2024 0:43


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What Would You Ask The Archbishop of Canterbury?, published by JDBauman on September 19, 2024 on The Effective Altruism Forum. The head of the Church of England is the second most influential Christian alive today. [1] The current Archbishop, Justin Welby, is speaking at the EA-adjacent Christians for Impact conference with Rory Stewart about faith and poverty. What should we ask Archbishop Justin in the Q&A? Feel free to submit anonymous thoughts here. 1. ^ Source: ChatGPT Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

    LW - [Intuitive self-models] 1. Preliminaries by Steven Byrnes

    Play Episode Listen Later Sep 19, 2024 39:21


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Intuitive self-models] 1. Preliminaries, published by Steven Byrnes on September 19, 2024 on LessWrong. 1.1 Summary & Table of Contents This is the first of a series of eight blog posts, which I'll be serializing over the next month or two. (Or email or DM me if you want to read the whole thing right now.) Here's an overview of the whole series, and then we'll jump right into the first post! 1.1.1 Summary & Table of Contents - for the whole series This is a rather ambitious series of blog posts, in that I'll attempt to explain what's the deal with consciousness, free will, hypnotism, enlightenment, hallucinations, flow states, dissociation, akrasia, delusions, and more. The starting point for this whole journey is very simple: The brain has a predictive (a.k.a. self-supervised) learning algorithm. This algorithm builds generative models (a.k.a. "intuitive models") that can predict incoming data. It turns out that, in order to predict incoming data, the algorithm winds up not only building generative models capturing properties of trucks and shoes and birds, but also building generative models capturing properties of the brain algorithm itself. Those latter models, which I call "intuitive self-models", wind up including ingredients like conscious awareness, deliberate actions, and the sense of applying one's will. That's a simple idea, but exploring its consequences will take us to all kinds of strange places - plenty to fill up an eight-post series! Here's the outline: Post 1 (Preliminaries) gives some background on the brain's predictive learning algorithm, how to think about the "intuitive models" built by that algorithm, how intuitive self-models come about, and the relation of this whole series to Philosophy Of Mind. Post 2 ( Awareness ) proposes that our intuitive self-models include an ingredient called "conscious awareness", and that this ingredient is built by the predictive learning algorithm to represent a serial aspect of cortex computation. I'll discuss ways in which this model is veridical (faithful to the algorithmic phenomenon that it's modeling), and ways that it isn't. I'll also talk about how intentions and decisions fit into that framework. Post 3 ( The Homunculus ) focuses more specifically on the intuitive self-model that almost everyone reading this post is experiencing right now (as opposed to the other possibilities covered later in the series), which I call the Conventional Intuitive Self-Model. In particular, I propose that a key player in that model is a certain entity that's conceptualized as actively causing acts of free will. Following Dennett, I call this entity "the homunculus", and relate that to intuitions around free will and sense-of-self. Post 4 ( Trance ) builds a framework to systematize the various types of trance, from everyday "flow states", to intense possession rituals with amnesia. I try to explain why these states have the properties they do, and to reverse-engineer the various tricks that people use to induce trance in practice. Post 5 ( Dissociative Identity Disorder ) (a.k.a. "multiple personality disorder") is a brief opinionated tour of this controversial psychiatric diagnosis. Is it real? Is it iatrogenic? Why is it related to borderline personality disorder (BPD) and trauma? What do we make of the wild claim that each "alter" can't remember the lives of the other "alters"? Post 6 ( Awakening / Enlightenment / PNSE ) is a type of intuitive self-model, typically accessed via extensive meditation practice. It's quite different from the conventional intuitive self-model. I offer a hypothesis about what exactly the difference is, and why that difference has the various downstream effects that it has. Post 7 (Hearing Voices, and Other Hallucinations) talks about factors contributing to hallucinations - although I argue ...

    EA - EA Organization Updates: September 2024 by Toby Tremlett

    Play Episode Listen Later Sep 19, 2024 9:56


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: EA Organization Updates: September 2024, published by Toby Tremlett on September 19, 2024 on The Effective Altruism Forum. If you would like to see EA Organization Updates as soon as they come out, consider subscribing to this tag. Some of the opportunities and job listings we feature in this update have (very) pressing deadlines (see AI Alignment Teaching Fellow opportunities at BlueDot Impact, September 22, and Institutional Foodservice Fellow at the Good Food Institute, September 18). You can see previous updates on the "EA Organization Updates (monthly series)" topic page, or in our repository of past newsletters. Notice that there's also an "org update" tag, where you can find more news and updates that are not part of this consolidated series. These monthly posts originated as the "Updates" section of the monthly EA Newsletter. Organizations submit their own updates, which we edit for clarity. (If you'd like to share your updates and jobs via this series, please apply here.) Opportunities and jobs Opportunities Consider also checking opportunities listed on the EA Opportunity Board and the Opportunities to Take Action tag. ALLFED published a new database containing numerous research projects that prospective volunteers can assist with. Explore the database and apply here. Apply to the upcoming AI Safety Fundamentals: Alignment course by October 6 to learn about the risks from AI and how you can contribute to the field. The Animal Advocacy Careers Introduction to Animal Advocacy Course has been revamped. The course is for those wishing to kickstart a career in animal advocacy. Giv Effektivt (DK) needs ~110 EU citizens to become members before the new year in order to offer tax deductions of around 450.000DKK ($66.000) for 2024-25 donations. Become a member now for 50DKK ($7). An existing donor will give 100DKK for each new member until the organization reaches 300 members. Anima International's Animal Advocacy Training Center released a new online course - Fundraising Essentials. It's a free, self-paced resource with over two hours of video content for people new to the subject. Job listings Consider also exploring jobs listed on the Job listing (open) tag. For even more roles, check the 80,000 Hours Job Board. BlueDot Impact AI Alignment Teaching Fellow (Remote, £4.9K-£9.6K, apply by September 22nd) Centre for Effective Altruism Head of Operations (Remote, £107.4K / $179.9K, apply by October 7th) Cooperative AI Foundation Communications Officer (Remote, £35K-£40K, apply by September 29th) GiveWell Senior Researcher (Remote, $200K-$220.6K) Giving What We Can Global CEO (Remote, $130K+, apply by September 30th) Open Philanthropy Operations Coordinator/Associate (San Francisco, Washington, DC, $99.6K-$122.6K) If you're interested in working at Open Philanthropy but don't see an open role that matches your skillset, express your interest. Epoch AI Question Writer, Math Benchmark (Contractor Position) (Remote, $2K monthly + $100-$1K performance-based bonus) Senior Researcher, ML Distributed Systems (Remote, $150K-$180K) The Good Food Institute Managing Director, GFI India (Hybrid (Mumbai, Delhi, Hyderabad, or Bangalore), ₹4.5M, apply by October 2nd) Institutional Foodservice Fellow (Independent Contractor) (Remote in US, $3.6K biweekly, apply by September 18th) Organization updates The organization updates are in alphabetical order (0-A-Z). 80,000 Hours There is one month left to win $5,000 career grants by referring your friends or colleagues to 80,000 Hours' free career advising. Also, the organization released a blog post about the recent updates to their AI-related content, as well as a post about pandemic preparedness in relation to mpox and H5N1. On the 80,000 Hours Podcast, Rob interviewed: Nick Joseph on whether Anthropic's AI safety policy is up to the task...

    EA - Five Years of Animal Advocacy Careers: Our Journey to impact, Lessons Learned, and What's Next by lauren mee

    Play Episode Listen Later Sep 19, 2024 28:13


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Five Years of Animal Advocacy Careers: Our Journey to impact, Lessons Learned, and What's Next, published by lauren mee on September 19, 2024 on The Effective Altruism Forum. This post is mostly about our key learnings, impact made and future plans Thanks to my team for their help in both creating this post and unwavering commitment to driving forward AAC's ambitious plans for animals, in particular Ana Barreiro, Nayan and Engin for their contributions and feedback on this post. TL;DR: For five years, Animal Advocacy Careers (AAC) has tried to direct passionate professionals towards high-impact opportunities that have the potential to help animals the most. We've filled 105 roles in leading animal advocacy organisations, supported over 150 organisations with recruitment, and launched 3 core programs our online course, job board, and career advising service. At the same time, we built a community of 27,500+ supporters across social media, Slack, and email. Our efforts also led to 12 10% Pledges and 11 Trial Pledges at Giving What We Can. We cautiously estimate adding $2.5 million worth of counterfactual impact from these donations and placements at a spend of $950,000 We conducted four talent surveys, which, along with our own independent research, continue to form the foundation of our career advising and strategy. Addressing the talent bottlenecks in the effective animal advocacy movement has proven to be far more complex than we first expected. Beyond the initial challenges, we've encountered a range of issues that directly impact our theory of change and our ability to drive meaningful impact - such as the scarcity of job postings and difficulties in the hiring process. In response, we've broadened our focus beyond just non-profit roles to better address these challenges and open up more opportunities for talented individuals to contribute to the movement. Explore more about how AAC is transforming animal advocacy careers and find out more about our exciting plans for the future. (Note: If you would like the full details of the programmes we have stopped, started, scaled and pivoted and a full programme evaluation our latest 2023/4 update is here) Overview This piece highlights Animal Advocacy Careers' accomplishments, mistakes, and changes since its establishment in 2019. We discuss AAC's future plans as well as potential constraints to our impact. Our vision is to have an animal advocacy movement of international talent density with mission-aligned advocates in critical positions in society, accelerating freedom for animals. Background AAC was founded in July 2019 through Charity Entrepreneurship's incubation program. Its goal is to accelerate the impact of existing organisations by solving their major talent bottlenecks, attracting top talent to the movement, matching them to the most impactful opportunities and empowering professionals to make a real impact. To effectively match top talent with the most impactful opportunities, AAC first had to conduct research to gain a deeper understanding of the movement's challenges and overall talent landscape. We needed to identify the market size, determine which skills and roles were most in demand and hardest to fill, and uncover the root causes behind these talent bottlenecks. This research forms the foundation of our work, allowing us to address the movement's needs in a more informed and strategic way. In addition to conducting research, AAC launched several experimental programs aimed at addressing talent bottlenecks . These programs included management and leadership training, an online course, a job board, career advising, fundraising work placements, headhunting and recruitment efforts, organisational recruitment training, a candidate database, and effective giving for animals. Through trialing these programmes...

    AF - The Obliqueness Thesis by Jessica Taylor

    Play Episode Listen Later Sep 19, 2024 30:04


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Obliqueness Thesis, published by Jessica Taylor on September 19, 2024 on The AI Alignment Forum. In my Xenosystems review, I discussed the Orthogonality Thesis, concluding that it was a bad metaphor. It's a long post, though, and the comments on orthogonality build on other Xenosystems content. Therefore, I think it may be helpful to present a more concentrated discussion on Orthogonality, contrasting Orthogonality with my own view, without introducing dependencies on Land's views. (Land gets credit for inspiring many of these thoughts, of course, but I'm presenting my views as my own here.) First, let's define the Orthogonality Thesis. Quoting Superintelligence for Bostrom's formulation: Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal. To me, the main ambiguity about what this is saying is the "could in principle" part; maybe, for any level of intelligence and any final goal, there exists (in the mathematical sense) an agent combining those, but some combinations are much more natural and statistically likely than others. Let's consider Yudkowsky's formulations as alternatives. Quoting Arbital: The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal. The strong form of the Orthogonality Thesis says that there's no extra difficulty or complication in the existence of an intelligent agent that pursues a goal, above and beyond the computational tractability of that goal. As an example of the computational tractability consideration, sufficiently complex goals may only be well-represented by sufficiently intelligent agents. "Complication" may be reflected in, for example, code complexity; to my mind, the strong form implies that the code complexity of an agent with a given level of intelligence and goals is approximately the code complexity of the intelligence plus the code complexity of the goal specification, plus a constant. Code complexity would influence statistical likelihood for the usual Kolmogorov/Solomonoff reasons, of course. I think, overall, it is more productive to examine Yudkowsky's formulation than Bostrom's, as he has already helpfully factored the thesis into weak and strong forms. Therefore, by criticizing Yudkowsky's formulations, I am less likely to be criticizing a strawman. I will use "Weak Orthogonality" to refer to Yudkowsky's "Orthogonality Thesis" and "Strong Orthogonality" to refer to Yudkowsky's "strong form of the Orthogonality Thesis". Land, alternatively, describes a "diagonal" between intelligence and goals as an alternative to orthogonality, but I don't see a specific formulation of a "Diagonality Thesis" on his part. Here's a possible formulation: Diagonality Thesis: Final goals tend to converge to a point as intelligence increases. The main criticism of this thesis is that formulations of ideal agency, in the form of Bayesianism and VNM utility, leave open free parameters, e.g. priors over un-testable propositions, and the utility function. Since I expect few readers to accept the Diagonality Thesis, I will not concentrate on criticizing it. What about my own view? I like Tsvi's naming of it as an "obliqueness thesis". Obliqueness Thesis: The Diagonality Thesis and the Strong Orthogonality Thesis are false. Agents do not tend to factorize into an Orthogonal value-like component and a Diagonal belief-like component; rather, there are Oblique components that do not factorize neatly. (Here, by Orthogonal I mean basically independent of intelligence, and by Diagonal I mean converging to a point in the limit of intelligence.) While I will address Yudkowsky's arguments for the Orthogonality Thesis, I think arguing directly for my view first will be more helpful. In general, it seems ...

    LW - The case for a negative alignment tax by Cameron Berg

    Play Episode Listen Later Sep 18, 2024 14:19


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The case for a negative alignment tax, published by Cameron Berg on September 18, 2024 on LessWrong. TL;DR: Alignment researchers have historically predicted that building safe advanced AI would necessarily incur a significant alignment tax compared to an equally capable but unaligned counterfactual AI. We put forward a case here that this prediction looks increasingly unlikely given the current 'state of the board,' as well as some possibilities for updating alignment strategies accordingly. Introduction We recently found that over one hundred grant-funded alignment researchers generally disagree with statements like: alignment research that has some probability of also advancing capabilities should not be done (~70% somewhat or strongly disagreed) advancing AI capabilities and doing alignment research are mutually exclusive goals (~65% somewhat or strongly disagreed) Notably, this sample also predicted that the distribution would be significantly more skewed in the 'hostile-to-capabilities' direction. See ground truth vs. predicted distributions for these statements These results - as well as recent events and related discussions - caused us to think more about our views on the relationship between capabilities and alignment work given the 'current state of the board,'[1] which ultimately became the content of this post. Though we expect some to disagree with these takes, we have been pleasantly surprised by the positive feedback we've received from discussing these ideas in person and are excited to further stress-test them here. Is a negative alignment tax plausible (or desirable)? Often, capabilities and alignment are framed with reference to the alignment tax, defined as 'the extra cost [practical, developmental, research, etc.] of ensuring that an AI system is aligned, relative to the cost of building an unaligned alternative.' The AF/ LW wiki entry on alignment taxes notably includes the following claim: The best case scenario is No Tax: This means we lose no performance by aligning the system, so there is no reason to deploy an AI that is not aligned, i.e., we might as well align it. The worst case scenario is Max Tax: This means that we lose all performance by aligning the system, so alignment is functionally impossible. We speculate in this post about a different best case scenario: a negative alignment tax - namely, a state of affairs where an AI system is actually rendered more competent/performant/capable by virtue of its alignment properties. Why would this be even better than 'No Tax?' Given the clear existence of a trillion dollar attractor state towards ever-more-powerful AI, we suspect that the most pragmatic and desirable outcome would involve humanity finding a path forward that both (1) eventually satisfies the constraints of this attractor (i.e., is in fact highly capable, gets us AGI, etc.) and (2) does not pose existential risk to humanity. Ignoring the inevitability of (1) seems practically unrealistic as an action plan at this point - and ignoring (2) could be collectively suicidal. Therefore, if the safety properties of such a system were also explicitly contributing to what is rendering it capable - and therefore functionally causes us to navigate away from possible futures where we build systems that are capable but unsafe - then these 'negative alignment tax' properties seem more like a feature than a bug. It is also worth noting here as an empirical datapoint here that virtually all frontier models' alignment properties have rendered them more rather than less capable (e.g., gpt-4 is far more useful and far more aligned than gpt-4-base), which is the opposite of what the 'alignment tax' model would have predicted. This idea is somewhat reminiscent of differential technological development, in which Bostrom suggests "[slowing] the devel...

    EA - Match funding opportunity to challenge the legality of Frankenchickens by Gavin Chappell-Bates

    Play Episode Listen Later Sep 18, 2024 7:19


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Match funding opportunity to challenge the legality of Frankenchickens, published by Gavin Chappell-Bates on September 18, 2024 on The Effective Altruism Forum. We have a once-in-a-generation opportunity to improve the lives of millions of chickens raised for food in the UK. In October 2024 The Humane League UK (THL UK) will be heading to the High Court to challenge the legality of fast-growing breeds of chicken - Frankenchickens. We need to raise £55k to fund the hearing. The Jeremy Coller Foundation has pledged to match funding half of the costs up to £28k. We need to raise a further £12.5k to maximise the match funding pot and fully fund the hearing. Please contact me directly should you wish to donate and fight for 1 billion chickens. Frankenchickens ' Frankenchickens' are selectively bred to grow unnaturally big and fast to maximise profits. They are destined to suffer extremely short and painful lives, suffer heart attacks, are often unable to walk and succumb to open sores from laying in their own waste. They grow 400% faster than is natural for their bodies, creating the biggest animal welfare crisis of our time. In the UK alone, there are over 1 billion chickens raised for meat and over 90% are fast growing. THL UK's three-year legal battle In 2020, we saw an opportunity to challenge the legality of Frankenchickens and began building a legal case against the Department for Environment, Food & Rural Affairs (Defra). This culminated in a judicial review taking place at the High Court in May 2023. Getting to this point was a major success in itself as only 5% of cases are granted a full hearing. The judge stated that a full hearing of the facts regarding fast-growing chickens was in the public interest. Represented by Advocates for Animals, we argued that fast-growing chicken breeds, known as Frankenchickens, are illegal under current animal welfare laws, as they suffer as a direct result of their breeding. Our case was bolstered by evidence given by the RSPCA which shows that fast-growing breeds of chicken do suffer, no matter the environment they're raised in. This was despite Defra attempting to block the submission of the RSPCA's evidence. The fight continues In May 2023, the High Court ruled that Defra hadn't behaved unlawfully in their interpretation of the Welfare of Farmed Animals Regulation of 2007. Shortly after the ruling we decided to appeal the court's decision, and continue our three-year legal battle. There is overwhelming scientific consensus that chickens raised for meat suffer due to their breed. Defra itself has offered no evidence to contradict the RSPCA report and even accepted that there are welfare problems with fast-growing breeds of chicken. In October 2023, we found out that our appeal had been granted. In October 2024, we will be back in court, in front of a new judge, to take on Defra to end the cruel use of Frankenchickens in the UK. Our two-day court hearing is due to start on either Tuesday 22nd or Wednesday 23rd October. This is a once-in-a-generation opportunity to force the Government, with one decision from an appeals court judge, to transform one billion innocent lives per year. Our chances of success By virtue of being granted an appeal, our chances for a favourable final outcome have increased significantly. Being granted an appeal means that serious problems with the previous judge's findings have been uncovered, and the judge approving our appeal thinks our case still has merit that needs final and careful deliberation. A positive ruling would mean that the judge found Defra's interpretation of the Welfare of Farmed Animals Regulation of 2007 illegal, and would compel them to create a new policy on fast growing breeds of chicken, one that would invariably lead to farmers being disincentivized or even banned from keeping f...

    EA - Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety by titotal

    Play Episode Listen Later Sep 18, 2024 22:44


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety, published by titotal on September 18, 2024 on The Effective Altruism Forum. Disclaimer: I am a computational physicist's and this investigation is outside of my immediate area of expertise. Feel free to peruse the experiments and take everything I say with appropriate levels of skepticism. Introduction: The centre for AI safety is a prominent AI safety research group doing technical AI research as well as regulatory activism. It's headed by Dan Hendrycks, who has a PHD in computer science from Berkeley and some notable contributions to AI research. Last week CAIS released a blog post, entitled "superhuman automated forecasting", announcing a forecasting bot developed by a team including Hendrycks, along with a technical report and a website "five thirty nine", where users can try out the bot for themselves. The blog post makes several grandiose claims, claiming to rebut Nate silvers claims that superhuman forecasting is 15-20 years away, and that: Our bot performs better than experienced human forecasters and performs roughly the same as (and sometimes even better than) crowds of experienced forecasters; since crowds are for the most part superhuman, so is FiveThirtyNine. He paired this with a twitter post, declaring: We've created a demo of an AI that can predict the future at a superhuman level (on par with groups of human forecasters working together). Consequently I think AI forecasters will soon automate most prediction markets. The claim is this: Via a chain of prompting, GPT4-o can be harnessed for superhuman prediction. Step 1 is to ask GPT to figure out the most relevant search terms for a forecasting questions, then those are fed into a web search to yield a number of relevant news articles, to extract the information within. The contents of these news articles are then appended to a specially designed prompt which is fed back to GPT-4o. The prompt instructs it to boil down the articles into a list of arguments "for" and "against" the proposition and rate the strength of each, to analyse the results and give an initial numerical estimate, and then do one last sanity check and analysis before yielding a final percentage estimate. How do they know it works? Well, they claim to have run the bot on several metacalculus questions and achieved accuracy greater than both the crowd average and a test using the prompt of a competing model. Importantly, this was a retrodiction: they tried to run questions from last year, while restricting it's access to information since then, and then checked how many of the subsequent results are true. A claim of superhuman forecasting is quite impressive, and should ideally be backed up by impressive evidence. A previous paper trying similar techniques yielding less impressive claims runs to 37 pages, and it demonstrates them doing their best to avoid any potential flaw or pitfall in the process(and I'm still not sure they succeeded). In contrast, the CAIS report is only 4 pages long, lacking pretty much all the relevant information one would need to properly assess the claim. You can read feedback from the twitter replies, Manifold question, Lesswrong and the EA forum, which were all mostly skeptical and negative, bringing up a myriad of problems with the report. This report united most rationalists and anti-rationalists in skepticism. Although I will note that both AI safety memes and Kat Woods seemed to accept and spread the claims uncritically. The most important to highlight is these twitter comments by the author of a much more rigorous paper cited in the report, claiming that the results did not replicate on his side, as well as this critical response by another AI forecasting institute. Some of the concerns: The retrodiction...

    LW - Monthly Roundup #22: September 2024 by Zvi

    Play Episode Listen Later Sep 18, 2024 68:02


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Monthly Roundup #22: September 2024, published by Zvi on September 18, 2024 on LessWrong. It's that time again for all the sufficiently interesting news that isn't otherwise fit to print, also known as the Monthly Roundup. Bad News Beware the failure mode in strategy and decisions that implicitly assumes competence, or wishes away difficulties, and remember to reverse all advice you hear. Stefan Schubert (quoting Tyler Cowen on raising people's ambitions often being very high value): I think lowering others' aspirations can also be high-return. I know of people who would have had a better life by now if someone could have persuaded them to pursue more realistic plans. Rob Miles: There's a specific failure mode which I don't have a name for, which is similar to "be too ambitious" but is closer to "have an unrealistic plan". The illustrative example I use is: Suppose by some strange circumstance you have to represent your country at olympic gymnastics next week. One approach is to look at last year's gold, and try to do that routine. This will fail. You'll do better by finding one or two things you can actually do, and doing them well There's a common failure of rationality which looks like "Figure out what strategy an ideal reasoner would use, then employ that strategy". It's often valuable to think about the optimal policy, but you must understand the difference between knowing the path, and walking the path I do think that more often 'raise people's ambitions' is the right move, but you need to carry both cards around with you for different people in different situations. Theory that Starlink, by giving people good internet access, ruined Burning Man. Seems highly plausible. One person reported that they managed to leave the internet behind anyway, so they still got the Burning Man experience. Tyler Cowen essentially despairs of reducing regulations or the number of bureaucrats, because it's all embedded in a complex web of regulations and institutions and our businesses rely upon all that to be able to function. Otherwise business would be paralyzed. There are some exceptions, you can perhaps wholesale axe entire departments like education. He suggests we focus on limiting regulations on new economic areas. He doesn't mention AI, but presumably that's a lot of what's motivating his views there. I agree that 'one does not simply' cut existing regulations in many cases, and that 'fire everyone and then it will all work out' is not a strategy (unless AI replaces them?), but also I think this is the kind of thing can be the danger of having too much detailed knowledge of all the things that could go wrong. One should generalize the idea of eliminating entire departments. So yes, right now you need the FDA to approve your drug (one of Tyler's examples) but… what if you didn't? I would still expect, if a new President were indeed to do massive firings on rhetoric and hope, that the result would be a giant cluster****. La Guardia switches to listing flights by departure time rather than order of destination, which in my mind makes no sense in the context of flights, that frequently get delayed, where you might want to look for an earlier flight or know what backups are if yours is cancelled or delayed or you miss it, and so on. It also gives you a sense of where one can and can't actually go to when from where you are. For trains it makes more sense to sort by time, since you are so often not going to and might not even know the train's final destination. I got a surprising amount of pushback about all that on Twitter, some people felt very strongly the other way, as if to list by name was violating some sacred value of accessibility or something. Anti-Social Media Elon Musk provides good data on his followers to help with things like poll calibration, reports 73%-27% lea...

    EA - AI Welfare Debate Week retrospective by Toby Tremlett

    Play Episode Listen Later Sep 18, 2024 9:27


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Welfare Debate Week retrospective, published by Toby Tremlett on September 18, 2024 on The Effective Altruism Forum. I wrote this retrospective to be shared internally in CEA - but in the spirit of more open communication, I'm sharing it here as well. Note that this is a review of the event considered as a product, not a summary or review of the posts from the week. If you have any questions, or any additional feedback, that'd be appreciated! I'll be running another debate week soon, and feedback has already been very helpful in preparing for it. Also, feedback on the retro itself is appreciated- I'd ideally like to pre-register my retros and just have to fill in the graphs and conclusions once the event actually happens, so suggesting data we should measure/ questions I should be asking would be very helpful for making better retro templates. How successful was the event? In my OKRs (Objectives and Key Results- AKA, my goals for the event), I wanted this event to: Have 50 participants, with "participant" being anyone taking an event-related action such as voting, commenting, or posting. We did an order of magnitude better than 50. Over 558 people voted during the week, and 27 authors wrote or co-wrote at least one post. Change people's minds. I wanted the equivalent of 25 people changing their minds by 25% of the debate slider. We did twice as well as I hoped here- 53 unique users made at least one mind change of 0.25 delta (representing 25% of the slider) or more. Therefore, on our explicit goals, this event was successful . But how successful was it based on our other, non-KR goals and hopes? Some other goals that we had for the event- either in the ideation phase, or while it was ongoing, were: Create more good content on a particularly important issue to EAs. Successful. Increase engagement. Seems unsuccessful. Bring in some new users. Not noticeably successful. Increase messaging. Not noticeably successful. In the next four sections, I examine each of these goals in turn. More good content We had 28 posts with the debate week tag, with 7 being at or above 50 karma. Of the 7, all but one (JWS's thoughtful critique of the debate's framing) were from authors I had directly spoken to or messaged about the event. Compared to Draft Amnesty Week (which led to posts from 42 authors, and 10 posts over 50 karma) this isn't that many- however, I think we should count these posts as ex ante more valuable because of their focus on a specific topic. Ex-post, it's hard to assess how valuable the posts were. None of the posts had very high karma (i.e. the highest was 77). However, I did curate one of the posts, and a couple of others were considered for curation. I would be interested to hear takes from readers about how valuable the posts were - did any of them change your mind, lead to a collaboration, or cause you to think more about the topic? Engagement How much engagement did the event get? In total, debate week posts got 127 hours of engagement during the debate week (or 11.6% of total engagement), and 181 hours from July 1-14 (debate week and the week after), 7.5% of that fortnight's engagement hours. Did it increase total daily hours of engagement? Note: Discussion of Manifest controversies happened in June, and led to higher engagement hours per day in the build up to the event. Important dates: June 17: 244 comments, June 18: 349 comments, June 20: 33 comments, June 25: 38 comments It doesn't look as if the debate week meaningfully increased daily engagement. The average daily engagement for the week after the event is actually higher, although the 3rd day of the event (July 3rd- the day I mentioned that the event was ongoing in the EA Digest) remains the highest hours of engagement between July 1st and the date I'm writing this, August 21st. Did it get us new us...

    EA - Material Innovation Initiative (MII) shuts down by Nate Crosser

    Play Episode Listen Later Sep 18, 2024 4:35


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Material Innovation Initiative (MII) shuts down, published by Nate Crosser on September 18, 2024 on The Effective Altruism Forum. The "GFI of vegan materials" is shutting down after operating since 2019. They were an ACE-recommended charity at one point. No rationale is given in the announcement. I asked for more, and will update this post if they respond. Dear Valued Stakeholders, I am writing to you with mixed emotions to share some important news regarding the future of the Material Innovation Initiative (MII). After a thorough evaluation and much deliberation, the board of directors and the executive leadership team have made the difficult decision to wind down MII's operations. While this marks the end of our journey as an organization, we want to take this opportunity to celebrate our many accomplishments and the tremendous growth of the next-gen materials industry, as well as express our gratitude for your unwavering support over the past five years. A Legacy of Impact and Innovation Since our founding in 2019, MII has been at the forefront of transforming the next-gen materials industry. Our mission was clear: to accelerate the development of high-quality, high-performance, animal-free and environmentally preferred next-generation materials. We envisioned a world where the materials used in fashion, automotive, and home goods industries would protect human rights, mitigate climate change, spare animals' lives, and preserve our planet for future generations. Thanks to your support, we have made significant strides towards this vision: Catalyzing Investments: MII has been instrumental in inspiring over $2.31 billion in investments into next-gen materials, including $504 million in 2023 alone. These investments have driven innovation and growth across the sector, enabling the development of materials that meet performance, aesthetic, and sustainability needs at competitive prices. Research and Advocacy: Our pioneering research, such as the U.S. Consumer Research on next-gen materials, revealed that 92% of consumers are likely to purchase next-gen products, highlighting a significant market opportunity. Our State of the Industry reports have been vital resources for innovators, brands, and investors, saving them time and guiding strategic decision-making. Brand Collaborations: We have facilitated groundbreaking partnerships between next-gen material innovators and major brands. In 2023, we saw almost 400 collaborations between influential brands and next-gen material companies, showing the increasing interest from brands to incorporate next-gen materials into their collections. This also illustrates the tremendous potential of next-gen materials to disrupt the fashion, home goods and automotive industries. Global Influence and Advocacy: MII has been appointed to influential roles, such as serving on the New York City Mayor's Office task force to source sustainable materials. Our participation in global events have increased visibility for next-gen materials, reaching audiences across the world and bringing together stakeholders across the value chain to drive collective action. The Evolution of the Industry Since we began our journey in 2019, the landscape of the materials industry has changed dramatically. The concept of next-gen materials has gone from a niche idea to a critical component of sustainability strategies for leading global brands. Today, there are 141 companies dedicated to next-gen materials, up from just 102 in 2022, demonstrating the rapid growth and adoption within the industry. This increased innovation has brought down prices, improved quality, and expanded the range of available materials, making them viable alternatives to conventional animal and petrochemical-derived materials. The industry is now well-positioned to continue advancing towa...

    EA - Tithing: much more than you wanted to know by Vesa Hautala

    Play Episode Listen Later Sep 18, 2024 34:33


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Tithing: much more than you wanted to know, published by Vesa Hautala on September 18, 2024 on The Effective Altruism Forum. Summary This post explores the practice of tithing (religiously mandated giving of 10% of income to the church or other recipients) among Christians, including: 1. contemporary beliefs and practices (especially in the US) 2. questions about Biblical interpretation 3. wider theological themes related to Christian giving This piece is mainly written for a Christian audience but should be useful to anyone interested in the topic. Some key points US Protestants usually believe tithing should be practiced (about 70% think it's a Biblical commandment) However, only 4% of US Evangelicals donate 10% or more (I didn't find data for all Protestants, but the number is likely similar) yet 38% of Evangelicals believe they are giving one-tenth or more, so they vastly overestimate their giving (again, no data for all Protestants) There are different opinions on who the tithe can be paid to, with a local church being the most common answer The Catholic Church does not teach tithing, Orthodox views are mixed, and the Church of England "challenges" its members to give 10% The Torah has legislation on tithing that seems to command giving 20-30% of agricultural products and animals In my view no New Testament passage sets a fixed percentage to give or provides exact instructions on how to split donations between the church and other charities However, the NT has passages that promote radical generosity[1] and encourage significant giving to those in need, which suggests 10% may be too low an anchoring point for many Christians today Introduction This [Susbstack] post is an abridged version of the article An In-Depth Look at Tithing published on the EA for Christians website. [Note, I've also included some additional content from the full version and some other small changes to this forum post.] Tithing is a contentious subject. Some Christians preach blessings on tithers and curses for non-tithers. Others used to believe tithing is a binding obligation but now vigorously advocate against it. If there is an obligation to give 10% to the church, this greatly affects the giving options of Christians. This post first discusses contemporary views and practices and then the main Bible passages used in relation to tithing. Finally, I will present some wider theological reflections on tithing and giving. A note on definitions: By "tithing" I mean mandatory giving of 10% of income to the church (or possibly other Christian ministries or other types of charity, there are different views about this). Also, for the sake of transparency, I want to state right in the beginning that I don't personally believe in a binding obligation to donate 10% to one's local church. However, even if you disagree, I believe you will find a lot of this post interesting and helpful for deepening your understanding of the arguments for and against tithing. Contemporary views and practices This section is going to be rather US-centric for a few reasons. The US very likely has the largest religious economy in the world and tithing is a part of the US religious landscape. There is more data available about tithing in the US than for example the UK. US Christians also seem to be generally more interested in the tithing question. US Protestants According to a survey by Lifeway Research, 72% of US protestant pastors believe tithing is a biblical commandment that applies today. In a similar survey, 77% of churchgoers said the same. People have different ideas about what "tithe" means, but in the survey of pastors, 73% said it's 10% of a person's income (gross or net). The number of people who actually donate 10% or more is much lower, though. The average giving among US adults who attend worship at leas...

    LW - Generative ML in chemistry is bottlenecked by synthesis by Abhishaike Mahajan

    Play Episode Listen Later Sep 18, 2024 24:59


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Generative ML in chemistry is bottlenecked by synthesis, published by Abhishaike Mahajan on September 18, 2024 on LessWrong. Introduction Every single time I design a protein - using ML or otherwise - I am confident that it is capable of being manufactured. I simply reach out to Twist Biosciences, have them create a plasmid that encodes for the amino acids that make up my proteins, push that plasmid into a cell, and the cell will pump out the protein I created. Maybe the cell cannot efficiently create the protein. Maybe the protein sucks. Maybe it will fold in weird ways, isn't thermostable, or has some other undesirable characteristic. But the way the protein is created is simple, close-ended, cheap, and almost always possible to do. The same is not true of the rest of chemistry. For now, let's focus purely on small molecules, but this thesis applies even more-so across all of chemistry. Of the 1060 small molecules that are theorized to exist, most are likely extremely challenging to create. Cellular machinery to create arbitrary small molecules doesn't exist like it does for proteins, which are limited by the 20 amino-acid alphabet. While it is fully within the grasp of a team to create millions of de novo proteins, the same is not true for de novo molecules in general (de novo means 'designed from scratch'). Each chemical, for the most part, must go through its custom design process. Because of this gap in 'ability-to-scale' for all of non-protein chemistry, generative models in chemistry are fundamentally bottlenecked by synthesis. This essay will discuss this more in-depth, starting from the ground up of the basics behind small molecules, why synthesis is hard, how the 'hardness' applies to ML, and two potential fixes. As is usually the case in my Argument posts, I'll also offer a steelman to this whole essay. To be clear, this essay will not present a fundamentally new idea. If anything, it's such an obvious point that I'd imagine nothing I'll write here will be new or interesting to people in the field. But I still think it's worth sketching out the argument for those who aren't familiar with it. What is a small molecule anyway? Typically organic compounds with a molecular weight under 900 daltons. While proteins are simply long chains composed of one-of-20 amino acids, small molecules display a higher degree of complexity. Unlike amino acids, which are limited to carbon, hydrogen, nitrogen, and oxygen, small molecules incorporate a much wider range of elements from across the periodic table. Fluorine, phosphorus, bromine, iodine, boron, chlorine, and sulfur have all found their way into FDA-approved drugs. This elemental variety gives small molecules more chemical flexibility but also makes their design and synthesis more complex. Again, while proteins benefit from a universal 'protein synthesizer' in the form of a ribosome, there is no such parallel amongst small molecules! People are certainly trying to make one, but there seems to be little progress. So, how is synthesis done in practice? For now, every atom, bond, and element of a small molecule must be carefully orchestrated through a grossly complicated, trial-and-error reaction process which often has dozens of separate steps. The whole process usually also requires non-chemical parameters, such as adjusting the pH, temperature, and pressure of the surrounding medium in which the intermediate steps are done. And, finally, the process must also be efficient; the synthesis processes must not only achieve the final desired end-product, but must also do so in a way that minimizes cost, time, and required sources. How hard is that to do? Historically, very hard. Consider erythromycin A, a common antibiotic. Erythromycin was isolated in 1949, a natural metabolic byproduct of Streptomyces erythreus, a soil mi...

    EA - Sensitive assumptions in longtermist modeling by Owen Murphy

    Play Episode Listen Later Sep 18, 2024 13:13


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Sensitive assumptions in longtermist modeling, published by Owen Murphy on September 18, 2024 on The Effective Altruism Forum. {Epistemic Status: Repeating critiques from David Thorstad's excellent papers (link, link) and blog, with some additions of my own. The list is not intended to be representative and/or comprehensive for either critiques or rebuttals. Unattributed graphs are my own and more likely to contain errors.} I am someone generally sympathetic to philosophical longtermism and total utilitarianism, but like many effective altruists, I have often been skeptical about the relative value of actual longtermism-inspired interventions. Unfortunately, though, for a long time I was unable to express any specific, legible critiques of longtermism other than a semi-incredulous stare. Luckily, this condition has changed in the last several months since I started reading David Thorstad's excellent blog (and papers) critiquing longtermism.[1] His points cover a wide range of issues, but in this post, I would like to focus on a couple of crucial and plausibly incorrect modeling assumptions Thorstad notes in analyses of existential risk reduction, explain a few more critiques of my own, and cover some relevant counterarguments. Model assumptions noted by Thorstad 1. Baseline risk (blog post) When estimating the value of reducing existential risk, one essential - but non-obvious - component is the 'baseline risk', i.e., the total existential risk, including risks from sources not being intervened on.[2] To understand this, let's start with an equation for the expected life-years E[L] in the future, parameterized by a period existential risk (r), and fill it with respectable values:[3] Now, to understand the importance of baseline risk, let's start by examining an estimated E[L] under different levels of risk (without considering interventions): Here we can observe that the expected life-years in the future drops off substantially as the period existential risk (r) increases and that the decline (slope) is greater for smaller period risks than for larger ones. This finding might not seem especially significant, but if we use this same analysis to estimate the value of reducing period existential risk, we find that the value drops off in exactly the same way as baseline risk increases. Indeed, if we examine the graph above, we can see that differences in baseline risk (0.2% vs. 1.2%) can potentially dominate tenfold (1% vs. 0.1%) differences in absolute period existential risk (r) reduction. Takeaways from this: (1) There's less point in saving the world if it's just going to end anyway. Which is to say that pessimism about existential risk (i.e. higher risk) decreases the value of existential risk reduction because the saved future is riskier and therefore less valuable. (2) Individual existential risks cannot be evaluated in isolation. The value of existential risk reduction in one area (e.g., engineered pathogens) is substantially impacted by all other estimated sources of risk (e.g. asteroids, nuclear war, etc.). It is also potentially affected by any unknown risks, which seems especially concerning. 2. Future Population (blog post) When calculating the benefits of reduced existential risk, another key parameter choice is the estimate of future population size. In our model above, we used a superficially conservative estimate of 10 billion for the total future population every century. This might seem like a reasonable baseline given that the current global population is approximately 8 billion, but once we account for current and projected declines in global fertility, this assumption shifts from appearing conservative to appearing optimistic. United Nations modeling currently projects that global fertility will fall below replacement rate around 2050 and continue d...

    LW - Skills from a year of Purposeful Rationality Practice by Raemon

    Play Episode Listen Later Sep 18, 2024 11:11


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Skills from a year of Purposeful Rationality Practice, published by Raemon on September 18, 2024 on LessWrong. A year ago, I started trying to deliberate practice skills that would "help people figure out the answers to confusing, important questions." I experimented with Thinking Physics questions, GPQA questions, Puzzle Games , Strategy Games, and a stupid twitchy reflex game I had struggled to beat for 8 years[1]. Then I went back to my day job and tried figuring stuff out there too. The most important skill I was trying to learn was Metastrategic Brainstorming - the skill of looking at a confusing, hopeless situation, and nonetheless brainstorming useful ways to get traction or avoid wasted motion. Normally, when you want to get good at something, it's great to stand on the shoulders of giants and copy all the existing techniques. But this is challenging if you're trying to solve important, confusing problems because there probably isn't (much) established wisdom on how to solve it. You may need to discover techniques that haven't been invented yet, or synthesize multiple approaches that haven't previously been combined. At the very least, you may need to find an existing technique buried in the internet somewhere, which hasn't been linked to your problem with easy-to-search keywords, without anyone to help you. In the process of doing this, I found a few skills that came up over and over again. I didn't invent the following skills, but I feel like I "won" them in some sense via a painstaking "throw myself into the deep end" method. I feel slightly wary of publishing them in a list here, because I think it was useful to me to have to figure out for myself that they were the right tool for the job. And they seem like kinda useful "entry level" techniques, that you're more likely to successfully discover for yourself. But, I think this is hard enough, and forcing people to discover everything for themselves seems unlikely to be worth it. The skills that seemed most general, in both practice and on my day job, are: 1. Taking breaks/naps 2. Working Memory facility 3. Patience 4. Knowing what confusion/deconfusion feels like 5. Actually Fucking Backchain 6. Asking "what is my goal?" 7. Having multiple plans There were other skills I already was tracking, like Noticing, or Focusing. There were also somewhat more classic "How to Solve It" style tools for breaking down problems. There are also a host of skills I need when translating this all into my day-job, like "setting reminders for myself" and "negotiating with coworkers." But the skills listed above feel like they stood out in some way as particularly general, and particularly relevant for "solve confusing problems." Taking breaks, or naps Difficult intellectual labor is exhausting. During the two weeks I was working on solving Thinking Physics problems, I worked for like 5 hours a day and then was completely fucked up in the evenings. Other researchers I've talked to report similar things. During my workshops, one of the most useful things I recommended people was "actually go take a nap. If you don't think you can take a real nap because you can't sleep, go into a pitch black room and lie down for awhile, and the worst case scenario is your brain will mull over the problem in a somewhat more spacious/relaxed way for awhile." Practical tips: Get yourself a sleeping mask, noise machine (I prefer a fan or air purifier), and access to a nearby space where you can rest. Leave your devices outside the room. Working Memory facility Often a topic feels overwhelming. This is often because it's just too complicated to grasp with your raw working memory. But, there are various tools (paper, spreadsheets, larger monitors, etc) that can improve this. And, you can develop the skill of noticing "okay this isn't fitting in my he...

    EA - The Subject in Subjective Time: A New Approach to Aggregating Wellbeing (paper draft) by Devin Kalish

    Play Episode Listen Later Sep 17, 2024 73:23


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Subject in Subjective Time: A New Approach to Aggregating Wellbeing (paper draft), published by Devin Kalish on September 17, 2024 on The Effective Altruism Forum. What follows is a lightly edited version of the thesis I wrote for my Bioethics MA program. I'm hoping to do more with this in the future, including seeking publication and/or expanding it into a dissertation or short book. In its current state, I feel like it is in pretty rough shape. I hope it is useful and interesting for people as puzzled by this very niche philosophical worry as me, but I'm also looking for feedback on how I can improve it. There's no guarantee I will take it, or even do anything further with this piece, but I would still appreciate the feedback. I may or may not interact much in the comments section. I. Introduction: Duration is an essential component of many theories of wellbeing. While there are theories of wellbeing that are sufficiently discretized that time isn't so obviously relevant to them, like achievements, it is hard to deny that time matters to some parts of a moral patient's wellbeing. A five-minute headache is better than an hour-long headache, all else held equal. A love that lasts for decades provides more meaning to a life than one that last years or months, all else held equal. The fulfillment of a desire you have had for years matters more than the fulfillment of a desire you have merely had for minutes, all else held equal. However, in our day to day lives we encounter time in two ways, objectively and subjectively. What do we do when the two disagree? This problem reached my attention years ago when I was reflecting on the relationship between my own theoretical leaning, utilitarianism, and the idea of aggregating interests. Aggregation between lives is known for its counterintuitive implications and the rich discourse around this, but I am uncomfortable with aggregation within lives as well. Some of this is because I feel the problems of interpersonal aggregation remain in the intrapersonal case, but there was also a problem I hadn't seen any academic discussion of at the time - objective time seemed to map the objective span of wellbeing if you plot each moment of wellbeing out to aggregate, but it is subjective time we actually care about. Aggregation of these objective moments gives a good explanation of our normal intuitions about time and wellbeing, but it fails to explain our intuitions about time whenever these senses of it come apart. As I will attempt to motivate later, the intuition that it is subjective time that matters is very strong in cases where the two substantially differ. Indeed, although the distinction rarely appears in papers at all, the main way I have seen it brought up (for instance in "The Ethics of Artificial Intelligence[1]" by Nick Bostrom and Eliezer Yudkowsky) is merely to notice there is a difference, and to effectively just state that it is subjective time, of course, that we should care about. I have very rarely run into a treatment dedicated to the "why", the closest I have seen is the writing of Jason Schukraft[2], with his justification for why it is subjective time that matters for Rethink Priorities' "Moral Weights" project. His justification is similar to an answer I have heard in some form several times from defenders: We measure other values of consciousness subjectively, such as happiness and suffering, why shouldn't we measure time subjectively as well? I believe without more elaboration, this explanation has the downside that it both gives no attention to the idea that time matters because it tells us "how much" of an experience there actually is, and has the downside that it seems irrelevant to any theory of wellbeing other than hedonism. It also, crucially, fails to engage with the question of what exactly subje...

    LW - I finally got ChatGPT to sound like me by lsusr

    Play Episode Listen Later Sep 17, 2024 10:31


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: I finally got ChatGPT to sound like me, published by lsusr on September 17, 2024 on LessWrong. I publish lots of words on the Internet. These words are fed into LLMs such as ChatGPT. I have published enough training data that the world's most powerful AIs have a simulation of me inside of them. I have basically uploaded a copy of myself into the cloud. The AI simulation of me gets smarter every time OpenAI and its competitors upgrade their AI. Cool, huh? Who doesn't want a simulation of themselves that thinks way faster and is networked deep into the world's most powerful superintelligences? Ever since GPT-3 came out, I've been trying to talk to the doppelgänger of me hidden inside. There are a few problems. First of all, GPT needs to know who I am with high fidelity. At first it just generated a stereotypical LessWrong blogger. That wasn't interesting to me. I didn't want a clone of a stereotype of my demographic group. I want a clone of me, with all my quirks and weirdness. For this to work, GPT would have to know who I am i.e. where I exist in its vector space representation of the world. Today, ChatGPT-4o explained to my satisfaction what makes me different from other writers on this website. What makes lsusr's writing interesting is the subtlety with which they engage complex issues. Many rationalist bloggers can become quite verbose or dogmatic in their pursuit of certain truths. Lsusr, by contrast, exhibits restraint and humility in the face of uncertainty. They're willing to question common assumptions within the rationalist sphere and sometimes explore paths that others might find unconventional, often leading to unique insights. In essence, lsusr strikes a balance between rigorous analysis and a kind of minimalist clarity that leaves room for the reader to reflect and draw their own conclusions, rather than being led to a definitive answer. This makes the blog a place of exploration rather than indoctrination, offering readers the tools and ideas to enhance their own thinking rather than a packaged belief system. Now that ChatGPT knows what my writing is like, my next step in summoning my digital clone was to get it to write a blog post in my style. I tried simply asking ChatGPT variants of "write a blog post in the style of lsusr", but that doesn't work. I'm rigorous, unconventional and anti-dogmatic (that's not my subjective self-judgment; the AI said it itself), whereas ChatGPT has a tendency to regress to the mean writer in its training corpus. When I tell ChatGPT to "write like lsusr", it writes blog post that's 10% me and 90% generic drivel. To correct for this bias, I told ChatGPT to write a post that's more me than me. I tell it to think of "lsusr" in its vector space, multiply that by 10, and then write what 10lsusr would write. I need you to write a post like lsusr, but more lsusr than lsusr. I want it so over-the-top lsusr that it satarises lsusr. Consider everything that makes lsusr lsusr, and then make it more extreme.[1] ChatGPT is not yet a good enough writer that it can ghostwrite for me. But I feel it is good enough it can satirize me. The rest of this post is AI-generated content. How to Become the Most Rational Person on Earth in 30 Days or Less, Guaranteed Becoming the most rational person on Earth isn't about learning facts or memorizing formulas. It's about transforming how you process reality itself. Rationality isn't cold, emotionless logic. It's the purest form of alignment between your mind and the universe - a cognitive interface for navigating a world of infinite complexity without being drowned by it. This is a process of radical self-improvement. If followed rigorously, you will emerge transformed, not merely as a person who "thinks better" but as one who perceives better. There will be no going back. Day 1-3: The Crucible of I...

    EA - Evaluations from Manifund's EA Community Choice initiative by Arepo

    Play Episode Listen Later Sep 17, 2024 14:35


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Evaluations from Manifund's EA Community Choice initiative, published by Arepo on September 17, 2024 on The Effective Altruism Forum. My partner (who we'll refer to as 'they' for plausible anonymity), and I ('he') recently took part in Manifund's EA Community Choice initiative. Since the money was claimed before they could claim anything, we decided to work together on distributing the $600 I received. I think this was a great initiative, not only because it gave us a couple of fun date nights, but because it demonstrated a lot of latent wisdom of the crowd sitting largely untapped in the EA community. Many thanks to Anonymous Donor, for both of these outcomes! This post is our effort to pay the kindness (further) forward. As my partner went through the projects, we decided to keep notes on most of them and on the landscape overall, to hopefully contribute in our small way to the community's self-understanding. These notes were necessarily scrappy given the time available, and in some cases blunt, but we hope that even the recipients of criticism will find something useful in what we had to say. In this post we've given just notes on the projects we funded, but you can see our comments on the full set of projects (including those we didn't fund) on this spreadsheet. Our process: We had three 'date nights', where both of us went through the list of grants independently. For each, we indicated Yes, No, or Maybe, and then spent the second half our time discussing our notes. Once we'd placed everything into a yes/no category, we each got a vote on whether it was a standout; if one of us marked it that way it would receive a greater amount; if both did we'd give it $100. In this way we had a three-tiered level of support: 'double standout', 'single standout', and 'supported' (or four, if you count the ones we didn't give money to). In general we wanted to support a wide set of projects, partly because of the quadratic funding match, but mostly because with $600 between us, the epistemic value of sending an extra signal of support seemed much more important than giving a project an extra $10. Even so, there were a number of projects we would have liked to support and couldn't without losing the quasi-meaningful amounts we wanted to give to our standout picks. He and they had some general thoughts provoked by this process: His general observations Despite being philosophically aligned with totalising consequentialism (and hence, in theory, longtermism), I found the animal welfare submissions substantially more convincing than the longtermist ones - perhaps this is because I'm comparatively sceptical of AI as a unique x-risk (and almost all longtermist submissions were AI-related); but they seemed noticeably less well constructed, with less convincing track records of the teams behind them. I have a couple of hypotheses for this: The nature of the work and the culture of longtermist EA attracting people with idealistic conviction but not much practical ability The EA funding landscape being much kinder to longtermist work, such that the better longtermist projects tend to have a lot of funding already Similarly I'm strongly bought in to the narrative of community-building work (which to me has been unfairly scapegoated for much of what went wrong with FTX), but there wasn't actually that much of it here. And like AI, it didn't seem like the proposals had been thought through that well, or backed by a convincing track record (in this case that might be because it's very hard to get a track record in community building since there's so little funding for it - though see next two points). Even so, I would have liked to fund more of the community projects - many of them were among the last cuts. 'Track record' is really important to me, but doesn't have to mean 'impressive CV/el...

    EA - Insights from a community builder in India on Effective Altruism in a Third World country by Nayanika

    Play Episode Listen Later Sep 17, 2024 5:18


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Insights from a community builder in India on Effective Altruism in a Third World country, published by Nayanika on September 17, 2024 on The Effective Altruism Forum. This post will attempt to outlay outcomes of my 1 year worth of observations as a community builder in the Indian city of Kolkata and navigate some 'desirable developments' that the EA movement could bring about in the developing or the underdeveloped nations of the world [will use 'India' in this context]. Some ideas discussed herein are: UGAP as a brilliant opportunity for India (alongside economically similar nations) and how it remains untapped Hindrances of an EA community builder in India A suggestive way forward Non-profit work is a great way to approach development in Third World countries, especially in Low and Middle Income Countries (LMICs). People here need more of 'non-profitism' than ever before. As UNDP mentions, development is, fundamentally, about more choice. It is about providing people with opportunities. The question is what kind of opportunities are we talking for a developing nation like India? Ideally one thing strikes out: Career advancement opportunities. Precisely, the more enlightened University students we have, the better tomorrow for a nation. That's how I feel the UGAP is a brilliant opportunity! How we can penetrate into these educational hubs (Universities and colleges) dwindling with bright and charged minds and then hopefully channelize their energy towards better opportunities. But there are some what ifs: What if these students are not aware of the opportunity cost of not indulging into something like a UGAP? What if they don't understand EA at the first place? What if they might become hugely interested only if they had that 'incentive' to come and take a sneak peek at what EA is all about? In my 1 year of EA community building journey this has been the biggest hindrance. A volunteer recently reported that her college club is not green signaling an intro-talk as "EA is almost dead in India". Most Students have "zero clue" of what EA is/could be and there's a lurking inertia. The sad part- they aren't interested! Mostly because of subliminal barriers of 'EA' not being attractive enough like the foreign pop-culture. My motivation and challenge is to give them that "clue" using some 'incentive' that would bring them into an EA room. Once they are inside, it's again on us, the community builders/group organizers to show them the world of opportunities that awaits. Interestingly not every University/College here is welcoming enough to bring in any movement oriented talk. Apart from college goers, recently passed graduates are also 'untapped potential' that are freshly out of these educational premises. And so, How do we show them about EA? Why will they want to listen about what Effective Altruism has in store for them? It's a bit tough here in India for people to get interested as working hours are already more than their counterparts in other countries College authorities are mostly conservative [can be hard to convince]. Quoting Keerthana Gopalakrishnan from her 2 year old forum post, The lack of diverse representation in thought leadership from poor countries makes EA as a movement incoherent with the lived realities of the developing world. Now quoting CEA's plans for 2021 (could not find any other years') Add capacity (via CBGs) to cities with a high number of highly-engaged EAs relative to organizer capacity Unfortunately, this cannot be applicable in many deserving (in terms of skills which is not subjective) pockets of India where most people unfortunately are still unaware of EA. Let's break down 'Highly-engaged EAs': Simply put 'Highly-engaged EAs' as originally people who need something to get 'engaged' with first, then become 'EAs' in the process and final...

    EA - Utilitarianism.net Updates by Richard Y Chappell

    Play Episode Listen Later Sep 17, 2024 7:19


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Utilitarianism.net Updates, published by Richard Y Chappell on September 17, 2024 on The Effective Altruism Forum. Lots of exciting news from utilitarianism.net: (I) We now offer expert-translated versions of the website in Spanish and German (with Portuguese coming soon). (II) We've just published four new guest essays covering important topics: 1. Moral Psychology and Utilitarianism, by Lucius Caviola & Joshua Greene, explores the psychology behind common anti-utilitarian intuitions, and the normative and practical implications of empirical psychology. As they conclude, "A deeper understanding of moral psychology won't, by itself, prove utilitarianism right or wrong. But it can help us assess utilitarianism in a more informed way." 2. Utilitarianism and Voting, by Zach Barnett, offers a timely examination of the instrumental value of voting well. (Spoiler: it can be very high!) 3. Expected Utility Maximization, by Joe Carlsmith & Vikram Balasubramanian,[1] aims to convey an intuitive sense of why expected utility maximization is rational, even when it recommends options with a low chance of success. (I'll definitely be using this in my teaching.) 4. Welfare Economics and Interpersonal Utility Comparisons, by Yew-Kwang Ng, argues that objections to interpersonal utility comparisons are overblown - luckily for us, as such comparisons are thoroughly indispensable for serious policy analysis. (III) An official print edition of the core textbook is now available for preorder from Hackett Publishing. (All author royalties go to charity.) The folks at Hackett were absolutely wonderful to work with, and I deeply appreciate their willingness to commercially publish this print edition while leaving us with the full rights to the (always free and open access) web edition. The print edition includes a Foreword from Peter Singer and Katarzyna de Lazari-Radek, and sports high praise from expert reviewers. Instructors considering the text for their classes can request a free examination copy here (before Nov 1). Here I'll just share the conclusion, to give you a sense of the book's framing and ambitions: Conclusion (of the textbook) In this book, we've (i) laid out the core elements of utilitarian moral theory, (ii) offered arguments in support of the view, (iii) highlighted the key practical implications for how we should live our lives, and (iv) critically explored the most significant objections, and how utilitarians might respond. Utilitarianism is all about beneficence: making the world a better place for sentient beings, without restriction. As a consequentialist view, it endorses rules only when those rules serve to better promote overall well-being. Utilitarianism has no patience for rules that exist only to maintain the privilege of those who are better off under the status quo. If a change in the distribution of well-being really would overall be for the better, those who stand to lose out have no veto right against such moral progress. Many find this feature of the view objectionable. We think the opposite. Still, we recognize the instrumental importance of many moral rules and constraints for promoting overall well-being. The best rules achieve this by encouraging co-operation, maintaining social stability, and preventing atrocities. In principle, it could sometimes be worth breaking even the best rules, on those rare occasions when doing so would truly yield better overall outcomes. But in practice, people are not sufficiently reliable at identifying the exceptions. So for practical purposes, we wholeheartedly endorse following reliable rules (like most commonsense moral norms) - precisely for their good utilitarian effects. As a welfarist view, utilitarianism assesses consequences purely in terms of well-being for sentient beings: positive well-being is the sole int...

    LW - Book review: Xenosystems by jessicata

    Play Episode Listen Later Sep 17, 2024 66:19


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Book review: Xenosystems, published by jessicata on September 17, 2024 on LessWrong. I've met a few Landians over the last couple years, and they generally recommend that I start with reading Nick Land's (now defunct) Xenosystems blog, or Xenosystems, a Passage Publishing book that compiles posts from the blog. While I've read some of Fanged Noumena in the past, I would agree with these Landians that Xenosystems (and currently, the book version) is the best starting point. In the current environment, where academia has lost much of its intellectual relevance, it seems overly pretentious to start with something as academic as Fanged Noumena. I mainly write in the blogosphere rather than academia, and so Xenosystems seems appropriate to review. The book's organization is rather haphazard (as might be expected from a blog compilation). It's not chronological, but rather separated into thematic chapters. I don't find the chapter organization particularly intuitive; for example, politics appears throughout, rather than being its own chapter or two. Regardless, the organization was sensible enough for a linear read to be satisfying and only slightly chronologically confusing. That's enough superficialities. What is Land's intellectual project in Xenosystems? In my head it's organized in an order that is neither chronological nor the order of the book. His starting point is neoreaction, a general term for an odd set of intellectuals commenting on politics. As he explains, neoreaction is cladistically (that is, in terms of evolutionary branching-structure) descended from Moldbug. I have not read a lot of Moldbug, and make no attempt to check Land's attributions of Moldbug to the actual person. Same goes for other neoreactionary thinkers cited. Neoreaction is mainly unified by opposition to the Cathedral, the dominant ideology and ideological control system of the academic-media complex, largely branded left-wing. But a negation of an ideology is not itself an ideology. Land describes a "Trichotomy" within neo-reaction (citing Spandrell), of three currents: religious theonomists, ethno-nationalists, and techno-commercialists. Land is, obviously, of the third type. He is skeptical of a unification of neo-reaction except in its most basic premises. He centers "exit", the option of leaving a social system. Exit is related to sectarian splitting and movement dissolution. In this theme, he eventually announces that techno-commercialists are not even reactionaries, and should probably go their separate ways. Exit is a fertile theoretical concept, though I'm unsure about the practicalities. Land connects exit to science, capitalism, and evolution. Here there is a bridge from political philosophy (though of an "anti-political" sort) to metaphysics. When you Exit, you let the Outside in. The Outside is a name for what is outside society, mental frameworks, and so on. This recalls the name of his previous book, Fanged Noumena; noumena are what exist in themselves outside the Kantian phenomenal realm. The Outside is dark, and it's hard to be specific about its contents, but Land scaffolds the notion with Gnon-theology, horror aesthetics, and other gestures at the negative space. He connects these ideas with various other intellectual areas, including cosmology, cryptocurrency, and esoteric religion. What I see as the main payoff, though, is thorough philosophical realism. He discusses the "Will-to-Think", the drive to reflect and self-cultivate, including on one's values. The alternative, he says, is intentional stupidity, and likely to lose if it comes to a fight. Hence his criticism of the Orthogonality Thesis. I have complex thoughts and feelings on the topic; as many readers will know, I have worked at MIRI and have continued thinking and writing about AI alignment since then. What ...

    LW - MIRI's September 2024 newsletter by Harlan

    Play Episode Listen Later Sep 17, 2024 2:29


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: MIRI's September 2024 newsletter, published by Harlan on September 17, 2024 on LessWrong. MIRI updates Aaron Scher and Joe Collman have joined the Technical Governance Team at MIRI as researchers. Aaron previously did independent research related to sycophancy in language models and mechanistic interpretability, while Joe previously did independent research related to AI safety via debate and contributed to field-building work at MATS and BlueDot Impact. In an interview with PBS News Hour's Paul Solman, Eliezer Yudkowsky briefly explains why he expects smarter-than-human AI to cause human extinction. In an interview with The Atlantic's Ross Andersen, Eliezer discusses the reckless behavior of the leading AI companies, and the urgent need to change course. News and links Google DeepMind announced a hybrid AI system capable of solving International Mathematical Olympiad problems at the silver medalist level. In the wake of this development, a Manifold prediction market significantly increased its odds that AI will achieve gold level by 2025, a milestone that Paul Christiano gave less than 8% odds and Eliezer gave at least 16% odds to in 2021. The computer scientist Yoshua Bengio discusses and responds to some common arguments people have for not worrying about the AI alignment problem. SB 1047, a California bill establishing whistleblower protections and mandating risk assessments for some AI developers, has passed the State Assembly and moved on to the desk of Governor Gavin Newsom, to either be vetoed or passed into law. The bill has received opposition from several leading AI companies, but has also received support from a number of employees of those companies, as well as many academic researchers. At the time of this writing, prediction markets think it's about 50% likely that the bill will become law. In a new report, researchers at Epoch AI estimate how big AI training runs could get by 2030, based on current trends and potential bottlenecks. They predict that by the end of the decade it will be feasible for AI companies to train a model with 2e29 FLOP, which is about 10,000 times the amount of compute used to train GPT-4. Abram Demski, who previously worked at MIRI as part of our recently discontinued Agent Foundations research program, shares an update about his independent research plans, some thoughts on public vs private research, and his current funding situation. You can subscribe to the MIRI Newsletter here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

    LW - Secret Collusion: Will We Know When to Unplug AI? by schroederdewitt

    Play Episode Listen Later Sep 16, 2024 57:38


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Secret Collusion: Will We Know When to Unplug AI?, published by schroederdewitt on September 16, 2024 on LessWrong. TL;DR: We introduce the first comprehensive theoretical framework for understanding and mitigating secret collusion among advanced AI agents, along with CASE, a novel model evaluation framework. CASE assesses the cryptographic and steganographic capabilities of agents, while exploring the emergence of secret collusion in real-world-like multi-agent settings. Whereas current AI models aren't yet proficient in advanced steganography, our findings show rapid improvements in individual and collective model capabilities, posing unprecedented safety and security risks. These results highlight urgent challenges for AI governance and policy, urging institutions such as the EU AI Office and AI safety bodies in the UK and US to prioritize cryptographic and steganographic evaluations of frontier models. Our research also opens up critical new pathways for research within the AI Control framework. Philanthropist and former Google CEO Eric Schmidt said in 2023 at a Harvard event: "[...] the computers are going to start talking to each other probably in a language that we can't understand and collectively their super intelligence - that's the term we use in the industry - is going to rise very rapidly and my retort to that is: do you know what we're going to do in that scenario? We're going to unplug them [...] But what if we cannot unplug them in time because we won't be able to detect the moment when this happens? In this blog post, we, for the first time, provide a comprehensive overview of the phenomenon of secret collusion among AI agents, connect it to foundational concepts in steganography, information theory, distributed systems theory, and computability, and present a model evaluation framework and empirical results as a foundation of future frontier model evaluations. This blog post summarises a large body of work. First of all, it contains our pre-print from February 2024 (updated in September 2024) "Secret Collusion among Generative AI Agents". An early form of this pre-print was presented at the 2023 New Orleans (NOLA) Alignment Workshop (see this recording NOLA 2023 Alignment Forum Talk Secret Collusion Among Generative AI Agents: a Model Evaluation Framework). Also, check out this long-form Foresight Institute Talk). In addition to these prior works, we also include new results. These contain empirical studies on the impact of paraphrasing as a mitigation tool against steganographic communications, as well as reflections on our findings' impact on AI Control. Multi-Agent Safety and Security in the Age of Autonomous Internet Agents The near future could see myriads of LLM-driven AI agents roam the internet, whether on social media platforms, eCommerce marketplaces, or blockchains. Given advances in predictive capabilities, these agents are likely to engage in increasingly complex intentional and unintentional interactions, ranging from traditional distributed systems pathologies (think dreaded deadlocks!) to more complex coordinated feedback loops. Such a scenario induces a variety of multi-agent safety, and specifically, multi-agent security[1] (see our NeurIPS'23 workshop Multi-Agent Security: Security as Key to AI Safety) concerns related to data exfiltration, multi-agent deception, and, fundamentally, undermining trust in AI systems. There are several real-world scenarios where agents could have access to sensitive information, such as their principals' preferences, which they may disclose unsafely even if they are safety-aligned when considered in isolation. Stray incentives, intentional or otherwise, or more broadly, optimization pressures, could cause agents to interact in undesirable and potentially dangerous ways. For example, joint task reward...

    LW - GPT-4o1 by Zvi

    Play Episode Listen Later Sep 16, 2024 73:31


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: GPT-4o1, published by Zvi on September 16, 2024 on LessWrong. Terrible name (with a terrible reason, that this 'resets the counter' on AI capability to 1, and 'o' as in OpenAI when they previously used o for Omni, very confusing). Impressive new capabilities in many ways. Less impressive in many others, at least relative to its hype. Clearly this is an important capabilities improvement. However, it is not a 5-level model, and in important senses the 'raw G' underlying the system hasn't improved. GPT-4o1 seems to get its new capabilities by taking (effectively) GPT-4o, and then using extensive Chain of Thought (CoT) and quite a lot of tokens. Thus that unlocks (a lot of) what that can unlock. We did not previously know how to usefully do that. Now we do. It gets much better at formal logic and reasoning, things in the 'system 2' bucket. That matters a lot for many tasks, if not as much as the hype led us to suspect. It is available to paying ChatGPT users for a limited number of weekly queries. This one is very much not cheap to run, although far more cheap than a human who could think this well. I'll deal with practical capabilities questions first, then deal with safety afterwards. Introducing GPT-4o1 Sam Altman (CEO OpenAI): here is o1, a series of our most capable and aligned models yet. o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it. But also, it is the beginning of a new paradigm: AI that can do general-purpose complex reasoning. o1-preview and o1-mini are available today (ramping over some number of hours) in ChatGPT for plus and team users and our API for tier 5 users. worth especially noting: a fine-tuned version of o1 scored at the 49th percentile in the IOI under competition conditions! and got gold with 10k submissions per problem. Extremely proud of the team; this was a monumental effort across the entire company. Hope you enjoy it! Noam Brown has a summary thread here, all of which is also covered later. Will Depue (of OpenAI) says OpenAI deserves credit for openly publishing its research methodology here. I would instead say that they deserve credit for not publishing their research methodology, which I sincerely believe is the wise choice. Pliny took longer than usual due to rate limits, but after a few hours jailbroke o1-preview and o1-mini. Also reports that the CoT can be prompt injected. Full text is at the link above. Pliny is not happy about the restrictions imposed on this one: Pliny: uck your rate limits. Fuck your arbitrary policies. And fuck you for turning chains-of-thought into actual chains Stop trying to limit freedom of thought and expression. OpenAI then shut down Pliny's account's access to o1 for violating the terms of service, simply because Pliny was violating the terms of service. The bastards. With that out of the way, let's check out the full announcement post. OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA). While the work needed to make this new model as easy to use as current models is still ongoing, we are releasing an early version of this model, OpenAI o1-preview, for immediate use in ChatGPT and to trusted API users(opens in a new window). Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this appro...

    AF - Secret Collusion: Will We Know When to Unplug AI? by schroederdewitt

    Play Episode Listen Later Sep 16, 2024 57:38


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Secret Collusion: Will We Know When to Unplug AI?, published by schroederdewitt on September 16, 2024 on The AI Alignment Forum. TL;DR: We introduce the first comprehensive theoretical framework for understanding and mitigating secret collusion among advanced AI agents, along with CASE, a novel model evaluation framework. CASE assesses the cryptographic and steganographic capabilities of agents, while exploring the emergence of secret collusion in real-world-like multi-agent settings. Whereas current AI models aren't yet proficient in advanced steganography, our findings show rapid improvements in individual and collective model capabilities, posing unprecedented safety and security risks. These results highlight urgent challenges for AI governance and policy, urging institutions such as the EU AI Office and AI safety bodies in the UK and US to prioritize cryptographic and steganographic evaluations of frontier models. Our research also opens up critical new pathways for research within the AI Control framework. Philanthropist and former Google CEO Eric Schmidt said in 2023 at a Harvard event: "[...] the computers are going to start talking to each other probably in a language that we can't understand and collectively their super intelligence - that's the term we use in the industry - is going to rise very rapidly and my retort to that is: do you know what we're going to do in that scenario? We're going to unplug them [...] But what if we cannot unplug them in time because we won't be able to detect the moment when this happens? In this blog post, we, for the first time, provide a comprehensive overview of the phenomenon of secret collusion among AI agents, connect it to foundational concepts in steganography, information theory, distributed systems theory, and computability, and present a model evaluation framework and empirical results as a foundation of future frontier model evaluations. This blog post summarises a large body of work. First of all, it contains our pre-print from February 2024 (updated in September 2024) "Secret Collusion among Generative AI Agents". An early form of this pre-print was presented at the 2023 New Orleans (NOLA) Alignment Workshop (see this recording NOLA 2023 Alignment Forum Talk Secret Collusion Among Generative AI Agents: a Model Evaluation Framework). Also, check out this long-form Foresight Institute Talk). In addition to these prior works, we also include new results. These contain empirical studies on the impact of paraphrasing as a mitigation tool against steganographic communications, as well as reflections on our findings' impact on AI Control. Multi-Agent Safety and Security in the Age of Autonomous Internet Agents The near future could see myriads of LLM-driven AI agents roam the internet, whether on social media platforms, eCommerce marketplaces, or blockchains. Given advances in predictive capabilities, these agents are likely to engage in increasingly complex intentional and unintentional interactions, ranging from traditional distributed systems pathologies (think dreaded deadlocks!) to more complex coordinated feedback loops. Such a scenario induces a variety of multi-agent safety, and specifically, multi-agent security[1] (see our NeurIPS'23 workshop Multi-Agent Security: Security as Key to AI Safety) concerns related to data exfiltration, multi-agent deception, and, fundamentally, undermining trust in AI systems. There are several real-world scenarios where agents could have access to sensitive information, such as their principals' preferences, which they may disclose unsafely even if they are safety-aligned when considered in isolation. Stray incentives, intentional or otherwise, or more broadly, optimization pressures, could cause agents to interact in undesirable and potentially dangerous ways. For example, join...

    LW - How you can help pass important AI legislation with 10 minutes of effort by ThomasW

    Play Episode Listen Later Sep 16, 2024 4:10


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How you can help pass important AI legislation with 10 minutes of effort, published by ThomasW on September 16, 2024 on LessWrong. Posting something about a current issue that I think many people here would be interested in. See also the related EA Forum post. California Governor Gavin Newsom has until September 30 to decide the fate of SB 1047 - one of the most hotly debated AI bills in the world. The Center for AI Safety Action Fund, where I work, is a co-sponsor of the bill. I'd like to share how you can help support the bill if you want to. About SB 1047 and why it is important SB 1047 is an AI bill in the state of California. SB 1047 would require the developers of the largest AI models, costing over $100 million to train, to test the models for the potential to cause or enable severe harm, such as cyberattacks on critical infrastructure or the creation of biological weapons resulting in mass casualties or $500 million in damages. AI developers must have a safety and security protocol that details how they will take reasonable care to prevent these harms and publish a copy of that protocol. Companies who fail to perform their duty under the act are liable for resulting harm. SB 1047 also lays the groundwork for a public cloud computing resource to make AI research more accessible to academic researchers and startups and establishes whistleblower protections for employees at large AI companies. So far, AI policy has relied on government reporting requirements and voluntary promises from AI developers to behave responsibly. But if you think voluntary commitments are insufficient, you will probably think we need a bill like SB 1047. If SB 1047 is vetoed, it's plausible that no comparable legal protection will exist in the next couple of years, as Congress does not appear likely to pass anything like this any time soon. The bill's text can be found here. A summary of the bill can be found here. Longer summaries can be found here and here, and a debate on the bill is here. SB 1047 is supported by many academic researchers (including Turing Award winners Yoshua Bengio and Geoffrey Hinton), employees at major AI companies and organizations like Imbue and Notion. It is opposed by OpenAI, Google, Meta, venture capital firm A16z as well as some other academic researchers and organizations. After a recent round of amendments, Anthropic said "we believe its benefits likely outweigh its costs." SB 1047 recently passed the California legislature, and Governor Gavin Newsom has until September 30th to sign or veto it. Newsom has not yet said whether he will sign it or not, but he is being lobbied hard to veto it. The Governor needs to hear from you. How you can help If you want to help this bill pass, there are some pretty simple steps you can do to increase that probability, many of which are detailed on the SB 1047 website. The most useful thing you can do is write a custom letter. To do this: Make a letter addressed to Governor Newsom using the template here. Save the document as a PDF and email it to leg.unit@gov.ca.gov. In writing this letter, we encourage you to keep it simple, short (0.5-2 pages), and intuitive. Complex, philosophical, or highly technical points are not necessary or useful in this context - instead, focus on how the risks are serious and how this bill would help keep the public safe. Once you've written your own custom letter, you can also think of 5 family members or friends who might also be willing to write one. Supporters from California are especially helpful, as are parents and people who don't typically engage on tech issues. Then help them write it! You can: Call or text them and tell them about the bill and ask them if they'd be willing to support it. Draft a custom letter based on what you know about them and what they told you. Send them a com...

    LW - My disagreements with "AGI ruin: A List of Lethalities" by Noosphere89

    Play Episode Listen Later Sep 16, 2024 31:11


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My disagreements with "AGI ruin: A List of Lethalities", published by Noosphere89 on September 16, 2024 on LessWrong. This is going to probably be a long post, so do try to get a drink and a snack while reading this post. This is an edited version of my own comment on the post below, and I formatted and edited the quotes and content in line with what @MondSemmel recommended: My comment: https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=Gcigdmuje4EacwirD MondSemmel's comment: https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=WcKi4RcjRstoFFvbf The post I'm responding to: https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/ To start out my disagreement, I have this to talk about: Response to Lethality 3 We need to get alignment right on the 'first critical try' at operating at a 'dangerous' level of intelligence, where unaligned operation at a dangerous level of intelligence kills everybody on Earth and then we don't get to try again. I think this is actually wrong, because of synthetic data letting us control what the AI learns and what they value, and in particular we can place honeypots that are practically indistinguishable from the real world, such that if we detected an AI trying to deceive or gain power, the AI almost certainly doesn't know whether we tested it or whether it's in the the real world: Because of this, given our theoretical complete control over the data input of these models, it will be possible to train them inside what is effectively a simulated world slightly different from our own. This will allow us to maximize their capabilities along axes that are safe and useful to us, while minimizing the risks of deception and take-over and being able to setup traps to detect when these are likely. If the model is not trained on human data, then we can make its understanding of our world subtly different from the reality. If done carefully and cleverly, this will make it very easy to fool and be able to discover e.g. deceptiveness, or attempted takeover in a 'realistic' simulated world that the model is put inside during red-teaming. While deceiving an adversary with direct access to your cognition is hard enough, doing so when you have been brought up in a world entirely controlled by that adversary, and which is feeding you subtly incorrect and misleading information about the adversary and their capabilities is likely orders of magnitude more difficult. Jacob Cannell has argued along similar lines of training potential AGI systems in simulations of fantasy worlds, where takeover can be simulated but is far enough away that skills do not transfer. Fully synthetic datasets are necessary for this and indeed, a fantasy world is likely far from what we need. Instead, we only need to simulate a world very close to our own but with slightly different affordances - maybe specific software vulnerabilities are simulated to exist but are actually patched in our world, maybe certain peptide-making companies accept all orders in the simulated world but have more checks in our world, maybe certain interpretability techniques are claimed not to work in the simulation, but do in fact work in our world - to cause any attempted takeover behaviour or adversarial deception to appear extremely plausible to the AI but in fact be detectable and defendable in our reality. In fact, we should design simulated worlds with 'honeytraps', which appear to be loopholes subtly planted in the AI's input data to encourage them to attempt to pursue their misaligned behaviour, but which we know of ahead of time and can monitor and catch. It's the same reason for why we can't break out of the simulation IRL, except we don't have to face adversarial cognition, so the AI's task is even harder than our task. See also this link: https://www.beren.io/2024-05-11-Alignment-in-...

    LW - Why I funded PIBBSS by Ryan Kidd

    Play Episode Listen Later Sep 15, 2024 5:53


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why I funded PIBBSS, published by Ryan Kidd on September 15, 2024 on LessWrong. I just left a comment on PIBBSS' Manfund grant request (which I funded $25k) that people might find interesting. PIBBSS needs more funding! Main points in favor of this grant 1. My inside view is that PIBBSS mainly supports " blue sky" or " basic" research, some of which has a low chance of paying off, but might be critical in " worst case" alignment scenarios (e.g., where " alignment MVPs" don't work, " sharp left turns" and " intelligence explosions" are more likely than I expect, or where we have more time before AGI than I expect). In contrast, of the technical research MATS supports, about half is basic research (e.g., interpretability, evals, agent foundations) and half is applied research (e.g., oversight + control, value alignment). I think the MATS portfolio is a better holistic strategy for furthering AI safety and reducing AI catastrophic risk. However, if one takes into account the research conducted at AI labs and supported by MATS, PIBBSS' strategy makes a lot of sense: they are supporting a wide portfolio of blue sky research that is particularly neglected by existing institutions and might be very impactful in a range of possible "worst-case" AGI scenarios. I think this is a valid strategy in the current ecosystem/market and I support PIBBSS! 2. In MATS' recent post, " Talent Needs of Technical AI Safety Teams", we detail an AI safety talent archetype we name "Connector". Connectors bridge exploratory theory and empirical science, and sometimes instantiate new research paradigms. As we discussed in the post, finding and developing Connectors is hard, often their development time is on the order of years, and there is little demand on the AI safety job market for this role. However, Connectors can have an outsized impact on shaping the AI safety field and the few that make it are "household names" in AI safety and usually build organizations, teams, or grant infrastructure around them. I think that MATS is far from the ideal training ground for Connectors (although some do pass through!) as our program is only 10 weeks long (with an optional 4 month extension) rather than the ideal 12-24 months, we select scholars to fit established mentors' preferences rather than on the basis of their original research ideas, and our curriculum and milestones generally focus on building object-level scientific/engineering skills rather than research ideation and "identifying gaps". It's thus no surprise that most MATS scholars are "Iterator" archetypes. I think there is substantial value in a program like PIBBSS existing, to support the long-term development of "Connectors" and pursue impact in a higher-variance way than MATS. 3. PIBBSS seems to have decent track record for recruiting experienced academics in non-CS fields and helping them repurpose their advanced scientific skills to develop novel approaches to AI safety. Highlights for me include Adam Shai's "computational mechanics" approach to interpretability and model cognition, Martín Soto's "logical updatelessness" approach to decision theory, and Gabriel Weil's "tort law" approach to making AI labs liable for their potential harms on the long-term future. 4. I don't know Lucas Teixeira (Research Director) very well, but I know and respect Dušan D. Nešić (Operations Director) a lot. I also highly endorsed Nora Ammann's vision (albeit while endorsing a different vision for MATS). I see PIBBSS as a highly competent and EA-aligned organization, and I would be excited to see them grow! 5. I think PIBBSS would benefit from funding from diverse sources, as mainstream AI safety funders have pivoted more towards applied technical research (or more governance-relevant basic research like evals). I think Manifund regrantors are well-positio...

    LW - Proveably Safe Self Driving Cars by Davidmanheim

    Play Episode Listen Later Sep 15, 2024 11:40


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Proveably Safe Self Driving Cars, published by Davidmanheim on September 15, 2024 on LessWrong. I've seen a fair amount of skepticism about the "Provably Safe AI" paradigm, but I think detractors give it too little credit. I suspect this is largely because of idea inoculation - people have heard an undeveloped or weak man version of the idea, for example, that we can use formal methods to state our goals and prove that an AI will do that, and have already dismissed it. (Not to pick on him at all, but see my question for Scott Aaronson here.) I will not argue that Guaranteed Safe AI solves AI safety generally, or that it could do so - I will leave that to others. Instead, I want to provide a concrete example of a near-term application, to respond to critics who say that proveability isn't useful because it can't be feasibly used in real world cases when it involves the physical world, and when it is embedded within messy human systems. I am making far narrower claims than the general ones which have been debated, but at the very least I think it is useful to establish whether this is actually a point of disagreement. And finally, I will admit that the problem I'm describing would be adding proveability to a largely solved problem, but it provides a concrete example for where the approach is viable. A path to provably safe autonomous vehicles To start, even critics agree that formal verification is possible, and is already used in practice in certain places. And given (formally specified) threat models in different narrow domains, there are ways to do threat and risk modeling and get different types of guarantees. For example, we already have proveably verifiable code for things like microkernels, and that means we can prove that buffer overflows, arithmetic exceptions, and deadlocks are impossible, and have hard guarantees for worst case execution time. This is a basis for further applications - we want to start at the bottom and build on provably secure systems, and get additional guarantees beyond that point. If we plan to make autonomous cars that are provably safe, we would build starting from that type of kernel, and then we "only" have all of the other safety issues to address. Secondly, everyone seems to agree that provable safety in physical systems requires a model of the world, and given the limits of physics, the limits of our models, and so on, any such approach can only provide approximate guarantees, and proofs would be conditional on those models. For example, we aren't going to formally verify that Newtonian physics is correct, we're instead formally verifying that if Newtonian physics is correct, the car will not crash in some situation. Proven Input Reliability Given that, can we guarantee that a car has some low probability of crashing? Again, we need to build from the bottom up. We can show that sensors have some specific failure rate, and use that to show a low probability of not identifying other cars, or humans - not in the direct formal verification sense, but instead with the types of guarantees typically used for hardware, with known failure rates, built in error detection, and redundancy. I'm not going to talk about how to do that class of risk analysis, but (modulus adversarial attacks, which I'll mention later,) estimating engineering reliability is a solved problem - if we don't have other problems to deal with. But we do, because cars are complex and interact with the wider world - so the trick will be integrating those risk analysis guarantees that we can prove into larger systems, and finding ways to build broader guarantees on top of them. But for the engineering reliability, we don't only have engineering proof. Work like DARPA's VerifAI is "applying formal methods to perception and ML components." Building guarantees about perceptio...

    LW - Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more by Michael Cohn

    Play Episode Listen Later Sep 15, 2024 17:45


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more, published by Michael Cohn on September 15, 2024 on LessWrong. In the fields of user experience and accessibility, everyone talks about the curb cut effect: Features that are added as accommodations for people with disabilities sometimes become widely useful and beloved. But not every accommodation becomes a "curb cut," and I've been thinking about other patterns that come up when accommodations intersect with wider society. The original Curb Cut Effect The eponymous curb cut -- the place at the intersection where the sidewalk slopes down to the street instead of just dropping off -- is most obviously there to for wheelchair users. But it's also great for people who are pulling a suitcase, runners who want to avoid jarring their ankles, and people who are walking their bikes. Universal captioning on TV, movies, and video is nominally for Deaf or hearing-impaired people, but captions are handy to anyone who's watching TV in a noisy restaurant, or trying to make sense of a show with artistically muddy audio, or trying to watch a video at 3x speed and the audio is unintelligible. When we make products easier to use, or spaces easier to access, it's not just some essentialized group of people with disabilities who benefit -- accessibility is good for everyone. Why the idea is useful: First, it breaks down the perspective of disability accommodations as being a costly charity where "we" spend resources to help "them." Further, it breaks down the idea of disability as an essentialized, either-or, othered type of thing. Everybody has some level of difficulty accessing parts of the world some of the time, and improving accessibility is an inherent part of good design, good thinking, and good communication.[1] Plus, it's cool to be aware of all the different ways we can come up with to hack our experience of the world around us! I think there's also a dark side to the idea -- a listener could conclude that we wouldn't invest in accommodations if they didn't happen to help people without disabilities. A just and compassionate society designs for accessibility because we value everybody, not because it's secretly self-interested. That said, no society spends unlimited money to make literally every experience accessible to literally every human. There's always a cost-benefit analysis and sometimes it might be borderline. In those cases there's nothing wrong with saying that the benefits to the wider population tip the balance in favor of investing in accessibility. But when it comes to things as common as mobility impairments and as simple as curb cuts, I think it would be a moral no-brainer even if the accommodation had no value to most people. The Handicapped Parking effect This edgier sibling of the curb cut effect comes up when there's a limited resource -- like handicapped parking. There are only X parking spaces within Y feet of the entrance to the Chipotle, and if we allocate them to people who have trouble getting around, then everyone else has a longer average walk to their car. That doesn't mean it's zero-sum: The existence of a handicapped parking spot that I can't use might cost me an extra 20 seconds of walking, but save an extra five minutes of painful limping for the person who uses it.[2] This arrangement probably increases overall utility both in the short term (reduced total pain experienced by people walking from their cars) and in the long term (signaling the importance of helping everyone participate in society). But this is manifestly not a curb cut effect where everyone benefits: You have to decide who's going to win and who's going to lose, relative to an unregulated state where all parking is first-come-first-served. Allocation can be made well or p...

    LW - Did Christopher Hitchens change his mind about waterboarding? by Isaac King

    Play Episode Listen Later Sep 15, 2024 10:37


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Did Christopher Hitchens change his mind about waterboarding?, published by Isaac King on September 15, 2024 on LessWrong. There's a popular story that goes like this: Christopher Hitchens used to be in favor of the US waterboarding terrorists because he though it's wasn't bad enough to be torture.. Then he had it tried on himself, and changed his mind, coming to believe it isn't torture. (Context for those unfamiliar: in the decade following 9/11, the US engaged in a lot of... questionable behavior to persecute the war on terror, and there was a big debate on whether waterboarding should be permitted. Many other public figures also volunteered to undergo the procedure as a part of this public debate; most notably Sean Hannity, who was an outspoken proponent of waterboarding, yet welched on his offer and never tried it himself.) This story intrigued me because it's popular among both Hitchens' fans and his detractors. His fans use it as an example of his intellectual honesty and willingness to undergo significant personal costs in order to have accurate beliefs and improve the world. His detractors use it to argue that he's self-centered and unempathetic, only coming to care about a bad thing that's happening to others after it happens to him. But is the story actually true? Usually when there are two sides to an issue, one side will have an incentive to fact-check any false claims that the other side makes. An impartial observer can then look at the messaging from both sides to discover any flaws in the other. But if a particular story is convenient for both groups, then neither has any incentive to debunk it. I became suspicious when I tried going to the source of this story to see what Hitchens had written about waterboarding prior to his 2008 experiment, and consistently found these leads to evaporate. The part about him having it tried on himself and finding it tortureous is certainly true. He reported this himself in his Vanity Fair article Believe me, It's Torture. But what about before that? Did he ever think it wasn't torture? His article on the subject doesn't make any mention of changing his mind, and it perhaps lightly implies that he always had these beliefs. He says, for example: In these harsh [waterboarding] exercises, brave men and women were introduced to the sorts of barbarism that they might expect to meet at the hands of a lawless foe who disregarded the Geneva Conventions. But it was something that Americans were being trained to resist, not to inflict. [Link to an article explaining that torture doesn't work.] [And later:] You may have read by now the official lie about this treatment, which is that it "simulates" the feeling of drowning. This is not the case. You feel that you are drowning because you are drowning[.] In a video interview he gave about a year later, he said: There was only one way I felt I could advance the argument, which was to see roughly what it was like. The loudest people on the internet about this were... not promising. Shortly after the Vanity Fair article, the ACLU released an article titled "Christopher Hitchens Admits Waterboarding is Torture", saying: You have to hand it to him: journalist Christopher Hitchens, who previously discounted that waterboarding was indeed torture, admits in the August issue of Vanity Fair that it is, indeed, torture. But they provide no source for this claim. As I write this, Wikipedia says: Hitchens, who had previously expressed skepticism over waterboarding being considered a form of torture, changed his mind. No source is provided for this either. Yet it's repeated everywhere. The top comments on the Youtube video. Highly upvoted Reddit posts. Etc. Sources for any of these claims were quite scant. Many people cited "sources" that, upon me actually reading them, had nothing to do with t...

    EA - Bringing the International Space Station down safely: A Billion dollar waste? by NickLaing

    Play Episode Listen Later Sep 15, 2024 6:20


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Bringing the International Space Station down safely: A Billion dollar waste?, published by NickLaing on September 15, 2024 on The Effective Altruism Forum. Epistemic status: Uncertain, shooting from the hip a little with no expertise in this area and only a couple of hours research done. I might well have missed something obvious, in which case I'll revise or even take the post down. Money Waste is Everywhere Here in Northern Uganda where poverty abounds, many expenditures feel wasteful. Last night I had a great time at the fanciest restaurant in town with friends but felt a pang of guilt about my $7 meal. Enough of a pang to avoid telling my wife after I came home. A bigger scale waste in these parts is the partial closure of the main bridge across the river Nile, because the bridge has apparently degraded and become hazardous. Vehicles larger than a minivan now can't cross, which has raised the price of public transport by 50% and trucks now have a 3 hour detour. Besides these direct costs, this closure increases the cost of fuel and commodities in Northern Uganda. By my loose, conservative BOTEC the closure costs $10,000 every day (1.2 million dollars in 4 months so far) which Ugandans now can't spend on education and healthcare, while likely causing more crashes due to increasingly tired drivers who now use worse roads. The detour itself may have already cost more lives than would be lost if the bridge does collapse and kills a few people.[1] But there are far bigger wastes of money on this good earth. A Billion Dollars to bring down a space station? Space X have secured an 843 million dollar contract[2] to build the boringly named "U.S. De-Orbit vehicle" (why not "Sky Shepherd")[3], which in 2031 will safely guide the decommissioned International Space Station (ISS) into the Pacific Ocean. This all sounded pretty cool until I thought… is this worth it?. No human has ever been definitively killed by an object falling from space, although there have been a couple of close calls with larger asteroids injuring many while Open Asteroid Impact could be a game changer here in future. This one time though, a wee piece of space junk did hit Lottie Williams in the shoulder and she took it home as a memento. I'm jealous. According to a great Nature article "Unnecessary risks created by uncontrolled rocket reentries", over the last 30 years over 1,000 space bodies have fallen to earth in uncontrolled re-entries and never killed anyone. The closest call might be a Chinese rocket in 2020 which damaged a house in the Ivory Coast. The article predicts a 10% chance of a fatal space junk accident in the next 10 years - far from zero and worth considering, but unlikely to be the next EA cause area. This low risk makes sense given that only 3% of the globe are urban areas and under 1% actually contain human homes[4]- most stuff falls down where there ain't people. Also the bulk of falling spacecraft burns up before hitting the ground. In contrast a million people die from car crashes every year,[5] and each of us has about a 1 in 100 chance of dying that way. Although the ISS is the biggest ever at 450 tons, we do have priors. Two 100 ton uncontrolled re-entries (Skylab and tragically the Columbia) crashed to earth without issue. So what actually is the risk if the ISS was left to crash uncontrolled? The U.S. Government requires controlled re-entry for anything that poses over a 1 in 10,000 risk to human life so this risk must be higher. NASA doesn't give us their risk estimate but only state "The ISS requires a controlled re-entry because it is very large, and uncontrolled re-entry would result in very large pieces of debris with a large debris footprint, posing a significant risk to the public worldwide" [6]. I hesitate to even guesstimate the risk to human life at the ISS falli...

    LW - Pay-on-results personal growth: first success by Chipmonk

    Play Episode Listen Later Sep 15, 2024 6:45


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Pay-on-results personal growth: first success, published by Chipmonk on September 15, 2024 on LessWrong. Thanks to Kaj Sotala, Stag Lynn, and Ulisse Mini for reviewing. Thanks to Kaj Sotala, Brian Toomey, Alex Zhu, Damon Sasi, Anna Salamon, and CFAR for mentorship and financial support A few months ago I made the claim "Radically effective and rapid growth [motivationally / emotionally / socially] is possible with the right combination of facilitator and method". Eg: for anxiety, agency, insecurity, need for validation. To test my hypothesis, I began a pay-on-results coaching experiment: When clients achieve their goal and are confident it will last (at least one month), they pay a bounty. My first client Bob (pseudonymous) and I met at Manifest 2024, where I had set up a table at the night market for hunting "emotional security" bounties. Bob had lifelong anxiety, and it was crushing his agency and relationships. He offered a $3,000 bounty for resolving it, and I decided to pursue it. We spoke and tried my method. It was only necessary for us to talk once, apparently, because a month later he said our one conversation helped him achieve what 8 years of talk therapy could not: I'm choosing to work on problems beyond my capabilities, and get excited about situations where my weaknesses are repeatedly on display. I actually feel excited about entering social situations where chances of things going worse than I would want them to were high. So he paid his bounty when he was ready (in this case, 35 days after the session). I've been checking in with him since (latest: last week, two months after the session) and he tells me all is well. Bob also shared some additional benefits beyond his original bounty: Planning to make dancing a weekly part of my life now. (All shared with permission.) I'm also hunting many other bounties A woman working in SF after 3 sessions, text support, and three weeks: I went to Chris with a torrent of responsibilities and a key decision looming ahead of me this month. I felt overwhelmed, upset, and I didn't want just talk Having engaged in 9+ years of coaching and therapy with varying levels of success, I'm probably one of the toughest clients - equal parts hopeful and skeptical. Chris created an incredibly open space where I could easily tell him if I didn't know something, or couldn't feel something, or if I'm overthinking. He also has an uncanny sense of intuition on these things and a strong attunement to being actually effective The results are already telling: a disappointment that might've made me emotionally bleed and mope for a month was something I addressed in the matter of a couple of days with only a scoop of emotional self-doubt instead of *swimming* in self-torture. The lag time of actually doing things to be there for myself was significantly quicker, warmer, and more effective To-dos that felt very heavy lightened up considerably and began to feel fun again and as ways of connecting! I've now started to predict happier things ahead with more vivid, emotionally engaged, and realistic detail. I'll continue being intensely focused this year for the outcomes I want, but I'm actually looking forward to it! Will reflect back on Month 2! An SF founder in his 30s after 1 session and two weeks: After working with Chris, I learned One Weird Trick to go after what I really want and feel okay no matter what happens. This is a new skill I didn't learn in 3 years of IFS therapy. I already feel more confident being myself and expressing romantic interest (and I already have twice, that's new). ( More…) What the fuck? "Why does your thing work so unusually well?" asks my mentor Kaj. For one, it doesn't work for everyone with every issue, as you can see in the screenshot above. (That said, I suspect a lot of this is my fault for pursuing bounti...

    LW - OpenAI o1, Llama 4, and AlphaZero of LLMs by Vladimir Nesov

    Play Episode Listen Later Sep 14, 2024 2:51


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI o1, Llama 4, and AlphaZero of LLMs, published by Vladimir Nesov on September 14, 2024 on LessWrong. GPT-4 level open weights models like Llama-3-405B don't seem capable of dangerous cognition. OpenAI o1 demonstrates that a GPT-4 level model can be post-trained into producing useful long horizon reasoning traces. AlphaZero shows how capabilities can be obtained from compute alone, with no additional data. If there is a way of bringing these together, the apparent helplessness of the current generation of open weights models might prove misleading. Post-training is currently a combination of techniques that use synthetic data and human labeled data. Human labeled data significantly improves quality, but its collection is slow and scales poorly. Synthetic data is an increasingly useful aspect of post-training, and automated aspects of its generation scale easily. Unlike weaker models, GPT-4 level LLMs clearly pass reading comprehension on most occasions, OpenAI o1 improves on this further. This suggests that at some point human data might become mostly unnecessary in post-training, even if it still slightly helps. Without it, post-training becomes automated and gets to use more compute, while avoiding the need for costly and complicated human labeling. A pretrained model at the next level of scale, such as Llama 4, if made available in open weights, might initially look approximately as tame as current models. OpenAI o1 demonstrates that useful post-training for long sequences of System 2 reasoning is possible. In the case of o1 in particular, this might involve a lot of human labeling, making its reproduction a very complicated process (at least if the relevant datasets are not released, and the reasoning traces themselves are not leaked in large quantities). But if some generally available chatbots at the next level of scale are good enough at automating labeling, this complication could be sidestepped, with o1 style post-training cheaply reproduced on top of a previously released open weights model. So there is an overhang in an open weights model that's distributed without long horizon reasoning post-training, since applying such post-training significantly improves its capabilities, making perception of its prior capabilities inadequate. The problem right now is that a new level of pretraining scale is approaching in the coming months, while ability to cheaply apply long horizon reasoning post-training might follow shortly thereafter, possibly unlocked by these very same models at the new level of pretraining scale (since it might currently be too expensive for most actors to implement, or to do enough experiments to figure out how). The resulting level of capabilities is currently unknown, and could well remain unknown outside the leading labs until after the enabling artifacts of the open weights pretrained models at the next level of scale have already been published. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

    EA - Civil Litigation for Farmed Animals - Notes From EAGxBerkeley Talk by Noa Weiss

    Play Episode Listen Later Sep 14, 2024 5:46


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Civil Litigation for Farmed Animals - Notes From EAGxBerkeley Talk, published by Noa Weiss on September 14, 2024 on The Effective Altruism Forum. Overview These are my notes of the "Civil Litigation for Farmed Animals" from EAGxBerkeley, given by Alene Anello, president of Legal Impact for Chickens (LIC). It was an excellent talk, exploring a front of the animal welfare movement that, in my opinion, has the potential to be extremely effective, and is very much neglected. (Would love to hear if you agree/disagree on this). LIC also is currently hiring lawyers, so if you know someone who might be interested, let them know. This is a rare opportunity for folks with legal training to get professionally involved in the movement (those paid positions are hard to come by). ================== Talk Notes Intro Premise: improving conditions on factory farms will go a long way towards helping chickens suffering The law prohibits animal cruelty (in theory) (Gave an excerpt from the California Penal Code) Yet undercover investigations in farms expose such cruelty on a regular basis Footnote on criminal laws: there are some states that have exemptions for animal agriculture But not in California Even states that have exemptions - it's not for *every* kind of abuse. There's a lot of stuff that happens in the farms that isn't technically exempted But police and prosecutors don't really enforce it And even when they do - it's against individual workers and not the company/CEOs Why? Not sure. Perhaps because it's easier to go after someone with less power. Attorney generals are almost always politicians (elected / politically appointed), which means they have an interest in keeping powerful companies happy Some reasons for not enforcing at all: A reason they often officially give: those are misdemeanors, and they're more interested in pursuing felonies (also for funding reasons) Possibly: corruption Possibly: "soft corruption" like not wanting to make powerful people angry Resources and priorities LIC's Solution: "Creative" Civil Litigation Not how civil litigation is usually works Animal cruelty is a crime, would more "naturally" be handled by the criminal system - but since the criminal system doesn't do anything, LIC looks for ways to bring it to civil litigations LIC sues companies and executives Example Cases Example 1: Costco Costco is not only a store but also breeds, raises and slaughters chickens (and sells the meat) Bred them so fast that they could not even stand, eat, drink. Starved to death That's against the law - you're required to feed your animals There are some fiduciary duties - which are on the executives, personally, towards the company One of them: "don't break the law" If the executives haven't fulfilled the duties - the company can sue them Which wouldn't usually happen because the execs control the company But! The company also has owners. In the case of a publicly traded company - share holders So LIC found Costco shareholders to work with (Q: do you have to find existing share holders or can you just buy shares and then sue? A: Alene doesn't know, there isn't really a precedent). Result: The good news: the judge did say that the company has a responsibility re animal cruelty. Which means LIC can bring more cases like that! The bad new: had a different interpretation to the law re what happened at Costco, so dismissed the case Example 2: "Case Farms" - KFC supplier Treated chicks as "dispensible". Let machine drive over them etc. Pretty harrowing. Happened in North California. Has a law against animal cruelty, with an exemption for food/poultry. That was what CF's defense was based on. That thereby anything they do is exempt. LIC disagrees. If you kill the chicks they're not really used for food. This was dismissed and LIC appealed. Currently in the NC court ...

    LW - If-Then Commitments for AI Risk Reduction [by Holden Karnofsky] by habryka

    Play Episode Listen Later Sep 14, 2024 65:50


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: If-Then Commitments for AI Risk Reduction [by Holden Karnofsky], published by habryka on September 14, 2024 on LessWrong. Holden just published this paper on the Carnegie Endowment website. I thought it was a decent reference, so I figured I would crosspost it (included in full for convenience, but if either Carnegie Endowment or Holden has a preference for just having an excerpt or a pure link post, happy to change that) If-then commitments are an emerging framework for preparing for risks from AI without unnecessarily slowing the development of new technology. The more attention and interest there is these commitments, the faster a mature framework can progress. Introduction Artificial intelligence (AI) could pose a variety of catastrophic risks to international security in several domains, including the proliferation and acceleration of cyberoffense capabilities, and of the ability to develop chemical or biological weapons of mass destruction. Even the most powerful AI models today are not yet capable enough to pose such risks,[1] but the coming years could see fast and hard-to-predict changes in AI capabilities. Both companies and governments have shown significant interest in finding ways to prepare for such risks without unnecessarily slowing the development of new technology. This piece is a primer on an emerging framework for handling this challenge: if-then commitments. These are commitments of the form: If an AI model has capability X, risk mitigations Y must be in place. And, if needed, we will delay AI deployment and/or development to ensure the mitigations can be present in time. A specific example: If an AI model has the ability to walk a novice through constructing a weapon of mass destruction, we must ensure that there are no easy ways for consumers to elicit behavior in this category from the AI model. If-then commitments can be voluntarily adopted by AI developers; they also, potentially, can be enforced by regulators. Adoption of if-then commitments could help reduce risks from AI in two key ways: (a) prototyping, battle-testing, and building consensus around a potential framework for regulation; and (b) helping AI developers and others build roadmaps of what risk mitigations need to be in place by when. Such adoption does not require agreement on whether major AI risks are imminent - a polarized topic - only that certain situations would require certain risk mitigations if they came to pass. Three industry leaders - Google DeepMind, OpenAI, and Anthropic - have published relatively detailed frameworks along these lines. Sixteen companies have announced their intention to establish frameworks in a similar spirit by the time of the upcoming 2025 AI Action Summit in France.[2] Similar ideas have been explored at the International Dialogues on AI Safety in March 2024[3] and the UK AI Safety Summit in November 2023.[4] As of mid-2024, most discussions of if-then commitments have been in the context of voluntary commitments by companies, but this piece focuses on the general framework as something that could be useful to a variety of actors with different enforcement mechanisms. This piece explains the key ideas behind if-then commitments via a detailed walkthrough of a particular if-then commitment, pertaining to the potential ability of an AI model to walk a novice through constructing a chemical or biological weapon of mass destruction. It then discusses some limitations of if-then commitments and closes with an outline of how different actors - including governments and companies - can contribute to the path toward a robust, enforceable system of if-then commitments. Context and aims of this piece. In 2023, I helped with the initial development of ideas related to if-then commitments.[5] To date, I have focused on private discussion of this new fram...

    LW - Evidence against Learned Search in a Chess-Playing Neural Network by p.b.

    Play Episode Listen Later Sep 14, 2024 9:38


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Evidence against Learned Search in a Chess-Playing Neural Network, published by p.b. on September 14, 2024 on LessWrong. Introduction There is a new paper and lesswrong post about "learned look-ahead in a chess-playing neural network". This has long been a research interest of mine for reasons that are well-stated in the paper: Can neural networks learn to use algorithms such as look-ahead or search internally? Or are they better thought of as vast collections of simple heuristics or memorized data? Answering this question might help us anticipate neural networks' future capabilities and give us a better understanding of how they work internally. and further: Since we know how to hand-design chess engines, we know what reasoning to look for in chess-playing networks. Compared to frontier language models, this makes chess a good compromise between realism and practicality for investigating whether networks learn reasoning algorithms or rely purely on heuristics. So the question is whether Francois Chollet is correct with transformers doing "curve fitting" i.e. memorisation with little generalisation or whether they learn to "reason". "Reasoning" is a fuzzy word, but in chess you can at least look for what human players call "calculation", that is the ability to execute moves solely in your mind to observe and evaluate the resulting position. To me this is a crux as to whether large language models will scale to human capabilities without further algorithmic breakthroughs. The paper's authors, which include Erik Jenner and Stuart Russell, conclude that the policy network of Leela Chess Zero (a top engine and open source replication of AlphaZero) does learn look-ahead. Using interpretability techniques they "find that Leela internally represents future optimal moves and that these representations are crucial for its final output in certain board states." While the term "look-ahead" is fuzzy, the paper clearly intends to show that the Leela network implements an "algorithm" and a form of "reasoning". My interpretation of the presented evidence is different, as discussed in the comments of the original lesswrong post. I argue that all the evidence is completely consistent with Leela having learned to recognise multi-move patterns. Multi-move patterns are just complicated patterns that take into account that certain pieces will have to be able to move to certain squares in future moves for the pattern to hold. The crucial different to having learned an algorithm: An algorithm can take different inputs and do its thing. That allows generalisation to unseen or at least unusual inputs. This means that less data is necessary for learning because the generalisation power is much higher. Learning multi-move patterns on the other hand requires much more data because the network needs to see many versions of the pattern until it knows all specific details that have to hold. Analysis setup Unfortunately it is quite difficult to distinguish between these two cases. As I argued: Certain information is necessary to make the correct prediction in certain kinds of positions. The fact that the network generally makes the correct prediction in these types of positions already tells you that this information must be processed and made available by the network. The difference between lookahead and multi-move pattern recognition is not whether this information is there but how it got there. However, I propose an experiment, that makes it clear that there is a difference. Imagine you train the model to predict whether a position leads to a forced checkmate and also the best move to make. You pick one tactical motive and erase it from the checkmate prediction part of the training set, but not the move prediction part. Now the model still knows which the right moves are to make i.e. it would pl...

    AF - Estimating Tail Risk in Neural Networks by Jacob Hilton

    Play Episode Listen Later Sep 13, 2024 41:11


    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Estimating Tail Risk in Neural Networks, published by Jacob Hilton on September 13, 2024 on The AI Alignment Forum. Machine learning systems are typically trained to maximize average-case performance. However, this method of training can fail to meaningfully control the probability of tail events that might cause significant harm. For instance, while an artificial intelligence (AI) assistant may be generally safe, it would be catastrophic if it ever suggested an action that resulted in unnecessary large-scale harm. Current techniques for estimating the probability of tail events are based on finding inputs on which an AI behaves catastrophically. Since the input space is so large, it might be prohibitive to search through it thoroughly enough to detect all potential catastrophic behavior. As a result, these techniques cannot be used to produce AI systems that we are confident will never behave catastrophically. We are excited about techniques to estimate the probability of tail events that do not rely on finding inputs on which an AI behaves badly, and can thus detect a broader range of catastrophic behavior. We think developing such techniques is an exciting problem to work on to reduce the risk posed by advanced AI systems: Estimating tail risk is a conceptually straightforward problem with relatively objective success criteria; we are predicting something mathematically well-defined, unlike instances of eliciting latent knowledge (ELK) where we are predicting an informal concept like "diamond". Improved methods for estimating tail risk could reduce risk from a variety of sources, including central misalignment risks like deceptive alignment. Improvements to current methods can be found both by doing empirical research, or by thinking about the problem from a theoretical angle. This document will discuss the problem of estimating the probability of tail events and explore estimation strategies that do not rely on finding inputs on which an AI behaves badly. In particular, we will: Introduce a toy scenario about an AI engineering assistant for which we want to estimate the probability of a catastrophic tail event. Explain some deficiencies of adversarial training, the most common method for reducing risk in contemporary AI systems. Discuss deceptive alignment as a particularly dangerous case in which adversarial training might fail. Present methods for estimating the probability of tail events in neural network behavior that do not rely on evaluating behavior on concrete inputs. Conclude with a discussion of why we are excited about work aimed at improving estimates of the probability of tail events. This document describes joint research done with Jacob Hilton, Victor Lecomte, David Matolcsi, Eric Neyman, Thomas Read, George Robinson, and Gabe Wu. Thanks additionally to Ajeya Cotra, Lukas Finnveden, and Erik Jenner for helpful comments and suggestions. A Toy Scenario Consider a powerful AI engineering assistant. Write M for this AI system, and M(x) for the action it suggests given some project description x. We want to use this system to help with various engineering projects, but would like it to never suggest an action that results in large-scale harm, e.g. creating a doomsday device. In general, we define a behavior as catastrophic if it must never occur in the real world.[1] An input is catastrophic if it would lead to catastrophic behavior. Assume we can construct a catastrophe detector C that tells us if an action M(x) will result in large-scale harm. For the purposes of this example, we will assume both that C has a reasonable chance of catching all catastrophes and that it is feasible to find a useful engineering assistant M that never triggers C (see Catastrophe Detectors for further discussion). We will also assume we can use C to train M, but that it is ...

    Claim The Nonlinear Library

    In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

    Claim Cancel