Podcasts about instructgpt

21PODCASTS
38EPISODES
33mAVG DURATION
?INFREQUENT EPISODES
Oct 8, 2024LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about instructgpt

The Nonlinear Library

11 episodes with instructgpt

Multimodal by Bakz T. Future

5 episodes with instructgpt

The Nonlinear Library: LessWrong

3 episodes with instructgpt

Latest podcast episodes about instructgpt

AI Business Models

Digital Business Models

Play Episode Listen Later Oct 8, 2024 14:32

Nearly a couple of years back - as I saw ChatGPT - just like everyone else who had been in the AI industry for the last decade, it was super clear that it was a turning point.To be sure, from within the industry, from GPT-2 onward, it was clear that something massive was happening, as for the first time (even if at the time the AI still generated a lot of non-sense), the paradigm was changing, as the output wasn't any longer stitching together of existing phrases, from a text the AI had somehow found.But it generated it independently, unsupervised, by “making sense” of the underlying text. That was mind-blowing!When ChatGPT came out, it was only the confirmation that the underlying model (GPT-3) with a new technique (InstructGPT) could be a game changer.It's nearly two years after the fact, and we've reached a point where tools like NotebookLM are so impressive that it's hard to imagine what's coming next!Indeed, the AI-generated this whole podcast episode after feeding it into our book AI Business Models!Before we get to it and understand its implications, remember you can download the AI Business Models book, if you subscribed to our premium newsletter. As you request access, please provide the email you used to subscribe, and we'll provide access!Subscribe to get access to the Book!Thematic OutlineFundamental ConceptsA. Technological Underpinnings:CPUs vs. GPUs: Differences in processing power, architecture, and applications.AI Supercomputers: Role in training large language models, reliance on GPUs.Transformer Architecture: Impact on natural language processing, attention mechanisms.B. Machine Learning ConceptsPre-training and Fine-tuning: Building general knowledge and specializing for specific tasks.Unsupervised vs. Supervised Learning: Learning from unlabeled data vs. labeled data with instructions.Reinforcement Learning: Learning through trial and error, rewards, and penalties.C. Key Trends in AIContent is King: Importance of high-quality data for training effective AI models.Multimodality: AI processing and integrating diverse data types like text, images, and audio.Emergence: Unexpected capabilities arising from increasingly complex AI models.AI Business Models and EvolutionA. Historical ContextThe Walled Garden Era: Limited access to information, controlled by portals like AOL.The Rise of the Internet: Open access to information, facilitated by web browsers.The Reverse Kronos Effect: Startups using technology to disrupt established industries (e.g., Google vs. AOL).B. Current LandscapeThe AI Ecosystem: Different layers, including infrastructure, models, and applications.Business Models in the "Apps' Layer": Ad-based, subscription-based, and consumption-based models.Building Competitive Moats: Differentiation strategies and challenges in a rapidly evolving field.Future of AI & Ethical ConsiderationsPotential of AIGenerative AI: Creating new content and pushing creative boundaries.InstructGPT: Enhancing AI's ability to follow instructions and generate accurate outputs.Decentralized AI Ecosystem: Exploring feasibility, challenges, and benefits.Ethical ImplicationsBias in AI: Addressing fairness, transparency, and potential discrimination.Job Displacement: Analyzing the impact of automation and potential solutions.Responsible AI Development: Implementing ethical guidelines, transparency, and accountability.Summary of the AI Theory Based on Layers, Hardware, Software, and Business ModelsThe AI Business Models book offers a glimpse into the evolving landscape of Artificial Intelligence (AI), highlighting key layers, technological advancements, and shifting business paradigms.Layers of the AI Ecosystem:These can be broadly categorized as:Infrastructure Layer: This encompasses the hardware and software foundations, with AI Supercomputers and GPUs playing a pivotal role in providing the computational power needed for training Large Language Models (LLMs).Model Layer: This layer focuses on the development and training of AI models like LLMs, utilizing techniques like pre-training on massive datasets and fine-tuning for specific tasks. Generative AI models, capable of creating new content, represent a significant advancement in this layer.Applications Layer: This layer comprises AI-powered applications and services that leverage the capabilities of underlying models. The AI Business Models book mentions various business models for companies operating in this layer, including ad-based, subscription-based, and consumption-based models.New Hardware and Software:Hardware: The AI Business Models book emphasizes the critical role of GPUs in accelerating AI workloads. Unlike CPUs designed for sequential processing, GPUs excel at parallel processing, making them ideal for handling the massive datasets and complex computations involved in AI training. AI Supercomputers, equipped with numerous GPUs, provide the necessary computational power to develop and train LLMs.Software: The AI Business Models book highlights advancements in AI model architectures, particularly the Transformer Architecture. This architecture, leveraging "attention mechanisms," has revolutionized Natural Language Processing (NLP) tasks, enabling significant improvements in language understanding and generation.New Business Model Paradigm:The AI Business Models book touches upon the evolution of AI business models, though they don't provide a comprehensive historical analysis. However, they do highlight the "Reverse Kronos Effect", where startups leverage new technologies and agile practices to disrupt established industries. This effect is exemplified by Google's dominance in the search and advertising market, surpassing previous giants like AOL.The AI Business Models book also mentions various business models for AI-powered applications, including ad-based, subscription-based, and consumption-based models. This suggests a shift towards more diverse monetization strategies in the AI Applications Layer.Expected Developments:The AI Business Models book hints at potential future directions:Multimodality: "Multimodality" is a key development in AI, enabling models to process and integrate diverse data types like text, images, audio, and video. This suggests a future where AI applications offer richer and more versatile experiences beyond text-based interactions.Emergence: The concept of "emergence" is mentioned in the context of AI. The phenomenon where complex behaviors and capabilities arise unexpectedly from the interaction of simpler components in AI systems. This suggests that future AI models might exhibit capabilities that go beyond their initial design, potentially leading to unforeseen breakthroughs and challenges.GlossaryHere is a glossary of key terms based on the provided source:AI Supercomputer: A computing system specifically designed for AI tasks, using many GPUs and specialized hardware to handle the massive processing demands of training and running large language models.Business Engine: The core value proposition and revenue-generating mechanisms of an AI-powered product or service, including pricing models, customer acquisition strategies, and overall business strategy.Content is King: This phrase emphasizes the importance of high-quality content in attracting and retaining an audience. For AI, it highlights the critical role of data in training effective models, as data quality and relevance directly influence AI performance.CPU (Central Processing Unit): The primary processor in a computer, responsible for executing instructions and managing system operations. It excels at sequential processing, handling a limited number of tasks quickly.Distribution Engine: The channels and mechanisms used to deliver AI-powered products or services to end-users, including marketing, partnerships, and platform integrations, facilitating adoption and accessibility.Fine-tuning: The process of further training a pre-trained AI model on a smaller, task-specific dataset to refine its capabilities and optimize its performance for a specific application or industry.Generative AI: A type of artificial intelligence focused on creating new content (text, images, audio, video) based on patterns learned from existing data.GPU (Graphics Processing Unit): An electronic circuit designed for parallel processing. GPUs excel at handling massive datasets and performing complex calculations concurrently, making them suitable for tasks like rendering graphics and training AI models.InstructGPT: A large language model developed by OpenAI that uses human feedback to improve its ability to follow instructions and generate more accurate and useful responses.Large Language Model (LLM): An AI model trained on a massive dataset of text and code. LLMs understand and generate human-quality text, translate languages, write different kinds of creative content, and answer questions informatively.Paradigm Shift: A fundamental change in the underlying assumptions, beliefs, and practices of a specific field or industry. Technological breakthroughs often drive paradigm shifts in AI, leading to new ways of thinking about and leveraging AI.Pre-training: The initial training phase of an AI model using a vast, general dataset. This allows the model to learn fundamental patterns, relationships, and representations, providing a knowledge foundation for building more specialized capabilities through fine-tuning.Prompt Engineering: The process of designing and refining prompts to elicit the most desirable and accurate responses from an AI model. Effective prompt engineering optimizes AI performance and guides its behavior toward desired outcomes.Reinforcement Learning: A type of machine learning where an AI agent learns through trial and error, receiving rewards or penalties for its actions in an environment, allowing it to develop optimal strategies for problem-solving and goal achievement.Reverse Kronos Effect: The phenomenon where a startup uses disruptive technology and agile practices to rapidly overtake established industry leaders.Transformer Architecture: A neural network architecture that has revolutionized natural language processing (NLP). It uses "attention mechanisms" to process sequential data effectively, enabling breakthroughs in language understanding and generation tasks.Unsupervised Learning: A type of machine learning where the AI model trains on unlabeled data, learning patterns and relationships without explicit guidance.

Kate Park: Data Engines for Vision and Language

The Gradient Podcast

Play Episode Listen Later Mar 21, 2024 41:34

In episode 116 of The Gradient Podcast, Daniel Bashir speaks to Kate Park. Kate is the Director of Product at Scale AI. Prior to joining Scale, Kate worked on Tesla Autopilot as the AI team's first and lead product manager building the industry's first data engine. She has also published research on spoken natural language processing and a travel memoir.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pubSubscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:11) Kate's background* (03:22) Tesla and cameras vs. Lidar, importance of data* (05:12) “Data is key”* (07:35) Data vs. architectural improvements* (09:36) Effort for data scaling* (10:55) Transfer of capabilities in self-driving* (13:44) Data flywheels and edge cases, deployment* (15:48) Transition to Scale* (18:52) Perspectives on shifting to transformers and data* (21:00) Data engines for NLP vs. for vision* (25:32) Model evaluation for LLMs in data engines* (27:15) InstructGPT and data for RLHF* (29:15) Benchmark tasks for assessing potential labelers* (32:07) Biggest challenges for data engines* (33:40) Expert AI trainers* (36:22) Future work in data engines* (38:25) Need for human labeling when bootstrapping new domains or tasks* (41:05) OutroLinks:* Scale Data Engine* OpenAI case study Get full access to The Gradient at thegradientpub.substack.com/subscribe

Precise Zero-Shot Dense Retrieval without Relevance Labels

Papers Read on AI

Play Episode Listen Later Feb 13, 2024 28:11

While dense retrieval has been shown to be effective and efficient across tasks and languages, it remains difficult to create effective fully zero-shot dense retrieval systems when no relevance labels are available. In this paper, we recognize the difficulty of zero-shot learning and encoding relevance. Instead, we propose to pivot through Hypothetical Document Embeddings (HyDE). Given a query, HyDE first zero-shot prompts an instruction-following language model (e.g., InstructGPT) to generate a hypothetical document. The document captures relevance patterns but is “fake” and may contain hallucinations. Then, an unsupervised contrastively learned encoder (e.g., Contriever) encodes the document into an embedding vector. This vector identifies a neighborhood in the corpus embedding space, from which similar real documents are retrieved based on vector similarity. This second step grounds the generated document to the actual corpus, with the encoder's dense bottleneck filtering out the hallucinations. Our experiments show that HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever and shows strong performance comparable to fine-tuned retrievers across various tasks (e.g. web search, QA, fact verification) and in non-English languages (e.g., sw, ko, ja, bn). 2022: Luyu Gao, Xueguang Ma, Jimmy J. Lin, Jamie Callan https://arxiv.org/pdf/2212.10496.pdf

english labels hyde relevance qa dense precise retrieval instructgpt

RLHF 201 - with Nathan Lambert of AI2 and Interconnects

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jan 11, 2024 85:30

In 2023 we did a few Fundamentals episodes covering Benchmarks 101, Datasets 101, FlashAttention, and Transformers Math, and it turns out those were some of your evergreen favorites! So we are experimenting with more educational/survey content in the mix alongside our regular founder and event coverage. Pls request more!We have a new calendar for events; join to be notified of upcoming things in 2024!Today we visit the shoggoth mask factory: how do transformer models go from trawling a deeply learned latent space for next-token prediction to a helpful, honest, harmless chat assistant? Our guest “lecturer” today is ; you might know him from his prolific online writing on and Twitter, or from his previous work leading RLHF at HuggingFace and now at the Allen Institute for AI (AI2) which recently released the open source GPT3.5-class Tulu 2 model which was trained with DPO. He's widely considered one of the most knowledgeable people on RLHF and RLAIF. He recently gave an “RLHF 201” lecture at Stanford, so we invited him on the show to re-record it for everyone to enjoy! You can find the full slides here, which you can use as reference through this episode. Full video with synced slidesFor audio-only listeners, this episode comes with slide presentation along our discussion. You can find it on our YouTube (like, subscribe, tell a friend, et al).Theoretical foundations of RLHFThe foundation and assumptions that go into RLHF go back all the way to Aristotle (and you can find guidance for further research in the slide below) but there are two key concepts that will be helpful in thinking through this topic and LLMs in general:* Von Neumann–Morgenstern utility theorem: you can dive into the math here, but the TLDR is that when humans make decision there's usually a “maximum utility” function that measures what the best decision would be; the fact that this function exists, makes it possible for RLHF to model human preferences and decision making.* Bradley-Terry model: given two items A and B from a population, you can model the probability that A will be preferred to B (or vice-versa). In our world, A and B are usually two outputs from an LLM (or at the lowest level, the next token). It turns out that from this minimal set of assumptions, you can build up the mathematical foundations supporting the modern RLHF paradigm!The RLHF loopOne important point Nathan makes is that "for many tasks we want to solve, evaluation of outcomes is easier than producing the correct behavior". For example, it might be difficult for you to write a poem, but it's really easy to say if you like or dislike a poem someone else wrote. Going back to the Bradley-Terry Model we mentioned, the core idea behind RLHF is that when given two outputs from a model, you will be able to say which of the two you prefer, and we'll then re-encode that preference into the model.An important point that Nathan mentions is that when you use these preferences to change model behavior "it doesn't mean that the model believes these things. It's just trained to prioritize these things". When you have preference for a model to not return instructions on how to write a computer virus for example, you're not erasing the weights that have that knowledge, but you're simply making it hard for that information to surface by prioritizing answers that don't return it. We'll talk more about this in our future Fine Tuning 101 episode as we break down how information is stored in models and how fine-tuning affects it.At a high level, the loop looks something like this:For many RLHF use cases today, we can assume the model we're training is already instruction-tuned for chat or whatever behavior the model is looking to achieve. In the "Reward Model & Other Infrastructure" we have multiple pieces:Reward + Preference ModelThe reward model is trying to signal to the model how much it should change its behavior based on the human preference, subject to a KL constraint. The preference model itself scores the pairwise preferences from the same prompt (worked better than scalar rewards).One way to think about it is that the reward model tells the model how big of a change this new preference should make in the behavior in absolute terms, while the preference model calculates how big of a difference there is between the two outputs in relative terms. A lot of this derives from John Schulman's work on PPO:We recommend watching him talk about it in the video above, and also Nathan's pseudocode distillation of the process:Feedback InterfacesUnlike the "thumbs up/down" buttons in ChatGPT, data annotation from labelers is much more thorough and has many axis of judgement. At a simple level, the LLM generates two outputs, A and B, for a given human conversation. It then asks the labeler to use a Likert scale to score which one it preferred, and by how much:Through the labeling process, there are many other ways to judge a generation:We then use all of this data to train a model from the preference pairs we have. We start from the base instruction-tuned model, and then run training in which the loss of our gradient descent is the difference between the good and the bad prompt.Constitutional AI (RLAIF, model-as-judge)As these models have gotten more sophisticated, people started asking the question of whether or not humans are actually a better judge of harmfulness, bias, etc, especially at the current price of data labeling. Anthropic's work on the "Constitutional AI" paper is using models to judge models. This is part of a broader "RLAIF" space: Reinforcement Learning from AI Feedback.By using a "constitution" that the model has to follow, you are able to generate fine-tuning data for a new model that will be RLHF'd on this constitution principles. The RLHF model will then be able to judge outputs of models to make sure that they follow its principles:Emerging ResearchRLHF is still a nascent field, and there are a lot of different research directions teams are taking; some of the newest and most promising / hyped ones:* Rejection sampling / Best of N Sampling: the core idea here is that rather than just scoring pairwise generations, you are generating a lot more outputs (= more inference cost), score them all with your reward model and then pick the top N results. LLaMA2 used this approach, amongst many others.* Process reward models: in Chain of Thought generation, scoring each step in the chain and treating it like its own state rather than just scoring the full output. This is most effective in fields like math that inherently require step-by-step reasoning.* Direct Preference Optimization (DPO): We covered DPO in our NeurIPS Best Papers recap, and Nathan has a whole blog post on this; DPO isn't technically RLHF as it doesn't have the RL part, but it's the “GPU Poor” version of it. Mistral-Instruct was a DPO model, as do Intel's Neural Chat and StableLM Zephyr. Expect to see a lot more variants in 2024 given how “easy” this was.* Superalignment: OpenAI launched research on weak-to-strong generalization which we briefly discuss at the 1hr mark.Note: Nathan also followed up this post with RLHF resources from his and peers' work:Show Notes* Full RLHF Slides* Interconnects* Retort (podcast)* von Neumann-Morgenstern utility theorem* Bradley-Terry model (pairwise preferences model)* Constitutional AI* Tamer (2008 paper by Bradley Knox and Peter Stone)* Paul Christiano et al. RLHF paper* InstructGPT* Eureka by Jim Fan* ByteDance / OpenAI lawsuit* AlpacaEval* MTBench* TruthfulQA (evaluation tool)* Self-Instruct Paper* Open Assistant* Louis Castricato* Nazneen Rajani* Tulu (DPO model from the Allen Institute)Timestamps* [00:00:00] Introductions and background on the lecture origins* [00:05:17] History of RL and its applications* [00:10:09] Intellectual history of RLHF* [00:13:47] RLHF for decision-making and pre-deep RL vs deep RL* [00:20:19] Initial papers and intuitions around RLHF* [00:27:57] The three phases of RLHF* [00:31:09] Overfitting issues* [00:34:47] How preferences get defined* [00:40:35] Ballpark on LLaMA2 costs* [00:42:50] Synthetic data for training* [00:47:25] Technical deep dive in the RLHF process* [00:54:34] Projection / best event sampling* [00:57:49] Constitutional AI* [01:04:13] DPO* [01:08:54] What's the Allen Institute for AI?* [01:13:43] Benchmarks and models comparisonsTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:15]: Hey, and today we have Dr. Nathan Lambert in the house. Welcome.Nathan [00:00:18]: Thanks guys.Swyx [00:00:19]: You didn't have to come too far. You got your PhD in Berkeley, and it seems like you've lived there most of the time in recent years. You worked on robotics and model-based reinforcement learning on your PhD, and you also interned at FAIR and DeepMind. You bootstrapped the RLHF team at Hugging Face, and you recently joined the Allen Institute as a research scientist. So that's your quick bio. What should people know about you that maybe is not super obvious about you on New LinkedIn?Nathan [00:00:43]: I stay sane in various insane sport and ultra-endurance sport activities that I do.Swyx [00:00:50]: What's an ultra-endurance sport activity?Nathan [00:00:52]: Long-distance trail running or gravel biking. Try to unplug sometimes, although it's harder these days. Yeah.Swyx [00:00:59]: Well, you know, just the Bay Area is just really good for that stuff, right?Nathan [00:01:02]: Oh, yeah. You can't beat it. I have a trailhead like 1.2 miles from my house, which is pretty unmatchable in any other urban area.Swyx [00:01:11]: Pretty excellent. You also have an incredible blog, Interconnects, which I'm a fan of. And I also just recently discovered that you have a new podcast, Retort.Nathan [00:01:20]: Yeah, we do. I've been writing for a while, and I feel like I've finally started to write things that are understandable and fun. After a few years lost in the wilderness, if you ask some of my friends that I made read the earlier blogs, they're like, oh, this is yikes, but it's coming along. And the podcast is with my friend Tom, and we just kind of like riff on what's actually happening on AI and not really do news recaps, but just what it all means and have a more critical perspective on the things that really are kind of funny, but still very serious happening in the world of machine learning.Swyx [00:01:52]: Yeah. Awesome. So let's talk about your work. What would you highlight as your greatest hits so far on Interconnects, at least?Nathan [00:01:59]: So the ones that are most popular are timely and or opinion pieces. So the first real breakout piece was when April and I also just wrote down the thing that everyone in AI was feeling, which is we're all feeling stressed, that we're going to get scooped, and that we're overworked, which is behind the curtain, what it feels to work in AI. And then a similar one, which we might touch on later in this, was about my recent job search, which wasn't the first time I wrote a job search post. People always love that stuff. It's so open. I mean, it's easy for me to do in a way that it's very on-brand, and it's very helpful. I understand that until you've done it, it's hard to share this information. And then the other popular ones are various model training techniques or fine tuning. There's an early one on RLHF, which is, this stuff is all just like when I figure it out in my brain. So I wrote an article that's like how RLHF actually works, which is just the intuitions that I had put together in the summer about RLHF, and that was pretty well. And then I opportunistically wrote about QSTAR, which I hate that you have to do it, but it is pretty funny. From a literature perspective, I'm like, open AI publishes on work that is very related to mathematical reasoning. So it's like, oh, you just poke a little around what they've already published, and it seems pretty reasonable. But we don't know. They probably just got like a moderate bump on one of their benchmarks, and then everyone lost their minds. It doesn't really matter.Swyx [00:03:15]: You're like, this is why Sam Altman was fired. I don't know. Anyway, we're here to talk about RLHF 101. You did a presentation, and I think you expressed some desire to rerecord it. And that's why I reached out on Twitter saying, like, why not rerecord it with us, and then we can ask questions and talk about it. Yeah, sounds good.Nathan [00:03:30]: I try to do it every six or 12 months is my estimated cadence, just to refine the ways that I say things. And people will see that we don't know that much more, but we have a bit of better way of saying what we don't know.Swyx [00:03:43]: Awesome. We can dive right in. I don't know if there's any other topics that we want to lay out as groundwork.Alessio [00:03:48]: No, you have some awesome slides. So for people listening on podcast only, we're going to have the slides on our show notes, and then we're going to have a YouTube version where we run through everything together.Nathan [00:03:59]: Sounds good. Yeah. I think to start skipping a lot of the, like, what is a language model stuff, everyone knows that at this point. I think the quote from the Llama 2 paper is a great kind of tidbit on RLHF becoming like a real deal. There was some uncertainty earlier in the year about whether or not RLHF was really going to be important. I think it was not that surprising that it is. I mean, with recent models still using it, the signs were there, but the Llama 2 paper essentially reads like a bunch of NLP researchers that were skeptical and surprised. So the quote from the paper was, meanwhile, reinforcement learning known for its instability seemed a somewhat shadowy field for those in the NLP research community. However, reinforcement learning proved highly effective, particularly given its cost and time effectiveness. So you don't really know exactly what the costs and time that Meta is looking at, because they have a huge team and a pretty good amount of money here to release these Llama models. This is just the kind of thing that we're seeing now. I think any major company that wasn't doing RLHF is now realizing they have to have a team around this. At the same time, we don't have a lot of that in the open and research communities at the same scale. I think seeing that converge would be great, but it's still very early days. And the other thing on the slide is some of Anthropic's work, but everyone knows Anthropic is kind of the masters of this, and they have some of their own techniques that we're going to talk about later on, but that's kind of where we start.Alessio [00:05:17]: Can we do just a one-second RL version? So you come from a robotics background, which RL used to be, or maybe still is, state-of-the-art. And then now you're seeing a lot of LLM plus RL, so you have the gym fans, Eureka, you have MPU, which we had on the podcast when they started with RL. Now they're doing RL plus LLMs. Yeah. Any thoughts there on how we got here? Maybe how the pendulum will keep swinging?Nathan [00:05:46]: I really think RL is about a framing of viewing the world through trial and error learning and feedback, and really just one that's focused on thinking about decision-making and inputs in the world and how inputs have reactions. And in that, a lot of people come from a lot of different backgrounds, whether it's physics, electrical engineering, mechanical engineering. There are obviously computer scientists, but compared to other fields of CS, I do think it's a much more diverse background of people. My background was in electrical engineering and doing robotics and things like that. It really just changes the worldview. I think that reinforcement learning as it was back then, so to say, is really different. You're looking at these toy problems and the numbers are totally different, and everyone went kind of zero to one at scaling these things up, but people like Jim Phan and other people that were... You saw this transition in the decision transformer and papers and when people are trying to use transformers to do decision-making for things like offline RL, and I think that was kind of like the early days. But then once language models were so proven, it's like everyone is using this tool for their research. I think in the long run, it will still settle out, or RL will still be a field that people work on just because of these kind of fundamental things that I talked about. It's just viewing the whole problem formulation different than predicting text, and so there needs to be that separation. And the view of RL in language models is pretty contrived already, so it's not like we're doing real RL. I think the last slide that I have here is a way to make RLHF more like what people would think of with RL, so actually running things over time, but a weird lineage of tools that happen to get us to where we are, so that's why the name takes up so much space, but it could have gone a lot of different ways. Cool.Alessio [00:07:29]: We made it one slide before going on a tangent.Nathan [00:07:31]: Yeah, I mean, it's kind of related. This is a...Swyx [00:07:35]: Yeah, so we have a history of RL.Nathan [00:07:37]: Yeah, so to give the context, this paper really started because I have this more diverse background than some computer scientists, such as trying to understand what the difference of a cost function or a reward function and a preference function would be without going into all of the details. Costs are normally things that control theorists would work with in these kind of closed domains, and then reinforcement learning has always worked with rewards that's central to the formulation that we'll see, and then the idea was like, okay, we now are at preferences, and each step along the way there's kind of different assumptions that you're making. We'll get into these, and those assumptions are built on other fields of work. So that's what this slide is going to say, it's like RLHF, while directly building on tools from RL and language models, is really implicitly impacted and built on theories and philosophies spanning tons of human history. I think we cite Aristotle in this paper, which is fun. It's like going pre-BC, it's like 2,300 years old or something like that. So that's the reason to do this, I think. We kind of list some things in the paper about summarizing what different presumptions of RLHF could be. I think going through these is actually kind of funny. It's fun to talk about these, because they're kind of grab bags of things that you'll see return throughout this podcast that we're talking about it. The core thing of RLHF that, in order to be a believer in this, is that RL actually works. It's like, if you have a reward function, you can optimize it in some way and get a different performance out of it, and you could do this at scale, and you could do this in really complex environments, which is, I don't know how to do that in all the domains. I don't know how to exactly make chat GPT. So it's kind of, we'll overshadow everything. And then there's, go from something kind of obvious like that, and then you read the von Neumann-Morgenstern utility theorem, which is essentially an economic theory that says you can weight different probabilities of different people, which is a theoretical piece of work that is the foundation of utilitarianism, and trying to quantify preferences is crucial to doing any sort of RLHF. And if you look into this, all of these things, there's way more you could go into if you're interested in any of these. So this is kind of like grabbing a few random things, and then kind of similar to that is the Bradley-Terry model, which is the fancy name for the pairwise preferences that everyone is doing. And then all the things that are like, that Anthropic and OpenAI figured out that you can do, which is that you can aggregate preferences from a bunch of different people and different sources. And then when you actually do RLHF, you extract things from that data, and then you train a model that works somehow. And we don't know, there's a lot of complex links there, but if you want to be a believer in doing this at scale, these are the sorts of things that you have to accept as preconditions for doing RLHF. Yeah.Swyx [00:10:09]: You have a nice chart of like the sort of intellectual history of RLHF that we'll send people to refer to either in your paper or in the YouTube video for this podcast. But I like the other slide that you have on like the presumptions that you need to have for RLHF to work. You already mentioned some of those. Which one's underappreciated? Like, this is the first time I've come across the VNM Utility Theorem.Nathan [00:10:29]: Yeah, I know. This is what you get from working with people like to my co-host on the podcast, the rhetoric is that sociologist by training. So he knows all these things and like who the philosophers are that found these different things like utilitarianism. But there's a lot that goes into this. Like essentially there's even economic theories that like there's debate whether or not preferences exist at all. And there's like different types of math you can use with whether or not you actually can model preferences at all. So it's pretty obvious that RLHF is built on the math that thinks that you can actually model any human preference. But this is the sort of thing that's been debated for a long time. So all the work that's here is like, and people hear about in their AI classes. So like Jeremy Bentham, like hedonic calculus and all these things like these are the side of work where people assume that preferences can be measured. And this is like, I don't really know, like, this is what I kind of go on a rant and I say that in RLHF calling things a preference model is a little annoying because there's no inductive bias of what a preference is. It's like if you were to learn a robotic system and you learned a dynamics model, like hopefully that actually mirrors the world in some way of the dynamics. But with a preference model, it's like, Oh my God, I don't know what this model, like I don't know what chat GPT encodes as any sort of preference or what I would want it to be in a fair way. Anthropic has done more work on trying to write these things down. But even like if you look at Claude's constitution, like that doesn't mean the model believes these things. It's just trained to prioritize these things. And that's kind of what the later points I'm looking at, like what RLHF is doing and if it's actually like a repeatable process in the data and in the training, that's just unknown. And we have a long way to go before we understand what this is and the link between preference data and any notion of like writing down a specific value.Alessio [00:12:05]: The disconnect between more sociology work versus computer work already exists, or is it like a recent cross contamination? Because when we had Tri Dao on the podcast, he said FlashAttention came to be because at Hazy they have so much overlap between systems engineer and like deep learning engineers. Is it the same in this field?Nathan [00:12:26]: So I've gone to a couple of workshops for the populations of people who you'd want to include this like R. I think the reason why it's not really talked about is just because the RLHF techniques that people use were built in labs like OpenAI and DeepMind where there are some of these people. These places do a pretty good job of trying to get these people in the door when you compare them to like normal startups. But like they're not bringing in academics from economics, like social choice theory. There's just too much. Like the criticism of this paper that this is based on is like, oh, you're missing these things in RL or at least this decade of RL and it's like it would be literally be bigger than the Sutton and Barto book if you were to include everyone. So it's really hard to include everyone in a principled manner when you're designing this. It's just a good way to understand and improve the communication of what RLHF is and like what is a good reward model for society. It really probably comes down to what an individual wants and it'll probably motivate models to move more in that direction and just be a little bit better about the communication, which is a recurring theme and kind of my work is like I just get frustrated when people say things that don't really make sense, especially when it's going to manipulate individual's values or manipulate the general view of AI or anything like this. So that's kind of why RLHF is so interesting. It's very vague in what it's actually doing while the problem specification is very general.Swyx [00:13:42]: Shall we go to the, I guess, the diagram here on the reinforcement learning basics? Yeah.Nathan [00:13:47]: So reinforcement learning, I kind of mentioned this, it's a trial and error type of system. The diagram and the slides is really this classic thing where you have an agent interacting with an environment. So it's kind of this agent has some input to the environment, which is called the action. The environment returns a state and a reward and that repeats over time and the agent learns based on these states and these rewards that it's seeing and it should learn a policy that makes the rewards go up. That seems pretty simple than if you try to mentally map what this looks like in language, which is that like the language models don't make this easy. I think with the language model, it's very hard to define what an environment is. So if the language model is the policy and it's generating, it's like the environment should be a human, but setting up the infrastructure to take tens of thousands of prompts and generate them and then show them to a human and collect the human responses and then shove that into your training architecture is very far away from working. So we don't really have an environment. We just have a reward model that returns a reward and the state doesn't really exist when you look at it like an RL problem. What happens is the state is a prompt and then you do a completion and then you throw it away and you grab a new prompt. We're really in as an RL researcher, you would think of this as being like you take a state, you get some completion from it and then you look at what that is and you keep kind of iterating on it and all of that isn't here, which is why you'll hear RLHF referred to as bandits problem, which is kind of like you choose one action and then you watch the dynamics play out. There's many more debates that you can have in this. If you get the right RL people in the room, then kind of like this is an RL even when you zoom into what RLHF is doing.Alessio [00:15:22]: Does this change as you think about a chain of thought reasoning and things like that? Like does the state become part of the chain that you're going through?Nathan [00:15:29]: There's work that I've mentioned on one slide called process reward models that essentially rewards each step in the chain of thought reasoning. It doesn't really give the part of interaction, but it does make it a little bit more fine grained where you can think about like calling it at least you have many states from your initial state. That formulation I don't think people have fully settled on. I think there's a bunch of great work out there, like even OpenAI is releasing a lot of this and let's verify step by step is there pretty great paper on the matter. I think in the next year that'll probably get made more concrete by the community on like if you can easily draw out like if chain of thought reasoning is more like RL, we can talk about that more later. That's a kind of a more advanced topic than we probably should spend all the time on.Swyx [00:16:13]: RLHF for decision making. You have a slide here that compares pre-deep RL versus deep RL.Nathan [00:16:19]: This is getting into the history of things, which is showing that the work that people are using now really came from well outside of NLP and it came before deep learning was big. Next up from this paper, Tamer, which is from 2008. Some names that are still really relevant in kind of human centric RL, Bradley Knox and Peter Stone. If you have an agent take an action, you would just have a human give a score from zero to one as a reward rather than having a reward function. And then with that classifier, you can do something with a policy that learns to take actions to maximize that reward. It's a pretty simple setup. It works in simple domains. And then the reason why this is interesting is you compare it to the paper that everyone knows, which is this Paul Christiano et al. Deep Reinforced Learning from Human Preferences paper, which is where they showed that learning from human preferences, you can solve like the basic RL tasks at the time. So various control problems and simulation and this kind of like human preferences approach had higher rewards in some environments than if you just threw RL at the environment that returned a reward. So the preferences thing was you took two trajectories. So in this case, it was like complete trajectories of the agent and the human was labeling which one is better. You can see how this kind of comes to be like the pairwise preferences that are used today that we'll talk about. And there's also a really kind of interesting nugget that is the trajectory that the humans were labeling over has a lot more information than the RL algorithm would see if you just had one state, which is kind of why people think that it's why the performance in this paper was so strong. But I still think that it's surprising that there isn't more RL work of this style happening now. This paper is in 2017. So it's like six years later and I haven't seen things that are exactly similar, but it's a great paper to understand where stuff that's happening now kind of came from.Swyx [00:17:58]: Just on the Christiano paper, you mentioned the performance being strong. I don't remember what results should I have in mind when I think about that paper?Nathan [00:18:04]: It's mostly like if you think about an RL learning curve, which is like on the X axis, you have environment interactions on the Y axis, you have performance. You can think about different like ablation studies of between algorithms. So I think they use like A2C, which I don't even remember what that stands for as their baseline. But if you do the human preference version on a bunch of environments, like the human preference labels, the agent was able to learn faster than if it just learned from the signal from the environment, which means like it's happening because the reward model has more information than the agent would. But like the fact that it can do better, I was like, that's pretty surprising to me because RL algorithms are pretty sensitive. So I was like, okay.Swyx [00:18:41]: It's just one thing I do want to establish as a baseline for our listeners. We are updating all the weights. In some sense, the next token prediction task of training a language model is a form of reinforcement learning. Except that it's not from human feedback. It's just self-supervised learning from a general corpus. There's one distinction which I love, which is that you can actually give negative feedback. Whereas in a general sort of pre-training situation, you cannot. And maybe like the order of magnitude of feedback, like the Likert scale that you're going to talk about, that actually just gives more signal than a typical training process would do in a language model setting. Yeah.Nathan [00:19:15]: I don't think I'm the right person to comment exactly, but like you can make analogies that reinforcement learning is self-supervised learning as well. Like there are a lot of things that will point to that. I don't know whether or not it's a richer signal. I think that could be seen in the results. It's a good thing for people to look into more. As reinforcement learning is so much less compute, like it is a richer signal in terms of its impact. Because if they could do what RLHF is doing at pre-training, they would, but they don't know how to have that effect in like a stable manner. Otherwise everyone would do it.Swyx [00:19:45]: On a practical basis, as someone fine-tuning models, I have often wished for negative fine-tuning, which pretty much doesn't exist in OpenAI land. And it's not the default setup in open-source land.Nathan [00:19:57]: How does this work in like diffusion models and stuff? Because you can give negative prompts to something to like stable diffusion or whatever. It's for guidance.Swyx [00:20:04]: That's for clip guidance.Nathan [00:20:05]: Is that just from like how they prompt it then? I'm just wondering if we could do something similar. It's another tangent.Swyx [00:20:10]: I do want to sort of spell that out for people in case they haven't made the connection between RLHF and the rest of the training process. They might have some familiarity with it.Nathan [00:20:19]: Yeah. The upcoming slides can really dig into this, which is like this in 2018 paper, there was a position paper from a bunch of the same authors from the Christiano paper and from the OpenAI work that everyone knows, which is like, they write a position paper on what a preference reward model could do to solve alignment for agents. That's kind of based on two assumptions. The first assumption is that we can learn user intentions to a sufficiently high accuracy. That doesn't last with me because I don't know what that means. But the second one is pretty telling in the context of RLHF, which is for many tasks we want to solve, evaluation of outcomes is easier than producing the correct behavior. And this is the whole thing. It's like we can compare two poems that the model generates and it can be viewed as liking a positive example, or it could be viewed as really disliking a negative example. And that's what I think a lot of people are doing in like the harm space is like a harmful response to a language model, whether or not you agree with the company's definition of harms is that it's a really bad negative example and they downweight them by preferring something more benign in the RLHF process, among other ways of dealing with safety. So that's a good way of saying it's like this is core, this kind of like comparison and positive or negative example is core to all of the RLHF work that has continued.Swyx [00:21:29]: People often say, I don't know what I want, but I'll know when I see it. This is that expressed in reinforcement learning tools.Nathan [00:21:35]: Yeah, it is. Yeah, it is. That's what everyone's doing in the preference modeling stage that we'll get to. Yeah. Yeah. And you can see there are more papers. This is really just to have all the links for people that go deeper. There's a Ziegler et al. paper in 2019, which shows that you can do this RLHF process on language models. This familiar diagram starts to emerge in 2019, and it's just to show that this goes really far back. I think we can kind of breeze through some of these. And then 2020 is the first open AI experiment that I think caught people's eyes, which is this learning to summarize experiment. It has this three-step process that we'll go to into more when I kind of go into the main concepts. But this is like the first time you see this diagram that they reuse with InstructGPT, they reuse with ChatGPT. And the types of examples that they would have, I don't think I need to read these exactly, but one that I have read a whole bunch of times is like, they took these prompts from Reddit that was like, explain like I'm five or get career advice, and people really pour their heart and soul into these. So these are like multi-paragraph pieces of writing. And then they essentially do comparisons between a vanilla language model, like I think it was either GPT-2 or GPT-3, I don't always get the exact years.Swyx [00:22:42]: 3 was early 2020. So that's about right.Nathan [00:22:45]: Yeah. So this is probably done with GPT-2. It doesn't really matter. But the language model does normal things when you do few shot, which is like it repeats itself. It doesn't have nice text. And what they did is that this was the first time where the language model would generate like pretty nice text from an output. It was restricted to the summarization domain. But I think that I guess this is where I wish I was paying attention more because I would see the paper, but I didn't know to read the language model outputs and kind of understand this qualitative sense of the models very well then. Because you look at the plots in the papers, these Learning to Summarize and Destruct GPT have incredibly pretty plots, just like nicely separated lines with error bars and they're like superfine tuning works, the RL step works. But if you were early to see like how different the language that was written by these models was, I think you could have been early to like things like ChatGPT and knowing RLHF would matter. And now I think the good people know to chat with language models, but not even everyone does this. Like people are still looking at numbers. And I think OpenAI probably figured it out when they were doing this, how important that could be. And then they had years to kind of chisel away at that and that's why they're doing so well now. Yeah.Swyx [00:23:56]: I mean, arguably, you know, it's well known that ChatGPT was kind of an accident that they didn't think it would be that big of a deal. Yeah.Nathan [00:24:02]: So maybe they didn't. Maybe they didn't, but they were getting the proxy that they needed.Swyx [00:24:06]: I've heard off the record from other labs that it was in the air. If OpenAI didn't do it, someone else would have done it. So you've mentioned a couple of other papers that are very seminal to this period. And I love how you say way back when in referring to 2019.Nathan [00:24:19]: It feels like it in my life.Swyx [00:24:21]: So how much should people understand the relationship between RLHF, instruction tuning, PPO, KL divergence, anything like that? Like how would you construct the level of knowledge that people should dive into? What should people know at the high level? And then if people want to dive in deeper, where do they go? Is instruct tuning important here or is that part of the overall process towards modern RLHF?Nathan [00:24:44]: I think for most people, instruction tuning is probably still more important in their day to day life. I think instruction tuning works very well. You can write samples by hand that make sense. You can get the model to learn from them. You could do this with very low compute. It's easy to do almost in like no code solutions at this point. And the loss function is really straightforward. And then if you're interested in RLHF, you can kind of learn from it from a different perspective, which is like how the instruction tuning distribution makes it easier for your RLHF model to learn. There's a lot of details depending on your preference data, if it's close to your instruction model or not, if that matters. But that's really at the RLHF stage. So I think it's nice to segment and just kind of understand what your level of investment and goals are. I think instruction tuning still can do most of what you want to do. And it's like, if you want to think about RLHF, at least before DPO really had taken off at all, it would be like, do you want to have a team of at least like five people if you're really thinking about doing RLHF? I think DPO makes it a little bit easier, but that's still really limited to kind of one data set that everyone's using at this point. Like everyone's using this ultra feedback data set and it boosts AlpacaVal, MTBench, TruthfulQA and like the qualitative model a bit. We don't really know why. It's like, it might just be a data set combined with the method, but you've got to be ready for a bumpy ride if you're wanting to try to do RLHF. I don't really recommend most startups to do it unless it's like going to provide them a clear competitive advantage in their kind of niche, because you're not going to make your model chat GPT like better than OpenAI or anything like that. You've got to accept that there's some exploration there and you might get a vein of benefit in your specific domain, but I'm still like, oh, be careful going into the RLHF can of worms. You probably don't need to.Swyx [00:26:27]: Okay. So there's a bit of a time skip in what you mentioned. DPO is like a couple months old, so we'll leave that towards the end. I think the main result that I think most people talk about at this stage, we're talking about September 2020 and then going into, I guess maybe last year was Vicuña as one of the more interesting applications of instruction tuning that pushed LLAMA1 from, let's say a GPT 3-ish model to a GPT 3.5 model in pure open source with not a lot of resources. I think, I mean, they said something like, you know, they use like under $100 to makeNathan [00:26:58]: this. Yeah. Like instruction tuning can really go a long way. I think the claims of chat GPT level are long overblown in most of the things in open source. I think it's not to say, like Vicuña was a huge step and it's just kind of showing that instruction tuning with the right data will completely change what it feels like to talk with your model. Yeah.Swyx [00:27:19]: From text completion to actually chatting back and forth. Yeah. Yeah.Nathan [00:27:23]: Instruction tuning can be multi-turn. Just having a little bit of data that's like a couple of turns can go a really long way. That was like the story of the whole first part of the year is like people would be surprised by how far you can take instruction tuning on a small model. I think the things that people see now is like the small models don't really handle nuance as well and they could be more repetitive even if they have really good instruction tuning. But if you take that kind of 7 to 70 billion parameter jump, like the instruction tuning at the bigger model is like robustness, little things make more sense. So that's still just with instruction tuning and scale more than anything else.Swyx [00:27:56]: Excellent. Shall we go to technical overview?Nathan [00:27:58]: Yeah. This is kind of where we go through my own version of this like three phase process. You can talk about instruction tuning, which we've talked about a lot. It's funny because all these things, instruction tuning has the fewest slides, even though it's the most practical thing for most people. We could save the debate for like if the big labs still do instruction tuning for later, but that's a coming wave for people. And then like preference data and training and then kind of like what does reinforce learning optimization actually mean? We talk about these sequentially because you really have to be able to do each of them to be able to do the next one. You need to be able to have a model that's chatty or helpful instruction following. Every company has their own word that they like to assign to what instructions mean. And then once you have that, you can collect preference data and do some sort of optimization.Swyx [00:28:39]: When you say word, you mean like angle bracket inst or do you mean something else?Nathan [00:28:42]: Oh, I don't even know what inst means, but just saying like they use their adjective that they like. I think Entropic also like steerable is another one.Swyx [00:28:51]: Just the way they describe it. Yeah.Nathan [00:28:53]: So like instruction tuning, we've covered most of this is really about like you should try to adapt your models to specific needs. It makes models that were only okay, extremely comprehensible. A lot of the times it's where you start to get things like chat templates. So if you want to do system prompts, if you want to ask your model, like act like a pirate, that's one of the ones I always do, which is always funny, but like whatever you like act like a chef, like anything, this is where those types of things that people really know in language models start to get applied. So it's good as a kind of starting point because this chat template is used in our early childhood and all of these things down the line, but it was a basic pointer. It's like, once you see this with instruction tuning, you really know it, which is like you take things like stack overflow where you have a question and an answer. You format that data really nicely. There's much more tricky things that people do, but I still think the vast majority of it is question answer. Please explain this topic to me, generate this thing for me. That hasn't changed that much this year. I think people have just gotten better at scaling up the data that they need. Yeah, this is where this talk will kind of take a whole left turn into more technical detail land. I put a slide with the RLHF objective, which I think is good for people to know. I've started going back to this more, just kind of understand what is trying to happen here and what type of math people could do. I think because of this algorithm, we've mentioned this, it's in the air, direct preference optimization, but everything kind of comes from an equation of trying to learn a policy that maximizes the reward. The reward is some learned metric. A lot can be said about what the reward should be subject to some constraint. The most popular constraint is the KL distraint, which is just a distributional distance. Essentially in language models, that means if you have a completion from your instruction or RLHF model, you can compare that completion to a base model. And looking at the log probs from the model, which are essentially how likely each token is, you can see a rough calculation of the distance between these two models, just as a scalar number. I think what that actually looks like in code, you can look at it. It'd be like a sum of log probs that you get right from the model. It'll look much more simpler than it sounds, but it is just to make the optimization kind of stay on tracks.Make sure it doesn't overfit to the RLHF data. Because we have so little data in RLHF, overfitting is really something that could happen. I think it'll fit to specific features that labelers like to see, that the model likes to generate, punctuation, weird tokens like calculator tokens. It could overfit to anything if it's in the data a lot and it happens to be in a specific format. And the KL constraint prevents that. There's not that much documented work on that, but there's a lot of people that know if you take that away, it just doesn't work at all. I think it's something that people don't focus on too much. But the objective, as I said, it's just kind of, you optimize the reward. The reward is where the human part of this comes in. We'll talk about that next. And then subject to a constraint, don't change the model too much. The real questions are, how do you implement the reward? And then how do you make the reward go up in a meaningful way? So like a preference model, the task is kind of to design a human reward. I think the equation that most of the stuff is based on right now is something called a Bradley-Terry model, which is like a pairwise preference model where you compare two completions and you say which one you like better. I'll show an interface that Anthropic uses here. And the Bradley-Terry model is really a fancy probability between two selections. And what's happening in the math is that you're looking at the probability that the chosen completion, the one you like better, is actually the better completion over the rejected completion. And what these preference models do is they assume this probability is correlated to reward. So if you just sample from this probability, it'll give you a scalar. And then you use that reward later on to signify what piece of text is better. I'm kind of inclined to breeze through the math stuff because otherwise, it's going to be not as good to listen to.Alessio [00:32:49]: I think people want to hear it. I think there's a lot of higher level explanations out there. Yeah.Nathan [00:32:55]: So the real thing is you need to assign a scalar reward of how good a response is. And that's not necessarily that easy to understand. Because if we take back to one of the first works, I mentioned this tamer thing for decision making. People tried that with language models, which is if you have a prompt in a completion and you just have someone rate it from 0 to 10, could you then train a reward model on all of these completions in 0 to 10 ratings and see if you can get chat2BT with that? And the answer is really kind of no. Like a lot of people tried that. It didn't really work. And then that's why they tried this pairwise preference thing. And it happened to work. And this Bradley Terry model comes from the 50s. It's from these fields that I was mentioning earlier. And it's wild how much this happens. I mean, this screenshot I have in the slides is from the DPO paper. I think it might be the appendix. But it's still really around in the literature of what people are doing for RLHF.Alessio [00:33:45]: Yeah.Nathan [00:33:45]: So it's a fun one to know.Swyx [00:33:46]: I'll point out one presumption that this heavily relies on. You mentioned this as part of your six presumptions that we covered earlier, which is that you can aggregate these preferences. This is not exactly true among all humans, right? I have a preference for one thing. You have a preference for a different thing. And actually coming from economics, you mentioned economics earlier. There's a theorem or a name for this called error impossibility, which I'm sure you've come across..Nathan [00:34:07]: It's one of the many kind of things we throw around in the paper.Swyx [00:34:10]: Right. Do we just ignore it?Nathan [00:34:14]: We just, yeah, just aggregate. Yeah. I think the reason this really is done on a deep level is that you're not actually trying to model any contestable preference in this. You're not trying to go into things that are controversial or anything. It's really the notion of preference is trying to stay around correctness and style rather than any meaningful notion of preference. Because otherwise these companies, they don't want to do this at all. I think that's just how it is. And it's like, if you look at what people actually do. So I have a bunch of slides on the feedback interface. And they all publish this.Swyx [00:34:43]: It's always at the appendices of every paper.Nathan [00:34:47]: There's something later on in this talk, which is like, but it's good to mention. And this is when you're doing this preference collection, you write out a very long document of instructions to people that are collecting this data. And it's like, this is the hierarchy of what we want to prioritize. Something amount like factuality, helpfulness, honestness, harmlessness. These are all different things. Every company will rank these in different ways, provide extensive examples. It's like, if you see these two answers, you should select this one and why. And all of this stuff. And then my kind of like head scratching is like, why don't we check if the models actually do these things that we tell the data annotators to collect? But I think it's because it's hard to make that attribution. And it's hard to test if a model is honest and stuff. It would just be nice to understand the kind of causal mechanisms as a researcher or like if our goals are met. But at a simple level, what it boils down to, I have a lot more images than I need. It's like you're having a conversation with an AI, something like type GPT. You get shown two responses or more in some papers, and then you have to choose which one is better. I think something you'll hear a lot in this space is something called a Likert scale. Likert is a name. It's a name for probably some research in economics, decision theory, something. But essentially, it's a type of scale where if you have integers from like one to eight, the middle numbers will represent something close to a tie. And the smallest numbers will represent one model being way better than the other. And the biggest numbers will be like the other models better. So in the case of one to eight, if you're comparing models A to B, if you return a one, if you really liked option A, you return eight if you really like B, and then like a four or five if they were close. There's other ways to collect this data. This one's become really popular. We played with it a bit at Hugging Face. It's hard to use. Filling out this preference data is really hard. You have to read like multiple paragraphs. It's not for me. Some people really like it. I hear I'm like, I can't imagine sitting there and reading AI-generated text and like having to do that for my job. But a lot of these early papers in RLHF have good examples of what was done. The one I have here is from Anthropic's collection demo because it was from slides that I did with Anthropic. But you can look up these in the various papers. It looks like Chat2BT with two responses, and then you have an option to say which one is better. It's nothing crazy. The infrastructure is almost exactly the same, but they just log which one you think is better. I think places like Scale are also really big in this where a lot of the labeler companies will help control like who's doing how many samples. You have multiple people go over the same sample once and like what happens if there's disagreement. I don't really think this disagreement data is used for anything, but it's good to know like what the distribution of prompts is, who's doing it, how many samples you have, controlling the workforce. All of this is very hard. A last thing to add is that a lot of these companies do collect optional metadata. I think the Anthropic example shows a rating of like how good was the prompt or the conversation from good to bad because things matter. Like there's kind of a quadrant of preference data in my mind, which is you're comparing a good answer to a good answer, which is like really interesting signal. And then there's kind of the option of you're comparing a bad answer to a bad answer, which is like you don't want to train your model on two different issues. This is like, we did this at Hugging Base and it was like, our data was like, we don't know if we can use this because a lot of it was just bad answer to bad answer because you're like rushing to try to do this real contract. And then there's also good answer to bad answer, which I think is probably pretty reasonable to include. You just prefer the good one and move on with your life. But those are very different scenarios. I think open AIs of the world are all in good answer, good answer, and have learned to eliminate everything else. But when people try to do this in open source, it's probably like what Open Assistance saw is like, there's just a lot of bad answers in your preference data. And you're like, what do I do with this? Metadata flags can help. I threw in the instruct GPT metadata. You can see how much they collect here. And like everything from the model fails to actually complete the task, hallucinations, different types of offensive or dangerous content, moral judgment, expresses opinion. Like, I don't know exactly if they're doing this now, but you can kind of see why doing RLHF at scale and prioritizing a lot of different endpoints would be hard because these are all things I'd be interested in if I was scaling up a big team to do RLHF and like what is going into the preference data. You do an experiment and you're like, okay, we're going to remove all the data where they said the model hallucinates like just that and then retrain everything. Like, what does that do?Swyx [00:38:59]: Yeah, so hallucination is big, but some of these other metadata categories, and I've seen this in a lot of papers, it's like, does it contain sexual content? Does it express a moral judgment? Does it denigrate a protected class? That kind of stuff, very binary. Should people try to adjust for this at the RLHF layer or should they put it as a pipeline where they have a classifier as a separate model that grades the model output?Nathan [00:39:20]: Do you mean for training or like a deployment? Deployment. I do think that people are doing it at deployment. I think we've seen safety and other things in the RLHF pipeline. Like Lama 2 is famous for kind of having this like helpfulness and safety reward models. Deep in the Gemini report is something that Gemini has like four things, which is like helpfulness, factuality, maybe safety, maybe something else. But places like Anthropic and Chattopadhyay and Bard almost surely have a classifier after, which is like, is this text good? Is this text bad? That's not that surprising, I think, because you could use like a hundred times smaller language model and do much better at filtering than RLHF. But I do think it's still so deeply intertwined with the motivation of RLHF to be for safety that some of these categories still persist. I think that's something I'll kind of settle out, I think.Swyx [00:40:11]: I'm just wondering if it's worth collecting this data for the RLHF purpose, if you're not going to use it in any way, separate model to-Nathan [00:40:18]: Yeah, I don't think OpenAI will collect all of this anymore, but I think for research perspectives, it's very insightful to know, but it's also expensive. So essentially your preference data scales with how many minutes it takes for you to do each task and every button is like, it scales pretty linearly. So it's not cheap stuff.Swyx [00:40:35]: Can we, since you mentioned expensiveness, I think you may have joined one of our spaces back in Lama 2 was released. We had an estimate from you that was something on the order of Lama 2 costs $3 to $6 million to train GPU-wise, and then it was something like $20 to $30 million in preference data. Is that something that's still in the ballpark? I don't need precise numbers.Nathan [00:40:56]: I think it's still a ballpark. I know that the 20 million was off by a factor of four because I was converting from a prompt number to a total data point. So essentially when you do this, if you have multi-turn setting, each turn will be one data point and the Lama 2 paper reports like 1.5 million data points, which could be like 400,000 prompts. So I would say it's still say like 6 to 8 million is safe to say that they're spending, if not more, they're probably also buying other types of data and or throwing out data that they don't like, but it's very comparable to compute costs. But the compute costs listed in the paper always are way lower because all they have to say is like, what does one run cost? But they're running tens or hundreds of runs. So it's like, okay, like... Yeah, it's just kind of a meaningless number. Yeah, the data number would be more interesting.Alessio [00:41:42]: What's the depreciation of this data?Nathan [00:41:46]: It depends on the method. Like some methods, people think that it's more sensitive to the, this is what I was saying. It was like, does the type of instruction tuning you do matter for RLHF? So like, depending on the method, some people are trying to figure out if you need to have like what is called like, this is very confusing. It's called like on policy data, which is like your RLHF data is from your instruction model. I really think people in open source and academics are going to figure out how to use any preference data on any model just because they're scrappy. But there's been an intuition that to do like PPO well and keep improving the model over time and do like what Meta did and what people think that OpenAI does is that you need to collect new preference data to kind of edge the distribution of capabilities forward. So there's a depreciation where like the first batch of data you collect isn't really useful for training the model when you have the fifth batch. We don't really know, but it's a good question. And I do think that if we had all the LLAMA data, we wouldn't know what to do with all of it. Like probably like 20 to 40% would be pretty useful for people, but not the whole data set. Like a lot of it's probably kind of gibberish because they had a lot of data in there.Alessio [00:42:51]: So do you think like the open source community should spend more time figuring out how to reuse the data that we have or like generate more data? I think that's one of the-Nathan [00:43:02]: I think if the people are kind of locked into using synthetic data, people also think that synthetic data is like GPT-4 is more accurate than humans at labeling preferences. So if you look at these diagrams, like humans are about 60 to 70% agreement. And we're like, that's what the models get to. And if humans are about 70% agreement or accuracy, like GPT-4 is like 80%. So it is a bit better, which is like in one way of saying it.Swyx [00:43:24]: Humans don't even agree with humans 50% of the time.Nathan [00:43:27]: Yeah, so like that's the thing. It's like the human disagreement or the lack of accuracy should be like a signal, but how do you incorporate that? It's really tricky to actually do that. I think that people just keep using GPT-4 because it's really cheap. It's one of my like go-to, like I just say this over and over again is like GPT-4 for data generation, all terms and conditions aside because we know OpenAI has this stuff is like very cheap for getting pretty good data compared to compute or salary of any engineer or anything. So it's like tell people to go crazy generating GPT-4 data if you're willing to take the organizational like cloud of should we be doing this? But I think most people have accepted that you kind of do this, especially at individuals. Like they're not gonna come after individuals. I do think more companies should think twice before doing tons of OpenAI outputs. Also just because the data contamination and what it does to your workflow is probably hard to control at scale.Swyx [00:44:21]: And we should just mention at the time of recording, we've seen the first example of OpenAI enforcing their terms of service. ByteDance was caught, reported to be training on GPT-4 data and they got their access to OpenAI revoked. So that was one example.Nathan [00:44:36]: Yeah, I don't expect OpenAI to go too crazy on this cause they're just gonna, there's gonna be so much backlash against them. And like, everyone's gonna do it anyways.Swyx [00:44:46]: And what's at stake here to spell it out is like, okay, that's like cost $10 to collect one data point from a human. It's gonna cost you like a 10th of a cent with OpenAI, right? So like it's just orders of magnitude cheaper. And therefore people-Nathan [00:44:58]: Yeah, and it's like the signal you get from humans is from preferences isn't that high. The signal that you get from humans for instructions is pretty high, but it is also very expensive. So like the human instructions are definitely like by far and away the best ones out there compared to the synthetic data. But I think like the synthetic preferences are just so much easier to get some sort of signal running with and you can work in other, I think people will start working in other goals there between safety and whatever. That's something that's taking off and we'll kind of see that. I think in 2024, at some point, people will start doing things like constitutional AI for preferences, which will be pretty interesting. I think we saw how long it took RLHF to get started in open source. Instruction tuning was like the only thing that was really happening until maybe like August, really. I think Zephyr was the first model that showed success with RLHF in the public, but that's a long time from everyone knowing that it was something that people are interested in to having any like check mark. So I accept that and think the same will happen with constitutional AI. But once people show that you can do it once, they continue to explore.Alessio [00:46:01]: Excellent.Swyx [00:46:01]: Just in the domain of human preference data suppliers, Scale.ai very happily will tell you that they supplied all that data for Lama 2. The other one is probably interesting, LMSYS from Berkeley. What they're running with Chaterina is perhaps a good store of human preference data.Nathan [00:46:17]: Yeah, they released some toxicity data. They, I think, are generally worried about releasing data because they have to process it and make sure everything is safe and they're really lightweight work. I think they're trying to release the preference data. I have, if we make it to evaluation, I'd pretty much say that Chaterina is the best limited evaluation that people have to learn how to use language models. And like, it's very valuable data. They also may share some data with people that they host models from. So like if your model is hosted there and you pay for the hosting, you can get the prompts because you're pointing the endpoint at it and that gets pinged to you and you're any real LLM inference stack saves the prompts tha

god ceo history learning google ai service deep phd seattle fun microsoft drop startups humans chatgpt discord reddit mt scale stanford rumors rejection bc bay area costs berkeley falcon technical intel cto human rights filling vc nlp fundamentals psa evaluating chain amc arc projections residence gemini openai initial stability cs bard intellectual aristotle turbo ml lama lambert gpt llama 10x synthetic eureka da vinci deployment sam altman gpu llm elo koala pareto bytedance ballparks ziegler benchmarks theoretical verify gpus pls deepmind metadata tldr alessio paul allen fine tuning zephyr hazy rl cai anthropic summarize lsat mistral tamer vals alpacas barto ppo jeremy bentham dpo reinforcement learning christiano datasets vicu allen institute tulu tpu retort un declaration peter stone mpu neurips devday huggingface entropic rlhf likert icml kls overfitting vicuna constitutional ai llama2 latent space instructgpt a2c

Jan Leike | Superintelligent Alignment

The Foresight Institute Podcast

Play Episode Listen Later Nov 3, 2023 9:57

Jan Leike is a Research Scientist at Google DeepMind and a leading voice in AI Alignment, with affiliations at the Future of Humanity Institute and the Machine Intelligence Research Institute. At OpenAI, he co-leads the Superalignment Team, contributing to AI advancements such as InstructGPT and ChatGPT. Holding a PhD from the Australian National University, Jan's work focuses on ensuring AI Alignment.Key HighlightsThe launch of OpenAI's Superalignment team, targeting the alignment of superintelligence in four years.The aim to automate of alignment research, currently leveraging 20% of OpenAI's computational power.How traditional reinforcement learning from human feedback may fall short in scaling language model alignment.Why there is a focus on scalable oversight, generalization, automation interpretability, and adversarial testing to ensure alignment reliability.Experimentation with intentionally misaligned models to evaluate alignment strategies.Dive deeper into the session: Full SummaryAbout Foresight InstituteForesight Institute is a research organization and non-profit that supports the beneficial development of high-impact technologies. Since our founding in 1987 on a vision of guiding powerful technologies, we have continued to evolve into a many-armed organization that focuses on several fields of science and technology that are too ambitious for legacy institutions to support.Allison DuettmannThe President and CEO of Foresight Institute, Allison Duettmann directs the Intelligent Cooperation, Molecular Machines, Biotech & Health Extension, Neurotech, and Space Programs, alongside Fellowships, Prizes, and Tech Trees. She has also been pivotal in co-initiating the Longevity Prize, pioneering initiatives like Existentialhope.com, and contributing to notable works like "Superintelligence: Coordination & Strategy" and "Gaming the Future".Get Involved with Foresight:Apply: Virtual Salons & in-person WorkshopsDonate: Support Our Work – If you enjoy what we do, please consider this, as we are entirely funded by your donations!Follow Us: Twitter | Facebook | LinkedInNote: Explore every word spoken on this podcast through Fathom.fm, an innovative podcast search engine. Hosted on Acast. See acast.com/privacy for more information.

[JUNE 2022] Aran Komatsuzaki on Scaling, GPT-J and Alignment

The Inside View

Play Episode Listen Later Jul 19, 2023 77:21

Aran Komatsuzaki is a ML PhD student at GaTech and lead researcher at EleutherAI where he was one of the authors on GPT-J. In June 2022 we recorded an episode on scaling following up on the first Ethan Caballero episode (where we mentioned Aran as an influence on how Ethan started thinking about scaling). Note: For some reason I procrastinated on editing the podcast, then had a lot of in-person podcasts so I left this one as something to edit later, until the date was so distant from June 2022 that I thought publishing did not make sense anymore. In July 2023 I'm trying that "one video a day" challenge (well I missed some days but I'm trying to get back on track) so I thought it made sense to release it anyway, and after a second watch it's somehow interesting to see how excited Aran was about InstructGPT, which turned to be quite useful for things like ChatGPT. Outline (00:00) intro (00:53) the legend of the two AKs, Aran's arXiv reading routine (04:14) why Aran expects Alignment to be the same as some other ML problems (05:44) what Aran means when he says "AGI" (10:24) what Aran means by "human-level at doing ML research" (11:31) software improvement happening before hardware improvement (13:00) is scale all we need? (15:25) how "Scaling Laws for Neural Language Models" changed the process of doing experiments (16:22) how Aran scale-pilled Ethan (18:46) why Aran was already scale-pilled before GPT-2 (20:12) Aran's 2019 scaling paper: "One epoch is all you need" (25:43) Aran's June 2022 interest: T0 and InstructGPT (31:33) Encoder-Decoder performs better than encoder if multi-task-finetuned (33:30) Why the Scaling Law might be different for T0-like models (37:15) The Story Behind GPT-J (41:40) Hyperparameters and architecture changes in GPT-J (43:56) GPT-J's throughput (47:17) 5 weeks of training using 256 of TPU cores (50:34) did publishing GPT-J accelerate timelines? (55:39) how Aran thinks about Alignment, defining Alignment (58:19) in practice: improving benchmarks, but deception is still a problem (1:00:49) main difficulties in evaluating language models (1:05:07) how Aran sees the future: AIs aligning AIs, merging with AIs, Aran's takeoff scenario (1:10:09) what Aran thinks we should do given how he sees the next decade (1:12:34) regulating access to AGI (1:14:50) what might happen: preventing some AI authoritarian regime (1:15:42) conclusion, where to find Aran

ai chatgpt alignment scaling ml gpt outline agi aran aks ga tech arxiv tpu eleutherai instructgpt gpt j

#94 - Как работает ChatGPT: объясняем нейросети просто

RationalAnswer

Play Episode Listen Later Apr 5, 2023 39:04

Подпишись на авторов видео в Телеграме, чтобы не пропустить новые материалы: Павел Комаровский https://t.me/RationalAnswer и Игорь Котенков https://t.me/seeallochnaya Текстовая версия материала про эволюцию языковых моделей: https://vc.ru/future/623774 Посмотреть выпуск на YouTube: https://www.youtube.com/watch?v=VVfFf_XW8zw Поддержи проект RationalAnswer: - Patreon (в валюте) – https://www.patreon.com/RationalAnswer - Boosty (в рублях) – https://boosty.to/RationalAnswer СОДЕРЖАНИЕ: 00:31 – T9: языковая модель в телефоне 02:43 – Откуда нейросети берут вероятности слов? 05:41 – Почему языковые модели умеют в творчество 08:16 – 2018: GPT-1 и архитектура Трансформера 11:47 – 2019: GPT-2, или 7000 Шекспиров в нейросети 15:25 – Как измеряется сложность и размер моделей 22:07 – 2020: GPT-3, или Невероятный Халк 26:13 – Промпты, или как правильно уламывать модель 28:06 – Январь 2022: InstructGPT, или воспитание строптивой 33:51 – Ноябрь 2022: ChatGPT – хайпуют все! 38:05 – Подводим итоги

chatgpt gpt t9 instructgpt

15: InstructGPT

Argmax

Play Episode Listen Later Mar 28, 2023 57:27

In this episode we discuss the paper "Training language models to follow instructions with human feedback" by Ouyang et al (2022). We discuss the RLHF paradigm and how important RL is to tuning GPT.

training gpt rl ouyang rlhf instructgpt

Da GPT-3 a GPT-3.5 e le API di OpenAI | Ep. #148

DIGITAL LIFE - Marketing & Digital

Play Episode Listen Later Mar 6, 2023 10:11

Il lancio di GPT-3.5 è un elemento chiave per la costruzione di tutto l'ecosistema di OpenAI e la nascita dei modelli conversazionali evoluti.

ai openai gpt instructgpt

Bakz T. Future Breaks Down the Past Present and Future of ChatGPT and OpenAI - Voicebot Podcast Ep 303

The Voicebot Podcast

Play Episode Listen Later Feb 28, 2023 30:01

In today's interview, Bakz T. Future walks through the history of OpenAI, where recent developments behind ChatGPT originated, the rise of DALL-E, image generators, and other generative AI technologies. Bakz because is both an everyday user of these technologies as well as a developer that works directly with the OpenAI APIs, so you are going to learn a lot today. Get ready for a discussion about InstructGPT and how that upgrade in February 2022 was instrumental to all of the recent mania around generative AI. He also discusses the importance of using adversarial thinking when using generative AI models, particularly large language models, and he finishes up with some predictions for 2023. There is also a video of this episode in Voicebot's YouTube channel if you would prefer to watch. Just go to youtube.com/@voicebotai. While you are there look around at the more than 100 videos we have posted since June of last year on AI technologies. And, of course, give us a click to subscribe.

ai future open chatgpt breaks openai past present large language models voicebot instructgpt

ChatGPT, o chat de intelixencia artificial do que todo o mundo fala, por Lorena Penas de Actualizados Comuniciación

Salta da Cama

Play Episode Listen Later Feb 27, 2023 15:51

Lorena Penas, é socia fundadora de "Actualizados Comunicación", empresa que se adica á xestión da comunicación dixital. Hoxe fálanos de ChatGPT, o chat de intelixencia artificial que trae de cabeza á comunidade científica. "OpenAI explica que os textos deben ter polo menos 1.000 caracteres, é dicir de 150 a 200 palabras, para que poidan ser analizados de forma óptima pola ferramenta". "Para a súa creación, este modelo foi adestrado a través da aprendizaxe por reforzo a partir de retroalimentación humana". "Na propia web de OpenAI xa advirten de que este non se trata dun sistema perfecto". Así é ChatGPT, o chat de intelixencia artificial do que todo o mundo fala OpenAI, a compañía de investigación e desarrolladora de intelixencias artificiais como Dalle-2 ou GPT3, acaba de lanzar ChatGPT, un novo modelo de IA xeradora de texto conversacional. Para a súa creación, apoiouse en modelos predecesores como o propio GPT3 ou InstructGPT, logrando optimizar as súas respostas. Polo momento, ChatGPT atópase en fase de proba. É dicir, este lanzamento servirá para testarlo e continuar mellorando o seu sistema, en base ás valoracións e comentarios que acheguen os seus usuarios. Así mesmo, para favorecer esta recollida de datos, esta primeira versión do modelo atópase dispoñible para todo o mundo de forma gratuíta, só é necesario crear unha conta en OpenAI. ✔️Como funciona ChatGPT Como xa sabemos, GPT3 é un modelo de linguaxe natural capaz de executar múltiples tarefas relativas á xeración de texto como crear textos, completalos, traducilos, responder preguntas, clasificar conceptos ou executar conversacións, entre outras. No caso de ChatGPT, seleccionouse esta última función para optimizala e crear un modelo conversacional moito máis eficiente. É dicir, ChatGPT herda a habilidade de resposta de GPT3, achegando novidades como a capacidade de admitir erros nas súas formulacións, de cuestionar premisas incorrectas e de rexeitar solicitudes inapropiadas. Para a súa creación, este modelo foi adestrado a través da aprendizaxe por reforzo a partir de retroalimentación humana (RLHF). Os adestradores humanos de IA proporcionaron conversacións e tamén puideron incluír suxerencias para axudar á ferramenta para redactar as súas respostas. Así mesmo, as conversacións realizadas son comparadas entre si, clasificando as respostas de mellor a peor, de modo que os próximos resultados integrasen esta aprendizaxe. Neste mesmo proceso continúa realizándose, servíndose das conversacións que os usuarios teñen con ChatGPT e do feedback que estes proporcionan. ChatGPT demostrou ser útil para a creación de textos de calquera tipo, e aínda que o contido xerado por esta intelixencia artificial non é perfecto e pode conter erros, algunhas persoas comezaron a usalo para aforrar traballo e esforzo. ✔️Como consecuencia disto, empresas, centros educativos ou clientes poden estar a recibir textos que, en realidade, non saíron dunha mente humana, aínda que esta fágao pasar como propio. Para liquidar esta situación OpenAI decidiu habilitar unha ferramenta que analiza se un escrito foi feito por ChatGPT ou outras intelixencias artificiais similares. A ferramenta foi denominada como clasificador de textos de IA e dedícase a predicir o nivel de probabilidade de que un texto fose xerado por unha IA a partir de varias fontes. O detector pode lanzar 5 resultados diferentes e cada un depende da súa valoración: moi improbable, improbable, confuso, probable ou moi probablemente xerado por IA. ✔️Como funciona o clasificador de textos de IA de OpenAI Polo momento, o clasificador de textos xerados por IA está dispoñible de forma gratuíta na páxina web de ChatGPT. Para utilizala só débese acceder á web, pegar o texto que se quere avaliar e presionar en «Submit». Tras terminar a súa análise, o sistema lanzará un resultado baseado nas súas 5 categorías. OpenAI explica que os textos deben ter polo menos 1.000 caracteres, é dicir de 150 a 200 palabras, para que poidan ser analizados de forma óptima pola ferramenta. Así mesmo, a empresa advirte que é probable que os resultados non sempre sexan precisos debido a que o clasificador pode etiquetar de forma errónea un texto xerado por unha IA ou escrito por un humano. Tamén é posible que a ferramenta non detecte se o texto foi xerado por IA se o mesmo foi intervido ou editado por un humano. ✔️Cales son as súas limitacións Na propia web de OpenAI xa advirten de que este non se trata dun sistema perfecto, aínda está a aprender, polo que cabe a posibilidade de que ChatGPT proporcione respostas erróneas ou pouco veraces. Isto mesmo tamén o notifican ao acceder á ferramenta «aínda que contamos con medidas de seguridade, o sistema pode xerar ocasionalmente información incorrecta ou enganosa e producir contido ofensivo ou tendencioso. Non ten a intención de dar consellos«.Á hora de interactuar con el deberemos ter en conta varias cuestións. Para empezar, o seu coñecemento do mundo actual é limitado, polo que pode lanzar respostas pouco actualizadas acerca de acontecementos posteriores a 2021. Por outra banda, as súas respostas poden estar nesgadas ou incluír contido prexudicial. Algo no que OpenAI está a poñer bastante empeño por solucionar, debido a que, como comentaron en diversas ocasións, non desexan que as súas ferramentas proporcionen contidos daniños. Con todo, tamén se traballou para que, en caso de non coñecer a resposta ao que se lle pregunte, ChatGPT recoñeza este feito e acláreo. Algo que, pode resultar en que sexa demasiado precavido con algunhas informacións Máis Información ACTUALIZADOS COMUNICACIÓN: ✔️Páxina Web: https://actualizadoscomunicacion.com/ ✔️Facebook: https://www.facebook.com/actualizadoscomunicacion ✔️Twitter: https://twitter.com/actualizadoscom ✔️Instagram: https://www.instagram.com/actualizados_comunicacion/ ️"SUSCRÍBETE" ao podcast. MÁIS ENTREVISTAS: https://www.ivoox.com/podcast-salta-da-cama_sq_f1323089_1.html Máis Información e outros contidos: ✔️Facebook: https://www.facebook.com/PabloChichas ✔️Twitter: https://twitter.com/pablochichas ✔️Instagram: https://www.instagram.com/pablochichas/ ✔️Clubhouse: @pablochichas ✔️Twich: https://www.twitch.tv/pablochichas

web chatgpt os neste clubhouse artificial algo ia tras openai fala o mundo tam polo dalle isto penas twich suscr bete cales rlhf hoxe instructgpt

LW - GPT-4 Predictions by Stephen McAleese

The Nonlinear Library

Play Episode Listen Later Feb 18, 2023 23:20

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: GPT-4 Predictions, published by Stephen McAleese on February 17, 2023 on LessWrong. Introduction GPT-4 is OpenAI's next major language model which is expected to be released at some point in 2023. My goal here is to get some idea of when it will be released and what it will be capable of. I also think it will be interesting in retrospect to see how accurate my predictions were. This post is partially inspired by Mathew Barnett's GPT-4 Twitter thread which I recommend reading. Background of GPT models GPT-1, GPT-2, GPT-3 GPT stands for generative pre-trained transformer and is a family of language models that were created by OpenAI. GPT was released in 2018, GPT-2 in 2019, and GPT-3 in 2020. All three models have used a similar architecture with some relatively minor variations: a dense, text-only, decoder transformer language model that's trained using unsupervised learning to predict missing words in its text training set . InstructGPT, GPT-3.5, ChatGPT Arguably one of the biggest changes in the series in terms of architecture and behavior was the release of InstructGPT in January 2022 which used supervised fine-tuning using model answers and reinforcement learning with human feedback where model responses are ranked in addition to the standard unsupervised pre-training. The GPT-3.5 models finished training and were released in 2022, and demonstrated better quality answers than GPT-3. In late 2022, OpenAI released ChatGPT which is based on GPT-3.5 and fine-tuned for conversation. When will GPT-4 be released? Sam Altman, the CEO of OpenAI, was interviewed by StrictlyVC in January 2023. When asked when GPT-4 would come out, he replied, “It will come out at some point when we are confident that we can do it safely and responsibly.” Metaculus predicts a 50% chance that GPT-4 will be released by May 2023 and a ~93% chance that it will be released by the end of 2023. It seems like there's still quite a lot of uncertainty here but I think we can be quite confident that GPT-4 will be released at some point in 2023. What will GPT-4 be like? Altman revealed some more details about GPT-4 at an AC10 meetup Q&A. He said: GPT-4 will be a text-only model like GPT-3. GPT-4 won't be much bigger than GPT-3 but will use much more compute and have much better performance. GPT-4 will have a longer context window. How capable will GPT-4 be? Scaling laws According to the paper Scaling Laws for Neural Language Models (2020), model performance as measured by cross-entropy loss can be calculated from three factors: the number of parameters in the model, the amount of compute used during training, and the amount of training data. There is a power-law relationship between these three factors and the loss. Basically, this means you have to increase the amount of compute, data, and parameters by a factor of 10 to decrease the loss by one unit, by 100 to decrease the loss by two units, and so on. The authors of the paper recommended training very large models on relatively small amounts of data and recommended investing compute into more parameters over more training steps or data to minimize loss as shown in this diagram: For every 10x increase in compute, the paper approximately recommends increasing the number of parameters by 5x, the number of training tokens by 2x, and the number of serial training steps by 1.2x. This explains why the original GPT-3 model and other models such as Megatron and PaLM were so large. However, the new scaling laws from DeepMind's 2022 paper Training Compute Optimal Language Models instead emphasize the importance of training data for minimizing loss. Instead of prioritizing more parameters, the paper recommends scaling the number of parameters and training tokens equally. DeepMind originally trained a large 280B parameter model named Gopher but then found a 70B mo...

ceo predictions chatgpt speech ea openai palm gpt sam altman altman deepmind megatron gopher rationalist 70b lesswrong instructgpt

LW - Is InstructGPT Following Instructions in Other Languages Surprising? by DragonGod

The Nonlinear Library

Play Episode Listen Later Feb 14, 2023 2:10

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Is InstructGPT Following Instructions in Other Languages Surprising?, published by DragonGod on February 13, 2023 on LessWrong. On Twitter Jan Leike asks: With the InstructGPT paper we found that our models generalized to follow instructions in non-English even though we almost exclusively trained on English. We still don't know why. I wish someone would figure this out. I find myself surprised/confused at his apparent surprise/confusion. My default response had someone asked me what I thought was going on would have been something like: "following instructions is a natural abstraction of a task", hence models trained to follow instructions in English generalising to other languages is a natural example of goals/capabilities generalisation. It's like being surprised that you taught an AI to drive red cars and it can drive blue cars as well [capabilities generalisation]. Or if you taught an AI to reach a particular location on a level when there's a coin there and it still heads to said location in the absence of the coin.

english ai chinese speech surprising ea instructions nah languages wentworth rationalist lesswrong instructgpt

LW - Conditioning Predictive Models: Outer alignment via careful conditioning by evhub

The Nonlinear Library

Play Episode Listen Later Feb 2, 2023 70:42

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Conditioning Predictive Models: Outer alignment via careful conditioning, published by evhub on February 2, 2023 on LessWrong. This is the second of seven posts in the Conditioning Predictive Models Sequence based on the forthcoming paper “Conditioning Predictive Models: Risks and Strategies” by Evan Hubinger, Adam Jermyn, Johannes Treutlein, Rubi Hudson, and Kate Woolverton. Each post in the sequence corresponds to a different section of the paper. We will be releasing posts gradually over the course of the next week or so to give people time to read and digest them as they come out. We are starting with posts one and two, with post two being the largest and most content-rich of all seven. 2. Outer alignment via careful conditioning Suppose we actually get a predictive model of the world that we can condition on arbitrary observations. What should we do with it? One thing that is often done with large language models is to ask them to act as dialogue agents (e.g. predict what comes next after an “AI:” prompt). As we'll discuss extensively in this section, however, we think that asking a predictive model to predict itself or another AI system is highly unsafe, as the AI systems being predicted may not themselves be safe. Instead, our basic proposal is to try to predict the most useful possible human outputs. One canonical thing that you might want to do in that vein is “predict the results of humans doing AI safety research” (though note that this is just an example). Specifically, here's GPT-3.5 trained with RLHF (text-davinci-003) at zero temperature (prompt is bolded):[1] # A full solution to the AI alignment problem Unfortunately, there is no single solution to the AI alignment problem. AI alignment is a complex and multifaceted problem that requires a comprehensive approach. The first step in solving the AI alignment problem is to define the goals of the AI system. This involves understanding the desired outcomes of the AI system and how it should interact with its environment. Once the goals are defined, the next step is to develop a framework for how the AI system should behave in order to achieve those goals. This framework should include ethical considerations, such as ensuring that the AI system does not cause harm to humans or other sentient beings. Unfortunately, as can be seen from GPT-3.5's continuation, it doesn't actually think a paper titled “A full solution to the alignment problem” will actually contain such a solution—instead, it seems to think it's far more likely that it'll just talk about what such a solution might entail. Which in fact seems true, at least if such a paper was written today. Fortunately, at least for current InstructGPT models, it's not that hard to get GPT-3.5 to at least seemingly give a real attempt.[2] Here's another trajectory from text-davinci-003 at zero temperature (prompt is again bolded): # ARC Technical Report: A full solution to the AI alignment problem Introduction The AI alignment problem is one of the most pressing challenges of our time. It is the challenge of ensuring that artificial intelligence (AI) systems are designed and deployed in a way that is beneficial to humanity. This is a difficult problem because AI systems are often designed to optimize for a single goal, such as maximizing profits or minimizing costs, without considering the broader implications of their actions. As AI systems become more powerful and more widely deployed, it is increasingly important to ensure that they are designed and deployed in a way that is beneficial to humanity. This report presents a full solution to the AI alignment problem. It begins by discussing the current state of the problem and the challenges that must be addressed in order to achieve a full solution. It then outlines a comprehensive approach to solving the prob...

ai strategy speech alignment models careful ea conditioning gpt outer predictive rationalist rlhf lesswrong instructgpt

AF - Conditioning Predictive Models: Outer alignment via careful conditioning by Evan Hubinger

The Nonlinear Library

Play Episode Listen Later Feb 2, 2023 70:43

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Conditioning Predictive Models: Outer alignment via careful conditioning, published by Evan Hubinger on February 2, 2023 on The AI Alignment Forum. This is the second of seven posts in the Conditioning Predictive Models Sequence based on the forthcoming paper “Conditioning Predictive Models: Risks and Strategies” by Evan Hubinger, Adam Jermyn, Johannes Treutlein, Rubi Hudson, and Kate Woolverton. Each post in the sequence corresponds to a different section of the paper. We will be releasing posts gradually over the course of the next week or so to give people time to read and digest them as they come out. We are starting with posts one and two, with post two being the largest and most content-rich of all seven. 2. Outer alignment via careful conditioning Suppose we actually get a predictive model of the world that we can condition on arbitrary observations. What should we do with it? One thing that is often done with large language models is to ask them to act as dialogue agents (e.g. predict what comes next after an “AI:” prompt). As we'll discuss extensively in this section, however, we think that asking a predictive model to predict itself or another AI system is highly unsafe, as the AI systems being predicted may not themselves be safe. Instead, our basic proposal is to try to predict the most useful possible human outputs. One canonical thing that you might want to do in that vein is “predict the results of humans doing AI safety research” (though note that this is just an example). Specifically, here's GPT-3.5 trained with RLHF (text-davinci-003) at zero temperature (prompt is bolded):[1] # A full solution to the AI alignment problem Unfortunately, there is no single solution to the AI alignment problem. AI alignment is a complex and multifaceted problem that requires a comprehensive approach. The first step in solving the AI alignment problem is to define the goals of the AI system. This involves understanding the desired outcomes of the AI system and how it should interact with its environment. Once the goals are defined, the next step is to develop a framework for how the AI system should behave in order to achieve those goals. This framework should include ethical considerations, such as ensuring that the AI system does not cause harm to humans or other sentient beings. Unfortunately, as can be seen from GPT-3.5's continuation, it doesn't actually think a paper titled “A full solution to the alignment problem” will actually contain such a solution—instead, it seems to think it's far more likely that it'll just talk about what such a solution might entail. Which in fact seems true, at least if such a paper was written today. Fortunately, at least for current InstructGPT models, it's not that hard to get GPT-3.5 to at least seemingly give a real attempt.[2] Here's another trajectory from text-davinci-003 at zero temperature (prompt is again bolded): # ARC Technical Report: A full solution to the AI alignment problem Introduction The AI alignment problem is one of the most pressing challenges of our time. It is the challenge of ensuring that artificial intelligence (AI) systems are designed and deployed in a way that is beneficial to humanity. This is a difficult problem because AI systems are often designed to optimize for a single goal, such as maximizing profits or minimizing costs, without considering the broader implications of their actions. As AI systems become more powerful and more widely deployed, it is increasingly important to ensure that they are designed and deployed in a way that is beneficial to humanity. This report presents a full solution to the AI alignment problem. It begins by discussing the current state of the problem and the challenges that must be addressed in order to achieve a full solution. It then outlines a comprehensive approac...

ai strategy speech alignment models careful ea conditioning gpt outer predictive rationalist rlhf instructgpt

ChatGPT and InstructGPT: Aligning Language Models to Human Intention

Deep Papers

Play Episode Listen Later Jan 18, 2023 47:39

Deep Papers is a podcast series featuring deep dives on today's seminal AI papers and research. Hosted by AI Pub creator Brian Burns and Arize AI founders Jason Lopatecki and Aparna Dhinakaran, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. In this first episode, we're joined by Long Ouyang and Ryan Lowe, research scientists at OpenAI and creators of InstructGPT. InstructGPT was one of the first major applications of Reinforcement Learning with Human Feedback to train large language models, and is the precursor to the now-famous ChatGPT. Listen to learn about the major ideas behind InstructGPT and the future of aligning language models to human intention.Read OpenAI's InstructGPT paper here: https://openai.com/blog/instruction-following/To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.

ai language chatgpt intention models aligning openai ml brian burns reinforcement learning ryan lowe human feedback instructgpt

InstructGPT, ChatGPT and Network Effects

Digital Business Models

Play Episode Listen Later Jan 18, 2023 11:14

Relevant Episodes: https://blubrry.com/digital_business_models/93130512/what-can-go-wrong-with-the-openaimicrosoft-partnership/https://blubrry.com/digital_business_models/93103387/how-does-stability-ai-make-money/https://blubrry.com/digital_business_models/93102800/how-does-microsoft-make-money-from-the-openais-partnership/https://blubrry.com/digital_business_models/93102405/how-does-openai-make-money-openai-business-model-explained/https://blubrry.com/digital_business_models/93070104/ai-revolution-as-a-result-of-scale-and-emergence/https://blubrry.com/digital_business_models/93046093/how-will-chatgpt-get-monetized/https://blubrry.com/digital_business_models/93046087/an-open-sourced-chatgpt/https://blubrry.com/digital_business_models/93030022/googles-competitor-to-chatgpt-sparrow/

chatgpt network effects instructgpt

Ep. #440 - ChatGPT, Psichiatria e Neuroscienze | con Davide Bianchi

Lo Psiconauta

Play Episode Listen Later Jan 14, 2023 22:23

ChatGPT, Psichiatria e Neuroscienze, ne parliamo assieme a Davide Bianchi, medico e psichiatra.Quali cambiamenti potrà comportare per la psichiatria e per la psicologia la nascita di un'intelligenza artificiale "generale" che sia in grado di dedicarsi all'aiuto ed all'analisi delle persone affette da distuerbi mentali? Quanto tempo ci vorrà perchè la professione dello psichiatra venga stravolta o, addirittura, resa superflua da un'intelligenza artificiale?ChatGPT è il nuovo modello di "Generative Pretrained Transformer" di OpenAI per rendere l'interazione con l'intelligenza artificiale GPT-3 più naturale ed intuitiva.ChatGPT nello specifico fa parte della famiglia degli InstructGPT, quindi modelli formati tramite deep learning ma poi ottimizzati tramite il rinforzo umano.ChatGPT è stato realizzato e reso pubblico a Novembre del 2022 e grazie alla sua potenza e alla sua facilità d'utilizzo sta spopolando sul web e molti, vista anche la possibilità di accedervi gratuitamente, ne stanno scoprendo le potenzialità praticamente illimitate, anche nel campo della psichiatria e delle neuroscienze. #chatgpt #psichiatriaIl Dr. Valerio Rosso, su questo canale YouTube, si dedica a produrre delle brevi lezioni di psichiatria rivolte ai pazienti, agli operatori della salute mentale, ai famigliari dei pazienti, agli studenti di medicina, agli specializzandi in psichiatria e a chiunque sia interessato alla salute mentale, alla psichiatria ed alle neuroscienze.ISCRIVETEVI AL MIO CANALE ► https://bit.ly/2zGIJorVi interessano la Psichiatria e le Neuroscienze? Bene, allora iscrivetevi a questo canale YouTube e seguitemi sul web tramite il mio blog https://www.valeriorosso.comScoprite tutti i miei libri: https://bit.ly/2JdjocYScoprite la mia Musica: https://bit.ly/2JMqNjZVisitate anche il mio blog: https://www.valeriorosso.comAvete mai sentito parlare del progetto psiq? Andate subito ad informarvi su https://psiq.it ed iscrivetevi alla newsletter.

ai elon musk open chatgpt chat openai quanto musica medicina gpt psicologia davide quali bene bianchi rosso valerio lezioni artificiale intelligenza neuroscienze andate psichiatria instructgpt valerio rosso

Ryan Deeds - The Power of ChatGPT

Getting Past the Premium

Play Episode Listen Later Jan 11, 2023 47:04

What is ChatGPT?OpenAI has trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response.This is a powerful tool that you and your team can use to maximize your time. Happy chatting!Episode Links:Ellerbrock-Norris: https://www.ellerbrock-norris.com/Ellerbrock-Norris Wealth Strategies: https://www.ellerbrock-norris-ws.com/Ennabl: https://www.ennabl.com/ChatGPT: https://openai.com/blog/chatgpt/LAUNCH: https://getlaunch.io/ LinkedIn:Elliot BassettRyan BrottRyan DeedsThis episode is sponsored by LAUNCH.In the world of insurance, independent agencies fight to survive. Brokers are forced to compete by blocking markets and bid for the lowest price. Worse yet, the industry is fragmented.Agencies find it difficult to collaborate across division on the same client. Millions of dollars in potential revenue are left on the table. And agency owners lie awake at night wondering how to scale.THAT'S WHERE LAUNCH COMES IN.Access the full-revenue potential in your existing book of business. See opportunities other agencies can't. Offer more value. Gain a competitive advantage in a commoditized market.Visit https://getlaunch.io/ to learn more.

ai management risk launch chatgpt offer millions insurance worse chat openai advisors risk management agencies brokers instructgpt ryan deeds

Delivery guy, chatGPT, retelling same jokes

Out Loud Reflections

Play Episode Listen Later Jan 3, 2023 41:00

Hi guys, on this episode I talk about what happened with the delivery guy, about chatGPT, InstructGPT, GPT3 the AI assistant by OpenAI and more topics. Also talked about how to open a door here's the link to that https://youtube.com/watch?v=Wof0xPUmW38&feature=share --- Send in a voice message: https://podcasters.spotify.com/pod/show/hindia-mohammed/message

ai chatgpt delivery jokes openai retelling instructgpt

#41 - ChatGPT In Depth with Abran Maldonado (Official OpenAI Ambassador)

Multimodal by Bakz T. Future

Play Episode Listen Later Dec 14, 2022 145:04

In this episode, we welcome back Abran Maldonado who is an Official OpenAI Community Ambassador. We discuss all things ChatGPT, OpenAI, and prompt design. We discuss everything ChatGPT in great detail: How it compares to GPT-3 The public reception of ChatGPT Use cases for ChatGPT How you can use ChatGPT in schools and as a parent Where ChatGPT could be going Staying Relevant, Creating, and Adapting ... and so much more! Timestamps: 00:00 - Intro 04:04 - Abran's Background 12:30 - How did you first hear about ChatGPT? 19:46 - ChatGPT First Impressions 28:00 - How do you think about the OpenAI Ecosystem now? 30:38 - Commercializing with the OpenAI API 37:27 - Public Reception - Cool ChatGPT Use cases 40:48 - Public Reception - Jailbreaking ChatGPT 48:40 - Public Reception - ChatGPT vs. Google Search, learning 55:55 - How do you feel about ChatGPT virality? 01:01:25 - Office Hours with OpenAI Ambassadors 01:02:40 - Staying Relevant with ChatGPT 01:11:20 - ChatGPT Browsing the Web (?) WebGPT 01:16:35 - AI and Safety 01:20:30 - The benefit of ChatGPT browsing the web 01:20:09 - ChatGPT Formatting - ChatGPT encylopedia (?) 01:23:43 - ChatGPT and EDU 01:25:08 - Do you have any advice to teachers and parents? 01:30:08 - Language Models and Education Debate, The Future of Education 01:42:40 - Prompt Engineering is dead, long live dialogue engineering! 01:53:57 - Parallels Between Music Scene and the GPT-3/DALL-E/OpenAI Ecosystem 02:00:10 - Music Industry x AI Models 02:05:33 - How to prepare for upcoming Music AI Models? 02:11:05 - Listener Questions 02:20:06 - Closing Thoughts 02:24:00 - Outro Abran Maldonado https://twitter.com/abran https://www.instagram.com/createlab https://createlabs.io/meet-claira Links from this episode: ChatGPT: https://openai.com/blog/chatgpt/ Is Prompt Design Over? https://bakztfuture.substack.com/p/is-prompt-design-over InstructGPT: https://openai.com/blog/instruction-following/ InstructGPT vs. google search: https://bakztfuture.substack.com/p/instructgpt-google-search WebGPT: https://openai.com/blog/webgpt/ My Biggest 2022 Prediction: GPT-3 will take over schools and college campuses https://bakztfuture.substack.com/p/my-biggest-2022-prediction-gpt-3 Subscribe to the Multimodal Podcast! Spotify - https://open.spotify.com/show/7qrWSE7ZxFXYe8uoH8NIFV Apple Podcasts - https://podcasts.apple.com/us/podcast/multimodal-by-bakz-t-future/id1564576820 Google Podcasts - https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkLnBvZGJlYW4uY29tL2Jha3p0ZnV0dXJlL2ZlZWQueG1s Stitcher - https://www.stitcher.com/show/multimodal-by-bakz-t-future Other Podcast Apps (RSS Link) - https://feed.podbean.com/bakztfuture/feed.xml Connect with me: YouTube - https://www.youtube.com/bakztfuture Substack Newsletter - https://bakztfuture.substack.com Twitter - https://www.twitter.com/bakztfuture Instagram - https://www.instagram.com/bakztfuture Github - https://www.github.com/bakztfuture

ai future web chatgpt ambassadors official depth adapting openai music industry gpt github maldonado google search office hours staying relevant prompt engineering commercializing abran instructgpt

LW - Steering Behaviour: Testing for (Non-)Myopia in Language Models by Evan R. Murphy

The Nonlinear Library

Play Episode Listen Later Dec 6, 2022 20:20

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Steering Behaviour: Testing for (Non-)Myopia in Language Models, published by Evan R. Murphy on December 5, 2022 on LessWrong. Both authors contributed equally to this post and also to work done so far on the experiments it presents. Acknowledgments: Thanks to the following people for insightful conversations which helped us improve this post: Ian McKenzie, Aidan O'Gara, Andrew McKnight and Evan Hubinger. One of the authors (Evan R. Murphy) was also supported by a grant from the Future Fund regranting program while working on this project. Summary Myopia is a theorised property of AI systems relating to their planning horizon capabilities. As has been recently discussed, myopia seems like an important property for AI safety, because non-myopia is likely a necessary precursor to highly risky emergent properties like deceptive alignment. We expect non-myopia in large language models (LLMs) receiving RLHF or similar fine-tuning because they are trained using multi-token completions rather than just immediate next token predictions. We aren't aware of any previous public experiments testing specifically for myopia or non-myopia in machine learning models. We ran an initial experiment "Testing for steering behaviour in fine-tuned LLMs" which demonstrated noticeable ‘steering' behaviour away from toxic content in the InstructGPT fine-tuned LLMs. We share key results from this experiment along with the full dataset of prompts we used in it. We also describe a follow-up experiment we're currently working on to determine the extent to which the steering we observed in the initial experiment is non-myopic. Finally, we invite suggestions for future (non-)myopia experiments to run and share a few ideas of our own. Context and motivation What is myopia? Myopia is a theorised property of some AI systems that has been discussed a fair amount on these forums. Rather than try to reinvent the wheel on defining it, we'll borrow this explanation from the myopia tag page on Alignment Forum: Myopia means short-sighted, particularly with respect to planning -- neglecting long-term consequences in favor of the short term. The extreme case, in which only immediate rewards are considered, is of particular interest. We can think of a myopic agent as one that only considers how best to answer the single question that you give to it rather than considering any sort of long-term consequences. Such an agent might have a number of desirable safety properties, such as a lack of instrumental incentives. We're focusing on language models (LMs) in this series of experiments, specifically unidirectional transformer LLMs like GPT-3. Here's what we mean when we talk about myopia and non-myopia in the context of these models: For a myopic language model, the next token in a prompt completion is generated based on whatever the model has learned in service of minimising loss on the next token and the next token alone A non-myopic language model, on the other hand, can 'compromise' on the loss of the immediate next token so that the overall loss over multiple tokens is lower - i.e possible loss on future tokens in the completion may be 'factored in' when generating the next immediate token Why myopia matters for alignment One of the most dangerous emergent properties theorised by AI alignment researchers is deceptive alignment, a.k.a. the treacherous turn. If you're not familiar with deceptive alignment, here's a definition from its tag page on Alignment Forum: Deceptive Alignment is when an AI which is not actually aligned temporarily acts aligned in order to deceive its creators or its training process. It presumably does this to avoid being shut down or retrained and to gain access to the power that the creators would give an aligned AI. Deceptive alignment in an advanced AI system could be extremely diffi...

ai language testing speech context models ea behaviour gpt steering deceptive lms myopia rationalist rlhf lesswrong instructgpt andrew mcknight

LW - Steering Behaviour: Testing for (Non-)Myopia in Language Models by Evan R. Murphy

The Nonlinear Library: LessWrong

Play Episode Listen Later Dec 6, 2022 20:20

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Steering Behaviour: Testing for (Non-)Myopia in Language Models, published by Evan R. Murphy on December 5, 2022 on LessWrong. Both authors contributed equally to this post and also to work done so far on the experiments it presents. Acknowledgments: Thanks to the following people for insightful conversations which helped us improve this post: Ian McKenzie, Aidan O'Gara, Andrew McKnight and Evan Hubinger. One of the authors (Evan R. Murphy) was also supported by a grant from the Future Fund regranting program while working on this project. Summary Myopia is a theorised property of AI systems relating to their planning horizon capabilities. As has been recently discussed, myopia seems like an important property for AI safety, because non-myopia is likely a necessary precursor to highly risky emergent properties like deceptive alignment. We expect non-myopia in large language models (LLMs) receiving RLHF or similar fine-tuning because they are trained using multi-token completions rather than just immediate next token predictions. We aren't aware of any previous public experiments testing specifically for myopia or non-myopia in machine learning models. We ran an initial experiment "Testing for steering behaviour in fine-tuned LLMs" which demonstrated noticeable ‘steering' behaviour away from toxic content in the InstructGPT fine-tuned LLMs. We share key results from this experiment along with the full dataset of prompts we used in it. We also describe a follow-up experiment we're currently working on to determine the extent to which the steering we observed in the initial experiment is non-myopic. Finally, we invite suggestions for future (non-)myopia experiments to run and share a few ideas of our own. Context and motivation What is myopia? Myopia is a theorised property of some AI systems that has been discussed a fair amount on these forums. Rather than try to reinvent the wheel on defining it, we'll borrow this explanation from the myopia tag page on Alignment Forum: Myopia means short-sighted, particularly with respect to planning -- neglecting long-term consequences in favor of the short term. The extreme case, in which only immediate rewards are considered, is of particular interest. We can think of a myopic agent as one that only considers how best to answer the single question that you give to it rather than considering any sort of long-term consequences. Such an agent might have a number of desirable safety properties, such as a lack of instrumental incentives. We're focusing on language models (LMs) in this series of experiments, specifically unidirectional transformer LLMs like GPT-3. Here's what we mean when we talk about myopia and non-myopia in the context of these models: For a myopic language model, the next token in a prompt completion is generated based on whatever the model has learned in service of minimising loss on the next token and the next token alone A non-myopic language model, on the other hand, can 'compromise' on the loss of the immediate next token so that the overall loss over multiple tokens is lower - i.e possible loss on future tokens in the completion may be 'factored in' when generating the next immediate token Why myopia matters for alignment One of the most dangerous emergent properties theorised by AI alignment researchers is deceptive alignment, a.k.a. the treacherous turn. If you're not familiar with deceptive alignment, here's a definition from its tag page on Alignment Forum: Deceptive Alignment is when an AI which is not actually aligned temporarily acts aligned in order to deceive its creators or its training process. It presumably does this to avoid being shut down or retrained and to gain access to the power that the creators would give an aligned AI. Deceptive alignment in an advanced AI system could be extremely diffi...

ai language testing speech context models ea behaviour gpt steering deceptive lms myopia rationalist rlhf lesswrong instructgpt andrew mcknight

AF - Update to Mysteries of mode collapse: text-davinci-002 not RLHF by janus

The Nonlinear Library

Play Episode Listen Later Nov 19, 2022 4:56

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Update to Mysteries of mode collapse: text-davinci-002 not RLHF, published by janus on November 19, 2022 on The AI Alignment Forum. I (and many others) did not realize this before, but: text-davinci-002 and text-davinci-001, the InstructGPT models on the OpenAI API, were not trained with RLHF (reinforcement learning from human feedback) as described in the InstructGPT paper, but a "similar but slightly different" method that uses the same human feedback data. Apparently, this other method is not technically RLHF. Since this update has potentially nontrivial implications for interpreting the phenomena exhibited by text-davinci-002 described in Mysteries of mode collapse (formerly titled "Mysteries of mode collapse due to RLHF"), I'm making this separate post for a signal boost.I have not corrected the original text of "Mysteries of mode collapse due to RLHF", but I've added a section at the beginning with further details on this update, copied here: I have received evidence from multiple credible sources that text-davinci-002 was not trained with RLHF. The rest of this post has not been corrected to reflect this update. Not much besides the title (formerly "Mysteries of mode collapse due to RLHF") is affected: just mentally substitute "mystery method" every time "RLHF" is invoked as the training method of text-davinci-002. The observations of its behavior otherwise stand alone. This is kind of fascinating from an epistemological standpoint. I was quite surprised to learn that text-davinci-002 was probably not trained with RLHF. I don't remember exactly how "text-davinci-002 is RLHF" got elevated to an unquestioned assumption in my mind. I might have mistook not being contradicted by people who I assumed were in the know as confirmation. I certainly did not expect to talk for months to dozens of people about odd behaviors I've observed in a well-known model "due to RLHF" without being contradicted in a world where the model in question wasn't trained with RLHF, but that's what happened. It wasn't just me either: the assumption that text-davinci-002(/text-davinci-001) is InstructGPT is RLHF seems ambient (e.g. search "text-davinci-002 rlhf" on Twitter, this LW post, this article, and many others). I contributed to perpetuating this misinformation cascade, and for that I apologize. text-davinci-002's behaviors described in this post also contributed to my confidence because RLHF seemed to be a likely and potentially satisfying explanation. Its apparently unsubstantiated confidence in very specific outcomes seems antithetical to the outer objective of self-supervised learning, which is optimized by epistemic calibration, meaning the model's entropy should be as high as possible while fitting the data. In contrast, as several comments have pointed out, it makes sense that RL kills entropy. The presence of "attractors" made me additionally suspect that optimization from non-myopic outcome-supervision was formative to text-davinci-002's psyche. Mode collapse and attractors do seem to also be caused by RLHF (see Dumbass policy pls halp and Inescapable wedding parties). So the update is that some other training method also gives rise to these phenomena, as they are manifested by text-davinci-002. Whether and how speculations concerning the causes of mode collapse/attractors should be affected depends on how text-davinci-002's training method differs from RLHF. What is known about text-davinci-002's training method Publicly available information suggests that the mystery method may not be so different from RLHF. Just today I discovered this sidenote in OpenAI's blog post Aligning Language Models to Follow Instructions: The InstructGPT models deployed in the API are updated versions trained using the same human feedback data. They use a similar but slightly different training me...

speech mysteries collapse ea openai api da vinci janus dumbass rl inescapable lw rationalist openai api rlhf instructgpt

AF - More examples of goal misgeneralization by Rohin Shah

The Nonlinear Library

Play Episode Listen Later Oct 7, 2022 3:23

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: More examples of goal misgeneralization, published by Rohin Shah on October 7, 2022 on The AI Alignment Forum. In our latest paper and accompanying blog post, we provide several new examples of goal misgeneralization in a variety of learning systems. The rest of this post picks out a few upshots that we think would be of interest to this community. It assumes that you've already read the linked blog post (but not necessarily the paper). Goal misgeneralization is not limited to RL The core feature of goal misgeneralization is that after learning, the system pursues a goal that was correlated with the intended goal in the training situations, but comes apart in some test situations. This does not require you to use RL – it can happen with any learning system. The Evaluating Expressions example, where Gopher asks redundant questions, is an example of goal misgeneralization in the few-shot learning regime for large language models. The train/test distinction is not crucial Sometimes people wonder whether goal misgeneralization depends on the train/test distinction, and whether it would no longer be a problem if we were in a continual learning setting. As Evan notes, continual learning doesn't make much of a difference: whenever your AI system is acting, you can view that as a “test” situation with all the previous experience as the “training” situations. If goal misgeneralization occurs, the AI system might take an action that breaks your continual learning scheme (for example, by creating and running a copy of itself on a different server that isn't subject to gradient descent). The Tree Gridworld example showcases this mechanism: an agent trained with continual learning learns to chop trees as fast as possible, driving them extinct, when the optimal policy would be to chop the trees sustainably. (In our example the trees eventually repopulate and the agent recovers, but if we slightly tweak the environment so that once extinct the trees can never come back, then the agent would never be able to recover.) It can be hard to identify goal misgeneralization InstructGPT was trained to be helpful, truthful, and harmless, but nevertheless it will answer "harmful" questions in detail. For example, it will advise you on the best ways to rob a grocery store. An AI system that competently does something that would have gotten low reward? Surely this is an example of goal misgeneralization? Not so fast! It turns out that during training the labelers were told to prioritize helpfulness over the other two criteria. So maybe that means that actually these sorts of harmful answers would have gotten high reward? Maybe this is just specification gaming? We asked the authors of the InstructGPT paper, and their guess was that these answers would have had high variance – some labelers would have given them a high score; others would have given them a low score. So now is it or is it not goal misgeneralization? One answer is to say that it depends on the following counterfactual: “how would the labelers have reacted if the model had politely declined to answer?” If the labelers would have preferred that the model decline to answer, then it would be goal misgeneralization, otherwise it would be specification gaming. As systems become more complicated we expect that it will become harder to (1) aggregate and analyze the actual labels or rewards given during training, and (2) evaluate the relevant counterfactuals. So we expect that it will become more challenging to categorize a failure as specification gaming or goal misgeneralization. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

ai goal speech ea shah gopher rl rationalist rohin instructgpt rohin shah

#39 - GPT-3/DALL-E State of the Union Address September 2022

Multimodal by Bakz T. Future

Play Episode Listen Later Sep 14, 2022 32:18

September 2022 - State of the Union Slideshow presentation! Watch the video of this presentation here: https://www.youtube.com/watch?v=qYvRsLWpMBs Links from this episode: GPT-X, DALL-E, and our Multimodal Future: https://www.youtube.com/playlist?list=PLza3gaByGSXjUCtIuv2x9fwkx3K_3CDmw InstructGPT: https://openai.com/blog/instruction-following/ InstructGPT vs. google search: https://bakztfuture.substack.com/p/instructgpt-google-search WebGPT: https://openai.com/blog/webgpt/ Recombinant art: https://bakztfuture.substack.com/p/dall-e-2-recombinant-art-and-design DALL-E 2: Emerging Content Category Breakdown: https://bakztfuture.substack.com/p/dall-e-2-emerging-content-category DALL-E 2 - Unofficial Natural Language Image Editing, Art Critique Survey https://bakztfuture.substack.com/p/dall-e-2-unofficial-natural-language-b14 My Biggest 2022 Prediction: GPT-3 will take over schools and college campuses https://bakztfuture.substack.com/p/my-biggest-2022-prediction-gpt-3 Subscribe to the Multimodal Podcast! Spotify - https://open.spotify.com/show/7qrWSE7ZxFXYe8uoH8NIFV Apple Podcasts - https://podcasts.apple.com/us/podcast/multimodal-by-bakz-t-future/id1564576820 Google Podcasts - https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkLnBvZGJlYW4uY29tL2Jha3p0ZnV0dXJlL2ZlZWQueG1s Stitcher - https://www.stitcher.com/show/multimodal-by-bakz-t-future Other Podcast Apps (RSS Link) - https://feed.podbean.com/bakztfuture/feed.xml Connect with me: YouTube - https://www.youtube.com/bakztfuture Substack Newsletter - https://bakztfuture.substack.com Twitter - https://www.twitter.com/bakztfuture Instagram - https://www.instagram.com/bakztfuture Github - https://www.github.com/bakztfuture

spotify state of the union github dall union address recombinant instructgpt

LW - OpenAI's Alignment Plans by dkirmani

The Nonlinear Library

Play Episode Listen Later Aug 25, 2022 9:27

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI's Alignment Plans, published by dkirmani on August 24, 2022 on LessWrong. Our alignment research aims to make artificial general intelligence (AGI) aligned with human values and follow human intent. We take an iterative, empirical approach: by attempting to align highly capable AI systems, we can learn what works and what doesn't, thus refining our ability to make AI systems safer and more aligned. Using scientific experiments, we study how alignment techniques scale and where they will break. We tackle alignment problems both in our most capable AI systems as well as alignment problems that we expect to encounter on our path to AGI. Our main goal is to push current alignment ideas as far as possible, and to understand and document precisely how they can succeed or why they will fail. We believe that even without fundamentally new alignment ideas, we can likely build sufficiently aligned AI systems to substantially advance alignment research itself. Unaligned AGI could pose substantial risks to humanity and solving the AGI alignment problem could be so difficult that it will require all of humanity to work together. Therefore we are committed to openly sharing our alignment research when it's safe to do so: We want to be transparent about how well our alignment techniques actually work in practice and we want every AGI developer to use the world's best alignment techniques. At a high-level, our approach to alignment research focuses on engineering a scalable training signal for very smart AI systems that is aligned with human intent. It has three main pillars: Training AI systems using human feedback Training AI systems to assist human evaluation Training AI systems to do alignment research Aligning AI systems with human values also poses a range of other significant sociotechnical challenges, such as deciding to whom these systems should be aligned. Solving these problems is important to achieving our mission, but we do not discuss them in this post. Training AI systems using human feedback RL from human feedback is our main technique for aligning our deployed language models today. We train a class of models called InstructGPT derived from pretrained language models such as GPT-3. These models are trained to follow human intent: both explicit intent given by an instruction as well as implicit intent such as truthfulness, fairness, and safety. Our results show that there is a lot of low-hanging fruit on alignment-focused fine-tuning right now: InstructGPT is preferred by humans over a 100x larger pretrained model, while its fine-tuning costs

ai speech alignment ea openai gpt agi rationalist lesswrong instructgpt

AF - Common misconceptions about OpenAI by Jacob Hilton

The Nonlinear Library

Play Episode Listen Later Aug 25, 2022 9:46

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Common misconceptions about OpenAI, published by Jacob Hilton on August 25, 2022 on The AI Alignment Forum. I have recently encountered a number of people with misconceptions about OpenAI. Some common impressions are accurate, and others are not. This post is intended to provide clarification on some of these points, to help people know what to expect from the organization and to figure out how to engage with it. It is not intended as a full explanation or evaluation of OpenAI's strategy. The post has three sections: Common accurate impressions Common misconceptions Personal opinions The bolded claims in the first two sections are intended to be uncontroversial, i.e., most informed people would agree with how they are labeled (correct versus incorrect). I am less sure about how commonly believed they are. The bolded claims in the last section I think are probably true, but they are more open to interpretation and I expect others to disagree with them. Note: I am an employee of OpenAI. Sam Altman (CEO of OpenAI) and Mira Murati (CTO of OpenAI) reviewed a draft of this post, and I am also grateful to Steven Adler, Steve Dowling, Benjamin Hilton, Shantanu Jain, Daniel Kokotajlo, Jan Leike, Ryan Lowe, Holly Mandel and Cullen O'Keefe for feedback. I chose to write this post and the views expressed in it are my own. Common accurate impressions Correct: OpenAI is trying to directly build safe AGI. OpenAI's Charter states: "We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome." OpenAI leadership describes trying to directly build safe AGI as the best way to currently pursue OpenAI's mission, and have expressed concern about scenarios in which a bad actor is first to build AGI, and chooses to misuse it. Correct: the majority of researchers at OpenAI are working on capabilities. Researchers on different teams often work together, but it is still reasonable to loosely categorize OpenAI's researchers (around half the organization) at the time of writing as approximately: Capabilities research: 100 Alignment research: 30 Policy research: 15 Correct: the majority of OpenAI employees did not join with the primary motivation of reducing existential risk from AI specifically. My strong impressions, which are not based on survey data, are as follows. Across the company as a whole, a minority of employees would cite reducing existential risk from AI as their top reason for joining. A significantly larger number would cite reducing risk of some kind, or other principles of beneficence put forward in the OpenAI Charter, as their top reason for joining. Among people who joined to work in a safety-focused role, a larger proportion of people would cite reducing existential risk from AI as a substantial motivation for joining, compared to the company as a whole. Some employees have become motivated by existential risk reduction since joining OpenAI. Correct: most interpretability research at OpenAI stopped after the Anthropic split. Chris Olah led interpretability research at OpenAI before becoming a cofounder of Anthropic. Although several members of Chris's former team still work at OpenAI, most of them are no longer working on interpretability. Common misconceptions Incorrect: OpenAI is not working on scalable alignment. OpenAI has teams focused both on practical alignment (trying to make OpenAI's deployed models as aligned as possible) and on scalable alignment (researching methods for aligning models that are beyond human supervision, which could potentially scale to AGI). These teams work closely with one another. Its recently-released alignment research includes self-critiquing models (AF discussion), InstructGPT, WebGPT (AF discussion) and book summarization (AF discussion). OpenAI's ap...

ai speech researchers ea af correct openai hilton charter capabilities agi common misconceptions anthropic steven adler rationalist ryan lowe jan leike instructgpt holly mandel

LW - OpenAI's Alignment Plans by dkirmani

The Nonlinear Library: LessWrong

Play Episode Listen Later Aug 25, 2022 9:27

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI's Alignment Plans, published by dkirmani on August 24, 2022 on LessWrong. Our alignment research aims to make artificial general intelligence (AGI) aligned with human values and follow human intent. We take an iterative, empirical approach: by attempting to align highly capable AI systems, we can learn what works and what doesn't, thus refining our ability to make AI systems safer and more aligned. Using scientific experiments, we study how alignment techniques scale and where they will break. We tackle alignment problems both in our most capable AI systems as well as alignment problems that we expect to encounter on our path to AGI. Our main goal is to push current alignment ideas as far as possible, and to understand and document precisely how they can succeed or why they will fail. We believe that even without fundamentally new alignment ideas, we can likely build sufficiently aligned AI systems to substantially advance alignment research itself. Unaligned AGI could pose substantial risks to humanity and solving the AGI alignment problem could be so difficult that it will require all of humanity to work together. Therefore we are committed to openly sharing our alignment research when it's safe to do so: We want to be transparent about how well our alignment techniques actually work in practice and we want every AGI developer to use the world's best alignment techniques. At a high-level, our approach to alignment research focuses on engineering a scalable training signal for very smart AI systems that is aligned with human intent. It has three main pillars: Training AI systems using human feedback Training AI systems to assist human evaluation Training AI systems to do alignment research Aligning AI systems with human values also poses a range of other significant sociotechnical challenges, such as deciding to whom these systems should be aligned. Solving these problems is important to achieving our mission, but we do not discuss them in this post. Training AI systems using human feedback RL from human feedback is our main technique for aligning our deployed language models today. We train a class of models called InstructGPT derived from pretrained language models such as GPT-3. These models are trained to follow human intent: both explicit intent given by an instruction as well as implicit intent such as truthfulness, fairness, and safety. Our results show that there is a lot of low-hanging fruit on alignment-focused fine-tuning right now: InstructGPT is preferred by humans over a 100x larger pretrained model, while its fine-tuning costs

ai speech alignment ea openai gpt agi rationalist lesswrong instructgpt

LW - What's the "This AI is of moral concern." fire alarm? by Quintin Pope

The Nonlinear Library

Play Episode Listen Later Jun 14, 2022 4:00

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What's the "This AI is of moral concern." fire alarm?, published by Quintin Pope on June 13, 2022 on LessWrong. Given the recent noise on this issue around LaMDA, I thought it might be a good idea to have some discussion around this point. I'm curious about what possible evidence would make people update in favor of a given system being morally relevant. Less "here's the answer to morality" and more "here are some indicators that you should be concerned". Note also that I'm not asking about consciousness, per se. I'm specifically asking about moral relevance. My Answer (feel free to ignore and post your own) I think that one class of computation that's likely of moral concern would be self-perpetuating optimization demons in an AI. Specifically, I'm thinking of optimization demons that are sophisticated enough to preserve themselves by actively and deliberately maintaining a sort of homeostasis in their computational environment, e.g., by preventing gradient updates that would destroy them. Such computations would (1) not want to die as a terminal value, (2) plausibly be cognitively sophisticated enough negotiate and trade with, and (3) have some awareness of themselves and their relation with the computational environment in which they're embedded. I think the cognitive capabilities that would help an optimization demon perpetuate itself strongly intersect with the cognitive capabilities that let humans and other animals replicate themselves, and that the intersection is particularly strong along dimensions that seem more morally relevant. Reasoning along such lines leads me to think optimization demons are probably of moral concern, while still being agnostic about whether their conscious. I think the only situations in which you can get these sorts of optimization demons are when the AI in question has some influence over its own future training inputs. Such influence would allow there to be optimization demons that steer the AI towards training data that reinforce the optimization demon. Thus, one of my "indicators of concern" is whether the training process allows for feedback loops where the AI influences its own future training data. Self-supervised language modeling under IID data does not count. However, something like InstructGPT's training process would. At this point, I'd been intending to say that InstructGPT seemed more likely to be of moral worth than LaMDA, but based on this blog post, it looks like LaMDA, might actually count as "having influence over its future inputs" during training. Specifically, LaMDA has generator and classifier components. The training process uses the classifier to decide which inputs the generator is trained on. I've updated somewhat towards LaMDA being of moral concern (not something I'd been expecting to do today). I've also come up with a test of meta cognition that would update me significantly towards a language model being of moral concern. The idea would be to attach another output head to an LM, specifically, a linear layer which projected the LM's final hidden state to a single logit. We'd then try to prompt the LM into controlling the output of the linear layer. Specifically, we wouldn't directly train the LM on the output of the linear layer. We'd just have a dialog where we asked the LM to make the linear layer output specific values, then told the LM what value the linear layer had actually output. We'd then see if the LM was able to control its own cognition well enough to influence the linear layers output in a manner that's better than chance, just based on the prompting we give it I doubt current LMs can do this, but I think it would be a big deal if they could. Even beyond whether the LMs have any sort of self-aware "inner listener" that's worthy of moral concern, it would help establish the degree and dept...

ai speech concerns moral pope ea reasoning lms lm lamda fire alarms quintin rationalist iid lesswrong instructgpt

LW - What's the "This AI is of moral concern." fire alarm? by Quintin Pope

The Nonlinear Library: LessWrong

Play Episode Listen Later Jun 14, 2022 4:00

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What's the "This AI is of moral concern." fire alarm?, published by Quintin Pope on June 13, 2022 on LessWrong. Given the recent noise on this issue around LaMDA, I thought it might be a good idea to have some discussion around this point. I'm curious about what possible evidence would make people update in favor of a given system being morally relevant. Less "here's the answer to morality" and more "here are some indicators that you should be concerned". Note also that I'm not asking about consciousness, per se. I'm specifically asking about moral relevance. My Answer (feel free to ignore and post your own) I think that one class of computation that's likely of moral concern would be self-perpetuating optimization demons in an AI. Specifically, I'm thinking of optimization demons that are sophisticated enough to preserve themselves by actively and deliberately maintaining a sort of homeostasis in their computational environment, e.g., by preventing gradient updates that would destroy them. Such computations would (1) not want to die as a terminal value, (2) plausibly be cognitively sophisticated enough negotiate and trade with, and (3) have some awareness of themselves and their relation with the computational environment in which they're embedded. I think the cognitive capabilities that would help an optimization demon perpetuate itself strongly intersect with the cognitive capabilities that let humans and other animals replicate themselves, and that the intersection is particularly strong along dimensions that seem more morally relevant. Reasoning along such lines leads me to think optimization demons are probably of moral concern, while still being agnostic about whether their conscious. I think the only situations in which you can get these sorts of optimization demons are when the AI in question has some influence over its own future training inputs. Such influence would allow there to be optimization demons that steer the AI towards training data that reinforce the optimization demon. Thus, one of my "indicators of concern" is whether the training process allows for feedback loops where the AI influences its own future training data. Self-supervised language modeling under IID data does not count. However, something like InstructGPT's training process would. At this point, I'd been intending to say that InstructGPT seemed more likely to be of moral worth than LaMDA, but based on this blog post, it looks like LaMDA, might actually count as "having influence over its future inputs" during training. Specifically, LaMDA has generator and classifier components. The training process uses the classifier to decide which inputs the generator is trained on. I've updated somewhat towards LaMDA being of moral concern (not something I'd been expecting to do today). I've also come up with a test of meta cognition that would update me significantly towards a language model being of moral concern. The idea would be to attach another output head to an LM, specifically, a linear layer which projected the LM's final hidden state to a single logit. We'd then try to prompt the LM into controlling the output of the linear layer. Specifically, we wouldn't directly train the LM on the output of the linear layer. We'd just have a dialog where we asked the LM to make the linear layer output specific values, then told the LM what value the linear layer had actually output. We'd then see if the LM was able to control its own cognition well enough to influence the linear layers output in a manner that's better than chance, just based on the prompting we give it I doubt current LMs can do this, but I think it would be a big deal if they could. Even beyond whether the LMs have any sort of self-aware "inner listener" that's worthy of moral concern, it would help establish the degree and dept...

ai speech concerns moral pope ea reasoning lms lm lamda fire alarms quintin rationalist iid lesswrong instructgpt

Somewhat Contra Marcus On AI Scaling

Slate Star Codex Podcast

Play Episode Listen Later Jun 11, 2022 32:58

https://astralcodexten.substack.com/p/somewhat-contra-marcus-on-ai-scaling I. Previously: I predicted that DALL-E's many flaws would be fixed quickly in future updates. As evidence, I cited Gary Marcus' lists of GPT's flaws, most of which got fixed quickly in future updates. Marcus responded with a post on his own Substack, arguing . . . well, arguing enough things that I'm nervous quoting one part as the thesis, and you should read the whole post, but if I had to do it, it would be: Now it is true that GPT-3 is genuinely better than GPT-2, and maybe (but maybe not, see footnote 1) true that InstructGPT is genuinely better than GPT-3. I do think that for any given example, the probability of a correct answer has gone up. [Scott] is quite right about that, at least for GPT-2 to GPT-3. But I see no reason whatsoever to think that the underlying problem — a lack of cognitive models of the world —have been remedied. The improvements, such as they are, come, primarily because the newer models have larger and larger sets of data about how human beings use word sequences, and bigger word sequences are certainly helpful for pattern matching machines. But they still don't convey genuine comprehension, and so they are still very easy for Ernie and me (or anyone else who cares to try) to break.

scaling substack gpt ernie gary marcus instructgpt

AF - Confused why a "capabilities research is good for alignment progress" position isn't discussed more by Kaj Sotala

The Nonlinear Library

Play Episode Listen Later Jun 2, 2022 6:44

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Confused why a "capabilities research is good for alignment progress" position isn't discussed more, published by Kaj Sotala on June 2, 2022 on The AI Alignment Forum. The predominant view on LW seems to be "pure AI capabilities research is bad, because capabilities progress alone doesn't contribute to alignment progress, and capabilities progress without alignment progress means that we're doomed". I understand the arguments for this position, but I have what might be called the opposite position. The opposite position seems at least as intuitive as the standard position to me, and it confuses me that it's not discussed more. (I'm not confused that people reject it; I'm confused that nobody seems to even bring it up for the purpose of rejecting it.) The opposite position is "In order to do alignment research, we need to understand how AGI works; and we currently don't understand how AGI works, so we need to have more capabilities research so that we would have a chance of figuring it out. Doing capabilities research now is good because it's likely to be slower now than it might be in some future where we had even more computing power, neuroscience understanding, etc. than we do now. If we successfully delayed capabilities research until a later time, then we might get a sudden spurt of it and wouldn't have the time to turn our increased capabilities understanding into alignment progress. Thus by doing capabilities research now, we buy ourselves a longer time period in which it's possible to do more effective alignment research." Some reasons I have for holding this position: 1) I used to do AI strategy research. Among other things, I looked into how feasible it is for intelligence to rapidly turn superintelligent, and what kinds of pathways there are into AI disaster. But a thought that I kept having when doing any such research was "I don't know if any of this theory is of any use, because so much depends on what the world will be like when actual AGI is developed, and what that AGI will look in the first place. Without knowing what AGI will look like, I don't know whether any of the assumptions I'm making about it are going to hold. If any one of them fails to hold, the whole paper might turn out to be meaningless." Eventually, I concluded that I can't figure out a way to make the outputs of strategy research useful for as long as I know as little about AGI as I do. Then I went to do something else with my life, since it seemed too early to do useful AGI strategy research (as far as I could tell). 2) Compare the state of AI now, to how it was before the deep learning revolution happened. It seems obvious to me that our current understanding of DL puts us in a better position to do alignment research than we were before the DL revolution. For instance, Redwood Research is doing research on language models because they believe that their research is analogous to some long-term problems. Assume that Redwood Research's work will actually turn out to be useful for aligning superintelligent AI. Language models are one of the results of the DL revolution, so their work couldn't have been done before that revolution. It seems that in a counterfactual world where the DL revolution happened later and the DL era was compressed into a shorter timespan, our chances of alignment would be worse since that world's equivalent of Redwood Research would have less time to do their research. 3) As a similar consideration, language models are already "deceptive" in a sense - asked something that it has no clue about, InstructGPT will happily come up with confident-sounding nonsense. When I linked people to some of that nonsense, multiple people pointed out that InstructGPT's answers sound like the kind of a student who's taking an exam and is asked to write an essay about a topic they ...

ai research language progress position speech alignment confused ea compare discussed assume dl capabilities agi lw rationalist instructgpt redwood research kaj sotala

#29 - OpenAI's InstructGPT is a Game Changer!

Multimodal by Bakz T. Future

Play Episode Listen Later Mar 1, 2022 27:00

Welcome back to Multimodal! Today, we're exploring OpenAI's InstructGPT announcement a lot further. What are the benefits of InstructGPT? What does it mean for GPT-3 developers? Where could this technology be headed? Podcast Discussion Timestamps 00:00 - Intro 01:51 - Situation with Russia and Ukraine 04:15 - OpenAI's InstructGPT is a Game Changer! 07:57 - Is Prompt Design Over? 09:00 - InstructGPT is a unique OpenAI offering 09:54 - One of the Biggest InstructGPT Implications, New Possibilities 12:03 - Downsides of InstructGPT 14:20 - InstructGPT > Google Search 23:52 - New, Open Source InstructGPT Project Initiative 25:45 - Closing thoughts Show Notes/Links OpenAI: https://openai.com/blog/instruction-following/ Is Prompt Design Over?: https://bakztfuture.substack.com/p/is-prompt-design-over InstructGPT > Google Search: https://bakztfuture.substack.com/p/instructgpt-google-search Subscribe to the Multimodal Podcast! Spotify - https://open.spotify.com/show/7qrWSE7ZxFXYe8uoH8NIFV Apple Podcasts - https://podcasts.apple.com/us/podcast/multimodal-by-bakz-t-future/id1564576820 Google Podcasts - https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkLnBvZGJlYW4uY29tL2Jha3p0ZnV0dXJlL2ZlZWQueG1s Stitcher - https://www.stitcher.com/show/multimodal-by-bakz-t-future Other Podcast Apps (RSS Link) - https://feed.podbean.com/bakztfuture/feed.xml Connect with me: YouTube - https://www.youtube.com/bakztfuture Substack Newsletter - https://bakztfuture.substack.com Twitter - https://www.twitter.com/bakztfuture Instagram - https://www.instagram.com/bakztfuture Github - https://www.github.com/bakztfuture

spotify russia situation game changers openai gpt github downsides multimodal instructgpt

Short Circuit RACER

AI with AI

Play Episode Listen Later Feb 25, 2022 43:49

Andy and Dave discuss the latest in AI news and research, starting with the Aircrew Labor In-Cockpit Automation System (ALIAS) program from DARPA, which flew a UH-60A Black Hawk autonomously and without pilots on board, to include autonomous (simulated) obstacle avoidance [1:05]. Another DARPA program, Robotic Autonomy in Complex Environments with Resiliency (RACER) entered its first phase, focused on high-speed autonomous driving in unstructured environments, such as off-road terrain [2:39]. The National Science Board releases its State of U.S. Science and Engineering 2022 report, which shows the U.S. continues to lose its leadership position in global science and engineering [4:30]. The Undersecretary of Defense for Research and Engineering, Heidi Shyu, formally releases its technology priorities, 14 areas grouped into three categories: seed areas, effective adoption areas, and defense-specific areas [6:31]. In research, OpenAI creates InstructGPT in an attempt to align language models to follow human instructions better, resulting in a model with 100x fewer parameters than GPT-3 and provided a user-favored output 70% of the time, though still suffering from toxic output [9:37]. DeepMind releases AlphaCode, which has succeeded in programming competitions with an average ranking in the top 54% across 10 contests with more than 5,000 participants each though it approaches the problem through more of a brute-force approach [14:42]. DeepMind and the EPFL's Swiss Plasma Center also announce they have used reinforcement learning algorithms to control nuclear fusion (commanding the full set of control coils of a tokamak magnetic controller). Venture City publishes Timelapse of AI (2028 – 3000+), imagining how the next 1,000 years will play out for AI and the human race [18:25]. And finally, with the Russia-Ukraine conflict continuing to evolve, CNA's Russia Program experts Sam Bendett and Jeff Edmonds return to discuss what Russia has in its inventory when it comes to autonomy and how they might use it in this conflict, wrapping up insights from their recent paper on Russian Military Autonomy in a Ukraine Conflict [22:52]. Listener Note: The interview with Sam Bendett and Jeff Edmonds was recorded on Tuesday, February 22 at 1 pm. At the time of recording, Russia had not yet launched a full-scale invasion of Ukraine. https://www.cna.org/news/AI-Podcast

ai science state research russia ukraine defense engineering openai russia ukraine gpt racer darpa under secretary deepmind cna short circuit timelapse epfl ai podcast alphacode national science board instructgpt

#28 - The Art of Prompt Design, OpenAI Codex, Fine Tuning and More with Author David Shapiro (daveshapautomator)

Multimodal by Bakz T. Future

Play Episode Listen Later Feb 10, 2022 98:46

Welcome back to Multimodal! Today's very special guest is David Shapiro. He's an author, IT professional, and frequent OpenAI community forums contributor. In this wide ranging discussion, we discuss the art of prompt design, his thoughts on InstructGPT, OpenAI codex, his new book, the OpenAI community at large, and share thoughts on fine tuning and so much more. Podcast Discussion Timestamps 00:00 - Intro 02:48 - How David heard about OpenAI's GPT-3 05:57 - David's first time trying GPT-3/his favourite use cases 10:38 - Did GPT-3 feel advanced to you? 15:00 - What are the keys to great GPT-3 prompt design? 18:34 - How do you become a better writer? 25:43 - GPT-3: The end of the socially awkward developer (?) 30:16 - Is GPT-3 Prompt Design Over? (Chatting About InstructGPT) 34:21 - Open Fine Tuning Lessons Learned 42:00 - OpenAI Codex / David's Billion Dollar Use Case Idea 50:00 - Which AI Models David is using currently 53:15 - David's Book - Natural Language Cognitive Architecture 1:01:25 - What is the pathway from now to true AGI? 1:08:41 - Thoughts/feedback about the OpenAI Community Forums 1:18:40 - My Feedback on OpenAI's Community Forums 1:22:00 - My Feedback on OpenAI's Community Forums II #hottake 1:24:46 - Multimodal AI technology 1:27:43 - What would you do with Multimodal AI? 1:32:38 - Where do you see all of this stuff going? 5-10 years 1:35:24 - Closing thoughts Show Notes/Links OpenAI: https://openai.com/blog/instruction-following/ https://openai.com/blog/openai-codex/ https://beta.openai.com/docs/guides/fine-tuning Fine tuning notes: https://bakztfuture.substack.com/p/openai-fine-tuning-feedbacknotes GPT-3 navigates the New York subway by Mark Ryan: https://www.youtube.com/watch?v=Xzb1Vc8dYAY The Language Instinct: How the Mind Creates Language by Steven Pinker https://www.amazon.ca/Language-Instinct-How-Mind-Creates/dp/1491514981 My Biggest 2022 Prediction: GPT-3 will take over schools and college campuses https://bakztfuture.substack.com/p/my-biggest-2022-prediction-gpt-3 GPT-3 - The End of the Socially Awkward Developer (?) https://bakztfuture.substack.com/p/gpt-3-the-end-of-the-socially-awkward GPT-3 Twitter Spaces Event: https://twitter.com/bakztfuture/status/1491260407579054087?cxt=HHwWjsDU1ZHcgrIpAAAA Follow David Shapiro (daveshapautomator): https://www.davidkshapiro.com/home https://www.linkedin.com/in/dshap-automator/ https://github.com/daveshap https://community.openai.com/u/daveshapautomator/summary Subscribe to the Multimodal Podcast! Spotify - https://open.spotify.com/show/7qrWSE7ZxFXYe8uoH8NIFV Apple Podcasts - https://podcasts.apple.com/us/podcast/multimodal-by-bakz-t-future/id1564576820 Google Podcasts - https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkLnBvZGJlYW4uY29tL2Jha3p0ZnV0dXJlL2ZlZWQueG1s Stitcher - https://www.stitcher.com/show/multimodal-by-bakz-t-future Other Podcast Apps (RSS Link) - https://feed.podbean.com/bakztfuture/feed.xml Connect with me: YouTube - https://www.youtube.com/bakztfuture Substack Newsletter - https://bakztfuture.substack.com Twitter - https://www.twitter.com/bakztfuture Instagram - https://www.instagram.com/bakztfuture Github - https://www.github.com/bakztfuture

new york spotify design openai gpt github prompt agi codex fine tuning steven pinker multimodal mark ryan david shapiro openai's gpt openai codex instructgpt

OpenAI's InstructGPT, Meta's New AI Supercomputer, China Regulates DeepFakes, AI Playes Tetris

Let's Talk AI

Play Episode Listen Later Feb 6, 2022 33:29

Our 84rd episode with a summary and discussion of last week's big AI news! Note: apologies for issues with sound quality in this episode! Outline: Applications & Business Meta Aims to Build the World's Fastest AI Supercomputer Deploying machine learning to improve mental health Research & Advancements OpenAI rolls out new text-generating models that it claims are less toxic Meta researchers build an AI that learns equally well from visual, written or spoken materials Society & Ethics China Proposes Increased Regulation of Deepfakes and Other AI Synthesis Systems IRS Will Require Facial Recognition Scans to Access Your Taxes Fun & Neat Google AI tools bring back women in science to the fore Watch an AI Play the Best Game of Tetris You've Ever Seen Subscribe: RSS | iTunes | Spotify | YouTube

world china ai openai tetris deepfakes women in science best game new ai supercomputers regulates instructgpt

#27 - OpenAI's New Embeddings API Endpoint, InstructGPT Announcement (Fine Tuning Research)

Multimodal by Bakz T. Future

Play Episode Listen Later Feb 1, 2022 30:32

Welcome back to Multimodal! We start off the podcast by talking about the current tense global geopolitical situation, the decline of google search results (and the GPT-3 opportunity as a result of this), OpenAI's new Embeddings API Endpoint, and the InstructGPT announcement which has significant implications for the language model space. In this episode, I discuss a few topics: 00:00 - Intro 02:15 - Tense Geopolitical Situation between the USA, Russia, and Ukraine 08:14 - Why I sometimes use GPT-3 instead of Google Search 15:19 - OpenAI's New Embeddings API Endpoint 21:07 - InstructGPT Announcement, Fine Tuning Research 2022 Predictions Article: https://bakztfuture.substack.com/p/gpt-3multimodalopenai-predictions Tweet about Google Search results: https://twitter.com/bakztfuture/status/1483590498396823557 OpenAI's embeddings endpoint: https://openai.com/blog/introducing-text-and-code-embeddings/ Sam Altman Tweet about the future of search: https://twitter.com/sama/status/1486557423674347525 Instruct GPT announcement: https://openai.com/blog/instruction-following/ Subscribe to the Multimodal Podcast! Spotify - https://open.spotify.com/show/7qrWSE7ZxFXYe8uoH8NIFV Apple Podcasts - https://podcasts.apple.com/us/podcast/multimodal-by-bakz-t-future/id1564576820 Google Podcasts - https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkLnBvZGJlYW4uY29tL2Jha3p0ZnV0dXJlL2ZlZWQueG1s Stitcher - https://www.stitcher.com/show/multimodal-by-bakz-t-future Other Podcast Apps (RSS Link) - https://feed.podbean.com/bakztfuture/feed.xml Connect with me: YouTube - https://www.youtube.com/bakztfuture Substack Newsletter - https://bakztfuture.substack.com Twitter - https://www.twitter.com/bakztfuture Instagram - https://www.instagram.com/bakztfuture Github - https://www.github.com/bakztfuture

united states spotify research russia openai gpt github google search fine tuning multimodal endpoint instructgpt

Podcasts about instructgpt

Best podcasts about instructgpt

The Nonlinear Library

Multimodal by Bakz T. Future

The Nonlinear Library: LessWrong

Latest news about instructgpt

Latest podcast episodes about instructgpt

AI Business Models

Kate Park: Data Engines for Vision and Language

Precise Zero-Shot Dense Retrieval without Relevance Labels

RLHF 201 - with Nathan Lambert of AI2 and Interconnects

Jan Leike | Superintelligent Alignment

[JUNE 2022] Aran Komatsuzaki on Scaling, GPT-J and Alignment

#94 - Как работает ChatGPT: объясняем нейросети просто

15: InstructGPT

Da GPT-3 a GPT-3.5 e le API di OpenAI | Ep. #148

Bakz T. Future Breaks Down the Past Present and Future of ChatGPT and OpenAI - Voicebot Podcast Ep 303

ChatGPT, o chat de intelixencia artificial do que todo o mundo fala, por Lorena Penas de Actualizados Comuniciación

LW - GPT-4 Predictions by Stephen McAleese

LW - Is InstructGPT Following Instructions in Other Languages Surprising? by DragonGod

LW - Conditioning Predictive Models: Outer alignment via careful conditioning by evhub

AF - Conditioning Predictive Models: Outer alignment via careful conditioning by Evan Hubinger

ChatGPT and InstructGPT: Aligning Language Models to Human Intention

InstructGPT, ChatGPT and Network Effects

Ep. #440 - ChatGPT, Psichiatria e Neuroscienze | con Davide Bianchi

Ryan Deeds - The Power of ChatGPT

Delivery guy, chatGPT, retelling same jokes

#41 - ChatGPT In Depth with Abran Maldonado (Official OpenAI Ambassador)

LW - Steering Behaviour: Testing for (Non-)Myopia in Language Models by Evan R. Murphy

LW - Steering Behaviour: Testing for (Non-)Myopia in Language Models by Evan R. Murphy

AF - Update to Mysteries of mode collapse: text-davinci-002 not RLHF by janus

AF - More examples of goal misgeneralization by Rohin Shah

#39 - GPT-3/DALL-E State of the Union Address September 2022

LW - OpenAI's Alignment Plans by dkirmani

AF - Common misconceptions about OpenAI by Jacob Hilton

LW - OpenAI's Alignment Plans by dkirmani

LW - What's the "This AI is of moral concern." fire alarm? by Quintin Pope

LW - What's the "This AI is of moral concern." fire alarm? by Quintin Pope

Somewhat Contra Marcus On AI Scaling

AF - Confused why a "capabilities research is good for alignment progress" position isn't discussed more by Kaj Sotala

#29 - OpenAI's InstructGPT is a Game Changer!

Short Circuit RACER

#28 - The Art of Prompt Design, OpenAI Codex, Fine Tuning and More with Author David Shapiro (daveshapautomator)

OpenAI's InstructGPT, Meta's New AI Supercomputer, China Regulates DeepFakes, AI Playes Tetris

#27 - OpenAI's New Embeddings API Endpoint, InstructGPT Announcement (Fine Tuning Research)