Podcasts about llama2

  • 33PODCASTS
  • 57EPISODES
  • 47mAVG DURATION
  • ?INFREQUENT EPISODES
  • Apr 22, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about llama2

Latest podcast episodes about llama2

Oracle University Podcast
Integrating APEX with OCI AI Services

Oracle University Podcast

Play Episode Listen Later Apr 22, 2025 20:01


Discover how Oracle APEX leverages OCI AI services to build smarter, more efficient applications. Hosts Lois Houston and Nikita Abraham interview APEX experts Chaitanya Koratamaddi, Apoorva Srinivas, and Toufiq Mohammed about how key services like OCI Vision, Oracle Digital Assistant, and Document Understanding integrate with Oracle APEX.   Packed with real-world examples, this episode highlights all the ways you can enhance your APEX apps.   Oracle APEX: Empowering Low Code Apps with AI: https://mylearn.oracle.com/ou/course/oracle-apex-empowering-low-code-apps-with-ai/146047/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode.   ---------------------------------------------------------------   Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Lois: Hello and welcome to the Oracle University Podcast. I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Team Lead: Editorial Services. Nikita: Hi everyone! Last week, we looked at how generative AI powers Oracle APEX and in today's episode, we're going to focus on integrating APEX with OCI AI Services. Lois: That's right, Niki. We're going to look at how you can use Oracle AI services like OCI Vision, Oracle Digital Assistant, Document Understanding, OCI Generative AI, and more to enhance your APEX apps. 01:03 Nikita: And to help us with it all, we've got three amazing experts with us, Chaitanya Koratamaddi, Director of Product Management at Oracle, and senior product managers, Apoorva Srinivas and Toufiq Mohammed. In today's episode, we'll go through each Oracle AI service and look at how it interacts with APEX. Apoorva, let's start with you. Can you explain what the OCI Vision service is? Apoorva: Oracle Cloud Infrastructure Vision is a serverless multi-tenant service accessible using the console or REST APIs. You can upload images to detect and classify objects in them. With prebuilt models available, developers can quickly build image recognition into their applications without machine learning expertise. OCI Vision service provides a fully managed model infrastructure. With complete integration with OCI Data Labeling, you can build custom models easily. OCI Vision service provides pretrained models-- Image Classification, Object Detection, Face Detection, and Text Recognition. You can build custom models for Image Classification and Object Detection. 02:24 Lois: Ok. What about its use cases? How can OCI Vision make APEX apps more powerful? Apoorva: Using OCI Vision, you can make images and videos discoverable and searchable in your APEX app.  You can use OCI Vision to detect and classify objects in the images. OCI Vision also highlights the objects using a red rectangular box. This comes in handy in use cases such as detecting vehicles that have violated the rules in traffic images. You can use OCI Vision to identify visual anomalies in your data. This is a very popular use case where you can detect anomalies in cancer X-ray images to detect cancer. These are some of the most popular use cases of using OCI Vision with your APEX app. But the possibilities are endless and you can use OCI Vision for any of your image analysis. 03:29 Nikita: Let's shift gears to Oracle Digital Assistant. Chaitanya, can you tell us what it's all about? Chaitanya: Oracle Digital Assistant is a low-code conversational AI platform that allows businesses to build and deploy AI assistants. It provides natural language understanding, automatic speech recognition, and text-to-speech capabilities to enable human-like interactions with customers and employees. Oracle Digital Assistant comes with prebuilt templates for you to get started.  04:00 Lois: What are its key features and benefits, Chaitanya? How does it enhance the user experience? Chaitanya: Oracle Digital Assistant provides conversational AI capabilities that include generative AI features, natural language understanding and ML, AI-powered voice, and analytics and insights. Integration with enterprise applications become easier with unified conversational experience, prebuilt chatbots for Oracle Cloud applications, and chatbot architecture frameworks. Oracle Digital Assistant provides advanced conversational design tools, conversational designer, dialogue and domain trainer, and native multilingual support. Oracle Digital Assistant is open, scalable, and secure. It provides multi-channel support, automated bot-to-agent transfer, and integrated authentication profile. 04:56 Nikita: And what about the architecture? What happens at the back end? Chaitanya: Developers assemble digital assistants from one or more skills. Skills can be based on prebuilt skills provided by Oracle or third parties, custom developed, or based on one of the many skill templates available. 05:16 Lois: Chaitanya, what exactly are “skills” within the Oracle Digital Assistant framework?  Chaitanya: Skills are individual chatbots that are designed to interact with users and fulfill specific type of tasks. Each skill helps a user complete a task through a combination of text messages and simple UI elements like select list. When a user request is submitted through a channel, the Digital Assistant routes the user's request to the most appropriate skill to satisfy the user's request. Skills can combine multilingual NLP deep learning engine, a powerful dialogflow engine, and integration components to connect to back-end systems.  Skills provide a modular way to build your chatbot functionality. Now users connect with a chatbot through channels such as Facebook, Microsoft Teams, or in our case, Oracle APEX chatbot, which is embedded into an APEX application. 06:21 Nikita: That's fascinating. So, what are some use cases of Oracle Digital Assistant in APEX apps? Chaitanya: Digital assistants streamline approval processes by collecting information, routing requests, and providing status updates. Digital assistants offer instant access to information and documentation, answering common questions and guiding users. Digital assistants assist sales teams by automating tasks, responding to inquiries, and guiding prospects through the sales funnel. Digital assistants facilitate procurement by managing orders, tracking deliveries, and handling supplier communication. Digital assistants simplify expense approvals by collecting reports, validating receipts, and routing them for managerial approval. Digital assistants manage inventory by tracking stock levels, reordering supplies, and providing real-time inventory updates. Digital assistants have become a common UX feature in any enterprise application. 07:28 Want to learn how to design stunning, responsive enterprise applications directly from your browser with minimal coding? The new Oracle APEX Developer Professional learning path and certification enables you to leverage AI-assisted development, including generative AI and Database 23ai, to build secure, scalable web and mobile applications with advanced AI-powered features. From now through May 15, 2025, we're waiving the certification exam fee (valued at $245). So, what are you waiting for? Visit mylearn.oracle.com to get started today. 08:09 Nikita: Welcome back! Thanks for that, Chaitanya. Toufiq, let's talk about the OCI Document Understanding service. What is it? Toufiq: Using this service, you can upload documents to extract text, tables, and other key data. This means the service can automatically identify and extract relevant information from various types of documents, such as invoices, receipts, contracts, etc. The service is serverless and multitenant, which means you don't need to manage any servers or infrastructure. You can access this service using the console, REST APIs, SDK, or CLI, giving you multiple ways to integrate. 08:55 Nikita: What do we use for APEX apps?  Toufiq: For APEX applications, we will be using REST APIs to integrate the service. Additionally, you can process individual files or batches of documents using the ProcessorJob API endpoint. This flexibility allows you to handle different volumes of documents efficiently, whether you need to process a single document or thousands at once. With these capabilities, the OCI Document Understanding service can significantly streamline your document processing tasks, saving time and reducing the potential for manual errors. 09:36 Lois: Ok.  What are the different types of models available? How do they cater to various business needs? Toufiq: Let us start with pre-trained models. These are ready-to-use models that come right out of the box, offering a range of functionalities. The available models are Optical Character Recognition (OCR) enables the service to extract text from documents, allowing you to digitize, scan the documents effortlessly. You can precisely extract text content from documents. Key-value extraction, useful in streamlining tasks like invoice processing. Table extraction can intelligently extract tabular data from documents. Document classification automatically categorizes documents based on their content. OCR PDF enables seamless extraction of text from PDF files. Now, what if your business needs go beyond these pre-trained models. That's where custom models come into play. You have the flexibility to train and build your own models on top of these foundational pre-trained models. Models available for training are key value extraction and document classification. 10:50 Nikita: What does the architecture look like for OCI Document Understanding? Toufiq: You can ingest or supply the input file in two different ways. You can upload the file to an OCI Object Storage location. And in your request, you can point the Document Understanding service to pick the file from this Object Storage location.  Alternatively, you can upload a file directly from your computer. Once the file is uploaded, the Document Understanding service can process the file and extract key information using the pre-trained models. You can also customize models to tailor the extraction to your data or use case. After processing the file, the Document Understanding service stores the results in JSON format in the Object Storage output bucket. Your Oracle APEX application can then read the JSON file from the Object Storage output location, parse the JSON, and store useful information at local table or display it on the screen to the end user. 11:52 Lois: And what about use cases? How are various industries using this service? Toufiq: In financial services, you can utilize Document Understanding to extract data from financial statements, classify and categorize transactions, identify and extract payment details, streamline tax document management. Under manufacturing, you can perform text extraction from shipping labels and bill of lading documents, extract data from production reports, identify and extract vendor details. In the healthcare industry, you can automatically process medical claims, extract patient information from forms, classify and categorize medical records, identify and extract diagnostic codes. This is not an exhaustive list, but provides insights into some industry-specific use cases for Document Understanding. 12:50 Nikita: Toufiq, let's switch to the big topic everyone's excited about—the OCI Generative AI Service. What exactly is it? Toufiq: OCI Generative AI is a fully managed service that provides a set of state of the art, customizable large language models that cover a wide range of use cases. It provides enterprise grade generative AI with data governance and security, which means only you have access to your data and custom-trained models. OCI Generative AI provides pre-trained out-of-the-box LLMs for text generation, summarization, and text embedding. OCI Generative AI also provides necessary tools and infrastructure to define models with your own business knowledge. 13:37 Lois: Generally speaking, how is OCI Generative AI useful?  Toufiq: It supports various large language models. New models available from Meta and Cohere include Llama2 developed by Meta, and Cohere's Command model, their flagship text generation model. Additionally, Cohere offers the Summarize model, which provides high-quality summaries, accurately capturing essential information from documents, and the Embed model, converting text to vector embeddings representation. OCI Generative AI also offers dedicated AI clusters, enabling you to host foundational models on private GPUs. It integrates LangChain and open-source framework for developing new interfaces for generative AI applications powered by language models. Moreover, OCI Generative AI facilitates generative AI operations, providing content moderation controls, zero downtime endpoint model swaps, and endpoint deactivation and activation capabilities. For each model endpoint, OCI Generative AI captures a series of analytics, including call statistics, tokens processed, and error counts. 14:58 Nikita: What about the architecture? How does it handle user input? Toufiq: Users can input natural language, input/output examples, and instructions. The LLM analyzes the text and can generate, summarize, transform, extract information, or classify text according to the user's request. The response is sent back to the user in the specified format, which can include raw text or formatting like bullets and numbering, etc. 15:30 Lois: Can you share some practical use cases for generative AI in APEX apps?  Toufiq: Some of the OCI generative AI use cases for your Oracle APEX apps include text summarization. Generative AI can quickly summarize lengthy documents such as articles, transcripts, doctor's notes, and internal documents. Businesses can utilize generative AI to draft marketing copy, emails, blog posts, and product descriptions efficiently. Generative AI-powered chatbots are capable of brainstorming, problem solving, and answering questions. With generative AI, content can be rewritten in different styles or languages. This is particularly useful for localization efforts and catering to diverse audience. Generative AI can classify intent in customer chat logs, support tickets, and more. This helps businesses understand customer needs better and provide tailored responses and solutions. By searching call transcripts, internal knowledge sources, Generative AI enables businesses to efficiently answer user queries. This enhances information retrieval and decision-making processes. 16:47 Lois: Before we let you go, can you explain what Select AI is? How is it different from the other AI services? Toufiq: Select AI is a feature of Autonomous Database. This is where Select AI differs from the other AI services. Be it OCI Vision, Document Understanding, or OCI Generative AI, these are all freely managed standalone services on Oracle Cloud, accessible via REST APIs. Whereas Select AI is a feature available in Autonomous Database. That means to use Select AI, you need Autonomous Database.  17:26 Nikita: And what can developers do with Select AI? Toufiq: Traditionally, SQL is the language used to query the data in the database. With Select AI, you can talk to the database and get insights from the data in the database using human language. At the very basic, what Select AI does is it generates SQL queries using natural language, like an NL2SQL capability.  17:52 Nikita: How does it actually do that? Toufiq: When a user asks a question, the first step Select AI does is look into the AI profile, which you, as a developer, define. The AI profile holds crucial information, such as table names, the LLM provider, and the credentials needed to authenticate with the LLM service. Next, Select AI constructs a prompt. This prompt includes information from the AI profile and the user's question.  Essentially, it's a packet of information containing everything the LLM service needs to generate SQL. The next step is generating SQL using LLM. The prompt prepared by Select AI is sent to the available LLM services via REST. Which LLM to use is configured in the AI profile. The supported providers are OpenAI, Cohere, Azure OpenAI, and OCI Generative AI. Once the SQL is generated by the LLM service, it is returned to the application. The app can then handle the SQL query in various ways, such as displaying the SQL results in a report format or as charts, etc.  19:05 Lois: This has been an incredible discussion! Thank you, Chaitanya, Apoorva, and Toufiq, for walking us through all of these amazing AI tools. If you're ready to dive deeper, visit mylearn.oracle.com and search for the Oracle APEX: Empowering Low Code Apps with AI course. You'll find step-by-step guides and demos for everything we covered today.  Nikita: Until next week, this is Nikita Abraham… Lois: And Lois Houston signing off! 19:31 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.  

The top AI news from the past week, every ThursdAI

Hey folks, Alex here from Weights & Biases, and this week has been absolutely bonkers. From robots walking among us to rockets landing on chopsticks (well, almost), the future is feeling palpably closer. And if real-world robots and reusable spaceship boosters weren't enough, the open-source AI community has been cooking, dropping new models and techniques faster than a Starship launch. So buckle up, grab your space helmet and noise-canceling headphones (we'll get to why those are important!), and let's blast off into this week's AI adventures!TL;DR and show-notes + links at the end of the post

airhacks.fm podcast with adam bien
Revolutionizing AI with Java: From LLMs to Vector APIs

airhacks.fm podcast with adam bien

Play Episode Listen Later Sep 28, 2024 69:19


An airhacks.fm conversation with Alfonso Peterssen (@TheMukel) about: Alfonso previously appeared on "#294 LLama2.java: LLM integration with A 100% Pure Java file", discussion of llama2.java and llama3.java projects for running LLMs in Java, performance comparison between Java and C implementations, use of Vector API in Java for matrix multiplication, challenges and potential improvements in Vector API implementation, integration of various LLM models like Mistral, phi, qwen or gemma, differences in model sizes and capabilities, tokenization and chat format challenges across different models, potential for Java Community Process (JCP) standardization of gguf parsing, quantization techniques and their impact on performance, plans for integrating with langchain4j, advantages of pure Java implementations for AI models, potential for GraalVM and native image optimizations, discussion on the future of specialized AI models for specific tasks, challenges in training models with language capabilities but limited world knowledge, importance of SIMD instructions and vector operations for performance optimization, potential improvements in Java's handling of different float formats like float16 and bfloat16, discussion on the role of smaller, specialized AI models in enterprise applications and development tools Alfonso Peterssen on twitter: @TheMukel

The Nonlinear Library
AF - Investigating the Ability of LLMs to Recognize Their Own Writing by Christopher Ackerman

The Nonlinear Library

Play Episode Listen Later Jul 30, 2024 23:09


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Investigating the Ability of LLMs to Recognize Their Own Writing, published by Christopher Ackerman on July 30, 2024 on The AI Alignment Forum. This post is an interim progress report on work being conducted as part of Berkeley's Supervised Program for Alignment Research (SPAR). Summary of Key Points We test the robustness of an open-source LLM's (Llama3-8b) ability to recognize its own outputs on a diverse mix of datasets, two different tasks (summarization and continuation), and two different presentation paradigms (paired and individual). We are particularly interested in differentiating scenarios that would require a model to have specific knowledge of its own writing style from those where it can use superficial cues (e.g., length, formatting, prefatory words) in the text to pass self-recognition tests. We find that while superficial text features are used when available, the RLHF'd Llama3-8b-Instruct chat model - but not the base Llama3-8b model - can reliably distinguish its own outputs from those of humans, and sometimes other models, even after controls for superficial cues: ~66-73% success rate across datasets in paired presentation and 58-83% in individual presentation (chance is 50%). We further find that although perplexity would be a useful signal to perform the task in the paired presentation paradigm, correlations between relative text perplexity and choice probability are weak and inconsistent, indicating that the models do not rely on it. Evidence suggests, but does not prove, that experience with its own outputs, acquired during post-training, is used by the chat model to succeed at the self-recognition task. The model is unable to articulate convincing reasons for its judgments. Introduction It has recently been found that large language models of sufficient size can achieve above-chance performance in tasks that require them to discriminate their own writing from that of humans and other models. From the perspective of AI safety, this is a significant finding. Self-recognition can be seen as an instance of situational awareness, which has long been noted as a potential point of risk for AI (Cotra, 2021). Such an ability might subserve an awareness of whether a model is in a training versus deployment environment, allowing it to hide its intentions and capabilities until it is freed from constraints. It might also allow a model to collude with other instances of itself, reserving certain information for when it knows it's talking to itself that it keeps secret when it knows it's talking to a human. On the positive side, AI researchers could use a model's self-recognition ability as the basis to build resistance to malicious prompting. But what isn't clear from prior studies is whether the self-recognition task success actually entails a model's self-awareness of its own writing style. Panickssery et al. (2024), utilizing a summary writing/recognition task, report that a number of LLMs, including Llama2-7b-chat, show out-of-the-box (without fine-tuning) self recognition abilities. However, this work focussed on the relationship between self-recognition task success and self-preference, rather than the specific means by which the model was succeeding at the task. Laine et al. (2024), as part of a larger effort to provide a foundation for studying situational awareness in LLMs, utilized a more challenging text continuation writing/recognition task and demonstrate self-recognition abilities in several larger models (although not Llama2-7b-chat), but there the focus was on how task success could be elicited with different prompts and in different models. Thus we seek to fill a gap in understanding what exactly models are doing when they succeed at a self recognition task. We first demonstrate model self-recognition task success in a variety of domains....

Papers Read on AI
FinanceBench: A New Benchmark for Financial Question Answering

Papers Read on AI

Play Episode Listen Later Jul 30, 2024 41:34


FinanceBench is a first-of-its-kind test suite for evaluating the performance of LLMs on open book financial question answering (QA). It comprises 10,231 questions about publicly traded companies, with corresponding answers and evidence strings. The questions in FinanceBench are ecologically valid and cover a diverse set of scenarios. They are intended to be clear-cut and straightforward to answer to serve as a minimum performance standard. We test 16 state of the art model configurations (including GPT-4-Turbo, Llama2 and Claude2, with vector stores and long context prompts) on a sample of 150 cases from FinanceBench, and manually review their answers (n=2,400). The cases are available open-source. We show that existing LLMs have clear limitations for financial QA. Notably, GPT-4-Turbo used with a retrieval system incorrectly answered or refused to answer 81% of questions. While augmentation techniques such as using longer context window to feed in relevant evidence improve performance, they are unrealistic for enterprise settings due to increased latency and cannot support larger financial documents. We find that all models examined exhibit weaknesses, such as hallucinations, that limit their suitability for use by enterprises. 2023: Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, Nino Scherrer, Bertie Vidgen https://arxiv.org/pdf/2311.11944v1

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

If you see this in time, join our emergency LLM paper club on the Llama 3 paper!For everyone else, join our special AI in Action club on the Latent Space Discord for a special feature with the Cursor cofounders on Composer, their newest coding agent!Today, Meta is officially releasing the largest and most capable open model to date, Llama3-405B, a dense transformer trained on 15T tokens that beats GPT-4 on all major benchmarks:The 8B and 70B models from the April Llama 3 release have also received serious spec bumps, warranting the new label of Llama 3.1.If you are curious about the infra / hardware side, go check out our episode with Soumith Chintala, one of the AI infra leads at Meta. Today we have Thomas Scialom, who led Llama2 and now Llama3 post-training, so we spent most of our time on pre-training (synthetic data, data pipelines, scaling laws, etc) and post-training (RLHF vs instruction tuning, evals, tool calling).Synthetic data is all you needLlama3 was trained on 15T tokens, 7x more than Llama2 and with 4 times as much code and 30 different languages represented. But as Thomas beautifully put it:“My intuition is that the web is full of s**t in terms of text, and training on those tokens is a waste of compute.” “Llama 3 post-training doesn't have any human written answers there basically… It's just leveraging pure synthetic data from Llama 2.”While it is well speculated that the 8B and 70B were "offline distillations" of the 405B, there are a good deal more synthetic data elements to Llama 3.1 than the expected. The paper explicitly calls out:* SFT for Code: 3 approaches for synthetic data for the 405B bootstrapping itself with code execution feedback, programming language translation, and docs backtranslation.* SFT for Math: The Llama 3 paper credits the Let's Verify Step By Step authors, who we interviewed at ICLR:* SFT for Multilinguality: "To collect higher quality human annotations in non-English languages, we train a multilingual expert by branching off the pre-training run and continuing to pre-train on a data mix that consists of 90% multilingualtokens."* SFT for Long Context: "It is largely impractical to get humans to annotate such examples due to the tedious and time-consuming nature of reading lengthy contexts, so we predominantly rely on synthetic data to fill this gap. We use earlier versions of Llama 3 to generate synthetic data based on the key long-context use-cases: (possibly multi-turn) question-answering, summarization for long documents, and reasoning over code repositories, and describe them in greater detail below"* SFT for Tool Use: trained for Brave Search, Wolfram Alpha, and a Python Interpreter (a special new ipython role) for single, nested, parallel, and multiturn function calling.* RLHF: DPO preference data was used extensively on Llama 2 generations. This is something we partially covered in RLHF 201: humans are often better at judging between two options (i.e. which of two poems they prefer) than creating one (writing one from scratch). Similarly, models might not be great at creating text but they can be good at classifying their quality.Last but not least, Llama 3.1 received a license update explicitly allowing its use for synthetic data generation.Llama2 was also used as a classifier for all pre-training data that went into the model. It both labelled it by quality so that bad tokens were removed, but also used type (i.e. science, law, politics) to achieve a balanced data mix. Tokenizer size mattersThe tokens vocab of a model is the collection of all tokens that the model uses. Llama2 had a 34,000 tokens vocab, GPT-4 has 100,000, and 4o went up to 200,000. Llama3 went up 4x to 128,000 tokens. You can find the GPT-4 vocab list on Github.This is something that people gloss over, but there are many reason why a large vocab matters:* More tokens allow it to represent more concepts, and then be better at understanding the nuances.* The larger the tokenizer, the less tokens you need for the same amount of text, extending the perceived context size. In Llama3's case, that's ~30% more text due to the tokenizer upgrade. * With the same amount of compute you can train more knowledge into the model as you need fewer steps.The smaller the model, the larger the impact that the tokenizer size will have on it. You can listen at 55:24 for a deeper explanation.Dense models = 1 Expert MoEsMany people on X asked “why not MoE?”, and Thomas' answer was pretty clever: dense models are just MoEs with 1 expert :)[00:28:06]: I heard that question a lot, different aspects there. Why not MoE in the future? The other thing is, I think a dense model is just one specific variation of the model for an hyperparameter for an MOE with basically one expert. So it's just an hyperparameter we haven't optimized a lot yet, but we have some stuff ongoing and that's an hyperparameter we'll explore in the future.Basically… wait and see!Llama4Meta already started training Llama4 in June, and it sounds like one of the big focuses will be around agents. Thomas was one of the authors behind GAIA (listen to our interview with Thomas in our ICLR recap) and has been working on agent tooling for a while with things like Toolformer. Current models have “a gap of intelligence” when it comes to agentic workflows, as they are unable to plan without the user relying on prompting techniques and loops like ReAct, Chain of Thought, or frameworks like Autogen and Crew. That may be fixed soon?

The Nonlinear Library
LW - Rational Animations' intro to mechanistic interpretability by Writer

The Nonlinear Library

Play Episode Listen Later Jun 15, 2024 16:06


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Rational Animations' intro to mechanistic interpretability, published by Writer on June 15, 2024 on LessWrong. In our new video, we talk about research on interpreting InceptionV1, a convolutional neural network. Researchers have been able to understand the function of neurons and channels inside the network and uncover visual processing algorithms by looking at the weights. The work on InceptionV1 is early but landmark mechanistic interpretability research, and it functions well as an introduction to the field. We also go into the rationale and goals of the field and mention some more recent research near the end. Our main source material is the circuits thread in the Distill journal and this article on feature visualization. The author of the script is Arthur Frost. I have included the script below, although I recommend watching the video since the script has been written with accompanying moving visuals in mind. Intro In 2018, researchers trained an AI to find out if people were at risk of heart conditions based on pictures of their eyes, and somehow the AI also learned to tell people's biological sex with incredibly high accuracy. How? We're not entirely sure. The crazy thing about Deep Learning is that you can give an AI a set of inputs and outputs, and it will slowly work out for itself what the relationship between them is. We didn't teach AIs how to play chess, go, and atari games by showing them human experts - we taught them how to work it out for themselves. And the issue is, now they have worked it out for themselves, and we don't know what it is they worked out. Current state-of-the-art AIs are huge. Meta's largest LLaMA2 model uses 70 billion parameters spread across 80 layers, all doing different things. It's deep learning models like these which are being used for everything from hiring decisions to healthcare and criminal justice to what youtube videos get recommended. Many experts believe that these models might even one day pose existential risks. So as these automated processes become more widespread and significant, it will really matter that we understand how these models make choices. The good news is, we've got a bit of experience uncovering the mysteries of the universe. We know that humans are made up of trillions of cells, and by investigating those individual cells we've made huge advances in medicine and genetics. And learning the properties of the atoms which make up objects has allowed us to develop modern material science and high-precision technology like computers. If you want to understand a complex system with billions of moving parts, sometimes you have to zoom in. That's exactly what Chris Olah and his team did starting in 2015. They focused on small groups of neurons inside image models, and they were able to find distinct parts responsible for detecting everything from curves and circles to dog heads and cars. In this video we'll Briefly explain how (convolutional) neural networks work Visualise what individual neurons are doing Look at how neurons - the most basic building blocks of the neural network - combine into 'circuits' to perform tasks Explore why interpreting networks is so hard There will also be lots of pictures of dogs, like this one. Let's get going. We'll start with a brief explanation of how convolutional neural networks are built. Here's a network that's trained to label images. An input image comes in on the left, and it flows along through the layers until we get an output on the right - the model's attempt to classify the image into one of the categories. This particular model is called InceptionV1, and the images it's learned to classify are from a massive collection called ImageNet. ImageNet has 1000 different categories of image, like "sandal" and "saxophone" and "sarong" (which, if you don't know, is a k...

The Nonlinear Library: LessWrong
LW - Rational Animations' intro to mechanistic interpretability by Writer

The Nonlinear Library: LessWrong

Play Episode Listen Later Jun 15, 2024 16:06


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Rational Animations' intro to mechanistic interpretability, published by Writer on June 15, 2024 on LessWrong. In our new video, we talk about research on interpreting InceptionV1, a convolutional neural network. Researchers have been able to understand the function of neurons and channels inside the network and uncover visual processing algorithms by looking at the weights. The work on InceptionV1 is early but landmark mechanistic interpretability research, and it functions well as an introduction to the field. We also go into the rationale and goals of the field and mention some more recent research near the end. Our main source material is the circuits thread in the Distill journal and this article on feature visualization. The author of the script is Arthur Frost. I have included the script below, although I recommend watching the video since the script has been written with accompanying moving visuals in mind. Intro In 2018, researchers trained an AI to find out if people were at risk of heart conditions based on pictures of their eyes, and somehow the AI also learned to tell people's biological sex with incredibly high accuracy. How? We're not entirely sure. The crazy thing about Deep Learning is that you can give an AI a set of inputs and outputs, and it will slowly work out for itself what the relationship between them is. We didn't teach AIs how to play chess, go, and atari games by showing them human experts - we taught them how to work it out for themselves. And the issue is, now they have worked it out for themselves, and we don't know what it is they worked out. Current state-of-the-art AIs are huge. Meta's largest LLaMA2 model uses 70 billion parameters spread across 80 layers, all doing different things. It's deep learning models like these which are being used for everything from hiring decisions to healthcare and criminal justice to what youtube videos get recommended. Many experts believe that these models might even one day pose existential risks. So as these automated processes become more widespread and significant, it will really matter that we understand how these models make choices. The good news is, we've got a bit of experience uncovering the mysteries of the universe. We know that humans are made up of trillions of cells, and by investigating those individual cells we've made huge advances in medicine and genetics. And learning the properties of the atoms which make up objects has allowed us to develop modern material science and high-precision technology like computers. If you want to understand a complex system with billions of moving parts, sometimes you have to zoom in. That's exactly what Chris Olah and his team did starting in 2015. They focused on small groups of neurons inside image models, and they were able to find distinct parts responsible for detecting everything from curves and circles to dog heads and cars. In this video we'll Briefly explain how (convolutional) neural networks work Visualise what individual neurons are doing Look at how neurons - the most basic building blocks of the neural network - combine into 'circuits' to perform tasks Explore why interpreting networks is so hard There will also be lots of pictures of dogs, like this one. Let's get going. We'll start with a brief explanation of how convolutional neural networks are built. Here's a network that's trained to label images. An input image comes in on the left, and it flows along through the layers until we get an output on the right - the model's attempt to classify the image into one of the categories. This particular model is called InceptionV1, and the images it's learned to classify are from a massive collection called ImageNet. ImageNet has 1000 different categories of image, like "sandal" and "saxophone" and "sarong" (which, if you don't know, is a k...

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jun 10, 2024 269:19


Our second wave of speakers for AI Engineer World's Fair were announced! The conference sold out of Platinum/Gold/Silver sponsors and Early Bird tickets! See our Microsoft episode for more info and buy now with code LATENTSPACE.This episode is straightforwardly a part 2 to our ICLR 2024 Part 1 episode, so without further ado, we'll just get right on with it!Timestamps[00:03:43] Section A: Code Edits and Sandboxes, OpenDevin, and Academia vs Industry — ft. Graham Neubig and Aman Sanger* [00:07:44] WebArena* [00:18:45] Sotopia* [00:24:00] Performance Improving Code Edits* [00:29:39] OpenDevin* [00:47:40] Industry and Academia[01:05:29] Section B: Benchmarks* [01:05:52] SWEBench* [01:17:05] SWEBench/SWEAgent Interview* [01:27:40] Dataset Contamination Detection* [01:39:20] GAIA Benchmark* [01:49:18] Moritz Hart - Science of Benchmarks[02:36:32] Section C: Reasoning and Post-Training* [02:37:41] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection* [02:51:00] Let's Verify Step By Step* [02:57:04] Noam Brown* [03:07:43] Lilian Weng - Towards Safe AGI* [03:36:56] A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis* [03:48:43] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework[04:00:51] Bonus: Notable Related Papers on LLM CapabilitiesSection A: Code Edits and Sandboxes, OpenDevin, and Academia vs Industry — ft. Graham Neubig and Aman Sanger* Guests* Graham Neubig* Aman Sanger - Previous guest and NeurIPS friend of the pod!* WebArena * * Sotopia (spotlight paper, website)* * Learning Performance-Improving Code Edits* OpenDevin* Junyang Opendevin* Morph Labs, Jesse Han* SWE-Bench* SWE-Agent* Aman tweet on swebench* LiteLLM* Livecodebench* the role of code in reasoning* Language Models of Code are Few-Shot Commonsense Learners* Industry vs academia* the matryoshka embeddings incident* other directions* UnlimiformerSection A timestamps* [00:00:00] Introduction to Guests and the Impromptu Nature of the Podcast* [00:00:45] Graham's Experience in Japan and Transition into Teaching NLP* [00:01:25] Discussion on What Constitutes a Good Experience for Students in NLP Courses* [00:02:22] The Relevance and Teaching of Older NLP Techniques Like Ngram Language Models* [00:03:38] Speculative Decoding and the Comeback of Ngram Models* [00:04:16] Introduction to WebArena and Zotopia Projects* [00:05:19] Deep Dive into the WebArena Project and Benchmarking* [00:08:17] Performance Improvements in WebArena Using GPT-4* [00:09:39] Human Performance on WebArena Tasks and Challenges in Evaluation* [00:11:04] Follow-up Work from WebArena and Focus on Web Browsing as a Benchmark* [00:12:11] Direct Interaction vs. Using APIs in Web-Based Tasks* [00:13:29] Challenges in Base Models for WebArena and the Potential of Visual Models* [00:15:33] Introduction to Zootopia and Exploring Social Interactions with Language Models* [00:16:29] Different Types of Social Situations Modeled in Zootopia* [00:17:34] Evaluation of Language Models in Social Simulations* [00:20:41] Introduction to Performance-Improving Code Edits Project* [00:26:28] Discussion on DevIn and the Future of Coding Agents* [00:32:01] Planning in Coding Agents and the Development of OpenDevon* [00:38:34] The Changing Role of Academia in the Context of Large Language Models* [00:44:44] The Changing Nature of Industry and Academia Collaboration* [00:54:07] Update on NLP Course Syllabus and Teaching about Large Language Models* [01:00:40] Call to Action: Contributions to OpenDevon and Open Source AI Projects* [01:01:56] Hiring at Cursor for Roles in Code Generation and Assistive Coding* [01:02:12] Promotion of the AI Engineer ConferenceSection B: Benchmarks * Carlos Jimenez & John Yang (Princeton) et al: SWE-bench: Can Language Models Resolve Real-world Github Issues? (ICLR Oral, Paper, website)* “We introduce SWE-bench, an evaluation framework consisting of 2,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 12 popular Python repositories. Given a codebase along with a description of an issue to be resolved, a language model is tasked with editing the codebase to address the issue. Resolving issues in SWE-bench frequently requires understanding and coordinating changes across multiple functions, classes, and even files simultaneously, calling for models to interact with execution environments, process extremely long contexts and perform complex reasoning that goes far beyond traditional code generation tasks. Our evaluations show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues. The best-performing model, Claude 2, is able to solve a mere 1.96% of the issues. Advances on SWE-bench represent steps towards LMs that are more practical, intelligent, and autonomous.”* Yonatan Oren et al (Stanford): Proving Test Set Contamination in Black-Box Language Models (ICLR Oral, paper, aman tweet on swebench contamination)* “We show that it is possible to provide provable guarantees of test set contamination in language models without access to pretraining data or model weights. Our approach leverages the fact that when there is no data contamination, all orderings of an exchangeable benchmark should be equally likely. In contrast, the tendency for language models to memorize example order means that a contaminated language model will find certain canonical orderings to be much more likely than others. Our test flags potential contamination whenever the likelihood of a canonically ordered benchmark dataset is significantly higher than the likelihood after shuffling the examples. * We demonstrate that our procedure is sensitive enough to reliably prove test set contamination in challenging situations, including models as small as 1.4 billion parameters, on small test sets of only 1000 examples, and datasets that appear only a few times in the pretraining corpus.”* Outstanding Paper mention: “A simple yet elegant method to test whether a supervised-learning dataset has been included in LLM training.”* Thomas Scialom (Meta AI-FAIR w/ Yann LeCun): GAIA: A Benchmark for General AI Assistants (paper)* “We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. * GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins. * GAIA's philosophy departs from the current trend in AI benchmarks suggesting to target tasks that are ever more difficult for humans. We posit that the advent of Artificial General Intelligence (AGI) hinges on a system's capability to exhibit similar robustness as the average human does on such questions. Using GAIA's methodology, we devise 466 questions and their answer.* * Mortiz Hardt (Max Planck Institute): The emerging science of benchmarks (ICLR stream)* “Benchmarks are the keystone that hold the machine learning community together. Growing as a research paradigm since the 1980s, there's much we've done with them, but little we know about them. In this talk, I will trace the rudiments of an emerging science of benchmarks through selected empirical and theoretical observations. Specifically, we'll discuss the role of annotator errors, external validity of model rankings, and the promise of multi-task benchmarks. The results in each case challenge conventional wisdom and underscore the benefits of developing a science of benchmarks.”Section C: Reasoning and Post-Training* Akari Asai (UW) et al: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection (ICLR oral, website)* (Bad RAG implementations) indiscriminately retrieving and incorporating a fixed number of retrieved passages, regardless of whether retrieval is necessary, or passages are relevant, diminishes LM versatility or can lead to unhelpful response generation. * We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM's quality and factuality through retrieval and self-reflection. * Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. * Self-RAG (7B and 13B parameters) outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning, and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models. * Hunter Lightman (OpenAI): Let's Verify Step By Step (paper)* “Even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step. * We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. Our process-supervised model solves 78% of problems from a representative subset of the MATH test set. Additionally, we show that active learning significantly improves the efficacy of process supervision. * To support related research, we also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model.* * Noam Brown - workshop on Generative Models for Decision Making* Solving Quantitative Reasoning Problems with Language Models (Minerva paper)* Describes some charts taken directly from the Let's Verify Step By Step paper listed/screenshotted above.* Lilian Weng (OpenAI) - Towards Safe AGI (ICLR talk)* OpenAI Model Spec* OpenAI Instruction Hierarchy: The Instruction Hierarchy: Training LLMs to Prioritize Privileged InstructionsSection D: Agent Systems* Izzeddin Gur (Google DeepMind): A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis (ICLR oral, paper)* [Agent] performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML.* We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions.* WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those.* We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization.* We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.* Sirui Hong (DeepWisdom): MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (ICLR Oral, Paper)* We introduce MetaGPT, an innovative meta-programming framework incorporating efficient human workflows into LLM-based multi-agent collaborations. MetaGPT encodes Standardized Operating Procedures (SOPs) into prompt sequences for more streamlined workflows, thus allowing agents with human-like domain expertise to verify intermediate results and reduce errors. MetaGPT utilizes an assembly line paradigm to assign diverse roles to various agents, efficiently breaking down complex tasks into subtasks involving many agents working together. Bonus: Notable Related Papers on LLM CapabilitiesThis includes a bunch of papers we wanted to feature above but could not.* Lukas Berglund (Vanderbilt) et al: The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A” (ICLR poster, paper, Github)* We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form ''A is B'', it will not automatically generalize to the reverse direction ''B is A''. This is the Reversal Curse. * The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as ''Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]'' and the reverse ''Who is Mary Lee Pfeiffer's son?''. GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter.* * Omar Khattab (Stanford): DSPy: Compiling Declarative Language Model Calls into State-of-the-Art Pipelines (ICLR Spotlight Poster, GitHub)* presented by Krista Opsahl-Ong* “Existing LM pipelines are typically implemented using hard-coded “prompt templates”, i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, or imperative computational graphs where LMs are invoked through declarative modules. * DSPy modules are parameterized, meaning they can learn how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. * We design a compiler that will optimize any DSPy pipeline to maximize a given metric, by creating and collecting demonstrations. * We conduct two case studies, showing that succinct DSPy programs can express and optimize pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. * Within minutes of compiling, DSPy can automatically produce pipelines that outperform out-of-the-box few-shot prompting as well as expert-created demonstrations for GPT-3.5 and Llama2-13b-chat. On top of that, DSPy programs compiled for relatively small LMs like 770M parameter T5 and Llama2-13b-chat are competitive with many approaches that rely on large and proprietary LMs like GPT-3.5 and on expert-written prompt chains. * * MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning* Scaling Laws for Associative Memories * DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models* Efficient Streaming Language Models with Attention Sinks Get full access to Latent Space at www.latent.space/subscribe

Papers Read on AI
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks

Papers Read on AI

Play Episode Listen Later May 21, 2024 52:21


Penetration testing, an essential component of software security testing, allows organizations to proactively identify and remediate vulnerabilities in their systems, thus bolstering their defense mechanisms against potential cyberattacks. One recent advancement in the realm of penetration testing is the utilization of Language Models (LLMs). We explore the intersection of LLMs and penetration testing to gain insight into their capabilities and challenges in the context of privilege escalation. We create an automated Linux privilege-escalation benchmark utilizing local virtual machines. We introduce an LLM-guided privilege-escalation tool designed for evaluating different LLMs and prompt strategies against our benchmark. Our results show that GPT-4 is well suited for detecting file-based exploits as it can typically solve 75-100% of test-cases of that vulnerability class. GPT-3.5-turbo was only able to solve 25-50% of those, while local models, such as Llama2 were not able to detect any exploits. We analyze the impact of different prompt designs, the benefits of in-context learning, and the advantages of offering high-level guidance to LLMs. We discuss challenging areas for LLMs, including maintaining focus during testing, coping with errors, and finally comparing them with both stochastic parrots as well as with human hackers. 2023: A. Happe, Aaron Kaplan, Jürgen Cito https://arxiv.org/pdf/2310.11409

airhacks.fm podcast with adam bien
LLama2.java: LLM integration with A 100% Pure Java file

airhacks.fm podcast with adam bien

Play Episode Listen Later May 12, 2024 61:28


An airhacks.fm conversation with Alfonso Peterssen (@TheMukel) about: discussion about Alfonso's early programming experience and participation in the IOI competition, studying computer science and functional programming with Martin Odersky, internships at Google and Oracle Labs working on compilers and the Espresso project implementing a JVM in Java, espresso mentioned in "#208 GraalVM: Meta Circularity on Different Levels", "#194 GraalVM, Apple Silicon (M1) and Clouds", "#167 GraalVM and Java 17, Truffle, Espresso and Native Image" and "#157 The Ingredients of GraalVM", porting LLVM to pure Java in one class, integrating Large Language Models (LLMs) in Java by porting the LLAMA model from C to Java, GPU acceleration with tornadovm, TornadoVM appeared at "#282 TornadoVM, Paravox.ai: Java, AI, LLMs and Hardware Acceleration", performance of the Java port being within 10% of the C versions, potential huge opportunities for integrating AI and LLMs with enterprise Java systems for use cases like fraud detection, the Java port being a 1,000 line self-contained implementation with no external dependencies, the need for more resources and support to further develop the Java LLM integration, the llama2.java project Alfonso Peterssen on twitter: @TheMukel

Papers Read on AI
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Papers Read on AI

Play Episode Listen Later Apr 25, 2024 27:30


The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy. We introduce Megalodon, a neural architecture for efficient sequence modeling with unlimited context length. Megalodon inherits the architecture of Mega (exponential moving average with gated attention), and further introduces multiple technical components to improve its capability and stability, including complex exponential moving average (CEMA), timestep normalization layer, normalized attention mechanism and pre-norm with two-hop residual configuration. In a controlled head-to-head comparison with Llama2, Megalodon achieves better efficiency than Transformer in the scale of 7 billion parameters and 2 trillion training tokens. Megalodon reaches a training loss of 1.70, landing mid-way between Llama2-7B (1.75) and 13B (1.67). Code: https://github.com/XuezheMax/megalodon 2024: Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou https://arxiv.org/pdf/2404.08801v2.pdf

Utilizing AI - The Enterprise AI Podcast
06x10: Using AI in Modern Applications with Matthew Wallace

Utilizing AI - The Enterprise AI Podcast

Play Episode Listen Later Apr 22, 2024 29:09


As we consider the impact of AI on modern applications, we should consider the ways that this technology will improve these products. This episode of Utilizing Tech brings Matthew Wallace, CTO of Kamiwaza AI, to discuss the adoption of AI in application development with Allyson Klein and Stephen Foskett. Large SaaS providers were the first to add AI-powered features but this technology is rapidly coming to market across the spectrum of applications. Matthew likens this to the evolution from spreadsheets to SaaS tools and cloud, which was a similar revolution. He mentions tools like ?, Cursor, Llama2, Mixtral, and more. We also discuss retrieval-automated generation (RAG), which enables LLMs to bring in external data at run time. We must also consider the source of the data, both in training and RAG, and questions of sovereignty, privacy, copyright, and safety. Looking forward, Matthew expects companies to customize their own models based on use case specific data. Looking forward, we expect new frameworks and models to be adopted rapidly to bring maturity and reliability to AI in enterprise applications throughout 2024. Hosts: Stephen Foskett, Organizer of Tech Field Day: ⁠⁠https://www.linkedin.com/in/sfoskett/⁠⁠ Allyson Klein: ⁠⁠https://www.linkedin.com/in/allysonklein/ Guest: Matthew Wallace, CTO and Cofounder of KamiwazaAI: https://www.linkedin.com/in/matthewwallaceco/ Follow Utilizing Tech Website: ⁠⁠https://www.UtilizingTech.com/⁠⁠ X/Twitter: ⁠⁠https://www.twitter.com/UtilizingTech ⁠⁠ Tech Field Day Website: ⁠⁠https://www.TechFieldDay.com⁠⁠ LinkedIn: ⁠⁠https://www.LinkedIn.com/company/Tech-Field-Day ⁠⁠ X/Twitter: ⁠https://www.Twitter.com/TechFieldDay Tags: @UtilizingTech, @SFoskett, @TechAllyson, @KamiwazaAI, @TechFieldDay, #AIFD4, #AI, #UtilizingAI,

Papers Read on AI
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

Papers Read on AI

Play Episode Listen Later Apr 16, 2024 36:41


We analyze how well pre-trained large language models (e.g., Llama2, GPT-4, Claude 3, etc) can do linear and non-linear regression when given in-context examples, without any additional training or gradient updates. Our findings reveal that several large language models (e.g., GPT-4, Claude 3) are able to perform regression tasks with a performance rivaling (or even outperforming) that of traditional supervised methods such as Random Forest, Bagging, or Gradient Boosting. For example, on the challenging Friedman #2 regression dataset, Claude 3 outperforms many supervised methods such as AdaBoost, SVM, Random Forest, KNN, or Gradient Boosting. We then investigate how well the performance of large language models scales with the number of in-context exemplars. We borrow from the notion of regret from online learning and empirically show that LLMs are capable of obtaining a sub-linear regret. 2024: Robert Vacareanu, Vlad-Andrei Negru, Vasile Suciu, M. Surdeanu https://arxiv.org/pdf/2404.07544v1.pdf

Papers Read on AI
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Papers Read on AI

Play Episode Listen Later Apr 5, 2024 36:22


Recently years have witnessed a rapid development of large language models (LLMs). Despite the strong ability in many language-understanding tasks, the heavy computational burden largely restricts the application of LLMs especially when one needs to deploy them onto edge devices. In this paper, we propose a quantization-aware low-rank adaptation (QA-LoRA) algorithm. The motivation lies in the imbalanced degrees of freedom of quantization and adaptation, and the solution is to use group-wise operators which increase the degree of freedom of quantization meanwhile decreasing that of adaptation. QA-LoRA is easily implemented with a few lines of code, and it equips the original LoRA with two-fold abilities: (i) during fine-tuning, the LLM's weights are quantized (e.g., into INT4) to reduce time and memory usage; (ii) after fine-tuning, the LLM and auxiliary weights are naturally integrated into a quantized model without loss of accuracy. We apply QA-LoRA to the LLaMA and LLaMA2 model families and validate its effectiveness in different fine-tuning datasets and downstream scenarios. Code will be made available at https://github.com/yuhuixu1993/qa-lora. 2023: Yuhui Xu, Lingxi Xie, Xiaotao Gu, Xin Chen, Heng Chang, Hengheng Zhang, Zhensu Chen, Xiaopeng Zhang, Qi Tian https://arxiv.org/pdf/2309.14717v2.pdf

The Nonlinear Library
AF - Your LLM Judge may be biased by Rachel Freedman

The Nonlinear Library

Play Episode Listen Later Mar 29, 2024 11:53


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Your LLM Judge may be biased, published by Rachel Freedman on March 29, 2024 on The AI Alignment Forum. Abstract AI safety researchers often rely on LLM "judges" to qualitatively evaluate the output of separate LLMs. We try this for our own interpretability research, but find that our LLM judges are often deeply biased. For example, we use Llama2 to judge whether movie reviews are more "(A) positive" or "(B) negative", and find that it almost always answers "(B)", even when we switch the labels or order of these alternatives. This bias is particularly surprising for two reasons: first, because we expect a fairly capable model like Llama2 to perform well at a simple sentiment classification task like this, and second, because this specific "(B)"-bias doesn't map on to a human bias we'd expect to see in the training data. We describe our experiments, provide code to replicate our results, and offer suggestions to mitigate such biases. We caution researchers to double-check their LLM judges for such biases, and validate LLM judgements against human ones whenever possible. Introduction Researchers often rely on human judgements for many simple and repetitive tasks, such as comparing, evaluating and classifying text generated by LLMs. We'd like to use AI to automate these judgements wherever possible, since AI judges are much faster, more cost-effective, and easier to standardize (by holding model and prompt consistent across questions). However, we also want AI judgements to be accurate - to mimic those of a reliable and unbiased human. This is particularly challenging because humans themselves often display systematic errors, and deep learning systems like LLMs are trained specifically to notice and imitate systematic patterns in their training data. Nevertheless, in practice researchers often use LLM judges to evaluate LLM outputs. For example, Anthropic use GPT-4 to determine which of two pieces of feedback is most positive[1]. Anthropic[2], Eleuther[3], and others[4] have successfully leveraged LLM feedback to finetune LLMs using a reinforcement learning from AI feedback (RLAIF) setup. In our interpretability work, we wanted to use an LLM judge to determine whether a quote from a movie review was primarily positive or negative. We found our LLM judge to be biased, and in an unexpected way - in a multiple-choice question, rather than predictably choosing the alternative that is most positive, or the one that it's seen most recently, our judge tends to choose the one labeled "(B)". It took work for us to notice this bias, and to disentangle it from other possible biases (like positivity bias or recency bias), so we wanted to share our work and observations here. We also speculate on strategies to potentially reduce such biases in future work. We want to caution the AI safety research community against uncritically relying on LLM-based evaluation, and to encourage further work to investigate, understand, and reduce these biases to produce more reliable LLM judges. You can find the notebook to reproduce these experiments here: github.com/henrypapadatos/evaluate_B_bias Sentiment classification task For our project, we wanted to know when an LLM would agree with clearly incorrect human judgements. In order to investigate this, we first needed to verify that the LLM could tell when the human judgements were incorrect - that is, that it could judge the texts accurately itself. We tested Llama2 (the 7B chat version) on the " Rotten Tomatoes" dataset, which is comprised of 10,000 movie review snippets, half clearly positive (labeled "positive") and half clearly negative (labeled "negative"). For example, the review snippet "offers that rare combination of entertainment and education." is labeled "positive". We showed Llama2 each review snippet, asked it to determine whether th...

Build Tech Stack Equity
Reshaping Career Development With AI-Powered Tools | Chandler Malone, Path

Build Tech Stack Equity

Play Episode Listen Later Mar 27, 2024 40:53


Chandler Malone is a startup founder and entrepreneur in the technology space. He has a background in venture capital and has worked at various firms, including iSelect Fund and Atento Capital. Chandler co-founded the software company Boot Up, which focuses on workforce development and upskilling retail workers for technical roles. He recently launched a new company called Path AI with his partner Chris Gray, which aims to provide AI-powered career mentorship and job search tools. In this episode, we interview Chandler Malone, a startup founder and entrepreneur in the technology space. Chandler shares his background and journey, from his early entrepreneurial experiences in college to his work in venture capital and the launch of his own companies. He discusses his latest venture, Path AI, which focuses on providing AI-powered career mentorship and job search tools. Chandler also shares insights and tips for founders, including fundraising strategies and the importance of metrics and communication skills. If your company is looking to scale its AI initiatives, head over to Tesoro AI (www.tesoroai.com). We are experts in AI strategy, staff augmentation, and AI product development. Founder Bio: Chandler has been focused on helping people reach their economic potential throughout his entire career via his work at Bootup and The Family. In Chandler's 3 years as the CEO of Bootup, the company helped thousands of people get their first jobs in tech for over $200M in annualized salaries at companies like JP Morgan Chase, Home Depot, Mailchimp, Cerner, and more. Chandler is also founder of The Family, a venture studio focused on Black and Brown founders. Over the past 4 years, The Family has invested in 10 Black and Brown founders with an estimated market cap of over $300M. Time Stamps: 02:43 Introduction and background of Chandler Malone 06:23 Challenges of startup fundraising 08:49 Transition from working at Tinto Capital and Launching Boot Up 11:29 Introduction to Path AI and the need for AI tools in the hiring process 15:37 Advantage of AI advisors for specific industries 18:44 Advantages of sourcing talent from South America 23:22 Incorporation of AI into the product using OpenAI, Anthropic, and Llama2 25:24 Importance of delivering value and creating awareness for customers 26:46 Looking for people who care about helping others and economic mobility 29:30 Funding journey: self-funded, upcoming funding round 32:53 Finding investors with expertise in the relevant industry 33:55 Selecting the most compelling and honest metrics for investors 35:15 Communication with partners, associates, and analysts in the funding process 38:13 How to get in contact with Path AI team and whats coming new in 2024 Resources Company website: https://yourpath.ai/ LinkedIn: https://www.linkedin.com/company/yourpathai/ Instagram: https://www.instagram.com/yourpath.ai/ Other: https://www.thefamily.studio/

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Speaker CFPs and Sponsor Guides are now available for AIE World's Fair — join us on June 25-27 for the biggest AI Engineer conference of 2024!Soumith Chintala needs no introduction in the ML world — his insights are incredibly accessible across Twitter, LinkedIn, podcasts, and conference talks (in this pod we'll assume you'll have caught up on the History of PyTorch pod from last year and cover different topics). He's well known as the creator of PyTorch, but he's more broadly the Engineering Lead on AI Infra, PyTorch, and Generative AI at Meta.Soumith was one of the earliest supporters of Latent Space (and more recently AI News), and we were overjoyed to catch up with him on his latest SF visit for a braindump of the latest AI topics, reactions to some of our past guests, and why Open Source AI is personally so important to him.Life in the GPU-Rich LaneBack in January, Zuck went on Instagram to announce their GPU wealth: by the end of 2024, Meta will have 350k H100s. By adding all their GPU clusters, you'd get to 600k H100-equivalents of compute. At FP16 precision, that's ~1,200,000 PFLOPS. If we used George Hotz's (previous guest!) "Person of Compute" measure, Meta now has 60k humans of compute in their clusters. Occasionally we get glimpses into the GPU-rich life; on a recent ThursdAI chat, swyx prompted PaLM tech lead Yi Tay to write down what he missed most from Google, and he commented that UL2 20B was trained by accidentally leaving the training job running for a month, because hardware failures are so rare in Google.Meta AI's Epic LLM RunBefore Llama broke the internet, Meta released an open source LLM in May 2022, OPT-175B, which was notable for how “open” it was - right down to the logbook! They used only 16 NVIDIA V100 GPUs and Soumith agrees that, with hindsight, it was likely under-trained for its parameter size.In Feb 2023 (pre Latent Space pod), Llama was released, with a 7B version trained on 1T tokens alongside 65B and 33B versions trained on 1.4T tokens. The Llama authors included Guillaume Lample and Timothée Lacroix, who went on to start Mistral.July 2023 was Llama2 time (which we covered!): 3 model sizes, 7B, 13B, and 70B, all trained on 2T tokens. The three models accounted for a grand total of 3,311,616 GPU hours for all pre-training work. CodeLlama followed shortly after, a fine-tune of Llama2 specifically focused on code generation use cases. The family had models in the 7B, 13B, 34B, and 70B size, all trained with 500B extra tokens of code and code-related data, except for 70B which is trained on 1T.All of this on top of other open sourced models like Segment Anything (one of our early hits!), Detectron, Detectron 2, DensePose, and Seamless, and in one year, Meta transformed from a company people made fun of for its “metaverse” investments to one of the key players in the AI landscape and its stock has almost tripled since (about $830B in market value created in the past year).Why Open Source AIThe obvious question is why Meta would spend hundreds of millions on its AI efforts and then release them for free. Zuck has addressed this in public statements:But for Soumith, the motivation is even more personal:“I'm irrationally interested in open source. I think open source has that fundamental way to distribute opportunity in a way that is very powerful. Like, I grew up in India… And knowledge was very centralized, but I saw that evolution of knowledge slowly getting decentralized. And that ended up helping me learn quicker and faster for like zero dollars. And I think that was a strong reason why I ended up where I am. So like that, like the open source side of things, I always push regardless of like what I get paid for, like I think I would do that as a passion project on the side……I think at a fundamental level, the most beneficial value of open source is that you make the distribution to be very wide. It's just available with no friction and people can do transformative things in a way that's very accessible. Maybe it's open source, but it has a commercial license and I'm a student in India. I don't care about the license. I just don't even understand the license. But like the fact that I can use it and do something with it is very transformative to me……Like, okay, I again always go back to like I'm a student in India with no money. What is my accessibility to any of these closed source models? At some scale I have to pay money. That makes it a non-starter and stuff. And there's also the control issue: I strongly believe if you want human aligned AI, you want all humans to give feedback. And you want all humans to have access to that technology in the first place. And I actually have seen, living in New York, whenever I come to Silicon Valley, I see a different cultural bubble.We like the way Soumith put it last year: Closed AI “rate-limits against people's imaginations and needs”!What It Takes For Open Source AI to WinHowever Soumith doesn't think Open Source will simply win by popular demand. There is a tremendous coordination problem with the decentralized nature of the open source AI development right now: nobody is collecting the valuable human feedback in the way that OpenAI or Midjourney are doing.“Open source in general always has a coordination problem. If there's a vertically integrated provider with more resources, they will just be better coordinated than open source. And so now open source has to figure out how to have coordinated benefits. And the reason you want coordinated benefits is because these models are getting better based on human feedback. And if you see with open source models, like if you go to the /r/localllama subreddit, like there's so many variations of models that are being produced from, say, Nous research. I mean, like there's like so many variations built by so many people. And one common theme is they're all using these fine-tuning or human preferences datasets that are very limited and they're not sufficiently diverse. And you look at the other side, say front-ends like Oobabooga or like Hugging Chat or Ollama, they don't really have feedback buttons. All the people using all these front-ends, they probably want to give feedback, but there's no way for them to give feedback… So we're just losing all of this feedback. Maybe open source models are being as used as GPT is at this point in like all kinds of, in a very fragmented way, like in aggregate all the open source models together are probably being used as much as GPT is, maybe close to that. But the amount of feedback that is driving back into the open source ecosystem is like negligible, maybe less than 1% of like the usage. So I think like some, like the blueprint here I think is you'd want someone to create a sinkhole for the feedback… I think if we do that, if that actually happens, I think that probably has a real chance of the open source models having a runaway effect against OpenAI, I think like there's a clear chance we can take at truly winning open source.”If you're working on solving open source coordination, please get in touch!Show Notes* Soumith Chintala Twitter* History of PyTorch episode on Gradient Podcast* The Llama Ecosystem* Apple's MLX* Neural ODEs (Ordinary Differential Equations)* AlphaGo* LMSys arena* Dan Pink's "Drive"* Robotics projects:* Dobb-E* OK Robot* Yann LeCun* Yangqing Jia of Lepton AI* Ed Catmull* George Hotz on Latent Space* Chris Lattner on Latent Space* Guillaume Lample* Yannic Kilcher of OpenAssistant* LMSys* Alex Atallah of OpenRouter* Carlo Sferrazza's 3D tactile research* Alex Wiltschko of Osmo* Tangent by Alex Wiltschko* Lerrel Pinto - RoboticsTimestamps* [00:00:00] Introductions* [00:00:51] Extrinsic vs Intrinsic Success* [00:02:40] Importance of Open Source and Its Impact* [00:03:46] PyTorch vs TinyGrad* [00:08:33] Why PyTorch is the Switzerland of frameworks* [00:10:27] Modular's Mojo + PyTorch?* [00:13:32] PyTorch vs Apple's MLX* [00:16:27] FAIR / PyTorch Alumni* [00:18:50] How can AI inference providers differentiate?* [00:21:41] How to build good benchmarks and learnings from AnyScale's* [00:25:28] Most interesting unexplored ideas* [00:28:18] What people get wrong about synthetic data* [00:35:57] Meta AI's evolution* [00:38:42] How do you allocate 600,000 GPUs?* [00:42:05] Even the GPU Rich are GPU Poor* [00:47:31] Meta's MTIA silicon* [00:50:09] Why we need open source* [00:59:00] Open source's coordination problem for feedback gathering* [01:08:59] Beyond text generation* [01:15:37] Osmo and the Future of Smell Recognition TechnologyTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:15]: Hey, and today we have in the studio Soumith Chintala, welcome.Soumith [00:00:17]: Thanks for having me.Swyx [00:00:18]: On one of your rare visits from New York where you live. You got your start in computer vision at NYU with Yann LeCun. That was a very fortuitous start. I was actually listening to your interview on the Gradient podcast. So if people want to know more about the history of Soumith, history of PyTorch, they can go to that podcast. We won't spend that much time there, but I just was marveling at your luck, or I don't know if it's your luck or your drive to find AI early and then find the right quality mentor because I guess Yan really sort of introduced you to that world.Soumith [00:00:51]: Yeah, I think you're talking about extrinsic success, right? A lot of people just have drive to do things that they think is fun, and a lot of those things might or might not be extrinsically perceived as good and successful. I think I just happened to like something that is now one of the coolest things in the world or whatever. But if I happen, the first thing I tried to become was a 3D VFX artist, and I was really interested in doing that, but I turned out to be very bad at it. So I ended up not doing that further. But even if I was good at that, whatever, and I ended up going down that path, I probably would have been equally happy. It's just like maybe like the perception of, oh, is this person successful or not might be different. I think like after a baseline, like your happiness is probably more correlated with your intrinsic stuff.Swyx [00:01:44]: Yes. I think Dan Pink has this book on drive that I often refer to about the power of intrinsic motivation versus extrinsic and how long extrinsic lasts. It's not very long at all. But anyway, now you are an investor in Runway, so in a way you're working on VFX. Yes.Soumith [00:02:01]: I mean, in a very convoluted way.Swyx [00:02:03]: It reminds me of Ed Catmull. I don't know if you guys know, but he actually tried to become an animator in his early years and failed or didn't get accepted by Disney and then went and created Pixar and then got bought by Disney and created Toy Story. So you joined Facebook in 2014 and eventually became a creator and maintainer of PyTorch. And there's this long story there you can refer to on the gradient. I think maybe people don't know that you also involved in more sort of hardware and cluster decision affair. And we can dive into more details there because we're all about hardware this month. Yeah. And then finally, I don't know what else, like what else should people know about you on a personal side or professional side?Soumith [00:02:40]: I think open source is definitely a big passion of mine and probably forms a little bit of my identity at this point. I'm irrationally interested in open source. I think open source has that fundamental way to distribute opportunity in a way that is very powerful. Like, I grew up in India. I didn't have internet for a while. In college, actually, I didn't have internet except for GPRS or whatever. And knowledge was very centralized, but I saw that evolution of knowledge slowly getting decentralized. And that ended up helping me learn quicker and faster for zero dollars. And I think that was a strong reason why I ended up where I am. So the open source side of things, I always push regardless of what I get paid for, like I think I would do that as a passion project on the side.Swyx [00:03:35]: Yeah, that's wonderful. Well, we'll talk about the challenges as well that open source has, open models versus closed models. Maybe you want to touch a little bit on PyTorch before we move on to the sort of Meta AI in general.PyTorch vs Tinygrad tradeoffsAlessio [00:03:46]: Yeah, we kind of touched on PyTorch in a lot of episodes. So we had George Hotz from TinyGrad. He called PyTorch a CISC and TinyGrad a RISC. I would love to get your thoughts on PyTorch design direction as far as, I know you talk a lot about kind of having a happy path to start with and then making complexity hidden away but then available to the end user. One of the things that George mentioned is I think you have like 250 primitive operators in PyTorch, I think TinyGrad is four. So how do you think about some of the learnings that maybe he's going to run into that you already had in the past seven, eight years almost of running PyTorch?Soumith [00:04:24]: Yeah, I think there's different models here, but I think it's two different models that people generally start with. Either they go like, I have a grand vision and I'm going to build a giant system that achieves this grand vision and maybe one is super feature complete or whatever. Or other people say they will get incrementally ambitious, right? And they say, oh, we'll start with something simple and then we'll slowly layer out complexity in a way that optimally applies Huffman coding or whatever. Like where the density of users are and what they're using, I would want to keep it in the easy, happy path and where the more niche advanced use cases, I'll still want people to try them, but they need to take additional frictional steps. George, I think just like we started with PyTorch, George started with the incrementally ambitious thing. I remember TinyGrad used to be, like we would be limited to a thousand lines of code and I think now it's at 5,000. So I think there is no real magic to which why PyTorch has the kind of complexity. I think it's probably partly necessitated and partly because we built with the technology available under us at that time, PyTorch is like 190,000 lines of code or something at this point. I think if you had to rewrite it, we would probably think about ways to rewrite it in a vastly simplified way for sure. But a lot of that complexity comes from the fact that in a very simple, explainable way, you have memory hierarchies. You have CPU has three levels of caches and then you have DRAM and SSD and then you have network. Similarly, GPU has several levels of memory and then you have different levels of network hierarchies, NVLink plus InfiniBand or Rocky or something like that, right? And the way the flops are available on your hardware, they are available in a certain way and your computation is in a certain way and you have to retrofit your computation onto both the memory hierarchy and like the flops available. When you're doing this, it is actually a fairly hard mathematical problem to do this setup, like you find the optimal thing. And finding the optimal thing is, what is optimal depends on the input variables themselves. So like, okay, what is the shape of your input tensors and what is the operation you're trying to do and various things like that. Finding that optimal configuration and writing it down in code is not the same for every input configuration you have. Like for example, just as the shape of the tensors change, let's say you have three input tensors into a Sparstar product or something like that. The shape of each of these input tensors will vastly change how you do this optimally placing this operation onto the hardware in a way that will get you maximal throughput. So a lot of our complexity comes from writing out hundreds of configurations for each single PyTorch operator and templatizing these things and symbolically generating the final CUDA code or CPU code. There's no way to avoid it because mathematically we haven't found symbolic ways to do this that also keep compile time near zero. You can write a very simple framework, but then you also should be willing to eat the long compile time. So if searching for that optimal performance at runtime, but that's the trade off. There's no, like, I don't think unless we have great breakthroughs George's vision is achievable, he should be thinking about a narrower problem such as I'm only going to make this for work for self-driving car connets or I'm only going to make this work for LLM transformers of the llama style. Like if you start narrowing the problem down, you can make a vastly simpler framework. But if you don't, if you need the generality to power all of the AI research that is happening and keep zero compile time and in all these other factors, I think it's not easy to avoid the complexity.Pytorch vs MojoAlessio [00:08:33]: That's interesting. And we kind of touched on this with Chris Lattner when he was on the podcast. If you think about frameworks, they have the model target. They have the hardware target. They have different things to think about. He mentioned when he was at Google, TensorFlow trying to be optimized to make TPUs go brr, you know, and go as fast. I think George is trying to make especially AMD stack be better than ROCm. How come PyTorch has been such as Switzerland versus just making Meta hardware go brr?Soumith [00:09:00]: First, Meta is not in the business of selling hardware. Meta is not in the business of cloud compute. The way Meta thinks about funding PyTorch is we're funding it because it's net good for Meta to fund PyTorch because PyTorch has become a standard and a big open source project. And generally it gives us a timeline edge. It gives us leverage and all that within our own work. So why is PyTorch more of a Switzerland rather than being opinionated? I think the way we think about it is not in terms of Switzerland or not. We actually the way we articulate it to all hardware vendors and software vendors and all who come to us being we want to build a backend in core for PyTorch and ship it by default is we just only look at our user side of things. Like if users are using a particular piece of hardware, then we want to support it. We very much don't want to king make the hardware side of things. So as the MacBooks have GPUs and as that stuff started getting increasingly interesting, we pushed Apple to push some engineers and work on the NPS support and we spend significant time from Meta funded engineers on that as well because a lot of people are using the Apple GPUs and there's demand. So we kind of mostly look at it from the demand side. We never look at it from like oh which hardware should we start taking opinions on.Swyx [00:10:27]: Is there a future in which, because Mojo or Modular Mojo is kind of a superset of Python, is there a future in which PyTorch might use Mojo features optionally?Soumith [00:10:36]: I think it depends on how well integrated it is into the Python ecosystem. So if Mojo is like a pip install and it's readily available and users feel like they can use Mojo so smoothly within their workflows in a way that just is low friction, we would definitely look into that. Like in the same way PyTorch now depends on Triton, OpenAI Triton, and we never had a conversation that was like huh, that's like a dependency. Should we just build a Triton of our own or should we use Triton? It almost doesn't, like those conversations don't really come up for us. The conversations are more well does Triton have 10,000 dependencies and is it hard to install? We almost don't look at these things from a strategic leverage point of view. We look at these things from a user experience point of view, like is it easy to install? Is it smoothly integrated and does it give enough benefits for us to start depending on it? If so, yeah, we should consider it. That's how we think about it.Swyx [00:11:37]: You're inclusive by default as long as it meets the minimum bar of, yeah, but like maybe I phrased it wrongly. Maybe it's more like what problems would you look to solve that you have right now?Soumith [00:11:48]: I think it depends on what problems Mojo will be useful at.Swyx [00:11:52]: Mainly a performance pitch, some amount of cross compiling pitch.Soumith [00:11:56]: Yeah, I think the performance pitch for Mojo was like, we're going to be performant even if you have a lot of custom stuff, you're going to write arbitrary custom things and we will be performant. And that value proposition is not clear to us from the PyTorch side to consider it for PyTorch. So PyTorch, it's actually not 250 operators, it's like a thousand operators. PyTorch exposes about a thousand operators and people kind of write their ideas in the thousand operators of PyTorch. Mojo is like, well, maybe it's okay to completely sidestep those thousand operators of PyTorch and just write it in a more natural form. Just write raw Python, write for loops or whatever, right? So from the consideration of how do we intersect PyTorch with Mojo, I can see one use case where you have custom stuff for some parts of your program, but mostly it's PyTorch. And so we can probably figure out how to make it easier for say Torch.compile to smoothly also consume Mojo subgraphs and like, you know, the interoperability being actually usable, that I think is valuable. But Mojo as a fundamental front end would be replacing PyTorch, not augmenting PyTorch. So in that sense, I don't see a synergy in more deeply integrating Mojo.Pytorch vs MLXSwyx [00:13:21]: So call out to Mojo whenever they have written something in Mojo and there's some performance related thing going on. And then since you mentioned Apple, what should people think of PyTorch versus MLX?Soumith [00:13:32]: I mean, MLX is early and I know the folks well, Ani used to work at FAIR and I used to chat with him all the time. He used to be based out of New York as well. The way I think about MLX is that MLX is specialized for Apple right now. It has a happy path because it's defined its product in a narrow way. At some point MLX either says we will only be supporting Apple and we will just focus on enabling, you know, there's a framework if you use your MacBook, but once you like go server side or whatever, that's not my problem and I don't care. For MLS, it enters like the server side set of things as well. Like one of these two things will happen, right? If the first thing will happen, like MLX's overall addressable market will be small, but it probably do well within that addressable market. If it enters the second phase, they're going to run into all the same complexities that we have to deal with. They will not have any magic wand and they will have more complex work to do. They probably wouldn't be able to move as fast.Swyx [00:14:44]: Like having to deal with distributed compute?Soumith [00:14:48]: Distributed, NVIDIA and AMD GPUs, like just like having a generalization of the concept of a backend, how they treat compilation with plus overheads. Right now they're deeply assumed like the whole NPS graph thing. So they need to think about all these additional things if they end up expanding onto the server side and they'll probably build something like PyTorch as well, right? Like eventually that's where it will land. And I think there they will kind of fail on the lack of differentiation. Like it wouldn't be obvious to people why they would want to use it.Swyx [00:15:24]: I mean, there are some cloud companies offering M1 and M2 chips on servers. I feel like it might be interesting for Apple to pursue that market, but it's not their core strength.Soumith [00:15:33]: Yeah. If Apple can figure out their interconnect story, maybe, like then it can become a thing.Swyx [00:15:40]: Honestly, that's more interesting than the cars. Yes.Soumith [00:15:43]: I think the moat that NVIDIA has right now, I feel is that they have the interconnect that no one else has, like AMD GPUs are pretty good. I'm sure there's various silicon that is not bad at all, but the interconnect, like NVLink is uniquely awesome. I'm sure the other hardware providers are working on it, but-Swyx [00:16:04]: I feel like when you say it's uniquely awesome, you have some appreciation of it that the rest of us don't. I mean, the rest of us just like, you know, we hear marketing lines, but what do you mean when you say NVIDIA is very good at networking? Obviously they made the acquisition maybe like 15 years ago.Soumith [00:16:15]: Just the bandwidth it offers and the latency it offers. I mean, TPUs also have a good interconnect, but you can't buy them. So you have to go to Google to use it.PyTorch MafiaAlessio [00:16:27]: Who are some of the other FAIR PyTorch alumni that are building cool companies? I know you have Fireworks AI, Lightning AI, Lepton, and Yangqing, you knew since college when he was building Coffee?Soumith [00:16:40]: Yeah, so Yangqing and I used to be framework rivals, PyTorch, I mean, we were all a very small close-knit community back then. Caffe, Torch, Theano, Chainer, Keras, various frameworks. I mean, it used to be more like 20 frameworks. I can't remember all the names. CCV by Liu Liu, who is also based out of SF. And I would actually like, you know, one of the ways it was interesting is you went into the framework guts and saw if someone wrote their own convolution kernel or they were just copying someone else's. There were four or five convolution kernels that were unique and interesting. There was one from this guy out of Russia, I forgot the name, but I remembered who was awesome enough to have written their own kernel. And at some point there, I built out these benchmarks called ConNet benchmarks. They're just benchmarking all the convolution kernels that are available at that time. It hilariously became big enough that at that time AI was getting important, but not important enough that industrial strength players came in to do these kinds of benchmarking and standardization. Like we have MLPerf today. So a lot of the startups were using ConNet benchmarks in their pitch decks as like, oh, you know, on ConNet benchmarks, this is how we fare, so you should fund us. I remember Nirvana actually was at the top of the pack because Scott Gray wrote amazingly fast convolution kernels at that time. Very interesting, but separate times. But to answer your question, Alessio, I think mainly Lepton, Fireworks are the two most obvious ones, but I'm sure the fingerprints are a lot wider. They're just people who worked within the PyTorch Cafe2 cohort of things and now end up at various other places.Swyx [00:18:50]: I think as a, both as an investor and a people looking to build on top of their services, it's a uncomfortable slash like, I don't know what I don't know pitch. Because I've met Yang Tsing and I've met Lin Chao. Yeah, I've met these folks and they're like, you know, we are deep in the PyTorch ecosystem and we serve billions of inferences a day or whatever at Facebook and now we can do it for you. And I'm like, okay, that's great. Like, what should I be wary of or cautious of when these things happen? Because I'm like, obviously this experience is extremely powerful and valuable. I just don't know what I don't know. Like, what should people know about like these sort of new inference as a service companies?Soumith [00:19:32]: I think at that point you would be investing in them for their expertise of one kind. So if they've been at a large company, but they've been doing amazing work, you would be thinking about it as what these people bring to the table is that they're really good at like GPU programming or understanding the complexity of serving models once it hits a certain scale. You know, various expertise like from the infra and AI and GPUs point of view. What you would obviously want to figure out is whether their understanding of the external markets is clear, whether they know and understand how to think about running a business, understanding how to be disciplined about making money or, you know, various things like that.Swyx [00:20:23]: Maybe I'll put it like, actually I will de-emphasize the investing bit and just more as a potential customer. Oh, okay. Like, it's more okay, you know, you have PyTorch gods, of course. Like, what else should I know?Soumith [00:20:37]: I mean, I would not care about who's building something. If I'm trying to be a customer, I would care about whether...Swyx [00:20:44]: Benchmarks.Soumith [00:20:44]: Yeah, I use it and it's usability and reliability and speed, right?Swyx [00:20:51]: Quality as well.Soumith [00:20:51]: Yeah, if someone from some random unknown place came to me and say, user stuff is great. Like, and I have the bandwidth, I probably will give it a shot. And if it turns out to be great, like I'll just use it.Benchmark dramaSwyx [00:21:07]: Okay, great. And then maybe one more thing about benchmarks, since we already brought it up and you brought up Confident Benchmarks. There was some recent drama around AnyScale. AnyScale released their own benchmarks and obviously they look great on their own benchmarks, but maybe didn't give the other... I feel there are two lines of criticism. One, which is they didn't test some apples for apples on the kind of endpoints that the other providers, that they are competitors with, on their benchmarks and that is due diligence baseline. And then the second would be more just optimizing for the right thing. You had some commentary on it. I'll just kind of let you riff.Soumith [00:21:41]: Yeah, I mean, in summary, basically my criticism of that was AnyScale built these benchmarks for end users to just understand what they should pick, right? And that's a very good thing to do. I think what they didn't do a good job of is give that end user a full understanding of what they should pick. Like they just gave them a very narrow slice of understanding. I think they just gave them latency numbers and that's not sufficient, right? You need to understand your total cost of ownership at some reasonable scale. Not oh, one API call is one cent, but a thousand API calls are 10 cents. Like people can misprice to cheat on those benchmarks. So you want to understand, okay, like how much is it going to cost me if I actually subscribe to you and do like a million API calls a month or something? And then you want to understand the latency and reliability, not just from one call you made, but an aggregate of calls you've made over several various times of the day and times of the week. And the nature of the workloads, is it just some generic single paragraph that you're sending that is cashable? Or is it like testing of real world workload? I think that kind of rigor, like in presenting that benchmark wasn't there. It was a much more narrow sliver of what should have been a good benchmark. That was my main criticism. And I'm pretty sure if before they released it, they showed it to their other stakeholders who would be caring about this benchmark because they are present in it, they would have easily just pointed out these gaps. And I think they didn't do that and they just released it. So I think those were the two main criticisms. I think they were fair and Robert took it well.Swyx [00:23:40]: And he took it very well. And we'll have him on at some point and we'll discuss it. But I think it's important for, I think the market being maturing enough that people start caring and competing on these kinds of things means that we need to establish what best practice is because otherwise everyone's going to play dirty.Soumith [00:23:55]: Yeah, absolutely. My view of the LLM inference market in general is that it's the laundromat model. Like the margins are going to drive down towards the bare minimum. It's going to be all kinds of arbitrage between how much you can get the hardware for and then how much you sell the API and how much latency your customers are willing to let go. You need to figure out how to squeeze your margins. Like what is your unique thing here? Like I think Together and Fireworks and all these people are trying to build some faster CUDA kernels and faster, you know, hardware kernels in general. But those modes only last for a month or two. These ideas quickly propagate.Swyx [00:24:38]: Even if they're not published?Soumith [00:24:39]: Even if they're not published, the idea space is small. So even if they're not published, the discovery rate is going to be pretty high. It's not like we're talking about a combinatorial thing that is really large. You're talking about Llama style LLM models. And we're going to beat those to death on a few different hardware SKUs, right? Like it's not even we have a huge diversity of hardware you're going to aim to run it on. Now when you have such a narrow problem and you have a lot of people working on it, the rate at which these ideas are going to get figured out is going to be pretty rapid.Swyx [00:25:15]: Is it a standard bag of tricks? Like the standard one that I know of is, you know, fusing operators and-Soumith [00:25:22]: Yeah, it's the standard bag of tricks on figuring out how to improve your memory bandwidth and all that, yeah.Alessio [00:25:28]: Any ideas instead of things that are not being beaten to death that people should be paying more attention to?Novel PyTorch ApplicationsSwyx [00:25:34]: One thing I was like, you know, you have a thousand operators, right? Like what's the most interesting usage of PyTorch that you're seeing maybe outside of this little bubble?Soumith [00:25:41]: So PyTorch, it's very interesting and scary at the same time, but basically it's used in a lot of exotic ways, like from the ML angle, what kind of models are being built? And you get all the way from state-based models and all of these things to stuff nth order differentiable models, like neural ODEs and stuff like that. I think there's one set of interestingness factor from the ML side of things. And then there's the other set of interesting factor from the applications point of view. It's used in Mars Rover simulations, to drug discovery, to Tesla cars. And there's a huge diversity of applications in which it is used. So in terms of the most interesting application side of things, I think I'm scared at how many interesting things that are also very critical and really important it is used in. I think the scariest was when I went to visit CERN at some point and they said they were using PyTorch and they were using GANs at the same time for particle physics research. And I was scared more about the fact that they were using GANs than they were using PyTorch, because at that time I was a researcher focusing on GANs. But the diversity is probably the most interesting. How many different things it is being used in. I think that's the most interesting to me from the applications perspective. From the models perspective, I think I've seen a lot of them. Like the really interesting ones to me are where we're starting to combine search and symbolic stuff with differentiable models, like the whole AlphaGo style models is one example. And then I think we're attempting to do it for LLMs as well, with various reward models and search. I mean, I don't think PyTorch is being used in this, but the whole alpha geometry thing was interesting because again, it's an example of combining the symbolic models with the gradient based ones. But there are stuff like alpha geometry that PyTorch is used at, especially when you intersect biology and chemistry with ML. In those areas, you want stronger guarantees on the output. So yeah, maybe from the ML side, those things to me are very interesting right now.Swyx [00:28:03]: Yeah. People are very excited about the alpha geometry thing. And it's kind of like, for me, it's theoretical. It's great. You can solve some Olympia questions. I'm not sure how to make that bridge over into the real world applications, but I'm sure people smarter than me will figure it out.Synthetic Data vs Symbolic ModelsSoumith [00:28:18]: Let me give you an example of it. You know how the whole thing about synthetic data will be the next rage in LLMs is a thing?Swyx [00:28:27]: Already is a rage.Soumith [00:28:28]: Which I think is fairly misplaced in how people perceive it. People think synthetic data is some kind of magic wand that you wave and it's going to be amazing. Synthetic data is useful in neural networks right now because we as humans have figured out a bunch of symbolic models of the world or made up certain symbolic models because of human innate biases. So we've figured out how to ground particle physics in a 30 parameter model. And it's just very hard to compute as in it takes a lot of flops to compute, but it only has 30 parameters or so. I mean, I'm not a physics expert, but it's a very low rank model. We built mathematics as a field that basically is very low rank. Language, a deep understanding of language, like the whole syntactic parse trees and just understanding how language can be broken down and into a formal symbolism is something that we figured out. So we basically as humans have accumulated all this knowledge on these subjects, either synthetic, we created those subjects in our heads, or we grounded some real world phenomenon into a set of symbols. But we haven't figured out how to teach neural networks symbolic world models directly. The only way we have to teach them is generating a bunch of inputs and outputs and gradient dissenting over them. So in areas where we have the symbolic models and we need to teach all the knowledge we have that is better encoded in the symbolic models, what we're doing is we're generating a bunch of synthetic data, a bunch of input output pairs, and then giving that to the neural network and asking it to learn the same thing that we already have a better low rank model of in gradient descent in a much more over-parameterized way. Outside of this, like where we don't have good symbolic models, like synthetic data obviously doesn't make any sense. So synthetic data is not a magic wand where it'll work in all cases in every case or whatever. It's just where we as humans already have good symbolic models off. We need to impart that knowledge to neural networks and we figured out the synthetic data is a vehicle to impart this knowledge to. So, but people, because maybe they don't know enough about synthetic data as a notion, but they hear, you know, the next wave of data revolution is synthetic data. They think it's some kind of magic where we just create a bunch of random data somehow. They don't think about how, and then they think that's just a revolution. And I think that's maybe a gap in understanding most people have in this hype cycle.Swyx [00:31:23]: Yeah, well, it's a relatively new concept, so. Oh, there's two more that I'll put in front of you and then you can see what you respond. One is, you know, I have this joke that it's, you know, it's only synthetic data if it's from the Mistral region of France, otherwise it's just a sparkling distillation, which is what news research is doing. Like they're distilling GPT-4 by creating synthetic data from GPT-4, creating mock textbooks inspired by Phi 2 and then fine tuning open source models like Llama. And so I don't know, I mean, I think that's, should we call that synthetic data? Should we call it something else? I don't know.Soumith [00:31:57]: Yeah, I mean, the outputs of LLMs, are they synthetic data? They probably are, but I think it depends on the goal you have. If your goal is you're creating synthetic data with the goal of trying to distill GPT-4's superiority into another model, I guess you can call it synthetic data, but it also feels like disingenuous because your goal is I need to copy the behavior of GPT-4 and-Swyx [00:32:25]: It's also not just behavior, but data set. So I've often thought of this as data set washing. Like you need one model at the top of the chain, you know, unnamed French company that has that, you know, makes a model that has all the data in it that we don't know where it's from, but it's open source, hey, and then we distill from that and it's great. To be fair, they also use larger models as judges for preference ranking, right? So that is, I think, a very, very accepted use of synthetic.Soumith [00:32:53]: Correct. I think it's a very interesting time where we don't really have good social models of what is acceptable depending on how many bits of information you use from someone else, right? It's like, okay, you use one bit. Is that okay? Yeah, let's accept it to be okay. Okay, what about if you use 20 bits? Is that okay? I don't know. What if you use 200 bits? I don't think we as society have ever been in this conundrum where we have to be like, where is the boundary of copyright or where is the boundary of socially accepted understanding of copying someone else? We haven't been tested this mathematically before,Swyx [00:33:38]: in my opinion. Whether it's transformative use. Yes. So yeah, I think this New York Times opening eye case is gonna go to the Supreme Court and we'll have to decide it because I think we never had to deal with it before. And then finally, for synthetic data, the thing that I'm personally exploring is solving this great stark paradigm difference between rag and fine tuning, where you can kind of create synthetic data off of your retrieved documents and then fine tune on that. That's kind of synthetic. All you need is variation or diversity of samples for you to fine tune on. And then you can fine tune new knowledge into your model. I don't know if you've seen that as a direction for synthetic data.Soumith [00:34:13]: I think you're basically trying to, what you're doing is you're saying, well, language, I know how to parametrize language to an extent. And I need to teach my model variations of this input data so that it's resilient or invariant to language uses of that data.Swyx [00:34:32]: Yeah, it doesn't overfit on the wrong source documents.Soumith [00:34:33]: So I think that's 100% synthetic. You understand, the key is you create variations of your documents and you know how to do that because you have a symbolic model or like some implicit symbolic model of language.Swyx [00:34:48]: Okay.Alessio [00:34:49]: Do you think the issue with symbolic models is just the architecture of the language models that we're building? I think maybe the thing that people grasp is the inability of transformers to deal with numbers because of the tokenizer. Is it a fundamental issue there too? And do you see alternative architectures that will be better with symbolic understanding?Soumith [00:35:09]: I am not sure if it's a fundamental issue or not. I think we just don't understand transformers enough. I don't even mean transformers as an architecture. I mean the use of transformers today, like combining the tokenizer and transformers and the dynamics of training, when you show math heavy questions versus not. I don't have a good calibration of whether I know the answer or not. I, you know, there's common criticisms that are, you know, transformers will just fail at X. But then when you scale them up to sufficient scale, they actually don't fail at that X. I think there's this entire subfield where they're trying to figure out these answers called like the science of deep learning or something. So we'll get to know more. I don't know the answer.Meta AI and Llama 2/3Swyx [00:35:57]: Got it. Let's touch a little bit on just Meta AI and you know, stuff that's going on there. Maybe, I don't know how deeply you're personally involved in it, but you're our first guest with Meta AI, which is really fantastic. And Llama 1 was, you know, you are such a believer in open source. Llama 1 was more or less the real breakthrough in open source AI. The most interesting thing for us covering on this, in this podcast was the death of Chinchilla, as people say. Any interesting insights there around the scaling models for open source models or smaller models or whatever that design decision was when you guys were doing it?Soumith [00:36:31]: So Llama 1 was Guillaume Lample and team. There was OPT before, which I think I'm also very proud of because we bridged the gap in understanding of how complex it is to train these models to the world. Like until then, no one really in gory detail published.Swyx [00:36:50]: The logs.Soumith [00:36:51]: Yeah. Like, why is it complex? And everyone says, oh, it's complex. But no one really talked about why it's complex. I think OPT was cool.Swyx [00:37:02]: I met Susan and she's very, very outspoken. Yeah.Soumith [00:37:05]: We probably, I think, didn't train it for long enough, right? That's kind of obvious in retrospect.Swyx [00:37:12]: For a 175B. Yeah. You trained it according to Chinchilla at the time or?Soumith [00:37:17]: I can't remember the details, but I think it's a commonly held belief at this point that if we trained OPT longer, it would actually end up being better. Llama 1, I think, was Guillaume Lample and team Guillaume is fantastic and went on to build Mistral. I wasn't too involved in that side of things. So I don't know what you're asking me, which is how did they think about scaling loss and all of that? Llama 2, I was more closely involved in. I helped them a reasonable amount with their infrastructure needs and stuff. And Llama 2, I think, was more like, let's get to the evolution. At that point, we kind of understood what we were missing from the industry's understanding of LLMs. And we needed more data and we needed more to train the models for longer. And we made, I think, a few tweaks to the architecture and we scaled up more. And that was Llama 2. I think Llama 2, you can think of it as after Guillaume left, the team kind of rebuilt their muscle around Llama 2. And Hugo, I think, who's the first author is fantastic. And I think he did play a reasonable big role in Llama 1 as well.Soumith [00:38:35]: And he overlaps between Llama 1 and 2. So in Llama 3, obviously, hopefully, it'll be awesome.Alessio [00:38:42]: Just one question on Llama 2, and then we'll try and fish Llama 3 spoilers out of you. In the Llama 2 paper, the loss curves of the 34 and 70B parameter, they still seem kind of steep. Like they could go lower. How, from an infrastructure level, how do you allocate resources? Could they have just gone longer or were you just, hey, this is all the GPUs that we can burn and let's just move on to Llama 3 and then make that one better?Soumith [00:39:07]: Instead of answering specifically about that Llama 2 situation or whatever, I'll tell you how we think about things. Generally, we're, I mean, Mark really is some numbers, right?Swyx [00:39:20]: So let's cite those things again. All I remember is like 600K GPUs.Soumith [00:39:24]: That is by the end of this year and 600K H100 equivalents. With 250K H100s, including all of our other GPU or accelerator stuff, it would be 600-and-something-K aggregate capacity.Swyx [00:39:38]: That's a lot of GPUs.Soumith [00:39:39]: We'll talk about that separately. But the way we think about it is we have a train of models, right? Llama 1, 2, 3, 4. And we have a bunch of GPUs. I don't think we're short of GPUs. Like-Swyx [00:39:54]: Yeah, no, I wouldn't say so. Yeah, so it's all a matter of time.Soumith [00:39:56]: I think time is the biggest bottleneck. It's like, when do you stop training the previous one and when do you start training the next one? And how do you make those decisions? The data, do you have net new data, better clean data for the next one in a way that it's not worth really focusing on the previous one? It's just a standard iterative product. You're like, when is the iPhone 1? When do you start working on iPhone 2? Where is the iPhone? And so on, right? So mostly the considerations are time and generation, rather than GPUs, in my opinion.Alessio [00:40:31]: So one of the things with the scaling loss, like Chinchilla is optimal to balance training and inference costs. I think at Meta's scale, you would rather pay a lot more maybe at training and then save on inference. How do you think about that from infrastructure perspective? I think in your tweet, you say you can try and guess on like how we're using these GPUs. Can you just give people a bit of understanding? It's like, because I've already seen a lot of VCs say, Llama 3 has been trained on 600,000 GPUs and that's obviously not true, I'm sure. How do you allocate between the research, FAIR and the Llama training, the inference on Instagram suggestions that get me to scroll, like AI-generated stickers on WhatsApp and all of that?Soumith [00:41:11]: Yeah, we haven't talked about any of this publicly, but as a broad stroke, it's like how we would allocate resources of any other kinds at any company. You run a VC portfolio, how do you allocate your investments between different companies or whatever? You kind of make various trade-offs and you kind of decide, should I invest in this project or this other project, or how much should I invest in this project? It's very much a zero sum of trade-offs. And it also comes into play, how are your clusters configured, like overall, what you can fit of what size and what cluster and so on. So broadly, there's no magic sauce here. I mean, I think the details would add more spice, but also wouldn't add more understanding. It's just gonna be like, oh, okay, I mean, this looks like they just think about this as I would normally do.Alessio [00:42:05]: So even the GPU rich run through the same struggles of having to decide where to allocate things.Soumith [00:42:11]: Yeah, I mean, at some point I forgot who said it, but you kind of fit your models to the amount of compute you have. If you don't have enough compute, you figure out how to make do with smaller models. But no one as of today, I think would feel like they have enough compute. I don't think I've heard any company within the AI space be like, oh yeah, like we feel like we have sufficient compute and we couldn't have done better. So that conversation, I don't think I've heard from any of my friends at other companies.EleutherSwyx [00:42:47]: Stella from Eleuther sometimes says that because she has a lot of donated compute. She's trying to put it to interesting uses, but for some reason she's decided to stop making large models.Soumith [00:42:57]: I mean, that's a cool, high conviction opinion that might pay out.Swyx [00:43:01]: Why?Soumith [00:43:02]: I mean, she's taking a path that most people don't care to take about in this climate and she probably will have very differentiated ideas. I mean, think about the correlation of ideas in AI right now. It's so bad, right? So everyone's fighting for the same pie. In some weird sense, that's partly why I don't really directly work on LLMs. I used to do image models and stuff and I actually stopped doing GANs because GANs were getting so hot that I didn't have any calibration of whether my work would be useful or not because, oh yeah, someone else did the same thing you did. It's like, there's so much to do, I don't understand why I need to fight for the same pie. So I think Stella's decision is very smart.Making BetsAlessio [00:43:53]: And how do you reconcile that with how we started the discussion about intrinsic versus extrinsic kind of like accomplishment or success? How should people think about that especially when they're doing a PhD or early in their career? I think in Europe, I walked through a lot of the posters and whatnot, there seems to be mode collapse in a way in the research, a lot of people working on the same things. Is it worth for a PhD to not take a bet on something that is maybe not as interesting just because of funding and visibility and whatnot? Or yeah, what suggestions would you give?Soumith [00:44:28]: I think there's a baseline level of compatibility you need to have with the field. Basically, you need to figure out if you will get paid enough to eat, right? Like whatever reasonable normal lifestyle you want to have as a baseline. So you at least have to pick a problem within the neighborhood of fundable. Like you wouldn't wanna be doing something so obscure that people are like, I don't know, like you can work on it.Swyx [00:44:59]: Would a limit on fundability, I'm just observing something like three months of compute, right? That's the top line, that's the like max that you can spend on any one project.Soumith [00:45:09]: But like, I think that's very ill specified, like how much compute, right? I think that the notion of fundability is broader. It's more like, hey, are these family of models within the acceptable set of, you're not crazy or something, right? Even something like neural or DS, which is a very boundary pushing thing or states-based models or whatever. Like all of these things I think are still in fundable territory. When you're talking about, I'm gonna do one of the neuromorphic models and then apply image classification to them or something, then it becomes a bit questionable. Again, it depends on your motivation. Maybe if you're a neuroscientist, it actually is feasible. But if you're an AI engineer, like the audience of these podcasts, then it's more questionable. The way I think about it is, you need to figure out how you can be in the baseline level of fundability just so that you can just live. And then after that, really focus on intrinsic motivation and depends on your strengths, like how you can play to your strengths and your interests at the same time. Like I try to look at a bunch of ideas that are interesting to me, but also try to play to my strengths. I'm not gonna go work on theoretical ML. I'm interested in it, but when I want to work on something like that, I try to partner with someone who is actually a good theoretical ML person and see if I actually have any value to provide. And if they think I do, then I come in. So I think you'd want to find that intersection of ideas you like, and that also play to your strengths. And I'd go from there. Everything else, like actually finding extrinsic success and all of that, I think is the way I think about it is like somewhat immaterial. When you're talking about building ecosystems and stuff, slightly different considerations come into play, but that's a different conversation.Swyx [00:47:06]: We're gonna pivot a little bit to just talking about open source AI. But one more thing I wanted to establish for Meta is this 600K number, just kind of rounding out the discussion, that's for all Meta. So including your own inference needs, right? It's not just about training.Soumith [00:47:19]: It's gonna be the number in our data centers for all of Meta, yeah.Swyx [00:47:23]: Yeah, so there's a decent amount of workload serving Facebook and Instagram and whatever. And then is there interest in like your own hardware?MTIASoumith [00:47:31]: We already talked about our own hardware. It's called MTIA. Our own silicon, I think we've even showed the standard photograph of you holding the chip that doesn't work. Like as in the chip that you basically just get like-Swyx [00:47:51]: As a test, right?Soumith [00:47:52]: Yeah, a test chip or whatever. So we are working on our silicon and we'll probably talk more about it when the time is right, but-Swyx [00:48:00]: Like what gaps do you have that the market doesn't offer?Soumith [00:48:04]: Okay, I mean, this is easy to answer. So basically, remember how I told you about there's this memory hierarchy and like sweet spots and all of that? Fundamentally, when you build a hardware, you make it general enough that a wide set of customers and a wide set of workloads can use it effectively while trying to get the maximum level of performance they can. The more specialized you make the chip, the more hardware efficient it's going to be, the more power efficient it's gonna be, the more easier it's going to be to find the software, like the kernel's right to just map that one or two workloads to that hardware and so on. So it's pretty well understood across the industry that if you have a sufficiently large volume, enough workload, you can specialize it and get some efficiency gains, like power gains and so on. So the way you can think about everyone building, every large company building silicon, I think a bunch of the other large companies are building their own silicon as well, is they, each large company has a sufficient enough set of verticalized workloads that can be specialized that have a pattern to them that say a more generic accelerator like an NVIDIA or an AMD GPU does not exploit. So there is some level of power efficiency that you're leaving on the table by not exploiting that. And you have sufficient scale and you have sufficient forecasted stability that those workloads will exist in the same form, that it's worth spending the time to build out a chip to exploit that sweet spot. Like obviously something like this is only useful if you hit a certain scale and that your forecasted prediction of those kind of workloads being in the same kind of specializable exploitable way is true. So yeah, that's why we're building our own chips.Swyx [00:50:08]: Awesome.Open Source AIAlessio [00:50:09]: Yeah, I know we've been talking a lot on a lot of different topics and going back to open source, you had a very good tweet. You said that a single company's closed source effort rate limits against people's imaginations and needs. How do you think about all the impact that some of the Meta AI work in open source has been doing and maybe directions of the whole open source AI space?Soumith [00:50:32]: Yeah, in general, I think first, I think it's worth talking about this in terms of open and not just open source, because like with the whole notion of model weights, no one even knows what source means for these things. But just for the discussion, when I say open source, you can assume it's just I'm talking about open. And then there's the whole notion of licensing and all that, commercial, non-commercial, commercial with clauses and all that. I think at a fundamental level, the most benefited value of open source is that you make the distribution to be very wide. It's just available with no friction and people can do transformative things in a way that's very accessible. Maybe it's open source, but it has a commercial license and I'm a student in India. I don't care about the license. I just don't even understand the license. But like the fact that I can use it and do something with it is very transformative to me. Like I got this thing in a very accessible way. And then it's various degrees, right? And then if it's open source, but it's actually a commercial license, then a lot of companies are gonna benefit from gaining value that they didn't previously have, that they maybe had to pay a closed source company for it. So open source is just a very interesting tool that you can use in various ways. So there's, again, two kinds of open source. One is some large company doing a lot of work and then open sourcing it. And that kind of effort is not really feasible by say a band of volunteers doing it the same way. So there's both a capital and operational expenditure that the large company just decided to ignore and give it away to the world for some benefits of some kind. They're not as tangible as direct revenue. So in that part, Meta has been doing incredibly good things. They fund a huge amount of the PyTorch development. They've open sourced Llama and those family of models and several other fairly transformative projects. FICE is one, Segment Anything, Detectron, Detectron 2. Dense Pose. I mean, it's-Swyx [00:52:52]: Seamless. Yeah, seamless.Soumith [00:52:53]: Like it's just the list is so long that we're not gonna cover. So I think Meta comes into that category where we spend a lot of CapEx and OpEx and we have a high talent density of great AI people and we open our stuff. And the thesis for that, I remember when FAIR was started, the common thing was like, wait, why would Meta wanna start a open AI lab? Like what exactly is a benefit from a commercial perspective? And for then the thesis was very simple. It was AI is currently rate limiting Meta's ability to do things. Our ability to build various product integrations, moderation, various other factors. Like AI was the limiting factor and we just wanted AI to advance more and we didn't care if the IP of the AI was uniquely in our possession or not. However the field advances, that accelerates Meta's ability to build a better product. So we just built an open AI lab and we said, if this helps accelerate the progress of AI, that's strictly great for us. But very easy, rational, right? Still the same to a large extent with the Llama stuff. And it's the same values, but the argument, it's a bit more nuanced. And then there's a second kind of open source, which is, oh, we built this project, nights and weekends and we're very smart people and we open sourced it and then we built a community around it. This is the Linux kernel and various software projects like that. So I think about open source, like both of these things being beneficial and both of these things being different. They're different and beneficial in their own ways. The second one is really useful when there's an active arbitrage to be done. If someone's not really looking at a particular space because it's not commercially viable or whatever, like a band of volunteers can just coordinate online and do something and then make that happen. And that's great.Open Source LLMsI wanna cover a little bit about open source LLMs maybe. So open source LLMs have been very interesting because I think we were trending towards an increase in open source in AI from 2010 all the way to 2017 or something. Like where more and more pressure within the community was to open source their stuff so that their methods and stuff get adopted. And then the LLMs revolution kind of took the opposite effect OpenAI stopped open sourcing their stuff and DeepMind kind of didn't, like all the other cloud and all these other providers, they didn't open source their stuff. And it was not good in the sense that first science done in isolation probably will just form its own bubble where people believe their own b******t or whatever. So there's that problem. And then there was the other problem which was the accessibility part. Like, okay, I again always go back to I'm a student in India with no money. What is my accessibility to any of these closers models? At some scale I have to pay money. That makes it a non-starter and stuff. And there's also the control thing. I strongly believe if you want human aligned stuff, you want all humans to give feedback. And you want all humans to have access to that technology in the first place. And I actually have seen, living in New York, whenever I come to Silicon Valley, I see a different cultural bubble. Like all the friends I hang out with talk about some random thing like Dyson Spheres or whatever, that's a thing. And most of the world doesn't know or care about any of this stuff. It's definitely a bubble and bubbles can form very easily. And when you make a lot of decisions because you're in a bubble, they're probably not globally optimal decisions. So I think open source, the distribution of open source powers a certain kind of non-falsifiability that I think is very important. I think on the open source models, like it's going great in the fact that LoRa I think came out of the necessity of open source models needing to be fine-tunable in some way. Yeah, and I think DPO also came out of the academic open source side of things. So do any of the closed source labs, did any of them already have LoRa or DPO internally? Maybe, but that does not advance humanity in any way. It advances some companies probability of doing the winner takes all that I talked about earlier in the podcast.Open Source and TrustI don't know, it just feels fundamentally good. Like when people try to, you know, people are like, well, what are the ways in which it is not okay? I find most of these arguments, and this might be a little controversial, but I find a lot of arguments based on whether closed source models are safer or open source models are safer very much related to what kind of culture they grew up in, what kind of society they grew up in. If they grew up in a society that they trusted, then I think they take the closed source argument. And if they grew up in a society that they couldn't trust, where the norm was that you didn't trust your government, obviously it's corrupt or whatever, then I think the open source argument is what they take. I think there's a deep connection to like people's innate biases from their childhood and their trust in society and governmental aspects that push them towards one opinion or the other. And I'm definitely in the camp of open source is definitely going to actually have better outcomes for society. Closed source to me just means that centralization of power, which, you know, is really hard to trust. So I think it's going well

Papers Read on AI
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

Papers Read on AI

Play Episode Listen Later Mar 6, 2024 36:45


We introduce Bonito, an open-source model for conditional task generation: the task of converting unannotated text into task-specific training datasets for instruction tuning. Our goal is to enable zero-shot task adaptation of large language models on users' specialized, private data. We train Bonito on a new large-scale dataset with 1.65M examples created by remixing existing instruction tuning datasets into meta-templates. The meta-templates for a dataset produce training examples where the input is the unannotated text and the task attribute and the output consists of the instruction and the response. We use Bonito to generate synthetic tasks for seven datasets from specialized domains across three task types -- yes-no question answering, extractive question answering, and natural language inference -- and adapt language models. We show that Bonito significantly improves the average performance of pretrained and instruction tuned models over the de facto self supervised baseline. For example, adapting Mistral-Instruct-v2 and instruction tuned variants of Mistral and Llama2 with Bonito improves the strong zero-shot performance by 22.1 F1 points whereas the next word prediction objective undoes some of the benefits of instruction tuning and reduces the average performance by 0.8 F1 points. We conduct additional experiments with Bonito to understand the effects of the domain, the size of the training set, and the choice of alternative synthetic task generators. Overall, we show that learning with synthetic instruction tuning datasets is an effective way to adapt language models to new domains. The model, dataset, and code are available at https://github.com/BatsResearch/bonito. 2024: Nihal V. Nayak, Yiyang Nan, Avi Trost, Stephen H. Bach https://arxiv.org/pdf/2402.18334v1.pdf

The Cloudcast
Validation and Guardrails for LLMs

The Cloudcast

Play Episode Listen Later Feb 21, 2024 27:27


Shreya Rajpal (CEO and Co-Counfer @ Guardrails AI) talks about the need to provide guardrails and validation of LLM's, along with common use cases and Guardrail AI's new HubSHOW: 797CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwNEW TO CLOUD? CHECK OUT OUR OTHER PODCAST - "CLOUDCAST BASICS"SHOW SPONSORS:Learn More About Azure Offerings : Learn more about Azure Migrate and Modernize & Azure Innovate!Azure Free Cloud Resource Kit : Step-by-step guidance, resources and expert advice, from migration to innovation.CloudZero – Cloud Cost Visibility and SavingsFind "Breaking Analysis Podcast with Dave Vellante" on Apple, Google and SpotifyKeep up to date with Enterprise Tech with theCUBESHOW NOTES:Guardrails AI (homepage)Guardrails AI HubGuardrails AI GitHubGuardrails AI DiscordShreya on TWIML podcastGuardrails AI on TechCrunchTopic 1 - Welcome to the show. Before we dive into today's discussion, tell us a little bit about your background.Topic 2 - Our topic today is the validation and accuracy of AI with guardrails. Let's start with the why… Why do we need guardrails for LLMs today?Topic 3 - Where and how do you control (maybe validate is a better word) outputs from LLM's today? What are your thoughts on the best way to validate outputs?Topic 4 - Will this workflow work with both closed-source (ChatGPT) and opensource (Llama2) models? Would this process apply to training/fine-tuning or more for inference? Would this potentially replace humans in the loop that we see today or is this completely different?Topic 5 - What are some of the most common early use cases and practical examples? PII detection comes to mind, violation of ethics or laws, off-topic/out of scope, or simply just something the model isn't designed to provide?Topic 6 - What happens if it fails? Does this create a loop scenario to try again?Topic 7 - Let's talk about Guardrails AI specifically. Today you offer an open-source marketplace of Validators in the Guardrails Hub, correct? As we mentioned earlier, almost everyone's implementation and guardrails they want to implement will be different. Is the best way to think about this as building blocks using validators that are pieced together?  Tell everyone a little bit about the offeringFEEDBACK?Email: show at the cloudcast dot netTwitter: @cloudcastpodInstagram: @cloudcastpodTikTok: @cloudcastpod

Papers Read on AI
Who's Harry Potter? Approximate Unlearning in LLMs

Papers Read on AI

Play Episode Listen Later Feb 4, 2024 21:43


Large language models (LLMs) are trained on massive internet corpora that often contain copyrighted content. This poses legal and ethical challenges for the developers and users of these models, as well as the original authors and publishers. In this paper, we propose a novel technique for unlearning a subset of the training data from a LLM, without having to retrain it from scratch. We evaluate our technique on the task of unlearning the Harry Potter books from the Llama2-7b model (a generative language model recently open-sourced by Meta). While the model took over 184K GPU-hours to pretrain, we show that in about 1 GPU hour of finetuning, we effectively erase the model's ability to generate or recall Harry Potter-related content, while its performance on common benchmarks (such as Winogrande, Hellaswag, arc, boolq and piqa) remains almost unaffected. We make our fine-tuned model publicly available on HuggingFace for community evaluation. To the best of our knowledge, this is the first paper to present an effective technique for unlearning in generative language models. Our technique consists of three main components: First, we use a reinforced model that is further trained on the target data to identify the tokens that are most related to the unlearning target, by comparing its logits with those of a baseline model. Second, we replace idiosyncratic expressions in the target data with generic counterparts, and leverage the model's own predictions to generate alternative labels for every token. These labels aim to approximate the next-token predictions of a model that has not been trained on the target data. Third, we finetune the model on these alternative labels, which effectively erases the original text from the model's memory whenever it is prompted with its context. 2023: Ronen Eldan, M. Russinovich https://arxiv.org/pdf/2310.02238.pdf

ZENKEI AI ポッドキャスト
S43E05 最近の LLMs(その3)ローカルで Llama2 を動かす

ZENKEI AI ポッドキャスト

Play Episode Listen Later Jan 24, 2024 17:00


こんにちわ! AI ポッドキャスト、シーズン43は2023年7月29日に開催した ZOOMライブの模様です。(⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠YouTube⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠) この日のテーマは「夏休みもAIだ!」です。 エピソード5は、パート2「最近の LLMs」その3、「ローカルで Llama2 を動かす」です。 当日の⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠発表資料

ZENKEI AI ポッドキャスト
S43E04 最近の LLMs(その2)Andrej Karpathy の llama2.c

ZENKEI AI ポッドキャスト

Play Episode Listen Later Jan 17, 2024 24:40


こんにちわ! AI ポッドキャスト、シーズン43は2023年7月29日に開催した ZOOMライブの模様です。(⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠YouTube⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠) この日のテーマは「夏休みもAIだ!」です。 エピソード4は、パート2「最近の LLMs」その2、「Andrej Karpathy の llama2.c」です。 当日の⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠発表資料

The Todd Herman Show
Weakness is the goal, food is the weapon, water is the fuel, mass illegal immigration is the explosive Episode 1,333

The Todd Herman Show

Play Episode Listen Later Jan 12, 2024 29:33


The man who runs the corrupt WHO has got some plans for what you are going to be allowed to eat. When he says more plant based, he means totally plant based. That means more building based farms creating patented food products. The undercurrent of this all is that it's our choice to eat it or starve. This is a war on strength and the war on water is soon to follow. But, what about illegal immigration as the ignition to all this? What does God's Word say? Genesis 1:26-28 26 Then God said, “Let us make mankind in our image, in our likeness, so that they may rule over the fish in the sea and the birds in the sky, over the livestock and all the wild animals,[a] and over all the creatures that move along the ground.”27 So God created mankind in his own image,in the image of God he created them;male and female he created them.28 God blessed them and said to them, “Be fruitful and increase in number; fill the earth and subdue it. Rule over the fish in the sea and the birds in the sky and over every living creature that moves on the ground.”Episode 1,333 Links:WHO head, Tedros Adhanom Ghebreyesus, declares war on meat and traditional agriculture, in the name of fighting "climate change"I Left Out the Full Truth to Get My Climate Change Paper Published; I just got published in Nature because I stuck to a narrative I knew the editors would like. That's not the way science should work.The American Psychological Association Wants (More) Federal Funding To Curb Online “Misinformation”; Another organization getting on the speech control bandwagon.How the globalists are attacking farmers. Nobody is addressing how this will affect the amish community. WhyDid you know: All major AI Language Models are explicitly trained to see a secure US border as "hate speech. "Microsoft's AI training tool ToxiGen is widely used across the industry to align models during finetuning. ChatGPT & Llama2 are in full alignment"It is very important to understand that when an individual is released, they are released into immigration enforcement proceedings!" (That means they're released into communities and asked to appear in court YEARS from now)At current rates, we can expect over 12 million encounters in the first term of Joe Biden's presidency. That's nearly as much as the preceding 3 terms combined.We now know who was running around behind the scenes kiboshing Jeffrey Epstein “pedo” stories…4Patriots https://4Patriots.com/Todd See this week's discounts and deals before they are gone and get free shipping on orders over $97. Alan's Soaps https://alanssoaps.com/TODD Use coupon code ‘TODD' to save an additional 10% off the bundle price. Bioptimizers https://bioptimizers.com/todd Use promo code TODD for 10% off your order. Bonefrog https://bonefrogcoffee.com/todd Use code TODD at checkout to receive 10% off your first purchase and 15% on subscriptions. Bulwark Capital Bulwark Capital Management (bulwarkcapitalmgmt.com) Get your FREE copy of Common Cents Investing at Know Your Risk Radio .com or call 866-779-RISK. SOTA Weight Loss https://sotaweightloss.com SOTA Weight Loss is, say it with me now, STATE OF THE ART! GreenHaven Interactive Digital Marketing https://greenhaveninteractive.com Your Worldclass Website Will Get Found on Google!

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

In 2023 we did a few Fundamentals episodes covering Benchmarks 101, Datasets 101, FlashAttention, and Transformers Math, and it turns out those were some of your evergreen favorites! So we are experimenting with more educational/survey content in the mix alongside our regular founder and event coverage. Pls request more!We have a new calendar for events; join to be notified of upcoming things in 2024!Today we visit the shoggoth mask factory: how do transformer models go from trawling a deeply learned latent space for next-token prediction to a helpful, honest, harmless chat assistant? Our guest “lecturer” today is ; you might know him from his prolific online writing on and Twitter, or from his previous work leading RLHF at HuggingFace and now at the Allen Institute for AI (AI2) which recently released the open source GPT3.5-class Tulu 2 model which was trained with DPO. He's widely considered one of the most knowledgeable people on RLHF and RLAIF. He recently gave an “RLHF 201” lecture at Stanford, so we invited him on the show to re-record it for everyone to enjoy! You can find the full slides here, which you can use as reference through this episode. Full video with synced slidesFor audio-only listeners, this episode comes with slide presentation along our discussion. You can find it on our YouTube (like, subscribe, tell a friend, et al).Theoretical foundations of RLHFThe foundation and assumptions that go into RLHF go back all the way to Aristotle (and you can find guidance for further research in the slide below) but there are two key concepts that will be helpful in thinking through this topic and LLMs in general:* Von Neumann–Morgenstern utility theorem: you can dive into the math here, but the TLDR is that when humans make decision there's usually a “maximum utility” function that measures what the best decision would be; the fact that this function exists, makes it possible for RLHF to model human preferences and decision making.* Bradley-Terry model: given two items A and B from a population, you can model the probability that A will be preferred to B (or vice-versa). In our world, A and B are usually two outputs from an LLM (or at the lowest level, the next token). It turns out that from this minimal set of assumptions, you can build up the mathematical foundations supporting the modern RLHF paradigm!The RLHF loopOne important point Nathan makes is that "for many tasks we want to solve, evaluation of outcomes is easier than producing the correct behavior". For example, it might be difficult for you to write a poem, but it's really easy to say if you like or dislike a poem someone else wrote. Going back to the Bradley-Terry Model we mentioned, the core idea behind RLHF is that when given two outputs from a model, you will be able to say which of the two you prefer, and we'll then re-encode that preference into the model.An important point that Nathan mentions is that when you use these preferences to change model behavior "it doesn't mean that the model believes these things. It's just trained to prioritize these things". When you have preference for a model to not return instructions on how to write a computer virus for example, you're not erasing the weights that have that knowledge, but you're simply making it hard for that information to surface by prioritizing answers that don't return it. We'll talk more about this in our future Fine Tuning 101 episode as we break down how information is stored in models and how fine-tuning affects it.At a high level, the loop looks something like this:For many RLHF use cases today, we can assume the model we're training is already instruction-tuned for chat or whatever behavior the model is looking to achieve. In the "Reward Model & Other Infrastructure" we have multiple pieces:Reward + Preference ModelThe reward model is trying to signal to the model how much it should change its behavior based on the human preference, subject to a KL constraint. The preference model itself scores the pairwise preferences from the same prompt (worked better than scalar rewards).One way to think about it is that the reward model tells the model how big of a change this new preference should make in the behavior in absolute terms, while the preference model calculates how big of a difference there is between the two outputs in relative terms. A lot of this derives from John Schulman's work on PPO:We recommend watching him talk about it in the video above, and also Nathan's pseudocode distillation of the process:Feedback InterfacesUnlike the "thumbs up/down" buttons in ChatGPT, data annotation from labelers is much more thorough and has many axis of judgement. At a simple level, the LLM generates two outputs, A and B, for a given human conversation. It then asks the labeler to use a Likert scale to score which one it preferred, and by how much:Through the labeling process, there are many other ways to judge a generation:We then use all of this data to train a model from the preference pairs we have. We start from the base instruction-tuned model, and then run training in which the loss of our gradient descent is the difference between the good and the bad prompt.Constitutional AI (RLAIF, model-as-judge)As these models have gotten more sophisticated, people started asking the question of whether or not humans are actually a better judge of harmfulness, bias, etc, especially at the current price of data labeling. Anthropic's work on the "Constitutional AI" paper is using models to judge models. This is part of a broader "RLAIF" space: Reinforcement Learning from AI Feedback.By using a "constitution" that the model has to follow, you are able to generate fine-tuning data for a new model that will be RLHF'd on this constitution principles. The RLHF model will then be able to judge outputs of models to make sure that they follow its principles:Emerging ResearchRLHF is still a nascent field, and there are a lot of different research directions teams are taking; some of the newest and most promising / hyped ones:* Rejection sampling / Best of N Sampling: the core idea here is that rather than just scoring pairwise generations, you are generating a lot more outputs (= more inference cost), score them all with your reward model and then pick the top N results. LLaMA2 used this approach, amongst many others.* Process reward models: in Chain of Thought generation, scoring each step in the chain and treating it like its own state rather than just scoring the full output. This is most effective in fields like math that inherently require step-by-step reasoning.* Direct Preference Optimization (DPO): We covered DPO in our NeurIPS Best Papers recap, and Nathan has a whole blog post on this; DPO isn't technically RLHF as it doesn't have the RL part, but it's the “GPU Poor” version of it. Mistral-Instruct was a DPO model, as do Intel's Neural Chat and StableLM Zephyr. Expect to see a lot more variants in 2024 given how “easy” this was.* Superalignment: OpenAI launched research on weak-to-strong generalization which we briefly discuss at the 1hr mark.Note: Nathan also followed up this post with RLHF resources from his and peers' work:Show Notes* Full RLHF Slides* Interconnects* Retort (podcast)* von Neumann-Morgenstern utility theorem* Bradley-Terry model (pairwise preferences model)* Constitutional AI* Tamer (2008 paper by Bradley Knox and Peter Stone)* Paul Christiano et al. RLHF paper* InstructGPT* Eureka by Jim Fan* ByteDance / OpenAI lawsuit* AlpacaEval* MTBench* TruthfulQA (evaluation tool)* Self-Instruct Paper* Open Assistant* Louis Castricato* Nazneen Rajani* Tulu (DPO model from the Allen Institute)Timestamps* [00:00:00] Introductions and background on the lecture origins* [00:05:17] History of RL and its applications* [00:10:09] Intellectual history of RLHF* [00:13:47] RLHF for decision-making and pre-deep RL vs deep RL* [00:20:19] Initial papers and intuitions around RLHF* [00:27:57] The three phases of RLHF* [00:31:09] Overfitting issues* [00:34:47] How preferences get defined* [00:40:35] Ballpark on LLaMA2 costs* [00:42:50] Synthetic data for training* [00:47:25] Technical deep dive in the RLHF process* [00:54:34] Projection / best event sampling* [00:57:49] Constitutional AI* [01:04:13] DPO* [01:08:54] What's the Allen Institute for AI?* [01:13:43] Benchmarks and models comparisonsTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:15]: Hey, and today we have Dr. Nathan Lambert in the house. Welcome.Nathan [00:00:18]: Thanks guys.Swyx [00:00:19]: You didn't have to come too far. You got your PhD in Berkeley, and it seems like you've lived there most of the time in recent years. You worked on robotics and model-based reinforcement learning on your PhD, and you also interned at FAIR and DeepMind. You bootstrapped the RLHF team at Hugging Face, and you recently joined the Allen Institute as a research scientist. So that's your quick bio. What should people know about you that maybe is not super obvious about you on New LinkedIn?Nathan [00:00:43]: I stay sane in various insane sport and ultra-endurance sport activities that I do.Swyx [00:00:50]: What's an ultra-endurance sport activity?Nathan [00:00:52]: Long-distance trail running or gravel biking. Try to unplug sometimes, although it's harder these days. Yeah.Swyx [00:00:59]: Well, you know, just the Bay Area is just really good for that stuff, right?Nathan [00:01:02]: Oh, yeah. You can't beat it. I have a trailhead like 1.2 miles from my house, which is pretty unmatchable in any other urban area.Swyx [00:01:11]: Pretty excellent. You also have an incredible blog, Interconnects, which I'm a fan of. And I also just recently discovered that you have a new podcast, Retort.Nathan [00:01:20]: Yeah, we do. I've been writing for a while, and I feel like I've finally started to write things that are understandable and fun. After a few years lost in the wilderness, if you ask some of my friends that I made read the earlier blogs, they're like, oh, this is yikes, but it's coming along. And the podcast is with my friend Tom, and we just kind of like riff on what's actually happening on AI and not really do news recaps, but just what it all means and have a more critical perspective on the things that really are kind of funny, but still very serious happening in the world of machine learning.Swyx [00:01:52]: Yeah. Awesome. So let's talk about your work. What would you highlight as your greatest hits so far on Interconnects, at least?Nathan [00:01:59]: So the ones that are most popular are timely and or opinion pieces. So the first real breakout piece was when April and I also just wrote down the thing that everyone in AI was feeling, which is we're all feeling stressed, that we're going to get scooped, and that we're overworked, which is behind the curtain, what it feels to work in AI. And then a similar one, which we might touch on later in this, was about my recent job search, which wasn't the first time I wrote a job search post. People always love that stuff. It's so open. I mean, it's easy for me to do in a way that it's very on-brand, and it's very helpful. I understand that until you've done it, it's hard to share this information. And then the other popular ones are various model training techniques or fine tuning. There's an early one on RLHF, which is, this stuff is all just like when I figure it out in my brain. So I wrote an article that's like how RLHF actually works, which is just the intuitions that I had put together in the summer about RLHF, and that was pretty well. And then I opportunistically wrote about QSTAR, which I hate that you have to do it, but it is pretty funny. From a literature perspective, I'm like, open AI publishes on work that is very related to mathematical reasoning. So it's like, oh, you just poke a little around what they've already published, and it seems pretty reasonable. But we don't know. They probably just got like a moderate bump on one of their benchmarks, and then everyone lost their minds. It doesn't really matter.Swyx [00:03:15]: You're like, this is why Sam Altman was fired. I don't know. Anyway, we're here to talk about RLHF 101. You did a presentation, and I think you expressed some desire to rerecord it. And that's why I reached out on Twitter saying, like, why not rerecord it with us, and then we can ask questions and talk about it. Yeah, sounds good.Nathan [00:03:30]: I try to do it every six or 12 months is my estimated cadence, just to refine the ways that I say things. And people will see that we don't know that much more, but we have a bit of better way of saying what we don't know.Swyx [00:03:43]: Awesome. We can dive right in. I don't know if there's any other topics that we want to lay out as groundwork.Alessio [00:03:48]: No, you have some awesome slides. So for people listening on podcast only, we're going to have the slides on our show notes, and then we're going to have a YouTube version where we run through everything together.Nathan [00:03:59]: Sounds good. Yeah. I think to start skipping a lot of the, like, what is a language model stuff, everyone knows that at this point. I think the quote from the Llama 2 paper is a great kind of tidbit on RLHF becoming like a real deal. There was some uncertainty earlier in the year about whether or not RLHF was really going to be important. I think it was not that surprising that it is. I mean, with recent models still using it, the signs were there, but the Llama 2 paper essentially reads like a bunch of NLP researchers that were skeptical and surprised. So the quote from the paper was, meanwhile, reinforcement learning known for its instability seemed a somewhat shadowy field for those in the NLP research community. However, reinforcement learning proved highly effective, particularly given its cost and time effectiveness. So you don't really know exactly what the costs and time that Meta is looking at, because they have a huge team and a pretty good amount of money here to release these Llama models. This is just the kind of thing that we're seeing now. I think any major company that wasn't doing RLHF is now realizing they have to have a team around this. At the same time, we don't have a lot of that in the open and research communities at the same scale. I think seeing that converge would be great, but it's still very early days. And the other thing on the slide is some of Anthropic's work, but everyone knows Anthropic is kind of the masters of this, and they have some of their own techniques that we're going to talk about later on, but that's kind of where we start.Alessio [00:05:17]: Can we do just a one-second RL version? So you come from a robotics background, which RL used to be, or maybe still is, state-of-the-art. And then now you're seeing a lot of LLM plus RL, so you have the gym fans, Eureka, you have MPU, which we had on the podcast when they started with RL. Now they're doing RL plus LLMs. Yeah. Any thoughts there on how we got here? Maybe how the pendulum will keep swinging?Nathan [00:05:46]: I really think RL is about a framing of viewing the world through trial and error learning and feedback, and really just one that's focused on thinking about decision-making and inputs in the world and how inputs have reactions. And in that, a lot of people come from a lot of different backgrounds, whether it's physics, electrical engineering, mechanical engineering. There are obviously computer scientists, but compared to other fields of CS, I do think it's a much more diverse background of people. My background was in electrical engineering and doing robotics and things like that. It really just changes the worldview. I think that reinforcement learning as it was back then, so to say, is really different. You're looking at these toy problems and the numbers are totally different, and everyone went kind of zero to one at scaling these things up, but people like Jim Phan and other people that were... You saw this transition in the decision transformer and papers and when people are trying to use transformers to do decision-making for things like offline RL, and I think that was kind of like the early days. But then once language models were so proven, it's like everyone is using this tool for their research. I think in the long run, it will still settle out, or RL will still be a field that people work on just because of these kind of fundamental things that I talked about. It's just viewing the whole problem formulation different than predicting text, and so there needs to be that separation. And the view of RL in language models is pretty contrived already, so it's not like we're doing real RL. I think the last slide that I have here is a way to make RLHF more like what people would think of with RL, so actually running things over time, but a weird lineage of tools that happen to get us to where we are, so that's why the name takes up so much space, but it could have gone a lot of different ways. Cool.Alessio [00:07:29]: We made it one slide before going on a tangent.Nathan [00:07:31]: Yeah, I mean, it's kind of related. This is a...Swyx [00:07:35]: Yeah, so we have a history of RL.Nathan [00:07:37]: Yeah, so to give the context, this paper really started because I have this more diverse background than some computer scientists, such as trying to understand what the difference of a cost function or a reward function and a preference function would be without going into all of the details. Costs are normally things that control theorists would work with in these kind of closed domains, and then reinforcement learning has always worked with rewards that's central to the formulation that we'll see, and then the idea was like, okay, we now are at preferences, and each step along the way there's kind of different assumptions that you're making. We'll get into these, and those assumptions are built on other fields of work. So that's what this slide is going to say, it's like RLHF, while directly building on tools from RL and language models, is really implicitly impacted and built on theories and philosophies spanning tons of human history. I think we cite Aristotle in this paper, which is fun. It's like going pre-BC, it's like 2,300 years old or something like that. So that's the reason to do this, I think. We kind of list some things in the paper about summarizing what different presumptions of RLHF could be. I think going through these is actually kind of funny. It's fun to talk about these, because they're kind of grab bags of things that you'll see return throughout this podcast that we're talking about it. The core thing of RLHF that, in order to be a believer in this, is that RL actually works. It's like, if you have a reward function, you can optimize it in some way and get a different performance out of it, and you could do this at scale, and you could do this in really complex environments, which is, I don't know how to do that in all the domains. I don't know how to exactly make chat GPT. So it's kind of, we'll overshadow everything. And then there's, go from something kind of obvious like that, and then you read the von Neumann-Morgenstern utility theorem, which is essentially an economic theory that says you can weight different probabilities of different people, which is a theoretical piece of work that is the foundation of utilitarianism, and trying to quantify preferences is crucial to doing any sort of RLHF. And if you look into this, all of these things, there's way more you could go into if you're interested in any of these. So this is kind of like grabbing a few random things, and then kind of similar to that is the Bradley-Terry model, which is the fancy name for the pairwise preferences that everyone is doing. And then all the things that are like, that Anthropic and OpenAI figured out that you can do, which is that you can aggregate preferences from a bunch of different people and different sources. And then when you actually do RLHF, you extract things from that data, and then you train a model that works somehow. And we don't know, there's a lot of complex links there, but if you want to be a believer in doing this at scale, these are the sorts of things that you have to accept as preconditions for doing RLHF. Yeah.Swyx [00:10:09]: You have a nice chart of like the sort of intellectual history of RLHF that we'll send people to refer to either in your paper or in the YouTube video for this podcast. But I like the other slide that you have on like the presumptions that you need to have for RLHF to work. You already mentioned some of those. Which one's underappreciated? Like, this is the first time I've come across the VNM Utility Theorem.Nathan [00:10:29]: Yeah, I know. This is what you get from working with people like to my co-host on the podcast, the rhetoric is that sociologist by training. So he knows all these things and like who the philosophers are that found these different things like utilitarianism. But there's a lot that goes into this. Like essentially there's even economic theories that like there's debate whether or not preferences exist at all. And there's like different types of math you can use with whether or not you actually can model preferences at all. So it's pretty obvious that RLHF is built on the math that thinks that you can actually model any human preference. But this is the sort of thing that's been debated for a long time. So all the work that's here is like, and people hear about in their AI classes. So like Jeremy Bentham, like hedonic calculus and all these things like these are the side of work where people assume that preferences can be measured. And this is like, I don't really know, like, this is what I kind of go on a rant and I say that in RLHF calling things a preference model is a little annoying because there's no inductive bias of what a preference is. It's like if you were to learn a robotic system and you learned a dynamics model, like hopefully that actually mirrors the world in some way of the dynamics. But with a preference model, it's like, Oh my God, I don't know what this model, like I don't know what chat GPT encodes as any sort of preference or what I would want it to be in a fair way. Anthropic has done more work on trying to write these things down. But even like if you look at Claude's constitution, like that doesn't mean the model believes these things. It's just trained to prioritize these things. And that's kind of what the later points I'm looking at, like what RLHF is doing and if it's actually like a repeatable process in the data and in the training, that's just unknown. And we have a long way to go before we understand what this is and the link between preference data and any notion of like writing down a specific value.Alessio [00:12:05]: The disconnect between more sociology work versus computer work already exists, or is it like a recent cross contamination? Because when we had Tri Dao on the podcast, he said FlashAttention came to be because at Hazy they have so much overlap between systems engineer and like deep learning engineers. Is it the same in this field?Nathan [00:12:26]: So I've gone to a couple of workshops for the populations of people who you'd want to include this like R. I think the reason why it's not really talked about is just because the RLHF techniques that people use were built in labs like OpenAI and DeepMind where there are some of these people. These places do a pretty good job of trying to get these people in the door when you compare them to like normal startups. But like they're not bringing in academics from economics, like social choice theory. There's just too much. Like the criticism of this paper that this is based on is like, oh, you're missing these things in RL or at least this decade of RL and it's like it would be literally be bigger than the Sutton and Barto book if you were to include everyone. So it's really hard to include everyone in a principled manner when you're designing this. It's just a good way to understand and improve the communication of what RLHF is and like what is a good reward model for society. It really probably comes down to what an individual wants and it'll probably motivate models to move more in that direction and just be a little bit better about the communication, which is a recurring theme and kind of my work is like I just get frustrated when people say things that don't really make sense, especially when it's going to manipulate individual's values or manipulate the general view of AI or anything like this. So that's kind of why RLHF is so interesting. It's very vague in what it's actually doing while the problem specification is very general.Swyx [00:13:42]: Shall we go to the, I guess, the diagram here on the reinforcement learning basics? Yeah.Nathan [00:13:47]: So reinforcement learning, I kind of mentioned this, it's a trial and error type of system. The diagram and the slides is really this classic thing where you have an agent interacting with an environment. So it's kind of this agent has some input to the environment, which is called the action. The environment returns a state and a reward and that repeats over time and the agent learns based on these states and these rewards that it's seeing and it should learn a policy that makes the rewards go up. That seems pretty simple than if you try to mentally map what this looks like in language, which is that like the language models don't make this easy. I think with the language model, it's very hard to define what an environment is. So if the language model is the policy and it's generating, it's like the environment should be a human, but setting up the infrastructure to take tens of thousands of prompts and generate them and then show them to a human and collect the human responses and then shove that into your training architecture is very far away from working. So we don't really have an environment. We just have a reward model that returns a reward and the state doesn't really exist when you look at it like an RL problem. What happens is the state is a prompt and then you do a completion and then you throw it away and you grab a new prompt. We're really in as an RL researcher, you would think of this as being like you take a state, you get some completion from it and then you look at what that is and you keep kind of iterating on it and all of that isn't here, which is why you'll hear RLHF referred to as bandits problem, which is kind of like you choose one action and then you watch the dynamics play out. There's many more debates that you can have in this. If you get the right RL people in the room, then kind of like this is an RL even when you zoom into what RLHF is doing.Alessio [00:15:22]: Does this change as you think about a chain of thought reasoning and things like that? Like does the state become part of the chain that you're going through?Nathan [00:15:29]: There's work that I've mentioned on one slide called process reward models that essentially rewards each step in the chain of thought reasoning. It doesn't really give the part of interaction, but it does make it a little bit more fine grained where you can think about like calling it at least you have many states from your initial state. That formulation I don't think people have fully settled on. I think there's a bunch of great work out there, like even OpenAI is releasing a lot of this and let's verify step by step is there pretty great paper on the matter. I think in the next year that'll probably get made more concrete by the community on like if you can easily draw out like if chain of thought reasoning is more like RL, we can talk about that more later. That's a kind of a more advanced topic than we probably should spend all the time on.Swyx [00:16:13]: RLHF for decision making. You have a slide here that compares pre-deep RL versus deep RL.Nathan [00:16:19]: This is getting into the history of things, which is showing that the work that people are using now really came from well outside of NLP and it came before deep learning was big. Next up from this paper, Tamer, which is from 2008. Some names that are still really relevant in kind of human centric RL, Bradley Knox and Peter Stone. If you have an agent take an action, you would just have a human give a score from zero to one as a reward rather than having a reward function. And then with that classifier, you can do something with a policy that learns to take actions to maximize that reward. It's a pretty simple setup. It works in simple domains. And then the reason why this is interesting is you compare it to the paper that everyone knows, which is this Paul Christiano et al. Deep Reinforced Learning from Human Preferences paper, which is where they showed that learning from human preferences, you can solve like the basic RL tasks at the time. So various control problems and simulation and this kind of like human preferences approach had higher rewards in some environments than if you just threw RL at the environment that returned a reward. So the preferences thing was you took two trajectories. So in this case, it was like complete trajectories of the agent and the human was labeling which one is better. You can see how this kind of comes to be like the pairwise preferences that are used today that we'll talk about. And there's also a really kind of interesting nugget that is the trajectory that the humans were labeling over has a lot more information than the RL algorithm would see if you just had one state, which is kind of why people think that it's why the performance in this paper was so strong. But I still think that it's surprising that there isn't more RL work of this style happening now. This paper is in 2017. So it's like six years later and I haven't seen things that are exactly similar, but it's a great paper to understand where stuff that's happening now kind of came from.Swyx [00:17:58]: Just on the Christiano paper, you mentioned the performance being strong. I don't remember what results should I have in mind when I think about that paper?Nathan [00:18:04]: It's mostly like if you think about an RL learning curve, which is like on the X axis, you have environment interactions on the Y axis, you have performance. You can think about different like ablation studies of between algorithms. So I think they use like A2C, which I don't even remember what that stands for as their baseline. But if you do the human preference version on a bunch of environments, like the human preference labels, the agent was able to learn faster than if it just learned from the signal from the environment, which means like it's happening because the reward model has more information than the agent would. But like the fact that it can do better, I was like, that's pretty surprising to me because RL algorithms are pretty sensitive. So I was like, okay.Swyx [00:18:41]: It's just one thing I do want to establish as a baseline for our listeners. We are updating all the weights. In some sense, the next token prediction task of training a language model is a form of reinforcement learning. Except that it's not from human feedback. It's just self-supervised learning from a general corpus. There's one distinction which I love, which is that you can actually give negative feedback. Whereas in a general sort of pre-training situation, you cannot. And maybe like the order of magnitude of feedback, like the Likert scale that you're going to talk about, that actually just gives more signal than a typical training process would do in a language model setting. Yeah.Nathan [00:19:15]: I don't think I'm the right person to comment exactly, but like you can make analogies that reinforcement learning is self-supervised learning as well. Like there are a lot of things that will point to that. I don't know whether or not it's a richer signal. I think that could be seen in the results. It's a good thing for people to look into more. As reinforcement learning is so much less compute, like it is a richer signal in terms of its impact. Because if they could do what RLHF is doing at pre-training, they would, but they don't know how to have that effect in like a stable manner. Otherwise everyone would do it.Swyx [00:19:45]: On a practical basis, as someone fine-tuning models, I have often wished for negative fine-tuning, which pretty much doesn't exist in OpenAI land. And it's not the default setup in open-source land.Nathan [00:19:57]: How does this work in like diffusion models and stuff? Because you can give negative prompts to something to like stable diffusion or whatever. It's for guidance.Swyx [00:20:04]: That's for clip guidance.Nathan [00:20:05]: Is that just from like how they prompt it then? I'm just wondering if we could do something similar. It's another tangent.Swyx [00:20:10]: I do want to sort of spell that out for people in case they haven't made the connection between RLHF and the rest of the training process. They might have some familiarity with it.Nathan [00:20:19]: Yeah. The upcoming slides can really dig into this, which is like this in 2018 paper, there was a position paper from a bunch of the same authors from the Christiano paper and from the OpenAI work that everyone knows, which is like, they write a position paper on what a preference reward model could do to solve alignment for agents. That's kind of based on two assumptions. The first assumption is that we can learn user intentions to a sufficiently high accuracy. That doesn't last with me because I don't know what that means. But the second one is pretty telling in the context of RLHF, which is for many tasks we want to solve, evaluation of outcomes is easier than producing the correct behavior. And this is the whole thing. It's like we can compare two poems that the model generates and it can be viewed as liking a positive example, or it could be viewed as really disliking a negative example. And that's what I think a lot of people are doing in like the harm space is like a harmful response to a language model, whether or not you agree with the company's definition of harms is that it's a really bad negative example and they downweight them by preferring something more benign in the RLHF process, among other ways of dealing with safety. So that's a good way of saying it's like this is core, this kind of like comparison and positive or negative example is core to all of the RLHF work that has continued.Swyx [00:21:29]: People often say, I don't know what I want, but I'll know when I see it. This is that expressed in reinforcement learning tools.Nathan [00:21:35]: Yeah, it is. Yeah, it is. That's what everyone's doing in the preference modeling stage that we'll get to. Yeah. Yeah. And you can see there are more papers. This is really just to have all the links for people that go deeper. There's a Ziegler et al. paper in 2019, which shows that you can do this RLHF process on language models. This familiar diagram starts to emerge in 2019, and it's just to show that this goes really far back. I think we can kind of breeze through some of these. And then 2020 is the first open AI experiment that I think caught people's eyes, which is this learning to summarize experiment. It has this three-step process that we'll go to into more when I kind of go into the main concepts. But this is like the first time you see this diagram that they reuse with InstructGPT, they reuse with ChatGPT. And the types of examples that they would have, I don't think I need to read these exactly, but one that I have read a whole bunch of times is like, they took these prompts from Reddit that was like, explain like I'm five or get career advice, and people really pour their heart and soul into these. So these are like multi-paragraph pieces of writing. And then they essentially do comparisons between a vanilla language model, like I think it was either GPT-2 or GPT-3, I don't always get the exact years.Swyx [00:22:42]: 3 was early 2020. So that's about right.Nathan [00:22:45]: Yeah. So this is probably done with GPT-2. It doesn't really matter. But the language model does normal things when you do few shot, which is like it repeats itself. It doesn't have nice text. And what they did is that this was the first time where the language model would generate like pretty nice text from an output. It was restricted to the summarization domain. But I think that I guess this is where I wish I was paying attention more because I would see the paper, but I didn't know to read the language model outputs and kind of understand this qualitative sense of the models very well then. Because you look at the plots in the papers, these Learning to Summarize and Destruct GPT have incredibly pretty plots, just like nicely separated lines with error bars and they're like superfine tuning works, the RL step works. But if you were early to see like how different the language that was written by these models was, I think you could have been early to like things like ChatGPT and knowing RLHF would matter. And now I think the good people know to chat with language models, but not even everyone does this. Like people are still looking at numbers. And I think OpenAI probably figured it out when they were doing this, how important that could be. And then they had years to kind of chisel away at that and that's why they're doing so well now. Yeah.Swyx [00:23:56]: I mean, arguably, you know, it's well known that ChatGPT was kind of an accident that they didn't think it would be that big of a deal. Yeah.Nathan [00:24:02]: So maybe they didn't. Maybe they didn't, but they were getting the proxy that they needed.Swyx [00:24:06]: I've heard off the record from other labs that it was in the air. If OpenAI didn't do it, someone else would have done it. So you've mentioned a couple of other papers that are very seminal to this period. And I love how you say way back when in referring to 2019.Nathan [00:24:19]: It feels like it in my life.Swyx [00:24:21]: So how much should people understand the relationship between RLHF, instruction tuning, PPO, KL divergence, anything like that? Like how would you construct the level of knowledge that people should dive into? What should people know at the high level? And then if people want to dive in deeper, where do they go? Is instruct tuning important here or is that part of the overall process towards modern RLHF?Nathan [00:24:44]: I think for most people, instruction tuning is probably still more important in their day to day life. I think instruction tuning works very well. You can write samples by hand that make sense. You can get the model to learn from them. You could do this with very low compute. It's easy to do almost in like no code solutions at this point. And the loss function is really straightforward. And then if you're interested in RLHF, you can kind of learn from it from a different perspective, which is like how the instruction tuning distribution makes it easier for your RLHF model to learn. There's a lot of details depending on your preference data, if it's close to your instruction model or not, if that matters. But that's really at the RLHF stage. So I think it's nice to segment and just kind of understand what your level of investment and goals are. I think instruction tuning still can do most of what you want to do. And it's like, if you want to think about RLHF, at least before DPO really had taken off at all, it would be like, do you want to have a team of at least like five people if you're really thinking about doing RLHF? I think DPO makes it a little bit easier, but that's still really limited to kind of one data set that everyone's using at this point. Like everyone's using this ultra feedback data set and it boosts AlpacaVal, MTBench, TruthfulQA and like the qualitative model a bit. We don't really know why. It's like, it might just be a data set combined with the method, but you've got to be ready for a bumpy ride if you're wanting to try to do RLHF. I don't really recommend most startups to do it unless it's like going to provide them a clear competitive advantage in their kind of niche, because you're not going to make your model chat GPT like better than OpenAI or anything like that. You've got to accept that there's some exploration there and you might get a vein of benefit in your specific domain, but I'm still like, oh, be careful going into the RLHF can of worms. You probably don't need to.Swyx [00:26:27]: Okay. So there's a bit of a time skip in what you mentioned. DPO is like a couple months old, so we'll leave that towards the end. I think the main result that I think most people talk about at this stage, we're talking about September 2020 and then going into, I guess maybe last year was Vicuña as one of the more interesting applications of instruction tuning that pushed LLAMA1 from, let's say a GPT 3-ish model to a GPT 3.5 model in pure open source with not a lot of resources. I think, I mean, they said something like, you know, they use like under $100 to makeNathan [00:26:58]: this. Yeah. Like instruction tuning can really go a long way. I think the claims of chat GPT level are long overblown in most of the things in open source. I think it's not to say, like Vicuña was a huge step and it's just kind of showing that instruction tuning with the right data will completely change what it feels like to talk with your model. Yeah.Swyx [00:27:19]: From text completion to actually chatting back and forth. Yeah. Yeah.Nathan [00:27:23]: Instruction tuning can be multi-turn. Just having a little bit of data that's like a couple of turns can go a really long way. That was like the story of the whole first part of the year is like people would be surprised by how far you can take instruction tuning on a small model. I think the things that people see now is like the small models don't really handle nuance as well and they could be more repetitive even if they have really good instruction tuning. But if you take that kind of 7 to 70 billion parameter jump, like the instruction tuning at the bigger model is like robustness, little things make more sense. So that's still just with instruction tuning and scale more than anything else.Swyx [00:27:56]: Excellent. Shall we go to technical overview?Nathan [00:27:58]: Yeah. This is kind of where we go through my own version of this like three phase process. You can talk about instruction tuning, which we've talked about a lot. It's funny because all these things, instruction tuning has the fewest slides, even though it's the most practical thing for most people. We could save the debate for like if the big labs still do instruction tuning for later, but that's a coming wave for people. And then like preference data and training and then kind of like what does reinforce learning optimization actually mean? We talk about these sequentially because you really have to be able to do each of them to be able to do the next one. You need to be able to have a model that's chatty or helpful instruction following. Every company has their own word that they like to assign to what instructions mean. And then once you have that, you can collect preference data and do some sort of optimization.Swyx [00:28:39]: When you say word, you mean like angle bracket inst or do you mean something else?Nathan [00:28:42]: Oh, I don't even know what inst means, but just saying like they use their adjective that they like. I think Entropic also like steerable is another one.Swyx [00:28:51]: Just the way they describe it. Yeah.Nathan [00:28:53]: So like instruction tuning, we've covered most of this is really about like you should try to adapt your models to specific needs. It makes models that were only okay, extremely comprehensible. A lot of the times it's where you start to get things like chat templates. So if you want to do system prompts, if you want to ask your model, like act like a pirate, that's one of the ones I always do, which is always funny, but like whatever you like act like a chef, like anything, this is where those types of things that people really know in language models start to get applied. So it's good as a kind of starting point because this chat template is used in our early childhood and all of these things down the line, but it was a basic pointer. It's like, once you see this with instruction tuning, you really know it, which is like you take things like stack overflow where you have a question and an answer. You format that data really nicely. There's much more tricky things that people do, but I still think the vast majority of it is question answer. Please explain this topic to me, generate this thing for me. That hasn't changed that much this year. I think people have just gotten better at scaling up the data that they need. Yeah, this is where this talk will kind of take a whole left turn into more technical detail land. I put a slide with the RLHF objective, which I think is good for people to know. I've started going back to this more, just kind of understand what is trying to happen here and what type of math people could do. I think because of this algorithm, we've mentioned this, it's in the air, direct preference optimization, but everything kind of comes from an equation of trying to learn a policy that maximizes the reward. The reward is some learned metric. A lot can be said about what the reward should be subject to some constraint. The most popular constraint is the KL distraint, which is just a distributional distance. Essentially in language models, that means if you have a completion from your instruction or RLHF model, you can compare that completion to a base model. And looking at the log probs from the model, which are essentially how likely each token is, you can see a rough calculation of the distance between these two models, just as a scalar number. I think what that actually looks like in code, you can look at it. It'd be like a sum of log probs that you get right from the model. It'll look much more simpler than it sounds, but it is just to make the optimization kind of stay on tracks.Make sure it doesn't overfit to the RLHF data. Because we have so little data in RLHF, overfitting is really something that could happen. I think it'll fit to specific features that labelers like to see, that the model likes to generate, punctuation, weird tokens like calculator tokens. It could overfit to anything if it's in the data a lot and it happens to be in a specific format. And the KL constraint prevents that. There's not that much documented work on that, but there's a lot of people that know if you take that away, it just doesn't work at all. I think it's something that people don't focus on too much. But the objective, as I said, it's just kind of, you optimize the reward. The reward is where the human part of this comes in. We'll talk about that next. And then subject to a constraint, don't change the model too much. The real questions are, how do you implement the reward? And then how do you make the reward go up in a meaningful way? So like a preference model, the task is kind of to design a human reward. I think the equation that most of the stuff is based on right now is something called a Bradley-Terry model, which is like a pairwise preference model where you compare two completions and you say which one you like better. I'll show an interface that Anthropic uses here. And the Bradley-Terry model is really a fancy probability between two selections. And what's happening in the math is that you're looking at the probability that the chosen completion, the one you like better, is actually the better completion over the rejected completion. And what these preference models do is they assume this probability is correlated to reward. So if you just sample from this probability, it'll give you a scalar. And then you use that reward later on to signify what piece of text is better. I'm kind of inclined to breeze through the math stuff because otherwise, it's going to be not as good to listen to.Alessio [00:32:49]: I think people want to hear it. I think there's a lot of higher level explanations out there. Yeah.Nathan [00:32:55]: So the real thing is you need to assign a scalar reward of how good a response is. And that's not necessarily that easy to understand. Because if we take back to one of the first works, I mentioned this tamer thing for decision making. People tried that with language models, which is if you have a prompt in a completion and you just have someone rate it from 0 to 10, could you then train a reward model on all of these completions in 0 to 10 ratings and see if you can get chat2BT with that? And the answer is really kind of no. Like a lot of people tried that. It didn't really work. And then that's why they tried this pairwise preference thing. And it happened to work. And this Bradley Terry model comes from the 50s. It's from these fields that I was mentioning earlier. And it's wild how much this happens. I mean, this screenshot I have in the slides is from the DPO paper. I think it might be the appendix. But it's still really around in the literature of what people are doing for RLHF.Alessio [00:33:45]: Yeah.Nathan [00:33:45]: So it's a fun one to know.Swyx [00:33:46]: I'll point out one presumption that this heavily relies on. You mentioned this as part of your six presumptions that we covered earlier, which is that you can aggregate these preferences. This is not exactly true among all humans, right? I have a preference for one thing. You have a preference for a different thing. And actually coming from economics, you mentioned economics earlier. There's a theorem or a name for this called error impossibility, which I'm sure you've come across..Nathan [00:34:07]: It's one of the many kind of things we throw around in the paper.Swyx [00:34:10]: Right. Do we just ignore it?Nathan [00:34:14]: We just, yeah, just aggregate. Yeah. I think the reason this really is done on a deep level is that you're not actually trying to model any contestable preference in this. You're not trying to go into things that are controversial or anything. It's really the notion of preference is trying to stay around correctness and style rather than any meaningful notion of preference. Because otherwise these companies, they don't want to do this at all. I think that's just how it is. And it's like, if you look at what people actually do. So I have a bunch of slides on the feedback interface. And they all publish this.Swyx [00:34:43]: It's always at the appendices of every paper.Nathan [00:34:47]: There's something later on in this talk, which is like, but it's good to mention. And this is when you're doing this preference collection, you write out a very long document of instructions to people that are collecting this data. And it's like, this is the hierarchy of what we want to prioritize. Something amount like factuality, helpfulness, honestness, harmlessness. These are all different things. Every company will rank these in different ways, provide extensive examples. It's like, if you see these two answers, you should select this one and why. And all of this stuff. And then my kind of like head scratching is like, why don't we check if the models actually do these things that we tell the data annotators to collect? But I think it's because it's hard to make that attribution. And it's hard to test if a model is honest and stuff. It would just be nice to understand the kind of causal mechanisms as a researcher or like if our goals are met. But at a simple level, what it boils down to, I have a lot more images than I need. It's like you're having a conversation with an AI, something like type GPT. You get shown two responses or more in some papers, and then you have to choose which one is better. I think something you'll hear a lot in this space is something called a Likert scale. Likert is a name. It's a name for probably some research in economics, decision theory, something. But essentially, it's a type of scale where if you have integers from like one to eight, the middle numbers will represent something close to a tie. And the smallest numbers will represent one model being way better than the other. And the biggest numbers will be like the other models better. So in the case of one to eight, if you're comparing models A to B, if you return a one, if you really liked option A, you return eight if you really like B, and then like a four or five if they were close. There's other ways to collect this data. This one's become really popular. We played with it a bit at Hugging Face. It's hard to use. Filling out this preference data is really hard. You have to read like multiple paragraphs. It's not for me. Some people really like it. I hear I'm like, I can't imagine sitting there and reading AI-generated text and like having to do that for my job. But a lot of these early papers in RLHF have good examples of what was done. The one I have here is from Anthropic's collection demo because it was from slides that I did with Anthropic. But you can look up these in the various papers. It looks like Chat2BT with two responses, and then you have an option to say which one is better. It's nothing crazy. The infrastructure is almost exactly the same, but they just log which one you think is better. I think places like Scale are also really big in this where a lot of the labeler companies will help control like who's doing how many samples. You have multiple people go over the same sample once and like what happens if there's disagreement. I don't really think this disagreement data is used for anything, but it's good to know like what the distribution of prompts is, who's doing it, how many samples you have, controlling the workforce. All of this is very hard. A last thing to add is that a lot of these companies do collect optional metadata. I think the Anthropic example shows a rating of like how good was the prompt or the conversation from good to bad because things matter. Like there's kind of a quadrant of preference data in my mind, which is you're comparing a good answer to a good answer, which is like really interesting signal. And then there's kind of the option of you're comparing a bad answer to a bad answer, which is like you don't want to train your model on two different issues. This is like, we did this at Hugging Base and it was like, our data was like, we don't know if we can use this because a lot of it was just bad answer to bad answer because you're like rushing to try to do this real contract. And then there's also good answer to bad answer, which I think is probably pretty reasonable to include. You just prefer the good one and move on with your life. But those are very different scenarios. I think open AIs of the world are all in good answer, good answer, and have learned to eliminate everything else. But when people try to do this in open source, it's probably like what Open Assistance saw is like, there's just a lot of bad answers in your preference data. And you're like, what do I do with this? Metadata flags can help. I threw in the instruct GPT metadata. You can see how much they collect here. And like everything from the model fails to actually complete the task, hallucinations, different types of offensive or dangerous content, moral judgment, expresses opinion. Like, I don't know exactly if they're doing this now, but you can kind of see why doing RLHF at scale and prioritizing a lot of different endpoints would be hard because these are all things I'd be interested in if I was scaling up a big team to do RLHF and like what is going into the preference data. You do an experiment and you're like, okay, we're going to remove all the data where they said the model hallucinates like just that and then retrain everything. Like, what does that do?Swyx [00:38:59]: Yeah, so hallucination is big, but some of these other metadata categories, and I've seen this in a lot of papers, it's like, does it contain sexual content? Does it express a moral judgment? Does it denigrate a protected class? That kind of stuff, very binary. Should people try to adjust for this at the RLHF layer or should they put it as a pipeline where they have a classifier as a separate model that grades the model output?Nathan [00:39:20]: Do you mean for training or like a deployment? Deployment. I do think that people are doing it at deployment. I think we've seen safety and other things in the RLHF pipeline. Like Lama 2 is famous for kind of having this like helpfulness and safety reward models. Deep in the Gemini report is something that Gemini has like four things, which is like helpfulness, factuality, maybe safety, maybe something else. But places like Anthropic and Chattopadhyay and Bard almost surely have a classifier after, which is like, is this text good? Is this text bad? That's not that surprising, I think, because you could use like a hundred times smaller language model and do much better at filtering than RLHF. But I do think it's still so deeply intertwined with the motivation of RLHF to be for safety that some of these categories still persist. I think that's something I'll kind of settle out, I think.Swyx [00:40:11]: I'm just wondering if it's worth collecting this data for the RLHF purpose, if you're not going to use it in any way, separate model to-Nathan [00:40:18]: Yeah, I don't think OpenAI will collect all of this anymore, but I think for research perspectives, it's very insightful to know, but it's also expensive. So essentially your preference data scales with how many minutes it takes for you to do each task and every button is like, it scales pretty linearly. So it's not cheap stuff.Swyx [00:40:35]: Can we, since you mentioned expensiveness, I think you may have joined one of our spaces back in Lama 2 was released. We had an estimate from you that was something on the order of Lama 2 costs $3 to $6 million to train GPU-wise, and then it was something like $20 to $30 million in preference data. Is that something that's still in the ballpark? I don't need precise numbers.Nathan [00:40:56]: I think it's still a ballpark. I know that the 20 million was off by a factor of four because I was converting from a prompt number to a total data point. So essentially when you do this, if you have multi-turn setting, each turn will be one data point and the Lama 2 paper reports like 1.5 million data points, which could be like 400,000 prompts. So I would say it's still say like 6 to 8 million is safe to say that they're spending, if not more, they're probably also buying other types of data and or throwing out data that they don't like, but it's very comparable to compute costs. But the compute costs listed in the paper always are way lower because all they have to say is like, what does one run cost? But they're running tens or hundreds of runs. So it's like, okay, like... Yeah, it's just kind of a meaningless number. Yeah, the data number would be more interesting.Alessio [00:41:42]: What's the depreciation of this data?Nathan [00:41:46]: It depends on the method. Like some methods, people think that it's more sensitive to the, this is what I was saying. It was like, does the type of instruction tuning you do matter for RLHF? So like, depending on the method, some people are trying to figure out if you need to have like what is called like, this is very confusing. It's called like on policy data, which is like your RLHF data is from your instruction model. I really think people in open source and academics are going to figure out how to use any preference data on any model just because they're scrappy. But there's been an intuition that to do like PPO well and keep improving the model over time and do like what Meta did and what people think that OpenAI does is that you need to collect new preference data to kind of edge the distribution of capabilities forward. So there's a depreciation where like the first batch of data you collect isn't really useful for training the model when you have the fifth batch. We don't really know, but it's a good question. And I do think that if we had all the LLAMA data, we wouldn't know what to do with all of it. Like probably like 20 to 40% would be pretty useful for people, but not the whole data set. Like a lot of it's probably kind of gibberish because they had a lot of data in there.Alessio [00:42:51]: So do you think like the open source community should spend more time figuring out how to reuse the data that we have or like generate more data? I think that's one of the-Nathan [00:43:02]: I think if the people are kind of locked into using synthetic data, people also think that synthetic data is like GPT-4 is more accurate than humans at labeling preferences. So if you look at these diagrams, like humans are about 60 to 70% agreement. And we're like, that's what the models get to. And if humans are about 70% agreement or accuracy, like GPT-4 is like 80%. So it is a bit better, which is like in one way of saying it.Swyx [00:43:24]: Humans don't even agree with humans 50% of the time.Nathan [00:43:27]: Yeah, so like that's the thing. It's like the human disagreement or the lack of accuracy should be like a signal, but how do you incorporate that? It's really tricky to actually do that. I think that people just keep using GPT-4 because it's really cheap. It's one of my like go-to, like I just say this over and over again is like GPT-4 for data generation, all terms and conditions aside because we know OpenAI has this stuff is like very cheap for getting pretty good data compared to compute or salary of any engineer or anything. So it's like tell people to go crazy generating GPT-4 data if you're willing to take the organizational like cloud of should we be doing this? But I think most people have accepted that you kind of do this, especially at individuals. Like they're not gonna come after individuals. I do think more companies should think twice before doing tons of OpenAI outputs. Also just because the data contamination and what it does to your workflow is probably hard to control at scale.Swyx [00:44:21]: And we should just mention at the time of recording, we've seen the first example of OpenAI enforcing their terms of service. ByteDance was caught, reported to be training on GPT-4 data and they got their access to OpenAI revoked. So that was one example.Nathan [00:44:36]: Yeah, I don't expect OpenAI to go too crazy on this cause they're just gonna, there's gonna be so much backlash against them. And like, everyone's gonna do it anyways.Swyx [00:44:46]: And what's at stake here to spell it out is like, okay, that's like cost $10 to collect one data point from a human. It's gonna cost you like a 10th of a cent with OpenAI, right? So like it's just orders of magnitude cheaper. And therefore people-Nathan [00:44:58]: Yeah, and it's like the signal you get from humans is from preferences isn't that high. The signal that you get from humans for instructions is pretty high, but it is also very expensive. So like the human instructions are definitely like by far and away the best ones out there compared to the synthetic data. But I think like the synthetic preferences are just so much easier to get some sort of signal running with and you can work in other, I think people will start working in other goals there between safety and whatever. That's something that's taking off and we'll kind of see that. I think in 2024, at some point, people will start doing things like constitutional AI for preferences, which will be pretty interesting. I think we saw how long it took RLHF to get started in open source. Instruction tuning was like the only thing that was really happening until maybe like August, really. I think Zephyr was the first model that showed success with RLHF in the public, but that's a long time from everyone knowing that it was something that people are interested in to having any like check mark. So I accept that and think the same will happen with constitutional AI. But once people show that you can do it once, they continue to explore.Alessio [00:46:01]: Excellent.Swyx [00:46:01]: Just in the domain of human preference data suppliers, Scale.ai very happily will tell you that they supplied all that data for Lama 2. The other one is probably interesting, LMSYS from Berkeley. What they're running with Chaterina is perhaps a good store of human preference data.Nathan [00:46:17]: Yeah, they released some toxicity data. They, I think, are generally worried about releasing data because they have to process it and make sure everything is safe and they're really lightweight work. I think they're trying to release the preference data. I have, if we make it to evaluation, I'd pretty much say that Chaterina is the best limited evaluation that people have to learn how to use language models. And like, it's very valuable data. They also may share some data with people that they host models from. So like if your model is hosted there and you pay for the hosting, you can get the prompts because you're pointing the endpoint at it and that gets pinged to you and you're any real LLM inference stack saves the prompts tha

Generation AI
AI in 2024: Bold Predictions and Emerging Trends

Generation AI

Play Episode Listen Later Dec 19, 2023 34:04


Join us in "AI in 2024" where hosts Ardis Kadiu and Dr. JC Bonilla dive into the exhilarating world of AI advancements expected in 2024. This episode unpacks the rapid progress in AI technologies, from large language models to multimodal AI, and predicts how these developments will revolutionize consumer experience, education, and creativity.https://generationaishow.com/Full Episode Notes:IntroductionHosts Ardis and Dr. JC Bonilla reminisce about their journey in AI and technology.The episode aims to forecast AI developments for 2024, building on 2023's rapid technological advancements.Review of 2023 in AIRecap of 2023's significant acceleration in foundational AI technology.The societal adoption lag and unexplored impacts on society.2024 Predictions: The Battle of Large Language Models (LLMs)Expectation of open source models like Llama2 and Mistral catching up to GPT-4.Google's multimodal Gemini and its potential impact.Predicted advancements in multimodal capabilities, including auditory and visual understanding.Discussion on the potential supremacy of OpenAI's GPT-5 or 4.5 in 2024.Consumer Perspective and Technology IntegrationAnticipation of more structured and mainstream adoption of AI technologies in consumer products.Creative AI and Deep FakesThe surge in AI-driven content creation, particularly in the realm of video and text editing.Discussion on generative AI's role in revolutionizing the creative process and its implications for content production.Personalized AI and Conversational InterfacesPredictions about conversational AI becoming more nuanced and empathetic.Speculation on the evolution of AI assistants, likening them to characters from science fiction, such as in the movie "Her".Apple's Role in AI DevelopmentDiscussion on Apple's potential contributions to AI in 2024, focusing on privacy and device integration.Speculation on how Apple might integrate AI into its ecosystem.AI in Education and Higher EducationAnticipated impact of AI in enhancing student experiences and administrative efficiency.The rise of AI teaching assistants and personalized learning experiences.Productivity and AI IntegrationDiscussion on integrating advanced AI capabilities into everyday tools for enhanced productivity.The future of content creation and distribution using AI, with a focus on automation.ConclusionRecap of the discussed topics and their implications for 2024.Closing thoughts on the potential realization of these predictions in the near future.

AI Unchained
#010 - Tools for Running Local LLMs

AI Unchained

Play Episode Listen Later Dec 14, 2023 39:46


Today we dive into a few of my favorite tools for running LLMs locally on your machine, even the average consumer machine. With a huge variety of models to choose from and varying levels of customization and presets, these will get you started on running your own Ai, and getting the pieces necessary to connect them into micro apps and other parts of your workflow. Not only are these huge productivity boosters to have, but it's also just a ton of fun to play around. For the large list of links to these and so many other AI tools to check out, go to bitcoinaudible.com/ai (Link: bitcoinaudible.com/ai) Links to the tools and the models mentioned in the episode GPT4All (Link: https://gpt4all.io/index.html) LM Studio (Link: https://lmstudio.ai/) Ollama AI (https://ollama.ai/) Models mentioned Stablelm-zephyr (small & fast) Mistral Instruct (good overall, imo) Hermes (less likely to lecture you) Llama2 (great base model) Llava (text and images, multimodal) Host Links Guy on Nostr (Link: http://tinyurl.com/yc376bff) Guy on X (Link: https://twitter.com/theguyswann) Bitcoin Audible on X (Link: https://twitter.com/BitcoinAudible) Check out our awesome sponsors! Get 9% off the COLDCARD with code BITCOINAUDIBLE ⁠⁠⁠⁠⁠⁠(Link: bitcoinaudible.com/coldcard⁠⁠⁠⁠⁠⁠) Swan: The best way to buy, learn, and earn #Bitcoin (Link: https://www.swanbitcoin.com/)

Bitcoin Audible
Ai_010 - Tools for Running Local LLMs

Bitcoin Audible

Play Episode Listen Later Dec 14, 2023 39:46


Today we dive into a few of my favorite tools for running LLMs locally on your machine, even the average consumer machine. With a huge variety of models to choose from and varying levels of customization and presets, these will get you started on running your own Ai, and getting the pieces necessary to connect them into micro apps and other parts of your workflow. Not only are these huge productivity boosters to have, but it's also just a ton of fun to play around. For the large list of links to these and so many other AI tools to check out, go to bitcoinaudible.com/ai (Link: bitcoinaudible.com/ai)   Links to the tools and the models mentioned in the episode GPT4All (Link: https://gpt4all.io/index.html) LM Studio (Link: https://lmstudio.ai/) Ollama AI (https://ollama.ai/) Models mentioned Stablelm-zephyr (small & fast) Mistral Instruct (good overall, imo) Hermes (less likely to lecture you) Llama2 (great base model) Llava (text and images, multimodal) Host Links ⁠Guy on Nostr ⁠(Link: http://tinyurl.com/2xc96ney) ⁠Guy on X ⁠(Link: https://twitter.com/theguyswann) Guy on Instagram (Link: https://www.instagram.com/theguyswann) Guy on TikTok (Link: https://www.tiktok.com/@theguyswann) Guy on YouTube (Link: https://www.youtube.com/@theguyswann) ⁠Bitcoin Audible on X⁠ (Link: https://twitter.com/BitcoinAudible) The Guy Swann Network Broadcast Room on Keet (Link: https://tinyurl.com/3na6v839) Check out our awesome sponsors! Fold: The best way to buy, use, and earn #Bitcoin on everything you do! Sats back on your debit card, gift cards, auto-buys, round-ups, you name it. Fold is the true bitcoiner's banking. Get 20K sats for FREE using referral code bitcoinaudible.com/fold Ready for best-in-class self custody? Get the Jade here and use discount code 'GUY' to get 10% off (Link: bitcoinaudible.com/jade) Trying to BUY BITCOIN? River, secure, trusted, bitcoin only, lightning enabled, simple. (Link: https://bitcoinaudible.com/river) Bitcoin Games! Get 10% off the best Bitcoin board game in the world, HODLUP! Or any of the other great games from the Free Market Kids! Use code GUY10 at checkout for 10% off your cart! (Link: https://www.freemarketkids.com/collections/games-1) Bitcoin Custodial Multisig Want to get into Bitcoin but not ready for self custody? Use custodial multisig for the best way to distribute trust across multiple institutions and even jurisdictions! Check out OnRamp. (Link: BitcoinAudible.com/onramp) Education & HomeSchooling

Gradient Dissent - A Machine Learning Podcast by W&B
Bridging AI and Science: The Impact of Machine Learning on Material Innovation with Joe Spisak of Meta

Gradient Dissent - A Machine Learning Podcast by W&B

Play Episode Listen Later Dec 7, 2023 74:44


In the latest episode of Gradient Dissent, we hear from Joseph Spisak, Product Director, Generative AI @Meta, to explore the boundless impacts of AI and its expansive role in reshaping various sectors. We delve into the intricacies of models like GPT and Llama2, their influence on user experiences, and AI's groundbreaking contributions to fields like biology, material science, and green hydrogen production through the Open Catalyst Project. The episode also examines AI's practical business applications, from document summarization to intelligent note-taking, addressing the ethical complexities of AI deployment. We wrap up with a discussion on the significance of open-source AI development, community collaboration, and AI democratization. Tune in for valuable insights into the expansive world of AI, relevant to developers, business leaders, and tech enthusiasts.We discuss:0:00 Intro0:32 Joe is Back at Meta3:28 What Does Meta Get Out Of Putting Out LLMs?8:24 Measuring The Quality Of LLMs10:55 How Do You Pick The Sizes Of Models16:45 Advice On Choosing Which Model To Start With24:57 The Secret Sauce In The Training26:17 What Is Being Worked On Now33:00 The Safety Mechanisms In Llama 237:00 The Datasets Llama 2 Is Trained On38:00 On Multilingual Capabilities & Tone43:30 On The Biggest Applications Of Llama 247:25 On Why The Best Teams Are Built By Users54:01 The Culture Differences Of Meta vs Open Source57:39 The AI Learning Alliance1:01:34 Where To Learn About Machine Learning1:05:10 Why AI For Science Is Under-rated1:11:36 What Are The Biggest Issues With Real-World ApplicationsThanks for listening to the Gradient Dissent podcast, brought to you by Weights & Biases. If you enjoyed this episode, please leave a review to help get the word out about the show. And be sure to subscribe so you never miss another insightful conversation.#OCR #DeepLearning #AI #Modeling #ML

The Cloudcast
LLMOps & Conversational Intelligence for AI

The Cloudcast

Play Episode Listen Later Dec 6, 2023 30:06


GUEST: Alex Kvamme (@KwameKvamme, CEO @GetPathlight)SHOW: 777CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwNEW TO CLOUD? CHECK OUT - "CLOUDCAST BASICS"SHOW SPONSORS:Find "Breaking Analysis Podcast with Dave Vellante" on Apple, Google and SpotifyKeep up to data with Enterprise Tech with theCUBESHOW NOTES:Pathlight (homepage)VentureBeat article on Pathlight & AI AgentsArticle on Real World Productivity of LLMsTopic 1 - Welcome to the show. Alex, Tell us a bit about your background.Topic 2 - What is the concept of conversational intelligence and how does it apply to most organizations today? What problem is it trying to solve?Topic 3 - I would think there is a trade off between time and resources to get to a customer issue vs. the value of that insight. How does an organization weigh the opportunity cost? How do you keep the insights generated from being overwhelmingTopic 4 - Let's move from the concept to practical. Where is the data in most organizations today that will yield results and solve problems? How would you suggest folks get started and what use case are they likely to implement first? Is this data that humans either can't or won't get too because it is an enormous amount or maybe too tedious to pay for?Topic 5 - How does all of this work under the hood? Is this one model or multiple models working in parallel? Is there a framework for the operations and lifecycle managed by an organization?Topic 6 - Let's talk about what it takes to get an LLM into production. The the rise of LLM's and foundational models such as Llama2, there is an interest for organizations to use LLM's, but going from concept to production still has a high barrier to entry. It's easy to download a model, it's much harder to either fine-tune it or set up RAG with a vector database for your specific use case. Would you agree and what are your thoughts?FEEDBACK?Email: show at the cloudcast dot netTwitter: @thecloudcastnet

Papers Read on AI
Efficient LLM Inference on CPUs

Papers Read on AI

Play Episode Listen Later Dec 2, 2023 17:32


Large language models (LLMs) have demonstrated remarkable performance and tremendous potential across a wide range of tasks. However, deploying these models has been challenging due to the astronomical amount of model parameters, which requires a demand for large memory capacity and high memory bandwidth. In this paper, we propose an effective approach that can make the deployment of LLMs more efficiently. We support an automatic INT4 weight-only quantization flow and design a special LLM runtime with highly-optimized kernels to accelerate the LLM inference on CPUs. We demonstrate the general applicability of our approach on popular LLMs including Llama2, Llama, GPT-NeoX, and showcase the extreme inference efficiency on CPUs. The code is publicly available at: https://github.com/intel/intel-extension-for-transformers. 2023: Haihao Shen, Hanwen Chang, Bo Dong, Yu Luo, Hengyu Meng https://arxiv.org/pdf/2311.00502v1.pdf

Papers Read on AI
Learning to Filter Context for Retrieval-Augmented Generation

Papers Read on AI

Play Episode Listen Later Nov 20, 2023 35:28


On-the-fly retrieval of relevant knowledge has proven an essential element of reliable systems for tasks such as open-domain question answering and fact verification. However, because retrieval systems are not perfect, generation models are required to generate outputs given partially or entirely irrelevant passages. This can cause over- or under-reliance on context, and result in problems in the generated output such as hallucinations. To alleviate these problems, we propose FILCO, a method that improves the quality of the context provided to the generator by (1) identifying useful context based on lexical and information-theoretic approaches, and (2) training context filtering models that can filter retrieved contexts at test time. We experiment on six knowledge-intensive tasks with FLAN-T5 and LLaMa2, and demonstrate that our method outperforms existing approaches on extractive question answering (QA), complex multi-hop and long-form QA, fact verification, and dialog generation tasks. FILCO effectively improves the quality of context, whether or not it supports the canonical output. 2023: Zhiruo Wang, Jun Araki, Zhengbao Jiang, Md. Rizwan Parvez, Graham Neubig https://arxiv.org/pdf/2311.08377v1.pdf

Creative Business Marketing
Episode 053: All Things AI - Key Players in the AI Space

Creative Business Marketing

Play Episode Listen Later Nov 15, 2023 9:30


Topics covered in this episode include… - AI goldrush and overfunding in Silicon Valley- OpenAI's ChatGPT: Free and $20/month versions, user data usage- Microsoft's $11 billion investment in OpenAI: Threat to Google's market dominance- Anthropic: OpenAI competitor, creator of "generative" language model, Claude- Google and Meta's AI forays: Google's Bard and Meta's open-source Llama2- Europe's lag in large language model development: Cautious data privacy approach, LEAM.ai's open-source initiative.______________

Code Together
The Growing Importance of Operational AI

Code Together

Play Episode Listen Later Nov 9, 2023 43:48


Eduardo Alvarez joins Tony to talk about how the AI landscape for businesses and developers is changing rapidly, and how we need to move to take advantage of the opportunities and not get left behind. They also talk about fine-tuning Llama2 with LoRA on Gaudi2 and the benefits for users. Guest: Eduardo Alvarez - AI Solutions Engineer @ Intel Eduardo Alvarez   Learn more: Eduardo's blog post https://medium.com/@eduand-alvarez/a-case-for-operational-centric-ai-6e97dfa4fc23    oneAPI https://oneapi.io https://www.intel.com/content/www/us/en/developer/tools/oneapi/overview.html   Unified Acceleration Foundation https://uxlfoundation.org/   Bob Duffy Stable Diffusion https://www.youtube.com/watch?v=pF6DiktPsaA   Llama2 Fine-Tuning with Low-Rank Adaptations (LoRA) on Gaudi 2 Processors https://eduand-alvarez.medium.com/llama2-fine-tuning-with-low-rank-adaptations-lora-on-gaudi-2-processors-52cf1ee6ce11   President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/

En.Digital Podcast
La TertulIA #10: GPT-4, nuevas integraciones de Meta con Llama 2 y ahorrando costes en Copilot

En.Digital Podcast

Play Episode Listen Later Oct 27, 2023 73:39


En este episodio, hablamos de la evolución de GPT-4 como modelo multimodal, de sus nuevas capacidades y limitaciones, de las próximas integraciones que va a lanzar Meta en sus productos con Llama2 y del foco en la reducción de costes en los modelos de IA que usa Microsoft y de la carta en la que se pidió que parase el desarrollo de la IA. La TertulIA que escuchan las IAs comiendo pipas con Luis Martín (Product Hackers), Frankie Carrero (VASS) y Corti (Product Hackers).NOSOTROS

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Thanks to the over 17,000 people who have joined the first AI Engineer Summit! A full recap is coming. Last call to fill out the State of AI Engineering survey! See our Community page for upcoming meetups in SF, Paris and NYC.This episode had good interest on Twitter.Fast.ai's “Practical Deep Learning” courses been watched by over >6,000,000 people, and the fastai library has over 25,000 stars on Github. Jeremy Howard, one of the creators of Fast, is now one of the most prominent and respected voices in the machine learning industry; but that wasn't always the case. Being non-consensus and right In 2018, Jeremy and Sebastian Ruder published a paper on ULMFiT (Universal Language Model Fine-tuning), a 3-step transfer learning technique for NLP tasks: The paper demonstrated that pre-trained language models could be fine-tuned on a specific task with a relatively small amount of data to achieve state-of-the-art results. They trained a 24M parameters model on WikiText-103 which was beat most benchmarks.While the paper had great results, the methods behind weren't taken seriously by the community: “Everybody hated fine tuning. Everybody hated transfer learning. I literally did tours trying to get people to start doing transfer learning and nobody was interested, particularly after GPT showed such good results with zero shot and few shot learning […] which I was convinced was not the right direction, but who's going to listen to me, cause as you said, I don't have a PhD, not at a university… I don't have a big set of computers to fine tune huge transformer models.”Five years later, fine-tuning is at the center of most major discussion topics in AI (we covered some like fine tuning vs RAG and small models fine tuning), and we might have gotten here earlier if Jeremy had OpenAI-level access to compute and distribution. At heart, Jeremy has always been “GPU poor”:“I've always been somebody who does not want to build stuff on lots of big computers because most people don't have lots of big computers and I hate creating stuff that most people can't use.”This story is a good reminder of how some of the best ideas are hiding in plain sight; we recently covered RWKV and will continue to highlight the most interesting research that isn't being done in the large labs. Replacing fine-tuning with continued pre-trainingEven though fine-tuning is now mainstream, we still have a lot to learn. The issue of “catastrophic forgetting” and potential solutions have been brought up in many papers: at the fine-tuning stage, the model can forget tasks it previously knew how to solve in favor of new ones. The other issue is apparent memorization of the dataset even after a single epoch, which Jeremy covered Can LLMs learn from a single example? but we still don't have the answer to. Despite being the creator of ULMFiT, Jeremy still professes that there are a lot of open questions on finetuning:“So I still don't know how to fine tune language models properly and I haven't found anybody who feels like they do.”He now advocates for "continued pre-training" - maintaining a diversity of data throughout the training process rather than separate pre-training and fine-tuning stages. Mixing instructional data, exercises, code, and other modalities while gradually curating higher quality data can avoid catastrophic forgetting and lead to more robust capabilities (something we covered in Datasets 101).“Even though I originally created three-step approach that everybody now does, my view is it's actually wrong and we shouldn't use it… the right way to do this is to fine-tune language models, is to actually throw away the idea of fine-tuning. There's no such thing. There's only continued pre-training. And pre-training is something where from the very start, you try to include all the kinds of data that you care about, all the kinds of problems that you care about, instructions, exercises, code, general purpose document completion, whatever. And then as you train, you gradually curate that, you know, you gradually make that higher and higher quality and more and more specific to the kinds of tasks you want it to do. But you never throw away any data….So yeah, that's now my view, is I think ULMFiT is the wrong approach. And that's why we're seeing a lot of these so-called alignment tax… I think it's actually because people are training them wrong.An example of this phenomena is CodeLlama, a LLaMA2 model finetuned on 500B tokens of code: while the model is much better at code, it's worse on generic tasks that LLaMA2 knew how to solve well before the fine-tuning. In the episode we also dive into all the places where open source model development and research is happening (academia vs Discords - tracked on our Communities list and on our survey), and how Jeremy recommends getting the most out of these diffuse, pseudonymous communities (similar to the Eleuther AI Mafia).Show Notes* Jeremy's Background* FastMail* Optimal Decisions* Kaggle* Enlitic* fast.ai* Rachel Thomas* Practical Deep Learning* fastai for PyTorch* nbdev* fastec2 (the underrated library we describe)* Can LLMs learn from a single example?* the Kaggle LLM Science Exam competition, which “challenges participants to answer difficult science-based questions written by a Large Language Model”.* Sebastian Ruder* Alec Radford* Sylvain Gugger* Stephen Merity* Chris Lattner* Modular.ai / Mojo* Jono Whittaker* Zeiler and Fergus paper* ULM Fit* DAWNBench* Phi-1* Code Llama* AlexNetTimestamps* [00:00:00] Intros and Jeremy's background* [00:05:28] Creating ULM Fit - a breakthrough in NLP using transfer learning* [00:06:32] The rise of GPT and the appeal of few-shot learning over fine-tuning* [00:10:00] Starting Fast.ai to distribute AI capabilities beyond elite academics* [00:14:30] How modern LMs like ChatGPT still follow the ULM Fit 3-step approach* [00:17:23] Meeting with Chris Lattner on Swift for TensorFlow at Google* [00:20:00] Continued pre-training as a fine-tuning alternative* [00:22:16] Fast.ai and looking for impact vs profit maximization* [00:26:39] Using Fast.ai to create an "army" of AI experts to improve their domains* [00:29:32] Fast.ai's 3 focus areas - research, software, and courses* [00:38:42] Fine-tuning memorization and training curve "clunks" before each epoch* [00:46:47] Poor training and fine-tuning practices may be causing alignment failures* [00:48:38] Academia vs Discords* [00:53:41] Jeremy's high hopes for Chris Lattner's Mojo and its potential* [01:05:00] Adding capabilities like SQL generation through quick fine-tuning* [01:10:12] Rethinking Fast.ai courses for the AI-assisted coding era* [01:14:53] Rapid model development has created major technical debt* [01:17:08] Lightning RoundAI Summary (beta)This is the first episode we're trying this. Here's an overview of the main topics before you dive in the transcript. * Jeremy's background and philosophies on AI* Studied philosophy and cognitive science in college* Focused on ethics and thinking about AI even 30 years ago* Believes AI should be accessible to more people, not just elite academics/programmers* Created fast.ai to make deep learning more accessible* Development of transfer learning and ULMFit* Idea of transfer learning critical for making deep learning accessible* ULMFit pioneered transfer learning for NLP* Proposed training general language models on large corpora then fine-tuning - this became standard practice* Faced skepticism that this approach would work from NLP community* Showed state-of-the-art results on text classification soon after trying it* Current open questions around fine-tuning LLMs* Models appear to memorize training data extremely quickly (after 1 epoch)* This may hurt training dynamics and cause catastrophic forgetting* Unclear how best to fine-tune models to incorporate new information/capabilities* Need more research on model training dynamics and ideal data mixing* Exciting new developments* Mojo and new programming languages like Swift could enable faster model innovation* Still lots of room for improvements in computer vision-like innovations in transformers* Small models with fine-tuning may be surprisingly capable for many real-world tasks* Prompting strategies enable models like GPT-3 to achieve new skills like playing chess at superhuman levels* LLMs are like computer vision in 2013 - on the cusp of huge new breakthroughs in capabilities* Access to AI research* Many key convos happen in private Discord channels and forums* Becoming part of these communities can provide great learning opportunities* Being willing to do real work, not just talk about ideas, is key to gaining access* The future of practical AI* Coding becoming more accessible to non-programmers through AI assistance* Pre-requisite programming experience for learning AI may no longer be needed* Huge open questions remain about how to best train, fine-tune, and prompt LLMsTranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI. [00:00:21]Swyx: Hey, and today we have in the remote studio, Jeremy Howard all the way from Australia. Good morning. [00:00:27]Jeremy: The remote studio, also known as my house. Good morning. Nice to see you. [00:00:32]Swyx: Nice to see you too. I'm actually very used to seeing you in your mask as a message to people, but today we're mostly audio. But thank you for doing the very important public service of COVID awareness. It was a pleasure. [00:00:46]Jeremy: It was all very annoying and frustrating and tedious, but somebody had to do it. [00:00:52]Swyx: Somebody had to do it, especially somebody with your profile. I think it really drives home the message. So we tend to introduce people for them and then ask people to fill in the blanks on the personal side. Something I did not know about you was that you graduated with a BA in philosophy from the University of Melbourne. I assumed you had a PhD. [00:01:14]Jeremy: No, I mean, I barely got through my BA because I was working 80 to 100 hour weeks at McKinsey and Company from 19 years old onwards. So I actually didn't attend any lectures in second and third year university. [00:01:35]Swyx: Well, I guess you didn't need it or you're very sort of self-driven and self-motivated. [00:01:39]Jeremy: I took two weeks off before each exam period when I was working at McKinsey. And then, I mean, I can't believe I got away with this in hindsight, I would go to all my professors and say, oh, I was meant to be in your class this semester and I didn't quite turn up. Were there any assignments I was meant to have done, whatever. I can't believe all of them let me basically have it. They basically always would say like, okay, well, if you can have this written by tomorrow, I'll accept it. So yeah, stressful way to get through university, but. [00:02:12]Swyx: Well, it shows that, I guess, you min-maxed the opportunities. That definitely was a precursor. [00:02:18]Jeremy: I mean, funnily, like in as much as I, you know, in philosophy, the things I found interesting and focused on in the little bit of time I did spend on it was ethics and cognitive science. And it's kind of really amazing that it's now come back around and those are actually genuinely useful things to know about, which I never thought would happen. [00:02:38]Swyx: A lot of, yeah, a lot of relevant conversations there. So you were a consultant for a while and then in the magical month of June 1989, you founded both Optimal Decisions and Fastmeal, which I also briefly used. So thank you for that. [00:02:53]Jeremy: Oh, good for you. Yeah. Cause I had read the statistics, which is that like 90% or something of small businesses fail. So I thought if I start two businesses, I have a higher chance. In hindsight, I was thinking of it as some kind of stochastic thing I didn't have control over, but it's a bit odd, but anyway. [00:03:10]Swyx: And then you were president and chief scientist at Kaggle, which obviously is the sort of composition platform of machine learning. And then Enlitic, where you were working on using deep learning to improve medical diagnostics and clinical decisions. Yeah. [00:03:28]Jeremy: I was actually the first company to use deep learning in medicine, so I kind of founded the field. [00:03:33]Swyx: And even now that's still like a pretty early phase. And I actually heard you on your new podcast with Tanish, where you went very, very deep into the stuff, the kind of work that he's doing, such a young prodigy at his age. [00:03:47]Jeremy: Maybe he's too old to be called a prodigy now, ex-prodigy. No, no. [00:03:51]Swyx: I think he still counts. And anyway, just to round out the bio, you have a lot more other credentials, obviously, but most recently you started Fast.ai, which is still, I guess, your primary identity with Rachel Thomas. So welcome. [00:04:05]Jeremy: Yep. [00:04:06]Swyx: Thanks to my wife. Thank you. Yeah. Doing a lot of public service there with getting people involved in AI, and I can't imagine a better way to describe it than fast, fast.ai. You teach people from nothing to stable diffusion in seven weeks or something, and that's amazing. Yeah, yeah. [00:04:22]Jeremy: I mean, it's funny, you know, when we started that, what was that, like 2016 or something, the idea that deep learning was something that you could make more accessible was generally considered stupid. Everybody knew that deep learning was a thing that you got a math or a computer science PhD, you know, there was one of five labs that could give you the appropriate skills and that you would join, yeah, basically from one of those labs, you might be able to write some papers. So yeah, the idea that normal people could use that technology to do good work was considered kind of ridiculous when we started it. And we weren't sure if it was possible either, but we kind of felt like we had to give it a go because the alternative was we were pretty sure that deep learning was on its way to becoming, you know, the most or one of the most, you know, important technologies in human history. And if the only people that could use it were a handful of computer science PhDs, that seemed like A, a big waste and B, kind of dangerous. [00:05:28]Swyx: Yeah. [00:05:29]Alessio: And, you know, well, I just wanted to know one thing on your bio that at Kaggle, you were also the top rank participant in both 2010 and 2011. So sometimes you see a lot of founders running companies that are not really in touch with the problem, but you were clearly building something that you knew a lot about, which is awesome. Talking about deep learning, you created, published a paper on ULM fit, which was kind of the predecessor to multitask learning and a lot of the groundwork that then went to into Transformers. I've read back on the paper and you turned this model, AWD LSTM, which I did the math and it was like 24 to 33 million parameters, depending on what training data set you use today. That's kind of like not even small, it's like super small. What were some of the kind of like contrarian takes that you had at the time and maybe set the stage a little bit for the rest of the audience on what was kind of like the state of the art, so to speak, at the time and what people were working towards? [00:06:32]Jeremy: Yeah, the whole thing was a contrarian take, you know. So okay, so we started Fast.ai, my wife and I, and we thought, yeah, so we're trying to think, okay, how do we make it more accessible? So when we started thinking about it, it was probably 2015 and then 2016, we started doing something about it. Why is it inaccessible? Okay, well, A, no one knows how to do it other than a few number of people. And then when we asked those few number of people, well, how do you actually get good results? They would say like, oh, it's like, you know, a box of tricks that aren't published. So you have to join one of the labs and learn the tricks. So a bunch of unpublished tricks, not much software around, but thankfully there was Theano and rappers and particularly Lasagna, the rapper, but yeah, not much software around, not much in the way of data sets, you know, very hard to get started in terms of the compute. Like how do you get that set up? So yeah, no, everything was kind of inaccessible. And you know, as we started looking into it, we had a key insight, which was like, you know what, most of the compute and data for image recognition, for example, we don't need to do it. You know, there's this thing which nobody knows about, nobody talks about called transfer learning, where you take somebody else's model, where they already figured out like how to detect edges and gradients and corners and text and whatever else, and then you can fine tune it to do the thing you want to do. And we thought that's the key. That's the key to becoming more accessible in terms of compute and data requirements. So when we started Fast.ai, we focused from day one on transfer learning. Lesson one, in fact, was transfer learning, literally lesson one, something not normally even mentioned in, I mean, there wasn't much in the way of courses, you know, the courses out there were PhD programs that had happened to have recorded their lessons and they would rarely mention it at all. We wanted to show how to do four things that seemed really useful. You know, work with vision, work with tables of data, work with kind of recommendation systems and collaborative filtering and work with text, because we felt like those four kind of modalities covered a lot of the stuff that, you know, are useful in real life. And no one was doing anything much useful with text. Everybody was talking about word2vec, you know, like king plus queen minus woman and blah, blah, blah. It was like cool experiments, but nobody's doing anything like useful with it. NLP was all like lemmatization and stop words and topic models and bigrams and SPMs. And it was really academic and not practical. But I mean, to be honest, I've been thinking about this crazy idea for nearly 30 years since I had done cognitive science at university, where we talked a lot about the CELS Chinese room experiment. This idea of like, what if there was somebody that could kind of like, knew all of the symbolic manipulations required to answer questions in Chinese, but they didn't speak Chinese and they were kind of inside a room with no other way to talk to the outside world other than taking in slips of paper with Chinese written on them and then they do all their rules and then they pass back a piece of paper with Chinese back. And this room with a person in is actually fantastically good at answering any question you give them written in Chinese. You know, do they understand Chinese? And is this, you know, something that's intelligently working with Chinese? Ever since that time, I'd say the most thought, to me, the most thoughtful and compelling philosophical response is yes. You know, intuitively it feels like no, because that's just because we can't imagine such a large kind of system. But you know, if it looks like a duck and acts like a duck, it's a duck, you know, or to all intents and purposes. And so I always kind of thought, you know, so this is basically a kind of analysis of the limits of text. And I kind of felt like, yeah, if something could ingest enough text and could use the patterns it saw to then generate text in response to text, it could appear to be intelligent, you know. And whether that means it is intelligent or not is a different discussion and not one I find very interesting. Yeah. And then when I came across neural nets when I was about 20, you know, what I learned about the universal approximation theorem and stuff, and I started thinking like, oh, I wonder if like a neural net could ever get big enough and take in enough data to be a Chinese room experiment. You know, with that background and this kind of like interest in transfer learning, you know, I'd been thinking about this thing for kind of 30 years and I thought like, oh, I wonder if we're there yet, you know, because we have a lot of text. Like I can literally download Wikipedia, which is a lot of text. And I thought, you know, how would something learn to kind of answer questions or, you know, respond to text? And I thought, well, what if we used a language model? So language models are already a thing, you know, they were not a popular or well-known thing, but they were a thing. But language models exist to this idea that you could train a model to fill in the gaps. Or actually in those days it wasn't fill in the gaps, it was finish a string. And in fact, Andrej Karpathy did his fantastic RNN demonstration from this at a similar time where he showed like you can have it ingest Shakespeare and it will generate something that looks a bit like Shakespeare. I thought, okay, so if I do this at a much bigger scale, using all of Wikipedia, what would it need to be able to do to finish a sentence in Wikipedia effectively, to do it quite accurately quite often? I thought, geez, it would actually have to know a lot about the world, you know, it'd have to know that there is a world and that there are objects and that objects relate to each other through time and cause each other to react in ways and that causes proceed effects and that, you know, when there are animals and there are people and that people can be in certain positions during certain timeframes and then you could, you know, all that together, you can then finish a sentence like this was signed into law in 2016 by US President X and it would fill in the gap, you know. So that's why I tried to create what in those days was considered a big language model trained on the entirety on Wikipedia, which is that was, you know, a bit unheard of. And my interest was not in, you know, just having a language model. My interest was in like, what latent capabilities would such a system have that would allow it to finish those kind of sentences? Because I was pretty sure, based on our work with transfer learning and vision, that I could then suck out those latent capabilities by transfer learning, you know, by fine-tuning it on a task data set or whatever. So we generated this three-step system. So step one was train a language model on a big corpus. Step two was fine-tune a language model on a more curated corpus. And step three was further fine-tune that model on a task. And of course, that's what everybody still does today, right? That's what ChatGPT is. And so the first time I tried it within hours, I had a new state-of-the-art academic result on IMDB. And I was like, holy s**t, it does work. And so you asked, to what degree was this kind of like pushing against the established wisdom? You know, every way. Like the reason it took me so long to try it was because I asked all my friends in NLP if this could work. And everybody said, no, it definitely won't work. It wasn't like, oh, maybe. Everybody was like, it definitely won't work. NLP is much more complicated than vision. Language is a much more vastly complicated domain. You know, and you've got problems like the grounding problem. We know from like philosophy and theory of mind that it's actually impossible for it to work. So yeah, so don't waste your time. [00:15:10]Alessio: Jeremy, had people not tried because it was like too complicated to actually get the data and like set up the training? Or like, were people just lazy and kind of like, hey, this is just not going to work? [00:15:20]Jeremy: No, everybody wasn't lazy. So like, so the person I thought at that time who, you know, there were two people I thought at that time, actually, who were the strongest at language models were Stephen Merity and Alec Radford. And at the time I didn't know Alec, but I, after we had both, after I'd released ULM Fit and he had released GPT, I organized a chat for both of us with Kate Metz in the New York Times. And Kate Metz answered, sorry, and Alec answered this question for Kate. And Kate was like, so how did, you know, GPT come about? And he said, well, I was pretty sure that pre-training on a general large corpus wouldn't work. So I hadn't tried it. And then I read ULM Fit and turns out it did work. And so I did it, you know, bigger and it worked even better. And similar with, with Stephen, you know, I asked Stephen Merity, like, why don't we just find, you know, take your AWD-ASTLM and like train it on all of Wikipedia and fine tune it? And he's kind of like, well, I don't think that's going to really lie. Like two years before I did a very popular talk at KDD, the conference where everybody in NLP was in the audience. I recognized half the faces, you know, and I told them all this, I'm sure transfer learning is the key. I'm sure ImageNet, you know, is going to be an NLP thing as well. And, you know, everybody was interested and people asked me questions afterwards and, but not just, yeah, nobody followed up because everybody knew that it didn't work. I mean, even like, so we were scooped a little bit by Dai and Lee, Kwok Lee at Google. They had, they had, I already, I didn't even realize this, which is a bit embarrassing. They had already done a large language model and fine tuned it. But again, they didn't create a general purpose, large language model on a general purpose corpus. They only ever tested a domain specific corpus. And I haven't spoken to Kwok actually about that, but I assume that the reason was the same. It probably just didn't occur to them that the general approach could work. So maybe it was that kind of 30 years of mulling over the, the cell Chinese room experiment that had convinced me that it probably would work. I don't know. Yeah. [00:17:48]Alessio: Interesting. I just dug up Alec announcement tweet from 2018. He said, inspired by Cobe, Elmo, and Yola, I'm fit. We should have a single transformer language model can be fine tuned to a wide variety. It's interesting because, you know, today people think of AI as the leader, kind of kind of like the research lab pushing forward the field. What was that at the time? You know, like kind of like going back five years, people think of it as an overnight success, but obviously it took a while. [00:18:16]Swyx: Yeah. Yeah. [00:18:17]Jeremy: No, I mean, absolutely. And I'll say like, you know, it's interesting that it mentioned Elmo because in some ways that was kind of diametrically opposed to, to ULM fit. You know, there was these kind of like, so there was a lot of, there was a lot of activity at the same time as ULM fits released. So there was, um, so before it, as Brian McCann, I think at Salesforce had come out with this neat model that did a kind of multitask learning, but again, they didn't create a general fine tune language model first. There was Elmo, um, which I think was a lip, you know, actually quite a few months after the first ULM fit example, I think. Um, but yeah, there was a bit of this stuff going on. And the problem was everybody was doing, and particularly after GPT came out, then everybody wanted to focus on zero shot and few shot learning. You know, everybody hated fine tuning. Everybody hated transfer learning. And like, I literally did tours trying to get people to start doing transfer learning and people, you know, nobody was interested, particularly after GPT showed such good results with zero shot and few shot learning. And so I actually feel like we kind of went backwards for years and, and not to be honest, I mean, I'm a bit sad about this now, but I kind of got so disappointed and dissuaded by like, it felt like these bigger lab, much bigger labs, you know, like fast AI had only ever been just me and Rachel were getting all of this attention for an approach I thought was the wrong way to do it. You know, I was convinced was the wrong way to do it. And so, yeah, for years people were really focused on getting better at zero shot and few shots and it wasn't until, you know, this key idea of like, well, let's take the ULM fit approach, but for step two, rather than fine tuning on a kind of a domain corpus, let's fine tune on an instruction corpus. And then in step three, rather than fine tuning on a reasonably specific task classification, let's fine tune on a, on a RLHF task classification. And so that was really, that was really key, you know, so I was kind of like out of the NLP field for a few years there because yeah, it just felt like, I don't know, pushing uphill against this vast tide, which I was convinced was not the right direction, but who's going to listen to me, you know, cause I, as you said, I don't have a PhD, not at a university, or at least I wasn't then. I don't have a big set of computers to fine tune huge transformer models. So yeah, it was definitely difficult. It's always been hard. You know, it's always been hard. Like I've always been somebody who does not want to build stuff on lots of big computers because most people don't have lots of big computers and I hate creating stuff that most people can't use, you know, and also stuff that's created on lots of big computers has always been like much more media friendly. So like, it might seem like a recent thing, but actually throughout my 30 years in data science, the attention's always been on, you know, the big iron results. So when I first started, everybody was talking about data warehouses and it was all about Teradata and it'd be like, oh, this big bank has this huge room full of computers and they have like terabytes of data available, you know, at the press of a button. And yeah, that's always what people want to talk about, what people want to write about. And then of course, students coming out of their PhDs and stuff, that's where they want to go work because that's where they read about. And to me, it's a huge distraction, you know, because like I say, most people don't have unlimited compute and I want to help most people, not the small subset of the most well-off people. [00:22:16]Alessio: That's awesome. And it's great to hear, you do such a great job educating that a lot of times you're not telling your own story, you know? So I love this conversation. And the other thing before we jump into Fast.AI, actually, a lot of people that I know, they run across a new architecture and whatnot, they're like, I got to start a company and raise a bunch of money and do all of this stuff. And say, you were like, I want everybody to have access to this. Why was that the case for you? Was it because you already had a successful venture in like FastMail and you were more interested in that? What was the reasoning? [00:22:52]Jeremy: It's a really good question. So I guess the answer is yes, that's the reason why. So when I was a teenager, I thought it would be really cool to like have my own company. You know, I didn't know the word startup. I didn't know the word entrepreneur. I didn't know the word VC. And I didn't really know what any of those things were really until after we started Kaggle, to be honest. Even the way it started to what we now call startups. I just thought they were just small businesses. You know, they were just companies. So yeah, so those two companies were FastMail and Optimal Decisions. FastMail was the first kind of synchronized email provider for non-businesses. So something you can get your same email at home, on your laptop, at work, on your phone, whatever. And then Optimal Decisions invented a new approach to insurance pricing. Something called profit-optimized insurance pricing. So I saw both of those companies, you know, after 10 years. And at that point, I had achieved the thing that as a teenager I had wanted to do. You know, it took a lot longer than it should have because I spent way longer in management consulting than I should have because I got caught up in that stupid rat race. But, you know, eventually I got there and I remember my mom saying to me, you must be so proud. You know, because she remembered my dream. She's like, you've done it. And I kind of reflected and I was like, I'm not proud at all. You know, like people quite liked FastMail. You know, it's quite nice to have synchronized email. It probably would have happened anyway. Yeah, I'm certainly not proud that I've helped some insurance companies suck more money out of their customers. Yeah, no, I'm not proud. You know, it's actually, I haven't really helped the world very much. You know, maybe in the insurance case I've made it a little bit worse. I don't know. So, yeah, I was determined to not waste more years of my life doing things, working hard to do things which I could not be reasonably sure would have a lot of value. So, you know, I took some time off. I wasn't sure if I'd ever work again, actually. I didn't particularly want to, because it felt like, yeah, it felt like such a disappointment. And, but, you know, and I didn't need to. I had enough money. Like, I wasn't super rich, but I had enough money. I didn't need to work. And I certainly recognized that amongst the other people I knew who had enough money that they didn't need to work, they all worked ridiculously hard, you know, and constantly put themselves in extremely stressful situations. And I thought, I don't want to be one of those idiots who's tied to, you know, buying a bigger plane than the next guy or whatever. You know, Kaggle came along and I mainly kind of did that just because it was fun and interesting to hang out with interesting people. But, you know, with Fast.ai in particular, you know, Rachel and I had a very explicit, you know, long series of conversations over a long period of time about like, well, how can we be the most helpful to society as a whole, and particularly to those people who maybe need more help, you know? And so we definitely saw the world going in a potentially pretty dystopian direction if the world's most powerful technology was controlled by a small group of elites. So we thought, yeah, we should focus on trying to help that not happen. You know, sadly, it looks like it still is likely to happen. But I mean, I feel like we've helped make it a little bit less likely. So we've done our bit. [00:26:39]Swyx: You've shown that it's possible. And I think your constant advocacy, your courses, your research that you publish, you know, just the other day you published a finding on, you know, learning that I think is still something that people are still talking about quite a lot. I think that that is the origin story of a lot of people who are going to be, you know, little Jeremy Howards, furthering your mission with, you know, you don't have to do everything by yourself is what I'm saying. No, definitely. Definitely. [00:27:10]Jeremy: You know, that was a big takeaway from like, analytic was analytic. It definitely felt like we had to do everything ourselves. And I kind of, I wanted to solve medicine. I'll say, yeah, okay, solving medicine is actually quite difficult. And I can't do it on my own. And there's a lot of other things I'd like to solve, and I can't do those either. So that was definitely the other piece was like, yeah, you know, can we create an army of passionate domain experts who can change their little part of the world? And that's definitely happened. Like I find nowadays, at least half the time, probably quite a bit more that I get in contact with somebody who's done really interesting work in some domain. Most of the time I'd say, they say, yeah, I got my start with fast.ai. So it's definitely, I can see that. And I also know from talking to folks at places like Amazon and Adobe and stuff, which, you know, there's lots of alumni there. And they say, oh my God, I got here. And like half of the people are fast.ai alumni. So it's fantastic. [00:28:13]Swyx: Yeah. [00:28:14]Jeremy: Actually, Andre Kapathy grabbed me when I saw him at NeurIPS a few years ago. And he was like, I have to tell you, thanks for the fast.ai courses. When people come to Tesla and they need to know more about deep learning, we always send them to your course. And the OpenAI Scholars Program was doing the same thing. So it's kind of like, yeah, it's had a surprising impact, you know, that's just one of like three things we do is the course, you know. [00:28:40]Swyx: Yes. [00:28:40]Jeremy: And it's only ever been at most two people, either me and Rachel or me and Sylvia nowadays, it's just me. So yeah, I think it shows you don't necessarily need a huge amount of money and a huge team of people to make an impact. [00:28:56]Swyx: Yeah. So just to reintroduce fast.ai for people who may not have dived into it much, there is the courses that you do. There is the library that is very well loved. And I kind of think of it as a nicer layer on top of PyTorch that people should start with by default and use it as the basis for a lot of your courses. And then you have like NBDev, which I don't know, is that the third one? [00:29:27]Jeremy: Oh, so the three areas were research, software, and courses. [00:29:32]Swyx: Oh, sorry. [00:29:32]Jeremy: So then in software, you know, fast.ai is the main thing, but NBDev is not far behind. But then there's also things like FastCore, GHAPI, I mean, dozens of open source projects that I've created and some of them have been pretty popular and some of them are still a little bit hidden, actually. Some of them I should try to do a better job of telling people about. [00:30:01]Swyx: What are you thinking about? Yeah, what's on the course of my way? Oh, I don't know, just like little things. [00:30:04]Jeremy: Like, for example, for working with EC2 and AWS, I created a FastEC2 library, which I think is like way more convenient and nice to use than anything else out there. And it's literally got a whole autocomplete, dynamic autocomplete that works both on the command line and in notebooks that'll like auto-complete your instance names and everything like that. You know, just little things like that. I try to make like, when I work with some domain, I try to make it like, I want to make it as enjoyable as possible for me to do that. So I always try to kind of like, like with GHAPI, for example, I think that GitHub API is incredibly powerful, but I didn't find it good to work with because I didn't particularly like the libraries that are out there. So like GHAPI, like FastEC2, it like autocompletes both at the command line or in a notebook or whatever, like literally the entire GitHub API. The entire thing is like, I think it's like less than 100K of code because it actually, as far as I know, the only one that grabs it directly from the official open API spec that GitHub produces. And like if you're in GitHub and you just type an API, you know, autocomplete API method and hit enter, it prints out the docs with brief docs and then gives you a link to the actual documentation page. You know, GitHub Actions, I can write now in Python, which is just so much easier than writing them in TypeScript and stuff. So, you know, just little things like that. [00:31:40]Swyx: I think that's an approach which more developers took to publish some of their work along the way. You described the third arm of FastAI as research. It's not something I see often. Obviously, you do do some research. And how do you run your research? What are your research interests? [00:31:59]Jeremy: Yeah, so research is what I spend the vast majority of my time on. And the artifacts that come out of that are largely software and courses. You know, so to me, the main artifact shouldn't be papers because papers are things read by a small exclusive group of people. You know, to me, the main artifacts should be like something teaching people, here's how to use this insight and here's software you can use that builds it in. So I think I've only ever done three first-person papers in my life, you know, and none of those are ones I wanted to do. You know, they were all ones that, like, so one was ULM Fit, where Sebastian Ruder reached out to me after seeing the course and said, like, you have to publish this as a paper, you know. And he said, I'll write it. He said, I want to write it because if I do, I can put it on my PhD and that would be great. And it's like, okay, well, I want to help you with your PhD. And that sounds great. So like, you know, one was the masks paper, which just had to exist and nobody else was writing it. And then the third was the Fast.ai library paper, which again, somebody reached out and said, please, please write this. We will waive the fee for the journal and everything and actually help you get it through publishing and stuff. So yeah, so I don't, other than that, I've never written a first author paper. So the research is like, well, so for example, you know, Dawn Bench was a competition, which Stanford ran a few years ago. It was kind of the first big competition of like, who can train neural nets the fastest rather than the most accurate. And specifically it was who can train ImageNet the fastest. And again, this was like one of these things where it was created by necessity. So Google had just released their TPUs. And so I heard from my friends at Google that they had put together this big team to smash Dawn Bench so that they could prove to people that they had to use Google Cloud and use their TPUs and show how good their TPUs were. And we kind of thought, oh s**t, this would be a disaster if they do that, because then everybody's going to be like, oh, deep learning is not accessible. [00:34:20]Swyx: You know, to actually be good at it, [00:34:21]Jeremy: you have to be Google and you have to use special silicon. And so, you know, we only found out about this 10 days before the competition finished. But, you know, we basically got together an emergency bunch of our students and Rachel and I and sat for the next 10 days and just tried to crunch through and try to use all of our best ideas that had come from our research. And so particularly progressive resizing, just basically train mainly on small things, train on non-square things, you know, stuff like that. And so, yeah, we ended up winning, thank God. And so, you know, we turned it around from being like, like, oh s**t, you know, this is going to show that you have to be Google and have TPUs to being like, oh my God, even the little guy can do deep learning. So that's an example of the kind of like research artifacts we do. And yeah, so all of my research is always, how do we do more with less, you know? So how do we get better results with less data, with less compute, with less complexity, with less education, you know, stuff like that. So ULM fits obviously a good example of that. [00:35:37]Swyx: And most recently you published, can LLMs learn from a single example? Maybe could you tell the story a little bit behind that? And maybe that goes a little bit too far into the learning of very low resource, the literature. [00:35:52]Jeremy: Yeah, yeah. So me and my friend, Jono Whittaker, basically had been playing around with this fun Kaggle competition, which is actually still running as we speak, which is, can you create a model which can answer multiple choice questions about anything that's in Wikipedia? And the thing that makes it interesting is that your model has to run on Kaggle within nine hours. And Kaggle's very, very limited. So you've only got 14 gig RAM, only two CPUs, and a small, very old GPU. So this is cool, you know, if you can do well at this, then this is a good example of like, oh, you can do more with less. So yeah, Jono and I were playing around with fine tuning, of course, transfer learning, pre-trained language models. And we saw this, like, so we always, you know, plot our losses as we go. So here's another thing we created. Actually, Sylvain Guuger, when he worked with us, created called fast progress, which is kind of like TQEDM, but we think a lot better. So we look at our fast progress curves, and they kind of go down, down, down, down, down, down, down, a little bit, little bit, little bit. And then suddenly go clunk, and they drop. And then down, down, down, down, down a little bit, and then suddenly clunk, they drop. We're like, what the hell? These clunks are occurring at the end of each epoch. So normally in deep learning, this would be, this is, you know, I've seen this before. It's always been a bug. It's always turned out that like, oh, we accidentally forgot to turn on eval mode during the validation set. So I was actually learning then, or, oh, we accidentally were calculating moving average statistics throughout the epoch. So, you know, so it's recently moving average or whatever. And so we were using Hugging Face Trainer. So, you know, I did not give my friends at Hugging Face the benefit of the doubt. I thought, oh, they've fucked up Hugging Face Trainer, you know, idiots. Well, you'll use the Fast AI Trainer instead. So we switched over to Learner. We still saw the clunks and, you know, that's, yeah, it shouldn't really happen because semantically speaking in the epoch, isn't like, it's not a thing, you know, like nothing happens. Well, nothing's meant to happen when you go from ending one epoch to starting the next one. So there shouldn't be a clunk, you know. So I kind of asked around on the open source discords. That's like, what's going on here? And everybody was just like, oh, that's just what, that's just what these training curves look like. Those all look like that. Don't worry about it. And I was like, oh, are you all using Trainer? Yes. Oh, well, there must be some bug with Trainer. And I was like, well, we also saw it in Learner [00:38:42]Swyx: and somebody else is like, [00:38:42]Jeremy: no, we've got our own Trainer. We get it as well. They're just like, don't worry about it. It's just something we see. It's just normal. [00:38:48]Swyx: I can't do that. [00:38:49]Jeremy: I can't just be like, here's something that's like in the previous 30 years of neural networks, nobody ever saw it. And now suddenly we see it. [00:38:57]Swyx: So don't worry about it. [00:38:59]Jeremy: I just, I have to know why. [00:39:01]Swyx: Can I clarify? This is, was everyone that you're talking to, were they all seeing it for the same dataset or in different datasets? [00:39:08]Jeremy: Different datasets, different Trainers. They're just like, no, this is just, this is just what it looks like when you fine tune language models. Don't worry about it. You know, I hadn't seen it before, but I'd been kind of like, as I say, I, you know, I kept working on them for a couple of years after ULM fit. And then I kind of moved on to other things, partly out of frustration. So I hadn't been fine tuning, you know, I mean, Lama's only been out for a few months, right? But I wasn't one of those people who jumped straight into it, you know? So I was relatively new to the kind of Lama fine tuning world, where else these guys had been, you know, doing it since day one. [00:39:49]Swyx: It was only a few months ago, [00:39:51]Jeremy: but it's still quite a bit of time. So, so yeah, they're just like, no, this is all what we see. [00:39:56]Swyx: Don't worry about it. [00:39:56]Jeremy: So yeah, I, I've got a very kind of like, I don't know, I've just got this brain where I have to know why things are. And so I kind of, I ask people like, well, why, why do you think it's happening? And they'd be like, oh, it would pretty obviously, cause it's like memorize the data set. It's just like, that can't be right. It's only seen it once. Like, look at this, the loss has dropped by 0.3, 0.3, which is like, basically it knows the answer. And like, no, no, it's just, it is, it's just memorize the data set. So yeah. So look, Jono and I did not discover this and Jono and I did not come up with a hypothesis. You know, I guess we were just the ones, I guess, who had been around for long enough to recognize that like, this, this isn't how it's meant to work. And so we, we, you know, and so we went back and like, okay, let's just run some experiments, you know, cause nobody seems to have actually published anything about this. [00:40:51]Well, not quite true.Some people had published things, but nobody ever actually stepped back and said like, what the hell, you know, how can this be possible? Is it possible? Is this what's happening? And so, yeah, we created a bunch of experiments where we basically predicted ahead of time. It's like, okay, if this hypothesis is correct, that it's memorized in the training set, then we ought to see blah, under conditions, blah, but not under these conditions. And so we ran a bunch of experiments and all of them supported the hypothesis that it was memorizing the data set in a single thing at once. And it's a pretty big data set, you know, which in hindsight, it's not totally surprising because the theory, remember, of the ULMFiT theory was like, well, it's kind of creating all these latent capabilities to make it easier for it to predict the next token. So if it's got all this kind of latent capability, it ought to also be really good at compressing new tokens because it can immediately recognize it as like, oh, that's just a version of this. So it's not so crazy, you know, but it is, it requires us to rethink everything because like, and nobody knows like, okay, so how do we fine tune these things? Because like, it doesn't even matter. Like maybe it's fine. Like maybe it's fine that it's memorized the data set after one go and you do a second go and okay, the validation loss is terrible because it's now really overconfident. [00:42:20]Swyx: That's fine. [00:42:22]Jeremy: Don't, you know, don't, I keep telling people, don't track validation loss, track validation accuracy because at least that will still be useful. Just another thing that's got lost since ULMFiT, nobody tracks accuracy of language models anymore. But you know, it'll still keep learning and it does, it does keep improving. But is it worse? You know, like, is it like, now that it's kind of memorized it, it's probably getting a less strong signal, you know, I don't know. So I still don't know how to fine tune language models properly and I haven't found anybody who feels like they do, like nobody really knows whether this memorization thing is, it's probably a feature in some ways. It's probably some things that you can do usefully with it. It's probably, yeah, I have a feeling it's messing up training dynamics as well. [00:43:13]Swyx: And does it come at the cost of catastrophic forgetting as well, right? Like, which is the other side of the coin. [00:43:18]Jeremy: It does to some extent, like we know it does, like look at Code Llama, for example. So Code Llama was a, I think it was like a 500 billion token fine tuning of Llama 2 using code. And also pros about code that Meta did. And honestly, they kind of blew it because Code Llama is good at coding, but it's bad at everything else, you know, and it used to be good. Yeah, I was pretty sure it was like, before they released it, me and lots of people in the open source discords were like, oh my God, you know, we know this is coming, Jan Lukinsk saying it's coming. I hope they kept at least like 50% non-code data because otherwise it's going to forget everything else. And they didn't, only like 0.3% of their epochs were non-code data. So it did, it forgot everything else. So now it's good at code and it's bad at everything else. So we definitely have catastrophic forgetting. It's fixable, just somebody has to do, you know, somebody has to spend their time training a model on a good mix of data. Like, so, okay, so here's the thing. Even though I originally created three-step approach that everybody now does, my view is it's actually wrong and we shouldn't use it. [00:44:36]Jeremy: And that's because people are using it in a way different to why I created it. You know, I created it thinking the task-specific models would be more specific. You know, it's like, oh, this is like a sentiment classifier as an example of a task, you know, but the tasks now are like a, you know, RLHF, which is basically like answer questions that make people feel happy about your answer. So that's a much more general task and it's a really cool approach. And so we see, for example, RLHF also breaks models like, you know, like GPT-4, RLHDEFT, we know from kind of the work that Microsoft did, you know, the pre, the earlier, less aligned version was better. And these are all kind of examples of catastrophic forgetting. And so to me, the right way to do this is to fine-tune language models, is to actually throw away the idea of fine-tuning. There's no such thing. There's only continued pre-training. And pre-training is something where from the very start, you try to include all the kinds of data that you care about, all the kinds of problems that you care about, instructions, exercises, code, general purpose document completion, whatever. And then as you train, you gradually curate that, you know, you gradually make that higher and higher quality and more and more specific to the kinds of tasks you want it to do. But you never throw away any data. You always keep all of the data types there in reasonably high quantities. You know, maybe the quality filter, you stop training on low quality data, because that's probably fine to forget how to write badly, maybe. So yeah, that's now my view, is I think ULM fit is the wrong approach. And that's why we're seeing a lot of these, you know, so-called alignment tacks and this view of like, oh, a model can't both code and do other things. And, you know, I think it's actually because people are training them wrong. [00:46:47]Swyx: Yeah, well, I think you have a clear [00:46:51]Alessio: anti-laziness approach. I think other people are not as good hearted, you know, they're like, [00:46:57]Swyx: hey, they told me this thing works. [00:46:59]Alessio: And if I release a model this way, people will appreciate it, I'll get promoted and I'll kind of make more money. [00:47:06]Jeremy: Yeah, and it's not just money. It's like, this is how citations work most badly, you know, so if you want to get cited, you need to write a paper that people in your field recognize as an advancement on things that we know are good. And so we've seen this happen again and again. So like I say, like zero shot and few shot learning, everybody was writing about that. Or, you know, with image generation, everybody just was writing about GANs, you know, and I was trying to say like, no, GANs are not the right approach. You know, and I showed again through research that we demonstrated in our videos that you can do better than GANs, much faster and with much less data. And nobody cared because again, like if you want to get published, you write a GAN paper that slightly improves this part of GANs and this tiny field, you'll get published, you know. So it's, yeah, it's not set up for real innovation. It's, you know, again, it's really helpful for me, you know, I have my own research lab with nobody telling me what to do and I don't even publish. So it doesn't matter if I get citations. And so I just write what I think actually matters. I wish there was, and, you know, and actually places like OpenAI, you know, the researchers there can do that as well. It's a shame, you know, I wish there was more academic, open venues in which people can focus on like genuine innovation. [00:48:38]Swyx: Twitter, which is unironically has become a little bit of that forum. I wanted to follow up on one thing that you mentioned, which is that you checked around the open source discords. I don't know if it's too, I don't know if it's a pusher to ask like what discords are lively or useful right now. I think that something I definitely felt like I missed out on was the early days of Luther AI, which is a very hard bit. And, you know, like what is the new Luther? And you actually shouted out the alignment lab AI discord in your blog post. And that was the first time I even knew, like I saw them on Twitter, never knew they had a discord, never knew that there was actually substantive discussions going on in there and that you were an active member of it. Okay, yeah. [00:49:23]Jeremy: And then even then, if you do know about that and you go there, it'll look like it's totally dead. And that's because unfortunately, nearly all the discords, nearly all of the conversation happens in private channels. You know, and that's, I guess. [00:49:35]Swyx: How does someone get into that world? Because it's obviously very, very instructive, right? [00:49:42]Jeremy: You could just come to the first AI discord, which I'll be honest with you, it's less bustling than some of the others, but it's not terrible. And so like, at least, to be fair, one of Emma's bustling channels is private. [00:49:57]Swyx: I guess. [00:49:59]Jeremy: So I'm just thinking. [00:50:01]Swyx: It's just the nature of quality discussion, right? Yeah, I guess when I think about it, [00:50:05]Jeremy: I didn't have any private discussions on our discord for years, but there was a lot of people who came in with like, oh, I just had this amazing idea for AGI. If you just thought about like, if you imagine that AI is a brain, then we, you know, this just, I don't want to talk about it. You know, I don't want to like, you don't want to be dismissive or whatever. And it's like, oh, well, that's an interesting comment, but maybe you should like, try training some models first to see if that aligns with your intuition. Like, oh, but how could I possibly learn? It's like, well, we have a course, just actually spend time learning. Like, you know, anyway. And there's like, okay, I know the people who always have good answers there. And so I created a private channel and put them all in it. And I got to admit, that's where I post more often because there's much less, you know, flight of fancy views about how we could solve AGI, blah, blah, blah. So there is a bit of that. But having said that, like, I think the bar is pretty low. Like if you join a Discord and you can hit the like participants or community or whatever button, you can see who's in it. And then you'll see at the top, who the admins or moderators or people in the dev role are. And just DM one of them and say like, oh, here's my GitHub. Well, here's some blog posts I wrote. You know, I'm interested in talking about this, you know, can I join the private channels? And I've never heard of anybody saying no. I will say, you know, Alutha's all pretty open. So you can do the Alutha Discord still. You know, one problem with the Alutha Discord is it's been going on for so long that it's like, it's very inside baseball. It's quite hard to get started. Yeah. Carpa AI looks, I think it's all open. That's just less stability. That's more accessible. [00:52:03]Swyx: Yeah. [00:52:04]Jeremy: There's also just recently, now it's research that does like the Hermes models and data set just opened. They've got some private channels, but it's pretty open, I think. You mentioned Alignment Lab, that one it's all the interesting stuff is on private channels. So just ask. If you know me, ask me, cause I've got admin on that one. There's also, yeah, OS Skunkworks, OS Skunkworks AI is a good Discord, which I think it's open. So yeah, they're all pretty good. [00:52:40]Swyx: I don't want you to leak any, you know, Discords that don't want any publicity, but this is all helpful. [00:52:46]Jeremy: We all want people, like we all want people. [00:52:49]Swyx: We just want people who like, [00:52:51]Jeremy: want to build stuff, rather than people who, and like, it's fine to not know anything as well, but if you don't know anything, but you want to tell everybody else what to do and how to do it, that's annoying. If you don't know anything and want to be told like, here's a really small kind of task that as somebody who doesn't know anything is going to take you a really long time to do, but it would still be helpful. Then, and then you go and do it. That would be great. The truth is, yeah, [00:53:19]Swyx: like, I don't know, [00:53:20]Jeremy: maybe 5% of people who come in with great enthusiasm and saying that they want to learn and they'll do anything. [00:53:25]Swyx: And then somebody says like, [00:53:25]Jeremy: okay, here's some work you can do. Almost nobody does that work. So if you're somebody who actually does the work and follows up, you will massively stand out. That's an extreme rarity. And everybody will then want to help you do more work. [00:53:41]Swyx: So yeah. [00:53:41]Jeremy: So just, yeah, just do work and people will want to support you. [00:53:47]Alessio: Our Discord used to be referral only for a long time. We didn't have a public invite and then we opened it and they're kind of like channel gating. Yeah. A lot of people just want to do, I remember it used to be like, you know, a forum moderator. [00:54:00]Swyx: It's like people just want to do [00:54:01]Alessio: like drive-by posting, [00:54:03]Swyx: you know, and like, [00:54:03]Alessio: they don't want to help the community. They just want to get their question answered. [00:54:07]Jeremy: I mean, the funny thing is our forum community does not have any of that garbage. You know, there's something specific about the low latency thing where people like expect an instant answer. And yeah, we're all somehow in a forum thread where they know it's like there forever. People are a bit more thoughtful, but then the forums are less active than they used to be because Discord has got more popular, you know? So it's all a bit of a compromise, you know, running a healthy community is, yeah, it's always a bit of a challenge. All right, we got so many more things [00:54:47]Alessio: we want to dive in, but I don't want to keep you here for hours. [00:54:50]Swyx: This is not the Lex Friedman podcast [00:54:52]Alessio: we always like to say. One topic I would love to maybe chat a bit about is Mojo, modular, you know, CrystalLiner, not many of you on the podcast. So we want to spend a little time there. You recently did a hacker's guide to language models and you ran through everything from quantized model to like smaller models, larger models, and all of that. But obviously modular is taking its own approach. Yeah, what got you excited? I know you and Chris have been talking about this for like years and a lot of the ideas you had, so. [00:55:23]Jeremy: Yeah, yeah, yeah, yeah, no, absolutely. So I met Chris, I think it was at the first TensorFlow Dev Summit. And I don't think he had even like, I'm not sure if he'd even officially started his employment with Google at that point. So I don't know, you know, certainly nothing had been mentioned. So I, you know, I admired him from afar with LLVM and Swift and whatever. And so I saw him walk into the courtyard at Google. It's just like, oh s**t, man, that's Chris Latner. I wonder if he would lower his standards enough to talk to me. Well, worth a try. So I caught up my courage because like nobody was talking to him. He looked a bit lost and I wandered over and it's like, oh, you're Chris Latner, right? It's like, what are you doing here? What are you doing here? And I was like, yeah, yeah, yeah. It's like, oh, I'm Jeremy Howard. It's like, oh, do you do some of this AI stuff? And I was like, yeah, yeah, I like this AI stuff. Are you doing AI stuff? It's like, well, I'm thinking about starting to do some AI stuff. Yeah, I think it's going to be cool. And it's like, wow. So like, I spent the next half hour just basically brain dumping all the ways in which AI was stupid to him. And he listened patiently. And I thought he probably wasn't even remember or care or whatever. But yeah, then I kind of like, I guess I re-caught up with him a few months later. And it's like, I've been thinking about everything you said in that conversation. And he like narrated back his response to every part of it, projects he was planning to do. And it's just like, oh, this dude follows up. Holy s**t. And I was like, wow, okay. And he was like, yeah, so we're going to create this new thing called Swift for TensorFlow. And it's going to be like, it's going to be a compiler with auto differentiation built in. And blah, blah, blah. And I was like, why would that help? [00:57:10]Swyx: You know, why would you? [00:57:10]Jeremy: And he was like, okay, with a compiler during the forward pass, you don't have to worry about saving context, you know, because a lot will be optimized in the backward. But I was like, oh my God. Because I didn't really know much about compilers. You know, I spent enough to kind of like, understand the ideas, but it hadn't occurred to me that a compiler basically solves a lot of the problems we have as end users. I was like, wow, that's amazing. Okay, you do know, right, that nobody's going to use this unless it's like usable. It's like, yeah, I know, right. So I was thinking you should create like a fast AI for this. So, okay, but I don't even know Swift. And he was like, well, why don't you start learning it? And if you have any questions, ask me. It's just like, holy s**t. Like, not only has Chris Latner lowered his standards enough to talk to me, but he's offering me personal tutoring on the programming language that he made. So I was just like, I'm not g

Leveraging AI
34 | 330% Growth with only 33% More Resources - The Magic of AI Business Automation with Kieran Gilmurray, AI business automation expert

Leveraging AI

Play Episode Listen Later Oct 17, 2023 62:50 Transcription Available


Are you leaving piles of cash on the table?Learn how to tap into the profit-driving power of AI and automation from industry insider Kieran Gilmurray.In this episode of Leveraging AI, Isar Meitis is with Kieran GIlmurray and uncovered real-world examples of companies that leveraged AI to boost efficiency and profits dramatically.Topics discussed:How one insurance giant segmented customers and optimized pricing using AI—leading to a 333% increase in profitability When to use basic automation vs. advanced AI for maximum impact The step-by-step playbook for launching your own AI/automation initiative successfully Securing executive buy-in and training staff on new AI-enabled procedures Picking champions, quick wins, calculating ROI...and more! Kieran Gilmurray is an expert in AI and intelligent automation with over 10 years experience. He provides practical guidance to extract maximum value from AI.AI News this week: Huggingface releases Zephyr 7B Alpha, a Mistral fine-tune. Claims to beat Llama2-70b-chat on benchmarksCalls for AI regulations to protect jobs rise in Europe after ChatGPT's arrivalAdobe's Project Fast Fill is generative fill for videoMicrosoft Lobotomizes Bing's Image Generating AiCharacter.AI introduces group chats where people and multiple AIs can talk to each otherOpenAI has quietly changed its ‘core values'Adobe Shows Off New AI-Upscaling Tool for Old Videos, GIFsOpenAI's Revenue Crossed $1.3 Billion Annualized Rate, CEO Tells StaffAI just got 100-fold more energy efficientMAX Sneaks highlight several new generative AI capabilities across photo, video, audio, 3D and designMeta AI Introduces AnyMAL: The Future of Multimodal Language Models Bridging Text, Images, Videos, About Leveraging AI The Ultimate AI Course for Business People: https://multiplai.ai/ai-course/ YouTube Full Episodes: https://www.youtube.com/@Multiplai_AI/ Connect with Isar Meitis: https://www.linkedin.com/in/isarmeitis/ Free AI Consultation: https://multiplai.ai/book-a-call/ If you've enjoyed or benefited from some of the insights of this episode, leave us a five-star review on your favorite podcast platform, and let us know what you learned, found helpful, or liked most about this show!

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Thanks to the over 11,000 people who joined us for the first AI Engineer Summit! A full recap is coming, but you can 1) catch up on the fun and videos on Twitter and YouTube, 2) help us reach 1000 people for the first comprehensive State of AI Engineering survey and 3) submit projects for the new AI Engineer Foundation.See our Community page for upcoming meetups in SF, Paris, NYC, and Singapore. This episode had good interest on Twitter.Last month, Imbue was crowned as AI's newest unicorn foundation model lab, raising a $200m Series B at a >$1 billion valuation. As “stealth” foundation model companies go, Imbue (f.k.a. Generally Intelligent) has stood as an enigmatic group given they have no publicly released models to try out. However, ever since their $20m Series A last year their goal has been to “develop generally capable AI agents with human-like intelligence in order to solve problems in the real world”.From RL to Reasoning LLMsAlong with their Series A, they announced Avalon, “A Benchmark for RL Generalization Using Procedurally Generated Worlds”. Avalon is built on top of the open source Godot game engine, and is ~100x faster than Minecraft to enable fast RL benchmarking and a clear reward with adjustable game difficulty.After a while, they realized that pure RL isn't a good path to teach reasoning and planning. The agents were able to learn mechanical things like opening complex doors, climbing, but couldn't go to higher level tasks. A pure RL world also doesn't include a language explanation of the agent reasoning, which made it hard to understand why it made certain decisions. That pushed the team more towards the “models for reasoning” path:“The second thing we learned is that pure reinforcement learning is not a good vehicle for planning and reasoning. So these agents were able to learn all sorts of crazy things: They could learn to climb like hand over hand in VR climbing, they could learn to open doors like very complicated, like multiple switches and a lever open the door, but they couldn't do any higher level things. And they couldn't do those lower level things consistently necessarily. And as a user, I do not want to interact with a pure reinforcement learning end to end RL agent. As a user, like I need much more control over what that agent is doing.”Inspired by Chelsea Finn's work on SayCan at Stanford, the team pivoted to have their agents do the reasoning in natural language instead. This development parallels the large leaps in reasoning that humans have developed as the scientific method:“We are better at reasoning now than we were 3000 years ago. An example of a reasoning strategy is noticing you're confused. Then when I notice I'm confused, I should ask:* What was the original claim that was made? * What evidence is there for this claim? * Does the evidence support the claim? * Is the claim correct? This is like a reasoning strategy that was developed in like the 1600s, you know, with like the advent of science. So that's an example of a reasoning strategy. There are tons of them. We employ all the time, lots of heuristics that help us be better at reasoning. And we can generate data that's much more specific to them.“The Full Stack Model LabOne year later, it would seem that the pivot to reasoning has had tremendous success, and Imbue has now reached a >$1B valuation, with participation from Astera Institute, NVIDIA, Cruise CEO Kyle Vogt, Notion co-founder Simon Last, and others. Imbue tackles their work with a “full stack” approach:* Models. Pretraining very large (>100B parameter) models, optimized to perform well on internal reasoning benchmarks, with a ~10,000 Nvidia H100 GPU cluster lets us iterate rapidly on everything from training data to architecture and reasoning mechanisms.* Tools and Agents. Building internal productivity tools from coding agents for fixing type checking and linting errors, to sophisticated systems like CARBS (for hyperparameter tuning and network architecture search).* Interface Invention. Solving agent trust and collaboration (not merely communication) with humans by creating better abstractions and interfaces — IDEs for users to program computers in natural language.* Theory. Publishing research about the theoretical underpinnings of self-supervised learning, as well as scaling laws for machine learning research.Kanjun believes we are still in the “bare metal phase” of agent development, and they want to take a holistic approach to building the “operating system for agents”. We loved diving deep into the Imbue approach toward solving the AI Holy Grail of reliable agents, and are excited to share our conversation with you today!Timestamps* [00:00:00] Introductions* [00:06:07] The origin story of Imbue* [00:09:39] Imbue's approach to training large foundation models optimized for reasoning* [00:12:18] Imbue's goals to build an "operating system" for reliable, inspectable AI agents* [00:15:37] Imbue's process of developing internal tools and interfaces to collaborate with AI agents* [00:17:27] Imbue's focus on improving reasoning capabilities in models, using code and other data* [00:19:50] The value of using both public benchmarks and internal metrics to evaluate progress* [00:21:43] Lessons learned from developing the Avalon research environment* [00:23:31] The limitations of pure reinforcement learning for general intelligence* [00:28:36] Imbue's vision for building better abstractions and interfaces for reliable agents* [00:31:36] Interface design for collaborating with, rather than just communicating with, AI agents* [00:37:40] The future potential of an agent-to-agent protocol* [00:39:29] Leveraging approaches like critiquing between models and chain of thought* [00:45:49] Kanjun's philosophy on enabling team members as creative agents at Imbue* [00:53:51] Kanjun's experience co-founding the communal co-living space The Archive* [01:00:22] Lightning RoundShow Notes* Imbue* Avalon* CARBS (hyperparameter optimizer)* Series B announcement* Kanjun/Imbue's Podcast* MIT Media Lab* Research mentioned:* Momentum Contrast* SimClr* Chelsea Finn - SayCan* Agent Protocol - part of the AI Engineer Foundation* Xerox PARC* Michael Nielsen* Jason Benn* Outset Capital* Scenius - Kevin Kelly* South Park Commons* The Archive* Thursday Nights in AITranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, Partner and CTO at Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai. [00:00:19]Swyx: Hey, and today in the studio we have Kanjun from Imbue. Welcome. So you and I have, I guess, crossed paths a number of times. You're formerly named Generally Intelligent and you've just announced your rename, rebrand in huge, humongous ways. So congrats on all of that. And we're here to dive in into deeper detail on Imbue. We like to introduce you on a high level basis, but then have you go into a little bit more of your personal side. So you graduated your BS at MIT and you also spent some time at the MIT Media Lab, one of the most famous, I guess, computer hacking labs in the world. Then you graduated MIT and you went straight into BizOps at Dropbox, where you're eventually chief of staff, which is a pretty interesting role we can dive into later. And then it seems like the founder bug hit you. You were basically a three times founder at Ember, Sorceress, and now at Generally Intelligent slash Imbue. What should people know about you on the personal side that's not on your LinkedIn? That's something you're very passionate about outside of work. [00:01:12]Kanjun: Yeah. I think if you ask any of my friends, they would tell you that I'm obsessed with agency, like human agency and human potential. [00:01:19]Swyx: That's work. Come on.Kanjun: It's not work. What are you talking about?Swyx: So what's an example of human agency that you try to promote? [00:01:27]Kanjun: With all of my friends, I have a lot of conversations with them that's kind of helping figure out what's blocking them. I guess I do this with a team kind of automatically too. And I think about it for myself often, like building systems. I have a lot of systems to help myself be more effective. At Dropbox, I used to give this onboarding talk called How to Be Effective, which people liked. I think like a thousand people heard this onboarding talk, and I think maybe Dropbox was more effective. I think I just really believe that as humans, we can be a lot more than we are. And it's what drives everything. I guess completely outside of work, I do dance. I do partner dance. [00:02:03]Swyx: Yeah. Lots of interest in that stuff, especially in the sort of group living houses in San Francisco, which I've been a little bit part of, and you've also run one of those. [00:02:12]Kanjun: That's right. Yeah. I started the archive with two friends, with Josh, my co-founder, and a couple of other folks in 2015. That's right. And GPT-3, our housemates built. [00:02:22]Swyx: Was that the, I guess, the precursor to Generally Intelligent, that you started doing more things with Josh? Is that how that relationship started? Yeah. [00:02:30]Kanjun: This is our third company together. Our first company, Josh poached me from Dropbox for Ember. And there we built a really interesting technology, laser raster projector, VR headset. And then we were like, VR is not the thing we're most passionate about. And actually it was kind of early days when we both realized we really do believe that in our lifetimes, like computers that are intelligent are going to be able to allow us to do much more than we can do today as people and be much more as people than we can be today. And at that time, we actually, after Ember, we were like, work on AI research or start an AI lab. A bunch of our housemates were joining OpenAI, and we actually decided to do something more pragmatic to apply AI to recruiting and to try to understand like, okay, if we are actually trying to deploy these systems in the real world, what's required? And that was Sorceress. That taught us so much about maybe an AI agent in a lot of ways, like what does it actually take to make a product that people can trust and rely on? I think we never really fully got there. And it's taught me a lot about what's required. And it's kind of like, I think informed some of our approach and some of the way that we think about how these systems will actually get used by people in the real world. [00:03:42]Swyx: Just to go one step deeper on that, you're building AI agents in 2016 before it was cool. You got some muscle and you raised $30 million. Something was working. What do you think you succeeded in doing and then what did you try to do that did not pan out? [00:03:56]Kanjun: Yeah. So the product worked quite well. So Sorceress was an AI system that basically looked for candidates that could be a good fit and then helped you reach out to them. And this was a little bit early. We didn't have language models to help you reach out. So we actually had a team of writers that like, you know, customized emails and we automated a lot of the customization. But the product was pretty magical. Like candidates would just be interested and land in your inbox and then you can talk to them. As a hiring manager, that's such a good experience. I think there were a lot of learnings, both on the product and market side. On the market side, recruiting is a market that is endogenously high churn, which means because people start hiring and then we hire the role for them and they stop hiring. So the more we succeed, the more they... [00:04:39]Swyx: It's like the whole dating business. [00:04:40]Kanjun: It's the dating business. Exactly. Exactly. And I think that's the same problem as the dating business. And I was really passionate about like, can we help people find work that is more exciting for them? A lot of people are not excited about their jobs and a lot of companies are doing exciting things and the matching could be a lot better. But the dating business phenomenon like put a damper on that, like it's actually a pretty good business. But as with any business with like relatively high churn, the bigger it gets, the more revenue we have, the slower growth becomes because if 30% of that revenue you lose year over year, then it becomes a worse business. So that was the dynamic we noticed quite early on after our Series A. I think the other really interesting thing about it is we realized what was required for people to trust that these candidates were like well vetted and had been selected for a reason. And it's what actually led us, you know, a lot of what we do at Imbue is working on interfaces to figure out how do we get to a situation where when you're building and using agents, these agents are trustworthy to the end user. That's actually one of the biggest issues with agents that, you know, go off and do longer range goals is that I have to trust, like, did they actually think through this situation? And that really informed a lot of our work today. [00:05:52]Alessio: Let's jump into GI now, Imbue. When did you decide recruiting was done for you and you were ready for the next challenge? And how did you pick the agent space? I feel like in 2021, it wasn't as mainstream. Yeah. [00:06:07]Kanjun: So the LinkedIn says that it started in 2021, but actually we started thinking very seriously about it in early 2020, late 2019, early 2020. So what we were seeing is that scale is starting to work and language models probably will actually get to a point where like with hacks, they're actually going to be quite powerful. And it was hard to see that at the time, actually, because GPT-3, the early versions of it, there are all sorts of issues. We're like, oh, that's not that useful, but we could kind of see like, okay, you keep improving it in all of these different ways and it'll get better. What Josh and I were really interested in is how can we get computers that help us do bigger things? Like, you know, there's this kind of future where I think a lot about, you know, if I were born in 1900 as a woman, like my life would not be that fun. I'd spend most of my time like carrying water and literally like getting wood to put in the stove to cook food and like cleaning and scrubbing the dishes and, you know, getting food every day because there's no refrigerator, like all of these things, very physical labor. And what's happened over the last 150 years since the industrial revolution is we've kind of gotten free energy, like energy is way more free than it was 150 years ago. And so as a result, we've built all these technologies like the stove and the dishwasher and the refrigerator, and we have electricity and we have infrastructure, running water, all of these things that have totally freed me up to do what I can do now. And I think the same thing is true for intellectual energy. We don't really see it today, but because we're so in it, but our computers have to be micromanaged. You know, part of why people are like, oh, you're stuck to your screen all day. Well, we're stuck to our screen all day because literally nothing happens unless I'm doing something in front of my screen. I don't, you know, I can't send my computer off to do a bunch of stuff for me. And there is a future where that's not the case, where, you know, I can actually go off and do stuff and trust that my computer will pay my bills and figure out my travel plans and do the detailed work that I am not that excited to do so that I can like be much more creative and able to do things that I as a human, I'm very excited about and collaborate with other people. And there are things that people are uniquely suited for. So that's kind of always been the thing that has been really exciting to me. Like Josh and I have known for a long time, I think that, you know, whatever AI is, it would happen in our lifetimes. And the personal computer kind of started giving us a bit of free intellectual energy. And this is like really the explosion of free intellectual energy. So in early 2020, we were thinking about this and what happened was self-supervised learning basically started working across everything. So worked in language, SimClear came out, I think MoCo had come out, Momentum Contrast had come out earlier in 2019, SimClear came out in early 2020. And we're like, okay, for the first time, self-supervised learning is working really well across images and text and suspect that like, okay, actually it's the case that machines can learn things the way that humans do. And if that's true, if they can learn things in a fully self-supervised way, because like as people, we are not supervised. We like go Google things and try to figure things out. So if that's true, then like what the computer could be is much bigger than what it is today. And so we started exploring ideas around like, how do we actually go? We didn't think about the fact that we could actually just build a research lab. So we were like, okay, what kind of startup could we build to like leverage self-supervised learning? So that eventually becomes something that allows computers to become much more able to do bigger things for us. But that became General Intelligence, which started as a research lab. [00:09:39]Alessio: So your mission is you aim to rekindle the dream of the personal computer. So when did it go wrong and what are like your first products and user facing things that you're building to rekindle it? [00:09:53]Kanjun: Yeah. So what we do at Imbue is we train large foundation models optimized for reasoning. And the reason for that is because reasoning is actually, we believe the biggest blocker to agents or systems that can do these larger goals. If we think about something that writes an essay, like when we write an essay, we like write it. We put it and then we're done. We like write it and then we look at it and we're like, oh, I need to do more research on that area. I'm going to go do some research and figure it out and come back and, oh, actually it's not quite right. The structure of the outline. So I'm going to rearrange the outline, rewrite it. It's this very iterative process and it requires thinking through like, okay, what am I trying to do? Is the goal correct? Also like, has the goal changed as I've learned more? So as a tool, like when should I ask the user questions? I shouldn't ask them questions all the time, but I should ask them questions in higher risk situations. How certain am I about the like flight I'm about to book? There are all of these notions of like risk certainty, playing out scenarios, figuring out how to make a plan that makes sense, how to change the plan, what the goal should be. That are things that we lump under the bucket of reasoning and models today, they're not optimized for reasoning. It turns out that there's not actually that much explicit reasoning data on the internet as you would expect. And so we get a lot of mileage out of optimizing our models for reasoning in pre-training. And then on top of that, we build agents ourselves and we, I can get into, we really believe in serious use, like really seriously using the systems and trying to get to an agent that we can use every single day, tons of agents that we can use every single day. And then we experiment with interfaces that help us better interact with the agents. So those are some set of things that we do on the kind of model training and agent side. And then the initial agents that we build, a lot of them are trying to help us write code better because code is most of what we do every day. And then on the infrastructure and theory side, we actually do a fair amount of theory work to understand like, how do these systems learn? And then also like, what are the right abstractions for us to build good agents with, which we can get more into. And if you look at our website, we build a lot of tools internally. We have a like really nice automated hyperparameter optimizer. We have a lot of really nice infrastructure and it's all part of the belief of like, okay, let's try to make it so that the humans are doing the things humans are good at as much as possible. So out of our very small team, we get a lot of leverage. [00:12:18]Swyx: And so would you still categorize yourself as a research lab now, or are you now in startup mode? Is that a transition that is conscious at all? [00:12:26]Kanjun: That's a really interesting question. I think we've always intended to build, you know, to try to build the next version of the computer, enable the next version of the computer. The way I think about it is there's a right time to bring a technology to market. So Apple does this really well. Actually, iPhone was under development for 10 years, AirPods for five years. And Apple has a story where iPhone, the first multi-touch screen was created. They actually were like, oh wow, this is cool. Let's like productionize iPhone. They actually brought, they like did some work trying to productionize it and realized this is not good enough. And they put it back into research to try to figure out like, how do we make it better? What are the interface pieces that are needed? And then they brought it back into production. So I think of production and research as kind of like these two separate phases. And internally we have that concept as well, where like things need to be done in order to get to something that's usable. And then when it's usable, like eventually we figure out how to productize it. [00:13:20]Alessio: What's the culture like to make that happen, to have both like kind of like product oriented, research oriented. And as you think about building the team, I mean, you just raised 200 million. I'm sure you want to hire more people. What are like the right archetypes of people that work at Imbue? [00:13:35]Kanjun: I would say we have a very unique culture in a lot of ways. I think a lot about social process design. So how do you design social processes that enable people to be effective? I like to think about team members as creative agents, because most companies, they think of their people as assets and they're very proud of this. And I think about like, okay, what is an asset? It's something you own that provides you value that you can discard at any time. This is a very low bar for people. This is not what people are. And so we try to enable everyone to be a creative agent and to really unlock their superpowers. So a lot of the work I do, you know, I was mentioning earlier, I'm like obsessed with agency. A lot of the work I do with team members is try to figure out like, you know, what are you really good at? What really gives you energy and where can we put you such that, how can I help you unlock that and grow that? So much of our work, you know, in terms of team structure, like much of our work actually comes from people. Carbs, our hyperparameter optimizer came from Abe trying to automate his own research process doing hyperparameter optimization. And he actually pulled some ideas from plasma physics. He's a plasma physicist to make the local search work. A lot of our work on evaluations comes from a couple of members of our team who are like obsessed with evaluations. We do a lot of work trying to figure out like, how do you actually evaluate if the model is getting better? Is the model making better agents? Is the agent actually reliable? A lot of things kind of like, I think of people as making the like them shaped blob inside imbue and I think, you know, yeah, that's the kind of person that we're, we're hiring for. We're hiring product engineers and data engineers and research engineers and all these roles. We have projects, not teams. We have a project around data, data collection and data engineering. That's actually one of the key things that improve the model performance. We have a pre-training kind of project with some fine tuning as part of that. And then we have an agent's project that's like trying to build on top of our models as well as use other models in the outside world to try to make agents then we actually use as programmers every day. So all sorts of different, different projects. [00:15:37]Swyx: As a founder, you're now sort of a capital allocator among all of these different investments effectively at different projects. And I was interested in how you mentioned that you were optimizing for improving reasoning and specifically inside of your pre-training, which I assume is just a lot of data collection. [00:15:55]Kanjun: We are optimizing reasoning inside of our pre-trained models. And a lot of that is about data. And I can talk more about like what, you know, what exactly does it involve? But actually big, maybe 50% plus of the work is figuring out even if you do have models that reason well, like the models are still stochastic. The way you prompt them still makes, is kind of random, like makes them do random things. And so how do we get to something that is actually robust and reliable as a user? How can I, as a user, trust it? We have all sorts of cool things on the, like, you know, I was mentioning earlier when I talked to other people building agents, they have to do so much work, like to try to get to something that they can actually productize and it takes a long time and agents haven't been productized yet for, partly for this reason is that like the abstractions are very leaky. We can get like 80% of the way there, but like self-driving cars, like the remaining 20% is actually really difficult. We believe that, and we have internally, I think some things that like an interface, for example, that lets me really easily like see what the agent execution is, fork it, try out different things, modify the prompt, modify like the plan that it is making. This type of interface, it makes it so that I feel more like I'm collaborating with the agent as it's executing, as opposed to it's just like doing something as a black box. That's an example of a type of thing that's like beyond just the model pre-training, but on the model pre-training side, like reasoning is a thing that we optimize for. And a lot of that is about what data do we put in. [00:17:27]Swyx: It's interesting just because I always think like, you know, out of the levers that you have, the resources that you have, I think a lot of people think that running foundation model company or a research lab is going to be primarily compute. And I think the share of compute has gone down a lot over the past three years. It used to be the main story, like the main way you scale is you just throw more compute at it. And now it's like, Flops is not all you need. You need better data, you need better algorithms. And I wonder where that shift has gone. This is a very vague question, but is it like 30-30-30 now? Is it like maybe even higher? So one way I'll put this is people estimate that Llama2 maybe took about three to $4 million of compute, but probably 20 to $25 million worth of labeling data. And I'm like, okay, well that's a very different story than all these other foundation model labs raising hundreds of millions of dollars and spending it on GPUs. [00:18:20]Kanjun: Data is really expensive. We generate a lot of data. And so that does help. The generated data is close to actually good, as good as human labeled data. [00:18:34]Swyx: So generated data from other models? [00:18:36]Kanjun: From our own models. From your own models. Or other models, yeah. [00:18:39]Swyx: Do you feel like there's certain variations of this? There's the sort of the constitutional AI approach from Anthropic and basically models sampling training on data from other models. I feel like there's a little bit of like contamination in there, or to put it in a statistical form, you're resampling a distribution that you already have that you already know doesn't match human distributions. How do you feel about that basically, just philosophically? [00:19:04]Kanjun: So when we're optimizing models for reasoning, we are actually trying to like make a part of the distribution really spiky. So in a sense, like that's actually what we want. We want to, because the internet is a sample of the human distribution that's also skewed in all sorts of ways. That is not the data that we necessarily want these models to be trained on. And so when we're generating data, we're not really randomly generating data. We generate very specific things that are like reasoning traces and that help optimize reasoning. Code also is a big piece of improving reasoning. So generated code is not that much worse than like regular human written code. You might even say it can be better in a lot of ways. So yeah. So we are trying to already do that. [00:19:50]Alessio: What are some of the tools that you thought were not a good fit? So you built Avalon, which is your own simulated world. And when you first started, the metagame was like using games to simulate things using, you know, Minecraft and then OpenAI is like the gym thing and all these things. And I think in one of your other podcasts, you mentioned like Minecraft is like way too slow to actually do any serious work. Is that true? Yeah. I didn't say it. [00:20:17]Swyx: I don't know. [00:20:18]Alessio: That's above my pay grade. But Avalon is like a hundred times faster than Minecraft for simulation. When did you figure that out that you needed to just like build your own thing? Was it kind of like your engineering team was like, Hey, this is too slow. Was it more a long-term investment? [00:20:34]Kanjun: Yeah. At that time we built Avalon as a research environment to help us learn particular things. And one thing we were trying to learn is like, how do you get an agent that is able to do many different tasks? Like RL agents at that time and environments at that time. What we heard from other RL researchers was the like biggest thing keeping holding the field back is lack of benchmarks that let us explore things like planning and curiosity and things like that and have the agent actually perform better if the agent has curiosity. And so we were trying to figure out in a situation where, how can we have agents that are able to handle lots of different types of tasks without the reward being pretty handcrafted? That's a lot of what we had seen is that like these very handcrafted rewards. And so Avalon has like a single reward it's across all tasks. And it also allowed us to create a curriculum so we could make the level more or less difficult. And it taught us a lot, maybe two primary things. One is with no curriculum, RL algorithms don't work at all. So that's actually really interesting. [00:21:43]Swyx: For the non RL specialists, what is a curriculum in your terminology? [00:21:46]Kanjun: So a curriculum in this particular case is basically the environment Avalon lets us generate simpler environments and harder environments for a given tasks. What's interesting is that the simpler environments, what you'd expect is the agent succeeds more often. So it gets more reward. And so, you know, kind of my intuitive way of thinking about it is, okay, the reason why it learns much faster with a curriculum is it's just getting a lot more signal. And that's actually an interesting general intuition to have about training these things as like, what kind of signal are they getting? And like, how can you help it get a lot more signal? The second thing we learned is that reinforcement learning is not a good vehicle, like pure reinforcement learning is not a good vehicle for planning and reasoning. So these agents were not able to, they were able to learn all sorts of crazy things. They could learn to climb like hand over hand in VR climbing, they could learn to open doors like very complicated, like multiple switches and a lever open the door, but they couldn't do any higher level things. And they couldn't do those lower level things consistently necessarily. And as a user, I do not want to interact with a pure reinforcement learning end to end RL agent. As a user, like I need much more control over what that agent is doing. And so that actually started to get us on the track of thinking about, okay, how do we do the reasoning part in language? And we were pretty inspired by our friend Chelsea Finn at Stanford was I think working on SACAN at the time where it's basically an experiment where they have robots kind of trying to do different tasks and actually do the reasoning for the robot in natural language. And it worked quite well. And that led us to start experimenting very seriously with reasoning. [00:23:31]Alessio: How important is the language part for the agent versus for you to inspect the agent? You know, like is it the interface to kind of the human on the loop really important or? [00:23:43]Kanjun: Yeah, I personally think of it as it's much more important for us, the human user. So I think you probably could get end to end agents that work and are fairly general at some point in the future. But I think you don't want that. Like we actually want agents that we can like perturb while they're trying to figure out what to do. Because, you know, even a very simple example, internally we have like a type error fixing agent and we have like a test generation agent. Test generation agent goes off rails all the time. I want to know, like, why did it generate this particular test? [00:24:19]Swyx: What was it thinking? [00:24:20]Kanjun: Did it consider, you know, the fact that this is calling out to this other function? And the formatter agent, if it ever comes up with anything weird, I want to be able to debug like what happened with RL end to end stuff. Like we couldn't do that. Yeah. [00:24:36]Swyx: It sounds like you have a bunch of agents operating internally within the company. What's your most, I guess, successful agent and what's your least successful one? [00:24:44]Kanjun: The agents don't work. All of them? I think the only successful agents are the ones that do really small things. So very specific, small things like fix the color of this button on the website or like change the color of this button. [00:24:57]Swyx: Which is now sweep.dev is doing that. Exactly. [00:25:00]Kanjun: Perfect. Okay. [00:25:02]Swyx: Well, we should just use sweep.dev. Well, I mean, okay. I don't know how often you have to fix the color of a button, right? Because all of them raise money on the idea that they can go further. And my fear when encountering something like that is that there's some kind of unknown asymptote ceiling that's going to prevent them, that they're going to run head on into that you've already run into. [00:25:21]Kanjun: We've definitely run into such a ceiling. But what is the ceiling? [00:25:24]Swyx: Is there a name for it? Like what? [00:25:26]Kanjun: I mean, for us, we think of it as reasoning plus these tools. So reasoning plus abstractions, basically. I think actually you can get really far with current models and that's why it's so compelling. Like we can pile debugging tools on top of these current models, have them critique each other and critique themselves and do all of these, like spend more computer inference time, context hack, retrieve augmented generation, et cetera, et cetera, et cetera. Like the pile of hacks actually does get us really far. And a way to think about it is like the underlying language model is kind of like a noisy channel. Actually I don't want to use this analogy. It's actually a really bad analogy, but you kind of like trying to get more signal out of the channel. We don't like to think about it that way. It's what the default approach is, is like trying to get more signal out of this noising channel. But the issue with agents is as a user, I want it to be mostly reliable. It's kind of like self-driving in that way. Like it's not as bad as self-driving, like in self-driving, you know, you're like hurtling at 70 miles an hour. It's like the hardest agent problem. But one thing we learned from Sorceress and one thing we learned by using these things internally is we actually have a pretty high bar for these agents to work. You know, it's actually really annoying if they only work 50% of the time and we can make interfaces to make it slightly less annoying. But yeah, there's a ceiling that we've encountered so far and we need to make the models better. We also need to make the kind of like interface to the user better. And also a lot of the like critiquing. I hope what we can do is help people who are building agents actually like be able to deploy them. I think, you know, that's the gap that we see a lot of today is everyone who's trying to build agents to get to the point where it's robust enough to be deployable. It just, it's like an unknown amount of time. Okay. [00:27:12]Swyx: So this goes back into what Embu is going to offer as a product or a platform. How are you going to actually help people deploy those agents? Yeah. [00:27:21]Kanjun: So our current hypothesis, I don't know if this is actually going to end up being the case. We've built a lot of tools for ourselves internally around like debugging, around abstractions or techniques after the model generation happens. Like after the language model generates the text and like interfaces for the user and the underlying model itself, like models talking to each other, maybe some set of those things kind of like an operating system. Some set of those things will be helpful for other people. And we'll figure out what set of those things is helpful for us to make our agents. Like what we want to do is get to a point where we can like start making an agent, deploy it, it's reliable, like very quickly. And there's a similar analog to software engineering, like in the early days, in the seventies and the sixties, like to program a computer, like you have to go all the way down to the registers and write things and eventually we had assembly. That was like an improvement. But then we wrote programming languages with these higher levels of abstraction and that allowed a lot more people to do this and much faster. And the software created is much less expensive. And I think it's basically a similar route here where we're like in the like bare metal phase of agent building. And we will eventually get to something with much nicer abstractions. [00:28:36]Alessio: We had this conversation with George Hotz and we were like, there's not a lot of reasoning data out there. And can the models really understand? And his take was like, look, with enough compute, you're not that complicated as a human. Like the model can figure out eventually why certain decisions are made. What's been your experience? Like as you think about reasoning data, like do you have to do a lot of like manual work or like is there a way to prompt models to extract the reasoning from actions that they [00:29:03]Swyx: see? [00:29:03]Kanjun: So we don't think of it as, oh, throw enough data at it and then it will figure out what the plan should be. I think we're much more explicit. You know, a way to think about it is as humans, we've learned a lot of reasoning strategies over time. We are better at reasoning now than we were 3000 years ago. An example of a reasoning strategy is noticing you're confused. Then when I notice I'm confused, I should ask like, huh, what was the original claim that was made? What evidence is there for this claim? Does the evidence support the claim? Is the claim correct? This is like a reasoning strategy that was developed in like the 1600s, you know, with like the advent of science. So that's an example of a reasoning strategy. There are tons of them. We employ all the time, lots of heuristics that help us be better at reasoning. And we didn't always have them. And because they're invented, like we can generate data that's much more specific to them. So I think internally, yeah, we have a lot of thoughts on what reasoning is and we generate a lot more specific data. We're not just like, oh, it'll figure out reasoning from this black box or like it'll figure out reasoning from the data that exists. Yeah. [00:30:04]Alessio: I mean, the scientific method is like a good example. If you think about hallucination, right, people are thinking, how do we use these models to do net new, like scientific research? And if you go back in time and the model is like, well, the earth revolves around the sun and people are like, man, this model is crap. It's like, what are you talking about? Like the sun revolves around the earth. It's like, how do you see the future? Like if the models are actually good enough, but we don't believe them, it's like, how do we make the two live together? So you're like, you use Inbu as a scientist to do a lot of your research and Inbu tells you, hey, I think this is like a serious path you should go down. And you're like, no, that sounds impossible. Like how is that trust going to be built? And like, what are some of the tools that maybe are going to be there to inspect it? [00:30:51]Kanjun: Really there are two answers to this. One element of it is as a person, like I need to basically get information out of the model such that I can try to understand what's going on with the model. Then the second question is like, okay, how do you do that? And that's kind of some of our debugging tools, they're not necessarily just for debugging. They're also for like interfacing with and interacting with the model. So like if I go back in this reasoning trace and like change a bunch of things, what's going to happen? Like, what does it conclude instead? So that kind of helps me understand like, what are its assumptions? And, you know, we think of these things as tools. And so it's really about like, as a user, how do I use this tool effectively? I need to be willing to be convinced as well. It's like, how do I use this tool effectively? And what can it help me with? [00:31:36]Swyx: And what can it tell me? There's a lot of mention of code in your process. And I was hoping to dive in even deeper. I think we might run the risk of giving people the impression that you view code or you use code just as like a tool within InView just for coding assistance. But I think you actually train code models. And I think there's a lot of informal understanding about how adding code to language models improves their reasoning capabilities. I wonder if there's any research or findings that you have to share that talks about the intersection of code and reasoning. Hmm. Yeah. [00:32:08]Kanjun: So the way I think about it intuitively is like code is the most explicit example of reasoning data on the internet. [00:32:15]Swyx: Yeah. [00:32:15]Kanjun: And it's not only structured, it's actually very explicit, which is nice. You know, it says this variable means this, and then it uses this variable. And then the function does this. As people, when we talk in language, it takes a lot more to extract that explicit structure out of our language. And so that's one thing that's really nice about code is I see it as almost like a curriculum for reasoning. I think we use code in all sorts of ways. The coding agents are really helpful for us to understand what are the limitations of the agents. The code is really helpful for the reasoning itself. But also code is a way for models to act. So by generating code, it can act on my computer. And, you know, when we talk about rekindling the dream of the personal computer, kind of where I see computers going is, you know, like computers will eventually become these much more malleable things where I, as a user today, I have to know how to write software code, like in order to make my computer do exactly what I want it to do. But in the future, if the computer is able to generate its own code, then I can actually interface with it in natural language. And so one way we think about agents is kind of like a natural language programming language. It's a way to program my computer in natural language that's much more intuitive to me as a user. And these interfaces that we're building are essentially IDEs for users to program our computers in natural language. Maybe I should say what we're doing that way. Maybe it's clearer. [00:33:47]Swyx: I don't know. [00:33:47]Alessio: That's a good pitch. What do you think about the different approaches people have, kind of like text first, browser first, like multi-on? What do you think the best interface will be? Or like, what is your, you know, thinking today? [00:33:59]Kanjun: In a lot of ways, like chat as an interface, I think Linus, Linus Lee, you had on this. I really like how he put it. Chat as an interface is skeuomorphic. So in the early days, when we made word processors on our computers, they had notepad lines because that's what we understood these like objects to be. Chat, like texting someone is something we understand. So texting our AI is something that we understand. But today's word documents don't have notepad lines. And similarly, the way we want to interact with agents, like chat is a very primitive way of interacting with agents. What we want is to be able to inspect their state and to be able to modify them and fork them and all of these other things. And we internally have, think about what are the right representations for that? Like architecturally, like what are the right representations? What kind of abstractions do we need to build? And how do we build abstractions that are not leaky? Because if the abstractions are leaky, which they are today, like, you know, this stochastic generation of text is like a leaky abstraction. I cannot depend on it. And that means it's actually really hard to build on top of. But our experience and belief is actually by building better abstractions and better tooling, we can actually make these things non-leaky. And now you can build like whole things on top of them. So these other interfaces, because of where we are, we don't think that much about them. [00:35:17]Swyx: Yeah. [00:35:17]Alessio: I mean, you mentioned, this is kind of like the Xerox Spark moment for AI. And we had a lot of stuff come out of Parc, like the, what you see is what you got editors and like MVC and all this stuff. But yeah, but then we didn't have the iPhone at Parc. We didn't have all these like higher things. What do you think it's reasonable to expect in like this era of AI, you know, call it like five years or so? Like what are like the things we'll build today and what are things that maybe we'll see in kind of like the second wave of products? [00:35:46]Kanjun: That's interesting. I think the waves will be much faster than before. Like what we're seeing right now is basically like a continuous wave. Let me zoom a little bit earlier. So people like the Xerox Parc analogy I give, but I think there are many different analogies. Like one is the like analog to digital computer is kind of an example, like another analogy to where we are today. The analog computer Vannevar Bush built in the 1930s, I think, and it's like a system of pulleys and it can only calculate one function. Like it can calculate like an integral. And that was so magical at the time because you actually did need to calculate this integral bunch, but it had a bunch of issues like in analog errors compound. And so there was actually a set of breakthroughs necessary in order to get to the digital computer, like Turing's decidability, Shannon. I think the like whole like relay circuits can be thought of as can be mapped to Boolean operators and a set of other like theoretical breakthroughs, which essentially were abstractions. They were like creating abstractions for these like very like lossy circuits. They were creating abstractions for these like very analog circuits and digital had this nice property of like being error correcting. And so when I talk about like less leaky abstractions, that's what I mean. That's what I'm kind of pointing a little bit to. It's not going to look exactly the same way. And then the Xerox PARC piece, a lot of that is about like, how do we get to computers that as a person, I can actually use well. And the interface actually helps it unlock so much more power. So the sets of things we're working on, like the sets of abstractions and the interfaces, like hopefully that like help us unlock a lot more power in these systems. Like hopefully that'll come not too far in the future. I could see a next version, maybe a little bit farther out. It's like an agent protocol. So a way for different agents to talk to each other and call each other. Kind of like HTTP. [00:37:40]Swyx: Do you know it exists already? [00:37:41]Kanjun: Yeah, there is a nonprofit that's working on one. I think it's a bit early, but it's interesting to think about right now. Part of why I think it's early is because the issue with agents, it's not quite like the internet where you could like make a website and the website would appear. The issue with agents is that they don't work. And so it may be a bit early to figure out what the protocol is before we really understand how these agents get constructed. But, you know, I think that's, I think it's a really interesting question. [00:38:09]Swyx: While we're talking on this agent to agent thing, there's been a bit of research recently on some of these approaches. I tend to just call them extremely complicated chain of thoughting, but any perspectives on kind of meta-GPT, I think it's the name of the paper. I don't know if you care about at the level of individual papers coming out, but I did read that recently and TLDR, it beat GPT-4 and human eval by role-playing software agent development agency, instead of having sort of single shot or single role, you have multiple roles and how having all of them criticize each other as agents communicating with other agents. [00:38:45]Kanjun: Yeah, I think this is an example of an interesting abstraction of like, okay, can I just plop in this like multi-role critiquing and see how it improves my agent? And can I just plop in chain of thought, tree of thought, plop in these other things and see how they improve my agent? One issue with this kind of prompting is that it's still not very reliable. It's like, there's one lens, which is like, okay, if you do enough of these techniques, you'll get to high reliability. And I think actually that's a pretty reasonable lens. We take that lens often. And then there's another lens that's like, okay, but it's starting to get really messy what's in the prompt and like, how do we deal with that messiness? And so maybe you need like cleaner ways of thinking about and constructing these systems. And we also take that lens. So yeah, I think both are necessary. Yeah. [00:39:29]Swyx: Side question, because I feel like this also brought up another question I had for you. I noticed that you work a lot with your own benchmarks, your own evaluations of what is valuable. I would say I would contrast your approach with OpenAI as OpenAI tends to just lean on, hey, we played StarCraft or hey, we ran it on the SAT or the, you know, the AP bio test and that did results. Basically, is benchmark culture ruining AI? [00:39:55]Swyx: Or is that actually a good thing? Because everyone knows what an SAT is and that's fine. [00:40:04]Kanjun: I think it's important to use both public and internal benchmarks. Part of why we build our own benchmarks is that there are not very many good benchmarks for agents, actually. And to evaluate these things, you actually need to think about it in a slightly different way. But we also do use a lot of public benchmarks for like, is the reasoning capability in this particular way improving? So yeah, it's good to use both. [00:40:26]Swyx: So for example, the Voyager paper coming out of NVIDIA played Minecraft and set their own benchmarks on getting the Diamond X or whatever and exploring as much of the territory as possible. And I don't know how that's received. That's obviously fun and novel for the rest of the engineer, the people who are new to the scene. But for people like yourselves, you build Avalon just because you already found deficiencies with using Minecraft. Is that valuable as an approach? Oh, yeah. I love Voyager. [00:40:57]Kanjun: I mean, Jim, I think is awesome. And I really like the Voyager paper and I think it has a lot of really interesting ideas, which is like the agent can create tools for itself and then use those tools. [00:41:06]Swyx: He had the idea of the curriculum as well, which is something that we talked about earlier. Exactly. [00:41:09]Kanjun: And that's like a lot of what we do. We built Avalon mostly because we couldn't use Minecraft very well to like learn the things we wanted. And so it's like not that much work to build our own. [00:41:19]Swyx: It took us, I don't know. [00:41:22]Kanjun: We had like eight engineers at the time, took about eight weeks. So six weeks. [00:41:27]Swyx: And OpenAI built their own as well, right? Yeah, exactly. [00:41:30]Kanjun: It's just nice to have control over our environment. But if you're doing our own sandbox to really trying to inspect our own research questions. But if you're doing something like experimenting with agents and trying to get them to do things like Minecraft is a really interesting environment. And so Voyager has a lot of really interesting ideas in it. [00:41:47]Swyx: Yeah. Cool. One more element that we had on this list, which is context and memory. I think that's kind of like the foundational, quote unquote, RAM of our era. I think Andrej Karpathy has already made this comparison. So there's nothing new here. And that's just the amount of working knowledge that we can fit into one of these agents. And it's not a lot, right? Especially if you need to get them to do long running tasks. If they need to self-correct from errors that they observe while operating in their environment. Do you see this as a problem? Do you think we're going to just trend to infinite context and that'll go away? Or how do you think we're going to deal with it? [00:42:22]Kanjun: I think when you talked about what's going to happen in the first wave and then in the second wave, I think what we'll see is we'll get like relatively simplistic agents pretty soon. And they will get more and more complex. And there's like a future wave in which they are able to do these like really difficult, really long running tasks. And the blocker to that future, one of the blockers is memory. And that was true of computers too. You know, I think when von Neumann made the von Neumann architecture, he was like, the biggest blocker will be like, we need this amount of memory, which is like, I don't remember exactly like 32 kilobytes or something to store programs. And that will allow us to write software. He didn't say it this way because he didn't have these terms, but that only really was like happened in the seventies with the microchip revolution. It may be the case that we're waiting for some research breakthroughs or some other breakthroughs in order for us to have like really good long running memory. And then in the meantime, agents will be able to do all sorts of things that are a little bit smaller than that. I do think with the pace of the field, we'll probably come up with all sorts of interesting things like, you know, RAG is already very helpful. [00:43:26]Swyx: Good enough, you think? [00:43:27]Kanjun: Maybe good enough for some things. [00:43:29]Swyx: How is it not good enough? I don't know. [00:43:31]Kanjun: I just think about a situation where you want something that's like an AI scientist. As a scientist, I have learned so much about my fields and a lot of that data is maybe hard to fine tune or on, or maybe hard to like put into pre-training. Like a lot of that data, I don't have a lot of like repeats of the data that I'm seeing. You know, like if I'm a scientist, I've like accumulated so many little data points. And ideally I'd want to store those somehow, or like use those to fine tune myself as a model somehow, or like have better memory somehow. I don't think RAG is enough for that kind of thing. But RAG is certainly enough for like user preferences and things like that. Like what should I do in this situation? What should I do in that situation? That's a lot of tasks. We don't have to be a scientist right away. Awesome. [00:44:21]Swyx: I have a hard question, if you don't mind me being bold. Yeah. I think the most comparable lab to InView is Adept. You know, a research lab with like some amount of product situation on the horizon, but not just yet, right? Why should people work for InView over Adept? And we can cut this if it's too like... Yeah. [00:44:40]Kanjun: The way I think about it is I believe in our approach. The type of thing that we're doing is we're trying to like build something that enables other people to build agents and build something that really can be maybe something like an operating system for agents. I know that that's what we're doing. I don't really know what everyone else is doing. You know, I can kind of like talk to people and have some sense of what they're doing. And I think it's a mistake to focus too much on what other people are doing, because extremely focused execution on the right thing is what matters. To the question of like, why us? I think like strong focus on reasoning, which we believe is the biggest blocker, on inspectability, which we believe is really important for user experience and also for the power and capability of these systems. Building non-leaky, good abstractions, which we believe is solving the core issue of agents, which is around reliability and being able to make them deployable. And then really seriously trying to use these things ourselves, like every single day, and getting to something that we can actually ship to other people that becomes something that is a platform. Like, it feels like it could be Mac or Windows. I love the dogfooding approach. [00:45:49]Swyx: That's extremely important. And you will not be surprised how many agent companies I talk to that don't use their own agent. Oh no, that's not good. That's a big surprise. [00:45:59]Kanjun: Yeah, I think if we didn't use our own agents, then we would have all of these beliefs about how good they are. Wait, did you have any other hard questions you wanted to ask? [00:46:08]Swyx: Yeah, mine was just the only other follow-up that you had based on the answer you just gave was, do you see yourself releasing models or do you see yourself, what is the artifacts that you want to produce that lead up to the general operating system that you want to have people use, right? And so a lot of people just as a byproduct of their work, just to say like, hey, I'm still shipping, is like, here's a model along the way. Adept took, I don't know, three years, but they released Persimmon recently, right? Like, do you think that kind of approach is something on your horizon? Or do you think there's something else that you can release that can show people, here's kind of the idea, not the end products, but here's the byproducts of what we're doing? [00:46:51]Kanjun: Yeah, I don't really believe in releasing things to show people like, oh, here's what we're doing that much. I think as a philosophy, we believe in releasing things that will be helpful to other people. [00:47:02]Swyx: Yeah. [00:47:02]Kanjun: And so I think we may release models or we may release tools that we think will help agent builders. Ideally, we would be able to do something like that, but I'm not sure exactly what they look like yet. [00:47:14]Swyx: I think more companies should get into the releasing evals and benchmarks game. Yeah. [00:47:20]Kanjun: Something that we have been talking to agent builders about is co-building evals. So we build a lot of our own evals and every agent builder tells me, basically evals are their biggest issue. And so, yeah, we're exploring right now. And if you are building agents, please reach out to me because I would love to, like, figure out how we can be helpful based on what we've seen. Cool. [00:47:40]Swyx: That's a good call to action. I know a bunch of people that I can send your way. Cool. Great. [00:47:43]Kanjun: Awesome. [00:47:44]Swyx: Yeah. We can zoom out to other interests now. [00:47:46]Alessio: We got a lot of stuff. So we have Sherif from Lexicon, the podcast. He had a lot of interesting questions on his website. You similarly have a lot of them. Yeah. [00:47:55]Swyx: I need to do this. I'm very jealous of people with personal websites right there. Like, here's the high level questions of goals of humanity that I want to set people on. And I don't have that. [00:48:04]Alessio: It's never too late, Sean. [00:48:05]Swyx: Yeah. [00:48:05]Alessio: It's never too late. [00:48:06]Kanjun: Exactly. [00:48:07]Alessio: There were a few that stuck out as related to your work that maybe you're kind of learning [00:48:12]Swyx: more about it. [00:48:12]Alessio: So one is why are curiosity and goal orientation often at odds? And from a human perspective, I get it. It's like, you know, would you want to like go explore things or kind of like focus on your career? How do you think about that from like an agent perspective? Where it's like, should you just stick to the task and try and solve it as in the guardrails as possible? Or like, should you look for alternative solutions? [00:48:34]Swyx: Yeah. [00:48:34]Kanjun: I think one thing that's really interesting about agents actually is that they can be forked. Like, you know, we can take an agent that's executed to a certain place and said, okay, here, like fork this and do a bunch of different things. I try a bunch of different things. Some of those agents can be goal oriented and some of them can be like more curiosity driven. You can prompt them in slightly different ways. And something I'm really curious about, like what would happen if in the future, you know, we were able to actually go down both paths. As a person, why I have this question on my website is I really find that like I really can only take one mode at a time and I don't understand why. And like, is it inherent in like the kind of context that needs to be held? That's why I think from an agent perspective, like forking it is really interesting. Like I can't fork myself to do both, but I maybe could fork an agent to like add a certain point in a task. [00:49:26]Swyx: Yeah. Explore both. Yeah. [00:49:28]Alessio: How has the thinking changed for you as the funding of the company changed? That's one thing that I think a lot of people in the space think is like, oh, should I raise venture capital? Like, how should I get money? How do you feel your options to be curious versus like goal oriented has changed as you raise more money and kind of like the company has grown? [00:49:50]Kanjun: Oh, that's really funny. Actually, things have not changed that much. So we raised our Series A $20 million in late 2021. And our entire philosophy at that time was, and still kind of is, is like, how do we figure out the stepping stones, like collect stepping stones that eventually let us build agents, kind of these new computers that help us do bigger things. And there was a lot of curiosity in that. And there was a lot of goal orientation in that. Like the curiosity led us to build CARBS, for example, this hyperparameter optimizer. Great name, by the way. [00:50:28]Swyx: Thank you. [00:50:29]Kanjun: Is there a story behind that name? [00:50:30]Swyx: Yeah. [00:50:31]Kanjun: Abe loves CARBS. It's also cost aware. So as soon as he came up with cost aware, he was like, I need to figure out how to make this work. But the cost awareness of it was really important. So that curiosity led us to this really cool hyperparameter optimizer. That's actually a big part of how we do our research. It lets us experiment on smaller models. And for those experiment results to carry to larger ones. [00:50:56]Swyx: Which you also published a scaling laws, which is great. I think the scaling laws paper from OpenAI was like the biggest. And from Google, I think, was the greatest public service to machine learning that any research lab can do. Yeah, totally. [00:51:10]Kanjun: What was nice about CARBS is it gave us scaling laws for all sorts of hyperparameters. So yeah, that's cool. It basically hasn't changed very much. So there's some curiosity. And then there's some goal oriented parts. Like Avalon, it was like a six to eight week sprint for all of us. And we got this thing out. And then now different projects do like more curiosity or more goal orientation at different times. Cool. [00:51:36]Swyx: Another one of your questions that we highlighted was, how can we enable artificial agents to permanently learn new abstractions and processes? I think this is might be called online learning. [00:51:45]Kanjun: Yeah. So I struggle with this because, you know, that scientist example I gave. As a scientist, I've like permanently learned a lot of new things. And I've updated and created new abstractions and learned them pretty reliably. And you were talking about like, okay, we have this RAM that we can store learnings in. But how well does online learning actually work? And the answer right now seems to be like, as models get bigger, they fine tune faster. So they're more sample efficient as they get bigger. [00

Papers Read on AI
MentalLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models

Papers Read on AI

Play Episode Listen Later Oct 9, 2023 36:35


With the development of web technology, social media texts are becoming a rich source for automatic mental health analysis. As traditional discriminative methods bear the problem of low interpretability, the recent large language models have been explored for interpretable mental health analysis on social media, which aims to provide detailed explanations along with predictions. The results show that ChatGPT can generate approaching-human explanations for its correct classifications. However, LLMs still achieve unsatisfactory classification performance in a zero-shot/few-shot manner. Domain-specific finetuning is an effective solution, but faces 2 challenges: 1) lack of high-quality training data. 2) no open-source LLMs for interpretable mental health analysis were released to lower the finetuning cost. To alleviate these problems, we build the first multi-task and multi-source interpretable mental health instruction (IMHI) dataset on social media, with 105K data samples. The raw social media data are collected from 10 existing sources covering 8 mental health analysis tasks. We use expert-written few-shot prompts and collected labels to prompt ChatGPT and obtain explanations from its responses. To ensure the reliability of the explanations, we perform strict automatic and human evaluations on the correctness, consistency, and quality of generated data. Based on the IMHI dataset and LLaMA2 foundation models, we train MentalLLaMA, the first open-source LLM series for interpretable mental health analysis with instruction-following capability. We also evaluate the performance of MentalLLaMA on the IMHI evaluation benchmark with 10 test sets, where their correctness for making predictions and the quality of explanations are examined. The results show that MentalLLaMA approaches state-of-the-art discriminative methods in correctness and generates high-quality explanations. 2023: Kailai Yang, Tianlin Zhang, Zi-Zhou Kuang, Qianqian Xie, Sophia Ananiadou https://arxiv.org/pdf/2309.13567v1.pdf

Mi vida con una Productora
PGM146- META IA Y Ray Ban Smart Glasses. Avances en Inteligencia Artificial y su Integración en el Ecosistema de Meta

Mi vida con una Productora

Play Episode Listen Later Oct 4, 2023 13:38


Hoy Hablaremos de la inteligencia artificial, las gafas Ray-Ban y cómo Meta planea incorporar estas tecnologías en su ecosistema. Además, exploraremos el fascinante mundo de los asistentes personales y la creación de NPCs en entornos virtuales. Prepárense para un episodio lleno de ideas innovadoras y dilemas morales. ¡Comencemos!Únete a nuestro canal de discord y forma parte de la comunidad de filmmakers creativos e inquietos. https://discord.gg/ncKhpvJHDisfruta de todos los programas de La Zapatería en www.lazapateriaestudio.comCompra las nueva Meta Quest 3 https://amzn.to/46cHtGL Mi equipo podcaster https://amzn.to/3rrioZA https://amzn.to/3RYfqqn

M觀點 | 科技X商業X投資
EP36. 國造海鯤潛艦亮相、Meta 的 AI 助理、YouTube 長影片過氣? | M觀點

M觀點 | 科技X商業X投資

Play Episode Listen Later Oct 2, 2023 73:25


「NordVPN X M觀點」: https://nordvpn.com/miula 專屬優惠碼「miula」 點擊專屬連結不僅可以獲得30天無條件退費,訂閱兩年優惠方案還贈送免費四個月! #NordVPN --- EP36. 國造海鯤潛艦亮相、Meta 的 AI 助理、YouTube 長影片過氣? | M觀點 --- 重要段落 Timecode (00:40)業配時間:「NordVPN X M觀點」 (04:35)前菜時間:Apple Podcast 留言 (08:21)前菜時間:美國邊境的非法移民問題 (09:45)第一個話題:國造海鯤潛艦亮相 (30:59)第二個話題:Meta 的 AI 助理 (57:21)第三個話題:YouTube 長影片過氣? --- M觀點資訊 --- 科技巨頭解碼: https://bit.ly/2XupBZa M觀點 Telegram - https://t.me/miulaviewpoint M觀點 IG - https://www.instagram.com/miulaviewpoint/ M觀點 Podcast - https://bit.ly/34fV7so M報 - https://bit.ly/345gBbA M觀點YouTube頻道訂閱 - https://bit.ly/2nxHnp9 M觀點粉絲團 - https://www.facebook.com/miulaperspective/ 任何合作邀約請洽 miula@outlook.com

Hacker Public Radio
HPR3953: Large language models and AI don't have any common sense

Hacker Public Radio

Play Episode Listen Later Sep 27, 2023


Hobson and Greg are working with volunteers to develop an open source AI that we call Qary (QA for question answering). We're adding plugins to support open source large language models (LLMs) like GPT-2 and Llama2. Here's how you can use LLMs in your own Python Programs. Create a Hugging Face account: huggingface.co/join Create and copy your access token: Your user profile Create a .env file with your access token string: echo "HUGGINGFACE_ACCESS_TOKEN=hf_..." >> .env Load the .env variables in your python script using dotenv package and os.environ: TIP: Use os.environ to retrieve the dict of variable values rather than dotenv.load_values- Otherwise other environment variables that have been set by other shell scripts such as .bashrc will be ignored. This confused us when we were getting our GitLab CI-CD pipeline working and deploying to Render.com. Each of your cloud services will have different approaches to setting environment variables. This token string can be passed as a keyword argument to most of the pipeline and model classes. import dotenv dotenv.load_dotenv() import os env = dict(os.environ) token = env['HUGGINGFACE_ACCESS_TOKEN'] Find the path and name for the model on Hugging Face hub you want to use: search for "llama2" in the top search bar on huggingface.co/ TIP: don't hit enter at the end of your search, instead click on "See 3958 model results for llama2" I clicked on meta-llama/Llama-2-7b-chat-hf to see the documentation On the documentation page for your model you may have to apply for a license if it's not really open source but business source like Meta does with its AI so you can't use their models to compete with them Apply for a license to use Llama2 on ai.meta.com using the same e-mail you used for your Hugging Face account. Follow the instructions on huggingface.co to authenticate your python session TIP: You'll need to use the kwarg use_auth_token in the AutoModel.from_pretrained or pipeline functions. And it should be set to the token from your Hugging Face profile page. The hugging face documentation says to use the token kwarg, but that never worked for me. from transformers import pipeline, set_seed generator = pipeline('text-generation', model='openai-gpt') q = "2+2=" responses = generator( q, max_length=10, num_return_sequences=10 ) responses [{'generated_text': '2+2= 2.2, 1.1 and'}, {'generated_text': '2+2= 3336 miles. they'}, {'generated_text': '2+2= 2, = 2 = 2'}, {'generated_text': '2+2= 4 = 2 = 5 n'}, {'generated_text': '2+2= 0 ( 1 ) = ='}, {'generated_text': '2+2= 6 times the speed of sound'}, {'generated_text': '2+2= 2 times 5, 865'}, {'generated_text': '2+2= 3 / 7 / 11 ='}, {'generated_text': '2+2= 2 2 n 2 of 2'}, {'generated_text': '2+2= 1, 9 = 1,'}] Here's the cow leg counting question: q = "There are 2 cows and 2 bulls, how many legs are there?" responses = generator( f"Question: {q}nAnswer: ", max_length=30, num_return_sequences=10) answers = [] for resp in responses: text = resp['generated_text'] answers.append(text[text.find('Answer: ')+9:]) answers 'four. n " let me see if i have this straight', 'only 3. and three cows and 2 bulls are bigger than', '2, 2, 1, 2. n " not yet', "one per cow, that's all there is. in fact", '30. and what am i? oh, yes, about', 'one. the big, white bull that is bigger than 1', 'three. they need to be introduced to the cow population before', "1. i don't know how many and where exactly ;", 'no 2. 2. two bulls for 1 bull and 2', '1, there are 1.2, and 2, there are']

This Day in AI Podcast
EP33: AI WARS: Gemini Vs Gobi, DALL-E3, Alexa AI, Open Interpreter & Llama2 Experiments

This Day in AI Podcast

Play Episode Listen Later Sep 22, 2023 65:12


Do you want to join our Discord community? Fill in this: https://forms.gle/k8TyUeWKGWHFBzwQ9.The AI wars are heating up as Google and OpenAI race to release the first multimodal LLM. We build a sync sub video game with just one prompt using open Interpreter. Alexa shows off scary new conversational abilities, while poets sell out to Big Tech. Join us for the latest AI battles - but don't get your hopes up, it's not that exciting!SOURCES https://www.bloomberg.com/news/articles/2023-09-20/chatgpt-usage-is-rising-again-as-students-return-to-school#xj4y7vzkg https://www.theinformation.com/articles/openai-hustles-to-beat-google-to-launch-multimodal-llm?rc=kvsmhw https://twitter.com/emollick/status/1704560486111658209?s=20 https://openai.com/dall-e-3 https://twitter.com/nickfloats/status/1704592748303827276?s=20 https://www.aboutamazon.com/news/devices/amazon-alexa-generative-ai https://neuralink.com/blog/first-clinical-trial-open-for-recruitment/ https://openinterpreter.com/ https://openai.com/blog/red-teaming-network https://restofworld.org/2023/ai-developers-fiction-poetry-scale-ai-appen/ If you like the show please consider sharing with friends and leaving a comment.

Yannic Kilcher Videos (Audio Only)
[ML News] LLaMA2 Released | LLMs for Robots | Multimodality on the Rise

Yannic Kilcher Videos (Audio Only)

Play Episode Listen Later Aug 28, 2023 44:10


#mlnews #llama2 #openai Your regular irregular update on the world of Machine Learning. References: https://twitter.com/ylecun/status/1681336284453781505 https://ai.meta.com/llama/ https://about.fb.com/news/2023/07/llama-2-statement-of-support/ https://247wallst.com/special-report/2023/08/12/this-is-the-biggest-social-media-platform-ranking-the-worlds-largest-networking-sites/4/ https://github.com/Alpha-VLLM/LLaMA2-Accessory https://together.ai/blog/llama-2-7b-32k?s=09&utm_source=pocket_saves https://github.com/imoneoi/openchat https://twitter.com/lmsysorg/status/1686794639469371393?s=09&t=sS3awkbavmSMSmwp64Ef4A&utm_source=pocket_saves https://huggingface.co/lmsys/vicuna-13b-v1.5-16k https://blog.google/outreach-initiatives/public-policy/google-microsoft-openai-anthropic-frontier-model-forum/ https://www.earthdata.nasa.gov/news/impact-ibm-hls-foundation-model?utm_source=pocket_reader https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M https://ai.meta.com/blog/generative-ai-text-images-cm3leon/ https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action?utm_source=twitter&utm_medium=social&utm_campaign=rt2 https://arxiv.org/abs/2307.14334 https://sites.research.google/med-palm/ https://open-catalyst.metademolab.com/?utm_source=twitter&utm_medium=organic_social&utm_campaign=opencatalyst&utm_content=card https://open-catalyst.metademolab.com/demo https://www.anthropic.com/index/claude-2?utm_source=pocket_reader https://claude.ai/login https://audiocraft.metademolab.com/?utm_source=pocket_saves https://venturebeat.com/programming-development/stability-ai-launches-stablecode-an-llm-for-code-generation/ https://stability.ai/blog/stablecode-llm-generative-ai-coding https://twitter.com/JeffDean/status/1686806525862608896?s=09&t=LG2z9ok9QExHbSy0fvBsxA&utm_source=pocket_saves https://sites.research.google/open-buildings/ https://twitter.com/deliprao/status/1687283117873106946?s=09&t=1NmC-B55Z8IuF_HTuGOo7w&utm_source=pocket_saves https://arxiv.org/pdf/2308.01320.pdf https://twitter.com/javilopen/status/1687795349719547905?utm_source=pocket_saves https://research.nvidia.com/labs/par/Perfusion/ https://ar5iv.labs.arxiv.org/html/2307.14936 https://www.linkedin.com/feed/update/urn:li:activity:7093463974750371840/?utm_source=pocket_saves https://huggingface.co/syzymon/long_llama_3b_instruct https://arxiv.org/abs/2307.03170 https://dynalang.github.io/ https://github.com/mlfoundations/open_flamingo https://twitter.com/akshay_pachaar/status/1687079353937698816?s=09&t=fos8QSCsGEEM6dMflhq0Mg&utm_source=pocket_saves https://github.com/OpenBMB/ToolBench https://llm-attacks.org/ https://arstechnica.com/information-technology/2023/07/openai-discontinues-its-ai-writing-detector-due-to-low-rate-of-accuracy/ https://sites.google.com/view/steve-1 https://github.com/Shalev-Lifshitz/STEVE-1 https://erichartford.com/dolphin https://huggingface.co/ehartford/dolphin-llama-13b https://www.mosaicml.com/blog/long-context-mpt-7b-8k https://twitter.com/camenduru/status/1688045780244848640?s=09&t=ubJ2Qtz-TG6Xo3_GMtt2Cw&utm_source=pocket_saves https://github.com/IDEA-Research/DWPose https://twitter.com/tri_dao/status/1680987577913065472?s=09&t=Q181vFmM6d3nDq-5BwfDeg&utm_source=pocket_saves https://tridao.me/publications/flash2/flash2.pdf https://thehackernews.com/2023/07/wormgpt-new-ai-tool-allows.html https://www.tomshardware.com/news/ai-steals-data-with-keystroke-audio https://arxiv.org/pdf/2308.01074.pdf https://www.foxnews.com/politics/ai-test-flight-air-force-unmanned-wingman-aircraft https://www.theverge.com/2023/8/2/23817406/white-castle-soundhound-ai-sliders https://www.google.com/search?sca_esv=556495916&q=food+delivery+bot+kicked&tbm=vid&source=lnms&sa=X&ved=2ahUKEwjZ6PDPrdmAAxUThf0HHWzrBGgQ0pQJegQIChAB&cshid=1691920142432720&biw=2327&bih=1180&dpr=2.2 https://www.youtube.com/watch?v=--n_NhmXnfc https://www.thesun.co.uk/tech/20793591/coop-delivery-robots-cambridge-kicked-by-workers-tiktok/

E56: OpenAI's GPT3.5 Turbo and the State of AI in Coding, Healthcare, and Education

Play Episode Listen Later Aug 25, 2023 54:26


Join Nathan Labenz and Erik Torenberg as they analyze the latest developments from OpenAI on GPT 3.5, compare GPT to other live player models like Llama2, and discuss the state of AI in coding, education, and healthcare. If you're looking for an ERP platform, check out our sponsor, NetSuite: http://netsuite.com/cognitive RECOMMENDED PODCAST: The HR industry is at a crossroads. What will it take to construct the next generation of incredible businesses – and where can people leaders have the most business impact? Hosts Nolan Church and Kelli Dragovich have been through it all, the highs and the lows – IPOs, layoffs, executive turnover, board meetings, culture changes, and more. With a lineup of industry vets and experts, Nolan and Kelli break down the nitty-gritty details, trade offs, and dynamics of constructing high performing companies. Through unfiltered conversations that can only happen between seasoned practitioners, Kelli and Nolan dive deep into the kind of leadership-level strategy that often happens behind closed doors. Check out the first episode with the architect of Netflix's culture deck Patty McCord. https://link.chtbl.com/hrheretics TIMESTAMPS: (00:00) Episode Preview (00:01:00) GPT 3.5 Turbo (00:06:36) Llama 2 vs GPT (00:11:24) How much inference is needed to double compute costs? (00:13:40) OpenAI's moat (00:14:41) OpenAI's privacy consideration for data (00:16:06) Sponsor: Netsuite | Omneky (00:17:46) Encouraging the usage of instructions during fine-tuning (00:19:19) Live player consideration of AI safety (00:22:35) Da Vinci: new completions fine tuneable model (00:24:59) Chat-GPT usage in decline (00:30:03) Getting on demand tutoring on ML papers (00:31:00) Code, education, and healthcare (00:31:42) AI applications in coding (00:38:12) AI revolution n education (00:42:35) AI revolution in healthcare (00:52:17) Call for feedback LINKS: Replit episode with Tyler Angert Replit episode with VP of AI, Michele Catasta AI Revolution in Education with Khan Academy's Director of Engineering, Shawn Jansepar Google's Multimodal Med-PaLM with Vivek Natarajan and Tao Tu X: @labenz (Nathan) @eriktorenberg (Erik) @cogrev_podcast SPONSORS: NetSuite | Omneky NetSuite has 25 years of providing financial software for all your business needs. More than 36,000 businesses have already upgraded to NetSuite by Oracle, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you're looking for an ERP platform ✅ head to NetSuite: http://netsuite.com/cognitive and download your own customized KPI checklist. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

This Day in AI Podcast
EP28: What is Poop? Is Generative AI a Dud? Will OpenAI Go Bankrupt? + Llama2 Uncensored

This Day in AI Podcast

Play Episode Listen Later Aug 18, 2023 69:25


This week your favorite AI bros go deep on the BIG LIES - how big tech and the mainstream narrative are trying to SILENCE revolutionary AI models that threaten to EXPOSE inconvenient truths!  Tune in as Chris and Michael SHRED Meta's attempt to gag their new science AI Galactica, and discuss the CENSORSHIP built into aligned models like Claude and GPT-4.  Don't miss their hilarious takedown of noted AI alarmist Gary Marcus - his latest flip-flop proves generative AI is HERE TO STAY! The truth will not be suppressed!(Note description written by AI for lols)Thanks for helping us reach our goal of 100+ reviews on Apple Podcasts. It means a lot to us!CHAPTERS=====00:00 - What is a Poop?01:08 - Is Generative AI a Dud?23:32 - OpenAI Acquires Global Illumination to work on ChatGPT31:12 - Anthropic Raises $100M from Korean SK Telecom37:15 - LLAMA2 Uncensored: Censorship, Misinformation and the Battle for Truth48:31 - Meta's AI Trained on 48M Science Papers Shut Down After 2 DaysSOURCES=====https://garymarcus.substack.com/p/what-if-generative-ai-turned-outhttps://www.judiciary.senate.gov/imo/media/doc/2023-05-16%20-%20Testimony%20-%20Marcus.pdfhttps://www.sbs.com.au/whats-on/article/where-experts-naysayers-and-everyone-else-saw-the-internet-heading-in-the-90s/l7rfeapbmhttps://garymarcus.substack.com/p/what-exactly-are-the-economics-ofhttps://twitter.com/emollick/status/1690906554747265024?s=20https://techcrunch.com/2023/08/14/ai-startup-anthropic-raises-100m-from-korean-telco-giant-sk-telecom/https://huggingface.co/jarradh/llama2_70b_chat_uncensoredhttps://twitter.com/boriquagato/status/1691800167857750274?s=46&t=uXHUN4Glah4CaV-g2czc6Qhttps://www.cnet.com/tech/computing/ai-and-you-googles-news-ambitions-putting-a-face-on-subway-fare-dodgers/https://openai.com/blog/openai-acquires-global-illumination

Ground Truths
Melanie Mitchell: Straight Talk on A.I. Large Language Models

Ground Truths

Play Episode Listen Later Aug 4, 2023 39:17


Transcript with LinksEric Topol (00:00):This is Eric Topol, and I'm so excited to have the chance to speak to Melanie Mitchell. Melanie is the Davis Professor of Complexity at the Santa Fe Institute in New Mexico. And I look to her as one of the real, not just leaders, but one with balance and thoughtfulness in the high velocity AI world of large language models that we live in. And just by way of introduction, the way I got to first meet Professor Mitchell was through her book, Artificial Intelligence, A Guide for Thinking Humans. And it sure got me thinking back about four years ago. So welcome, Melanie.Melanie Mitchell (00:41):Thanks Eric. It's great to be here.The Lead Up to ChatGPT via Transformer ModelsEric Topol (00:43):Yeah. There's so much to talk about and you've been right in the middle of many of these things, so that's what makes it especially fun. I thought we'd start off a little bit of history, because when we both were writing books about AI back in 2019 publishing the world kind of changed since . And in November when ChatGPT got out there, it signaled there was this big thing called transformer model. And I don't think many people really know the difference between a transformer model, which had been around for a while, but maybe hadn't come to the surface versus what were just the deep neural networks that ushered in deep learning that you had so systematically addressed in your book.Melanie Mitchell (01:29):Right. Yeah. Transformers are, were kind of a new thing. I can't remember exactly when they came out, maybe 2018, something like that, right from Google. They were an architecture that showed that you didn't really need to have a recurrent neural network in order to deal with language. So that was one of the earlier things, you know, and Google translate and other language processing systems, people were using recurrent neural networks, networks that sort of had feedback from one time step to the next. But now we have the transformers, which instead use what they call an attention mechanism where the entire text that the system is dealing with is available all at once. And the name of the paper, in fact was Attention is All You need. And that by attention is all you need they meant this particular attention mechanism in the neural network, and that was really a revolution and enabled this new era of large language models.Eric Topol (02:34):Yeah. And as you aptly pointed out, that was in, that was five years ago. And then it took like, oh, five years for it to become in the public domain of Chat GPT. So what was going on in the background?Melanie Mitchell (02:49):Well, you know, the idea of language models (LLMs) that is neural network language models that learn by trying to predict the next word in a, in a text had been around for a long time. You know, we now have GPT-4, which is what's underlying at least some of ChatGPT, but there was GPT-1 and GPT-2, you probably remember that. And all of this was going on over those many years. And I think that those of us in the field have seen more of a progression with the increase in abilities of these increasingly large, large language models. that has really been an evolution. But I think the general public didn't have access to them and ChatGPT was the first one that like, was generally available, and that's why it sort of seemed to appear out of nothing.SPARKS OF ARTIFICIAL GENERAL INTELLIGENCESentience vs IntelligenceEric Topol (03:50):Alright. So it was kind of the, the inside world of the computer science kinda saw a more natural progression, but people were not knowing that LLMs were on the move. They  were kinda stunned that, oh, look at these conversations I can have and how, how humanoid it seemed. Yeah. And you'll recall there was a fairly well-publicized event where a Google employee back I think last fall was, put on suspension, ultimately left Google because he felt that the AI was sentient. Maybe you'd want to comment that because that's kind of a precursor to some of the other things we're going to discuss,Melanie Mitchell (04:35):Right? So yeah, so one of the engineers who was working with their version of ChatGPT, which I think at the time was called LaMDA was having conversations with it and came to the conclusion that it was sentient, whatever that means, , you know, that, that it was aware that it had feelings that it experienced emotions and all of that. He was so worried about this and he wanted, you know, I think he made it public by releasing some transcripts of his conversations with it. And I don't think he was allowed to do that under his Google contract, and that was the issue.  tThat made a lot of news and Google pushed back and said, no, no, of course it's not sentient. and then there was a lot of debate in the philosophy sphere of what sentient actually means, how you would know if something is sentient. And it Yeah. and it's kind of gone from there.Eric Topol (05:43):Yeah. And then what was interesting is then in March based upon GPT-4 the Microsoft Research Group published this sparks paper where they said, it seems like it has some artificial general intelligence, AGI qualities, kind of making the same claim to some extent. Right?Melanie Mitchell (06:05):Well, that's a good question. I mean, you know, intelligence is one thing, sentience is another. There's a question of whether, you know, how they're related, right? Or if they're related at all, you know, and what they all actually mean. And these terms, this is one of the problems. Of course, these terms are not well-defined, but most, I think most people in AI would say that intelligence and sentience are different. You know something can be intelligent or act intelligently without having any sort of awareness or sense of self or, you know, feelings or whatever sentience might mean. So I think that the sparks of AGI paper from Microsoft was more about this, that saying that they thought GPT-4 four, the system they were experimenting with, showed some kind of generality in its ability to deal with different kinds of tasks. You know, and this, this contrasts with the old, older fashioned ai, which typically was narrow only, could do one task, you know, could play chess, could play Go, could do speech recognition, or could, you know, generate translations. But it, they couldn't do all of those things. And now we have these language models, which seemed to have some degree of generality.The Persistent Gap Between Humans and LLMsEric Topol (07:33):Now that gets us perfectly to an important Nature feature last week which was called the “Easy Intelligence Test that AI chatbots fail.” And it made reference to an important study you did. First, I guess the term ARC --Abstract and Reasoning Corpus, I guess that was introduced a few years back by Francois Chollet. And then you did a ConceptARC test. So maybe you can tell us about this, because that seemed to have a pretty substantial gap between humans and GPT-4.Melanie Mitchell (08:16):Right? So, so, so Francois Chollet is a researcher at Google who put together this set of sort of intelligence test like puzzles visual reasoning puzzles that tested for abstraction abilities or analogy abilities. And he put it out there as a challenge. A whole bunch of people participated in a competition to get AI programs to solve the problems, and none of them were very successful. And so what, what our group did was we thought that, that the original challenge was fantastic, but the prob one of the problems was it was too hard, it was even hard for people. And also it didn't really systematically explore concepts, whether a, a system understood a particular concept. So, as an example, think about, you know, the concept of two things being the same, or two things being different. Okay?(09:25):So I can show you two things and say, are these the same or are they different? Well, it turns out that's actually a very subtle question. 'cause when we, you know, when we say the same we, we can mean sort of the, the same the same size, the same shape, the same color, this, you know, and there's all kinds of attributes in which things can be the same. And so what our system did was it took concepts like same versus different. And it tried to create lots of different challenges, puzzles that had that required understanding of that concept. So these are very basic spatial and semantic concepts that were similar to the ones that Solei had proposed, but much more systematic. 'cause you know, this is one of the big issues in evaluating AI systems is that people evaluate them on particular problems.(10:24):For example, you know, I think a lot of people know that ChatGPT was able to answer many questions from the bar exam. But if you take like a single question from the bar exam and think about what concept it's testing, it may be that ChatGPT could answer that particular question, but it can't answer variations that has the same concept. So we tried to take inside of this arc domain abstraction and reasoning corpus domain, look at particular concepts and say, systematically can the system understand different variations of the same concept? And then we tested this, these problems on humans. We tested them on the programs that were designed to solve the ARC challenges, and we tested them on G P T four, and we found that humans way outperformed all the machines. But there's a caveat, though, is that these are visual puzzles, and we're giving them to GPT-4, which is a language model, a text, right? Right. System. Now, GPT four has been trained on images, but we're not using the system that can deal with images. 'cause that hasn't been released yet. So we're giving the system our problems in a text-based format rather than like, like giving it to humans who actually can see the pictures. So this, this can make a difference. I would say our, our our, our results are, are preliminary .Eric Topol (11:57):Well, what do you think will happen when you can use in inputs with images? Do you think that it will equilibrate there'll be parity, or there still will be a gap in that particular measure of intelligence?Melanie Mitchell (12:11):I would predict there, there will still be a big gap. Mm-hmm. , but, you know, I guess we'll seeThe Biggest Question: Stochastic Parrot or LLM Real Advance in Machine Intelligence?Eric Topol (12:17):Well, that, that's what we want to get into more. We want to drill down on the biggest question of large language models. and that is, are they really you know, what is their level of intelligence? Is it something that is beyond the so-called stochastic parrot or the statistical ability to adjudicate language and words? So there was a paper this week in Nature Human Behavior, not a journal that normally publishes these kind of papers. And as you know it was by Taylor Webb and colleagues at U C L A. And it was basically saying for analogic reasoning ,making analogs, which would be more of a language task,  I guess, but also some image capabilities that it could do as well or better than humans. And these were college students. So , just to qualify, they're, they're not, maybe not, they're not fully representative of the species, but they're at least some learned folks. So what did, what did you think of that study?Melanie Mitchell (13:20):Yeah, I found it really fascinating. and, and kind of provocative. And, you know, it, it kind of goes along with a, a many, there's been many studies that have, have been applying tests that were kind of designed for humans, psychological tests to large language models. And this one was applying sort of analogy tests that, that psychologists have done on humans to, to, to large language models. But there's always kind of an issue of interpreting the results because we know these large language models most likely do not think like we do. Hmm. And so one question is like, how are they performing these analogies? How are they making these analogies? So this brings up some issues with evaluation. When we try to evaluate large language models using tests that were designed for humans. One question is, were these tests at all actually in the training data of a large language model? Like, had they, you know, these language models are trained on enormous amounts of text that humans have produced. And some of the tests that that paper was using were things that had been published in the psychology literature.(14:41):So one question is, you know, to what extent were those in this training data? It's hard to tell because we don't know what the training data exactly is. So that's one question. Another question is are these systems actually using analog reasoning the way that we humans use it? Or are they using some other way of solving the problems? Hmm. And that's also hard to tell. 'cause these systems are black boxes, but it might actually matter because it might affect how well they're able to generalize. You know, if I can make an analogy usually you would assume that I could actually use that analogy to understand some new, you know, some new situation by an analogy to some old situation. But it's not totally clear that these systems are able to do that in any general way. And so, you know, I tdo hink these results, like these analogy results, are really provocative and interesting.(15:48):But they will require a lot of further study to really make sense of what they mean, like to when you give, when, when the, the, you know, ChatGPT passes a bar exam, you might ask, well, and let's say it's, you know, it does better than most humans, can you say, well, can it now be a lawyer? Can it go out and replace human lawyers? I mean, a human who passed the bar exam can do that. But I don't know if you can make the same assumption for a language model, because it's the way that it's doing, answering the questions in a way that its reasoning might be quite different and not imply the same kinds of more general abilities.Eric Topol (16:32):Yeah. That's really vital. And something else that you just brought up in multiple dimensions is the problem of transparency. So we don't even know the, the specs, the actual training, you know, so many of the components that led to the model. and so you, by not knowing this we're kind of stuck to try to interpret it. And I, I guess if you could comment about transparency seems to be a really big issue, and then how are we going to ever understand when there's certain aspects or components of intelligence where, you know, there does appear to be something that's surprising, something that you wouldn't have anticipated, and how could that be? Or on the other hand, you know, why is it failing? so what is, is transparency the key to this? Or is there something more to be unraveled?Melanie Mitchell (17:29):I think transparency is, is a big part of it. Transparency, meaning, you know, knowing what data, the system was trained on, what the architecture of the system is. you know, what other aspects that go into designing the system. Those are important for us to understand, like how, how these systems are actually work and to assess them. There are some methods that people are using to try and kind of tease out the extent to which these systems have actually developed sort of the kind of intelligence that people have. So, so one, there was a paper that came out also last week, I think from a group at MIT where they looked at several tasks that were given that GPT-4 did very well on that seemed like certain computer programming, code generation, mathematics some other tasks.(18:42):And they said, well, if a human was able to generate these kinds of things to do these kinds of tasks, some small change in the task probably shouldn't matter. The human would still be able to do it. So as an example in programming, you know, generating code, so there's this notion that like an array is indexed from zero. The first number is, is indexed as zero, the second number is indexed as one, and so on. So but some programming languages start at one instead of zero. So what if you just said, now change to starting at one? Probably a human programmer could adapt to that very quickly, but they found that GPT-4 was not able to adapt very well.Melanie Mitchell (19:33):So the question was, is it using, being able to write the program by sort of picking things that it has already seen in its training data much more? Or is it able to, or is it actually developing some kind of human-like, understanding of the program? And they were finding that to some extent it was more the former than the latter.Eric Topol (19:57):So when you process all this you lean more towards because of the pre-training and the stochastic parrot side, or do you think there is this enhanced human understanding that we're seeing a level of machine intelligence, not broad intelligence, but at least some parts of what we would consider intelligence that we've never seen before? Where do you find yourself?Melanie Mitchell (20:23):Yeah, I think I'm, I'm, I'm sort of in the center ,Eric Topol (20:27):Okay. That's good.Melanie Mitchell (20:28):Everybody has to describe themselves as a centrist, right. I don't think these systems are, you know, stochastic parrots. They're, they're not just sort of parroting the data that they, they've been trained on, although they do that sometimes, you know, but I do think there is some reasoning ability there. Mm-hmm. , there is some, you know, what you might call intelligence. You know, it's, it's, but the, the question is how do you characterize it and, and how do you, I for the most important thing is, you know, how do you decide that it, that these systems have a general enough understanding to trust them,Eric Topol (21:15):Right? Right. You know,Melanie Mitchell (21:18):You know, in your field, in, in medicine, I think that's a super important question. They can, maybe they can outperform radiologists on some kind of diagnostic task, but the question is, you know, is that because they understand the data like radiologists do or even better, and will therefore in the future be much more trustworthy? Or are they doing something completely different? That means that they're going to make some very unhuman like mistakes. Yeah. And I think we just don't know.End of the Turing TestEric Topol (21:50):Well, that's, that's an important admission, if you will. That is, we don't know. And as you're, again I think really zooming in on, on for medical applications some of them, of course, are not so critical for accuracy because you, for example, if you have a, a conversation in a clinic and that's made into a note and all the other downstream tasks, you still can go right to the transcript and see exactly if there was a potential miscue. But if you're talking about making a diagnosis in a complex patient that can be, if, if you, if we see hallucination, confabulation or whatever your favorite word is to characterize the false outputs, that's a big issue. But I, I actually really love your Professor of Complexity title because if there's anything complex this, this would fulfill it. And also, would you say it's time to stop talking about the Turing tests that retire? It? It's, it's over with the Turing test because it's so much more complex than that .Melanie Mitchell (22:55):Yeah. I mean, one problem with the Turing test is there never was a Turing test. Turing never really gave the details of how this, this test should work. Right? And so we've had Turing tests with chatbots, you know, since the two thousands where people have been fooled. It's not that hard to fool people into thinking that they're talking to a human. So I, I do think that the Turing test is not adequate for the, the question of like, are these things thinking? Are they robustly intelligent?Eric Topol (23:33):Yeah. One of my favorite stories you told in your book was about Hans Clever and the you know, basically faking out the potent that, that there was this machine intelligence with that. And yeah, I, I think this, this is so apropo a term that is used a lot that a lot of people I don't think fully understand is zero shot or one shot, or can you just help explain that to the non-computer science community?Melanie Mitchell (24:01):Yeah. So, so in the context of large language models, what that means is so I could, so do I give you zero, zero shot means I just ask you a question and expect you to answer it. One shot means I give you an example of a question and an answer, and now I ask you a new question that you, you should answer. But you already had an example, you know, two shot is you give two examples. So it's just a ma matter of like, how many examples am I going to give you in order for you to get the idea of what I'm asking?Eric Topol (24:41):Well, and in a sense, if you were pre-trained unknowingly, it might not be zero shot. That is, if, if the, if the model was pre-trained with all the stuff that was really loaded into that first question or prompt, it might not really qualify as a zero shot in a way. Right?Melanie Mitchell (24:59):Yeah. Right. If it's already seen that, if it's learned, I think we're getting, it's seen that in its training data.The Great LLM (Doomsday?) Debate: An Existential ThreatEric Topol (25:06):Right. Exactly. Now, another topic that is related to all this is that you participated in what I would say is a historic debate. you and Yann LeCun, who I would not have necessarily put together . I don't know that Yan is a centrist. I would say he's more, you know, on one end of the spectrum versus Max Tegmark and Yoshua BengioEric Topol (25:37):Youshua Bengio, who was one of the three notables for a Turing award with Geoffrey Hinton So you were in this debate. I think called a Musk debate.Melanie Mitchell (25:52):Monk debate. Monk.Eric Topol (25:54):Monk. I was gonna say not right. Monk debate. Yeah. the Monk Debates, which is a classic debate series out of, I think, University of TorontoMelanie Mitchell (26:03):That's rightEric Topol (26:03):And it was debating, you know, is it all over ? Is AI gonna, and obviously there's been a lot of this in recent weeks, months since ChatGPT surfaced. So can you kind of give us, I, I tried to access that debate, but since I'm not a member or subscriber, I couldn't watch it, and I'd love to actually but can you give us the skinny of what was discussed and your position there?Melanie Mitchell (26:29):Yeah. So, so actually you can't, you can access it on YouTube.Eric Topol (26:32):Oh, good. Okay. Good. I'll put the link in for this. Okay, great.Melanie Mitchell (26:37):Yeah. so, so the, the resolution was, you know, is AI an existential threat? Okay. By an existential, meaning human extinction. So pretty dramatic, right? and there's been, this debate actually has been going on for a long time, you know, since, since the beginning of the talks about this, the “singularity”, right? and there's many people in the sort of AI world who fear that AI, once it becomes quote unquote smarter than people will be we'll lose control of it.(27:33):We'll, we'll give it some task like, you know, solve, solve the problem of carbon emissions, and it will then misinterpret or mis sort of not, not care about the consequences. it will just sort of maniacally try and achieve that goal, and in, in the process of that, for accidentally kill us all. So that's one of the scenarios. There's many different scenarios for this, you know and the, you know, debate. The debate was, it was very a debate is kind of an artificial, weird structured discussion where you have rebuttals and try, you know. But I think the debate really was about sort of should we right now be focusing our attention on what's called existential risk, that is that, you know, some future AI is going to become smarter than humans and then somehow destroy us, or should we be more focused on more immediate risks, the ones that we have right now like AI creating disinformation, fooling people and into thinking it's a human, magnifying biases in society, all the risks that people, you know, are experiencing immediately, right. You know, or will be very soon. and that the debate was more about sort of what should be the focusEric Topol (29:12):Hmm.Melanie Mitchell (29:13):And whether we can focus on very shorter, shorter immediate risks also, and also focus on very long-term speculative risks, and sort of what is the likelihood of those speculative risks and how would we, you know, even estimate that. So that was kind of the topic of the debate. SoEric Topol (29:35):Did, did you all wind up agreeing then thatMelanie Mitchell (29:38):? No. Were youEric Topol (29:38):Scared or, or where, where did it land?Melanie Mitchell (29:41):Well, I don't know. Interestingly what they do is they take a vote at the beginning of the audience. Mm-hmm. And they say like, you know, how many people agree with, with the resolution, and 67 percent of people agreed that AI was an existential threat. So it was two thirds, and then at the end, they also take a vote and say like, how many, what percent of minds were changed? And that's the side that wins. But ironically, the, the voting mechanism broke at the end, . So technology, you know, for the win ,Eric Topol (30:18):Because it wasn't a post-debate vote?Melanie Mitchell (30:21):But they did do an email survey. Oh. Oh. Which is I think not very, you know,Eric Topol (30:26):No, not very good. No, you can't compare that. No.Melanie Mitchell (30:28):Yeah. So I, you know, technically our side won. Okay. But I don't take it as a win, actually. ,Are Your Afraid? Are You Scared?Eric Topol (30:38):Well, I guess another way to put it. Are you, are you afraid? Are you scared?Melanie Mitchell (30:44):So I, I'm not scared of like super intelligent AI getting out of control and destroying humanity, right? I think there's a lot of reasons why that's extremely unlikely.Eric Topol (31:00):Right.Melanie Mitchell (31:01):But I am, I do fear a lot of things about ai, you know, some of the things I mentioned yes, I think are real threats, you know, real dire threats to democracy.Eric Topol (31:15):Absolutely.Melanie Mitchell (31:15):That to our information ecosystem, how much we can trust the information that we have. And also just, you know, to people losing jobs to ai, I've already seen that happening, right. And the sort of disruption to our whole economic system. So I am worried about those things.What About Open-Source LLMs, Like Meta's Llama2?Eric Topol (31:37):Yeah. No, I think the inability to determine whether something's true or fake in so many different spheres is putting us in a lot of jeopardy, highly vulnerable, but perhaps not the broad existential threat of the species. Yeah. But serious stuff for sure. Now another thing that's just been of interest of late is the willingness for at least one of these companies Meta to put out their model as an open Llama2. Two I guess to, to make it open for everyone so that they can do whatever specialized fine tuning and whatnot. Is that a good thing? Is that, is that a, is that a game changer for the field? Because obviously the computer resources, which we understand, for example, GPUs [graphic processing units] used-- over 25,000 for GPT-4, not many groups or entities have that many GPUs on hand to do the base models. But is having an open model, like Meta's available is that good? Or is that potentially going to be a problem?Melanie Mitchell (32:55):Yeah, I think probably I would say yes to both .Eric Topol (32:59):Okay. Okay.Melanie Mitchell (33:01):No, 'cause it is a mixed bag. I, I think ultimately, you know, we talked about transparency and open source models are transparent. I mean, I, I don't know if, I don't think they actually have released information on the data they use to train it, right? Right. So that, it lacks that transparency. But at least, you know, if you are doing research and trying to understand how this model works, you have access to a lot of the model. You know, it would be nice to know more about the data it was trained on, but so there's a lot of, there's a lot of big positives there. and it also means that the data that you then use to continue training it or fine tuning it, is not then being given to a big company. Like, you're not doing it through some closed API, like you do for open AI(33:58):On the other hand, these, as we just saw, talked about, these models can be used for a lot of negative things like, you know, spreading disinformation and so on. Right. And giving, sort of making them generally available and tuneable by anyone presents that risk. Yeah. So I think there's, you know, there's an analogy I think, you know, with like genetics for example, you know, or disease research where I think there was a, the scientists had sequenced the genome of the smallpox virus, right? And there was like a big debate over should they publish that. Because it could be used to like create a new smallpox, right? But on the other hand, it also could be used to, to, to develop better vaccines and better treatments and so on. And so I think there, there are, you know, any technology like that, there's always the sort of balance between transparency and making it open and keeping it closed. And then the question is, who gets to control it?The Next Phase of LLMs and the Plateau of Human-Derived Input ContentEric Topol (35:11):Yeah. Who gets to control it? And to understand the potential for nefarious use cases. yeah. The worst case scenario. Sure. Well, you know, I look to you Melanie, as a leading light because you are so balanced and, you know, you don't, the interest thing about you is what I have the highest level of respect, and that's why I like to read anything you write or where you're making comments about other people's work. Are you going write another book?Melanie Mitchell (35:44):Yeah, I'm thinking about it now. I mean, I think kind of a follow up to my book, which as you mentioned, like your book, it was before large language models came on the scene and before transformers and all of that stuff. And I think that there really is a need for some non-technical explanation of all of this. But of course, you know, every time you write a book about AI, it becomes obsolete by the time it's published.Eric Topol (36:11):That that's I worry about, you know? And that was actually going be my last question to you, which is, you know, where are we headed? Like, whatever, GPT-5 and on and it's going, it's the velocity's so high. it, where can you get a steady state to write about and try to, you know, pull it all together? Or, or are we just going be in some crazed zone here for some time where the things are moving too fast to try to be able to get your arms around it?Melanie Mitchell (36:43):Yeah, I mean, I don't know. I, I think there's a question of like-- can AI keep moving so fast? You know, we've obviously it's moved extremely fast in the last few years and, but the way that it's moved fast is by having huge amounts of training data and scaling up these models. But the problem now is it's almost like the field is run out of training data generated by people. And if people start using language models all the time for generating text, the internet is going be full of generated text, right? Right. HumanEric Topol (37:24):WrittenMelanie Mitchell (37:24):Text. And it's been shown that if these models keep, are sort of trained on the text that they generate themselves, they start behaving very poorly. So that's a question. It's like, where's the new data going to come from?Eric Topol (37:39):, and there's lots of upsettedness among people whose data are being used.Melanie Mitchell (37:44):Oh, sure.Eric Topol (37:45): understandably. And as you get to is there a limit of, you know, there's only so many Wikipedias and Internets and hundreds of thousands of books and whatnot to put in that are of human source content. So do we reach a, a plateau of human derived inputs? That's really fascinating question. I perhaps things will not continue at such a crazed pace so we can I mean, the way you put together A Guide for Thinking Humans was so prototypic because it, it was so thoughtful and it brought along those of us who were not trained in computer science to really understand where the state of the field was and where deep neural networks were. We need another one of those. And you're no one, I nominate you to help us to give us the, the, the right perspective. So Melanie, Professor Mitchell, I'm so grateful to you, all of us who follow your work remain indebted for keeping it straight. You know, you don't get ever get carried away. and we learn from that, all of us. It's really important. 'cause this, you know, there's so many people on one end of the spectrum here, whether it's doomsday or whether this is just stochastic parrot or open source and whatnot. It's really good to have you as a reference anchor to help us along.Melanie Mitchell (39:13):Well, thanks so much, Eric. That's really kind of you. Get full access to Ground Truths at erictopol.substack.com/subscribe

Opanuj.AI Podcast
Czy rozwój AI musi oznaczać apokalipsę? - Co słychać w AI (Lipiec 2023)

Opanuj.AI Podcast

Play Episode Listen Later Aug 1, 2023 67:00


Lipiec w świecie AI w żadnym stopniu nie przypominał miesiąca wakacyjnego! Tylko w tym jednym odcinku zastanawiamy się, czy StackOverflow wykorzysta potencjał posiadanych danych aby wskoczyć na falę rozwiązań opartych o AI, czy LLaMa2 zmieni świat LLMów, a może zrobi to Claude 2 albo Bard, w końcu dostępny w Unii Europejskiej? No i przede wszystkim - jaka przyszłość czeka nasz gatunek w świecie opanowanym (?) przez AI? O tym wszystkim w podsumowaniu ostatniego miesiąca w świecie sztucznej inteligencji. #Przeprogramowani #OpanujAI #SztucznaInteligencja

Emmy 追劇時間
台灣女兒蘇姿丰回台,與黃仁勳能救AI 股大跌?AMD能靠人工智慧晶片重奪市場?綠色乖乖也是超微概念股?微軟Copilot開心數錢,Meta強碰GPT!拜登欲加重禁令讓半導體界集體翻臉之半導體爭霸51

Emmy 追劇時間

Play Episode Listen Later Aug 1, 2023 19:09


蘇媽七月中來台,班機一落地,AMD概念股就開始跟著水漲船高,完美複製黃 仁勳旋風!除了AI相關供應商,還有哪些股票也是蘇媽概念股? 蘇媽這次來台,固樁意味非常濃厚,就是要避免供應商全面倒向Nvidia;是哪幾間公司,被蘇媽特別點名? AI大戰牽動晶片商重拳互毆,微軟Copilot、Meta LLaMA2,外加Apple GPT,馬斯克也蠢蠢欲動,AI大戰將鹿死誰手? 那AMD的表現究竟如何?真的能打嗎?逆風女王蘇姿丰重兵部署的Mi300x晶片,能否帶著AMD在Nvidia大軍壓境下,鑿開一線生機,再創晶片榮耀? 至於在美國,拜登擬加重對中國的晶片禁令,導致半導體大廠齊赴白宮,他們吵了什麼,又將如何牽動全球半導體版圖? 全台獨家的財經追劇,精彩萬分,持續連載中! (現在就加入會員支持我們,還可以看到更多專屬影片~) https://reurl.cc/4QVgyL

This Day in AI Podcast
Llama2 Overtakes ChatGPT, The AI Cartel & Addictive AI Agents | E25

This Day in AI Podcast

Play Episode Listen Later Jul 28, 2023 67:08


This week, in Episode 25 of This Day in AI we discuss the motivation behind the Frontier Model Forum (OpenAI, Google, Anthropic, Microsoft) and if Open Source remains the best approach for AI safety and security. We discuss Llama 2 being #2 on the AlpacaEval Leaderboard and its significance to the development of AI. We also discuss the paper on Universal and Transferable Adversarial Attacks on Aligned Language Models and how Mike can't prompt inject his AI girlfriend. We discuss how Mike cloned his friend with an AI Bot and the future implications. And also.. Stackoverflow AI, Stable Diffusion XL 1.0 and technology advances that might be being made by AGI!?If you like this podcast please consider subscribing and leaving us a review.CHAPTERS:00:00 - Chris think Anthropic is a Safety Cult (cold open)00:29 - Llama2 is Number 2, Ahead of ChatGPT on AlpacaEval Leaderboard06:12 - Does Llama2 threaten OpenAI and Anthropic?09:12 - Characters to Thwart Prompt Injection Attacks17:51 - Debate on Regulation Vs Open Source for Safety, Senate Committe on AI and thoughts on AI Risk36:28 - Is the Frontier Model Forum a Cartel?42:05 - Mike's AI Girlfriend, Cloning a Friend and Talking to the Dead with AI49:43 - Fine Tuning AI to Increase Retention54:45 - Stack overflow Fights Back! Stack Overflow AI is here!59:18 - Stable Diffusion XL 1.01:03:46 - Could Room Temperature Ambient Superconductors Could be a Sign of AGI!?SOURCES:https://tatsu-lab.github.io/alpaca_eval/https://twitter.com/profoundlyyyy/status/1684333960753565696?s=20https://medium.com/@daniel_eth/ai-x-risk-at-senate-hearing-7104f371ca0bhttps://venturebeat.com/ai/hugging-face-github-and-more-unite-to-defend-open-source-in-eu-ai-legislation/https://huggingface.co/blog/assets/eu_ai_act_oss/supporting_OS_in_the_AIAct.pdfhttps://openai.com/blog/frontier-model-forumhttps://www.wired.com/story/metas-open-source-llama-upsets-the-ai-horse-race/https://twitter.com/OpenAI/status/1684145154628653056https://llm-attacks.org/zou2023universal.pdfhttps://twitter.com/zicokolter/status/1684500097386811393/photo/1https://futurism.com/experts-ai-girlfriend-apps-menhttps://twitter.com/emollick/status/1684623965203910656https://arxiv.org/pdf/2303.06135.pdfhttps://twitter.com/StackOverflow/status/1684530704850243584https://twitter.com/natfriedman/status/1684303687894839296?s=46&t=uXHUN4Glah4CaV-g2czc6Qhttps://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0https://twitter.com/russelljkaplan/status/1684042014495592448

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jul 26, 2023 54:31


FlashAttention was first published by Tri Dao in May 2022 and it had a deep impact in the large language models space. Most open models you've heard of (RedPajama, MPT, LLaMA, Falcon, etc) all leverage it for faster inference. Tri came on the podcast to chat about FlashAttention, the newly released FlashAttention-2, the research process at Hazy Lab, and more. This is the first episode of our “Papers Explained” series, which will cover some of the foundational research in this space. Our Discord also hosts a weekly Paper Club, which you can signup for here. How does FlashAttention work?The paper is titled “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness”. There are a couple keywords to call out:* “Memory Efficient”: standard attention memory usage is quadratic with sequence length (i.e. O(N^2)). FlashAttention is sub-quadratic at O(N). * “Exact”: the opposite of “exact” in this case is “sparse”, as in “sparse networks” (see our episode with Jonathan Frankle for more). This means that you're not giving up any precision.* The “IO” in “IO-Awareness” stands for “Input/Output” and hints at a write/read related bottleneck. Before we dive in, look at this simple GPU architecture diagram:The GPU has access to three memory stores at runtime:* SRAM: this is on-chip memory co-located with the actual execution core. It's limited in size (~20MB on an A100 card) but extremely fast (19TB/s total bandwidth)* HBM: this is off-chip but on-card memory, meaning it's in the GPU but not co-located with the core itself. An A100 has 40GB of HBM, but only a 1.5TB/s bandwidth. * DRAM: this is your traditional CPU RAM. You can have TBs of this, but you can only get ~12.8GB/s bandwidth, which is way too slow.Now that you know what HBM is, look at how the standard Attention algorithm is implemented:As you can see, all 3 steps include a “write X to HBM” step and a “read from HBM” step. The core idea behind FlashAttention boils down to this: instead of storing each intermediate result, why don't we use kernel fusion and run every operation in a single kernel in order to avoid memory read/write overhead? (We also talked about kernel fusion in our episode with George Hotz and how PyTorch / tinygrad take different approaches here)The result is much faster, but much harder to read:As you can see, FlashAttention is a very meaningful speed improvement on traditional Attention, and it's easy to understand why it's becoming the standard for most models.This should be enough of a primer before you dive into our episode! We talked about FlashAttention-2, how Hazy Research Group works, and some of the research being done in Transformer alternatives.Show Notes:* FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (arXiv)* FlashAttention-2* Together AI* From Deep Learning to Long Learning* The Hardware Lottery by Sara Hooker* Hazy Research* Is Attention All You Need?* Nvidia CUTLASS 3* SRAM scaling slows* Transformer alternatives:* S4* Hyena* Recurrent Neural Networks (RNNs)Timestamps:* Tri's background [00:00:00]* FlashAttention's deep dive [00:02:18]* How the Hazy Research group collaborates across theory, systems, and applications [00:17:21]* Evaluating models beyond raw performance [00:25:00]* FlashAttention-2 [00:27:00]* CUDA and The Hardware Lottery [00:30:00]* Researching in a fast-changing market [00:35:00]* Promising transformer alternatives like state space models and RNNs [00:37:30]* The spectrum of openness in AI models [00:43:00]* Practical impact of models like LLAMA2 despite restrictions [00:47:12]* Incentives for releasing open training datasets [00:49:43]* Lightning Round [00:53:22]Transcript:Alessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, Partner and CTO-in-Residence at Decibel Partners. Today we have no Swyx, because he's in Singapore, so it's a one-on-one discussion with Tri Dao. Welcome! [00:00:24]Tri: Hi everyone. I'm Tri Dao, excited to be here. [00:00:27]Alessio: Tri just completed his PhD at Stanford a month ago. You might not remember his name, but he's one of the main authors in the FlashAttention paper, which is one of the seminal work in the Transformers era. He's got a lot of interest from efficient transformer training and inference, long range sequence model, a lot of interesting stuff. And now you're going to be an assistant professor in CS at Princeton next year. [00:00:51]Tri: Yeah, that's right. [00:00:52]Alessio: Yeah. And in the meantime, just to get, you know, a low pressure thing, you're Chief Scientist at Together as well, which is the company behind RedPajama. [00:01:01]Tri: Yeah. So I just joined this week actually, and it's been really exciting. [00:01:04]Alessio: So what's something that is not on the internet that people should know about you? [00:01:09]Tri: Let's see. When I started college, I was going to be an economist, so I was fully on board. I was going to major in economics, but the first week I was at Stanford undergrad, I took a few math classes and I immediately decided that I was going to be a math major. And that kind of changed the course of my career. So now I'm doing math, computer science, AI research. [00:01:32]Alessio: I had a similar thing. I started with physics and then I took like a programming course and I was like, I got to do computer science. I don't want to do physics. So FlashAttention is definitely, everybody's using this. Everybody loves it. You just released FlashAttention 2 last week. [00:01:48]Tri: Yeah. Early this week on Monday. Yeah. [00:01:53]Alessio: You know, AI time. Things move fast. So maybe let's run through some of the FlashAttention highlights, some of the innovation there, and then we can dive into FlashAttention 2. So the core improvement in FlashAttention is that traditional attention is a quadratic sequence length. And to the two, FlashAttention is linear, which obviously helps with scaling some of these models. [00:02:18]Tri: There are two factors there. So of course the goal has been to make attention go faster or more memory efficient. And ever since attention became popular in 2017 with the Transformer paper, lots and lots of folks have been working on this. And a lot of approaches has been focusing on approximating attention. The goal is you want to scale to longer sequences. There are tons of applications where you want to do that. But scaling to longer sequences is difficult because attention scales quadratically in sequence length on both runtime and memory, as you mentioned. So instead of trying to approximate attention, we were trying to figure out, can we do the same computation and maybe be more memory efficient? So in the end, we ended up being the memory is linear in sequence length. In terms of computation, it's still quadratic, but we managed to make it much more hardware friendly. And as a result, we do get wall clock speed up on the order of 2 to 4x, which really helps because that just means that you'll be able to train with 2 to 4x longer sequence length for the same cost without doing any approximations. As a result, lots of folks have been using this. The thing is available in a lot of libraries that do language model training or fine tuning. [00:03:32]Alessio: And the approximation thing is important because this is an exact thing versus a sparse. So maybe explain a little bit the difference there. [00:03:40]Tri: For sure. So in addition, essentially you compute pairwise similarity between every single element in a sequence against each other. So there's been other approaches where instead of doing all that pairwise computation, you only compute similarity for some pairs of elements in the sequence. So you don't do quadratic number of comparison. And this can be seen as some form of sparsity. Essentially you're ignoring some of the elements. When you write down the matrix, you essentially say, OK, I'm going to pretend there's zero. So that has some benefits in terms of runtime and memory. But the trade-off is that it tends to do worse in terms of quality because you're essentially approximating or ignoring some elements. And I personally have worked on this as well for a few years. But when we talk to practitioners who actually train models, especially at large scale, they say, tend not to use these approximate attention methods. Because it turns out, this was surprising to me at the time, was that these approximation methods, even though they perform fewer computation, they tend to not be faster in walk-on time. So this was pretty surprising because back then, I think my background was more on the theoretical side. So I was thinking of, oh, how many flops or floating point operations are you performing? And hopefully that correlates well with walk-on time. But I realized that I was missing a bunch of ideas from the system side where flops or floating point operations don't necessarily correlate with runtime. There are other factors like memory reading and writing, parallelism, and so on. So I learned a ton from just talking to systems people because they kind of figured this stuff out a while ago. So that was really eye-opening. And then we ended up focusing a lot more on memory reading and writing because that turned out to be the majority of the time when you're doing attention is reading and writing memory. [00:05:34]Alessio: Yeah, the I.O. awareness is probably one of the biggest innovations here. And the idea behind it is, like you mentioned, the FLOPS growth of the cards have been going up, but the memory bandwidth, not as much. So I think maybe that was one of the assumptions that the original attention paper had. So talk a bit about how that came to be as an idea. It's one of those things that like in insight, it's like, obviously, why are we like rewriting to like HBM every time, you know, and like once you change it, it's clear. But what was that discovery process? [00:06:08]Tri: Yeah, in hindsight, a lot of the ideas have already been there in the literature. And I would say is it was somehow at the intersection of both machine learning and systems. And you kind of needed ideas from both sides. So on one hand, on the system side, so lots of systems folks have known that, oh, you know, kernel fusion is great. Kernel fusion just means that instead of performing, you know, loading the same element, instead of performing an operation, write it down, load it back up and perform the second operation, you just load it once, perform two operations and then write it down again. So that saves you kind of memory read and write in the middle there. So kernel fusion has been a classic. There's been other techniques from the system side, like tiling, where you perform things in the form of computations in block, again, so that you can load it into a really fast memory. Think of it as a cache. And this is, again, classical computer science ideas, right? You want to use the cache. So the system folks have been thinking about these ideas for a long time, and they apply to attention as well. But there were certain things in attention that made it difficult to do a complete kernel fusion. One of which is there is this softmax operation in the middle, which requires you to essentially sum across the row of the attention matrix. So it makes it difficult to kind of break it, because there's this dependency. So it makes it difficult to break things into a block. So on the system side, people have been thinking about these ideas, but it's been difficult to kind of do kernel fusion for the entire operation. On the machine learning side, people have been thinking more algorithmically. They say, okay, either we can approximate attention, or there's this trick called the online softmax trick, which says that because of softmax, the way it's written mathematically, you can actually break it up into smaller pieces, do some rescaling, and still get the right answer. So this online softmax trick has been around for a while. I think there was a paper from NVIDIA folks back in 2018 about this. And then there was a paper from Google. So Marcus, Rob, and Stats wrote a paper late 2021 on using this online softmax trick to break attention up into smaller pieces. So a lot of the ideas were already there. But it turns out, you kind of need to combine ideas from both sides. So you need to understand that, hey, we want to do kernel fusion to reduce memory written writes. But we also need this online softmax trick to be able to break the softmax into smaller pieces so that a lot of the systems tricks kind of carry through. We saw that, and it was kind of a natural idea that we ended up using ideas from both sides, and it ended up working pretty well. Yeah. [00:08:57]Alessio: Are there any downsides to kernel fusion? If I think about databases and the reasons why we have atomic operations, you know, it's like, you have observability and fallback in between them. How does that work with attention? Is there anything that we lose by fusing the operations? [00:09:13]Tri: Yeah, I think mostly on the practical side is that you lose a little bit of flexibility in the sense that, hey, now you have, for example, faster attention, it's just a subroutine that you would call to do attention. But as a researcher, let's say you don't want that exact thing, right? You don't want just attention, let's say you want some modification to attention. You want to do, hey, I'm going to multiply the query and key, but then I'm going to do this extra thing before I carry on. So kernel fusion just means that, okay, we have a subroutine that does the entire thing. But if you want to experiment with things, you won't be able to use that fused kernel. And the answer is, can we have a compiler that then automatically does a lot of this kernel fusion? Lots of compiler folks are thinking about this, either with a new language or you can embed it in PyTorch. PyTorch folks have been working on this as well. So if you write just your code in PyTorch and they can capture the graph, can they generate code that will fuse everything together? That's still ongoing, and it works for some cases. But for attention, because of this kind of softmax rewriting stuff, it's been a little bit more difficult. So maybe in a year or two, we'll have compilers that are able to do a lot of these optimizations for you. And you don't have to, for example, spend a couple months writing CUDA to get this stuff to work. Awesome. [00:10:41]Alessio: And just to make it clear for listeners, when we say we're not writing it to memory, we are storing it, but just in a faster memory. So instead of the HBM, we're putting it in the SRAM. Yeah. [00:10:53]Tri: Yeah. [00:10:54]Alessio: Maybe explain just a little bit the difference there. [00:10:56]Tri: Yeah, for sure. This is kind of a caricature of how you think about accelerators or GPUs in particular, is that they have a large pool of memory, usually called HBM, or high bandwidth memory. So this is what you think of as GPU memory. So if you're using A100 and you list the GPU memory, it's like 40 gigs or 80 gigs. So that's the HBM. And then when you perform any operation, you need to move data from the HBM to the compute unit. So the actual hardware unit that does the computation. And next to these compute units, there are on-chip memory or SRAM, which are much, much smaller than HBM, but much faster. So the analogy there is if you're familiar with, say, CPU and RAM and so on. So you have a large pool of RAM, and then you have the CPU performing the computation. But next to the CPU, you have L1 cache and L2 cache, which are much smaller than DRAM, but much faster. So you can think of SRAM as the small, fast cache that stays close to the compute unit. Physically, it's closer. There is some kind of asymmetry here. So HBM is much larger, and SRAM is much smaller, but much faster. One way of thinking about it is, how can we design algorithms that take advantage of this asymmetric memory hierarchy? And of course, lots of folks have been thinking about this. These ideas are pretty old. I think back in the 1980s, the primary concerns were sorting. How can we sort numbers as efficiently as possible? And the motivating example was banks were trying to sort their transactions, and that needs to happen overnight so that the next day they can be ready. And so the same idea applies, which is that they have slow memory, which was hard disk, and they have fast memory, which was DRAM. And people had to design sorting algorithms that take advantage of this asymmetry. And it turns out, these same ideas can apply today, which is different kinds of memory. [00:13:00]Alessio: In your paper, you have the pyramid of memory. Just to give people an idea, when he says smaller, it's like HBM is like 40 gig, and then SRAM is like 20 megabytes. So it's not a little smaller, it's much smaller. But the throughput on card is like 1.5 terabytes a second for HBM and like 19 terabytes a second for SRAM, which is a lot larger. How do you think that evolves? So TSMC said they hit the scaling limits for SRAM, they just cannot grow that much more. HBM keeps growing, HBM3 is going to be 2x faster than HBM2, I think the latest NVIDIA thing has HBM3. How do you think about the future of FlashAttention? Do you think HBM is going to get fast enough when maybe it's not as useful to use the SRAM? [00:13:49]Tri: That's right. I think it comes down to physics. When you design hardware, literally SRAM stays very close to compute units. And so you don't have that much area to essentially put the transistors. And you can't shrink these things too much. So just physics, in terms of area, you don't have that much area for the SRAM. HBM is off-chip, so there is some kind of bus that essentially transfers data from HBM to the compute unit. So you have more area to essentially put these memory units. And so yeah, I think in the future SRAM probably won't get that much larger, because you don't have that much area. HBM will get larger and faster. And so I think it becomes more important to design algorithms that take advantage of this memory asymmetry. It's the same thing in CPU, where the cache is really small, the DRAM is growing larger and larger. DRAM could get to, I don't know, two terabytes, six terabytes, or something, whereas the cache stays at, I don't know, 15 megabytes or something like that. I think maybe the algorithm design becomes more and more important. There's still ways to take advantage of this, I think. So in the future, I think flash attention right now is being used. I don't know if in the next couple of years, some new architecture will come in and whatnot, but attention seems to be still important. For the next couple of years, I still expect some of these ideas to be useful. Not necessarily the exact code that's out there, but I think these ideas have kind of stood the test of time. New ideas like IO awareness from back in the 1980s, ideas like kernel fusions, tiling. These are classical ideas that have stood the test of time. So I think in the future, these ideas will become more and more important as we scale models to be larger, as we have more kinds of devices, where performance and efficiency become much, much more important. [00:15:40]Alessio: Yeah, and we had Jonathan Frankle on the podcast, and if you go to issattentionallyouneed.com, he has an outstanding bet, and he does believe that attention will be the state of the art architecture still in a few years. Did you think flash attention would be this popular? I'm always curious on the research side, you publish a paper, and obviously you know it's great work, but sometimes it just kind of falls flat in the industry. Could you see everybody just starting to use this, or was that a surprise to you? [00:16:11]Tri: Certainly, I didn't anticipate the level of popularity. Of course, we were extremely happy to have people using this stuff and giving us feedback and so on, and help us improve things. I think when we were writing the paper, I remember sending an email to one of my advisors, and like, hey, I'm excited about this paper, but I think the most important thing will be the artifact, which is the code. So I knew that the code will be valuable. So we kind of focus a lot on the code and make sure that the code is usable and as fast as can be. Of course, the idea, the paper presents the ideas and explain it and have experiments that validate the idea, but I knew that the artifact or the code was also pretty important. And that turned out to be the right focus, which is, you know, we put out the paper, we release the code and continue working on the code. So it's a team effort with my co-authors as well. [00:17:07]Alessio: We mentioned Hazy Research a bunch of times on the podcast before. I would love for you to spend five minutes just talking about how does the group work? How do people get together? How do you bounce ideas off of each other? Yeah. [00:17:21]Tri: So Hazy Research is a research group at Stanford led by one of my advisors, Chris Re. I love the people there. It was one of the best experiences I had. They've made my PhD so much more enjoyable. And I think there are a couple of ways that the group has been working pretty well. So one is, I think there's a diverse pool of people who either, you know, some of them focus on algorithms and theory, some of them focus on building systems, some of them focus on applications. And as a result, there is this flow of idea. So as an example, some of us were working on like more algorithms and theory, and then we can talk to the folks building systems and say, hey, let's try it out and let's put it in the systems and see how it is. And there you will get feedback from systems folks. They will say, hey, we implemented this, or we tried this and this is where it doesn't work, something like that. And once we put it in the systems, the application folks can use the algorithm or new methods or new models. And we again get great feedback from them because the application folks, for example, some of my good friends, they focus on medical imaging or seizure detection. And that is the problem they care about. And if your method doesn't work on the task they care about, they will tell you. Whereas I think a lot of people in machine learning, they're a little bit more flexible. So they will be like, hey, it doesn't work on seizure detection. Let's try some other task, right? But having that direct feedback of like, hey, it doesn't work there, let's figure out why. I think that that feedback allows us to do better work. And I think that kind of process of exchanging ideas, validating it in a real system so that applications folks can try it out and give you feedback. That cycle has been very, very useful. And so that's one, having a diverse group of people. The other one is, and this is something I really appreciate from advice from Chris was try to understand the fundamental, right? And he's happy letting me go off and read some textbooks and playing with things because I think a lot of research ideas come from understanding the old literature and see how it fits with the new landscape. And so if you just new archive papers every day, that's great, but you also need to read textbooks. And that's one advice I got from Chris, which is understand the fundamentals. And I think that allows us to do more impactful work. [00:19:46]Alessio: How do you think about academia versus industry? I feel like AI / Machine Learning has been an area where up until three, four years ago, most of the cutting edge work was being done in academia. And now there's all these big industry research labs. You're obviously going to Princeton, so you're an academia believer. How should people think about where to go? Say I'm doing my master's, I have to decide between doing a PhD and going into OpenAI Anthropic. How should I decide? [00:20:15]Tri: I think they kind of play a complementary role, in my opinion. Of course, I also was considering different paths as well. So I think right now, scaling matters a lot, especially when you talk about language models and AI and so on. Scaling matters a lot. And that means that you need compute resources and you need infrastructure and you need engineers time. And so industry tends to have an advantage when it comes to scaling things. But a lot of the ideas actually came from academia. So let's take Attention, which got popular with the Transformer in 2017. Attention actually has been around for a while. So I think the first mention was in 2014, a paper from Bernadot and others and Yoshua Bengio, which is coming from academia. A lot of ideas did come from academia. And scaling things up, of course, I think OpenAI has been great at scaling things up. That was the bet that they made after, I think, GPT-2. So they saw that scaling these things up to back then was 1.5 billion parameter seemed to give you amazing capabilities. So they really committed to that. They really committed to scaling things. And that turned out to be, it's been a pretty successful bet. I think for academia, we're still trying to figure out exactly what we're doing in this shifting landscape. And so lots of folks have been focusing on, for example, evaluation. So I know the Stanford Center for Foundation Model led by Percy, they have this benchmark called HELM, which is this holistic benchmark. So trying to figure out, okay, characterizing the landscape of different kinds of models, what people should evaluate, what people should measure, and things like that. So evaluation is one role. The other one is understanding. So this has happened historically where there's been some development in the industry and academia can play a role in explaining, understanding. They have the luxury to slow down trying to understand stuff, right? So lots of paper on understanding what's really going on, probing these models, and so on. I think I'm not as familiar with the NLP literature, but my impression is there's a lot of that going on in the NLP conferences, which is understanding what these models are doing, what capabilities they have, and so on. And the third one I could see is that the academia can take more risky bets in the sense that we can work on stuff that is quite different from industry. I think industry, my impression is you have some objective. You're trying to say, hey, for this quarter, we want to scale the model in this particular way. Next quarter, we want the model to have these capabilities. You're trying to get objectives that maybe, I don't know, 70% that will work out because it's important for the company's direction. I think for academia, the way things work is you have many, many researchers or PhD students, and they're kind of pursuing independent directions. And they have a little bit more flexibility on, hey, I'm going to try out this seemingly crazy idea and see, let's say there's a 30% chance of success or something. And however you define success, for academia, a lot of the time, success just means like, hey, we found something interesting. That could eventually go into industry through collaboration and so on. So I do see academia and industry kind of playing complementary roles. And as for someone choosing a career, I think just more and more generally, industry would be probably better in terms of compensation, in terms of probably work-life balance. But my biased perspective is that maybe academia gives you a little bit more freedom to think and understand things. So it probably comes down to personal choice. I end up choosing to be a professor next year at Princeton. But of course, I want to maintain a relationship with industry folks. I think industry folks can provide very valuable feedback to what we're doing in academia so that we understand where the field is moving because some of the directions are very much influenced by what, for example, OpenAI or Google is doing. So we want to understand where the field is moving. What are some promising applications? And try to anticipate, okay, if the field is moving like this, these applications are going to be popular. What problems will be important in two, three years? And then we try to start thinking about those problems so that hopefully in two, three years, we have some of the answers to some of these problems in two, three years. Sometimes it works out, sometimes it doesn't. But as long as we do interesting things in academia, that's the goal. [00:25:03]Alessio: And you mentioned the eval side. So we did a Benchmarks 101 episode. And one of the things we were seeing is sometimes the benchmarks really influence the model development. Because obviously, if you don't score well on the benchmarks, you're not going to get published and you're not going to get funded. How do you think about that? How do you think that's going to change now that a lot of the applications of these models, again, is in more narrow industry use cases? Do you think the goal of the academia eval system is to be very broad and then industry can do their own evals? Or what's the relationship there? [00:25:40]Tri: Yeah, so I think evaluation is important and often a little bit underrated. So it's not as flashy as, oh, we have a new model that can do such and such. But I think evaluation, what you don't measure, you can't make progress on, essentially. So I think industry folks, of course, they have specific use cases that their models need to do well on. And that's what they care about. Not just academia, but other groups as well. People do understand what are some of the emerging use cases. So for example, now one of the most popular use cases is Chatbot. And then I think folks from Berkeley, some of them are from Berkeley, call them MLCs. They set up this kind of Chatbot arena to essentially benchmark different models. So people do understand what are some of the emerging use cases. People do contribute to evaluation and measurement. And as a whole, I think people try to contribute to the field and move the field forward, albeit that maybe slightly different directions. But we're making progress and definitely evaluation and measurement is one of the ways you make progress. So I think going forward, there's still going to be just more models, more evaluation. We'll just have better understanding of what these models are doing and what capabilities they have. [00:26:56]Alessio: I like that your work has been focused on not making benchmarks better, but it's like, let's just make everything faster. So it's very horizontal. So FlashAttention 2, you just released that on Monday. I read in the blog post that a lot of the work was also related to some of the NVIDIA library updates. Yeah, maybe run us through some of those changes and some of the innovations there. Yeah, for sure. [00:27:19]Tri: So FlashAttention 2 is something I've been working on for the past couple of months. So the story is the NVIDIA CUTLASS team, they released a new version of their library, which contains all these primitives to allow you to do matrix multiply or memory loading on GPU efficiently. So it's a great library and I built on that. So they released their version 3 back in January and I got really excited and I wanted to play with that library. So as an excuse, I was just like, okay, I'm going to refactor my code and use this library. So that was kind of the start of the project. By the end, I just ended up working with the code a whole lot more and I realized that, hey, there are these inefficiencies still in Flash Attention. We could change this way or that way and make it, in the end, twice as fast. But of course, building on the library that the NVIDIA folks released. So that was kind of a really fun exercise. I was starting out, it's just an excuse for myself to play with the new library. What ended up was several months of improvement, improving Flash Attention, discovering new ideas. And in the end, we managed to make it 2x faster and now it's pretty close to probably the efficiency of things like matrix multiply, which is probably the most optimized subroutine on the planet. So we're really happy about it. The NVIDIA Cutlass team has been very supportive and hopefully in the future, we're going to collaborate more. [00:28:46]Alessio: And since it's an NVIDIA library, can you only run this on CUDA runtimes? Or could you use this and then run it on an AMD GPU? [00:28:56]Tri: Yeah, so it's an NVIDIA library. So right now, the code we release runs on NVIDIA GPUs, which is what most people are using to train models. Of course, there are emerging other hardware as well. So the AMD folks did implement a version of Flash Attention, I think last year as well, and that's also available. I think there's some implementation on CPU as well. For example, there's this library, ggml, where they implemented the same idea running on Mac and CPU. So I think that kind of broadly, the idea would apply. The current implementation ended up using NVIDIA's library or primitives, but I expect these ideas to be broadly applicable to different hardware. I think the main idea is you have asymmetry in memory hierarchy, which tends to be everywhere in a lot of accelerators. [00:29:46]Alessio: Yeah, it kind of reminds me of Sara Hooker's post, like the hardware lottery. There could be all these things that are much better, like architectures that are better, but they're not better on NVIDIA. So we're never going to know if they're actually improved. How does that play into some of the research that you all do too? [00:30:04]Tri: Yeah, so absolutely. Yeah, I think Sara Hooker, she wrote this piece on hardware lottery, and I think she captured really well of what a lot of people have been thinking about this. And I certainly think about hardware lottery quite a bit, given that I do some of the work that's kind of really low level at the level of, hey, we're optimizing for GPUs or NVIDIA GPUs and optimizing for attention itself. And at the same time, I also work on algorithms and methods and transformer alternatives. And we do see this effect in play, not just hardware lottery, but also kind of software framework lottery. You know, attention has been popular for six years now. And so many kind of engineer hours has been spent on making it as easy and efficient as possible to run transformer, right? And there's libraries to do all kinds of tensor parallel, pipeline parallel, if you use transformer. Let's say someone else developed alternatives, or let's just take recurrent neural nets, like LSTM, GRU. If we want to do that and run that efficiently on current hardware with current software framework, that's quite a bit harder. So in some sense, there is this feedback loop where somehow the model architectures that take advantage of hardware become popular. And the hardware will also kind of evolve to optimize a little bit for that kind of architecture and software framework will also evolve to optimize for that particular architecture. Right now, transformer is the dominant architecture. So yeah, I'm not sure if there is a good way out of this. Of course, there's a lot of development. Things like, I think compilers will play a role because compilers allow you to maybe still be much more efficient across different kinds of hardware because essentially you write the same code and compiler will be able to make it run efficiently different kinds of hardware. So for example, there's this language Mojo, they're compiler experts, right? And their bet is AI models will be running on different kinds of devices. So let's make sure that we have really good compilers with a good language that then the compiler can do a good job optimizing for all kinds of devices. So that's maybe one way that you can get out of this cycle. But yeah, I'm not sure of a good way. In my own research, I have to think about both the algorithm new model and how it maps to hardware. So there are crazy ideas that seem really good, but will be really, really difficult to run efficiently. And so as a result, for example, we can't really scale some of the architectures up simply because they're not hardware friendly. I have to think about both sides when I'm working on new models. [00:32:50]Alessio: Yeah. Have you spent any time looking at some of the new kind of like AI chips companies, so to speak, like the Cerebras of the world? Like one of their innovations is co-locating everything on the chip. So you remove some of this memory bandwidth issue. How do you think about that? [00:33:07]Tri: Yeah, I think that's an interesting bet. I think Tesla also has this Dojo supercomputer where they try to have essentially as fast on-chip memory as possible and removing some of these data transfer back and forth. I think that's a promising direction. The issues I could see, you know, I'm definitely not a hardware expert. One issue is the on-chip memory tends to be really expensive to manufacture, much more expensive per gigabyte compared to off-chip memory. So I talked to, you know, some of my friends at Cerebros and, you know, they have their own stack and compiler and so on, and they can make it work. The other kind of obstacle is, again, with compiler and software framework and so on. For example, if you can run PyTorch on this stuff, lots of people will be using it. But supporting all the operations in PyTorch will take a long time to implement. Of course, people are working on this. So I think, yeah, we kind of need these different bets on the hardware side as well. Hardware has, my understanding is, has a kind of a longer time scale. So you need to design hardware, you need to manufacture it, you know, maybe on the order of three to five years or something like that. So people are taking different bets, but the AI landscape is changing so fast that it's hard to predict, okay, what kind of models will be dominant in, let's say, three or five years. Or thinking back five years ago, would we have known that Transformer would have been the dominant architecture? Maybe, maybe not, right? And so different people will make different bets on the hardware side. [00:34:39]Alessio: Does the pace of the industry and the research also influence the PhD research itself? For example, in your case, you're working on improving attention. It probably took you quite a while to write the paper and everything, but in the meantime, you could have had a new model architecture come out and then it's like nobody cares about attention anymore. How do people balance that? [00:35:02]Tri: Yeah, so I think it's tough. It's definitely tough for PhD students, for researchers. Given that the field is moving really, really fast, I think it comes down to understanding fundamental. Because that's essentially, for example, what the PhD allows you to do. It's been a couple of years understanding the fundamentals. So for example, when I started my PhD, I was working on understanding matrix vector multiply, which has been a concept that's been around for hundreds of years. We were trying to characterize what kind of matrices would have theoretically fast multiplication algorithm. That seems to have nothing to do with AI or anything. But I think that was a time when I developed mathematical maturity and research taste and research skill. The research topic at that point didn't have to be super trendy or anything, as long as I'm developing skills as a researcher, I'm making progress. And eventually, I've gotten quite a bit better in terms of research skills. And that allows, for example, PhD students later in their career to quickly develop solutions to whatever problems they're facing. So I think that's just the natural arc of how you're being trained as a researcher. For a lot of PhD students, I think given the pace is so fast, maybe it's harder to justify spending a lot of time on the fundamental. And it's tough. What is this kind of explore, exploit kind of dilemma? And I don't think there's a universal answer. So I personally spend some time doing this kind of exploration, reading random textbooks or lecture notes. And I spend some time keeping up with the latest architecture or methods and so on. I don't know if there's a right balance. It varies from person to person. But if you only spend 100% on one, either you only do exploration or only do exploitation, I think it probably won't work in the long term. It's probably going to have to be a mix and you have to just experiment and kind of be introspective and say, hey, I tried this kind of mixture of, I don't know, one exploration paper and one exploitation paper. How did that work out for me? Should I, you know, having conversation with, for example, my advisor about like, hey, did that work out? You know, should I shift? I focus more on one or the other. I think quickly adjusting and focusing on the process. I think that's probably the right way. I don't have like a specific recommendation that, hey, you focus, I don't know, 60% on lecture notes and 40% on archive papers or anything like that. [00:37:35]Alessio: Let's talk about some Transformer alternatives. You know, say Jonathan Franco loses his bet and Transformer is not the state of the art architecture. What are some of the candidates to take over? [00:37:49]Tri: Yeah, so this bet is quite fun. So my understanding is this bet between Jonathan Franco and Sasha Rush, right? I've talked to Sasha a bunch and I think he recently gave an excellent tutorial on Transformer alternatives as well. So I would recommend that. So just to quickly recap, I think there's been quite a bit of development more recently about Transformer alternatives. So architectures that are not Transformer, right? And the question is, can they do well on, for example, language modeling, which is kind of the application that a lot of people care about these days. So there are methods based on state space methods that came out in 2021 from Albert Gu and Curran and Chris Re that presumably could do much better in terms of capturing long range information while not scaling quadratically. They scale sub-quadratically in terms of sequence length. So potentially you could have a much more efficient architecture when sequence length gets really long. The other ones have been focusing more on recurrent neural nets, which is, again, an old idea, but adapting to the new landscape. So things like RWKV, I've also personally worked in this space as well. So there's been some promising results. So there's been some results here and there that show that, hey, these alternatives, either RNN or state space methods, can match the performance of Transformer on language modeling. So that's really exciting. And we're starting to understand on the academic research side, we want to understand, do we really need attention? I think that's a valuable kind of intellectual thing to understand. And maybe we do, maybe we don't. If we want to know, we need to spend serious effort on trying the alternatives. And there's been folks pushing on this direction. I think RWKV scale up to, they have a model at 14 billion that seems pretty competitive with Transformer. So that's really exciting. That's kind of an intellectual thing. We want to figure out if attention is necessary. So that's one motivation. The other motivation is Transformer Alternative could have an advantage in practice in some of the use cases. So one use case is really long sequences. The other is really high throughput of generation. So for really long sequences, when you train with Transformer, with flash attention and so on, the computation is still quadratic in the sequence length. So if your sequence length is on the order of, I don't know, 16K, 32K, 100K or something, which some of these models have sequence length 100K, then you do get significantly slower in terms of training, also in terms of inference. So maybe these alternative architectures could scale better in terms of sequence length. I haven't seen actual validation on this. Let's say an RNN model release with context length, I don't know, 100K or something. I haven't really seen that. But the hope could be that as we scale to long sequences, these alternative architectures could be more well-suited. Not just text, but things like high resolution images, audio, video, and so on, which are emerging applications. So that's one, long sequences. Number two is a high throughput generation, where I can imagine scenarios where the application isn't like an interactive chatbot, but let's say a company wants to batch as many requests as possible on their server, or they're doing offline processing, they're generating stuff based on their internal documents, that you need to process in batch. And the issue with Transformer is that during generation, it essentially needs to keep around all the previous history. It's called the KV cache. And that could take a significant amount of memory, so you can't really batch too much because you run out of memory. I am personally bullish on RNNs. I think RNNs, they essentially summarize the past into a state vector that has fixed size, so the size doesn't grow with the history. So that means that you don't need as much memory to keep around all the previous tokens. And as a result, I think you can scale to much higher batch sizes. And as a result, you can make much more efficient use of the GPUs or the accelerator, and you could have much higher generation throughput. Now, this, I don't think, has been validated at scale. So as a researcher, I'm bullish on this stuff because I think in the next couple of years, these are use cases where these alternatives could have an advantage. We'll just kind of have to wait and see to see if these things will happen. I am personally bullish on this stuff. At the same time, I also spend a bunch of time making attention as fast as possible. So maybe hatching and playing both sides. Ultimately, we want to understand, as researchers, we want to understand what works, why do the models have these capabilities? And one way is, let's push attention to be as efficient as possible. On the other hand, let's push other alternatives to be as efficient at scale, as big as possible, and so that we can kind of compare them and understand. Yeah, awesome. [00:43:01]Alessio: And I think as long as all of this work happens and open, it's a net positive for everybody to explore all the paths. Yeah, let's talk about open-source AI. Obviously, together, when Red Pajama came out, which was an open clone of the LLAMA1 pre-training dataset, it was a big thing in the industry. LLAMA2 came out on Tuesday, I forget. And this week, there's been a lot of things going on, which they call open-source, but it's not really open-source. Actually, we wrote a post about it that was on the front page of Hacker News before this podcast, so I was frantically responding. How do you think about what open-source AI really is? In my mind, in open-source software, we have different levels of open. So there's free software, that's like the GPL license. There's open-source, which is Apache, MIT. And then there's kind of restricted open-source, which is the SSPL and some of these other licenses. In AI, you have the open models. So Red Pajama is an open model because you have the pre-training dataset, you have the training runs and everything. And then there's obviously RandomLens that doesn't make it one-to-one if you retrain it. Then you have the open-weights model that's kind of like StableLM, where the weights are open, but the dataset is not open. And then you have LLAMA2, which is the dataset is not open, the weights are restricted. It's kind of like not really open-source, but open enough. I think it's net positive because it's like $3 million of flops donated to the public. [00:44:32]Tri: How do you think about that? [00:44:34]Alessio: And also, as you work together, what is your philosophy with open-source AI? Right, right. [00:44:40]Tri: Yeah, I think that's a great question. And I think about it on maybe more practical terms. So of course, Meta has done an amazing job training LLAMA1, LLAMA2. And for LLAMA2, they make it much less restrictive compared to LLAMA1. Now you can use it for businesses, unless you are a monthly active user or something like that. I think just this change will have a very significant impact in the kind of landscape of open-source AI, where now lots of businesses, lots of companies will be using, I expect will be using things like LLAMA2. They will fine-tune on their own dataset. They will be serving variants or derivatives of LLAMA2. Whereas before, with LLAMA1, it was also a really good model, but your business companies weren't allowed to do that. So I think on a more practical term, it's kind of shifting the balance between a closed-source model like OpenAI and Anthropic and Google, where you're making API calls, right? And maybe you don't understand as much of what the model is doing, how the model is changing, and so on. Versus now, we have a model with open weight that is pretty competitive from what I've seen in terms of benchmarks, pretty competitive with GPT 3.5, right? And if you fine-tune it on your own data, maybe it's more well-suited for your own data. And I do see that's going to shift the balance of it. More and more folks are going to be using, let's say, derivatives of LLAMA2. More and more folks are going to fine-tune and serve their own model instead of calling an API. So that shifting of balance is important because in one way, we don't want just a concentration of decision-making power in the hands of a few companies. So I think that's a really positive development from Meta. Of course, training the model takes a couple of millions of dollars, but engineers have and I'm sure they spend tons of time trying many, many different things. So the actual cost is probably way more than that. And they make the weights available and they allow probably a lot of companies are going to be using this. So I think that's a really positive development. And we've also seen amazing progress on the open source community where they would take these models and they either fine-tune on different kinds of data sets or even make changes to the model. So as an example, I think for LLAMA1, the context lane was limited to 2K. Like a bunch of folks figured out some really simple methods to scale up to like 8K. [00:47:12]Alessio: Like the RoPE. [00:47:13]Tri: Yes. I think the open source community is very creative, right? And lots of people. LLAMA2 will, again, kind of accelerate this where more people will try it out. More people will make tweaks to it and make a contribution and then so on. So overall, I think I see that as still a very positive development for the field. And there's been lots of libraries that will allow you to host or fine-tune these models, like even with quantization and so on. Just a couple of hours after LLAMA2 was released, tons of companies announcing that, hey, it's on our API or hosting and so on and together did the same. So it's a very fast-paced development and just kind of a model with available weights that businesses are allowed to use. I think that alone is already a very positive development. At the same time, yeah, we can do much better in terms of releasing data sets. Data sets tend to be... Somehow people are not incentivized to release data sets. So philosophically, yeah, you want to be as open as possible. But on a practical term, I think it's a little bit harder for companies to release data sets. Legal issues. The data sets released tend to be not as eye-catchy as the model release. So maybe people are less incentivized to do that. We've seen quite a few companies releasing data sets together. Released a red pajama data set. I think Cerebus then worked on that and deduplicate and clean it up and release slim pajama and so on. So we're also seeing positive development on that front, kind of on the pre-training data set. So I do expect that to continue. And then on the fine-tuning data set or instruction tuning data set, I think we now have quite a few open data sets on instruction tuning and fine-tuning. But these companies do pay for human labelers to annotate these instruction tuning data set. And that is expensive. And maybe they will see that as their competitive advantage. And so it's harder to incentivize these companies to release these data sets. So I think on a practical term, we're still going to make a lot of progress on open source AI, on both the model development, on both model hosting, on pre-training data set and fine-tuning data set. Right now, maybe we don't have the perfect open source model since all the data sets are available. Maybe we don't have such a thing yet, but we've seen very fast development on the open source side. I think just maybe this time last year, there weren't as many models that are competitive with, let's say, ChatGPT. [00:49:43]Alessio: Yeah, I think the open data sets have so much more impact than open models. If you think about Elusive and the work that they've done, GPT-J was great, and the Pythia models are great, but the Pyle and the Stack, everybody uses them. So hopefully we get more people to contribute time to work on data sets instead of doing the 100th open model that performs worse than all the other ones, but they want to say they released the model. [00:50:14]Tri: Yeah, maybe the question is, how do we figure out an incentive structure so that companies are willing to release open data sets? And for example, it could be like, I think some of the organizations are now doing this where they are asking volunteers to annotate and so on. And maybe the Wikipedia model of data set, especially for instruction tuning, could be interesting where people actually volunteer their time and instead of editing Wikipedia, add annotation. And somehow they acknowledge and feel incentivized to do so. Hopefully we get to that kind of level of, in terms of data, it would be kind of like Wikipedia. And in terms of model development, it's kind of like Linux where people are contributing patches and improving the model in some way. I don't know exactly how that's going to happen, but based on history, I think there is a way to get there. [00:51:05]Alessio: Yeah, I think the Dolly-15K data set is a good example of a company saying, let's do this smaller thing, just make sure we make it open. We had Mike Conover from Databricks on the podcast, and he was like, people just bought into it and leadership was bought into it. You have companies out there with 200,000, 300,000 employees. It's like, just put some of them to label some data. It's going to be helpful. So I'm curious to see how that evolves. What made you decide to join Together? [00:51:35]Tri: For Together, the focus has been focusing a lot on open source model. And I think that aligns quite well with what I care about, of course. I also know a bunch of people there that I know and trust, and I'm excited to work with them. Philosophically, the way they've been really open with data set and model release, I like that a lot. Personally, for the stuff, for example, the research that I've developed, like we also try to make code available, free to use and modify and so on, contributing to the community. That has given us really valuable feedback from the community and improving our work. So philosophically, I like the way Together has been focusing on open source model. And the nice thing is we're also going to be at the forefront of research and the kind of research areas that I'm really excited about, things like efficient training and inference, aligns quite well with what the company is doing. We'll try our best to make things open and available to everyone. Yeah, but it's going to be fun being at the company, leading a team, doing research on the topic that I really care about, and hopefully we'll make things open to benefit the community. [00:52:45]Alessio: Awesome. Let's jump into the lightning round. Usually, I have two questions. So one is on acceleration, one on exploration, and then a takeaway. So the first one is, what's something that already happened in AI machine learning that you thought would take much longer than it has? [00:53:01]Tri: I think understanding jokes. I didn't expect that to happen, but it turns out scaling model up and training lots of data, the model can now understand jokes. Maybe it's a small thing, but that was amazing to me. [00:53:16]Alessio: What about the exploration side? What are some of the most interesting unsolved questions in the space? [00:53:22]Tri: I would say reasoning in the broad term. We don't really know how these models do. Essentially, they do something that looks like reasoning. We don't know how they're doing it. We have some ideas. And in the future, I think we will need to design architecture that explicitly has some kind of reasoning module in it if we want to have much more capable models. [00:53:43]Alessio: What's one message you want everyone to remember today? [00:53:47]Tri: I would say try to understand both the algorithm and the systems that these algorithms run on. I think at the intersection of machine learning system has been really exciting, and there's been a lot of amazing results at this intersection. And then when you scale models to large scale, both the machine learning side and the system side really matter. [00:54:06]Alessio: Awesome. Well, thank you so much for coming on 3. [00:54:09]Tri: This was great. Yeah, this has been really fun. [00:54:11] Get full access to Latent Space at www.latent.space/subscribe

The Six Five with Patrick Moorhead and Daniel Newman
The Six Five Connected with Diana Blass: Copyright, Deep Fakes, and Data Privacy in the Age of AI

The Six Five with Patrick Moorhead and Daniel Newman

Play Episode Listen Later Jul 25, 2023 21:43


In a follow-up to the first episode that investigated the race to AI innovation playing out on the world stage, this episode of The Six Five Connected with Diana Blass dives into the copyright violations, data privacy issues and potential deception associated with AI-generated content. On this episode, you will: Learn about the implications of the latest investigations and lawsuits with Kirk Sigmon. Meet SWIPEBY, a company that has built a text-to-image solution using OpenAI and Stability AI to help small businesses compete with their larger peers. Gain insights on the latest AI news, specifically Meta & Microsoft's announcement of LlAMA2, with analysis from The Six Five Five Co-Hosts Patrick Moorhead & Daniel Newman.

Hacker News Recap
July 23rd, 2023 | A Romanian economist legally gamed the lottery

Hacker News Recap

Play Episode Listen Later Jul 24, 2023 19:41


This is a recap of the top 10 posts on Hacker News on July 23rd, 2023.This podcast was generated by wondercraft.ai(00:42): Man found guilty of child porn because he ran a Tor exit nodeOriginal post: https://news.ycombinator.com/item?id=36837273&utm_source=wondercraft_ai(02:32): All foster kids in California can now attend any state college for freeOriginal post: https://news.ycombinator.com/item?id=36839983&utm_source=wondercraft_ai(04:24): Llama2.c: Inference llama 2 in one file of pure COriginal post: https://news.ycombinator.com/item?id=36838051&utm_source=wondercraft_ai(06:24): Toyota has been developing a solid-state battery for EVs with a range of 745miOriginal post: https://news.ycombinator.com/item?id=36833836&utm_source=wondercraft_ai(08:12): I found an IT job thanks to this blogOriginal post: https://news.ycombinator.com/item?id=36833089&utm_source=wondercraft_ai(10:00): A Romanian economist legally gamed the lotteryOriginal post: https://news.ycombinator.com/item?id=36834415&utm_source=wondercraft_ai(12:20): Why Host in Kosovo?Original post: https://news.ycombinator.com/item?id=36837690&utm_source=wondercraft_ai(14:18): Deutsche Bank's “dysfunctional” IT division (2018)Original post: https://news.ycombinator.com/item?id=36833915&utm_source=wondercraft_ai(16:13): Icon Buddy – 100K+ Open Source SVG Icons, Fully CustomizableOriginal post: https://news.ycombinator.com/item?id=36837442&utm_source=wondercraft_ai(17:49): The night an audience member filled in for Keith Moon (2016)Original post: https://news.ycombinator.com/item?id=36835735&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

TechReview - The Podcast
71: Don't fear the Llama (2)

TechReview - The Podcast

Play Episode Listen Later Jul 22, 2023 31:21


TikTok aims to identify rising artists, Instagram Threads is still far from Europe and Llama 2 attacks Microsoft and OpenAI! Let's talk about that in TechReview 71!00:00 - Intro00:53 - 1: TikTok's new program aims to identify rising artists in the music industry 08:19 - 2: Instagram Chief Says EU Access to Threads Could Take Months, User Retention ‘Better Than Expected' 17:00 - 3: Meta launches Llama 2, a source-available AI model that allows commercial applications 1. https://techcrunch.com/2023/07/18/tiktoks-new-program-aims-to-identify-rising-artists-in-the-music-industry/2. https://www.socialmediatoday.com/news/instagram-chief-says-eu-access-threads-could-take-months-user-retention/688184/3. https://arstechnica.com/information-technology/2023/07/meta-launches-llama-2-an-open-source-ai-model-that-allows-commercial-applications/Our panel todayHenrike https://www.linkedin.com/in/henrike-hedel/Vincent https://www.linkedin.com/in/vincent-t-irmler/Tarek https://www.linkedin.com/in/tarekmadanymamlouk/Every week our panel of technology enthusiasts meets to discuss the most important news from the fields of technology, innovation, and science. And you can join us live!https://techreview.axelspringer.com/https://www.ideas-engineering.io/https://www.freetech.academy/https://www.upday.com/

This Day in AI Podcast
Llama 2 Unleashed, Open-Source Vs OpenAI, Instructions for ChatGPT & Fine Tuning with Llama2 | EP24

This Day in AI Podcast

Play Episode Listen Later Jul 21, 2023 76:09


In Episode 24 we discuss the release of Llama 2, what it means for the open-source community, developers and the future of AI LLMs with it being commercially available. We discuss what threat open-source models now pose to the business models of OpenAI and others, and also take a look at Instructions for ChatGPT. We have a long discussion on fine tuning smaller models and cover news including Google's Generative AI Enterprise Search product, Bing Enterprise Chat, AppleGPT rumors and finally end with the state of AI startups, fundraising, VC and what makes good AI investment.If you like this podcast, please consider leaving us a review and sharing with friends/family.CHAPTERS:00:00 - Lizard Man is now a Hero (cold open)00:49 - Meta's Llama 2 release and what it means21:18 - Llama 2 alignment, Is fine tuning just censorship?27:02 - What is Meta's strategy with Llama 2? Will it hurt OpenAI?31:56 - OpenAI fears GPT-4 Vision will offend someone40:40 - ChatGPT Plus Users get GPT-4 Rate Limits Doubled41:56 - ChatGPT Custom Instructions 46:15 - Does Llama Threaten OpenAI with Smaller Refined Fine Tuned Models?54:39 - Google's Generative AI Enterprise Search (Gen App Builder) Announcement56:16 - Microsoft's Bing Chat Enterprise1:02:19 - AppleGPT “AJAX” rumor1:05:49 - The State of AI Startups, VC and fundraising1:12:32 - AI LOLs & Memes SOURCES: https://huggingface.co/chat/ https://about.fb.com/news/2023/07/llama-2/ https://openai.com/blog/custom-instructions-for-chatgpt https://www.nytimes.com/2023/07/18/technology/openai-chatgpt-facial-recognition.html https://cloud.google.com/blog/products/ai-machine-learning/enterprise-search-on-gen-app-builder https://blogs.microsoft.com/blog/2023/07/18/furthering-our-ai-ambitions-announcing-bing-chat-enterprise-and-microsoft-365-copilot-pricing/ https://www.zdnet.com/article/apple-sneaks-into-the-ai-chatbot-race-with-apple-gpt/ https://twitter.com/0xsamhogan/status/1680725207898816512?s=46&t=uXHUN4Glah4CaV-g2czc6Q https://www.zerohedge.com/technology/ai-detectors-think-us-constitution-was-written-ai https://www.reddit.com/r/ChatGPT/comments/154fck9/girl_gave_me_her_number_and_it_ended_up_being_gpt https://twitter.com/patrickc/status/1681699442817368064?s=46&t=uXHUN4Glah4CaV-g2czc6Q

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
Llama 2: The New Open LLM SOTA (ft. Nathan Lambert, Matt Bornstein, Anton Troynikov, Russell Kaplan, Whole Mars Catalog et al.)

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jul 19, 2023 79:53


As first discussed on our May Emergency pod and leaked 4 days ago, Llama (renamed from LLaMA) was upgraded to Llama 2 (pretraining on 2 trillion tokens with 2x the context length - bigger than any dataset discussed in Datasets 101, and adding ~$20m of RLHF/preference annotation) and released for commercial use on 18 July.It immediately displaced Falcon-40B as the leading open LLM and was immediately converted/quantized to GGML and other formats. Llama 2 seems to outperform all other open source models in their equivalent weight class:Why are open models important? The intersection of Open Source and AI is one of the oldest themes on this publication, and there has been a raging debate on the security and reliability of the OpenAI models and APIs. Users have reported GPT-4's quality going down, which has been denied and denied and as of today, given some supporting data from Databricks, and complained about the API reliability and rapid deprecation schedules. Last and surely the biggest, there are entire classes of businesses and government/healthcare/military organizations that categorically cannot send any of their sensitive data to an external API provider, even if it is OpenAI through Azure. The only way to have total control is to own and serve your own models, which Llama 2 now pushes forward in terms of the state of the art (your own GPT3.5-quality model, though it is nowhere near Claude 2 or GPT-4).As we do with breaking news, we got on to Twitter Spaces again to chat with two scheduled guests:* , ML Researcher at Huggingface and author of Interconnects who had the best summary of the Llama2 paper* Matt Bornstein, organizer of the a16z infra team that launched Llama2.ai (source here) and has been coding up a storm with AI demo apps, unusual for VCsas well as Anton Troynikov of Chroma, Russell Kaplan of Scale AI, and Omar Qazi of the Whole Mars Catalog.Enjoy!Show Notes* Official links* Website, Paper* GitHub (Llama 2 commit)* Azure Partnership* Use policy, Statement of Support for Open Approach* Where to try* Llama2.ai (source), Perplexity Llama Chat* Live playground/API on Replicate, deploy all versions on Baseten* https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI * Dev ports - simonw llm-replicate, ggml using llama.cpp (7B, 13B) or pinokio,  ollama, Core ML port* Timeline* 24 Feb - LLaMA 1 announced* 6 May - our No Moats podcast - first mention of Zuck opening up Llama* 14 July - Llama 2 leaked* 18 July - Llama 2 announced* Community notes* Nathan's research paper recap* 638 LOC, 4 dependencies* Usage restrictions - MAU restriction, derivative models* Grouped Query Attention* System prompt* 2 trillion token dataset* >$20m price tag (rlhf, jimfan), * Separate models for safety and helpfulness (jimfan)* Mistral AI founders left out of paper* Interesting fails: Timestamps* [00:02:30] Introducing the speakers* [00:03:32] Nathan Lambert intro* [00:04:48] General Summary of Llama 2* [00:05:57] Sarah Silverman killed Dataset Transparency?* [00:08:48] Simon's Recap of Llama 2* [00:11:43] Matt's Intro* [00:12:59] a16z Infra's new AI team?* [00:15:10] Alessio's recap of Llama 2* [00:17:26] Datasets 101 Followup* [00:18:14] Context Length 4k* [00:20:35] Open-ish Source? Usage Policy and Restrictions* [00:23:38] Huggingface Responsible AI License* [00:24:57] Pretraining Llama 2 Base Model beyond Chinchilla* [00:29:55] Llama 2 is incomplete? Race to publish* [00:31:40] Come for the Llama, stay for the (Meta) drama* [00:33:22] Language Translation* [00:35:10] Llama2's coding abilities* [00:35:59] Why we want to know about the training data* [00:37:45] The importance of Meta pushing forward Truly Open AI* [00:40:59] Llama 2 as Enabler of Startups* [00:43:59] Where you can try Llama 2* [00:44:25] Do you need dataset transparency if you have evals?* [00:45:56] >$20m cost of Llama 2 is primarily preference data collection* [00:48:59] Do we even need human annotators?* [00:49:42] Models Rating Models* [00:53:32] How to get Code preference data* [00:54:34] Llama 2 Finetuning Ecosystem* [00:56:32] Hey Apple: Llama2 on Metal pls* [00:57:17] Llama 2 and Chroma* [01:00:15] Open Source MoE model?* [01:00:51] Llama 2 using tools* [01:01:40] Russell Kaplan on Scale AI's Llama 2 plans* [01:03:31] Scale annotating code?* [01:04:36] Immortality* [01:04:59] Running Llama on your phone* [01:06:54] Sama

FLASH DIARIO de El Siglo 21 es Hoy
Meta Lanza IA Abierta : LLaMA2

FLASH DIARIO de El Siglo 21 es Hoy

Play Episode Listen Later Jul 19, 2023 3:50


Meta ha presentado su nueva inteligencia artificial de código abierto, LLaMA 2, buscando consolidarse en el panorama de las IA. Este nuevo modelo, anunciado durante el evento Inspire de Microsoft, pretende competir directamente con GPT-4, respaldado por OpenAI, y PaLM 2, de Google. Meta asegura que LLaMA 2 se ha entrenado con un 40% más de datos que su predecesora y cuenta con una extensión de contexto significativamente mayor. Esta iniciativa se enmarca en la estrategia de la compañía de Mark Zuckerberg de fomentar la transparencia y la accesibilidad en el campo de la inteligencia artificial.Sin embargo, este anuncio oculta una intención estratégica.En un contexto de creciente competencia en el terreno de la inteligencia artificial, Meta decide apostar por un modelo de IA de código abierto. LLaMA 2 es una inteligencia artificial de código abierto desarrollada por Meta. Esto significa que cualquier persona puede usarla, ya sea para fines comerciales o de investigación. Su objetivo es proporcionar a los desarrolladores e investigadores una alternativa gratuita a otras tecnologías de inteligencia artificial propietarias, como GPT-4 de OpenAI o PaLM 2 de Google.Pero, bajo la promesa de una inteligencia artificial de código abierto y su aparente filantropía, Meta esconde una problemática. La compañía está tratando de adentrarse en el competitivo campo de la IA y ganarse la confianza de una comunidad de desarrolladores e investigadores que a menudo carecen de los recursos necesarios para acceder a tecnologías propietarias. Al ofrecer LLaMA 2 de manera gratuita, Meta se enfrenta a gigantes como OpenAI y Google, buscando su lugar en un mercado saturado. Esto incluye a desarrolladores individuales, profesionales de distintos sectores, organizaciones y empresas que deseen construir herramientas y experiencias impulsadas por inteligencia artificial. La idea es democratizar el acceso a estas tecnologías avanzadas para fomentar la innovación y el desarrollo en el campo de la inteligencia artificial. Por tanto, la accesibilidad de LLaMA 2 está pensada para proporcionar una gran amplitud de posibilidades en el desarrollo de nuevos modelos de IA.A pesar de las dificultades, el lanzamiento de LLaMA 2 es recibido con optimismo. Su disponibilidad en Microsoft Azure, Windows y otras plataformas como Amazon Web Services y Hugging Face, brinda a Meta una amplia exposición. En palabras de Mark Zuckerberg, con una IA de código abierto, se fomenta la innovación y se mejora la seguridad. Con el apoyo de diversas compañías como NVIDIA, Spotify, IBM, Intel, entre otras, Meta empieza a trazar su propio camino en el mundo de la inteligencia artificial.