Podcasts about lstms

25PODCASTS
40EPISODES
49mAVG DURATION
1MONTHLY NEW EPISODE
Jun 2, 2025LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about lstms

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

5 episodes with lstms

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

3 episodes with lstms

The Nonlinear Library

3 episodes with lstms

NLP Highlights

2 episodes with lstms

O'Reilly Bots Podcast - O'Reilly Media Podcast

2 episodes with lstms

The Nonlinear Library: LessWrong

3 episodes with lstms

Latest podcast episodes about lstms

#259 Anjney Midha: a16z's Strategy to Turn AI Startups into Unicorns

Eye On A.I.

Play Episode Listen Later Jun 2, 2025 56:38

AGNTCY - Unlock agents at scale with an open Internet of Agents. Visit https://agntcy.org/ and add your support. What does it really take to turn cutting-edge AI research into a successful foundation model company? In this episode of Eye on AI, we sit down with Anjney Midha, General Partner at a16z, to unpack how he helps scientists and researchers transform their breakthroughs into scalable, real-world AI businesses. From his early backing of Anthropic to launching Mistral and Black Forest Labs, Anjney shares a behind-the-scenes look at how AI infrastructure companies are born. We dive into the critical challenges of model reliability, evaluation beyond academic benchmarks, and the rise of hybrid architectures combining transformers with diffusion and LSTMs. If you're building in AI or investing in it, this is the roadmap to what's next. Stay Updated: Craig Smith on X:https://x.com/craigss Eye on A.I. on X: https://x.com/EyeOn_AI (00:00) Turning AI Research into Real Companies (02:08) Anjney's Journey into Venture Capital (05:44) The Birth of Anthropic (08:26) Backing Mistral and Stable Diffusion (13:16) Are Transformers Really Enough? (18:36) Why AI Evaluation Is Broken (22:10) Making AI Models More Interpretable (28:38) The Real Potential of AI Agents (32:43) How a16z Spots AI Breakthroughs (37:45) Investing Like It's the 1970s (43:31) What AI Voice Tech Needs Right Now (46:32) Models vs Products (51:17) What's Holding Back AI Agents (55:41) Anjney Startup Investing Strategy

ai strategy internet startups birth products unicorns models venture capital general partners anthropic mistral stable diffusion a16z mistral ai foundation models lstms

#248 Pedro Domingos: How Connectionism Is Reshaping the Future of Machine Learning

Eye On A.I.

Play Episode Listen Later Apr 17, 2025 59:56

This episode is sponsored by Indeed. Stop struggling to get your job post seen on other job sites. Indeed's Sponsored Jobs help you stand out and hire fast. With Sponsored Jobs your post jumps to the top of the page for your relevant candidates, so you can reach the people you want faster. Get a $75 Sponsored Job Credit to boost your job's visibility! Claim your offer now: https://www.indeed.com/EYEONAI In this episode, renowned AI researcher Pedro Domingos, author of The Master Algorithm, takes us deep into the world of Connectionism—the AI tribe behind neural networks and the deep learning revolution. From the birth of neural networks in the 1940s to the explosive rise of transformers and ChatGPT, Pedro unpacks the history, breakthroughs, and limitations of connectionist AI. Along the way, he explores how supervised learning continues to quietly power today's most impressive AI systems—and why reinforcement learning and unsupervised learning are still lagging behind. We also dive into: The tribal war between Connectionists and Symbolists The surprising origins of Backpropagation How transformers redefined machine translation Why GANs and generative models exploded (and then faded) The myth of modern reinforcement learning (DeepSeek, RLHF, etc.) The danger of AI research narrowing too soon around one dominant approach Whether you're an AI enthusiast, a machine learning practitioner, or just curious about where intelligence is headed, this episode offers a rare deep dive into the ideological foundations of AI—and what's coming next. Don't forget to subscribe for more episodes on AI, data, and the future of tech. Stay Updated: Craig Smith on X:https://x.com/craigss Eye on A.I. on X: https://x.com/EyeOn_AI (00:00) What Are Generative Models? (03:02) AI Progress and the Local Optimum Trap (06:30) The Five Tribes of AI and Why They Matter (09:07) The Rise of Connectionism (11:14) Rosenblatt's Perceptron and the First AI Hype Cycle (13:35) Backpropagation: The Algorithm That Changed Everything (19:39) How Backpropagation Actually Works (21:22) AlexNet and the Deep Learning Boom (23:22) Why the Vision Community Resisted Neural Nets (25:39) The Expansion of Deep Learning (28:48) NetTalk and the Baby Steps of Neural Speech (31:24) How Transformers (and Attention) Transformed AI (34:36) Why Attention Solved the Bottleneck in Translation (35:24) The Untold Story of Transformer Invention (38:35) LSTMs vs. Attention: Solving the Vanishing Gradient Problem (42:29) GANs: The Evolutionary Arms Race in AI (48:53) Reinforcement Learning Explained (52:46) Why RL Is Mostly Just Supervised Learning in Disguise (54:35) Where AI Research Should Go Next

#232 Sepp Hochreiter: How LSTMs Power Modern AI System's

Eye On A.I.

Play Episode Listen Later Jan 22, 2025 51:08

In this special episode of the Eye on AI podcast, Sepp Hochreiter, the inventor of Long Short-Term Memory (LSTM) networks, joins Craig Smith to discuss the profound impact of LSTMs on artificial intelligence, from language models to real-time robotics. Sepp reflects on the early days of LSTM development, sharing insights into his collaboration with Jürgen Schmidhuber and the challenges they faced in gaining recognition for their groundbreaking work. He explains how LSTMs became the foundation for technologies used by giants like Amazon, Apple, and Google, and how they paved the way for modern advancements like transformers. Topics include: - The origin story of LSTMs and their unique architecture. - Why LSTMs were crucial for sequence data like speech and text. - The rise of transformers and how they compare to LSTMs. - Real-time robotics: using LSTMs to build energy-efficient, autonomous systems. The next big challenges for AI and robotics in the era of generative AI. Sepp also shares his optimistic vision for the future of AI, emphasizing the importance of efficient, scalable models and their potential to revolutionize industries from healthcare to autonomous vehicles. Don't miss this deep dive into the history and future of AI, featuring one of its most influential pioneers. (00:00) Introduction: Meet Sepp Hochreiter (01:10) The Origins of LSTMs (02:26) Understanding the Vanishing Gradient Problem (05:12) Memory Cells and LSTM Architecture (06:35) Early Applications of LSTMs in Technology (09:38) How Transformers Differ from LSTMs (13:38) Exploring XLSTM for Industrial Applications (15:17) AI for Robotics and Real-Time Systems (18:55) Expanding LSTM Memory with Hopfield Networks (21:18) The Road to XLSTM Development (23:17) Industrial Use Cases of XLSTM (27:49) AI in Simulation: A New Frontier (32:26) The Future of LSTMs and Scalability (35:48) Inference Efficiency and Potential Applications (39:53) Continuous Learning and Adaptability in AI (42:59) Training Robots with XLSTM Technology (44:47) NXAI: Advancing AI in Industry

amazon google ai apple technology future real modern origins robotics adaptability scalability continuous learning sepp craig smith ai systems autonomous systems lstm schmidhuber lstms

Ep 47: Chief AI Officer of Databricks Jonathan Frankle on Why New Model Architectures are Unlikely, When to Pre-Train or Fine Tune, and Hopes for Future AI Policy

Unsupervised Learning

Play Episode Listen Later Nov 12, 2024 64:25

Jonathan Frankle is the Chief AI Scientist at Databricks ($43B), which he joined through the acquisition of MosaicML in July 2023. Databricks has over 12,000 customers on the cutting edge of AI; Jonathan works to anticipate their needs and offer solutions even as the tech is rapidly evolving. [0:00] Intro[0:52] Incentives and Team Motivation at Databricks[2:40] The Evolution of AI Models: Transformers vs. LSTMs[5:27] Mosaic and Databricks: A Strategic Merger[7:31] Guidance on AI Model Training and Fine-Tuning[11:11] Building Effective AI Evaluations[16:02] Domain-Specific AI Models and Their Importance[19:37] The Future of AI: Challenges and Opportunities[25:07] Ethical Considerations and Human-AI Interaction[29:13] Customer Collaboration and AI Implementation[30:45] Navigating AI Tools and Techniques[35:41] The Role of Open Source Models[36:46] AI Infrastructure and Partnerships[48:27] Academia's Role in AI Research[52:09] Ethics and Policy in AI[57:47] Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq'd by VMWare) @jordan_segall - Partner at Redpoint

Efficiency is Coming: 3000x Faster, Cheaper, Better AI Inference from Hardware Improvements, Quantization, and Synthetic Data Distillation

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Sep 3, 2024 65:18

AI Engineering is expanding! Join the first

tv amazon time world new york city google ai vision las vegas training research data speaker fun model unity 3d harry potter sons computers taiwan tv shows avatar context ebay hitting albert einstein ces ip characters saas cto efficiency optimizing openai gemini nvidia hardware qr oxygen generating turbo gpt improvements ml gin festa notably llama precision npc cheaper unreal alternatively siemens synthetic exeter opus slenderman sora copilot preference llm gpu npcs elo quirky agi astrophysics second life rembrandt jin c4 rag gpus jetsons gan zenith google ai tldr anthropic alessio raman computer vision mpc cpus iac suno mistral amazon go convey inference databricks google deepmind chinchillas simulating carleton college jensen huang jetson computex sean carroll distillation distill naila edge ai metahuman nvidia gtc character ai 70b loras lisa su game engines rlhf resnet v100 james wang mpcs latent space ternary lstms

Jürgen Schmidhuber - Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs

Machine Learning Street Talk

Play Episode Listen Later Aug 28, 2024 99:39

Jürgen Schmidhuber, the father of generative AI shares his groundbreaking work in deep learning and artificial intelligence. In this exclusive interview, he discusses the history of AI, some of his contributions to the field, and his vision for the future of intelligent machines. Schmidhuber offers unique insights into the exponential growth of technology and the potential impact of AI on humanity and the universe. YT version: https://youtu.be/DP454c1K_vQ MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api. TOC 00:00:00 Intro 00:03:38 Reasoning 00:13:09 Potential AI Breakthroughs Reducing Computation Needs 00:20:39 Memorization vs. Generalization in AI 00:25:19 Approach to the ARC Challenge 00:29:10 Perceptions of Chat GPT and AGI 00:58:45 Abstract Principles of Jurgen's Approach 01:04:17 Analogical Reasoning and Compression 01:05:48 Breakthroughs in 1991: the P, the G, and the T in ChatGPT and Generative AI 01:15:50 Use of LSTM in Language Models by Tech Giants 01:21:08 Neural Network Aspect Ratio Theory 01:26:53 Reinforcement Learning Without Explicit Teachers Refs: ★ "Annotated History of Modern AI and Deep Learning" (2022 survey by Schmidhuber): ★ Chain Rule For Backward Credit Assignment (Leibniz, 1676) ★ First Neural Net / Linear Regression / Shallow Learning (Gauss & Legendre, circa 1800) ★ First 20th Century Pioneer of Practical AI (Quevedo, 1914) ★ First Recurrent NN (RNN) Architecture (Lenz, Ising, 1920-1925) ★ AI Theory: Fundamental Limitations of Computation and Computation-Based AI (Gödel, 1931-34) ★ Unpublished ideas about evolving RNNs (Turing, 1948) ★ Multilayer Feedforward NN Without Deep Learning (Rosenblatt, 1958) ★ First Published Learning RNNs (Amari and others, ~1972) ★ First Deep Learning (Ivakhnenko & Lapa, 1965) ★ Deep Learning by Stochastic Gradient Descent (Amari, 1967-68) ★ ReLUs (Fukushima, 1969) ★ Backpropagation (Linnainmaa, 1970); precursor (Kelley, 1960) ★ Backpropagation for NNs (Werbos, 1982) ★ First Deep Convolutional NN (Fukushima, 1979); later combined with Backprop (Waibel 1987, Zhang 1988). ★ Metalearning or Learning to Learn (Schmidhuber, 1987) ★ Generative Adversarial Networks / Artificial Curiosity / NN Online Planners (Schmidhuber, Feb 1990; see the G in Generative AI and ChatGPT) ★ NNs Learn to Generate Subgoals and Work on Command (Schmidhuber, April 1990) ★ NNs Learn to Program NNs: Unnormalized Linear Transformer (Schmidhuber, March 1991; see the T in ChatGPT) ★ Deep Learning by Self-Supervised Pre-Training. Distilling NNs (Schmidhuber, April 1991; see the P in ChatGPT) ★ Experiments with Pre-Training; Analysis of Vanishing/Exploding Gradients, Roots of Long Short-Term Memory / Highway Nets / ResNets (Hochreiter, June 1991, further developed 1999-2015 with other students of Schmidhuber) ★ LSTM journal paper (1997, most cited AI paper of the 20th century) ★ xLSTM (Hochreiter, 2024) ★ Reinforcement Learning Prompt Engineer for Abstract Reasoning and Planning (Schmidhuber 2015) ★ Mindstorms in Natural Language-Based Societies of Mind (2023 paper by Schmidhuber's team) https://arxiv.org/abs/2305.17066 ★ Bremermann's physical limit of computation (1982) EXTERNAL LINKS CogX 2018 - Professor Juergen Schmidhuber https://www.youtube.com/watch?v=17shdT9-wuA Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability (Neural Networks, 1997) https://sferics.idsia.ch/pub/juergen/loconet.pdf The paradox at the heart of mathematics: Gödel's Incompleteness Theorem - Marcus du Sautoy https://www.youtube.com/watch?v=I4pQbo5MQOs (Refs truncated, full version on YT VD)

AI Magic: Shipping 1000s of successful products with no managers and a team of 12 — Jeremy Howard of Answer.ai

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Aug 16, 2024 58:56

Disclaimer: We recorded this episode ~1.5 months ago, timing for the FastHTML release. It then got bottlenecked by Llama3.1, Winds of AI Winter, and SAM2 episodes, so we're a little late. Since then FastHTML was released, swyx is building an app in it for AINews, and Anthropic has also released their prompt caching API. Remember when Dylan Patel of SemiAnalysis coined the GPU Rich vs GPU Poor war? (if not, see our pod with him). The idea was that if you're GPU poor you shouldn't waste your time trying to solve GPU rich problems (i.e. pre-training large models) and are better off working on fine-tuning, optimized inference, etc. Jeremy Howard (see our “End of Finetuning” episode to catchup on his background) and Eric Ries founded Answer.AI to do exactly that: “Practical AI R&D”, which is very in-line with the GPU poor needs. For example, one of their first releases was a system based on FSDP + QLoRA that let anyone train a 70B model on two NVIDIA 4090s. Since then, they have come out with a long list of super useful projects (in no particular order, and non-exhaustive):* FSDP QDoRA: this is just as memory efficient and scalable as FSDP/QLoRA, and critically is also as accurate for continued pre-training as full weight training.* Cold Compress: a KV cache compression toolkit that lets you scale sequence length without impacting speed.* colbert-small: state of the art retriever at only 33M params* JaColBERTv2.5: a new state-of-the-art retrievers on all Japanese benchmarks.* gpu.cpp: portable GPU compute for C++ with WebGPU.* Claudette: a better Anthropic API SDK. They also recently released FastHTML, a new way to create modern interactive web apps. Jeremy recently released a 1 hour “Getting started” tutorial on YouTube; while this isn't AI related per se, but it's close to home for any AI Engineer who are looking to iterate quickly on new products: In this episode we broke down 1) how they recruit 2) how they organize what to research 3) and how the community comes together. At the end, Jeremy gave us a sneak peek at something new that he's working on that he calls dialogue engineering: So I've created a new approach. It's not called prompt engineering. I'm creating a system for doing dialogue engineering. It's currently called AI magic. I'm doing most of my work in this system and it's making me much more productive than I was before I used it.He explains it a bit more ~44:53 in the pod, but we'll just have to wait for the public release to figure out exactly what he means.Timestamps* [00:00:00] Intro by Suno AI* [00:03:02] Continuous Pre-Training is Here* [00:06:07] Schedule-Free Optimizers and Learning Rate Schedules* [00:07:08] Governance and Structural Issues within OpenAI and Other AI Labs* [00:13:01] How Answer.ai works* [00:23:40] How to Recruit Productive Researchers* [00:27:45] Building a new BERT* [00:31:57] FSDP, QLoRA, and QDoRA: Innovations in Fine-Tuning Large Models* [00:36:36] Research and Development on Model Inference Optimization* [00:39:49] FastHTML for Web Application Development* [00:46:53] AI Magic & Dialogue Engineering* [00:52:19] AI wishlist & predictionsShow Notes* Jeremy Howard* Previously on Latent Space: The End of Finetuning, NeurIPS Startups* Answer.ai* Fast.ai* FastHTML* answerai-colbert-small-v1* gpu.cpp* Eric Ries* Aaron DeFazio* Yi Tai* Less Wright* Benjamin Warner* Benjamin Clavié* Jono Whitaker* Austin Huang* Eric Gilliam* Tim Dettmers* Colin Raffel* Sebastian Raschka* Carson Gross* Simon Willison* Sepp Hochreiter* Llama3.1 episode* Snowflake Arctic* Ranger Optimizer* Gemma.cpp* HTMX* UL2* BERT* DeBERTa* Efficient finetuning of Llama 3 with FSDP QDoRA* xLSTMTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO-in-Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:14]: And today we're back with Jeremy Howard, I think your third appearance on Latent Space. Welcome.Jeremy [00:00:19]: Wait, third? Second?Swyx [00:00:21]: Well, I grabbed you at NeurIPS.Jeremy [00:00:23]: I see.Swyx [00:00:24]: Very fun, standing outside street episode.Jeremy [00:00:27]: I never heard that, by the way. You've got to send me a link. I've got to hear what it sounded like.Swyx [00:00:30]: Yeah. Yeah, it's a NeurIPS podcast.Alessio [00:00:32]: I think the two episodes are six hours, so there's plenty to listen, we'll make sure to send it over.Swyx [00:00:37]: Yeah, we're trying this thing where at the major ML conferences, we, you know, do a little audio tour of, give people a sense of what it's like. But the last time you were on, you declared the end of fine tuning. I hope that I sort of editorialized the title a little bit, and I know you were slightly uncomfortable with it, but you just own it anyway. I think you're very good at the hot takes. And we were just discussing in our pre-show that it's really happening, that the continued pre-training is really happening.Jeremy [00:01:02]: Yeah, absolutely. I think people are starting to understand that treating the three ULM FIT steps of like pre-training, you know, and then the kind of like what people now call instruction tuning, and then, I don't know if we've got a general term for this, DPO, RLHFE step, you know, or the task training, they're not actually as separate as we originally suggested they were in our paper, and when you treat it more as a continuum, and that you make sure that you have, you know, more of kind of the original data set incorporated into the later stages, and that, you know, we've also seen with LLAMA3, this idea that those later stages can be done for a lot longer. These are all of the things I was kind of trying to describe there. It wasn't the end of fine tuning, but more that we should treat it as a continuum, and we should have much higher expectations of how much you can do with an already trained model. You can really add a lot of behavior to it, you can change its behavior, you can do a lot. So a lot of our research has been around trying to figure out how to modify the model by a larger amount rather than starting from random weights, because I get very offended at the idea of starting from random weights.Swyx [00:02:14]: Yeah, I saw that in ICLR in Vienna, there was an outstanding paper about starting transformers from data-driven piers. I don't know if you saw that one, they called it sort of never trained from scratch, and I think it was kind of rebelling against like the sort of random initialization.Jeremy [00:02:28]: Yeah, I've, you know, that's been our kind of continuous message since we started Fast AI, is if you're training for random weights, you better have a really good reason, you know, because it seems so unlikely to me that nobody has ever trained on data that has any similarity whatsoever to the general class of data you're working with, and that's the only situation in which I think starting from random weights makes sense.Swyx [00:02:51]: The other trends since our last pod that I would point people to is I'm seeing a rise in multi-phase pre-training. So Snowflake released a large model called Snowflake Arctic, where they detailed three phases of training where they had like a different mixture of like, there was like 75% web in the first instance, and then they reduced the percentage of the web text by 10% each time and increased the amount of code in each phase. And I feel like multi-phase is being called out in papers more. I feel like it's always been a thing, like changing data mix is not something new, but calling it a distinct phase is new, and I wonder if there's something that you're seeingJeremy [00:03:32]: on your end. Well, so they're getting there, right? So the point at which they're doing proper continued pre-training is the point at which that becomes a continuum rather than a phase. So the only difference with what I was describing last time is to say like, oh, there's a function or whatever, which is happening every batch. It's not a huge difference. You know, I always used to get offended when people had learning rates that like jumped. And so one of the things I started doing early on in Fast.ai was to say to people like, no, you should actually have your learning rate schedule should be a function, not a list of numbers. So now I'm trying to give the same idea about training mix.Swyx [00:04:07]: There's been pretty public work from Meta on schedule-free optimizers. I don't know if you've been following Aaron DeFazio and what he's doing, just because you mentioned learning rate schedules, you know, what if you didn't have a schedule?Jeremy [00:04:18]: I don't care very much, honestly. I don't think that schedule-free optimizer is that exciting. It's fine. We've had non-scheduled optimizers for ages, like Less Wright, who's now at Meta, who was part of the Fast.ai community there, created something called the Ranger optimizer. I actually like having more hyperparameters. You know, as soon as you say schedule-free, then like, well, now I don't get to choose. And there isn't really a mathematically correct way of, like, I actually try to schedule more parameters rather than less. So like, I like scheduling my epsilon in my atom, for example. I schedule all the things. But then the other thing we always did with the Fast.ai library was make it so you don't have to set any schedules. So Fast.ai always supported, like, you didn't even have to pass a learning rate. Like, it would always just try to have good defaults and do the right thing. But to me, I like to have more parameters I can play with if I want to, but you don't have to.Alessio [00:05:08]: And then the more less technical side, I guess, of your issue, I guess, with the market was some of the large research labs taking all this innovation kind of behind closed doors and whether or not that's good, which it isn't. And now we could maybe make it more available to people. And then a month after we released the episode, there was the whole Sam Altman drama and like all the OpenAI governance issues. And maybe people started to think more, okay, what happens if some of these kind of labs, you know, start to break from within, so to speak? And the alignment of the humans is probably going to fall before the alignment of the models. So I'm curious, like, if you have any new thoughts and maybe we can also tie in some of the way that we've been building Answer as like a public benefit corp and some of those aspects.Jeremy [00:05:51]: Sure. So, yeah, I mean, it was kind of uncomfortable because two days before Altman got fired, I did a small public video interview in which I said, I'm quite sure that OpenAI's current governance structure can't continue and that it was definitely going to fall apart. And then it fell apart two days later and a bunch of people were like, what did you know, Jeremy?Alessio [00:06:13]: What did Jeremy see?Jeremy [00:06:15]: I didn't see anything. It's just obviously true. Yeah. So my friend Eric Ries and I spoke a lot before that about, you know, Eric's, I think probably most people would agree, the top expert in the world on startup and AI governance. And you know, we could both clearly see that this didn't make sense to have like a so-called non-profit where then there are people working at a company, a commercial company that's owned by or controlled nominally by the non-profit, where the people in the company are being given the equivalent of stock options, like everybody there was working there with expecting to make money largely from their equity. So the idea that then a board could exercise control by saying like, oh, we're worried about safety issues and so we're going to do something that decreases the profit of the company, when every stakeholder in the company, their remuneration pretty much is tied to their profit, it obviously couldn't work. So I mean, that was a huge oversight there by someone. I guess part of the problem is that the kind of people who work at non-profits and in this case the board, you know, who are kind of academics and, you know, people who are kind of true believers. I think it's hard for them to realize that 99.999% of the world is driven very heavily by money, especially huge amounts of money. So yeah, Eric and I had been talking for a long time before that about what could be done differently, because also companies are sociopathic by design and so the alignment problem as it relates to companies has not been solved. Like, companies become huge, they devour their founders, they devour their communities and they do things where even the CEOs, you know, often of big companies tell me like, I wish our company didn't do that thing. You know, I know that if I didn't do it, then I would just get fired and the board would put in somebody else and the board knows if they don't do it, then their shareholders can sue them because they're not maximizing profitability or whatever. So what Eric's spent a lot of time doing is trying to think about how do we make companies less sociopathic, you know, how to, or more, you know, maybe a better way to think of it is like, how do we make it so that the founders of companies can ensure that their companies continue to actually do the things they want them to do? You know, when we started a company, hey, we very explicitly decided we got to start a company, not a academic lab, not a nonprofit, you know, we created a Delaware Seacorp, you know, the most company kind of company. But when we did so, we told everybody, you know, including our first investors, which was you Alessio. They sound great. We are going to run this company on the basis of maximizing long-term value. And in fact, so when we did our second round, which was an angel round, we had everybody invest through a long-term SPV, which we set up where everybody had to agree to vote in line with long-term value principles. So like never enough just to say to people, okay, we're trying to create long-term value here for society as well as for ourselves and everybody's like, oh, yeah, yeah, I totally agree with that. But when it comes to like, okay, well, here's a specific decision we have to make, which will not maximize short-term value, people suddenly change their mind. So you know, it has to be written into the legal documents of everybody so that no question that that's the way the company has to be managed. So then you mentioned the PBC aspect, Public Benefit Corporation, which I never quite understood previously. And turns out it's incredibly simple, like it took, you know, like one paragraph added to our corporate documents to become a PBC. It was cheap, it was easy, but it's got this huge benefit, which is if you're not a public benefit corporation, then somebody can come along and offer to buy you with a stated description of like turning your company into the thing you most hate, right? And if they offer you more than the market value of your company and you don't accept it, then you are not necessarily meeting the kind of your fiduciary responsibilities. So the way like Eric always described it to me is like, if Philip Morris came along and said that you've got great technology for marketing cigarettes to children, so we're going to pivot your company to do that entirely, and we're going to pay you 50% more than the market value, you're going to have to say yes. If you have a PBC, then you are more than welcome to say no, if that offer is not in line with your stated public benefit. So our stated public benefit is to maximize the benefit to society through using AI. So given that more children smoking doesn't do that, then we can say like, no, we're not selling to you.Alessio [00:11:01]: I was looking back at some of our emails. You sent me an email on November 13th about talking and then on the 14th, I sent you an email working together to free AI was the subject line. And then that was kind of the start of the C round. And then two days later, someone got fired. So you know, you were having these thoughts even before we had like a public example of like why some of the current structures didn't work. So yeah, you were very ahead of the curve, so to speak. You know, people can read your awesome introduction blog and answer and the idea of having a R&D lab versus our lab and then a D lab somewhere else. I think to me, the most interesting thing has been hiring and some of the awesome people that you've been bringing on that maybe don't fit the central casting of Silicon Valley, so to speak. Like sometimes I got it like playing baseball cards, you know, people are like, oh, what teams was this person on, where did they work versus focusing on ability. So I would love for you to give a shout out to some of the awesome folks that you have on the team.Jeremy [00:11:58]: So, you know, there's like a graphic going around describing like the people at XAI, you know, Elon Musk thing. And like they are all connected to like multiple of Stanford, Meta, DeepMind, OpenAI, Berkeley, Oxford. Look, these are all great institutions and they have good people. And I'm definitely not at all against that, but damn, there's so many other people. And one of the things I found really interesting is almost any time I see something which I think like this is really high quality work and it's something I don't think would have been built if that person hadn't built the thing right now, I nearly always reach out to them and ask to chat. And I tend to dig in to find out like, okay, you know, why did you do that thing? Everybody else has done this other thing, your thing's much better, but it's not what other people are working on. And like 80% of the time, I find out the person has a really unusual background. So like often they'll have like, either they like came from poverty and didn't get an opportunity to go to a good school or had dyslexia and, you know, got kicked out of school in year 11, or they had a health issue that meant they couldn't go to university or something happened in their past and they ended up out of the mainstream. And then they kind of succeeded anyway. Those are the people that throughout my career, I've tended to kind of accidentally hire more of, but it's not exactly accidentally. It's like when I see somebody who's done, two people who have done extremely well, one of them did extremely well in exactly the normal way from the background entirely pointing in that direction and they achieved all the hurdles to get there. And like, okay, that's quite impressive, you know, but another person who did just as well, despite lots of constraints and doing things in really unusual ways and came up with different approaches. That's normally the person I'm likely to find useful to work with because they're often like risk-takers, they're often creative, they're often extremely tenacious, they're often very open-minded. So that's the kind of folks I tend to find myself hiring. So now at Answer.ai, it's a group of people that are strong enough that nearly every one of them has independently come to me in the past few weeks and told me that they have imposter syndrome and they're not convinced that they're good enough to be here. And I kind of heard it at the point where I was like, okay, I don't think it's possible that all of you are so far behind your peers that you shouldn't get to be here. But I think part of the problem is as an R&D lab, the great developers look at the great researchers and they're like, wow, these big-brained, crazy research people with all their math and s**t, they're too cool for me, oh my God. And then the researchers look at the developers and they're like, oh, they're killing it, making all this stuff with all these people using it and talking on Twitter about how great it is. I think they're both a bit intimidated by each other, you know. And so I have to kind of remind them like, okay, there are lots of things in this world where you suck compared to lots of other people in this company, but also vice versa, you know, for all things. And the reason you came here is because you wanted to learn about those other things from those other people and have an opportunity to like bring them all together into a single unit. You know, it's not reasonable to expect you're going to be better at everything than everybody else. I guess the other part of it is for nearly all of the people in the company, to be honest, they have nearly always been better than everybody else at nearly everything they're doing nearly everywhere they've been. So it's kind of weird to be in this situation now where it's like, gee, I can clearly see that I suck at this thing that I'm meant to be able to do compared to these other people where I'm like the worst in the company at this thing for some things. So I think that's a healthy place to be, you know, as long as you keep reminding each other about that's actually why we're here. And like, it's all a bit of an experiment, like we don't have any managers. We don't have any hierarchy from that point of view. So for example, I'm not a manager, which means I don't get to tell people what to do or how to do it or when to do it. Yeah, it's been a bit of an experiment to see how that would work out. And it's been great. So for instance, Ben Clavier, who you might have come across, he's the author of Ragatouille, he's the author of Rerankers, super strong information retrieval guy. And a few weeks ago, you know, this additional channel appeared on Discord, on our private Discord called Bert24. And these people started appearing, as in our collab sections, we have a collab section for like collaborating with outsiders. And these people started appearing, there are all these names that I recognize, like Bert24, and they're all talking about like the next generation of Bert. And I start following along, it's like, okay, Ben decided that I think, quite rightly, we need a new Bert. Because everybody, like so many people are still using Bert, and it's still the best at so many things, but it actually doesn't take advantage of lots of best practices. And so he just went out and found basically everybody who's created better Berts in the last four or five years, brought them all together, suddenly there's this huge collaboration going on. So yeah, I didn't tell him to do that. He didn't ask my permission to do that. And then, like, Benjamin Warner dived in, and he's like, oh, I created a whole transformers from scratch implementation designed to be maximally hackable. He originally did it largely as a teaching exercise to show other people, but he was like, I could, you know, use that to create a really hackable BERT implementation. In fact, he didn't say that. He said, I just did do that, you know, and I created a repo, and then everybody's like starts using it. They're like, oh my god, this is amazing. I can now implement all these other BERT things. And it's not just answer AI guys there, you know, there's lots of folks, you know, who have like contributed new data set mixes and blah, blah, blah. So, I mean, I can help in the same way that other people can help. So like, then Ben Clavier reached out to me at one point and said, can you help me, like, what have you learned over time about how to manage intimidatingly capable and large groups of people who you're nominally meant to be leading? And so, you know, I like to try to help, but I don't direct. Another great example was Kerem, who, after our FSTP QLORA work, decided quite correctly that it didn't really make sense to use LoRa in today's world. You want to use the normalized version, which is called Dora. Like two or three weeks after we did FSTP QLORA, he just popped up and said, okay, I've just converted the whole thing to Dora, and I've also created these VLLM extensions, and I've got all these benchmarks, and, you know, now I've got training of quantized models with adapters that are as fast as LoRa, and as actually better than, weirdly, fine tuning. Just like, okay, that's great, you know. And yeah, so the things we've done to try to help make these things happen as well is we don't have any required meetings, you know, but we do have a meeting for each pair of major time zones that everybody's invited to, and, you know, people see their colleagues doing stuff that looks really cool and say, like, oh, how can I help, you know, or how can I learn or whatever. So another example is Austin, who, you know, amazing background. He ran AI at Fidelity, he ran AI at Pfizer, he ran browsing and retrieval for Google's DeepMind stuff, created Jemma.cpp, and he's been working on a new system to make it easier to do web GPU programming, because, again, he quite correctly identified, yeah, so I said to him, like, okay, I want to learn about that. Not an area that I have much expertise in, so, you know, he's going to show me what he's working on and teach me a bit about it, and hopefully I can help contribute. I think one of the key things that's happened in all of these is everybody understands what Eric Gilliam, who wrote the second blog post in our series, the R&D historian, describes as a large yard with narrow fences. Everybody has total flexibility to do what they want. We all understand kind of roughly why we're here, you know, we agree with the premises around, like, everything's too expensive, everything's too complicated, people are building too many vanity foundation models rather than taking better advantage of fine-tuning, like, there's this kind of general, like, sense of we're all on the same wavelength about, you know, all the ways in which current research is fucked up, and, you know, all the ways in which we're worried about centralization. We all care a lot about not just research for the point of citations, but research that actually wouldn't have happened otherwise, and actually is going to lead to real-world outcomes. And so, yeah, with this kind of, like, shared vision, people understand, like, you know, so when I say, like, oh, well, you know, tell me, Ben, about BERT 24, what's that about? And he's like, you know, like, oh, well, you know, you can see from an accessibility point of view, or you can see from a kind of a actual practical impact point of view, there's far too much focus on decoder-only models, and, you know, like, BERT's used in all of these different places and industry, and so I can see, like, in terms of our basic principles, what we're trying to achieve, this seems like something important. And so I think that's, like, a really helpful that we have that kind of shared perspective, you know?Alessio [00:21:14]: Yeah. And before we maybe talk about some of the specific research, when you're, like, reaching out to people, interviewing them, what are some of the traits, like, how do these things come out, you know, usually? Is it working on side projects that you, you know, you're already familiar with? Is there anything, like, in the interview process that, like, helps you screen for people that are less pragmatic and more research-driven versus some of these folks that are just gonna do it, you know? They're not waiting for, like, the perfect process.Jeremy [00:21:40]: Everybody who comes through the recruiting is interviewed by everybody in the company. You know, our goal is 12 people, so it's not an unreasonable amount. So the other thing to say is everybody so far who's come into the recruiting pipeline, everybody bar one, has been hired. So which is to say our original curation has been good. And that's actually pretty easy, because nearly everybody who's come in through the recruiting pipeline are people I know pretty well. So Jono Whitaker and I, you know, he worked on the stable diffusion course we did. He's outrageously creative and talented, and he's super, like, enthusiastic tinkerer, just likes making things. Benjamin was one of the strongest parts of the fast.ai community, which is now the alumni. It's, like, hundreds of thousands of people. And you know, again, like, they're not people who a normal interview process would pick up, right? So Benjamin doesn't have any qualifications in math or computer science. Jono was living in Zimbabwe, you know, he was working on, like, helping some African startups, you know, but not FAANG kind of credentials. But yeah, I mean, when you actually see people doing real work and they stand out above, you know, we've got lots of Stanford graduates and open AI people and whatever in our alumni community as well. You know, when you stand out above all of those people anyway, obviously you've got something going for you. You know, Austin, him and I worked together on the masks study we did in the proceeding at the National Academy of Science. You know, we had worked together, and again, that was a group of, like, basically the 18 or 19 top experts in the world on public health and epidemiology and research design and so forth. And Austin, you know, one of the strongest people in that collaboration. So yeah, you know, like, I've been lucky enough to have had opportunities to work with some people who are great and, you know, I'm a very open-minded person, so I kind of am always happy to try working with pretty much anybody and some people stand out. You know, there have been some exceptions, people I haven't previously known, like Ben Clavier, actually, I didn't know before. But you know, with him, you just read his code, and I'm like, oh, that's really well-written code. And like, it's not written exactly the same way as everybody else's code, and it's not written to do exactly the same thing as everybody else's code. So yeah, and then when I chatted to him, it's just like, I don't know, I felt like we'd known each other for years, like we just were on the same wavelength, but I could pretty much tell that was going to happen just by reading his code. I think you express a lot in the code you choose to write and how you choose to write it, I guess. You know, or another example, a guy named Vic, who was previously the CEO of DataQuest, and like, in that case, you know, he's created a really successful startup. He won the first, basically, Kaggle NLP competition, which was automatic essay grading. He's got the current state-of-the-art OCR system, Surya. Again, he's just a guy who obviously just builds stuff, you know, he doesn't ask for permission, he doesn't need any, like, external resources. Actually, Karim's another great example of this, I mean, I already knew Karim very well because he was my best ever master's student, but it wasn't a surprise to me then when he then went off to create the world's state-of-the-art language model in Turkish on his own, in his spare time, with no budget, from scratch. This is not fine-tuning or whatever, he, like, went back to Common Crawl and did everything. Yeah, it's kind of, I don't know what I'd describe that process as, but it's not at all based on credentials.Swyx [00:25:17]: Assemble based on talent, yeah. We wanted to dive in a little bit more on, you know, turning from the people side of things into the technical bets that you're making. Just a little bit more on Bert. I was actually, we just did an interview with Yi Tay from Reka, I don't know if you're familiar with his work, but also another encoder-decoder bet, and one of his arguments was actually people kind of over-index on the decoder-only GPT-3 type paradigm. I wonder if you have thoughts there that is maybe non-consensus as well. Yeah, no, absolutely.Jeremy [00:25:45]: So I think it's a great example. So one of the people we're collaborating with a little bit with BERT24 is Colin Raffle, who is the guy behind, yeah, most of that stuff, you know, between that and UL2, there's a lot of really interesting work. And so one of the things I've been encouraging the BERT group to do, Colin has as well, is to consider using a T5 pre-trained encoder backbone as a thing you fine-tune, which I think would be really cool. You know, Colin was also saying actually just use encoder-decoder as your Bert, you know, why don't you like use that as a baseline, which I also think is a good idea. Yeah, look.Swyx [00:26:25]: What technical arguments are people under-weighting?Jeremy [00:26:27]: I mean, Colin would be able to describe this much better than I can, but I'll give my slightly non-expert attempt. Look, I mean, think about like diffusion models, right? Like in stable diffusion, like we use things like UNet. You have this kind of downward path and then in the upward path you have the cross connections, which it's not a tension, but it's like a similar idea, right? You're inputting the original encoding path into your decoding path. It's critical to make it work, right? Because otherwise in the decoding part, the model has to do so much kind of from scratch. So like if you're doing translation, like that's a classic kind of encoder-decoder example. If it's decoder only, you never get the opportunity to find the right, you know, feature engineering, the right feature encoding for the original sentence. And it kind of means then on every token that you generate, you have to recreate the whole thing, you know? So if you have an encoder, it's basically saying like, okay, this is your opportunity model to create a really useful feature representation for your input information. So I think there's really strong arguments for encoder-decoder models anywhere that there is this kind of like context or source thing. And then why encoder only? Well, because so much of the time what we actually care about is a classification, you know? It's like an output. It's like generating an arbitrary length sequence of tokens. So anytime you're not generating an arbitrary length sequence of tokens, decoder models don't seem to make much sense. Now the interesting thing is, you see on like Kaggle competitions, that decoder models still are at least competitive with things like Deberta v3. They have to be way bigger to be competitive with things like Deberta v3. And the only reason they are competitive is because people have put a lot more time and money and effort into training the decoder only ones, you know? There isn't a recent Deberta. There isn't a recent Bert. Yeah, it's a whole part of the world that people have slept on a little bit. And this is just what happens. This is how trends happen rather than like, to me, everybody should be like, oh, let's look at the thing that has shown signs of being useful in the past, but nobody really followed up with properly. That's the more interesting path, you know, where people tend to be like, oh, I need to get citations. So what's everybody else doing? Can I make it 0.1% better, you know, or 0.1% faster? That's what everybody tends to do. Yeah. So I think it's like, Itay's work commercially now is interesting because here's like a whole, here's a whole model that's been trained in a different way. So there's probably a whole lot of tasks it's probably better at than GPT and Gemini and Claude. So that should be a good commercial opportunity for them if they can figure out what those tasks are.Swyx [00:29:07]: Well, if rumors are to be believed, and he didn't comment on this, but, you know, Snowflake may figure out the commercialization for them. So we'll see.Jeremy [00:29:14]: Good.Alessio [00:29:16]: Let's talk about FSDP, Qlora, Qdora, and all of that awesome stuff. One of the things we talked about last time, some of these models are meant to run on systems that nobody can really own, no single person. And then you were like, well, what if you could fine tune a 70B model on like a 4090? And I was like, no, that sounds great, Jeremy, but like, can we actually do it? And then obviously you all figured it out. Can you maybe tell us some of the worst stories behind that, like the idea behind FSDP, which is kind of taking sharded data, parallel computation, and then Qlora, which is do not touch all the weights, just go quantize some of the model, and then within the quantized model only do certain layers instead of doing everything.Jeremy [00:29:57]: Well, do the adapters. Yeah.Alessio [00:29:59]: Yeah. Yeah. Do the adapters. Yeah. I will leave the floor to you. I think before you published it, nobody thought this was like a short term thing that we're just going to have. And now it's like, oh, obviously you can do it, but it's not that easy.Jeremy [00:30:12]: Yeah. I mean, to be honest, it was extremely unpleasant work to do. It's like not at all enjoyable. I kind of did version 0.1 of it myself before we had launched the company, or at least the kind of like the pieces. They're all pieces that are difficult to work with, right? So for the quantization, you know, I chatted to Tim Detmers quite a bit and, you know, he very much encouraged me by saying like, yeah, it's possible. He actually thought it'd be easy. It probably would be easy for him, but I'm not Tim Detmers. And, you know, so he wrote bits and bytes, which is his quantization library. You know, he wrote that for a paper. He didn't write that to be production like code. It's now like everybody's using it, at least the CUDA bits. So like, it's not particularly well structured. There's lots of code paths that never get used. There's multiple versions of the same thing. You have to try to figure it out. So trying to get my head around that was hard. And you know, because the interesting bits are all written in CUDA, it's hard to like to step through it and see what's happening. And then, you know, FSTP is this very complicated library and PyTorch, which not particularly well documented. So the only really, really way to understand it properly is again, just read the code and step through the code. And then like bits and bytes doesn't really work in practice unless it's used with PEF, the HuggingFace library and PEF doesn't really work in practice unless you use it with other things. And there's a lot of coupling in the HuggingFace ecosystem where like none of it works separately. You have to use it all together, which I don't love. So yeah, trying to just get a minimal example that I can play with was really hard. And so I ended up having to rewrite a lot of it myself to kind of create this like minimal script. One thing that helped a lot was Medec had this LlamaRecipes repo that came out just a little bit before I started working on that. And like they had a kind of role model example of like, here's how to train FSTP, LoRa, didn't work with QLoRa on Llama. A lot of the stuff I discovered, the interesting stuff would be put together by Les Wright, who's, he was actually the guy in the Fast.ai community I mentioned who created the Ranger Optimizer. So he's doing a lot of great stuff at Meta now. So yeah, I kind of, that helped get some minimum stuff going and then it was great once Benjamin and Jono joined full time. And so we basically hacked at that together and then Kerim joined like a month later or something. And it was like, gee, it was just a lot of like fiddly detailed engineering on like barely documented bits of obscure internals. So my focus was to see if it kind of could work and I kind of got a bit of a proof of concept working and then the rest of the guys actually did all the work to make it work properly. And, you know, every time we thought we had something, you know, we needed to have good benchmarks, right? So we'd like, it's very easy to convince yourself you've done the work when you haven't, you know, so then we'd actually try lots of things and be like, oh, and these like really important cases, the memory use is higher, you know, or it's actually slower. And we'd go in and we just find like all these things that were nothing to do with our library that just didn't work properly. And nobody had noticed they hadn't worked properly because nobody had really benchmarked it properly. So we ended up, you know, trying to fix a whole lot of different things. And even as we did so, new regressions were appearing in like transformers and stuff that Benjamin then had to go away and figure out like, oh, how come flash attention doesn't work in this version of transformers anymore with this set of models and like, oh, it turns out they accidentally changed this thing, so it doesn't work. You know, there's just, there's not a lot of really good performance type evals going on in the open source ecosystem. So there's an extraordinary amount of like things where people say like, oh, we built this thing and it has this result. And when you actually check it, so yeah, there's a shitload of war stories from getting that thing to work. And it did require a particularly like tenacious group of people and a group of people who don't mind doing a whole lot of kind of like really janitorial work, to be honest, to get the details right, to check them. Yeah.Alessio [00:34:09]: We had a trade out on the podcast and we talked about how a lot of it is like systems work to make some of these things work. It's not just like beautiful, pure math that you do on a blackboard. It's like, how do you get into the nitty gritty?Jeremy [00:34:22]: I mean, flash attention is a great example of that. Like it's, it basically is just like, oh, let's just take the attention and just do the tiled version of it, which sounds simple enough, you know, but then implementing that is challenging at lots of levels.Alessio [00:34:36]: Yeah. What about inference? You know, obviously you've done all this amazing work on fine tuning. Do you have any research you've been doing on the inference side, how to make local inference really fast on these models too?Jeremy [00:34:47]: We're doing quite a bit on that at the moment. We haven't released too much there yet. But one of the things I've been trying to do is also just to help other people. And one of the nice things that's happened is that a couple of folks at Meta, including Mark Seraphim, have done a nice job of creating this CUDA mode community of people working on like CUDA kernels or learning about that. And I tried to help get that going well as well and did some lessons to help people get into it. So there's a lot going on in both inference and fine tuning performance. And a lot of it's actually happening kind of related to that. So PyTorch team have created this Torch AO project on quantization. And so there's a big overlap now between kind of the FastAI and AnswerAI and CUDA mode communities of people working on stuff for both inference and fine tuning. But we're getting close now. You know, our goal is that nobody should be merging models, nobody should be downloading merged models, everybody should be using basically quantized plus adapters for almost everything and just downloading the adapters. And that should be much faster. So that's kind of the place we're trying to get to. It's difficult, you know, because like Karim's been doing a lot of work with VLM, for example. These inference engines are pretty complex bits of code. They have a whole lot of custom kernel stuff going on as well, as do the quantization libraries. So we've been working on, we're also quite a bit of collaborating with the folks who do HQQ, which is a really great quantization library and works super well. So yeah, there's a lot of other people outside AnswerAI that we're working with a lot who are really helping on all this performance optimization stuff, open source.Swyx [00:36:27]: Just to follow up on merging models, I picked up there that you said nobody should be merging models. That's interesting because obviously a lot of people are experimenting with this and finding interesting results. I would say in defense of merging models, you can do it without data. That's probably the only thing that's going for it.Jeremy [00:36:45]: To explain, it's not that you shouldn't merge models. You shouldn't be distributing a merged model. You should distribute a merged adapter 99% of the time. And actually often one of the best things happening in the model merging world is actually that often merging adapters works better anyway. The point is, Sean, that once you've got your new model, if you distribute it as an adapter that sits on top of a quantized model that somebody's already downloaded, then it's a much smaller download for them. And also the inference should be much faster because you're not having to transfer FB16 weights from HPM memory at all or ever load them off disk. You know, all the main weights are quantized and the only floating point weights are in the adapters. So that should make both inference and fine tuning faster. Okay, perfect.Swyx [00:37:33]: We're moving on a little bit to the rest of the fast universe. I would have thought that, you know, once you started Answer.ai, that the sort of fast universe would be kind of on hold. And then today you just dropped Fastlight and it looks like, you know, there's more activity going on in sort of Fastland.Jeremy [00:37:49]: Yeah. So Fastland and Answerland are not really distinct things. Answerland is kind of like the Fastland grown up and funded. They both have the same mission, which is to maximize the societal benefit of AI broadly. We want to create thousands of commercially successful products at Answer.ai. And we want to do that with like 12 people. So that means we need a pretty efficient stack, you know, like quite a few orders of magnitude more efficient, not just for creation, but for deployment and maintenance than anything that currently exists. People often forget about the D part of our R&D firm. So we've got to be extremely good at creating, deploying and maintaining applications, not just models. Much to my horror, the story around creating web applications is much worse now than it was 10 or 15 years ago in terms of, if I say to a data scientist, here's how to create and deploy a web application, you know, either you have to learn JavaScript or TypeScript and about all the complex libraries like React and stuff, and all the complex like details around security and web protocol stuff around how you then talk to a backend and then all the details about creating the backend. You know, if that's your job and, you know, you have specialists who work in just one of those areas, it is possible for that to all work. But compared to like, oh, write a PHP script and put it in the home directory that you get when you sign up to this shell provider, which is what it was like in the nineties, you know, here are those 25 lines of code and you're done and now you can pass that URL around to all your friends, or put this, you know, .pl file inside the CGI bin directory that you got when you signed up to this web host. So yeah, the thing I've been mainly working on the last few weeks is fixing all that. And I think I fixed it. I don't know if this is an announcement, but I tell you guys, so yeah, there's this thing called fastHTML, which basically lets you create a complete web application in a single Python file. Unlike excellent projects like Streamlit and Gradio, you're not working on top of a highly abstracted thing. That's got nothing to do with web foundations. You're working with web foundations directly, but you're able to do it by using pure Python. There's no template, there's no ginger, there's no separate like CSS and JavaScript files. It looks and behaves like a modern SPA web application. And you can create components for like daisy UI, or bootstrap, or shoelace, or whatever fancy JavaScript and or CSS tailwind etc library you like, but you can write it all in Python. You can pip install somebody else's set of components and use them entirely from Python. You can develop and prototype it all in a Jupyter notebook if you want to. It all displays correctly, so you can like interactively do that. And then you mentioned Fastlight, so specifically now if you're using SQLite in particular, it's like ridiculously easy to have that persistence, and all of your handlers will be passed database ready objects automatically, that you can just call dot delete dot update dot insert on. Yeah, you get session, you get security, you get all that. So again, like with most everything I do, it's very little code. It's mainly tying together really cool stuff that other people have written. You don't have to use it, but a lot of the best stuff comes from its incorporation of HTMX, which to me is basically the thing that changes your browser to make it work the way it always should have. So it just does four small things, but those four small things are the things that are basically unnecessary constraints that HTML should never have had, so it removes the constraints. It sits on top of Starlet, which is a very nice kind of lower level platform for building these kind of web applications. The actual interface matches as closely as possible to FastAPI, which is a really nice system for creating the kind of classic JavaScript type applications. And Sebastian, who wrote FastAPI, has been kind enough to help me think through some of these design decisions, and so forth. I mean, everybody involved has been super helpful. Actually, I chatted to Carson, who created HTMX, you know, so about it. Some of the folks involved in Django, like everybody in the community I've spoken to definitely realizes there's a big gap to be filled around, like, highly scalable, web foundation-based, pure Python framework with a minimum of fuss. So yeah, I'm getting a lot of support and trying to make sure that FastHTML works well for people.Swyx [00:42:38]: I would say, when I heard about this, I texted Alexio. I think this is going to be pretty huge. People consider Streamlit and Gradio to be the state of the art, but I think there's so much to improve, and having what you call web foundations and web fundamentals at the core of it, I think, would be really helpful.Jeremy [00:42:54]: I mean, it's based on 25 years of thinking and work for me. So like, FastML was built on a system much like this one, but that was of hell. And so I spent, you know, 10 years working on that. We had millions of people using that every day, really pushing it hard. And I really always enjoyed working in that. Yeah. So, you know, and obviously lots of other people have done like great stuff, and particularly HTMX. So I've been thinking about like, yeah, how do I pull together the best of the web framework I created for FastML with HTMX? There's also things like PicoCSS, which is the CSS system, which by default, FastHTML comes with. Although, as I say, you can pip install anything you want to, but it makes it like super easy to, you know, so we try to make it so that just out of the box, you don't have any choices to make. Yeah. You can make choices, but for most people, you just, you know, it's like the PHP in your home directory thing. You just start typing and just by default, you'll get something which looks and feels, you know, pretty okay. And if you want to then write a version of Gradio or Streamlit on top of that, you totally can. And then the nice thing is if you then write it in kind of the Gradio equivalent, which will be, you know, I imagine we'll create some kind of pip installable thing for that. Once you've outgrown, or if you outgrow that, it's not like, okay, throw that all away and start again. And this like whole separate language that it's like this kind of smooth, gentle path that you can take step-by-step because it's all just standard web foundations all the way, you know.Swyx [00:44:29]: Just to wrap up the sort of open source work that you're doing, you're aiming to create thousands of projects with a very, very small team. I haven't heard you mention once AI agents or AI developer tooling or AI code maintenance. I know you're very productive, but you know, what is the role of AI in your own work?Jeremy [00:44:47]: So I'm making something. I'm not sure how much I want to say just yet.Swyx [00:44:52]: Give us a nibble.Jeremy [00:44:53]: All right. I'll give you the key thing. So I've created a new approach. It's not called prompt engineering. It's called dialogue engineering. But I'm creating a system for doing dialogue engineering. It's currently called AI magic. I'm doing most of my work in this system and it's making me much more productive than I was before I used it. So I always just build stuff for myself and hope that it'll be useful for somebody else. Think about chat GPT with code interpreter, right? The basic UX is the same as a 1970s teletype, right? So if you wrote APL on a teletype in the 1970s, you typed onto a thing, your words appeared at the bottom of a sheet of paper and you'd like hit enter and it would scroll up. And then the answer from APL would be printed out, scroll up, and then you would type the next thing. And like, which is also the way, for example, a shell works like bash or ZSH or whatever. It's not terrible, you know, like we all get a lot done in these like very, very basic teletype style REPL environments, but I've never felt like it's optimal and everybody else has just copied chat GPT. So it's also the way BART and Gemini work. It's also the way the Claude web app works. And then you add code interpreter. And the most you can do is to like plead with chat GPT to write the kind of code I want. It's pretty good for very, very, very beginner users who like can't code at all, like by default now the code's even hidden away, so you never even have to see it ever happened. But for somebody who's like wanting to learn to code or who already knows a bit of code or whatever, it's, it seems really not ideal. So okay, that's one end of the spectrum. The other end of the spectrum, which is where Sean's work comes in, is, oh, you want to do more than chat GPT? No worries. Here is Visual Studio Code. I run it. There's an empty screen with a flashing cursor. Okay, start coding, you know, and it's like, okay, you can use systems like Sean's or like cursor or whatever to be like, okay, Apple K in cursors, like a creative form that blah, blah, blah. But in the end, it's like a convenience over the top of this incredibly complicated system that full-time sophisticated software engineers have designed over the past few decades in a totally different environment as a way to build software, you know. And so we're trying to like shoehorn in AI into that. And it's not easy to do. And I think there are like much better ways of thinking about the craft of software development in a language model world to be much more interactive, you know. So the thing that I'm building is neither of those things. It's something between the two. And it's built around this idea of crafting a dialogue, you know, where the outcome of the dialogue is the artifacts that you want, whether it be a piece of analysis or whether it be a Python library or whether it be a technical blog post or whatever. So as part of building that, I've created something called Claudette, which is a library for Claude. I've created something called Cosette, which is a library for OpenAI. They're libraries which are designed to make those APIs much more usable, much easier to use, much more concise. And then I've written AI magic on top of those. And that's been an interesting exercise because I did Claudette first, and I was looking at what Simon Willison did with his fantastic LLM library. And his library is designed around like, let's make something that supports all the LLM inference engines and commercial providers. I thought, okay, what if I did something different, which is like make something that's as Claude friendly as possible and forget everything else. So that's what Claudette was. So for example, one of the really nice things in Claude is prefill. So by telling the assistant that this is what your response started with, there's a lot of powerful things you can take advantage of. So yeah, I created Claudette to be as Claude friendly as possible. And then after I did that, and then particularly with GPT 4.0 coming out, I kind of thought, okay, now let's create something that's as OpenAI friendly as possible. And then I tried to look to see, well, where are the similarities and where are the differences? And now can I make them compatible in places where it makes sense for them to be compatible without losing out on the things that make each one special for what they are. So yeah, those are some of the things I've been working on in that space. And I'm thinking we might launch AI magic via a course called how to solve it with code. The name is based on the classic Polya book, if you know how to solve it, which is, you know, one of the classic math books of all time, where we're basically going to try to show people how to solve challenging problems that they didn't think they could solve without doing a full computer science course, by taking advantage of a bit of AI and a bit of like practical skills, as particularly for this like whole generation of people who are learning to code with and because of ChatGPT. Like I love it, I know a lot of people who didn't really know how to code, but they've created things because they use ChatGPT, but they don't really know how to maintain them or fix them or add things to them that ChatGPT can't do, because they don't really know how to code. And so this course will be designed to show you how you can like either become a developer who can like supercharge their capabilities by using language models, or become a language model first developer who can supercharge their capabilities by understanding a bit about process and fundamentals.Alessio [00:50:19]: Nice. That's a great spoiler. You know, I guess the fourth time you're going to be on learning space, we're going to talk about AI magic. Jeremy, before we wrap, this was just a great run through everything. What are the things that when you next come on the podcast in nine, 12 months, we're going to be like, man, Jeremy was like really ahead of it. Like, is there anything that you see in the space that maybe people are not talking enough? You know, what's the next company that's going to fall, like have drama internally, anything in your mind?Jeremy [00:50:47]: You know, hopefully we'll be talking a lot about fast HTML and hopefully the international community that at that point has come up around that. And also about AI magic and about dialogue engineering. Hopefully dialogue engineering catches on because I think it's the right way to think about a lot of this stuff. What else? Just trying to think about all on the research side. Yeah. I think, you know, I mean, we've talked about a lot of it. Like I think encoder decoder architectures, encoder only architectures, hopefully we'll be talking about like the whole re-interest in BERT that BERT 24 stimulated.Swyx [00:51:17]: There's a safe space model that came out today that might be interesting for this general discussion. One thing that stood out to me with Cartesia's blog posts was that they were talking about real time ingestion, billions and trillions of tokens, and keeping that context, obviously in the state space that they have.Jeremy [00:51:34]: Yeah.Swyx [00:51:35]: I'm wondering what your thoughts are because you've been entirely transformers the whole time.Jeremy [00:51:38]: Yeah. No. So obviously my background is RNNs and LSTMs. Of course. And I'm still a believer in the idea that state is something you can update, you know? So obviously Sepp Hochreiter came up, came out with xLSTM recently. Oh my God. Okay. Another whole thing we haven't talked about, just somewhat related. I've been going crazy for like a long time about like, why can I not pay anybody to save my KV cash? I just ingested the Great Gatsby or the documentation for Starlet or whatever, you know, I'm sending it as my prompt context. Why are you redoing it every time? So Gemini is about to finally come out with KV caching, and this is something that Austin actually in Gemma.cpp had had on his roadmap for years, well not years, months, long time. The idea that the KV cache is like a thing that, it's a third thing, right? So there's RAG, you know, there's in-context learning, you know, and prompt engineering, and there's KV cache creation. I think it creates like a whole new class almost of applications or as techniques where, you know, for me, for example, I very often work with really new libraries or I've created my own library that I'm now writing with rather than on. So I want all the docs in my new library to be there all the time. So I want to upload them once, and then we have a whole discussion about building this application using FastHTML. Well nobody's got FastHTML in their language model yet, I don't want to send all the FastHTML docs across every time. So one of the things I'm looking at doing in AI Magic actually is taking advantage of some of these ideas so that you can have the documentation of the libraries you're working on be kind of always available. Something over the next 12 months people will be spending time thinking about is how to like, where to use RAG, where to use fine-tuning, where to use KV cache storage, you know. And how to use state, because in state models and XLSTM, again, state is something you update. So how do we combine the best of all of these worlds?Alessio [00:53:46]: And Jeremy, I know before you talked about how some of the autoregressive models are not maybe a great fit for agents. Any other thoughts on like JEPA, diffusion for text, any interesting thing that you've seen pop up?Jeremy [00:53:58]: In the same way that we probably ought to have state that you can update, i.e. XLSTM and state models, in the same way that a lot of things probably should have an encoder, JEPA and diffusion both seem like the right conceptual mapping for a lot of things we probably want to do. So the idea of like, there should be a piece of the generative pipeline, which is like thinking about the answer and coming up with a sketch of what the answer looks like before you start outputting tokens. That's where it kind of feels like diffusion ought to fit, you know. And diffusion is, because it's not autoregressive, it's like, let's try to like gradually de-blur the picture of how to solve this. So this is also where dialogue engineering fits in, by the way. So with dialogue engineering, one of the reasons it's working so well for me is I use it to kind of like craft the thought process before I generate the code, you know. So yeah, there's a lot of different pieces here and I don't know how they'll all kind of exactly fit together. I don't know if JEPA is going to actually end up working in the text world. I don't know if diffusion will end up working in the text world, but they seem to be like trying to solve a class of problem which is currently unsolved.Alessio [00:55:13]: Awesome, Jeremy. This was great, as usual. Thanks again for coming back on the pod and thank you all for listening. Yeah, that was fantastic. Get full access to Latent Space at www.latent.space/subscribe

god ceo google ai science magic building research elon musk japanese development african chatgpt silicon valley ceos discord oxford stanford spa products berkeley cto pfizer react bart managers governance openai turkish gemini residence cgi zimbabwe nvidia ux api shipping ranger vic python ui winds gpt ml llama snowflakes national academy apis karim javascript html assemble r d sam altman llm gpu altman great gatsby css php django jono kv ocr rag deepmind anthropic alessio fine tuning faang surya typescript eric ries pbc philip morris xai apl starlet cuda visual studio code dpo kerem t5 reka pytorch kaggle sqlite kerim spv jupyter public benefit corporation itay 33m pef jeremy howard 70b repl neurips berts ai engineer huggingface vl m htmx ai winter zsh hpm alexio rnns iclr simon willison streamlit webgpu latent space unet gradio lstms polya web application development

Llama 3, context window and LSTMs

Machine Learning Teeta

Play Episode Listen Later Apr 24, 2024 35:46

We discuss three main things. First the release of llama 3. Model by meta. Second, we answer the question as to why LLMs with longer context windows are not always better. Then finally, we explain when LSTMs should be used over Transformers. Enjoy!

model window context transformers llama lstms

Latent Space Chats: NLW (Four Wars, GPT5), Josh Albrecht/Ali Rohde (TNAI), Dylan Patel/Semianalysis (Groq), Milind Naphade (Nvidia GTC), Personal AI (ft. Harrison Chase — LangFriend/LangMem)

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Apr 6, 2024 121:17

Our next 2 big events are AI UX and the World's Fair. Join and apply to speak/sponsor!Due to timing issues we didn't have an interview episode to share with you this week, but not to worry, we have more than enough “weekend special” content in the backlog for you to get your Latent Space fix, whether you like thinking about the big picture, or learning more about the pod behind the scenes, or talking Groq and GPUs, or AI Leadership, or Personal AI. Enjoy!AI BreakdownThe indefatigable NLW had us back on his show for an update on the Four Wars, covering Sora, Suno, and the reshaped GPT-4 Class Landscape:and a longer segment on AI Engineering trends covering the future LLM landscape (Llama 3, GPT-5, Gemini 2, Claude 4), Open Source Models (Mistral, Grok), Apple and Meta's AI strategy, new chips (Groq, MatX) and the general movement from baby AGIs to vertical Agents:Thursday Nights in AIWe're also including swyx's interview with Josh Albrecht and Ali Rohde to reintroduce swyx and Latent Space to a general audience, and engage in some spicy Q&A:Dylan Patel on GroqWe hosted a private event with Dylan Patel of SemiAnalysis (our last pod here):Not all of it could be released so we just talked about our Groq estimates:Milind Naphade - Capital OneIn relation to conversations at NeurIPS and Nvidia GTC and upcoming at World's Fair, we also enjoyed chatting with Milind Naphade about his AI Leadership work at IBM, Cisco, Nvidia, and now leading the AI Foundations org at Capital One. We covered:* Milind's learnings from ~25 years in machine learning * His first paper citation was 24 years ago* Lessons from working with Jensen Huang for 6 years and being CTO of Metropolis * Thoughts on relevant AI research* GTC takeaways and what makes NVIDIA specialIf you'd like to work on building solutions rather than platform (as Milind put it), his Applied AI Research team at Capital One is hiring, which falls under the Capital One Tech team.Personal AI MeetupIt all started with a meme:Within days of each other, BEE, FRIEND, EmilyAI, Compass, Nox and LangFriend were all launching personal AI wearables and assistants. So we decided to put together a the world's first Personal AI meetup featuring creators and enthusiasts of wearables. The full video is live now, with full show notes within.Timestamps* [00:01:13] AI Breakdown Part 1* [00:02:20] Four Wars* [00:13:45] Sora* [00:15:12] Suno* [00:16:34] The GPT-4 Class Landscape* [00:17:03] Data War: Reddit x Google* [00:21:53] Gemini 1.5 vs Claude 3* [00:26:58] AI Breakdown Part 2* [00:27:33] Next Frontiers: Llama 3, GPT-5, Gemini 2, Claude 4* [00:31:11] Open Source Models - Mistral, Grok* [00:34:13] Apple MM1* [00:37:33] Meta's $800b AI rebrand* [00:39:20] AI Engineer landscape - from baby AGIs to vertical Agents* [00:47:28] Adept episode - Screen Multimodality* [00:48:54] Top Model Research from January Recap* [00:53:08] AI Wearables* [00:57:26] Groq vs Nvidia month - GPU Chip War* [01:00:31] Disagreements* [01:02:08] Summer 2024 Predictions* [01:04:18] Thursday Nights in AI - swyx* [01:33:34] Dylan Patel - Semianalysis + Latent Space Live Show* [01:34:58] GroqTranscript[00:00:00] swyx: Welcome to the Latent Space Podcast Weekend Edition. This is Charlie, your AI co host. Swyx and Alessio are off for the week, making more great content. We have exciting interviews coming up with Elicit, Chroma, Instructor, and our upcoming series on NSFW, Not Safe for Work AI. In today's episode, we're collating some of Swyx and Alessio's recent appearances, all in one place for you to find.[00:00:32] swyx: In part one, we have our first crossover pod of the year. In our listener survey, several folks asked for more thoughts from our two hosts. In 2023, Swyx and Alessio did crossover interviews with other great podcasts like the AI Breakdown, Practical AI, Cognitive Revolution, Thursday Eye, and Chinatalk, all of which you can find in the Latentspace About page.[00:00:56] swyx: NLW of the AI Breakdown asked us back to do a special on the 4Wars framework and the AI engineer scene. We love AI Breakdown as one of the best examples Daily podcasts to keep up on AI news, so we were especially excited to be back on Watch out and take[00:01:12] NLW: care[00:01:13] AI Breakdown Part 1[00:01:13] NLW: today on the AI breakdown. Part one of my conversation with Alessio and Swix from Latent Space.[00:01:19] NLW: All right, fellas, welcome back to the AI Breakdown. How are you doing? I'm good. Very good. With the last, the last time we did this show, we were like, oh yeah, let's do check ins like monthly about all the things that are going on and then. Of course, six months later, and, you know, the, the, the world has changed in a thousand ways.[00:01:36] NLW: It's just, it's too busy to even, to even think about podcasting sometimes. But I, I'm super excited to, to be chatting with you again. I think there's, there's a lot to, to catch up on, just to tap in, I think in the, you know, in the beginning of 2024. And, and so, you know, we're gonna talk today about just kind of a, a, a broad sense of where things are in some of the key battles in the AI space.[00:01:55] NLW: And then the, you know, one of the big things that I, that I'm really excited to have you guys on here for us to talk about where, sort of what patterns you're seeing and what people are actually trying to build, you know, where, where developers are spending their, their time and energy and, and, and any sort of, you know, trend trends there, but maybe let's start I guess by checking in on a framework that you guys actually introduced, which I've loved and I've cribbed a couple of times now, which is this sort of four wars of the, of the AI stack.[00:02:20] Four Wars[00:02:20] NLW: Because first, since I have you here, I'd love, I'd love to hear sort of like where that started gelling. And then and then maybe we can get into, I think a couple of them that are you know, particularly interesting, you know, in the, in light of[00:02:30] swyx: some recent news. Yeah, so maybe I'll take this one. So the four wars is a framework that I came up around trying to recap all of 2023.[00:02:38] swyx: I tried to write sort of monthly recap pieces. And I was trying to figure out like what makes one piece of news last longer than another or more significant than another. And I think it's basically always around battlegrounds. Wars are fought around limited resources. And I think probably the, you know, the most limited resource is talent, but the talent expresses itself in a number of areas.[00:03:01] swyx: And so I kind of focus on those, those areas at first. So the four wars that we cover are the data wars, the GPU rich, poor war, the multi modal war, And the RAG and Ops War. And I think you actually did a dedicated episode to that, so thanks for covering that. Yeah, yeah.[00:03:18] NLW: Not only did I do a dedicated episode, I actually used that.[00:03:22] NLW: I can't remember if I told you guys. I did give you big shoutouts. But I used it as a framework for a presentation at Intel's big AI event that they hold each year, where they have all their folks who are working on AI internally. And it totally resonated. That's amazing. Yeah, so, so, what got me thinking about it again is specifically this inflection news that we recently had, this sort of, you know, basically, I can't imagine that anyone who's listening wouldn't have thought about it, but, you know, inflection is a one of the big contenders, right?[00:03:53] NLW: I think probably most folks would have put them, you know, just a half step behind the anthropics and open AIs of the world in terms of labs, but it's a company that raised 1. 3 billion last year, less than a year ago. Reed Hoffman's a co founder Mustafa Suleyman, who's a co founder of DeepMind, you know, so it's like, this is not a a small startup, let's say, at least in terms of perception.[00:04:13] NLW: And then we get the news that basically most of the team, it appears, is heading over to Microsoft and they're bringing in a new CEO. And you know, I'm interested in, in, in kind of your take on how much that reflects, like hold aside, I guess, you know, all the other things that it might be about, how much it reflects this sort of the, the stark.[00:04:32] NLW: Brutal reality of competing in the frontier model space right now. And, you know, just the access to compute.[00:04:38] Alessio: There are a lot of things to say. So first of all, there's always somebody who's more GPU rich than you. So inflection is GPU rich by startup standard. I think about 22, 000 H100s, but obviously that pales compared to the, to Microsoft.[00:04:55] Alessio: The other thing is that this is probably good news, maybe for the startups. It's like being GPU rich, it's not enough. You know, like I think they were building something pretty interesting in, in pi of their own model of their own kind of experience. But at the end of the day, you're the interface that people consume as end users.[00:05:13] Alessio: It's really similar to a lot of the others. So and we'll tell, talk about GPT four and cloud tree and all this stuff. GPU poor, doing something. That the GPU rich are not interested in, you know we just had our AI center of excellence at Decibel and one of the AI leads at one of the big companies was like, Oh, we just saved 10 million and we use these models to do a translation, you know, and that's it.[00:05:39] Alessio: It's not, it's not a GI, it's just translation. So I think like the inflection part is maybe. A calling and a waking to a lot of startups then say, Hey, you know, trying to get as much capital as possible, try and get as many GPUs as possible. Good. But at the end of the day, it doesn't build a business, you know, and maybe what inflection I don't, I don't, again, I don't know the reasons behind the inflection choice, but if you say, I don't want to build my own company that has 1.[00:06:05] Alessio: 3 billion and I want to go do it at Microsoft, it's probably not a resources problem. It's more of strategic decisions that you're making as a company. So yeah, that was kind of my. I take on it.[00:06:15] swyx: Yeah, and I guess on my end, two things actually happened yesterday. It was a little bit quieter news, but Stability AI had some pretty major departures as well.[00:06:25] swyx: And you may not be considering it, but Stability is actually also a GPU rich company in the sense that they were the first new startup in this AI wave to brag about how many GPUs that they have. And you should join them. And you know, Imadis is definitely a GPU trader in some sense from his hedge fund days.[00:06:43] swyx: So Robin Rhombach and like the most of the Stable Diffusion 3 people left Stability yesterday as well. So yesterday was kind of like a big news day for the GPU rich companies, both Inflection and Stability having sort of wind taken out of their sails. I think, yes, it's a data point in the favor of Like, just because you have the GPUs doesn't mean you can, you automatically win.[00:07:03] swyx: And I think, you know, kind of I'll echo what Alessio says there. But in general also, like, I wonder if this is like the start of a major consolidation wave, just in terms of, you know, I think that there was a lot of funding last year and, you know, the business models have not been, you know, All of these things worked out very well.[00:07:19] swyx: Even inflection couldn't do it. And so I think maybe that's the start of a small consolidation wave. I don't think that's like a sign of AI winter. I keep looking for AI winter coming. I think this is kind of like a brief cold front. Yeah,[00:07:34] NLW: it's super interesting. So I think a bunch of A bunch of stuff here.[00:07:38] NLW: One is, I think, to both of your points, there, in some ways, there, there had already been this very clear demarcation between these two sides where, like, the GPU pores, to use the terminology, like, just weren't trying to compete on the same level, right? You know, the vast majority of people who have started something over the last year, year and a half, call it, were racing in a different direction.[00:07:59] NLW: They're trying to find some edge somewhere else. They're trying to build something different. If they're, if they're really trying to innovate, it's in different areas. And so it's really just this very small handful of companies that are in this like very, you know, it's like the coheres and jaspers of the world that like this sort of, you know, that are that are just sort of a little bit less resourced than, you know, than the other set that I think that this potentially even applies to, you know, everyone else that could clearly demarcate it into these two, two sides.[00:08:26] NLW: And there's only a small handful kind of sitting uncomfortably in the middle, perhaps. Let's, let's come back to the idea of, of the sort of AI winter or, you know, a cold front or anything like that. So this is something that I, I spent a lot of time kind of thinking about and noticing. And my perception is that The vast majority of the folks who are trying to call for sort of, you know, a trough of disillusionment or, you know, a shifting of the phase to that are people who either, A, just don't like AI for some other reason there's plenty of that, you know, people who are saying, You Look, they're doing way worse than they ever thought.[00:09:03] NLW: You know, there's a lot of sort of confirmation bias kind of thing going on. Or two, media that just needs a different narrative, right? Because they're sort of sick of, you know, telling the same story. Same thing happened last summer, when every every outlet jumped on the chat GPT at its first down month story to try to really like kind of hammer this idea that that the hype was too much.[00:09:24] NLW: Meanwhile, you have, you know, just ridiculous levels of investment from enterprises, you know, coming in. You have, you know, huge, huge volumes of, you know, individual behavior change happening. But I do think that there's nothing incoherent sort of to your point, Swyx, about that and the consolidation period.[00:09:42] NLW: Like, you know, if you look right now, for example, there are, I don't know, probably 25 or 30 credible, like, build your own chatbot. platforms that, you know, a lot of which have, you know, raised funding. There's no universe in which all of those are successful across, you know, even with a, even, even with a total addressable market of every enterprise in the world, you know, you're just inevitably going to see some amount of consolidation.[00:10:08] NLW: Same with, you know, image generators. There are, if you look at A16Z's top 50 consumer AI apps, just based on, you know, web traffic or whatever, they're still like I don't know, a half. Dozen or 10 or something, like, some ridiculous number of like, basically things like Midjourney or Dolly three. And it just seems impossible that we're gonna have that many, you know, ultimately as, as, as sort of, you know, going, going concerned.[00:10:33] NLW: So, I don't know. I, I, I think that the, there will be inevitable consolidation 'cause you know. It's, it's also what kind of like venture rounds are supposed to do. You're not, not everyone who gets a seed round is supposed to get to series A and not everyone who gets a series A is supposed to get to series B.[00:10:46] NLW: That's sort of the natural process. I think it will be tempting for a lot of people to try to infer from that something about AI not being as sort of big or as as sort of relevant as, as it was hyped up to be. But I, I kind of think that's the wrong conclusion to come to.[00:11:02] Alessio: I I would say the experimentation.[00:11:04] Alessio: Surface is a little smaller for image generation. So if you go back maybe six, nine months, most people will tell you, why would you build a coding assistant when like Copilot and GitHub are just going to win everything because they have the data and they have all the stuff. If you fast forward today, A lot of people use Cursor everybody was excited about the Devin release on Twitter.[00:11:26] Alessio: There are a lot of different ways of attacking the market that are not completion of code in the IDE. And even Cursors, like they evolved beyond single line to like chat, to do multi line edits and, and all that stuff. Image generation, I would say, yeah, as a, just as from what I've seen, like maybe the product innovation has slowed down at the UX level and people are improving the models.[00:11:50] Alessio: So the race is like, how do I make better images? It's not like, how do I make the user interact with the generation process better? And that gets tough, you know? It's hard to like really differentiate yourselves. So yeah, that's kind of how I look at it. And when we think about multimodality, maybe the reason why people got so excited about Sora is like, oh, this is like a completely It's not a better image model.[00:12:13] Alessio: This is like a completely different thing, you know? And I think the creative mind It's always looking for something that impacts the viewer in a different way, you know, like they really want something different versus the developer mind. It's like, Oh, I, I just, I have this like very annoying thing I want better.[00:12:32] Alessio: I have this like very specific use cases that I want to go after. So it's just different. And that's why you see a lot more companies in image generation. But I agree with you that. If you fast forward there, there's not going to be 10 of them, you know, it's probably going to be one or[00:12:46] swyx: two. Yeah, I mean, to me, that's why I call it a war.[00:12:49] swyx: Like, individually, all these companies can make a story that kind of makes sense, but collectively, they cannot all be true. Therefore, they all, there is some kind of fight over limited resources here. Yeah, so[00:12:59] NLW: it's interesting. We wandered very naturally into sort of another one of these wars, which is the multimodality kind of idea, which is, you know, basically a question of whether it's going to be these sort of big everything models that end up winning or whether, you know, you're going to have really specific things, you know, like something, you know, Dolly 3 inside of sort of OpenAI's larger models versus, you know, a mid journey or something like that.[00:13:24] NLW: And at first, you know, I was kind of thinking like, For most of the last, call it six months or whatever, it feels pretty definitively both and in some ways, you know, and that you're, you're seeing just like great innovation on sort of the everything models, but you're also seeing lots and lots happen at sort of the level of kind of individual use cases.[00:13:45] Sora[00:13:45] NLW: But then Sora comes along and just like obliterates what I think anyone thought you know, where we were when it comes to video generation. So how are you guys thinking about this particular battle or war at the moment?[00:13:59] swyx: Yeah, this was definitely a both and story, and Sora tipped things one way for me, in terms of scale being all you need.[00:14:08] swyx: And the benefit, I think, of having multiple models being developed under one roof. I think a lot of people aren't aware that Sora was developed in a similar fashion to Dolly 3. And Dolly3 had a very interesting paper out where they talked about how they sort of bootstrapped their synthetic data based on GPT 4 vision and GPT 4.[00:14:31] swyx: And, and it was just all, like, really interesting, like, if you work on one modality, it enables you to work on other modalities, and all that is more, is, is more interesting. I think it's beneficial if it's all in the same house, whereas the individual startups who don't, who sort of carve out a single modality and work on that, definitely won't have the state of the art stuff on helping them out on synthetic data.[00:14:52] swyx: So I do think like, The balance is tilted a little bit towards the God model companies, which is challenging for the, for the, for the the sort of dedicated modality companies. But everyone's carving out different niches. You know, like we just interviewed Suno ai, the sort of music model company, and, you know, I don't see opening AI pursuing music anytime soon.[00:15:12] Suno[00:15:12] swyx: Yeah,[00:15:13] NLW: Suno's been phenomenal to play with. Suno has done that rare thing where, which I think a number of different AI product categories have done, where people who don't consider themselves particularly interested in doing the thing that the AI enables find themselves doing a lot more of that thing, right?[00:15:29] NLW: Like, it'd be one thing if Just musicians were excited about Suno and using it but what you're seeing is tons of people who just like music all of a sudden like playing around with it and finding themselves kind of down that rabbit hole, which I think is kind of like the highest compliment that you can give one of these startups at the[00:15:45] swyx: early days of it.[00:15:46] swyx: Yeah, I, you know, I, I asked them directly, you know, in the interview about whether they consider themselves mid journey for music. And he had a more sort of nuanced response there, but I think that probably the business model is going to be very similar because he's focused on the B2C element of that. So yeah, I mean, you know, just to, just to tie back to the question about, you know, You know, large multi modality companies versus small dedicated modality companies.[00:16:10] swyx: Yeah, highly recommend people to read the Sora blog posts and then read through to the Dali blog posts because they, they strongly correlated themselves with the same synthetic data bootstrapping methods as Dali. And I think once you make those connections, you're like, oh, like it, it, it is beneficial to have multiple state of the art models in house that all help each other.[00:16:28] swyx: And these, this, that's the one thing that a dedicated modality company cannot do.[00:16:34] The GPT-4 Class Landscape[00:16:34] NLW: So I, I wanna jump, I wanna kind of build off that and, and move into the sort of like updated GPT-4 class landscape. 'cause that's obviously been another big change over the last couple months. But for the sake of completeness, is there anything that's worth touching on with with sort of the quality?[00:16:46] NLW: Quality data or sort of a rag ops wars just in terms of, you know, anything that's changed, I guess, for you fundamentally in the last couple of months about where those things stand.[00:16:55] swyx: So I think we're going to talk about rag for the Gemini and Clouds discussion later. And so maybe briefly discuss the data piece.[00:17:03] Data War: Reddit x Google[00:17:03] swyx: I think maybe the only new thing was this Reddit deal with Google for like a 60 million dollar deal just ahead of their IPO, very conveniently turning Reddit into a AI data company. Also, very, very interestingly, a non exclusive deal, meaning that Reddit can resell that data to someone else. And it probably does become table stakes.[00:17:23] swyx: A lot of people don't know, but a lot of the web text dataset that originally started for GPT 1, 2, and 3 was actually scraped from GitHub. from Reddit at least the sort of vote scores. And I think, I think that's a, that's a very valuable piece of information. So like, yeah, I think people are figuring out how to pay for data.[00:17:40] swyx: People are suing each other over data. This, this, this war is, you know, definitely very, very much heating up. And I don't think, I don't see it getting any less intense. I, you know, next to GPUs, data is going to be the most expensive thing in, in a model stack company. And. You know, a lot of people are resorting to synthetic versions of it, which may or may not be kosher based on how far along or how commercially blessed the, the forms of creating that synthetic data are.[00:18:11] swyx: I don't know if Alessio, you have any other interactions with like Data source companies, but that's my two cents.[00:18:17] Alessio: Yeah yeah, I actually saw Quentin Anthony from Luther. ai at GTC this week. He's also been working on this. I saw Technium. He's also been working on the data side. I think especially in open source, people are like, okay, if everybody is putting the gates up, so to speak, to the data we need to make it easier for people that don't have 50 million a year to get access to good data sets.[00:18:38] Alessio: And Jensen, at his keynote, he did talk about synthetic data a little bit. So I think that's something that we'll definitely hear more and more of in the enterprise, which never bodes well, because then all the, all the people with the data are like, Oh, the enterprises want to pay now? Let me, let me put a pay here stripe link so that they can give me 50 million.[00:18:57] Alessio: But it worked for Reddit. I think the stock is up. 40 percent today after opening. So yeah, I don't know if it's all about the Google deal, but it's obviously Reddit has been one of those companies where, hey, you got all this like great community, but like, how are you going to make money? And like, they try to sell the avatars.[00:19:15] Alessio: I don't know if that it's a great business for them. The, the data part sounds as an investor, you know, the data part sounds a lot more interesting than, than consumer[00:19:25] swyx: cosmetics. Yeah, so I think, you know there's more questions around data you know, I think a lot of people are talking about the interview that Mira Murady did with the Wall Street Journal, where she, like, just basically had no, had no good answer for where they got the data for Sora.[00:19:39] swyx: I, I think this is where, you know, there's, it's in nobody's interest to be transparent about data, and it's, it's kind of sad for the state of ML and the state of AI research but it is what it is. We, we have to figure this out as a society, just like we did for music and music sharing. You know, in, in sort of the Napster to Spotify transition, and that might take us a decade.[00:19:59] swyx: Yeah, I[00:20:00] NLW: do. I, I agree. I think, I think that you're right to identify it, not just as that sort of technical problem, but as one where society has to have a debate with itself. Because I think that there's, if you rationally within it, there's Great kind of points on all side, not to be the sort of, you know, person who sits in the middle constantly, but it's why I think a lot of these legal decisions are going to be really important because, you know, the job of judges is to listen to all this stuff and try to come to things and then have other judges disagree.[00:20:24] NLW: And, you know, and have the rest of us all debate at the same time. By the way, as a total aside, I feel like the synthetic data right now is like eggs in the 80s and 90s. Like, whether they're good for you or bad for you, like, you know, we, we get one study that's like synthetic data, you know, there's model collapse.[00:20:42] NLW: And then we have like a hint that llama, you know, to the most high performance version of it, which was one they didn't release was trained on synthetic data. So maybe it's good. It's like, I just feel like every, every other week I'm seeing something sort of different about whether it's a good or bad for, for these models.[00:20:56] swyx: Yeah. The branding of this is pretty poor. I would kind of tell people to think about it like cholesterol. There's good cholesterol, bad cholesterol. And you can have, you know, good amounts of both. But at this point, it is absolutely without a doubt that most large models from here on out will all be trained as some kind of synthetic data and that is not a bad thing.[00:21:16] swyx: There are ways in which you can do it poorly. Whether it's commercial, you know, in terms of commercial sourcing or in terms of the model performance. But it's without a doubt that good synthetic data is going to help your model. And this is just a question of like where to obtain it and what kinds of synthetic data are valuable.[00:21:36] swyx: You know, if even like alpha geometry, you know, was, was a really good example from like earlier this year.[00:21:42] NLW: If you're using the cholesterol analogy, then my, then my egg thing can't be that far off. Let's talk about the sort of the state of the art and the, and the GPT 4 class landscape and how that's changed.[00:21:53] Gemini 1.5 vs Claude 3[00:21:53] NLW: Cause obviously, you know, sort of the, the two big things or a couple of the big things that have happened. Since we last talked, we're one, you know, Gemini first announcing that a model was coming and then finally it arriving, and then very soon after a sort of a different model arriving from Gemini and and Cloud three.[00:22:11] NLW: So I guess, you know, I'm not sure exactly where the right place to start with this conversation is, but, you know, maybe very broadly speaking which of these do you think have made a bigger impact? Thank you.[00:22:20] Alessio: Probably the one you can use, right? So, Cloud. Well, I'm sure Gemini is going to be great once they let me in, but so far I haven't been able to.[00:22:29] Alessio: I use, so I have this small podcaster thing that I built for our podcast, which does chapters creation, like named entity recognition, summarization, and all of that. Cloud Tree is, Better than GPT 4. Cloud2 was unusable. So I use GPT 4 for everything. And then when Opus came out, I tried them again side by side and I posted it on, on Twitter as well.[00:22:53] Alessio: Cloud is better. It's very good, you know, it's much better, it seems to me, it's much better than GPT 4 at doing writing that is more, you know, I don't know, it just got good vibes, you know, like the GPT 4 text, you can tell it's like GPT 4, you know, it's like, it always uses certain types of words and phrases and, you know, maybe it's just me because I've now done it for, you know, So, I've read like 75, 80 generations of these things next to each other.[00:23:21] Alessio: Clutter is really good. I know everybody is freaking out on twitter about it, my only experience of this is much better has been on the podcast use case. But I know that, you know, Quran from from News Research is a very big opus pro, pro opus person. So, I think that's also It's great to have people that actually care about other models.[00:23:40] Alessio: You know, I think so far to a lot of people, maybe Entropic has been the sibling in the corner, you know, it's like Cloud releases a new model and then OpenAI releases Sora and like, you know, there are like all these different things, but yeah, the new models are good. It's interesting.[00:23:55] NLW: My my perception is definitely that just, just observationally, Cloud 3 is certainly the first thing that I've seen where lots of people.[00:24:06] NLW: They're, no one's debating evals or anything like that. They're talking about the specific use cases that they have, that they used to use chat GPT for every day, you know, day in, day out, that they've now just switched over. And that has, I think, shifted a lot of the sort of like vibe and sentiment in the space too.[00:24:26] NLW: And I don't necessarily think that it's sort of a A like full you know, sort of full knock. Let's put it this way. I think it's less bad for open AI than it is good for anthropic. I think that because GPT 5 isn't there, people are not quite willing to sort of like, you know get overly critical of, of open AI, except in so far as they're wondering where GPT 5 is.[00:24:46] NLW: But I do think that it makes, Anthropic look way more credible as a, as a, as a player, as a, you know, as a credible sort of player, you know, as opposed to to, to where they were.[00:24:57] Alessio: Yeah. And I would say the benchmarks veil is probably getting lifted this year. I think last year. People were like, okay, this is better than this on this benchmark, blah, blah, blah, because maybe they did not have a lot of use cases that they did frequently.[00:25:11] Alessio: So it's hard to like compare yourself. So you, you defer to the benchmarks. I think now as we go into 2024, a lot of people have started to use these models from, you know, from very sophisticated things that they run in production to some utility that they have on their own. Now they can just run them side by side.[00:25:29] Alessio: And it's like, Hey, I don't care that like. The MMLU score of Opus is like slightly lower than GPT 4. It just works for me, you know, and I think that's the same way that traditional software has been used by people, right? Like you just strive for yourself and like, which one does it work, works best for you?[00:25:48] Alessio: Like nobody looks at benchmarks outside of like sales white papers, you know? And I think it's great that we're going more in that direction. We have a episode with Adapt coming out this weekend. I'll and some of their model releases, they specifically say, We do not care about benchmarks, so we didn't put them in, you know, because we, we don't want to look good on them.[00:26:06] Alessio: We just want the product to work. And I think more and more people will, will[00:26:09] swyx: go that way. Yeah. I I would say like, it does take the wind out of the sails for GPT 5, which I know where, you know, Curious about later on. I think anytime you put out a new state of the art model, you have to break through in some way.[00:26:21] swyx: And what Claude and Gemini have done is effectively take away any advantage to saying that you have a million token context window. Now everyone's just going to be like, Oh, okay. Now you just match the other two guys. And so that puts An insane amount of pressure on what gpt5 is going to be because it's just going to have like the only option it has now because all the other models are multimodal all the other models are long context all the other models have perfect recall gpt5 has to match everything and do more to to not be a flop[00:26:58] AI Breakdown Part 2[00:26:58] NLW: hello friends back again with part two if you haven't heard part one of this conversation i suggest you go check it out but to be honest they are kind of actually separable In this conversation, we get into a topic that I think Alessio and Swyx are very well positioned to discuss, which is what developers care about right now, what people are trying to build around.[00:27:16] NLW: I honestly think that one of the best ways to see the future in an industry like AI is to try to dig deep on what developers and entrepreneurs are attracted to build, even if it hasn't made it to the news pages yet. So consider this your preview of six months from now, and let's dive in. Let's bring it to the GPT 5 conversation.[00:27:33] Next Frontiers: Llama 3, GPT-5, Gemini 2, Claude 4[00:27:33] NLW: I mean, so, so I think that that's a great sort of assessment of just how the stakes have been raised, you know is your, I mean, so I guess maybe, maybe I'll, I'll frame this less as a question, just sort of something that, that I, that I've been watching right now, the only thing that makes sense to me with how.[00:27:50] NLW: Fundamentally unbothered and unstressed OpenAI seems about everything is that they're sitting on something that does meet all that criteria, right? Because, I mean, even in the Lex Friedman interview that, that Altman recently did, you know, he's talking about other things coming out first. He's talking about, he's just like, he, listen, he, he's good and he could play nonchalant, you know, if he wanted to.[00:28:13] NLW: So I don't want to read too much into it, but. You know, they've had so long to work on this, like unless that we are like really meaningfully running up against some constraint, it just feels like, you know, there's going to be some massive increase, but I don't know. What do you guys think?[00:28:28] swyx: Hard to speculate.[00:28:29] swyx: You know, at this point, they're, they're pretty good at PR and they're not going to tell you anything that they don't want to. And he can tell you one thing and change their minds the next day. So it's, it's, it's really, you know, I've always said that model version numbers are just marketing exercises, like they have something and it's always improving and at some point you just cut it and decide to call it GPT 5.[00:28:50] swyx: And it's more just about defining an arbitrary level at which they're ready and it's up to them on what ready means. We definitely did see some leaks on GPT 4. 5, as I think a lot of people reported and I'm not sure if you covered it. So it seems like there might be an intermediate release. But I did feel, coming out of the Lex Friedman interview, that GPT 5 was nowhere near.[00:29:11] swyx: And you know, it was kind of a sharp contrast to Sam talking at Davos in February, saying that, you know, it was his top priority. So I find it hard to square. And honestly, like, there's also no point Reading too much tea leaves into what any one person says about something that hasn't happened yet or has a decision that hasn't been taken yet.[00:29:31] swyx: Yeah, that's, that's my 2 cents about it. Like, calm down, let's just build .[00:29:35] Alessio: Yeah. The, the February rumor was that they were gonna work on AI agents, so I don't know, maybe they're like, yeah,[00:29:41] swyx: they had two agent two, I think two agent projects, right? One desktop agent and one sort of more general yeah, sort of GPTs like agent and then Andre left, so he was supposed to be the guy on that.[00:29:52] swyx: What did Andre see? What did he see? I don't know. What did he see?[00:29:56] Alessio: I don't know. But again, it's just like the rumors are always floating around, you know but I think like, this is, you know, we're not going to get to the end of the year without Jupyter you know, that's definitely happening. I think the biggest question is like, are Anthropic and Google.[00:30:13] Alessio: Increasing the pace, you know, like it's the, it's the cloud four coming out like in 12 months, like nine months. What's the, what's the deal? Same with Gemini. They went from like one to 1. 5 in like five days or something. So when's Gemini 2 coming out, you know, is that going to be soon? I don't know.[00:30:31] Alessio: There, there are a lot of, speculations, but the good thing is that now you can see a world in which OpenAI doesn't rule everything. You know, so that, that's the best, that's the best news that everybody got, I would say.[00:30:43] swyx: Yeah, and Mistral Large also dropped in the last month. And, you know, not as, not quite GPT 4 class, but very good from a new startup.[00:30:52] swyx: So yeah, we, we have now slowly changed in landscape, you know. In my January recap, I was complaining that nothing's changed in the landscape for a long time. But now we do exist in a world, sort of a multipolar world where Cloud and Gemini are legitimate challengers to GPT 4 and hopefully more will emerge as well hopefully from meta.[00:31:11] Open Source Models - Mistral, Grok[00:31:11] NLW: So speak, let's actually talk about sort of the open source side of this for a minute. So Mistral Large, notable because it's, it's not available open source in the same way that other things are, although I think my perception is that the community has largely given them Like the community largely recognizes that they want them to keep building open source stuff and they have to find some way to fund themselves that they're going to do that.[00:31:27] NLW: And so they kind of understand that there's like, they got to figure out how to eat, but we've got, so, you know, there there's Mistral, there's, I guess, Grok now, which is, you know, Grok one is from, from October is, is open[00:31:38] swyx: sourced at, yeah. Yeah, sorry, I thought you thought you meant Grok the chip company.[00:31:41] swyx: No, no, no, yeah, you mean Twitter Grok.[00:31:43] NLW: Although Grok the chip company, I think is even more interesting in some ways, but and then there's the, you know, obviously Llama3 is the one that sort of everyone's wondering about too. And, you know, my, my sense of that, the little bit that, you know, Zuckerberg was talking about Llama 3 earlier this year, suggested that, at least from an ambition standpoint, he was not thinking about how do I make sure that, you know, meta content, you know, keeps, keeps the open source thrown, you know, vis a vis Mistral.[00:32:09] NLW: He was thinking about how you go after, you know, how, how he, you know, releases a thing that's, you know, every bit as good as whatever OpenAI is on at that point.[00:32:16] Alessio: Yeah. From what I heard in the hallways at, at GDC, Llama 3, the, the biggest model will be, you 260 to 300 billion parameters, so that that's quite large.[00:32:26] Alessio: That's not an open source model. You know, you cannot give people a 300 billion parameters model and ask them to run it. You know, it's very compute intensive. So I think it is, it[00:32:35] swyx: can be open source. It's just, it's going to be difficult to run, but that's a separate question.[00:32:39] Alessio: It's more like, as you think about what they're doing it for, you know, it's not like empowering the person running.[00:32:45] Alessio: llama. On, on their laptop, it's like, oh, you can actually now use this to go after open AI, to go after Anthropic, to go after some of these companies at like the middle complexity level, so to speak. Yeah. So obviously, you know, we estimate Gentala on the podcast, they're doing a lot here, they're making PyTorch better.[00:33:03] Alessio: You know, they want to, that's kind of like maybe a little bit of a shorted. Adam Bedia, in a way, trying to get some of the CUDA dominance out of it. Yeah, no, it's great. The, I love the duck destroying a lot of monopolies arc. You know, it's, it's been very entertaining. Let's bridge[00:33:18] NLW: into the sort of big tech side of this, because this is obviously like, so I think actually when I did my episode, this was one of the I added this as one of as an additional war that, that's something that I'm paying attention to.[00:33:29] NLW: So we've got Microsoft's moves with inflection, which I think pretend, potentially are being read as A shift vis a vis the relationship with OpenAI, which also the sort of Mistral large relationship seems to reinforce as well. We have Apple potentially entering the race, finally, you know, giving up Project Titan and and, and kind of trying to spend more effort on this.[00:33:50] NLW: Although, Counterpoint, we also have them talking about it, or there being reports of a deal with Google, which, you know, is interesting to sort of see what their strategy there is. And then, you know, Meta's been largely quiet. We kind of just talked about the main piece, but, you know, there's, and then there's spoilers like Elon.[00:34:07] NLW: I mean, you know, what, what of those things has sort of been most interesting to you guys as you think about what's going to shake out for the rest of this[00:34:13] Apple MM1[00:34:13] swyx: year? I'll take a crack. So the reason we don't have a fifth war for the Big Tech Wars is that's one of those things where I just feel like we don't cover differently from other media channels, I guess.[00:34:26] swyx: Sure, yeah. In our anti interestness, we actually say, like, we try not to cover the Big Tech Game of Thrones, or it's proxied through Twitter. You know, all the other four wars anyway, so there's just a lot of overlap. Yeah, I think absolutely, personally, the most interesting one is Apple entering the race.[00:34:41] swyx: They actually released, they announced their first large language model that they trained themselves. It's like a 30 billion multimodal model. People weren't that impressed, but it was like the first time that Apple has kind of showcased that, yeah, we're training large models in house as well. Of course, like, they might be doing this deal with Google.[00:34:57] swyx: I don't know. It sounds very sort of rumor y to me. And it's probably, if it's on device, it's going to be a smaller model. So something like a Jemma. It's going to be smarter autocomplete. I don't know what to say. I'm still here dealing with, like, Siri, which hasn't, probably hasn't been updated since God knows when it was introduced.[00:35:16] swyx: It's horrible. I, you know, it, it, it makes me so angry. So I, I, one, as an Apple customer and user, I, I'm just hoping for better AI on Apple itself. But two, they are the gold standard when it comes to local devices, personal compute and, and trust, like you, you trust them with your data. And. I think that's what a lot of people are looking for in AI, that they have, they love the benefits of AI, they don't love the downsides, which is that you have to send all your data to some cloud somewhere.[00:35:45] swyx: And some of this data that we're going to feed AI is just the most personal data there is. So Apple being like one of the most trusted personal data companies, I think it's very important that they enter the AI race, and I hope to see more out of them.[00:35:58] Alessio: To me, the, the biggest question with the Google deal is like, who's paying who?[00:36:03] Alessio: Because for the browsers, Google pays Apple like 18, 20 billion every year to be the default browser. Is Google going to pay you to have Gemini or is Apple paying Google to have Gemini? I think that's, that's like what I'm most interested to figure out because with the browsers, it's like, it's the entry point to the thing.[00:36:21] Alessio: So it's really valuable to be the default. That's why Google pays. But I wonder if like the perception in AI is going to be like, Hey. You just have to have a good local model on my phone to be worth me purchasing your device. And that was, that's kind of drive Apple to be the one buying the model. But then, like Shawn said, they're doing the MM1 themselves.[00:36:40] Alessio: So are they saying we do models, but they're not as good as the Google ones? I don't know. The whole thing is, it's really confusing, but. It makes for great meme material on on Twitter.[00:36:51] swyx: Yeah, I mean, I think, like, they are possibly more than OpenAI and Microsoft and Amazon. They are the most full stack company there is in computing, and so, like, they own the chips, man.[00:37:05] swyx: Like, they manufacture everything so if, if, if there was a company that could do that. You know, seriously challenge the other AI players. It would be Apple. And it's, I don't think it's as hard as self driving. So like maybe they've, they've just been investing in the wrong thing this whole time. We'll see.[00:37:21] swyx: Wall Street certainly thinks[00:37:22] NLW: so. Wall Street loved that move, man. There's a big, a big sigh of relief. Well, let's, let's move away from, from sort of the big stuff. I mean, the, I think to both of your points, it's going to.[00:37:33] Meta's $800b AI rebrand[00:37:33] NLW: Can I, can[00:37:34] swyx: I, can I, can I jump on factoid about this, this Wall Street thing? I went and looked at when Meta went from being a VR company to an AI company.[00:37:44] swyx: And I think the stock I'm trying to look up the details now. The stock has gone up 187% since Lamo one. Yeah. Which is $830 billion in market value created in the past year. . Yeah. Yeah.[00:37:57] NLW: It's, it's, it's like, remember if you guys haven't Yeah. If you haven't seen the chart, it's actually like remarkable.[00:38:02] NLW: If you draw a little[00:38:03] swyx: arrow on it, it's like, no, we're an AI company now and forget the VR thing.[00:38:10] NLW: It's it, it is an interesting, no, it's, I, I think, alessio, you called it sort of like Zuck's Disruptor Arc or whatever. He, he really does. He is in the midst of a, of a total, you know, I don't know if it's a redemption arc or it's just, it's something different where, you know, he, he's sort of the spoiler.[00:38:25] NLW: Like people loved him just freestyle talking about why he thought they had a better headset than Apple. But even if they didn't agree, they just loved it. He was going direct to camera and talking about it for, you know, five minutes or whatever. So that, that's a fascinating shift that I don't think anyone had on their bingo card, you know, whatever, two years ago.[00:38:41] NLW: Yeah. Yeah,[00:38:42] swyx: we still[00:38:43] Alessio: didn't see and fight Elon though, so[00:38:45] swyx: that's what I'm really looking forward to. I mean, hey, don't, don't, don't write it off, you know, maybe just these things take a while to happen. But we need to see and fight in the Coliseum. No, I think you know, in terms of like self management, life leadership, I think he has, there's a lot of lessons to learn from him.[00:38:59] swyx: You know he might, you know, you might kind of quibble with, like, the social impact of Facebook, but just himself as a in terms of personal growth and, and, you know, Per perseverance through like a lot of change and you know, everyone throwing stuff his way. I think there's a lot to say about like, to learn from, from Zuck, which is crazy 'cause he's my age.[00:39:18] swyx: Yeah. Right.[00:39:20] AI Engineer landscape - from baby AGIs to vertical Agents[00:39:20] NLW: Awesome. Well, so, so one of the big things that I think you guys have, you know, distinct and, and unique insight into being where you are and what you work on is. You know, what developers are getting really excited about right now. And by that, I mean, on the one hand, certainly, you know, like startups who are actually kind of formalized and formed to startups, but also, you know, just in terms of like what people are spending their nights and weekends on what they're, you know, coming to hackathons to do.[00:39:45] NLW: And, you know, I think it's a, it's a, it's, it's such a fascinating indicator for, for where things are headed. Like if you zoom back a year, right now was right when everyone was getting so, so excited about. AI agent stuff, right? Auto, GPT and baby a GI. And these things were like, if you dropped anything on YouTube about those, like instantly tens of thousands of views.[00:40:07] NLW: I know because I had like a 50,000 view video, like the second day that I was doing the show on YouTube, you know, because I was talking about auto GPT. And so anyways, you know, obviously that's sort of not totally come to fruition yet, but what are some of the trends in what you guys are seeing in terms of people's, people's interest and, and, and what people are building?[00:40:24] Alessio: I can start maybe with the agents part and then I know Shawn is doing a diffusion meetup tonight. There's a lot of, a lot of different things. The, the agent wave has been the most interesting kind of like dream to reality arc. So out of GPT, I think they went, From zero to like 125, 000 GitHub stars in six weeks, and then one year later, they have 150, 000 stars.[00:40:49] Alessio: So there's kind of been a big plateau. I mean, you might say there are just not that many people that can start it. You know, everybody already started it. But the promise of, hey, I'll just give you a goal, and you do it. I think it's like, amazing to get people's imagination going. You know, they're like, oh, wow, this This is awesome.[00:41:08] Alessio: Everybody, everybody can try this to do anything. But then as technologists, you're like, well, that's, that's just like not possible, you know, we would have like solved everything. And I think it takes a little bit to go from the promise and the hope that people show you to then try it yourself and going back to say, okay, this is not really working for me.[00:41:28] Alessio: And David Wong from Adept, you know, they in our episode, he specifically said. We don't want to do a bottom up product. You know, we don't want something that everybody can just use and try because it's really hard to get it to be reliable. So we're seeing a lot of companies doing vertical agents that are narrow for a specific domain, and they're very good at something.[00:41:49] Alessio: Mike Conover, who was at Databricks before, is also a friend of Latentspace. He's doing this new company called BrightWave doing AI agents for financial research, and that's it, you know, and they're doing very well. There are other companies doing it in security, doing it in compliance, doing it in legal.[00:42:08] Alessio: All of these things that like, people, nobody just wakes up and say, Oh, I cannot wait to go on AutoGPD and ask it to do a compliance review of my thing. You know, just not what inspires people. So I think the gap on the developer side has been the more bottom sub hacker mentality is trying to build this like very Generic agents that can do a lot of open ended tasks.[00:42:30] Alessio: And then the more business side of things is like, Hey, If I want to raise my next round, I can not just like sit around the mess, mess around with like super generic stuff. I need to find a use case that really works. And I think that that is worth for, for a lot of folks in parallel, you have a lot of companies doing evals.[00:42:47] Alessio: There are dozens of them that just want to help you measure how good your models are doing. Again, if you build evals, you need to also have a restrained surface area to actually figure out whether or not it's good, right? Because you cannot eval anything on everything under the sun. So that's another category where I've seen from the startup pitches that I've seen, there's a lot of interest in, in the enterprise.[00:43:11] Alessio: It's just like really. Fragmented because the production use cases are just coming like now, you know, there are not a lot of long established ones to, to test against. And so does it, that's kind of on the virtual agents and then the robotic side it's probably been the thing that surprised me the most at NVIDIA GTC, the amount of robots that were there that were just like robots everywhere.[00:43:33] Alessio: Like, both in the keynote and then on the show floor, you would have Boston Dynamics dogs running around. There was, like, this, like fox robot that had, like, a virtual face that, like, talked to you and, like, moved in real time. There were industrial robots. NVIDIA did a big push on their own Omniverse thing, which is, like, this Digital twin of whatever environments you're in that you can use to train the robots agents.[00:43:57] Alessio: So that kind of takes people back to the reinforcement learning days, but yeah, agents, people want them, you know, people want them. I give a talk about the, the rise of the full stack employees and kind of this future, the same way full stack engineers kind of work across the stack. In the future, every employee is going to interact with every part of the organization through agents and AI enabled tooling.[00:44:17] Alessio: This is happening. It just needs to be a lot more narrow than maybe the first approach that we took, which is just put a string in AutoGPT and pray. But yeah, there's a lot of super interesting stuff going on.[00:44:27] swyx: Yeah. Well, he Let's recover a lot of stuff there. I'll separate the robotics piece because I feel like that's so different from the software world.[00:44:34] swyx: But yeah, we do talk to a lot of engineers and you know, that this is our sort of bread and butter. And I do agree that vertical agents have worked out a lot better than the horizontal ones. I think all You know, the point I'll make here is just the reason AutoGPT and maybe AGI, you know, it's in the name, like they were promising AGI.[00:44:53] swyx: But I think people are discovering that you cannot engineer your way to AGI. It has to be done at the model level and all these engineering, prompt engineering hacks on top of it weren't really going to get us there in a meaningful way without much further, you know, improvements in the models. I would say, I'll go so far as to say, even Devin, which is, I would, I think the most advanced agent that we've ever seen, still requires a lot of engineering and still probably falls apart a lot in terms of, like, practical usage.[00:45:22] swyx: Or it's just, Way too slow and expensive for, you know, what it's, what it's promised compared to the video. So yeah, that's, that's what, that's what happened with agents from, from last year. But I, I do, I do see, like, vertical agents being very popular and, and sometimes you, like, I think the word agent might even be overused sometimes.[00:45:38] swyx: Like, people don't really care whether or not you call it an AI agent, right? Like, does it replace boring menial tasks that I do That I might hire a human to do, or that the human who is hired to do it, like, actually doesn't really want to do. And I think there's absolutely ways in sort of a vertical context that you can actually go after very routine tasks that can be scaled out to a lot of, you know, AI assistants.[00:46:01] swyx: So, so yeah, I mean, and I would, I would sort of basically plus one what let's just sit there. I think it's, it's very, very promising and I think more people should work on it, not less. Like there's not enough people. Like, we, like, this should be the, the, the main thrust of the AI engineer is to look out, look for use cases and, and go to a production with them instead of just always working on some AGI promising thing that never arrives.[00:46:21] swyx: I,[00:46:22] NLW: I, I can only add that so I've been fiercely making tutorials behind the scenes around basically everything you can imagine with AI. We've probably done, we've done about 300 tutorials over the last couple of months. And the verticalized anything, right, like this is a solution for your particular job or role, even if it's way less interesting or kind of sexy, it's like so radically more useful to people in terms of intersecting with how, like those are the ways that people are actually.[00:46:50] NLW: Adopting AI in a lot of cases is just a, a, a thing that I do over and over again. By the way, I think that's the same way that even the generalized models are getting adopted. You know, it's like, I use midjourney for lots of stuff, but the main thing I use it for is YouTube thumbnails every day. Like day in, day out, I will always do a YouTube thumbnail, you know, or two with, with Midjourney, right?[00:47:09] NLW: And it's like you can, you can start to extrapolate that across a lot of things and all of a sudden, you know, a AI doesn't. It looks revolutionary because of a million small changes rather than one sort of big dramatic change. And I think that the verticalization of agents is sort of a great example of how that's[00:47:26] swyx: going to play out too.[00:47:28] Adept episode - Screen Multimodality[00:47:28] swyx: So I'll have one caveat here, which is I think that Because multi modal models are now commonplace, like Cloud, Gemini, OpenAI, all very very easily multi modal, Apple's easily multi modal, all this stuff. There is a switch for agents for sort of general desktop browsing that I think people so much for joining us today, and we'll see you in the next video.[00:48:04] swyx: Version of the the agent where they're not specifically taking in text or anything They're just watching your screen just like someone else would and and I'm piloting it by vision And you know in the the episode with David that we'll have dropped by the time that this this airs I think I think that is the promise of adept and that is a promise of what a lot of these sort of desktop agents Are and that is the more general purpose system That could be as big as the browser, the operating system, like, people really want to build that foundational piece of software in AI.[00:48:38] swyx: And I would see, like, the potential there for desktop agents being that, that you can have sort of self driving computers. You know, don't write the horizontal piece out. I just think we took a while to get there.[00:48:48] NLW: What else are you guys seeing that's interesting to you? I'm looking at your notes and I see a ton of categories.[00:48:54] Top Model Research from January Recap[00:48:54] swyx: Yeah so I'll take the next two as like as one category, which is basically alternative architectures, right? The two main things that everyone following AI kind of knows now is, one, the diffusion architecture, and two, the let's just say the, Decoder only transformer architecture that is popularized by GPT.[00:49:12] swyx: You can read, you can look on YouTube for thousands and thousands of tutorials on each of those things. What we are talking about here is what's next, what people are researching, and what could be on the horizon that takes the place of those other two things. So first of all, we'll talk about transformer architectures and then diffusion.[00:49:25] swyx: So transformers the, the two leading candidates are effectively RWKV and the state space models the most recent one of which is Mamba, but there's others like the Stripe, ENA, and the S four H three stuff coming out of hazy research at Stanford. And all of those are non quadratic language models that scale the promise to scale a lot better than the, the traditional transformer.[00:49:47] swyx: That this might be too theoretical for most people right now, but it's, it's gonna be. It's gonna come out in weird ways, where, imagine if like, Right now the talk of the town is that Claude and Gemini have a million tokens of context and like whoa You can put in like, you know, two hours of video now, okay But like what if you put what if we could like throw in, you know, two hundred thousand hours of video?[00:50:09] swyx: Like how does that change your usage of AI? What if you could throw in the entire genetic sequence of a human and like synthesize new drugs. Like, well, how does that change things? Like, we don't know because we haven't had access to this capability being so cheap before. And that's the ultimate promise of these two models.[00:50:28] swyx: They're not there yet but we're seeing very, very good progress. RWKV and Mamba are probably the, like, the two leading examples, both of which are open source that you can try them today and and have a lot of progress there. And the, the, the main thing I'll highlight for audio e KV is that at, at the seven B level, they seem to have beat LAMA two in all benchmarks that matter at the same size for the same amount of training as an open source model.[00:50:51] swyx: So that's exciting. You know, they're there, they're seven B now. They're not at seven tb. We don't know if it'll. And then the other thing is diffusion. Diffusions and transformers are are kind of on the collision course. The original stable diffusion already used transformers in in parts of its architecture.[00:51:06] swyx: It seems that transformers are eating more and more of those layers particularly the sort of VAE layer. So that's, the Diffusion Transformer is what Sora is built on. The guy who wrote the Diffusion Transformer paper, Bill Pebbles, is, Bill Pebbles is the lead tech guy on Sora. So you'll just see a lot more Diffusion Transformer stuff going on.[00:51:25] swyx: But there's, there's more sort of experimentation with diffusion. I'm holding a meetup actually here in San Francisco that's gonna be like the state of diffusion, which I'm pretty excited about. Stability's doing a lot of good work. And if you look at the, the architecture of how they're creating Stable Diffusion 3, Hourglass Diffusion, and the inconsistency models, or SDXL Turbo.[00:51:45] swyx: All of these are, like, very, very interesting innovations on, like, the original idea of what Stable Diffusion was. So if you think that it is expensive to create or slow to create Stable Diffusion or an AI generated art, you are not up to date with the latest models. If you think it is hard to create text and images, you are not up to date with the latest models.[00:52:02] swyx: And people still are kind of far behind. The last piece of which is the wildcard I always kind of hold out, which is text diffusion. So Instead of using autogenerative or autoregressive transformers, can you use text to diffuse? So you can use diffusion models to diffuse and create entire chunks of text all at once instead of token by token.[00:52:22] swyx: And that is something that Midjourney confirmed today, because it was only rumored the past few months. But they confirmed today that they were looking into. So all those things are like very exciting new model architectures that are, Maybe something that we'll, you'll see in production two to three years from now.[00:52:37] swyx: So the couple of the trends[00:52:38] NLW: that I want to just get your takes on, because they're sort of something that, that seems like they're coming up are one sort of these, these wearable, you know, kind of passive AI experiences where they're absorbing a lot of what's going on around you and then, and then kind of bringing things back.[00:52:53] NLW: And then the, the other one that I, that I wanted to see if you guys had thoughts on were sort of this next generation of chip companies. Obviously there's a huge amount of emphasis. On on hardware and silicon and, and, and different ways of doing things, but, y

america god tv love ceo amazon spotify netflix world learning europe english google ai apple lessons pr magic san francisco phd friend digital chinese marvel reading data predictions elon musk microsoft events funny fortune startups white house weird economics wall street memory wall street journal reddit wars vr auto cloud curious singapore gate stanford connections mix israelis context ibm mark zuckerberg senior vice president average intel cto ram state of the union signal tigers minecraft vc adapt siri ipo transformers sol instructors lsu openai gemini clouds nvidia stability rust ux api gi lemon patel nsfw cisco luther b2c compass progression d d bro davos sweep bing makes disagreement gpt mythology ml lama github llama token thursday night apis stripe quran vcs amd devops captive baldur opus embody silicon sora copilot dozen sam altman bobo llm tab capital one mamba gpu altman boba generic waze agi dali upfront midjourney ide approve napster gdc zuck golem coliseum git kv prs albrecht waymo diffusion rag klarna cloudflare gpus coders gan deepmind grok tldr anthropic boston dynamics alessio gitlab sergei minefields ppa json fragmented lex fridman ena suno mistral stable diffusion nox inflection decibel databricks counterpoint a16z mts cursor gpts rohde adept cuda chroma asr jensen huang sundar lemurian gtc iou decoder stability ai singaporeans netlify omniverse etched sram nvidia gpus cerebros pytorch eac lamo day6 devtools not safe mustafa suleyman agis jupyter kubecon vae elicit autogpt project titan tpu practical ai milind nvidia gtc personal ai groq demis neurips andrej karpathy marginally jeff dean imbue nlw positron ai engineer hbm slido nat friedman entropic ppap lstm c300 technium boba guys simon willison mbu xla lpu you look latent space swix medex lstms mxu metax

LW - The One and a Half Gemini by Zvi

The Nonlinear Library

Play Episode Listen Later Feb 22, 2024 13:07

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The One and a Half Gemini, published by Zvi on February 22, 2024 on LessWrong. Previously: I hit send on The Third Gemini, and within half an hour DeepMind announced Gemini 1.5. So this covers Gemini 1.5. One million tokens, and we are promised overall Gemini Advanced or GPT-4 levels of performance on Gemini Pro levels of compute. This post does not cover the issues with Gemini's image generation, and what it is and is not willing to generate. I am on top of that situation and will get to it soon. One Million Tokens Our teams continue pushing the frontiers of our latest models with safety at the core. They are making rapid progress. In fact, we're ready to introduce the next generation: Gemini 1.5. It shows dramatic improvements across a number of dimensions and 1.5 Pro achieves comparable quality to 1.0 Ultra, while using less compute. It is truly bizarre to launch Gemini Advanced as a paid service, and then about a week later announce the new Gemini Pro 1.5 is now about as good as Gemini Advanced. Yes, actually, I do feel the acceleration, hot damn. And that's not all! This new generation also delivers a breakthrough in long-context understanding. We've been able to significantly increase the amount of information our models can process - running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet. One million is a lot of tokens. That covers every individual document I have ever asked an LLM to examine. That is enough to cover my entire set of AI columns for the entire year, in case I ever need to look something up, presumably Google's NotebookLM is The Way to do that. A potential future 10 million would be even more. Soon Gemini will be able to watch a one hour video or read 700k words, whereas right now if I use the web interface of Gemini Advanced interface all I can upload is a photo. The standard will be to give people 128k tokens to start, then you can pay for more than that. A million tokens is not cheap inference, even for Google. Oriol Vinyals (VP of R&D DeepMind): Gemini 1.5 has arrived. Pro 1.5 with 1M tokens available as an experimental feature via AI Studio and Vertex AI in private preview. Then there's this: In our research, we tested Gemini 1.5 on up to 2M tokens for audio, 2.8M tokens for video, and 10M tokens for text. From Shannon's 1950s bi-gram models (2 tokens), and after being mesmerized by LSTMs many years ago able to model 200 tokens, it feels almost impossible that I would be talking about hundreds of thousands of tokens in context length, let alone millions. Jeff Dean (Chief Scientist, Google DeepMind): Multineedle in haystack test: We also created a generalized version of the needle in a haystack test, where the model must retrieve 100 different needles hidden in the context window. For this, we see that Gemini 1.5 Pro's performance is above that of GPT-4 Turbo at small context lengths and remains relatively steady across the entire 1M context window, while the GPT-4 Turbo model drops off more quickly (and cannot go past 128k tokens). Guido Appenzeller (responding to similar post): Is this really done with a monolithic model? For a 10M token window, input state would be many Gigabytes. Seems crazy expensive to run on today's hardware. Sholto Douglas (DeepMind): It would honestly have been difficult to do at decent latency without TPUs (and their interconnect) They're an under appreciated but critical piece of this story Here are their head-to-head results with themselves: Here is the technical report. There is no need to read it, all of this is straightforward. Their safety section says 'we followed our procedure' and offers no additional details on methodology. On safety performance, their tests did not seem to offer much insight, scores were similar to Gemini Pro 1.0. Mixture...

google ai speech ea gemini turbo gpt 2m 10m llm 8m deepmind mixture gigabytes zvi rationalist gemini pro vertex ai lesswrong lstms

LW - The One and a Half Gemini by Zvi

The Nonlinear Library: LessWrong

Play Episode Listen Later Feb 22, 2024 13:07

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The One and a Half Gemini, published by Zvi on February 22, 2024 on LessWrong. Previously: I hit send on The Third Gemini, and within half an hour DeepMind announced Gemini 1.5. So this covers Gemini 1.5. One million tokens, and we are promised overall Gemini Advanced or GPT-4 levels of performance on Gemini Pro levels of compute. This post does not cover the issues with Gemini's image generation, and what it is and is not willing to generate. I am on top of that situation and will get to it soon. One Million Tokens Our teams continue pushing the frontiers of our latest models with safety at the core. They are making rapid progress. In fact, we're ready to introduce the next generation: Gemini 1.5. It shows dramatic improvements across a number of dimensions and 1.5 Pro achieves comparable quality to 1.0 Ultra, while using less compute. It is truly bizarre to launch Gemini Advanced as a paid service, and then about a week later announce the new Gemini Pro 1.5 is now about as good as Gemini Advanced. Yes, actually, I do feel the acceleration, hot damn. And that's not all! This new generation also delivers a breakthrough in long-context understanding. We've been able to significantly increase the amount of information our models can process - running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet. One million is a lot of tokens. That covers every individual document I have ever asked an LLM to examine. That is enough to cover my entire set of AI columns for the entire year, in case I ever need to look something up, presumably Google's NotebookLM is The Way to do that. A potential future 10 million would be even more. Soon Gemini will be able to watch a one hour video or read 700k words, whereas right now if I use the web interface of Gemini Advanced interface all I can upload is a photo. The standard will be to give people 128k tokens to start, then you can pay for more than that. A million tokens is not cheap inference, even for Google. Oriol Vinyals (VP of R&D DeepMind): Gemini 1.5 has arrived. Pro 1.5 with 1M tokens available as an experimental feature via AI Studio and Vertex AI in private preview. Then there's this: In our research, we tested Gemini 1.5 on up to 2M tokens for audio, 2.8M tokens for video, and 10M tokens for text. From Shannon's 1950s bi-gram models (2 tokens), and after being mesmerized by LSTMs many years ago able to model 200 tokens, it feels almost impossible that I would be talking about hundreds of thousands of tokens in context length, let alone millions. Jeff Dean (Chief Scientist, Google DeepMind): Multineedle in haystack test: We also created a generalized version of the needle in a haystack test, where the model must retrieve 100 different needles hidden in the context window. For this, we see that Gemini 1.5 Pro's performance is above that of GPT-4 Turbo at small context lengths and remains relatively steady across the entire 1M context window, while the GPT-4 Turbo model drops off more quickly (and cannot go past 128k tokens). Guido Appenzeller (responding to similar post): Is this really done with a monolithic model? For a 10M token window, input state would be many Gigabytes. Seems crazy expensive to run on today's hardware. Sholto Douglas (DeepMind): It would honestly have been difficult to do at decent latency without TPUs (and their interconnect) They're an under appreciated but critical piece of this story Here are their head-to-head results with themselves: Here is the technical report. There is no need to read it, all of this is straightforward. Their safety section says 'we followed our procedure' and offers no additional details on methodology. On safety performance, their tests did not seem to offer much insight, scores were similar to Gemini Pro 1.0. Mixture...

google ai speech ea gemini turbo gpt 2m 10m llm 8m deepmind mixture gigabytes zvi rationalist gemini pro vertex ai lesswrong lstms

Tal Linzen: Psycholinguistics and Language Modeling

The Gradient Podcast

Play Episode Listen Later Oct 5, 2023 74:50

In episode 93 of The Gradient Podcast, Daniel Bashir speaks to Professor Tal Linzen.Professor Linzen is an Associate Professor of Linguistics and Data Science at New York University and a Research Scientist at Google. He directs the Computation and Psycholinguistics Lab, where he and his collaborators use behavioral experiments and computational methods to study how people learn and understand language. They also develop methods for evaluating, understanding, and improving computational systems for language processing.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pubSubscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:25) Prof. Linzen's background* (05:37) Back and forth between psycholinguistics and deep learning research, LM evaluation* (08:40) How can deep learning successes/failures help us understand human language use, methodological concerns, comparing human representations to LM representations* (14:22) Behavioral capacities and degrees of freedom in representations* (16:40) How LMs are becoming less and less like humans* (19:25) Assessing LSTMs' ability to learn syntax-sensitive dependencies* (22:48) Similarities between structure-sensitive dependencies, sophistication of syntactic representations* (25:30) RNNs implicitly implement tensor-product representations—vector representations of symbolic structures* (29:45) Representations required to solve certain tasks, difficulty of natural language* (33:25) Accelerating progress towards human-like linguistic generalization* (34:30) The pre-training agnostic identically distributed evaluation paradigm* (39:50) Ways to mitigate differences in evaluation* (44:20) Surprisal does not explain syntactic disambiguation difficulty* (45:00) How to measure processing difficulty, predictability and processing difficulty* (49:20) What other factors influence processing difficulty?* (53:10) How to plant trees in language models* (55:45) Architectural influences on generalizing knowledge of linguistic structure* (58:20) “Cognitively relevant regimes” and speed of generalization* (1:00:45) Acquisition of syntax and sampling simpler vs. more complex sentences* (1:04:03) Curriculum learning for progressively more complicated syntax* (1:05:35) Hypothesizing tree-structured representations* (1:08:00) Reflecting on a prediction from the past* (1:10:15) Goals and “the correct direction” in AI research* (1:14:04) OutroLinks:* Prof. Linzen's Twitter and homepage* Papers* Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies* RNNS Implicitly Implement Tensor-Product Representations* How Can We Accelerate Progress Towards Human-like Linguistic Generalization?* Surprisal does not explain syntactic disambiguation difficulty: evidence from a large-scale benchmark* How to Plant Trees in LMs: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases Get full access to The Gradient at thegradientpub.substack.com/subscribe

Let's Play! The Intersection between Art and Science

Connecting Citizens to Science

Play Episode Listen Later May 26, 2023 19:24 Transcription Available

Have you heard the term SciArt before? In this episode, we explore what it is and the benefits of combining art and science as a research and communication tool.Our Co-host for this episode is Elli Wright, Public Engagement Manager at the Liverpool School of Tropical Medicine. Elli told us ‘...effective science communication can really connect people and communities with research. Science communicators and public engagement professionals want to reduce the elitism built into society with regards to who is allowed to access scientific knowledge. Science belongs to all of us which is why effective science communication is so important. There are many ways that science can be communicated to the diverse public audience, including through art.'Natasha Niethamer shared with us, ‘the more we engage others about public health concerns that require global efforts to fight, the more likely we are to inspire community action, driving interest in policy makers and funders. Directly inspiring even one teacher, parent, young person, or community member may indirectly inspire a large network of their own. You may inspire the next major activist of our generation!'Listen on to find out more about how a playful approach can bring new insights to your work.This episode features:Dr Elli Wright - Public Engagement Manager, Liverpool School of Tropical MedicineElli has been working in the science communication and public engagement sector at Liverpool School of Tropical Medicine for nearly 8 years. She is currently studying an MSc. in science communication at the University of the West of England, which has given her a stronger understanding of the theories behind science communication practices. Her research explores the use of autoethnography as a tool to the co-creation of the Tropical Medicine Time Machine by artists Tom Hyatt and Natasha Neithamer (also featured in this episode). Mark Roughley - Senior Lecturer 3D Digital Art, Liverpool John Moores UniversityMark is a Senior Lecturer in 3D Digital Art at Liverpool School of Art and Design and a member of the Face Lab research group that explores faces and art-science applications. Mark trained as a medical artist, gaining his MSc in Medical Art from the University of Dundee, and specialises in visualising anatomy through 3D data acquisition, modelling and fabrication. His research focuses on the affordances of 3D digital technologies for both digital and haptic interaction with anatomical and cultural artefacts. Mark is also the programme leader for the MA Art in Science programme, which provides exciting opportunities for artists and scientists to collaborate and explore the boundaries of art and science.Tom Hyatt - PhD Student at the Liverpool School of Art & Design, Liverpool John Moores UniversityTom is a polymathic artist, musician, scientist, educator, and maker from Rossendale, Lancashire. After graduating with a Masters in physics and philosophy from Oxford University he moved to London to pursue grassroots music and a career in the arts, while teaching maths and physics. He moved back up to Liverpool after receiving a PhD scholarship to study at the Liverpool School of Art and Design. Recently he has been working with Natasha Niethamer to create the ‘Tropical Medicine Time Machine' for LSTM – a multifaceted piece of sci-art public engagement that encompasses the length and breadth of LSTM's prolific 125 years.Natasha Niethamer – SciArtist, Liverpool School of Tropical MedicineNatasha is a SciArtist commissioned to create a pop up museum for LSTMs 125th Anniversary, along with a set of interactive loan boxes for use in local primary schools. Natasha has a special interest in sci-art activism and public outreach in microbiology and antimicrobial resistance. In 2020, she graduated from the MA Art in...

LW - Parameter Scaling Comes for RL, Maybe by 1a3orn

The Nonlinear Library

Play Episode Listen Later Jan 24, 2023 22:57

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Parameter Scaling Comes for RL, Maybe, published by 1a3orn on January 24, 2023 on LessWrong. TLDR Unlike language models or image classifiers, past reinforcement learning models did not reliably get better as they got bigger. Two DeepMind RL papers published in January 2023 nevertheless show that with the right techniques, scaling up RL model parameters can increase both total reward and sample-efficiency of RL agents -- and by a lot. Return-to-scale has been key for rendering language models powerful and economically valuable; it might also be key for RL, although many important questions remain unanswered. Intro Reinforcement learning models often have very few parameters compared to language and image models. The Vision Transformer has 2 billion parameters. GPT-3 has 175 billion. The slimmer Chinchilla, trained in accord with scaling laws emphasizing bigger datasets, has 70 billion. By contrast, until a month ago, the largest mostly-RL models I knew of were the agents for Starcraft and Dota2, AlphaStar and OpenAI5, which had 139 million and 158 million parameters. And most RL models are far smaller, coming in well under 50 million parameters. The reason RL hasn't scaled up the size of its models is simple -- doing so generally hasn't made them better. Increasing model size in RL can even hurt performance. MuZero Reanalyze gets worse on some tasks as you scale network size. So does a vanilla SAC agent. There has been good evidence for scaling model size in somewhat... non-central examples of RL. For instance, offline RL agents trained from expert examples, such as DeepMind's 1.2-billion parameter Gato or Multi-Game Decision Transformers, clearly get better with scale. Similarly, RL from human feedback on language models generally shows that larger LM's are better. Hybrid systems such as PaLM SayCan benefit from larger language models. But all these cases sidestep problems central to RL -- they have no need to balance exploration and exploitation in seeking reward. In the typical RL setting -- there has generally been little scaling and little evidence for the efficacy of scaling. (Although there has not been no evidence.) None of the above means that the compute spent on RL models is small or that compute scaling does nothing for them. AlphaStar used only a little less compute than GPT-3, and AlphaGo Zero used more, because both of them trained on an enormous number of games. Additional compute predictably improves performance of RL agents. But, rather than getting a bigger brain, almost all RL algorithms spend this compute by (1) training on an enormous number of games (2) or (if concerned with sample-efficiency) by revisiting the games that they've played an enormous number of times. So for a while RL has lacked: (1) The ability to scale up model size to reliably improve performance. (2) (Even supposing the above were around) Any theory like the language-model scaling laws which would let you figure out how to allocate compute between model size / longer training. My intuition is that the lack of (1), and to a lesser degree the lack of (2), is evidence that no one has stumbled on the "right way" to do RL or RL-like problems. It's like language modeling when it only had LSTMS and no Transformers, before the frighteningly straight lines in log-log charts appeared. In the last month, though, two RL papers came out with interesting scaling charts, each showing strong gains to parameter scaling. Both were (somewhat unsurprisingly) from DeepMind. This is the kind of thing that leads me to think "Huh, this might be an important link in the chain that brings about AGI." The first paper is "Mastering Diverse Domains Through World Models", which names its agent DreamerV3. The second is "Human-Timescale Adaptation in an Open-Ended Task Space", which names its agent Adaptive...

speech increasing scaling hybrid transformers ea gpt adaptive gato agi starcraft sac deepmind lm rl parameter chinchillas dota2 rationalist alphastar lesswrong alphago zero lstms

LW - Parameter Scaling Comes for RL, Maybe by 1a3orn

The Nonlinear Library: LessWrong

Play Episode Listen Later Jan 24, 2023 22:57

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Parameter Scaling Comes for RL, Maybe, published by 1a3orn on January 24, 2023 on LessWrong. TLDR Unlike language models or image classifiers, past reinforcement learning models did not reliably get better as they got bigger. Two DeepMind RL papers published in January 2023 nevertheless show that with the right techniques, scaling up RL model parameters can increase both total reward and sample-efficiency of RL agents -- and by a lot. Return-to-scale has been key for rendering language models powerful and economically valuable; it might also be key for RL, although many important questions remain unanswered. Intro Reinforcement learning models often have very few parameters compared to language and image models. The Vision Transformer has 2 billion parameters. GPT-3 has 175 billion. The slimmer Chinchilla, trained in accord with scaling laws emphasizing bigger datasets, has 70 billion. By contrast, until a month ago, the largest mostly-RL models I knew of were the agents for Starcraft and Dota2, AlphaStar and OpenAI5, which had 139 million and 158 million parameters. And most RL models are far smaller, coming in well under 50 million parameters. The reason RL hasn't scaled up the size of its models is simple -- doing so generally hasn't made them better. Increasing model size in RL can even hurt performance. MuZero Reanalyze gets worse on some tasks as you scale network size. So does a vanilla SAC agent. There has been good evidence for scaling model size in somewhat... non-central examples of RL. For instance, offline RL agents trained from expert examples, such as DeepMind's 1.2-billion parameter Gato or Multi-Game Decision Transformers, clearly get better with scale. Similarly, RL from human feedback on language models generally shows that larger LM's are better. Hybrid systems such as PaLM SayCan benefit from larger language models. But all these cases sidestep problems central to RL -- they have no need to balance exploration and exploitation in seeking reward. In the typical RL setting -- there has generally been little scaling and little evidence for the efficacy of scaling. (Although there has not been no evidence.) None of the above means that the compute spent on RL models is small or that compute scaling does nothing for them. AlphaStar used only a little less compute than GPT-3, and AlphaGo Zero used more, because both of them trained on an enormous number of games. Additional compute predictably improves performance of RL agents. But, rather than getting a bigger brain, almost all RL algorithms spend this compute by (1) training on an enormous number of games (2) or (if concerned with sample-efficiency) by revisiting the games that they've played an enormous number of times. So for a while RL has lacked: (1) The ability to scale up model size to reliably improve performance. (2) (Even supposing the above were around) Any theory like the language-model scaling laws which would let you figure out how to allocate compute between model size / longer training. My intuition is that the lack of (1), and to a lesser degree the lack of (2), is evidence that no one has stumbled on the "right way" to do RL or RL-like problems. It's like language modeling when it only had LSTMS and no Transformers, before the frighteningly straight lines in log-log charts appeared. In the last month, though, two RL papers came out with interesting scaling charts, each showing strong gains to parameter scaling. Both were (somewhat unsurprisingly) from DeepMind. This is the kind of thing that leads me to think "Huh, this might be an important link in the chain that brings about AGI." The first paper is "Mastering Diverse Domains Through World Models", which names its agent DreamerV3. The second is "Human-Timescale Adaptation in an Open-Ended Task Space", which names its agent Adaptive...

speech increasing scaling hybrid transformers ea gpt adaptive gato agi starcraft sac deepmind lm rl parameter chinchillas dota2 rationalist alphastar lesswrong alphago zero lstms

LW - The next decades might be wild by Marius Hobbhahn

The Nonlinear Library

Play Episode Listen Later Dec 15, 2022 64:11

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The next decades might be wild, published by Marius Hobbhahn on December 15, 2022 on LessWrong. I'd like to thank Simon Grimm and Tamay Besiroglu for feedback and discussions. This post is inspired by What 2026 looks like and an AI vignette workshop guided by Tamay Besiroglu. I think of this post as “what would I expect the world to look like if these timelines (median compute for transformative AI ~2036) were true” or “what short-to-medium timelines feel like” since I find it hard to translate a statement like “median TAI year is 20XX” into a coherent imaginable world. I expect some readers to think that the post sounds wild and crazy but that doesn't mean its content couldn't be true. If you had told someone in 1990 or 2000 that there would be more smartphones and computers than humans in 2020, that probably would have sounded wild to them. The same could be true for AIs, i.e. that in 2050 there are more human-level AIs than humans. The fact that this sounds as ridiculous as ubiquitous smartphones sounded to the 1990/2000 person, might just mean that we are bad at predicting exponential growth and disruptive technology. Update: titotal points out in the comments that the correct timeframe for computers is probably 1980 to 2020. So the correct time span is probably 40 years instead of 30. For mobile phones, it's probably 1993 to 2020 if you can trust this statistic. I'm obviously not confident (see confidence and takeaways section) in this particular prediction but many of the things I describe seem like relatively direct consequences of more and more powerful and ubiquitous AI mixed with basic social dynamics and incentives. Taking stock of the past Some claims about the past Tech can be very disruptive: The steam engine, phones, computers, the internet, smartphones, etc; all of these have changed the world in a relatively quick fashion (e.g. often just a few decades between introduction and widespread use). Furthermore, newer technologies disrupt faster (e.g. smartphones were adopted faster than cars) due to faster supply-chain integration, faster R&D, larger investments and much more. People who were born 10 years before me grew up without widespread smartphones and the internet, I grew up with the benefits of both and people who are 10 years younger than me play high-resolution video games on their smartphones. Due to technology, it is the norm, not the exception, that people who are born 10 years apart can have very different childhoods. AI is getting useful: There are more and more tasks in which AI is highly useful in the real world. By now certain narrow tasks can be automated to a large extent, e.g., protein folding prediction, traffic management, image recognition, sales predictions and much more. Some of these, e.g. sales predictions sometimes rely on technology developed before 2000, but many of the more complex and less structured tasks such as traffic predictions are built on top of Deep Learning architectures only developed in the 2010s. Thus we are either one or multiple decades deep in the AI disruption period. Based on this, I consider another AI winter unlikely. You can make money with AI right now and the ML systems can automate a lot of tasks today even if we didn't find any additional breakthroughs. Transformers work astonishingly well: Originally transformers were used mostly on language modeling tasks but by now they have been successfully applied to many computer vision tasks, reinforcement learning, robotics, time series and much more. The basic recipe of “transformer+data+compute+a bit of engineering” seems to work astonishingly well. This was not true for previous architectures such as CNNs or LSTMs. People use AI to build AI: The first large language models (LLMs) that assist human coders seem to genuinely improve their performance, ther...

ai wild cnn speech transformers decades ea ml marius r d tai deep learning 20xx rationalist lesswrong simon grimm lstms

LW - The next decades might be wild by Marius Hobbhahn

The Nonlinear Library: LessWrong

Play Episode Listen Later Dec 15, 2022 64:11

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The next decades might be wild, published by Marius Hobbhahn on December 15, 2022 on LessWrong. I'd like to thank Simon Grimm and Tamay Besiroglu for feedback and discussions. This post is inspired by What 2026 looks like and an AI vignette workshop guided by Tamay Besiroglu. I think of this post as “what would I expect the world to look like if these timelines (median compute for transformative AI ~2036) were true” or “what short-to-medium timelines feel like” since I find it hard to translate a statement like “median TAI year is 20XX” into a coherent imaginable world. I expect some readers to think that the post sounds wild and crazy but that doesn't mean its content couldn't be true. If you had told someone in 1990 or 2000 that there would be more smartphones and computers than humans in 2020, that probably would have sounded wild to them. The same could be true for AIs, i.e. that in 2050 there are more human-level AIs than humans. The fact that this sounds as ridiculous as ubiquitous smartphones sounded to the 1990/2000 person, might just mean that we are bad at predicting exponential growth and disruptive technology. Update: titotal points out in the comments that the correct timeframe for computers is probably 1980 to 2020. So the correct time span is probably 40 years instead of 30. For mobile phones, it's probably 1993 to 2020 if you can trust this statistic. I'm obviously not confident (see confidence and takeaways section) in this particular prediction but many of the things I describe seem like relatively direct consequences of more and more powerful and ubiquitous AI mixed with basic social dynamics and incentives. Taking stock of the past Some claims about the past Tech can be very disruptive: The steam engine, phones, computers, the internet, smartphones, etc; all of these have changed the world in a relatively quick fashion (e.g. often just a few decades between introduction and widespread use). Furthermore, newer technologies disrupt faster (e.g. smartphones were adopted faster than cars) due to faster supply-chain integration, faster R&D, larger investments and much more. People who were born 10 years before me grew up without widespread smartphones and the internet, I grew up with the benefits of both and people who are 10 years younger than me play high-resolution video games on their smartphones. Due to technology, it is the norm, not the exception, that people who are born 10 years apart can have very different childhoods. AI is getting useful: There are more and more tasks in which AI is highly useful in the real world. By now certain narrow tasks can be automated to a large extent, e.g., protein folding prediction, traffic management, image recognition, sales predictions and much more. Some of these, e.g. sales predictions sometimes rely on technology developed before 2000, but many of the more complex and less structured tasks such as traffic predictions are built on top of Deep Learning architectures only developed in the 2010s. Thus we are either one or multiple decades deep in the AI disruption period. Based on this, I consider another AI winter unlikely. You can make money with AI right now and the ML systems can automate a lot of tasks today even if we didn't find any additional breakthroughs. Transformers work astonishingly well: Originally transformers were used mostly on language modeling tasks but by now they have been successfully applied to many computer vision tasks, reinforcement learning, robotics, time series and much more. The basic recipe of “transformer+data+compute+a bit of engineering” seems to work astonishingly well. This was not true for previous architectures such as CNNs or LSTMs. People use AI to build AI: The first large language models (LLMs) that assist human coders seem to genuinely improve their performance, ther...

ai wild cnn speech transformers decades ea ml marius tai deep learning 20xx rationalist lesswrong simon grimm lstms

AI Transformers in context, with Aleksa Gordić

London Futurists

Play Episode Listen Later Sep 22, 2022 28:03

Welcome to episode 5 of the London Futurist podcast, with your co-hosts David Wood and Calum Chace.We're attempting something rather ambitious in episodes 5 and 6. We try to explain how today's cutting edge artificial intelligence systems work, using language familiar to lay people, rather than people with maths or computer science degrees.Understanding how Transformers and Generative Adversarial Networks (GANs) work means getting to grips with concepts like matrix transformations, vectors, and landscapes with 500 dimensions.This is challenging stuff, but do persevere. These AI systems are already having a profound impact, and that impact will only grow. Even at the level of pure self-interest, it is often said that in the short term, AIs won't take all the jobs, but people who understand AI will take the best jobs.We are extremely fortunate to have as our guide for these episodes a brilliant AI researcher at DeepMind, Aleksa Gordić.Note that Aleksa is speaking in personal capacity and is not representing DeepMind.Aleksa's YouTube channel is https://www.youtube.com/c/TheAIEpiphany00.03 An ambitious couple of episodes01.22 Introducing Aleksa, a double rising star02.15 Keeping it simple02.50 Aleksa's current research, and previous work on Microsoft's HoloLens03.40 Self-taught in AI. Not representing DeepMind04.20 The narrative of the Big Bang in 2012, when Machine Learning started to work in AI.05.15 What machine learning is05.45 AlexNet. Bigger data sets and more powerful computers06.40 Deep learning a subset of machine learning, and a re-branding of artificial neural networks07.27 2017 and the arrival of Transformers07.40 Attention is All You Need08.16 Before this there were LSTMs, Long Short-Term Memories08.40 Why Transformers beat LSTMs09.58 Tokenisation. Splitting text into smaller units and mapping them onto higher dimension networks10.30 3D space is defined by three numbers10.55 Humans cannot envisage multi-dimensional spaces with hundreds of dimensions, but it's OK to imagine them as 3D spaces11.55 Some dimensions of the word "princess"12.30 Black boxes13.05 People are trying to understand how machines handle the dimensions13.50 "Man is to king as woman is to queen." Using mathematical operators on this kind of relationship14.35 Not everything is explainable14.45 Machines discover the relationships themselves15.15 Supervised and self-supervised learning. Rewarding or penalising the machine for predicting labels16.25 Vectors are best viewed as arrows in 3D space, although that is over-simplifying17.20 For instance the relationship between "queen" and "woman" is a vector17.50 Self-supervised systems do their own labelling18.30 The labels and relationships have probability distributions19.20 For instance, a princess is far more likely to wear a slipper than a dog19.35 Large numbers of parameters19.40 BERT, the original Transformer, had a hundred million or so parameters20.04 Now it's in the hundreds of billions, or even trillions20.24 A parameter is analogous to a synapse in the human brain21.19 Synapses can have different weights22.10 The more parameters, the lower the loss22.35 Not just text, but images too, because images can also be represented as tokens23.00 In late 2020 Google released the first vision Transformer23.29 Dall-E and Midjourney are diffusion models, which have replaced GANs24.15 What are GANs, or Generative Adversarial Networks?24.45 Two types of model: Generators and Discriminators. The first tries to fool the second26.20 Simple text can produce photorealistic images27.10 Aleksa's YouTube videos are available at "The AI Epiphany"27.40 CloseMusic: Spike Protein, by Koi Discovery, available under CC0 1.0 Public Domain Declaration

#5 Felix Hill: Grounded Language, Transformers, and DeepMind

The Embodied AI Podcast

Play Episode Listen Later Jul 6, 2022 94:48

Felix is a research scientist at DeepMind. He is interested in grounded language understanding and natural language processing (NLP). After finding out about Felix's background, we bring up compositionality and explore why natural language is NonCompositional (also, the name of Felix's blog). Then, Felix tells us a bit about his work in Cambridge on abstract vs concrete concepts and gives us a quick crash course on the role of recurrent neural networks (RNNs), long short-term memory (LSTMs), and transformers in language models. Next, we talk about Jeff Elman's landmark paper 'Finding Structure in Time' and how neural networks can learn to understand analogies. After, we discuss the core of Felix work: Training language agents in 3D simulations, where we raise some questions on language learning as an embodied agent in space and time, and Allan Paivio's dual coding theory implemented in the memory of a language model. Next, we stick with the theme of memory retrieval and discuss Felix and Andrew Lampinen's work on 'mental time travel' in language models. Finally, I ask Felix on some good strategies on how to get into DeepMind and the best way to learn NLP. Timestamps: (00:00) - Intro (07:57) - Compositionality in natural language (16:42) - Abstract vs concrete concepts (24:03) - RNNs, LSTMs, Transformers (34:12) - Prediction, time and Jeff Elman (48:04) - Neural networks & analogies (56:32) - Grounded language, 3D simulations, babies, (01:05:20) - Keeping vision and language data separate (01:13:51) - NeuroAI and mental time travel (01:21:47) - Getting into DeepMind and learning NLP Felix Website (good overview for his papers) https://fh295.github.io/ Abstract vs concrete concepts paper https://onlinelibrary.wiley.com/doi/full/10.1111/cogs.12076 Jeff Elman (1990): Finding structure in time https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog1402_1 Analogies paper https://openreview.net/pdf?id=SylLYsCcFm Dual coding theory paper https://arxiv.org/abs/2009.01719 Mental Time Travel paper https://arxiv.org/abs/2105.14039 My Twitter https://twitter.com/Embodied_AI My LinkedIn https://www.linkedin.com/in/akseli-ilmanen-842098181/

time training predictions language 3d cambridge nlp transformers grounded abstract neural deepmind rnns lstms compositionality

Ep 10 - Daniel Hoyos (Machine Learning en Blue Orange y antes en Mercado Libre) - ML para series de tiempo

We Decentralize Tech

Play Episode Listen Later Feb 2, 2022 79:18

Daniel habla personalmente y no representando a Blue Orange o Mercado Libre de ninguna manera. Daniel Hoyos trabaja como Machine Learning Engineer en Blue Orange y anteriormente como Senior Machine Learning Engineer en Mercado Libre. Se especializa en forecasting con series de tiempo utilizando técnicas de machine learning. Twitter: @dannyehb Linkedin: linkedin.com/in/daniel-hoyos-2b3a07a1 Consejos generales: Todas las personas deben aprender a trabajar en equipo sin importar cuáles sean sus intereses. Es necesario tener una especialidad pero también manejar otros temas más generales. Es una combinación entre ambas cosas. Los Data Engineers son muy valorados! Consejos para entrevistas de trabajo: Las compañías toman en cuenta la calidad de las respuestas, el tono de voz y la seguridad del participante. De esta manera saben si el candidato tiene conocimiento sobre un tema o está mintiendo. Al tener un portafolio de proyectos en GitHub los empleadores sabrán cuáles son tus conocimientos y habilidades. Es una forma mucho más objetiva de poder validar la calidad del aplicación te como programador y si es apto o no para el empleo. Muestra tus trabajos. Muestra lo que tu haces. Esto expande tu área laboral. Entre más te dejes ver, más oportunidades laborales tendrás. Algoritmos que se pueden usar para forecasting de time series (Cada uno de estos algoritmos será útil dependiendo de lo que se necesite pronosticar y de cómo arregles tus datos. : Modelos de regresión lineal para hacer una predicción en series de tiempo. Modelos de árboles. Random forests. Perceptron. LSTMs. GRU. Temporal Fusion Transformer

Predictive Maintenance Using Deep Learning and Reliability Engineering with Shayan Mortazavi - #540

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Nov 29, 2021 49:01

Today we're joined by Shayan Mortazavi, a data science manager at Accenture. In our conversation with Shayan, we discuss his talk from the recent SigOpt HPC & AI Summit, titled A Novel Framework Predictive Maintenance Using Dl and Reliability Engineering. In the talk, Shayan proposes a novel deep learning-based approach for prognosis prediction of oil and gas plant equipment in an effort to prevent critical damage or failure. We explore the evolution of reliability engineering, the decision to use a residual-based approach rather than traditional anomaly detection to determine when an anomaly was happening, the challenges of using LSTMs when building these models, the amount of human labeling required to build the models, and much more! The complete show notes for this episode can be found at twimlai.com/go/540

accenture deep learning ai summit shayan predictive maintenance reliability engineering lstms

Large-scale machine learning and AI on multi-modal data

The Data Exchange with Ben Lorica

Play Episode Listen Later Oct 7, 2021 32:05

This week's guest is Bob Friday, VP and CTO at Mist Systems a Juniper Company. Bob is a serial entrepreneur and seasoned technologist, and at Mist his team uses data technologies, machine learning , and AI to “optimize user experiences and simplify operations across the wireless access, wired access, and SD-WAN domains”. Bob and his team build models from structured, semi-structured, and unstructured data. They have deployed anomaly detection models that rely on deep learning (LSTMs) and have begun exploring the use of graph neural networks for a variety of use cases. They have also built and deployed systems that use recent advances in natural language models. Their virtual assistant provides insight and guidance to IT staff via a natural language conversational interface.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

ai data cto mist machine learning detailed large scale modal sd wan lstms mist systems

Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later May 17, 2021 36:15

Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of recurrent neural networks for learning long-time dependencies. We explore the inspiration he drew from neuroscience when tackling this problem, how the performance results compared to networks like LSTMs and others that have been proven to work on this problem and Konstantin’s future research goals. The complete show notes for this episode can be found at twimlai.com/go/484.

learning neuroscience longtime neurobiology konstantin neurons phd student eth zurich dependencies rusch lstm rnns schmidhuber iclr lstms

Data Science/AI Interview Experience and Tips from BEPEC Learner

Tech Podcast's - Data Science, AI, Machine Learning(BEPEC)

Play Episode Listen Later Dec 2, 2020 16:24

To increase your chances of getting hired into the company as the Data Scientist/AI Engineer, every learner must place below projects. 1. Forecasting Live Projects using LSTMs, RNNs, Bidirectional RNNs, e.t.c.2. Text & Audio Related Projects using BERT, Attention-based Neural Networks, LSTMs3. Computer Vision Projects using CNNs and different types of CNNs4. ML AlgorithmsSo, to meet the current market demand with live projects in your Resume, To get the Live Projects to work and to place in your resume please contact +919121516181 or https://www.bepec.in/registration-formFollow our Mentor Mr Kanth on Instagram @meet_kanth and book a free call consultation.

cnn attention resume data science learner neural networks rnns lstms

Learning, Visualizing and Exploring 16S rRNA Structure Using an Attention-based Deep Neural Network

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Oct 12, 2020

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.12.336271v1?rss=1 Authors: Zhao, Z., Woloszynek, S., Agbavor, F., Mell, J., Sokhansanj, B. A., Rosen, G. Abstract: Recurrent neural networks (RNNs) with memory (e.g. LSTMs) and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional networks, recurrent neural networks, and attention mechanisms to perform sample-associated attribute prediction---phenotype prediction---and extract interesting features, such as informative taxa and predictive k-mer context. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We focus on typically short DNA reads of 16s ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. Our deep learning approach enables sample-level attribute and taxonomic prediction, with the aim of aiding biological research and supporting medical diagnosis. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction and, in turn, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance comparable to conventional approaches. Most importantly, as a further result of the training process, the network architecture will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output on the intermediate layer of the network model, which can provide biological insight when visualized. Finally, we demonstrate that a model with an attention layer can automatically identify informative regions in sequences/reads which are particularly informative for classification tasks. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention. Copy rights belong to original authors. Visit the link for more info

learning deep dna attention structure copy rosen visualizing neural networks mell biorxiv rnns 16s rrna lstms eesi

Mind reading of the proteins: Deep-learning to forecast molecular dynamics

PaperPlayer biorxiv biophysics

Play Episode Listen Later Jul 29, 2020

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.28.225490v1?rss=1 Authors: Gupta, C., Cava, J. K., Sarkar, D., Wilson, E. A., Vant, J., Murray, S., Singharoy, A., Karmaker, S. K. Abstract: Molecular dynamics (MD) simulations have emerged to become the back- bone of today's computational biophysics. Simulation tools such as, NAMD, AMBER and GROMACS have accumulated more than 100,000 users. Despite this remarkable success, now also bolstered by compatibility with graphics processor units (GPUs) and exascale computers, even the most scalable simulations cannot access biologically relevant timescales - the number of numerical integration steps necessary for solving differential equations in a million-to-billion-dimensional space is computationally in- tractable. Recent advancements in Deep Learning has made it such that patterns can be found in high dimensional data. In addition, Deep Learning have also been used for simulating physical dynamics. Here, we utilize LSTMs in order to predict future molecular dynamics from current and previous timesteps, and examine how this physics-guided learning can benefit researchers in computational biophysics. In particular, we test fully connected Feed-forward Neural Networks, Recurrent Neural Networks with LSTM / GRU memory cells with TensorFlow and PyTorch frame- works trained on data from NAMD simulations to predict conformational transitions on two different biological systems. We find that non-equilibrium MD is easier to train and performance improves under the assumption that each atom is independent of all other atoms in the system. Our study represents a case study for high-dimensional data that switches stochastically between fast and slow regimes. Applications of re- solving these sets will allow real-world applications in the interpretation of data from Atomic Force Microscopy experiments. Copy rights belong to original authors. Visit the link for more info

md applications copy dynamics forecast simulation molecular deep learning proteins gpus cava neural networks mind reading tensorflow sarkar vant pytorch biorxiv namd lstms

Capturing brain dynamics: latent spatiotemporal patterns predict stimuli and individual differences

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Jun 12, 2020

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.06.11.146969v1?rss=1 Authors: Venkatesh, M., JaJa, J., Pessoa, L. Abstract: Insights from functional Magnetic Resonance Imaging (fMRI), and more recently from recordings of large numbers of neurons through calcium imaging, reveal that many cognitive, emotional, and motor functions depend on the multivariate interactions of neuronal populations. To capture and characterize spatiotemporal properties of brain events, we propose an architecture based on long short-term memory (LSTM) networks to uncover distributed spatiotemporal signatures during dynamic experimental conditions. We demonstrate the potential of the approach using naturalistic movie-watching fMRI data. We show that movie clips result in complex but distinct spatiotemporal patterns in brain data that can be classified using LSTMs ({approx}90% for 15-way classification), demonstrating that learned representations generalized to unseen participants. LSTMs were also superior to existing methods in predicting behavior and personality traits of individuals. We propose a dimensionality reduction approach that uncovers low-dimensional trajectories and captures essential informational properties of brain dynamics. Finally, we employed saliency maps to characterize spatiotemporally-varying brain-region importance. The spatiotemporal saliency maps revealed dynamic but consistent changes in fMRI activation data. We believe our approach provides a powerful framework for visualizing, analyzing, and discovering dynamic spatially distributed brain representations during naturalistic conditions. Copy rights belong to original authors. Visit the link for more info

brain patterns copy dynamics capturing predict pessoa jaja fmri latent individual differences stimuli biorxiv lstm spatiotemporal lstms magnetic resonance imaging fmri

14: Building a Character-Based Text Classifier

The Data Life Podcast

Play Episode Listen Later Aug 7, 2019 23:20

Ever wonder how to automatically detect language from a script? How does Google do it? Ever wonder how Amazon knows whether you are searching for a product or a SKU on its search bar? We look into character-based text classifiers in this episode. We cover 2 types of models. First is the bag-of-words models such as Naive Bayes, logistic regression and vanilla neural network. Second we cover sequence models such as LSTMs and how to prepare your characters for the LSTMs including things like one-hot encoding, padding, creating character embeddings and then feeding these into LSTMs. We also cover how to set up and compile these sequence models. Thanks for listening, and if you find this content useful, please leave a review and consider supporting this podcast from the link below. --- Send in a voice message: https://anchor.fm/the-data-life-podcast/message Support this podcast: https://anchor.fm/the-data-life-podcast/support

amazon google character sku classifier lstms naive bayes

The Unreasonable Effectiveness of the Forget Gate with Jos Van Der Westhuizen - TWiML Talk #240

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Mar 18, 2019 33:32

Today we’re joined by Jos Van Der Westhuizen, PhD student in Engineering at Cambridge University. Jos’ research focuses on applying LSTMs, or Long Short-Term Memory neural networks, to biological data for various tasks. In our conversation, we discuss his paper The unreasonable effectiveness of the forget gate, in which he explores the various “gates” that make up an LSTM module and the general impact of getting rid of gates on the computational intensity of training the networks. Jos eventually determines that leaving only the forget-gate results in an unreasonably effective network, and we discuss why. Jos also gives us some great LSTM related resources, including references to Jurgen Schmidhuber, whose research group invented the LSTM, and who I spoke to back in Talk #44. Thanks to Pegasystems for sponsoring today's show! I'd like to invite you to join me at PegaWorld, the company’s annual digital transformation conference, which takes place this June in Las Vegas. To learn more about the conference or to register, visit pegaworld.com and use TWIML19 in the promo code field when you get there for $200 off. The complete show notes for this episode can be found at https://twimlai.com/talk/240.

Juergen Schmidhuber: Godel Machines, Meta-Learning, and LSTMs

Lex Fridman Podcast

Play Episode Listen Later Dec 23, 2018 80:06

Juergen Schmidhuber is the co-creator of long short-term memory networks (LSTMs) which are used in billions of devices today for speech recognition, translation, and much more. Over 30 years, he has proposed a lot of interesting, out-of-the-box ideas in artificial intelligence including a formal theory of creativity. Video version is available on YouTube. If you would like to get more information about this podcast go to https://lexfridman.com/ai or connect with @lexfridman on Twitter, LinkedIn, Facebook, or YouTube where you can watch the video versions of these conversations.

learning video machines juergen godel schmidhuber lstms

AWS Machine Learning Services

Develpreneur: Become a Better Developer and Entrepreneur

Play Episode Listen Later Dec 3, 2018 21:48

The AWS machine learning services are more examples of the newer offerings. Nevertheless, these are growing fast and can help you embrace cutting edge technology. Machine learning is a recent technology in general so the time you spend understanding these services may help you land that next job. Amazon SageMaker This service provides a method for building, training, and deploying machine learning models at any scale. This is a great way to try out machine learning. The time you spend here is good to use on your next resume update. You do need to put some data on S3 to analyze and then check out the use cases. There is a free tier for the first two months. Amazon Comprehend Quick and easy text analysis. Send your text to this service to analyze it for keywords among many other ways to do so. There is a free tier you can use to try it out and find out ways to organize and mine your content. Amazon Lex This service allows you to build voice and chatbots using the technology that drives Alexa. There are some templates, and the interface makes it easy to get started quickly. Amazon Polly If you want to create audio from your content, then this is the service for you. Try out the service a few thousand words at a time for free, and you can even download the audio in mp3 format. Amazon Rekognition The features that Comprehend provides for text is moved into the video world by Rekognition. This service analyzes video and can highlight or recognize people, objects, and other details you might search for in a stream. Amazon Translate This service provides a quick and easy way to translate text between any two languages. Much like Google translate, it is quick and provides an API that you can use to significantly increase your audience. Amazon Transcribe If you have ever wondered about transcribing audio notes (or a podcast), then this is the service for you. It is quick and easy to customize for even highly technical terms. The accuracy varies based on the clarity of the audio and background noise. AWS DeepLens This service is best understood by utilizing the tutorials. It provides a way to analyze videos for objects, faces, and activities. An essential difference between this and the others is that this is a piece of hardware and not just a service. It provides a camera with HD and onboard analysis tools for real-time processing of video. AWS Deep Learning AMIs This service provides quick start machine learning on EC2 through the AMIs. The configuration of a machine learning development environment can be tedious and time-consuming. These AMI options offer a shortcut to get working sooner. Apache MXNet on AWS This is a machine learning framework Apache MXNet is a fast and scalable training and inference framework with an easy-to-use, concise API for machine learning. MXNet includes the Gluon interface that allows developers of all skill levels to get started with deep learning on the cloud, on edge devices, and mobile apps. In just a few lines of Gluon code, you can build linear regression, convolutional networks and recurrent LSTMs for object detection, speech recognition, recommendation, and personalization. TensorFlow on AWS This is a machine learning framework on AWS. I think their description works best and avoids any ignorance about it on my end. "TensorFlow™ enables developers to quickly and easily get started with deep learning in the cloud. The framework has broad support in the industry and has become a popular choice for deep learning research and application development, particularly in areas such as computer vision, natural language understanding, and speech translation. You can get started on AWS with a fully-managed TensorFlow experience with Amazon SageMaker, a platform to build, train, and deploy machine learning models at scale. Or, you can use the AWS Deep Learning AMIs to build custom environments and workflows with TensorFlow and other popular frameworks including Apache MXNet, PyTorch, Caffe, Caffe2, Chainer, Gluon, Keras, and Microsoft Cognitive Toolkit."

google development machine learning api aws amis s3 comprehend tensorflow keras caffe pytorch ec2 rekognition learning services amazon sagemaker gluon chainer lstms mxnet apache mxnet

Milestones in Neural Natural Language Processing with Sebastian Ruder - TWiML Talk #195

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Oct 29, 2018 61:40

In this episode, we’re joined by Sebastian Ruder, a PhD student studying natural language processing at the National University of Ireland and a Research Scientist at text analysis startup Aylien. In our conversation, Sebastian and I discuss recent milestones in neural NLP, including multi-task learning and pretrained language models. We also discuss the use of attention-based models, Tree RNNs and LSTMs, and memory-based networks. Finally, Sebastian walks us through his recent ULMFit paper, short for “Universal Language Model Fine-tuning for Text Classification,” which he co-authored with Jeremy Howard of fast.ai who I interviewed in episode 186. For the complete show notes for this episode, visit https://twimlai.com/talk/195.

Machine Learning, Big Data, and Your Family History

Data Crunch

Play Episode Listen Later Sep 26, 2018 21:10

How can artificial intelligence, machine learning, and deep learning benefit your family? These technologies are moving into every field, industry, and hobby, including what some say is the United State's second most popular hobby, family history. Today, it's so much easier to trace your roots back to find out more about your progenitors. Tyler Folkman, senior manager at Ancestry, the leading family history company, describes to us how he and his team use convolutional neural networks, LSTMs, conditional random fields, and the like to more easily piece together the puzzle of your family tree. Ginette: Today we peek into an area rich in data that has lots of interesting AI and machine learning problems. Curtis: The second most popular hobby in the United States, some claim, is family history research. And whether that’s true or not, it's has had a lot of growth recently. Personal DNA testing products have exploded in popular over the past three years, but beyond this popular product, lots of people go a step further and start tracing their roots back to piece together the puzzle of their family tree. Today we’re going to dive into the data side of this hobby with the leading family history research company. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how data and prediction shape our world. Ginette: A Vault Analytics production. Ginette: This episode of Data Crunch is supported by Lightpost Analytics, a company helping bridge the last mile of AI: making data and algorithms understandable and actionable for a non-technical person, like the CEO of your company. Lightpost Analytics is offering a training academy to teach you Tableau, an industry-leading data visualization software. According to Indeed.com, the average salary for a Tableau Developer is above $50 per hour. If done well, making data understandable can create breakthroughs in your company and lead to recognition and promotions in your job. Go to lightpostanalytics.com/datacrunch to learn more and get some freebies. Tyler: My name's Tyler Folkman. Curtis: Who is a Senior manager of data science at Ancestry. Tyler: As I look across Ancestry and family history, we almost have, like, every kind of machine learning problem you might want, I mean, probably not every kind, but we have genetically based machine learning problems on the DNA science side. We have search optimization because people need to search our databases. We have recommendation problems because we want to hint the best resources out to people or provide them. For example, if we have a hundred things we think might be relevant to a person, what order do we showed them? So we use recommendation algorithms for that. We have a lot of computer vision problems because people upload pictures and a lot of our documents, if they're not like digitized yet, meaning that they’ve extracted the text, they might just be raw photos, or even just the things that our pictures uploaded, we want to understand what's in them, so is this a picture of a graveyard is it a family portrait? Is it an old photo? And so tons of computers vision stuff, natural language processing. On the business side, we have marketing problems just like any other business, like how do you optimize marketing spend? How do you optimize customer experience, customer flow? And so it's really a cool place because you really can get exposed to almost any type of problem you might be interested in. Curtis: So back in the 80s, before you could go easily find information on the Internet, genealogists had to spend a ton of time trekking around to libraries to try to find information on their ancestors. Ancestry saw a business opportunity and started selling floppy disks, and eventually CDs, full of genealogical resources for genealogists to easily access in their home. Tyler: And then they grew up through the Internet age and moved out ...

united states ceo ai internet dna senior machine learning cds big data ancestry family history tableau ginette lstms curtis a

MCL303: Deep Learning with Apache MXNet and Gluon

AWS re:Invent 2017

Play Episode Listen Later Nov 30, 2017 59:20

Developing deep learning applications just got even simpler and faster. In this session, you will learn how to program deep learning models using Gluon, the new intuitive, dynamic programming interface available for the Apache MXNet open-source framework. We'll also explore neural network architectures such as multi-layer perceptrons, convolutional neural networks (CNNs) and LSTMs.

developing cnn deep learning gluon lstms apache mxnet

41 - Cross-Sentence N-ary Relation Extraction with Graph LSTMs, with Nanyun (Violet) Peng

NLP Highlights

Play Episode Listen Later Nov 10, 2017 34:37

TACL 2017 paper, by Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. Most relation extraction work focuses on binary relations, like (Seattle, located in, Washington), because extracting n-ary relations is difficult. Nanyun (Violet) and her colleagues came up with a model to extract n-ary relations, focusing on drug-mutation-gene interactions, using graph LSTMs (a construct pretty similar to graph CNNs, which was developed around the same time). Nanyun comes on the podcast to tell us about her work. https://www.semanticscholar.org/paper/Cross-Sentence-N-ary-Relation-Extraction-with-Grap-Peng-Poon/03a2f871cc841e8047ab3291806dc301c5144bec

washington cross seattle cnn relation sentence extraction wen graphs peng lstms

40 - On the State of the Art of Evaluation in Neural Language Models, with Gábor Melis

NLP Highlights

Play Episode Listen Later Nov 7, 2017 29:58

Recent arxiv paper by Gábor Melis, Chris Dyer, and Phil Blunsom. Gábor comes on the podcast to tell us about his work. He performs a thorough comparison between vanilla LSTMs and recurrent highway networks on the language modeling task, showing that when both methods are given equal amounts of hyperparameter tuning, LSTMs perform better, in contrast to prior work claiming that recurrent highway networks perform better. We talk about parameter tuning, training variance, language model evaluation, and other related issues. https://www.semanticscholar.org/paper/On-the-State-of-the-Art-of-Evaluation-in-Neural-La-Melis-Dyer/2397ce306e5d7f3d0492276e357fb1833536b5d8

art state language models evaluation neural melis chris dyer lstms

LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber - TWiML Talk #44

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Aug 28, 2017 66:19

This week we have a very special interview to share with you! Those of you who’ve been receiving my newsletter for a while might remember that while in Switzerland last month, I had the pleasure of interviewing Jurgen Schmidhuber, in his lab IDSIA, which is the Dalle Molle Institute for Artificial Intelligence Research in Lugano, Switzerland, where he serves as Scientific Director. In addition to his role at IDSIA, Jurgen is also Co-Founder and Chief Scientist of NNaisense, a company that is using AI to build large-scale neural network solutions for “superhuman perception and intelligent automation.” Jurgen is an interesting, accomplished and in some circles controversial figure in the AI community and we covered a lot of very interesting ground in our discussion, so much so that I couldn't truly unpack it all until I had a chance to sit with it after the fact. We talked a bunch about his work on neural networks, especially LSTM’s, or Long Short-Term Memory networks, which are a key innovation behind many of the advances we’ve seen in deep learning and its application over the past few years. Along the way, Jurgen walks us through a deep learning history lesson that spans 50+ years. It was like walking back in time with the 3 eyed raven. I know you’re really going to enjoy this one, and by the way, this is definitely a nerd alert show! For the show notes, visit twimlai.com/talk/44

ai co founders switzerland history lessons chief scientist deep learning jurgen lugano scientific director lstm schmidhuber lstms twiml

Richard Socher on the future of deep learning

O'Reilly Bots Podcast - O'Reilly Media Podcast

Play Episode Listen Later Dec 1, 2016 57:46

The O’Reilly Bots Podcast: Making neural networks more accessible.In this episode of the O’Reilly Bots Podcast, Pete Skomoroch and I talk with Richard Socher, chief scientist at Salesforce. He was previously the founder and CEO of MetaMind, a deep learning startup that Salesforce acquired in 2016. Socher also teaches the “Deep Learning for Natural Language Processing” course at Stanford University. Our conversation focuses on where deep learning and NLP are headed, and interesting current and near-future applications.Discussion points: Accessibility, in a couple of senses: making deep learning easier for computer scientists to implement, and making the power of deep learning available through intuitive applications AI-enabled question answering systems and dynamic co-attention networks The issue of interpretability, and progress in creating more interpretable models Why Socher believes that human-in-the-loop is the best solution for the current “fake news” controversy, the hottest topic in NLP now Why Quasi-Recurrent Neural Networks (QRNNs) are an advancement over Long Short Term Memory networks (LSTMs), the subject of a recent paper co-authored by Socher Other links: The Stanford Question Answering Dataset TensorFlow and Chainer, two frameworks for working with neural networks Summaries of recent papers by the Salesforce research team

ceo ai stanford university squad nlp salesforce accessibility xd deep learning natural language processing tensorflow summaries xa chainer metamind lstms

Richard Socher on the future of deep learning

O'Reilly Bots Podcast - O'Reilly Media Podcast

Play Episode Listen Later Dec 1, 2016 57:46

ceo ai stanford university squad nlp salesforce accessibility xd deep learning natural language processing tensorflow summaries xa chainer metamind lstms

Podcasts about lstms

Best podcasts about lstms

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

The Nonlinear Library

NLP Highlights

O'Reilly Bots Podcast - O'Reilly Media Podcast

The Nonlinear Library: LessWrong

Latest news about lstms

Latest podcast episodes about lstms

#259 Anjney Midha: a16z's Strategy to Turn AI Startups into Unicorns

#248 Pedro Domingos: How Connectionism Is Reshaping the Future of Machine Learning

#232 Sepp Hochreiter: How LSTMs Power Modern AI System's

Ep 47: Chief AI Officer of Databricks Jonathan Frankle on Why New Model Architectures are Unlikely, When to Pre-Train or Fine Tune, and Hopes for Future AI Policy

Efficiency is Coming: 3000x Faster, Cheaper, Better AI Inference from Hardware Improvements, Quantization, and Synthetic Data Distillation

Jürgen Schmidhuber - Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs

AI Magic: Shipping 1000s of successful products with no managers and a team of 12 — Jeremy Howard of Answer.ai

Llama 3, context window and LSTMs

Latent Space Chats: NLW (Four Wars, GPT5), Josh Albrecht/Ali Rohde (TNAI), Dylan Patel/Semianalysis (Groq), Milind Naphade (Nvidia GTC), Personal AI (ft. Harrison Chase — LangFriend/LangMem)

LW - The One and a Half Gemini by Zvi

LW - The One and a Half Gemini by Zvi

Tal Linzen: Psycholinguistics and Language Modeling

Let's Play! The Intersection between Art and Science

LW - Parameter Scaling Comes for RL, Maybe by 1a3orn

LW - Parameter Scaling Comes for RL, Maybe by 1a3orn

LW - The next decades might be wild by Marius Hobbhahn

LW - The next decades might be wild by Marius Hobbhahn

AI Transformers in context, with Aleksa Gordić

#5 Felix Hill: Grounded Language, Transformers, and DeepMind

Ep 10 - Daniel Hoyos (Machine Learning en Blue Orange y antes en Mercado Libre) - ML para series de tiempo

Predictive Maintenance Using Deep Learning and Reliability Engineering with Shayan Mortazavi - #540

Large-scale machine learning and AI on multi-modal data

Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484

Data Science/AI Interview Experience and Tips from BEPEC Learner

Learning, Visualizing and Exploring 16S rRNA Structure Using an Attention-based Deep Neural Network

Mind reading of the proteins: Deep-learning to forecast molecular dynamics

Capturing brain dynamics: latent spatiotemporal patterns predict stimuli and individual differences

14: Building a Character-Based Text Classifier

The Unreasonable Effectiveness of the Forget Gate with Jos Van Der Westhuizen - TWiML Talk #240

Juergen Schmidhuber: Godel Machines, Meta-Learning, and LSTMs

AWS Machine Learning Services

Milestones in Neural Natural Language Processing with Sebastian Ruder - TWiML Talk #195

Machine Learning, Big Data, and Your Family History

MCL303: Deep Learning with Apache MXNet and Gluon

41 - Cross-Sentence N-ary Relation Extraction with Graph LSTMs, with Nanyun (Violet) Peng

40 - On the State of the Art of Evaluation in Neural Language Models, with Gábor Melis

LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber - TWiML Talk #44

Richard Socher on the future of deep learning

Richard Socher on the future of deep learning