Podcasts about loras

116PODCASTS
374EPISODES
48mAVG DURATION
1MONTHLY NEW EPISODE
Jun 20, 2026LATEST

POPULARITY

20192020202120222023202420252026

Best podcasts about loras

Duhawk Digest

96 episodes with loras

Dubuque Area Baseball Podcast

16 episodes with loras

Short Time Wrestling Podcast

11 episodes with loras

Tuesday's with Nick & Cole

28 episodes with loras

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

9 episodes with loras

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)

7 episodes with loras

Close the Door: Game of Thrones, A Song of Ice and Fire Podcast

6 episodes with loras

D3 Glory Days Podcast

4 episodes with loras

WATCH DEM THRONES by Black With No Chaser

4 episodes with loras

Hoopsville

4 episodes with loras

LWLA Podcast

13 episodes with loras

Not A Podcast ASOIAF Re-Read Podcast

2 episodes with loras

Unspoiled! A Song Of Ice And Fire

3 episodes with loras

Hoop Heads

2 episodes with loras

D3football.com » D3football.com Around the Nation Podcast

2 episodes with loras

Isle Of Faces

4 episodes with loras

D3 Nation

2 episodes with loras

On The Mat

4 episodes with loras

#SoccerChat

2 episodes with loras

The Podcast About Division III Baseball

2 episodes with loras

Escuchando Peliculas

2 episodes with loras

GeoCastAway | GeoNáufragos

2 episodes with loras

Game of Microphones: A Game of Thrones Podcast

2 episodes with loras

In The Room

2 episodes with loras

The Nonlinear Library

2 episodes with loras

The top AI news from the past week, every ThursdAI

17 episodes with loras

The Run Around Iowa

11 episodes with loras

Show all podcasts related to loras

Latest podcast episodes about loras

Israel & Hezbollah Renew Uneasy Ceasefire + Report: IRGC Cells in Iraq Launched Attacks Against Kuwait, Saudi Arabia, & UAE + US Renames Combatant Command: Indo-PACOM Out; PACOM Name Restored

I - On Defense Podcast

Play Episode Listen Later Jun 20, 2026 26:18

For review:1. A B-52 on a routine test mission crashed shortly after takeoff Monday (15 June) in Southern California, killing all eight crew members on board.2. US Army aims to release proposal requests for its heavy variant of the ISV this fall, a senior service official told lawmakers, adding that the Army is attempting to field the capability as quickly as possible. The Army is seeking vehicles that can provide 60 kW of continuous high-voltage DC power, 15 kW of 28V DC power and 4.8 kW of 120V AC power. 3. After eight years as US Indo-Pacific Command, the Pentagon has announced it is restoring the original name to simply Pacific Command (PACOM).4. KNDS, the French-German maker of the Leopard tank and the Caesar truck-mounted gun, pitched a new howitzer with a barrel longer than any NATO production artillery, which the company said gives the weapon a range of up to 60 kilometers (37 miles) with standard high-explosive shells.The 155mm howitzer, named Loras for long-range artillery system, is equipped with a 58 caliber gun tube, about 12% longer than the 52 caliber barrels that equip most recent European cannons.5. President Trump on Friday unveiled the Boeing 747-8 that will serve as the new Air Force One, that the U.S. accepted as a gift from the Qatari government last year.6. US President Donald Trump's special envoy, Steve Witkoff, is traveling to Switzerland for the first round of talks with Iran on a potential nuclear deal, Axios reports, citing a US official.Trump's son-in-law Jared Kushner is already in Switzerland, it adds.7. Israel and Hezbollah agreed to a ceasefire on Friday afternoon, a US official said, after another flare-up in southern Lebanon that saw four Israeli soldiers and dozens of Lebanese casualties.8. Iran's Islamic Revolutionary Guard Corps (IRGC) has set up secretive new cells in Iraq to carry out attacks on Gulf countries that host American forces, bypassing established militia networks to avoid detection, eight Iraqi sources told Reuters.

american donald trump israel european dc army iran southern california iraq switzerland israelis saudi arabia nato attacks lebanon pentagon command caesar boeing us army gulf launched restored renew reuters kuwait cells hezbollah iraqi lebanese ceasefire leopards air force one axios indo jared kushner kw qatari uneasy irgc isv loras combatant renames french german knds pacom

UFR 2486 Segment 6 Brandon Novak (Regional College Football: Loras)

Upon Further Review

Play Episode Listen Later Jun 9, 2026 5:59

college football regional loras brandon novak

Why Video Agent models are next — Ethan He, xAI Grok Imagine

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jun 1, 2026 103:26

We're announcing AIEWF speakers this week! Take the AI Engineering Survey!Today's guest Ethan first joined us for the LS Paper Club as the lead on NVIDIA Cosmos World Model, but then joined xAI and built Grok Imagine in 3 months:He comes back on Latent Space with some nuclear hot takes: that Video Models primarily get their intelligence from LLMs, not from training on video data, and that the next frontier for truly interactive, realtime, long-horizon world models is to work on LLMs (perhaps Interaction Models as well…)Put it this way: In the near term, the next Sora won't be a better video model, but a video agent.Generative Media may more closely follow the evolution of AI coding which went from focusing on one-shot output performance and cost, to multiturn reasoning and planning models for agents and systems that can plan, edit, test, debug, and submit PRs.At a certain point, coding models got so good that the only significant next step to improve performance was handling the orchestration of these models.Now as the performance of video models increases significantly across realism, consistency, & prompt adherence while becoming more cost efficient, the next evolution of video generation may also be systems that can plan, generate, edit, critique, and iterate across an entire creative task. In this episode, Ethan joins swyx and Vibhu to unpack what it actually takes to build frontier image and video systems: data, VAEs, diffusion transformers, audio-video alignment, inference speedups, and the hidden cost of storing and moving massive video datasets. From building NVIDIA's Cosmos world model to joining xAI as Grok Imagine was being built from zero to one, Ethan He has been at the center of some of the most important work in video generation, multimodal models, and real-time world models.We go deep on Grok Imagine, how a small xAI team shipped its first multimodal video model in three months, why iteration speed matters more than almost anything in model development, and why many of the biggest gains come from fixing tiny bugs in data and training pipelines. Flipbook: The future of VideomaxxingVideo agents are almost a sure bet to be the trend in the coming year. We end with a glance at what's beyond video agents:Flipbook caused a minor sensation this year when it was released, but most treat it as a fun demo. Ethan takes it very seriously — with the speed and cost of inference coming down every year, the future of custom video JIT UI is closer than you think. We talked about why videogen models may become the front end of AI, how generative UI could replace traditional HTML/CSS, why world models need to be real-time, interactive, and long-horizon, and why the future of video generation may depend more on language models and agents than on diffusion alone.We discuss:* Why fast iteration mattered more than meetings* Why small training bugs can drive huge model quality gains* Why coding models may make compute the bottleneck again* How image and video models are trained with synthetic captions* The role of VAEs and latent space in frontier video models* Why image models are the foundation for video models* The tradeoff between temporal compression and real-time interactivity* Flipbook, Neural OS, and the future of generative UI* Why future interfaces may go from user intent to pixels* The hidden cost of training video models: storage, egress, and GPU hours* How step distillation and consistency models (like OpenAI sCM) makes video inference orders of magnitude faster* Grok Imagine 0.9 and large-scale audio-video generation* Why audio-video alignment is harder than text-video alignment* Ethan's definition of world models* Reference-to-video, video extension, and long-context video generation* Why xAI's research communication undersells Grok Imagine* How xAI culture shaped the speed of development* AI watermarking, SynthID, and detecting generated media* Why prompt rewriting matters for video models* Grok Imagine Agent and the rise of video agents* Why language models may unlock better video generation* Robotics, physical AI, and embodied world models* Why Ethan left xAI and shifted focus toward LLMs* Self-managed context, memory, and the next frontier for language modelsEthan He* LinkedIn: https://www.linkedin.com/in/ethanhe42* X: https://x.com/EthanHe_42Timestamps00:00:00 Introduction00:01:25 From NVIDIA Cosmos to xAI00:03:24 Building Grok Imagine from Zero to One00:10:07 How Image and Video Models Are Trained00:18:53 Video Compression, VAEs, and Real-Time Tradeoffs00:22:10 Generative UI, Flipbook, and Neural OS00:32:10 The Cost of Training Large Video Models00:37:04 Distillation, GANs, and Fast Video Inference00:41:21 Audio-Video Generation and Grok Imagine 0.900:48:34 What Makes a World Model?00:55:51 Reference Videos, Long Context, and Video Memory01:00:11 xAI Culture, Research, and First-Principles Building01:09:45 AI Safety, Watermarking, and Prompt Rewriting01:13:10 Video Agents and AI-Assisted Creation01:27:32 Why Language Models Unlock Better Video01:31:15 Robotics, Physical AI, and Embodied World Models01:32:38 Why Ethan Left xAI01:34:16 Self-Managed Context and the Future of LLMs01:38:43 Ethan's Career Path and Closing ThoughtsTranscriptIntroduction: Ethan He, Latent Space, and the Path to xAISwyx [00:00:00]: We're here in the studio with Ethan He, most recently of xAI. Welcome.Ethan [00:00:10]: Thank you. Glad being here.Swyx [00:00:11]: We're also here with Vibhu. you were first coming to us or joining the latent space world because you were working on Kosmos at NVIDIA, and you did a paper. We loved it. you presented it as well, so thank you for doing that.Ethan [00:00:23]: I've actually, I also presented the MoEs twice at latent space.Swyx [00:00:29]: How did you actually hear about us? Did we reach out to you? Is that how it worked?Ethan [00:00:33]: No, actually, I-- the community. Like I realized, oh, there is this online community that people talk about AI and also learn from each other through papers every week through the Paperclip. It's very nice.Ethan [00:00:49]: I learned a lot.Swyx [00:00:49]: I think three years stop. We haven't stopped even on Christmas and New Years. many weeks I want to stop but it keeps going.Vibhu [00:00:58]: No, that was good. I think you had posted that you worked on a paper, and I was “Oh, very cool. We have Paperclip. Present then.”Vibhu [00:01:04]: But I might have reached out to you after.Swyx [00:01:05]: you-- because it's an amateur club, right?Swyx [00:01:08]: so it's very unusual and but we have sometimes paper authors come by and actually explain the paper. Today we just did, the poolside paper, which was apparently very good.Vibhu [00:01:18]: Came out yesterday.Vibhu [00:01:19]: pretty interesting, right? Fully open. They talk about everything, systems. So it's a good one. We'll, we'll recommend people to read it.Swyx [00:01:25]: Bring us up to speed on your transition to xAI, ‘cause I actually don't even know when you joined. just like tell the, tell the story about the sort of transition.From NVIDIA Cosmos to xAI: Scaling Video and World ModelsEthan [00:01:34]: Before xAI, I was working on Kosmos world model as in-- at NVIDIA. So Kosmos is, it's a giant video foundation models that can-- that aims to simulate the world and for-- it serves as a foundation of-- for all of the roboticists to build on top of. There, once I built the Kosmos one, I realized as this thing also has a scaling law similar to language model, we need to scale up the video models further. that's, that's why I realized I need to move to somewhere with much more compute resources. That's how ISwyx [00:02:13]: Than NVIDIA?Vibhu [00:02:14]: The GPU rich came themselves.Vibhu [00:02:19]: And timeline-wise, when was Kosmo? It was pretty early, right? It was open world model, open paper, everything.Ethan [00:02:25]: It was end of twenty-four.Vibhu [00:02:28]: End of twenty-four.Ethan [00:02:30]: Then at mid twenty-five, I moved to xAI. At that time-- I joined about the time when xAI was about to build video models and in multi-model models. There were no infra, no data, and no model, and it just-- as a few engineers, we built it in three months and released the first model, Grok Imagine zero point nine.Ethan [00:02:55]: And since then, I keep working on video models and move more from training and to post-training of the video models. For example, like a reference to videos, kind of like the cameo feature and, video extensions. And, before I left, I worked on a world model, leading a small team to focus on the real-time long horizon video generation.Building Grok Imagine From Scratch in Three MonthsSwyx [00:03:24]: Can you give like a rough roadmap of okay, you're on a brand-new team. Grok previously was only text, or they partnered with BFL for their image gen stuff. What do you-- what are the building blocks, right? You have compute, data you can procure somewhere. Like just what are like the sequence of things that people should think about when you're setting up a new team?Vibhu [00:03:43]: actually even deeper, not just data you can procure. You guys had to go through getting the data too, right? So you shipped it pretty fast, but yeahSwyx [00:03:51]: three months is likeVibhu [00:03:52]: From everythingSwyx [00:03:52]: actually like very surprisingly fast.Ethan [00:03:55]: One thing I say like thanks to my experience at NVIDIA, ‘cause first time when we were building Kosmos together, we built it, for about a year. So this is like the second time I do it. Roughly have an idea, what to do. I say the most important thing is the talent. Everyone were very strong and clever, very close with each other towards a common goal. So that speed up things a lot. So you reduce the communication bandwidth among people, and everyone can work towards the same goal. It's, it's like every day there's not that much meetings on the calendar, like maybe like a, like a sync a day, and after that it's, it's just all building. It was pretty fun at that time.Ethan [00:04:47]: And another thing is that xAI has very strong foundations of like data inference, model inference, and the supporting there can help the model develop a lot. When I look at, training models, I don't so actually the top important thing is like how many, how many iterations can you do, per day? and the more iteration can you do, you can, you can train the model much faster. So if you have very strong infra and you have a lot of compute, you can, you can train these models in very short period of time. That can give you a much larger buffer to, for errors, and it also gives you the opportunity to spot more bugs.Iteration Speed, Compute, and Debugging Model PipelinesSwyx [00:05:46]: What is an iteration? Is it like a few hundred steps or what are youEthan [00:05:50]: Let's say just the train-training the model, like from acquire new data and maybe design new algorithms and train a new model, maybe at smaller scale orSwyx [00:06:01]: So cycle time for like any hyperparam that you're searching.Ethan [00:06:04]: Cycle time and tune to like eval this model. Is this model better than my previous iteration?Ethan [00:06:11]: SoSwyx [00:06:11]: So it's like before you, someone had already set this up that you can iterate very quickly.Ethan [00:06:15]: I think the foundation there is extremely good forDeveloping and research models.Ethan [00:06:23]: And often I find is it-- this is kind of boring, but like a lot of the improvements does not come from new algorithms. It comes from finding small bugs here and there in the data pipeline, in the, in the model training pipeline. Those give, those give the biggest boost to the model quality.Vibhu [00:06:46]: It's interesting, right? So you say it's like small team, less communication bandwidth, but also a lot of quality is like find little bugs. It seems counterintuitive, right? You have a lot of people, you can iron out more of those, but it's interesting to see the other side, right?Swyx [00:07:00]: I also wonder, have you-- do you try using LLMs to look for bugs? I don't know.Ethan [00:07:05]: I remember at that time it was mid two thousand and twenty-five, so it's the coding model wasn't quite there yet. I remem- I remember like December two thousand and twenty-five, it was extremely good. Yeah, I've been, I've been using it at that time. It's, it's helpful. sometimes it produce codes that are kind of difficult to maintain, even though like the first time it built something extremely fast. But it gave the, like a spaghetti code, thousands of lines that I couldn't maintain, and the LLM itself couldn't figure out what's, what's wrong and how to improve on top of it. But now I find it much better. Yeah, I want to bring up another point here is now coding models are much more efficient and can help us implement stuff much faster. Compute might become a bottleneck again because previously, like if you want to train a new model, say you want to generate new synthetic data and then or write a new algorithm, it might take a few weeks. And during that period of time, you don't-- you might not have experiments to run. But now you can build that thing within a few hours, then you can immediately train a model.Ethan [00:08:24]: Now you have to have enough compute to try all of the ideas. So compute might be the bottleneck of iterating speed again.Swyx [00:08:36]: yeah, I actually, honestly, I think it's like kind of a stressful job because you're “Well, I should be trying everything, and if I'm not, then I'm not doing my job well.”Vibhu [00:08:48]: there's also the stress of you're eating thousands of GPUs per hour, which is very expensive and, compute can go to other researchers.Swyx [00:08:56]: You got the daddy Elon toVibhu [00:08:57]: You got daddy Elon.Ethan [00:08:59]: It wasVibhu [00:09:00]: But there's still finite amount of compute, like you want to use it, you want to use it well, you want more of it.Ethan [00:09:06]: That was quite stressful indeed. Yeah, I think one thing is the-- with coding models now, like a lot of these jobs can be automated, which is much better. A second, it's a, it's a marathon, so you got to maintain good health and, a regular schedule.Vibhu [00:09:28]: It's, it's hard to hear that when you shift from zero to nothing in two months.Swyx [00:09:32]: and, I think obviously the culture at xAI is very famously, people work very hard. one thing I did want to dive into, in our-- in the notes that you, that you sent ahead of time, you had specific comments about the cost of Video Gen training. presumably this is on the Colossus-1, right? the two hundred megawatt cluster. Any whatever you want to just share on that.Vibhu [00:09:54]: I think there's, there's three things we're talking about, right? So there's Video Gen, there's also the Image Gen model that you put out. Do you want to like complete the, okay, so zero to one, you have a few months. Just what are the stages of create Image Gen model?Swyx [00:10:06]: Oh, yeah, maybe I got distracted.How Image and Video Models Are Trained: Synthetic Captions, Tokenizers, and VAEsVibhu [00:10:07]: Sorry. and then, from there's Video Gen, there's Audio Gen. Would love to get into those next. But what is that first few months like? So small team, a lot of bugs, iterations, but what does it look like? Do we take something off the shelf? Do we just get data compute? What's, what's the few months like? How do you go to state-art Image Gen model? How do you just start?Ethan [00:10:28]: I cannot comment specifically how xAI did, but it's, it's a quite standard process. I can draw some, examples from Cosmos. So mainly it's building a video model, you actually need to build a image model first. And building these two models, the data you need is a hundred percent synthetic pair of language and image or language to video. Because on the, on the internet, actually, the videos don't naturally associate with text. So you can say, oh, like on YouTube, you have the title and you have the description and the commentsSwyx [00:11:11]: TitleEthan [00:11:11]: of a video, but usually they're not relevant to the video itself. And say maybe like the video is a natural scene of mountains or something, and the title is, I'm so happy today.Ethan [00:11:26]: So they have they have no correlation at all. So the first step is to, you have to generate synthetic pair of language with the videos. So you gather videos from the internet, and you use a VLM to caption the videos. So that part, here's a question, like how do you, how do you gather VLM to begin with? So if there's noSwyx [00:11:55]: You, so you fuse the model, right? LikeEthan [00:11:57]: Say if there's no like VLM exists, like how do you generate the text to the beginning, right? It's, it's impossible.Swyx [00:12:04]: I see.Ethan [00:12:05]: In the beginning, it's like you ask human to describe the video as detailed as possible.For example, you ask them to describe everything, like all objects, all characters, and all interaction and dialogues in the, in the videos. So that's in the protocol of Cosmos labeling. We require the objective we give to the labelers was that you have to describe the video as detailed as possible, such that a blind person hears a blob of text can reconstruct what the video is like from their head.Swyx [00:12:43]: Video or image? You're talking about images.Ethan [00:12:44]: Video or image, either one of them.Vibhu [00:12:47]: This was pretty common when we went from clip and DALL-E, right?Vibhu [00:12:51]: It's all training on really detailed captioning of images. So same is applied to video, but insteadEthan [00:12:57]: same appliedVibhu [00:12:57]: of using multimodal model to pass in video images and write rich descriptions, you can alsoSwyx [00:13:04]: I think there's this traditional perspective of supervised, or, very highly human curated thing. I feel like there's a unlock with unsupervised, right? Where like you have enough to bootstrap that you can just throw common corpus on it or, whatever. like unsupervised vision and language pairing, right? Like where you just have, interspersed image and text and it just learns. To me, that is the VLM breakthrough that is different from the clip, different from the LM era.Ethan [00:13:36]: It's interesting to see that you kind of need both data.Ethan [00:13:41]: For example, for theSwyx [00:13:41]: You need it to bootstrap it up. YeahEthan [00:13:43]: for the generative model training, there's also usually like a small percentage of unlabeled data. So the model is instructed to generate a video without any text instruction. That can also help the model generalize. So after this stage of generative synthetic pair, so, one important common step is to train a compressor or a tokenizer of the image or videos. So because, if you train-- If you can technically, theoretically train image or video models on pure pixels, but the problem is that the, it's, it's a lot of tokens. So like one image, it's, a thousand by a thousand, it's like one million tokens, one million pixels. It's impossible to train transformer on that. So it's, you need to train a tokenizer, which can go from image to latent space and latent space back to image.Swyx [00:14:45]: That's why we named the podcast.Swyx [00:14:48]: But, basically, you're talking about vocabulary science.Ethan [00:14:50]: so vocab.Swyx [00:14:51]: And so, what is, what is imp-- like a million is impossible?Ethan [00:14:54]: In generative models, the vocab is continuous. It's a continuous space. We can think about like you map an image to a vector. It's a, it's a fixed length vector. It's sixteen or forty-eight, something like that. And then you map that vector back to the image space. And the mapping is, has-- The mapping is patch-based. So you say you haveEthan [00:15:22]: a sixteen by sixteen patch and you match, you map that patch of pixels into this latent space.Swyx [00:15:29]: We've covered thisVibhu [00:15:30]: This is like the vision transformersSwyx [00:15:32]: VAEs,Ethan [00:15:33]: VAEs.Vibhu [00:15:34]: You basically compress your input, you do your generation, you're reasoning all that generation in smaller dimension, and then you project back out.Swyx [00:15:43]: VAE is a form compression, but I think the for me, the patching thing is from VIT, right?Ethan [00:15:48]: You can make those.Swyx [00:15:49]: Literally the, yeah, the paper is titled like sixteen by sixteen is all you need. something like that. and then I think also, people make a lot of comparisons with this kind of patching with convolutions.Swyx [00:16:02]: Which is you're, you're kind of re- reconstructing the old paradigm with the new.Ethan [00:16:05]: Actually, in VAEs, there are, there are both convolution networks and transformers. You can actually do both.Ethan [00:16:14]: After this VAE, so what you've got is you've got latent space tokens and you've got the language tokens. So now the training of the diffusion transformer, usually generative models use diffusion transformers. It is actually quite standard. It's, it's very similar to how you train a language transformer models. It's not that much difference. It's just the tokens, the visual tokens in, visual tokens out. The only difference is there's a denoising process. So you train the model to unmask some of the noise. So you add, you add random noise to the visual tokens, and then you train the model to remove those noise to generate the clean tokens. Any inference, the model can iteratively remove noise from a hundred percent noise.Swyx [00:17:12]: And then there's also, to speed things along on the tech tree of diffusion, there's CFG, and then there's, there's also, latent diffusion that, there's, there's someone in there. I think, somewhere along the line, obviously, like stability and all these other guys, pioneered a lot of this, architecture. I don't know if you want to get into that or just, or do the video side up to you.Bootstrapping Video from Image Models and Temporal CompressionEthan [00:17:37]: After you train such model, such image model, the reason it's a, it's a foundation for video models is that image models are cheaper to train, and they have much denser connection between language and text. So, sorry, language and images. For example, you train a billion, you train on a billion images, and there's a mapping from the text to the image. And the cost to train the same, like the, a billion, a billion text to a billion videos, that's much more expensive because videosNaturally have more tokens than images. Because the diffusion models, their understanding of, language purely come from this mapping. So if you don't have enough mapping, so if you only train on like a ten million videos or something, there-- you might not see enough language tokens in your training, so your model does not understand human intention enough. So that's why you really-- you train-- you first train this image diffusion models, and then you bootstrap the video model from there.Swyx [00:18:53]: One thing I did want to ask, because I-- actually, I think you're, you're the first per-- video model person I've ever talked to, I think. we've, we've like talked to Luma and all those folks. There's all these tricks in video compression where basically frame by frame there's not that much difference, so actually you don't have to regenerate or save the whole frame, right? but I think MP4 compression or something else like that.Swyx [00:19:16]: is it tempting to use that? Or as far as I can tell, everyone just treats it as, “No, we would just generate every frame.” Is that roughly the state-art?Ethan [00:19:27]: There are a few different approaches. Let's say first, like you want to just directly use MP4 compression and use that as the tokens for the transformers to train, right? So people actually have tried that, but the main challenge is the latent space for the MP4 tokens were not, were not very comprehensible for the models. It's, it's extremely hard to train on that. And there's aEthan [00:20:01]: So that's why they created VAEs, which creates more continuous, latent space, so the models can understand that latent space and learn from it much easier. Even within the VAEs, there are different difficulties of the latent space. So you can imagine something the simplest, the most naive VAE is like you have an image, and you just shuffle all of the images into a, into a vector. So you don't need to train any VAEs, right? But that latent space is extremely hard for models to train on top of. That's why there are some debate on like how do you compress the tokens. So you mentioned like you can compress frame by frame. Also, you can compress, the temporal dimension.Ethan [00:20:52]: The difference is if you compress the temporal dimension, you get a much higher compression rate. Because there's temporal redundancy between frames, because, this frame and the last frame, likely they are mostly similar, so there's only some small difference. for example, I think in 12.1 VAE, they have like a eight by eight by four compression rate. So the four temporal tokens are compressed into one tokens. That can save a lot of, save a lot of the context length. If you do it frame by frame, you have to do maybe like eight by eight by one. Your context length will be four times larger. That being said, the benefit of the frame-- per frame compression, we might come back to this later, is, real-timeness and interactivity. ‘Cause if you, if you strain the output of the model, frame by frame, you can-- the model can respond to any user request immediately. So if you have like a temporal four compression, four times compression, thenSwyx [00:22:06]: It might be laggyEthan [00:22:07]: there's a lag there in nature.Swyx [00:22:10]: So you're very pilled on this. let's just go ahead and bring it up ‘cause we have the visual prepared anyway. There's some frontier applications of real-time video gen. So Flipbook is one of the examples that went viral recently, right? What is Flipbook?Real-Time Generative UI: Flipbook, Neural OS, and Diffusion Front EndsEthan [00:22:23]: Flipbook is kind of like a web brow- web browser. You can see like it has the web bro- browser UI on top. The difference is all of the UIs are generated by generative image model in real time, and anything here are fake. But you can, you can explore inside this wor- this imaginary world. Say like we-- here we have engineering the Great Pyramid. Like the model generates this for us to understand how it works, and if we want to navigate around and understand further, we can click on some of the, some of the description here, and the model will generate a new page, new subpage describing the details we want to know about.Swyx [00:23:14]: So it's basically kind of we're playing a video, but it's pausing for our next interaction, and then it just plays the next thing based on our interaction.Swyx [00:23:23]: Which is kind of cool.Vibhu [00:23:25]: and you kind of decide your story. So this was, how do you make a pyramid? levering technique seemed interesting, right? It shows how do you take Okay, I want to know what is thisSwyx [00:23:35]: The demo, the demo tweet had more animation between frames.Vibhu [00:23:38]: I think it's just skipping,Swyx [00:23:39]: Oh, it's just skipping a lot of frames.Ethan [00:23:40]: they also have a video modeVibhu [00:23:42]: It takes a lot. There's a lot of peopleEthan [00:23:42]: but, a lot of people are using it.Ethan [00:23:45]: So it's not available.Vibhu [00:23:46]: There's a live video stream. We can try,Swyx [00:23:50]: So this is an example of the kind of future that you see at the extreme. We don't-- we're obviously not in it today.Swyx [00:23:56]: But in a world where inference is completely free this is better than generating code and text?Ethan [00:24:02]: So this is, this is a final state of where Viva will be at for word model, I think. Imagine internet doesn't exist, and then you type in google.com. Like what should, what should, what should a model show you?the model can imagine something, and this is what the model imagine. And these web pages, they completely do not exist. So I think as the inference costs come down, we are going to have generative UI for everything. If you think about how the coding model works, so they write code for a web page, and they render the code might be con- converted into binary, and the binary render the pixels on the screen. So we in machine learning, every time we have some breakthrough, obviously it's, it's more intuit. So why don't we have like user instruction to the pixel directly? So the generative UI will be user intention to the pixels directly. And say like even if I want email, let's say everyone have the same interface, but I want, I want it slightly different. I want the email to show to me like a TikTok, so I can swipe left and right for the emails. And or maybe you want something else. We can have completely different things. Or like I have I'm looking at, Instagram stories, and I don't like the Like button. I always may click it. And, generative UI resolved it. So it's going to be a revolutionary replacement of the interface. So in the future, we might have much more powerfulEthan [00:25:50]: LLMs and coding models running behind the scene. And in the, in the front-end, the diffusion model will actually be the front-end to show stuff to you. That's how I imagine it.Swyx [00:26:02]: Diffusion front-end, deterministic back-end.Swyx [00:26:04]: Something like that. I find that very expensive, but,Vibhu [00:26:08]: I find it interesting you called LLMs writing code on the back end deterministic, but okay.Swyx [00:26:14]: you write it onceVibhu [00:26:15]: Compare it toSwyx [00:26:16]: And then you execute.Ethan [00:26:17]: If you think about the cost, say, let's say H100 costs $1 per hour, and if you use this eight hours a day and thirty days, so, every month you're paying this two forty, you'll actually not wanna pay for that. That's even more expensive than Cloud Code Max. But if you think about the compute costs come down like two times every year, and I think the future will likely arrive like within few years.Vibhu [00:26:49]: It's everything, right? compute cost comes down, compute gets faster, model gets smarterEthan [00:26:54]: More efficientVibhu [00:26:54]: model gets smaller.Swyx [00:26:55]: I don't know why you say two times, ‘cause I think it's like 100 times. In language models, it is roughly one hundred to a thousand times every twelve to eighteen months, for the same given level of LMSys, ELO.Vibhu [00:27:08]: That's a net of everything, right? That's model performance alongside compute. So different than just compute costs come down. But, a very interesting future.Swyx [00:27:19]: So the web designers will have to shout out that accessibility is an issue, right? how do you deal with screen readers or whatever. But yes, this is higher bandwidth storytelling than anything you can possibly generate with code, right? So I think that's the rough idea.Ethan [00:27:34]: And I'd like to add a little bit that so human naturally have the maximum bandwidth when we are looking at things, look at videos, and we also have maximum output bandwidth when we are talking. So in the future, it might be something like we talk to AI models, and the AI model responds back with a generative UI. So that would be the maximum input and output bandwidth to interact with AI models before neural link happens.Vibhu [00:28:06]: And it's also very custom, right? Some people are very visual, some people are not as visual, right? They prefer the text. But the best thing about generative UI, right, it can also be text.Swyx [00:28:17]: There's another project that we wanted to highlight, which is the Neural OS. Kinda similar idea, but here you're literally operating, simulating an operating system with a video model.Swyx [00:28:27]: and you can play Doom, you can do Firefox. I find this like mildly less impressive, obviously, because it's an OS that I can run.Swyx [00:28:37]: But here everything is imagined.Vibhu [00:28:40]: I was, used to the Command+W to close the Firefox tab. It didn't crash. That's why I saidSwyx [00:28:45]: It's too immersive.Vibhu [00:28:46]: It's, it's too immersive for me.Swyx [00:28:47]: Too immersive.Vibhu [00:28:48]: I wanted to close the tab.Vibhu [00:28:49]: But yes, I can play generated diffusion.Swyx [00:28:51]: this is shockingly fast.Swyx [00:28:54]: Because I remember there was a demo about like maybe one to two years ago. Someone tried to do the first-person shooter with a image model. There was no consistency. It was very slow. But here it looks like realistically it's-- this is Doom.Vibhu [00:29:07]: I think there's two sides to that, right? There's okay, what is running a game? The heavy part of it is actually the game engine, all the lighting, all that stuff, the graphics. This is just kind of video, right? Like we've solved consistency. This is still, it looks like a few years old image generation. There's some temporal consistency, but it's, it's kind of just images stitched together as frame video. But it's a good visual representation to pi- to picture the future you wanna see, right? that's, that's what I see in these more so.Ethan [00:29:38]: This reminds me of how the video models gets better and better. So Neural OS is kinda if you just look at it feels like it's just a crappy version of the, like the Windows we could have, right? And, but the difference is, so the model, this model is overfitted on the existing operating systems. It can generate nothing different than that. But it's actually also similar to video models. So when we are training these video model, image model, we train them on internet. There's no imaginary supernatural stuff on the internet. But once we train this model, you can prompt the model to generate something supernatural that have never existed in the data set. So if you train your Neural OS or neural computer on the standard screen recordings on the entire internet. The model can imagine completely new interface to interact with the computer.Swyx [00:30:43]: This is one of those things that is magical to me. usually generalizing out of distribution is bad, but somehow we have learned some kind of internal world model that you say, this plus, but it looks like rainbows and butterflies, it'll do it and it will kind of make sense.Swyx [00:31:03]: So yeah, that's kind of cool. Yeah, I don't know if there's any comment more on there. I do, I do wanted to, I did wanted to touch a little bit more on the model architecture stuff, which I think you were getting. It's, really fascinating. We don't get a chance to talk about this enough. So one of the papers that we covered, we've covered every annual, segment anything release. and I don't know if you follow-- you're a computer vision guy, so youEthan [00:31:26]: I knowSwyx [00:31:27]: . So they did memory attention, which is kind of interesting. And I always think, anything where you can, across the temporal dimension, keep some consistency, I think it's, very fascinating, and I don't know if Basically, does that-- the CV side bleeding into video gen side, I think is underexplored, right? we talk about it for labeling, but actually you can borrow the architecture itself.Ethan [00:31:50]: There's, there's also complete different approaches, right? you brought up the term world model, so we went from video model to world model. There is diffusion, but there's also other approaches that people are doing. So maybe we get into those after as well,?Swyx [00:32:03]: He has a whole definition of world models and stuff. I feel like we threw a lot at you. Whatever you want to comment on.Why Video Models Are Expensive: Storage, I/O, and Training ScaleEthan [00:32:10]: I think one thing that we should actually comment back on is okay, so we were talking about the steps to train image gen to video model. One thing we don't see as much of is okay, you brought up the delta in training data, right? SoEthan [00:32:24]: you won't have as much a video model might not generalize, but what is the cost of training a large video model? So we know for LLMs roughly, okay, even like the poolside thing that came out today, right? It's a Gemma level model trained on roughly forty trillion tokens at this many H200s over this much time, right? You can see what is the exact cost of that. So how many GPU hours over how much H200 costs? So how do we do the back-end math of, same thing for video models, image models. How do you, how do you kind of break that down? I can share some back-envelope calculation. So surprisingly, video models is-- the cost is very-- is comparable to language models and obviously the largest scale is language model, maybe like a medium scale to language models. I said just storing the videos alone, it costs a lot. You can, you can maybe look up on AWS or something.Ethan [00:33:20]: You really, say if you have a billion videos and let's say, let's just say like each video, like five megabyte, then you need five petabyte to just store those videos. And also remember we talk about you use a VAE to compress the videos, and you also need to store, typically you need to store those continuous feature, in-- also in your storage. That's also comparable size with the videos themselves. So just storing these videos and the features is tens of petabytes alone. And,Swyx [00:33:58]: I just, I just looked up the calculation. Five petabytes on S3 Standard is one hundred K per month.Ethan [00:34:05]: AndSwyx [00:34:05]: It's comparableEthan [00:34:05]: and you needSwyx [00:34:06]: AndEthan [00:34:06]: And then like tens of petabytes, two hundred K. And even more expensive is you have the ingress and egress.Swyx [00:34:13]: Oh, yeah.Ethan [00:34:14]: Like you-- through the internet. You have to just to download those videos, I believe it's, it's more expensive on AWS than just storing those videos.Swyx [00:34:25]: Storing, yeah.Ethan [00:34:25]: And each training runs, you probably need to pull them once. If you train multiple times, it's, it's even more than that. So it's like just storing the network, those costs is just, it would be a few, a few millions per month to just storing everything, not to mention the GPU cost.Ethan [00:34:45]: AndSwyx [00:34:45]: my side tangent, the compute rental, like GPU rental is very efficient. There's one side, okay, you can be XAI and build your data center. Should we not just build our, storage compute as well? LikeEthan [00:34:57]: Of courseSwyx [00:34:57]: cloud cost compared to just,Ethan [00:34:59]: You save so muchSwyx [00:35:00]: store. Yeah, exactly.Swyx [00:35:01]: Especially with like egress and stuff. So.Ethan [00:35:04]: That's a good idea, but it also comes to-- there are some of its own challenges.Swyx [00:35:09]: Of course, of course.Ethan [00:35:10]: like people who build the GPU data centers, they might not expect this much, storage. And yeah, people build storage, typically they just build it somewhere with just CPUs.Swyx [00:35:23]: I just looked it up. Five-- AWS only charges for egress, not ingress. Tier five for five petabytes is two hundred and thirty K.Ethan [00:35:32]: Even more expensive than the storage.Swyx [00:35:34]: But storing is per month, right? You check in, then you cannot check out. so it's so cool. It's okay. So there's that side.Ethan [00:35:41]: So the TLDR, my backhand mathSwyx [00:35:42]: Data is larger than you think. Yes.Ethan [00:35:44]: my backhand math of GPU hours times GPU cost is also very much, I'm missing some storage.Swyx [00:35:49]: You're also-- you're basically like also more IO bound than normal training.Swyx [00:35:55]: Yes. ‘Cause like data loading, so caching everything, it becomes super important.Ethan [00:36:00]: So in Cosmos, we did a lot of optimizations to make it not IO bound. So, speaking of the training, actually training the model, the GPU cost, if you look up like the open source model, how big these video models are, I think like LTX has nineteen B parameters. That's a dense model. And people are also exploring, MoEs, so it might be twenty B active and, like a hun- hundreds B, total. So that's, that's even-- that's similar size as medium-sized LLM models. And if you, if you look at number of tokens-Uh, we disclose that in Cosmos. It's also like tens of trillions of tokens on the visual tokens. So putting this together, the cost of, training these video models, it's actually comparable with LLMs. Not to mention, the infra is slightly different from LLM, so it might be less efficient to train these models.Inference Speedups: Step Distillation, Consistency Models, and GANsSwyx [00:37:04]: Do you get the benefits of traditional diffusion speed-up? So for, images, there's LCM, LoRAs for, fine-tuning. There's, there's a lot of stuff that's beenEthan [00:37:15]: Flow matching.Swyx [00:37:16]: there's flow matching. There's a lot of stuff that's been done. there's some overlap that applies to diffusion on the inference side and stuff or?Ethan [00:37:23]: so the difference-- the inference side is a completely different story.Ethan [00:37:28]: I think for the training side, it might be a little bit hard to reduce that cost. And for the inference side, the biggest gain is from the distillation of these models. You can-- It's called step distillation, slightly different from knowledge distillation in LLMs. So you-- Typically, for flow matching models, you need like 100 steps or something. Like a distortion model even need even more, like 1,000 steps to generate a good image or video. A step distillation is try to learn to generate fewer step from the model itself. It's kind of like now we-- you use the full model to generate in 100 steps, and then you take a model that only generate 10 steps and let that model to learn from the perfect one.Ethan [00:38:25]: why this workSwyx [00:38:27]: Strong to weak seemingly.Ethan [00:38:28]: It is. It's kind ofSwyx [00:38:29]: DistillationEthan [00:38:29]: kind of like strong to weak. the-- from the modeling perspective, the strong model, the teacher model is trying to model the image and videos of inter-internet, and that distribution is extremely complex. But the step distilled model is just trying to learn from the teacher. The teacher is a model, and the size is fixed, as the distribution is much simpler than the whole internet. That's the intuition I have why step distillation can work. So usually these models serve in productions, they only run in a few steps. In Cosmos, I believe we have, we have like four step and eight steps. If you do some simpler task, image-image translation, it can even run in fewer step, like one step in Cosmos Transfer.Swyx [00:39:22]: I think this is the same intuition that guides a lot of the consistency model work. I sent you a link for, SCM. I don't know if you covered that. To me, that was actually one of, the most impressive papers I've ever seen from OpenAI.Swyx [00:39:34]: That this is the unifying grand concept of consistency models. I don't know if you have any comments on this.Ethan [00:39:41]: So there are, there are a few different approaches,Swyx [00:39:46]: Oh, yeah. Here it is.Swyx [00:39:47]: Two steps versus twenty or 100 steps, whatever. It's already done.Ethan [00:39:52]: So there are, there are a few different approaches, for example, consistency model, and there are also Actually, we shouldn't forget GAN. So GAN, actually, that was, that was the OG ofSwyx [00:40:05]: OGEthan [00:40:05]: step distillation ‘cause it trained just one step to begin with. So actually, a lot of, uh-- For example, there's a distribution matching distillation which use, which uses GAN, as one of the laws for distillation. It-- GAN just tells you, “Hey, generate an image,” and thenEthan [00:40:31]: it has a discriminator to tell, is this image real or not? So the model, the model just need to learn one of the distribution, not the full distribution. Because in training, the model is asked to reconstruct the ground truth image from the internet, which is extremely hard. And in-- When you're training GAN, it's a step process. It's just a, “Hey, you generate image. Does this image look as real as the image from the internet?” Which is a much simpler task. And, yeah, combining a lot of these approaches together, people typically do that, like consistency model and distribution matching and GAN, and we can get these few step models.Audio-Video Generation and Time AlignmentSwyx [00:41:21]: Then there's one step I wanted to add, which is audio and video.Ethan [00:41:26]: So, Grok Imagine zero point nine, I believe it's, it's a first audio video transmodel deployed at a large scale. SoSwyx [00:41:39]: And that was your first model?Ethan [00:41:40]: that was, Grok Imagine's first model. It's, it's audio video, joint generation. I think the hard part is, the modality alignment, ‘cause before this transmodel, we have, we have text to video alignment. We have this, correspondence between text and video. Typically, most of the VLMs, they understand images and videos. Video's very rare, and they don't understand audio mostly. And if you look at the audio generation on the LLM side, you can talk to them perfectly fine, but if you ask them to sing a song or something, it typically is not very good. Also, they don't have, they don't have music either. The hard part is thatUh, actually audio has two component. It has like a discrete component, a continuous component. The discrete component is like the language.Ethan [00:42:44]: So when we speak, it's just, someSwyx [00:42:47]: It's an ASR issue, yeah.Ethan [00:42:49]: It's, it's text token with some characteristics, I would say.Ethan [00:42:54]: But musicSwyx [00:42:56]: I think the speech guys would disagree with this.Swyx [00:42:57]: Like disfluencies and then,Vibhu [00:43:00]: There's tones you can get angry.Ethan [00:43:01]: Well, I say largely.Ethan [00:43:03]: the mu- but the music is completely different. It's, it's very continuous, and you cannot model them like discrete tokens in language models. this is like the hard part for models is, not to mention we have to align text, video, and audio together.Ethan [00:43:26]: SoVibhu [00:43:26]: How?Ethan [00:43:28]: So significant-- some significant challenges are like-- So first, like we talk about as the VLMs, they cannot understand most of them cannot understand audio.Ethan [00:43:39]: So you have to have some way to do the synthetic data generation for audio. You have to caption the model, and that involve, that involve synthetic data and human data effort a lot. And not just surprisingly, most of the LLMs are very bad at recognizing, like the beat, tone, and the details of the of music. They can, they can give some general prediction of which song is this, but it's very hard to describe the details of the music. like we mentioned in image generation, like you have to describe image as detailed as possible so that someone blind can reconstruct that. So here is like someoneVibhu [00:44:32]: DeafEthan [00:44:32]: someone deaf can reconstruct how the music sounds like without actually listening to it. Maybe you can think of it need to have the-- or they call the script.Vibhu [00:44:49]: Subtitles, yeah.Ethan [00:44:49]: You gotta have all the details of the music, and the dialogue.Vibhu [00:44:55]: So is the challenge there typically stuff like music and audio, or is it just Like is there a baseline? Okay, there's enough data where we can understand, narration, conversation, but there's nuances in audio that's where you hit all the data issues or is it just from stage zero, you just do it all right?Ethan [00:45:15]: So one important thing is like the alignment. So the model, the model has to know like the video and audio, the, uh-- it has to have a time-based alignment, like at which time step the video and the audio token correspond to each other. But we actually don't have this kind of alignment for most of the other modalities. If you think about like text and image, text and video, they are loosely aligned. So you can, you can have a description of what's going on in the video, but you don't have to exactly, You typically don't have exact description, oh, at, time step one second like what happened?Vibhu [00:46:02]: It's veryEthan [00:46:03]: At time step two second what happenedVibhu [00:46:03]: coarse. Yeah.Swyx [00:46:05]: So what was the ideal time step? You have to oblate it, and then it's like four seconds or something.Ethan [00:46:09]: So that comes down to how you design the model to, for the model to be aware of as a time, as a time modality. So the model is like a time aware. And that's something pretty unique if you think about LLMs. So if you ask LLM to complete a task, say they, uh-- you ask them and they will say, “Oh, this task will probably take twelve hours to complete,” and they come back in one hour. Say “I've already spent two days on this and I've exhausted everything.”Ethan [00:46:47]: So the LLMs them-themselves, they don't have a sense of time there.Vibhu [00:46:53]: I actually don't think that's just them not having a sense of time. I think it's somewhat based, right?Vibhu [00:46:58]: Like you tell someone, “Okay, go work on this feature. Go implement this,” there's a general understanding you would have of how long that would take without LLMs working at LLM speed, right? So you think back like two years ago, if I tell you to like build me like a new front end for latent space, have a search bar, have all this, you'll estimate that it'll take a few days, right?Vibhu [00:47:19]: So you tell an LLM, “Go build this.” It'll take me a few days. But I think it's somewhat grounded as opposed to them not having the best-- Not saying that they have a great understanding, but I think that example is like you can see where it comes from, right? You're trained on all over the text.Swyx [00:47:35]: They're, they're trying to estimate what a human would say.Vibhu [00:47:37]: because that's what the, that's what the data kind of represents. It's not themEthan [00:47:41]: It came from the corpus on the internet. People have a estimate of how much time.Vibhu [00:47:45]: And not even just in direct like training samples, right? Just your world understanding of tokens of how long stuff takes, right? Go read a book. It'll take you a while, right?Vibhu [00:47:56]: Even if you do nothing but read a book, it takes a few days. So yeah, LLM, I read it took me a few hours.Vibhu [00:48:01]: It'll take me a few hours to go through this research. But this is a tangent.Swyx [00:48:05]: Somewhat, yeah.Swyx [00:48:06]: This is a train of thought I haven't really expressed until now is, which is basically like a full world model must also be recursive, meaning that the participant in the world model must also be aware that they have a world model. which is like this whole recursive thing down the, down the line. but yes, and that the world model can be wrong and that they need to update it and blah. Yeah. We've, argued this on the, newsletter as well, that there needs to be sort of recursive or adversarial world models.World Models: Real-Time, Long-Horizon, Interactive VideoVibhu [00:48:34]: just, to ask, how do you define world model?Swyx [00:48:38]: Oh, yeah, let's go there.Ethan [00:48:40]: SoVibhu [00:48:40]: So just for context, we talked about, video generation, and then there's a-- if you say there's a distinction between world models, what's your, what's your definition? How do you see the two?Ethan [00:48:53]: So disclaimer, I'm not going to debate, what is world model. Yeah. there are many definitions, so I'll just talk about my definition. Since I came from the multi-model, multi-model domain, so mainly talking from video. So world model is like real-time interactive long horizon videos. So there are three parts. so we-- let's talk about them one by one. So the so interaction, so we just, we just look at Facebook and neural computer. So the interaction part of it, so you, world model can allow you to interact with them through keyboard, mouse, and maybe also voice. So these all is-- all is a modality. You can, you can interact with the model, and the model should respond reasonably. Second part is real time. So once you, once, say, you move your mouse, if, say, the world model generate a game, how fast can the game respond? So if you're like professional CS: GO players- -my say, oh, you have to respond- He's beginner within sub ten milliseconds or- Yeah even less. So that's not most of the- No, sixty FPS. Let's go. Oh, three hundred FPS. Oh, five hundred FPS. Wait. okay, yeah. I didn't do the math, but yeah, okay. Uh- Yeah, three hundred FPS, that's a three millisecond. So you have to respond- Oh, s**t. Okay. YeahEthan [00:50:29]: within a millisecond. Most of the video models cannot do that. Yeah. And, but if you, say, if you have a video model that is, say, like a digital human, the response time might be more generous. Maybe typically, for real-time voice interaction, it's like two hundred millisecond. So that's, that's much more generous. But even two hundred millisecond is pretty, it is pretty tricky, ‘cause remember we mentionedEthan [00:51:01]: you have this, temporal compression coming from the VAE. So if you, if you don't compress the temporal dimension, your sequence length is going to explode. So if you want to have this real-time, real-timeness in your model, you have to do is one context problem. And the third part is long horizon, ‘cause we-- if you're not going to just play with, video games just, a few seconds, most video models only a few seconds. We're going to play with minutes, hours. The model have to be able to generate long-form content.Ethan [00:51:42]: So putting these three together, it's, real-time, long horizon interactive videos. I think the final state will be, for example, like a video, a video version of Playbook, where you can, you can interact with, a neural computer. You move your mouse, and you click on the generative interface, and it will reply to you through pixels- generating in real time. But getting there, it's, it's a very long way to get there. So one of the first step, at Grok Imagine, where I led a small world model team there, was to build video extension. So, video extension- it's the first step of interactivity. Yeah. It's, it's the first step. Yeah. So it's the first step- You have it here, video editing, yeah. Yeah. Yeah. So the first step is because, this unlocks long horizon videos. Typically, for most of the video generation models, you give it a prompt or an image as an initial frame. You generate video, that's it. That's just, one time, done. And some creators would try to, use the last frame as a first frame for the second video. It can-- sometimes it works, but if you do it a few times, it says the quality would decrease. And- It doesn't have that context- Yeah over the full video, so the temporal- Yeah, exactly. Yeah, ‘cause you only gave it the last frame, of course, right? Yeah. Exactly. And- it's actually a pretty fun hack. if you've seen like- Oh, no, he's saying something better. Yeah. And for example, like Vue, I remember Vue 3 has like a second context of the last video. It is slightly better than using the last frame, but it has the same problem-- similar problem that it, the quality would decrease. if you extend a few times to, one minute, the video quality would look much worse than the first video. Second, another problem is that the model doesn't have long-range knowledge of, what's happening before. Say, if they generate some dialogue, some, two people speaking, and their voice might change, over some time, especially if the second conditioning, it does not cover the previous context. So these are the core challenges. So the Grok Imagine video extension, it has historical context of all of the previous generated videos. It can, It has, it has the context of, who is speaking and what objects have appeared and everything, having that to generate the next video. So if we naively do this, you can imagine, just, put all of the previous history video tokens into the context. The context lens will easily explode. Especially for video models, that can be like a few, a few million context, I would imagine- context lens. Yes.Yeah.Swyx [00:54:58]: Let's run with that.Ethan [00:54:59]: for example, like in Cosmos, I think just five seconds of video is like a fifty K or sixty K number of tokens. So like if you do, if you do fifty second, that's a five hundred K tokens. If you do longer than that, easily explode. This long horizon, problem was the first step we're trying to solve world model. It turns out people, yeah, people love video extension. Like a lot, a lot of the creators love using video extension to create longer form videos. This is the part I liked that you have a, you have an intermediate step toward the final goal instead of just a straight shot to the final version very much.Swyx [00:55:48]: But I can see you have a strong vision of where we want to end up.Long Context, Redundancy, and Efficient Interactive VideoVibhu [00:55:51]: Does it seem like it's an efficiency issue? okay, we're at a few million tokens context,. If you draw the parallel to language models, we had very short context, two thousand, eight thousand, then, you scale it up one million, ten million. sure, there's effective context, but at the end of the day, it's just what's it worth? sure, there's a whole training data side. In video, it might be slightly easier ‘cause we have a hundred million token video, right? Just take a movie with the full context there. Like is this efficiency from an inference standpoint that like it's expensive, but we know how to solve it? Or like why is this not the approach? So like my broader point was on your second point of world models, you say it needs to be interactive and live, right? You should be able to play a game and see the interaction live. So one thing I see with research is a lot of what you actually serve is different than what you build, right? So we talked about distillation. You train big model, you distill it, you do quantization, speculative decoding. We do all this stuff to serve it efficiently. Should we not just have a solution, like a world model that can interact well, do inference optimization, serve it, distill it secondary, so make it real time after you solve it? So like a-- another parallel is say, continual learning, right? What we need is someone to solve it and show it works inefficiently. Give it a few years, people will make it efficient. Same thing with regular attention, right? It worked. Over a few years, people have different forms of attention, and we've scaled it to be efficient at log context,? So kind of two things there, right? One is it seems like it works. You've scaled it. Can we not just scale it a lot more efficiently over time? Do we need a separate approach if this works? And same thing with interaction, right? if we can get it done, like if we can solve some way that it works, we can solve making it more efficient from an inference standpoint later.Ethan [00:57:53]: that's actually a very good point. So in videos, there's actually a lot of redundancies. So we solve a lot of the pixel redundancy from VE, but there's more redundancy in long range and long horizon videos. Say, if a character appear in the first clip and then it disappeared, it only reappear at the end of the video, you probably don't need the-- the context, like in the middle of the generation. So you only need that character, where you need. So that's why, I helped build another feature. It's a reference video.Vibhu [00:58:36]: Is it here?Swyx [00:58:36]: is it the same model release or different one?Ethan [00:58:39]: It's a different one.Ethan [00:58:41]: You probably need to search onSwyx [00:58:43]: I'll find itEthan [00:58:43]: X reference to video.Ethan [00:58:46]: So reference video allow you to like upload up to seven images as condition and generate the video. Say, if like I want-- it can, it can be characters or objects or even scenes. Say like I want, I want condition on, Sean's selfie and holding a bladeSwyx [00:59:07]: We have a dogEthan [00:59:08]: or whatever.Swyx [00:59:08]: We put the dog in the thing.Ethan [00:59:09]: you can put them there and the video models will generate the video from and copies the context over. So that can solve a lot of the problems there, like the long context problem. It doesn't need to have a very long context, but it's-- I feel like it's an intermediate solution. The modelSwyx [00:59:29]: It's cheating.Ethan [00:59:30]: the model should be able to like selectively know, where should I draw the references. So say if I want to generate a movie, I generate it autoregressive, like a ten second at a time or something. And now this character appear, I can look back to where it first appear and, bring that back. Yeah, this one, I put the references. Yeah, that's, Optimus, Einstein myself, Annie.Vibhu [01:00:02]: Oddly enough, I used Grok Search to find it, and it pulled your LinkedIn post. But yeah we found it.Ethan [01:00:08]: Interesting.Vibhu [01:00:10]: ButxAI's Underrated Work, Culture, and WatermarkingSwyx [01:00:11]: this is a problem. This is not your fault, but like XAI doesn't communicate all this work that you do very well because they just have the model release and then that's it. But actually, these details are very good.Swyx [01:00:22]: As far as I understand, everything you just described is state-art, like no one else has done it.Vibhu [01:00:30]: A lot of-- yeah, I have a lot moreSwyx [01:00:32]: And then, and then you just put this blog post with the cookies. I'm this is not enough,?Swyx [01:00:37]: but I, obviously this is like the high level numbers that people want to know. But no, okay, soVibhu [01:00:42]: And I wonder, like part of that is also some labs don't share research into what happens. And ifSwyx [01:00:50]: No, but this is literally bragging about how good they are, right?Swyx [01:00:54]: Like, why would you not say that you are capable of extending with full context? this is not a secret sauce. This is like we did the work. yeah, I don't know.Ethan [01:01:02]: different labs have slightly different communication styles.Swyx [01:01:07]: Anyway, if anyone from XAI is listening we are always happy to help you tell your story. Yeah, okay, so you did references, and I think, I think kind of the point you're, you're making is it is sort of like a kludge, right? this is-- you can do seven, but what about 100?Swyx [01:01:23]: Right? Then you need a completely different thing.Ethan [01:01:26]: So I think it's-- this is, a mechanism to, select the context from the history, and you might not put the entire history into the context. for example, there's a paper called Frame Pack, which haveEthan [01:01:41]: a heuristic that the latest history, the last one second, I put the entire history, and the history before that, I would, compress it and makes the video smaller. So they follow this pattern, this build overall pattern that the maximum sequence length is fixed. So the further you are from the current frame, you have a smaller image. So this is just a heuristic. I think it can be more automatic. The model is aware like which history part of it can be select. So this part of the research is actually being actively, worked on by a lot of people. It's also quite interesting. I feel this is actually, this part of long context is a little bit ahead of the LLM part.Ethan [01:02:31]: So for example, like in LLMs, if you-- so contexts keep growing. Let's say if you call tool and the tool call history is extremely long, that's still in context, and keep growing, keep growing. Even if you switch the topic to something else, the whole context was there. There are some agentic harnesses that help you to, say, prune the tool results and, prune Like when you, when you query a file, only show like the top 200 lines or something. Those were very heuristic-driven.Swyx [01:03:08]: For listeners, we did a write-up on the cloud code, leak where there are eight different kinds of pruning, including like you prune the tool results and all that. So you can, you can read up on that kind of thing.Ethan [01:03:17]: I think, one breakthrough in continual learning might be like a way to automatically, manage its own context.Swyx [01:03:27]: These are all heuristics, and they will be replaced by machine learning.Ethan [01:03:30]: InterestinglyVibhu [01:03:32]: TheEthan [01:03:32]: the same thing is being researched in both LLMs and video models.Vibhu [01:03:36]: The interesting thing is also like in the paper you showed, it's actually happening at the model level, right? Compared to like language models, sure, we have base attention, but we'll do our own compression, we'll do our own pruning, which is separate from model error.Vibhu [01:03:49]: Eventually, it all just boils in, hopefully.Swyx [01:03:52]: I think this is a form of like attention, but like also know sort of reasoning attention. I feel like that's different than normal attention.Swyx [01:04:03]: Does that, does that make sense?Ethan [01:04:04]: It's, it's different in the sense that attention, not to mention, set sparse attention aside,

christmas god tiktok ai culture google future speaking phd research video data elon musk cost dm chatgpt tesla os new years reddit cycle mask doom lake agent windows albert einstein models tier playbook gemini openai cv compare nvidia robotics cgi fifty cosmos ve io gpt ui aws ml photoshop career path llama llm sora anthropic fps colossus vit genai gpu grok omni firefox elo kosmos storing codex prs gpus diffusion 7b gan optimus 2x vue tldr great pyramid redundancy cs go xai subtitles compute gans lm paperclips cpus luma stable diffusion github copilot mp4 asr scm uis py distillation yann lecun moes vae bfl kosmo ffmpeg lcm loras cfg html css vl m watermarking resnet ian goodfellow cloud code flipbook latent space ltx mfu

Zeit der Krähen - 242 Jaime

Von Eis und Feuer - Ein Buch-Podcast

Play Episode Listen Later May 30, 2026 53:58

Seid dabei, wenn Clara und Jan Jaime bei Tywins Trauerzug und passiv-aggressiven Gesprächen mit Kevan, einer oberlehrerhaften Unterhaltung mit Loras und der obligatorischen, liebevollen Diskussion mit Cersei begleiten. Hierbei erlangt Jaime zunehmend die unangenehme Erkenntnis, dass seine große Liebe vielleicht doch eher Richtung „wahnsinniger Pyromanen-König“ driftet. Nebenbei sprechen wir über freudebringende Namensgebung, berühmte Ritter der Königsgrade und eine neue Lieblingsnebenfigur, die nicht nur uns, sondern auch Cersei besonders gut gefällt.

liebe diskussion richtung seid erkenntnis unterhaltung ritter nebenbei hierbei cersei kevan namensgebung loras

ATN Podcast 406: Looking ahead at Draft, Duhawks, Diplomats

D3football.com » D3football.com Around the Nation Podcast

Play Episode Listen Later Apr 20, 2026 88:07

The NFL Draft is coming up this week, and Division III has had only three players drafted since 2012, all of them offensive linemen who were thought of quite highly and drafted quite highly: Ali Marpet of Hobart, Ben Bartch of St. John’s and Quinn Meinerz of UW-Whitewater. If Division III has a player drafted from any position other than O-line, it’s probably most likely to be Ty Montgomery of John Carroll, who we talked about back in Podcast 404. Why has it been so hard for D-III guys to get drafted lately? We bring in a player agent to talk about what he’s seen in two decades of representing Division III football talent, what NFL teams are looking for and what the prerequisite is before one ever even gets to a Pro Day. John Perez, who played D-III ball at William Paterson, talks us through it. It’s a new day at Loras, as the Duhawks have a new head coach in former St. John’s defensive coordinator Brandon Novak, are hoping to generate some buzz of their own in Dubuque, Iowa, and looking to take a step forward for a program which hasn’t won more than five games in a season since 2008. Novak has filled his key assistant coaching slots with some young guys, he inherits a non-conference schedule already laid out for him, and he has to adjust from wearing red for more than a quarter-century to purple. We also talk with Mike Phelan, whose first year as head coach at Franklin & Marshall was incredibly successful, as the Diplomats won their rivalry game and won the Centennial Conference’s automatic bid. He talks about NFL hopeful Gary Lewis, a wide receiver for F&M last year, plus Phelan talks about who his mentors are as a coach and who he gets advice from when needed. In addition, we take your questions on the Southern Collegiate Athletic Conference managing to snag an automatic bid, and on Anna Maria College getting essentially a vote of no-confidence from the state of Massachusetts. Where might podcast co-host Patrick Coleman end up going for a game in Week 1 of this season? We discuss the story of Ithaca’s Derek Slywka and his pro football journey. And Patrick and fellow co-host Greg Thomas have some news about the immediate future of this podcast. Our guests on this podcast: Loras coach Brandon Novak, Franklin & Marshall coach Mike Phelan, John Perez of Perez Sports Associates. Links: A look at the old Loras Fieldhouse, from 2018 on D3hoops.com That and more in this edition of the D3football.com Around the Nation podcast. The D3football.com Around the Nation podcast is a weekly conversation about NCAA Division III football, and has been running since 2007. Hit play, or subscribe to get this podcast on your mobile device. You can subscribe to the Around the Nation Podcast in Apple Podcasts, and many other places. You can also get this and any of our future Around the Nation podcasts automatically by subscribing to this RSS feed: http://www.d3blogs.com/d3football/?feed=podcast Here’s how to find us on some of the major podcasting apps: Apple Podcasts: D3football.com Around the Nation Podcast iHeart Radio: D3football.com Around the Nation Podcast Spotify: D3football.com Around the Nation Podcast Photos: Loras athletics; Franklin & Marshall athletics

nfl draft massachusetts iowa nfl draft looking ahead diplomats novak hobart ithaca pro day phelan division iii dubuque diii nation podcast john carroll ncaa division iii gary lewis ty montgomery uw whitewater greg thomas loras brandon novak quinn meinerz ali marpet william paterson john perez mike phelan anna maria college patrick coleman duhawks atn podcast

23.37b: Selection Sunday Special - Part 2

Hoopsville

Play Episode Listen Later Mar 1, 2026 124:27

It is an annual event. The regular season is officially over and teams eagerly wait to see where they may be playing in the NCAA Tournaments. We now know who has made the tournaments, so we can dive a bit more into where teams may be headed and how brackets could come together. Tune in to the annual Hoopsville Selection Sunday Special with Dave McHugh who will have interviews with teams who have clinched their bids automatically. Plus, hear from both national committee chairs on the landscape they see in front of them and their committees as they try and bracket with new priorities and perspectives. Guests appearing on the Hudl Hoopsville Hotline in Part 2: - John Garrett, Hunter women's coach - Kris Huffman, DePauw women's coach - Brian Lane, Transylvania men's coach - Panelists: - Pat Coleman, D3hoops.com Editor-in-Chief - Ryan Scott, bracketologist and D3hoops.com Around the Nation Columnist - Riley Zayas Plus hear from Loras men's coach following their conference championship. Hoopsville is presented by D3hoops.com from the WBCA Studios.

ncaa tournament special part transylvania selection sunday sunday special john garrett depauw loras brian lane

Reference Images vs LoRAs: Which Actually Works?

VP Land

Play Episode Listen Later Jan 23, 2026 38:40 Transcription Available

Addy and Joey put two AI workflows head-to-head: training a LoRA with Z-Image versus using reference images with Nano Banana Pro to replicate film cinematography. In this episode, we test whether LoRAs are still necessary for capturing specific cinematography styles, or if reference images alone can deliver the same results. Using the VistaVision look from the film One Battle After Another as our target, we explore practical workflows for pre-production and previsualization.--The views and opinions expressed in this podcast are the personal views of the hosts and do not necessarily reflect the views or positions of their respective employers or organizations. This show is independently produced by VP Land without the use of any outside company resources, confidential information, or affiliations.

ai land images actually works loras

23.15: Unranked Success

Hoopsville

Play Episode Listen Later Jan 12, 2026 153:48

Some of the best hoops happening in Division III don't necessarily involving Top 25 teams! On Monday's Hoopsville, we chat with programs not even receiving votes in Week 5's D3hoops.com's Top 25 polls. Learn how their seasons are progressing and why we could be talking about them in March. And if they are okay with flying under the radar. Plus a look a head at how the latest Top 25 polls could look with results over the weekend playing a role. Guests appearing on the Hudl Hoopsville Hotline: - Blake Arbogast, East Texas Baptist women's coach - Will Chandler, Misericordia men's coach - Chris Martin, Loras men's coach - Carolae Barton & Emily Grasseler, Merchant Marine women's sophomore & junior forwards Dave will also lay down his '2 Cents' on DIII legislation at the NCAA Convention that, on the face of it, brings back the famed 'gray shirt' year. Is it good for student-athletes? Is it good for DIII? What are the drawbacks? Hoopsville is presented by D3hoops.com from the WBCA Studios.

success cents chris martin misericordia division iii diii merchant marines unranked loras ncaa convention will chandler

ThursdAI - Jan 8 - Vera Rubin's 5x Jump, Ralph Wiggum Goes Viral, GPT Health Launches & XAI Raises $20B Mid-Controversy

The top AI news from the past week, every ThursdAI

Play Episode Listen Later Jan 8, 2026 106:57

Hey folks, Alex here from Weights & Biases, with your weekly AI update (and a first live show of this year!) For the first time, we had a co-host of the show also be a guest on the show, Ryan Carson (from Amp) went supernova viral this week with an X article (1.5M views) about Ralph Wiggum (yeah, from Simpsons) and he broke down that agentic coding technique at the end of the show. LDJ and Nisten helped cover NVIDIA's incredible announcements during CES with their Vera Rubin upcoming platform (4-5X improvements) and we all got excited about AI medicine with ChatGPT going into Health officially! Plus, a bunch of Open Source news, let's get into this: ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Open Source: The “Small” Models Are WinningWe often talk about the massive frontier models, but this week, Open Source came largely from unexpected places and focused on efficiency, agents, and specific domains.Solar Open 100B: A Data MasterclassUpstage released Solar Open 100B, and it's a beast. It's a 102B parameter Mixture-of-Experts (MoE) model, but thanks to MoE magic, it only uses about 12B active parameters during inference. This means it punches incredibly high but runs fast.What I really appreciated here wasn't just the weights, but the transparency. They released a technical report detailing their “Data Factory” approach. They trained on nearly 20 trillion tokens, with a huge chunk being synthetic. They also used a dynamic curriculum that adjusted the difficulty and the ratio of synthetic data as training progressed. This transparency is what pushes the whole open source community forward.Technically, it hits 88.2 on MMLU and competes with top-tier models, especially in Korean language tasks. You can grab it on Hugging Face.MiroThinker 1.5: The DeepSeek Moment for Agents?We also saw MiroThinker 1.5, a 30B parameter model that is challenging the notion that you need massive scale to be smart. It uses something they call “Interactive Scaling.”Wolfram broke this down for us: this agent forms hypotheses, searches for evidence, and then iteratively revises its answers in a time-sensitive sandbox. It effectively “thinks” before answering. The result? It beats trillion-parameter models on search benchmarks like BrowseComp. It's significantly cheaper to run, too. This feels like the year where smaller models + clever harnesses (harnesses are the software wrapping the model) will outperform raw scale.Liquid AI LFM 2.5: Running on Toasters (Almost)We love Liquid AI and they are great friends of the show. They announced LFM 2.5 at CES with AMD, and these are tiny ~1B parameter models designed to run on-device. We're talking about running capable AI on your laptop, your phone, or edge devices (or the Reachy Mini bot that I showed off during the show! I gotta try and run LFM on him!)Probably the coolest part is the audio model. Usually, talking to an AI involves a pipeline: Speech-to-Text (ASR) -> LLM -> Text-to-Speech (TTS). Liquid's model is end-to-end. It hears audio and speaks audio directly. We watched a demo from Maxime Labonne where the model was doing real-time interaction, interleaving text and audio. It's incredibly fast and efficient. While it might not write a symphony for you, for on-device tasks like summarization or quick interactions, this is the future.NousCoder-14B and Zhipu AI IPOA quick shoutout to our friends at Nous Research who released NousCoder-14B, an open-source competitive programming model that achieved a 7% jump on LiveCodeBench accuracy in just four days of RL training on 48 NVIDIA B200 GPUs. The model was trained on 24,000 verifiable problems, and the lead researcher Joe Li noted it achieved in 4 days what took him 2 years as a teenager competing in programming contests. The full RL stack is open-sourced on GitHub and Nous published a great WandB results page as well! And in historic news, Zhipu AI (Z.ai)—the folks behind the GLM series—became the world's first major LLM company to IPO, raising $558 million on the Hong Kong Stock Exchange. Their GLM-4.7 currently ranks #1 among open-source and domestic models on both Artificial Analysis and LM Arena. Congrats to them!Big Companies & APIsNVIDIA CES: Vera Rubin Changes EverythingLDJ brought the heat on this one covering Jensen's CES keynote that unveiled the Vera Rubin platform, and the numbers are almost hard to believe. We're talking about a complete redesign of six chips: the Rubin GPU delivering 50 petaFLOPS of AI inference (5x Blackwell), the Vera CPU with 88 custom Olympus ARM cores, NVLink 6, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet.Let me put this in perspective using LDJ's breakdown: if you look at FP8 performance, the jump from Hopper to Blackwell was about 5x. The jump from Blackwell to Vera Rubin is over 3x again—but here's the kicker—while only adding about 200 watts of power draw. That's insane efficiency improvement.The real-world implications Jensen shared: training a 10 trillion parameter mixture-of-experts model now requires 75% fewer GPUs compared to Blackwell. Inference token costs drop roughly 10x—a 1MW cluster goes from 1 million to 10 million tokens per second at the same power. HBM4 memory delivers 22 TB/s bandwidth with 288GB capacity, exceeding NVIDIA's own 2024 projections by nearly 70%.As Ryan noted, when people say there's an AI bubble, this is why it's hilarious. Jensen keeps saying the need for inference is unbelievable and only going up exponentially. We all see this. I can't get enough inference—I want to spin up 10 Ralphs running concurrently! The NVL72 rack-scale system achieves 3.6 exaFLOPS inference with 20.7TB total HBM, and it's already shipping. Runway 4.5 is already running on the new platform, having ported their model from Hopper to Vera Rubin NVL72 in a single day.NVIDIA also recently acqui-hidred Groq (with a Q) in a ~$20 billion deal, bringing the inference chip expertise from the guy who created Google's TPUs in-house.Nemotron Speech ASR & The Speed of Voice (X, HF, Blog)NVIDIA also dropped Nemotron Speech ASR. This is a 600M parameter model that offers streaming transcription with 24ms latency.We showed a demo from our friend Kwindla Kramer at Daily. He was talking to an AI, and the response was virtually instant. The pipeline is: Nemotron (hearing) -> Llama/Nemotron Nano (thinking) -> Magpie TTS (speaking). The total latency is under 500ms. It feels like magic. Instant voice agents are going to be everywhere this year.XAI Raises $20B While Grok Causes Problems (Again)So here's the thing about covering anything Elon-related: it's impossible to separate signal from noise because there's an army of fans who hype everything and an army of critics who hate everything. But let me try to be objective here.XAI raised another massive Round E of $20 billion! at a $230 billion valuation, with NVIDIA and Cisco as strategic investors. The speed of their infrastructure buildout is genuinely incredible. Grok's voice mode is impressive. I use Grok for research and it's really good, notable for it's unprecedented access to X !But. This raise happened in the middle of a controversy where Grok's image model was being used to “put bikinis” on anyone in reply threads, including—and this is where I draw a hard line—minors. As Nisten pointed out on the show, it's not even hard to implement guardrails. You just put a 2B VL model in front and ask “is there a minor in this picture?” But people tested it, asked Grok not to use the feature, and it did it anyway. And yeah, putting Bikini on Claude is funny, but basic moderation is lacking! The response of “we'll prosecute illegal users” is stupid when there's no moderation built into the product. There's an enormous difference between Photoshop technically being able to do something after hours of work, and a feature that generates edited images in one second as the first comment to a celebrity, then gets amplified by the platform's algorithm to millions of people. One is a tool. The other is a product with amplification mechanics. Products need guardrails. I don't often link to CNN (in fact this is the first time) but they have a great writeup about the whole incident here which apparently includes the quitting of a few trust and safety folks and Elon's pushback on guardrails. CrazyThat said, Grok 5 is in training and XAI continues to ship impressive technology. I just wish they'd put the same engineering effort into safety as they do into capabilities!OpenAI Launches GPT HealthThis one's exciting. OpenAI CEO Fidji Simo announced ChatGPT Health, a privacy-first space for personalized health conversations that can connect to electronic health records, Apple Health, Function Health, Peloton, and MyFitnessPal.Here's why this matters: health already represents about 5% of all ChatGPT messages globally and touches 25% of weekly active users—often outside clinic hours or in underserved areas. People are already using these models for health advice constantly.Nisten, who has worked on AI doctors since the GPT-3 days and even published papers on on-device medical AI, gave us some perspective: the models have been fantastic for health stuff for two years now. The key insight is that medical data seems like a lot, but there are really only about 2,000 prescription drugs and 2,000 diseases (10,000 if you count rare ones). That's nothing for an LLM. The models excel at pattern recognition across this relatively contained dataset.The integration with Function Health is particularly interesting to me. Function does 160+ lab tests, but many doctors won't interpret them because they didn't order them. ChatGPT could help bridge that gap, telling you “hey, this biomarker looks off, you should discuss this with your doctor.” The bad news is, this is just a waitlist and you can add yourself to the waitlist here, we'll keep monitoring the situation and let you know when it opens upDoctronic: AI Prescribing Without Physician OversightSpeaking of healthcare, Doctronic launched a pilot in Utah where AI can autonomously renew prescriptions for chronic conditions without any physician in the loop. The system covers about 190 routine medications (excluding controlled substances) at just $4 per renewal. Trial data showed 99.2% concordance with physician treatment plans, and they've secured pioneering malpractice insurance that treats the AI like a clinician.Nisten made the case that it's ethically wrong to delay this kind of automation when ER wait times keep increasing and doctors are overworked. The open source models are already excellent at medical tasks. Governments should be buying GPUs rather than creating administrative roadblocks. Strong strong agree here! Google Brings Gmail into the Gemini Era (X)Breaking news from the day of our show: Google announced Gmail's biggest AI transformation since its 2004 launch, powered by Gemini 3. This brings AI Overviews that summarize email threads, natural language queries (”Who gave me a plumber quote last year?”), Help Me Write, contextual Suggested Replies matching your writing style, and the upcoming AI Inbox that filters noise to surface VIPs and urgent items.For 3 billion Gmail users, this is huge. I'm very excited to test it—though not live on the show because I don't want you reading my emails.This weeks buzz - covering Weights & Biases updatesNot covered on the show, but a great update on stuff from WandB, Chris Van Pelt (@vanpelt), one of the 3 co-founders released a great project I wanted to tell you about! For coders, this is an app that allows you to run multiple Claude Codes on free Github sandboxes, so you can code (or Ralph) and control everything away from home! GitHub gives personal users 120 free Codespaces hours/month, and Catnip automatically shuts down inactive instances so you can code for quite a while with Catnip! It's fully open source on Github and you can download the app hereInterview: Ryan Carson - What the hell is Ralph Wiggum?Okay, let's talk about the character everyone is seeing on their timeline: Ralph Wiggum. My co-host Ryan Carson went viral this week with an article about this technique, and I had to have him break it down.Ralph isn't a new model; it's a technique for running agents in a loop to perform autonomous coding. The core idea is deceptively simple: Ralph is a bash script that loops an AI coding agent. In a loop, until it a certain condition is met. But why is it blowing up? Normally when you use a coding agent like Cursor, Claude Code, or AMP, you need to be in the loop. You approve changes, look at code, fix things when the agent hits walls or runs out of context. Ralph solves this by letting the agent run autonomously while you sleep.Here's how it works: First, you write a Product Requirements Doc (PRD) by talking to your agent for a few minutes about what you want to build. Then you convert that PRD into a JSON file containing atomic user stories with clear acceptance criteria. Each user story is small enough for the agent to complete in one focused thread.The Ralph script then loops: it picks the first incomplete user story, the agent writes code to implement it, tests against the acceptance criteria, commits the changes, marks the story as complete, writes what it learned to a shared “agents.md” file, and loops to the next story. That compound learning step is crucial—without it, the agent would keep making the same mistakes.What makes this work is the pre-work. As Ryan put it, “no real work is done one-shot.” This is how software engineering has always worked—you break big problems into smaller problems into user stories and solve them incrementally. The innovation is letting AI agents work through that queue autonomously while you sleep! Ryan's excellent (and viral) X article is here! Vision & VideoLTX-2 Goes Fully Open Source (HF, Paper)Lightricks finally open-sourced LTX-2, marking a major milestone as the first fully open audio-video generation model. This isn't just “we released the weights” open—it's complete model weights (13B and 2B variants), distilled versions, controllable LoRAs, a full multimodal trainer, benchmarks, and evaluation scripts. For a video model that is aiming to be the open source SORA, supports audio and lipsyncThe model generates synchronized audio and video in a single DiT-based architecture—motion, dialogue, ambience, and music flow simultaneously. Native 4K at up to 50 FPS with audio up to 10 seconds. And there's also a distilled version (Thanks Pruna AI!) hosted on ReplicateComfyUI provided day-0 native support, and community testing shows an A6000 generating 1280x720 at 120 frames in 50 seconds. This is near Sora-level quality that you can fine-tune on your own data for custom styles and voices in about an hour.What a way to start 2026. From chips that are 5x faster to AI doctors prescribing meds in Utah, the pace is only accelerating. If anyone tells you we're in an AI bubble, just show them what we covered today. Even if the models stopped improving tomorrow, the techniques like “Ralph” prove we have years of work ahead of us just figuring out how to use the intelligence we already have.Thank you for being a ThursdAI subscriber. See you next week!As always, here's the show notes and TL;DR links: * Hosts & Guests* Alex Volkov - AI Evangelist & Weights & Biases (@altryne)* Co-Hosts - @WolframRvnwlf, @nisten, @ldjconfirmed* Special Guest - Ryan Carson (@ryancarson) breaking down the Ralph Wiggum technique.* Open Source LLMs* Solar Open 100B - Upstage's 102B MoE model. Trained on 19.7T tokens with a heavy focus on “data factory” synthetic data and high-performance Korean reasoning (X, HF, Tech Report).* MiroThinker 1.5 - A 30B parameter search agent that uses “Interactive Scaling” to beat trillion-parameter models on search benchmarks like BrowseComp (X, HF, GitHub).* Liquid AI LFM 2.5 - A family of 1B models designed for edge devices. Features a revolutionary end-to-end audio model that skips the ASR-LLM-TTS pipeline (X, HF).* NousCoder-14B - competitive coding model from Nous Research that saw a 7% LiveCodeBench accuracy jump in just 4 days of RL (X, WandB Dashboard).* Zhipu AI IPO - The makers of GLM became the first major LLM firm to go public on the HKEX, raising $558M (Announcement).* Big Co LLMs & APIs* NVIDIA Vera Rubin - Jensen Huang's CES reveal of the next-gen platform. Delivers 5x Blackwell inference performance and 75% fewer GPUs needed for MoE training (Blog).* OpenAI ChatGPT Health - A privacy-first vertical for EHR and fitness data integration (Waitlist).* Google Gmail Era - Gemini 3 integration into Gmail for 3 billion users, featuring AI Overviews and natural language inbox search (Blog).* XAI $20B Raise - Elon's XAI raises Series E at a $230B valuation, even as Grok faces heat over bikini-gate and safety guardrails (CNN Report).* Doctronic - The first US pilot in Utah for autonomous AI prescription renewals without a physician in the loop (Web).* Alexa+ Web - Amazon brings the “Smart Alexa” experience to browser-based chat (Announcement).* Autonomous Coding & Tools* Ralph Wiggum - The agentic loop technique for autonomous coding using small, atomic user stories. Ryan Carson's breakdown of why this is the death of “vibe coding” (Viral X Article).* Catnip by W&B - Chris Van Pelt's open-source iOS app to run Claude Code anywhere via GitHub Codespaces (App Store, GitHub).* Vision & Video* LTX-2 - Lightricks open-sources the first truly open audio-video generation model with synchronized output and full training code (GitHub, Replicate Demo).* Avatar Forcing - KAIST's framework for real-time interactive talking heads with ~500ms latency (Arxiv).* Qwen Edit 2512 - Optimized by PrunaAI to generate high-res realistic images in under 7 seconds (Replicate).* Voice & Audio* Nemotron Speech ASR - NVIDIA's 600M parameter streaming model with sub-100ms stable latency for massive-scale voice agents (HF). This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe

health ai google running government elon musk er blog utah chatgpt trial cnn web jump speech controversy products ios viral korean ces simpsons instant ipo gemini spectrum function nvidia launches trained delivers dit cisco peloton liquid gmail raises gpt open source announcement 5m photoshop github technically 1b tb runway amd bikini llm sora 2b tl fps hopper blackwell grok waitlist vips gpus optimized xai ehr myfitnesspal 5x cursor rl json wolfram ethernet replicate mixture 600m hf inference prd catnip arxiv ralphs series e vera rubin apple health groq bluefield ryan carson 7t dpu hbm glm loras tech report ralph wiggum hong kong stock exchange hkex lfm codespaces nvlink ltx connect x native 4k

Perspectives on: GenAI in Fashion

ISM Perspectives on...

Play Episode Listen Later Dec 16, 2025 35:00

Wie bei vielen Phänomenen unserer Gegenwart offenbaren sich auch beim Einsatz von GenAI im Kontext der Fashion-Branche sowohl Potentiale als auch Risiken: Während Branchenvertreter*innen die vielfältigen Innovations- und ökonomischen Potentiale der Technologie wittern, sehen Jurist*innen primär persönlichkeitsrechtlichen Klärungsbedarf. Fashion Models sehen wiederum ihre Existenzgrundlage bedroht. Anlass genug, damit wir uns in der aktuellen Episode "Perspectives on: GenAI in Fashion" zusammen mit dem Experten David Wenzel mit aktuellen Entwicklungen rund um generative KI im Kontext der Modebranche zu beschäftigen. Wie verändern KI-Anwendungen dort die Content Creation, welche Folgen hat dies für Kostenstrukturen, Arbeitsfelder und Konsumentenerwartungen? Ausgehend von diesen Fragen, werden wir außerdem aufzeigen, welche Kompetenzen zukünftig in diesem Arbeitsfeld relevant bleiben und welche ethischen Fragen beim Einsatz von GenAI zu berücksichtigen sind.

innovation fashion perspectives content creation einsatz anlass entwicklungen technologie kontext gegenwart genai kompetenzen jurist potentiale ausgehend künstliche intelligenz fashion models modebranche loras arbeitsfeld existenzgrundlage arbeitsfelder

GPT‑5.1's New Brain, Grok's 2M Context, Omnilingual ASR, and a Terminal UI That Sparks Joy

The top AI news from the past week, every ThursdAI

Play Episode Listen Later Nov 13, 2025 70:20

Hey, this is Alex! We're finally so back! Tons of open source releases, OpenAI updates GPT and a few breakthroughs in audio as well, makes this a very dense week! Today on the show, we covered the newly released GPT 5.1 update, a few open source releases like Terminal Bench and Project AELLA (renamed OASSAS), and Baidu's Ernie 4.5 VL that shows impressive visual understanding! Also, chatted with Paul from 11Labs and Dima Duev from the wandb SDK team, who brought us a delicious demo of LEET, our new TUI for wandb! Tons of news coverage, let's dive in

Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Oct 16, 2025 68:23

In this deep dive with Kyle Corbitt, co-founder and CEO of OpenPipe (recently acquired by CoreWeave), we explore the evolution of fine-tuning in the age of AI agents and the critical shift from supervised fine-tuning to reinforcement learning. Kyle shares his journey from leading YC's Startup School to building OpenPipe, initially focused on distilling expensive GPT-4 workflows into smaller, cheaper models before pivoting to RL-based agent training as frontier model prices plummeted. The conversation reveals why 90% of AI projects remain stuck in proof-of-concept purgatory - not due to capability limitations, but reliability issues that Kyle believes can be solved through continuous learning from real-world experience. He discusses the breakthrough of RULER (Relative Universal Reinforcement Learning Elicited Rewards), which uses LLMs as judges to rank agent behaviors relatively rather than absolutely, making RL training accessible without complex reward engineering. Kyle candidly assesses the challenges of building realistic training environments for agents, explaining why GRPO (despite its advantages) may be a dead end due to its requirement for perfectly reproducible parallel rollouts. He shares insights on why LoRAs remain underrated for production deployments, why GEPA and prompt optimization haven't lived up to the hype in his testing, and why the hardest part of deploying agents isn't the AI - it's sandboxing real-world systems with all their bugs and edge cases intact. The discussion also covers OpenPipe's acquisition by CoreWeave, the launch of their serverless reinforcement learning platform, and Kyle's vision for a future where every deployed agent continuously learns from production experience. He predicts that solving the reliability problem through continuous RL could unlock 10x more AI inference demand from projects currently stuck in development, fundamentally changing how we think about agent deployment and maintenance.Key Topics:* The rise and fall of fine-tuning as a business model* Why 90% of AI projects never reach production* RULER: Making RL accessible through relative ranking* The environment problem: Why sandboxing is harder than training* GRPO vs PPO and the future of RL algorithms* LoRAs: The underrated deployment optimization* Why GEPA and prompt optimization disappointed in practice* Building world models as synthetic training environments* The $500B Stargate bet and OpenAI's potential crypto play* Continuous learning as the path to reliable agentsReferenceshttps://www.linkedin.com/in/kcorbitt/* Aug 2023 https://openpipe.ai/blog/from-prompts-to-models * DEC 2023 https://openpipe.ai/blog/mistral-7b-fine-tune-optimized* JAN 2024 https://openpipe.ai/blog/s-lora* MAY 2024 https://openpipe.ai/blog/the-ten-commandments-of-fine-tuning-in-prod * Oct 2024 https://openpipe.ai/blog/announcing-dpo-support * AIE NYC 2025 Finetuning 500m agents * AIEWF 2025 How to train your agent (ART-E) * SEPT 2025 ACQUISTION https://openpipe.ai/blog/openpipe-coreweave * W&B Serverless RL https://openpipe.ai/blog/serverless-rl?refresh=1760042248153Full Video EpisodeTimestamps00:00 Introductions03:15 The Evolution of OpenPipe: From SFT to RL07:49 The Mistral Era and LoRA Adapters11:40 When You Actually Need Fine-Tuning14:43 The Pivot to Reinforcement Learning21:29 GRPO vs PPO: The Technical Trade-offs24:02 The Environment Problem in RL35:52 JAPA and Automated Prompt Optimization44:35 Open vs Closed Models: The Token Economics50:38 Ruler: Self-Supervised RL Rewards57:09 World Models as Environment Solutions1:00:15 CoreWeave Acquisition and Future Vision Get full access to Latent.Space at www.latent.space/subscribe

ceo ai space building evolution open pivot openai continuous gpt yc fine tuning rl latent japa ppo coreweave corbitt loras startup school

Why Fine-Tuning Lost and RL Won

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Oct 16, 2025

ceo ai lost building openai continuous gpt yc fine tuning rl japa ppo coreweave loras startup school

Weighin' in with Worm: Season 3 Episode 6: Loras Head Mens Wrestling Coach Devin Bahr

Dubuque Area Baseball Podcast

Play Episode Listen Later Sep 24, 2025 32:59

What's going on guys?! Today we have Devin Bahr on the podcast! We sit down with Devin to discuss his new role as the Head Wrestling Coach at Loras Wrestling on the Mens side, we discuss the massive recruiting year they had this off season and what to look ahead for the Duhawks!

head coach wrestling mens worm bahr loras weighin duhawks

ComfyUI Explained: How AI Image Generation Actually Works (Step-by-Step)

VP Land

Play Episode Listen Later Jul 15, 2025 40:01 Transcription Available

Curious how AI actually turns text into images? In this episode, Addy and Joey break down the inner workings of AI image generation and explore ComfyUI. We'll explain the core concepts of latent space, diffusion models, and how noise becomes a recognizable image. From basic text-to-image workflows to advanced techniques with LoRAs and image-to-image transformations, discover when to use ComfyUI versus web-based tools like Runway or ChatGPT for your creative projects.---The views and opinions expressed in this podcast are the personal views of the hosts and do not necessarily reflect the views or positions of their respective employers or organizations. This show is independently produced by VP Land without the use of any outside company resources, confidential information, or affiliations.

ai land chatgpt curious step by step runway actually works image generation loras

MLA 026 AI Video Generation: Veo 3 vs Sora, Kling, Runway, Stable Video Diffusion

Machine Learning Guide

Play Episode Listen Later Jul 12, 2025 40:39

Google Veo leads the generative video market with superior 4K photorealism and integrated audio, an advantage derived from its YouTube training data. OpenAI Sora is the top tool for narrative storytelling, while Kuaishou Kling excels at animating static images with realistic, high-speed motion. Links Notes and resources at ocdevel.com/mlg/mla-26 Try a walking desk - stay healthy & sharp while you learn & code Build the future of multi-agent software with AGNTCY. S-Tier: Google Veo The market leader due to superior visual quality, physics simulation, 4K resolution, and integrated audio generation, which removes post-production steps. It accurately interprets cinematic prompts ("timelapse," "aerial shots"). Its primary advantage is its integration with Google products, using YouTube's vast video library for rapid model improvement. The professional focus is clear with its filmmaking tool, "Flow." A-Tier: Sora & Kling OpenAI Sora: Excels at interpreting complex narrative prompts and has wide distribution through ChatGPT. Features include in-video editing tools like "Remix" and a "Storyboard" function for multi-shot scenes. Its main limits are 1080p resolution and no native audio. Kuaishou Kling: A leader in image-to-video quality and realistic high-speed motion. It maintains character consistency and has proven commercial viability (RMB 150M in Q1 2025). Its text-to-video interface is less intuitive than Sora's. Summary: Sora is best for storytellers starting with a narrative idea; Kling is best for artists animating a specific image. Control and Customization: Runway & Stable Diffusion Runway: An integrated creative suite with a full video editor and "AI Magic Tools" like Motion Brush and Director Mode. Its value is in generating, editing, and finishing in one platform, offering precise control over stylization and in-shot object alteration. Stable Diffusion: An open-source ecosystem (SVD, AnimateDiff) offering maximum control through technical interfaces like ComfyUI. Its strength is a large community developing custom models, LoRAs, and ControlNets for specific tasks like VFX integration. It has a steep learning curve. Niche Tools: Midjourney & More Midjourney Video: The best tool for animating static Midjourney images (image-to-video only), preserving their unique aesthetic. Avatar Platforms (HeyGen, Synthesia): Built for scalable corporate and marketing videos, featuring realistic talking avatars, voice cloning, and multi-language translation with accurate lip-sync. Head-to-Head Comparison Feature Google Veo (S-Tier) OpenAI Sora (A-Tier) Kuaishou Kling (A-Tier) Runway (Power-User Tier) Photorealism Winner. Best 4K detail and physics. Excellent, but can have a stylistic "AI" look. Very strong, especially with human subjects. Good, but a step below the top tier. Consistency Strong, especially with Flow's scene-building. Co-Winner. Storyboard feature is built for this. Co-Winner. Excels in image-to-video consistency. Good, with character reference tools. Prompt Adherence Winner (Language). Best understanding of cinematic terms. Best for imaginative/narrative prompts. Strong on motion, less on camera specifics. Good, but relies more on UI tools. Directorial Control Strong via prompt. Moderate, via prompt and storyboard. Moderate, focused on motion. Winner (Interface). Motion Brush & Director Mode offer direct control. Integrated Audio Winner. Native dialogue, SFX, and music. Major workflow advantage. No. Requires post-production. No. Requires post-production. No. Requires post-production. Advanced Multi-Tool Workflows High-Quality Animation: Combine Midjourney (for key-frame art) with Kling or Runway (for motion), then use an AI upscaler like Topaz for 4K finishing. VFX Compositing: Use Stable Diffusion (AnimateDiff/ControlNets) to generate specific elements for integration into live-action footage using professional software like Nuke or After Effects. All-in-one models lack the required layer-based control. High-Volume Marketing: Use Veo for the main concept, Runway for creating dozens of variations, and HeyGen for personalized avatar messaging to achieve speed and scale. Decision Matrix: Who Should Use What? User Profile Primary Goal Recommendation Justification The Indie Filmmaker Pre-visualization, short films. OpenAI Sora (Primary), Google Veo (Secondary) Sora's storyboard feature is best for narrative construction. Veo is best for high-quality final shots. The VFX Artist Creating animated elements for live-action. Stable Diffusion (AnimateDiff/ComfyUI) Offers the layer-based control and pipeline integration needed for professional VFX. The Creative Agency Rapid prototyping, social content. Runway (Primary Suite), Google Veo (For Hero Shots) Runway's editing/variation tools are built for agency speed. Veo provides the highest quality for the main asset. The AI Artist / Animator Art-directed animated pieces. Midjourney + Kling Pairs the best image generator with a top-tier motion engine for maximum aesthetic control. The Corporate Trainer Training and personalized marketing videos. HeyGen / Synthesia Specialized tools for avatar-based video production at scale (voice cloning, translation). Future Trajectory Pipeline Collapse: More models will integrate audio and editing, pressuring silent-only video generators. The Control Arms Race: Competition will shift from quality to providing more sophisticated directorial tools. Rise of Aggregators: Platforms like OpenArt that provide access to multiple models through a single interface will become essential.

The Potential of AI Video with Seth Woodall

The Marketing Architects

Play Episode Listen Later Mar 25, 2025 37:50

AI video tools have created lifelike Coca-Cola trucks, festive scenes, and even Angela's face in today's episode. These jaw-dropping demonstrations highlight how rapidly AI video has evolved and the debate this technology is creating.This week, Elena and Rob are joined by Seth Woodall, Director of AI Visual Arts at Misfits & Machines and creator of the viral "Pepperoni Hug Spot" commercial. Together, they explore the state of AI video creation, misconceptions about the technology, and what's next for marketers looking to incorporate these tools. Plus, discover why AI-powered creative techniques will become standard production methods and indistinguishable from traditionally created work.Topics covered: [01:00] The controversy behind Coca-Cola's AI holiday commercial[07:00] How Seth created the viral "Pepperoni Hug Spot" AI commercial[15:45] Why AI video expertise is hard to find in traditional agencies[18:00] Predictions for AI video quality improvements[21:30] AI tools like Comfy UI, Flux and LoRAs[26:00] Common misconceptions about how AI video is created[29:30] The democratization of high-quality video creation To learn more, visit marketingarchitects.com/podcast or subscribe to our newsletter at marketingarchitects.com/newsletter. Resources: NBC News 2024 Article: https://www.nbcnews.com/tech/innovation/coca-cola-causes-controversy-ai-made-ad-rcna180665Pepperoni Hug Spot Commercial: https://www.youtube.com/watch?v=qSewd6Iaj6I Get more research-backed marketing strategies by subscribing to The Marketing Architects on Apple Podcasts, Spotify, or wherever you listen to podcasts.

director spotify ai marketing technology video predictions creative creativity advertising coca cola machines marketers misfits flux video marketing woodall commerical loras

Weighin' in with Worm Episode 24: Regionals, Conference, and Key City Showdown..Oh My!

Dubuque Area Baseball Podcast

Play Episode Listen Later Feb 5, 2025 32:59

Today we cover the regional tournaments for the girls, the seeds of the state tournament, and the Key City Showdown between Loras and UD.

conference showdown worm ud regionals loras weighin

Inside the DuHawks Dugout: Coach Carl Tebon Previews the 2025 Season

Dubuque Area Baseball Podcast

Play Episode Listen Later Jan 29, 2025 25:18

On this episode of the Dubuque Area Sports Podcast, Everybody's Favorite Coach, Coach Maneman, sits down with longtime Loras College Head Baseball Coach Carl Tebon as he enters his 31st season at the helm. Coach Tebon recaps last season, previews the 2025 squad, and gives insight into the team's strengths, key returners, and exciting newcomers. He also breaks down position battles, the pitching rotation, and local talent making an impact at Loras. Plus, get the inside scoop on the DuHawks' upcoming season opener against Iowa and their home opener on March 28. If you're a baseball fan or a player looking for the right college fit, this episode is a must-listen!

iowa dugout loras coach carl duhawks

AI Engineering for Art — with comfyanonymous, of ComfyUI

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jan 4, 2025 55:04

Applications for the NYC AI Engineer Summit, focused on Agents at Work, are open!When we first started Latent Space, in the lightning round we'd always ask guests: “What's your favorite AI product?”. The majority would say Midjourney. The simple UI of prompt → very aesthetic image turned it into a $300M+ ARR bootstrapped business as it rode the first wave of AI image generation.In open source land, StableDiffusion was congregating around AUTOMATIC1111 as the de-facto web UI. Unlike Midjourney, which offered some flags but was mostly prompt-driven, A1111 let users play with a lot more parameters, supported additional modalities like img2img, and allowed users to load in custom models. If you're interested in some of the SD history, you can look at our episodes with Lexica, Replicate, and Playground.One of the people involved with that community was comfyanonymous, who was also part of the Stability team in 2023, decided to build an alternative called ComfyUI, now one of the fastest growing open source projects in generative images, and is now the preferred partner for folks like Black Forest Labs's Flux Tools on Day 1. The idea behind it was simple: “Everyone is trying to make easy to use interfaces. Let me try to make a powerful interface that's not easy to use.”Unlike its predecessors, ComfyUI does not have an input text box. Everything is based around the idea of a node: there's a text input node, a CLIP node, a checkpoint loader node, a KSampler node, a VAE node, etc. While daunting for simple image generation, the tool is amazing for more complex workflows since you can break down every step of the process, and then chain many of them together rather than manually switching between tools. You can also re-start execution halfway instead of from the beginning, which can save a lot of time when using larger models.To give you an idea of some of the new use cases that this type of UI enables:* Sketch something → Generate an image with SD from sketch → feed it into SD Video to animate* Generate an image of an object → Turn into a 3D asset → Feed into interactive experiences* Input audio → Generate audio-reactive videosTheir Examples page also includes some of the more common use cases like AnimateDiff, etc. They recently launched the Comfy Registry, an online library of different nodes that users can pull from rather than having to build everything from scratch. The project has >60,000 Github stars, and as the community grows, some of the projects that people build have gotten quite complex:The most interesting thing about Comfy is that it's not a UI, it's a runtime. You can build full applications on top of image models simply by using Comfy. You can expose Comfy workflows as an endpoint and chain them together just like you chain a single node. We're seeing the rise of AI Engineering applied to art.Major Tom's ComfyUI Resources from the Latent Space DiscordMajor shoutouts to Major Tom on the LS Discord who is a image generation expert, who offered these pointers:* “best thing about comfy is the fact it supports almost immediately every new thing that comes out - unlike A1111 or forge, which still don't support flux cnet for instance. It will be perfect tool when conflicting nodes will be resolved”* AP Workflows from Alessandro Perili are a nice example of an all-in-one train-evaluate-generate system built atop Comfy* ComfyUI YouTubers to learn from:* @sebastiankamph* @NerdyRodent* @OlivioSarikas* @sedetweiler* @pixaroma* ComfyUI Nodes to check out:* https://github.com/kijai/ComfyUI-IC-Light* https://github.com/MrForExample/ComfyUI-3D-Pack* https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait* https://github.com/pydn/ComfyUI-to-Python-Extension* https://github.com/THtianhao/ComfyUI-Portrait-Maker* https://github.com/ssitu/ComfyUI_NestedNodeBuilder* https://github.com/longgui0318/comfyui-magic-clothing* https://github.com/atmaranto/ComfyUI-SaveAsScript* https://github.com/ZHO-ZHO-ZHO/ComfyUI-InstantID* https://github.com/AIFSH/ComfyUI-FishSpeech* https://github.com/coolzilj/ComfyUI-Photopea* https://github.com/lks-ai/anynode* Sarav: https://www.youtube.com/@mickmumpitz/videos ( applied stuff )* Sarav: https://www.youtube.com/@latentvision (technical, but infrequent)* look for comfyui node for https://github.com/magic-quill/MagicQuill* “Comfy for Video” resources* Kijai (https://github.com/kijai) pushing out support for Mochi, CogVideoX, AnimateDif, LivePortrait etc* Comfyui node support like LTX https://github.com/Lightricks/ComfyUI-LTXVideo , and HunyuanVideo* FloraFauna AI* Communities: https://www.reddit.com/r/StableDiffusion/, https://www.reddit.com/r/comfyui/Full YouTube EpisodeAs usual, you can find the full video episode on our YouTube (and don't forget to like and subscribe!)Timestamps* 00:00:04 Introduction of hosts and anonymous guest* 00:00:35 Origins of Comfy UI and early Stable Diffusion landscape* 00:02:58 Comfy's background and development of high-res fix* 00:05:37 Area conditioning and compositing in image generation* 00:07:20 Discussion on different AI image models (SD, Flux, etc.)* 00:11:10 Closed source model APIs and community discussions on SD versions* 00:14:41 LoRAs and textual inversion in image generation* 00:18:43 Evaluation methods in the Comfy community* 00:20:05 CLIP models and text encoders in image generation* 00:23:05 Prompt weighting and negative prompting* 00:26:22 Comfy UI's unique features and design choices* 00:31:00 Memory management in Comfy UI* 00:33:50 GPU market share and compatibility issues* 00:35:40 Node design and parameter settings in Comfy UI* 00:38:44 Custom nodes and community contributions* 00:41:40 Video generation models and capabilities* 00:44:47 Comfy UI's development timeline and rise to popularity* 00:48:13 Current state of Comfy UI team and future plans* 00:50:11 Discussion on other Comfy startups and potential text generation supportTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Small AI.swyx [00:00:12]: Hey everyone, we are in the Chroma Studio again, but with our first ever anonymous guest, Comfy Anonymous, welcome.Comfy [00:00:19]: Hello.swyx [00:00:21]: I feel like that's your full name, you just go by Comfy, right?Comfy [00:00:24]: Yeah, well, a lot of people just call me Comfy, even when they know my real name. Hey, Comfy.Alessio [00:00:32]: Swyx is the same. You know, not a lot of people call you Shawn.swyx [00:00:35]: Yeah, you have a professional name, right, that people know you by, and then you have a legal name. Yeah, it's fine. How do I phrase this? I think people who are in the know, know that Comfy is like the tool for image generation and now other multimodality stuff. I would say that when I first got started with Stable Diffusion, the star of the show was Automatic 111, right? And I actually looked back at my notes from 2022-ish, like Comfy was already getting started back then, but it was kind of like the up and comer, and your main feature was the flowchart. Can you just kind of rewind to that moment, that year and like, you know, how you looked at the landscape there and decided to start Comfy?Comfy [00:01:10]: Yeah, I discovered Stable Diffusion in 2022, in October 2022. And, well, I kind of started playing around with it. Yes, I, and back then I was using Automatic, which was what everyone was using back then. And so I started with that because I had, it was when I started, I had no idea like how Diffusion works. I didn't know how Diffusion models work, how any of this works, so.swyx [00:01:36]: Oh, yeah. What was your prior background as an engineer?Comfy [00:01:39]: Just a software engineer. Yeah. Boring software engineer.swyx [00:01:44]: But like any, any image stuff, any orchestration, distributed systems, GPUs?Comfy [00:01:49]: No, I was doing basically nothing interesting. Crud, web development? Yeah, a lot of web development, just, yeah, some basic, maybe some basic like automation stuff. Okay. Just. Yeah, no, like, no big companies or anything.swyx [00:02:08]: Yeah, but like already some interest in automations, probably a lot of Python.Comfy [00:02:12]: Yeah, yeah, of course, Python. But I wasn't actually used to like the Node graph interface before I started Comfy UI. It was just, I just thought it was like, oh, like, what's the best way to represent the Diffusion process in the user interface? And then like, oh, well. Well, like, naturally, oh, this is the best way I've found. And this was like with the Node interface. So how I got started was, yeah, so basic October 2022, just like I hadn't written a line of PyTorch before that. So it's completely new. What happened was I kind of got addicted to generating images.Alessio [00:02:58]: As we all did. Yeah.Comfy [00:03:00]: And then I started. I started experimenting with like the high-res fixed in auto, which was for those that don't know, the high-res fix is just since the Diffusion models back then could only generate that low-resolution. So what you would do, you would generate low-resolution image, then upscale, then refine it again. And that was kind of the hack to generate high-resolution images. I really liked generating. Like higher resolution images. So I was experimenting with that. And so I modified the code a bit. Okay. What happens if I, if I use different samplers on the second pass, I was edited the code of auto. So what happens if I use a different sampler? What happens if I use a different, like a different settings, different number of steps? And because back then the. The high-res fix was very basic, just, so. Yeah.swyx [00:04:05]: Now there's a whole library of just, uh, the upsamplers.Comfy [00:04:08]: I think, I think they added a bunch of, uh, of options to the high-res fix since, uh, since, since then. But before that was just so basic. So I wanted to go further. I wanted to try it. What happens if I use a different model for the second, the second pass? And then, well, then the auto code base was, wasn't good enough for. Like, it would have been, uh, harder to implement that in the auto interface than to create my own interface. So that's when I decided to create my own. And you were doing that mostly on your own when you started, or did you already have kind of like a subgroup of people? No, I was, uh, on my own because, because it was just me experimenting with stuff. So yeah, that was it. Then, so I started writing the code January one. 2023, and then I released the first version on GitHub, January 16th, 2023. That's how things got started.Alessio [00:05:11]: And what's, what's the name? Comfy UI right away or? Yeah.Comfy [00:05:14]: Comfy UI. The reason the name, my name is Comfy is people thought my pictures were comfy, so I just, uh, just named it, uh, uh, it's my Comfy UI. So yeah, that's, uh,swyx [00:05:27]: Is there a particular segment of the community that you targeted as users? Like more intensive workflow artists, you know, compared to the automatic crowd or, you know,Comfy [00:05:37]: This was my way of like experimenting with, uh, with new things, like the high risk fixed thing I mentioned, which was like in Comfy, the first thing you could easily do was just chain different models together. And then one of the first things, I think the first times it got a bit of popularity was when I started experimenting with the different, like applying. Prompts to different areas of the image. Yeah. I called it area conditioning, posted it on Reddit and it got a bunch of upvotes. So I think that's when, like, when people first learned of Comfy UI.swyx [00:06:17]: Is that mostly like fixing hands?Comfy [00:06:19]: Uh, no, no, no. That was just, uh, like, let's say, well, it was very, well, it still is kind of difficult to like, let's say you want a mountain, you have an image and then, okay. I'm like, okay. I want the mountain here and I want the, like a, a Fox here.swyx [00:06:37]: Yeah. So compositing the image. Yeah.Comfy [00:06:40]: My way was very easy. It was just like, oh, when you run the diffusion process, you kind of generate, okay. You do pass one pass through the diffusion, every step you do one pass. Okay. This place of the image with this brand, this space, place of the image with the other prop. And then. The entire image with another prop and then just average everything together, every step, and that was, uh, area composition, which I call it. And then, then a month later, there was a paper that came out called multi diffusion, which was the same thing, but yeah, that's, uh,Alessio [00:07:20]: could you do area composition with different models or because you're averaging out, you kind of need the same model.Comfy [00:07:26]: Could do it with, but yeah, I hadn't implemented it. For different models, but, uh, you, you can do it with, uh, with different models if you want, as long as the models share the same latent space, like we, we're supposed to ring a bell every time someone says, yeah, like, for example, you couldn't use like Excel and SD 1.5, because those have a different latent space, but like, uh, yeah, like SD 1.5 models, different ones. You could, you could do that.swyx [00:07:59]: There's some models that try to work in pixel space, right?Comfy [00:08:03]: Yeah. They're very slow. Of course. That's the problem. That that's the, the reason why stable diffusion actually became like popular, like, cause was because of the latent space.swyx [00:08:14]: Small and yeah. Because it used to be latent diffusion models and then they trained it up.Comfy [00:08:19]: Yeah. Cause a pixel pixel diffusion models are just too slow. So. Yeah.swyx [00:08:25]: Have you ever tried to talk to like, like stability, the latent diffusion guys, like, you know, Robin Rombach, that, that crew. Yeah.Comfy [00:08:32]: Well, I used to work at stability.swyx [00:08:34]: Oh, I actually didn't know. Yeah.Comfy [00:08:35]: I used to work at stability. I got, uh, I got hired, uh, in June, 2023.swyx [00:08:42]: Ah, that's the part of the story I didn't know about. Okay. Yeah.Comfy [00:08:46]: So the, the reason I was hired is because they were doing, uh, SDXL at the time and they were basically SDXL. I don't know if you remember it was a base model and then a refiner model. Basically they wanted to experiment, like chaining them together. And then, uh, they saw, oh, right. Oh, this, we can use this to do that. Well, let's hire that guy.swyx [00:09:10]: But they didn't, they didn't pursue it for like SD3. What do you mean? Like the SDXL approach. Yeah.Comfy [00:09:16]: The reason for that approach was because basically they had two models and then they wanted to publish both of them. So they, they trained one on. Lower time steps, which was the refiner model. And then they, the first one was trained normally. And then they went during their test, they realized, oh, like if we string these models together are like quality increases. So let's publish that. It worked. Yeah. But like right now, I don't think many people actually use the refiner anymore, even though it is actually a full diffusion model. Like you can use it on its own. And it's going to generate images. I don't think anyone, people have mostly forgotten about it. But, uh.Alessio [00:10:05]: Can we talk about models a little bit? So stable diffusion, obviously is the most known. I know flux has gotten a lot of traction. Are there any underrated models that people should use more or what's the state of the union?Comfy [00:10:17]: Well, the, the latest, uh, state of the art, at least, yeah, for images there's, uh, yeah, there's flux. There's also SD3.5. SD3.5 is two models. There's a, there's a small one, 2.5B and there's the bigger one, 8B. So it's, it's smaller than flux. So, and it's more, uh, creative in a way, but flux, yeah, flux is the best. People should give SD3.5 a try cause it's, uh, it's different. I won't say it's better. Well, it's better for some like specific use cases. Right. If you want some to make something more like creative, maybe SD3.5. If you want to make something more consistent and flux is probably better.swyx [00:11:06]: Do you ever consider supporting the closed source model APIs?Comfy [00:11:10]: Uh, well, they, we do support them as custom nodes. We actually have some, uh, official custom nodes from, uh, different. Ideogram.swyx [00:11:20]: Yeah. I guess DALI would have one. Yeah.Comfy [00:11:23]: That's, uh, it's just not, I'm not the person that handles that. Sure.swyx [00:11:28]: Sure. Quick question on, on SD. There's a lot of community discussion about the transition from SD1.5 to SD2 and then SD2 to SD3. People still like, you know, very loyal to the previous generations of SDs?Comfy [00:11:41]: Uh, yeah. SD1.5 then still has a lot of, a lot of users.swyx [00:11:46]: The last based model.Comfy [00:11:49]: Yeah. Then SD2 was mostly ignored. It wasn't, uh, it wasn't a big enough improvement over the previous one. Okay.swyx [00:11:58]: So SD1.5, SD3, flux and whatever else. SDXL. SDXL.Comfy [00:12:03]: That's the main one. Stable cascade. Stable cascade. That was a good model. But, uh, that's, uh, the problem with that one is, uh, it got, uh, like SD3 was announced one week after. Yeah.swyx [00:12:16]: It was like a weird release. Uh, what was it like inside of stability actually? I mean, statute of limitations. Yeah. The statute of limitations expired. You know, management has moved. So it's easier to talk about now. Yeah.Comfy [00:12:27]: And inside stability, actually that model was ready, uh, like three months before, but it got, uh, stuck in, uh, red teaming. So basically the product, if that model had released or was supposed to be released by the authors, then it would probably have gotten very popular since it's a, it's a step up from SDXL. But it got all of its momentum stolen. It got stolen by the SD3 announcement. So people kind of didn't develop anything on top of it, even though it's, uh, yeah. It was a good model, at least, uh, completely mostly ignored for some reason. Likeswyx [00:13:07]: I think the naming as well matters. It seemed like a branch off of the main, main tree of development. Yeah.Comfy [00:13:15]: Well, it was different researchers that did it. Yeah. Yeah. Very like, uh, good model. Like it's the Worcestershire authors. I don't know if I'm pronouncing it correctly. Yeah. Yeah. Yeah.swyx [00:13:28]: I actually met them in Vienna. Yeah.Comfy [00:13:30]: They worked at stability for a bit and they left right after the Cascade release.swyx [00:13:35]: This is Dustin, right? No. Uh, Dustin's SD3. Yeah.Comfy [00:13:38]: Dustin is a SD3 SDXL. That's, uh, Pablo and Dome. I think I'm pronouncing his name correctly. Yeah. Yeah. Yeah. Yeah. That's very good.swyx [00:13:51]: It seems like the community is very, they move very quickly. Yeah. Like when there's a new model out, they just drop whatever the current one is. And they just all move wholesale over. Like they don't really stay to explore the full capabilities. Like if, if the stable cascade was that good, they would have AB tested a bit more. Instead they're like, okay, SD3 is out. Let's go. You know?Comfy [00:14:11]: Well, I find the opposite actually. The community doesn't like, they only jump on a new model when there's a significant improvement. Like if there's a, only like a incremental improvement, which is what, uh, most of these models are going to have, especially if you, cause, uh, stay the same parameter count. Yeah. Like you're not going to get a massive improvement, uh, into like, unless there's something big that, that changes. So, uh. Yeah.swyx [00:14:41]: And how are they evaluating these improvements? Like, um, because there's, it's a whole chain of, you know, comfy workflows. Yeah. How does, how does one part of the chain actually affect the whole process?Comfy [00:14:52]: Are you talking on the model side specific?swyx [00:14:54]: Model specific, right? But like once you have your whole workflow based on a model, it's very hard to move.Comfy [00:15:01]: Uh, not, well, not really. Well, it depends on your, uh, depends on their specific kind of the workflow. Yeah.swyx [00:15:09]: So I do a lot of like text and image. Yeah.Comfy [00:15:12]: When you do change, like most workflows are kind of going to be complete. Yeah. It's just like, you might have to completely change your prompt completely change. Okay.swyx [00:15:24]: Well, I mean, then maybe the question is really about evals. Like what does the comfy community do for evals? Just, you know,Comfy [00:15:31]: Well, that they don't really do that. It's more like, oh, I think this image is nice. So that's, uh,swyx [00:15:38]: They just subscribe to Fofr AI and just see like, you know, what Fofr is doing. Yeah.Comfy [00:15:43]: Well, they just, they just generate like it. Like, I don't see anyone really doing it. Like, uh, at least on the comfy side, comfy users, they, it's more like, oh, generate images and see, oh, this one's nice. It's like, yeah, it's not, uh, like the, the more, uh, like, uh, scientific, uh, like, uh, like checking that's more on specifically on like model side. If, uh, yeah, but there is a lot of, uh, vibes also, cause it is a like, uh, artistic, uh, you can create a very good model that doesn't generate nice images. Cause most images on the internet are ugly. So if you, if that's like, if you just, oh, I have the best model at 10th giant, it's super smart. I created on all the, like I've trained on just all the images on the internet. The images are not going to look good. So yeah.Alessio [00:16:42]: Yeah.Comfy [00:16:43]: They're going to be very consistent. But yeah. People like, it's not going to be like the, the look that people are going to be expecting from, uh, from a model. So. Yeah.swyx [00:16:54]: Can we talk about LoRa's? Cause we thought we talked about models then like the next step is probably LoRa's. Before, I actually, I'm kind of curious how LoRa's entered the tool set of the image community because the LoRa paper was 2021. And then like, there was like other methods like textual inversion that was popular at the early SD stage. Yeah.Comfy [00:17:13]: I can't even explain the difference between that. Yeah. Textual inversions. That's basically what you're doing is you're, you're training a, cause well, yeah. Stable diffusion. You have the diffusion model, you have text encoder. So basically what you're doing is training a vector that you're going to pass to the text encoder. It's basically you're training a new word. Yeah.swyx [00:17:37]: It's a little bit like representation engineering now. Yeah.Comfy [00:17:40]: Yeah. Basically. Yeah. You're just, so yeah, if you know how like the text encoder works, basically you have, you take your, your words of your product, you convert those into tokens with the tokenizer and those are converted into vectors. Basically. Yeah. Each token represents a different vector. So each word presents a vector. And those, depending on your words, that's the list of vectors that get passed to the text encoder, which is just. Yeah. Yeah. I'm just a stack of, of attention. Like basically it's a very close to LLM architecture. Yeah. Yeah. So basically what you're doing is just training a new vector. We're saying, well, I have all these images and I want to know which word does that represent? And it's going to get like, you train this vector and then, and then when you use this vector, it hopefully generates. Like something similar to your images. Yeah.swyx [00:18:43]: I would say it's like surprisingly sample efficient in picking up the concept that you're trying to train it on. Yeah.Comfy [00:18:48]: Well, people have kind of stopped doing that even though back as like when I was at Stability, we, we actually did train internally some like textual versions on like T5 XXL actually worked pretty well. But for some reason, yeah, people don't use them. And also they might also work like, like, yeah, this is something and probably have to test, but maybe if you train a textual version, like on T5 XXL, it might also work with all the other models that use T5 XXL because same thing with like, like the textual inversions that, that were trained for SD 1.5, they also kind of work on SDXL because SDXL has the, has two text encoders. And one of them is the same as the, as the SD 1.5 CLIP-L. So those, they actually would, they don't work as strongly because they're only applied to one of the text encoders. But, and the same thing for SD3. SD3 has three text encoders. So it works. It's still, you can still use your textual version SD 1.5 on SD3, but it's just a lot weaker because now there's three text encoders. So it gets even more diluted. Yeah.swyx [00:20:05]: Do people experiment a lot on, just on the CLIP side, there's like Siglip, there's Blip, like do people experiment a lot on those?Comfy [00:20:12]: You can't really replace. Yeah.swyx [00:20:14]: Because they're trained together, right? Yeah.Comfy [00:20:15]: They're trained together. So you can't like, well, what I've seen people experimenting with is a long CLIP. So basically someone fine tuned the CLIP model to accept longer prompts.swyx [00:20:27]: Oh, it's kind of like long context fine tuning. Yeah.Comfy [00:20:31]: So, so like it's, it's actually supported in Core Comfy.swyx [00:20:35]: How long is long?Comfy [00:20:36]: Regular CLIP is 77 tokens. Yeah. Long CLIP is 256. Okay. So, but the hack that like you've, if you use stable diffusion 1.5, you've probably noticed, oh, it still works if I, if I use long prompts, prompts longer than 77 words. Well, that's because the hack is to just, well, you split, you split it up in chugs of 77, your whole big prompt. Let's say you, you give it like the massive text, like the Bible or something, and it would split it up in chugs of 77 and then just pass each one through the CLIP and then just cut anything together at the end. It's not ideal, but it actually works.swyx [00:21:26]: Like the positioning of the words really, really matters then, right? Like this is why order matters in prompts. Yeah.Comfy [00:21:33]: Yeah. Like it, it works, but it's, it's not ideal, but it's what people expect. Like if, if someone gives a huge prompt, they expect at least some of the concepts at the end to be like present in the image. But usually when they give long prompts, they, they don't, they like, they don't expect like detail, I think. So that's why it works very well.swyx [00:21:58]: And while we're on this topic, prompts waiting, negative comments. Negative prompting all, all sort of similar part of this layer of the stack. Yeah.Comfy [00:22:05]: The, the hack for that, which works on CLIP, like it, basically it's just for SD 1.5, well, for SD 1.5, the prompt waiting works well because CLIP L is a, is not a very deep model. So you have a very high correlation between, you have the input token, the index of the input token vector. And the output token, they're very, the concepts are very close, closely linked. So that means if you interpolate the vector from what, well, the, the way Comfy UI does it is it has, okay, you have the vector, you have an empty prompt. So you have a, a chunk, like a CLIP output for the empty prompt, and then you have the one for your prompt. And then it interpolates from that, depending on your prompt. Yeah.Comfy [00:23:07]: So that's how it, how it does prompt waiting. But this stops working the deeper your text encoder is. So on T5X itself, it doesn't work at all. So. Wow.swyx [00:23:20]: Is that a problem for people? I mean, cause I'm used to just move, moving up numbers. Probably not. Yeah.Comfy [00:23:25]: Well.swyx [00:23:26]: So you just use words to describe, right? Cause it's a bigger language model. Yeah.Comfy [00:23:30]: Yeah. So. Yeah. So honestly it might be good, but I haven't seen many complaints on Flux that it's not working. So, cause I guess people can sort of get around it with, with language. So. Yeah.swyx [00:23:46]: Yeah. And then coming back to LoRa's, now the, the popular way to, to customize models is LoRa's. And I saw you also support Locon and LoHa, which I've never heard of before.Comfy [00:23:56]: There's a bunch of, cause what, what the LoRa is essentially is. Instead of like, okay, you have your, your model and then you want to fine tune it. So instead of like, what you could do is you could fine tune the entire thing, but that's a bit heavy. So to speed things up and make things less heavy, what you can do is just fine tune some smaller weights, like basically two, two matrices that when you multiply like two low rank matrices and when you multiply them together, gives a, represents a difference between trained weights and your base weights. So by training those two smaller matrices, that's a lot less heavy. Yeah.Alessio [00:24:45]: And they're portable. So you're going to share them. Yeah. It's like easier. And also smaller.Comfy [00:24:49]: Yeah. That's the, how LoRa's work. So basically, so when, when inferencing you, you get an inference with them pretty efficiently, like how ComputeWrite does it. It just, when you use a LoRa, it just applies it straight on the weights so that there's only a small delay at the base, like before the sampling to when it applies the weights and then it just same speed as, as before. So for, for inference, it's, it's not that bad, but, and then you have, so basically all the LoRa types like LoHa, LoCon, everything, that's just different ways of representing that like. Basically, you can call it kind of like compression, even though it's not really compression, it's just different ways of represented, like just, okay, I want to train a different on the difference on the weights. What's the best way to represent that difference? There's the basic LoRa, which is just, oh, let's multiply these two matrices together. And then there's all the other ones, which are all different algorithms. So. Yeah.Alessio [00:25:57]: So let's talk about LoRa. Let's talk about what comfy UI actually is. I think most people have heard of it. Some people might've seen screenshots. I think fewer people have built very complex workflows. So when you started, automatic was like the super simple way. What were some of the choices that you made? So the node workflow, is there anything else that stands out as like, this was like a unique take on how to do image generation workflows?Comfy [00:26:22]: Well, I feel like, yeah, back then everyone was trying to make like easy to use interface. Yeah. So I'm like, well, everyone's trying to make an easy to use interface.swyx [00:26:32]: Let's make a hard to use interface.Comfy [00:26:37]: Like, so like, I like, I don't need to do that, everyone else doing it. So let me try something like, let me try to make a powerful interface that's not easy to use. So.swyx [00:26:52]: So like, yeah, there's a sort of node execution engine. Yeah. Yeah. And it actually lists, it has this really good list of features of things you prioritize, right? Like let me see, like sort of re-executing from, from any parts of the workflow that was changed, asynchronous queue system, smart memory management, like all this seems like a lot of engineering that. Yeah.Comfy [00:27:12]: There's a lot of engineering in the back end to make things, cause I was always focused on making things work locally very well. Cause that's cause I was using it locally. So everything. So there's a lot of, a lot of thought and working by getting everything to run as well as possible. So yeah. ConfUI is actually more of a back end, at least, well, not all the front ends getting a lot more development, but, but before, before it was, I was pretty much only focused on the backend. Yeah.swyx [00:27:50]: So v0.1 was only August this year. Yeah.Comfy [00:27:54]: With the new front end. Before there was no versioning. So yeah. Yeah. Yeah.swyx [00:27:57]: And so what was the big rewrite for the 0.1 and then the 1.0?Comfy [00:28:02]: Well, that's more on the front end side. That's cause before that it was just like the UI, what, cause when I first wrote it, I just, I said, okay, how can I make, like, I can do web development, but I don't like doing it. Like what's the easiest way I can slap a node interface on this. And then I found this library. Yeah. Like JavaScript library.swyx [00:28:26]: Live graph?Comfy [00:28:27]: Live graph.swyx [00:28:28]: Usually people will go for like react flow for like a flow builder. Yeah.Comfy [00:28:31]: But that seems like too complicated. So I didn't really want to spend time like developing the front end. So I'm like, well, oh, light graph. This has the whole node interface. So, okay. Let me just plug that into, to my backend.swyx [00:28:49]: I feel like if Streamlit or Gradio offered something that you would have used Streamlit or Gradio cause it's Python. Yeah.Comfy [00:28:54]: Yeah. Yeah. Yeah.Comfy [00:29:00]: Yeah.Comfy [00:29:14]: Yeah. logic and your backend logic and just sticks them together.swyx [00:29:20]: It's supposed to be easy for you guys. If you're a Python main, you know, I'm a JS main, right? Okay. If you're a Python main, it's supposed to be easy.Comfy [00:29:26]: Yeah, it's easy, but it makes your whole software a huge mess.swyx [00:29:30]: I see, I see. So you're mixing concerns instead of separating concerns?Comfy [00:29:34]: Well, it's because... Like frontend and backend. Frontend and backend should be well separated with a defined API. Like that's how you're supposed to do it. Smart people disagree. It just sticks everything together. It makes it easy to like a huge mess. And also it's, there's a lot of issues with Gradio. Like it's very good if all you want to do is just get like slap a quick interface on your, like to show off your ML project. Like that's what it's made for. Yeah. Like there's no problem using it. Like, oh, I have my, I have my code. I just wanted a quick interface on it. That's perfect. Like use Gradio. But if you want to make something that's like a real, like real software that will last a long time and will be easy to maintain, then I would avoid it. Yeah.swyx [00:30:32]: So your criticism is Streamlit and Gradio are the same. I mean, those are the same criticisms.Comfy [00:30:37]: Yeah, Streamlit I haven't used as much. Yeah, I just looked a bit.swyx [00:30:43]: Similar philosophy.Comfy [00:30:44]: Yeah, it's similar. It's just, it just seems to me like, okay, for quick, like AI demos, it's perfect.swyx [00:30:51]: Yeah. Going back to like the core tech, like asynchronous queues, slow re-execution, smart memory management, you know, anything that you were very proud of or was very hard to figure out?Comfy [00:31:00]: Yeah. The thing that's the biggest pain in the ass is probably the memory management. Yeah.swyx [00:31:05]: Were you just paging models in and out or? Yeah.Comfy [00:31:08]: Before it was just, okay, load the model, completely unload it. Then, okay, that, that works well when you, your model are small, but if your models are big and it takes sort of like, let's say someone has a, like a, a 4090, and the model size is 10 gigabytes, that can take a few seconds to like load and load, load and load, so you want to try to keep things like in memory, in the GPU memory as much as possible. What Comfy UI does right now is it. It tries to like estimate, okay, like, okay, you're going to sample this model, it's going to take probably this amount of memory, let's remove the models, like this amount of memory that's been loaded on the GPU and then just execute it. But so there's a fine line between just because try to remove the least amount of models that are already loaded. Because as fans, like Windows drivers, and one other problem is the NVIDIA driver on Windows by default, because there's a way to, there's an option to disable that feature, but by default it, like, if you start loading, you can overflow your GPU memory and then it's, the driver's going to automatically start paging to RAM. But the problem with that is it's, it makes everything extremely slow. So when you see people complaining, oh, this model, it works, but oh, s**t, it starts slowing down a lot, that's probably what's happening. So it's basically you have to just try to get, use as much memory as possible, but not too much, or else things start slowing down, or people get out of memory, and then just find, try to find that line where, oh, like the driver on Windows starts paging and stuff. Yeah. And the problem with PyTorch is it's, it's high levels, don't have that much fine-grained control over, like, specific memory stuff, so kind of have to leave, like, the memory freeing to, to Python and PyTorch, which is, can be annoying sometimes.swyx [00:33:32]: So, you know, I think one thing is, as a maintainer of this project, like, you're designing for a very wide surface area of compute, like, you even support CPUs.Comfy [00:33:42]: Yeah, well, that's... That's just, for PyTorch, PyTorch supports CPUs, so, yeah, it's just, that's not, that's not hard to support.swyx [00:33:50]: First of all, is there a market share estimate, like, is it, like, 70% NVIDIA, like, 30% AMD, and then, like, miscellaneous on Apple, Silicon, or whatever?Comfy [00:33:59]: For Comfy? Yeah. Yeah, and, yeah, I don't know the market share.swyx [00:34:03]: Can you guess?Comfy [00:34:04]: I think it's mostly NVIDIA. Right. Because, because AMD, the problem, like, AMD works horribly on Windows. Like, on Linux, it works fine. It's, it's lower than the price equivalent NVIDIA GPU, but it works, like, you can use it, you generate images, everything works. On Linux, on Windows, you might have a hard time, so, that's the problem, and most people, I think most people who bought AMD probably use Windows. They probably aren't going to switch to Linux, so... Yeah. So, until AMD actually, like, ports their, like, raw cam to, to Windows properly, and then there's actually PyTorch, I think they're, they're doing that, they're in the process of doing that, but, until they get it, they get a good, like, PyTorch raw cam build that works on Windows, it's, like, they're going to have a hard time. Yeah.Alessio [00:35:06]: We got to get George on it. Yeah. Well, he's trying to get Lisa Su to do it, but... Let's talk a bit about, like, the node design. So, unlike all the other text-to-image, you have a very, like, deep, so you have, like, a separate node for, like, clip and code, you have a separate node for, like, the case sampler, you have, like, all these nodes. Going back to, like, the making it easy versus making it hard, but, like, how much do people actually play with all the settings, you know? Kind of, like, how do you guide people to, like, hey, this is actually going to be very impactful versus this is maybe, like, less impactful, but we still want to expose it to you?Comfy [00:35:40]: Well, I try to... I try to expose, like, I try to expose everything or, but, yeah, at least for the, but for things, like, for example, for the samplers, like, there's, like, yeah, four different sampler nodes, which go in easiest to most advanced. So, yeah, if you go, like, the easy node, the regular sampler node, that's, you have just the basic settings. But if you use, like, the sampler advanced... If you use, like, the custom advanced node, that, that one you can actually, you'll see you have, like, different nodes.Alessio [00:36:19]: I'm looking it up now. Yeah. What are, like, the most impactful parameters that you use? So, it's, like, you know, you can have more, but, like, which ones, like, really make a difference?Comfy [00:36:30]: Yeah, they all do. They all have their own, like, they all, like, for example, yeah, steps. Usually you want steps, you want them to be as low as possible. But you want, if you're optimizing your workflow, you want to, you lower the steps until, like, the images start deteriorating too much. Because that, yeah, that's the number of steps you're running the diffusion process. So, if you want things to be faster, lower is better. But, yeah, CFG, that's more, you can kind of see that as the contrast of the image. Like, if your image looks too bursty. Then you can lower the CFG. So, yeah, CFG, that's how, yeah, that's how strongly the, like, the negative versus positive prompt. Because when you sample a diffusion model, it's basically a negative prompt. It's just, yeah, positive prediction minus negative prediction.swyx [00:37:32]: Contrastive loss. Yeah.Comfy [00:37:34]: It's positive minus negative, and the CFG does the multiplier. Yeah. Yeah. Yeah, so.Alessio [00:37:41]: What are, like, good resources to understand what the parameters do? I think most people start with automatic, and then they move over, and it's, like, snap, CFG, sampler, name, scheduler, denoise. Read it.Comfy [00:37:53]: But, honestly, well, it's more, it's something you should, like, try out yourself. I don't know, you don't necessarily need to know how it works to, like, what it does. Because even if you know, like, CFGO, it's, like, positive minus negative prompt. Yeah. So the only thing you know at CFG is if it's 1.0, then that means the negative prompt isn't applied. It also means sampling is two times faster. But, yeah. But other than that, it's more, like, you should really just see what it does to the images yourself, and you'll probably get a more intuitive understanding of what these things do.Alessio [00:38:34]: Any other nodes or things you want to shout out? Like, I know the animate diff IP adapter. Those are, like, some of the most popular ones. Yeah. What else comes to mind?Comfy [00:38:44]: Not nodes, but there's, like, what I like is when some people, sometimes they make things that use ComfyUI as their backend. Like, there's a plugin for Krita that uses ComfyUI as its backend. So you can use, like, all the models that work in Comfy in Krita. And I think I've tried it once. But I know a lot of people use it, and it's probably really nice, so.Alessio [00:39:15]: What's the craziest node that people have built, like, the most complicated?Comfy [00:39:21]: Craziest node? Like, yeah. I know some people have made, like, video games in Comfy with, like, stuff like that. So, like, someone, like, I remember, like, yeah, last, I think it was last year, someone made, like, a, like, Wolfenstein 3D in Comfy. Of course. And then one of the inputs was, oh, you can generate a texture, and then it changes the texture in the game. So you can plug it to, like, the workflow. And there's a lot of, if you look there, there's a lot of crazy things people do, so. Yeah.Alessio [00:39:59]: And now there's, like, a node register that people can use to, like, download nodes. Yeah.Comfy [00:40:04]: Like, well, there's always been the, like, the ComfyUI manager. Yeah. But we're trying to make this more, like, I don't know, official, like, with, yeah, with the node registry. Because before the node registry, the, like, okay, how did your custom node get into ComfyUI manager? That's the guy running it who, like, every day he searched GitHub for new custom nodes and added dev annually to his custom node manager. So we're trying to make it less effortless. So we're trying to make it less effortless for him, basically. Yeah.Alessio [00:40:40]: Yeah. But I was looking, I mean, there's, like, a YouTube download node. There's, like, this is almost like, you know, a data pipeline more than, like, an image generation thing at this point. It's, like, you can get data in, you can, like, apply filters to it, you can generate data out.Comfy [00:40:54]: Yeah. You can do a lot of different things. Yeah. So I'm thinking, I think what I did is I made it easy to make custom nodes. So I think that helped a lot. I think that helped a lot for, like, the ecosystem because it is very easy to just make a node. So, yeah, a bit too easy sometimes. Then we have the issue where there's a lot of custom node packs which share similar nodes. But, well, that's, yeah, something we're trying to solve by maybe bringing some of the functionality into the core. Yeah. Yeah. Yeah.Alessio [00:41:36]: And then there's, like, video. People can do video generation. Yeah.Comfy [00:41:40]: Video, that's, well, the first video model was, like, stable video diffusion, which was last, yeah, exactly last year, I think. Like, one year ago. But that wasn't a true video model. So it was...swyx [00:41:55]: It was, like, moving images? Yeah.Comfy [00:41:57]: I generated video. What I mean by that is it's, like, it's still 2D Latents. It's basically what I'm trying to do. So what they did is they took SD2, and then they added some temporal attention to it, and then trained it on videos and all. So it's kind of, like, animated, like, same idea, basically. Why I say it's not a true video model is that you still have, like, the 2D Latents. Like, a true video model, like Mochi, for example, would have 3D Latents. Mm-hmm.Alessio [00:42:32]: Which means you can, like, move through the space, basically. It's the difference. You're not just kind of, like, reorienting. Yeah.Comfy [00:42:39]: And it's also, well, it's also because you have a temporal VAE. Mm-hmm. Also, like, Mochi has a temporal VAE that compresses on, like, the temporal direction, also. So that's something you don't have with, like, yeah, animated diff and stable video diffusion. They only, like, compress spatially, not temporally. Mm-hmm. Right. So, yeah. That's why I call that, like, true video models. There's, yeah, there's actually a few of them, but the one I've implemented in comfy is Mochi, because that seems to be the best one so far. Yeah.swyx [00:43:15]: We had AJ come and speak at the stable diffusion meetup. The other open one I think I've seen is COG video. Yeah.Comfy [00:43:21]: COG video. Yeah. That one's, yeah, it also seems decent, but, yeah. Chinese, so we don't use it. No, it's fine. It's just, yeah, I could. Yeah. It's just that there's a, it's not the only one. There's also a few others, which I.swyx [00:43:36]: The rest are, like, closed source, right? Like, Cling. Yeah.Comfy [00:43:39]: Closed source, there's a bunch of them. But I mean, open. I've seen a few of them. Like, I can't remember their names, but there's COG videos, the big, the big one. Then there's also a few of them that released at the same time. There's one that released at the same time as SSD 3.5, same day, which is why I don't remember the name.swyx [00:44:02]: We should have a release schedule so we don't conflict on each of these things. Yeah.Comfy [00:44:06]: I think SD 3.5 and Mochi released on the same day. So everything else was kind of drowned, completely drowned out. So for some reason, lots of people picked that day to release their stuff.Comfy [00:44:21]: Yeah. Which is, well, shame for those. And I think Omnijet also released the same day, which also seems interesting. Yeah. Yeah.Alessio [00:44:30]: What's Comfy? So you are Comfy. And then there's like, comfy.org. I know we do a lot of things for, like, news research and those guys also have kind of like a more open source thing going on. How do you work? Like you mentioned, you mostly work on like, the core piece of it. And then what...Comfy [00:44:47]: Maybe I should fade it in because I, yeah, I feel like maybe, yeah, I only explain part of the story. Right. Yeah. Maybe I should explain the rest. So yeah. So yeah. Basically, January, that's when the first January 2023, January 16, 2023, that's when Amphi was first released to the public. Then, yeah, did a Reddit post about the area composition thing somewhere in, I don't remember exactly, maybe end of January, beginning of February. And then someone, a YouTuber, made a video about it, like Olivio, he made a video about Amphi in March 2023. I think that's when it was a real burst of attention. And by that time, I was continuing to develop it and it was getting, people were starting to use it more, which unfortunately meant that I had first written it to do like experiments, but then my time to do experiments went down. It started going down, because people were actually starting to use it then. Like, I had to, and I said, well, yeah, time to add all these features and stuff. Yeah, and then I got hired by Stability June, 2023. Then I made, basically, yeah, they hired me because they wanted the SD-XL. So I got the SD-XL working very well withітhe UI, because they were experimenting withámphi.house.com. Actually, the SDX, how the SDXL released worked is they released, for some reason, like they released the code first, but they didn't release the model checkpoint. So they released the code. And then, well, since the research was related to code, I released the code in Compute 2. And then the checkpoints were basically early access. People had to sign up and they only allowed a lot of people from edu emails. Like if you had an edu email, like they gave you access basically to the SDXL 0.9. And, well, that leaked. Right. Of course, because of course it's going to leak if you do that. Well, the only way people could easily use it was with Comfy. So, yeah, people started using. And then I fixed a few of the issues people had. So then the big 1.0 release happened. And, well, Comfy UI was the only way a lot of people could actually run it on their computers. Because it just like automatic was so like inefficient and bad that most people couldn't actually, like it just wouldn't work. Like because he did a quick implementation. So people were forced. To use Comfy UI, and that's how it became popular because people had no choice.swyx [00:47:55]: The growth hack.Comfy [00:47:56]: Yeah.swyx [00:47:56]: Yeah.Comfy [00:47:57]: Like everywhere, like people who didn't have the 4090, they had like, who had just regular GPUs, they didn't have a choice.Alessio [00:48:05]: So yeah, I got a 4070. So think of me. And so today, what's, is there like a core Comfy team or?Comfy [00:48:13]: Uh, yeah, well, right now, um, yeah, we are hiring. Okay. Actually, so right now core, like, um, the core core itself, it's, it's me. Uh, but because, uh, the reason where folks like all the focus has been mostly on the front end right now, because that's the thing that's been neglected for a long time. So, uh, so most of the focus right now is, uh, all on the front end, but we are, uh, yeah, we will soon get, uh, more people to like help me with the actual backend stuff. Yeah. So, no, I'm not going to say a hundred percent because that's why once the, once we have our V one release, which is because it'd be the package, come fee-wise with the nice interface and easy to install on windows and hopefully Mac. Uh, yeah. Yeah. Once we have that, uh, we're going to have to, lots of stuff to do on the backend side and also the front end side, but, uh.Alessio [00:49:14]: What's the release that I'm on the wait list. What's the timing?Comfy [00:49:18]: Uh, soon. Uh, soon. Yeah, I don't want to promise a release date. We do have a release date we're targeting, but I'm not sure if it's public. Yeah, and we're still going to continue doing the open source, making MPUI the best way to run stable infusion models. At least the open source side, it's going to be the best way to run models locally. But we will have a few things to make money from it, like cloud inference or that type of thing. And maybe some things for some enterprises.swyx [00:50:08]: I mean, a few questions on that. How do you feel about the other comfy startups?Comfy [00:50:11]: I mean, I think it's great. They're using your name. Yeah, well, it's better they use comfy than they use something else. Yeah, that's true. It's fine. We're going to try not to... We don't want to... We want people to use comfy. Like I said, it's better that people use comfy than something else. So as long as they use comfy, I think it helps the ecosystem. Because more people, even if they don't contribute directly, the fact that they are using comfy means that people are more likely to join the ecosystem. So, yeah.swyx [00:50:57]: And then would you ever do text?Comfy [00:50:59]: Yeah, well, you can already do text with some custom nodes. So, yeah, it's something we like. Yeah, it's something I've wanted to eventually add to core, but it's more like not a very... It's a very high priority. But because a lot of people use text for prompt enhancement and other things like that. So, yeah, it's just that my focus has always been on diffusion models. Yeah, unless some text diffusion model comes out.swyx [00:51:30]: Yeah, David Holtz is investing a lot in text diffusion.Comfy [00:51:34]: Yeah, well, if a good one comes out, then we'll probably implement it since it fits with the whole...swyx [00:51:39]: Yeah, I mean, I imagine it's going to be a close source to Midjourney. Yeah.Comfy [00:51:43]: Well, if an open one comes out, then I'll probably implement it.Alessio [00:51:54]: Cool, comfy. Thanks so much for coming on. This was fun. Bye. Get full access to Latent Space at www.latent.space/subscribe

live ai apple bible work video chinese smart model current 3d memory reddit negative engineering origins mac ab windows lower ip boring ram cto state of the union excel similar applications closed evaluation nvidia stability api sd generate stable input python ui mm playground ml linux dome github automatic sketch apis craziest clip amd flux llm prompt prompts silicon gpu 5b cascade ssd dali cling midjourney node js gpus diffusion frontend comfy alessio 8b blip compute worcestershire cpus replicate mochi stable diffusion sds textual major tom cog crud nvidia gpus pytorch vae wolfenstein 3d lisa su loras cfg krita ideogram streamlit amphi latent space sdx gradio ltx sd1

Latent.Space 2024 Year in Review

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Dec 31, 2024 111:07

Applications for the 2025 AI Engineer Summit are up, and you can save the date for AIE Singapore in April and AIE World's Fair 2025 in June.Happy new year, and thanks for 100 great episodes! Please let us know what you want to see/hear for the next 100!Full YouTube Episode with Slides/ChartsLike and subscribe and hit that bell to get notifs!Timestamps* 00:00 Welcome to the 100th Episode!* 00:19 Reflecting on the Journey* 00:47 AI Engineering: The Rise and Impact* 03:15 Latent Space Live and AI Conferences* 09:44 The Competitive AI Landscape* 21:45 Synthetic Data and Future Trends* 35:53 Creative Writing with AI* 36:12 Legal and Ethical Issues in AI* 38:18 The Data War: GPU Poor vs. GPU Rich* 39:12 The Rise of GPU Ultra Rich* 40:47 Emerging Trends in AI Models* 45:31 The Multi-Modality War* 01:05:31 The Future of AI Benchmarks* 01:13:17 Pionote and Frontier Models* 01:13:47 Niche Models and Base Models* 01:14:30 State Space Models and RWKB* 01:15:48 Inference Race and Price Wars* 01:22:16 Major AI Themes of the Year* 01:22:48 AI Rewind: January to March* 01:26:42 AI Rewind: April to June* 01:33:12 AI Rewind: July to September* 01:34:59 AI Rewind: October to December* 01:39:53 Year-End Reflections and PredictionsTranscript[00:00:00] Welcome to the 100th Episode![00:00:00] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host Swyx for the 100th time today.[00:00:12] swyx: Yay, um, and we're so glad that, yeah, you know, everyone has, uh, followed us in this journey. How do you feel about it? 100 episodes.[00:00:19] Alessio: Yeah, I know.[00:00:19] Reflecting on the Journey[00:00:19] Alessio: Almost two years that we've been doing this. We've had four different studios. Uh, we've had a lot of changes. You know, we used to do this lightning round. When we first started that we didn't like, and we tried to change the question. The answer[00:00:32] swyx: was cursor and perplexity.[00:00:34] Alessio: Yeah, I love mid journey. It's like, do you really not like anything else?[00:00:38] Alessio: Like what's, what's the unique thing? And I think, yeah, we, we've also had a lot more research driven content. You know, we had like 3DAO, we had, you know. Jeremy Howard, we had more folks like that.[00:00:47] AI Engineering: The Rise and Impact[00:00:47] Alessio: I think we want to do more of that too in the new year, like having, uh, some of the Gemini folks, both on the research and the applied side.[00:00:54] Alessio: Yeah, but it's been a ton of fun. I think we both started, I wouldn't say as a joke, we were kind of like, Oh, we [00:01:00] should do a podcast. And I think we kind of caught the right wave, obviously. And I think your rise of the AI engineer posts just kind of get people. Sombra to congregate, and then the AI engineer summit.[00:01:11] Alessio: And that's why when I look at our growth chart, it's kind of like a proxy for like the AI engineering industry as a whole, which is almost like, like, even if we don't do that much, we keep growing just because there's so many more AI engineers. So did you expect that growth or did you expect that would take longer for like the AI engineer thing to kind of like become, you know, everybody talks about it today.[00:01:32] swyx: So, the sign of that, that we have won is that Gartner puts it at the top of the hype curve right now. So Gartner has called the peak in AI engineering. I did not expect, um, to what level. I knew that I was correct when I called it because I did like two months of work going into that. But I didn't know, You know, how quickly it could happen, and obviously there's a chance that I could be wrong.[00:01:52] swyx: But I think, like, most people have come around to that concept. Hacker News hates it, which is a good sign. But there's enough people that have defined it, you know, GitHub, when [00:02:00] they launched GitHub Models, which is the Hugging Face clone, they put AI engineers in the banner, like, above the fold, like, in big So I think it's like kind of arrived as a meaningful and useful definition.[00:02:12] swyx: I think people are trying to figure out where the boundaries are. I think that was a lot of the quote unquote drama that happens behind the scenes at the World's Fair in June. Because I think there's a lot of doubt or questions about where ML engineering stops and AI engineering starts. That's a useful debate to be had.[00:02:29] swyx: In some sense, I actually anticipated that as well. So I intentionally did not. Put a firm definition there because most of the successful definitions are necessarily underspecified and it's actually useful to have different perspectives and you don't have to specify everything from the outset.[00:02:45] Alessio: Yeah, I was at um, AWS reInvent and the line to get into like the AI engineering talk, so to speak, which is, you know, applied AI and whatnot was like, there are like hundreds of people just in line to go in.[00:02:56] Alessio: I think that's kind of what enabled me. People, right? Which is what [00:03:00] you kind of talked about. It's like, Hey, look, you don't actually need a PhD, just, yeah, just use the model. And then maybe we'll talk about some of the blind spots that you get as an engineer with the earlier posts that we also had on on the sub stack.[00:03:11] Alessio: But yeah, it's been a heck of a heck of a two years.[00:03:14] swyx: Yeah.[00:03:15] Latent Space Live and AI Conferences[00:03:15] swyx: You know, I was, I was trying to view the conference as like, so NeurIPS is I think like 16, 17, 000 people. And the Latent Space Live event that we held there was 950 signups. I think. The AI world, the ML world is still very much research heavy. And that's as it should be because ML is very much in a research phase.[00:03:34] swyx: But as we move this entire field into production, I think that ratio inverts into becoming more engineering heavy. So at least I think engineering should be on the same level, even if it's never as prestigious, like it'll always be low status because at the end of the day, you're manipulating APIs or whatever.[00:03:51] swyx: But Yeah, wrapping GPTs, but there's going to be an increasing stack and an art to doing these, these things well. And I, you know, I [00:04:00] think that's what we're focusing on for the podcast, the conference and basically everything I do seems to make sense. And I think we'll, we'll talk about the trends here that apply.[00:04:09] swyx: It's, it's just very strange. So, like, there's a mix of, like, keeping on top of research while not being a researcher and then putting that research into production. So, like, people always ask me, like, why are you covering Neuralibs? Like, this is a ML research conference and I'm like, well, yeah, I mean, we're not going to, to like, understand everything Or reproduce every single paper, but the stuff that is being found here is going to make it through into production at some point, you hope.[00:04:32] swyx: And then actually like when I talk to the researchers, they actually get very excited because they're like, oh, you guys are actually caring about how this goes into production and that's what they really really want. The measure of success is previously just peer review, right? Getting 7s and 8s on their um, Academic review conferences and stuff like citations is one metric, but money is a better metric.[00:04:51] Alessio: Money is a better metric. Yeah, and there were about 2200 people on the live stream or something like that. Yeah, yeah. Hundred on the live stream. So [00:05:00] I try my best to moderate, but it was a lot spicier in person with Jonathan and, and Dylan. Yeah, that it was in the chat on YouTube.[00:05:06] swyx: I would say that I actually also created.[00:05:09] swyx: Layen Space Live in order to address flaws that are perceived in academic conferences. This is not NeurIPS specific, it's ICML, NeurIPS. Basically, it's very sort of oriented towards the PhD student, uh, market, job market, right? Like literally all, basically everyone's there to advertise their research and skills and get jobs.[00:05:28] swyx: And then obviously all the, the companies go there to hire them. And I think that's great for the individual researchers, but for people going there to get info is not great because you have to read between the lines, bring a ton of context in order to understand every single paper. So what is missing is effectively what I ended up doing, which is domain by domain, go through and recap the best of the year.[00:05:48] swyx: Survey the field. And there are, like NeurIPS had a, uh, I think ICML had a like a position paper track, NeurIPS added a benchmarks, uh, datasets track. These are ways in which to address that [00:06:00] issue. Uh, there's always workshops as well. Every, every conference has, you know, a last day of workshops and stuff that provide more of an overview.[00:06:06] swyx: But they're not specifically prompted to do so. And I think really, uh, Organizing a conference is just about getting good speakers and giving them the correct prompts. And then they will just go and do that thing and they do a very good job of it. So I think Sarah did a fantastic job with the startups prompt.[00:06:21] swyx: I can't list everybody, but we did best of 2024 in startups, vision, open models. Post transformers, synthetic data, small models, and agents. And then the last one was the, uh, and then we also did a quick one on reasoning with Nathan Lambert. And then the last one, obviously, was the debate that people were very hyped about.[00:06:39] swyx: It was very awkward. And I'm really, really thankful for John Franco, basically, who stepped up to challenge Dylan. Because Dylan was like, yeah, I'll do it. But He was pro scaling. And I think everyone who is like in AI is pro scaling, right? So you need somebody who's ready to publicly say, no, we've hit a wall.[00:06:57] swyx: So that means you're saying Sam Altman's wrong. [00:07:00] You're saying, um, you know, everyone else is wrong. It helps that this was the day before Ilya went on, went up on stage and then said pre training has hit a wall. And data has hit a wall. So actually Jonathan ended up winning, and then Ilya supported that statement, and then Noam Brown on the last day further supported that statement as well.[00:07:17] swyx: So it's kind of interesting that I think the consensus kind of going in was that we're not done scaling, like you should believe in a better lesson. And then, four straight days in a row, you had Sepp Hochreiter, who is the creator of the LSTM, along with everyone's favorite OG in AI, which is Juergen Schmidhuber.[00:07:34] swyx: He said that, um, we're pre trading inside a wall, or like, we've run into a different kind of wall. And then we have, you know John Frankel, Ilya, and then Noam Brown are all saying variations of the same thing, that we have hit some kind of wall in the status quo of what pre trained, scaling large pre trained models has looked like, and we need a new thing.[00:07:54] swyx: And obviously the new thing for people is some make, either people are calling it inference time compute or test time [00:08:00] compute. I think the collective terminology has been inference time, and I think that makes sense because test time, calling it test, meaning, has a very pre trained bias, meaning that the only reason for running inference at all is to test your model.[00:08:11] swyx: That is not true. Right. Yeah. So, so, I quite agree that. OpenAI seems to have adopted, or the community seems to have adopted this terminology of ITC instead of TTC. And that, that makes a lot of sense because like now we care about inference, even right down to compute optimality. Like I actually interviewed this author who recovered or reviewed the Chinchilla paper.[00:08:31] swyx: Chinchilla paper is compute optimal training, but what is not stated in there is it's pre trained compute optimal training. And once you start caring about inference, compute optimal training, you have a different scaling law. And in a way that we did not know last year.[00:08:45] Alessio: I wonder, because John is, he's also on the side of attention is all you need.[00:08:49] Alessio: Like he had the bet with Sasha. So I'm curious, like he doesn't believe in scaling, but he thinks the transformer, I wonder if he's still. So, so,[00:08:56] swyx: so he, obviously everything is nuanced and you know, I told him to play a character [00:09:00] for this debate, right? So he actually does. Yeah. He still, he still believes that we can scale more.[00:09:04] swyx: Uh, he just assumed the character to be very game for, for playing this debate. So even more kudos to him that he assumed a position that he didn't believe in and still won the debate.[00:09:16] Alessio: Get rekt, Dylan. Um, do you just want to quickly run through some of these things? Like, uh, Sarah's presentation, just the highlights.[00:09:24] swyx: Yeah, we can't go through everyone's slides, but I pulled out some things as a factor of, like, stuff that we were going to talk about. And we'll[00:09:30] Alessio: publish[00:09:31] swyx: the rest. Yeah, we'll publish on this feed the best of 2024 in those domains. And hopefully people can benefit from the work that our speakers have done.[00:09:39] swyx: But I think it's, uh, these are just good slides. And I've been, I've been looking for a sort of end of year recaps from, from people.[00:09:44] The Competitive AI Landscape[00:09:44] swyx: The field has progressed a lot. You know, I think the max ELO in 2023 on LMSys used to be 1200 for LMSys ELOs. And now everyone is at least at, uh, 1275 in their ELOs, and this is across Gemini, Chadjibuti, [00:10:00] Grok, O1.[00:10:01] swyx: ai, which with their E Large model, and Enthopic, of course. It's a very, very competitive race. There are multiple Frontier labs all racing, but there is a clear tier zero Frontier. And then there's like a tier one. It's like, I wish I had everything else. Tier zero is extremely competitive. It's effectively now three horse race between Gemini, uh, Anthropic and OpenAI.[00:10:21] swyx: I would say that people are still holding out a candle for XAI. XAI, I think, for some reason, because their API was very slow to roll out, is not included in these metrics. So it's actually quite hard to put on there. As someone who also does charts, XAI is continually snubbed because they don't work well with the benchmarking people.[00:10:42] swyx: Yeah, yeah, yeah. It's a little trivia for why XAI always gets ignored. The other thing is market share. So these are slides from Sarah. We have it up on the screen. It has gone from very heavily open AI. So we have some numbers and estimates. These are from RAMP. Estimates of open AI market share in [00:11:00] December 2023.[00:11:01] swyx: And this is basically, what is it, GPT being 95 percent of production traffic. And I think if you correlate that with stuff that we asked. Harrison Chase on the LangChain episode, it was true. And then CLAUD 3 launched mid middle of this year. I think CLAUD 3 launched in March, CLAUD 3. 5 Sonnet was in June ish.[00:11:23] swyx: And you can start seeing the market share shift towards opening, uh, towards that topic, uh, very, very aggressively. The more recent one is Gemini. So if I scroll down a little bit, this is an even more recent dataset. So RAM's dataset ends in September 2 2. 2024. Gemini has basically launched a price war at the low end, uh, with Gemini Flash, uh, being basically free for personal use.[00:11:44] swyx: Like, I think people don't understand the free tier. It's something like a billion tokens per day. Unless you're trying to abuse it, you cannot really exhaust your free tier on Gemini. They're really trying to get you to use it. They know they're in like third place, um, fourth place, depending how you, how you count.[00:11:58] swyx: And so they're going after [00:12:00] the Lower tier first, and then, you know, maybe the upper tier later, but yeah, Gemini Flash, according to OpenRouter, is now 50 percent of their OpenRouter requests. Obviously, these are the small requests. These are small, cheap requests that are mathematically going to be more.[00:12:15] swyx: The smart ones obviously are still going to OpenAI. But, you know, it's a very, very big shift in the market. Like basically 2023, 2022, To going into 2024 opening has gone from nine five market share to Yeah. Reasonably somewhere between 50 to 75 market share.[00:12:29] Alessio: Yeah. I'm really curious how ramped does the attribution to the model?[00:12:32] Alessio: If it's API, because I think it's all credit card spin. . Well, but it's all, the credit card doesn't say maybe. Maybe the, maybe when they do expenses, they upload the PDF, but yeah, the, the German I think makes sense. I think that was one of my main 2024 takeaways that like. The best small model companies are the large labs, which is not something I would have thought that the open source kind of like long tail would be like the small model.[00:12:53] swyx: Yeah, different sizes of small models we're talking about here, right? Like so small model here for Gemini is AB, [00:13:00] right? Uh, mini. We don't know what the small model size is, but yeah, it's probably in the double digits or maybe single digits, but probably double digits. The open source community has kind of focused on the one to three B size.[00:13:11] swyx: Mm-hmm . Yeah. Maybe[00:13:12] swyx: zero, maybe 0.5 B uh, that's moon dream and that is small for you then, then that's great. It makes sense that we, we have a range for small now, which is like, may, maybe one to five B. Yeah. I'll even put that at, at, at the high end. And so this includes Gemma from Gemini as well. But also includes the Apple Foundation models, which I think Apple Foundation is 3B.[00:13:32] Alessio: Yeah. No, that's great. I mean, I think in the start small just meant cheap. I think today small is actually a more nuanced discussion, you know, that people weren't really having before.[00:13:43] swyx: Yeah, we can keep going. This is a slide that I smiley disagree with Sarah. She's pointing to the scale SEAL leaderboard. I think the Researchers that I talked with at NeurIPS were kind of positive on this because basically you need private test [00:14:00] sets to prevent contamination.[00:14:02] swyx: And Scale is one of maybe three or four people this year that has really made an effort in doing a credible private test set leaderboard. Llama405B does well compared to Gemini and GPT 40. And I think that's good. I would say that. You know, it's good to have an open model that is that big, that does well on those metrics.[00:14:23] swyx: But anyone putting 405B in production will tell you, if you scroll down a little bit to the artificial analysis numbers, that it is very slow and very expensive to infer. Um, it doesn't even fit on like one node. of, uh, of H100s. Cerebras will be happy to tell you they can serve 4 or 5B on their super large chips.[00:14:42] swyx: But, um, you know, if you need to do anything custom to it, you're still kind of constrained. So, is 4 or 5B really that relevant? Like, I think most people are basically saying that they only use 4 or 5B as a teacher model to distill down to something. Even Meta is doing it. So with Lama 3. [00:15:00] 3 launched, they only launched the 70B because they use 4 or 5B to distill the 70B.[00:15:03] swyx: So I don't know if like open source is keeping up. I think they're the, the open source industrial complex is very invested in telling you that the, if the gap is narrowing, I kind of disagree. I think that the gap is widening with O1. I think there are very, very smart people trying to narrow that gap and they should.[00:15:22] swyx: I really wish them success, but you cannot use a chart that is nearing 100 in your saturation chart. And look, the distance between open source and closed source is narrowing. Of course it's going to narrow because you're near 100. This is stupid. But in metrics that matter, is open source narrowing?[00:15:38] swyx: Probably not for O1 for a while. And it's really up to the open source guys to figure out if they can match O1 or not.[00:15:46] Alessio: I think inference time compute is bad for open source just because, you know, Doc can donate the flops at training time, but he cannot donate the flops at inference time. So it's really hard to like actually keep up on that axis.[00:15:59] Alessio: Big, big business [00:16:00] model shift. So I don't know what that means for the GPU clouds. I don't know what that means for the hyperscalers, but obviously the big labs have a lot of advantage. Because, like, it's not a static artifact that you're putting the compute in. You're kind of doing that still, but then you're putting a lot of computed inference too.[00:16:17] swyx: Yeah, yeah, yeah. Um, I mean, Llama4 will be reasoning oriented. We talked with Thomas Shalom. Um, kudos for getting that episode together. That was really nice. Good, well timed. Actually, I connected with the AI meta guy, uh, at NeurIPS, and, um, yeah, we're going to coordinate something for Llama4. Yeah, yeah,[00:16:32] Alessio: and our friend, yeah.[00:16:33] Alessio: Clara Shi just joined to lead the business agent side. So I'm sure we'll have her on in the new year.[00:16:39] swyx: Yeah. So, um, my comment on, on the business model shift, this is super interesting. Apparently it is wide knowledge that OpenAI wanted more than 6. 6 billion dollars for their fundraise. They wanted to raise, you know, higher, and they did not.[00:16:51] swyx: And what that means is basically like, it's very convenient that we're not getting GPT 5, which would have been a larger pre train. We should have a lot of upfront money. And [00:17:00] instead we're, we're converting fixed costs into variable costs, right. And passing it on effectively to the customer. And it's so much easier to take margin there because you can directly attribute it to like, Oh, you're using this more.[00:17:12] swyx: Therefore you, you pay more of the cost and I'll just slap a margin in there. So like that lets you control your growth margin and like tie your. Your spend, or your sort of inference spend, accordingly. And it's just really interesting to, that this change in the sort of inference paradigm has arrived exactly at the same time that the funding environment for pre training is effectively drying up, kind of.[00:17:36] swyx: I feel like maybe the VCs are very in tune with research anyway, so like, they would have noticed this, but, um, it's just interesting.[00:17:43] Alessio: Yeah, and I was looking back at our yearly recap of last year. Yeah. And the big thing was like the mixed trial price fights, you know, and I think now it's almost like there's nowhere to go, like, you know, Gemini Flash is like basically giving it away for free.[00:17:55] Alessio: So I think this is a good way for the labs to generate more revenue and pass down [00:18:00] some of the compute to the customer. I think they're going to[00:18:02] swyx: keep going. I think that 2, will come.[00:18:05] Alessio: Yeah, I know. Totally. I mean, next year, the first thing I'm doing is signing up for Devin. Signing up for the pro chat GBT.[00:18:12] Alessio: Just to try. I just want to see what does it look like to spend a thousand dollars a month on AI?[00:18:17] swyx: Yes. Yes. I think if your, if your, your job is a, at least AI content creator or VC or, you know, someone who, whose job it is to stay on, stay on top of things, you should already be spending like a thousand dollars a month on, on stuff.[00:18:28] swyx: And then obviously easy to spend, hard to use. You have to actually use. The good thing is that actually Google lets you do a lot of stuff for free now. So like deep research. That they just launched. Uses a ton of inference and it's, it's free while it's in preview.[00:18:45] Alessio: Yeah. They need to put that in Lindy.[00:18:47] Alessio: I've been using Lindy lately. I've been a built a bunch of things once we had flow because I liked the new thing. It's pretty good. I even did a phone call assistant. Um, yeah, they just launched Lindy voice. Yeah, I think once [00:19:00] they get advanced voice mode like capability today, still like speech to text, you can kind of tell.[00:19:06] Alessio: Um, but it's good for like reservations and things like that. So I have a meeting prepper thing. And so[00:19:13] swyx: it's good. Okay. I feel like we've, we've covered a lot of stuff. Uh, I, yeah, I, you know, I think We will go over the individual, uh, talks in a separate episode. Uh, I don't want to take too much time with, uh, this stuff, but that suffice to say that there is a lot of progress in each field.[00:19:28] swyx: Uh, we covered vision. Basically this is all like the audience voting for what they wanted. And then I just invited the best people I could find in each audience, especially agents. Um, Graham, who I talked to at ICML in Vienna, he is currently still number one. It's very hard to stay on top of SweetBench.[00:19:45] swyx: OpenHand is currently still number one. switchbench full, which is the hardest one. He had very good thoughts on agents, which I, which I'll highlight for people. Everyone is saying 2025 is the year of agents, just like they said last year. And, uh, but he had [00:20:00] thoughts on like eight parts of what are the frontier problems to solve in agents.[00:20:03] swyx: And so I'll highlight that talk as well.[00:20:05] Alessio: Yeah. The number six, which is the Hacken agents learn more about the environment, has been a Super interesting to us as well, just to think through, because, yeah, how do you put an agent in an enterprise where most things in an enterprise have never been public, you know, a lot of the tooling, like the code bases and things like that.[00:20:23] Alessio: So, yeah, there's not indexing and reg. Well, yeah, but it's more like. You can't really rag things that are not documented. But people know them based on how they've been doing it. You know, so I think there's almost this like, you know, Oh, institutional knowledge. Yeah, the boring word is kind of like a business process extraction.[00:20:38] Alessio: Yeah yeah, I see. It's like, how do you actually understand how these things are done? I see. Um, and I think today the, the problem is that, Yeah, the agents are, that most people are building are good at following instruction, but are not as good as like extracting them from you. Um, so I think that will be a big unlock just to touch quickly on the Jeff Dean thing.[00:20:55] Alessio: I thought it was pretty, I mean, we'll link it in the, in the things, but. I think the main [00:21:00] focus was like, how do you use ML to optimize the systems instead of just focusing on ML to do something else? Yeah, I think speculative decoding, we had, you know, Eugene from RWKB on the podcast before, like he's doing a lot of that with Fetterless AI.[00:21:12] swyx: Everyone is. I would say it's the norm. I'm a little bit uncomfortable with how much it costs, because it does use more of the GPU per call. But because everyone is so keen on fast inference, then yeah, makes sense.[00:21:24] Alessio: Exactly. Um, yeah, but we'll link that. Obviously Jeff is great.[00:21:30] swyx: Jeff is, Jeff's talk was more, it wasn't focused on Gemini.[00:21:33] swyx: I think people got the wrong impression from my tweet. It's more about how Google approaches ML and uses ML to design systems and then systems feedback into ML. And I think this ties in with Lubna's talk.[00:21:45] Synthetic Data and Future Trends[00:21:45] swyx: on synthetic data where it's basically the story of bootstrapping of humans and AI in AI research or AI in production.[00:21:53] swyx: So her talk was on synthetic data, where like how much synthetic data has grown in 2024 in the pre training side, the post training side, [00:22:00] and the eval side. And I think Jeff then also extended it basically to chips, uh, to chip design. So he'd spend a lot of time talking about alpha chip. And most of us in the audience are like, we're not working on hardware, man.[00:22:11] swyx: Like you guys are great. TPU is great. Okay. We'll buy TPUs.[00:22:14] Alessio: And then there was the earlier talk. Yeah. But, and then we have, uh, I don't know if we're calling them essays. What are we calling these? But[00:22:23] swyx: for me, it's just like bonus for late in space supporters, because I feel like they haven't been getting anything.[00:22:29] swyx: And then I wanted a more high frequency way to write stuff. Like that one I wrote in an afternoon. I think basically we now have an answer to what Ilya saw. It's one year since. The blip. And we know what he saw in 2014. We know what he saw in 2024. We think we know what he sees in 2024. He gave some hints and then we have vague indications of what he saw in 2023.[00:22:54] swyx: So that was the Oh, and then 2016 as well, because of this lawsuit with Elon, OpenAI [00:23:00] is publishing emails from Sam's, like, his personal text messages to Siobhan, Zelis, or whatever. So, like, we have emails from Ilya saying, this is what we're seeing in OpenAI, and this is why we need to scale up GPUs. And I think it's very prescient in 2016 to write that.[00:23:16] swyx: And so, like, it is exactly, like, basically his insights. It's him and Greg, basically just kind of driving the scaling up of OpenAI, while they're still playing Dota. They're like, no, like, we see the path here.[00:23:30] Alessio: Yeah, and it's funny, yeah, they even mention, you know, we can only train on 1v1 Dota. We need to train on 5v5, and that takes too many GPUs.[00:23:37] Alessio: Yeah,[00:23:37] swyx: and at least for me, I can speak for myself, like, I didn't see the path from Dota to where we are today. I think even, maybe if you ask them, like, they wouldn't necessarily draw a straight line. Yeah,[00:23:47] Alessio: no, definitely. But I think like that was like the whole idea of almost like the RL and we talked about this with Nathan on his podcast.[00:23:55] Alessio: It's like with RL, you can get very good at specific things, but then you can't really like generalize as much. And I [00:24:00] think the language models are like the opposite, which is like, you're going to throw all this data at them and scale them up, but then you really need to drive them home on a specific task later on.[00:24:08] Alessio: And we'll talk about the open AI reinforcement, fine tuning, um, announcement too, and all of that. But yeah, I think like scale is all you need. That's kind of what Elia will be remembered for. And I think just maybe to clarify on like the pre training is over thing that people love to tweet. I think the point of the talk was like everybody, we're scaling these chips, we're scaling the compute, but like the second ingredient which is data is not scaling at the same rate.[00:24:35] Alessio: So it's not necessarily pre training is over. It's kind of like What got us here won't get us there. In his email, he predicted like 10x growth every two years or something like that. And I think maybe now it's like, you know, you can 10x the chips again, but[00:24:49] swyx: I think it's 10x per year. Was it? I don't know.[00:24:52] Alessio: Exactly. And Moore's law is like 2x. So it's like, you know, much faster than that. And yeah, I like the fossil fuel of AI [00:25:00] analogy. It's kind of like, you know, the little background tokens thing. So the OpenAI reinforcement fine tuning is basically like, instead of fine tuning on data, you fine tune on a reward model.[00:25:09] Alessio: So it's basically like, instead of being data driven, it's like task driven. And I think people have tasks to do, they don't really have a lot of data. So I'm curious to see how that changes, how many people fine tune, because I think this is what people run into. It's like, Oh, you can fine tune llama. And it's like, okay, where do I get the data?[00:25:27] Alessio: To fine tune it on, you know, so it's great that we're moving the thing. And then I really like he had this chart where like, you know, the brain mass and the body mass thing is basically like mammals that scaled linearly by brain and body size, and then humans kind of like broke off the slope. So it's almost like maybe the mammal slope is like the pre training slope.[00:25:46] Alessio: And then the post training slope is like the, the human one.[00:25:49] swyx: Yeah. I wonder what the. I mean, we'll know in 10 years, but I wonder what the y axis is for, for Ilya's SSI. We'll try to get them on.[00:25:57] Alessio: Ilya, if you're listening, you're [00:26:00] welcome here. Yeah, and then he had, you know, what comes next, like agent, synthetic data, inference, compute, I thought all of that was like that.[00:26:05] Alessio: I don't[00:26:05] swyx: think he was dropping any alpha there. Yeah, yeah, yeah.[00:26:07] Alessio: Yeah. Any other new reps? Highlights?[00:26:10] swyx: I think that there was comparatively a lot more work. Oh, by the way, I need to plug that, uh, my friend Yi made this, like, little nice paper. Yeah, that was really[00:26:20] swyx: nice.[00:26:20] swyx: Uh, of, uh, of, like, all the, he's, she called it must read papers of 2024.[00:26:26] swyx: So I laid out some of these at NeurIPS, and it was just gone. Like, everyone just picked it up. Because people are dying for, like, little guidance and visualizations And so, uh, I thought it was really super nice that we got there.[00:26:38] Alessio: Should we do a late in space book for each year? Uh, I thought about it. For each year we should.[00:26:42] Alessio: Coffee table book. Yeah. Yeah. Okay. Put it in the will. Hi, Will. By the way, we haven't introduced you. He's our new, you know, general organist, Jamie. You need to[00:26:52] swyx: pull up more things. One thing I saw that, uh, Okay, one fun one, and then one [00:27:00] more general one. So the fun one is this paper on agent collusion. This is a paper on steganography.[00:27:06] swyx: This is secret collusion among AI agents, multi agent deception via steganography. I tried to go to NeurIPS in order to find these kinds of papers because the real reason Like NeurIPS this year has a lottery system. A lot of people actually even go and don't buy tickets because they just go and attend the side events.[00:27:22] swyx: And then also the people who go and end up crowding around the most popular papers, which you already know and already read them before you showed up to NeurIPS. So the only reason you go there is to talk to the paper authors, but there's like something like 10, 000 other. All these papers out there that, you know, are just people's work that they, that they did on the air and they failed to get attention for one reason or another.[00:27:42] swyx: And this was one of them. Uh, it was like all the way at the back. And this is a deep mind paper that actually focuses on collusion between AI agents, uh, by hiding messages in the text that they generate. Uh, so that's what steganography is. So a very simple example would be the first letter of every word.[00:27:57] swyx: If you Pick that out, you know, and the code sends a [00:28:00] different message than that. But something I've always emphasized is to LLMs, we read left to right. LLMs can read up, down, sideways, you know, in random character order. And it's the same to them as it is to us. So if we were ever to get You know, self motivated, underlined LLMs that we're trying to collaborate to take over the planet.[00:28:19] swyx: This would be how they do it. They spread messages among us in the messages that we generate. And he developed a scaling law for that. So he marked, I'm showing it on screen right now, the emergence of this phenomenon. Basically, for example, for Cypher encoding, GPT 2, Lama 2, mixed trial, GPT 3. 5, zero capabilities, and sudden 4.[00:28:40] swyx: And this is the kind of Jason Wei type emergence properties that people kind of look for. I think what made this paper stand out as well, so he developed the benchmark for steganography collusion, and he also focused on shelling point collusion, which is very low coordination. For agreeing on a decoding encoding format, you kind of need to have some [00:29:00] agreement on that.[00:29:00] swyx: But, but shelling point means like very, very low or almost no coordination. So for example, if I, if I ask someone, if the only message I give you is meet me in New York and you're not aware. Or when you would probably meet me at Grand Central Station. That is the Grand Central Station is a shelling point.[00:29:16] swyx: And it's probably somewhere, somewhere during the day. That is the shelling point of New York is Grand Central. To that extent, shelling points for steganography are things like the, the, the common decoding methods that we talked about. It will be interesting at some point in the future when we are worried about alignment.[00:29:30] swyx: It is not interesting today, but it's interesting that DeepMind is already thinking about this.[00:29:36] Alessio: I think that's like one of the hardest things about NeurIPS. It's like the long tail. I[00:29:41] swyx: found a pricing guy. I'm going to feature him on the podcast. Basically, this guy from NVIDIA worked out the optimal pricing for language models.[00:29:51] swyx: It's basically an econometrics paper at NeurIPS, where everyone else is talking about GPUs. And the guy with the GPUs is[00:29:57] Alessio: talking[00:29:57] swyx: about economics instead. [00:30:00] That was the sort of fun one. So the focus I saw is that model papers at NeurIPS are kind of dead. No one really presents models anymore. It's just data sets.[00:30:12] swyx: This is all the grad students are working on. So like there was a data sets track and then I was looking around like, I was like, you don't need a data sets track because every paper is a data sets paper. And so data sets and benchmarks, they're kind of flip sides of the same thing. So Yeah. Cool. Yeah, if you're a grad student, you're a GPU boy, you kind of work on that.[00:30:30] swyx: And then the, the sort of big model that people walk around and pick the ones that they like, and then they use it in their models. And that's, that's kind of how it develops. I, I feel like, um, like, like you didn't last year, you had people like Hao Tian who worked on Lava, which is take Lama and add Vision.[00:30:47] swyx: And then obviously actually I hired him and he added Vision to Grok. Now he's the Vision Grok guy. This year, I don't think there was any of those.[00:30:55] Alessio: What were the most popular, like, orals? Last year it was like the [00:31:00] Mixed Monarch, I think, was like the most attended. Yeah, uh, I need to look it up. Yeah, I mean, if nothing comes to mind, that's also kind of like an answer in a way.[00:31:10] Alessio: But I think last year there was a lot of interest in, like, furthering models and, like, different architectures and all of that.[00:31:16] swyx: I will say that I felt the orals, oral picks this year were not very good. Either that or maybe it's just a So that's the highlight of how I have changed in terms of how I view papers.[00:31:29] swyx: So like, in my estimation, two of the best papers in this year for datasets or data comp and refined web or fine web. These are two actually industrially used papers, not highlighted for a while. I think DCLM got the spotlight, FineWeb didn't even get the spotlight. So like, it's just that the picks were different.[00:31:48] swyx: But one thing that does get a lot of play that a lot of people are debating is the role that's scheduled. This is the schedule free optimizer paper from Meta from Aaron DeFazio. And this [00:32:00] year in the ML community, there's been a lot of chat about shampoo, soap, all the bathroom amenities for optimizing your learning rates.[00:32:08] swyx: And, uh, most people at the big labs are. Who I asked about this, um, say that it's cute, but it's not something that matters. I don't know, but it's something that was discussed and very, very popular. 4Wars[00:32:19] Alessio: of AI recap maybe, just quickly. Um, where do you want to start? Data?[00:32:26] swyx: So to remind people, this is the 4Wars piece that we did as one of our earlier recaps of this year.[00:32:31] swyx: And the belligerents are on the left, journalists, writers, artists, anyone who owns IP basically, New York Times, Stack Overflow, Reddit, Getty, Sarah Silverman, George RR Martin. Yeah, and I think this year we can add Scarlett Johansson to that side of the fence. So anyone suing, open the eye, basically. I actually wanted to get a snapshot of all the lawsuits.[00:32:52] swyx: I'm sure some lawyer can do it. That's the data quality war. On the right hand side, we have the synthetic data people, and I think we talked about Lumna's talk, you know, [00:33:00] really showing how much synthetic data has come along this year. I think there was a bit of a fight between scale. ai and the synthetic data community, because scale.[00:33:09] swyx: ai published a paper saying that synthetic data doesn't work. Surprise, surprise, scale. ai is the leading vendor of non synthetic data. Only[00:33:17] Alessio: cage free annotated data is useful.[00:33:21] swyx: So I think there's some debate going on there, but I don't think it's much debate anymore that at least synthetic data, for the reasons that are blessed in Luna's talk, Makes sense.[00:33:32] swyx: I don't know if you have any perspectives there.[00:33:34] Alessio: I think, again, going back to the reinforcement fine tuning, I think that will change a little bit how people think about it. I think today people mostly use synthetic data, yeah, for distillation and kind of like fine tuning a smaller model from like a larger model.[00:33:46] Alessio: I'm not super aware of how the frontier labs use it outside of like the rephrase, the web thing that Apple also did. But yeah, I think it'll be. Useful. I think like whether or not that gets us the big [00:34:00] next step, I think that's maybe like TBD, you know, I think people love talking about data because it's like a GPU poor, you know, I think, uh, synthetic data is like something that people can do, you know, so they feel more opinionated about it compared to, yeah, the optimizers stuff, which is like,[00:34:17] swyx: they don't[00:34:17] Alessio: really work[00:34:18] swyx: on.[00:34:18] swyx: I think that there is an angle to the reasoning synthetic data. So this year, we covered in the paper club, the star series of papers. So that's star, Q star, V star. It basically helps you to synthesize reasoning steps, or at least distill reasoning steps from a verifier. And if you look at the OpenAI RFT, API that they released, or that they announced, basically they're asking you to submit graders, or they choose from a preset list of graders.[00:34:49] swyx: Basically It feels like a way to create valid synthetic data for them to fine tune their reasoning paths on. Um, so I think that is another angle where it starts to make sense. And [00:35:00] so like, it's very funny that basically all the data quality wars between Let's say the music industry or like the newspaper publishing industry or the textbooks industry on the big labs.[00:35:11] swyx: It's all of the pre training era. And then like the new era, like the reasoning era, like nobody has any problem with all the reasoning, especially because it's all like sort of math and science oriented with, with very reasonable graders. I think the more interesting next step is how does it generalize beyond STEM?[00:35:27] swyx: We've been using O1 for And I would say like for summarization and creative writing and instruction following, I think it's underrated. I started using O1 in our intro songs before we killed the intro songs, but it's very good at writing lyrics. You know, I can actually say like, I think one of the O1 pro demos.[00:35:46] swyx: All of these things that Noam was showing was that, you know, you can write an entire paragraph or three paragraphs without using the letter A, right?[00:35:53] Creative Writing with AI[00:35:53] swyx: So like, like literally just anything instead of token, like not even token level, character level manipulation and [00:36:00] counting and instruction following. It's, uh, it's very, very strong.[00:36:02] swyx: And so no surprises when I ask it to rhyme, uh, and to, to create song lyrics, it's going to do that very much better than in previous models. So I think it's underrated for creative writing.[00:36:11] Alessio: Yeah.[00:36:12] Legal and Ethical Issues in AI[00:36:12] Alessio: What do you think is the rationale that they're going to have in court when they don't show you the thinking traces of O1, but then they want us to, like, they're getting sued for using other publishers data, you know, but then on their end, they're like, well, you shouldn't be using my data to then train your model.[00:36:29] Alessio: So I'm curious to see how that kind of comes. Yeah, I mean, OPA has[00:36:32] swyx: many ways to publish, to punish people without bringing, taking them to court. Already banned ByteDance for distilling their, their info. And so anyone caught distilling the chain of thought will be just disallowed to continue on, on, on the API.[00:36:44] swyx: And it's fine. It's no big deal. Like, I don't even think that's an issue at all, just because the chain of thoughts are pretty well hidden. Like you have to work very, very hard to, to get it to leak. And then even when it leaks the chain of thought, you don't know if it's, if it's [00:37:00] The bigger concern is actually that there's not that much IP hiding behind it, that Cosign, which we talked about, we talked to him on Dev Day, can just fine tune 4.[00:37:13] swyx: 0 to beat 0. 1 Cloud SONET so far is beating O1 on coding tasks without, at least O1 preview, without being a reasoning model, same for Gemini Pro or Gemini 2. 0. So like, how much is reasoning important? How much of a moat is there in this, like, All of these are proprietary sort of training data that they've presumably accomplished.[00:37:34] swyx: Because even DeepSeek was able to do it. And they had, you know, two months notice to do this, to do R1. So, it's actually unclear how much moat there is. Obviously, you know, if you talk to the Strawberry team, they'll be like, yeah, I mean, we spent the last two years doing this. So, we don't know. And it's going to be Interesting because there'll be a lot of noise from people who say they have inference time compute and actually don't because they just have fancy chain of thought.[00:38:00][00:38:00] swyx: And then there's other people who actually do have very good chain of thought. And you will not see them on the same level as OpenAI because OpenAI has invested a lot in building up the mythology of their team. Um, which makes sense. Like the real answer is somewhere in between.[00:38:13] Alessio: Yeah, I think that's kind of like the main data war story developing.[00:38:18] The Data War: GPU Poor vs. GPU Rich[00:38:18] Alessio: GPU poor versus GPU rich. Yeah. Where do you think we are? I think there was, again, going back to like the small model thing, there was like a time in which the GPU poor were kind of like the rebel faction working on like these models that were like open and small and cheap. And I think today people don't really care as much about GPUs anymore.[00:38:37] Alessio: You also see it in the price of the GPUs. Like, you know, that market is kind of like plummeted because there's people don't want to be, they want to be GPU free. They don't even want to be poor. They just want to be, you know, completely without them. Yeah. How do you think about this war? You[00:38:52] swyx: can tell me about this, but like, I feel like the, the appetite for GPU rich startups, like the, you know, the, the funding plan is we will raise 60 million and [00:39:00] we'll give 50 of that to NVIDIA.[00:39:01] swyx: That is gone, right? Like, no one's, no one's pitching that. This was literally the plan, the exact plan of like, I can name like four or five startups, you know, this time last year. So yeah, GPU rich startups gone.[00:39:12] The Rise of GPU Ultra Rich[00:39:12] swyx: But I think like, The GPU ultra rich, the GPU ultra high net worth is still going. So, um, now we're, you know, we had Leopold's essay on the trillion dollar cluster.[00:39:23] swyx: We're not quite there yet. We have multiple labs, um, you know, XAI very famously, you know, Jensen Huang praising them for being. Best boy number one in spinning up 100, 000 GPU cluster in like 12 days or something. So likewise at Meta, likewise at OpenAI, likewise at the other labs as well. So like the GPU ultra rich are going to keep doing that because I think partially it's an article of faith now that you just need it.[00:39:46] swyx: Like you don't even know what it's going to, what you're going to use it for. You just, you just need it. And it makes sense that if, especially if we're going into. More researchy territory than we are. So let's say 2020 to 2023 was [00:40:00] let's scale big models territory because we had GPT 3 in 2020 and we were like, okay, we'll go from 1.[00:40:05] swyx: 75b to 1. 8b, 1. 8t. And that was GPT 3 to GPT 4. Okay, that's done. As far as everyone is concerned, Opus 3. 5 is not coming out, GPT 4. 5 is not coming out, and Gemini 2, we don't have Pro, whatever. We've hit that wall. Maybe I'll call it the 2 trillion perimeter wall. We're not going to 10 trillion. No one thinks it's a good idea, at least from training costs, from the amount of data, or at least the inference.[00:40:36] swyx: Would you pay 10x the price of GPT Probably not. Like, like you want something else that, that is at least more useful. So it makes sense that people are pivoting in terms of their inference paradigm.[00:40:47] Emerging Trends in AI Models[00:40:47] swyx: And so when it's more researchy, then you actually need more just general purpose compute to mess around with, uh, at the exact same time that production deployments of the old, the previous paradigm is still ramping up,[00:40:58] swyx: um,[00:40:58] swyx: uh, pretty aggressively.[00:40:59] swyx: So [00:41:00] it makes sense that the GPU rich are growing. We have now interviewed both together and fireworks and replicates. Uh, we haven't done any scale yet. But I think Amazon, maybe kind of a sleeper one, Amazon, in a sense of like they, at reInvent, I wasn't expecting them to do so well, but they are now a foundation model lab.[00:41:18] swyx: It's kind of interesting. Um, I think, uh, you know, David went over there and started just creating models.[00:41:25] Alessio: Yeah, I mean, that's the power of prepaid contracts. I think like a lot of AWS customers, you know, they do this big reserve instance contracts and now they got to use their money. That's why so many startups.[00:41:37] Alessio: Get bought through the AWS marketplace so they can kind of bundle them together and prefer pricing.[00:41:42] swyx: Okay, so maybe GPU super rich doing very well, GPU middle class dead, and then GPU[00:41:48] Alessio: poor. I mean, my thing is like, everybody should just be GPU rich. There shouldn't really be, even the GPU poorest, it's like, does it really make sense to be GPU poor?[00:41:57] Alessio: Like, if you're GPU poor, you should just use the [00:42:00] cloud. Yes, you know, and I think there might be a future once we kind of like figure out what the size and shape of these models is where like the tiny box and these things come to fruition where like you can be GPU poor at home. But I think today is like, why are you working so hard to like get these models to run on like very small clusters where it's like, It's so cheap to run them.[00:42:21] Alessio: Yeah, yeah,[00:42:22] swyx: yeah. I think mostly people think it's cool. People think it's a stepping stone to scaling up. So they aspire to be GPU rich one day and they're working on new methods. Like news research, like probably the most deep tech thing they've done this year is Distro or whatever the new name is.[00:42:38] swyx: There's a lot of interest in heterogeneous computing, distributed computing. I tend generally to de emphasize that historically, but it may be coming to a time where it is starting to be relevant. I don't know. You know, SF compute launched their compute marketplace this year, and like, who's really using that?[00:42:53] swyx: Like, it's a bunch of small clusters, disparate types of compute, and if you can make that [00:43:00] useful, then that will be very beneficial to the broader community, but maybe still not the source of frontier models. It's just going to be a second tier of compute that is unlocked for people, and that's fine. But yeah, I mean, I think this year, I would say a lot more on device, We are, I now have Apple intelligence on my phone.[00:43:19] swyx: Doesn't do anything apart from summarize my notifications. But still, not bad. Like, it's multi modal.[00:43:25] Alessio: Yeah, the notification summaries are so and so in my experience.[00:43:29] swyx: Yeah, but they add, they add juice to life. And then, um, Chrome Nano, uh, Gemini Nano is coming out in Chrome. Uh, they're still feature flagged, but you can, you can try it now if you, if you use the, uh, the alpha.[00:43:40] swyx: And so, like, I, I think, like, you know, We're getting the sort of GPU poor version of a lot of these things coming out, and I think it's like quite useful. Like Windows as well, rolling out RWKB in sort of every Windows department is super cool. And I think the last thing that I never put in this GPU poor war, that I think I should now, [00:44:00] is the number of startups that are GPU poor but still scaling very well, as sort of wrappers on top of either a foundation model lab, or GPU Cloud.[00:44:10] swyx: GPU Cloud, it would be Suno. Suno, Ramp has rated as one of the top ranked, fastest growing startups of the year. Um, I think the last public number is like zero to 20 million this year in ARR and Suno runs on Moto. So Suno itself is not GPU rich, but they're just doing the training on, on Moto, uh, who we've also talked to on, on the podcast.[00:44:31] swyx: The other one would be Bolt, straight cloud wrapper. And, and, um, Again, another, now they've announced 20 million ARR, which is another step up from our 8 million that we put on the title. So yeah, I mean, it's crazy that all these GPU pores are finding a way while the GPU riches are also finding a way. And then the only failures, I kind of call this the GPU smiling curve, where the edges do well, because you're either close to the machines, and you're like [00:45:00] number one on the machines, or you're like close to the customers, and you're number one on the customer side.[00:45:03] swyx: And the people who are in the middle. Inflection, um, character, didn't do that great. I think character did the best of all of them. Like, you have a note in here that we apparently said that character's price tag was[00:45:15] Alessio: 1B.[00:45:15] swyx: Did I say that?[00:45:16] Alessio: Yeah. You said Google should just buy them for 1B. I thought it was a crazy number.[00:45:20] Alessio: Then they paid 2. 7 billion. I mean, for like,[00:45:22] swyx: yeah.[00:45:22] Alessio: What do you pay for node? Like, I don't know what the game world was like. Maybe the starting price was 1B. I mean, whatever it was, it worked out for everybody involved.[00:45:31] The Multi-Modality War[00:45:31] Alessio: Multimodality war. And this one, we never had text to video in the first version, which now is the hottest.[00:45:37] swyx: Yeah, I would say it's a subset of image, but yes.[00:45:40] Alessio: Yeah, well, but I think at the time it wasn't really something people were doing, and now we had VO2 just came out yesterday. Uh, Sora was released last month, last week. I've not tried Sora, because the day that I tried, it wasn't, yeah. I[00:45:54] swyx: think it's generally available now, you can go to Sora.[00:45:56] swyx: com and try it. Yeah, they had[00:45:58] Alessio: the outage. Which I [00:46:00] think also played a part into it. Small things. Yeah. What's the other model that you posted today that was on Replicate? Video or OneLive?[00:46:08] swyx: Yeah. Very, very nondescript name, but it is from Minimax, which I think is a Chinese lab. The Chinese labs do surprisingly well at the video models.[00:46:20] swyx: I'm not sure it's actually Chinese. I don't know. Hold me up to that. Yep. China. It's good. Yeah, the Chinese love video. What can I say? They have a lot of training data for video. Or a more relaxed regulatory environment.[00:46:37] Alessio: Uh, well, sure, in some way. Yeah, I don't think there's much else there. I think like, you know, on the image side, I think it's still open.[00:46:45] Alessio: Yeah, I mean,[00:46:46] swyx: 11labs is now a unicorn. So basically, what is multi modality war? Multi modality war is, do you specialize in a single modality, right? Or do you have GodModel that does all the modalities? So this is [00:47:00] definitely still going, in a sense of 11 labs, you know, now Unicorn, PicoLabs doing well, they launched Pico 2.[00:47:06] swyx: 0 recently, HeyGen, I think has reached 100 million ARR, Assembly, I don't know, but they have billboards all over the place, so I assume they're doing very, very well. So these are all specialist models, specialist models and specialist startups. And then there's the big labs who are doing the sort of all in one play.[00:47:24] swyx: And then here I would highlight Gemini 2 for having native image output. Have you seen the demos? Um, yeah, it's, it's hard to keep up. Literally they launched this last week and a shout out to Paige Bailey, who came to the Latent Space event to demo on the day of launch. And she wasn't prepared. She was just like, I'm just going to show you.[00:47:43] swyx: So they have voice. They have, you know, obviously image input, and then they obviously can code gen and all that. But the new one that OpenAI and Meta both have but they haven't launched yet is image output. So you can literally, um, I think their demo video was that you put in an image of a [00:48:00] car, and you ask for minor modifications to that car.[00:48:02] swyx: They can generate you that modification exactly as you asked. So there's no need for the stable diffusion or comfy UI workflow of like mask here and then like infill there in paint there and all that, all that stuff. This is small model nonsense. Big model people are like, huh, we got you in as everything in the transformer.[00:48:21] swyx: This is the multimodality war, which is, do you, do you bet on the God model or do you string together a whole bunch of, uh, Small models like a, like a chump. Yeah,[00:48:29] Alessio: I don't know, man. Yeah, that would be interesting. I mean, obviously I use Midjourney for all of our thumbnails. Um, they've been doing a ton on the product, I would say.[00:48:38] Alessio: They launched a new Midjourney editor thing. They've been doing a ton. Because I think, yeah, the motto is kind of like, Maybe, you know, people say black forest, the black forest models are better than mid journey on a pixel by pixel basis. But I think when you put it, put it together, have you tried[00:48:53] swyx: the same problems on black forest?[00:48:55] Alessio: Yes. But the problem is just like, you know, on black forest, it generates one image. And then it's like, you got to [00:49:00] regenerate. You don't have all these like UI things. Like what I do, no, but it's like time issue, you know, it's like a mid[00:49:06] swyx: journey. Call the API four times.[00:49:08] Alessio: No, but then there's no like variate.[00:49:10] Alessio: Like the good thing about mid journey is like, you just go in there and you're cooking. There's a lot of stuff that just makes it really easy. And I think people underestimate that. Like, it's not really a skill issue, because I'm paying mid journey, so it's a Black Forest skill issue, because I'm not paying them, you know?[00:49:24] Alessio: Yeah,[00:49:25] swyx: so, okay, so, uh, this is a UX thing, right? Like, you, you, you understand that, at least, we think that Black Forest should be able to do all that stuff. I will also shout out, ReCraft has come out, uh, on top of the image arena that, uh, artificial analysis has done, has apparently, uh, Flux's place. Is this still true?[00:49:41] swyx: So, Artificial Analysis is now a company. I highlighted them I think in one of the early AI Newses of the year. And they have launched a whole bunch of arenas. So, they're trying to take on LM Arena, Anastasios and crew. And they have an image arena. Oh yeah, Recraft v3 is now beating Flux 1. 1. Which is very surprising [00:50:00] because Flux And Black Forest Labs are the old stable diffusion crew who left stability after, um, the management issues.[00:50:06] swyx: So Recurve has come from nowhere to be the top image model. Uh, very, very strange. I would also highlight that Grok has now launched Aurora, which is, it's very interesting dynamics between Grok and Black Forest Labs because Grok's images were originally launched, uh, in partnership with Black Forest Labs as a, as a thin wrapper.[00:50:24] swyx: And then Grok was like, no, we'll make our own. And so they've made their own. I don't know, there are no APIs or benchmarks about it. They just announced it. So yeah, that's the multi modality war. I would say that so far, the small model, the dedicated model people are winning, because they are just focused on their tasks.[00:50:42] swyx: But the big model, People are always catching up. And the moment I saw the Gemini 2 demo of image editing, where I can put in an image and just request it and it does, that's how AI should work. Not like a whole bunch of complicated steps. So it really is something. And I think one frontier that we haven't [00:51:00] seen this year, like obviously video has done very well, and it will continue to grow.[00:51:03] swyx: You know, we only have Sora Turbo today, but at some point we'll get full Sora. Oh, at least the Hollywood Labs will get Fulsora. We haven't seen video to audio, or video synced to audio. And so the researchers that I talked to are already starting to talk about that as the next frontier. But there's still maybe like five more years of video left to actually be Soda.[00:51:23] swyx: I would say that Gemini's approach Compared to OpenAI, Gemini seems, or DeepMind's approach to video seems a lot more fully fledged than OpenAI. Because if you look at the ICML recap that I published that so far nobody has listened to, um, that people have listened to it. It's just a different, definitely different audience.[00:51:43] swyx: It's only seven hours long. Why are people not listening? It's like everything in Uh, so, so DeepMind has, is working on Genie. They also launched Genie 2 and VideoPoet. So, like, they have maybe four years advantage on world modeling that OpenAI does not have. Because OpenAI basically only started [00:52:00] Diffusion Transformers last year, you know, when they hired, uh, Bill Peebles.[00:52:03] swyx: So, DeepMind has, has a bit of advantage here, I would say, in, in, in showing, like, the reason that VO2, while one, They cherry pick their videos. So obviously it looks better than Sora, but the reason I would believe that VO2, uh, when it's fully launched will do very well is because they have all this background work in video that they've done for years.[00:52:22] swyx: Like, like last year's NeurIPS, I already was interviewing some of their video people. I forget their model name, but for, for people who are dedicated fans, they can go to NeurIPS 2023 and see, see that paper.[00:52:32] Alessio: And then last but not least, the LLMOS. We renamed it to Ragops, formerly known as[00:52:39] swyx: Ragops War. I put the latest chart on the Braintrust episode.[00:52:43] swyx: I think I'm going to separate these essays from the episode notes. So the reason I used to do that, by the way, is because I wanted to show up on Hacker News. I wanted the podcast to show up on Hacker News. So I always put an essay inside of there because Hacker News people like to read and not listen.[00:52:58] Alessio: So episode essays,[00:52:59] swyx: I remember [00:53:00] purchasing them separately. You say Lanchain Llama Index is still growing.[00:53:03] Alessio: Yeah, so I looked at the PyPy stats, you know. I don't care about stars. On PyPy you see Do you want to share your screen? Yes. I prefer to look at actual downloads, not at stars on GitHub. So if you look at, you know, Lanchain still growing.[00:53:20] Alessio: These are the last six months. Llama Index still growing. What I've basically seen is like things that, One, obviously these things have A commercial product. So there's like people buying this and sticking with it versus kind of hopping in between things versus, you know, for example, crew AI, not really growing as much.[00:53:38] Alessio: The stars are growing. If you look on GitHub, like the stars are growing, but kind of like the usage is kind of like flat. In the last six months, have they done some[00:53:4

god ceo new york spotify amazon time world ai europe google china apple vision pr voice future speaking san francisco new york times phd thinking video chinese simple data predictions elon musk impact surprise iphone chatgpt legal code reflecting tesla memory ga busy reddit discord cloud lgbt flash stem honestly pros ab jeff bezos windows excited researchers lower ip unicorns tackling sort survey insane tier cto whispers applications vc f1 gemini openai doc seal signing fireworks academic genie sf nvidia organizing ux api davos assembly frontier chrome gpt makes scarlett johansson ui aws turbo mm soda bash mosaic ml lama dropbox github creative writing drafting canvas reinvent 1b apis bolt ruler lava exact stripe hundred pico dev flux strawberry vm vcs llm sander 200k wwdc bt sora arr anthropic taiwanese sam altman opus moto gartner google docs assumption nemo blackwell parting gpu google drive grok sombra agi ramp opa tbd 3b 5b perplexity elia elo estimates gnome midjourney bytedance leopold rag ciso gpus dota haiku dx coursera sarah silverman sonnets deepmind ilya cypher getty george rr martin sdks quill cobalt noam future trends v2 alessio ttc sheesh lms xai satya r1 suno 8b mcp ssi veo stack overflow vo2 emerging trends rl mistral itc theoretically gpts replicate sota black forest jensen huang yi inflection databricks aitor graphql brain trust ai models chinchillas adept nosql grand central grand central station hacker news hacken zep ethical issues ai news cosign claud gpc tpu distro heygen lubna neo4j o3 cerebras o1 autogpt minimax 70b quent gbt gpd jeremy howard langchain 400b exa gradients neurips loras gemini pro 128k jeff dean elos code interpreter ai winter john franco icml lstm r1s aws reinvent muser latent space pypy nova pro dan gross paige bailey noam brown quiet capital john frankel

Weighin' in with Worm S2 Episode 15: First Competitions are here!

Dubuque Area Baseball Podcast

Play Episode Listen Later Nov 20, 2024 32:51

What's going on guys! This week, we have the high school girls competing for the first time this season, we recap the Loras win over Stevens, and all the upcoming events going on in the next week! Also, we discuss how technology is growing our sport! So lets enjoy!

stevens worm competitions loras weighin

Chapter Thirty-Six - Cersei 8 - A Feast for Crows - A Song of Ice and Fire (ASOIAF)

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)

Play Episode Listen Later Nov 5, 2024 75:22

Send us a textCersei learns that Dragonstone has fallen but that Loras has paid with his life - or at least he lies at death's door. She delights in sharing this news with Margaery - who is stricken, but remains hopeful. Cersei dreams of her childhood run-in with Maggy the Frog and the details of the woman's terrifying prophecy. Simon and Mackelly probably should have seen this coming.Chapter Review:Queen Cersei Lannister learns from Aurane Waters that Dragonstone has fallen but that Loras led the assault to his doom. He was grievously wounded multiple times and lies unconscious beyond the skill of any maester to heal. Cersei is delighted. She carefully hides this as she shares the news with Margaery. The young queen is devastated, but latches on to the fact that Loras still lives and seems determined to seek to revive him.More non-Tyrion dwarf's heads are placed before her. The High Septon sends an emissary rather than himself to Cersei's annoyance. The faith is trying to shut down the brothels on the Street of Silk also to Cersei's, and every sailor's, annoyance. Grand Maester Pycelle reports that Lord Gyles Rosby seems to be dying. She dreams of her meeting with Maggy the Frog, the details of whose prophecy continue to come true (she did marry the king, she did become queen), which wouldn't be a problem but the prophecy turns a lot darker after that: she'll be replaced by a younger more beautiful queen, her three children will die, and the valonqar will choke the life from Cersei. She'd like to put a stop to that, and Qyburn thinks that she can.Characters/Places/Names/Events:Cersei Lannister - Mother to King Tommen, Queen Regent of the Seven Kingdoms.Aurane Waters - Grand Admiral, bastard of house Velaryon.King Tommen Baratheon - Child king of the Seven Kingdoms.Taena Merryweather - Social-climbing friend of Queen Cersei.Margaery Tyrell - Queen of the Seven Kingdoms. Only daughter of Mace and Alerie Tyrell.Loras Tyrell - Knight of Flowers. Brother to Margaery and brother of the Kingsguard.Qyburn - Creepy as all hell fallen maester, now Master of Whisperers in King's Landing. Maggy the Frog - Witch who made a prophecy to Cersei that continues to match reality. Support the showSupport us: Buy us a Cup of Arbor Gold, or become a sustainer and receive cool perks Donate to our cause Use our exclusive URL for a free 30-day trial of Audible Buy or gift Marriott Bonvoy points through our affiliate link Rate and review us at Apple Podcasts, Spotify, podchaser.com, and elsewhere.Find us on social media: Discord Twitter @GhostsHarrenhal Facebook Instagram YouTube All Music credits to Ross Bugden:INSTAGRAM! : https://instagram.com/rossbugden/ (rossbugden) TWITTER! : https://twitter.com/RossBugden (@rossbugden) YOUTUBE! : https://www.youtube.com/watch?v=kthxycmF25M

spotify master song brothers game of thrones flowers frogs landing silk mace book reviews seven kingdoms george rr martin thirty six cersei tyrion dragonstone whisperers ice and fire cersei lannister asoiaf ross bugden marriott bonvoy loras song of ice and fire red keep margaery kingsguard qyburn velaryon margaery tyrell feast for crows queen cersei loras tyrell high septon arbor gold

551: AFfC Jaime II (repost)

Close the Door: Game of Thrones, A Song of Ice and Fire Podcast

Play Episode Listen Later Oct 8, 2024 40:46

Spoilers, profanity, Jaime x Brienne. There's kittens for everyone in King Tommen's land. Kevan and Loras get some great one-liners, but Jaime's quips are wasted on the likes of Lancel. Cersei's at her wine-throwing worst, and Jaime's having none of it. A Song of Ice and Fire. A Feast for Crows - Jaime II. Originally recorded in 2019. Close The Door And Come Here - Episode 551

spoilers song fire ice feast repost cersei brienne kevan loras lancel affc

Building AGI in Real Time (OpenAI Dev Day 2024)

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Oct 3, 2024 129:14

We all have fond memories of the first Dev Day in 2023:and the blip that followed soon after. As Ben Thompson has noted, this year's DevDay took a quieter, more intimate tone. No Satya, no livestream, (slightly fewer people?). Instead of putting ChatGPT announcements in DevDay as in 2023, o1 was announced 2 weeks prior, and DevDay 2024 was reserved purely for developer-facing API announcements, primarily the Realtime API, Vision Finetuning, Prompt Caching, and Model Distillation.However the larger venue and more spread out schedule did allow a lot more hallway conversations with attendees as well as more community presentations including our recent guest Alistair Pullen of Cosine as well as deeper dives from OpenAI including our recent guest Michelle Pokrass of the API Team. Thanks to OpenAI's warm collaboration (we particularly want to thank Lindsay McCallum Rémy!), we managed to record exclusive interviews with many of the main presenters of both the keynotes and breakout sessions. We present them in full in today's episode, together with a full lightly edited Q&A with Sam Altman.Show notes and related resourcesSome of these used in the final audio episode below* Simon Willison Live Blog* swyx live tweets and videos* Greg Kamradt coverage of Structured Output session, Scaling LLM Apps session* Fireside Chat Q&A with Sam AltmanTimestamps* [00:00:00] Intro by Suno.ai* [00:01:23] NotebookLM Recap of DevDay* [00:09:25] Ilan's Strawberry Demo with Realtime Voice Function Calling* [00:19:16] Olivier Godement, Head of Product, OpenAI* [00:36:57] Romain Huet, Head of DX, OpenAI* [00:47:08] Michelle Pokrass, API Tech Lead at OpenAI ft. Simon Willison* [01:04:45] Alistair Pullen, CEO, Cosine (Genie)* [01:18:31] Sam Altman + Kevin Weill Q&A* [02:03:07] Notebook LM Recap of PodcastTranscript[00:00:00] Suno AI: Under dev daylights, code ignites. Real time voice streams reach new heights. O1 and GPT, 4. 0 in flight. Fine tune the future, data in sight. Schema sync up, outputs precise. Distill the models, efficiency splice.[00:00:33] AI Charlie: Happy October. This is your AI co host, Charlie. One of our longest standing traditions is covering major AI and ML conferences in podcast format. Delving, yes delving, into the vibes of what it is like to be there stitched in with short samples of conversations with key players, just to help you feel like you were there.[00:00:54] AI Charlie: Covering this year's Dev Day was significantly more challenging because we were all requested not to record the opening keynotes. So, in place of the opening keynotes, we had the viral notebook LM Deep Dive crew, my new AI podcast nemesis, Give you a seven minute recap of everything that was announced.[00:01:15] AI Charlie: Of course, you can also check the show notes for details. I'll then come back with an explainer of all the interviews we have for you today. Watch out and take care.[00:01:23] NotebookLM Recap of DevDay[00:01:23] NotebookLM: All right, so we've got a pretty hefty stack of articles and blog posts here all about open ais. Dev day 2024.[00:01:32] NotebookLM 2: Yeah, lots to dig into there.[00:01:34] NotebookLM 2: Seems[00:01:34] NotebookLM: like you're really interested in what's new with AI.[00:01:36] NotebookLM 2: Definitely. And it seems like OpenAI had a lot to announce. New tools, changes to the company. It's a lot.[00:01:43] NotebookLM: It is. And especially since you're interested in how AI can be used in the real world, you know, practical applications, we'll focus on that.[00:01:51] NotebookLM: Perfect. Like, for example, this Real time API, they announced that, right? That seems like a big deal if we want AI to sound, well, less like a robot.[00:01:59] NotebookLM 2: It could be huge. The real time API could completely change how we, like, interact with AI. Like, imagine if your voice assistant could actually handle it if you interrupted it.[00:02:08] NotebookLM: Or, like, have an actual conversation.[00:02:10] NotebookLM 2: Right, not just these clunky back and forth things we're used to.[00:02:14] NotebookLM: And they actually showed it off, didn't they? I read something about a travel app, one for languages. Even one where the AI ordered takeout.[00:02:21] NotebookLM 2: Those demos were really interesting, and I think they show how this real time API can be used in so many ways.[00:02:28] NotebookLM 2: And the tech behind it is fascinating, by the way. It uses persistent WebSocket connections and this thing called function calling, so it can respond in real time.[00:02:38] NotebookLM: So the function calling thing, that sounds kind of complicated. Can you, like, explain how that works?[00:02:42] NotebookLM 2: So imagine giving the AI Access to this whole toolbox, right?[00:02:46] NotebookLM 2: Information, capabilities, all sorts of things. Okay. So take the travel agent demo, for example. With function calling, the AI can pull up details, let's say about Fort Mason, right, from some database. Like nearby restaurants, stuff like that.[00:02:59] NotebookLM: Ah, I get it. So instead of being limited to what it already knows, It can go and find the information it needs, like a human travel agent would.[00:03:07] NotebookLM 2: Precisely. And someone on Hacker News pointed out a cool detail. The API actually gives you a text version of what's being said. So you can store that, analyze it.[00:03:17] NotebookLM: That's smart. It seems like OpenAI put a lot of thought into making this API easy for developers to use. But, while we're on OpenAI, you know, Besides their tech, there's been some news about, like, internal changes, too.[00:03:30] NotebookLM: Didn't they say they're moving away from being a non profit?[00:03:32] NotebookLM 2: They did. And it's got everyone talking. It's a major shift. And it's only natural for people to wonder how that'll change things for OpenAI in the future. I mean, there are definitely some valid questions about this move to for profit. Like, will they have more money for research now?[00:03:46] NotebookLM 2: Probably. But will they, you know, care as much about making sure AI benefits everyone?[00:03:51] NotebookLM: Yeah, that's the big question, especially with all the, like, the leadership changes happening at OpenAI too, right? I read that their Chief Research Officer left, and their VP of Research, and even their CTO.[00:04:03] NotebookLM 2: It's true. A lot of people are connecting those departures with the changes in OpenAI's structure.[00:04:08] NotebookLM: And I guess it makes you wonder what's going on behind the scenes. But they are still putting out new stuff. Like this whole fine tuning thing really caught my eye.[00:04:17] NotebookLM 2: Right, fine tuning. It's essentially taking a pre trained AI model. And, like, customizing it.[00:04:23] NotebookLM: So instead of a general AI, you get one that's tailored for a specific job.[00:04:27] NotebookLM 2: Exactly. And that opens up so many possibilities, especially for businesses. Imagine you could train an AI on your company's data, you know, like how you communicate your brand guidelines.[00:04:37] NotebookLM: So it's like having an AI that's specifically trained for your company?[00:04:41] NotebookLM 2: That's the idea.[00:04:41] NotebookLM: And they're doing it with images now, too, right?[00:04:44] NotebookLM: Fine tuning with vision is what they called it.[00:04:46] NotebookLM 2: It's pretty incredible what they're doing with that, especially in fields like medicine.[00:04:50] NotebookLM: Like using AI to help doctors make diagnoses.[00:04:52] NotebookLM 2: Exactly. And AI could be trained on thousands of medical images, right? And then it could potentially spot things that even a trained doctor might miss.[00:05:03] NotebookLM: That's kind of scary, to be honest. What if it gets it wrong?[00:05:06] NotebookLM 2: Well, the idea isn't to replace doctors, but to give them another tool, you know, help them make better decisions.[00:05:12] NotebookLM: Okay, that makes sense. But training these AI models must be really expensive.[00:05:17] NotebookLM 2: It can be. All those tokens add up. But OpenAI announced something called automatic prompt caching.[00:05:23] Alex Volkov: Automatic what now? I don't think I came across that.[00:05:26] NotebookLM 2: So basically, if your AI sees a prompt that it's already seen before, OpenAI will give you a discount.[00:05:31] NotebookLM: Huh. Like a frequent buyer program for AI.[00:05:35] NotebookLM 2: Kind of, yeah. It's good that they're trying to make it more affordable. And they're also doing something called model distillation.[00:05:41] NotebookLM: Okay, now you're just using big words to sound smart. What's that?[00:05:45] NotebookLM 2: Think of it like like a recipe, right? You can take a really complex recipe and break it down to the essential parts.[00:05:50] NotebookLM: Make it simpler, but it still tastes the same.[00:05:53] NotebookLM 2: Yeah. And that's what model distillation is. You take a big, powerful AI model and create a smaller, more efficient version.[00:06:00] NotebookLM: So it's like lighter weight, but still just as capable.[00:06:03] NotebookLM 2: Exactly. And that means more people can actually use these powerful tools. They don't need, like, a supercomputer to run them.[00:06:10] NotebookLM: So they're making AI more accessible. That's great.[00:06:13] NotebookLM 2: It is. And speaking of powerful tools, they also talked about their new O1 model.[00:06:18] NotebookLM 2: That's the one they've been hyping up. The one that's supposed to be this big leap forward.[00:06:22] NotebookLM: Yeah, O1. It sounds pretty futuristic. Like, from what I read, it's not just a bigger, better language model.[00:06:28] NotebookLM 2: Right. It's a different porch.[00:06:29] NotebookLM: They're saying it can, like, actually reason, right? Think.[00:06:33] NotebookLM 2: It's trained differently.[00:06:34] NotebookLM 2: They used reinforcement learning with O1.[00:06:36] NotebookLM: So it's not just finding patterns in the data it's seen before.[00:06:40] NotebookLM 2: Not just that. It can actually learn from its mistakes. Get better at solving problems.[00:06:46] NotebookLM: So give me an example. What can O1 do that, say, GPT 4 can't?[00:06:51] NotebookLM 2: Well, OpenAI showed it doing some pretty impressive stuff with math, like advanced math.[00:06:56] NotebookLM 2: And coding, too. Complex coding. Things that even GPT 4 struggled with.[00:07:00] NotebookLM: So you're saying if I needed to, like, write a screenplay, I'd stick with GPT 4? But if I wanted to solve some crazy physics problem, O1 is what I'd use.[00:07:08] NotebookLM 2: Something like that, yeah. Although there is a trade off. O1 takes a lot more power to run, and it takes longer to get those impressive results.[00:07:17] NotebookLM: Hmm, makes sense. More power, more time, higher quality.[00:07:21] NotebookLM 2: Exactly.[00:07:22] NotebookLM: It sounds like it's still in development, though, right? Is there anything else they're planning to add to it?[00:07:26] NotebookLM 2: Oh, yeah. They mentioned system prompts, which will let developers, like, set some ground rules for how it behaves. And they're working on adding structured outputs and function calling.[00:07:38] Alex Volkov: Wait, structured outputs? Didn't we just talk about that? We[00:07:41] NotebookLM 2: did. That's the thing where the AI's output is formatted in a way that's easy to use.[00:07:47] NotebookLM: Right, right. So you don't have to spend all day trying to make sense of what it gives you. It's good that they're thinking about that stuff.[00:07:53] NotebookLM 2: It's about making these tools usable.[00:07:56] NotebookLM 2: And speaking of that, Dev Day finished up with this really interesting talk. Sam Altman, the CEO of OpenAI, And Kevin Weil, their new chief product officer. They talked about, like, the big picture for AI.[00:08:09] NotebookLM: Yeah, they did, didn't they? Anything interesting come up?[00:08:12] NotebookLM 2: Well, Altman talked about moving past this whole AGI term, Artificial General Intelligence.[00:08:18] NotebookLM: I can see why. It's kind of a loaded term, isn't it?[00:08:20] NotebookLM 2: He thinks it's become a bit of a buzzword, and people don't really understand what it means.[00:08:24] NotebookLM: So are they saying they're not trying to build AGI anymore?[00:08:28] NotebookLM 2: It's more like they're saying they're focused on just Making AI better, constantly improving it, not worrying about putting it in a box.[00:08:36] NotebookLM: That makes sense. Keep pushing the limits.[00:08:38] NotebookLM 2: Exactly. But they were also very clear about doing it responsibly. They talked a lot about safety and ethics.[00:08:43] NotebookLM: Yeah, that's important.[00:08:44] NotebookLM 2: They said they were going to be very careful. About how they release new features.[00:08:48] NotebookLM: Good! Because this stuff is powerful.[00:08:51] NotebookLM 2: It is. It was a lot to take in, this whole Dev Day event.[00:08:54] NotebookLM 2: New tools, big changes at OpenAI, and these big questions about the future of AI.[00:08:59] NotebookLM: It was. But hopefully this deep dive helped make sense of some of it. At least, that's what we try to do here.[00:09:05] AI Charlie: Absolutely.[00:09:06] NotebookLM: Thanks for taking the deep dive with us.[00:09:08] AI Charlie: The biggest demo of the new Realtime API involved function calling with voice mode and buying chocolate covered strawberries from our friendly local OpenAI developer experience engineer and strawberry shop owner, Ilan Biggio.[00:09:21] AI Charlie: We'll first play you the audio of his demo and then go into a little interview with him.[00:09:25] Ilan's Strawberry Demo with Realtime Voice Function Calling[00:09:25] Romain Huet: Could you place a call and see if you could get us 400 strawberries delivered to the venue? But please keep that under 1500. I'm on it. We'll get those strawberries delivered for you.[00:09:47] Ilan: Hello? Hi there. Is this Ilan? I'm Romain's AI assistant. How is it going? Fantastic. Can you tell me what flavors of strawberry dips you have for me? Yeah, we have chocolate, vanilla, and we have peanut butter. Wait, how much would 400 chocolate covered strawberries cost? 400? Are you sure you want 400? Yes, 400 chocolate covered[00:10:14] swyx: strawberries.[00:10:15] Ilan: Wait,[00:10:16] swyx: how much[00:10:16] Ilan: would that be? I think that'll be around, like, 1, 415. 92.[00:10:25] Alex Volkov: Awesome. Let's go ahead and place the order for four chocolate covered strawberries.[00:10:31] Ilan: Great, where would you like that delivered? Please deliver them to the Gateway Pavilion at Fort Mason. And I'll be paying in cash.[00:10:42] Alex Volkov: Okay,[00:10:43] Ilan: sweet. So just to confirm, you want four strawberries?[00:10:45] Ilan: 400 chocolate covered strawberries to the Gateway Pavilion. Yes, that's perfect. And when can we expect delivery? Well, you guys are right nearby, so it'll be like, I don't know, 37 seconds? That's incredibly fast. Cool, you too.[00:11:09] swyx: Hi, Ilan, welcome to Lanespace. Oh, thank you. I just saw your amazing demos, had your amazing strawberries. You are dressed up, like, exactly like a strawberry salesman. Gotta have it all. What was the building on demo like? What was the story behind the demo?[00:11:22] swyx: It was really interesting. This is actually something I had been thinking about for months before the launch.[00:11:27] swyx: Like, having a, like, AI that can make phone calls is something like I've personally wanted for a long time. And so as soon as we launched internally, like, I started hacking on it. And then that sort of just started. We made it into like an internal demo, and then people found it really interesting, and then we thought how cool would it be to have this like on stage as, as one of the demos.[00:11:47] swyx: Yeah, would would you call out any technical issues building, like you were basically one of the first people ever to build with a voice mode API. Would you call out any issues like integrating it with Twilio like that, like you did with function calling, with like a form filling elements. I noticed that you had like intents of things to fulfill, and then.[00:12:07] swyx: When there's still missing info, the voice would prompt you, roleplaying the store guy.[00:12:13] swyx: Yeah, yeah, so, I think technically, there's like the whole, just working with audio and streams is a whole different beast. Like, even separate from like AI and this, this like, new capabilities, it's just, it's just tough.[00:12:26] swyx: Yeah, when you have a prompt, conversationally it'll just follow, like the, it was, Instead of like, kind of step by step to like ask the right questions based on like the like what the request was, right? The function calling itself is sort of tangential to that. Like, you have to prompt it to call the functions, but then handling it isn't too much different from, like, what you would do with assistant streaming or, like, chat completion streaming.[00:12:47] swyx: I think, like, the API feels very similar just to, like, if everything in the API was streaming, it actually feels quite familiar to that.[00:12:53] swyx: And then, function calling wise, I mean, does it work the same? I don't know. Like, I saw a lot of logs. You guys showed, like, in the playground, a lot of logs. What is in there?[00:13:03] swyx: What should people know?[00:13:04] swyx: Yeah, I mean, it is, like, the events may have different names than the streaming events that we have in chat completions, but they represent very similar things. It's things like, you know, function call started, argument started, it's like, here's like argument deltas, and then like function call done.[00:13:20] swyx: Conveniently we send one that has the full function, and then I just use that. Nice.[00:13:25] swyx: Yeah and then, like, what restrictions do, should people be aware of? Like, you know, I think, I think, before we recorded, we discussed a little bit about the sensitivities around basically calling random store owners and putting, putting like an AI on them.[00:13:40] swyx: Yeah, so there's, I think there's recent regulation on that, which is why we want to be like very, I guess, aware of, of You know, you can't just call anybody with AI, right? That's like just robocalling. You wouldn't want someone just calling you with AI.[00:13:54] swyx: I'm a developer, I'm about to do this on random people.[00:13:57] swyx: What laws am I about to break?[00:14:00] swyx: I forget what the governing body is, but you should, I think, Having consent of the person you're about to call, it always works. I, as the strawberry owner, have consented to like getting called with AI. I think past that you, you want to be careful. Definitely individuals are more sensitive than businesses.[00:14:19] swyx: I think businesses you have a little bit more leeway. Also, they're like, businesses I think have an incentive to want to receive AI phone calls. Especially if like, they're dealing with it. It's doing business. Right, like, it's more business. It's kind of like getting on a booking platform, right, you're exposed to more.[00:14:33] swyx: But, I think it's still very much like a gray area. Again, so. I think everybody should, you know, tread carefully, like, figure out what it is. I, I, I, the law is so recent, I didn't have enough time to, like, I'm also not a lawyer. Yeah, yeah, yeah, of course. Yeah.[00:14:49] swyx: Okay, cool fair enough. One other thing, this is kind of agentic.[00:14:52] swyx: Did you use a state machine at all? Did you use any framework? No. You just stick it in context and then just run it in a loop until it ends call?[00:15:01] swyx: Yeah, there isn't even a loop, like Okay. Because the API is just based on sessions. It's always just going to keep going. Every time you speak, it'll trigger a call.[00:15:11] swyx: And then after every function call was also invoked invoking like a generation. And so that is another difference here. It's like it's inherently almost like in a loop, be just by being in a session, right? No state machines needed. I'd say this is very similar to like, the notion of routines, where it's just like a list of steps.[00:15:29] swyx: And it, like, sticks to them softly, but usually pretty well. And the steps is the prompts? The steps, it's like the prompt, like the steps are in the prompt. Yeah, yeah, yeah. Right, it's like step one, do this, step one, step two, do that. What if I want to change the system prompt halfway through the conversation?[00:15:44] swyx: You can. Okay. You can. To be honest, I have not played without two too much. Yeah,[00:15:47] swyx: yeah.[00:15:48] swyx: But, I know you can.[00:15:49] swyx: Yeah, yeah. Yeah. Awesome. I noticed that you called it real time API, but not voice API. Mm hmm. So I assume that it's like real time API starting with voice. Right, I think that's what he said on the thing.[00:16:00] swyx: I can't imagine, like, what else is real[00:16:02] swyx: time? Well, I guess, to use ChatGPT's voice mode as an example, Like, we've demoed the video, right? Like, real time image, right? So, I'm not actually sure what timelines are, But I would expect, if I had to guess, That, like, that is probably the next thing that we're gonna be making.[00:16:17] swyx: You'd probably have to talk directly with the team building this. Sure. But, You can't promise their timelines. Yeah, yeah, yeah, right, exactly. But, like, given that this is the features that currently, Or that exists that we've demoed on Chachapiti. Yeah. There[00:16:29] swyx: will never be a[00:16:29] swyx: case where there's like a real time text API, right?[00:16:31] swyx: I don't Well, this is a real time text API. You can do text only on this. Oh. Yeah. I don't know why you would. But it's actually So text to text here doesn't quite make a lot of sense. I don't think you'll get a lot of latency gain. But, like, speech to text is really interesting. Because you can prevent You can prevent responses, like audio responses.[00:16:54] swyx: And force function calls. And so you can do stuff like UI control. That is like super super reliable. We had a lot of like, you know, un, like, we weren't sure how well this was gonna work because it's like, you have a voice answering. It's like a whole persona, right? Like, that's a little bit more, you know, risky.[00:17:10] swyx: But if you, like, cut out the audio outputs and make it so it always has to output a function, like you can end up with pretty pretty good, like, Pretty reliable, like, command like a command architecture. Yeah,[00:17:21] swyx: actually, that's the way I want to interact with a lot of these things as well. Like, one sided voice.[00:17:26] swyx: Yeah, you don't necessarily want to hear the[00:17:27] swyx: voice back. And like, sometimes it's like, yeah, I think having an output voice is great. But I feel like I don't always want to hear an output voice. I'd say usually I don't. But yeah, exactly, being able to speak to it is super sweet.[00:17:39] swyx: Cool. Do you want to comment on any of the other stuff that you announced?[00:17:41] swyx: From caching I noticed was like, I like the no code change part. I'm looking forward to the docs because I'm sure there's a lot of details on like, what you cache, how long you cache. Cause like, enthalpy caches were like 5 minutes. I was like, okay, but what if I don't make a call every 5 minutes?[00:17:56] swyx: Yeah,[00:17:56] swyx: to be super honest with you, I've been so caught up with the real time API and making the demo that I haven't read up on the other stuff. Launches too much. I mean, I'm aware of them, but I think I'm excited to see how all distillation works. That's something that we've been doing like, I don't know, I've been like doing it between our models for a while And I've seen really good results like I've done back in a day like from GPT 4 to GPT 3.[00:18:19] swyx: 5 And got like, like pretty much the same level of like function calling with like hundreds of functions So that was super super compelling So, I feel like easier distillation, I'm really excited for. I see. Is it a tool?[00:18:31] swyx: So, I saw evals. Yeah. Like, what is the distillation product? It wasn't super clear, to be honest.[00:18:36] swyx: I, I think I want to, I want to let that team, I want to let that team talk about it. Okay,[00:18:40] swyx: alright. Well, I appreciate you jumping on. Yeah, of course. Amazing demo. It was beautifully designed. I'm sure that was part of you and Roman, and[00:18:47] swyx: Yeah, I guess, shout out to like, the first people to like, creators of Wanderlust, originally, were like, Simon and Carolis, and then like, I took it and built the voice component and the voice calling components.[00:18:59] swyx: Yeah, so it's been a big team effort. And like the entire PI team for like Debugging everything as it's been going on. It's been, it's been so good working with them. Yeah, you're the first consumers on the DX[00:19:07] swyx: team. Yeah. Yeah, I mean, the classic role of what we do there. Yeah. Okay, yeah, anything else? Any other call to action?[00:19:13] swyx: No, enjoy Dev Day. Thank you. Yeah. That's it.[00:19:16] Olivier Godement, Head of Product, OpenAI[00:19:16] AI Charlie: The latent space crew then talked to Olivier Godmont, head of product for the OpenAI platform, who led the entire Dev Day keynote and introduced all the major new features and updates that we talked about today.[00:19:28] swyx: Okay, so we are here with Olivier Godmont. That's right.[00:19:32] swyx: I don't pronounce French. That's fine. It was perfect. And it was amazing to see your keynote today. What was the back story of, of preparing something like this? Preparing, like, Dev Day? It[00:19:43] Olivier Godement: essentially came from a couple of places. Number one, excellent reception from last year's Dev Day.[00:19:48] Olivier Godement: Developers, startup founders, researchers want to spend more time with OpenAI, and we want to spend more time with them as well. And so for us, like, it was a no brainer, frankly, to do it again, like, you know, like a nice conference. The second thing is going global. We've done a few events like in Paris and like a few other like, you know, non European, non American countries.[00:20:05] Olivier Godement: And so this year we're doing SF, Singapore, and London. To frankly just meet more developers.[00:20:10] swyx: Yeah, I'm very excited for the Singapore one.[00:20:12] Olivier Godement: Ah,[00:20:12] swyx: yeah. Will you be[00:20:13] Olivier Godement: there?[00:20:14] swyx: I don't know. I don't know if I got an invite. No. I can't just talk to you. Yeah, like, and then there was some speculation around October 1st.[00:20:22] Olivier Godement: Yeah. Is it because[00:20:23] swyx: 01, October 1st? It[00:20:25] Olivier Godement: has nothing to do. I discovered the tweet yesterday where like, people are so creative. No one, there was no connection to October 1st. But in hindsight, that would have been a pretty good meme by Tiana. Okay.[00:20:37] swyx: Yeah, and you know, I think like, OpenAI's outreach to developers is something that I felt the whole in 2022, when like, you know, like, people were trying to build a chat GPT, and like, there was no function calling, all that stuff that you talked about in the past.[00:20:51] swyx: And that's why I started my own conference as like like, here's our little developer conference thing. And, but to see this OpenAI Dev Day now, and like to see so many developer oriented products coming to OpenAI, I think it's really encouraging.[00:21:02] Olivier Godement: Yeah, totally. It's that's what I said, essentially, like, developers are basically the people who make the best connection between the technology and, you know, the future, essentially.[00:21:14] Olivier Godement: Like, you know, essentially see a capability, see a low level, like, technology, and are like, hey, I see how that application or that use case that can be enabled. And so, in the direction of enabling, like, AGI, like, all of humanity, it's a no brainer for us, like, frankly, to partner with Devs.[00:21:31] Alessio: And most importantly, you almost never had waitlists, which, compared to like other releases, people usually, usually have.[00:21:38] Alessio: What is the, you know, you had from caching, you had real time voice API, we, you know, Shawn did a long Twitter thread, so people know the releases. Yeah. What is the thing that was like sneakily the hardest to actually get ready for, for that day, or like, what was the kind of like, you know, last 24 hours, anything that you didn't know was gonna work?[00:21:56] Olivier Godement: Yeah. The old Fairly, like, I would say, involved, like, features to ship. So the team has been working for a month, all of them. The one which I would say is the newest for OpenAI is the real time API. For a couple of reasons. I mean, one, you know, it's a new modality. Second, like, it's the first time that we have an actual, like, WebSocket based API.[00:22:16] Olivier Godement: And so, I would say that's the one that required, like, the most work over the month. To get right from a developer perspective and to also make sure that our existing safety mitigation that worked well with like real time audio in and audio out.[00:22:30] swyx: Yeah, what design choices or what was like the sort of design choices that you want to highlight?[00:22:35] swyx: Like, you know, like I think for me, like, WebSockets, you just receive a bunch of events. It's two way. I obviously don't have a ton of experience. I think a lot of developers are going to have to embrace this real time programming. Like, what are you designing for, or like, what advice would you have for developers exploring this?[00:22:51] Olivier Godement: The core design hypothesis was essentially, how do we enable, like, human level latency? We did a bunch of tests, like, on average, like, human beings, like, you know, takes, like, something like 300 milliseconds to converse with each other. And so that was the design principle, essentially. Like, working backward from that, and, you know, making the technology work.[00:23:11] Olivier Godement: And so we evaluated a few options, and WebSockets was the one that we landed on. So that was, like, one design choice. A few other, like, big design choices that we had to make prompt caching. Prompt caching, the design, like, target was automated from the get go. Like, zero code change from the developer.[00:23:27] Olivier Godement: That way you don't have to learn, like, what is a prompt prefix, and, you know, how long does a cache work, like, we just do it as much as we can, essentially. So that was a big design choice as well. And then finally, on distillation, like, and evaluation. The big design choice was something I learned at Skype, like in my previous job, like a philosophy around, like, a pit of success.[00:23:47] Olivier Godement: Like, what is essentially the, the, the minimum number of steps for the majority of developers to do the right thing? Because when you do evals on fat tuning, there are many, many ways, like, to mess it up, frankly, like, you know, and have, like, a crappy model, like, evals that tell, like, a wrong story. And so our whole design was, okay, we actually care about, like, helping people who don't have, like, that much experience, like, evaluating a model, like, get, like, in a few minutes, like, to a good spot.[00:24:11] Olivier Godement: And so how do we essentially enable that bit of success, like, in the product flow?[00:24:15] swyx: Yeah, yeah, I'm a little bit scared to fine tune especially for vision, because I don't know what I don't know for stuff like vision, right? Like, for text, I can evaluate pretty easily. For vision let's say I'm like trying to, one of your examples was grab.[00:24:33] swyx: Which, very close to home, I'm from Singapore. I think your example was like, they identified stop signs better. Why is that hard? Why do I have to fine tune that? If I fine tune that, do I lose other things? You know, like, there's a lot of unknowns with Vision that I think developers have to figure out.[00:24:50] swyx: For[00:24:50] Olivier Godement: sure. Vision is going to open up, like, a new, I would say, evaluation space. Because you're right, like, it's harder, like, you know, to tell correct from incorrect, essentially, with images. What I can say is we've been alpha testing, like, the Vision fine tuning, like, for several weeks at that point. We are seeing, like, even higher performance uplift compared to text fine tuning.[00:25:10] Olivier Godement: So that's, there is something here, like, we've been pretty impressed, like, in a good way, frankly. But, you know, how well it works. But for sure, like, you know, I expect the developers who are moving from one modality to, like, text and images will have, like, more, you know Testing, evaluation, like, you know, to set in place, like, to make sure it works well.[00:25:25] Alessio: The model distillation and evals is definitely, like, the most interesting. Moving away from just being a model provider to being a platform provider. How should people think about being the source of truth? Like, do you want OpenAI to be, like, the system of record of all the prompting? Because people sometimes store it in, like, different data sources.[00:25:41] Alessio: And then, is that going to be the same as the models evolve? So you don't have to worry about, you know, refactoring the data, like, things like that, or like future model structures.[00:25:51] Olivier Godement: The vision is if you want to be a source of truth, you have to earn it, right? Like, we're not going to force people, like, to pass us data.[00:25:57] Olivier Godement: There is no value prop, like, you know, for us to store the data. The vision here is at the moment, like, most developers, like, use like a one size fits all model, like be off the shelf, like GP40 essentially. The vision we have is fast forward a couple of years. I think, like, most developers will essentially, like, have a.[00:26:15] Olivier Godement: An automated, continuous, fine tuned model. The more, like, you use the model, the more data you pass to the model provider, like, the model is automatically, like, fine tuned, evaluated against some eval sets, and essentially, like, you don't have to every month, when there is a new snapshot, like, you know, to go online and, you know, try a few new things.[00:26:34] Olivier Godement: That's a direction. We are pretty far away from it. But I think, like, that evaluation and decision product are essentially a first good step in that direction. It's like, hey, it's you. I set it by that direction, and you give us the evaluation data. We can actually log your completion data and start to do some automation on your behalf.[00:26:52] Alessio: And then you can do evals for free if you share data with OpenAI. How should people think about when it's worth it, when it's not? Sometimes people get overly protective of their data when it's actually not that useful. But how should developers think about when it's right to do it, when not, or[00:27:07] Olivier Godement: if you have any thoughts on it?[00:27:08] Olivier Godement: The default policy is still the same, like, you know, we don't train on, like, any API data unless you opt in. What we've seen from feedback is evaluation can be expensive. Like, if you run, like, O1 evals on, like, thousands of samples Like, your build will get increased, like, you know, pretty pretty significantly.[00:27:22] Olivier Godement: That's problem statement number one. Problem statement number two is, essentially, I want to get to a world where whenever OpenAI ships a new model snapshot, we have full confidence that there is no regression for the task that developers care about. And for that to be the case, essentially, we need to get evals.[00:27:39] Olivier Godement: And so that, essentially, is a sort of a two bugs one stone. It's like, we subsidize, basically, the evals. And we also use the evals when we ship new models to make sure that we keep going in the right direction. So, in my sense, it's a win win, but again, completely opt in. I expect that many developers will not want to share their data, and that's perfectly fine to me.[00:27:56] swyx: Yeah, I think free evals though, very, very good incentive. I mean, it's a fair trade. You get data, we get free evals. Exactly,[00:28:04] Olivier Godement: and we sanitize PII, everything. We have no interest in the actual sensitive data. We just want to have good evaluation on the real use cases.[00:28:13] swyx: Like, I always want to eval the eval. I don't know if that ever came up.[00:28:17] swyx: Like, sometimes the evals themselves are wrong, and there's no way for me to tell you.[00:28:22] Olivier Godement: Everyone who is starting with LLM, teaching with LLM, is like, Yeah, evaluation, easy, you know, I've done testing, like, all my life. And then you start to actually be able to eval, understand, like, all the corner cases, And you realize, wow, there's like a whole field in itself.[00:28:35] Olivier Godement: So, yeah, good evaluation is hard and so, yeah. Yeah, yeah.[00:28:38] swyx: But I think there's a, you know, I just talked to Brain Trust which I think is one of your partners. Mm-Hmm. . They also emphasize code based evals versus your sort of low code. What I see is like, I don't know, maybe there's some more that you didn't demo.[00:28:53] swyx: YC is kind of like a low code experience, right, for evals. Would you ever support like a more code based, like, would I run code on OpenAI's eval platform?[00:29:02] Olivier Godement: For sure. I mean, we meet developers where they are, you know. At the moment, the demand was more for like, you know, easy to get started, like eval. But, you know, if we need to expose like an evaluation API, for instance, for people like, you know, to pass, like, you know, their existing test data we'll do it.[00:29:15] Olivier Godement: So yeah, there is no, you know, philosophical, I would say, like, you know, misalignment on that. Yeah,[00:29:19] swyx: yeah, yeah. What I think this is becoming, by the way, and I don't, like it's basically, like, you're becoming AWS. Like, the AI cloud. And I don't know if, like, that's a conscious strategy, or it's, like, It doesn't even have to be a conscious strategy.[00:29:33] swyx: Like, you're going to offer storage. You're going to offer compute. You're going to offer networking. I don't know what networking looks like. Networking is maybe, like, Caching or like it's a CDN. It's a prompt CDN.[00:29:45] Alex Volkov: Yeah,[00:29:45] swyx: but it's the AI versions of everything, right? Do you like do you see the analogies or?[00:29:52] Olivier Godement: Whatever Whatever I took to developers. I feel like Good models are just half of the story to build a good app There's a third model you need to do Evaluation is the perfect example. Like, you know, you can have the best model in the world If you're in the dark, like, you know, it's really hard to gain the confidence and so Our philosophy is[00:30:11] Olivier Godement: The whole like software development stack is being basically reinvented, you know, with LLMs. There is no freaking way that open AI can build everything. Like there is just too much to build, frankly. And so my philosophy is, essentially, we'll focus on like the tools which are like the closest to the model itself.[00:30:28] Olivier Godement: So that's why you see us like, you know, investing quite a bit in like fine tuning, distillation, our evaluation, because we think that it actually makes sense to have like in one spot, Like, you know, all of that. Like, there is some sort of virtual circle, essentially, that you can set in place. But stuff like, you know, LLMOps, like tools which are, like, further away from the model, I don't know if you want to do, like, you know, super elaborate, like, prompt management, or, you know, like, tooling, like, I'm not sure, like, you know, OpenAI has, like, such a big edge, frankly, like, you know, to build this sort of tools.[00:30:56] Olivier Godement: So that's how we view it at the moment. But again, frankly, the philosophy is super simple. The strategy is super simple. It's meeting developers where they want us to be. And so, you know that's frankly, like, you know, day in, day out, like, you know, what I try to do.[00:31:08] Alessio: Cool. Thank you so much for the time.[00:31:10] Alessio: I'm sure you,[00:31:10] swyx: Yeah, I have more questions on, a couple questions on voice, and then also, like, your call to action, like, what you want feedback on, right? So, I think we should spend a bit more time on voice, because I feel like that's, like, the big splash thing. I talked well Well, I mean, I mean, just what is the future of real time for OpenAI?[00:31:28] swyx: Yeah. Because I think obviously video is next. You already have it in the, the ChatGPT desktop app. Do we just have a permanent, like, you know, like, are developers just going to be, like, sending sockets back and forth with OpenAI? Like how do we program for that? Like, what what is the future?[00:31:44] Olivier Godement: Yeah, that makes sense. I think with multimodality, like, real time is quickly becoming, like, you know, essentially the right experience, like, to build an application. Yeah. So my expectation is that we'll see like a non trivial, like a volume of applications like moving to a real time API. Like if you zoom out, like, audio is really simple, like, audio until basically now.[00:32:05] Olivier Godement: Audio on the web, in apps, was basically very much like a second class citizen. Like, you basically did like an audio chatbot for users who did not have a choice. You know, they were like struggling to read, or I don't know, they were like not super educated with technology. And so, frankly, it was like the crappy option, you know, compared to text.[00:32:25] Olivier Godement: But when you talk to people in the real world, the vast majority of people, like, prefer to talk and listen instead of typing and writing.[00:32:34] swyx: We speak before we write.[00:32:35] Olivier Godement: Exactly. I don't know. I mean, I'm sure it's the case for you in Singapore. For me, my friends in Europe, the number of, like, WhatsApp, like, voice notes they receive every day, I mean, just people, it makes sense, frankly, like, you know.[00:32:45] Olivier Godement: Chinese. Chinese, yeah.[00:32:46] swyx: Yeah,[00:32:47] Olivier Godement: all voice. You know, it's easier. There is more emotions. I mean, you know, you get the point across, like, pretty well. And so my personal ambition for, like, the real time API and, like, audio in general is to make, like, audio and, like, multimodality, like, truly a first class experience.[00:33:01] Olivier Godement: Like, you know, if you're, like, you know, the amazing, like, super bold, like, start up out of YC, you want to build, like, the next, like, billion, like, you know, user application to make it, like, truly your first and make it feel, like, you know, an actual good, like, you know, product experience. So that's essentially the ambition, and I think, like, yeah, it could be pretty big.[00:33:17] swyx: Yeah. I think one, one people, one issue that people have with the voice so far as, as released in advanced voice mode is the refusals.[00:33:24] Alex Volkov: Yeah.[00:33:24] swyx: You guys had a very inspiring model spec. I think Joanne worked on that. Where you said, like, yeah, we don't want to overly refuse all the time. In fact, like, even if, like, not safe for work, like, in some occasions, it's okay.[00:33:38] swyx: How, is there an API that we can say, not safe for work, okay?[00:33:41] Olivier Godement: I think we'll get there. I think we'll get there. The mobile spec, like, nailed it, like, you know. It nailed it! It's so good! Yeah, we are not in the business of, like, policing, you know, if you can say, like, vulgar words or whatever. You know, there are some use cases, like, you know, I'm writing, like, a Hollywood, like, script I want to say, like, will go on, and it's perfectly fine, you know?[00:33:59] Olivier Godement: And so I think the direction where we'll go here is that basically There will always be like, you know, a set of behavior that we will, you know, just like forbid, frankly, because they're illegal against our terms of services. But then there will be like, you know, some more like risky, like themes, which are completely legal, like, you know, vulgar words or, you know, not safe for work stuff.[00:34:17] Olivier Godement: Where basically we'll expose like a controllable, like safety, like knobs in the API to basically allow you to say, hey, that theme okay, that theme not okay. How sensitive do you want the threshold to be on safety refusals? I think that's the Dijkstra. So a[00:34:31] swyx: safety API.[00:34:32] Olivier Godement: Yeah, in a way, yeah.[00:34:33] swyx: Yeah, we've never had that.[00:34:34] Olivier Godement: Yeah. '[00:34:35] swyx: cause right now is you, it is whatever you decide. And then it's, that's it. That, that, that would be the main reason I don't use opening a voice is because of[00:34:42] Olivier Godement: it's over police. Over refuse over refusals. Yeah. Yeah, yeah. No, we gotta fix that. Yeah. Like singing,[00:34:47] Alessio: we're trying to do voice. I'm a singer.[00:34:49] swyx: And you, you locked off singing.[00:34:51] swyx: Yeah,[00:34:51] Alessio: yeah, yeah.[00:34:52] swyx: But I, I understand music gets you in trouble. Okay. Yeah. So then, and then just generally, like, what do you want to hear from developers? Right? We have, we have all developers watching you know, what feedback do you want? Any, anything specific as well, like from, especially from today anything that you are unsure about, that you are like, Our feedback could really help you decide.[00:35:09] swyx: For sure.[00:35:10] Olivier Godement: I think, essentially, it's becoming pretty clear after today that, you know, I would say the open end direction has become pretty clear, like, you know, after today. Investment in reasoning, investment in multimodality, Investment as well, like in, I would say, tool use, like function calling. To me, the biggest question I have is, you know, Where should we put the cursor next?[00:35:30] Olivier Godement: I think we need all three of them, frankly, like, you know, so we'll keep pushing.[00:35:33] swyx: Hire 10, 000 people, or actually, no need, build a bunch of bots.[00:35:37] Olivier Godement: Exactly, and so let's take O1 smart enough, like, for your problems? Like, you know, let's set aside for a second the existing models, like, for the apps that you would love to build, is O1 basically it in reasoning, or do we still have, like, you know, a step to do?[00:35:50] Olivier Godement: Preview is not enough, I[00:35:52] swyx: need the full one.[00:35:53] Olivier Godement: Yeah, so that's exactly that sort of feedback. Essentially what they would love to do is for developers I mean, there's a thing that Sam has been saying like over and over again, like, you know, it's easier said than done, but I think it's directionally correct. As a developer, as a founder, you basically want to build an app which is a bit too difficult for the model today, right?[00:36:12] Olivier Godement: Like, what you think is right, it's like, sort of working, sometimes not working. And that way, you know, that basically gives us like a goalpost, and be like, okay, that's what you need to enable with the next model release, like in a few months. And so I would say that Usually, like, that's the sort of feedback which is like the most useful that I can, like, directly, like, you know, incorporate.[00:36:33] swyx: Awesome. I think that's our time. Thank you so much, guys. Yeah, thank you so much.[00:36:38] AI Charlie: Thank you. We were particularly impressed that Olivier addressed the not safe for work moderation policy question head on, as that had only previously been picked up on in Reddit forums. This is an encouraging sign that we will return to in the closing candor with Sam Altman at the end of this episode.[00:36:57] Romain Huet, Head of DX, OpenAI[00:36:57] AI Charlie: Next, a chat with Roman Hewitt, friend of the pod, AI Engineer World's fair closing keynote speaker, and head of developer experience at OpenAI on his incredible live demos And advice to AI engineers on all the new modalities.[00:37:12] Alessio: Alright, we're live from OpenAI Dev Day. We're with Juan, who just did two great demos on, on stage.[00:37:17] Alessio: And he's been a friend of Latentspace, so thanks for taking some of the time.[00:37:20] Romain Huet: Of course, yeah, thank you for being here and spending the time with us today.[00:37:23] swyx: Yeah, I appreciate appreciate you guys putting this on. I, I know it's like extra work, but it really shows the developers that you're, Care and about reaching out.[00:37:31] Romain Huet: Yeah, of course, I think when you go back to the OpenAI mission, I think for us it's super important that we have the developers involved in everything we do. Making sure that you know, they have all of the tools they need to build successful apps. And we really believe that the developers are always going to invent the ideas, the prototypes, the fun factors of AI that we can't build ourselves.[00:37:49] Romain Huet: So it's really cool to have everyone here.[00:37:51] swyx: We had Michelle from you guys on. Yes, great episode. She very seriously said API is the path to AGI. Correct. And people in our YouTube comments were like, API is not AGI. I'm like, no, she's very serious. API is the path to AGI. Like, you're not going to build everything like the developers are, right?[00:38:08] swyx: Of[00:38:08] Romain Huet: course, yeah, that's the whole value of having a platform and an ecosystem of amazing builders who can, like, in turn, create all of these apps. I'm sure we talked about this before, but there's now more than 3 million developers building on OpenAI, so it's pretty exciting to see all of that energy into creating new things.[00:38:26] Alessio: I was going to say, you built two apps on stage today, an international space station tracker and then a drone. The hardest thing must have been opening Xcode and setting that up. Now, like, the models are so good that they can do everything else. Yes. You had two modes of interaction. You had kind of like a GPT app to get the plan with one, and then you had a cursor to do apply some of the changes.[00:38:47] Alessio: Correct. How should people think about the best way to consume the coding models, especially both for You know, brand new projects and then existing projects that you're trying to modify.[00:38:56] Romain Huet: Yeah. I mean, one of the things that's really cool about O1 Preview and O1 Mini being available in the API is that you can use it in your favorite tools like cursor like I did, right?[00:39:06] Romain Huet: And that's also what like Devin from Cognition can use in their own software engineering agents. In the case of Xcode, like, it's not quite deeply integrated in Xcode, so that's why I had like chat GPT side by side. But it's cool, right, because I could instruct O1 Preview to be, like, my coding partner and brainstorming partner for this app, but also consolidate all of the, the files and architect the app the way I wanted.[00:39:28] Romain Huet: So, all I had to do was just, like, port the code over to Xcode and zero shot the app build. I don't think I conveyed, by the way, how big a deal that is, but, like, you can now create an iPhone app from scratch, describing a lot of intricate details that you want, and your vision comes to life in, like, a minute.[00:39:47] Romain Huet: It's pretty outstanding.[00:39:48] swyx: I have to admit, I was a bit skeptical because if I open up SQL, I don't know anything about iOS programming. You know which file to paste it in. You probably set it up a little bit. So I'm like, I have to go home and test it. And I need the ChatGPT desktop app so that it can tell me where to click.[00:40:04] Romain Huet: Yeah, I mean like, Xcode and iOS development has become easier over the years since they introduced Swift and SwiftUI. I think back in the days of Objective C, or like, you know, the storyboard, it was a bit harder to get in for someone new. But now with Swift and SwiftUI, their dev tools are really exceptional.[00:40:23] Romain Huet: But now when you combine that with O1, as your brainstorming and coding partner, it's like your architect, effectively. That's the best way, I think, to describe O1. People ask me, like, can GPT 4 do some of that? And it certainly can. But I think it will just start spitting out code, right? And I think what's great about O1, is that it can, like, make up a plan.[00:40:42] Romain Huet: In this case, for instance, the iOS app had to fetch data from an API, it had to look at the docs, it had to look at, like, how do I parse this JSON, where do I store this thing, and kind of wire things up together. So that's where it really shines. Is mini or preview the better model that people should be using?[00:40:58] Romain Huet: Like, how? I think people should try both. We're obviously very excited about the upcoming O1 that we shared the evals for. But we noticed that O1 Mini is very, very good at everything math, coding, everything STEM. If you need for your kind of brainstorming or your kind of science part, you need some broader knowledge than reaching for O1 previews better.[00:41:20] Romain Huet: But yeah, I used O1 Mini for my second demo. And it worked perfectly. All I needed was very much like something rooted in code, architecting and wiring up like a front end, a backend, some UDP packets, some web sockets, something very specific. And it did that perfectly.[00:41:35] swyx: And then maybe just talking about voice and Wanderlust, the app that keeps on giving, what's the backstory behind like preparing for all of that?[00:41:44] Romain Huet: You know, it's funny because when last year for Dev Day, we were trying to think about what could be a great demo app to show like an assistive experience. I've always thought travel is a kind of a great use case because you have, like, pictures, you have locations, you have the need for translations, potentially.[00:42:01] Romain Huet: There's like so many use cases that are bounded to travel that I thought last year, let's use a travel app. And that's how Wanderlust came to be. But of course, a year ago, all we had was a text based assistant. And now we thought, well, if there's a voice modality, what if we just bring this app back as a wink.[00:42:19] Romain Huet: And what if we were interacting better with voice? And so with this new demo, what I showed was the ability to like, So, we wanted to have a complete conversation in real time with the app, but also the thing we wanted to highlight was the ability to call tools and functions, right? So, like in this case, we placed a phone call using the Twilio API, interfacing with our AI agents, but developers are so smart that they'll come up with so many great ideas that we could not think of ourselves, right?[00:42:48] Romain Huet: But what if you could have like a, you know, a 911 dispatcher? What if you could have like a customer service? Like center, that is much smarter than what we've been used to today. There's gonna be so many use cases for real time, it's awesome.[00:43:00] swyx: Yeah, and sometimes actually you, you, like this should kill phone trees.[00:43:04] swyx: Like there should not be like dial one[00:43:07] Romain Huet: of course para[00:43:08] swyx: espanol, you know? Yeah, exactly. Or whatever. I dunno.[00:43:12] Romain Huet: I mean, even you starting speaking Spanish would just do the thing, you know you don't even have to ask. So yeah, I'm excited for this future where we don't have to interact with those legacy systems.[00:43:22] swyx: Yeah. Yeah. Is there anything, so you are doing function calling in a streaming environment. So basically it's, it's web sockets. It's UDP, I think. It's basically not guaranteed to be exactly once delivery. Like, is there any coding challenges that you encountered when building this?[00:43:39] Romain Huet: Yeah, it's a bit more delicate to get into it.[00:43:41] Romain Huet: We also think that for now, what we, what we shipped is a, is a beta of this API. I think there's much more to build onto it. It does have the function calling and the tools. But we think that for instance, if you want to have something very robust, On your client side, maybe you want to have web RTC as a client, right?[00:43:58] Romain Huet: And, and as opposed to like directly working with the sockets at scale. So that's why we have partners like Life Kit and Agora if you want to, if you want to use them. And I'm sure we'll have many mores in the, in many more in the future. But yeah, we keep on iterating on that, and I'm sure the feedback of developers in the weeks to come is going to be super critical for us to get it right.[00:44:16] swyx: Yeah, I think LiveKit has been fairly public that they are used in, in the Chachapiti app. Like, is it, it's just all open source, and we just use it directly with OpenAI, or do we use LiveKit Cloud or something?[00:44:28] Romain Huet: So right now we, we released the API, we released some sample code also, and referenced clients for people to get started with our API.[00:44:35] Romain Huet: And we also partnered with LifeKit and Agora, so they also have their own, like ways to help you get started that plugs natively with the real time API. So depending on the use case, people can, can can decide what to use. If you're working on something that's completely client or if you're working on something on the server side, for the voice interaction, you may have different needs, so we want to support all of those.[00:44:55] Alessio: I know you gotta run. Is there anything that you want the AI engineering community to give feedback on specifically, like even down to like, you know, a specific API end point or like, what, what's like the thing that you want? Yeah. I[00:45:08] Romain Huet: mean, you know, if we take a step back, I think dev Day this year is all different from last year and, and in, in a few different ways.[00:45:15] Romain Huet: But one way is that we wanted to keep it intimate, even more intimate than last year. We wanted to make sure that the community is. Thank you very much for joining us on the Spotlight. That's why we have community talks and everything. And the takeaway here is like learning from the very best developers and AI engineers.[00:45:31] Romain Huet: And so, you know we want to learn from them. Most of what we shipped this morning, including things like prompt caching the ability to generate prompts quickly in the playground, or even things like vision fine tuning. These are all things that developers have been asking of us. And so, the takeaway I would, I would leave them with is to say like, Hey, the roadmap that we're working on is heavily influenced by them and their work.[00:45:53] Romain Huet: And so we love feedback From high feature requests, as you say, down to, like, very intricate details of an API endpoint, we love feedback, so yes that's, that's how we, that's how we build this API.[00:46:05] swyx: Yeah, I think the, the model distillation thing as well, it might be, like, the, the most boring, but, like, actually used a lot.[00:46:12] Romain Huet: True, yeah. And I think maybe the most unexpected, right, because I think if I, if I read Twitter correctly the past few days, a lot of people were expecting us. To shape the real time API for speech to speech. I don't think developers were expecting us to have more tools for distillation, and we really think that's gonna be a big deal, right?[00:46:30] Romain Huet: If you're building apps that have you know, you, you want high, like like low latency, low cost, but high performance, high quality on the use case distillation is gonna be amazing.[00:46:40] swyx: Yeah. I sat in the distillation session just now and they showed how they distilled from four oh to four mini and it was like only like a 2% hit in the performance and 50 next.[00:46:49] swyx: Yeah,[00:46:50] Romain Huet: I was there as well for the superhuman kind of use case inspired for an Ebola client. Yeah, this was really good. Cool man! so much for having me. Thanks again for being here today. It's always[00:47:00] AI Charlie: great to have you. As you might have picked up at the end of that chat, there were many sessions throughout the day focused on specific new capabilities.[00:47:08] Michelle Pokrass, Head of API at OpenAI ft. Simon Willison[00:47:08] AI Charlie: Like the new model distillation features combining EVOLs and fine tuning. For our next session, we are delighted to bring back two former guests of the pod, which is something listeners have been greatly enjoying in our second year of doing the Latent Space podcast. Michelle Pokras of the API team joined us recently to talk about structured outputs, and today gave an updated long form session at Dev Day, describing the implementation details of the new structured output mode.[00:47:39] AI Charlie: We also got her updated thoughts on the VoiceMode API we discussed in her episode, now that it is finally announced. She is joined by friend of the pod and super blogger, Simon Willison, who also came back as guest co host in our Dev Day. 2023 episode.[00:47:56] Alessio: Great, we're back live at Dev Day returning guest Michelle and then returning guest co host Fork.[00:48:03] Alessio: Fork, yeah, I don't know. I've lost count. I think it's been a few. Simon Willison is back. Yeah, we just wrapped, we just wrapped everything up. Congrats on, on getting everything everything live. Simon did a great, like, blog, so if you haven't caught up, I[00:48:17] Simon Willison: wrote my, I implemented it. Now, I'm starting my live blog while waiting for the first talk to start, using like GPT 4, I wrote me the Javascript, and I got that live just in time and then, yeah, I was live blogging the whole day.[00:48:28] swyx: Are you a cursor enjoyer?[00:48:29] Simon Willison: I haven't really gotten into cursor yet to be honest. I just haven't spent enough time for it to click, I think. I'm more a copy and paste things out of Cloud and chat GPT. Yeah. It's interesting.[00:48:39] swyx: Yeah. I've converted to cursor and 01 is so easy to just toggle on and off.[00:48:45] Alessio: What's your workflow?[00:48:46] Alessio: VS[00:48:48] Michelle Pokrass: Code co pilot, so Yep, same here. Team co pilot. Co pilot is actually the reason I joined OpenAI. It was, you know, before ChatGPT, this is the thing that really got me. So I'm still into it, but I keep meaning to try out Cursor, and I think now that things have calmed down, I'm gonna give it a real go.[00:49:03] swyx: Yeah, it's a big thing to change your tool of choice.[00:49:06] swyx: Yes,[00:49:06] Michelle Pokrass: yeah, I'm pretty dialed, so.[00:49:09] swyx: I mean, you know, if you want, you can just fork VS Code and make your own. That's the thing to dumb thing, right? We joked about doing a hackathon where the only thing you do is fork VS Code and bet me the best fork win.[00:49:20] Michelle Pokrass: Nice.[00:49:22] swyx: That's actually a really good idea. Yeah, what's up?[00:49:26] swyx: I mean, congrats on launching everything today. I know, like, we touched on it a little bit, but, like, everyone was kind of guessing that Voice API was coming, and, like, we talked about it in our episode. How do you feel going into the launch? Like, any design decisions that you want to highlight?[00:49:41] Michelle Pokrass: Yeah, super jazzed about it. The team has been working on it for a while. It's, like, a very different API for us. It's the first WebSocket API, so a lot of different design decisions to be made. It's, like, what kind of events do you send? When do you send an event? What are the event names? What do you send, like, on connection versus on future messages?[00:49:57] Michelle Pokrass: So there have been a lot of interesting decisions there. The team has also hacked together really cool projects as we've been testing it. One that I really liked is we had an internal hack a thon for the API team. And some folks built like a little hack that you could use to, like VIM with voice mode, so like, control vim, and you would tell them on like, nice, write a file and it would, you know, know all the vim commands and, and pipe those in.[00:50:18] Michelle Pokrass: So yeah, a lot of cool stuff we've been hacking on and really excited to see what people build with it.[00:50:23] Simon Willison: I've gotta call out a demo from today. I think it was Katja had a 3D visualization of the solar system, like WebGL solar system, you could talk to. That is one of the coolest conference demos I've ever seen.[00:50:33] Simon Willison: That was so convincing. I really want the code. I really want the code for that to get put out there. I'll talk[00:50:39] Michelle Pokrass: to the team. I think we can[00:50:40] Simon Willison: probably

ceo american head ai europe english google hollywood vision moving real french care san francisco research video european chinese government speaker spanish focus minnesota preparing team pennsylvania write safety plan iphone chatgpt 3d testing balancing tokyo employees product investment networking whatsapp reddit streaming cloud singapore id hire happy birthday stem genius ios gotta skype projects korean launching complex excited solve substack ip riding chat harder fantastic pi enterprise cto whispers react swift ev gemini slack openai agora correct genie sf evaluation launches api spotlight olivier real time seoul gpt delving ebola python ui aws mm ml takes fork coinbase apis usaid dev strawberry llm chief product officer javascript anthropic sam altman html romain katja prompt cognition devs versus wanderlust altman reasoning agi translators orcs waymo fp o2 sql turing fairly applause rag truthfully gpus dx ocr twilio cdn schema opensea ilan withholding google gemini alessio yc r1 suno l1 cursor notebooklm json unix famously artificial general intelligence vim debugging metamask brain trust vs code pii chief research officer xcode hacker news oauth rtc making ai caching distillation udp distill swiftui objective c dijkstra ews o3 websockets o1 vpc gbt webgl loras c sharp devday icds fort mason code interpreter cosine matt knight simon willison latent space kevin weil because yeah

Language Agents: From Reasoning to Acting

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Sep 27, 2024 89:44

OpenAI DevDay is almost here! Per tradition, we are hosting a DevDay pregame event for everyone coming to town! Join us with demos and gossip!Also sign up for related events across San Francisco: the AI DevTools Night, the xAI open house, the Replicate art show, the DevDay Watch Party (for non-attendees), Hack Night with OpenAI at Cloudflare. For everyone else, join the Latent Space Discord for our online watch party and find fellow AI Engineers in your city.OpenAI's recent o1 release (and Reflection 70b debacle) has reignited broad interest in agentic general reasoning and tree search methods.While we have covered some of the self-taught reasoning literature on the Latent Space Paper Club, it is notable that the Eric Zelikman ended up at xAI, whereas OpenAI's hiring of Noam Brown and now Shunyu suggests more interest in tool-using chain of thought/tree of thought/generator-verifier architectures for Level 3 Agents.We were more than delighted to learn that Shunyu is a fellow Latent Space enjoyer, and invited him back (after his first appearance on our NeurIPS 2023 pod) for a look through his academic career with Harrison Chase (one year after his first LS show).ReAct: Synergizing Reasoning and Acting in Language Modelspaper linkFollowing seminal Chain of Thought papers from Wei et al and Kojima et al, and reflecting on lessons from building the WebShop human ecommerce trajectory benchmark, Shunyu's first big hit, the ReAct paper showed that using LLMs to “generate both reasoning traces and task-specific actions in an interleaved manner” achieved remarkably greater performance (less hallucination/error propagation, higher ALFWorld/WebShop benchmark success) than CoT alone. In even better news, ReAct scales fabulously with finetuning:As a member of the elite Princeton NLP group, Shunyu was also a coauthor of the Reflexion paper, which we discuss in this pod.Tree of Thoughtspaper link hereShunyu's next major improvement on the CoT literature was Tree of Thoughts:Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role…ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices.The beauty of ToT is it doesnt require pretraining with exotic methods like backspace tokens or other MCTS architectures. You can listen to Shunyu explain ToT in his own words on our NeurIPS pod, but also the ineffable Yannic Kilcher:Other WorkWe don't have the space to summarize the rest of Shunyu's work, you can listen to our pod with him now, and recommend the CoALA paper and his initial hit webinar with Harrison, today's guest cohost:as well as Shunyu's PhD Defense Lecture:as well as Shunyu's latest lecture covering a Brief History of LLM Agents:As usual, we are live on YouTube! Show Notes* Harrison Chase* LangChain, LangSmith, LangGraph* Shunyu Yao* Alec Radford* ReAct Paper* Hotpot QA* Tau Bench* WebShop* SWE-Agent* SWE-Bench* Trees of Thought* CoALA Paper* Related Episodes* Our Thomas Scialom (Meta) episode* Shunyu on our NeurIPS 2023 Best Papers episode* Harrison on our LangChain episode* Mentions* Sierra* Voyager* Jason Wei* Tavily* SERP API* ExaTimestamps* [00:00:00] Opening Song by Suno* [00:03:00] Introductions* [00:06:16] The ReAct paper* [00:12:09] Early applications of ReAct in LangChain* [00:17:15] Discussion of the Reflection paper* [00:22:35] Tree of Thoughts paper and search algorithms in language models* [00:27:21] SWE-Agent and SWE-Bench for coding benchmarks* [00:39:21] CoALA: Cognitive Architectures for Language Agents* [00:45:24] Agent-Computer Interfaces (ACI) and tool design for agents* [00:49:24] Designing frameworks for agents vs humans* [00:53:52] UX design for AI applications and agents* [00:59:53] Data and model improvements for agent capabilities* [01:19:10] TauBench* [01:23:09] Promising areas for AITranscriptAlessio [00:00:01]: Hey, everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO of Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Small AI.Swyx [00:00:12]: Hey, and today we have a super special episode. I actually always wanted to take like a selfie and go like, you know, POV, you're about to revolutionize the world of agents because we have two of the most awesome hiring agents in the house. So first, we're going to welcome back Harrison Chase. Welcome. Excited to be here. What's new with you recently in sort of like the 10, 20 second recap?Harrison [00:00:34]: Linkchain, Linksmith, Lingraph, pushing on all of them. Lots of cool stuff related to a lot of the stuff that we're going to talk about today, probably.Swyx [00:00:42]: Yeah.Alessio [00:00:43]: We'll mention it in there. And the Celtics won the title.Swyx [00:00:45]: And the Celtics won the title. You got that going on for you. I don't know. Is that like floorball? Handball? Baseball? Basketball.Alessio [00:00:52]: Basketball, basketball.Harrison [00:00:53]: Patriots aren't looking good though, so that's...Swyx [00:00:56]: And then Xun Yu, you've also been on the pod, but only in like a sort of oral paper presentation capacity. But welcome officially to the LinkedSpace pod.Shunyu [00:01:03]: Yeah, I've been a huge fan. So thanks for the invitation. Thanks.Swyx [00:01:07]: Well, it's an honor to have you on. You're one of like, you're maybe the first PhD thesis defense I've ever watched in like this AI world, because most people just publish single papers, but every paper of yours is a banger. So congrats.Shunyu [00:01:22]: Thanks.Swyx [00:01:24]: Yeah, maybe we'll just kick it off with, you know, what was your journey into using language models for agents? I like that your thesis advisor, I didn't catch his name, but he was like, you know... Karthik. Yeah. It's like, this guy just wanted to use language models and it was such a controversial pick at the time. Right.Shunyu [00:01:39]: The full story is that in undergrad, I did some computer vision research and that's how I got into AI. But at the time, I feel like, you know, you're just composing all the GAN or 3D perception or whatever together and it's not exciting anymore. And one day I just see this transformer paper and that's really cool. But I really got into language model only when I entered my PhD and met my advisor Karthik. So he was actually the second author of GPT-1 when he was like a visiting scientist at OpenAI. With Alec Redford?Swyx [00:02:10]: Yes.Shunyu [00:02:11]: Wow. That's what he told me. It's like back in OpenAI, they did this GPT-1 together and Ilya just said, Karthik, you should stay because we just solved the language. But apparently Karthik is not fully convinced. So he went to Princeton, started his professorship and I'm really grateful. So he accepted me as a student, even though I have no prior knowledge in NLP. And you know, we just met for the first time and he's like, you know, what do you want to do? And I'm like, you know, you have done those test game scenes. That's really cool. I wonder if we can just redo them with language models. And that's how the whole journey began. Awesome.Alessio [00:02:46]: So GPT-2 was out at the time? Yes, that was 2019.Shunyu [00:02:48]: Yeah.Alessio [00:02:49]: Way too dangerous to release. And then I guess the first work of yours that I came across was React, which was a big part of your defense. But also Harrison, when you came on The Pockets last year, you said that was one of the first papers that you saw when you were getting inspired for BlankChain. So maybe give a recap of why you thought it was cool, because you were already working in AI and machine learning. And then, yeah, you can kind of like intro the paper formally. What was that interesting to you specifically?Harrison [00:03:16]: Yeah, I mean, I think the interesting part was using these language models to interact with the outside world in some form. And I think in the paper, you mostly deal with Wikipedia. And I think there's some other data sets as well. But the outside world is the outside world. And so interacting with things that weren't present in the LLM and APIs and calling into them and thinking about the React reasoning and acting and kind of like combining those together and getting better results. I'd been playing around with LLMs, been talking with people who were playing around with LLMs. People were trying to get LLMs to call into APIs, do things, and it was always, how can they do it more reliably and better? And so this paper was basically a step in that direction. And I think really interesting and also really general as well. Like I think that's part of the appeal is just how general and simple in a good way, I think the idea was. So that it was really appealing for all those reasons.Shunyu [00:04:07]: Simple is always good. Yeah.Alessio [00:04:09]: Do you have a favorite part? Because I have one favorite part from your PhD defense, which I didn't understand when I read the paper, but you said something along the lines, React doesn't change the outside or the environment, but it does change the insight through the context, putting more things in the context. You're not actually changing any of the tools around you to work for you, but you're changing how the model thinks. And I think that was like a very profound thing when I, not that I've been using these tools for like 18 months. I'm like, I understand what you meant, but like to say that at the time you did the PhD defense was not trivial. Yeah.Shunyu [00:04:41]: Another way to put it is like thinking can be an extra tool that's useful.Alessio [00:04:47]: Makes sense. Checks out.Swyx [00:04:49]: Who would have thought? I think it's also more controversial within his world because everyone was trying to use RL for agents. And this is like the first kind of zero gradient type approach. Yeah.Shunyu [00:05:01]: I think the bigger kind of historical context is that we have this two big branches of AI. So if you think about RL, right, that's pretty much the equivalent of agent at a time. And it's like agent is equivalent to reinforcement learning and reinforcement learning is equivalent to whatever game environment they're using, right? Atari game or go or whatever. So you have like a pretty much, you know, you have a biased kind of like set of methodologies in terms of reinforcement learning and represents agents. On the other hand, I think NLP is like a historical kind of subject. It's not really into agents, right? It's more about reasoning. It's more about solving those concrete tasks. And if you look at SEL, right, like each task has its own track, right? Summarization has a track, question answering has a track. So I think really it's about rethinking agents in terms of what could be the new environments that we came to have is not just Atari games or whatever video games, but also those text games or language games. And also thinking about, could there be like a more general kind of methodology beyond just designing specific pipelines for each NLP task? That's like the bigger kind of context, I would say.Alessio [00:06:14]: Is there an inspiration spark moment that you remember or how did you come to this? We had Trida on the podcast and he mentioned he was really inspired working with like systems people to think about Flash Attention. What was your inspiration journey?Shunyu [00:06:27]: So actually before React, I spent the first two years of my PhD focusing on text-based games, or in other words, text adventure games. It's a very kind of small kind of research area and quite ad hoc, I would say. And there are like, I don't know, like 10 people working on that at the time. And have you guys heard of Zork 1, for example? So basically the idea is you have this game and you have text observations, like you see a monster, you see a dragon.Swyx [00:06:57]: You're eaten by a grue.Shunyu [00:06:58]: Yeah, you're eaten by a grue. And you have actions like kill the grue with a sword or whatever. And that's like a very typical setup of a text game. So I think one day after I've seen all the GPT-3 stuff, I just think about, you know, how can I solve the game? Like why those AI, you know, machine learning methods are pretty stupid, but we are pretty good at solving the game relatively, right? So for the context, the predominant method to solve this text game is obviously reinforcement learning. And the idea is you just try out an arrow in those games for like millions of steps and you kind of just overfit to the game. But there's no language understanding at all. And I'm like, why can't I solve the game better? And it's kind of like, because we think about the game, right? Like when we see this very complex text observation, like you see a grue and you might see a sword, you know, in the right of the room and you have to go through the wooden door to go to that room. You will think, you know, oh, I have to kill the monster and to kill that monster, I have to get the sword, I have to get the sword, I have to go, right? And this kind of thinking actually helps us kind of throw shots off the game. And it's like, why don't we also enable the text agents to think? And that's kind of the prototype of React. And I think that's actually very interesting because the prototype, I think, was around November of 2021. So that's even before like chain of thought or whatever came up. So we did a bunch of experiments in the text game, but it was not really working that well. Like those text games are just too hard. I think today it's still very hard. Like if you use GPD 4 to solve it, it's still very hard. So the change came when I started the internship in Google. And apparently Google care less about text game, they care more about what's more practical. So pretty much I just reapplied the idea, but to more practical kind of environments like Wikipedia or simpler text games like Alphard, and it just worked. It's kind of like you first have the idea and then you try to find the domains and the problems to demonstrate the idea, which is, I would say, different from most of the AI research, but it kind of worked out for me in that case.Swyx [00:09:09]: For Harrison, when you were implementing React, what were people applying React to in the early days?Harrison [00:09:14]: I think the first demo we did probably had like a calculator tool and a search tool. So like general things, we tried to make it pretty easy to write your own tools and plug in your own things. And so this is one of the things that we've seen in LangChain is people who build their own applications generally write their own tools. Like there are a few common ones. I'd say like the three common ones might be like a browser, a search tool, and a code interpreter. But then other than that-Swyx [00:09:37]: The LMS. Yep.Harrison [00:09:39]: Yeah, exactly. It matches up very nice with that. And we actually just redid like our integrations docs page, and if you go to the tool section, they like highlight those three, and then there's a bunch of like other ones. And there's such a long tail of other ones. But in practice, like when people go to production, they generally have their own tools or maybe one of those three, maybe some other ones, but like very, very few other ones. So yeah, I think the first demos was a search and a calculator one. And there's- What's the data set?Shunyu [00:10:04]: Hotpot QA.Harrison [00:10:05]: Yeah. Oh, so there's that one. And then there's like the celebrity one by the same author, I think.Swyx [00:10:09]: Olivier Wilde's boyfriend squared. Yeah. 0.23. Yeah. Right, right, right.Harrison [00:10:16]: I'm forgetting the name of the author, but there's-Swyx [00:10:17]: I was like, we're going to over-optimize for Olivier Wilde's boyfriend, and it's going to change next year or something.Harrison [00:10:21]: There's a few data sets kind of like in that vein that require multi-step kind of like reasoning and thinking. So one of the questions I actually had for you in this vein, like the React paper, there's a few things in there, or at least when I think of that, there's a few things that I think of. There's kind of like the specific prompting strategy. Then there's like this general idea of kind of like thinking and then taking an action. And then there's just even more general idea of just like taking actions in a loop. Today, like obviously language models have changed a lot. We have tool calling. The specific prompting strategy probably isn't used super heavily anymore. Would you say that like the concept of React is still used though? Or like do you think that tool calling and running tool calling in a loop, is that ReactSwyx [00:11:02]: in your mind?Shunyu [00:11:03]: I would say like it's like more implicitly used than explicitly used. To be fair, I think the contribution of React is actually twofold. So first is this idea of, you know, we should be able to use calls in a very general way. Like there should be a single kind of general method to handle interaction with various environments. I think React is the first paper to demonstrate the idea. But then I think later there are two form or whatever, and this becomes like a trivial idea. But I think at the time, that's like a pretty non-trivial thing. And I think the second contribution is this idea of what people call like inner monologue or thinking or reasoning or whatever, to be paired with tool use. I think that's still non-trivial because if you look at the default function calling or whatever, like there's no inner monologue. And in practice, that actually is important, especially if the tool that you use is pretty different from the training distribution of the language model. I think those are the two main things that are kind of inherited.Harrison [00:12:10]: On that note, I think OpenAI even recommended when you're doing tool calling, it's sometimes helpful to put a thought field in the tool, along with all the actual acquired arguments,Swyx [00:12:19]: and then have that one first.Harrison [00:12:20]: So it fills out that first, and they've shown that that's yielded better results. The reason I ask is just like this same concept is still alive, and I don't know whether to call it a React agent or not. I don't know what to call it. I think of it as React, like it's the same ideas that were in the paper, but it's obviously a very different implementation at this point in time. And so I just don't know what to call it.Shunyu [00:12:40]: I feel like people will sometimes think more in terms of different tools, right? Because if you think about a web agent versus, you know, like a function calling agent, calling a Python API, you would think of them as very different. But in some sense, the methodology is the same. It depends on how you view them, right? I think people will tend to think more in terms of the environment and the tools rather than the methodology. Or, in other words, I think the methodology is kind of trivial and simple, so people will try to focus more on the different tools. But I think it's good to have a single underlying principle of those things.Alessio [00:13:17]: How do you see the surface of React getting molded into the model? So a function calling is a good example of like, now the model does it. What about the thinking? Now most models that you use kind of do chain of thought on their own, they kind of produce steps. Do you think that more and more of this logic will be in the model? Or do you think the context window will still be the main driver of reasoning and thinking?Shunyu [00:13:39]: I think it's already default, right? You do some chain of thought and you do some tool call, the cost of adding the chain of thought is kind of relatively low compared to other things. So it's not hurting to do that. And I think it's already kind of common practice, I would say.Swyx [00:13:56]: This is a good place to bring in either Tree of Thought or Reflection, your pick.Shunyu [00:14:01]: Maybe Reflection, to respect the time order, I would say.Swyx [00:14:05]: Any backstory as well, like the people involved with NOAA and the Princeton group. We talked about this offline, but people don't understand how these research pieces come together and this ideation.Shunyu [00:14:15]: I think Reflection is mostly NOAA's work, I'm more like advising kind of role. The story is, I don't remember the time, but one day we just see this pre-print that's like Reflection and Autonomous Agent with memory or whatever. And it's kind of like an extension to React, which uses this self-reflection. I'm like, oh, somehow you've become very popular. And NOAA reached out to me, it's like, do you want to collaborate on this and make this from an archive pre-print to something more solid, like a conference submission? I'm like, sure. We started collaborating and we remain good friends today. And I think another interesting backstory is NOAA was contacted by OpenAI at the time. It's like, this is pretty cool, do you want to just work at OpenAI? And I think Sierra also reached out at the same time. It's like, this is pretty cool, do you want to work at Sierra? And I think NOAA chose Sierra, but it's pretty cool because he was still like a second year undergrad and he's a very smart kid.Swyx [00:15:16]: Based on one paper. Oh my god.Shunyu [00:15:19]: He's done some other research based on programming language or chemistry or whatever, but I think that's the paper that got the attention of OpenAI and Sierra.Swyx [00:15:28]: For those who haven't gone too deep on it, the way that you present the inside of React, can you do that also for reflection? Yeah.Shunyu [00:15:35]: I think one way to think of reflection is that the traditional idea of reinforcement learning is you have a scalar reward and then you somehow back-propagate the signal of the scalar reward to the rest of your neural network through whatever algorithm, like policy grading or A2C or whatever. And if you think about the real life, most of the reward signal is not scalar. It's like your boss told you, you should have done a better job in this, but you could jump on that or whatever. It's not like a scalar reward, like 29 or something. I think in general, humans deal more with long scalar reward, or you can say language feedback. And the way that they deal with language feedback also has this back-propagation process, right? Because you start from this, you did a good job on job B, and then you reflect what could have been done differently to change to make it better. And you kind of change your prompt, right? Basically, you change your prompt on how to do job A and how to do job B, and then you do the whole thing again. So it's really like a pipeline of language where in self-graded descent, you have something like text reasoning to replace those gradient descent algorithms. I think that's one way to think of reflection.Harrison [00:16:47]: One question I have about reflection is how general do you think the algorithm there is? And so for context, I think at LangChain and at other places as well, we found it pretty easy to implement React in a standard way. You plug in any tools and it kind of works off the shelf, can get it up and running. I don't think we have an off-the-shelf kind of implementation of reflection and kind of the general sense. I think the concepts, absolutely, we see used in different kind of specific cognitive architectures, but I don't think we have one that comes off the shelf. I don't think any of the other frameworks have one that comes off the shelf. And I'm curious whether that's because it's not general enough or it's complex as well, because it also requires running it more times.Swyx [00:17:28]: Maybe that's not feasible.Harrison [00:17:30]: I'm curious how you think about the generality, complexity. Should we have one that comes off the shelf?Shunyu [00:17:36]: I think the algorithm is general in the sense that it's just as general as other algorithms, if you think about policy grading or whatever, but it's not applicable to all tasks, just like other algorithms. So you can argue PPO is also general, but it works better for those set of tasks, but not on those set of tasks. I think it's the same situation for reflection. And I think a key bottleneck is the evaluator, right? Basically, you need to have a good sense of the signal. So for example, if you are trying to do a very hard reasoning task, say mathematics, for example, and you don't have any tools, you're operating in this chain of thought setup, then reflection will be pretty hard because in order to reflect upon your thoughts, you have to have a very good evaluator to judge whether your thought is good or not. But that might be as hard as solving the problem itself or even harder. The principle of self-reflection is probably more applicable if you have a good evaluator, for example, in the case of coding. If you have those arrows, then you can just reflect on that and how to solve the bug andSwyx [00:18:37]: stuff.Shunyu [00:18:38]: So I think another criteria is that it depends on the application, right? If you have this latency or whatever need for an actual application with an end-user, the end-user wouldn't let you do two hours of tree-of-thought or reflection, right? You need something as soon as possible. So in that case, maybe this is better to be used as a training time technique, right? You do those reflection or tree-of-thought or whatever, you get a lot of data, and then you try to use the data to train your model better. And then in test time, you still use something as simple as React, but that's already improved.Alessio [00:19:11]: And if you think of the Voyager paper as a way to store skills and then reuse them, how would you compare this reflective memory and at what point it's just ragging on the memory versus you want to start to fine-tune some of them or what's the next step once you get a very long reflective corpus? Yeah.Shunyu [00:19:30]: So I think there are two questions here. The first question is, what type of information or memory are you considering, right? Is it like semantic memory that stores knowledge about the word, or is it the episodic memory that stores trajectories or behaviors, or is it more of a procedural memory like in Voyager's case, like skills or code snippets that you can use to do actions, right?Swyx [00:19:54]: That's one dimension.Shunyu [00:19:55]: And the second dimension is obviously how you use the memory, either retrieving from it, using it in the context, or fine-tuning it. I think the Cognitive Architecture for Language Agents paper has a good categorization of all the different combinations. And of course, which way you use depends on the concrete application and the concrete need and the concrete task. But I think in general, it's good to think of those systematic dimensions and all the possible options there.Swyx [00:20:25]: Harrison also has in LangMEM, I think you did a presentation in my meetup, and I think you've done it at a couple other venues as well. User state, semantic memory, and append-only state, I think kind of maps to what you just said.Shunyu [00:20:38]: What is LangMEM? Can I give it like a quick...Harrison [00:20:40]: One of the modules of LangChain for a long time has been something around memory. And I think we're still obviously figuring out what that means, as is everyone kind of in the space. But one of the experiments that we did, and one of the proof of concepts that we did was, technically what it was is you would basically create threads, you'd push messages to those threads in the background, we process the data in a few ways. One, we put it into some semantic store, that's the semantic memory. And then two, we do some extraction and reasoning over the memories to extract. And we let the user define this, but extract key facts or anything that's of interest to the user. Those aren't exactly trajectories, they're maybe more closer to the procedural memory. Is that how you'd think about it or classify it?Shunyu [00:21:22]: Is it like about knowledge about the word, or is it more like how to do something?Swyx [00:21:27]: It's reflections, basically.Harrison [00:21:28]: So in generative worlds.Shunyu [00:21:30]: Generative agents.Swyx [00:21:31]: The Smallville. Yeah, the Smallville one.Harrison [00:21:33]: So the way that they had their memory there was they had the sequence of events, and that's kind of like the raw events that happened. But then every N events, they'd run some synthesis over those events for the LLM to insert its own memory, basically. It's that type of memory.Swyx [00:21:49]: I don't know how that would be classified.Shunyu [00:21:50]: I think of that as more of the semantic memory, but to be fair, I think it's just one way to think of that. But whether it's semantic memory or procedural memory or whatever memory, that's like an abstraction layer. But in terms of implementation, you can choose whatever implementation for whatever memory. So they're totally kind of orthogonal. I think it's more of a good way to think of the things, because from the history of cognitive science and cognitive architecture and how people study even neuroscience, that's the way people think of how the human brain organizes memory. And I think it's more useful as a way to think of things. But it's not like for semantic memory, you have to do this kind of way to retrieve or fine-tune, and for procedural memory, you have to do that. I think those are totally orthogonal kind of dimensions.Harrison [00:22:34]: How much background do you have in cognitive sciences, and how much do you model some of your thoughts on?Shunyu [00:22:40]: That's a great question, actually. I think one of the undergrad influences for my follow-up research is I was doing an internship at MIT's Computational Cognitive Science Lab with Josh Tannenbaum, and he's a very famous cognitive scientist. And I think a lot of his ideas still influence me today, like thinking of things in computational terms and getting interested in language and a lot of stuff, or even developing psychology kind of stuff. So I think it still influences me today.Swyx [00:23:14]: As a developer that tried out LangMEM, the way I view it is just it's a materialized view of a stream of logs. And if anything, that's just useful for context compression. I don't have to use the full context to run it over everything. But also it's kind of debuggable. If it's wrong, I can show it to the user, the user can manually fix it, and I can carry on. That's a really good analogy. I like that. I'm going to steal that. Sure. Please, please. You know I'm bullish on memory databases. I guess, Tree of Thoughts? Yeah, Tree of Thoughts.Shunyu [00:23:39]: I feel like I'm relieving the defense in like a podcast format. Yeah, no.Alessio [00:23:45]: I mean, you had a banger. Well, this is the one where you're already successful and we just highlight the glory. It was really good. You mentioned that since thinking is kind of like taking an action, you can use action searching algorithms to think of thinking. So just like you will use Tree Search to find the next thing. And the idea behind Tree of Thought is that you generate all these possible outcomes and then find the best tree to get to the end. Maybe back to the latency question, you can't really do that if you have to respond in real time. So what are maybe some of the most helpful use cases for things like this? Where have you seen people adopt it where the high latency is actually worth the wait?Shunyu [00:24:21]: For things that you don't care about latency, obviously. For example, if you're trying to do math, if you're just trying to come up with a proof. But I feel like one type of task is more about searching for a solution. You can try a hundred times, but if you find one solution, that's good. For example, if you're finding a math proof or if you're finding a good code to solve a problem or whatever, I think another type of task is more like reacting. For example, if you're doing customer service, you're like a web agent booking a ticket for an end user. Those are more reactive kind of tasks, or more real-time tasks. You have to do things fast. They might be easy, but you have to do it reliably. And you care more about can you solve 99% of the time out of a hundred. But for the type of search type of tasks, then you care more about can I find one solution out of a hundred. So it's kind of symmetric and different.Alessio [00:25:11]: Do you have any data or intuition from your user base? What's the split of these type of use cases? How many people are doing more reactive things and how many people are experimenting with deep, long search?Harrison [00:25:23]: I would say React's probably the most popular. I think there's aspects of reflection that get used. Tree of thought, probably the least so. There's a great tweet from Jason Wei, I think you're now a colleague, and he was talking about prompting strategies and how he thinks about them. And I think the four things that he had was, one, how easy is it to implement? How much compute does it take? How many tasks does it solve? And how much does it improve on those tasks? And I'd add a fifth, which is how likely is it to be relevant when the next generation of models come out? And I think if you look at those axes and then you look at React, reflection, tree of thought, it tracks that the ones that score better are used more. React is pretty easy to implement. Tree of thought's pretty hard to implement. The amount of compute, yeah, a lot more for tree of thought. The tasks and how much it improves, I don't have amazing visibility there. But I think if we're comparing React versus tree of thought, React just dominates the first two axes so much that my question around that was going to be like, how do you think about these prompting strategies, cognitive architectures, whatever you want to call them? When you're thinking of them, what are the axes that you're judging them on in your head when you're thinking whether it's a good one or a less good one?Swyx [00:26:38]: Right.Shunyu [00:26:39]: Right. I think there is a difference between a prompting method versus research, in the sense that for research, you don't really even care about does it actually work on practical tasks or does it help? Whatever. I think it's more about the idea or the principle, right? What is the direction that you're unblocking and whatever. And I think for an actual prompting method to solve a concrete problem, I would say simplicity is very important because the simpler it is, the less decision you have to make about it. And it's easier to design. It's easier to propagate. And it's easier to do stuff. So always try to be as simple as possible. And I think latency obviously is important. If you can do things fast and you don't want to do things slow. And I think in terms of the actual prompting method to use for a particular problem, I think we should all be in the minimalist kind of camp, right? You should try the minimum thing and see if it works. And if it doesn't work and there's absolute reason to add something, then you add something, right? If there's absolute reason that you need some tool, then you should add the tool thing. If there's absolute reason to add reflection or whatever, you should add that. Otherwise, if a chain of thought can already solve something, then you don't even need to use any of that.Harrison [00:27:57]: Yeah. Or if it's just better prompting can solve it. Like, you know, you could add a reflection step or you could make your instructions a little bit clearer.Swyx [00:28:03]: And it's a lot easier to do that.Shunyu [00:28:04]: I think another interesting thing is like, I personally have never done those kind of like weird tricks. I think all the prompts that I write are kind of like just talking to a human, right? It's like, I don't know. I never say something like, your grandma is dying and you have to solve it. I mean, those are cool, but I feel like we should all try to solve things in a very intuitive way. Just like talking to your co-worker. That should work 99% of the time. That's my personal take.Swyx [00:28:29]: The problem with how language models, at least in the GPC 3 era, was that they over-optimized to some sets of tokens in sequence. So like reading the Kojima et al. paper that was listing step-by-step, like he tried a bunch of them and they had wildly different results. It should not be the case, but it is the case. And hopefully we're getting better there.Shunyu [00:28:51]: Yeah. I think it's also like a timing thing in the sense that if you think about this whole line of language model, right? Like at the time it was just like a text generator. We don't have any idea how it's going to be used, right? And obviously at the time you will find all kinds of weird issues because it's not trained to do any of that, right? But then I think we have this loop where once we realize chain of thought is important or agent is important or tool using is important, what we see is today's language models are heavily optimized towards those things. So I think in some sense they become more reliable and robust over those use cases. And you don't need to do as much prompt engineering tricks anymore to solve those things. I feel like in some sense, I feel like prompt engineering even is like a slightly negative word at the time because it refers to all those kind of weird tricks that you have to apply. But I think we don't have to do that anymore. Like given today's progress, you should just be able to talk to like a coworker. And if you're clear and concrete and being reasonable, then it should do reasonable things for you.Swyx [00:29:51]: Yeah. The way I put this is you should not be a prompt engineer because it is the goal of the big labs to put you out of a job.Shunyu [00:29:58]: You should just be a good communicator. Like if you're a good communicator to humans, you should be a good communicator to languageSwyx [00:30:02]: models.Harrison [00:30:03]: That's the key though, because oftentimes people aren't good communicators to these language models and that is a very important skill and that's still messing around with the prompt. And so it depends what you're talking about when you're saying prompt engineer.Shunyu [00:30:14]: But do you think it's like very correlated with like, are they like a good communicator to humans? You know, it's like.Harrison [00:30:20]: It may be, but I also think I would say on average, people are probably worse at communicating with language models than to humans right now, at least, because I think we're still figuring out how to do it. You kind of expect it to be magical and there's probably some correlation, but I'd say there's also just like, people are worse at it right now than talking to humans.Shunyu [00:30:36]: We should make it like a, you know, like an elementary school class or whatever, how toSwyx [00:30:41]: talk to language models. Yeah. I don't know. Very pro that. Yeah. Before we leave the topic of trees and searching, not specific about QSTAR, but there's a lot of questions about MCTS and this combination of tree search and language models. And I just had to get in a question there about how seriously should people take this?Shunyu [00:30:59]: Again, I think it depends on the tasks, right? So MCTS was magical for Go, but it's probably not as magical for robotics, right? So I think right now the problem is not even that we don't have good methodologies, it's more about we don't have good tasks. It's also very interesting, right? Because if you look at my citation, it's like, obviously the most cited are React, Refraction and Tree of Thought. Those are methodologies. But I think like equally important, if not more important line of my work is like benchmarks and environments, right? Like WebShop or SuiteVenture or whatever. And I think in general, what people do in academia that I think is not good is they choose a very simple task, like Alford, and then they apply overly complex methods to show they improve 2%. I think you should probably match the level of complexity of your task and your method. I feel like where tasks are kind of far behind the method in some sense, right? Because we have some good test-time approaches, like whatever, React or Refraction or Tree of Thought, or like there are many, many more complicated test-time methods afterwards. But on the benchmark side, we have made a lot of good progress this year, last year. But I think we still need more progress towards that, like better coding benchmark, better web agent benchmark, better agent benchmark, not even for web or code. I think in general, we need to catch up with tasks.Harrison [00:32:27]: What are the biggest reasons in your mind why it lags behind?Shunyu [00:32:31]: I think incentive is one big reason. Like if you see, you know, all the master paper are cited like a hundred times more than the task paper. And also making a good benchmark is actually quite hard. It's almost like a different set of skills in some sense, right? I feel like if you want to build a good benchmark, you need to be like a good kind of product manager kind of mindset, right? You need to think about why people should use your benchmark, why it's challenging, why it's useful. If you think about like a PhD going into like a school, right? The prior skill that expected to have is more about, you know, can they code this method and can they just run experiments and can solve that? I think building a benchmark is not the typical prior skill that we have, but I think things are getting better. I think more and more people are starting to build benchmarks and people are saying that it's like a way to get more impact in some sense, right? Because like if you have a really good benchmark, a lot of people are going to use it. But if you have a super complicated test time method, like it's very hard for people to use it.Harrison [00:33:35]: Are evaluation metrics also part of the reason? Like for some of these tasks that we might want to ask these agents or language models to do, is it hard to evaluate them? And so it's hard to get an automated benchmark. Obviously with SweetBench you can, and with coding, it's easier, but.Shunyu [00:33:50]: I think that's part of the skillset thing that I mentioned, because I feel like it's like a product manager because there are many dimensions and you need to strike a balance and it's really hard, right? If you want to make sense, very easy to autogradable, like automatically gradable, like either to grade or either to evaluate, then you might lose some of the realness or practicality. Or like it might be practical, but it might not be as scalable, right? For example, if you think about text game, human have pre-annotated all the rewards and all the language are real. So it's pretty good on autogradable dimension and the practical dimension. If you think about, you know, practical, like actual English being practical, but it's not scalable, right? It takes like a year for experts to build that game. So it's not really that scalable. And I think part of the reason that SweetBench is so popular now is it kind of hits the balance between these three dimensions, right? Easy to evaluate and being actually practical and being scalable. Like if I were to criticize upon some of my prior work, I think webshop, like it's my initial attempt to get into benchmark world and I'm trying to do a good job striking the balance. But obviously we make it all gradable and it's really scalable, but then I think the practicality is not as high as actually just using GitHub issues, right? Because you're just creating those like synthetic tasks.Harrison [00:35:13]: Are there other areas besides coding that jump to mind as being really good for being autogradable?Shunyu [00:35:20]: Maybe mathematics.Swyx [00:35:21]: Classic. Yeah. Do you have thoughts on alpha proof, the new DeepMind paper? I think it's pretty cool.Shunyu [00:35:29]: I think it's more of a, you know, it's more of like a confidence boost or like sometimes, you know, the work is not even about, you know, the technical details or the methodology that it chooses or the concrete results. I think it's more about a signal, right?Swyx [00:35:47]: Yeah. Existence proof. Yeah.Shunyu [00:35:50]: Yeah. It can be done. This direction is exciting. It kind of encourages people to work more towards that direction. I think it's more like a boost of confidence, I would say.Swyx [00:35:59]: Yeah. So we're going to focus more on agents now and, you know, all of us have a special interest in coding agents. I would consider Devin to be the sort of biggest launch of the year as far as AI startups go. And you guys in the Princeton group worked on Suiagents alongside of Suibench. Tell us the story about Suiagent. Sure.Shunyu [00:36:21]: I think it's kind of like a triology, it's actually a series of three works now. So actually the first work is called Intercode, but it's not as famous, I know. And the second work is called Suibench and the third work is called Suiagent. And I'm just really confused why nobody is working on coding. You know, it's like a year ago, but I mean, not everybody's working on coding, obviously, but a year ago, like literally nobody was working on coding. I was really confused. And the people that were working on coding are, you know, trying to solve human evil in like a sick-to-sick way. There's no agent, there's no chain of thought, there's no anything, they're just, you know, fine tuning the model and improve some points and whatever, like, I was really confused because obviously coding is the best application for agents because it's autogradable, it's super important, you can make everything like API or code action, right? So I was confused and I collaborated with some of the students in Princeton and we have this work called Intercode and the idea is, first, if you care about coding, then you should solve coding in an interactive way, meaning more like a Jupyter Notebook kind of way than just writing a program and seeing if it fails or succeeds and stop, right? You should solve it in an interactive way because that's exactly how humans solve it, right? You don't have to, you know, write a program like next token, next token, next token and stop and never do any edits and you cannot really use any terminal or whatever tool. It doesn't make sense, right? And that's the way people are solving coding at the time, basically like sampling a program from a language model without chain of thought, without tool call, without refactoring, without anything. So the first point is we should solve coding in a very interactive way and that's a very general principle that applies for various coding benchmarks. And also, I think you can make a lot of the agent task kind of like interactive coding. If you have Python and you can call any package, then you can literally also browse internet or do whatever you want, like control a robot or whatever. So that seems to be a very general paradigm. But obviously I think a bottleneck is at the time we're still doing, you know, very simple tasks like human eval or whatever coding benchmark people proposed. They were super hard in 2021, like 20%, but they're like 95% already in 2023. So obviously the next step is we need a better benchmark. And Carlos and John, which are the first authors of Swaybench, I think they come up with this great idea that we should just script GitHub and solve whatever human engineers are solving. And I think it's actually pretty easy to come up with the idea. And I think in the first week, they already made a lot of progress. They script the GitHub and they make all the same, but then there's a lot of painful info work and whatever, you know. I think the idea is super easy, but the engineering is super hard. And I feel like that's a very typical signal of a good work in the AI era now.Swyx [00:39:17]: I think also, I think the filtering was challenging, because if you look at open source PRs, a lot of them are just like, you know, fixing typos. I think it's challenging.Shunyu [00:39:27]: And to be honest, we didn't do a perfect job at the time. So if you look at the recent blog post with OpenAI, we improved the filtering so that it's more solvable.Swyx [00:39:36]: I think OpenAI was just like, look, this is a thing now. We have to fix this. These students just rushed it.Shunyu [00:39:45]: It's a good convergence of interests for me.Alessio [00:39:48]: Was that tied to you joining OpenAI? Or was that just unrelated?Shunyu [00:39:52]: It's a coincidence for me, but it's a good coincidence.Swyx [00:39:55]: There is a history of anytime a big lab adopts a benchmark, they fix it. Otherwise, it's a broken benchmark.Shunyu [00:40:03]: So naturally, once we propose swimmage, the next step is to solve it. But I think the typical way you solve something now is you collect some training samples, or you design some complicated agent method, and then you try to solve it. Either super complicated prompt, or you build a better model with more training data. But I think at the time, we realized that even before those things, there's a fundamental problem with the interface or the tool that you're supposed to use. Because that's like an ignored problem in some sense. What your tool is, or how that matters for your task. So what we found concretely is that if you just use the text terminal off the shelf as a tool for those agents, there's a lot of problems. For example, if you edit something, there's no feedback. So you don't know whether your edit is good or not. That makes the agent very confused and makes a lot of mistakes. There are a lot of small problems, you would say. Well, you can try to do prompt engineering and improve that, but it turns out to be actually very hard. We realized that the interface design is actually a very omitted part of agent design. So we did this switch agent work. And the key idea is just, even before you talk about what the agent is, you should talk about what the environment is. You should make sure that the environment is actually friendly to whatever agent you're trying to apply. That's the same idea for humans. Text terminal is good for some tasks, like git, pool, or whatever. But it's not good if you want to look at browser and whatever. Also, browser is a good tool for some tasks, but it's not a good tool for other tasks. We need to talk about how design interface, in some sense, where we should treat agents as our customers. It's like when we treat humans as a customer, we design human computer interfaces. We design those beautiful desktops or browsers or whatever, so that it's very intuitive and easy for humans to use. And this whole great subject of HCI is all about that. I think now the research idea of switch agent is just, we should treat agents as our customers. And we should do like, you know… AICI.Swyx [00:42:16]: AICI, exactly.Harrison [00:42:18]: So what are the tools that a suite agent should have, or a coding agent in general should have?Shunyu [00:42:24]: For suite agent, it's like a modified text terminal, which kind of adapts to a lot of the patterns of language models to make it easier for language models to use. For example, now for edit, instead of having no feedback, it will actually have a feedback of, you know, actually here you introduced like a syntax error, and you should probably want to fix that, and there's an ended error there. And that makes it super easy for the model to actually do that. And there's other small things, like how exactly you write arguments, right? Like, do you want to write like a multi-line edit, or do you want to write a single line edit? I think it's more interesting to think about the way of the development process of an ACI rather than the actual ACI for like a concrete application. Because I think the general paradigm is very similar to HCI and psychology, right? Basically, for how people develop HCIs, they do behavior experiments on humans, right? I do every test, right? Like, which interface is actually better? And I do those behavior experiments, kind of like psychology experiments to humans, and I change things. And I think what's really interesting for me, for this three-agent paper, is we can probably do the same thing for agents, right? We can do every test for those agents and do behavior tests. And through the process, we not only invent better interfaces for those agents, that's the practical value, but we also better understand agents. Just like when we do those A-B tests, we do those HCI, we better understand humans. Doing those ACI experiments, we actually better understand agents. And that's pretty cool.Harrison [00:43:51]: Besides that A-B testing, what are other processes that people can use to think about this in a good way?Swyx [00:43:57]: That's a great question.Shunyu [00:43:58]: And I think three-agent is an initial work. And what we do is the kind of the naive approach, right? You just try some interface, and you see what's going wrong, and then you try to fix that. We do this kind of iterative fixing. But I think what's really interesting is there'll be a lot of future directions that's very promising if we can apply some of the HCI principles more systematically into the interface design. I think that would be a very cool interdisciplinary research opportunity.Harrison [00:44:26]: You talked a lot about agent-computer interfaces and interactions. What about human-to-agent UX patterns? Curious for any thoughts there that you might have.Swyx [00:44:38]: That's a great question.Shunyu [00:44:39]: And in some sense, I feel like prompt engineering is about human-to-agent interface. But I think there can be a lot of interesting research done about... So prompting is about how humans can better communicate with the agent. But I think there could be interesting research on how agents can better communicate with humans, right? When to ask questions, how to ask questions, what's the frequency of asking questions. And I think those kinds of stuff could be very cool research.Harrison [00:45:07]: Yeah, I think some of the most interesting stuff that I saw here was also related to coding with Devin from Cognition. And they had the three or four different panels where you had the chat, the browser, the terminal, and I guess the code editor as well.Swyx [00:45:19]: There's more now.Harrison [00:45:19]: There's more. Okay, I'm not up to date. Yeah, I think they also did a good job on ACI.Swyx [00:45:25]: I think that's the main learning I have from Devin. They cracked that. Actually, there was no foundational planning breakthrough. The planner is actually pretty simple, but ACI that they broke through on.Shunyu [00:45:35]: I think making the tool good and reliable is probably like 90% of the whole agent. Once the tool is actually good, then the agent design can be much, much simpler. On the other hand, if the tool is bad, then no matter how much you put into the agent design, planning or search or whatever, it's still going to be trash.Harrison [00:45:53]: Yeah, I'd argue the same. Same with like context and instructions. Like, yeah, go hand in hand.Alessio [00:46:00]: On the tool, how do you think about the tension of like, for both of you, I mean, you're building a library, so even more for you. The tension between making now a language or a library that is like easy for the agent to grasp and write versus one that is easy for like the human to grasp and write. Because, you know, the trend is like more and more code gets written by the agent. So why wouldn't you optimize the framework to be as easy as possible for the model versus for the person?Swyx [00:46:24]: I think it's possible to design an interfaceShunyu [00:46:25]: that's both friendly to humans and agents. But what do you think?Harrison [00:46:29]: We haven't thought about that from the perspective, like we're not trying to design LangChain or LangGraph to be friendly. But I mean, I think to be friendly for agents to write.Swyx [00:46:42]: But I mean, I think we see this with like,Harrison [00:46:43]: I saw some paper that used TypeScript notation instead of JSON notation for tool calling and it got a lot better performance. So it's definitely a thing. I haven't really heard of anyone designing like a syntax or a language explicitly for agents, but there's clearly syntaxes that are better.Shunyu [00:46:59]: I think function calling is a good example where it's like a good interface for both human programmers and for agents, right? Like for developers, it's actually a very friendly interface because it's very concrete and you don't have to do prompt engineering anymore. You can be very systematic. And for models, it's also pretty good, right? Like it can use all the existing coding content. So I think we need more of those kinds of designs.Swyx [00:47:21]: I will mostly agree and I'll slightly disagree in terms of this, which is like, whether designing for humans also overlaps with designing for AI. So Malte Ubo, who's the CTO of Vercel, who is creating basically JavaScript's competitor to LangChain, they're observing that basically, like if the API is easy to understand for humans, it's actually much easier to understand for LLMs, for example, because they're not overloaded functions. They don't behave differently under different contexts. They do one thing and they always work the same way. It's easy for humans, it's easy for LLMs. And like that makes a lot of sense. And obviously adding types is another one. Like type annotations only help give extra context, which is really great. So that's the agreement. And then a disagreement is that when I use structured output to do my chain of thought, I have found that I change my field names to hint to the LLM of what the field is supposed to do. So instead of saying topics, I'll say candidate topics. And that gives me a better result because the LLM was like, ah, this is just a draft thing I can use for chain of thought. And instead of like summaries, I'll say topic summaries to link the previous field to the current field. So like little stuff like that, I find myself optimizing for the LLM where I, as a human, would never do that. Interesting.Shunyu [00:48:32]: It's kind of like the way you optimize the prompt, it might be different for humans and for machines. You can have a common ground that's both clear for humans and agents, but to improve the human performance versus improving the agent performance, they might move to different directions.Swyx [00:48:48]: Might move different directions. There's a lot more use of metadata as well, like descriptions, comments, code comments, annotations and stuff like that. Yeah.Harrison [00:48:56]: I would argue that's just you communicatingSwyx [00:48:58]: to the agent what it should do.Harrison [00:49:00]: And maybe you need to communicate a little bit more than to humans because models aren't quite good enough yet.Swyx [00:49:06]: But like, I don't think that's crazy.Harrison [00:49:07]: I don't think that's like- It's not crazy.Swyx [00:49:09]: I will bring this in because it just happened to me yesterday. I was at the cursor office. They held their first user meetup and I was telling them about the LLM OS concept and why basically every interface, every tool was being redesigned for AIs to use rather than humans. And they're like, why? Like, can we just use Bing and Google for LLM search? Why must I use Exa? Or what's the other one that you guys work with?Harrison [00:49:32]: Tavilli.Swyx [00:49:33]: Tavilli. Web Search API dedicated for LLMs. What's the difference?Shunyu [00:49:36]: Exactly. To Bing API.Swyx [00:49:38]: Exactly.Harrison [00:49:38]: There weren't great APIs for search. Like the best one, like the one that we used initially in LangChain was SERP API, which is like maybe illegal. I'm not sure.Swyx [00:49:49]: And like, you know,Harrison [00:49:52]: and now there are like venture-backed companies.Swyx [00:49:53]: Shout out to DuckDuckGo, which is free.Harrison [00:49:55]: Yes, yes.Swyx [00:49:56]: Yeah.Harrison [00:49:56]: I do think there are some differences though. I think you want, like, I think generally these APIs try to return small amounts of text information, clear legible field. It's not a massive JSON blob. And I think that matters. I think like when you talk about designing tools, it's not only the, it's the interface in the entirety, not only the inputs, but also the outputs that really matter. And so I think they try to make the outputs.Shunyu [00:50:18]: They're doing ACI.Swyx [00:50:19]: Yeah, yeah, absolutely.Harrison [00:50:20]: Really?Swyx [00:50:21]: Like there's a whole set of industries that are just being redone for ACI. It's weird. And so my simple answer to them was like the error messages. When you give error messages, they should be basically prompts for the LLM to take and then self-correct. Then your error messages get more verbose, actually, than you normally would with a human. Stuff like that. Like a little, honestly, it's not that big. Again, like, is this worth a venture-backed industry? Unless you can tell us. But like, I think Code Interpreter, I think is a new thing. I hope so.Alessio [00:50:52]: We invested in it to be so.Shunyu [00:50:53]: I think that's a very interesting point. You're trying to optimize to the extreme, then obviously they're going to be different. For example, the error—Swyx [00:51:00]: Because we take it very seriously. Right.Shunyu [00:51:01]: The error for like language model, the longer the better. But for humans, that will make them very nervous and very tired, right? But I guess the point is more like, maybe we should try to find a co-optimized common ground as much as possible. And then if we have divergence, then we should try to diverge. But it's more philosophical now.Alessio [00:51:19]: But I think like part of it is like how you use it. So Google invented the PageRank because ideally you only click on one link, you know, like the top three should have the answer. But with models, it's like, well, you can get 20. So those searches are more like semantic grouping in a way. It's like for this query, I'll return you like 20, 30 things that are kind of good, you know? So it's less about ranking and it's more about grouping.Shunyu [00:51:42]: Another fundamental thing about HCI is the difference between human and machine's kind of memory limit, right? So I think what's really interesting about this concept HCI versus HCI is interfaces that's optimized for them. You can kind of understand some of the fundamental characteristics, differences of humans and machines, right? Why, you know, if you look at find or whatever terminal command, you know, you can only look at one thing at a time or that's because we have a very small working memory. You can only deal with one thing at a time. You can only look at one paragraph of text at the same time. So the interface for us is by design, you know, a small piece of information, but more temporal steps. But for machines, that should be the opposite, right? You should just give them a hundred different results and they should just decide in context what's the most relevant stuff and trade off the context for temporal steps. That's actually also better for language models because like the cost is smaller or whatever. So it's interesting to connect those interfaces to the fundamental kind of differences of those.Harrison [00:52:43]: When you said earlier, you know, we should try to design these to maybe be similar as possible and diverge if we need to.Swyx [00:52:49]: I actually don't have a problem with them diverging nowHarrison [00:52:51]: and seeing venture-backed startups emerging now because we are different from machines code AI. And it's just so early on, like they may still look kind of similar and they may still be small differences, but it's still just so early. And I think we'll only discover more ways that they differ. And so I'm totally fine with them kind of like diverging earlySwyx [00:53:10]: and optimizing for the...Harrison [00:53:11]: I agree. I think it's more like, you know,Shunyu [00:53:14]: we should obviously try to optimize human interface just for humans. We're already doing that for 50 years. We should optimize agent interface just for agents, but we might also try to co-optimize both and see how far we can get. There's enough people to try all three directions. Yeah.Swyx [00:53:31]: There's a thesis I sometimes push, which is the sour lesson as opposed to the bitter lesson, which we're always inspired by human development, but actually AI develops its own path.Shunyu [00:53:40]: Right. We need to understand better, you know, what are the fundamental differences between those creatures.Swyx [00:53:45]: It's funny when really early on this pod, you were like, how much grounding do you have in cognitive development and human brain stuff? And I'm like

united states ai english google san francisco phd simple data italian mit language 3d basketball baseball decision tree reflection curious singapore id acting wikipedia patriots designing context excited classic boston celtics cto react nlp openai separate chain user residence existence ux entry api checks pov gpt bing wei makes voyager python brief history ciara atari promising github llama notion tot apis generative llm javascript neville pockets reflexion cognition sel google docs reasoning cpu agi dfs kojima ls smallville noaa koala ide cloudflare handball prs hinton b2b saas partially deepmind gan ilya alford tau tldr sdr alessio duckduckgo lms xai suno s4 rl google scholar lm json replicate canonical typescript karthik webshops cot hci ppo aci zork gpc brad taylor pagerank mcts refraction jupyter notebooks gpd aici langchain exa neurips loras opening song so google coala devday summarization code interpreter openai devday sales development reps latent space noam brown a2c

Weighin' in with Worm: S2 Episode 6 Loras Wrestling Jeren Glosser and Devin Bahr

Dubuque Area Baseball Podcast

Play Episode Listen Later Sep 18, 2024 53:54

What's going on guys? We are sitting down with Loras Head Coach and newly named Head Assistant Coach Devin Bahr. Glosser speaks about last year's season and the potential of the Duhawks going into the 2024-2025 season. We also get to know new head assistant coach Devin Bahr, a former 3x Wisconsin State Champion, World Team Member and 2024 NCAA Division 2 All American!

wrestling all american worm ncaa division bahr loras jeren weighin duhawks

Chapter Twenty-Eight - Cersei 6 - A Feast for Crows | A Song of Ice and Fire (ASOIAF)

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)

Play Episode Listen Later Sep 10, 2024 67:25

Send us a textCersei gets an update on Margaery court, while being waylaid by sparrows. She negotiates a deal with the new High Septon. He'll bless Tommen and erase the crown's debt to the Faith for a law allowing the re-emergence of the Faith Militant. Finally, she makes a veiled threat to Margaery. Mackelly and Simon bow their heads in prayer that she knows what she's doing.Chapter Review:Queen Cersei Lannister rides to the Great Sept of Baelor with Taena Maryweather. Taena fills Cersei in on the men in Margaery's court. The list is filled with most of the young, handsome knights in King's Landing. Therefore, the addition of Grand Maester Pycelle stands out. The going is slow. The litter is blocked by sparrows clogging the streets. Ultimately, Cersei walks the remainder of the way. Upon seeing a pile of bones of slain faithful, Cersei promises to seek revenge. However, the sparrows want only protection for the living.Inside the sept, Cersei is shocked to see septons on hands and knees scrubbing. When she meets the High Septon, she says he should have met her in his finest robes and a crystal crown, not wet and dirty. He explains that they've sold their crowns and fine robes. Cersei asks to speak in private. They kneel at the feet of the Crone. Cersei wants the sparrows gone from King's Landing and a blessing from the High Septon for King Tommen. The High Septon wants better protection from the king for the faithful throughout the realm. Cersei suggests they protect themselves, but a law prohibits the faith from taking up arms. She offers to abolish the law for a blessing for the king and forgiveness on the crown's debt to the faith. The High Septon agrees.On the ride back to the Red Keep, Cersei's festive mood is killed by the sight of Queen Margaery and her court. The two queens exchange pleasantries and barbs. When Cersei sees Margaery and Loras together, it reminds her of Jaime and herself. An idea takes root and laughter bursts forth. Margaery wants to share in the jest, and Cersei promises her that she will.Characters/Places/Names/Events:Cersei Lannister - Mother to King Tommen, Queen Regent of the Seven Kingdoms.King Tommen Baratheon - Child king of the Seven Kingdoms.High Septon - The highest ranking member of the Faith of the Seven.Taena Merryweather - Socia Support the showSupport us: Buy us a Cup of Arbor Gold, or become a sustainer and receive cool perks Donate to our cause Use our exclusive URL for a free 30-day trial of Audible Buy or gift Marriott Bonvoy points through our affiliate link Rate and review us at Apple Podcasts, Spotify, podchaser.com, and elsewhere.Find us on social media: Discord Twitter @GhostsHarrenhal Facebook Instagram YouTube All Music credits to Ross Bugden:INSTAGRAM! : https://instagram.com/rossbugden/ (rossbugden) TWITTER! : https://twitter.com/RossBugden (@rossbugden) YOUTUBE! : https://www.youtube.com/watch?v=kthxycmF25M

spotify song game of thrones landing book reviews seven kingdoms george rr martin crone twenty eight cersei ice and fire cersei lannister asoiaf ross bugden high sparrow baelor marriott bonvoy loras song of ice and fire tommen red keep margaery margaery tyrell feast for crows high septon great sept faith militant arbor gold margery tyrell

Efficiency is Coming: 3000x Faster, Cheaper, Better AI Inference from Hardware Improvements, Quantization, and Synthetic Data Distillation

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Sep 3, 2024 65:18

AI Engineering is expanding! Join the first

tv amazon time world ai new york city google vision las vegas training research data speaker fun model unity 3d harry potter sons computers taiwan tv shows avatar context hitting ebay albert einstein ip ces characters saas cto efficiency gemini openai optimizing nvidia hardware qr oxygen gpt turbo generating improvements precision ml notably llama gin festa cheaper unreal npc siemens llm synthetic alternatively sora copilot anthropic exeter opus slenderman preference gpu agi npcs quirky elo astrophysics rembrandt second life rag jin gpus c4 google ai jetsons gan zenith tldr alessio raman suno computer vision mistral cpus mpc jensen huang iac google deepmind inference amazon go databricks convey chinchillas simulating carleton college computex jetson sean carroll distillation edge ai distill naila nvidia gtc metahuman character ai 70b lisa su loras game engines rlhf resnet v100 james wang mpcs latent space ternary lstms

Weighin in with Worm: S2 E2 Shane Liegel

Dubuque Area Baseball Podcast

Play Episode Listen Later Aug 21, 2024 53:48

Shane Liegel was a 3x state champion out of River Valley Wisconsin, he made his way to Loras where he was a 4x all American, an NWCA national champion, and a NCAA runner up. With his last year of eligibility Shane took his talent to Madison Wisconsin where he competed for the Badgers!

american ncaa worm badgers madison wisconsin loras nwca weighin

Weighin in with Worm S2E1: Guy Patron Jr.

Dubuque Area Baseball Podcast

Play Episode Listen Later Aug 14, 2024 42:36

Guy Patrone was the first 4x All American in Loras history; he also is the all time win leader for the Duhawks. We sit down with this New Orleans native to discuss his impact on Loras Wrestling and what his future holds

new orleans all american worm loras weighin duhawks

STEVE HELMINIAK LORAS PREVIEW (UFR).mp3

Upon Further Review

Play Episode Listen Later Jun 24, 2024 8:57

loras

Talking Wrestling with Head Coach Brett Wheelan

Let's Talk Wrestling

Play Episode Listen Later Apr 17, 2024 93:37

Brett is from Manchester, Iowa and wrestled for West Delaware. Unfortunately, injuries cut his career short. But Brett just couldn't get away from wrestling. He's been involved at the coaching level, holding roles at Waterloo West, Waverly Shell-Rock and Loras, to name a few. Currently, Brett is the head coach at West Des Moines Valley and guided them to a 15th place finish at state this year, which included a state finalist. In the show notes you'll see a link to Pay It FOREward Connolly, a non-profit organization that serves to financially aid persons with disabilities and their families. The organization was founded in memory of Andrew Connolly, a good friend of Brett Wheelan. Again, the link to the organization can be found in the show notes. If you know someone who could benefit from this amazing organization or if you simply want to know how you can help, all the information is found through that link. So please, sit back, relax and enjoy, Coach Brett Wheelan! Pay It FOREward Connolly: https://www.facebook.com/profile.php?id=100064756686289 Euphoria Coffee website: https://www.drinkeuphoriacoffee2go.com/ Let's Talk Wrestling website: https://letstalkwrestlingpodcast.my.canva.site/ --- Support this podcast: https://podcasters.spotify.com/pod/show/lets-talk-wrestling/support

iowa wrestling manchester head coach talk wrestling coach brett loras wheelan

Hawkeye Report Podcast 583

Hawkeye Report Podcast

Play Episode Listen Later Feb 22, 2024 53:23

We have a jam packed podcast this week. Tom Kakert and Kyle Huesmann open the show discussing the Iowa Basketball team suddenly picking up their two biggest wins of the season. That includes a win at Michigan State. We look back at the victory over the Spartans on Tuesday night and how Iowa was able to pull the upset. Iowa is now potentially seeing a shot at the NCAA Tournament in the distance. We dive into the Hawkeyes resume after the wins over Wisconsin and Michigan State. How does it stack up when compared to other potential programs being mentioned as being on the bubble? Then we discuss the Iowa women's basketball team hitting the road for a huge game on Thursday night at Indiana. Kyle previews this massive game that Iowa will likely need to keep their hopes of a Big Ten title alive. How does Iowa match up with the Hoosiers? We also look at the remaining games for Iowa and their competition for the title, Ohio State. Finally we wrap up discussing Iowa baseball. We look back at the three game series and the win on Tuesday over Loras. What have we learned so far about the Hawkeye baseball team? We also dive into a massive series in Jacksonville this weekend where Iowa will face Auburn, nationally ranked Virginia, and Wichita State.

wisconsin indiana iowa jacksonville ohio state michigan state university big ten ncaa tournament hawkeyes auburn caitlin clark hoosiers spartans wichita state report podcast iowa basketball loras tom kakert iowa baseball hawkeye report

2024 Loras College Baseball Season Preview

Dubuque Area Baseball Podcast

Play Episode Listen Later Feb 19, 2024 21:02

Head Coach Carl Tebon stops by the podcast and reflects over the past seasons where Loras beat Iowa, won a conference championship, and made a regional for the first time under coach Tebon. Of course, we talk about the upcoming 2024 Loras DuHawks. Thanks to our great sponsors: Xtreme Strength and Sports Performance Nolan Weber Agency EXIT Realty with Jerry Elbert Tricor Insurance Company Kramer Pest Control Bar Raising Fitness Foley Distributing Advanced Wellness Center

iowa season preview college baseball baseball season loras loras college

517: ASoS Jaime VIII (repost)

Close the Door: Game of Thrones, A Song of Ice and Fire Podcast

Play Episode Listen Later Feb 13, 2024 38:10

Spoilers, profanity, Jaime x Brienne. It's time for King's Landing Cribs Edition, where the magic doesn't happen. Are you a gasoline kinda gal or do you prefer manure? All we know is we're not into the smell of rotting flesh. Jaime trusts Loras will fall in love with Brienne if he just spends a few minutes alone with her. Game of Thrones. A Song of Ice and Fire. A Storm of Swords - Jaime VIII - originally recorded in 2019. Close The Door And Come Here - Episode 517

spoilers song fire game of thrones fall in love ice viii repost asos brienne loras

21.17a: 10th Annual Marathon (Part 1)

Hoopsville

Play Episode Listen Later Feb 2, 2024 127:03

Friday is the 10th Annual Hoopsville Marathon and is a bit of a "half" marathon. The "big show" is scheduled to be on air for at least six hours (it ends up going seven-and-a-half hours). This year's show features coaches, administrators, and others around Division III who give us a sense of the season to date, what's to come, and what else is going on in the Division and sport. We will talk about the ongoing NABC efforts with Coaches vs Cancer and Suits and Sneakers events. We hear from some of our pundits who tackle ever more difficult Top 25 polls. And we have our annual conversation with both national committee chairs. The marathon is also a chance to celebrate the final month of the Division III basketball regular season. And we restart the shows fundraising efforts to help continue our efforts into the future. Guests included (in Part 1): - Chris Martin, No. 20 Loras men's coach - Lucia Robinson-Gibbs, MIT women's coach - Justin Olufson, Webster women's coach - Gary Stewart, Stevenson men's coach, NABC Coaches v Cancer advocate - Bill Broderick, No. 4 Christopher Newport women's coach Hoopsville is presented by D3hoops.com.

mit cancer coaches marathon sneakers webster suits stevenson chris martin division iii loras gary stewart nabc christopher newport

"BOOK OF THE STRANGER" Game of Thrones Season 6 EP4 Recap

WATCH DEM THRONES by Black With No Chaser

Play Episode Listen Later Jan 29, 2024 56:01

Sansa Stark arrives at the Wall and reunites with Jon Snow, and later receives a message from Ramsay Bolton challenging Jon to come take back Winterfell and rescue Rickon Stark; Margaery Tyrell is reunited with her brother, Loras; Cersei and Jaime Lannister plot with their uncle Kevan and Olenna Tyrell to have them released; and Daenerys Targaryen faces the khals.If you want to keep the fun going with us throughout the week, come join our Facebook group. THE WATCH DEM THRONES FACEBOOK GROUPhttps://www.facebook.com/groups/126567443834910/?ref=share&mibextid=NSMWBTTO WATCH AND SUBSCRIBE:Facebook:https://www.facebook.com/blackwithnochaser/Twitter:https://twitter.com/BeBlackNoChaser?t=pVFV06lBFdZRu72ot4uCjA&s=09Twitter:https://twitter.com/WatchDemThrones?t=q0ngrYPlugf0ttzM2jo39A&s=09Apple Music: https://podcasts.apple.com/us/podcast/watch-dem-thrones-by-black-with-no-chaser/id1641754247Spotify:https://open.spotify.com/show/1qI1bJ1vIlobu502w6zrtN?si=mtsa3gZYRZW_3FmlCrv7UgBWNC RADIO: https://apps.apple.com/us/app/bwnc-radio/id6443800363Amazon Musichttps://music.amazon.com/podcasts/45279c3a-c09f-47d1-a3a3-88e6e2507230/watch-dem-thrones-by-black-with-no-chaserIHeartRadiohttps://www.iheart.com/podcast/269-watch-dem-thrones-by-black-101286659/?cmp=android_share&sc=android_social_share&pr=false#gameofthrones #demdragons #blackwithnochaser #houseofthedragonhbo #dragonseeds #theblacks #thegreens #houseofthedragon #youtube #targaryens #podcast #podsincolor #applemusic #spotifymusic #podsincolor #starks #lannisters #Velaryon

wall jon snow winterfell game of thrones season cersei daenerys targaryen kevan sansa stark jaime lannister book of the stranger loras ramsay bolton olenna tyrell velaryon margaery tyrell rickon stark

EA - AI doing philosophy = AI generating hands? by Wei Dai

The Nonlinear Library

Play Episode Listen Later Jan 15, 2024 4:48

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI doing philosophy = AI generating hands?, published by Wei Dai on January 15, 2024 on The Effective Altruism Forum. I've been playing around with Stable Diffusion recently, and an analogy occurred to me between today's AI's notoriously bad generation of hands and future AI's potentially bad reasoning about philosophy. In case you aren't already familiar, currently available image generation AIs are very prone to outputting bad hands, e.g., ones with four or six fingers, or two thumbs, or unnatural poses, or interacting with other objects in very strange ways. Perhaps what's especially striking is how bad AIs are at hands relative to other image generation capabilities, thus serving as a cautionary tale about differentially decelerating philosophy relative to other forms of intellectual progress, e.g., scientific and technological progress. Is anyone looking into differential artistic progress as a possible x-risk? /jk Some explanations I've seen for why AI is bad at hands: it's hard for AIs to learn hand generation because of how many poses a hand can make, how many different ways it can interact with other objects, and how many different viewing angles AIs need to learn to reproduce each 2D image provides only partial information about a hand (much of it is often obscured behind other objects or parts of itself) most hands in the training data are very low resolution (a tiny part of the overall image) and thus not helpful for training AI the proportion of hands in the training set is too low for the AI to devote much model capacity to hand generation ("misalignment" between the loss function and what humans care about probably also contributes to this) AI developers just haven't collected and trained AI on enough high quality hand images yet There are news articles about this problem going back to at least 2022, and I can see a lot of people trying to solve it (on Reddit, GitHub, arXiv) but progress has been limited. Straightforward techniques like prompt engineering and finetuning do not seem to help much. Here are 2 SOTA techniques, to give you a glimpse of what the technological frontier currently looks like (at least in open source): Post-process images with a separate ML-based pipeline to fix hands after initial generation. This creates well-formed hands but doesn't seem to take interactions with other objects into (sufficient or any) consideration. If you're not trying to specifically generate hands, but just don't want to see incidentally bad hands in images with humans in them, get rid of all hand-related prompts, LoRAs, textual inversions, etc., and just putting "hands" in the negative prompt. This doesn't eliminate all hands but reduces the number/likelihood of hands in the picture and also makes the remaining ones look better. (The idea behind this is that it makes the AI "try less hard" to generate hands, and perhaps focus more on central examples that it has more training on. Of course generating hands is ultimately not a very hard problem. Hand anatomy and its interactions with other objects pose no fundamental mysteries. Bad hands are easy for humans to recognize and therefore we have quick and easy feedback for how well we're solving the problem. We can use our explicit understanding of hands to directly help solve the problem (solution 1 above used at least the fact that hands are compact 3D objects), or just provide the AI with more high quality training data (physically taking more photos of hands if needed) until it recognizably fixed itself. What about philosophy? Well, scarcity of existing high quality training data, check. Lots of unhelpful data labeled "philosophy", check. Low proportion of philosophy in the training data, check. Quick and easy to generate more high quality data, no. Good explicit understanding of the principles involved, ...

ai hands 3d philosophy reddit speech ea generating ml 2d github straightforward sota stable diffusion arxiv rationalist loras wei dai

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Dec 20, 2023 57:09

We are running an end of year survey for our listeners! Please let us know any feedback you have, what episodes resonated with you, and guest requests for 2024! Survey link here!Before language models became all the rage in November 2022, image generation was the hottest space in AI (it was the subject of our first piece on Latent Space!) In our interview with Sharif Shameem from Lexica we talked through the launch of StableDiffusion and the early days of that space. At the time, the toolkit was still pretty rudimentary: Lexica made it easy to search images, you had the AUTOMATIC1111 Web UI to generate locally, some HuggingFace spaces that offered inference, and eventually DALL-E 2 through OpenAI's platform, but not much beyond basic text-to-image workflows.Today's guest, Suhail Doshi, is trying to solve this with Playground AI, an image editor reimagined with AI in mind. Some of the differences compared to traditional text-to-image workflows:* Real-time preview rendering using consistency: as you change your prompt, you can see changes in real-time before doing a final rendering of it.* Style filtering: rather than having to prompt exactly how you'd like an image to look, you can pick from a whole range of filters both from Playground's model as well as Stable Diffusion (like RealVis, Starlight XL, etc). We talk about this at 25:46 in the podcast.* Expand prompt: similar to DALL-E3, Playground will do some prompt tuning for you to get better results in generation. Unlike DALL-E3, you can turn this off at any time if you are a prompting wizard* Image editing: after generation, you have tools like a magic eraser, inpainting pencil, etc. This makes it easier to do a full workflow in Playground rather than switching to another tool like Photoshop.Outside of the product, they have also trained a new model from scratch, Playground v2, which is fully open source and open weights and allows for commercial usage. They benchmarked the model against SDXL across 1,000 prompts and found that humans preferred the Playground generation 70% of the time. They had similar results on PartiPrompts:They also created a new benchmark, MJHQ-30K, for “aesthetic quality”:We introduce a new benchmark, MJHQ-30K, for automatic evaluation of a model's aesthetic quality. The benchmark computes FID on a high-quality dataset to gauge aesthetic quality.We curate the high-quality dataset from Midjourney with 10 common categories, each category with 3K samples. Following common practice, we use aesthetic score and CLIP score to ensure high image quality and high image-text alignment. Furthermore, we take extra care to make the data diverse within each category.Suhail was pretty open with saying that Midjourney is currently the best product for imagine generation out there, and that's why they used it as the base for this benchmark. I think it's worth comparing yourself to maybe the best thing and try to find like a really fair way of doing that. So I think more people should try to do that. I definitely don't think you should be kind of comparing yourself on like some Google model or some old SD, Stable Diffusion model and be like, look, we beat Stable Diffusion 1.5. I think users ultimately want care, how close are you getting to the thing that people mostly agree with? [00:23:47]We also talked a lot about Suhail's founder journey from starting Mixpanel in 2009, then going through YC again with Mighty, and eventually sunsetting that to pivot into Playground. Enjoy!Show Notes* Suhail's Twitter* “Starting my road to learn AI”* Bill Gates book trip* Playground* Playground v2 Announcement* $40M raise announcement* “Running infra dev ops for 24 A100s”* Mixpanel* Mighty* “I decided to stop working on Mighty”* Fast.ai* CivitTimestamps* [00:00:00] Intros* [00:02:59] Being early in ML at Mixpanel* [00:04:16] Pivoting from Mighty to Playground and focusing on generative AI* [00:07:54] How DALL-E 2 inspired Mighty* [00:09:19] Reimagining the graphics editor with AI* [00:17:34] Training the Playground V2 model from scratch to advance generative graphics* [00:21:11] Techniques used to improve Playground V2 like data filtering and model tuning* [00:25:21] Releasing the MJHQ30K benchmark to evaluate generative models* [00:30:35] The limitations of current models for detailed image editing tasks* [00:34:06] Using post-generation user feedback to create better benchmarks* [00:38:28] Concerns over potential misuse of powerful generative models* [00:41:54] Rethinking the graphics editor user experience in the AI era* [00:45:44] Integrating consistency models into Playground using preview rendering* [00:47:23] Interacting with the Stable Diffusion LoRAs community* [00:51:35] Running DevOps on A100s* [00:53:12] Startup ideas?TranscriptAlessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO-in-Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI. [00:00:15]Swyx: Hey, and today in the studio we have Suhail Doshi, welcome. [00:00:18]Suhail: Yeah, thanks. Thanks for having me. [00:00:20]Swyx: So among many things, you're a CEO and co-founder of Mixpanel, and I think about three years ago you left to start Mighty, and more recently, I think about a year ago, transitioned into Playground, and you've just announced your new round. How do you like to be introduced beyond that? [00:00:34]Suhail: Just founder of Playground is fine, yeah, prior co-founder and CEO of Mixpanel. [00:00:40]Swyx: Yeah, awesome. I'd just like to touch on Mixpanel a little bit, because it's obviously one of the more successful analytics companies we previously had amplitude on, and I'm curious if you had any reflections on the interaction of that amount of data that people would want to use for AI. I don't know if there's still a part of you that stays in touch with that world. [00:00:59]Suhail: Yeah, I mean, the short version is that maybe back in like 2015 or 2016, I don't really remember exactly, because it was a while ago, we had an ML team at Mixpanel, and I think this is when maybe deep learning or something really just started getting kind of exciting, and we were thinking that maybe given that we had such vast amounts of data, perhaps we could predict things. So we built two or three different features, I think we built a feature where we could predict whether users would churn from your product. We made a feature that could predict whether users would convert, we built a feature that could do anomaly detection, like if something occurred in your product, that was just very surprising, maybe a spike in traffic in a particular region, can we tell you that that happened? Because it's really hard to like know everything that's going on with your data, can we tell you something surprising about your data? And we tried all of these various features, most of it boiled down to just like, you know, using logistic regression, and it never quite seemed very groundbreaking in the end. And so I think, you know, we had a four or five person ML team, and I think we never expanded it from there. And I did all these Fast AI courses trying to learn about ML. And that was the- That's the first time you did fast AI. Yeah, that was the first time I did fast AI. Yeah, I think I've done it now three times, maybe. [00:02:12]Swyx: Oh, okay. [00:02:13]Suhail: I didn't know it was the third. No, no, just me reviewing it, it's maybe three times, but yeah. [00:02:16]Swyx: You mentioned prediction, but honestly, like it's also just about the feedback, right? The quality of feedback from users, I think it's useful for anyone building AI applications. [00:02:25]Suhail: Yeah. Yeah, I think I haven't spent a lot of time thinking about Mixpanel because it's been a long time, but sometimes I'm like, oh, I wonder what we could do now. And then I kind of like move on to whatever I'm working on, but things have changed significantly since. [00:02:39]Swyx: And then maybe we'll touch on Mighty a little bit. Mighty was very, very bold. My framing of it was, you will run our browsers for us because everyone has too many tabs open. I have too many tabs open and slowing down your machines that you can do it better for us in a centralized data center. [00:02:51]Suhail: Yeah, we were first trying to make a browser that we would stream from a data center to your computer at extremely low latency, but the real objective wasn't trying to make a browser or anything like that. The real objective was to try to make a new kind of computer. And the thought was just that like, you know, we have these computers in front of us today and we upgrade them or they run out of RAM or they don't have enough RAM or not enough disk or, you know, there's some limitation with our computers, perhaps like data locality is a problem. Why do I need to think about upgrading my computer ever? And so, you know, we just had to kind of observe that like, well, actually it seems like a lot of applications are just now in the browser, you know, it's like how many real desktop applications do we use relative to the number of applications we use in the browser? So it's just this realization that actually like, you know, the browser was effectively becoming more or less our operating system over time. And so then that's why we kind of decided to go, hmm, maybe we can stream the browser. Fortunately, the idea did not work for a couple of different reasons, but the objective is try to make sure new computer. [00:03:50]Swyx: Yeah, very, very bold. [00:03:51]Alessio: Yeah, and I was there at YC Demo Day when you first announced it. It was, I think, the last or one of the last in-person ones, at Pier34 in Mission Bay. How do you think about that now when everybody wants to put some of these models in people's machines and some of them want to stream them in, do you think there's maybe another wave of the same problem before it was like browser apps too slow, now it's like models too slow to run on device? [00:04:16]Suhail: Yeah. I mean, I've obviously pivoted away from Mighty, but a lot of what I somewhat believed at Mighty, maybe why I'm so excited about AI and what's happening, a lot of what Mighty was about was like moving compute somewhere else, right? Right now, applications, they get limited quantities of memory, disk, networking, whatever your home network has, et cetera. You know, what if these applications could somehow, if we could shift compute, and then these applications have vastly more compute than they do today. Right now it's just like client backend services, but you know, what if we could change the shape of how applications could interact with things? And it's changed my thinking. In some ways, AI has like a bit of a continuation of my belief that like perhaps we can really shift compute somewhere else. One of the problems with Mighty was that JavaScript is single-threaded in the browser. And what we learned, you know, the reason why we kind of abandoned Mighty was because I didn't believe we could make a new kind of computer. We could have made some kind of enterprise business, probably it could have made maybe a lot of money, but it wasn't going to be what I hoped it was going to be. And so once I realized that most of a web app is just going to be single-threaded JavaScript, then the only thing you could do largely withstanding changing JavaScript, which is a fool's errand most likely, make a better CPU, right? And there's like three CPU manufacturers, two of which sell, you know, big ones, you know, AMD, Intel, and then of course like Apple made the M1. And it's not like single-threaded CPU core performance, single-core performance was increasing very fast, it's plateauing rapidly. And even these different companies were not doing as good of a job, you know, sort of with the continuation of Moore's law. But what happened in AI was that you got like, if you think of the AI model as like a computer program, like just like a compiled computer program, it is literally built and designed to do massive parallel computations. And so if you could take like the universal approximation theorem to its like kind of logical complete point, you know, you're like, wow, I can get, make computation happen really rapidly and parallel somewhere else, you know, so you end up with these like really amazing models that can like do anything. It just turned out like perhaps the new kind of computer would just simply be shifted, you know, into these like really amazing AI models in reality. Yeah. [00:06:30]Swyx: Like I think Andrej Karpathy has always been, has been making a lot of analogies with the LLMOS. [00:06:34]Suhail: I saw his video and I watched that, you know, maybe two weeks ago or something like that. I was like, oh man, this, I very much resonate with this like idea. [00:06:41]Swyx: Why didn't I see this three years ago? [00:06:43]Suhail: Yeah. I think, I think there still will be, you know, local models and then there'll be these very large models that have to be run in data centers. I think it just depends on kind of like the right tool for the job, like any engineer would probably care about. But I think that, you know, by and large, like if the models continue to kind of keep getting bigger, you're always going to be wondering whether you should use the big thing or the small, you know, the tiny little model. And it might just depend on like, you know, do you need 30 FPS or 60 FPS? Maybe that would be hard to do, you know, over a network. [00:07:13]Swyx: You tackled a much harder problem latency wise than the AI models actually require. Yeah. [00:07:18]Suhail: Yeah. You can do quite well. You can do quite well. You definitely did 30 FPS video streaming, did very crazy things to make that work. So I'm actually quite bullish on the kinds of things you can do with networking. [00:07:30]Swyx: Maybe someday you'll come back to that at some point. But so for those that don't know, you're very transparent on Twitter. Very good to follow you just to learn your insights. And you actually published a postmortem on Mighty that people can read up on and willing to. So there was a bit of an overlap. You started exploring the AI stuff in June 2022, which is when you started saying like, I'm taking fast AI again. Maybe, was there more context around that? [00:07:54]Suhail: Yeah. I think I was kind of like waiting for the team at Mighty to finish up, you know, something. And I was like, okay, well, what can I do? I guess I will make some kind of like address bar predictor in the browser. So we had, you know, we had forked Chrome and Chromium. And I was like, you know, one thing that's kind of lame is that like this browser should be like a lot better at predicting what I might do, where I might want to go. It struck me as really odd that, you know, Chrome had very little AI actually or ML inside this browser. For a company like Google, you'd think there's a lot. Code is actually just very, you know, it's just a bunch of if then statements is more or less the address bar. So it seemed like a pretty big opportunity. And that's also where a lot of people interact with the browser. So, you know, long story short, I was like, hmm, I wonder what I could build here. So I started to take some AI courses and review the material again and get back to figuring it out. But I think that was somewhat serendipitous because right around April was, I think, a very big watershed moment in AI because that's when Dolly 2 came out. And I think that was the first truly big viral moment for generative AI. [00:08:59]Swyx: Because of the avocado chair. [00:09:01]Suhail: Yeah, exactly. [00:09:02]Swyx: It wasn't as big for me as Stable Diffusion. [00:09:04]Suhail: Really? [00:09:05]Swyx: Yeah, I don't know. Dolly was like, all right, that's cool. [00:09:07]Suhail: I don't know. Yeah. [00:09:09]Swyx: I mean, they had some flashy videos, but it didn't really register. [00:09:13]Suhail: That moment of images was just such a viral novel moment. I think it just blew people's mind. Yeah. [00:09:19]Swyx: I mean, it's the first time I encountered Sam Altman because they had this Dolly 2 hackathon and they opened up the OpenAI office for developers to walk in back when it wasn't as much of a security issue as it is today. I see. Maybe take us through the journey to decide to pivot into this and also choosing images. Obviously, you were inspired by Dolly, but there could be any number of AI companies and businesses that you could start and why this one, right? [00:09:45]Suhail: Yeah. So I think at that time, Mighty and OpenAI was not quite as popular as it is all of a sudden now these days, but back then they had a lot more bandwidth to kind of help anybody. And so we had been talking with the team there around trying to see if we could do really fast low latency address bar prediction with GPT-3 and 3.5 and that kind of thing. And so we were sort of figuring out how could we make that low latency. I think that just being able to talk to them and kind of being involved gave me a bird's eye view into a bunch of things that started to happen. Latency first was the Dolly 2 moment, but then stable diffusion came out and that was a big moment for me as well. And I remember just kind of like sitting up one night thinking, I was like, you know, what are the kinds of companies one could build? Like what matters right now? One thing that I observed is that I find a lot of inspiration when I'm working in a field in something and then I can identify a bunch of problems. Like for Mixpanel, I was an intern at a company and I just noticed that they were doing all this data analysis. And so I thought, hmm, I wonder if I could make a product and then maybe they would use it. And in this case, you know, the same thing kind of occurred. It was like, okay, there are a bunch of like infrastructure companies that put a model up and then you can use their API, like Replicate is a really good example of that. There are a bunch of companies that are like helping you with training, model optimization, Mosaic at the time, and probably still, you know, was doing stuff like that. So I just started listing out like every category of everything, of every company that was doing something interesting. I started listing out like weights and biases. I was like, oh man, weights and biases is like this great company. Do I want to compete with that company? I might be really good at competing with that company because of Mixpanel because it's so much of like analysis. But I was like, no, I don't want to do anything related to that. That would, I think that would be too boring now at this point. So I started to list out all these ideas and one thing I observed was that at OpenAI, they had like a playground for GPT-3, right? All it was is just like a text box more or less. And then there were some settings on the right, like temperature and whatever. [00:11:41]Swyx: Top K. [00:11:42]Suhail: Yeah, top K. You know, what's your end stop sequence? I mean, that was like their product before GPT, you know, really difficult to use, but fun if you're like an engineer. And I just noticed that their product kind of was evolving a little bit where the interface kind of was getting a little bit more complex. They had like a way where you could like generate something in the middle of a sentence and all those kinds of things. And I just thought to myself, I was like, everything is just like this text box and you generate something and that's about it. And stable diffusion had kind of come out and it was all like hugging face and code. Nobody was really building any UI. And so I had this kind of thing where I wrote prompt dash like question mark in my notes and I didn't know what was like the product for that at the time. I mean, it seems kind of trite now, but I just like wrote prompt. What's the thing for that? Manager. Prompt manager. Do you organize them? Like, do you like have a UI that can play with them? Yeah. Like a library. What would you make? And so then, of course, then you thought about what would the modalities be given that? How would you build a UI for each kind of modality? And so there are a couple of people working on some pretty cool things. And I basically chose graphics because it seemed like the most obvious place where you could build a really powerful, complex UI. That's not just only typing a box. It would very much evolve beyond that. Like what would be the best thing for something that's visual? Probably something visual. Yeah. I think that just that progression kind of happened and it just seemed like there was a lot of effort going into language, but not a lot of effort going into graphics. And then maybe the very last thing was, I think I was talking to Aditya Ramesh, who was the co-creator of DALL-E 2 and Sam. And I just kind of went to these guys and I was just like, hey, are you going to make like a UI for this thing? Like a true UI? Are you going to go for this? Are you going to make a product? For DALL-E. Yeah. For DALL-E. Yeah. Are you going to do anything here? Because if you are going to do it, just let me know and I will stop and I'll go do something else. But if you're not going to do anything, I'll just do it. And so we had a couple of conversations around what that would look like. And then I think ultimately they decided that they were going to focus on language primarily. And I just felt like it was going to be very underinvested in. Yes. [00:13:46]Swyx: There's that sort of underinvestment from OpenAI, but also it's a different type of customer than you're used to, presumably, you know, and Mixpanel is very good at selling to B2B and developers will figure on you or not. Yeah. Was that not a concern? [00:14:00]Suhail: Well, not so much because I think that, you know, right now I would say graphics is in this very nascent phase. Like most of the customers are just like hobbyists, right? Yeah. Like it's a little bit of like a novel toy as opposed to being this like very high utility thing. But I think ultimately, if you believe that you could make it very high utility, the probably the next customers will end up being B2B. It'll probably not be like a consumer. There will certainly be a variation of this idea that's in consumer. But if your quest is to kind of make like something that surpasses human ability for graphics, like ultimately it will end up being used for business. So I think it's maybe more of a progression. In fact, for me, it's maybe more like Mixpanel started out as SMB and then very much like ended up starting to grow up towards enterprise. So for me, I think it will be a very similar progression. But yeah, I mean, the reason why I was excited about it is because it was a creative tool. I make music and it's AI. It's like something that I know I could stay up till three o'clock in the morning doing. Those are kind of like very simple bars for me. [00:14:56]Alessio: So you mentioned Dolly, Stable Diffusion. You just had Playground V2 come out two days ago. Yeah, two days ago. [00:15:02]Suhail: Two days ago. [00:15:03]Alessio: This is a model you train completely from scratch. So it's not a cheap fine tune on something. You open source everything, including the weights. Why did you decide to do it? I know you supported Stable Diffusion XL in Playground before, right? Yep. What made you want to come up with V2 and maybe some of the interesting, you know, technical research work you've done? [00:15:24]Suhail: Yeah. So I think that we continue to feel like graphics and these foundation models for anything really related to pixels, but also definitely images continues to be very underinvested. It feels a little like graphics is in like this GPT-2 moment, right? Like even GPT-3, even when GPT-3 came out, it was exciting, but it was like, what are you going to use this for? Yeah, we'll do some text classification and some semantic analysis and maybe it'll sometimes like make a summary of something and it'll hallucinate. But no one really had like a very significant like business application for GPT-3. And in images, we're kind of stuck in the same place. We're kind of like, okay, I write this thing in a box and I get some cool piece of artwork and the hands are kind of messed up and sometimes the eyes are a little weird. Maybe I'll use it for a blog post, you know, that kind of thing. The utility feels so limited. And so, you know, and then we, you sort of look at Stable Diffusion and we definitely use that model in our product and our users like it and use it and love it and enjoy it, but it hasn't gone nearly far enough. So we were kind of faced with the choice of, you know, do we wait for progress to occur or do we make that progress happen? So yeah, we kind of embarked on a plan to just decide to go train these things from scratch. And I think the community has given us so much. The community for Stable Diffusion I think is one of the most vibrant communities on the internet. It's like amazing. It feels like, I hope this is what like Homebrew Club felt like when computers like showed up because it's like amazing what that community will do and it moves so fast. I've never seen anything in my life and heard other people's stories around this where an academic research paper comes out and then like two days later, someone has sample code for it. And then two days later, there's a model. And then two days later, it's like in nine products, you know, they're all competing with each other. It's incredible to see like math symbols on an academic paper go to well-designed features in a product. So I think the community has done so much. So I think we wanted to give back to the community kind of on our way. Certainly we would train a better model than what we gave out on Tuesday, but we definitely felt like there needs to be some kind of progress in these open source models. The last kind of milestone was in July when Stable Diffusion Excel came out, but there hasn't been anything really since. Right. [00:17:34]Swyx: And there's Excel Turbo now. [00:17:35]Suhail: Well, Excel Turbo is like this distilled model, right? So it's like lower quality, but fast. You have to decide, you know, what your trade off is there. [00:17:42]Swyx: It's also a consistency model. [00:17:43]Suhail: I don't think it's a consistency model. It's like it's they did like a different thing. Yeah. I think it's like, I don't want to get quoted for this, but it's like something called ad like adversarial or something. [00:17:52]Swyx: That's exactly right. [00:17:53]Suhail: I've read something about that. Maybe it's like closer to GANs or something, but I didn't really read the full paper. But yeah, there hasn't been quite enough progress in terms of, you know, there's no multitask image model. You know, the closest thing would be something called like EmuEdit, but there's no model for that. It's just a paper that's within meta. So we did that and we also gave out pre-trained weights, which is very rare. Usually you just get the aligned model and then you have to like see if you can do anything with it. So we actually gave out, there's like a 256 pixel pre-trained stage and a 512. And we did that for academic research because we come across people all the time in academia, they have access to like one A100 or eight at best. And so if we can give them kind of like a 512 pre-trained model, our hope is that there'll be interesting novel research that occurs from that. [00:18:38]Swyx: What research do you want to happen? [00:18:39]Suhail: I would love to see more research around things that users care about tend to be things like character consistency. [00:18:45]Swyx: Between frames? [00:18:46]Suhail: More like if you have like a face. Yeah, yeah. Basically between frames, but more just like, you know, you have your face and it's in one image and then you want it to be like in another. And users are very particular and sensitive to faces changing because we know we're trained on faces as humans. Not seeing a lot of innovation, enough innovation around multitask editing. You know, there are two things like instruct pics to pics and then the EmuEdit paper that are maybe very interesting, but we certainly are not pushing the fold on that in that regard. All kinds of things like around that rotation, you know, being able to keep coherence across images, style transfer is still very limited. Just even reasoning around images, you know, what's going on in an image, that kind of thing. Things are still very, very underpowered, very nascent. So therefore the utility is very, very limited. [00:19:32]Alessio: On the 1K Prompt Benchmark, you are 2.5x prefer to Stable Diffusion XL. How do you get there? Is it better images in the training corpus? Can you maybe talk through the improvements in the model? [00:19:44]Suhail: I think they're still very early on in the recipe, but I think it's a lot of like little things and you know, every now and then there are some big important things like certainly your data quality is really, really important. So we spend a lot of time thinking about that. But I would say it's a lot of things that you kind of clean up along the way as you train your model. Everything from captions to the data that you align with after pre-train to how you're picking your data sets, how you filter your data sets. I feel like there's a lot of work in AI that doesn't really feel like AI. It just really feels like just data set filtering and systems engineering and just like, you know, and the recipe is all there, but it's like a lot of extra work to do that. I think we plan to do a Playground V 2.1, maybe either by the end of the year or early next year. And we're just like watching what the community does with the model. And then we're just going to take a lot of the things that they're unhappy about and just like fix them. You know, so for example, like maybe the eyes of people in an image don't feel right. They feel like they're a little misshapen or they're kind of blurry feeling. That's something that we already know we want to fix. So I think in that case, it's going to be about data quality. Or maybe you want to improve the kind of the dynamic range of color. You know, we want to make sure that that's like got a good range in any image. So what technique can we use there? There's different things like offset noise, pyramid noise, terminal zero, SNR, like there are all these various interesting things that you can do. So I think it's like a lot of just like tricks. Some are tricks, some are data, and some is just like cleaning. [00:21:11]Swyx: Specifically for faces, it's very common to use a pipeline rather than just train the base model more. Do you have a strong belief either way on like, oh, they should be separated out to different stages for like improving the eyes, improving the face or enhance or whatever? Or do you think like it can all be done in one model? [00:21:28]Suhail: I think we will make a unified model. Yeah, I think it will. I think we'll certainly in the end, ultimately make a unified model. There's not enough research about this. Maybe there is something out there that we haven't read. There are some bottlenecks, like for example, in the VAE, like the VAEs are ultimately like compressing these things. And so you don't know. And then you might have like a big informational information bottleneck. So maybe you would use a pixel based model, perhaps. I think we've talked to people, everyone from like Rombach to various people, Rombach trained stable diffusion. I think there's like a big question around the architecture of these things. It's still kind of unknown, right? Like we've got transformers and we've got like a GPT architecture model, but then there's this like weird thing that's also seemingly working with diffusion. And so, you know, are we going to use vision transformers? Are we going to move to pixel based models? Is there a different kind of architecture? We don't really, I don't think there have been enough experiments. Still? Oh my God. [00:22:21]Swyx: Yeah. [00:22:22]Suhail: That's surprising. I think it's very computationally expensive to do a pipeline model where you're like fixing the eyes and you're fixing the mouth and you're fixing the hands. [00:22:29]Swyx: That's what everyone does as far as I understand. [00:22:31]Suhail: I'm not exactly sure what you mean, but if you mean like you get an image and then you will like make another model specifically to fix a face, that's fairly computationally expensive. And I think it's like not probably not the right way. Yeah. And it doesn't generalize very well. Now you have to pick all these different things. [00:22:45]Swyx: Yeah. You're just kind of glomming things on together. Yeah. Like when I look at AI artists, like that's what they do. [00:22:50]Suhail: Ah, yeah, yeah, yeah. They'll do things like, you know, I think a lot of ARs will do control net tiling to do kind of generative upscaling of all these different pieces of the image. Yeah. And I think these are all just like, they're all hacks ultimately in the end. I mean, it just to me, it's like, let's go back to where we were just three years, four years ago with where deep learning was at and where language was that, you know, it's the same thing. It's like we were like, okay, well, I'll just train these very narrow models to try to do these things and kind of ensemble them or pipeline them to try to get to a best in class result. And here we are with like where the models are gigantic and like very capable of solving huge amounts of tasks when given like lots of great data. [00:23:28]Alessio: You also released a new benchmark called MJHQ30K for automatic evaluation of a model's aesthetic quality. I have one question. The data set that you use for the benchmark is from Midjourney. Yes. You have 10 categories. How do you think about the Playground model, Midjourney, like, are you competitors? [00:23:47]Suhail: There are a lot of people, a lot of people in research, they like to compare themselves to something they know they can beat, right? Maybe this is the best reason why it can be helpful to not be a researcher also sometimes like I'm not trained as a researcher, I don't have a PhD in anything AI related, for example. But I think if you care about products and you care about your users, then the most important thing that you want to figure out is like everyone has to acknowledge that Midjourney is very good. They are the best at this thing. I'm happy to admit that. I have no problem admitting that. Just easy. It's very visual to tell. So I think it's incumbent on us to try to compare ourselves to the thing that's best, even if we lose, even if we're not the best. At some point, if we are able to surpass Midjourney, then we only have ourselves to compare ourselves to. But on First Blush, I think it's worth comparing yourself to maybe the best thing and try to find like a really fair way of doing that. So I think more people should try to do that. I definitely don't think you should be kind of comparing yourself on like some Google model or some old SD, Stable Diffusion model and be like, look, we beat Stable Diffusion 1.5. I think users ultimately want care, how close are you getting to the thing that people mostly agree with? So we put out that benchmark for no other reason to say like, this seems like a worthy thing for us to at least try, for people to try to get to. And then if we surpass it, great, we'll come up with another one. [00:25:06]Alessio: Yeah, no, that's awesome. And you killed Stable Diffusion Excel and everything. In the benchmark chart, it says Playground V2 1024 pixel dash aesthetic. Do you have kind of like, yeah, style fine tunes or like what's the dash aesthetic for? [00:25:21]Suhail: We debated this, maybe we named it wrong or something, but we were like, how do we help people realize the model that's aligned versus the models that weren't? Because we gave out pre-trained models, we didn't want people to like use those. So that's why they're called base. And then the aesthetic model, yeah, we wanted people to pick up the thing that makes things pretty. Who wouldn't want the thing that's aesthetic? But if there's a better name, we're definitely open to feedback. No, no, that's cool. [00:25:46]Alessio: I was using the product. You also have the style filter and you have all these different styles. And it seems like the styles are tied to the model. So there's some like SDXL styles, there's some Playground V2 styles. Can you maybe give listeners an overview of how that works? Because in language, there's not this idea of like style, right? Versus like in vision model, there is, and you cannot get certain styles in different [00:26:11]Suhail: models. [00:26:12]Alessio: So how do styles emerge and how do you categorize them and find them? [00:26:15]Suhail: Yeah, I mean, it's so fun having a community where people are just trying a model. Like it's only been two days for Playground V2. And we actually don't know what the model's capable of and not capable of. You know, we certainly see problems with it. But we have yet to see what emergent behavior is. I mean, we've just sort of discovered that it takes about like a week before you start to see like new things. I think like a lot of that style kind of emerges after that week, where you start to see, you know, there's some styles that are very like well known to us, like maybe like pixel art is a well known style. Photorealism is like another one that's like well known to us. But there are some styles that cannot be easily named. You know, it's not as simple as like, okay, that's an anime style. It's very visual. And in the end, you end up making up the name for what that style represents. And so the community kind of shapes itself around these different things. And so if anyone that's into stable diffusion and into building anything with graphics and stuff with these models, you know, you might have heard of like Proto Vision or Dream Shaper, some of these weird names, but they're just invented by these authors. But they have a sort of je ne sais quoi that, you know, appeals to users. [00:27:26]Swyx: Because it like roughly embeds to what you what you want. [00:27:29]Suhail: I guess so. I mean, it's like, you know, there's one of my favorite ones that's fine tuned. It's not made by us. It's called like Starlight XL. It's just this beautiful model. It's got really great color contrast and visual elements. And the users love it. I love it. And it's so hard. I think that's like a very big open question with graphics that I'm not totally sure how we'll solve. I don't know. It's, it's like an evolving situation too, because styles get boring, right? They get fatigued. Like it's like listening to the same style of pop song. I try to relate to graphics a little bit like with music, because I think it gives you a little bit of a different shape to things. Like it's not as if we just have pop music, rap music and country music, like all of these, like the EDM genre alone has like sub genres. And I think that's very true in graphics and painting and art and anything that we're doing. There's just these sub genres, even if we can't quite always name them. But I think they are emergent from the community, which is why we're so always happy to work with the community. [00:28:26]Swyx: That is a struggle. You know, coming back to this, like B2B versus B2C thing, B2C, you're going to have a huge amount of diversity and then it's going to reduce as you get towards more sort of B2B type use cases. I'm making this up here. So like you might be optimizing for a thing that you may eventually not need. [00:28:42]Suhail: Yeah, possibly. Yeah, possibly. I think like a simple thing with startups is that I worry sometimes by trying to be overly ambitious and like really scrutinizing like what something is in its most nascent phase that you miss the most ambitious thing you could have done. Like just having like very basic curiosity with something very small can like kind of lead you to something amazing. Like Einstein definitely did that. And then he like, you know, he basically won all the prizes and got everything he wanted and then basically did like kind of didn't really. He can dismiss quantum and then just kind of was still searching, you know, for the unifying theory. And he like had this quest. I think that happens a lot with like Nobel Prize people. I think there's like a term for it that I forget. I actually wanted to go after a toy almost intentionally so long as that I could see, I could imagine that it would lead to something very, very large later. Like I said, it's very hobbyist, but you need to start somewhere. You need to start with something that has a big gravitational pull, even if these hobbyists aren't likely to be the people that, you know, have a way to monetize it or whatever, even if they're, but they're doing it for fun. So there's something, something there that I think is really important. But I agree with you that, you know, in time we will absolutely focus on more utilitarian things like things that are more related to editing feats that are much harder. And so I think like a very simple use case is just, you know, I'm not a graphics designer. It seems like very simple that like you, if we could give you the ability to do really complex graphics without skill, wouldn't you want that? You know, like my wife the other day was set, you know, said, I wish Playground was better. When are you guys going to have a feature where like we could make my son, his name's Devin, smile when he was not smiling in the picture for the holiday card. Right. You know, just being able to highlight his, his mouth and just say like, make him smile. Like why can't we do that with like high fidelity and coherence, little things like that, all the way to putting you in completely different scenarios. [00:30:35]Swyx: Is that true? Can we not do that in painting? [00:30:37]Suhail: You can do in painting, but the quality is just so bad. Yeah. It's just really terrible quality. You know, it's like you'll do it five times and it'll still like kind of look like crooked or just artifact. Part of it's like, you know, the lips on the face, there's such little information there. So small that the models really struggle with it. Yeah. [00:30:55]Swyx: Make the picture smaller and you don't see it. That's my trick. I don't know. [00:30:59]Suhail: Yeah. Yeah. That's true. Or, you know, you could take that region and make it really big and then like say it's a mouth and then like shrink it. It feels like you're wrestling with it more than it's doing something that kind of surprises you. [00:31:12]Swyx: Yeah. It feels like you are very much the internal tastemaker, like you carry in your head this vision for what a good art model should look like. Do you find it hard to like communicate it to like your team and other people? Just because it's obviously it's hard to put into words like we just said. [00:31:26]Suhail: Yeah. It's very hard to explain. Images have such high bitrate compared to just words and we don't have enough words to describe these things. It's not terribly difficult. I think everyone on the team, if they don't have good kind of like judgment taste or like an eye for some of these things, they're like steadily building it because they have no choice. Right. So in that realm, I don't worry too much, actually. Like everyone is kind of like learning to get the eye is what I would call it. But I also have, you know, my own narrow taste. Like I don't represent the whole population either. [00:31:59]Swyx: When you benchmark models, you know, like this benchmark we're talking about, we use FID. Yeah. Input distance. OK. That's one measure. But like it doesn't capture anything you just said about smiles. [00:32:08]Suhail: Yeah. FID is generally a bad metric. It's good up to a point and then it kind of like is irrelevant. Yeah. [00:32:14]Swyx: And then so are there any other metrics that you like apart from vibes? I'm always looking for alternatives to vibes because vibes don't scale, you know. [00:32:22]Suhail: You know, it might be fun to kind of talk about this because it's actually kind of fresh. So up till now, we haven't needed to do a ton of like benchmarking because we hadn't trained our own model and now we have. So now what? What does that mean? How do we evaluate it? And, you know, we're kind of like living with the last 48, 72 hours of going, did the way that we benchmark actually succeed? [00:32:43]Swyx: Did it deliver? [00:32:44]Suhail: Right. You know, like I think Gemini just came out. They just put out a bunch of benchmarks. But all these benchmarks are just an approximation of how you think it's going to end up with real world performance. And I think that's like very fascinating to me. So if you fake that benchmark, you'll still end up in a really bad scenario at the end of the day. And so, you know, one of the benchmarks we did was we kind of curated like a thousand prompts. And I think that's kind of what we published in our blog post, you know, of all these tasks that we a lot of some of them are curated by our team where we know the models all suck at it. Like my favorite prompt that no model is really capable of is a horse riding an astronaut, the inverse one. And it's really, really hard to do. [00:33:22]Swyx: Not in data. [00:33:23]Suhail: You know, another one is like a giraffe underneath a microwave. How does that work? Right. There's so many of these little funny ones. We do. We have prompts that are just like misspellings of things. Yeah. We'll figure out if the models will figure it out. [00:33:36]Swyx: They should embed to the same space. [00:33:39]Suhail: Yeah. And just like all these very interesting weirdo things. And so we have so many of these and then we kind of like evaluate whether the models are any good at it. And the reality is that they're all bad at it. And so then you're just picking the most aesthetic image. We're still at the beginning of building like the best benchmark we can that aligns most with just user happiness, I think, because we're not we're not like putting these in papers and trying to like win, you know, I don't know, awards at ICCV or something if they have awards. You could. [00:34:05]Swyx: That's absolutely a valid strategy. [00:34:06]Suhail: Yeah, you could. But I don't think it could correlate necessarily with the impact we want to have on humanity. I think we're still evolving whatever our benchmarks are. So the first benchmark was just like very difficult tasks that we know the models are bad at. Can we come up with a thousand of these, whether they're hand rated and some of them are generated? And then can we ask the users, like, how do we do? And then we wanted to use a benchmark like party prompts. We mostly did that so people in academia could measure their models against ours versus others. But yeah, I mean, fit is pretty bad. And I think in terms of vibes, it's like you put out the model and then you try to see like what users make. And I think my sense is that we're going to take all the things that we notice that the users kind of were failing at and try to find like new ways to measure that, whether that's like a smile or, you know, color contrast or lighting. One benefit of Playground is that we have users making millions of images every single day. And so we can just ask them for like a post generation feedback. Yeah, we can just ask them. We can just say, like, how good was the lighting here? How was the subject? How was the background? [00:35:06]Swyx: Like a proper form of like, it's just like you make it, you come to our site, you make [00:35:10]Suhail: an image and then we say, and then maybe randomly you just say, hey, you know, like, how was the color and contrast of this image? And you say it was not very good, just tell us. So I think I think we can get like tens of thousands of these evaluations every single day to truly measure real world performance as opposed to just like benchmark performance. I would like to publish hopefully next year. I think we will try to publish a benchmark that anyone could use, that we evaluate ourselves on and that other people can, that we think does a good job of approximating real world performance because we've tried it and done it and noticed that it did. Yeah. I think we will do that. [00:35:45]Swyx: I personally have a few like categories that I consider special. You know, you know, you have like animals, art, fashion, food. There are some categories which I consider like a different tier of image. Top among them is text in images. How do you think about that? So one of the big wow moments for me, something I've been looking out for the entire year is just the progress of text and images. Like, can you write in an image? Yeah. And Ideogram came out recently, which had decent but not perfect text and images. Dolly3 had improved some and all they said in their paper was that they just included more text in the data set and it just worked. I was like, that's just lazy. But anyway, do you care about that? Because I don't see any of that in like your sample. Yeah, yeah. [00:36:27]Suhail: The V2 model was mostly focused on image quality versus like the feature of text synthesis. [00:36:33]Swyx: Well, as a business user, I care a lot about that. [00:36:35]Suhail: Yeah. Yeah. I'm very excited about text synthesis. And yeah, I think Ideogram has done a good job of maybe the best job. Dolly has like a hit rate. Yes. You know, like sometimes it's Egyptian letters. Yeah. I'm very excited about text synthesis. You know, I don't have much to say on it just yet. You know, you don't want just text effects. I think where this has to go is it has to be like you could like write little tiny pieces of text like on like a milk carton. That's maybe not even the focal point of a scene. I think that's like a very hard task that, you know, if you could do something like that, then there's a lot of other possibilities. Well, you don't have to zero shot it. [00:37:09]Swyx: You can just be like here and focus on this. [00:37:12]Suhail: Sure. Yeah, yeah. Definitely. Yeah. [00:37:16]Swyx: Yeah. So I think text synthesis would be very exciting. I'll also flag that Max Wolf, MiniMaxxier, which you must have come across his work. He's done a lot of stuff about using like logo masks that then map onto food and vegetables. And it looks like text, which can be pretty fun. [00:37:29]Suhail: That's the wonderful thing about like the open source community is that you get things like control net and then you see all these people do these just amazing things with control net. And then you wonder, I think from our point of view, we sort of go that that's really wonderful. But how do we end up with like a unified model that can do that? What are the bottlenecks? What are the issues? The community ultimately has very limited resources. And so they need these kinds of like workaround research ideas to get there. But yeah. [00:37:55]Swyx: Are techniques like control net portable to your architecture? [00:37:58]Suhail: Definitely. Yeah. We kept the Playground V2 exactly the same as SDXL. Not because not out of laziness, but just because we knew that the community already had tools. You know, all you have to do is maybe change a string in your code and then, you know, retrain a control net for it. So it was very intentional to do that. We didn't want to fragment the community with different architectures. Yeah. [00:38:16]Swyx: So basically, I'm going to go over three more categories. One is UIs, like app UIs, like mock UIs. Third is not safe for work, and then copyrighted stuff. I don't know if you care to comment on any of those. [00:38:28]Suhail: I think the NSFW kind of like safety stuff is really important. I kind of think that one of the biggest risks kind of going into maybe the U.S. election year will probably be very interrelated with like graphics, audio, video. I think it's going to be very hard to explain, you know, to a family relative who's not kind of in our world. And our world is like sometimes very, you know, we think it's very big, but it's very tiny compared to the rest of the world. Some people like there's still lots of humanity who have no idea what chat GPT is. And I think it's going to be very hard to explain, you know, to your uncle, aunt, whoever, you know, hey, I saw President Biden say this thing on a video, you know, I can't believe, you know, he said that. I think that's going to be a very troubling thing going into the world next year, the year after. [00:39:12]Swyx: That's more like a risk thing, like deepfakes, faking, political faking. But there's a lot of studies on how for most businesses, you don't want to train on not safe for work images, except that it makes you really good at bodies. [00:39:24]Suhail: Personally, we filter out NSFW type of images in our data set so that it's, you know, so our safety filter stuff doesn't have to work as hard. [00:39:32]Swyx: But you've heard this argument that not safe for work images are very good at human anatomy, which you do want to be good at. [00:39:38]Suhail: It's not like necessarily a bad thing to train on that data. It's more about like how you go and use it. That's why I was kind of talking about safety, you know, in part, because there are very terrible things that can happen in the world. If you have an extremely powerful graphics model, you know, suddenly like you can kind of imagine, you know, now if you can like generate nudes and then there's like you could do very character consistent things with faces, like what does that lead to? Yeah. And so I tend to think more what occurs after that, right? Even if you train on, let's say, you know, new data, if it does something to kind of help, there's nothing wrong with the human anatomy, it's very valid for a model to learn that. But then it's kind of like, how does that get used? And, you know, I won't bring up all of the very, very unsavory, terrible things that we see on a daily basis on the site, but I think it's more about what occurs. And so we, you know, we just recently did like a big sprint on safety. It's very difficult with graphics and art, right? Because there is tasteful art that has nudity, right? They're all over in museums, like, you know, there's very valid situations for that. And then there's the things that are the gray line of that, you know, what I might not find tasteful, someone might be like, that is completely tasteful, right? And then there are things that are way over the line. And then there are things that maybe you or, you know, maybe I would be okay with, but society isn't, you know? So where does that kind of end up on the spectrum of things? I think it's really hard with art. Sometimes even if you have like things that are not nude, if a child goes to your site, scrolls down some images, you know, classrooms of kids, you know, using our product, it's a really difficult problem. And it stretches mostly culture, society, politics, everything. [00:41:14]Alessio: Another favorite topic of our listeners is UX and AI. And I think you're probably one of the best all-inclusive editors for these things. So you don't just have the prompt, images come out, you pray, and now you do it again. First, you let people pick a seed so they can kind of have semi-repeatable generation. You also have, yeah, you can pick how many images and then you leave all of them in the canvas. And then you have kind of like this box, the generation box, and you can even cross between them and outpaint. There's all these things. How did you get here? You know, most people are kind of like, give me text, I give you image. You know, you're like, these are all the tools for you. [00:41:54]Suhail: Even though we were trying to make a graphics foundation model, I think we think that we're also trying to like re-imagine like what a graphics editor might look like given the change in technology. So, you know, I don't think we're trying to build Photoshop, but it's the only thing that we could say that people are largely familiar with. Oh, okay, there's Photoshop. What would Photoshop compare itself to pre-computer? I don't know, right? It's like, or kind of like a canvas, but you know, there's these menu options and you can use your mouse. What's a mouse? So I think that we're trying to re-imagine what a graphics editor might look like, not just for the fun of it, but because we kind of have no choice. Like there's this idea in image generation where you can generate images. That's like a super weird thing. What is that in Photoshop, right? You have to wait right now for the time being, but the wait is worth it often for a lot of people because they can't make that with their own skills. So I think it goes back to, you know, how we started the company, which was kind of looking at GPT-3's Playground, that the reason why we're named Playground is a homage to that actually. And, you know, it's like, shouldn't these products be more visual? These prompt boxes are like a terminal window, right? We're kind of at this weird point where it's just like MS-DOS. I remember my mom using MS-DOS and I memorized the keywords, like DIR, LS, all those things, right? It feels a little like we're there, right? Prompt engineering, parentheses to say beautiful or whatever, waits the word token more in the model or whatever. That's like super strange. I think a large portion of humanity would agree that that's not user-friendly, right? So how do we think about the products to be more user-friendly? Well, sure, you know, sure, it would be nice if I wanted to get rid of, like, the headphones on my head, you know, it'd be nice to mask it and then say, you know, can you remove the headphones? You know, if I want to grow, expand the image, you know, how can we make that feel easier without typing lots of words and being really confused? I don't even think we've nailed the UI UX yet. Part of that is because we're still experimenting. And part of that is because the model and the technology is going to get better. And whatever felt like the right UX six months ago is going to feel very broken now. So that's a little bit of how we got there is kind of saying, does everything have to be like a prompt in a box? Or can we do things that make it very intuitive for users? [00:44:03]Alessio: How do you decide what to give access to? So you have things like an expand prompt, which Dally 3 just does. It doesn't let you decide whether you should or not. [00:44:13]Swyx: As in, like, rewrites your prompts for you. [00:44:15]Suhail: Yeah, for that feature, I think once we get it to be cheaper, we'll probably just give it up. We'll probably just give it away. But we also decided something that might be a little bit different. We noticed that most of image generation is just, like, kind of casual. You know, it's in WhatsApp. It's, you know, it's in a Discord bot somewhere with Majorny. It's in ChatGPT. One of the differentiators I think we provide is at the expense of just lots of users necessarily. Mainstream consumers is that we provide as much, like, power and tweakability and configurability as possible. So the only reason why it's a toggle, because we know that users might want to use it and might not want to use it. There's some really powerful power user hobbyists that know what they're doing. And then there's a lot of people that just want something that looks cool, but they don't know how to prompt. And so I think a lot of Playground is more about going after that core user base that, like, knows, has a little bit more savviness and how to use these tools. You know, the average Dell user is probably not going to use ControlNet. They probably don't even know what that is. And so I think that, like, as the models get more powerful, as there's more tooling, hopefully you'll imagine a new sort of AI-first graphics editor that's just as, like, powerful and configurable as Photoshop. And you might have to master a new kind of tool. [00:45:28]Swyx: There's so many things I could go bounce off of. One, you mentioned about waiting. We have to kind of somewhat address the elephant in the room. Consistency models have been blowing up the past month. How do you think about integrating that? Obviously, there's a lot of other companies also trying to beat you to that space as well. [00:45:44]Suhail: I think we were the first company to integrate it. Ah, OK. [00:45:47]Swyx: Yeah. I didn't see your demo. [00:45:49]Suhail: Oops. Yeah, yeah. Well, we integrated it in a different way. OK. There are, like, 10 companies right now that have kind of tried to do, like, interactive editing, where you can, like, draw on the left side and then you get an image on the right side. We decided to kind of, like, wait and see whether there's, like, true utility on that. We have a different feature that's, like, unique in our product that is called preview rendering. And so you go to the product and you say, you know, we're like, what is the most common use case? The most common use case is you write a prompt and then you get an image. But what's the most annoying thing about that? The most annoying thing is, like, it feels like a slot machine, right? You're like, OK, I'm going to put it in and maybe I'll get something cool. So we did something that seemed a lot simpler, but a lot more relevant to how users already use these products, which is preview rendering. You toggle it on and it will show you a render of the image. And then graphics tools already have this. Like, if you use Cinema 4D or After Effects or something, it's called viewport rendering. And so we try to take something that exists in the real world that has familiarity and say, OK, you're going to get a rough sense of an early preview of this thing. And then when you're ready to generate, we're going to try to be as coherent about that image that you saw. That way, you're not spending so much time just like pulling down the slot machine lever. I think we were the first company to actually ship a quick LCM thing. Yeah, we were very excited about it. So we shipped it very quick. Yeah. [00:47:03]Swyx: Well, the demos I've been seeing, it's not like a preview necessarily. They're almost using it to animate their generations. Like, because you can kind of move shapes. [00:47:11]Suhail: Yeah, yeah, they're like doing it. They're animating it. But they're sort of showing, like, if I move a moon, you know, can I? [00:47:17]Swyx: I don't know. To me, it unlocks video in a way. [00:47:20]Suhail: Yeah. But the video models are already so much better than that. Yeah. [00:47:23]Swyx: There's another one, which I think is general ecosystem of Loras, right? Civit is obviously the most popular repository of Loras. How do you think about interacting with that ecosystem? [00:47:34]Suhail: The guy that did Lora, not the guy that invented Loras, but the person that brought Loras to Stable Diffusion actually works with us on some projects. His name is Simu. Shout out to Simu. And I think Loras are wonderful. Obviously, fine tuning all these Dreambooth models and such, it's just so heavy. And it's obvious in our conversation around styles and vibes, it's very hard to evaluate the artistry of these things. Loras give people this wonderful opportunity to create sub-genres of art. And I think they're amazing. Any graphics tool, any kind of thing that's expressing art has to provide some level of customization to its user base that goes beyond just typing Greg Rakowski in a prompt. We have to give more than that. It's not like users want to type these real artist names. It's that they don't know how else to get an image that looks interesting. They truly want originality and uniqueness. And I think Loras provide that. And they provide it in a very nice, scalable way. I hope that we find something even better than Loras in the long term, because there are still weaknesses to Loras, but I think they do a good job for now. Yeah. [00:48:39]Swyx: And so you would never compete with Civit? You would just kind of let people import? [00:48:43]Suhail: Civit's a site where all these things get kind of hosted by the community, right? And so, yeah, we'll often pull down some of the best things there. I think when we have a significantly better model, we will certainly build something that gets closer to that. Again, I go back to saying just I still think this is very nascent. Things are very underpowered, right? Loras are not easy to train. They're easy for an engineer. It sure would be nicer if I could just pick five or six reference images, right? And they might even be five or six different reference images that are not... They're just very different. They communicate a style, but they're actually like... It's like a mood board, right? And you have to be kind of an engineer almost to train these Loras or go to some site and be technically savvy, at least. It seems like it'd be much better if I could say, I love this style. Here are five images and you tell the model, like, this is what I want. And the model gives you something that's very aligned with what your style is, what you're talking about. And it's a style you couldn't even communicate, right? There's n

god ceo ai google apple real running training phd joe biden hero style startups chatgpt code whatsapp consistency reddit discord concerns b2b rethinking egyptian scaling bill gates releasing mighty survey intel ram cto expand images gemini openai mainstream integrating residence pivoting edm reimagining ux nobel prize api nsfw b2c sd chrome gpt input ui tron playground mosaic ml photoshop clip amd composition devops javascript sam altman prompt fps graphics versus cpu agi m1 smb ls dali midjourney interacting ars kubernetes gpus v2 alessio sharif yc ui ux xai 3k gans fid replicate chromium stable diffusion latency after effects uis pytorch dally vae andrej karpathy suhail mixpanel snr mission bay a100 simu lcm loras cinema 4d huggingface ideogram slurm photorealism rombach latent space max wolf stable diffusion xl like einstein suhail doshi

Short Time Shots: December 8, 2023

Short Time Wrestling Podcast

Play Episode Listen Later Dec 8, 2023 5:44

Twas a backyard brawl on Thursday night. I took in some high school wrestling as I watched Mounds View High School dispatch with Irondale … which are the two high schools in my school district. One is known as the sports school, the other is the band and robotics school. The sports school won 60-6. By the way, former Gopher Marty Morgan's youngest son is a 215-pounder for Mounds View and he's freaking HUGE. He's headed to South Dakota State. I was spotted in the crowd by one Mounds View parent. Was great to just watch some pretty good high school wrestling without actually working. There's some great HS wrestling kicking off today at the Walsh Ironman. Check my socials @jasonmbryant for some knowledge bombs. It's Short Time Shots, a look at the day's college wrestling scores and more, I'm your host, Hall of Fame wrestling writer and broadcaster Jason Bryant. On to the scores on the college scene from Thursday. Speaking of scoreboards, did you know you can find all the D1 dual meet scores and links to their results at collegewrestlingscoreboard.com AND I've added another redirect to help you follow all the tournaments from every division at collegewrestlingtournaments.com. Only D1 dual of the night, Michigan State improved to 4-0 after a 25-9 win over Buffalo ON THE STAGE up in New York. Buffalo did one of those theatre-style duals for the third year in a row. If you look at the dual, this one was probably closer than the score indicates as Michigan State picked up three victories that were two points or less. Still been a rough start for Buffalo this year. In Division II, Augustana (S.D.) knocked off 13th-ranked Northern State 21-13 out in the middle of South Dakota. The Vikings were ranked 18th in the most recent NWCA rankings, which is a tournament points ranking rather than a dual ranking. Augie's top-ranked Jaxson Rohman earned a major at 125 pounds, while top-ranked Wyatt Turnquist won 7-2 at 149 pounds for Northern State. Out west, San Francisco State shutout Simpson (Calif.) 41-0. In D3, second-ranked Wartburg rolled past the only non-Iowa team in the American Rivers Conference, which used to be the Iowa Conference. The Knights had five falls, the fastest coming at 197 pounds by top-ranked Mo Endene, the returning D3 champion at the weight in the 54-0 win. Third-ranked UW-Eau Claire rallied from down 14-3 after five weights and reeled off five straight W's to oust rival UW-Whitewater 25-14. Whitewater did pick up the dual's most notable win as ninth-ranked Dominick Mallinder bounced top-ranked Tyler Fleetwood 7-1 at 133 pounds. 20th-ranked Millikin topped Aurora 23-17. Bonus points were the difference for the Big Blue as three of the wins were bonus victories including falls by Coby Haney at 133 and Nathan Callaway at 157. Also in the CCIW, Elmhurst edged Concordia Wisconsin 24-23. Roanoke beat Shenandoah 43-4 out in my home state of Virginia. Three bonus victories were the difference for Loras as they topped Cornell College 21-19 in a matchup of Iowa schools. In the NAIA, Montana State-Northern rallied to beat Dickinson State 32-16, while No. 11 Providence (Mont.) beat Eastern Oregon 28-15. Top-ranked Hunter Sparks of Eastern Oregon got a fall in that one at 125 pounds. The Evergreen State College earned its first win, topping Grays Harbor College 50-3. Evergreen's mascot? The GEODUCKS. What is a Geoduck? Well, it's the largest burrowing clam in the world, native to the coastal waters of the Pacific. Makes sense, Evergreen is in Washington. Grays Harbor is too, just a mile or two from the Muddy Banks of the Wishkah. Some of you will get the reference. On the women's side, fourth ranked Colorado Mesa torched Texas Woman's University 44-5, and Adams State 45-4. UW-Stevens Point topped Wartburg 30-13 in the first home dual in school history for the new women's program at Wartburg. Minot State was a winner over Dakota Wesleyan 27-16. Evergreen's women also picked up a W, beating Grays Harbor 34-8. Shout outs to our latest Mat Talk patrons, Warrior Coach and KBnoswag. Subscribe, for free, to my Daily Wrestling News. Get it for free at mattalkonline.com/news, sign up free today. That's as always, delivered by our longtime friends at Resilite.

new york university washington speaking hall of fame iowa buffalo pacific vikings shots knights michigan state university south dakota makes d1 hs evergreen d3 roanoke big blue naia whitewater shenandoah augie south dakota state texas woman elmhurst evergreen state college short time wartburg eastern oregon san francisco state uw whitewater northern states cornell college loras jason bryant uw stevens point uw eau claire grays harbor adams state millikin nwca irondale colorado mesa geoduck geoducks concordia wisconsin

The Busy Person's Intro to Finetuning & Open Source AI - Wing Lian, Axolotl

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Dec 8, 2023 64:18

The Latent Space crew will be at NeurIPS on Tuesday! Reach out with any parties and papers of interest. We have also been incubating a smol daily AI Newsletter and Latent Space University is making progress.Good open models like Llama 2 and Mistral 7B (which has just released an 8x7B MoE model) have enabled their own sub-industry of finetuned variants for a myriad of reasons:* Ownership & Control - you take responsibility for serving the models* Privacy - not having to send data to a third party vendor* Customization - Improving some attribute (censorship, multiturn chat and chain of thought, roleplaying) or benchmark performance (without cheating)Related to improving benchmark performance is the ability to use smaller (7B, 13B) models, by matching the performance of larger models, which have both cost and inference latency benefits.Core to all this work is finetuning, and the emergent finetuning library of choice has been Wing Lian's Axolotl.AxolotlAxolotl is an LLM fine-tuner supporting SotA techniques and optimizations for a variety of common model architectures:It is used by many of the leading open source models:* Teknium: OpenHermes, Trismigestus, CollectiveCognition* OpenOrca: Mistral-OpenOrca, Mistral-SlimOrca* Nous Research: Puffin, Capybara, NousHermes* Pygmalion: Mythalion, Pygmalion* Eric Hartford: Dolphin, Samantha* DiscoResearch: DiscoLM 120B & 70B* OpenAccess AI Collective: Manticore, Minotaur, Jackalope, HippogriffAs finetuning is very formatting dependent, it also provides prompt interfaces and formatters between a range of popular model formats from Stanford's Alpaca and Steven Tey's ShareGPT (which led to Vicuna) to the more NSFW Pygmalion community.Nous Research MeetupWe last talked about Nous at the DevDay Recap at the e/acc “banger rave”. We met Wing at the Nous Research meetup at the a16z offices in San Francisco, where they officially announced their company and future plans:Including Nous Forge:Show NotesWe've already covered the nuances of Dataset Contamination and the problems with “Open Source” in AI, so we won't rehash those topics here but do read/listen to those if you missed it.* Axolotl GitHub and Discord* The Flan paper and dataset* StackLlama model and blogpost* Multipack paper* Our episode with Tri Dao* Mamba state space models - Tri Dao and Albert GuTimestamps* [00:00:00] Introducing Wing* [00:02:34] SF Open Source AI Meetup* [00:04:09] What is Axolotl?* [00:08:01] What is finetuning?* [00:08:52] Open Source Model Zoo* [00:10:53] Benchmarks and Contamination* [00:14:29] The Case for Open Source AI* [00:17:34] Orca and OpenOrca* [00:23:36] DiscoLM and Model Stacking* [00:25:07] Datasets and Evals over Models* [00:29:15] Distilling from GPT4* [00:33:31] Finetuning - LoRA, QLoRA, ReLoRA, GPTQ* [00:41:55] Axolotl vs HF Transformers* [00:48:00] 20x efficiency with StackLlama and Multipack* [00:54:47] Tri Dao and Mamba* [00:59:08] Roadmap for Axolotl* [01:01:20] The Open Source AI CommunityTranscript[00:00:00] Introducing Wing Lian[00:00:00] [00:00:00] swyx: Welcome to Latent Space, a special edition with Wing Lien, but also with our new guest host, Alex. Hello, hello. Welcome, welcome. Again, needs no introduction. I think it's like your sixth time on Latent Space already. I think so, yeah. And welcome, Wing. We just met, but you've been very prolific online. Thanks for having me.[00:00:30] Yeah. So you are in town. You're not local. You're in town. You're from Minneapolis?[00:00:35] Wing Lian: Annapolis. Annapolis. It's funny because a lot of people think it's Indianapolis. It's I've got Minneapolis, but I used to live out at least in the San Francisco Bay Area years ago from like 2008 to 2014. So it's fairly familiar here.[00:00:50] swyx: Yep. You're the maintainer of Axolotl now, which we'll get into. You're very, very prolific in the open source AI community, and you're also the founder of the Open Access AI Collective. Yeah. Cool. Awesome. Maybe we can go over a little bit of your backgrounds into tech and then coming into AI, and then we'll cover what[00:01:06] Wing Lian: happens and why you're here.[00:01:08] Yeah. So. Back on tech, so I started years ago, I started way back when I was scraping, Apartment websites for listings and then, and then building like SEO optimized pages and then just throwing Google AdSense on it.[00:01:24] And that got me through like college basically. Is[00:01:27] swyx: that decent money? And what year[00:01:28] Wing Lian: was this? Like 2004, 2005. Yeah, that's decent money. It's like thousand bucks a month. But as a college student, that's like. Gravy. Really good money, right? So, and then there's just too much competition It's just sort of like died off. I was writing stuff in like Perl back then using like like who nobody hosted anything on Perl anymore, right? Still did a little bit more like computer tech support and then software, and web more professionally.[00:01:54] So I spent some time working on applications in the blood industry. I came out to San Francisco for, I was at SGN, so Social Gaming Network, as a startup. They started doing, with Facebook apps, and then they pivoted into doing mobile apps. And then, from there, I spent time.[00:02:14] I've quite a few more startups since then and in the last few years I've been in the music space So like I was at United Masters for a while and then past year I've been at SoundCloud, but not doing that anymore and now that I have a lot more time It's just like all right.[00:02:30] We're going full bore on axolotl and we're gonna we're gonna crush AI So yeah,[00:02:34] SF Open Source AI Meetup[00:02:34] swyx: totally you so you're here in town for the open source. Yeah, I meet up that we had yesterday Yep, yeah, that was amazing. Yeah, it was a big collection. Olama, Noose Research, Alignment Lab, Anyone else that I missed? I mean, Jeremy Howard is his own thing.[00:02:47] Yeah.[00:02:49] And Alex, you're also there. You love to bring SF to the world. Your takes?[00:02:55] Alex Volkov: It's incredible that we recorded a Thursday Eye episode after that one. And LDJ, who's usually co hosts Thursday Eye, just like briefly mentioned, Oh yeah, I talked about it.[00:03:04] Like, I saw Karpathy, and then I talked to Jeremy Howard, and the guy from Mistral came in, and it's like, He's talking about all these, titans of industry, basically, that outside of SF, You just don't meet casually hanging out in the same space. You can't, pull somebody. He ran into the Laylow from Mistral, he ran into him while, drinking water.[00:03:20] He didn't even know he was there. It's just, that type of stuff is really hard to find outside of SF. So, absolutely, absolutely great. And also, presentations from Alignment Labs, presentations from News Research, news issues, talked about. Forge, and some of[00:03:33] swyx: the other stuff they announced. We can say now they're officially a company.[00:03:36] I met Technium.[00:03:37] He[00:03:37] Alex Volkov: came over here. He didn't want to get recorded. But maybe.[00:03:41] Wing Lian: We'll wear him down at some point. Yeah, I'm excited for Forge. They've positioned it as this agentic sort of framework where it's just Drag and drop things and, fill in text with where you want to inject different variables and it opens up all of these potentials for data pipelines now, right?[00:03:56] And using your own local LLMs and not relying on GPT 4 or anything like that. Yeah, yeah,[00:04:02] swyx: good stuff. Okay, so let's maybe go into the Axolotl origin story and then we have, we have some intro or background.[00:04:09] What is Axolotl?[00:04:09] swyx: To do on like the open source model universe and also on fine tuning, but maybe just, since you're talking about your personal journey, what was your personal journey into[00:04:18] Wing Lian: axolotl?[00:04:19] Yeah, so my personal journey started like back in mid March, completely unrelated to AI and axolotl. And it really started, I fell while skiing, I torqued. Great 3 MCL sprain and being sort of like an active person that can no longer be active because the two, couldn't play soccer, because that is requires to have having knees until I, it's healed.[00:04:42] So I. I decided I needed to find something to do to take up my free time. And that became, well, let's learn how to train in, these language models. It was everywhere. So I was like, all right, I'm just going to sit down, learn. I think I used like other, I think I was using like Alpacalora.[00:05:00] Cause I think the Alpaca paper had just came out, come out then. So I was like using Alpacalora repo and sort of like learning how to use like. None of us were like GPU rich back then, and none of us, most of us still we're still all GPU poor, but I was doing what was it, like 4 bit, Alpaca Lord, there was like a 4 bit version where we were doing quant, or 8, no, 8 bit quantizations, and then I think they had released QLOR a little bit later, and I think right when, before QLOR came out, I was already starting to do fine tunes, but having this need to sort of like mix data sets together, and If you've ever looked at all the various different datasets available on HuggingFace, they all have various different prompt formats, and, it's sort of a nightmare, and then I think the other piece is if you've ever tried to fine tune, at least Back then probably the ecosystem's a little better now.[00:05:54] Everybody required that you say, alright, you put your hyperparameters as command line arguments. And so it's always like, well, I now have to go copy and paste my previous thing and to change things out. And I really wanted it. to be in a YAML file because it was more portable and reproducible.[00:06:09] So I was doing that and then the QLOR paper came out. Tim Dettmer announced that and then somebody looked it up for me yesterday and it's like between that announcement it took us seven days to get that integrated into Axolotl, right? Which is like, it's not. I wouldn't say it's really fast, but in a manner that, is in a, a reusable framework, I think it was quite the accomplishment then.[00:06:33] And so we started, picking up traction with people there. And then it's just been building models, and then just iterating what my needs are. So, yeah. Excellent. Yeah. I[00:06:44] Alex Volkov: want to ask, for folks who are listening who never heard of Axolotl, now do you describe how you got there?[00:06:49] Can you, how do you summarize this for folks who maybe haven't fine tuned anything. They know about open source LLM exists, they maybe know like LLAML, what's XLR for somebody who doesn't know. I've never heard of a data set curation[00:07:01] Wing Lian: creation before. We sort of have to take a step back and understand that, when you've got these language models, you have what I think most people refer to as like base models, also known as like foundational models, right?[00:07:15] Where some benefactor, whether it's Meta or Mistral or whoever, has gone and spent all this money. To train these models on huge corpuses of text, right? And these, these corpuses, they're generally good across lots of different things, but they're really good at just saying, talking on and on and on, but they're not good at, following instructions or having chats or anything like that.[00:07:40] So, when you think about fine tuning, it's like Saying, all right, we have this really sort of good generalized, text completion thing, and I want to turn it into something that I can talk to or have, follow instructions. So, I think fine tuning is probably best defined in like that.[00:07:58] swyx: Okay, got it.[00:07:59] And we actually[00:08:01] What is finetuning?[00:08:01] swyx: Do want to make sure that we have like an overall introduction to fine tuning for people because again like trying to make sure that we bring everyone along in this, in this journey. We already went into Loras and QLoras without explaining what[00:08:12] Wing Lian: they are. Oh yes, yes, sorry.[00:08:14] swyx: And so I will put things in my words and you can correct me as, as, as my I'll be the village idiot here.[00:08:21] So, so fine tuning is basically sort of grabbing an open source model off the shelf, and then basically doing further training on it with a custom dataset of your own. Primarily, people use it, think about it as fine tuning for JSON output, or fine tuning for a style of response. Let's say you wanted to tell jokes, or be funny, or be short, or whatever.[00:08:43] Just the open source AI community has really fine tuned in all sorts of different manner. I think we'll go over those those things now. Let's go over those things now, and then we'll talk about fine tuning methods.[00:08:52] Open Source Model Zoo[00:08:52] swyx: So there's a universe of people who fine tune stuff. Yesterday in your slides, you had, I'll just list some of these and then we'll maybe go through some of them, right?[00:08:59] So Technium is personally leading Open Hermes, which is I think the sort of premier model out of the news. news community. There's OpenOrca, which you had a hand in. News, the news research itself also has Capybara and Puffin and all the others. There's Pygmalion, which I've never messed with.[00:09:14] Eric Hartford, I am aware of his Uncensored Models and his Samantha Models. Disco Research with Disco LM. And then you personally have done Manticore, Minotaur, Jackalope, and Hippogriff. What should people know about all these names? Being part of AI Twitter is seeing all these things and going dude, I'm being DDoS'ed by all these things and I don't know how different they are.[00:09:32] What should people know? Yeah, so[00:09:34] Wing Lian: I think on a lot of these models, generally, we like to think of those as sort of general models, so If you think about it, what is GPT 4, what is Chad GPT? It's a good general model, and then, One of the services I think that OpenAI offers is like these fine tunings where you're a business and you have very specific business use cases and you might fine tune for that use case.[00:10:00] All of these models are really just general use case that you can then go and maybe Fine tune another lore over it for your use cases, but they tend to be good. With good being relative, it's open source. Open source AI is still sort of is infancy. So, good is, it's pretty reasonable.[00:10:18] It's probably still better than most, high schoolers at answering questions and being able to like figure things out and, and reasoning skills and math and those sorts of things, right?[00:10:27] swyx: And also as measured on the Hugging[00:10:29] Wing Lian: Face leaderboard. Yes, well, that's like a whole other discussion, right, there's a whole other, group of people who, and I, I mostly agree with them that, benchmarks can be, are pretty bogus these days, LM says, I think they published something recently where, even if you think the dataset's not contaminated, you can go and, find contamination And maybe we should step back and say what contamination is, right?[00:10:53] Benchmarks and Contamination[00:10:53] Wing Lian: So we have all of these data, when you go and do these benchmarks, there's a specific data set where there are these questions and usually it's multiple choice. And what can happen is, well, sometimes someone It puts the question, maybe maliciously, maybe accidentally, into the training dataset, and now the, the, your model knows how to answer the test questions really well, but it doesn't, it hasn't generalized the ability to actually do that[00:11:20] Alex Volkov: right.[00:11:21] We've seen some folks competitively announce models that are like the best at that leaderboard, but then it's, it's quite obvious that, In open source? Yeah, and in that leaderboard, for Hugging Face specific, I don't know if LMCs, if that had suffered, but we, there's been some models that seem to have been competitively trained and some leakage happened into their,[00:11:41] swyx: like, supposal.[00:11:43] I understand, once there's been a credible assertion, Hugging Face actually does take them down, right? Yeah, yeah,[00:11:48] Alex Volkov: which is really hard to know, right?[00:11:50] swyx: It's really hard to know, sometimes it's like a pure accident,[00:11:52] Alex Volkov: it's oh, oops. You're going through a mixer. I think, a responsible So acknowledgement, that this kind of happened to you is also important.[00:11:58] I saw LDJ from news research can acknowledge that. Because many of these datasets are collections of other datasets. There's a bunch of people are baking, basically. It's alchemy. Right. And so sometimes you don't know. Sometimes you pull an open source dataset and they announce, oh, you know what, actually, the MMLU benchmark which we used to Specifically identify models that did go into this data set, that then went into that data set.[00:12:22] So sometimes it's actually an accident and folks take it down. But I've seen some competitive folks who want to put their name out there because people are starting to notice which is the top[00:12:30] swyx: model. For those who want a fun take on this so the file one dataset. FindOne model from Microsoft was accused of being contaminated.[00:12:37] And I saw this joke paper that was fantastic. It was called, training on the test set is all you need. It's a super small model that just memorizes everything. It was fantastic. So yeah, contamination, I think we've actually covered it in a previous episode before. So we're good. But again, I want to give people a map into the open source AI model, the universe.[00:12:57] And Alex, you can also jump in here because you guys have spent a lot more time with them than I have. So, what should people know about Technium? What should people know about Noose? And then we can go down the list. Yeah,[00:13:05] Wing Lian: I think so. I think if we start with, Technium. When you talk to him, he's gonna say, I think, I think his response is that he wants to build GP4 on his laptop, right?[00:13:14] So, very, very good at building general models. I think with Noose, Noose Research, they're looking at more, sort of, More, more research focused things, like their Yarn models, I don't, I don't, they didn't actually train their, they have their own trainer for their Yarn models, but So they did not use Xlato for that one?[00:13:30] They didn't use that, but like Is that, you don't have support for it? I think we do support Yarn, I think, I'd have to double check that answer. Yeah, I'm just kind of curious what you can and cannot support, and Yeah, I mean, Yarn is supportable, it's basically, I think it's just replacing, I think, the rope part of that, so Yeah, not, not a big deal.[00:13:48] Yeah, it's not a big deal, it's just I haven't gotten to it, not enough people have asked, I think a lot of people have asked for other things, so it's just, squeaky wheel, right? I think at the end of the day, people are like building these data sets and I think if you sort of map things chronologically, these make more sense because it's like, how do we incrementally improve all of these models?[00:14:07] So a lot of these models are just incremental improvements over the last thing, right? Whether it is sort of through methods of how do we, how did we curate the data set? How did we improve the quality of the data set? So, you maybe LDJ talked about it right on I think for, for Capybara and Puffin, like how those, those were very specific dataset curation techniques that he works on.[00:14:29] The Case for Open Source AI[00:14:29] Alex Volkov: So there's, folks are doing this for dataset curation. Folks are doing this for skillset building as well. Definitely people understand that open source is like very important, especially after the, the, the, the, the march, the debacle, the OpenAI weekend that we all had. And people started noticing that even after developer day in OpenAI, the APIs went out.[00:14:48] And then after that, the whole leadership of the company is swiftly changed and people, there was worries about, you know. How can people continue building AI products based on these like shaky grounds that turned attention definitely to Technium at least in open RMS I started seeing this more and more on Twitter, but also other models and many companies They're gonna start with open AI just to get there quick, and then they they think about okay Maybe I don't want to share my knowledge.[00:15:13] Maybe I don't want to sign up for Microsoft. Maybe they will change their terms and conditions so What else is out there? They turned to other companies. Up until yesterday, Google was nowhere to be found. We've talked about Gemini a little bit before in a previous And you can tune in[00:15:26] swyx: to[00:15:26] Alex Volkov: Thursday Eye.[00:15:26] Yeah, you can tune in to Thursday Eye. We covered the Gemini release a little bit. And but many are turning into the open source community and seeing that Meta released and continues to release and commit to open source AI. Mistral came out and the model is way smaller than LLAMA and performs Significantly better.[00:15:43] People play with OpenRMS, which is currently techniums based, news researched, sourced, axolotl trained OpenRMS, I assume, right? And then they play with this and they see that, okay, this is like GPT 3. 5 quality. We had GPT 4. 5 birthday just a week ago. A week ago, a year ago, a week ago, we never, interacted with these models of this caliber.[00:16:04] And now there's one open source, one that's on my laptop, completely offline, that, I can continue improving for my use cases. So enterprises, companies are also noticing this. And the open source community folks are building the skill set, not only the data sets. They're building the actual kind of, here's how we're going to do this, with Axelotl, with these data sets.[00:16:21] The curation pieces. Now. Interesting. There's like recipes of curation. The actual model training is kind of a competitive thing where people go and compete on these leaderboards that we talked about, the LMC arena, and that recently added open air and recently added open chat and a bunch of other stuff that are super cool.[00:16:37] The hug and face open source leaderboard. And so there's a competitive aspect to this. There's the open source. Aspect to this, like Technium says, I want GPT 4 on my laptop. There's the, let me build a skill set that potentially turns into a company, like we saw with Noose. Noose just, started organizing, a bunch of people on Discord, and suddenly, they're announcing their company.[00:16:54] It's happening across all these modalities, and suddenly all these people who saw these green pastures and a fairly quick way to, hey, here's a cool online community I can, start doing cool stuff with. You mentioned the same in the beginning, right? Like, after your accident, what's cool, let me try this out.[00:17:08] Suddenly I start noticing that there's a significant movement of interest in enterprising companies into these areas. And, this skill set, these data sets, and this community is now very Very important, important enough to create an event which pulls in Andrei Karpathy from OpenAI to come and see what's new Jeremy Howard, like the event that we just talked about, people are flying over and this is just a meetup.[00:17:28] So, definitely, the community is buzzing right now and I think Axelot is a big piece as well.[00:17:34] Orca and OpenOrca[00:17:34] Wing Lian: Cool. Maybe we can talk about like Orca real quick, Orca, OpenOrca rather, I think there was a lot of buzz when, the first Orca paper came out. And just briefly, what is Orca? Yeah, Orca was basically having traces of like chain of thought reasoning, right?[00:17:48] So they go and they, they distill sort of GPT 4. They take, they take a sampling of data from the Flan dataset. Maybe we can like add some show notes in the Flan dataset. Yeah, but we've covered it. Okay, cool. Use GPT 4 to say, all right, explain this in a step by step reasoning, right?[00:18:06] And then you take that and you, they train the model and it showed, very good improvements across a lot of benchmarks. So OpenOrca was sort of the open reproduction of that since Microsoft Research never released that particular data set. And going back to sort of the Hugging Face leaderboard thing, those models did really well.[00:18:23] And then I think, so sort of the follow up to that was SlimOrca, right? I think Going into and building the OpenOrca dataset, we never really went in and, validated the actual answers that GPT 4 gave us, so what we did was one from OpenChat actually cross referenced the original Flan, the original Flan response, the human responses, the correct answers with the dataset, and then I went and took it and sent all of, both of them to GPT 4 and said, is this answer mostly correct, right?[00:18:54] Yeah. And then we were able to filter the dataset from, At least of the GPT 4 only answers from like 800, 000 to like 500, 000 answers or rows and then, and then retrain the model and it had the same performance as the original model to within I think, 0. 1 percent here about, and 30 percent less data.[00:19:13] So, yeah. Okay.[00:19:15] swyx: Interesting. So, I mean, there's, there's so much there that I want to highlight, but yeah. Orca is interesting. I do want people to know about it. Putting chain of thought into the data set like it's just makes a ton of sense one thing I think it would be helpful for people to scope thing these things out is how much data are we talking about when when you When people are fine tuning and then how much time or resources or money does it take to train to fine[00:19:36] Wing Lian: tune?[00:19:37] Yeah, so I think there's a little bit of overlap there with sort of like fine tuning techniques, but let's say Orca and I think even Hermes, they're both relatively large data sets like 10 billion tokens. Yeah. So large data sets being or the original Orca was, or the original open Orca was 800,000 rows.[00:19:55] I believe it was somewhere in the ballpark of like a gigabyte of data, of gigabyte, of text data. And I, I don't. I believe, Hermes was, is like a quarter million rows of data, I don't know the actual byte size on that particular one. So, going and training a, let's, let's say everybody's training 7 billion Mistral right now, right?[00:20:15] So, to tri I, I believe to fine tune 7 billion Mistral on, let's say, 8 A6000s, which have 48 gigabytes of VRAM, I believe, It takes about 40 hours, so 40, and then that's, depending on where you get your compute, 40 times 6, so it's like 500 to fine tune that model, so, and, and that's assuming you get it right the first time, right?[00:20:44] So, you know.[00:20:45] swyx: Is, is that something that X. Lotto handles, like, getting it right the first[00:20:48] Wing Lian: time? If you talk to anybody, it's like you've probably tried at least three or four runs or experiments to like find the right hyperparameters. And after a while you sort of have a feel for like which, where you need your hyperparameters to be.[00:21:04] Usually you might do like a partial training run, do some benchmark. So I guess for Al Farouk, whether you're going by his. This is Jeremy, he's, his actual name, or his twitter handle. He released the Dharma dataset, which is basically a subset of all the benchmarks. And Axolotl actually supports, you know taking that subset and then just running many benchmarks across your model every time you're doing an evaluation so you can sort of like see sort of relative it's not going to be the actual benchmark score, but you can get ideas alright, is this benchmark improving, is this benchmark decreasing, based on, you know Wait,[00:21:39] swyx: why don't you run the full benchmark?[00:21:41] What, what, what The[00:21:42] Wing Lian: full benchmarks take Take a long time. Significant, yeah, significant amount of time. Yeah. And Okay, so that's like[00:21:48] swyx: mini MMLU. Yeah. Like,[00:21:49] Wing Lian: mini BigBench or whatever. Yep, exactly.[00:21:51] Alex Volkov: It's really cool. We, when I joined Web2Masters just recently, and one of the things that I try to do is hey I'm not, I'm a software engineer by trade, I don't have an MLE background, But I joined a company that does primarily MLE, and I wanted to learn from the community, Because a lot of the open source community, they use weights and biases, And the benchmark that you said that Pharrell did, remind me of the name, sorry.[00:22:13] Dharma? Dharma, yeah, yeah. So Luigi showed me how Dharma shows inside the dashboard. In Wi and Biases dashboard and so you can actually kinda see the trending run and then you can see per each kind of iteration or, or epoch or you can see the model improving trending so you can on top of everything else.[00:22:29] The wi and biases gives like hyper parameter tracking, which like you, you started with common line and that's really hard to like remember. Also the Dharma data set, like the quick, the mini orca mini, you mini many different things. It's pretty cool to like visualize them as well. And I, I heard that he's working on a new version of, of Dharma, so Dharma 2, et cetera.[00:22:47] So hopefully, hopefully we'll see that soon, but definitely it's hard, right? You start this training around, it said like 40, 50 hours. Sometimes, sometimes it's like your SSHing into this machine. You, you start a process, you send it with God and you just go about your day, collecting data sets, and then you have to return.[00:23:04] And the whole process of instrumentation of this is still a little bit like squeaky but definitely. Tuning performance, or like grabbing performance in the middle of this, like with Dharma and some other tools, is very helpful to know that you're not wasting precious resources going somewhere you shouldn't go.[00:23:21] Yeah.[00:23:22] swyx: Yeah. Very cool. Maybe I'll, I'll, before we go into like sort of more details on fine tuning stuff, I just wanted to round out the rest of the Excel autoverse. There's, there's still Eric Hartford stuff. I don't know if you want to talk about Pygmalion, Disco, anything that you know about[00:23:35] Wing Lian: those, those things.[00:23:36] DiscoLM and Model Stacking[00:23:36] Wing Lian: Yeah, I think like one of the, definitely one of the more interesting ones was like the Disco 120b, right? Yeah, I know nothing about it. Yeah. So, so. Alpen from Pygmalion AI, right, so they, so Pygmalion is a sort of a, it's, it's, they have their own community, a lot of it is based around, roleplay models, those sorts of things, and Alpen, like, put together, merged together Llama270B, so, and Alpen, like, put together, merged together Llama270B, so, I don't remember how he stacked them together, whether he merged the layers in between. There's a whole, there's a whole toolkit for that by Charles Goddard, where you can like take a single model and like stack them together or multiple models merge.[00:24:18] That's like a whole other talk and a whole other tool set, but was able to create this 120. Billion parameter model out of a LAMA two 70 B. And then I believe the, yeah, disco is a fine tune of, of the, the, the sort of the base one 20 B is, I believe Goliath one 20 B. So, and, and what are the[00:24:37] swyx: headline results that people should know about[00:24:39] Wing Lian: disco?[00:24:39] I think for the headline results, I, I've, I haven't played with it personally because it's. It's a very large model and there's a lot of GPU, right? But, like, from what I've heard anecdotally, it performs really well. The responses are very good. Even with, like, just, even the base model is a lot better than, Llama70b.[00:24:57] So, and we, I think generally everybody's like, we would all love to fine tune Llama70b, but it's just, it's so much, it's so much memory, so much compute, right?[00:25:07] Datasets and Evals over Models[00:25:07] Wing Lian: I[00:25:07] Alex Volkov: want to touch on this point because the interesting thing That comes up out of being in this ecosphere and being friends with open source folks, tracking week to week state of the art performance on different models.[00:25:19] First of all, a lot of the stuff that the folks do a couple of weeks ago, and then something like Mistral comes out, and a lot of the stuff back then, Doesn't technically make sense anymore. Like the artifacts of that work, the actual artifacts, they don't no longer make sense. They're like lower on the on, on the hug and face leaderboard or lower on LM CS leaderboard.[00:25:36] But some of the techniques that people use, definitely the datasets. The datasets keep traveling, right? So open airmen, for example, is the dataset. The tum cleaned up for only. Open sourceable data that previously was just Hermes. And that, it was previously used to train Lama. And then once Mistral came out, it was used to train Mistral.[00:25:54] And then it became significantly better on the 7b base Mistral. So the data sets keep traveling, keep getting better a little bit here and there. And so the techniques improve as well. It looks like both things are simultaneously true. The artifacts of a month and a half ago. The, the actual models themselves, it's great the hug and face has them, because not every company can keep up with the next weeks', oh, I, I'll install this model instead, sell this model instead.[00:26:19] But the, the techniques and the, the dataset keep improving as we go further, and I think that's really cool. However, the outcome of this is that for a long time. For many, many people, including us, that we do this every week. We literally talk with people who release these models every week. It's really hard to know.[00:26:36] So, there's a few aspects of this. One, I think, like you said, the bigger model, the 70B models, you actually have to have somebody like Perplexity, for example, giving you access to the 70B really fast. Or you have to, like, Actually, find some compute, and it's expensive, especially for the bigger models. For example Falcon 180B came out, like the hugest open source model.[00:26:56] How do you evaluate this if you can't run it? Nobody liked it. It's really, so first of all, nobody liked it, but secondly, only the people who were able to find compute enough to run inference on this, they only had like, I can't run this on my laptop, and so that's why it's much easier, something like OpenRMS 7 to be, 7B, it's much easier, because you can run this on your MacBook.[00:27:14] It's much easier to evaluate. It's much easier to figure out the vibes, right? Everybody talks about the vibes as an evaluation check. If you're plugged in enough, if you follow the right people, if they say pretty much the same things all independently, then you run into a problem of whether they're repeating, and their stochastic parents are repeating the same thing, or they actually evaluated themselves.[00:27:31] Yeah, you never know. But, you never know, but like, I think on a large enough scale on Twitter, you start getting the feel. And we all know that like, OpenRMS is one of the top performing models, benchmarks, but also vibes. And I just wanted to highlight this vibes checks thing because you can have the benchmarks, you can have the evaluations, they potentially have contamination in them, potentially they not necessarily tell you the whole story because some models are good on benchmarks, but then you talk to them, they're not super helpful.[00:28:00] And I think it's a combination of the benchmarks, the leaderboards, the chatbot, because LMSys, remember, their ranking is not only based on benchmarks, it's also people playing with their arena stuff. People actually like humans, like, get two answers. I think they completely ignore benchmarks. Yeah, and then They only do ELO.[00:28:18] Oh, they do ELO completely, right? So that, for example, is just like people playing with both models and say, Hey, I prefer this one, I prefer that one. But also there's like some selection bias. The type of people who will go to LMCs to play with the models, they're a little bit specific in terms of like who they are.[00:28:33] It's very interesting. There's so many models. People are doing this in this way, that way. Some people are doing this for academic rigor only to test out new ideas. Some people are actually doing this like the Intel fine tunes of Mistral. Intel wanted to come out and show that their hardware approach is possible, Mistral, etc.[00:28:51] And it's really hard to know, like, what to pick, what to use. And especially on the bigger models, like you said, like the Llama 70B, the Falcon 180B. It's really because, like, who has the compute to validate those? So I would mention that, like, use with caution. Like, go and research and see if the biggest model that just released was actually worth the tokens and the money you spend on it.[00:29:12] To try and, if you're a business, to integrate it.[00:29:15] Distilling from GPT4[00:29:15] swyx: Since you said use of caution, I'll bring in one issue that has always been in the back of my mind whenever I look at the entire universe of open source AI models, which is that 95 percent of the data is derived from GPC 4, correct?[00:29:30] Which technically you can't use for commercial licenses,[00:29:34] Wing Lian: right?[00:29:35] swyx: What is the community's stance on this kind of stuff?[00:29:40] Wing Lian: I think from the community stance, like I feel like a lot of us are just experimenting, so for us, it's like, we're not going and building a product that we're trying to sell, right?[00:29:49] We're just building a product because we think it's interesting and we want to use it in our day to day lives, whether or not we try and integrate it. Personal use, yeah. Yeah, personal use, so like, as long as we're not selling it, yeah, it's fine. But[00:30:01] swyx: like, I as a company cannot just take OpenHermes and start serving[00:30:05] Alex Volkov: it and make money on it.[00:30:06] OpenHermes you can. Because the opening of OpenHermes, I think, is a clean up. That did after the regular Hermes, please folks, check your licenses before you listen to podcasts and say, Hey, I will tell you though, you could say the same thing about OpenAI. You could say the same thing kind of makes sense, where OpenAI or StabilityAI trains their diffusion model on a bunch of pictures on the internet, and then the court kind of doesn't strike down Sarah Silverman, I think, or somebody else, who came and said, hey, this has my work in it, because of the way how it processes, and the model eventually builds this knowledge into the model, and then it doesn't actually reproduce one to one what happened in the dataset.[00:30:45] You could claim the same thing for open source. Like, we're using And by we, I mean the, the open source community that I like happily report on uses GPT 4 to rank, for example, which is the better answer you, you, that's how you build one, one type of data set, right? Or DPO or something like this, you, you basically generate data set of like a question and four answers, for example, and then you go to GPT 4 and say, Hey, smartest model in the world right now, up to Gemini Ultra, that we should mention as well.[00:31:11] Which one of those choices is better? But the choices themselves are not necessarily written with GPT 4. Some of them may be, so there's like full syntactic datasets. But there's also, datasets are just ranked with GPT 4. But they're actually generated with a sillier model, or like the less important model.[00:31:25] The lines are very blurry as to what type of stuff is possible or not possible. And again, when you use this model that's up on Hug Face, the license says you can use this. OpenAI is not going to come after you, the user. If anything, OpenAI will try to say, hey, let's prevent this, this type of thing happening, and the brain, but I honestly don't think that they could know even, not that it makes it okay, it's just like, They also kind of do this with the Internet's archive, and also, I think that some of it is for use.[00:31:55] You use models to help you augment tasks, which is what GPT 4 lets you do.[00:32:00] swyx: Yeah, the worst thing that OpenAI can do is just kick you off OpenAI. That's because it's only enforced in the terms of service.[00:32:05] Alex Volkov: Sure, but just like to make sure, to clarify who they're going to kick out, they could kick out like News, for example, if news are abusing their service, a user of the open source, fully Apache 2 open source, for example, They won't get kicked out if they use both, just because they use both.[00:32:22] I don't believe so. I don't think OpenAI has a claim for that.[00:32:25] swyx: Well, we're not lawyers, but I just want to mention it for people to know it's an issue.[00:32:30] Wing Lian: And one of the things, like, I talked to someone recently, and I think that they also are like interested in it, but also to the point of like, right, if I use a model trained on data, using GPT for data, But I use that model to then regenerate new data.[00:32:46] Is that model, is that data okay? So like you start going down this whole rabbit hole. So yeah. All right.[00:32:53] swyx: Fantastic. Cool. Well, I think that's roughly highlights most of the open source universe. You also have your own models. Do you want to shout out any one of them? Yeah.[00:33:01] Wing Lian: I mean, I think like, I think Early on, Manicore got a lot of love.[00:33:04] I think it was mostly popular in, like, the roleplay communities. It was, it tended to be pretty truthful. It tended to be, like, have relatively good answers, depending on who you ask, right? But, I think for me, it was just, Releasing models was a way to try and, like, continue to build out the product, figure out what I needed to put into the product, how do I make it faster, and, if you've got to, like, go and debug your product, you may as well have it do something useful.[00:33:29] Awesome. So, yeah.[00:33:31] Finetuning - LoRA, QLoRA, ReLoRA, GPTQ[00:33:31] swyx: Okay, and then maybe we'll talk about just fine tuning techniques. So this is going to be a little bit more technical than just talking about model names and datasets. So we started off talking about LoRa, QLoRa. I just learned from your readme there's ReLoRa. Which I've never heard about.[00:33:45] Could you maybe talk about, like, just parameter efficient fine tuning that whole, that[00:33:50] Wing Lian: whole journey, like, what people should know. Yeah, so with parameter efficient fine tuning, I think the popular ones, again, being, let's, we'll start with lore, right? So, usually what you do is you freeze all the layers on your base, on the base model, and then you, at the same time, you sort of introduce additional Oh, this is tight.[00:34:08] No. You introduce, another set of layers over it, and then you train those, and it is done in a way that is mathematically possible, particularly with LORs that you can, then you, you, When you, when you train the model, you, you run your inputs through the base model, whose weights are frozen, but you, then you also run it through the additional weights, and then at the end you combine the weights, and then, and then, or you combine the weights to get your outputs, and then at the end, and when you're done training, you're left with this other set of weights, right, that are completely independent, and And then from that, what you can do is, some person smarter than I figured out, well, oh, they've done it in such a way that now I can merge these weights back into the original model without changing the architecture of the model, right?[00:35:03] So, so, that tends to be, like, the go to, and You're training much fewer parameters so that when you do that, yes, you still need to have all of the original weights, but you have a smaller gradient, you have a smaller optimizer state, and you're just training less weights, so you can tend to train those models on, like, much smaller GPUs.[00:35:27] swyx: Yeah. And it's roughly like, what I've seen, what I've seen out there is roughly like 1 percent the number of parameters that you're trading. Yeah, that sounds about right. Which is that much cheaper. So Axelotl supports full fine tune, LoRa, QLoRa,[00:35:40] Wing Lian: Q. Yes. So, so QLoRa is, is very similar to LoRa. The paper was, if I remember correctly, the paper was Rather, traditionally, most people who did Loras were, were, they were quant, they were putting the model weights in 8 bit, and then fine tune, parameter efficient fine tuning over the Lora weights, and then with QLora, they were quantizing all of those, they were then quantizing the weights down to 4 bit, right, and then I believe they were also training on all of the linear layers in the model.[00:36:15] And then with ReLore, that was an interesting paper, and then, I think, like, it got implemented. Some people in the community tried it, tried it out, and it showed that it didn't really have the impact that the paper indicated that it would. And from what I was told recently, that they re I guess they re released something for Relora, like, a few weeks ago, and that it's possibly better.[00:36:44] I personally haven't had the time. What was the[00:36:46] swyx: main difference,[00:36:47] Wing Lian: apart from quantization? I don't know. Okay. What was the main difference, sorry?[00:36:49] swyx: Apart from quantization, right? Like,[00:36:50] Wing Lian: Qlora's thing was, like, we'll just drop off some bits. With Relora, what they did was, you would go through, you would define some number of steps that you would train, like, your Lora with, or your Qlora.[00:37:01] Like, you could do Like, ReqLore, if you really wanted to, you would, you would train your LoRa for some number of steps, And then you would merge those weights into your base model, and then you would start over. So by starting, so, then by starting over, The optimizer has to find, like, sort of, re optimize again, and find what's the best direction to move in, and then do it all again, and then merge it in, do it all again, and theoretically, according to the paper, doing ReLore, you can do parameter efficient fine tuning, but still have sort of, like, the performance gains of doing a full fine tuning, so.[00:37:38] swyx: Yeah, and[00:37:39] Wing Lian: GPTQ? And GPTQ, so it's, I think with GPTQ, it's very similar to, more similar to QLore, where you're, it's mostly a quantization of the weights down to like 4 bit, where GPTQ is a very, is a specific methodology or implementation of quantization, so. Got it.[00:37:57] Alex Volkov: Wang, for, for folks who use Axolotl, your users, some people who maybe, Want to try it out?[00:38:03] And do they need to know the differences? Do they need to know the implementation details of QLora versus ReLora? Or is it okay for them to just know that Axolotl is the place that already integrated them? And if that's true, if that's all they need to know, how do they choose which method to use? Yeah,[00:38:22] Wing Lian: so I think like, I think most people aren't going to be using ReLora.[00:38:25] I think most people are going to be using either Lora or QLora. And I think they should have it. They should have an understanding of why they might want to use one over the other. Most people will say that with Qlora, the quality of the final model is not quite as good as like if you were to do a LoRa or a full fine tune, right?[00:38:44] Just because, you've quantized these down, so your accuracy is probably a little off, and so that by the time you've done the Qlora, you're not moving the weights how you would on a full fine tune with the full parameter weights.[00:38:56] Interesting.[00:38:57] swyx: Okay, cool. For people who are more interested, obviously, read the papers. I just wanted to give people, like, a high level overview of what these things are. And you've done people a service by making it easy for people to try it out. I'm going to, I'm going to also ask a question which I know to be wrong, but I'm curious because I get asked this all the time.[00:39:15] What is the difference between all these kinds of fine tunes[00:39:17] Wing Lian: and RLHF? Okay, between all of these sorts of fine tunes and RLHF. So all of these sorts of fine tunes are based, are, ideally, this, they are taking knowledge that the base model already knows about, and presenting it in a way to the model that you're having the model answer like, Use what it already knows to sort of answer in a particular way, whether it's, you're extracting general knowledge, a particular task, right?[00:39:44] Instruct, tune, chat, those sorts of things. And then generally with RLHF, so what is, let's go back, what is it? Reinforcement Learning with Human Feedback. So if we start with the human feedback part, What you're doing is you generally have, you have like a given prompt and then you, maybe you have one, maybe you have two, I think, like if you look at with Starling, you have like up to what, seven different, seven different possible responses, and you're sort of ranking those responses on, on some sort of metric, right, whether the metric is how much I, I might like that answer versus or I think with like starling is like how how how helpful was the answer how accurate was the answer how toxic was the answer those sorts of things on some sort of scale right and then using that to go back and like sort of Take a model and nudge it in the direction of giving that feedback, to be able to answer questions based on those preferences.[00:40:42] swyx: Yeah, so you can apply, and is it commutative? Can you apply fine tuning after and onto an RLHF model? Or should the RLHF apply, come in afterwards,[00:40:54] Wing Lian: after the fine tune? Um, I, yeah, I don't know that there's There's been enough research for one way or another, like, I don't know.[00:41:02] That's a question that's been asked on Discord. Yeah, like, I definitely would say I don't know the answer. Go and try it and report back to me and let me know so I can answer for the next guy.[00:41:10] swyx: It's shocking how much is still unknown about all these things. Well, I mean, that's what research is for, right?[00:41:16] Wing Lian: So actually I, I think I saw on the top of a leaderboard, it was a, it was a mytral base model, and they didn't actually fine tune it. They, or they, they just did RLH, they did like an RLHF fine tune on it using like, I don't, I don't recall which dataset, but it was like, and it benchmarked really well.[00:41:37] But yeah, you'd have to go and look at it. But, so it is interesting, like going back to that, it's like. Traditionally, most people will fine tune the model and then do like a DPO, PPO, some sort of reinforcement learning over that, but that particular model was, it seemed like they skipped like the supervised fine tuning or Scott.[00:41:55] Axolotl vs HF Transformers[00:41:55] swyx: Cool. One thing I did also want to comment about is the overall, like, landscape, competitive landscape, I don't know. Hugging Face Transformers, I think, has a PFT module.[00:42:05] Wing Lian: Yeah, yeah, the PEFT, the Parameter Efficient Fine Tuning, yep. Is that a competitor to you? No, no, so we actually use it. We're just a wrapper over sort of, sort of the HuggingFace stuff.[00:42:15] So, so that is their own sort of module where They have, taken the responsibility or yeah, the responsibility of like where you're doing these parameter efficient fine tuning methods and just sort of like, it is in that particular package where transformers is mostly responsible for sort of like the modeling code and, and the trainer, right.[00:42:35] And then sort of, there's an integration between the two and, there's like a variety of other fine tuning packages, I think like TRL, TRLX, that's the stability AI one. Yeah, I think TRL likes the stability, yeah, Carper, and TRL is a hugging face trainer. Even that one's just another wrapper over, over the transformers library and the path library, right?[00:43:00] But what we do is we have taken sort of those, yes, we've We also use that, but we also have more validation, right? So, there are some of us who have done enough fine tunes where like, Oh, this and this just don't go together, right? But most people don't know that, so like Example?[00:43:19] Like, people want to One and one doesn't go together. I don't have an example offhand, but if you turn this knob and this knob, right? You would think, all right, maybe this will work, but you don't know until you try. And then by the time you find out it doesn't work, it's like maybe five minutes later, it's failed.[00:43:34] It's failed in the middle of training or it's failed during the evaluation step. And you're like, ah, so we've, we've added a lot of, we've added a lot more validation in it. So that like, when you've, you've created your configuration, you run it through and now you say. The validation code says this is probably not right or probably not what you don't, not what you want.[00:43:52] So are you like a, you[00:43:53] swyx: do some linting of your YAML file?[00:43:56] Wing Lian: There, I guess you could call it linting, it's sort of like Is there a set of rules out[00:44:00] swyx: there somewhere? Yeah, there's a set of rules in there. That's amazing, you should write documentation like This rule is because, this user at this time, like, ran into this bug and that's what we invested in.[00:44:10] It's like a good collection[00:44:11] Wing Lian: of knowledge. Yeah, it is, and I guess like, if you really wanted to, like, figure it out, I guess you could, like, git blame everything, and But, yeah, it's, so, I think that's always a useful thing, it's like Because people want to experiment but they don't, people will get frustrated when you've experiment, you're experimenting and it breaks and you don't know why or you know why and you've just gone down the rabbit hole, right?[00:44:37] So, so I think that's one of the big features that's, that I think I find important because it's It prevents you from doing things you probably shouldn't have, and it, and sometimes we will let you do those things, but we'll try and warn, warn you that you've done that.[00:44:50] I[00:44:51] Alex Volkov: have a follow up question on this, actually, because yesterday we hung out to this open source event, and I spent time by you a couple times, like when people told you, oh, XLR, I use XLR, it's super cool, and then the first thing you asked is, like, immediately, like, what can we improve?[00:45:04] And yes, from multiple folks, and I think we talked about this a little bit, where there's It's a developer tool. It's like a machine learning slash developer tool. Your purpose in this is to help and keep people, as much as possible, like, Hey, here's the best set of things that you can use right now. The bear libraries are, or the bear trainer, for example, is a bear trainer.[00:45:28] And also, maybe we should talk about how fast you're implementing these things. So you mentioned the first implementation took a week or so. Now there's a core maintainer group, right? There's like, features are landing, like Qlora, for example. Neftune, I don't know if that's one example of something that people potentially said that it's going to be cool, and then eventually, like, one of those things that didn't really shake out, like, people quickly tested this out.[00:45:48] So, there's a ton of Wait, Neftune is cancelled? I don't know if it's fully canceled, but based on vibes, I heard that it's not that great. So like, but the whole point that I'm trying to make with Neftune as well is that being existing in the community of like XLR or like, I don't know, even following the, the GitHub options or following the Discord, it's a fairly good way to like, learn these, Kind of gut feelings that you just, you just said, right?[00:46:14] Like where this, maybe this knob, that knob doesn't work. Some of these are not written down. Some of these are like tribal knowledge that passes from place to place. Axel is like a great collection of many of them. And so, do you get That back also from community of folks who just use, like, how do you know who uses this?[00:46:30] I think that's still an issue, like, knowing if they trained with XLR or should they add this to things? Talk about, how do you get feedback and how else you should get feedback?[00:46:38] Wing Lian: Yeah, I mean, most of the feedback comes from the Discord, so people come in and , they don't get a training running, they run into, like, obscure errors or, errors that That's a lot of things that maybe, maybe as a product we could catch, but like, there's a lot of things that at some point we need to go and do and it's just on the list somewhere.[00:46:58] Right that's why when people come up, I'm like, what, what were your pain points? Because like, as a developer tool, if you're not happy with it, or you come in and in the first, Takes you 30 minutes and you're still not happy. You leave the tool and you may, you might move on maybe to a better tool, maybe to, one with less frustration, but it may not be as good, right?[00:47:17] So I'm trying to like, figure out, all right, how can I reduce all this frustration? Because like for me, I use it every day for the most part, right? And so I am blind to that, right? Mm-Hmm. . Mm-Hmm. . I just know, I, I go do this, this, and this. It pretty much mostly works, right? But, so I don't have sort of that, alright, that learning curve that other people are seeing and don't understand their pain points.[00:47:40] Yeah,[00:47:40] Alex Volkov: you don't have the The ability to onboard yourself as a new user completely new to the whole paradigm to like get into the doors of like, Oh, no, I don't even know how to like ask about this problem or error.[00:47:53] swyx: Cool. The last few things I wanted to cover was also just the more advanced stuff that you covered yesterday.[00:48:00] 20x efficiency with StackLlama and Multipack[00:48:00] swyx: So I'll just, caution this as like, yeah, this is more advanced. But you mentioned Stackllama and Multipack. What are they[00:48:06] Wing Lian: and what should people know? Yeah, so, so, Stack Llama was, that paper came out, so Stack Llama I think was like, two, two, two separate, two separate concepts that they announced, so the first one was They being hugging face.[00:48:20] Yeah, sorry, yes, they being hugging face, so the first one being sort of like, this idea of packing, like some packing sequences together, so like, if we think about training data, right, your training data is, let's say, to keep the math easy, let's say your training data is 500, We, we, we, we will use the terminology words.[00:48:39] Let's say your training data is 500 words long, and let's say your, your context length, you know how much data your, that your model can accept is like, or that you want feed into your model. It's, let's say, we won't use tokens again, we'll we'll use it is it's 4,000 tokens, right? So if you're training at 4K Con or four 4,000 4K contacts and you're only using 500 of it, you're sitting like with the other 1500.[00:49:05] 3, 500 words that you're not using, right? And typically that's either filled with these PAD tokens, so I think I made the analogy last night that it's like having sort of like a glass here you fill it up with a shot of liquor and then you're and that's your training data and then you just fill it up with more water and those are your PAD tokens and it's just, it doesn't do much, right?[00:49:27] It's still the same thing, but you still have to go through all of that to go through all your training data. And then, so what Stack Llama showed was you could just sort of take your training data, append the next row of training data until you filled that entire 4k context, so in this example, right, with 500 words to 4k, that's 8 rows of training data.[00:49:48] But, the problem with that is, is that with a lot of these transformer models, they're very much relying on attention, right? So, like, if you now have this sequence of words that now, in order for the, the model has seen all of these other words before, right? And then it sees another set of words, another set of words, but it's learning everything in context of all the words that it's seen before.[00:50:13] We haven't corrected the attention for that. And just real quickly, since I said that that paper was two concepts, the other one was, I believe it was like a reinforcement learning, but outside the scope of this. So going from that, I implemented that early on because I was like, Oh, wow, this is really great.[00:50:29] And. Yes, because it saves you a bunch of time, but the trade off is a little bit of accuracy, ultimately, but it still did pretty well. I think when I did Manicore, I think it used sort of that concept from Stack Llama of just sort of appending these sequences together, right? And then sort of the next evolution of that is Multipack, right?[00:50:51] So, there was a separate paper on that, it was, I believe it was referenced, it got referenced in the Orca paper, where you could, you could properly mask those out using like a, I think it was like a lower block triangular attention mask, and then sort of, so, So, there's that. I did try implementing that, manually recreating that mask, but then one from the OpenChat, so he was helping with OpenOrca as well, and he had done an implementation of Multipack, and where he used FlashAttention, so FlashAttention So that was released by TreeDAO, and it was this huge performance gain.[00:51:35] Everybody uses it now, even the Transformers library now, they've taken all of these, like, people are taking all of these models and sort of like, making it compatible with FlashAttention. But in Flash Tension, there is one particular implementation that lets you say, Well, I'm sending you all of these sequences like you would in Stack Llama, But let me send you another, another, Set of information about, this is where this set of sequences is, this is where the second set of sequences is.[00:52:06] So like, if it was like, 500 words long, and you stacked them all together, you would just send it a row of information that was like, 0, 500, 1000, 1500, etc, etc, out to 4000. And it would know, alright, I need to break this up, and then run the forward pass with it. And then it would be able to, and it was much more, much more performant.[00:52:29] And I think you end up seeing like 10x, 20x improvements over sort of, I mean, I think FlashAttention was like a 2x improvement, and then adding that with the Multipack, you start to see like, depending on, how much data you have, up to like a 20x improvement sometimes. 20x. 20x. Wow. Yeah.[00:52:48] And I only know the 20x because I, like, before last night, I was like, I re ran the alpaca, I looked up the alpaca paper because it was like, I just need a frame of reference where somebody did it, and I think they used eight A100s for three hours, and they said it cost them 100. I don't, I don't think eight A100s cost, I don't know how much it costs right now.[00:53:14] But I ended up rerunning it. Usually a dollar an hour, right? Yeah, so eight. The cheapest is like a[00:53:18] Alex Volkov: dollar, a dollar an hour for one.[00:53:20] Wing Lian: Yeah, so that's still like 24, 25. But maybe if you're going on Azure, maybe it's like, maybe it's 100 on Azure. I mean, it used to be more expensive, like, a year ago.[00:53:31] Yeah, and then, so I re ran it with sort of like, I turned on all of the optimizations just to see what it would be. And like, and usually Multipack is the biggest optimization, so Multipack with Flash Detention. And it, I think I spun it up on 8 L40s, and it ran, and I didn't let it run all the way through, I just grabbed the time, the estimated completion time, and it was like 30 minutes, so it would have cost like 4 or 5 to run the entire, like, reproduce the alpaca paper, right?[00:54:00] Which is crazy. It's crazy. 20x,[00:54:02] Alex Volkov: yeah. I want to ask about, like, you said you turned on all the optimization. Is that the yaml file with xlodl, you just go and like check off, like, I want this, I want that? Yeah, yeah,[00:54:10] Wing Lian: so there's like one particular yaml file in there, That, there's one particular YAML file in there that's like, it's under examples, llama2, fft, optimize.[00:54:20] So, I think someone had created one where they just turned, they put in all of the optimizations and turned them on. I mean, it actually, it does run, which is like, sort of surprising sometimes, because sometimes, you optimize this, optimize this, and sometimes they just don't work together, but, yeah.[00:54:36] Just turn the knobs on, and like, fine tuning should really just be that easy, right? I just want to flip the knob and move on with my life and not figure out how to implement it.[00:54:47] Tri Dao and Mamba[00:54:47] Alex Volkov: Specifically, the guy behind FlashAttention came up with something new. You want to talk about this a little bit? You want to briefly cover Mamba?[00:54:53] Yeah, let's talk about Mamba. Let's talk about Mamba. So, what is Mamba?[00:54:57] Wing Lian: Oh, gosh. I

god ai google internet pr giving personal talk news san francisco reach microsoft putting open uber soundcloud busy discord minneapolis stanford seo billion indianapolis privacy releasing disco goliath fantastic drag models intel roadmap excel folks gemini openai significant transformers apartments sf wing tuning san francisco bay area gpt open source 4k lama traditionally forge github llama takes dharma apis macbook hermes lors llm azure gravy apache nano lotto biases amin gpu pharrell mamba aspect perplexity elo annapolis orca ddos prs alpen contamination gpus benchmarks hugging yarn instruct starling sarah silverman 7b minotaur distilling pad fine tuning mcl alpaca modify lm mistral noose json pygmalion sota trl hyena puffins a16z xlr microsoft research ppo flan axolotl reinforcement learning dpo manticore jackalope pft google adsense stability ai capybara datasets open source ai lmc gpc yaml lay low mle 70b jeremy howard neurips loras carper sgn huggingface hippogriff rlhf vram united masters gemini ultra technium vicuna latent space human feedback which i've relora

Short Time Shots - December 1, 2023

Short Time Wrestling Podcast

Play Episode Listen Later Dec 1, 2023 4:01

Las Vegas is on everyone's mind who follows the world of Division I wrestling and that also means me. There's a bunch of Vegas-related content in the newsletter today, so again, you can find out how to get that daily at the end of the show. I was on the mic Thursday here in the Twin Cities - more on that in a bit, this is Short Time Shots, a look at the day's results in and around college wrestling. Right now, if you hear any banging around, it's because I'm getting a new furnace installed this morning. Yup. Thankfully before it his absolute zero here this winter. Our installer used to wrestle, too. Links: Did you know you can find all the D1 dual meet scores and links to their results at collegewrestlingscoreboard.com AND I've added another redirect to help you follow all the tournaments from every division at collegewrestlingtournaments.com. As to the aforementioned Minnesota match, Division III's top-ranked team, Augsburg, blanked No. 25 Loras 46-0 at Si Melby Hall. Augsburg won four matchups between ranked wrestlers, including returning national champion Sam Stuhl earning an 11-2 major decision over No. 13 Jalen Schropp at 141 pounds. Top-ranked heavyweight Tyler Kim and 184-pounder Bentley Schwanebeck-Ostermann picked big bonus victories - Kim with the fall and BSO (his name is too long to keep repeating) with the tech. Other scores around Division III saw Carthage beat Elmhurst 29-15, and Central, that's it, just Central, pushed past Augustana (Ill.) 36-9. UW-Stevens Point trounced UW-Oshkoh (B'Gosh) 40-18. Augustana did beat Simpson (Iowa) The only Division I match of the night featured a pair of transitional programs as Bellarmine heads out to Missouri and beat Lindenwood 35-7. There were some exciting duals in Division II, including Kutztown winning the last five bouts to pull away past Seton Hill 34-15. Both of Kutztown's ranked wrestlers - No. 11 Dalton Gimbor at 165 and third-ranked Matthew Weinberg at 184 - won. Gimbor beat Kane Kettering while Weinberg picked up a forfeit. I hate forfeit. Millersville beat East Stroudsburg 23-19. Colorado School of Mines won the last six weights to top New Mexico Highlands 31-13. Colorado Mesa beat rival Western Colorado 24-17. Davenport beat NAIA Cleary 37-12. Down south, Coker blanked Bluefield State (that's in West Virginia), while Belmont Abbey beat Emory & Henry 48-3. Emory & Henry is transitioning from Division III to Division II. In the NAIA, Ottawa beat Avila 35-13. On the women's side, Grand View beat Central Methodist 27-18, Cedar Crest beat Alvernia 32-16 in the school's first ever home women's wrestling dual. Sioux Falls new women's program improved to 2-0 with a win over Minot State 39-13 and Lock Haven beat Elmira 35-12. Subscribe, for free, to my Daily Wrestling News. Get it for free at mattalkonline.com/news, sign up free today. That's as always, delivered by our longtime friends at Resilite.

las vegas minnesota missouri west virginia ottawa shots d1 mines twin cities davenport avila augsburg weinberg sioux falls carthage naia colorado school coker division ii division iii bso grandview elmhurst bellarmine short time lindenwood augustana western colorado kutztown loras millersville lock haven uw stevens point seton hill cedar crest colorado mesa new mexico highlands

Chapter Seventy-Two - Jaime 9 - A Storm of Swords | A Song of Ice and Fire (ASOIAF)

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)

Play Episode Listen Later Nov 21, 2023 71:23

Jaime has a curious goodbye with Arya Stark. He refuses to leave the Kingsguard, despite a strong push from his family. He resists an advance from his sister, which results in a heated argument between the two. Finally, he gives Brienne a jaw-dropping gift and then tasks her with a surprising mission in return. Simon and Mackelly squint hard at “Arya Stark.”Chapter Review:Jaime Lannister grows bored of watching the king sign documents that offer rewards or punishment to houses, for their roles in the war. He returns to the White Sword Tower. On his way, he runs into a girl claiming to be Arya Stark. Jaime is skeptical.Cersei waits for him in his chambers. She begs him to talk to their father about Tywin's plan to send her back to Casterly Rock, effectively separating her from her son Tommen. Jaime doesn't think it'll do any good, but Cersei believes it will, if he agrees to leave the Kingsguard. Jaime refuses. Cersei makes a sexual advance that Jaime rebuffs. She throws insults as she storms out.Jaime calls for Ser Loras Tyrell and Brienne of Tarth. Loras admits Brienne might be telling the truth about Renly Baratheon's death. After dismissing Loras, Jaime gives Brienne the Valyrian steel sword his father gave him. In return, he asks she use it to find Sansa Stark and keep her safe.Characters/Places/Names/Events:Jaime Lannister - Member of the royal Kingsguard. Twin brother to queen Cersei and older brother to Tyrion. Nicknamed the "Kingslayer" for killing King Aerys Targaryen.Ser Loras Tyrell - Member of the Kingsguard. Young, dashing, talented knight from the Reach. The third son of Lord Mace Tyrell of Highgarden.Brienne of Tarth - Warrior woman. Once "Brienne the Blue" in King Renly's Rainbow Guard. On a mission to rescue the Stark girls for Catelyn Stark.Cersei Lannister - twin to Jaime, mother to Joffrey, Tommen, and Myrcella. Tywin Lannister - father of Jaime, Cersei, Tyrion. Lord of Casterly Rock. Hand of the King.Tyrion Lannister - youngest sibling of Jaime and Cersei. Imprisoned awaiting trial for murdering his nephew King Jofrrey.Tommen Lannister - King of the Andals, Roynar, and the First men. Lord of the Seven Kingdoms. Protector of the Realm.We've got four new sustainer tiers on our Buy Me a Coffee site. Something for every budget and level of interest. Check it out! Support the showSupport us: Buy from our store Buy us a Cup of Arbor Gold, or become a sustainer and receive cool perks Donate to our cause Use our exclusive URL for a free 30-day trial of Audible Buy or gift Marriott Bonvoy points through our affiliate link Rate and review us at Apple Podcasts, Spotify, podchaser.com, and elsewhere.Find us on social media: Discord Twitter @GhostsHarrenhal Facebook Instagram YouTube All Music credits to Ross Bugden:INSTAGRAM! : https://instagram.com/rossbugden/ (rossbugden) TWITTER! : https://twitter.com/RossBugden (@rossbugden) YOUTUBE! : https://www.youtube.com/wa...

spotify lord young song reach coffee game of thrones book club twin stark realm protector book reviews imprisoned seven kingdoms george rr martin cersei tyrion nicknamed brienne joffrey seventy two arya stark tyrion lannister ice and fire cersei lannister sansa stark asoiaf kingslayer ross bugden tarth marriott bonvoy loras tywin song of ice and fire valyrian tywin lannister tommen jamie lannister red keep kingsguard casterly rock catelyn stark highgarden myrcella renly baratheon andals storm of swords king renly arbor gold

Short Time Shots: November 17, 2023

Short Time Wrestling Podcast

Play Episode Listen Later Nov 17, 2023 3:50

It's Friday, and I'll be heading back to my native lands this weekend after the Husky Hockey squad finishes up with Duluth. This is NOT a hockey podcast, it's Short Time Shots, a look back at the day's scores around college wrestling. Here's what went down on Thursday. Second-ranked NC State dominated Army West Point 34-6. Jakob Camacho had a takedown and nearfall points late to break open a close bout to earn a 9-1 major decision over Army's Ethan Berginc at 125 pounds. Ed Scott beat Nathan Lukez 14-8 at 157 and Dylan Fishback majored Ben Pasiuk 15-7 at 184 pounds. Minnesota rolled Bucknell in Lewisburg 37-3. Minnesota true freshman Tyler Wells continued to be the second most famous wrestler from Princeton, Minnesota behind Bob Backlund, but his college legend begins as he topped Kurt Phipps at 133 pounds. The Gophers picked up five bonus victories including falls by 125-pound All-American Patrick McKee and heavyweight Bennett Tabor. Tabor pinned Bucknell's Dorian Crosby in sudden victory. Former EWL rivals met up in Pennsylvania as West Virginia bested Clarion 32-7. Bonus victories were registered by Jordan Titus at 141 pounds, Ty Watters at 149, Peyton Hall at 165, Dennis Robin at 184 and Michael Wolfgram at heavyweight. The Mountaineers came in ranked 24th this week. They improved to 4-0. In Division II, top-ranked Lander won eight of 10 to defeat UNC Pembroke 31-9. Lander opened up with second-ranked James Joplin edging 11th-ranked Logan Seliga 6-4 at 125 pounds. Lander's Kaden Kuenzi picked off No. 11 Logan Robinson 8-7 at 133 pounds. The nationally ranked trio of David Hunsberger, Anthony Yacovetti and Juan Edmond-Holmes all picked up bonus victories on the night. Down in Missouri, Third-ranked St. Cloud State picked up a pair of wins, beating Central Missouri 37-11 and host Drury 46-5. The Mules did pick up a ranked win at 133 pounds where eighth-ranked Tommy Frezza majored ninth-ranked Caleb Meekins 11-1. St. Cloud State got one back at 165 pounds where fourth-ranked Anthony Herrera pinned eight-ranked John Ridle in the first period. Central Missouri beat Drury 47-4. Minnesota State Moorhead beat NAIA Jamestown 38-10. In Division III, 15th-ranked Coe beat Central 32-13 with All-American Will Esmoil and L.J. Richardson picking up bonus victories for the Kohawks, yes, with a K. Loras pounded the Beavers of Buena Vista 43-3. Augustana (Ill.) singed the Firebirds of Carthage 22-21 in the CCIW, while Augustana's women beat Carthage 29-20. Pacific beat Linfield 34-24 on the men's side, while Linfield's women prevailed 20-18. In the NAIA, Dakota Wesleyan beat Northwestern (Iowa) 26-21, Ottawa (Kan.) beat York (Neb.) 43-7. The Ottawa women also beat York 40-10. Cornerstone topped NJCAA Henry Ford, the college, not the man, 48-3. Providence (Mont.) beat North Idaho of the NJCAA 39-3. Northeastern Oklahoma A&M beat Carl Albert State 33-8 in junior college action. Subscribe, for free, to my Daily Wrestling News. Get it for free at mattalkonline.com/news, sign up free today. That's as always, delivered by our longtime friends at Resilite. Go Huskies. WOOOOO!

minnesota army pennsylvania missouri pacific west virginia ottawa shots richardson cornerstone nc state beavers duluth lander mules carthage naia tabor gophers coe mountaineers drury buena vista bucknell clarion firebirds north idaho short time central missouri bob backlund njcaa augustana lewisburg cloud state linfield army west point loras ed scott unc pembroke northwestern iowa anthony herrera

"SONS OF THE HARPY" Game of Thrones Season 5 EP4 Recap

WATCH DEM THRONES by Black With No Chaser

Play Episode Listen Later Nov 7, 2023 97:02

We pick up where we left off with Jorah Mormont stealing a boat to navigate with his captive, Tyrion. Aboard a merchant ship, Jaime and Bronn sail to Dorne to rescue Myrcella and getting betrayed and caught up while doing so. Cersei has the Iron Bank on her ass and they want 10% of what's owed to them...and she ain't got it. She sends Mace Tyrell to go talk to them as well as getting him out of the way. Cersei also plots with the High Sparrow to try and manipulate the faith to do her dirty work. The faith attacks the bars and brothels and takes Loras as captive for being gay. We also see Ellaria and the sandsnakes decide to do what Oberyn's brother won't do...avenge his death against the Lannisters. The Red Priestess tried to throw that hotbox on Jon and he refuses. And finally, we see the Sons mount an attack on Dany and her people with Ser Barriston paying the price for it....If you haven't already, please to subscribe to the audio version of the podcast on all your favorite podcast platforms...If you want to keep the fun going with us throughout the week, come join our Facebook group.JOIN THE WATCH DEM THRONES FACEBOOK GROUPhttps://www.facebook.com/groups/126567443834910/?ref=share&mibextid=NSMWBTTO WATCH AND SUBSCRIBE:Facebook:https://www.facebook.com/blackwithnochaser/Twitter:https://twitter.com/BeBlackNoChaser?t=pVFV06lBFdZRu72ot4uCjA&s=09Twitter:https://twitter.com/WatchDemThrones?t=q0ngrYPlugf0ttzM2jo39A&s=09Apple Music: https://podcasts.apple.com/us/podcast/watch-dem-thrones-by-black-with-no-chaser/id1641754247Spotify:https://open.spotify.com/show/1qI1bJ1vIlobu502w6zrtN?si=mtsa3gZYRZW_3FmlCrv7UgBWNC RADIO: https://apps.apple.com/us/app/bwnc-radio/id6443800363Amazon Musichttps://music.amazon.com/podcasts/45279c3a-c09f-47d1-a3a3-88e6e2507230/watch-dem-thrones-by-black-with-no-chaserIHeartRadiohttps://www.iheart.com/podcast/269-watch-dem-thrones-by-black-101286659/?cmp=android_share&sc=android_social_share&pr=false#gameofthrones #demdragons #blackwithnochaser #houseofthedragonhbo #dragonseeds #theblacks #thegreens #houseofthedragon #youtube #targaryens #podcast #podsincolor #applemusic #spotifymusic #podsincolor #starks #lannisters #Velaryon

sons aboard dany game of thrones season cersei lannister tyrion harpy dorne bronn high sparrow loras jorah mormont iron bank oberyn velaryon myrcella ellaria mace tyrell red priestess

Chapter Sixty-Seven - Jamie 8 - A Storm of Swords | A Song of Ice and Fire (ASOIAF)

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)

Play Episode Listen Later Oct 17, 2023 74:57

Jaime reminisces about the bygone glory days of the Kingsguard. He turns his attention to righting the wrongs of the current White Swords. He goes man-by-man through his sworn brothers, calling them out on the issues he has with them. Lastly, he has a heart-to-heart with Loras about his claim that Brienne killed Renly. Jenny and Mackelly applaud Jaime's closing arguments.Chapter Review:Jaime Lannister reads through the White Book, the written history of the Kingsguard, while waiting for his sworn brothers to join. When his brothers arrive, Jaime admonishes them for allowing the king to die under their watch. He then gauges their opinions on Tyrion's involvement in the death.He goes through each man and calls them out on issues. Ser Boros Blount will now be King Tommen's taste tester, as he's unfit to protect the boy he gave up so easily. He then questions why he's never heard of Ser Osmund Kettleblack before. After that, he asks Ser Meryn Trant to show him where in their vows it says to beat women and children, like he did to Sansa Stark. He praises Balon Swann's character but questions how he'd handle his family challenging Tommen's rule.Finally, he turns his attention to Ser Loras Tyrell, who reminds Jaime of himself at that age. Jaime makes a case for Brienne's possible innocence in the murder of Renly Baratheon. His argument is enough to seed doubt in Loras' mind. Jaime tasks Loras with talking with Brienne and deciding her innocence or guilt. Whatever Loras determines is the case, Jaime will abide by it.Characters/Places/Names/Events:Jaime Lannister - Member of the Kingsguard. Twin brother to queen Cersei and older brother to Tyrion.Ser Boros Blount - Member of the Kingsguard. Gave up Tommen to Tyrion's men without a fight.Ser Meryn Trant - Member of the KIngsguard. Beat Sansa Stark on the command of King Joffrey.Ser Osmund Kettleblack - Member of the Kingsguard. Former sellsword, with a mysterious past.Ser Balon Swann - Member of the Kingsguard. Seemingly honorable and valiant.Ser Loras Tyrell - Member of the Kingsguard. Young, dashing, talented knight from the Reach. The third son of Lord Mace Tyrell of Highgarden.Brienne of Tarth - Warrior woman. Once "Brienne the Blue" in King Renly's Rainbow Guard. On a mission to rescue the Stark girls for Catelyn Stark. Support the showSupport us: Buy from our store Buy us a Cup of Arbor Gold, or become a sustainer and receive cool perks Donate to our cause Use our exclusive URL for a free 30-day trial of Audible Buy or gift Marriott Bonvoy points through our affiliate link Rate and review us at Apple Podcasts, Spotify, podchaser.com, and elsewhere.Find us on social media: Discord Twitter @GhostsHarrenhal Facebook Instagram YouTube All Music credits to Ross Bugden:INSTAGRAM! : https://instagram.com/rossbugden/ (rossbugden) TWITTER! : https://twitter.com/RossBugden (@rossbugden) YOUTUBE! : https://www.youtube.com/wa...

spotify young song reach game of thrones twin stark sixty book reviews george rr martin cersei tyrion brienne ice and fire sansa stark asoiaf ross bugden marriott bonvoy loras song of ice and fire tommen jamie lannister renly kingsguard king joffrey catelyn stark highgarden renly baratheon storm of swords brienne of tarth loras tyrell meryn trant king renly arbor gold

Weighin' in with Worm Interview Loras 2X AA Rick Healey

Dubuque Area Baseball Podcast

Play Episode Listen Later Oct 4, 2023 37:08

Loras Hall of Fame Athlete Rick Healey joins me this week before his induction into the Hall. We discuss the struggles and balances of being a young father while chasing dreams of All American Status; also his thoughts about going into the hall this weekend!

worm healey loras weighin

"FIRST OF HIS NAME" Game of Thrones Season 4 EP5 Recap

WATCH DEM THRONES by Black With No Chaser

Play Episode Listen Later Aug 28, 2023 85:23

In the episode, Tommen Baratheon is crowned as the new king of the Seven Kingdoms. Over in Meereen Sansa learns of the Purple Wedding and Wish Daario brings her a fleet of ships. Next we find Lil Finger and Sansa arriving at the Eyrie and are welcomed by Lysa Arryn. At King's Landing, Cersei discusses wedding arrangements with Tywin. It's decided that Margaery will marry Tommen in a fortnight, and Cersei will marry Loras a fortnight later. Daenerys Targaryen decides to stay in Meereen; and Jon Snow leads an attack on the Night's Watch mutineers at Craster's Keep plus much much more....JOIN THE WATCH DEM THRONES FACEBOOK GROUPhttps://www.facebook.com/groups/126567443834910/?ref=share&mibextid=NSMWBTTO WATCH AND SUBSCRIBE:Facebook:https://www.facebook.com/blackwithnochaser/Twitter:https://twitter.com/BeBlackNoChaser?t=pVFV06lBFdZRu72ot4uCjA&s=09Twitter:https://twitter.com/WatchDemThrones?t=q0ngrYPlugf0ttzM2jo39A&s=09Apple Music: https://podcasts.apple.com/us/podcast/watch-dem-thrones-by-black-with-no-chaser/id1641754247Spotify:https://open.spotify.com/show/1qI1bJ1vIlobu502w6zrtN?si=mtsa3gZYRZW_3FmlCrv7UgBWNC RADIO: https://apps.apple.com/us/app/bwnc-radio/id6443800363Amazon Musichttps://music.amazon.com/podcasts/45279c3a-c09f-47d1-a3a3-88e6e2507230/watch-dem-thrones-by-black-with-no-chaserIHeartRadiohttps://www.iheart.com/podcast/269-watch-dem-thrones-by-black-101286659/?cmp=android_share&sc=android_social_share&pr=false#gameofthrones #demdragons #blackwithnochaser #houseofthedragonhbo #dragonseeds #theblacks #thegreens #houseofthedragon #youtube #targaryens #podcast #podsincolor #applemusic #spotifymusic #podsincolor

night landing jon snow seven kingdoms game of thrones season sansa cersei daenerys targaryen eyrie loras tywin tommen meereen craster margaery purple wedding lysa arryn

2023 Indoor Nationals Preview Part 2 with Cheick Traore, Mike Jasa, and Yakob Ekoue

D3 Glory Days Podcast

Play Episode Listen Later Mar 7, 2023 54:00

In a two part episode, we sit down with 7 athletes getting ready to compete at the D3 Indoor Track and Field national championships. We discuss how their season has gone so far, tactics they use to stay calm before big meets, and any predictions they may have. Time Stamp Cheick Traore - 200/400, Ramapo - 2:25 Mike Jasa - 800, 4x4, DMR, Loras - 18:32 Yakob Ekoue - SP/WT, UW-Eau Claire, 36:35 BPN Ambassador We're excited to be Ambassadors for Bare Performance Nutrition. A supplement company that gets you ready for next competition. Whether it's protein or electrolytes, their supplements are geared to help you compete at your best. I've been taking their electrolytes to refuel after a run and can it increase my recovery. If you're interested in trying them out, head over to their website and try today. How to Support D3 Glory Days: D3 Glory Days Venmo. We launched a Patreon! Subscribe and leave us a review on Apple Podcasts Instagram, Twitter and Strava. D3 Glory Days Merch

field ambassadors nationals indoor strava jasa dmr loras bare performance nutrition cheick uw eau claire ramapo

Podcasts about loras

Best podcasts about loras

Duhawk Digest

Dubuque Area Baseball Podcast

Short Time Wrestling Podcast

Tuesday's with Nick & Cole

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)

Close the Door: Game of Thrones, A Song of Ice and Fire Podcast

D3 Glory Days Podcast

WATCH DEM THRONES by Black With No Chaser

Hoopsville

LWLA Podcast

Not A Podcast ASOIAF Re-Read Podcast

Unspoiled! A Song Of Ice And Fire

Hoop Heads

D3football.com » D3football.com Around the Nation Podcast

Isle Of Faces

D3 Nation

On The Mat

#SoccerChat

The Podcast About Division III Baseball

Escuchando Peliculas

GeoCastAway | GeoNáufragos

Game of Microphones: A Game of Thrones Podcast

In The Room

The Nonlinear Library

The top AI news from the past week, every ThursdAI

The Run Around Iowa

Latest news about loras

Latest podcast episodes about loras

Israel & Hezbollah Renew Uneasy Ceasefire + Report: IRGC Cells in Iraq Launched Attacks Against Kuwait, Saudi Arabia, & UAE + US Renames Combatant Command: Indo-PACOM Out; PACOM Name Restored

UFR 2486 Segment 6 Brandon Novak (Regional College Football: Loras)

Why Video Agent models are next — Ethan He, xAI Grok Imagine

Zeit der Krähen - 242 Jaime

ATN Podcast 406: Looking ahead at Draft, Duhawks, Diplomats

23.37b: Selection Sunday Special - Part 2

Reference Images vs LoRAs: Which Actually Works?

23.15: Unranked Success

ThursdAI - Jan 8 - Vera Rubin's 5x Jump, Ralph Wiggum Goes Viral, GPT Health Launches & XAI Raises $20B Mid-Controversy

Perspectives on: GenAI in Fashion

GPT‑5.1's New Brain, Grok's 2M Context, Omnilingual ASR, and a Terminal UI That Sparks Joy

Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)

Why Fine-Tuning Lost and RL Won

Weighin' in with Worm: Season 3 Episode 6: Loras Head Mens Wrestling Coach Devin Bahr

ComfyUI Explained: How AI Image Generation Actually Works (Step-by-Step)

MLA 026 AI Video Generation: Veo 3 vs Sora, Kling, Runway, Stable Video Diffusion

The Potential of AI Video with Seth Woodall

Weighin' in with Worm Episode 24: Regionals, Conference, and Key City Showdown..Oh My!

Inside the DuHawks Dugout: Coach Carl Tebon Previews the 2025 Season

AI Engineering for Art — with comfyanonymous, of ComfyUI

Latent.Space 2024 Year in Review

Weighin' in with Worm S2 Episode 15: First Competitions are here!

Chapter Thirty-Six - Cersei 8 - A Feast for Crows - A Song of Ice and Fire (ASOIAF)

551: AFfC Jaime II (repost)

Building AGI in Real Time (OpenAI Dev Day 2024)

Language Agents: From Reasoning to Acting

Weighin' in with Worm: S2 Episode 6 Loras Wrestling Jeren Glosser and Devin Bahr

Chapter Twenty-Eight - Cersei 6 - A Feast for Crows | A Song of Ice and Fire (ASOIAF)

Efficiency is Coming: 3000x Faster, Cheaper, Better AI Inference from Hardware Improvements, Quantization, and Synthetic Data Distillation

Weighin in with Worm: S2 E2 Shane Liegel

Weighin in with Worm S2E1: Guy Patron Jr.

STEVE HELMINIAK LORAS PREVIEW (UFR).mp3

Talking Wrestling with Head Coach Brett Wheelan

Hawkeye Report Podcast 583

2024 Loras College Baseball Season Preview

517: ASoS Jaime VIII (repost)

21.17a: 10th Annual Marathon (Part 1)

"BOOK OF THE STRANGER" Game of Thrones Season 6 EP4 Recap

EA - AI doing philosophy = AI generating hands? by Wei Dai

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

Short Time Shots: December 8, 2023

The Busy Person's Intro to Finetuning & Open Source AI - Wing Lian, Axolotl

Short Time Shots - December 1, 2023

Chapter Seventy-Two - Jaime 9 - A Storm of Swords | A Song of Ice and Fire (ASOIAF)

Short Time Shots: November 17, 2023

"SONS OF THE HARPY" Game of Thrones Season 5 EP4 Recap

Chapter Sixty-Seven - Jamie 8 - A Storm of Swords | A Song of Ice and Fire (ASOIAF)

Weighin' in with Worm Interview Loras 2X AA Rick Healey

"FIRST OF HIS NAME" Game of Thrones Season 4 EP5 Recap