Podcasts about loras

  • 110PODCASTS
  • 352EPISODES
  • 47mAVG DURATION
  • 1EPISODE EVERY OTHER WEEK
  • Mar 25, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about loras

Show all podcasts related to loras

Latest podcast episodes about loras

The Marketing Architects
The Potential of AI Video with Seth Woodall

The Marketing Architects

Play Episode Listen Later Mar 25, 2025 37:50


AI video tools have created lifelike Coca-Cola trucks, festive scenes, and even Angela's face in today's episode. These jaw-dropping demonstrations highlight how rapidly AI video has evolved and the debate this technology is creating.This week, Elena and Rob are joined by Seth Woodall, Director of AI Visual Arts at Misfits & Machines and creator of the viral "Pepperoni Hug Spot" commercial. Together, they explore the state of AI video creation, misconceptions about the technology, and what's next for marketers looking to incorporate these tools. Plus, discover why AI-powered creative techniques will become standard production methods and indistinguishable from traditionally created work.Topics covered: [01:00] The controversy behind Coca-Cola's AI holiday commercial[07:00] How Seth created the viral "Pepperoni Hug Spot" AI commercial[15:45] Why AI video expertise is hard to find in traditional agencies[18:00] Predictions for AI video quality improvements[21:30] AI tools like Comfy UI, Flux and LoRAs[26:00] Common misconceptions about how AI video is created[29:30] The democratization of high-quality video creation  To learn more, visit marketingarchitects.com/podcast or subscribe to our newsletter at marketingarchitects.com/newsletter.  Resources: NBC News 2024 Article: https://www.nbcnews.com/tech/innovation/coca-cola-causes-controversy-ai-made-ad-rcna180665Pepperoni Hug Spot Commercial: https://www.youtube.com/watch?v=qSewd6Iaj6I Get more research-backed marketing strategies by subscribing to The Marketing Architects on Apple Podcasts, Spotify, or wherever you listen to podcasts. 

Filmbranschpodden
Billy Boman och Mikeadelica - AI & framtidens filmskapande

Filmbranschpodden

Play Episode Listen Later Mar 19, 2025 114:09


VEM: Billy Boman & Mikeadelica (Northern Spark) EPISOD: 41 (Säsong 3 avsnitt 6) EN PODD AV: Simon Kölle www.linktr.ee/simonkolle PRODUCERAS AV: Simon Kölle & Amir Noghabai SÄSONG 3 GÖRS I SAMARBETE MED: FrameSage, med hjälp utav FrameSage kan du finansiera ditt filmprojekt eller investera i film.  Besök www.framesage.com  SÄSONG 3 SPONSRAS AV: Story Academy och Film Crew. Drömmer du om att jobba med film och TV? På Gotlands folkhögskola finns Story Academy och Film Crew – två utbildningar för dig som vill in i branschen. Om Episoden: 

The top AI news from the past week, every ThursdAI

LET'S GO! Happy second birthday to ThursdAI, your favorite weekly AI news show! Can you believe it's been two whole years since we jumped into that random Twitter Space to rant about GPT-4? From humble beginnings as a late-night Twitter chat to a full-blown podcast, Newsletter and YouTube show with hundreds of thousands of downloads, it's been an absolutely wild ride! That's right, two whole years of me, Alex Volkov, your friendly AI Evangelist, along with my amazing co-hosts, trying to keep you up-to-date on the breakneck speed of the AI worldAnd what better way to celebrate than with a week PACKED with insane AI news? Buckle up, folks, because this week Google went OPEN SOURCE crazy, Gemini got even cooler, OpenAI created a whole new Agents SDK and the open-source community continues to blow our minds. We've got it all - from game-changing model releases to mind-bending demos.This week I'm also on the Weights & Biases company retreat, so TL;DR first and then the newsletter, but honestly, I'll start embedding the live show here in the substack from now on, because we're getting so good at it, I barely have to edit lately and there's a LOT to show you guys! TL;DR and Show Notes & Links* Hosts & Guests* Alex Volkov - AI Eveangelist & Weights & Biases (@altryne)* Co Hosts - @WolframRvnwlf @ldjconfirmed @nisten * Sandra Kublik - DevRel at Cohere (@itsSandraKublik)* Open Source LLMs * Google open sources Gemma 3 - 1B - 27B - 128K context (Blog, AI Studio, HF)* EuroBERT - multilingual encoder models (210M to 2.1B params)* Reka Flash 3 (reasoning) 21B parameters is open sourced (Blog, HF)* Cohere Command A 111B model - 256K context (Blog)* Nous Research Deep Hermes 24B / 3B Hybrid Reasoners (X, HF)* AllenAI OLMo 2 32B - fully open source GPT4 level model (X, Blog, Try It)* Big CO LLMs + APIs* Gemini Flash generates images natively (X, AI Studio)* Google deep research is now free in Gemini app and powered by Gemini Thinking (Try It no cost)* OpenAI released new responses API, Web Search, File search and Computer USE tools (X, Blog)* This weeks Buzz * The whole company is at an offsite at oceanside, CA* W&B internal MCP hackathon and had cool projects - launching an MCP server soon!* Vision & Video* Remade AI - 8 LORA video effects for WANX (HF)* AI Art & Diffusion & 3D* ByteDance Seedream 2.0 - A Native Chinese-English Bilingual Image Generation Foundation Model by ByteDance (Blog, Paper)* Tools* Everyone's talking about Manus - (manus.im)* Google AI studio now supports youtube understanding via link droppingOpen Source LLMs: Gemma 3, EuroBERT, Reka Flash 3, and Cohere Command-A Unleashed!This week was absolutely HUGE for open source, folks. Google dropped a BOMBSHELL with Gemma 3! As Wolfram pointed out, this is a "very technical achievement," and it's not just one model, but a whole family ranging from 1 billion to 27 billion parameters. And get this – the 27B model can run on a SINGLE GPU! Sundar Pichai himself claimed you'd need "at least 10X compute to get similar performance from other models." Insane!Gemma 3 isn't just about size; it's packed with features. We're talking multimodal capabilities (text, images, and video!), support for over 140 languages, and a massive 128k context window. As Nisten pointed out, "it might actually end up being the best at multimodal in that regard" for local models. Plus, it's fine-tuned for safety and comes with ShieldGemma 2 for content moderation. You can grab Gemma 3 on Google AI Studio, Hugging Face, Ollama, Kaggle – everywhere! Huge shoutout to Omar Sanseviero and the Google team for this incredible release and for supporting the open-source community from day one! Colin aka Bartowski, was right, "The best thing about Gemma is the fact that Google specifically helped the open source communities to get day one support." This is how you do open source right!Next up, we have EuroBERT, a new family of multilingual encoder models. Wolfram, our European representative, was particularly excited about this one: "In European languages, you have different characters than in other languages. And, um, yeah, encoding everything properly is, uh, difficult." Ranging from 210 million to 2.1 billion parameters, EuroBERT is designed to push the boundaries of NLP in European and global languages. With training on a massive 5 trillion-token dataset across 15 languages and support for 8K context tokens, EuroBERT is a workhorse for RAG and other NLP tasks. Plus, how cool is their mascot?Reka Flash 3 - a 21B reasoner with apache 2 trained with RLOOAnd the open source train keeps rolling! Reka AI dropped Reka Flash 3, a 21 billion parameter reasoning model with an Apache 2.0 license! Nisten was blown away by the benchmarks: "This might be one of the best like 20B size models that there is right now. And it's Apache 2.0. Uh, I, I think this is a much bigger deal than most people realize." Reka Flash 3 is compact, efficient, and excels at chat, coding, instruction following, and function calling. They even used a new reinforcement learning technique called REINFORCE Leave One-Out (RLOO). Go give it a whirl on Hugging Face or their chat interface – chat.reka.ai!Last but definitely not least in the open-source realm, we had a special guest, Sandra (@itsSandraKublik) from Cohere, join us to announce Command-A! This beast of a model clocks in at 111 BILLION parameters with a massive 256K context window. Sandra emphasized its efficiency, "It requires only two GPUs. Typically the models of this size require 32 GPUs. So it's a huge, huge difference." Command-A is designed for enterprises, focusing on agentic tasks, tool use, and multilingual performance. It's optimized for private deployments and boasts enterprise-grade security. Congrats to Sandra and the Cohere team on this massive release!Big CO LLMs + APIs: Gemini Flash Gets Visual, Deep Research Goes Free, and OpenAI Builds for AgentsThe big companies weren't sleeping either! Google continued their awesome week by unleashing native image generation in Gemini Flash Experimental! This is seriously f*****g cool, folks! Sorry for my French, but it's true. You can now directly interact with images, tell Gemini what to do, and it just does it. We even showed it live on the stream, turning ourselves into cat-confetti-birthday-hat-wearing masterpieces! Wolfram was right, "It's also a sign what we will see in, like, Photoshop, for example. Where you, you expect to just talk to it and have it do everything that a graphic designer would be doing." The future of creative tools is HERE.And guess what else Google did? They made Deep Research FREE in the Gemini app and powered by Gemini Thinking! Nisten jumped in to test it live, and we were all impressed. "This is the nicest interface so far that I've seen," he said. Deep Research now digs through HUNDREDS of websites (Nisten's test hit 156!) to give you comprehensive answers, and the interface is slick and user-friendly. Plus, you can export to Google Docs! Intelligence too cheap to meter? Google is definitely pushing that boundary.Last second additions - Allen Institute for AI released OLMo 2 32B - their biggest open model yetJust as I'm writing this, friend of the pod, Nathan from Allen Institute for AI announced the release of a FULLY OPEN OLMo 2, which includes weights, code, dataset, everything and apparently it beats the latest GPT 3.5, GPT 4o mini, and leading open weight models like Qwen and Mistral. Evals look legit, but nore than that, this is an Apache 2 model with everything in place to advance open AI and open science! Check out Nathans tweet for more info, and congrats to Allen team for this awesome release! OpenAI new responses API and Agent ASK with Web, File and CUA toolsOf course, OpenAI wasn't going to let Google have all the fun. They dropped a new SDK for agents called the Responses API. This is a whole new way to build with OpenAI, designed specifically for the agentic era we're entering. They also released three new tools: Web Search, Computer Use Tool, and File Search Tool. The Web Search tool is self-explanatory – finally, built-in web search from OpenAI!The Computer Use Tool, while currently limited in availability, opens up exciting possibilities for agent automation, letting agents interact with computer interfaces. And the File Search Tool gives you a built-in RAG system, simplifying knowledge retrieval from your own files. As always, OpenAI is adapting to the agentic world and giving developers more power.Finally in the big company space, Nous Research released PORTAL, their new Inference API service. Now you can access their awesome models, like Hermes 3 Llama 70B and DeepHermes 3 8B, directly via API. It's great to see more open-source labs offering API access, making these powerful models even more accessible.This Week's Buzz at Weights & Biases: Offsite Hackathon and MCP Mania!This week's "This Week's Buzz" segment comes to you live from Oceanside, California! The whole Weights & Biases team is here for our company offsite. Despite the not-so-sunny California weather (thanks, storm!), it's been an incredible week of meeting colleagues, strategizing, and HACKING!And speaking of hacking, we had an MCP hackathon! After last week's MCP-pilling episode, we were all hyped about Model Context Protocol, and the team didn't disappoint. In just three hours, the innovation was flowing! We saw agents built for WordPress, MCP support integrated into Weave playground, and even MCP servers for Weights & Biases itself! Get ready, folks, because an MCP server for Weights & Biases is COMING SOON! You'll be able to talk to your W&B data like never before. Huge shoutout to the W&B team for their incredible talent and for embracing the agentic future! And in case you missed it, Weights & Biases is now part of the CoreWeave family! Exciting times ahead!Vision & Video: LoRA Video Effects and OpenSora 2.0Moving into vision and video, Remade AI released 8 LoRA video effects for 1X! Remember 1X from Alibaba? Now you can add crazy effects like "squish," "inflate," "deflate," and even "cakeify" to your videos using LoRAs. It's open source and super cool to see video effects becoming trainable and customizable.And in the realm of open-source video generation, OpenSora 2.0 dropped! This 11 billion parameter model claims state-of-the-art video generation trained for just $200,000! They're even claiming performance close to Sora itself on some benchmarks. Nisten checked out the demos, and while we're all a bit jaded now with the rapid pace of video AI, it's still mind-blowing how far we've come. Open source video is getting seriously impressive, seriously fast.AI Art & Diffusion & 3D: ByteDance's Bilingual Seedream 2.0ByteDance, the folks behind TikTok, released Seedream 2.0, a native Chinese-English bilingual image generation foundation model. This model, from ByteDream, excels at text rendering, cultural nuance, and human preference alignment. Seedream 2.0 boasts "powerful general capability," "native bilingual comprehension ability," and "excellent text rendering." It's designed to understand both Chinese and English prompts natively, generating high-quality, culturally relevant images. The examples look stunning, especially its ability to render Chinese text beautifully.Tools: Manus AI Agent, Google AI Studio YouTube Links, and Cursor EmbeddingsFinally, in the tools section, everyone's buzzing about Manus, a new AI research agent. We gave it a try live on the show, asking it to do some research. The UI is slick, and it seems to be using Claude 3.7 behind the scenes. Manus creates a to-do list, browses the web in a real Chrome browser, and even generates files. It's like Operator on steroids. We'll be keeping an eye on Manus and will report back on its performance in future episodes.And Google AI Studio keeps getting better! Now you can drop YouTube links into Google AI Studio, and it will natively understand the video! This is HUGE for video analysis and content understanding. Imagine using this for support, content summarization, and so much more.PHEW! What a week to celebrate two years of ThursdAI! From open source explosions to Gemini's visual prowess and OpenAI's agentic advancements, the AI world is moving faster than ever. As Wolfram aptly put it, "The acceleration, you can feel it." And Nisten reminded us of the incredible journey, "I remember I had early access to GPT-4 32K, and, uh, then... the person for the contract that had given me access, they cut it off because on the one weekend, I didn't realize how expensive it was. So I had to use $180 worth of tokens just trying it out." Now, we have models that are more powerful and more accessible than ever before. Thank you to Wolfram, Nisten, and LDJ for co-hosting and bringing their insights every week. And most importantly, THANK YOU to our amazing community for tuning in, listening, and supporting ThursdAI for two incredible years! We couldn't do it without you. Here's to another year of staying up-to-date so YOU don't have to! Don't forget to subscribe to the podcast, YouTube channel, and newsletter to stay in the loop. And share ThursdAI with a friend – it's the best birthday gift you can give us! Until next week, keep building and keep exploring the amazing world of AI! LET'S GO! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe

The top AI news from the past week, every ThursdAI

Holy moly, AI enthusiasts! Alex Volkov here, reporting live from the AI Engineer Summit in the heart of (touristy) Times Square, New York! This week has been an absolute whirlwind of announcements, from XAI's Grok 3 dropping like a bomb, to Figure robots learning to hand each other things, and even a little eval smack-talk between OpenAI and XAI. It's enough to make your head spin – but that's what ThursdAI is here for. We sift through the chaos and bring you the need-to-know, so you can stay on the cutting edge without having to, well, spend your entire life glued to X and Reddit.This week we had a very special live show with the Haize Labs folks, the ones I previously interviewed about their bijection attacks, discussing their open source judge evaluation library called Verdict. So grab your favorite caffeinated beverage, maybe do some stretches because your mind will be blown, and let's dive into the TL;DR of ThursdAI, February 20th, 2025!Participants* Alex Volkov: AI Evangelist with Weights and Biases* Nisten: AI Engineer and cohost* Akshay: AI Community Member* Nuo: Dev Advocate at 01AI* Nimit: Member of Technical Staff at Haize Labs* Leonard: Co-founder at Haize LabsOpen Source LLMsPerplexity's R1 7076: Censorship-Free DeepSeekPerplexity made a bold move this week, releasing R1 7076, a fine-tuned version of DeepSeek R1 specifically designed to remove what they (and many others) perceive as Chinese government censorship. The name itself, 1776, is a nod to American independence – a pretty clear statement! The core idea? Give users access to information on topics the CCP typically restricts, like Tiananmen Square and Taiwanese independence.Perplexity used human experts to identify around 300 sensitive topics and built a "censorship classifier" to train the bias out of the model. The impressive part? They claim to have done this without significantly impacting the model's performance on standard evals. As Nuo from 01AI pointed out on the show, though, he'd "actually prefer that they can actually disclose more of their details in terms of post training... Running the R1 model by itself, it's already very difficult and very expensive." He raises a good point – more transparency is always welcome! Still, it's a fascinating attempt to tackle a tricky problem, the problem which I always say we simply cannot avoid. You can check it out yourself on Hugging Face and read their blog post.Arc Institute & NVIDIA Unveil Evo 2: Genomics PowerhouseGet ready for some serious science, folks! Arc Institute and NVIDIA dropped Evo 2, a massive genomics model (40 billion parameters!) trained on a mind-boggling 9.3 trillion nucleotides. And it's fully open – two papers, weights, data, training, and inference codebases. We love to see it!Evo 2 uses the StripedHyena architecture to process huge genetic sequences (up to 1 million nucleotides!), allowing for analysis of complex genomic patterns. The practical applications? Predicting the effects of genetic mutations (super important for healthcare) and even designing entire genomes. I've been super excited about genomics models, and seeing these alternative architectures like StripedHyena getting used here is just icing on the cake. Check it out on X.ZeroBench: The "Impossible" Benchmark for VLLMsNeed more benchmarks? Always! A new benchmark called ZeroBench arrived, claiming to be the "impossible benchmark" for Vision Language Models (VLLMs). And guess what? All current top-of-the-line VLLMs get a big fat zero on it.One example they gave was a bunch of scattered letters, asking the model to "answer the question that is written in the shape of the star among the mess of letters." Honestly, even I struggled to see the star they were talking about. It highlights just how much further VLLMs need to go in terms of true visual understanding. (X, Page, Paper, HF)Hugging Face's Ultra Scale Playbook: Scaling UpFor those of you building massive models, Hugging Face released the Ultra Scale Playbook, a guide to building and scaling AI models on huge GPU clusters.They ran 4,000 scaling experiments on up to 512 GPUs (nothing close to Grok's 100,000, but still impressive!). If you're working in a lab and dreaming big, this is definitely a resource to check out. (HF).Big CO LLMs + APIsGrok 3: XAI's Big Swing new SOTA LLM! (and Maybe a Bug?)Monday evening, BOOM! While some of us were enjoying President's Day, the XAI team dropped Grok 3. They announced it with a setting very similar to OpenAI announcements. They're claiming state-of-the-art performance on some benchmarks (more on that drama later!), and a whopping 1 million token context window, finally confirmed after some initial confusion. They talked a lot about agents and a future of reasoners as well.The launch was a bit… messy. First, there was a bug where some users were getting Grok 2 even when the dropdown said Grok 3. That led to a lot of mixed reviews. Even when I finally thought I was using Grok 3, it still flubbed my go-to logic test, the "Beth's Ice Cubes" question. (The answer is zero, folks – ice cubes melt!). But Akshay, who joined us on the show, chimed in with some love: "...with just the base model of Grok 3, it's, in my opinion, it's the best coding model out there." So, mixed vibes, to say the least! It's also FREE for now, "until their GPUs melt," according to XAI, which is great.UPDATE: The vibes are shifting, more and more of my colleagues and mutuals are LOVING grok3 for one shot coding, for talking to it. I'm getting convinced as well, though I did use and will continue to use Grok for real time data and access to X. DeepSearchIn an attempt to show off some Agentic features, XAI also launched a deep search (not research like OpenAI but effectively the same) Now, XAI of course has access to X, which makes their deep search have a leg up, specifically for real time information! I found out it can even “use” the X search! OpenAI's Open Source TeaseIn what felt like a very conveniently timed move, Sam Altman dropped a poll on X the same day as the Grok announcement: if OpenAI were to open-source something, should it be a small, mobile-optimized model, or a model on par with o3-mini? Most of us chose o3 mini, just to have access to that model and play with it. No indication of when this might happen, but it's a clear signal that OpenAI is feeling the pressure from the open-source community.The Eval Wars: OpenAI vs. XAIThings got spicy! There was a whole debate about the eval numbers XAI posted, specifically the "best of N" scores (like best of 64 runs). Boris from OpenAI, and Aiden mcLau called out some of the graphs. Folks on X were quick to point out that OpenAI also used "best of N" in the past, and the discussion devolved from there.XAI is claiming SOTA. OpenAI (or some folks from within OpenAI) aren't so sure. The core issue? We can't independently verify Grok's performance because there's no API yet! As I said, "…we're not actually able to use this model to independently evaluate this model and to tell you guys whether or not they actually told us the truth." Transparency matters, folks!DeepSearch - How Deep?Grok also touted a new "Deep Search" feature, kind of like Perplexity or OpenAI's "Deep Research" in their more expensive plan. My initial tests were… underwhelming. I nicknamed it "Shallow Search" because it spent all of 34 seconds on a complex query where OpenAI's Deep Research took 11 minutes and cited 17 sources. We're going to need to do some more digging (pun intended) on this one.This Week's BuzzWe're leaning hard into agents at Weights & Biases! We just released an agents whitepaper (check it out on our socials!), and we're launching an agents course in collaboration with OpenAI's Ilan Biggio. Sign up at wandb.me/agents! We're hearing so much about agent evaluation and observability, and we're working hard to provide the tools the community needs.Also, sadly, our Toronto workshops are completely sold out. But if you're at AI Engineer in New York, come say hi to our booth! And catch my talk on LLM Reasoner Judges tomorrow (Friday) at 11 am EST – it'll be live on the AI Engineer YouTube channel (HERE)!Vision & VideoMicrosoft MUSE: Playable Worlds from a Single ImageThis one is wild. Microsoft's MUSE can generate minutes of playable gameplay from just a single second of video frames and controller actions.It's based on the World and Human Action Model (WHAM) architecture, trained on a billion gameplay images from Xbox. So if you've been playing Xbox lately, you might be in the model! I found it particularly cool: "…you give it like a single second of a gameplay of any type of game with all the screen elements, with percentages, with health bars, with all of these things and their model generates a game that you can control." (X, HF, Blog).StepFun's Step-Video-T2V: State-of-the-Art (and Open Source!)We got two awesome open-source video breakthroughs this week. First, StepFun's Step-Video-T2V (and T2V Turbo), a 30 billion parameter text-to-video model. The results look really good, especially the text integration. Imagine a Chinese girl opening a scroll, and the words "We will open source" appearing as she unfurls it. That's the kind of detail we're talking about.And it's MIT licensed! As Nisten noted "This is pretty cool. It came out. Right before Sora came out, people would have lost their minds." (X, Paper, HF, Try It).HAO AI's FastVideo: Speeding Up HY-VideoThe second video highlight: HAO AI released FastVideo, a way to make HY-Video (already a strong open-source contender) three times faster with no additional training! They call the trick "Sliding Tile Attention" apparently that alone provides enormous boost compared to even flash attention.This is huge because faster inference means these models become more practical for real-world use. And, bonus: it supports HY-Video's Loras, meaning you can fine-tune it for, ahem, all kinds of creative applications. I will not go as far as to mention civit ai. (Github)Figure's Helix: Robot Collaboration!Breaking news from the AI Engineer conference floor: Figure, the humanoid robot company, announced Helix, a Vision-Language-Action (VLA) model built into their robots!It has full upper body control!What blew my mind: they showed two robots working together, handing objects to each other, based on natural language commands! As I watched, I exclaimed, "I haven't seen a humanoid robot, hand off stuff to the other one... I found it like super futuristically cool." The model runs on the robot, using a 7 billion parameter VLM for understanding and an 80 million parameter transformer for control. This is the future, folks!Tools & OthersMicrosoft's New Quantum Chip (and State of Matter!)Microsoft announced a new quantum chip and a new state of matter (called "topological superconductivity"). "I found it like absolutely mind blowing that they announced something like this," I gushed on the show. While I'm no quantum physicist, this sounds like a big deal for the future of computing.Verdict: Hayes Labs' Framework for LLM JudgesAnd of course, the highlight of our show: Verdict, a new open-source framework from Hayes Labs (the folks behind those "bijection" jailbreaks!) for composing LLM judges. This is a huge deal for anyone working on evaluation. Leonard and Nimit from Hayes Labs joined us to explain how Verdict addresses some of the core problems with LLM-as-a-judge: biases (like preferring their own responses!), sensitivity to prompts, and the challenge of "meta-evaluation" (how do you know your judge is actually good?).Verdict lets you combine different judging techniques ("primitives") to create more robust and efficient evaluators. Think of it as "judge-time compute scaling," as Leonard called it. They're achieving near state-of-the-art results on benchmarks like ExpertQA, and it's designed to be fast enough to use as a guardrail in real-time applications!One key insight: you don't always need a full-blown reasoning model for judging. As Nimit explained, Verdict can combine simpler LLM calls to achieve similar results at a fraction of the cost. And, it's open source! (Paper, Github,X).ConclusionAnother week, another explosion of AI breakthroughs! Here are my key takeaways:* Open Source is THRIVING: From censorship-free LLMs to cutting-edge video models, the open-source community is delivering incredible innovation.* The Need for Speed (and Efficiency): Whether it's faster video generation or more efficient LLM judging, performance is key.* Robots are Getting Smarter (and More Collaborative): Figure's Helix is a glimpse into a future where robots work together.* Evaluation is (Finally) Getting Attention: Tools like Verdict are essential for building reliable and trustworthy AI systems.* The Big Players are Feeling the Heat: OpenAI's open-source tease and XAI's rapid progress show that the competition is fierce.I'll be back in my usual setup next week, ready to break down all the latest AI news. Stay tuned to ThursdAI – and don't forget to give the pod five stars and subscribe to the newsletter for all the links and deeper dives. There's potentially an Anthropic announcement coming, so we'll see you all next week.TLDR* Open Source LLMs* Perplexity R1 1776 - finetune of china-less R1 (Blog, Model)* Arc institute + Nvidia - introduce EVO 2 - genomics model (X)* ZeroBench - impossible benchmark for VLMs (X, Page, Paper, HF)* HuggingFace ultra scale playbook (HF)* Big CO LLMs + APIs* Grok 3 SOTA LLM + reasoning and Deep Search (blog, try it)* OpenAI is about to open source something? Sam posts a polls* This weeks Buzz* We are about to launch an agents course! Pre-sign up wandb.me/agents* Workshops are SOLD OUT* Watch my talk LIVE from AI Engineer - 11am EST Friday (HERE)* Keep watching AI Eng conference after the show on AIE YT* )* Vision & Video* Microsoft MUSE - playable worlds from one image (X, HF, Blog)* Microsoft OmniParser - Better, faster screen parsing for GUI agents with OmniParser v2 (Gradio Demo)* HAO AI - fastVIDEO - making HY-Video 3x as fast (Github)* StepFun - Step-Video-T2V (+Turbo), a SotA 30B text-to-video model (Paper, Github, HF, Try It)* Figure announces HELIX - vision action model built into FIGURE Robot (Paper)* Tools & Others* Microsoft announces a new quantum chip and a new state of matter (Blog, X)* Verdict - Framework to compose SOTA LLM judges with JudgeTime Scaling (Paper, Github,X) This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe

Dubuque Area Baseball Podcast
Weighin' in with Worm Episode 24: Regionals, Conference, and Key City Showdown..Oh My!

Dubuque Area Baseball Podcast

Play Episode Listen Later Feb 5, 2025 32:59


Today we cover the regional tournaments for the girls, the seeds of the state tournament, and the Key City Showdown between Loras and UD.

Dubuque Area Baseball Podcast
Inside the DuHawks Dugout: Coach Carl Tebon Previews the 2025 Season

Dubuque Area Baseball Podcast

Play Episode Listen Later Jan 29, 2025 25:18


On this episode of the Dubuque Area Sports Podcast, Everybody's Favorite Coach, Coach Maneman, sits down with longtime Loras College Head Baseball Coach Carl Tebon as he enters his 31st season at the helm. Coach Tebon recaps last season, previews the 2025 squad, and gives insight into the team's strengths, key returners, and exciting newcomers. He also breaks down position battles, the pitching rotation, and local talent making an impact at Loras. Plus, get the inside scoop on the DuHawks' upcoming season opener against Iowa and their home opener on March 28. If you're a baseball fan or a player looking for the right college fit, this episode is a must-listen!

Archi-Tech Network
EP 53 | Ismail Seleit | What is a LORA? and the future of training your own AI models.

Archi-Tech Network

Play Episode Listen Later Jan 12, 2025 57:43


In this podcast we sit down with Ismail Seleit a AI architectural designer working with LORAs and training his own models to harness the potential of stable and comfyUI. We discuss what even is a LORA and how are architects using them in practice. ISMAIL SELEIT | / ismailseleit ISMAIL MODELS | https://civitai.com/user/ismailseleit ATN HOST | Oliver Thomas | / olly____t ATN MASTERCLASS | https://archi-tech.network 00:00 Introduction 00:51 ATN Courses 01:20 Hello Ismail 03:27 How did you get started with AI? 07:40 Transition from Park to Foster + Partners 10:45 Where are with with AI right now? 13:45 Have we seen AI improve design? 18:00 Automation vs Design 21:50 What is a LORA? 27:42 How are architects using LORAS 35:50 Where can people find LORAS? 38:30 Future issue with AI copyright 46:40 What next on the AI horizon? 54:00 Whats next for you? Enjoyed the video? Be sure to like, subscribe, and share for more insights into the intersection of the Metaverse and Architecture. Don't forget to hit the bell icon to stay updated on our latest content. Join the conversation on our social media platforms: INSTAGRAM | / architech.network LINKEDIN | / archi-tech-network TIKTOK | / architech.network EVENTS | https://linktr.ee/architech_network #housing #sustainablehousing #digitalfabrication #timberconstruction #modular #offsiteconstruction #bim #SustainableHomes #FACITHomes #ConstructionTechnology #HomeBuilding #BIM #MobileMicroFactory #SustainableConstruction #UKHousing #PodcastInterview

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Applications for the NYC AI Engineer Summit, focused on Agents at Work, are open!When we first started Latent Space, in the lightning round we'd always ask guests: “What's your favorite AI product?”. The majority would say Midjourney. The simple UI of prompt → very aesthetic image turned it into a $300M+ ARR bootstrapped business as it rode the first wave of AI image generation.In open source land, StableDiffusion was congregating around AUTOMATIC1111 as the de-facto web UI. Unlike Midjourney, which offered some flags but was mostly prompt-driven, A1111 let users play with a lot more parameters, supported additional modalities like img2img, and allowed users to load in custom models. If you're interested in some of the SD history, you can look at our episodes with Lexica, Replicate, and Playground.One of the people involved with that community was comfyanonymous, who was also part of the Stability team in 2023, decided to build an alternative called ComfyUI, now one of the fastest growing open source projects in generative images, and is now the preferred partner for folks like Black Forest Labs's Flux Tools on Day 1. The idea behind it was simple: “Everyone is trying to make easy to use interfaces. Let me try to make a powerful interface that's not easy to use.”Unlike its predecessors, ComfyUI does not have an input text box. Everything is based around the idea of a node: there's a text input node, a CLIP node, a checkpoint loader node, a KSampler node, a VAE node, etc. While daunting for simple image generation, the tool is amazing for more complex workflows since you can break down every step of the process, and then chain many of them together rather than manually switching between tools. You can also re-start execution halfway instead of from the beginning, which can save a lot of time when using larger models.To give you an idea of some of the new use cases that this type of UI enables:* Sketch something → Generate an image with SD from sketch → feed it into SD Video to animate* Generate an image of an object → Turn into a 3D asset → Feed into interactive experiences* Input audio → Generate audio-reactive videosTheir Examples page also includes some of the more common use cases like AnimateDiff, etc. They recently launched the Comfy Registry, an online library of different nodes that users can pull from rather than having to build everything from scratch. The project has >60,000 Github stars, and as the community grows, some of the projects that people build have gotten quite complex:The most interesting thing about Comfy is that it's not a UI, it's a runtime. You can build full applications on top of image models simply by using Comfy. You can expose Comfy workflows as an endpoint and chain them together just like you chain a single node. We're seeing the rise of AI Engineering applied to art.Major Tom's ComfyUI Resources from the Latent Space DiscordMajor shoutouts to Major Tom on the LS Discord who is a image generation expert, who offered these pointers:* “best thing about comfy is the fact it supports almost immediately every new thing that comes out - unlike A1111 or forge, which still don't support flux cnet for instance. It will be perfect tool when conflicting nodes will be resolved”* AP Workflows from Alessandro Perili are a nice example of an all-in-one train-evaluate-generate system built atop Comfy* ComfyUI YouTubers to learn from:* @sebastiankamph* @NerdyRodent* @OlivioSarikas* @sedetweiler* @pixaroma* ComfyUI Nodes to check out:* https://github.com/kijai/ComfyUI-IC-Light* https://github.com/MrForExample/ComfyUI-3D-Pack* https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait* https://github.com/pydn/ComfyUI-to-Python-Extension* https://github.com/THtianhao/ComfyUI-Portrait-Maker* https://github.com/ssitu/ComfyUI_NestedNodeBuilder* https://github.com/longgui0318/comfyui-magic-clothing* https://github.com/atmaranto/ComfyUI-SaveAsScript* https://github.com/ZHO-ZHO-ZHO/ComfyUI-InstantID* https://github.com/AIFSH/ComfyUI-FishSpeech* https://github.com/coolzilj/ComfyUI-Photopea* https://github.com/lks-ai/anynode* Sarav: https://www.youtube.com/@mickmumpitz/videos ( applied stuff )* Sarav: https://www.youtube.com/@latentvision (technical, but infrequent)* look for comfyui node for https://github.com/magic-quill/MagicQuill* “Comfy for Video” resources* Kijai (https://github.com/kijai) pushing out support for Mochi, CogVideoX, AnimateDif, LivePortrait etc* Comfyui node support like LTX https://github.com/Lightricks/ComfyUI-LTXVideo , and HunyuanVideo* FloraFauna AI* Communities: https://www.reddit.com/r/StableDiffusion/, https://www.reddit.com/r/comfyui/Full YouTube EpisodeAs usual, you can find the full video episode on our YouTube (and don't forget to like and subscribe!)Timestamps* 00:00:04 Introduction of hosts and anonymous guest* 00:00:35 Origins of Comfy UI and early Stable Diffusion landscape* 00:02:58 Comfy's background and development of high-res fix* 00:05:37 Area conditioning and compositing in image generation* 00:07:20 Discussion on different AI image models (SD, Flux, etc.)* 00:11:10 Closed source model APIs and community discussions on SD versions* 00:14:41 LoRAs and textual inversion in image generation* 00:18:43 Evaluation methods in the Comfy community* 00:20:05 CLIP models and text encoders in image generation* 00:23:05 Prompt weighting and negative prompting* 00:26:22 Comfy UI's unique features and design choices* 00:31:00 Memory management in Comfy UI* 00:33:50 GPU market share and compatibility issues* 00:35:40 Node design and parameter settings in Comfy UI* 00:38:44 Custom nodes and community contributions* 00:41:40 Video generation models and capabilities* 00:44:47 Comfy UI's development timeline and rise to popularity* 00:48:13 Current state of Comfy UI team and future plans* 00:50:11 Discussion on other Comfy startups and potential text generation supportTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Small AI.swyx [00:00:12]: Hey everyone, we are in the Chroma Studio again, but with our first ever anonymous guest, Comfy Anonymous, welcome.Comfy [00:00:19]: Hello.swyx [00:00:21]: I feel like that's your full name, you just go by Comfy, right?Comfy [00:00:24]: Yeah, well, a lot of people just call me Comfy, even when they know my real name. Hey, Comfy.Alessio [00:00:32]: Swyx is the same. You know, not a lot of people call you Shawn.swyx [00:00:35]: Yeah, you have a professional name, right, that people know you by, and then you have a legal name. Yeah, it's fine. How do I phrase this? I think people who are in the know, know that Comfy is like the tool for image generation and now other multimodality stuff. I would say that when I first got started with Stable Diffusion, the star of the show was Automatic 111, right? And I actually looked back at my notes from 2022-ish, like Comfy was already getting started back then, but it was kind of like the up and comer, and your main feature was the flowchart. Can you just kind of rewind to that moment, that year and like, you know, how you looked at the landscape there and decided to start Comfy?Comfy [00:01:10]: Yeah, I discovered Stable Diffusion in 2022, in October 2022. And, well, I kind of started playing around with it. Yes, I, and back then I was using Automatic, which was what everyone was using back then. And so I started with that because I had, it was when I started, I had no idea like how Diffusion works. I didn't know how Diffusion models work, how any of this works, so.swyx [00:01:36]: Oh, yeah. What was your prior background as an engineer?Comfy [00:01:39]: Just a software engineer. Yeah. Boring software engineer.swyx [00:01:44]: But like any, any image stuff, any orchestration, distributed systems, GPUs?Comfy [00:01:49]: No, I was doing basically nothing interesting. Crud, web development? Yeah, a lot of web development, just, yeah, some basic, maybe some basic like automation stuff. Okay. Just. Yeah, no, like, no big companies or anything.swyx [00:02:08]: Yeah, but like already some interest in automations, probably a lot of Python.Comfy [00:02:12]: Yeah, yeah, of course, Python. But I wasn't actually used to like the Node graph interface before I started Comfy UI. It was just, I just thought it was like, oh, like, what's the best way to represent the Diffusion process in the user interface? And then like, oh, well. Well, like, naturally, oh, this is the best way I've found. And this was like with the Node interface. So how I got started was, yeah, so basic October 2022, just like I hadn't written a line of PyTorch before that. So it's completely new. What happened was I kind of got addicted to generating images.Alessio [00:02:58]: As we all did. Yeah.Comfy [00:03:00]: And then I started. I started experimenting with like the high-res fixed in auto, which was for those that don't know, the high-res fix is just since the Diffusion models back then could only generate that low-resolution. So what you would do, you would generate low-resolution image, then upscale, then refine it again. And that was kind of the hack to generate high-resolution images. I really liked generating. Like higher resolution images. So I was experimenting with that. And so I modified the code a bit. Okay. What happens if I, if I use different samplers on the second pass, I was edited the code of auto. So what happens if I use a different sampler? What happens if I use a different, like a different settings, different number of steps? And because back then the. The high-res fix was very basic, just, so. Yeah.swyx [00:04:05]: Now there's a whole library of just, uh, the upsamplers.Comfy [00:04:08]: I think, I think they added a bunch of, uh, of options to the high-res fix since, uh, since, since then. But before that was just so basic. So I wanted to go further. I wanted to try it. What happens if I use a different model for the second, the second pass? And then, well, then the auto code base was, wasn't good enough for. Like, it would have been, uh, harder to implement that in the auto interface than to create my own interface. So that's when I decided to create my own. And you were doing that mostly on your own when you started, or did you already have kind of like a subgroup of people? No, I was, uh, on my own because, because it was just me experimenting with stuff. So yeah, that was it. Then, so I started writing the code January one. 2023, and then I released the first version on GitHub, January 16th, 2023. That's how things got started.Alessio [00:05:11]: And what's, what's the name? Comfy UI right away or? Yeah.Comfy [00:05:14]: Comfy UI. The reason the name, my name is Comfy is people thought my pictures were comfy, so I just, uh, just named it, uh, uh, it's my Comfy UI. So yeah, that's, uh,swyx [00:05:27]: Is there a particular segment of the community that you targeted as users? Like more intensive workflow artists, you know, compared to the automatic crowd or, you know,Comfy [00:05:37]: This was my way of like experimenting with, uh, with new things, like the high risk fixed thing I mentioned, which was like in Comfy, the first thing you could easily do was just chain different models together. And then one of the first things, I think the first times it got a bit of popularity was when I started experimenting with the different, like applying. Prompts to different areas of the image. Yeah. I called it area conditioning, posted it on Reddit and it got a bunch of upvotes. So I think that's when, like, when people first learned of Comfy UI.swyx [00:06:17]: Is that mostly like fixing hands?Comfy [00:06:19]: Uh, no, no, no. That was just, uh, like, let's say, well, it was very, well, it still is kind of difficult to like, let's say you want a mountain, you have an image and then, okay. I'm like, okay. I want the mountain here and I want the, like a, a Fox here.swyx [00:06:37]: Yeah. So compositing the image. Yeah.Comfy [00:06:40]: My way was very easy. It was just like, oh, when you run the diffusion process, you kind of generate, okay. You do pass one pass through the diffusion, every step you do one pass. Okay. This place of the image with this brand, this space, place of the image with the other prop. And then. The entire image with another prop and then just average everything together, every step, and that was, uh, area composition, which I call it. And then, then a month later, there was a paper that came out called multi diffusion, which was the same thing, but yeah, that's, uh,Alessio [00:07:20]: could you do area composition with different models or because you're averaging out, you kind of need the same model.Comfy [00:07:26]: Could do it with, but yeah, I hadn't implemented it. For different models, but, uh, you, you can do it with, uh, with different models if you want, as long as the models share the same latent space, like we, we're supposed to ring a bell every time someone says, yeah, like, for example, you couldn't use like Excel and SD 1.5, because those have a different latent space, but like, uh, yeah, like SD 1.5 models, different ones. You could, you could do that.swyx [00:07:59]: There's some models that try to work in pixel space, right?Comfy [00:08:03]: Yeah. They're very slow. Of course. That's the problem. That that's the, the reason why stable diffusion actually became like popular, like, cause was because of the latent space.swyx [00:08:14]: Small and yeah. Because it used to be latent diffusion models and then they trained it up.Comfy [00:08:19]: Yeah. Cause a pixel pixel diffusion models are just too slow. So. Yeah.swyx [00:08:25]: Have you ever tried to talk to like, like stability, the latent diffusion guys, like, you know, Robin Rombach, that, that crew. Yeah.Comfy [00:08:32]: Well, I used to work at stability.swyx [00:08:34]: Oh, I actually didn't know. Yeah.Comfy [00:08:35]: I used to work at stability. I got, uh, I got hired, uh, in June, 2023.swyx [00:08:42]: Ah, that's the part of the story I didn't know about. Okay. Yeah.Comfy [00:08:46]: So the, the reason I was hired is because they were doing, uh, SDXL at the time and they were basically SDXL. I don't know if you remember it was a base model and then a refiner model. Basically they wanted to experiment, like chaining them together. And then, uh, they saw, oh, right. Oh, this, we can use this to do that. Well, let's hire that guy.swyx [00:09:10]: But they didn't, they didn't pursue it for like SD3. What do you mean? Like the SDXL approach. Yeah.Comfy [00:09:16]: The reason for that approach was because basically they had two models and then they wanted to publish both of them. So they, they trained one on. Lower time steps, which was the refiner model. And then they, the first one was trained normally. And then they went during their test, they realized, oh, like if we string these models together are like quality increases. So let's publish that. It worked. Yeah. But like right now, I don't think many people actually use the refiner anymore, even though it is actually a full diffusion model. Like you can use it on its own. And it's going to generate images. I don't think anyone, people have mostly forgotten about it. But, uh.Alessio [00:10:05]: Can we talk about models a little bit? So stable diffusion, obviously is the most known. I know flux has gotten a lot of traction. Are there any underrated models that people should use more or what's the state of the union?Comfy [00:10:17]: Well, the, the latest, uh, state of the art, at least, yeah, for images there's, uh, yeah, there's flux. There's also SD3.5. SD3.5 is two models. There's a, there's a small one, 2.5B and there's the bigger one, 8B. So it's, it's smaller than flux. So, and it's more, uh, creative in a way, but flux, yeah, flux is the best. People should give SD3.5 a try cause it's, uh, it's different. I won't say it's better. Well, it's better for some like specific use cases. Right. If you want some to make something more like creative, maybe SD3.5. If you want to make something more consistent and flux is probably better.swyx [00:11:06]: Do you ever consider supporting the closed source model APIs?Comfy [00:11:10]: Uh, well, they, we do support them as custom nodes. We actually have some, uh, official custom nodes from, uh, different. Ideogram.swyx [00:11:20]: Yeah. I guess DALI would have one. Yeah.Comfy [00:11:23]: That's, uh, it's just not, I'm not the person that handles that. Sure.swyx [00:11:28]: Sure. Quick question on, on SD. There's a lot of community discussion about the transition from SD1.5 to SD2 and then SD2 to SD3. People still like, you know, very loyal to the previous generations of SDs?Comfy [00:11:41]: Uh, yeah. SD1.5 then still has a lot of, a lot of users.swyx [00:11:46]: The last based model.Comfy [00:11:49]: Yeah. Then SD2 was mostly ignored. It wasn't, uh, it wasn't a big enough improvement over the previous one. Okay.swyx [00:11:58]: So SD1.5, SD3, flux and whatever else. SDXL. SDXL.Comfy [00:12:03]: That's the main one. Stable cascade. Stable cascade. That was a good model. But, uh, that's, uh, the problem with that one is, uh, it got, uh, like SD3 was announced one week after. Yeah.swyx [00:12:16]: It was like a weird release. Uh, what was it like inside of stability actually? I mean, statute of limitations. Yeah. The statute of limitations expired. You know, management has moved. So it's easier to talk about now. Yeah.Comfy [00:12:27]: And inside stability, actually that model was ready, uh, like three months before, but it got, uh, stuck in, uh, red teaming. So basically the product, if that model had released or was supposed to be released by the authors, then it would probably have gotten very popular since it's a, it's a step up from SDXL. But it got all of its momentum stolen. It got stolen by the SD3 announcement. So people kind of didn't develop anything on top of it, even though it's, uh, yeah. It was a good model, at least, uh, completely mostly ignored for some reason. Likeswyx [00:13:07]: I think the naming as well matters. It seemed like a branch off of the main, main tree of development. Yeah.Comfy [00:13:15]: Well, it was different researchers that did it. Yeah. Yeah. Very like, uh, good model. Like it's the Worcestershire authors. I don't know if I'm pronouncing it correctly. Yeah. Yeah. Yeah.swyx [00:13:28]: I actually met them in Vienna. Yeah.Comfy [00:13:30]: They worked at stability for a bit and they left right after the Cascade release.swyx [00:13:35]: This is Dustin, right? No. Uh, Dustin's SD3. Yeah.Comfy [00:13:38]: Dustin is a SD3 SDXL. That's, uh, Pablo and Dome. I think I'm pronouncing his name correctly. Yeah. Yeah. Yeah. Yeah. That's very good.swyx [00:13:51]: It seems like the community is very, they move very quickly. Yeah. Like when there's a new model out, they just drop whatever the current one is. And they just all move wholesale over. Like they don't really stay to explore the full capabilities. Like if, if the stable cascade was that good, they would have AB tested a bit more. Instead they're like, okay, SD3 is out. Let's go. You know?Comfy [00:14:11]: Well, I find the opposite actually. The community doesn't like, they only jump on a new model when there's a significant improvement. Like if there's a, only like a incremental improvement, which is what, uh, most of these models are going to have, especially if you, cause, uh, stay the same parameter count. Yeah. Like you're not going to get a massive improvement, uh, into like, unless there's something big that, that changes. So, uh. Yeah.swyx [00:14:41]: And how are they evaluating these improvements? Like, um, because there's, it's a whole chain of, you know, comfy workflows. Yeah. How does, how does one part of the chain actually affect the whole process?Comfy [00:14:52]: Are you talking on the model side specific?swyx [00:14:54]: Model specific, right? But like once you have your whole workflow based on a model, it's very hard to move.Comfy [00:15:01]: Uh, not, well, not really. Well, it depends on your, uh, depends on their specific kind of the workflow. Yeah.swyx [00:15:09]: So I do a lot of like text and image. Yeah.Comfy [00:15:12]: When you do change, like most workflows are kind of going to be complete. Yeah. It's just like, you might have to completely change your prompt completely change. Okay.swyx [00:15:24]: Well, I mean, then maybe the question is really about evals. Like what does the comfy community do for evals? Just, you know,Comfy [00:15:31]: Well, that they don't really do that. It's more like, oh, I think this image is nice. So that's, uh,swyx [00:15:38]: They just subscribe to Fofr AI and just see like, you know, what Fofr is doing. Yeah.Comfy [00:15:43]: Well, they just, they just generate like it. Like, I don't see anyone really doing it. Like, uh, at least on the comfy side, comfy users, they, it's more like, oh, generate images and see, oh, this one's nice. It's like, yeah, it's not, uh, like the, the more, uh, like, uh, scientific, uh, like, uh, like checking that's more on specifically on like model side. If, uh, yeah, but there is a lot of, uh, vibes also, cause it is a like, uh, artistic, uh, you can create a very good model that doesn't generate nice images. Cause most images on the internet are ugly. So if you, if that's like, if you just, oh, I have the best model at 10th giant, it's super smart. I created on all the, like I've trained on just all the images on the internet. The images are not going to look good. So yeah.Alessio [00:16:42]: Yeah.Comfy [00:16:43]: They're going to be very consistent. But yeah. People like, it's not going to be like the, the look that people are going to be expecting from, uh, from a model. So. Yeah.swyx [00:16:54]: Can we talk about LoRa's? Cause we thought we talked about models then like the next step is probably LoRa's. Before, I actually, I'm kind of curious how LoRa's entered the tool set of the image community because the LoRa paper was 2021. And then like, there was like other methods like textual inversion that was popular at the early SD stage. Yeah.Comfy [00:17:13]: I can't even explain the difference between that. Yeah. Textual inversions. That's basically what you're doing is you're, you're training a, cause well, yeah. Stable diffusion. You have the diffusion model, you have text encoder. So basically what you're doing is training a vector that you're going to pass to the text encoder. It's basically you're training a new word. Yeah.swyx [00:17:37]: It's a little bit like representation engineering now. Yeah.Comfy [00:17:40]: Yeah. Basically. Yeah. You're just, so yeah, if you know how like the text encoder works, basically you have, you take your, your words of your product, you convert those into tokens with the tokenizer and those are converted into vectors. Basically. Yeah. Each token represents a different vector. So each word presents a vector. And those, depending on your words, that's the list of vectors that get passed to the text encoder, which is just. Yeah. Yeah. I'm just a stack of, of attention. Like basically it's a very close to LLM architecture. Yeah. Yeah. So basically what you're doing is just training a new vector. We're saying, well, I have all these images and I want to know which word does that represent? And it's going to get like, you train this vector and then, and then when you use this vector, it hopefully generates. Like something similar to your images. Yeah.swyx [00:18:43]: I would say it's like surprisingly sample efficient in picking up the concept that you're trying to train it on. Yeah.Comfy [00:18:48]: Well, people have kind of stopped doing that even though back as like when I was at Stability, we, we actually did train internally some like textual versions on like T5 XXL actually worked pretty well. But for some reason, yeah, people don't use them. And also they might also work like, like, yeah, this is something and probably have to test, but maybe if you train a textual version, like on T5 XXL, it might also work with all the other models that use T5 XXL because same thing with like, like the textual inversions that, that were trained for SD 1.5, they also kind of work on SDXL because SDXL has the, has two text encoders. And one of them is the same as the, as the SD 1.5 CLIP-L. So those, they actually would, they don't work as strongly because they're only applied to one of the text encoders. But, and the same thing for SD3. SD3 has three text encoders. So it works. It's still, you can still use your textual version SD 1.5 on SD3, but it's just a lot weaker because now there's three text encoders. So it gets even more diluted. Yeah.swyx [00:20:05]: Do people experiment a lot on, just on the CLIP side, there's like Siglip, there's Blip, like do people experiment a lot on those?Comfy [00:20:12]: You can't really replace. Yeah.swyx [00:20:14]: Because they're trained together, right? Yeah.Comfy [00:20:15]: They're trained together. So you can't like, well, what I've seen people experimenting with is a long CLIP. So basically someone fine tuned the CLIP model to accept longer prompts.swyx [00:20:27]: Oh, it's kind of like long context fine tuning. Yeah.Comfy [00:20:31]: So, so like it's, it's actually supported in Core Comfy.swyx [00:20:35]: How long is long?Comfy [00:20:36]: Regular CLIP is 77 tokens. Yeah. Long CLIP is 256. Okay. So, but the hack that like you've, if you use stable diffusion 1.5, you've probably noticed, oh, it still works if I, if I use long prompts, prompts longer than 77 words. Well, that's because the hack is to just, well, you split, you split it up in chugs of 77, your whole big prompt. Let's say you, you give it like the massive text, like the Bible or something, and it would split it up in chugs of 77 and then just pass each one through the CLIP and then just cut anything together at the end. It's not ideal, but it actually works.swyx [00:21:26]: Like the positioning of the words really, really matters then, right? Like this is why order matters in prompts. Yeah.Comfy [00:21:33]: Yeah. Like it, it works, but it's, it's not ideal, but it's what people expect. Like if, if someone gives a huge prompt, they expect at least some of the concepts at the end to be like present in the image. But usually when they give long prompts, they, they don't, they like, they don't expect like detail, I think. So that's why it works very well.swyx [00:21:58]: And while we're on this topic, prompts waiting, negative comments. Negative prompting all, all sort of similar part of this layer of the stack. Yeah.Comfy [00:22:05]: The, the hack for that, which works on CLIP, like it, basically it's just for SD 1.5, well, for SD 1.5, the prompt waiting works well because CLIP L is a, is not a very deep model. So you have a very high correlation between, you have the input token, the index of the input token vector. And the output token, they're very, the concepts are very close, closely linked. So that means if you interpolate the vector from what, well, the, the way Comfy UI does it is it has, okay, you have the vector, you have an empty prompt. So you have a, a chunk, like a CLIP output for the empty prompt, and then you have the one for your prompt. And then it interpolates from that, depending on your prompt. Yeah.Comfy [00:23:07]: So that's how it, how it does prompt waiting. But this stops working the deeper your text encoder is. So on T5X itself, it doesn't work at all. So. Wow.swyx [00:23:20]: Is that a problem for people? I mean, cause I'm used to just move, moving up numbers. Probably not. Yeah.Comfy [00:23:25]: Well.swyx [00:23:26]: So you just use words to describe, right? Cause it's a bigger language model. Yeah.Comfy [00:23:30]: Yeah. So. Yeah. So honestly it might be good, but I haven't seen many complaints on Flux that it's not working. So, cause I guess people can sort of get around it with, with language. So. Yeah.swyx [00:23:46]: Yeah. And then coming back to LoRa's, now the, the popular way to, to customize models is LoRa's. And I saw you also support Locon and LoHa, which I've never heard of before.Comfy [00:23:56]: There's a bunch of, cause what, what the LoRa is essentially is. Instead of like, okay, you have your, your model and then you want to fine tune it. So instead of like, what you could do is you could fine tune the entire thing, but that's a bit heavy. So to speed things up and make things less heavy, what you can do is just fine tune some smaller weights, like basically two, two matrices that when you multiply like two low rank matrices and when you multiply them together, gives a, represents a difference between trained weights and your base weights. So by training those two smaller matrices, that's a lot less heavy. Yeah.Alessio [00:24:45]: And they're portable. So you're going to share them. Yeah. It's like easier. And also smaller.Comfy [00:24:49]: Yeah. That's the, how LoRa's work. So basically, so when, when inferencing you, you get an inference with them pretty efficiently, like how ComputeWrite does it. It just, when you use a LoRa, it just applies it straight on the weights so that there's only a small delay at the base, like before the sampling to when it applies the weights and then it just same speed as, as before. So for, for inference, it's, it's not that bad, but, and then you have, so basically all the LoRa types like LoHa, LoCon, everything, that's just different ways of representing that like. Basically, you can call it kind of like compression, even though it's not really compression, it's just different ways of represented, like just, okay, I want to train a different on the difference on the weights. What's the best way to represent that difference? There's the basic LoRa, which is just, oh, let's multiply these two matrices together. And then there's all the other ones, which are all different algorithms. So. Yeah.Alessio [00:25:57]: So let's talk about LoRa. Let's talk about what comfy UI actually is. I think most people have heard of it. Some people might've seen screenshots. I think fewer people have built very complex workflows. So when you started, automatic was like the super simple way. What were some of the choices that you made? So the node workflow, is there anything else that stands out as like, this was like a unique take on how to do image generation workflows?Comfy [00:26:22]: Well, I feel like, yeah, back then everyone was trying to make like easy to use interface. Yeah. So I'm like, well, everyone's trying to make an easy to use interface.swyx [00:26:32]: Let's make a hard to use interface.Comfy [00:26:37]: Like, so like, I like, I don't need to do that, everyone else doing it. So let me try something like, let me try to make a powerful interface that's not easy to use. So.swyx [00:26:52]: So like, yeah, there's a sort of node execution engine. Yeah. Yeah. And it actually lists, it has this really good list of features of things you prioritize, right? Like let me see, like sort of re-executing from, from any parts of the workflow that was changed, asynchronous queue system, smart memory management, like all this seems like a lot of engineering that. Yeah.Comfy [00:27:12]: There's a lot of engineering in the back end to make things, cause I was always focused on making things work locally very well. Cause that's cause I was using it locally. So everything. So there's a lot of, a lot of thought and working by getting everything to run as well as possible. So yeah. ConfUI is actually more of a back end, at least, well, not all the front ends getting a lot more development, but, but before, before it was, I was pretty much only focused on the backend. Yeah.swyx [00:27:50]: So v0.1 was only August this year. Yeah.Comfy [00:27:54]: With the new front end. Before there was no versioning. So yeah. Yeah. Yeah.swyx [00:27:57]: And so what was the big rewrite for the 0.1 and then the 1.0?Comfy [00:28:02]: Well, that's more on the front end side. That's cause before that it was just like the UI, what, cause when I first wrote it, I just, I said, okay, how can I make, like, I can do web development, but I don't like doing it. Like what's the easiest way I can slap a node interface on this. And then I found this library. Yeah. Like JavaScript library.swyx [00:28:26]: Live graph?Comfy [00:28:27]: Live graph.swyx [00:28:28]: Usually people will go for like react flow for like a flow builder. Yeah.Comfy [00:28:31]: But that seems like too complicated. So I didn't really want to spend time like developing the front end. So I'm like, well, oh, light graph. This has the whole node interface. So, okay. Let me just plug that into, to my backend.swyx [00:28:49]: I feel like if Streamlit or Gradio offered something that you would have used Streamlit or Gradio cause it's Python. Yeah.Comfy [00:28:54]: Yeah. Yeah. Yeah.Comfy [00:29:00]: Yeah.Comfy [00:29:14]: Yeah. logic and your backend logic and just sticks them together.swyx [00:29:20]: It's supposed to be easy for you guys. If you're a Python main, you know, I'm a JS main, right? Okay. If you're a Python main, it's supposed to be easy.Comfy [00:29:26]: Yeah, it's easy, but it makes your whole software a huge mess.swyx [00:29:30]: I see, I see. So you're mixing concerns instead of separating concerns?Comfy [00:29:34]: Well, it's because... Like frontend and backend. Frontend and backend should be well separated with a defined API. Like that's how you're supposed to do it. Smart people disagree. It just sticks everything together. It makes it easy to like a huge mess. And also it's, there's a lot of issues with Gradio. Like it's very good if all you want to do is just get like slap a quick interface on your, like to show off your ML project. Like that's what it's made for. Yeah. Like there's no problem using it. Like, oh, I have my, I have my code. I just wanted a quick interface on it. That's perfect. Like use Gradio. But if you want to make something that's like a real, like real software that will last a long time and will be easy to maintain, then I would avoid it. Yeah.swyx [00:30:32]: So your criticism is Streamlit and Gradio are the same. I mean, those are the same criticisms.Comfy [00:30:37]: Yeah, Streamlit I haven't used as much. Yeah, I just looked a bit.swyx [00:30:43]: Similar philosophy.Comfy [00:30:44]: Yeah, it's similar. It's just, it just seems to me like, okay, for quick, like AI demos, it's perfect.swyx [00:30:51]: Yeah. Going back to like the core tech, like asynchronous queues, slow re-execution, smart memory management, you know, anything that you were very proud of or was very hard to figure out?Comfy [00:31:00]: Yeah. The thing that's the biggest pain in the ass is probably the memory management. Yeah.swyx [00:31:05]: Were you just paging models in and out or? Yeah.Comfy [00:31:08]: Before it was just, okay, load the model, completely unload it. Then, okay, that, that works well when you, your model are small, but if your models are big and it takes sort of like, let's say someone has a, like a, a 4090, and the model size is 10 gigabytes, that can take a few seconds to like load and load, load and load, so you want to try to keep things like in memory, in the GPU memory as much as possible. What Comfy UI does right now is it. It tries to like estimate, okay, like, okay, you're going to sample this model, it's going to take probably this amount of memory, let's remove the models, like this amount of memory that's been loaded on the GPU and then just execute it. But so there's a fine line between just because try to remove the least amount of models that are already loaded. Because as fans, like Windows drivers, and one other problem is the NVIDIA driver on Windows by default, because there's a way to, there's an option to disable that feature, but by default it, like, if you start loading, you can overflow your GPU memory and then it's, the driver's going to automatically start paging to RAM. But the problem with that is it's, it makes everything extremely slow. So when you see people complaining, oh, this model, it works, but oh, s**t, it starts slowing down a lot, that's probably what's happening. So it's basically you have to just try to get, use as much memory as possible, but not too much, or else things start slowing down, or people get out of memory, and then just find, try to find that line where, oh, like the driver on Windows starts paging and stuff. Yeah. And the problem with PyTorch is it's, it's high levels, don't have that much fine-grained control over, like, specific memory stuff, so kind of have to leave, like, the memory freeing to, to Python and PyTorch, which is, can be annoying sometimes.swyx [00:33:32]: So, you know, I think one thing is, as a maintainer of this project, like, you're designing for a very wide surface area of compute, like, you even support CPUs.Comfy [00:33:42]: Yeah, well, that's... That's just, for PyTorch, PyTorch supports CPUs, so, yeah, it's just, that's not, that's not hard to support.swyx [00:33:50]: First of all, is there a market share estimate, like, is it, like, 70% NVIDIA, like, 30% AMD, and then, like, miscellaneous on Apple, Silicon, or whatever?Comfy [00:33:59]: For Comfy? Yeah. Yeah, and, yeah, I don't know the market share.swyx [00:34:03]: Can you guess?Comfy [00:34:04]: I think it's mostly NVIDIA. Right. Because, because AMD, the problem, like, AMD works horribly on Windows. Like, on Linux, it works fine. It's, it's lower than the price equivalent NVIDIA GPU, but it works, like, you can use it, you generate images, everything works. On Linux, on Windows, you might have a hard time, so, that's the problem, and most people, I think most people who bought AMD probably use Windows. They probably aren't going to switch to Linux, so... Yeah. So, until AMD actually, like, ports their, like, raw cam to, to Windows properly, and then there's actually PyTorch, I think they're, they're doing that, they're in the process of doing that, but, until they get it, they get a good, like, PyTorch raw cam build that works on Windows, it's, like, they're going to have a hard time. Yeah.Alessio [00:35:06]: We got to get George on it. Yeah. Well, he's trying to get Lisa Su to do it, but... Let's talk a bit about, like, the node design. So, unlike all the other text-to-image, you have a very, like, deep, so you have, like, a separate node for, like, clip and code, you have a separate node for, like, the case sampler, you have, like, all these nodes. Going back to, like, the making it easy versus making it hard, but, like, how much do people actually play with all the settings, you know? Kind of, like, how do you guide people to, like, hey, this is actually going to be very impactful versus this is maybe, like, less impactful, but we still want to expose it to you?Comfy [00:35:40]: Well, I try to... I try to expose, like, I try to expose everything or, but, yeah, at least for the, but for things, like, for example, for the samplers, like, there's, like, yeah, four different sampler nodes, which go in easiest to most advanced. So, yeah, if you go, like, the easy node, the regular sampler node, that's, you have just the basic settings. But if you use, like, the sampler advanced... If you use, like, the custom advanced node, that, that one you can actually, you'll see you have, like, different nodes.Alessio [00:36:19]: I'm looking it up now. Yeah. What are, like, the most impactful parameters that you use? So, it's, like, you know, you can have more, but, like, which ones, like, really make a difference?Comfy [00:36:30]: Yeah, they all do. They all have their own, like, they all, like, for example, yeah, steps. Usually you want steps, you want them to be as low as possible. But you want, if you're optimizing your workflow, you want to, you lower the steps until, like, the images start deteriorating too much. Because that, yeah, that's the number of steps you're running the diffusion process. So, if you want things to be faster, lower is better. But, yeah, CFG, that's more, you can kind of see that as the contrast of the image. Like, if your image looks too bursty. Then you can lower the CFG. So, yeah, CFG, that's how, yeah, that's how strongly the, like, the negative versus positive prompt. Because when you sample a diffusion model, it's basically a negative prompt. It's just, yeah, positive prediction minus negative prediction.swyx [00:37:32]: Contrastive loss. Yeah.Comfy [00:37:34]: It's positive minus negative, and the CFG does the multiplier. Yeah. Yeah. Yeah, so.Alessio [00:37:41]: What are, like, good resources to understand what the parameters do? I think most people start with automatic, and then they move over, and it's, like, snap, CFG, sampler, name, scheduler, denoise. Read it.Comfy [00:37:53]: But, honestly, well, it's more, it's something you should, like, try out yourself. I don't know, you don't necessarily need to know how it works to, like, what it does. Because even if you know, like, CFGO, it's, like, positive minus negative prompt. Yeah. So the only thing you know at CFG is if it's 1.0, then that means the negative prompt isn't applied. It also means sampling is two times faster. But, yeah. But other than that, it's more, like, you should really just see what it does to the images yourself, and you'll probably get a more intuitive understanding of what these things do.Alessio [00:38:34]: Any other nodes or things you want to shout out? Like, I know the animate diff IP adapter. Those are, like, some of the most popular ones. Yeah. What else comes to mind?Comfy [00:38:44]: Not nodes, but there's, like, what I like is when some people, sometimes they make things that use ComfyUI as their backend. Like, there's a plugin for Krita that uses ComfyUI as its backend. So you can use, like, all the models that work in Comfy in Krita. And I think I've tried it once. But I know a lot of people use it, and it's probably really nice, so.Alessio [00:39:15]: What's the craziest node that people have built, like, the most complicated?Comfy [00:39:21]: Craziest node? Like, yeah. I know some people have made, like, video games in Comfy with, like, stuff like that. So, like, someone, like, I remember, like, yeah, last, I think it was last year, someone made, like, a, like, Wolfenstein 3D in Comfy. Of course. And then one of the inputs was, oh, you can generate a texture, and then it changes the texture in the game. So you can plug it to, like, the workflow. And there's a lot of, if you look there, there's a lot of crazy things people do, so. Yeah.Alessio [00:39:59]: And now there's, like, a node register that people can use to, like, download nodes. Yeah.Comfy [00:40:04]: Like, well, there's always been the, like, the ComfyUI manager. Yeah. But we're trying to make this more, like, I don't know, official, like, with, yeah, with the node registry. Because before the node registry, the, like, okay, how did your custom node get into ComfyUI manager? That's the guy running it who, like, every day he searched GitHub for new custom nodes and added dev annually to his custom node manager. So we're trying to make it less effortless. So we're trying to make it less effortless for him, basically. Yeah.Alessio [00:40:40]: Yeah. But I was looking, I mean, there's, like, a YouTube download node. There's, like, this is almost like, you know, a data pipeline more than, like, an image generation thing at this point. It's, like, you can get data in, you can, like, apply filters to it, you can generate data out.Comfy [00:40:54]: Yeah. You can do a lot of different things. Yeah. So I'm thinking, I think what I did is I made it easy to make custom nodes. So I think that helped a lot. I think that helped a lot for, like, the ecosystem because it is very easy to just make a node. So, yeah, a bit too easy sometimes. Then we have the issue where there's a lot of custom node packs which share similar nodes. But, well, that's, yeah, something we're trying to solve by maybe bringing some of the functionality into the core. Yeah. Yeah. Yeah.Alessio [00:41:36]: And then there's, like, video. People can do video generation. Yeah.Comfy [00:41:40]: Video, that's, well, the first video model was, like, stable video diffusion, which was last, yeah, exactly last year, I think. Like, one year ago. But that wasn't a true video model. So it was...swyx [00:41:55]: It was, like, moving images? Yeah.Comfy [00:41:57]: I generated video. What I mean by that is it's, like, it's still 2D Latents. It's basically what I'm trying to do. So what they did is they took SD2, and then they added some temporal attention to it, and then trained it on videos and all. So it's kind of, like, animated, like, same idea, basically. Why I say it's not a true video model is that you still have, like, the 2D Latents. Like, a true video model, like Mochi, for example, would have 3D Latents. Mm-hmm.Alessio [00:42:32]: Which means you can, like, move through the space, basically. It's the difference. You're not just kind of, like, reorienting. Yeah.Comfy [00:42:39]: And it's also, well, it's also because you have a temporal VAE. Mm-hmm. Also, like, Mochi has a temporal VAE that compresses on, like, the temporal direction, also. So that's something you don't have with, like, yeah, animated diff and stable video diffusion. They only, like, compress spatially, not temporally. Mm-hmm. Right. So, yeah. That's why I call that, like, true video models. There's, yeah, there's actually a few of them, but the one I've implemented in comfy is Mochi, because that seems to be the best one so far. Yeah.swyx [00:43:15]: We had AJ come and speak at the stable diffusion meetup. The other open one I think I've seen is COG video. Yeah.Comfy [00:43:21]: COG video. Yeah. That one's, yeah, it also seems decent, but, yeah. Chinese, so we don't use it. No, it's fine. It's just, yeah, I could. Yeah. It's just that there's a, it's not the only one. There's also a few others, which I.swyx [00:43:36]: The rest are, like, closed source, right? Like, Cling. Yeah.Comfy [00:43:39]: Closed source, there's a bunch of them. But I mean, open. I've seen a few of them. Like, I can't remember their names, but there's COG videos, the big, the big one. Then there's also a few of them that released at the same time. There's one that released at the same time as SSD 3.5, same day, which is why I don't remember the name.swyx [00:44:02]: We should have a release schedule so we don't conflict on each of these things. Yeah.Comfy [00:44:06]: I think SD 3.5 and Mochi released on the same day. So everything else was kind of drowned, completely drowned out. So for some reason, lots of people picked that day to release their stuff.Comfy [00:44:21]: Yeah. Which is, well, shame for those. And I think Omnijet also released the same day, which also seems interesting. Yeah. Yeah.Alessio [00:44:30]: What's Comfy? So you are Comfy. And then there's like, comfy.org. I know we do a lot of things for, like, news research and those guys also have kind of like a more open source thing going on. How do you work? Like you mentioned, you mostly work on like, the core piece of it. And then what...Comfy [00:44:47]: Maybe I should fade it in because I, yeah, I feel like maybe, yeah, I only explain part of the story. Right. Yeah. Maybe I should explain the rest. So yeah. So yeah. Basically, January, that's when the first January 2023, January 16, 2023, that's when Amphi was first released to the public. Then, yeah, did a Reddit post about the area composition thing somewhere in, I don't remember exactly, maybe end of January, beginning of February. And then someone, a YouTuber, made a video about it, like Olivio, he made a video about Amphi in March 2023. I think that's when it was a real burst of attention. And by that time, I was continuing to develop it and it was getting, people were starting to use it more, which unfortunately meant that I had first written it to do like experiments, but then my time to do experiments went down. It started going down, because people were actually starting to use it then. Like, I had to, and I said, well, yeah, time to add all these features and stuff. Yeah, and then I got hired by Stability June, 2023. Then I made, basically, yeah, they hired me because they wanted the SD-XL. So I got the SD-XL working very well withітhe UI, because they were experimenting withámphi.house.com. Actually, the SDX, how the SDXL released worked is they released, for some reason, like they released the code first, but they didn't release the model checkpoint. So they released the code. And then, well, since the research was related to code, I released the code in Compute 2. And then the checkpoints were basically early access. People had to sign up and they only allowed a lot of people from edu emails. Like if you had an edu email, like they gave you access basically to the SDXL 0.9. And, well, that leaked. Right. Of course, because of course it's going to leak if you do that. Well, the only way people could easily use it was with Comfy. So, yeah, people started using. And then I fixed a few of the issues people had. So then the big 1.0 release happened. And, well, Comfy UI was the only way a lot of people could actually run it on their computers. Because it just like automatic was so like inefficient and bad that most people couldn't actually, like it just wouldn't work. Like because he did a quick implementation. So people were forced. To use Comfy UI, and that's how it became popular because people had no choice.swyx [00:47:55]: The growth hack.Comfy [00:47:56]: Yeah.swyx [00:47:56]: Yeah.Comfy [00:47:57]: Like everywhere, like people who didn't have the 4090, they had like, who had just regular GPUs, they didn't have a choice.Alessio [00:48:05]: So yeah, I got a 4070. So think of me. And so today, what's, is there like a core Comfy team or?Comfy [00:48:13]: Uh, yeah, well, right now, um, yeah, we are hiring. Okay. Actually, so right now core, like, um, the core core itself, it's, it's me. Uh, but because, uh, the reason where folks like all the focus has been mostly on the front end right now, because that's the thing that's been neglected for a long time. So, uh, so most of the focus right now is, uh, all on the front end, but we are, uh, yeah, we will soon get, uh, more people to like help me with the actual backend stuff. Yeah. So, no, I'm not going to say a hundred percent because that's why once the, once we have our V one release, which is because it'd be the package, come fee-wise with the nice interface and easy to install on windows and hopefully Mac. Uh, yeah. Yeah. Once we have that, uh, we're going to have to, lots of stuff to do on the backend side and also the front end side, but, uh.Alessio [00:49:14]: What's the release that I'm on the wait list. What's the timing?Comfy [00:49:18]: Uh, soon. Uh, soon. Yeah, I don't want to promise a release date. We do have a release date we're targeting, but I'm not sure if it's public. Yeah, and we're still going to continue doing the open source, making MPUI the best way to run stable infusion models. At least the open source side, it's going to be the best way to run models locally. But we will have a few things to make money from it, like cloud inference or that type of thing. And maybe some things for some enterprises.swyx [00:50:08]: I mean, a few questions on that. How do you feel about the other comfy startups?Comfy [00:50:11]: I mean, I think it's great. They're using your name. Yeah, well, it's better they use comfy than they use something else. Yeah, that's true. It's fine. We're going to try not to... We don't want to... We want people to use comfy. Like I said, it's better that people use comfy than something else. So as long as they use comfy, I think it helps the ecosystem. Because more people, even if they don't contribute directly, the fact that they are using comfy means that people are more likely to join the ecosystem. So, yeah.swyx [00:50:57]: And then would you ever do text?Comfy [00:50:59]: Yeah, well, you can already do text with some custom nodes. So, yeah, it's something we like. Yeah, it's something I've wanted to eventually add to core, but it's more like not a very... It's a very high priority. But because a lot of people use text for prompt enhancement and other things like that. So, yeah, it's just that my focus has always been on diffusion models. Yeah, unless some text diffusion model comes out.swyx [00:51:30]: Yeah, David Holtz is investing a lot in text diffusion.Comfy [00:51:34]: Yeah, well, if a good one comes out, then we'll probably implement it since it fits with the whole...swyx [00:51:39]: Yeah, I mean, I imagine it's going to be a close source to Midjourney. Yeah.Comfy [00:51:43]: Well, if an open one comes out, then I'll probably implement it.Alessio [00:51:54]: Cool, comfy. Thanks so much for coming on. This was fun. Bye. Get full access to Latent Space at www.latent.space/subscribe

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Applications for the 2025 AI Engineer Summit are up, and you can save the date for AIE Singapore in April and AIE World's Fair 2025 in June.Happy new year, and thanks for 100 great episodes! Please let us know what you want to see/hear for the next 100!Full YouTube Episode with Slides/ChartsLike and subscribe and hit that bell to get notifs!Timestamps* 00:00 Welcome to the 100th Episode!* 00:19 Reflecting on the Journey* 00:47 AI Engineering: The Rise and Impact* 03:15 Latent Space Live and AI Conferences* 09:44 The Competitive AI Landscape* 21:45 Synthetic Data and Future Trends* 35:53 Creative Writing with AI* 36:12 Legal and Ethical Issues in AI* 38:18 The Data War: GPU Poor vs. GPU Rich* 39:12 The Rise of GPU Ultra Rich* 40:47 Emerging Trends in AI Models* 45:31 The Multi-Modality War* 01:05:31 The Future of AI Benchmarks* 01:13:17 Pionote and Frontier Models* 01:13:47 Niche Models and Base Models* 01:14:30 State Space Models and RWKB* 01:15:48 Inference Race and Price Wars* 01:22:16 Major AI Themes of the Year* 01:22:48 AI Rewind: January to March* 01:26:42 AI Rewind: April to June* 01:33:12 AI Rewind: July to September* 01:34:59 AI Rewind: October to December* 01:39:53 Year-End Reflections and PredictionsTranscript[00:00:00] Welcome to the 100th Episode![00:00:00] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host Swyx for the 100th time today.[00:00:12] swyx: Yay, um, and we're so glad that, yeah, you know, everyone has, uh, followed us in this journey. How do you feel about it? 100 episodes.[00:00:19] Alessio: Yeah, I know.[00:00:19] Reflecting on the Journey[00:00:19] Alessio: Almost two years that we've been doing this. We've had four different studios. Uh, we've had a lot of changes. You know, we used to do this lightning round. When we first started that we didn't like, and we tried to change the question. The answer[00:00:32] swyx: was cursor and perplexity.[00:00:34] Alessio: Yeah, I love mid journey. It's like, do you really not like anything else?[00:00:38] Alessio: Like what's, what's the unique thing? And I think, yeah, we, we've also had a lot more research driven content. You know, we had like 3DAO, we had, you know. Jeremy Howard, we had more folks like that.[00:00:47] AI Engineering: The Rise and Impact[00:00:47] Alessio: I think we want to do more of that too in the new year, like having, uh, some of the Gemini folks, both on the research and the applied side.[00:00:54] Alessio: Yeah, but it's been a ton of fun. I think we both started, I wouldn't say as a joke, we were kind of like, Oh, we [00:01:00] should do a podcast. And I think we kind of caught the right wave, obviously. And I think your rise of the AI engineer posts just kind of get people. Sombra to congregate, and then the AI engineer summit.[00:01:11] Alessio: And that's why when I look at our growth chart, it's kind of like a proxy for like the AI engineering industry as a whole, which is almost like, like, even if we don't do that much, we keep growing just because there's so many more AI engineers. So did you expect that growth or did you expect that would take longer for like the AI engineer thing to kind of like become, you know, everybody talks about it today.[00:01:32] swyx: So, the sign of that, that we have won is that Gartner puts it at the top of the hype curve right now. So Gartner has called the peak in AI engineering. I did not expect, um, to what level. I knew that I was correct when I called it because I did like two months of work going into that. But I didn't know, You know, how quickly it could happen, and obviously there's a chance that I could be wrong.[00:01:52] swyx: But I think, like, most people have come around to that concept. Hacker News hates it, which is a good sign. But there's enough people that have defined it, you know, GitHub, when [00:02:00] they launched GitHub Models, which is the Hugging Face clone, they put AI engineers in the banner, like, above the fold, like, in big So I think it's like kind of arrived as a meaningful and useful definition.[00:02:12] swyx: I think people are trying to figure out where the boundaries are. I think that was a lot of the quote unquote drama that happens behind the scenes at the World's Fair in June. Because I think there's a lot of doubt or questions about where ML engineering stops and AI engineering starts. That's a useful debate to be had.[00:02:29] swyx: In some sense, I actually anticipated that as well. So I intentionally did not. Put a firm definition there because most of the successful definitions are necessarily underspecified and it's actually useful to have different perspectives and you don't have to specify everything from the outset.[00:02:45] Alessio: Yeah, I was at um, AWS reInvent and the line to get into like the AI engineering talk, so to speak, which is, you know, applied AI and whatnot was like, there are like hundreds of people just in line to go in.[00:02:56] Alessio: I think that's kind of what enabled me. People, right? Which is what [00:03:00] you kind of talked about. It's like, Hey, look, you don't actually need a PhD, just, yeah, just use the model. And then maybe we'll talk about some of the blind spots that you get as an engineer with the earlier posts that we also had on on the sub stack.[00:03:11] Alessio: But yeah, it's been a heck of a heck of a two years.[00:03:14] swyx: Yeah.[00:03:15] Latent Space Live and AI Conferences[00:03:15] swyx: You know, I was, I was trying to view the conference as like, so NeurIPS is I think like 16, 17, 000 people. And the Latent Space Live event that we held there was 950 signups. I think. The AI world, the ML world is still very much research heavy. And that's as it should be because ML is very much in a research phase.[00:03:34] swyx: But as we move this entire field into production, I think that ratio inverts into becoming more engineering heavy. So at least I think engineering should be on the same level, even if it's never as prestigious, like it'll always be low status because at the end of the day, you're manipulating APIs or whatever.[00:03:51] swyx: But Yeah, wrapping GPTs, but there's going to be an increasing stack and an art to doing these, these things well. And I, you know, I [00:04:00] think that's what we're focusing on for the podcast, the conference and basically everything I do seems to make sense. And I think we'll, we'll talk about the trends here that apply.[00:04:09] swyx: It's, it's just very strange. So, like, there's a mix of, like, keeping on top of research while not being a researcher and then putting that research into production. So, like, people always ask me, like, why are you covering Neuralibs? Like, this is a ML research conference and I'm like, well, yeah, I mean, we're not going to, to like, understand everything Or reproduce every single paper, but the stuff that is being found here is going to make it through into production at some point, you hope.[00:04:32] swyx: And then actually like when I talk to the researchers, they actually get very excited because they're like, oh, you guys are actually caring about how this goes into production and that's what they really really want. The measure of success is previously just peer review, right? Getting 7s and 8s on their um, Academic review conferences and stuff like citations is one metric, but money is a better metric.[00:04:51] Alessio: Money is a better metric. Yeah, and there were about 2200 people on the live stream or something like that. Yeah, yeah. Hundred on the live stream. So [00:05:00] I try my best to moderate, but it was a lot spicier in person with Jonathan and, and Dylan. Yeah, that it was in the chat on YouTube.[00:05:06] swyx: I would say that I actually also created.[00:05:09] swyx: Layen Space Live in order to address flaws that are perceived in academic conferences. This is not NeurIPS specific, it's ICML, NeurIPS. Basically, it's very sort of oriented towards the PhD student, uh, market, job market, right? Like literally all, basically everyone's there to advertise their research and skills and get jobs.[00:05:28] swyx: And then obviously all the, the companies go there to hire them. And I think that's great for the individual researchers, but for people going there to get info is not great because you have to read between the lines, bring a ton of context in order to understand every single paper. So what is missing is effectively what I ended up doing, which is domain by domain, go through and recap the best of the year.[00:05:48] swyx: Survey the field. And there are, like NeurIPS had a, uh, I think ICML had a like a position paper track, NeurIPS added a benchmarks, uh, datasets track. These are ways in which to address that [00:06:00] issue. Uh, there's always workshops as well. Every, every conference has, you know, a last day of workshops and stuff that provide more of an overview.[00:06:06] swyx: But they're not specifically prompted to do so. And I think really, uh, Organizing a conference is just about getting good speakers and giving them the correct prompts. And then they will just go and do that thing and they do a very good job of it. So I think Sarah did a fantastic job with the startups prompt.[00:06:21] swyx: I can't list everybody, but we did best of 2024 in startups, vision, open models. Post transformers, synthetic data, small models, and agents. And then the last one was the, uh, and then we also did a quick one on reasoning with Nathan Lambert. And then the last one, obviously, was the debate that people were very hyped about.[00:06:39] swyx: It was very awkward. And I'm really, really thankful for John Franco, basically, who stepped up to challenge Dylan. Because Dylan was like, yeah, I'll do it. But He was pro scaling. And I think everyone who is like in AI is pro scaling, right? So you need somebody who's ready to publicly say, no, we've hit a wall.[00:06:57] swyx: So that means you're saying Sam Altman's wrong. [00:07:00] You're saying, um, you know, everyone else is wrong. It helps that this was the day before Ilya went on, went up on stage and then said pre training has hit a wall. And data has hit a wall. So actually Jonathan ended up winning, and then Ilya supported that statement, and then Noam Brown on the last day further supported that statement as well.[00:07:17] swyx: So it's kind of interesting that I think the consensus kind of going in was that we're not done scaling, like you should believe in a better lesson. And then, four straight days in a row, you had Sepp Hochreiter, who is the creator of the LSTM, along with everyone's favorite OG in AI, which is Juergen Schmidhuber.[00:07:34] swyx: He said that, um, we're pre trading inside a wall, or like, we've run into a different kind of wall. And then we have, you know John Frankel, Ilya, and then Noam Brown are all saying variations of the same thing, that we have hit some kind of wall in the status quo of what pre trained, scaling large pre trained models has looked like, and we need a new thing.[00:07:54] swyx: And obviously the new thing for people is some make, either people are calling it inference time compute or test time [00:08:00] compute. I think the collective terminology has been inference time, and I think that makes sense because test time, calling it test, meaning, has a very pre trained bias, meaning that the only reason for running inference at all is to test your model.[00:08:11] swyx: That is not true. Right. Yeah. So, so, I quite agree that. OpenAI seems to have adopted, or the community seems to have adopted this terminology of ITC instead of TTC. And that, that makes a lot of sense because like now we care about inference, even right down to compute optimality. Like I actually interviewed this author who recovered or reviewed the Chinchilla paper.[00:08:31] swyx: Chinchilla paper is compute optimal training, but what is not stated in there is it's pre trained compute optimal training. And once you start caring about inference, compute optimal training, you have a different scaling law. And in a way that we did not know last year.[00:08:45] Alessio: I wonder, because John is, he's also on the side of attention is all you need.[00:08:49] Alessio: Like he had the bet with Sasha. So I'm curious, like he doesn't believe in scaling, but he thinks the transformer, I wonder if he's still. So, so,[00:08:56] swyx: so he, obviously everything is nuanced and you know, I told him to play a character [00:09:00] for this debate, right? So he actually does. Yeah. He still, he still believes that we can scale more.[00:09:04] swyx: Uh, he just assumed the character to be very game for, for playing this debate. So even more kudos to him that he assumed a position that he didn't believe in and still won the debate.[00:09:16] Alessio: Get rekt, Dylan. Um, do you just want to quickly run through some of these things? Like, uh, Sarah's presentation, just the highlights.[00:09:24] swyx: Yeah, we can't go through everyone's slides, but I pulled out some things as a factor of, like, stuff that we were going to talk about. And we'll[00:09:30] Alessio: publish[00:09:31] swyx: the rest. Yeah, we'll publish on this feed the best of 2024 in those domains. And hopefully people can benefit from the work that our speakers have done.[00:09:39] swyx: But I think it's, uh, these are just good slides. And I've been, I've been looking for a sort of end of year recaps from, from people.[00:09:44] The Competitive AI Landscape[00:09:44] swyx: The field has progressed a lot. You know, I think the max ELO in 2023 on LMSys used to be 1200 for LMSys ELOs. And now everyone is at least at, uh, 1275 in their ELOs, and this is across Gemini, Chadjibuti, [00:10:00] Grok, O1.[00:10:01] swyx: ai, which with their E Large model, and Enthopic, of course. It's a very, very competitive race. There are multiple Frontier labs all racing, but there is a clear tier zero Frontier. And then there's like a tier one. It's like, I wish I had everything else. Tier zero is extremely competitive. It's effectively now three horse race between Gemini, uh, Anthropic and OpenAI.[00:10:21] swyx: I would say that people are still holding out a candle for XAI. XAI, I think, for some reason, because their API was very slow to roll out, is not included in these metrics. So it's actually quite hard to put on there. As someone who also does charts, XAI is continually snubbed because they don't work well with the benchmarking people.[00:10:42] swyx: Yeah, yeah, yeah. It's a little trivia for why XAI always gets ignored. The other thing is market share. So these are slides from Sarah. We have it up on the screen. It has gone from very heavily open AI. So we have some numbers and estimates. These are from RAMP. Estimates of open AI market share in [00:11:00] December 2023.[00:11:01] swyx: And this is basically, what is it, GPT being 95 percent of production traffic. And I think if you correlate that with stuff that we asked. Harrison Chase on the LangChain episode, it was true. And then CLAUD 3 launched mid middle of this year. I think CLAUD 3 launched in March, CLAUD 3. 5 Sonnet was in June ish.[00:11:23] swyx: And you can start seeing the market share shift towards opening, uh, towards that topic, uh, very, very aggressively. The more recent one is Gemini. So if I scroll down a little bit, this is an even more recent dataset. So RAM's dataset ends in September 2 2. 2024. Gemini has basically launched a price war at the low end, uh, with Gemini Flash, uh, being basically free for personal use.[00:11:44] swyx: Like, I think people don't understand the free tier. It's something like a billion tokens per day. Unless you're trying to abuse it, you cannot really exhaust your free tier on Gemini. They're really trying to get you to use it. They know they're in like third place, um, fourth place, depending how you, how you count.[00:11:58] swyx: And so they're going after [00:12:00] the Lower tier first, and then, you know, maybe the upper tier later, but yeah, Gemini Flash, according to OpenRouter, is now 50 percent of their OpenRouter requests. Obviously, these are the small requests. These are small, cheap requests that are mathematically going to be more.[00:12:15] swyx: The smart ones obviously are still going to OpenAI. But, you know, it's a very, very big shift in the market. Like basically 2023, 2022, To going into 2024 opening has gone from nine five market share to Yeah. Reasonably somewhere between 50 to 75 market share.[00:12:29] Alessio: Yeah. I'm really curious how ramped does the attribution to the model?[00:12:32] Alessio: If it's API, because I think it's all credit card spin. . Well, but it's all, the credit card doesn't say maybe. Maybe the, maybe when they do expenses, they upload the PDF, but yeah, the, the German I think makes sense. I think that was one of my main 2024 takeaways that like. The best small model companies are the large labs, which is not something I would have thought that the open source kind of like long tail would be like the small model.[00:12:53] swyx: Yeah, different sizes of small models we're talking about here, right? Like so small model here for Gemini is AB, [00:13:00] right? Uh, mini. We don't know what the small model size is, but yeah, it's probably in the double digits or maybe single digits, but probably double digits. The open source community has kind of focused on the one to three B size.[00:13:11] swyx: Mm-hmm . Yeah. Maybe[00:13:12] swyx: zero, maybe 0.5 B uh, that's moon dream and that is small for you then, then that's great. It makes sense that we, we have a range for small now, which is like, may, maybe one to five B. Yeah. I'll even put that at, at, at the high end. And so this includes Gemma from Gemini as well. But also includes the Apple Foundation models, which I think Apple Foundation is 3B.[00:13:32] Alessio: Yeah. No, that's great. I mean, I think in the start small just meant cheap. I think today small is actually a more nuanced discussion, you know, that people weren't really having before.[00:13:43] swyx: Yeah, we can keep going. This is a slide that I smiley disagree with Sarah. She's pointing to the scale SEAL leaderboard. I think the Researchers that I talked with at NeurIPS were kind of positive on this because basically you need private test [00:14:00] sets to prevent contamination.[00:14:02] swyx: And Scale is one of maybe three or four people this year that has really made an effort in doing a credible private test set leaderboard. Llama405B does well compared to Gemini and GPT 40. And I think that's good. I would say that. You know, it's good to have an open model that is that big, that does well on those metrics.[00:14:23] swyx: But anyone putting 405B in production will tell you, if you scroll down a little bit to the artificial analysis numbers, that it is very slow and very expensive to infer. Um, it doesn't even fit on like one node. of, uh, of H100s. Cerebras will be happy to tell you they can serve 4 or 5B on their super large chips.[00:14:42] swyx: But, um, you know, if you need to do anything custom to it, you're still kind of constrained. So, is 4 or 5B really that relevant? Like, I think most people are basically saying that they only use 4 or 5B as a teacher model to distill down to something. Even Meta is doing it. So with Lama 3. [00:15:00] 3 launched, they only launched the 70B because they use 4 or 5B to distill the 70B.[00:15:03] swyx: So I don't know if like open source is keeping up. I think they're the, the open source industrial complex is very invested in telling you that the, if the gap is narrowing, I kind of disagree. I think that the gap is widening with O1. I think there are very, very smart people trying to narrow that gap and they should.[00:15:22] swyx: I really wish them success, but you cannot use a chart that is nearing 100 in your saturation chart. And look, the distance between open source and closed source is narrowing. Of course it's going to narrow because you're near 100. This is stupid. But in metrics that matter, is open source narrowing?[00:15:38] swyx: Probably not for O1 for a while. And it's really up to the open source guys to figure out if they can match O1 or not.[00:15:46] Alessio: I think inference time compute is bad for open source just because, you know, Doc can donate the flops at training time, but he cannot donate the flops at inference time. So it's really hard to like actually keep up on that axis.[00:15:59] Alessio: Big, big business [00:16:00] model shift. So I don't know what that means for the GPU clouds. I don't know what that means for the hyperscalers, but obviously the big labs have a lot of advantage. Because, like, it's not a static artifact that you're putting the compute in. You're kind of doing that still, but then you're putting a lot of computed inference too.[00:16:17] swyx: Yeah, yeah, yeah. Um, I mean, Llama4 will be reasoning oriented. We talked with Thomas Shalom. Um, kudos for getting that episode together. That was really nice. Good, well timed. Actually, I connected with the AI meta guy, uh, at NeurIPS, and, um, yeah, we're going to coordinate something for Llama4. Yeah, yeah,[00:16:32] Alessio: and our friend, yeah.[00:16:33] Alessio: Clara Shi just joined to lead the business agent side. So I'm sure we'll have her on in the new year.[00:16:39] swyx: Yeah. So, um, my comment on, on the business model shift, this is super interesting. Apparently it is wide knowledge that OpenAI wanted more than 6. 6 billion dollars for their fundraise. They wanted to raise, you know, higher, and they did not.[00:16:51] swyx: And what that means is basically like, it's very convenient that we're not getting GPT 5, which would have been a larger pre train. We should have a lot of upfront money. And [00:17:00] instead we're, we're converting fixed costs into variable costs, right. And passing it on effectively to the customer. And it's so much easier to take margin there because you can directly attribute it to like, Oh, you're using this more.[00:17:12] swyx: Therefore you, you pay more of the cost and I'll just slap a margin in there. So like that lets you control your growth margin and like tie your. Your spend, or your sort of inference spend, accordingly. And it's just really interesting to, that this change in the sort of inference paradigm has arrived exactly at the same time that the funding environment for pre training is effectively drying up, kind of.[00:17:36] swyx: I feel like maybe the VCs are very in tune with research anyway, so like, they would have noticed this, but, um, it's just interesting.[00:17:43] Alessio: Yeah, and I was looking back at our yearly recap of last year. Yeah. And the big thing was like the mixed trial price fights, you know, and I think now it's almost like there's nowhere to go, like, you know, Gemini Flash is like basically giving it away for free.[00:17:55] Alessio: So I think this is a good way for the labs to generate more revenue and pass down [00:18:00] some of the compute to the customer. I think they're going to[00:18:02] swyx: keep going. I think that 2, will come.[00:18:05] Alessio: Yeah, I know. Totally. I mean, next year, the first thing I'm doing is signing up for Devin. Signing up for the pro chat GBT.[00:18:12] Alessio: Just to try. I just want to see what does it look like to spend a thousand dollars a month on AI?[00:18:17] swyx: Yes. Yes. I think if your, if your, your job is a, at least AI content creator or VC or, you know, someone who, whose job it is to stay on, stay on top of things, you should already be spending like a thousand dollars a month on, on stuff.[00:18:28] swyx: And then obviously easy to spend, hard to use. You have to actually use. The good thing is that actually Google lets you do a lot of stuff for free now. So like deep research. That they just launched. Uses a ton of inference and it's, it's free while it's in preview.[00:18:45] Alessio: Yeah. They need to put that in Lindy.[00:18:47] Alessio: I've been using Lindy lately. I've been a built a bunch of things once we had flow because I liked the new thing. It's pretty good. I even did a phone call assistant. Um, yeah, they just launched Lindy voice. Yeah, I think once [00:19:00] they get advanced voice mode like capability today, still like speech to text, you can kind of tell.[00:19:06] Alessio: Um, but it's good for like reservations and things like that. So I have a meeting prepper thing. And so[00:19:13] swyx: it's good. Okay. I feel like we've, we've covered a lot of stuff. Uh, I, yeah, I, you know, I think We will go over the individual, uh, talks in a separate episode. Uh, I don't want to take too much time with, uh, this stuff, but that suffice to say that there is a lot of progress in each field.[00:19:28] swyx: Uh, we covered vision. Basically this is all like the audience voting for what they wanted. And then I just invited the best people I could find in each audience, especially agents. Um, Graham, who I talked to at ICML in Vienna, he is currently still number one. It's very hard to stay on top of SweetBench.[00:19:45] swyx: OpenHand is currently still number one. switchbench full, which is the hardest one. He had very good thoughts on agents, which I, which I'll highlight for people. Everyone is saying 2025 is the year of agents, just like they said last year. And, uh, but he had [00:20:00] thoughts on like eight parts of what are the frontier problems to solve in agents.[00:20:03] swyx: And so I'll highlight that talk as well.[00:20:05] Alessio: Yeah. The number six, which is the Hacken agents learn more about the environment, has been a Super interesting to us as well, just to think through, because, yeah, how do you put an agent in an enterprise where most things in an enterprise have never been public, you know, a lot of the tooling, like the code bases and things like that.[00:20:23] Alessio: So, yeah, there's not indexing and reg. Well, yeah, but it's more like. You can't really rag things that are not documented. But people know them based on how they've been doing it. You know, so I think there's almost this like, you know, Oh, institutional knowledge. Yeah, the boring word is kind of like a business process extraction.[00:20:38] Alessio: Yeah yeah, I see. It's like, how do you actually understand how these things are done? I see. Um, and I think today the, the problem is that, Yeah, the agents are, that most people are building are good at following instruction, but are not as good as like extracting them from you. Um, so I think that will be a big unlock just to touch quickly on the Jeff Dean thing.[00:20:55] Alessio: I thought it was pretty, I mean, we'll link it in the, in the things, but. I think the main [00:21:00] focus was like, how do you use ML to optimize the systems instead of just focusing on ML to do something else? Yeah, I think speculative decoding, we had, you know, Eugene from RWKB on the podcast before, like he's doing a lot of that with Fetterless AI.[00:21:12] swyx: Everyone is. I would say it's the norm. I'm a little bit uncomfortable with how much it costs, because it does use more of the GPU per call. But because everyone is so keen on fast inference, then yeah, makes sense.[00:21:24] Alessio: Exactly. Um, yeah, but we'll link that. Obviously Jeff is great.[00:21:30] swyx: Jeff is, Jeff's talk was more, it wasn't focused on Gemini.[00:21:33] swyx: I think people got the wrong impression from my tweet. It's more about how Google approaches ML and uses ML to design systems and then systems feedback into ML. And I think this ties in with Lubna's talk.[00:21:45] Synthetic Data and Future Trends[00:21:45] swyx: on synthetic data where it's basically the story of bootstrapping of humans and AI in AI research or AI in production.[00:21:53] swyx: So her talk was on synthetic data, where like how much synthetic data has grown in 2024 in the pre training side, the post training side, [00:22:00] and the eval side. And I think Jeff then also extended it basically to chips, uh, to chip design. So he'd spend a lot of time talking about alpha chip. And most of us in the audience are like, we're not working on hardware, man.[00:22:11] swyx: Like you guys are great. TPU is great. Okay. We'll buy TPUs.[00:22:14] Alessio: And then there was the earlier talk. Yeah. But, and then we have, uh, I don't know if we're calling them essays. What are we calling these? But[00:22:23] swyx: for me, it's just like bonus for late in space supporters, because I feel like they haven't been getting anything.[00:22:29] swyx: And then I wanted a more high frequency way to write stuff. Like that one I wrote in an afternoon. I think basically we now have an answer to what Ilya saw. It's one year since. The blip. And we know what he saw in 2014. We know what he saw in 2024. We think we know what he sees in 2024. He gave some hints and then we have vague indications of what he saw in 2023.[00:22:54] swyx: So that was the Oh, and then 2016 as well, because of this lawsuit with Elon, OpenAI [00:23:00] is publishing emails from Sam's, like, his personal text messages to Siobhan, Zelis, or whatever. So, like, we have emails from Ilya saying, this is what we're seeing in OpenAI, and this is why we need to scale up GPUs. And I think it's very prescient in 2016 to write that.[00:23:16] swyx: And so, like, it is exactly, like, basically his insights. It's him and Greg, basically just kind of driving the scaling up of OpenAI, while they're still playing Dota. They're like, no, like, we see the path here.[00:23:30] Alessio: Yeah, and it's funny, yeah, they even mention, you know, we can only train on 1v1 Dota. We need to train on 5v5, and that takes too many GPUs.[00:23:37] Alessio: Yeah,[00:23:37] swyx: and at least for me, I can speak for myself, like, I didn't see the path from Dota to where we are today. I think even, maybe if you ask them, like, they wouldn't necessarily draw a straight line. Yeah,[00:23:47] Alessio: no, definitely. But I think like that was like the whole idea of almost like the RL and we talked about this with Nathan on his podcast.[00:23:55] Alessio: It's like with RL, you can get very good at specific things, but then you can't really like generalize as much. And I [00:24:00] think the language models are like the opposite, which is like, you're going to throw all this data at them and scale them up, but then you really need to drive them home on a specific task later on.[00:24:08] Alessio: And we'll talk about the open AI reinforcement, fine tuning, um, announcement too, and all of that. But yeah, I think like scale is all you need. That's kind of what Elia will be remembered for. And I think just maybe to clarify on like the pre training is over thing that people love to tweet. I think the point of the talk was like everybody, we're scaling these chips, we're scaling the compute, but like the second ingredient which is data is not scaling at the same rate.[00:24:35] Alessio: So it's not necessarily pre training is over. It's kind of like What got us here won't get us there. In his email, he predicted like 10x growth every two years or something like that. And I think maybe now it's like, you know, you can 10x the chips again, but[00:24:49] swyx: I think it's 10x per year. Was it? I don't know.[00:24:52] Alessio: Exactly. And Moore's law is like 2x. So it's like, you know, much faster than that. And yeah, I like the fossil fuel of AI [00:25:00] analogy. It's kind of like, you know, the little background tokens thing. So the OpenAI reinforcement fine tuning is basically like, instead of fine tuning on data, you fine tune on a reward model.[00:25:09] Alessio: So it's basically like, instead of being data driven, it's like task driven. And I think people have tasks to do, they don't really have a lot of data. So I'm curious to see how that changes, how many people fine tune, because I think this is what people run into. It's like, Oh, you can fine tune llama. And it's like, okay, where do I get the data?[00:25:27] Alessio: To fine tune it on, you know, so it's great that we're moving the thing. And then I really like he had this chart where like, you know, the brain mass and the body mass thing is basically like mammals that scaled linearly by brain and body size, and then humans kind of like broke off the slope. So it's almost like maybe the mammal slope is like the pre training slope.[00:25:46] Alessio: And then the post training slope is like the, the human one.[00:25:49] swyx: Yeah. I wonder what the. I mean, we'll know in 10 years, but I wonder what the y axis is for, for Ilya's SSI. We'll try to get them on.[00:25:57] Alessio: Ilya, if you're listening, you're [00:26:00] welcome here. Yeah, and then he had, you know, what comes next, like agent, synthetic data, inference, compute, I thought all of that was like that.[00:26:05] Alessio: I don't[00:26:05] swyx: think he was dropping any alpha there. Yeah, yeah, yeah.[00:26:07] Alessio: Yeah. Any other new reps? Highlights?[00:26:10] swyx: I think that there was comparatively a lot more work. Oh, by the way, I need to plug that, uh, my friend Yi made this, like, little nice paper. Yeah, that was really[00:26:20] swyx: nice.[00:26:20] swyx: Uh, of, uh, of, like, all the, he's, she called it must read papers of 2024.[00:26:26] swyx: So I laid out some of these at NeurIPS, and it was just gone. Like, everyone just picked it up. Because people are dying for, like, little guidance and visualizations And so, uh, I thought it was really super nice that we got there.[00:26:38] Alessio: Should we do a late in space book for each year? Uh, I thought about it. For each year we should.[00:26:42] Alessio: Coffee table book. Yeah. Yeah. Okay. Put it in the will. Hi, Will. By the way, we haven't introduced you. He's our new, you know, general organist, Jamie. You need to[00:26:52] swyx: pull up more things. One thing I saw that, uh, Okay, one fun one, and then one [00:27:00] more general one. So the fun one is this paper on agent collusion. This is a paper on steganography.[00:27:06] swyx: This is secret collusion among AI agents, multi agent deception via steganography. I tried to go to NeurIPS in order to find these kinds of papers because the real reason Like NeurIPS this year has a lottery system. A lot of people actually even go and don't buy tickets because they just go and attend the side events.[00:27:22] swyx: And then also the people who go and end up crowding around the most popular papers, which you already know and already read them before you showed up to NeurIPS. So the only reason you go there is to talk to the paper authors, but there's like something like 10, 000 other. All these papers out there that, you know, are just people's work that they, that they did on the air and they failed to get attention for one reason or another.[00:27:42] swyx: And this was one of them. Uh, it was like all the way at the back. And this is a deep mind paper that actually focuses on collusion between AI agents, uh, by hiding messages in the text that they generate. Uh, so that's what steganography is. So a very simple example would be the first letter of every word.[00:27:57] swyx: If you Pick that out, you know, and the code sends a [00:28:00] different message than that. But something I've always emphasized is to LLMs, we read left to right. LLMs can read up, down, sideways, you know, in random character order. And it's the same to them as it is to us. So if we were ever to get You know, self motivated, underlined LLMs that we're trying to collaborate to take over the planet.[00:28:19] swyx: This would be how they do it. They spread messages among us in the messages that we generate. And he developed a scaling law for that. So he marked, I'm showing it on screen right now, the emergence of this phenomenon. Basically, for example, for Cypher encoding, GPT 2, Lama 2, mixed trial, GPT 3. 5, zero capabilities, and sudden 4.[00:28:40] swyx: And this is the kind of Jason Wei type emergence properties that people kind of look for. I think what made this paper stand out as well, so he developed the benchmark for steganography collusion, and he also focused on shelling point collusion, which is very low coordination. For agreeing on a decoding encoding format, you kind of need to have some [00:29:00] agreement on that.[00:29:00] swyx: But, but shelling point means like very, very low or almost no coordination. So for example, if I, if I ask someone, if the only message I give you is meet me in New York and you're not aware. Or when you would probably meet me at Grand Central Station. That is the Grand Central Station is a shelling point.[00:29:16] swyx: And it's probably somewhere, somewhere during the day. That is the shelling point of New York is Grand Central. To that extent, shelling points for steganography are things like the, the, the common decoding methods that we talked about. It will be interesting at some point in the future when we are worried about alignment.[00:29:30] swyx: It is not interesting today, but it's interesting that DeepMind is already thinking about this.[00:29:36] Alessio: I think that's like one of the hardest things about NeurIPS. It's like the long tail. I[00:29:41] swyx: found a pricing guy. I'm going to feature him on the podcast. Basically, this guy from NVIDIA worked out the optimal pricing for language models.[00:29:51] swyx: It's basically an econometrics paper at NeurIPS, where everyone else is talking about GPUs. And the guy with the GPUs is[00:29:57] Alessio: talking[00:29:57] swyx: about economics instead. [00:30:00] That was the sort of fun one. So the focus I saw is that model papers at NeurIPS are kind of dead. No one really presents models anymore. It's just data sets.[00:30:12] swyx: This is all the grad students are working on. So like there was a data sets track and then I was looking around like, I was like, you don't need a data sets track because every paper is a data sets paper. And so data sets and benchmarks, they're kind of flip sides of the same thing. So Yeah. Cool. Yeah, if you're a grad student, you're a GPU boy, you kind of work on that.[00:30:30] swyx: And then the, the sort of big model that people walk around and pick the ones that they like, and then they use it in their models. And that's, that's kind of how it develops. I, I feel like, um, like, like you didn't last year, you had people like Hao Tian who worked on Lava, which is take Lama and add Vision.[00:30:47] swyx: And then obviously actually I hired him and he added Vision to Grok. Now he's the Vision Grok guy. This year, I don't think there was any of those.[00:30:55] Alessio: What were the most popular, like, orals? Last year it was like the [00:31:00] Mixed Monarch, I think, was like the most attended. Yeah, uh, I need to look it up. Yeah, I mean, if nothing comes to mind, that's also kind of like an answer in a way.[00:31:10] Alessio: But I think last year there was a lot of interest in, like, furthering models and, like, different architectures and all of that.[00:31:16] swyx: I will say that I felt the orals, oral picks this year were not very good. Either that or maybe it's just a So that's the highlight of how I have changed in terms of how I view papers.[00:31:29] swyx: So like, in my estimation, two of the best papers in this year for datasets or data comp and refined web or fine web. These are two actually industrially used papers, not highlighted for a while. I think DCLM got the spotlight, FineWeb didn't even get the spotlight. So like, it's just that the picks were different.[00:31:48] swyx: But one thing that does get a lot of play that a lot of people are debating is the role that's scheduled. This is the schedule free optimizer paper from Meta from Aaron DeFazio. And this [00:32:00] year in the ML community, there's been a lot of chat about shampoo, soap, all the bathroom amenities for optimizing your learning rates.[00:32:08] swyx: And, uh, most people at the big labs are. Who I asked about this, um, say that it's cute, but it's not something that matters. I don't know, but it's something that was discussed and very, very popular. 4Wars[00:32:19] Alessio: of AI recap maybe, just quickly. Um, where do you want to start? Data?[00:32:26] swyx: So to remind people, this is the 4Wars piece that we did as one of our earlier recaps of this year.[00:32:31] swyx: And the belligerents are on the left, journalists, writers, artists, anyone who owns IP basically, New York Times, Stack Overflow, Reddit, Getty, Sarah Silverman, George RR Martin. Yeah, and I think this year we can add Scarlett Johansson to that side of the fence. So anyone suing, open the eye, basically. I actually wanted to get a snapshot of all the lawsuits.[00:32:52] swyx: I'm sure some lawyer can do it. That's the data quality war. On the right hand side, we have the synthetic data people, and I think we talked about Lumna's talk, you know, [00:33:00] really showing how much synthetic data has come along this year. I think there was a bit of a fight between scale. ai and the synthetic data community, because scale.[00:33:09] swyx: ai published a paper saying that synthetic data doesn't work. Surprise, surprise, scale. ai is the leading vendor of non synthetic data. Only[00:33:17] Alessio: cage free annotated data is useful.[00:33:21] swyx: So I think there's some debate going on there, but I don't think it's much debate anymore that at least synthetic data, for the reasons that are blessed in Luna's talk, Makes sense.[00:33:32] swyx: I don't know if you have any perspectives there.[00:33:34] Alessio: I think, again, going back to the reinforcement fine tuning, I think that will change a little bit how people think about it. I think today people mostly use synthetic data, yeah, for distillation and kind of like fine tuning a smaller model from like a larger model.[00:33:46] Alessio: I'm not super aware of how the frontier labs use it outside of like the rephrase, the web thing that Apple also did. But yeah, I think it'll be. Useful. I think like whether or not that gets us the big [00:34:00] next step, I think that's maybe like TBD, you know, I think people love talking about data because it's like a GPU poor, you know, I think, uh, synthetic data is like something that people can do, you know, so they feel more opinionated about it compared to, yeah, the optimizers stuff, which is like,[00:34:17] swyx: they don't[00:34:17] Alessio: really work[00:34:18] swyx: on.[00:34:18] swyx: I think that there is an angle to the reasoning synthetic data. So this year, we covered in the paper club, the star series of papers. So that's star, Q star, V star. It basically helps you to synthesize reasoning steps, or at least distill reasoning steps from a verifier. And if you look at the OpenAI RFT, API that they released, or that they announced, basically they're asking you to submit graders, or they choose from a preset list of graders.[00:34:49] swyx: Basically It feels like a way to create valid synthetic data for them to fine tune their reasoning paths on. Um, so I think that is another angle where it starts to make sense. And [00:35:00] so like, it's very funny that basically all the data quality wars between Let's say the music industry or like the newspaper publishing industry or the textbooks industry on the big labs.[00:35:11] swyx: It's all of the pre training era. And then like the new era, like the reasoning era, like nobody has any problem with all the reasoning, especially because it's all like sort of math and science oriented with, with very reasonable graders. I think the more interesting next step is how does it generalize beyond STEM?[00:35:27] swyx: We've been using O1 for And I would say like for summarization and creative writing and instruction following, I think it's underrated. I started using O1 in our intro songs before we killed the intro songs, but it's very good at writing lyrics. You know, I can actually say like, I think one of the O1 pro demos.[00:35:46] swyx: All of these things that Noam was showing was that, you know, you can write an entire paragraph or three paragraphs without using the letter A, right?[00:35:53] Creative Writing with AI[00:35:53] swyx: So like, like literally just anything instead of token, like not even token level, character level manipulation and [00:36:00] counting and instruction following. It's, uh, it's very, very strong.[00:36:02] swyx: And so no surprises when I ask it to rhyme, uh, and to, to create song lyrics, it's going to do that very much better than in previous models. So I think it's underrated for creative writing.[00:36:11] Alessio: Yeah.[00:36:12] Legal and Ethical Issues in AI[00:36:12] Alessio: What do you think is the rationale that they're going to have in court when they don't show you the thinking traces of O1, but then they want us to, like, they're getting sued for using other publishers data, you know, but then on their end, they're like, well, you shouldn't be using my data to then train your model.[00:36:29] Alessio: So I'm curious to see how that kind of comes. Yeah, I mean, OPA has[00:36:32] swyx: many ways to publish, to punish people without bringing, taking them to court. Already banned ByteDance for distilling their, their info. And so anyone caught distilling the chain of thought will be just disallowed to continue on, on, on the API.[00:36:44] swyx: And it's fine. It's no big deal. Like, I don't even think that's an issue at all, just because the chain of thoughts are pretty well hidden. Like you have to work very, very hard to, to get it to leak. And then even when it leaks the chain of thought, you don't know if it's, if it's [00:37:00] The bigger concern is actually that there's not that much IP hiding behind it, that Cosign, which we talked about, we talked to him on Dev Day, can just fine tune 4.[00:37:13] swyx: 0 to beat 0. 1 Cloud SONET so far is beating O1 on coding tasks without, at least O1 preview, without being a reasoning model, same for Gemini Pro or Gemini 2. 0. So like, how much is reasoning important? How much of a moat is there in this, like, All of these are proprietary sort of training data that they've presumably accomplished.[00:37:34] swyx: Because even DeepSeek was able to do it. And they had, you know, two months notice to do this, to do R1. So, it's actually unclear how much moat there is. Obviously, you know, if you talk to the Strawberry team, they'll be like, yeah, I mean, we spent the last two years doing this. So, we don't know. And it's going to be Interesting because there'll be a lot of noise from people who say they have inference time compute and actually don't because they just have fancy chain of thought.[00:38:00][00:38:00] swyx: And then there's other people who actually do have very good chain of thought. And you will not see them on the same level as OpenAI because OpenAI has invested a lot in building up the mythology of their team. Um, which makes sense. Like the real answer is somewhere in between.[00:38:13] Alessio: Yeah, I think that's kind of like the main data war story developing.[00:38:18] The Data War: GPU Poor vs. GPU Rich[00:38:18] Alessio: GPU poor versus GPU rich. Yeah. Where do you think we are? I think there was, again, going back to like the small model thing, there was like a time in which the GPU poor were kind of like the rebel faction working on like these models that were like open and small and cheap. And I think today people don't really care as much about GPUs anymore.[00:38:37] Alessio: You also see it in the price of the GPUs. Like, you know, that market is kind of like plummeted because there's people don't want to be, they want to be GPU free. They don't even want to be poor. They just want to be, you know, completely without them. Yeah. How do you think about this war? You[00:38:52] swyx: can tell me about this, but like, I feel like the, the appetite for GPU rich startups, like the, you know, the, the funding plan is we will raise 60 million and [00:39:00] we'll give 50 of that to NVIDIA.[00:39:01] swyx: That is gone, right? Like, no one's, no one's pitching that. This was literally the plan, the exact plan of like, I can name like four or five startups, you know, this time last year. So yeah, GPU rich startups gone.[00:39:12] The Rise of GPU Ultra Rich[00:39:12] swyx: But I think like, The GPU ultra rich, the GPU ultra high net worth is still going. So, um, now we're, you know, we had Leopold's essay on the trillion dollar cluster.[00:39:23] swyx: We're not quite there yet. We have multiple labs, um, you know, XAI very famously, you know, Jensen Huang praising them for being. Best boy number one in spinning up 100, 000 GPU cluster in like 12 days or something. So likewise at Meta, likewise at OpenAI, likewise at the other labs as well. So like the GPU ultra rich are going to keep doing that because I think partially it's an article of faith now that you just need it.[00:39:46] swyx: Like you don't even know what it's going to, what you're going to use it for. You just, you just need it. And it makes sense that if, especially if we're going into. More researchy territory than we are. So let's say 2020 to 2023 was [00:40:00] let's scale big models territory because we had GPT 3 in 2020 and we were like, okay, we'll go from 1.[00:40:05] swyx: 75b to 1. 8b, 1. 8t. And that was GPT 3 to GPT 4. Okay, that's done. As far as everyone is concerned, Opus 3. 5 is not coming out, GPT 4. 5 is not coming out, and Gemini 2, we don't have Pro, whatever. We've hit that wall. Maybe I'll call it the 2 trillion perimeter wall. We're not going to 10 trillion. No one thinks it's a good idea, at least from training costs, from the amount of data, or at least the inference.[00:40:36] swyx: Would you pay 10x the price of GPT Probably not. Like, like you want something else that, that is at least more useful. So it makes sense that people are pivoting in terms of their inference paradigm.[00:40:47] Emerging Trends in AI Models[00:40:47] swyx: And so when it's more researchy, then you actually need more just general purpose compute to mess around with, uh, at the exact same time that production deployments of the old, the previous paradigm is still ramping up,[00:40:58] swyx: um,[00:40:58] swyx: uh, pretty aggressively.[00:40:59] swyx: So [00:41:00] it makes sense that the GPU rich are growing. We have now interviewed both together and fireworks and replicates. Uh, we haven't done any scale yet. But I think Amazon, maybe kind of a sleeper one, Amazon, in a sense of like they, at reInvent, I wasn't expecting them to do so well, but they are now a foundation model lab.[00:41:18] swyx: It's kind of interesting. Um, I think, uh, you know, David went over there and started just creating models.[00:41:25] Alessio: Yeah, I mean, that's the power of prepaid contracts. I think like a lot of AWS customers, you know, they do this big reserve instance contracts and now they got to use their money. That's why so many startups.[00:41:37] Alessio: Get bought through the AWS marketplace so they can kind of bundle them together and prefer pricing.[00:41:42] swyx: Okay, so maybe GPU super rich doing very well, GPU middle class dead, and then GPU[00:41:48] Alessio: poor. I mean, my thing is like, everybody should just be GPU rich. There shouldn't really be, even the GPU poorest, it's like, does it really make sense to be GPU poor?[00:41:57] Alessio: Like, if you're GPU poor, you should just use the [00:42:00] cloud. Yes, you know, and I think there might be a future once we kind of like figure out what the size and shape of these models is where like the tiny box and these things come to fruition where like you can be GPU poor at home. But I think today is like, why are you working so hard to like get these models to run on like very small clusters where it's like, It's so cheap to run them.[00:42:21] Alessio: Yeah, yeah,[00:42:22] swyx: yeah. I think mostly people think it's cool. People think it's a stepping stone to scaling up. So they aspire to be GPU rich one day and they're working on new methods. Like news research, like probably the most deep tech thing they've done this year is Distro or whatever the new name is.[00:42:38] swyx: There's a lot of interest in heterogeneous computing, distributed computing. I tend generally to de emphasize that historically, but it may be coming to a time where it is starting to be relevant. I don't know. You know, SF compute launched their compute marketplace this year, and like, who's really using that?[00:42:53] swyx: Like, it's a bunch of small clusters, disparate types of compute, and if you can make that [00:43:00] useful, then that will be very beneficial to the broader community, but maybe still not the source of frontier models. It's just going to be a second tier of compute that is unlocked for people, and that's fine. But yeah, I mean, I think this year, I would say a lot more on device, We are, I now have Apple intelligence on my phone.[00:43:19] swyx: Doesn't do anything apart from summarize my notifications. But still, not bad. Like, it's multi modal.[00:43:25] Alessio: Yeah, the notification summaries are so and so in my experience.[00:43:29] swyx: Yeah, but they add, they add juice to life. And then, um, Chrome Nano, uh, Gemini Nano is coming out in Chrome. Uh, they're still feature flagged, but you can, you can try it now if you, if you use the, uh, the alpha.[00:43:40] swyx: And so, like, I, I think, like, you know, We're getting the sort of GPU poor version of a lot of these things coming out, and I think it's like quite useful. Like Windows as well, rolling out RWKB in sort of every Windows department is super cool. And I think the last thing that I never put in this GPU poor war, that I think I should now, [00:44:00] is the number of startups that are GPU poor but still scaling very well, as sort of wrappers on top of either a foundation model lab, or GPU Cloud.[00:44:10] swyx: GPU Cloud, it would be Suno. Suno, Ramp has rated as one of the top ranked, fastest growing startups of the year. Um, I think the last public number is like zero to 20 million this year in ARR and Suno runs on Moto. So Suno itself is not GPU rich, but they're just doing the training on, on Moto, uh, who we've also talked to on, on the podcast.[00:44:31] swyx: The other one would be Bolt, straight cloud wrapper. And, and, um, Again, another, now they've announced 20 million ARR, which is another step up from our 8 million that we put on the title. So yeah, I mean, it's crazy that all these GPU pores are finding a way while the GPU riches are also finding a way. And then the only failures, I kind of call this the GPU smiling curve, where the edges do well, because you're either close to the machines, and you're like [00:45:00] number one on the machines, or you're like close to the customers, and you're number one on the customer side.[00:45:03] swyx: And the people who are in the middle. Inflection, um, character, didn't do that great. I think character did the best of all of them. Like, you have a note in here that we apparently said that character's price tag was[00:45:15] Alessio: 1B.[00:45:15] swyx: Did I say that?[00:45:16] Alessio: Yeah. You said Google should just buy them for 1B. I thought it was a crazy number.[00:45:20] Alessio: Then they paid 2. 7 billion. I mean, for like,[00:45:22] swyx: yeah.[00:45:22] Alessio: What do you pay for node? Like, I don't know what the game world was like. Maybe the starting price was 1B. I mean, whatever it was, it worked out for everybody involved.[00:45:31] The Multi-Modality War[00:45:31] Alessio: Multimodality war. And this one, we never had text to video in the first version, which now is the hottest.[00:45:37] swyx: Yeah, I would say it's a subset of image, but yes.[00:45:40] Alessio: Yeah, well, but I think at the time it wasn't really something people were doing, and now we had VO2 just came out yesterday. Uh, Sora was released last month, last week. I've not tried Sora, because the day that I tried, it wasn't, yeah. I[00:45:54] swyx: think it's generally available now, you can go to Sora.[00:45:56] swyx: com and try it. Yeah, they had[00:45:58] Alessio: the outage. Which I [00:46:00] think also played a part into it. Small things. Yeah. What's the other model that you posted today that was on Replicate? Video or OneLive?[00:46:08] swyx: Yeah. Very, very nondescript name, but it is from Minimax, which I think is a Chinese lab. The Chinese labs do surprisingly well at the video models.[00:46:20] swyx: I'm not sure it's actually Chinese. I don't know. Hold me up to that. Yep. China. It's good. Yeah, the Chinese love video. What can I say? They have a lot of training data for video. Or a more relaxed regulatory environment.[00:46:37] Alessio: Uh, well, sure, in some way. Yeah, I don't think there's much else there. I think like, you know, on the image side, I think it's still open.[00:46:45] Alessio: Yeah, I mean,[00:46:46] swyx: 11labs is now a unicorn. So basically, what is multi modality war? Multi modality war is, do you specialize in a single modality, right? Or do you have GodModel that does all the modalities? So this is [00:47:00] definitely still going, in a sense of 11 labs, you know, now Unicorn, PicoLabs doing well, they launched Pico 2.[00:47:06] swyx: 0 recently, HeyGen, I think has reached 100 million ARR, Assembly, I don't know, but they have billboards all over the place, so I assume they're doing very, very well. So these are all specialist models, specialist models and specialist startups. And then there's the big labs who are doing the sort of all in one play.[00:47:24] swyx: And then here I would highlight Gemini 2 for having native image output. Have you seen the demos? Um, yeah, it's, it's hard to keep up. Literally they launched this last week and a shout out to Paige Bailey, who came to the Latent Space event to demo on the day of launch. And she wasn't prepared. She was just like, I'm just going to show you.[00:47:43] swyx: So they have voice. They have, you know, obviously image input, and then they obviously can code gen and all that. But the new one that OpenAI and Meta both have but they haven't launched yet is image output. So you can literally, um, I think their demo video was that you put in an image of a [00:48:00] car, and you ask for minor modifications to that car.[00:48:02] swyx: They can generate you that modification exactly as you asked. So there's no need for the stable diffusion or comfy UI workflow of like mask here and then like infill there in paint there and all that, all that stuff. This is small model nonsense. Big model people are like, huh, we got you in as everything in the transformer.[00:48:21] swyx: This is the multimodality war, which is, do you, do you bet on the God model or do you string together a whole bunch of, uh, Small models like a, like a chump. Yeah,[00:48:29] Alessio: I don't know, man. Yeah, that would be interesting. I mean, obviously I use Midjourney for all of our thumbnails. Um, they've been doing a ton on the product, I would say.[00:48:38] Alessio: They launched a new Midjourney editor thing. They've been doing a ton. Because I think, yeah, the motto is kind of like, Maybe, you know, people say black forest, the black forest models are better than mid journey on a pixel by pixel basis. But I think when you put it, put it together, have you tried[00:48:53] swyx: the same problems on black forest?[00:48:55] Alessio: Yes. But the problem is just like, you know, on black forest, it generates one image. And then it's like, you got to [00:49:00] regenerate. You don't have all these like UI things. Like what I do, no, but it's like time issue, you know, it's like a mid[00:49:06] swyx: journey. Call the API four times.[00:49:08] Alessio: No, but then there's no like variate.[00:49:10] Alessio: Like the good thing about mid journey is like, you just go in there and you're cooking. There's a lot of stuff that just makes it really easy. And I think people underestimate that. Like, it's not really a skill issue, because I'm paying mid journey, so it's a Black Forest skill issue, because I'm not paying them, you know?[00:49:24] Alessio: Yeah,[00:49:25] swyx: so, okay, so, uh, this is a UX thing, right? Like, you, you, you understand that, at least, we think that Black Forest should be able to do all that stuff. I will also shout out, ReCraft has come out, uh, on top of the image arena that, uh, artificial analysis has done, has apparently, uh, Flux's place. Is this still true?[00:49:41] swyx: So, Artificial Analysis is now a company. I highlighted them I think in one of the early AI Newses of the year. And they have launched a whole bunch of arenas. So, they're trying to take on LM Arena, Anastasios and crew. And they have an image arena. Oh yeah, Recraft v3 is now beating Flux 1. 1. Which is very surprising [00:50:00] because Flux And Black Forest Labs are the old stable diffusion crew who left stability after, um, the management issues.[00:50:06] swyx: So Recurve has come from nowhere to be the top image model. Uh, very, very strange. I would also highlight that Grok has now launched Aurora, which is, it's very interesting dynamics between Grok and Black Forest Labs because Grok's images were originally launched, uh, in partnership with Black Forest Labs as a, as a thin wrapper.[00:50:24] swyx: And then Grok was like, no, we'll make our own. And so they've made their own. I don't know, there are no APIs or benchmarks about it. They just announced it. So yeah, that's the multi modality war. I would say that so far, the small model, the dedicated model people are winning, because they are just focused on their tasks.[00:50:42] swyx: But the big model, People are always catching up. And the moment I saw the Gemini 2 demo of image editing, where I can put in an image and just request it and it does, that's how AI should work. Not like a whole bunch of complicated steps. So it really is something. And I think one frontier that we haven't [00:51:00] seen this year, like obviously video has done very well, and it will continue to grow.[00:51:03] swyx: You know, we only have Sora Turbo today, but at some point we'll get full Sora. Oh, at least the Hollywood Labs will get Fulsora. We haven't seen video to audio, or video synced to audio. And so the researchers that I talked to are already starting to talk about that as the next frontier. But there's still maybe like five more years of video left to actually be Soda.[00:51:23] swyx: I would say that Gemini's approach Compared to OpenAI, Gemini seems, or DeepMind's approach to video seems a lot more fully fledged than OpenAI. Because if you look at the ICML recap that I published that so far nobody has listened to, um, that people have listened to it. It's just a different, definitely different audience.[00:51:43] swyx: It's only seven hours long. Why are people not listening? It's like everything in Uh, so, so DeepMind has, is working on Genie. They also launched Genie 2 and VideoPoet. So, like, they have maybe four years advantage on world modeling that OpenAI does not have. Because OpenAI basically only started [00:52:00] Diffusion Transformers last year, you know, when they hired, uh, Bill Peebles.[00:52:03] swyx: So, DeepMind has, has a bit of advantage here, I would say, in, in, in showing, like, the reason that VO2, while one, They cherry pick their videos. So obviously it looks better than Sora, but the reason I would believe that VO2, uh, when it's fully launched will do very well is because they have all this background work in video that they've done for years.[00:52:22] swyx: Like, like last year's NeurIPS, I already was interviewing some of their video people. I forget their model name, but for, for people who are dedicated fans, they can go to NeurIPS 2023 and see, see that paper.[00:52:32] Alessio: And then last but not least, the LLMOS. We renamed it to Ragops, formerly known as[00:52:39] swyx: Ragops War. I put the latest chart on the Braintrust episode.[00:52:43] swyx: I think I'm going to separate these essays from the episode notes. So the reason I used to do that, by the way, is because I wanted to show up on Hacker News. I wanted the podcast to show up on Hacker News. So I always put an essay inside of there because Hacker News people like to read and not listen.[00:52:58] Alessio: So episode essays,[00:52:59] swyx: I remember [00:53:00] purchasing them separately. You say Lanchain Llama Index is still growing.[00:53:03] Alessio: Yeah, so I looked at the PyPy stats, you know. I don't care about stars. On PyPy you see Do you want to share your screen? Yes. I prefer to look at actual downloads, not at stars on GitHub. So if you look at, you know, Lanchain still growing.[00:53:20] Alessio: These are the last six months. Llama Index still growing. What I've basically seen is like things that, One, obviously these things have A commercial product. So there's like people buying this and sticking with it versus kind of hopping in between things versus, you know, for example, crew AI, not really growing as much.[00:53:38] Alessio: The stars are growing. If you look on GitHub, like the stars are growing, but kind of like the usage is kind of like flat. In the last six months, have they done some[00:53:4

god ceo new york amazon spotify time world europe google ai china apple vision pr voice future speaking san francisco new york times phd video thinking chinese simple data predictions elon musk iphone surprise impact legal code tesla chatgpt reflecting memory ga discord reddit busy lgbt cloud flash stem honestly ab pros windows jeff bezos excited researchers lower unicorns ip tackling sort survey insane tier cto vc whispers applications doc signing seal fireworks f1 genie academic sf openai gemini organizing nvidia ux api assembly davos frontier chrome makes scarlett johansson ui mm turbo bash gpt soda ml aws lama dropbox mosaic creative writing github drafting reinvent canvas 1b bolt apis lava ruler exact stripe dev wwdc pico vm hundred strawberry sander bt flux vcs taiwanese 200k moto arr gartner assumption sora opus google docs parting nemo blackwell sam altman google drive sombra llm gpu opa tbd ramp elia 3b elo gnome agi 5b estimates bytedance midjourney leopold dota ciso haiku dx sarah silverman coursera rag gpus sonnets george rr martin cypher quill getty cobalt sdks ilya deepmind noam sheesh v2 alessio ttc perplexity future trends lms satya anthropic r1 grok ssi stack overflow rl 8b itc emerging trends theoretically sota yi replicate vo2 suno mistral veo black forest graphql inflection aitor brain trust databricks xai chinchillas adept nosql gpts grand central jensen huang grand central station hacker news ai models zep mcp hacken ethical issues cosign claud ai news distro gpc lubna autogpt neo4j tpu jeremy howard o3 gbt o1 gpd quent heygen loras exa gradients 70b minimax langchain neurips jeff dean 400b 128k elos gemini pro cerebras code interpreter icml john franco r1s lstm ai winter aws reinvent muser latent space pypy dan gross nova pro paige bailey noam brown quiet capital john frankel
monos estocásticos
Google y OpenAI se han quedado sin datos para su IA. La gente de pueblo es la única esperanza

monos estocásticos

Play Episode Listen Later Dec 19, 2024 86:55


Ni Gemini 2.0, ni o1 Pro saben distinguir una porra de un salmorejo Entrevistamos a Rubén Colomer sobre el primer fondo de capital riesgo para startups de IA Además: - Meta contra OpenAI- El cierre de Cruise- Ilya Sutskever dice cosas- Veo 2 vs. Sora- Skyrim con LLM en local Patrocinadores: Don Ibérico vuelve a nuestras casas por Navidad y ahora tienes un 15% de descuento con el código MONOS en doniberico.net Ya sabéis, una empresa familiar que envía a domicilio un jamón de bellota 100% ibérico de primerísimo nivel desde Guijuelo. Envío rapidísimo. Si lo prefieres, cortado y envasado al vacío, a cuchillo o a máquina, con su hueso para el caldo y otros ibéricos. Hemos enviado las felicitaciones navideñas de este año con nuestros LoRAs de freepik.es, imágenes hiperrealistas con nuestra cara que ahora se pueden convertir en vídeo directamente desde la suite de herramientas de IA de Freepik. Ni confirmamos ni desmentimos que nos hayamos puesto un poco de pelo y masa muscular de más...  Monos estocásticos es un podcast sobre inteligencia artificial presentado por Antonio Ortiz (@antonello) y Matías S. Zavia (@matiass). Más en monosestocasticos.com

monos estocásticos
Elon Musk gasta 1 billón en tarjetas gráficas: ¿IA? No, Diablo IV

monos estocásticos

Play Episode Listen Later Dec 5, 2024 76:40


0:00 Matías se vende a TikTok 4:30 La carrera de los modelos fundacionales 10:42 Nuestra peña de Tulotero 12:20 Elon Musk va a por todas 24:04 Klarna se ahorra una pasta con la IA 27:27 LoRAs en movimiento con Freepik 30:17 De ChatGPT Plus a ChatGPT Pro 33:58 El modelo de vídeo más grande 35:40 Llega el modelo razonador de Alibaba 40:00 Entrevista con Pilar Manchón de Google 1:02:00 Puerta Grande o Enfermería Promo: Participa en nuestra peña de la Lotería de Navidad de Tulotero con el número 01111. Si eres nuevo registro, tienes una participación gratuita de 1 euro con el código de promoción "monosnavidad". También puedes formar parte de la peña de monos estocásticos buscando "monosnavidad" en la sección de pequeñas de empresa de Tulotero. La peña se cierra el 19 de diciembre a las 23:59 https://tulotero.es/ Entrena un LoRA con tu cara o la de tus amigos en Freepik y anímalos con los modelos de vídeo. Es la suite de herramientas de IA más inituitiva, ¡y hecha en Málaga! https://www.freepik.com/pikaso/ai-image-generator?create=character Feliz cumpleaños, Paréntesis.MEDia https://www.parentesis.media/antoni-esteve-aniversario-parentesis-escribir-historia-futuro/ Monos estocásticos es un podcast sobre inteligencia artificial presentado por Antonio Ortiz (@antonello) y Matías S. Zavia (@matiass). Más en monosestocasticos.com - (0) Matías se vende a TikTok - (04:30) La carrera de los modelos fundacionales - (10:42) Nuestra peña de Tulotero - (12:20) Elon Musk va a por todas - (24:04) Klarna se ahorra una pasta con la IA - (27:27) LoRAs en movimiento con Freepik - (30:17) De ChatGPT Plus a ChatGPT Pro - (33:58) El modelo de vídeo más grande - (35:40) Llega el modelo razonador de Alibaba - (40:00) Entrevista con Pilar Manchón de Google - (1h02) Puerta Grande o Enfermería

Dubuque Area Baseball Podcast
Weighin' in with Worm S2 Episode 15: First Competitions are here!

Dubuque Area Baseball Podcast

Play Episode Listen Later Nov 20, 2024 32:51


What's going on guys! This week, we have the high school girls competing for the first time this season, we recap the Loras win over Stevens, and all the upcoming events going on in the next week! Also, we discuss how technology is growing our sport! So lets enjoy!

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)
Chapter Thirty-Six - Cersei 8 - A Feast for Crows - A Song of Ice and Fire (ASOIAF)

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)

Play Episode Listen Later Nov 5, 2024 75:22


Send us a textCersei learns that Dragonstone has fallen but that Loras has paid with his life - or at least he lies at death's door. She delights in sharing this news with Margaery - who is stricken, but remains hopeful. Cersei dreams of her childhood run-in with Maggy the Frog and the details of the woman's terrifying prophecy. Simon and Mackelly probably should have seen this coming.Chapter Review:Queen Cersei Lannister learns from Aurane Waters that Dragonstone has fallen but that Loras led the assault to his doom. He was grievously wounded multiple times and lies unconscious beyond the skill of any maester to heal. Cersei is delighted. She carefully hides this as she shares the news with Margaery. The young queen is devastated, but latches on to the fact that Loras still lives and seems determined to seek to revive him.More non-Tyrion dwarf's heads are placed before her. The High Septon sends an emissary rather than himself to Cersei's annoyance. The faith is trying to shut down the brothels on the Street of Silk also to Cersei's, and every sailor's, annoyance. Grand Maester Pycelle reports that Lord Gyles Rosby seems to be dying. She dreams of her meeting with Maggy the Frog, the details of whose prophecy continue to come true (she did marry the king, she did become queen), which wouldn't be a problem but the prophecy turns a lot darker after that: she'll be replaced by a younger more beautiful queen, her three children will die, and the valonqar will choke the life from Cersei. She'd like to put a stop to that, and Qyburn thinks that she can.Characters/Places/Names/Events:Cersei Lannister - Mother to King Tommen, Queen Regent of the Seven Kingdoms.Aurane Waters - Grand Admiral, bastard of house Velaryon.King Tommen Baratheon - Child king of the Seven Kingdoms.Taena Merryweather - Social-climbing friend of Queen Cersei.Margaery Tyrell - Queen of the Seven Kingdoms. Only daughter of Mace and Alerie Tyrell.Loras Tyrell - Knight of Flowers. Brother to Margaery and brother of the Kingsguard.Qyburn - Creepy as all hell fallen maester, now Master of Whisperers in King's Landing. Maggy the Frog - Witch who made a prophecy to Cersei that continues to match reality. Support the showSupport us: Buy us a Cup of Arbor Gold, or become a sustainer and receive cool perks Donate to our cause Use our exclusive URL for a free 30-day trial of Audible Buy or gift Marriott Bonvoy points through our affiliate link Rate and review us at Apple Podcasts, Spotify, podchaser.com, and elsewhere.Find us on social media: Discord Twitter @GhostsHarrenhal Facebook Instagram YouTube All Music credits to Ross Bugden:INSTAGRAM! : https://instagram.com/rossbugden/ (rossbugden) TWITTER! : https://twitter.com/RossBugden (@rossbugden) YOUTUBE! : https://www.youtube.com/watch?v=kthxycmF25M

The Run Around Iowa
Season 4, Episode 25: A conversation with former Loras College and Davenport Central star Mason Tope

The Run Around Iowa

Play Episode Listen Later Oct 29, 2024 63:45


Mason opens the conversation, recorded on Oct. 17, by talking about his move to Las Vegas and lived a semi-nomadic lifestyle since graduating from Loras in 2018. Mason tells about a conversation that sparked his return to running competitively in early 2023. The former Loras College conference champion relives a race on the track during his comeback where he competed with a bit of a handicap. Mason talks about a turning point his freshman year of college where he knew had a lot of potential. He goes into detail about the running philosophy at Loras under former coach Bob Schultz. Mason talks about a sickness that partially derailed his senior season at Loras and about a track where he never fulfilled his potential. He tells why he has never run a marathon or half marathon and why he is inclined not to join a running club. Mason looks ahead to his 2025 schedule and mentions two of his passions outside of running. --- Support this podcast: https://podcasters.spotify.com/pod/show/lance-bergeson8/support

Von Eis und Feuer - Ein Buch-Podcast
Sturm der Schwerter - 160 Sansa

Von Eis und Feuer - Ein Buch-Podcast

Play Episode Listen Later Oct 19, 2024 35:59


Sie sind Kinder. Dumme kleine Mädchen, sogar Elinor. Sie haben noch nie eine Schlacht erlebt oder einen Mann sterben gesehen, sie wissen überhaupt nichts. (George R. R. Martin; Sturm der Schwerter, Sansa) Folgenbeschreibung: Bibi und Mele schwelgen in Sansa's Erinnerungen und Vermutungen. Sansa kann sich endlich wieder mit typischen Lady-Themen wie Klatsch und Tratsch auseinander setzen und findet, dass sie Maegery nun genug vor Joffreys Grausamkeiten gewarnt hat. Sie findet sich immer mehr damit ab, die neue Frau an Willa's Seite zu werden, wobei sie Loras einfach nicht vergessen kann... Hier findest du uns: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Von Eis und Feuer Buch-Podcast | Linktree⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Musik: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Free Sound Effects to Download | ZapSplat⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Schriftart im Logo: Venca von ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Pixel Sagas - Neale Davidso⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠n

Close the Door: Game of Thrones, A Song of Ice and Fire Podcast

Spoilers, profanity, Jaime x Brienne. There's kittens for everyone in King Tommen's land. Kevan and Loras get some great one-liners, but Jaime's quips are wasted on the likes of Lancel. Cersei's at her wine-throwing worst, and Jaime's having none of it. A Song of Ice and Fire. A Feast for Crows - Jaime II. Originally recorded in 2019.   Close The Door And Come Here - Episode 551

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

We all have fond memories of the first Dev Day in 2023:and the blip that followed soon after. As Ben Thompson has noted, this year's DevDay took a quieter, more intimate tone. No Satya, no livestream, (slightly fewer people?). Instead of putting ChatGPT announcements in DevDay as in 2023, o1 was announced 2 weeks prior, and DevDay 2024 was reserved purely for developer-facing API announcements, primarily the Realtime API, Vision Finetuning, Prompt Caching, and Model Distillation.However the larger venue and more spread out schedule did allow a lot more hallway conversations with attendees as well as more community presentations including our recent guest Alistair Pullen of Cosine as well as deeper dives from OpenAI including our recent guest Michelle Pokrass of the API Team. Thanks to OpenAI's warm collaboration (we particularly want to thank Lindsay McCallum Rémy!), we managed to record exclusive interviews with many of the main presenters of both the keynotes and breakout sessions. We present them in full in today's episode, together with a full lightly edited Q&A with Sam Altman.Show notes and related resourcesSome of these used in the final audio episode below* Simon Willison Live Blog* swyx live tweets and videos* Greg Kamradt coverage of Structured Output session, Scaling LLM Apps session* Fireside Chat Q&A with Sam AltmanTimestamps* [00:00:00] Intro by Suno.ai* [00:01:23] NotebookLM Recap of DevDay* [00:09:25] Ilan's Strawberry Demo with Realtime Voice Function Calling* [00:19:16] Olivier Godement, Head of Product, OpenAI* [00:36:57] Romain Huet, Head of DX, OpenAI* [00:47:08] Michelle Pokrass, API Tech Lead at OpenAI ft. Simon Willison* [01:04:45] Alistair Pullen, CEO, Cosine (Genie)* [01:18:31] Sam Altman + Kevin Weill Q&A* [02:03:07] Notebook LM Recap of PodcastTranscript[00:00:00] Suno AI: Under dev daylights, code ignites. Real time voice streams reach new heights. O1 and GPT, 4. 0 in flight. Fine tune the future, data in sight. Schema sync up, outputs precise. Distill the models, efficiency splice.[00:00:33] AI Charlie: Happy October. This is your AI co host, Charlie. One of our longest standing traditions is covering major AI and ML conferences in podcast format. Delving, yes delving, into the vibes of what it is like to be there stitched in with short samples of conversations with key players, just to help you feel like you were there.[00:00:54] AI Charlie: Covering this year's Dev Day was significantly more challenging because we were all requested not to record the opening keynotes. So, in place of the opening keynotes, we had the viral notebook LM Deep Dive crew, my new AI podcast nemesis, Give you a seven minute recap of everything that was announced.[00:01:15] AI Charlie: Of course, you can also check the show notes for details. I'll then come back with an explainer of all the interviews we have for you today. Watch out and take care.[00:01:23] NotebookLM Recap of DevDay[00:01:23] NotebookLM: All right, so we've got a pretty hefty stack of articles and blog posts here all about open ais. Dev day 2024.[00:01:32] NotebookLM 2: Yeah, lots to dig into there.[00:01:34] NotebookLM 2: Seems[00:01:34] NotebookLM: like you're really interested in what's new with AI.[00:01:36] NotebookLM 2: Definitely. And it seems like OpenAI had a lot to announce. New tools, changes to the company. It's a lot.[00:01:43] NotebookLM: It is. And especially since you're interested in how AI can be used in the real world, you know, practical applications, we'll focus on that.[00:01:51] NotebookLM: Perfect. Like, for example, this Real time API, they announced that, right? That seems like a big deal if we want AI to sound, well, less like a robot.[00:01:59] NotebookLM 2: It could be huge. The real time API could completely change how we, like, interact with AI. Like, imagine if your voice assistant could actually handle it if you interrupted it.[00:02:08] NotebookLM: Or, like, have an actual conversation.[00:02:10] NotebookLM 2: Right, not just these clunky back and forth things we're used to.[00:02:14] NotebookLM: And they actually showed it off, didn't they? I read something about a travel app, one for languages. Even one where the AI ordered takeout.[00:02:21] NotebookLM 2: Those demos were really interesting, and I think they show how this real time API can be used in so many ways.[00:02:28] NotebookLM 2: And the tech behind it is fascinating, by the way. It uses persistent WebSocket connections and this thing called function calling, so it can respond in real time.[00:02:38] NotebookLM: So the function calling thing, that sounds kind of complicated. Can you, like, explain how that works?[00:02:42] NotebookLM 2: So imagine giving the AI Access to this whole toolbox, right?[00:02:46] NotebookLM 2: Information, capabilities, all sorts of things. Okay. So take the travel agent demo, for example. With function calling, the AI can pull up details, let's say about Fort Mason, right, from some database. Like nearby restaurants, stuff like that.[00:02:59] NotebookLM: Ah, I get it. So instead of being limited to what it already knows, It can go and find the information it needs, like a human travel agent would.[00:03:07] NotebookLM 2: Precisely. And someone on Hacker News pointed out a cool detail. The API actually gives you a text version of what's being said. So you can store that, analyze it.[00:03:17] NotebookLM: That's smart. It seems like OpenAI put a lot of thought into making this API easy for developers to use. But, while we're on OpenAI, you know, Besides their tech, there's been some news about, like, internal changes, too.[00:03:30] NotebookLM: Didn't they say they're moving away from being a non profit?[00:03:32] NotebookLM 2: They did. And it's got everyone talking. It's a major shift. And it's only natural for people to wonder how that'll change things for OpenAI in the future. I mean, there are definitely some valid questions about this move to for profit. Like, will they have more money for research now?[00:03:46] NotebookLM 2: Probably. But will they, you know, care as much about making sure AI benefits everyone?[00:03:51] NotebookLM: Yeah, that's the big question, especially with all the, like, the leadership changes happening at OpenAI too, right? I read that their Chief Research Officer left, and their VP of Research, and even their CTO.[00:04:03] NotebookLM 2: It's true. A lot of people are connecting those departures with the changes in OpenAI's structure.[00:04:08] NotebookLM: And I guess it makes you wonder what's going on behind the scenes. But they are still putting out new stuff. Like this whole fine tuning thing really caught my eye.[00:04:17] NotebookLM 2: Right, fine tuning. It's essentially taking a pre trained AI model. And, like, customizing it.[00:04:23] NotebookLM: So instead of a general AI, you get one that's tailored for a specific job.[00:04:27] NotebookLM 2: Exactly. And that opens up so many possibilities, especially for businesses. Imagine you could train an AI on your company's data, you know, like how you communicate your brand guidelines.[00:04:37] NotebookLM: So it's like having an AI that's specifically trained for your company?[00:04:41] NotebookLM 2: That's the idea.[00:04:41] NotebookLM: And they're doing it with images now, too, right?[00:04:44] NotebookLM: Fine tuning with vision is what they called it.[00:04:46] NotebookLM 2: It's pretty incredible what they're doing with that, especially in fields like medicine.[00:04:50] NotebookLM: Like using AI to help doctors make diagnoses.[00:04:52] NotebookLM 2: Exactly. And AI could be trained on thousands of medical images, right? And then it could potentially spot things that even a trained doctor might miss.[00:05:03] NotebookLM: That's kind of scary, to be honest. What if it gets it wrong?[00:05:06] NotebookLM 2: Well, the idea isn't to replace doctors, but to give them another tool, you know, help them make better decisions.[00:05:12] NotebookLM: Okay, that makes sense. But training these AI models must be really expensive.[00:05:17] NotebookLM 2: It can be. All those tokens add up. But OpenAI announced something called automatic prompt caching.[00:05:23] Alex Volkov: Automatic what now? I don't think I came across that.[00:05:26] NotebookLM 2: So basically, if your AI sees a prompt that it's already seen before, OpenAI will give you a discount.[00:05:31] NotebookLM: Huh. Like a frequent buyer program for AI.[00:05:35] NotebookLM 2: Kind of, yeah. It's good that they're trying to make it more affordable. And they're also doing something called model distillation.[00:05:41] NotebookLM: Okay, now you're just using big words to sound smart. What's that?[00:05:45] NotebookLM 2: Think of it like like a recipe, right? You can take a really complex recipe and break it down to the essential parts.[00:05:50] NotebookLM: Make it simpler, but it still tastes the same.[00:05:53] NotebookLM 2: Yeah. And that's what model distillation is. You take a big, powerful AI model and create a smaller, more efficient version.[00:06:00] NotebookLM: So it's like lighter weight, but still just as capable.[00:06:03] NotebookLM 2: Exactly. And that means more people can actually use these powerful tools. They don't need, like, a supercomputer to run them.[00:06:10] NotebookLM: So they're making AI more accessible. That's great.[00:06:13] NotebookLM 2: It is. And speaking of powerful tools, they also talked about their new O1 model.[00:06:18] NotebookLM 2: That's the one they've been hyping up. The one that's supposed to be this big leap forward.[00:06:22] NotebookLM: Yeah, O1. It sounds pretty futuristic. Like, from what I read, it's not just a bigger, better language model.[00:06:28] NotebookLM 2: Right. It's a different porch.[00:06:29] NotebookLM: They're saying it can, like, actually reason, right? Think.[00:06:33] NotebookLM 2: It's trained differently.[00:06:34] NotebookLM 2: They used reinforcement learning with O1.[00:06:36] NotebookLM: So it's not just finding patterns in the data it's seen before.[00:06:40] NotebookLM 2: Not just that. It can actually learn from its mistakes. Get better at solving problems.[00:06:46] NotebookLM: So give me an example. What can O1 do that, say, GPT 4 can't?[00:06:51] NotebookLM 2: Well, OpenAI showed it doing some pretty impressive stuff with math, like advanced math.[00:06:56] NotebookLM 2: And coding, too. Complex coding. Things that even GPT 4 struggled with.[00:07:00] NotebookLM: So you're saying if I needed to, like, write a screenplay, I'd stick with GPT 4? But if I wanted to solve some crazy physics problem, O1 is what I'd use.[00:07:08] NotebookLM 2: Something like that, yeah. Although there is a trade off. O1 takes a lot more power to run, and it takes longer to get those impressive results.[00:07:17] NotebookLM: Hmm, makes sense. More power, more time, higher quality.[00:07:21] NotebookLM 2: Exactly.[00:07:22] NotebookLM: It sounds like it's still in development, though, right? Is there anything else they're planning to add to it?[00:07:26] NotebookLM 2: Oh, yeah. They mentioned system prompts, which will let developers, like, set some ground rules for how it behaves. And they're working on adding structured outputs and function calling.[00:07:38] Alex Volkov: Wait, structured outputs? Didn't we just talk about that? We[00:07:41] NotebookLM 2: did. That's the thing where the AI's output is formatted in a way that's easy to use.[00:07:47] NotebookLM: Right, right. So you don't have to spend all day trying to make sense of what it gives you. It's good that they're thinking about that stuff.[00:07:53] NotebookLM 2: It's about making these tools usable.[00:07:56] NotebookLM 2: And speaking of that, Dev Day finished up with this really interesting talk. Sam Altman, the CEO of OpenAI, And Kevin Weil, their new chief product officer. They talked about, like, the big picture for AI.[00:08:09] NotebookLM: Yeah, they did, didn't they? Anything interesting come up?[00:08:12] NotebookLM 2: Well, Altman talked about moving past this whole AGI term, Artificial General Intelligence.[00:08:18] NotebookLM: I can see why. It's kind of a loaded term, isn't it?[00:08:20] NotebookLM 2: He thinks it's become a bit of a buzzword, and people don't really understand what it means.[00:08:24] NotebookLM: So are they saying they're not trying to build AGI anymore?[00:08:28] NotebookLM 2: It's more like they're saying they're focused on just Making AI better, constantly improving it, not worrying about putting it in a box.[00:08:36] NotebookLM: That makes sense. Keep pushing the limits.[00:08:38] NotebookLM 2: Exactly. But they were also very clear about doing it responsibly. They talked a lot about safety and ethics.[00:08:43] NotebookLM: Yeah, that's important.[00:08:44] NotebookLM 2: They said they were going to be very careful. About how they release new features.[00:08:48] NotebookLM: Good! Because this stuff is powerful.[00:08:51] NotebookLM 2: It is. It was a lot to take in, this whole Dev Day event.[00:08:54] NotebookLM 2: New tools, big changes at OpenAI, and these big questions about the future of AI.[00:08:59] NotebookLM: It was. But hopefully this deep dive helped make sense of some of it. At least, that's what we try to do here.[00:09:05] AI Charlie: Absolutely.[00:09:06] NotebookLM: Thanks for taking the deep dive with us.[00:09:08] AI Charlie: The biggest demo of the new Realtime API involved function calling with voice mode and buying chocolate covered strawberries from our friendly local OpenAI developer experience engineer and strawberry shop owner, Ilan Biggio.[00:09:21] AI Charlie: We'll first play you the audio of his demo and then go into a little interview with him.[00:09:25] Ilan's Strawberry Demo with Realtime Voice Function Calling[00:09:25] Romain Huet: Could you place a call and see if you could get us 400 strawberries delivered to the venue? But please keep that under 1500. I'm on it. We'll get those strawberries delivered for you.[00:09:47] Ilan: Hello? Hi there. Is this Ilan? I'm Romain's AI assistant. How is it going? Fantastic. Can you tell me what flavors of strawberry dips you have for me? Yeah, we have chocolate, vanilla, and we have peanut butter. Wait, how much would 400 chocolate covered strawberries cost? 400? Are you sure you want 400? Yes, 400 chocolate covered[00:10:14] swyx: strawberries.[00:10:15] Ilan: Wait,[00:10:16] swyx: how much[00:10:16] Ilan: would that be? I think that'll be around, like, 1, 415. 92.[00:10:25] Alex Volkov: Awesome. Let's go ahead and place the order for four chocolate covered strawberries.[00:10:31] Ilan: Great, where would you like that delivered? Please deliver them to the Gateway Pavilion at Fort Mason. And I'll be paying in cash.[00:10:42] Alex Volkov: Okay,[00:10:43] Ilan: sweet. So just to confirm, you want four strawberries?[00:10:45] Ilan: 400 chocolate covered strawberries to the Gateway Pavilion. Yes, that's perfect. And when can we expect delivery? Well, you guys are right nearby, so it'll be like, I don't know, 37 seconds? That's incredibly fast. Cool, you too.[00:11:09] swyx: Hi, Ilan, welcome to Lanespace. Oh, thank you. I just saw your amazing demos, had your amazing strawberries. You are dressed up, like, exactly like a strawberry salesman. Gotta have it all. What was the building on demo like? What was the story behind the demo?[00:11:22] swyx: It was really interesting. This is actually something I had been thinking about for months before the launch.[00:11:27] swyx: Like, having a, like, AI that can make phone calls is something like I've personally wanted for a long time. And so as soon as we launched internally, like, I started hacking on it. And then that sort of just started. We made it into like an internal demo, and then people found it really interesting, and then we thought how cool would it be to have this like on stage as, as one of the demos.[00:11:47] swyx: Yeah, would would you call out any technical issues building, like you were basically one of the first people ever to build with a voice mode API. Would you call out any issues like integrating it with Twilio like that, like you did with function calling, with like a form filling elements. I noticed that you had like intents of things to fulfill, and then.[00:12:07] swyx: When there's still missing info, the voice would prompt you, roleplaying the store guy.[00:12:13] swyx: Yeah, yeah, so, I think technically, there's like the whole, just working with audio and streams is a whole different beast. Like, even separate from like AI and this, this like, new capabilities, it's just, it's just tough.[00:12:26] swyx: Yeah, when you have a prompt, conversationally it'll just follow, like the, it was, Instead of like, kind of step by step to like ask the right questions based on like the like what the request was, right? The function calling itself is sort of tangential to that. Like, you have to prompt it to call the functions, but then handling it isn't too much different from, like, what you would do with assistant streaming or, like, chat completion streaming.[00:12:47] swyx: I think, like, the API feels very similar just to, like, if everything in the API was streaming, it actually feels quite familiar to that.[00:12:53] swyx: And then, function calling wise, I mean, does it work the same? I don't know. Like, I saw a lot of logs. You guys showed, like, in the playground, a lot of logs. What is in there?[00:13:03] swyx: What should people know?[00:13:04] swyx: Yeah, I mean, it is, like, the events may have different names than the streaming events that we have in chat completions, but they represent very similar things. It's things like, you know, function call started, argument started, it's like, here's like argument deltas, and then like function call done.[00:13:20] swyx: Conveniently we send one that has the full function, and then I just use that. Nice.[00:13:25] swyx: Yeah and then, like, what restrictions do, should people be aware of? Like, you know, I think, I think, before we recorded, we discussed a little bit about the sensitivities around basically calling random store owners and putting, putting like an AI on them.[00:13:40] swyx: Yeah, so there's, I think there's recent regulation on that, which is why we want to be like very, I guess, aware of, of You know, you can't just call anybody with AI, right? That's like just robocalling. You wouldn't want someone just calling you with AI.[00:13:54] swyx: I'm a developer, I'm about to do this on random people.[00:13:57] swyx: What laws am I about to break?[00:14:00] swyx: I forget what the governing body is, but you should, I think, Having consent of the person you're about to call, it always works. I, as the strawberry owner, have consented to like getting called with AI. I think past that you, you want to be careful. Definitely individuals are more sensitive than businesses.[00:14:19] swyx: I think businesses you have a little bit more leeway. Also, they're like, businesses I think have an incentive to want to receive AI phone calls. Especially if like, they're dealing with it. It's doing business. Right, like, it's more business. It's kind of like getting on a booking platform, right, you're exposed to more.[00:14:33] swyx: But, I think it's still very much like a gray area. Again, so. I think everybody should, you know, tread carefully, like, figure out what it is. I, I, I, the law is so recent, I didn't have enough time to, like, I'm also not a lawyer. Yeah, yeah, yeah, of course. Yeah.[00:14:49] swyx: Okay, cool fair enough. One other thing, this is kind of agentic.[00:14:52] swyx: Did you use a state machine at all? Did you use any framework? No. You just stick it in context and then just run it in a loop until it ends call?[00:15:01] swyx: Yeah, there isn't even a loop, like Okay. Because the API is just based on sessions. It's always just going to keep going. Every time you speak, it'll trigger a call.[00:15:11] swyx: And then after every function call was also invoked invoking like a generation. And so that is another difference here. It's like it's inherently almost like in a loop, be just by being in a session, right? No state machines needed. I'd say this is very similar to like, the notion of routines, where it's just like a list of steps.[00:15:29] swyx: And it, like, sticks to them softly, but usually pretty well. And the steps is the prompts? The steps, it's like the prompt, like the steps are in the prompt. Yeah, yeah, yeah. Right, it's like step one, do this, step one, step two, do that. What if I want to change the system prompt halfway through the conversation?[00:15:44] swyx: You can. Okay. You can. To be honest, I have not played without two too much. Yeah,[00:15:47] swyx: yeah.[00:15:48] swyx: But, I know you can.[00:15:49] swyx: Yeah, yeah. Yeah. Awesome. I noticed that you called it real time API, but not voice API. Mm hmm. So I assume that it's like real time API starting with voice. Right, I think that's what he said on the thing.[00:16:00] swyx: I can't imagine, like, what else is real[00:16:02] swyx: time? Well, I guess, to use ChatGPT's voice mode as an example, Like, we've demoed the video, right? Like, real time image, right? So, I'm not actually sure what timelines are, But I would expect, if I had to guess, That, like, that is probably the next thing that we're gonna be making.[00:16:17] swyx: You'd probably have to talk directly with the team building this. Sure. But, You can't promise their timelines. Yeah, yeah, yeah, right, exactly. But, like, given that this is the features that currently, Or that exists that we've demoed on Chachapiti. Yeah. There[00:16:29] swyx: will never be a[00:16:29] swyx: case where there's like a real time text API, right?[00:16:31] swyx: I don't Well, this is a real time text API. You can do text only on this. Oh. Yeah. I don't know why you would. But it's actually So text to text here doesn't quite make a lot of sense. I don't think you'll get a lot of latency gain. But, like, speech to text is really interesting. Because you can prevent You can prevent responses, like audio responses.[00:16:54] swyx: And force function calls. And so you can do stuff like UI control. That is like super super reliable. We had a lot of like, you know, un, like, we weren't sure how well this was gonna work because it's like, you have a voice answering. It's like a whole persona, right? Like, that's a little bit more, you know, risky.[00:17:10] swyx: But if you, like, cut out the audio outputs and make it so it always has to output a function, like you can end up with pretty pretty good, like, Pretty reliable, like, command like a command architecture. Yeah,[00:17:21] swyx: actually, that's the way I want to interact with a lot of these things as well. Like, one sided voice.[00:17:26] swyx: Yeah, you don't necessarily want to hear the[00:17:27] swyx: voice back. And like, sometimes it's like, yeah, I think having an output voice is great. But I feel like I don't always want to hear an output voice. I'd say usually I don't. But yeah, exactly, being able to speak to it is super sweet.[00:17:39] swyx: Cool. Do you want to comment on any of the other stuff that you announced?[00:17:41] swyx: From caching I noticed was like, I like the no code change part. I'm looking forward to the docs because I'm sure there's a lot of details on like, what you cache, how long you cache. Cause like, enthalpy caches were like 5 minutes. I was like, okay, but what if I don't make a call every 5 minutes?[00:17:56] swyx: Yeah,[00:17:56] swyx: to be super honest with you, I've been so caught up with the real time API and making the demo that I haven't read up on the other stuff. Launches too much. I mean, I'm aware of them, but I think I'm excited to see how all distillation works. That's something that we've been doing like, I don't know, I've been like doing it between our models for a while And I've seen really good results like I've done back in a day like from GPT 4 to GPT 3.[00:18:19] swyx: 5 And got like, like pretty much the same level of like function calling with like hundreds of functions So that was super super compelling So, I feel like easier distillation, I'm really excited for. I see. Is it a tool?[00:18:31] swyx: So, I saw evals. Yeah. Like, what is the distillation product? It wasn't super clear, to be honest.[00:18:36] swyx: I, I think I want to, I want to let that team, I want to let that team talk about it. Okay,[00:18:40] swyx: alright. Well, I appreciate you jumping on. Yeah, of course. Amazing demo. It was beautifully designed. I'm sure that was part of you and Roman, and[00:18:47] swyx: Yeah, I guess, shout out to like, the first people to like, creators of Wanderlust, originally, were like, Simon and Carolis, and then like, I took it and built the voice component and the voice calling components.[00:18:59] swyx: Yeah, so it's been a big team effort. And like the entire PI team for like Debugging everything as it's been going on. It's been, it's been so good working with them. Yeah, you're the first consumers on the DX[00:19:07] swyx: team. Yeah. Yeah, I mean, the classic role of what we do there. Yeah. Okay, yeah, anything else? Any other call to action?[00:19:13] swyx: No, enjoy Dev Day. Thank you. Yeah. That's it.[00:19:16] Olivier Godement, Head of Product, OpenAI[00:19:16] AI Charlie: The latent space crew then talked to Olivier Godmont, head of product for the OpenAI platform, who led the entire Dev Day keynote and introduced all the major new features and updates that we talked about today.[00:19:28] swyx: Okay, so we are here with Olivier Godmont. That's right.[00:19:32] swyx: I don't pronounce French. That's fine. It was perfect. And it was amazing to see your keynote today. What was the back story of, of preparing something like this? Preparing, like, Dev Day? It[00:19:43] Olivier Godement: essentially came from a couple of places. Number one, excellent reception from last year's Dev Day.[00:19:48] Olivier Godement: Developers, startup founders, researchers want to spend more time with OpenAI, and we want to spend more time with them as well. And so for us, like, it was a no brainer, frankly, to do it again, like, you know, like a nice conference. The second thing is going global. We've done a few events like in Paris and like a few other like, you know, non European, non American countries.[00:20:05] Olivier Godement: And so this year we're doing SF, Singapore, and London. To frankly just meet more developers.[00:20:10] swyx: Yeah, I'm very excited for the Singapore one.[00:20:12] Olivier Godement: Ah,[00:20:12] swyx: yeah. Will you be[00:20:13] Olivier Godement: there?[00:20:14] swyx: I don't know. I don't know if I got an invite. No. I can't just talk to you. Yeah, like, and then there was some speculation around October 1st.[00:20:22] Olivier Godement: Yeah. Is it because[00:20:23] swyx: 01, October 1st? It[00:20:25] Olivier Godement: has nothing to do. I discovered the tweet yesterday where like, people are so creative. No one, there was no connection to October 1st. But in hindsight, that would have been a pretty good meme by Tiana. Okay.[00:20:37] swyx: Yeah, and you know, I think like, OpenAI's outreach to developers is something that I felt the whole in 2022, when like, you know, like, people were trying to build a chat GPT, and like, there was no function calling, all that stuff that you talked about in the past.[00:20:51] swyx: And that's why I started my own conference as like like, here's our little developer conference thing. And, but to see this OpenAI Dev Day now, and like to see so many developer oriented products coming to OpenAI, I think it's really encouraging.[00:21:02] Olivier Godement: Yeah, totally. It's that's what I said, essentially, like, developers are basically the people who make the best connection between the technology and, you know, the future, essentially.[00:21:14] Olivier Godement: Like, you know, essentially see a capability, see a low level, like, technology, and are like, hey, I see how that application or that use case that can be enabled. And so, in the direction of enabling, like, AGI, like, all of humanity, it's a no brainer for us, like, frankly, to partner with Devs.[00:21:31] Alessio: And most importantly, you almost never had waitlists, which, compared to like other releases, people usually, usually have.[00:21:38] Alessio: What is the, you know, you had from caching, you had real time voice API, we, you know, Shawn did a long Twitter thread, so people know the releases. Yeah. What is the thing that was like sneakily the hardest to actually get ready for, for that day, or like, what was the kind of like, you know, last 24 hours, anything that you didn't know was gonna work?[00:21:56] Olivier Godement: Yeah. The old Fairly, like, I would say, involved, like, features to ship. So the team has been working for a month, all of them. The one which I would say is the newest for OpenAI is the real time API. For a couple of reasons. I mean, one, you know, it's a new modality. Second, like, it's the first time that we have an actual, like, WebSocket based API.[00:22:16] Olivier Godement: And so, I would say that's the one that required, like, the most work over the month. To get right from a developer perspective and to also make sure that our existing safety mitigation that worked well with like real time audio in and audio out.[00:22:30] swyx: Yeah, what design choices or what was like the sort of design choices that you want to highlight?[00:22:35] swyx: Like, you know, like I think for me, like, WebSockets, you just receive a bunch of events. It's two way. I obviously don't have a ton of experience. I think a lot of developers are going to have to embrace this real time programming. Like, what are you designing for, or like, what advice would you have for developers exploring this?[00:22:51] Olivier Godement: The core design hypothesis was essentially, how do we enable, like, human level latency? We did a bunch of tests, like, on average, like, human beings, like, you know, takes, like, something like 300 milliseconds to converse with each other. And so that was the design principle, essentially. Like, working backward from that, and, you know, making the technology work.[00:23:11] Olivier Godement: And so we evaluated a few options, and WebSockets was the one that we landed on. So that was, like, one design choice. A few other, like, big design choices that we had to make prompt caching. Prompt caching, the design, like, target was automated from the get go. Like, zero code change from the developer.[00:23:27] Olivier Godement: That way you don't have to learn, like, what is a prompt prefix, and, you know, how long does a cache work, like, we just do it as much as we can, essentially. So that was a big design choice as well. And then finally, on distillation, like, and evaluation. The big design choice was something I learned at Skype, like in my previous job, like a philosophy around, like, a pit of success.[00:23:47] Olivier Godement: Like, what is essentially the, the, the minimum number of steps for the majority of developers to do the right thing? Because when you do evals on fat tuning, there are many, many ways, like, to mess it up, frankly, like, you know, and have, like, a crappy model, like, evals that tell, like, a wrong story. And so our whole design was, okay, we actually care about, like, helping people who don't have, like, that much experience, like, evaluating a model, like, get, like, in a few minutes, like, to a good spot.[00:24:11] Olivier Godement: And so how do we essentially enable that bit of success, like, in the product flow?[00:24:15] swyx: Yeah, yeah, I'm a little bit scared to fine tune especially for vision, because I don't know what I don't know for stuff like vision, right? Like, for text, I can evaluate pretty easily. For vision let's say I'm like trying to, one of your examples was grab.[00:24:33] swyx: Which, very close to home, I'm from Singapore. I think your example was like, they identified stop signs better. Why is that hard? Why do I have to fine tune that? If I fine tune that, do I lose other things? You know, like, there's a lot of unknowns with Vision that I think developers have to figure out.[00:24:50] swyx: For[00:24:50] Olivier Godement: sure. Vision is going to open up, like, a new, I would say, evaluation space. Because you're right, like, it's harder, like, you know, to tell correct from incorrect, essentially, with images. What I can say is we've been alpha testing, like, the Vision fine tuning, like, for several weeks at that point. We are seeing, like, even higher performance uplift compared to text fine tuning.[00:25:10] Olivier Godement: So that's, there is something here, like, we've been pretty impressed, like, in a good way, frankly. But, you know, how well it works. But for sure, like, you know, I expect the developers who are moving from one modality to, like, text and images will have, like, more, you know Testing, evaluation, like, you know, to set in place, like, to make sure it works well.[00:25:25] Alessio: The model distillation and evals is definitely, like, the most interesting. Moving away from just being a model provider to being a platform provider. How should people think about being the source of truth? Like, do you want OpenAI to be, like, the system of record of all the prompting? Because people sometimes store it in, like, different data sources.[00:25:41] Alessio: And then, is that going to be the same as the models evolve? So you don't have to worry about, you know, refactoring the data, like, things like that, or like future model structures.[00:25:51] Olivier Godement: The vision is if you want to be a source of truth, you have to earn it, right? Like, we're not going to force people, like, to pass us data.[00:25:57] Olivier Godement: There is no value prop, like, you know, for us to store the data. The vision here is at the moment, like, most developers, like, use like a one size fits all model, like be off the shelf, like GP40 essentially. The vision we have is fast forward a couple of years. I think, like, most developers will essentially, like, have a.[00:26:15] Olivier Godement: An automated, continuous, fine tuned model. The more, like, you use the model, the more data you pass to the model provider, like, the model is automatically, like, fine tuned, evaluated against some eval sets, and essentially, like, you don't have to every month, when there is a new snapshot, like, you know, to go online and, you know, try a few new things.[00:26:34] Olivier Godement: That's a direction. We are pretty far away from it. But I think, like, that evaluation and decision product are essentially a first good step in that direction. It's like, hey, it's you. I set it by that direction, and you give us the evaluation data. We can actually log your completion data and start to do some automation on your behalf.[00:26:52] Alessio: And then you can do evals for free if you share data with OpenAI. How should people think about when it's worth it, when it's not? Sometimes people get overly protective of their data when it's actually not that useful. But how should developers think about when it's right to do it, when not, or[00:27:07] Olivier Godement: if you have any thoughts on it?[00:27:08] Olivier Godement: The default policy is still the same, like, you know, we don't train on, like, any API data unless you opt in. What we've seen from feedback is evaluation can be expensive. Like, if you run, like, O1 evals on, like, thousands of samples Like, your build will get increased, like, you know, pretty pretty significantly.[00:27:22] Olivier Godement: That's problem statement number one. Problem statement number two is, essentially, I want to get to a world where whenever OpenAI ships a new model snapshot, we have full confidence that there is no regression for the task that developers care about. And for that to be the case, essentially, we need to get evals.[00:27:39] Olivier Godement: And so that, essentially, is a sort of a two bugs one stone. It's like, we subsidize, basically, the evals. And we also use the evals when we ship new models to make sure that we keep going in the right direction. So, in my sense, it's a win win, but again, completely opt in. I expect that many developers will not want to share their data, and that's perfectly fine to me.[00:27:56] swyx: Yeah, I think free evals though, very, very good incentive. I mean, it's a fair trade. You get data, we get free evals. Exactly,[00:28:04] Olivier Godement: and we sanitize PII, everything. We have no interest in the actual sensitive data. We just want to have good evaluation on the real use cases.[00:28:13] swyx: Like, I always want to eval the eval. I don't know if that ever came up.[00:28:17] swyx: Like, sometimes the evals themselves are wrong, and there's no way for me to tell you.[00:28:22] Olivier Godement: Everyone who is starting with LLM, teaching with LLM, is like, Yeah, evaluation, easy, you know, I've done testing, like, all my life. And then you start to actually be able to eval, understand, like, all the corner cases, And you realize, wow, there's like a whole field in itself.[00:28:35] Olivier Godement: So, yeah, good evaluation is hard and so, yeah. Yeah, yeah.[00:28:38] swyx: But I think there's a, you know, I just talked to Brain Trust which I think is one of your partners. Mm-Hmm. . They also emphasize code based evals versus your sort of low code. What I see is like, I don't know, maybe there's some more that you didn't demo.[00:28:53] swyx: YC is kind of like a low code experience, right, for evals. Would you ever support like a more code based, like, would I run code on OpenAI's eval platform?[00:29:02] Olivier Godement: For sure. I mean, we meet developers where they are, you know. At the moment, the demand was more for like, you know, easy to get started, like eval. But, you know, if we need to expose like an evaluation API, for instance, for people like, you know, to pass, like, you know, their existing test data we'll do it.[00:29:15] Olivier Godement: So yeah, there is no, you know, philosophical, I would say, like, you know, misalignment on that. Yeah,[00:29:19] swyx: yeah, yeah. What I think this is becoming, by the way, and I don't, like it's basically, like, you're becoming AWS. Like, the AI cloud. And I don't know if, like, that's a conscious strategy, or it's, like, It doesn't even have to be a conscious strategy.[00:29:33] swyx: Like, you're going to offer storage. You're going to offer compute. You're going to offer networking. I don't know what networking looks like. Networking is maybe, like, Caching or like it's a CDN. It's a prompt CDN.[00:29:45] Alex Volkov: Yeah,[00:29:45] swyx: but it's the AI versions of everything, right? Do you like do you see the analogies or?[00:29:52] Olivier Godement: Whatever Whatever I took to developers. I feel like Good models are just half of the story to build a good app There's a third model you need to do Evaluation is the perfect example. Like, you know, you can have the best model in the world If you're in the dark, like, you know, it's really hard to gain the confidence and so Our philosophy is[00:30:11] Olivier Godement: The whole like software development stack is being basically reinvented, you know, with LLMs. There is no freaking way that open AI can build everything. Like there is just too much to build, frankly. And so my philosophy is, essentially, we'll focus on like the tools which are like the closest to the model itself.[00:30:28] Olivier Godement: So that's why you see us like, you know, investing quite a bit in like fine tuning, distillation, our evaluation, because we think that it actually makes sense to have like in one spot, Like, you know, all of that. Like, there is some sort of virtual circle, essentially, that you can set in place. But stuff like, you know, LLMOps, like tools which are, like, further away from the model, I don't know if you want to do, like, you know, super elaborate, like, prompt management, or, you know, like, tooling, like, I'm not sure, like, you know, OpenAI has, like, such a big edge, frankly, like, you know, to build this sort of tools.[00:30:56] Olivier Godement: So that's how we view it at the moment. But again, frankly, the philosophy is super simple. The strategy is super simple. It's meeting developers where they want us to be. And so, you know that's frankly, like, you know, day in, day out, like, you know, what I try to do.[00:31:08] Alessio: Cool. Thank you so much for the time.[00:31:10] Alessio: I'm sure you,[00:31:10] swyx: Yeah, I have more questions on, a couple questions on voice, and then also, like, your call to action, like, what you want feedback on, right? So, I think we should spend a bit more time on voice, because I feel like that's, like, the big splash thing. I talked well Well, I mean, I mean, just what is the future of real time for OpenAI?[00:31:28] swyx: Yeah. Because I think obviously video is next. You already have it in the, the ChatGPT desktop app. Do we just have a permanent, like, you know, like, are developers just going to be, like, sending sockets back and forth with OpenAI? Like how do we program for that? Like, what what is the future?[00:31:44] Olivier Godement: Yeah, that makes sense. I think with multimodality, like, real time is quickly becoming, like, you know, essentially the right experience, like, to build an application. Yeah. So my expectation is that we'll see like a non trivial, like a volume of applications like moving to a real time API. Like if you zoom out, like, audio is really simple, like, audio until basically now.[00:32:05] Olivier Godement: Audio on the web, in apps, was basically very much like a second class citizen. Like, you basically did like an audio chatbot for users who did not have a choice. You know, they were like struggling to read, or I don't know, they were like not super educated with technology. And so, frankly, it was like the crappy option, you know, compared to text.[00:32:25] Olivier Godement: But when you talk to people in the real world, the vast majority of people, like, prefer to talk and listen instead of typing and writing.[00:32:34] swyx: We speak before we write.[00:32:35] Olivier Godement: Exactly. I don't know. I mean, I'm sure it's the case for you in Singapore. For me, my friends in Europe, the number of, like, WhatsApp, like, voice notes they receive every day, I mean, just people, it makes sense, frankly, like, you know.[00:32:45] Olivier Godement: Chinese. Chinese, yeah.[00:32:46] swyx: Yeah,[00:32:47] Olivier Godement: all voice. You know, it's easier. There is more emotions. I mean, you know, you get the point across, like, pretty well. And so my personal ambition for, like, the real time API and, like, audio in general is to make, like, audio and, like, multimodality, like, truly a first class experience.[00:33:01] Olivier Godement: Like, you know, if you're, like, you know, the amazing, like, super bold, like, start up out of YC, you want to build, like, the next, like, billion, like, you know, user application to make it, like, truly your first and make it feel, like, you know, an actual good, like, you know, product experience. So that's essentially the ambition, and I think, like, yeah, it could be pretty big.[00:33:17] swyx: Yeah. I think one, one people, one issue that people have with the voice so far as, as released in advanced voice mode is the refusals.[00:33:24] Alex Volkov: Yeah.[00:33:24] swyx: You guys had a very inspiring model spec. I think Joanne worked on that. Where you said, like, yeah, we don't want to overly refuse all the time. In fact, like, even if, like, not safe for work, like, in some occasions, it's okay.[00:33:38] swyx: How, is there an API that we can say, not safe for work, okay?[00:33:41] Olivier Godement: I think we'll get there. I think we'll get there. The mobile spec, like, nailed it, like, you know. It nailed it! It's so good! Yeah, we are not in the business of, like, policing, you know, if you can say, like, vulgar words or whatever. You know, there are some use cases, like, you know, I'm writing, like, a Hollywood, like, script I want to say, like, will go on, and it's perfectly fine, you know?[00:33:59] Olivier Godement: And so I think the direction where we'll go here is that basically There will always be like, you know, a set of behavior that we will, you know, just like forbid, frankly, because they're illegal against our terms of services. But then there will be like, you know, some more like risky, like themes, which are completely legal, like, you know, vulgar words or, you know, not safe for work stuff.[00:34:17] Olivier Godement: Where basically we'll expose like a controllable, like safety, like knobs in the API to basically allow you to say, hey, that theme okay, that theme not okay. How sensitive do you want the threshold to be on safety refusals? I think that's the Dijkstra. So a[00:34:31] swyx: safety API.[00:34:32] Olivier Godement: Yeah, in a way, yeah.[00:34:33] swyx: Yeah, we've never had that.[00:34:34] Olivier Godement: Yeah. '[00:34:35] swyx: cause right now is you, it is whatever you decide. And then it's, that's it. That, that, that would be the main reason I don't use opening a voice is because of[00:34:42] Olivier Godement: it's over police. Over refuse over refusals. Yeah. Yeah, yeah. No, we gotta fix that. Yeah. Like singing,[00:34:47] Alessio: we're trying to do voice. I'm a singer.[00:34:49] swyx: And you, you locked off singing.[00:34:51] swyx: Yeah,[00:34:51] Alessio: yeah, yeah.[00:34:52] swyx: But I, I understand music gets you in trouble. Okay. Yeah. So then, and then just generally, like, what do you want to hear from developers? Right? We have, we have all developers watching you know, what feedback do you want? Any, anything specific as well, like from, especially from today anything that you are unsure about, that you are like, Our feedback could really help you decide.[00:35:09] swyx: For sure.[00:35:10] Olivier Godement: I think, essentially, it's becoming pretty clear after today that, you know, I would say the open end direction has become pretty clear, like, you know, after today. Investment in reasoning, investment in multimodality, Investment as well, like in, I would say, tool use, like function calling. To me, the biggest question I have is, you know, Where should we put the cursor next?[00:35:30] Olivier Godement: I think we need all three of them, frankly, like, you know, so we'll keep pushing.[00:35:33] swyx: Hire 10, 000 people, or actually, no need, build a bunch of bots.[00:35:37] Olivier Godement: Exactly, and so let's take O1 smart enough, like, for your problems? Like, you know, let's set aside for a second the existing models, like, for the apps that you would love to build, is O1 basically it in reasoning, or do we still have, like, you know, a step to do?[00:35:50] Olivier Godement: Preview is not enough, I[00:35:52] swyx: need the full one.[00:35:53] Olivier Godement: Yeah, so that's exactly that sort of feedback. Essentially what they would love to do is for developers I mean, there's a thing that Sam has been saying like over and over again, like, you know, it's easier said than done, but I think it's directionally correct. As a developer, as a founder, you basically want to build an app which is a bit too difficult for the model today, right?[00:36:12] Olivier Godement: Like, what you think is right, it's like, sort of working, sometimes not working. And that way, you know, that basically gives us like a goalpost, and be like, okay, that's what you need to enable with the next model release, like in a few months. And so I would say that Usually, like, that's the sort of feedback which is like the most useful that I can, like, directly, like, you know, incorporate.[00:36:33] swyx: Awesome. I think that's our time. Thank you so much, guys. Yeah, thank you so much.[00:36:38] AI Charlie: Thank you. We were particularly impressed that Olivier addressed the not safe for work moderation policy question head on, as that had only previously been picked up on in Reddit forums. This is an encouraging sign that we will return to in the closing candor with Sam Altman at the end of this episode.[00:36:57] Romain Huet, Head of DX, OpenAI[00:36:57] AI Charlie: Next, a chat with Roman Hewitt, friend of the pod, AI Engineer World's fair closing keynote speaker, and head of developer experience at OpenAI on his incredible live demos And advice to AI engineers on all the new modalities.[00:37:12] Alessio: Alright, we're live from OpenAI Dev Day. We're with Juan, who just did two great demos on, on stage.[00:37:17] Alessio: And he's been a friend of Latentspace, so thanks for taking some of the time.[00:37:20] Romain Huet: Of course, yeah, thank you for being here and spending the time with us today.[00:37:23] swyx: Yeah, I appreciate appreciate you guys putting this on. I, I know it's like extra work, but it really shows the developers that you're, Care and about reaching out.[00:37:31] Romain Huet: Yeah, of course, I think when you go back to the OpenAI mission, I think for us it's super important that we have the developers involved in everything we do. Making sure that you know, they have all of the tools they need to build successful apps. And we really believe that the developers are always going to invent the ideas, the prototypes, the fun factors of AI that we can't build ourselves.[00:37:49] Romain Huet: So it's really cool to have everyone here.[00:37:51] swyx: We had Michelle from you guys on. Yes, great episode. She very seriously said API is the path to AGI. Correct. And people in our YouTube comments were like, API is not AGI. I'm like, no, she's very serious. API is the path to AGI. Like, you're not going to build everything like the developers are, right?[00:38:08] swyx: Of[00:38:08] Romain Huet: course, yeah, that's the whole value of having a platform and an ecosystem of amazing builders who can, like, in turn, create all of these apps. I'm sure we talked about this before, but there's now more than 3 million developers building on OpenAI, so it's pretty exciting to see all of that energy into creating new things.[00:38:26] Alessio: I was going to say, you built two apps on stage today, an international space station tracker and then a drone. The hardest thing must have been opening Xcode and setting that up. Now, like, the models are so good that they can do everything else. Yes. You had two modes of interaction. You had kind of like a GPT app to get the plan with one, and then you had a cursor to do apply some of the changes.[00:38:47] Alessio: Correct. How should people think about the best way to consume the coding models, especially both for You know, brand new projects and then existing projects that you're trying to modify.[00:38:56] Romain Huet: Yeah. I mean, one of the things that's really cool about O1 Preview and O1 Mini being available in the API is that you can use it in your favorite tools like cursor like I did, right?[00:39:06] Romain Huet: And that's also what like Devin from Cognition can use in their own software engineering agents. In the case of Xcode, like, it's not quite deeply integrated in Xcode, so that's why I had like chat GPT side by side. But it's cool, right, because I could instruct O1 Preview to be, like, my coding partner and brainstorming partner for this app, but also consolidate all of the, the files and architect the app the way I wanted.[00:39:28] Romain Huet: So, all I had to do was just, like, port the code over to Xcode and zero shot the app build. I don't think I conveyed, by the way, how big a deal that is, but, like, you can now create an iPhone app from scratch, describing a lot of intricate details that you want, and your vision comes to life in, like, a minute.[00:39:47] Romain Huet: It's pretty outstanding.[00:39:48] swyx: I have to admit, I was a bit skeptical because if I open up SQL, I don't know anything about iOS programming. You know which file to paste it in. You probably set it up a little bit. So I'm like, I have to go home and test it. And I need the ChatGPT desktop app so that it can tell me where to click.[00:40:04] Romain Huet: Yeah, I mean like, Xcode and iOS development has become easier over the years since they introduced Swift and SwiftUI. I think back in the days of Objective C, or like, you know, the storyboard, it was a bit harder to get in for someone new. But now with Swift and SwiftUI, their dev tools are really exceptional.[00:40:23] Romain Huet: But now when you combine that with O1, as your brainstorming and coding partner, it's like your architect, effectively. That's the best way, I think, to describe O1. People ask me, like, can GPT 4 do some of that? And it certainly can. But I think it will just start spitting out code, right? And I think what's great about O1, is that it can, like, make up a plan.[00:40:42] Romain Huet: In this case, for instance, the iOS app had to fetch data from an API, it had to look at the docs, it had to look at, like, how do I parse this JSON, where do I store this thing, and kind of wire things up together. So that's where it really shines. Is mini or preview the better model that people should be using?[00:40:58] Romain Huet: Like, how? I think people should try both. We're obviously very excited about the upcoming O1 that we shared the evals for. But we noticed that O1 Mini is very, very good at everything math, coding, everything STEM. If you need for your kind of brainstorming or your kind of science part, you need some broader knowledge than reaching for O1 previews better.[00:41:20] Romain Huet: But yeah, I used O1 Mini for my second demo. And it worked perfectly. All I needed was very much like something rooted in code, architecting and wiring up like a front end, a backend, some UDP packets, some web sockets, something very specific. And it did that perfectly.[00:41:35] swyx: And then maybe just talking about voice and Wanderlust, the app that keeps on giving, what's the backstory behind like preparing for all of that?[00:41:44] Romain Huet: You know, it's funny because when last year for Dev Day, we were trying to think about what could be a great demo app to show like an assistive experience. I've always thought travel is a kind of a great use case because you have, like, pictures, you have locations, you have the need for translations, potentially.[00:42:01] Romain Huet: There's like so many use cases that are bounded to travel that I thought last year, let's use a travel app. And that's how Wanderlust came to be. But of course, a year ago, all we had was a text based assistant. And now we thought, well, if there's a voice modality, what if we just bring this app back as a wink.[00:42:19] Romain Huet: And what if we were interacting better with voice? And so with this new demo, what I showed was the ability to like, So, we wanted to have a complete conversation in real time with the app, but also the thing we wanted to highlight was the ability to call tools and functions, right? So, like in this case, we placed a phone call using the Twilio API, interfacing with our AI agents, but developers are so smart that they'll come up with so many great ideas that we could not think of ourselves, right?[00:42:48] Romain Huet: But what if you could have like a, you know, a 911 dispatcher? What if you could have like a customer service? Like center, that is much smarter than what we've been used to today. There's gonna be so many use cases for real time, it's awesome.[00:43:00] swyx: Yeah, and sometimes actually you, you, like this should kill phone trees.[00:43:04] swyx: Like there should not be like dial one[00:43:07] Romain Huet: of course para[00:43:08] swyx: espanol, you know? Yeah, exactly. Or whatever. I dunno.[00:43:12] Romain Huet: I mean, even you starting speaking Spanish would just do the thing, you know you don't even have to ask. So yeah, I'm excited for this future where we don't have to interact with those legacy systems.[00:43:22] swyx: Yeah. Yeah. Is there anything, so you are doing function calling in a streaming environment. So basically it's, it's web sockets. It's UDP, I think. It's basically not guaranteed to be exactly once delivery. Like, is there any coding challenges that you encountered when building this?[00:43:39] Romain Huet: Yeah, it's a bit more delicate to get into it.[00:43:41] Romain Huet: We also think that for now, what we, what we shipped is a, is a beta of this API. I think there's much more to build onto it. It does have the function calling and the tools. But we think that for instance, if you want to have something very robust, On your client side, maybe you want to have web RTC as a client, right?[00:43:58] Romain Huet: And, and as opposed to like directly working with the sockets at scale. So that's why we have partners like Life Kit and Agora if you want to, if you want to use them. And I'm sure we'll have many mores in the, in many more in the future. But yeah, we keep on iterating on that, and I'm sure the feedback of developers in the weeks to come is going to be super critical for us to get it right.[00:44:16] swyx: Yeah, I think LiveKit has been fairly public that they are used in, in the Chachapiti app. Like, is it, it's just all open source, and we just use it directly with OpenAI, or do we use LiveKit Cloud or something?[00:44:28] Romain Huet: So right now we, we released the API, we released some sample code also, and referenced clients for people to get started with our API.[00:44:35] Romain Huet: And we also partnered with LifeKit and Agora, so they also have their own, like ways to help you get started that plugs natively with the real time API. So depending on the use case, people can, can can decide what to use. If you're working on something that's completely client or if you're working on something on the server side, for the voice interaction, you may have different needs, so we want to support all of those.[00:44:55] Alessio: I know you gotta run. Is there anything that you want the AI engineering community to give feedback on specifically, like even down to like, you know, a specific API end point or like, what, what's like the thing that you want? Yeah. I[00:45:08] Romain Huet: mean, you know, if we take a step back, I think dev Day this year is all different from last year and, and in, in a few different ways.[00:45:15] Romain Huet: But one way is that we wanted to keep it intimate, even more intimate than last year. We wanted to make sure that the community is. Thank you very much for joining us on the Spotlight. That's why we have community talks and everything. And the takeaway here is like learning from the very best developers and AI engineers.[00:45:31] Romain Huet: And so, you know we want to learn from them. Most of what we shipped this morning, including things like prompt caching the ability to generate prompts quickly in the playground, or even things like vision fine tuning. These are all things that developers have been asking of us. And so, the takeaway I would, I would leave them with is to say like, Hey, the roadmap that we're working on is heavily influenced by them and their work.[00:45:53] Romain Huet: And so we love feedback From high feature requests, as you say, down to, like, very intricate details of an API endpoint, we love feedback, so yes that's, that's how we, that's how we build this API.[00:46:05] swyx: Yeah, I think the, the model distillation thing as well, it might be, like, the, the most boring, but, like, actually used a lot.[00:46:12] Romain Huet: True, yeah. And I think maybe the most unexpected, right, because I think if I, if I read Twitter correctly the past few days, a lot of people were expecting us. To shape the real time API for speech to speech. I don't think developers were expecting us to have more tools for distillation, and we really think that's gonna be a big deal, right?[00:46:30] Romain Huet: If you're building apps that have you know, you, you want high, like like low latency, low cost, but high performance, high quality on the use case distillation is gonna be amazing.[00:46:40] swyx: Yeah. I sat in the distillation session just now and they showed how they distilled from four oh to four mini and it was like only like a 2% hit in the performance and 50 next.[00:46:49] swyx: Yeah,[00:46:50] Romain Huet: I was there as well for the superhuman kind of use case inspired for an Ebola client. Yeah, this was really good. Cool man! so much for having me. Thanks again for being here today. It's always[00:47:00] AI Charlie: great to have you. As you might have picked up at the end of that chat, there were many sessions throughout the day focused on specific new capabilities.[00:47:08] Michelle Pokrass, Head of API at OpenAI ft. Simon Willison[00:47:08] AI Charlie: Like the new model distillation features combining EVOLs and fine tuning. For our next session, we are delighted to bring back two former guests of the pod, which is something listeners have been greatly enjoying in our second year of doing the Latent Space podcast. Michelle Pokras of the API team joined us recently to talk about structured outputs, and today gave an updated long form session at Dev Day, describing the implementation details of the new structured output mode.[00:47:39] AI Charlie: We also got her updated thoughts on the VoiceMode API we discussed in her episode, now that it is finally announced. She is joined by friend of the pod and super blogger, Simon Willison, who also came back as guest co host in our Dev Day. 2023 episode.[00:47:56] Alessio: Great, we're back live at Dev Day returning guest Michelle and then returning guest co host Fork.[00:48:03] Alessio: Fork, yeah, I don't know. I've lost count. I think it's been a few. Simon Willison is back. Yeah, we just wrapped, we just wrapped everything up. Congrats on, on getting everything everything live. Simon did a great, like, blog, so if you haven't caught up, I[00:48:17] Simon Willison: wrote my, I implemented it. Now, I'm starting my live blog while waiting for the first talk to start, using like GPT 4, I wrote me the Javascript, and I got that live just in time and then, yeah, I was live blogging the whole day.[00:48:28] swyx: Are you a cursor enjoyer?[00:48:29] Simon Willison: I haven't really gotten into cursor yet to be honest. I just haven't spent enough time for it to click, I think. I'm more a copy and paste things out of Cloud and chat GPT. Yeah. It's interesting.[00:48:39] swyx: Yeah. I've converted to cursor and 01 is so easy to just toggle on and off.[00:48:45] Alessio: What's your workflow?[00:48:46] Alessio: VS[00:48:48] Michelle Pokrass: Code co pilot, so Yep, same here. Team co pilot. Co pilot is actually the reason I joined OpenAI. It was, you know, before ChatGPT, this is the thing that really got me. So I'm still into it, but I keep meaning to try out Cursor, and I think now that things have calmed down, I'm gonna give it a real go.[00:49:03] swyx: Yeah, it's a big thing to change your tool of choice.[00:49:06] swyx: Yes,[00:49:06] Michelle Pokrass: yeah, I'm pretty dialed, so.[00:49:09] swyx: I mean, you know, if you want, you can just fork VS Code and make your own. That's the thing to dumb thing, right? We joked about doing a hackathon where the only thing you do is fork VS Code and bet me the best fork win.[00:49:20] Michelle Pokrass: Nice.[00:49:22] swyx: That's actually a really good idea. Yeah, what's up?[00:49:26] swyx: I mean, congrats on launching everything today. I know, like, we touched on it a little bit, but, like, everyone was kind of guessing that Voice API was coming, and, like, we talked about it in our episode. How do you feel going into the launch? Like, any design decisions that you want to highlight?[00:49:41] Michelle Pokrass: Yeah, super jazzed about it. The team has been working on it for a while. It's, like, a very different API for us. It's the first WebSocket API, so a lot of different design decisions to be made. It's, like, what kind of events do you send? When do you send an event? What are the event names? What do you send, like, on connection versus on future messages?[00:49:57] Michelle Pokrass: So there have been a lot of interesting decisions there. The team has also hacked together really cool projects as we've been testing it. One that I really liked is we had an internal hack a thon for the API team. And some folks built like a little hack that you could use to, like VIM with voice mode, so like, control vim, and you would tell them on like, nice, write a file and it would, you know, know all the vim commands and, and pipe those in.[00:50:18] Michelle Pokrass: So yeah, a lot of cool stuff we've been hacking on and really excited to see what people build with it.[00:50:23] Simon Willison: I've gotta call out a demo from today. I think it was Katja had a 3D visualization of the solar system, like WebGL solar system, you could talk to. That is one of the coolest conference demos I've ever seen.[00:50:33] Simon Willison: That was so convincing. I really want the code. I really want the code for that to get put out there. I'll talk[00:50:39] Michelle Pokrass: to the team. I think we can[00:50:40] Simon Willison: probably

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

OpenAI DevDay is almost here! Per tradition, we are hosting a DevDay pregame event for everyone coming to town! Join us with demos and gossip!Also sign up for related events across San Francisco: the AI DevTools Night, the xAI open house, the Replicate art show, the DevDay Watch Party (for non-attendees), Hack Night with OpenAI at Cloudflare. For everyone else, join the Latent Space Discord for our online watch party and find fellow AI Engineers in your city.OpenAI's recent o1 release (and Reflection 70b debacle) has reignited broad interest in agentic general reasoning and tree search methods.While we have covered some of the self-taught reasoning literature on the Latent Space Paper Club, it is notable that the Eric Zelikman ended up at xAI, whereas OpenAI's hiring of Noam Brown and now Shunyu suggests more interest in tool-using chain of thought/tree of thought/generator-verifier architectures for Level 3 Agents.We were more than delighted to learn that Shunyu is a fellow Latent Space enjoyer, and invited him back (after his first appearance on our NeurIPS 2023 pod) for a look through his academic career with Harrison Chase (one year after his first LS show).ReAct: Synergizing Reasoning and Acting in Language Modelspaper linkFollowing seminal Chain of Thought papers from Wei et al and Kojima et al, and reflecting on lessons from building the WebShop human ecommerce trajectory benchmark, Shunyu's first big hit, the ReAct paper showed that using LLMs to “generate both reasoning traces and task-specific actions in an interleaved manner” achieved remarkably greater performance (less hallucination/error propagation, higher ALFWorld/WebShop benchmark success) than CoT alone. In even better news, ReAct scales fabulously with finetuning:As a member of the elite Princeton NLP group, Shunyu was also a coauthor of the Reflexion paper, which we discuss in this pod.Tree of Thoughtspaper link hereShunyu's next major improvement on the CoT literature was Tree of Thoughts:Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role…ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices.The beauty of ToT is it doesnt require pretraining with exotic methods like backspace tokens or other MCTS architectures. You can listen to Shunyu explain ToT in his own words on our NeurIPS pod, but also the ineffable Yannic Kilcher:Other WorkWe don't have the space to summarize the rest of Shunyu's work, you can listen to our pod with him now, and recommend the CoALA paper and his initial hit webinar with Harrison, today's guest cohost:as well as Shunyu's PhD Defense Lecture:as well as Shunyu's latest lecture covering a Brief History of LLM Agents:As usual, we are live on YouTube! Show Notes* Harrison Chase* LangChain, LangSmith, LangGraph* Shunyu Yao* Alec Radford* ReAct Paper* Hotpot QA* Tau Bench* WebShop* SWE-Agent* SWE-Bench* Trees of Thought* CoALA Paper* Related Episodes* Our Thomas Scialom (Meta) episode* Shunyu on our NeurIPS 2023 Best Papers episode* Harrison on our LangChain episode* Mentions* Sierra* Voyager* Jason Wei* Tavily* SERP API* ExaTimestamps* [00:00:00] Opening Song by Suno* [00:03:00] Introductions* [00:06:16] The ReAct paper* [00:12:09] Early applications of ReAct in LangChain* [00:17:15] Discussion of the Reflection paper* [00:22:35] Tree of Thoughts paper and search algorithms in language models* [00:27:21] SWE-Agent and SWE-Bench for coding benchmarks* [00:39:21] CoALA: Cognitive Architectures for Language Agents* [00:45:24] Agent-Computer Interfaces (ACI) and tool design for agents* [00:49:24] Designing frameworks for agents vs humans* [00:53:52] UX design for AI applications and agents* [00:59:53] Data and model improvements for agent capabilities* [01:19:10] TauBench* [01:23:09] Promising areas for AITranscriptAlessio [00:00:01]: Hey, everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO of Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Small AI.Swyx [00:00:12]: Hey, and today we have a super special episode. I actually always wanted to take like a selfie and go like, you know, POV, you're about to revolutionize the world of agents because we have two of the most awesome hiring agents in the house. So first, we're going to welcome back Harrison Chase. Welcome. Excited to be here. What's new with you recently in sort of like the 10, 20 second recap?Harrison [00:00:34]: Linkchain, Linksmith, Lingraph, pushing on all of them. Lots of cool stuff related to a lot of the stuff that we're going to talk about today, probably.Swyx [00:00:42]: Yeah.Alessio [00:00:43]: We'll mention it in there. And the Celtics won the title.Swyx [00:00:45]: And the Celtics won the title. You got that going on for you. I don't know. Is that like floorball? Handball? Baseball? Basketball.Alessio [00:00:52]: Basketball, basketball.Harrison [00:00:53]: Patriots aren't looking good though, so that's...Swyx [00:00:56]: And then Xun Yu, you've also been on the pod, but only in like a sort of oral paper presentation capacity. But welcome officially to the LinkedSpace pod.Shunyu [00:01:03]: Yeah, I've been a huge fan. So thanks for the invitation. Thanks.Swyx [00:01:07]: Well, it's an honor to have you on. You're one of like, you're maybe the first PhD thesis defense I've ever watched in like this AI world, because most people just publish single papers, but every paper of yours is a banger. So congrats.Shunyu [00:01:22]: Thanks.Swyx [00:01:24]: Yeah, maybe we'll just kick it off with, you know, what was your journey into using language models for agents? I like that your thesis advisor, I didn't catch his name, but he was like, you know... Karthik. Yeah. It's like, this guy just wanted to use language models and it was such a controversial pick at the time. Right.Shunyu [00:01:39]: The full story is that in undergrad, I did some computer vision research and that's how I got into AI. But at the time, I feel like, you know, you're just composing all the GAN or 3D perception or whatever together and it's not exciting anymore. And one day I just see this transformer paper and that's really cool. But I really got into language model only when I entered my PhD and met my advisor Karthik. So he was actually the second author of GPT-1 when he was like a visiting scientist at OpenAI. With Alec Redford?Swyx [00:02:10]: Yes.Shunyu [00:02:11]: Wow. That's what he told me. It's like back in OpenAI, they did this GPT-1 together and Ilya just said, Karthik, you should stay because we just solved the language. But apparently Karthik is not fully convinced. So he went to Princeton, started his professorship and I'm really grateful. So he accepted me as a student, even though I have no prior knowledge in NLP. And you know, we just met for the first time and he's like, you know, what do you want to do? And I'm like, you know, you have done those test game scenes. That's really cool. I wonder if we can just redo them with language models. And that's how the whole journey began. Awesome.Alessio [00:02:46]: So GPT-2 was out at the time? Yes, that was 2019.Shunyu [00:02:48]: Yeah.Alessio [00:02:49]: Way too dangerous to release. And then I guess the first work of yours that I came across was React, which was a big part of your defense. But also Harrison, when you came on The Pockets last year, you said that was one of the first papers that you saw when you were getting inspired for BlankChain. So maybe give a recap of why you thought it was cool, because you were already working in AI and machine learning. And then, yeah, you can kind of like intro the paper formally. What was that interesting to you specifically?Harrison [00:03:16]: Yeah, I mean, I think the interesting part was using these language models to interact with the outside world in some form. And I think in the paper, you mostly deal with Wikipedia. And I think there's some other data sets as well. But the outside world is the outside world. And so interacting with things that weren't present in the LLM and APIs and calling into them and thinking about the React reasoning and acting and kind of like combining those together and getting better results. I'd been playing around with LLMs, been talking with people who were playing around with LLMs. People were trying to get LLMs to call into APIs, do things, and it was always, how can they do it more reliably and better? And so this paper was basically a step in that direction. And I think really interesting and also really general as well. Like I think that's part of the appeal is just how general and simple in a good way, I think the idea was. So that it was really appealing for all those reasons.Shunyu [00:04:07]: Simple is always good. Yeah.Alessio [00:04:09]: Do you have a favorite part? Because I have one favorite part from your PhD defense, which I didn't understand when I read the paper, but you said something along the lines, React doesn't change the outside or the environment, but it does change the insight through the context, putting more things in the context. You're not actually changing any of the tools around you to work for you, but you're changing how the model thinks. And I think that was like a very profound thing when I, not that I've been using these tools for like 18 months. I'm like, I understand what you meant, but like to say that at the time you did the PhD defense was not trivial. Yeah.Shunyu [00:04:41]: Another way to put it is like thinking can be an extra tool that's useful.Alessio [00:04:47]: Makes sense. Checks out.Swyx [00:04:49]: Who would have thought? I think it's also more controversial within his world because everyone was trying to use RL for agents. And this is like the first kind of zero gradient type approach. Yeah.Shunyu [00:05:01]: I think the bigger kind of historical context is that we have this two big branches of AI. So if you think about RL, right, that's pretty much the equivalent of agent at a time. And it's like agent is equivalent to reinforcement learning and reinforcement learning is equivalent to whatever game environment they're using, right? Atari game or go or whatever. So you have like a pretty much, you know, you have a biased kind of like set of methodologies in terms of reinforcement learning and represents agents. On the other hand, I think NLP is like a historical kind of subject. It's not really into agents, right? It's more about reasoning. It's more about solving those concrete tasks. And if you look at SEL, right, like each task has its own track, right? Summarization has a track, question answering has a track. So I think really it's about rethinking agents in terms of what could be the new environments that we came to have is not just Atari games or whatever video games, but also those text games or language games. And also thinking about, could there be like a more general kind of methodology beyond just designing specific pipelines for each NLP task? That's like the bigger kind of context, I would say.Alessio [00:06:14]: Is there an inspiration spark moment that you remember or how did you come to this? We had Trida on the podcast and he mentioned he was really inspired working with like systems people to think about Flash Attention. What was your inspiration journey?Shunyu [00:06:27]: So actually before React, I spent the first two years of my PhD focusing on text-based games, or in other words, text adventure games. It's a very kind of small kind of research area and quite ad hoc, I would say. And there are like, I don't know, like 10 people working on that at the time. And have you guys heard of Zork 1, for example? So basically the idea is you have this game and you have text observations, like you see a monster, you see a dragon.Swyx [00:06:57]: You're eaten by a grue.Shunyu [00:06:58]: Yeah, you're eaten by a grue. And you have actions like kill the grue with a sword or whatever. And that's like a very typical setup of a text game. So I think one day after I've seen all the GPT-3 stuff, I just think about, you know, how can I solve the game? Like why those AI, you know, machine learning methods are pretty stupid, but we are pretty good at solving the game relatively, right? So for the context, the predominant method to solve this text game is obviously reinforcement learning. And the idea is you just try out an arrow in those games for like millions of steps and you kind of just overfit to the game. But there's no language understanding at all. And I'm like, why can't I solve the game better? And it's kind of like, because we think about the game, right? Like when we see this very complex text observation, like you see a grue and you might see a sword, you know, in the right of the room and you have to go through the wooden door to go to that room. You will think, you know, oh, I have to kill the monster and to kill that monster, I have to get the sword, I have to get the sword, I have to go, right? And this kind of thinking actually helps us kind of throw shots off the game. And it's like, why don't we also enable the text agents to think? And that's kind of the prototype of React. And I think that's actually very interesting because the prototype, I think, was around November of 2021. So that's even before like chain of thought or whatever came up. So we did a bunch of experiments in the text game, but it was not really working that well. Like those text games are just too hard. I think today it's still very hard. Like if you use GPD 4 to solve it, it's still very hard. So the change came when I started the internship in Google. And apparently Google care less about text game, they care more about what's more practical. So pretty much I just reapplied the idea, but to more practical kind of environments like Wikipedia or simpler text games like Alphard, and it just worked. It's kind of like you first have the idea and then you try to find the domains and the problems to demonstrate the idea, which is, I would say, different from most of the AI research, but it kind of worked out for me in that case.Swyx [00:09:09]: For Harrison, when you were implementing React, what were people applying React to in the early days?Harrison [00:09:14]: I think the first demo we did probably had like a calculator tool and a search tool. So like general things, we tried to make it pretty easy to write your own tools and plug in your own things. And so this is one of the things that we've seen in LangChain is people who build their own applications generally write their own tools. Like there are a few common ones. I'd say like the three common ones might be like a browser, a search tool, and a code interpreter. But then other than that-Swyx [00:09:37]: The LMS. Yep.Harrison [00:09:39]: Yeah, exactly. It matches up very nice with that. And we actually just redid like our integrations docs page, and if you go to the tool section, they like highlight those three, and then there's a bunch of like other ones. And there's such a long tail of other ones. But in practice, like when people go to production, they generally have their own tools or maybe one of those three, maybe some other ones, but like very, very few other ones. So yeah, I think the first demos was a search and a calculator one. And there's- What's the data set?Shunyu [00:10:04]: Hotpot QA.Harrison [00:10:05]: Yeah. Oh, so there's that one. And then there's like the celebrity one by the same author, I think.Swyx [00:10:09]: Olivier Wilde's boyfriend squared. Yeah. 0.23. Yeah. Right, right, right.Harrison [00:10:16]: I'm forgetting the name of the author, but there's-Swyx [00:10:17]: I was like, we're going to over-optimize for Olivier Wilde's boyfriend, and it's going to change next year or something.Harrison [00:10:21]: There's a few data sets kind of like in that vein that require multi-step kind of like reasoning and thinking. So one of the questions I actually had for you in this vein, like the React paper, there's a few things in there, or at least when I think of that, there's a few things that I think of. There's kind of like the specific prompting strategy. Then there's like this general idea of kind of like thinking and then taking an action. And then there's just even more general idea of just like taking actions in a loop. Today, like obviously language models have changed a lot. We have tool calling. The specific prompting strategy probably isn't used super heavily anymore. Would you say that like the concept of React is still used though? Or like do you think that tool calling and running tool calling in a loop, is that ReactSwyx [00:11:02]: in your mind?Shunyu [00:11:03]: I would say like it's like more implicitly used than explicitly used. To be fair, I think the contribution of React is actually twofold. So first is this idea of, you know, we should be able to use calls in a very general way. Like there should be a single kind of general method to handle interaction with various environments. I think React is the first paper to demonstrate the idea. But then I think later there are two form or whatever, and this becomes like a trivial idea. But I think at the time, that's like a pretty non-trivial thing. And I think the second contribution is this idea of what people call like inner monologue or thinking or reasoning or whatever, to be paired with tool use. I think that's still non-trivial because if you look at the default function calling or whatever, like there's no inner monologue. And in practice, that actually is important, especially if the tool that you use is pretty different from the training distribution of the language model. I think those are the two main things that are kind of inherited.Harrison [00:12:10]: On that note, I think OpenAI even recommended when you're doing tool calling, it's sometimes helpful to put a thought field in the tool, along with all the actual acquired arguments,Swyx [00:12:19]: and then have that one first.Harrison [00:12:20]: So it fills out that first, and they've shown that that's yielded better results. The reason I ask is just like this same concept is still alive, and I don't know whether to call it a React agent or not. I don't know what to call it. I think of it as React, like it's the same ideas that were in the paper, but it's obviously a very different implementation at this point in time. And so I just don't know what to call it.Shunyu [00:12:40]: I feel like people will sometimes think more in terms of different tools, right? Because if you think about a web agent versus, you know, like a function calling agent, calling a Python API, you would think of them as very different. But in some sense, the methodology is the same. It depends on how you view them, right? I think people will tend to think more in terms of the environment and the tools rather than the methodology. Or, in other words, I think the methodology is kind of trivial and simple, so people will try to focus more on the different tools. But I think it's good to have a single underlying principle of those things.Alessio [00:13:17]: How do you see the surface of React getting molded into the model? So a function calling is a good example of like, now the model does it. What about the thinking? Now most models that you use kind of do chain of thought on their own, they kind of produce steps. Do you think that more and more of this logic will be in the model? Or do you think the context window will still be the main driver of reasoning and thinking?Shunyu [00:13:39]: I think it's already default, right? You do some chain of thought and you do some tool call, the cost of adding the chain of thought is kind of relatively low compared to other things. So it's not hurting to do that. And I think it's already kind of common practice, I would say.Swyx [00:13:56]: This is a good place to bring in either Tree of Thought or Reflection, your pick.Shunyu [00:14:01]: Maybe Reflection, to respect the time order, I would say.Swyx [00:14:05]: Any backstory as well, like the people involved with NOAA and the Princeton group. We talked about this offline, but people don't understand how these research pieces come together and this ideation.Shunyu [00:14:15]: I think Reflection is mostly NOAA's work, I'm more like advising kind of role. The story is, I don't remember the time, but one day we just see this pre-print that's like Reflection and Autonomous Agent with memory or whatever. And it's kind of like an extension to React, which uses this self-reflection. I'm like, oh, somehow you've become very popular. And NOAA reached out to me, it's like, do you want to collaborate on this and make this from an archive pre-print to something more solid, like a conference submission? I'm like, sure. We started collaborating and we remain good friends today. And I think another interesting backstory is NOAA was contacted by OpenAI at the time. It's like, this is pretty cool, do you want to just work at OpenAI? And I think Sierra also reached out at the same time. It's like, this is pretty cool, do you want to work at Sierra? And I think NOAA chose Sierra, but it's pretty cool because he was still like a second year undergrad and he's a very smart kid.Swyx [00:15:16]: Based on one paper. Oh my god.Shunyu [00:15:19]: He's done some other research based on programming language or chemistry or whatever, but I think that's the paper that got the attention of OpenAI and Sierra.Swyx [00:15:28]: For those who haven't gone too deep on it, the way that you present the inside of React, can you do that also for reflection? Yeah.Shunyu [00:15:35]: I think one way to think of reflection is that the traditional idea of reinforcement learning is you have a scalar reward and then you somehow back-propagate the signal of the scalar reward to the rest of your neural network through whatever algorithm, like policy grading or A2C or whatever. And if you think about the real life, most of the reward signal is not scalar. It's like your boss told you, you should have done a better job in this, but you could jump on that or whatever. It's not like a scalar reward, like 29 or something. I think in general, humans deal more with long scalar reward, or you can say language feedback. And the way that they deal with language feedback also has this back-propagation process, right? Because you start from this, you did a good job on job B, and then you reflect what could have been done differently to change to make it better. And you kind of change your prompt, right? Basically, you change your prompt on how to do job A and how to do job B, and then you do the whole thing again. So it's really like a pipeline of language where in self-graded descent, you have something like text reasoning to replace those gradient descent algorithms. I think that's one way to think of reflection.Harrison [00:16:47]: One question I have about reflection is how general do you think the algorithm there is? And so for context, I think at LangChain and at other places as well, we found it pretty easy to implement React in a standard way. You plug in any tools and it kind of works off the shelf, can get it up and running. I don't think we have an off-the-shelf kind of implementation of reflection and kind of the general sense. I think the concepts, absolutely, we see used in different kind of specific cognitive architectures, but I don't think we have one that comes off the shelf. I don't think any of the other frameworks have one that comes off the shelf. And I'm curious whether that's because it's not general enough or it's complex as well, because it also requires running it more times.Swyx [00:17:28]: Maybe that's not feasible.Harrison [00:17:30]: I'm curious how you think about the generality, complexity. Should we have one that comes off the shelf?Shunyu [00:17:36]: I think the algorithm is general in the sense that it's just as general as other algorithms, if you think about policy grading or whatever, but it's not applicable to all tasks, just like other algorithms. So you can argue PPO is also general, but it works better for those set of tasks, but not on those set of tasks. I think it's the same situation for reflection. And I think a key bottleneck is the evaluator, right? Basically, you need to have a good sense of the signal. So for example, if you are trying to do a very hard reasoning task, say mathematics, for example, and you don't have any tools, you're operating in this chain of thought setup, then reflection will be pretty hard because in order to reflect upon your thoughts, you have to have a very good evaluator to judge whether your thought is good or not. But that might be as hard as solving the problem itself or even harder. The principle of self-reflection is probably more applicable if you have a good evaluator, for example, in the case of coding. If you have those arrows, then you can just reflect on that and how to solve the bug andSwyx [00:18:37]: stuff.Shunyu [00:18:38]: So I think another criteria is that it depends on the application, right? If you have this latency or whatever need for an actual application with an end-user, the end-user wouldn't let you do two hours of tree-of-thought or reflection, right? You need something as soon as possible. So in that case, maybe this is better to be used as a training time technique, right? You do those reflection or tree-of-thought or whatever, you get a lot of data, and then you try to use the data to train your model better. And then in test time, you still use something as simple as React, but that's already improved.Alessio [00:19:11]: And if you think of the Voyager paper as a way to store skills and then reuse them, how would you compare this reflective memory and at what point it's just ragging on the memory versus you want to start to fine-tune some of them or what's the next step once you get a very long reflective corpus? Yeah.Shunyu [00:19:30]: So I think there are two questions here. The first question is, what type of information or memory are you considering, right? Is it like semantic memory that stores knowledge about the word, or is it the episodic memory that stores trajectories or behaviors, or is it more of a procedural memory like in Voyager's case, like skills or code snippets that you can use to do actions, right?Swyx [00:19:54]: That's one dimension.Shunyu [00:19:55]: And the second dimension is obviously how you use the memory, either retrieving from it, using it in the context, or fine-tuning it. I think the Cognitive Architecture for Language Agents paper has a good categorization of all the different combinations. And of course, which way you use depends on the concrete application and the concrete need and the concrete task. But I think in general, it's good to think of those systematic dimensions and all the possible options there.Swyx [00:20:25]: Harrison also has in LangMEM, I think you did a presentation in my meetup, and I think you've done it at a couple other venues as well. User state, semantic memory, and append-only state, I think kind of maps to what you just said.Shunyu [00:20:38]: What is LangMEM? Can I give it like a quick...Harrison [00:20:40]: One of the modules of LangChain for a long time has been something around memory. And I think we're still obviously figuring out what that means, as is everyone kind of in the space. But one of the experiments that we did, and one of the proof of concepts that we did was, technically what it was is you would basically create threads, you'd push messages to those threads in the background, we process the data in a few ways. One, we put it into some semantic store, that's the semantic memory. And then two, we do some extraction and reasoning over the memories to extract. And we let the user define this, but extract key facts or anything that's of interest to the user. Those aren't exactly trajectories, they're maybe more closer to the procedural memory. Is that how you'd think about it or classify it?Shunyu [00:21:22]: Is it like about knowledge about the word, or is it more like how to do something?Swyx [00:21:27]: It's reflections, basically.Harrison [00:21:28]: So in generative worlds.Shunyu [00:21:30]: Generative agents.Swyx [00:21:31]: The Smallville. Yeah, the Smallville one.Harrison [00:21:33]: So the way that they had their memory there was they had the sequence of events, and that's kind of like the raw events that happened. But then every N events, they'd run some synthesis over those events for the LLM to insert its own memory, basically. It's that type of memory.Swyx [00:21:49]: I don't know how that would be classified.Shunyu [00:21:50]: I think of that as more of the semantic memory, but to be fair, I think it's just one way to think of that. But whether it's semantic memory or procedural memory or whatever memory, that's like an abstraction layer. But in terms of implementation, you can choose whatever implementation for whatever memory. So they're totally kind of orthogonal. I think it's more of a good way to think of the things, because from the history of cognitive science and cognitive architecture and how people study even neuroscience, that's the way people think of how the human brain organizes memory. And I think it's more useful as a way to think of things. But it's not like for semantic memory, you have to do this kind of way to retrieve or fine-tune, and for procedural memory, you have to do that. I think those are totally orthogonal kind of dimensions.Harrison [00:22:34]: How much background do you have in cognitive sciences, and how much do you model some of your thoughts on?Shunyu [00:22:40]: That's a great question, actually. I think one of the undergrad influences for my follow-up research is I was doing an internship at MIT's Computational Cognitive Science Lab with Josh Tannenbaum, and he's a very famous cognitive scientist. And I think a lot of his ideas still influence me today, like thinking of things in computational terms and getting interested in language and a lot of stuff, or even developing psychology kind of stuff. So I think it still influences me today.Swyx [00:23:14]: As a developer that tried out LangMEM, the way I view it is just it's a materialized view of a stream of logs. And if anything, that's just useful for context compression. I don't have to use the full context to run it over everything. But also it's kind of debuggable. If it's wrong, I can show it to the user, the user can manually fix it, and I can carry on. That's a really good analogy. I like that. I'm going to steal that. Sure. Please, please. You know I'm bullish on memory databases. I guess, Tree of Thoughts? Yeah, Tree of Thoughts.Shunyu [00:23:39]: I feel like I'm relieving the defense in like a podcast format. Yeah, no.Alessio [00:23:45]: I mean, you had a banger. Well, this is the one where you're already successful and we just highlight the glory. It was really good. You mentioned that since thinking is kind of like taking an action, you can use action searching algorithms to think of thinking. So just like you will use Tree Search to find the next thing. And the idea behind Tree of Thought is that you generate all these possible outcomes and then find the best tree to get to the end. Maybe back to the latency question, you can't really do that if you have to respond in real time. So what are maybe some of the most helpful use cases for things like this? Where have you seen people adopt it where the high latency is actually worth the wait?Shunyu [00:24:21]: For things that you don't care about latency, obviously. For example, if you're trying to do math, if you're just trying to come up with a proof. But I feel like one type of task is more about searching for a solution. You can try a hundred times, but if you find one solution, that's good. For example, if you're finding a math proof or if you're finding a good code to solve a problem or whatever, I think another type of task is more like reacting. For example, if you're doing customer service, you're like a web agent booking a ticket for an end user. Those are more reactive kind of tasks, or more real-time tasks. You have to do things fast. They might be easy, but you have to do it reliably. And you care more about can you solve 99% of the time out of a hundred. But for the type of search type of tasks, then you care more about can I find one solution out of a hundred. So it's kind of symmetric and different.Alessio [00:25:11]: Do you have any data or intuition from your user base? What's the split of these type of use cases? How many people are doing more reactive things and how many people are experimenting with deep, long search?Harrison [00:25:23]: I would say React's probably the most popular. I think there's aspects of reflection that get used. Tree of thought, probably the least so. There's a great tweet from Jason Wei, I think you're now a colleague, and he was talking about prompting strategies and how he thinks about them. And I think the four things that he had was, one, how easy is it to implement? How much compute does it take? How many tasks does it solve? And how much does it improve on those tasks? And I'd add a fifth, which is how likely is it to be relevant when the next generation of models come out? And I think if you look at those axes and then you look at React, reflection, tree of thought, it tracks that the ones that score better are used more. React is pretty easy to implement. Tree of thought's pretty hard to implement. The amount of compute, yeah, a lot more for tree of thought. The tasks and how much it improves, I don't have amazing visibility there. But I think if we're comparing React versus tree of thought, React just dominates the first two axes so much that my question around that was going to be like, how do you think about these prompting strategies, cognitive architectures, whatever you want to call them? When you're thinking of them, what are the axes that you're judging them on in your head when you're thinking whether it's a good one or a less good one?Swyx [00:26:38]: Right.Shunyu [00:26:39]: Right. I think there is a difference between a prompting method versus research, in the sense that for research, you don't really even care about does it actually work on practical tasks or does it help? Whatever. I think it's more about the idea or the principle, right? What is the direction that you're unblocking and whatever. And I think for an actual prompting method to solve a concrete problem, I would say simplicity is very important because the simpler it is, the less decision you have to make about it. And it's easier to design. It's easier to propagate. And it's easier to do stuff. So always try to be as simple as possible. And I think latency obviously is important. If you can do things fast and you don't want to do things slow. And I think in terms of the actual prompting method to use for a particular problem, I think we should all be in the minimalist kind of camp, right? You should try the minimum thing and see if it works. And if it doesn't work and there's absolute reason to add something, then you add something, right? If there's absolute reason that you need some tool, then you should add the tool thing. If there's absolute reason to add reflection or whatever, you should add that. Otherwise, if a chain of thought can already solve something, then you don't even need to use any of that.Harrison [00:27:57]: Yeah. Or if it's just better prompting can solve it. Like, you know, you could add a reflection step or you could make your instructions a little bit clearer.Swyx [00:28:03]: And it's a lot easier to do that.Shunyu [00:28:04]: I think another interesting thing is like, I personally have never done those kind of like weird tricks. I think all the prompts that I write are kind of like just talking to a human, right? It's like, I don't know. I never say something like, your grandma is dying and you have to solve it. I mean, those are cool, but I feel like we should all try to solve things in a very intuitive way. Just like talking to your co-worker. That should work 99% of the time. That's my personal take.Swyx [00:28:29]: The problem with how language models, at least in the GPC 3 era, was that they over-optimized to some sets of tokens in sequence. So like reading the Kojima et al. paper that was listing step-by-step, like he tried a bunch of them and they had wildly different results. It should not be the case, but it is the case. And hopefully we're getting better there.Shunyu [00:28:51]: Yeah. I think it's also like a timing thing in the sense that if you think about this whole line of language model, right? Like at the time it was just like a text generator. We don't have any idea how it's going to be used, right? And obviously at the time you will find all kinds of weird issues because it's not trained to do any of that, right? But then I think we have this loop where once we realize chain of thought is important or agent is important or tool using is important, what we see is today's language models are heavily optimized towards those things. So I think in some sense they become more reliable and robust over those use cases. And you don't need to do as much prompt engineering tricks anymore to solve those things. I feel like in some sense, I feel like prompt engineering even is like a slightly negative word at the time because it refers to all those kind of weird tricks that you have to apply. But I think we don't have to do that anymore. Like given today's progress, you should just be able to talk to like a coworker. And if you're clear and concrete and being reasonable, then it should do reasonable things for you.Swyx [00:29:51]: Yeah. The way I put this is you should not be a prompt engineer because it is the goal of the big labs to put you out of a job.Shunyu [00:29:58]: You should just be a good communicator. Like if you're a good communicator to humans, you should be a good communicator to languageSwyx [00:30:02]: models.Harrison [00:30:03]: That's the key though, because oftentimes people aren't good communicators to these language models and that is a very important skill and that's still messing around with the prompt. And so it depends what you're talking about when you're saying prompt engineer.Shunyu [00:30:14]: But do you think it's like very correlated with like, are they like a good communicator to humans? You know, it's like.Harrison [00:30:20]: It may be, but I also think I would say on average, people are probably worse at communicating with language models than to humans right now, at least, because I think we're still figuring out how to do it. You kind of expect it to be magical and there's probably some correlation, but I'd say there's also just like, people are worse at it right now than talking to humans.Shunyu [00:30:36]: We should make it like a, you know, like an elementary school class or whatever, how toSwyx [00:30:41]: talk to language models. Yeah. I don't know. Very pro that. Yeah. Before we leave the topic of trees and searching, not specific about QSTAR, but there's a lot of questions about MCTS and this combination of tree search and language models. And I just had to get in a question there about how seriously should people take this?Shunyu [00:30:59]: Again, I think it depends on the tasks, right? So MCTS was magical for Go, but it's probably not as magical for robotics, right? So I think right now the problem is not even that we don't have good methodologies, it's more about we don't have good tasks. It's also very interesting, right? Because if you look at my citation, it's like, obviously the most cited are React, Refraction and Tree of Thought. Those are methodologies. But I think like equally important, if not more important line of my work is like benchmarks and environments, right? Like WebShop or SuiteVenture or whatever. And I think in general, what people do in academia that I think is not good is they choose a very simple task, like Alford, and then they apply overly complex methods to show they improve 2%. I think you should probably match the level of complexity of your task and your method. I feel like where tasks are kind of far behind the method in some sense, right? Because we have some good test-time approaches, like whatever, React or Refraction or Tree of Thought, or like there are many, many more complicated test-time methods afterwards. But on the benchmark side, we have made a lot of good progress this year, last year. But I think we still need more progress towards that, like better coding benchmark, better web agent benchmark, better agent benchmark, not even for web or code. I think in general, we need to catch up with tasks.Harrison [00:32:27]: What are the biggest reasons in your mind why it lags behind?Shunyu [00:32:31]: I think incentive is one big reason. Like if you see, you know, all the master paper are cited like a hundred times more than the task paper. And also making a good benchmark is actually quite hard. It's almost like a different set of skills in some sense, right? I feel like if you want to build a good benchmark, you need to be like a good kind of product manager kind of mindset, right? You need to think about why people should use your benchmark, why it's challenging, why it's useful. If you think about like a PhD going into like a school, right? The prior skill that expected to have is more about, you know, can they code this method and can they just run experiments and can solve that? I think building a benchmark is not the typical prior skill that we have, but I think things are getting better. I think more and more people are starting to build benchmarks and people are saying that it's like a way to get more impact in some sense, right? Because like if you have a really good benchmark, a lot of people are going to use it. But if you have a super complicated test time method, like it's very hard for people to use it.Harrison [00:33:35]: Are evaluation metrics also part of the reason? Like for some of these tasks that we might want to ask these agents or language models to do, is it hard to evaluate them? And so it's hard to get an automated benchmark. Obviously with SweetBench you can, and with coding, it's easier, but.Shunyu [00:33:50]: I think that's part of the skillset thing that I mentioned, because I feel like it's like a product manager because there are many dimensions and you need to strike a balance and it's really hard, right? If you want to make sense, very easy to autogradable, like automatically gradable, like either to grade or either to evaluate, then you might lose some of the realness or practicality. Or like it might be practical, but it might not be as scalable, right? For example, if you think about text game, human have pre-annotated all the rewards and all the language are real. So it's pretty good on autogradable dimension and the practical dimension. If you think about, you know, practical, like actual English being practical, but it's not scalable, right? It takes like a year for experts to build that game. So it's not really that scalable. And I think part of the reason that SweetBench is so popular now is it kind of hits the balance between these three dimensions, right? Easy to evaluate and being actually practical and being scalable. Like if I were to criticize upon some of my prior work, I think webshop, like it's my initial attempt to get into benchmark world and I'm trying to do a good job striking the balance. But obviously we make it all gradable and it's really scalable, but then I think the practicality is not as high as actually just using GitHub issues, right? Because you're just creating those like synthetic tasks.Harrison [00:35:13]: Are there other areas besides coding that jump to mind as being really good for being autogradable?Shunyu [00:35:20]: Maybe mathematics.Swyx [00:35:21]: Classic. Yeah. Do you have thoughts on alpha proof, the new DeepMind paper? I think it's pretty cool.Shunyu [00:35:29]: I think it's more of a, you know, it's more of like a confidence boost or like sometimes, you know, the work is not even about, you know, the technical details or the methodology that it chooses or the concrete results. I think it's more about a signal, right?Swyx [00:35:47]: Yeah. Existence proof. Yeah.Shunyu [00:35:50]: Yeah. It can be done. This direction is exciting. It kind of encourages people to work more towards that direction. I think it's more like a boost of confidence, I would say.Swyx [00:35:59]: Yeah. So we're going to focus more on agents now and, you know, all of us have a special interest in coding agents. I would consider Devin to be the sort of biggest launch of the year as far as AI startups go. And you guys in the Princeton group worked on Suiagents alongside of Suibench. Tell us the story about Suiagent. Sure.Shunyu [00:36:21]: I think it's kind of like a triology, it's actually a series of three works now. So actually the first work is called Intercode, but it's not as famous, I know. And the second work is called Suibench and the third work is called Suiagent. And I'm just really confused why nobody is working on coding. You know, it's like a year ago, but I mean, not everybody's working on coding, obviously, but a year ago, like literally nobody was working on coding. I was really confused. And the people that were working on coding are, you know, trying to solve human evil in like a sick-to-sick way. There's no agent, there's no chain of thought, there's no anything, they're just, you know, fine tuning the model and improve some points and whatever, like, I was really confused because obviously coding is the best application for agents because it's autogradable, it's super important, you can make everything like API or code action, right? So I was confused and I collaborated with some of the students in Princeton and we have this work called Intercode and the idea is, first, if you care about coding, then you should solve coding in an interactive way, meaning more like a Jupyter Notebook kind of way than just writing a program and seeing if it fails or succeeds and stop, right? You should solve it in an interactive way because that's exactly how humans solve it, right? You don't have to, you know, write a program like next token, next token, next token and stop and never do any edits and you cannot really use any terminal or whatever tool. It doesn't make sense, right? And that's the way people are solving coding at the time, basically like sampling a program from a language model without chain of thought, without tool call, without refactoring, without anything. So the first point is we should solve coding in a very interactive way and that's a very general principle that applies for various coding benchmarks. And also, I think you can make a lot of the agent task kind of like interactive coding. If you have Python and you can call any package, then you can literally also browse internet or do whatever you want, like control a robot or whatever. So that seems to be a very general paradigm. But obviously I think a bottleneck is at the time we're still doing, you know, very simple tasks like human eval or whatever coding benchmark people proposed. They were super hard in 2021, like 20%, but they're like 95% already in 2023. So obviously the next step is we need a better benchmark. And Carlos and John, which are the first authors of Swaybench, I think they come up with this great idea that we should just script GitHub and solve whatever human engineers are solving. And I think it's actually pretty easy to come up with the idea. And I think in the first week, they already made a lot of progress. They script the GitHub and they make all the same, but then there's a lot of painful info work and whatever, you know. I think the idea is super easy, but the engineering is super hard. And I feel like that's a very typical signal of a good work in the AI era now.Swyx [00:39:17]: I think also, I think the filtering was challenging, because if you look at open source PRs, a lot of them are just like, you know, fixing typos. I think it's challenging.Shunyu [00:39:27]: And to be honest, we didn't do a perfect job at the time. So if you look at the recent blog post with OpenAI, we improved the filtering so that it's more solvable.Swyx [00:39:36]: I think OpenAI was just like, look, this is a thing now. We have to fix this. These students just rushed it.Shunyu [00:39:45]: It's a good convergence of interests for me.Alessio [00:39:48]: Was that tied to you joining OpenAI? Or was that just unrelated?Shunyu [00:39:52]: It's a coincidence for me, but it's a good coincidence.Swyx [00:39:55]: There is a history of anytime a big lab adopts a benchmark, they fix it. Otherwise, it's a broken benchmark.Shunyu [00:40:03]: So naturally, once we propose swimmage, the next step is to solve it. But I think the typical way you solve something now is you collect some training samples, or you design some complicated agent method, and then you try to solve it. Either super complicated prompt, or you build a better model with more training data. But I think at the time, we realized that even before those things, there's a fundamental problem with the interface or the tool that you're supposed to use. Because that's like an ignored problem in some sense. What your tool is, or how that matters for your task. So what we found concretely is that if you just use the text terminal off the shelf as a tool for those agents, there's a lot of problems. For example, if you edit something, there's no feedback. So you don't know whether your edit is good or not. That makes the agent very confused and makes a lot of mistakes. There are a lot of small problems, you would say. Well, you can try to do prompt engineering and improve that, but it turns out to be actually very hard. We realized that the interface design is actually a very omitted part of agent design. So we did this switch agent work. And the key idea is just, even before you talk about what the agent is, you should talk about what the environment is. You should make sure that the environment is actually friendly to whatever agent you're trying to apply. That's the same idea for humans. Text terminal is good for some tasks, like git, pool, or whatever. But it's not good if you want to look at browser and whatever. Also, browser is a good tool for some tasks, but it's not a good tool for other tasks. We need to talk about how design interface, in some sense, where we should treat agents as our customers. It's like when we treat humans as a customer, we design human computer interfaces. We design those beautiful desktops or browsers or whatever, so that it's very intuitive and easy for humans to use. And this whole great subject of HCI is all about that. I think now the research idea of switch agent is just, we should treat agents as our customers. And we should do like, you know… AICI.Swyx [00:42:16]: AICI, exactly.Harrison [00:42:18]: So what are the tools that a suite agent should have, or a coding agent in general should have?Shunyu [00:42:24]: For suite agent, it's like a modified text terminal, which kind of adapts to a lot of the patterns of language models to make it easier for language models to use. For example, now for edit, instead of having no feedback, it will actually have a feedback of, you know, actually here you introduced like a syntax error, and you should probably want to fix that, and there's an ended error there. And that makes it super easy for the model to actually do that. And there's other small things, like how exactly you write arguments, right? Like, do you want to write like a multi-line edit, or do you want to write a single line edit? I think it's more interesting to think about the way of the development process of an ACI rather than the actual ACI for like a concrete application. Because I think the general paradigm is very similar to HCI and psychology, right? Basically, for how people develop HCIs, they do behavior experiments on humans, right? I do every test, right? Like, which interface is actually better? And I do those behavior experiments, kind of like psychology experiments to humans, and I change things. And I think what's really interesting for me, for this three-agent paper, is we can probably do the same thing for agents, right? We can do every test for those agents and do behavior tests. And through the process, we not only invent better interfaces for those agents, that's the practical value, but we also better understand agents. Just like when we do those A-B tests, we do those HCI, we better understand humans. Doing those ACI experiments, we actually better understand agents. And that's pretty cool.Harrison [00:43:51]: Besides that A-B testing, what are other processes that people can use to think about this in a good way?Swyx [00:43:57]: That's a great question.Shunyu [00:43:58]: And I think three-agent is an initial work. And what we do is the kind of the naive approach, right? You just try some interface, and you see what's going wrong, and then you try to fix that. We do this kind of iterative fixing. But I think what's really interesting is there'll be a lot of future directions that's very promising if we can apply some of the HCI principles more systematically into the interface design. I think that would be a very cool interdisciplinary research opportunity.Harrison [00:44:26]: You talked a lot about agent-computer interfaces and interactions. What about human-to-agent UX patterns? Curious for any thoughts there that you might have.Swyx [00:44:38]: That's a great question.Shunyu [00:44:39]: And in some sense, I feel like prompt engineering is about human-to-agent interface. But I think there can be a lot of interesting research done about... So prompting is about how humans can better communicate with the agent. But I think there could be interesting research on how agents can better communicate with humans, right? When to ask questions, how to ask questions, what's the frequency of asking questions. And I think those kinds of stuff could be very cool research.Harrison [00:45:07]: Yeah, I think some of the most interesting stuff that I saw here was also related to coding with Devin from Cognition. And they had the three or four different panels where you had the chat, the browser, the terminal, and I guess the code editor as well.Swyx [00:45:19]: There's more now.Harrison [00:45:19]: There's more. Okay, I'm not up to date. Yeah, I think they also did a good job on ACI.Swyx [00:45:25]: I think that's the main learning I have from Devin. They cracked that. Actually, there was no foundational planning breakthrough. The planner is actually pretty simple, but ACI that they broke through on.Shunyu [00:45:35]: I think making the tool good and reliable is probably like 90% of the whole agent. Once the tool is actually good, then the agent design can be much, much simpler. On the other hand, if the tool is bad, then no matter how much you put into the agent design, planning or search or whatever, it's still going to be trash.Harrison [00:45:53]: Yeah, I'd argue the same. Same with like context and instructions. Like, yeah, go hand in hand.Alessio [00:46:00]: On the tool, how do you think about the tension of like, for both of you, I mean, you're building a library, so even more for you. The tension between making now a language or a library that is like easy for the agent to grasp and write versus one that is easy for like the human to grasp and write. Because, you know, the trend is like more and more code gets written by the agent. So why wouldn't you optimize the framework to be as easy as possible for the model versus for the person?Swyx [00:46:24]: I think it's possible to design an interfaceShunyu [00:46:25]: that's both friendly to humans and agents. But what do you think?Harrison [00:46:29]: We haven't thought about that from the perspective, like we're not trying to design LangChain or LangGraph to be friendly. But I mean, I think to be friendly for agents to write.Swyx [00:46:42]: But I mean, I think we see this with like,Harrison [00:46:43]: I saw some paper that used TypeScript notation instead of JSON notation for tool calling and it got a lot better performance. So it's definitely a thing. I haven't really heard of anyone designing like a syntax or a language explicitly for agents, but there's clearly syntaxes that are better.Shunyu [00:46:59]: I think function calling is a good example where it's like a good interface for both human programmers and for agents, right? Like for developers, it's actually a very friendly interface because it's very concrete and you don't have to do prompt engineering anymore. You can be very systematic. And for models, it's also pretty good, right? Like it can use all the existing coding content. So I think we need more of those kinds of designs.Swyx [00:47:21]: I will mostly agree and I'll slightly disagree in terms of this, which is like, whether designing for humans also overlaps with designing for AI. So Malte Ubo, who's the CTO of Vercel, who is creating basically JavaScript's competitor to LangChain, they're observing that basically, like if the API is easy to understand for humans, it's actually much easier to understand for LLMs, for example, because they're not overloaded functions. They don't behave differently under different contexts. They do one thing and they always work the same way. It's easy for humans, it's easy for LLMs. And like that makes a lot of sense. And obviously adding types is another one. Like type annotations only help give extra context, which is really great. So that's the agreement. And then a disagreement is that when I use structured output to do my chain of thought, I have found that I change my field names to hint to the LLM of what the field is supposed to do. So instead of saying topics, I'll say candidate topics. And that gives me a better result because the LLM was like, ah, this is just a draft thing I can use for chain of thought. And instead of like summaries, I'll say topic summaries to link the previous field to the current field. So like little stuff like that, I find myself optimizing for the LLM where I, as a human, would never do that. Interesting.Shunyu [00:48:32]: It's kind of like the way you optimize the prompt, it might be different for humans and for machines. You can have a common ground that's both clear for humans and agents, but to improve the human performance versus improving the agent performance, they might move to different directions.Swyx [00:48:48]: Might move different directions. There's a lot more use of metadata as well, like descriptions, comments, code comments, annotations and stuff like that. Yeah.Harrison [00:48:56]: I would argue that's just you communicatingSwyx [00:48:58]: to the agent what it should do.Harrison [00:49:00]: And maybe you need to communicate a little bit more than to humans because models aren't quite good enough yet.Swyx [00:49:06]: But like, I don't think that's crazy.Harrison [00:49:07]: I don't think that's like- It's not crazy.Swyx [00:49:09]: I will bring this in because it just happened to me yesterday. I was at the cursor office. They held their first user meetup and I was telling them about the LLM OS concept and why basically every interface, every tool was being redesigned for AIs to use rather than humans. And they're like, why? Like, can we just use Bing and Google for LLM search? Why must I use Exa? Or what's the other one that you guys work with?Harrison [00:49:32]: Tavilli.Swyx [00:49:33]: Tavilli. Web Search API dedicated for LLMs. What's the difference?Shunyu [00:49:36]: Exactly. To Bing API.Swyx [00:49:38]: Exactly.Harrison [00:49:38]: There weren't great APIs for search. Like the best one, like the one that we used initially in LangChain was SERP API, which is like maybe illegal. I'm not sure.Swyx [00:49:49]: And like, you know,Harrison [00:49:52]: and now there are like venture-backed companies.Swyx [00:49:53]: Shout out to DuckDuckGo, which is free.Harrison [00:49:55]: Yes, yes.Swyx [00:49:56]: Yeah.Harrison [00:49:56]: I do think there are some differences though. I think you want, like, I think generally these APIs try to return small amounts of text information, clear legible field. It's not a massive JSON blob. And I think that matters. I think like when you talk about designing tools, it's not only the, it's the interface in the entirety, not only the inputs, but also the outputs that really matter. And so I think they try to make the outputs.Shunyu [00:50:18]: They're doing ACI.Swyx [00:50:19]: Yeah, yeah, absolutely.Harrison [00:50:20]: Really?Swyx [00:50:21]: Like there's a whole set of industries that are just being redone for ACI. It's weird. And so my simple answer to them was like the error messages. When you give error messages, they should be basically prompts for the LLM to take and then self-correct. Then your error messages get more verbose, actually, than you normally would with a human. Stuff like that. Like a little, honestly, it's not that big. Again, like, is this worth a venture-backed industry? Unless you can tell us. But like, I think Code Interpreter, I think is a new thing. I hope so.Alessio [00:50:52]: We invested in it to be so.Shunyu [00:50:53]: I think that's a very interesting point. You're trying to optimize to the extreme, then obviously they're going to be different. For example, the error—Swyx [00:51:00]: Because we take it very seriously. Right.Shunyu [00:51:01]: The error for like language model, the longer the better. But for humans, that will make them very nervous and very tired, right? But I guess the point is more like, maybe we should try to find a co-optimized common ground as much as possible. And then if we have divergence, then we should try to diverge. But it's more philosophical now.Alessio [00:51:19]: But I think like part of it is like how you use it. So Google invented the PageRank because ideally you only click on one link, you know, like the top three should have the answer. But with models, it's like, well, you can get 20. So those searches are more like semantic grouping in a way. It's like for this query, I'll return you like 20, 30 things that are kind of good, you know? So it's less about ranking and it's more about grouping.Shunyu [00:51:42]: Another fundamental thing about HCI is the difference between human and machine's kind of memory limit, right? So I think what's really interesting about this concept HCI versus HCI is interfaces that's optimized for them. You can kind of understand some of the fundamental characteristics, differences of humans and machines, right? Why, you know, if you look at find or whatever terminal command, you know, you can only look at one thing at a time or that's because we have a very small working memory. You can only deal with one thing at a time. You can only look at one paragraph of text at the same time. So the interface for us is by design, you know, a small piece of information, but more temporal steps. But for machines, that should be the opposite, right? You should just give them a hundred different results and they should just decide in context what's the most relevant stuff and trade off the context for temporal steps. That's actually also better for language models because like the cost is smaller or whatever. So it's interesting to connect those interfaces to the fundamental kind of differences of those.Harrison [00:52:43]: When you said earlier, you know, we should try to design these to maybe be similar as possible and diverge if we need to.Swyx [00:52:49]: I actually don't have a problem with them diverging nowHarrison [00:52:51]: and seeing venture-backed startups emerging now because we are different from machines code AI. And it's just so early on, like they may still look kind of similar and they may still be small differences, but it's still just so early. And I think we'll only discover more ways that they differ. And so I'm totally fine with them kind of like diverging earlySwyx [00:53:10]: and optimizing for the...Harrison [00:53:11]: I agree. I think it's more like, you know,Shunyu [00:53:14]: we should obviously try to optimize human interface just for humans. We're already doing that for 50 years. We should optimize agent interface just for agents, but we might also try to co-optimize both and see how far we can get. There's enough people to try all three directions. Yeah.Swyx [00:53:31]: There's a thesis I sometimes push, which is the sour lesson as opposed to the bitter lesson, which we're always inspired by human development, but actually AI develops its own path.Shunyu [00:53:40]: Right. We need to understand better, you know, what are the fundamental differences between those creatures.Swyx [00:53:45]: It's funny when really early on this pod, you were like, how much grounding do you have in cognitive development and human brain stuff? And I'm like

Dubuque Area Baseball Podcast
Weighin' in with Worm: S2 Episode 6 Loras Wrestling Jeren Glosser and Devin Bahr

Dubuque Area Baseball Podcast

Play Episode Listen Later Sep 18, 2024 53:54


What's going on guys? We are sitting down with Loras Head Coach and newly named Head Assistant Coach Devin Bahr. Glosser speaks about last year's season and the potential of the Duhawks going into the 2024-2025 season. We also get to know new head assistant coach Devin Bahr, a former 3x Wisconsin State Champion, World Team Member and 2024 NCAA Division 2 All American!

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)
Chapter Twenty-Eight - Cersei 6 - A Feast for Crows | A Song of Ice and Fire (ASOIAF)

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)

Play Episode Listen Later Sep 10, 2024 67:25


Send us a textCersei gets an update on Margaery court, while being waylaid by sparrows. She negotiates a deal with the new High Septon. He'll bless Tommen and erase the crown's debt to the Faith for a law allowing the re-emergence of the Faith Militant. Finally, she makes a veiled threat to Margaery. Mackelly and Simon bow their heads in prayer that she knows what she's doing.Chapter Review:Queen Cersei Lannister rides to the Great Sept of Baelor with Taena Maryweather. Taena fills Cersei in on the men in Margaery's court. The list is filled with most of the young, handsome knights in King's Landing. Therefore, the addition of Grand Maester Pycelle stands out. The going is slow. The litter is blocked by sparrows clogging the streets. Ultimately, Cersei walks the remainder of the way. Upon seeing a pile of bones of slain faithful, Cersei promises to seek revenge. However, the sparrows want only protection for the living.Inside the sept, Cersei is shocked to see septons on hands and knees scrubbing. When she meets the High Septon, she says he should have met her in his finest robes and a crystal crown, not wet and dirty. He explains that they've sold their crowns and fine robes. Cersei asks to speak in private. They kneel at the feet of the Crone. Cersei wants the sparrows gone from King's Landing and a blessing from the High Septon for King Tommen. The High Septon wants better protection from the king for the faithful throughout the realm. Cersei suggests they protect themselves, but a law prohibits the faith from taking up arms. She offers to abolish the law for a blessing for the king and forgiveness on the crown's debt to the faith. The High Septon agrees.On the ride back to the Red Keep, Cersei's festive mood is killed by the sight of Queen Margaery and her court. The two queens exchange pleasantries and barbs. When Cersei sees Margaery and Loras together, it reminds her of Jaime and herself. An idea takes root and laughter bursts forth. Margaery wants to share in the jest, and Cersei promises her that she will.Characters/Places/Names/Events:Cersei Lannister - Mother to King Tommen, Queen Regent of the Seven Kingdoms.King Tommen Baratheon - Child king of the Seven Kingdoms.High Septon - The highest ranking member of the Faith of the Seven.Taena Merryweather - Socia Support the showSupport us: Buy us a Cup of Arbor Gold, or become a sustainer and receive cool perks Donate to our cause Use our exclusive URL for a free 30-day trial of Audible Buy or gift Marriott Bonvoy points through our affiliate link Rate and review us at Apple Podcasts, Spotify, podchaser.com, and elsewhere.Find us on social media: Discord Twitter @GhostsHarrenhal Facebook Instagram YouTube All Music credits to Ross Bugden:INSTAGRAM! : https://instagram.com/rossbugden/ (rossbugden) TWITTER! : https://twitter.com/RossBugden (@rossbugden) YOUTUBE! : https://www.youtube.com/watch?v=kthxycmF25M

The top AI news from the past week, every ThursdAI

Hey, for the least time during summer of 2024, welcome to yet another edition of ThursdAI, also happy skynet self-awareness day for those who keep track :) This week, Cerebras broke the world record for fastest LLama 3.1 70B/8B inference (and came on the show to talk about it) Google updated 3 new Geminis, Anthropic artifacts for all, 100M context windows are possible, and Qwen beats SOTA on vision models + much more! As always, this weeks newsletter is brought to you by Weights & Biases, did I mention we're doing a hackathon in SF in September 21/22 and that we have an upcoming free RAG course w/ Cohere & Weaviate? TL;DR* Open Source LLMs * Nous DisTrO - Distributed Training (X , Report)* NousResearch/ hermes-function-calling-v1 open sourced - (X, HF)* LinkedIN Liger-Kernel - OneLine to make Training 20% faster & 60% more memory Efficient (Github)* Cartesia - Rene 1.3B LLM SSM + Edge Apache 2 acceleration (X, Blog)* Big CO LLMs + APIs* Cerebras launches the fastest AI inference - 447t/s LLama 3.1 70B (X, Blog, Try It)* Google - Gemini 1.5 Flash 8B & new Gemini 1.5 Pro/Flash (X, Try it)* Google adds Gems & Imagen to Gemini paid tier* Anthropic artifacts available to all users + on mobile (Blog, Try it)* Anthropic publishes their system prompts with model releases (release notes)* OpenAI has project Strawberry coming this fall (via The information)* This weeks Buzz* WandB Hackathon hackathon hackathon (Register, Join)* Also, we have a new RAG course w/ Cohere and Weaviate (RAG Course)* Vision & Video* Zhipu AI CogVideoX - 5B Video Model w/ Less 10GB of VRAM (X, HF, Try it)* Qwen-2 VL 72B,7B,2B - new SOTA vision models from QWEN (X, Blog, HF)* AI Art & Diffusion & 3D* GameNgen - completely generated (not rendered) DOOM with SD1.4 (project)* FAL new LORA trainer for FLUX - trains under 5 minutes (Trainer, Coupon for ThursdAI)* Tools & Others* SimpleBench from AI Explained - closely matches human experience (simple-bench.com)ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Open SourceLet's be honest - ThursdAI is a love letter to the open-source AI community, and this week was packed with reasons to celebrate.Nous Research DiStRO + Function Calling V1Nous Research was on fire this week (aren't they always?) and they kicked off the week with the release of DiStRO, which is a breakthrough in distributed training. You see, while LLM training requires a lot of hardware, it also requires a lot of network bandwidth between the different GPUs, even within the same data center. Proprietary networking solutions like Nvidia NVLink, and more open standards like Ethernet work well within the same datacenter, but training across different GPU clouds has been unimaginable until now. Enter DiStRo, a new decentralized training by the mad geniuses at Nous Research, in which they reduced the required bandwidth to train a 1.2B param model from 74.4GB to just 86MB (857x)! This can have massive implications for training across compute clusters, doing shared training runs, optimizing costs and efficiency and democratizing LLM training access! So don't sell your old GPUs just yet, someone may just come up with a folding@home but for training the largest open source LLM, and it may just be Nous! Nous Research also released their function-calling-v1 dataset (HF) that was used to train Hermes-2, and we had InterstellarNinja who authored that dataset, join the show and chat about it. This is an incredible unlock for the open source community, as function calling become a de-facto standard now. Shout out to the Glaive team as well for their pioneering work that paved the way!LinkedIn's Liger Kernel: Unleashing the Need for Speed (with One Line of Code)What if I told you, that whatever software you develop, you can add 1 line of code, and it'll run 20% faster, and require 60% less memory? This is basically what Linkedin researches released this week with Liger Kernel, yes you read that right, Linkedin, as in the website you career related posts on! "If you're doing any form of finetuning, using this is an instant win"Wing Lian - AxolotlThis absolutely bonkers improvement in training LLMs, now works smoothly with Flash Attention, PyTorch FSDP and DeepSpeed. If you want to read more about the implementation of the triton kernels, you can see a deep dive here, I just wanted to bring this to your attention, even if you're not technical, because efficiency jumps like these are happening all the time. We are used to seeing them in capabilities / intelligence, but they are also happening on the algorithmic/training/hardware side, and it's incredible to see!Huge shoutout to Byron and team at Linkedin for this unlock, check out their Github if you want to get involved!Qwen-2 VL - SOTA image and video understanding + open weights mini VLMYou may already know that we love the folks at Qwen here on ThursdAI, not only because Junyang Lin is a frequeny co-host and we get to hear about their releases as soon as they come out (they seem to be releasing them on thursdays around the time of the live show, I wonder why!) But also because, they are committed to open source, and have released 2 models 7B and 2B with complete Apache 2 license! First of all, their Qwen-2 VL 72B model, is now SOTA at many benchmarks, beating GPT-4, Claude 3.5 and other much bigger models. This is insane. I literally had to pause Junyang and repeat what he said, this is a 72B param model, that beats GPT-4o on document understanding, on math, on general visual Q&A. Additional Capabilities & Smaller modelsThey have added new capabilities in these models, like being able to handle arbitrary resolutions, but the one I'm most excited about is the video understanding. These models can now understand up to 20 minutes of video sequences, and it's not just "split the video to 10 frames and do image caption", no, these models understand video progression and if I understand correctly how they do it, it's quite genius. They the video embed time progression into the model using a new technique called M-RoPE, which turns the time progression into rotary positional embeddings. Now, the 72B model is currently available via API, but we do get 2 new small models with Apache 2 license and they are NOT too shabby either! 7B parameters (HF) and 2B Qwen-2 VL (HF) are small enough to run completely on your machine, and the 2B parameter, scores better than GPT-4o mini on OCR-bench for example! I can't wait to finish writing and go play with these models! Big Companies & LLM APIsThe biggest news this week came from Cerebras System, a relatively unknown company, that shattered the world record for LLM inferencing out of the blue (and came on the show to talk about how they are doing it)Cerebras - fastest LLM inference on wafer scale chipsCerebras has introduced the concept of wafer scale chips to the world, which is, if you imagine a microchip, they are the size of a post stamp maybe? GPUs are bigger, well, Cerebras are making chips the sizes of an iPad (72 square inches), largest commercial chips in the world. And now, they created an inference stack on top of those chips, and showed that they have the fastest inference in the world, how fast? Well, they can server LLama 3.1 8B at a whopping 1822t/s. No really, this is INSANE speeds, as I was writing this, I copied all the words I had so far, went to inference.cerebras.ai , asked to summarize, pasted and hit send, and I immediately got a summary! "The really simple explanation is we basically store the entire model, whether it's 8B or 70B or 405B, entirely on the chip. There's no external memory, no HBM. We have 44 gigabytes of memory on chip."James WangThey not only store the whole model (405B coming soon), but they store it in full fp16 precision as well, so they don't quantize the models. Right now, they are serving it with 8K tokens in context window, and we had a conversation about their next steps being giving more context to developers. The whole conversation is well worth listening to, James and Ian were awesome to chat with, and while they do have a waitlist, as they gradually roll out their release, James said to DM him on X and mention ThursdAI, and he'll put you through, so you'll be able to get an OpenAI compatible API key and be able to test this insane speed. P.S - we also did an independent verification of these speeds, using Weave, and found Cerebras to be quite incredible for agentic purposes, you can read our report here and the weave dashboard hereAnthropic - unlocking just-in-time applications with artifacts for allWell, if you aren't paying claude, maybe this will convince you. This week, anthropic announced that artifacts are available to all users, not only their paid customers. Artifacts are a feature in Claude that is basically a side pane (and from this week, a drawer in their mobile apps) that allows you to see what Claude is building, by rendering the web application almost on the fly. They have also trained Claude in working with that interface, so it knows about the different files etcEffectively, this turns Claude into a web developer that will build mini web applications (without backend) for you, on the fly, for any task you can think of. Drop a design, and it'll build a mock of it, drop some data in a CSV and it'll build an interactive onetime dashboard visualizing that data, or just ask it to build an app helping you split the bill between friends by uploading a picture of a bill. Artifacts are share-able and remixable, so you can build something and share with friends, so here you go, an artifact I made, by dropping my notes into claude, and asking for a magic 8 Ball, that will spit out a random fact from today's editing of ThursdAI. I also provided Claude with an 8Ball image, but it didn't work due to restrictions, so instead I just uploaded that image to claude and asked it to recreate it with SVG! And viola, a completely un-nessesary app that works! Google's Gemini Keeps Climbing the Charts (But Will It Be Enough?)Sensing a disturbance in the AI force (probably from that Cerebras bombshell), Google rolled out a series of Gemini updates, including a new experimental Gemini 1.5 Pro (0827) with sharper coding skills and logical reasoning. According to LMSys, it's already nipping at the heels of ChatGPT 4o and is number 2!Their Gemini 1.5 Flash model got a serious upgrade, vaulting to the #6 position on the arena. And to add to the model madness, they even released an Gemini Flash 8B parameter version for folks who need that sweet spot between speed and size.Oh, and those long-awaited Gems are finally starting to roll out. But get ready to open your wallet – this feature (preloading Gemini with custom context and capabilities) is a paid-tier exclusive. But hey, at least Imagen-3 is cautiously returning to the image generation game! AI Art & DiffusionDoom Meets Stable Diffusion: AI Dreams in 20FPS Glory (GameNGen)The future of video games is, uh, definitely going to be interesting. Just as everyone thought AI would be conquering Go or Chess, it seems we've stumbled into a different battlefield: first-person shooters.

The top AI news from the past week, every ThursdAI

Hey there, Alex here with an end of summer edition of our show, which did not disappoint. Today is the official anniversary of stable diffusion 1.4 can you believe it? It's the second week in the row that we have an exclusive LLM launch on the show (after Emozilla announced Hermes 3 on last week's show), and spoiler alert, we may have something cooking for next week as well!This edition of ThursdAI is brought to you by W&B Weave, our LLM observability toolkit, letting you evaluate LLMs for your own use-case easilyAlso this week, we've covered both ends of AI progress, doomerist CEO saying "Fck Gen AI" vs an 8yo coder and I continued to geek out on putting myself into memes (I promised I'll stop... at some point) so buckle up, let's take a look at another crazy week: TL;DR* Open Source LLMs * AI21 releases Jamba1.5 Large / Mini hybrid Mamba MoE (X, Blog, HF)* Microsoft Phi 3.5 - 3 new models including MoE (X, HF)* BFCL 2 - Berkley Function Calling Leaderboard V2 (X, Blog, Leaderboard)* NVIDIA - Mistral Nemo Minitron 8B - Distilled / Pruned from 12B (HF)* Cohere paper proves - code improves intelligence (X, Paper)* MOHAWK - transformer → Mamba distillation method (X, Paper, Blog)* AI Art & Diffusion & 3D* Ideogram launches v2 - new img diffusion king

Dubuque Area Baseball Podcast
Weighin in with Worm: S2 E2 Shane Liegel

Dubuque Area Baseball Podcast

Play Episode Listen Later Aug 21, 2024 53:48


Shane Liegel was a 3x state champion out of River Valley Wisconsin, he made his way to Loras where he was a 4x all American, an NWCA national champion, and a NCAA runner up. With his last year of eligibility Shane took his talent to Madison Wisconsin where he competed for the Badgers!

The top AI news from the past week, every ThursdAI

Look these crazy weeks don't seem to stop, and though this week started out a bit slower (while folks were waiting to see how the speculation about certain red berry flavored conspiracies are shaking out) the big labs are shipping! We've got space uncle Elon dropping an "almost-gpt4" level Grok-2, that's uncensored, has access to real time data on X and can draw all kinds of images with Flux, OpenAI announced a new ChatGPT 4o version (not the one from last week that supported structured outputs, a different one!) and Anthropic dropping something that makes AI Engineers salivate! Oh, and for the second week in a row, ThursdAI live spaces were listened to by over 4K people, which is very humbling, and awesome because for example today, Nous Research announced Hermes 3 live on ThursdAI before the public heard about it (and I had a long chat w/ Emozilla about it, very well worth listening to)TL;DR of all topics covered: * Big CO LLMs + APIs* Xai releases GROK-2 - frontier level Grok, uncensored + image gen with Flux (

Dubuque Area Baseball Podcast
Weighin in with Worm S2E1: Guy Patron Jr.

Dubuque Area Baseball Podcast

Play Episode Listen Later Aug 14, 2024 42:36


Guy Patrone was the first 4x All American in Loras history; he also is the all time win leader for the Duhawks. We sit down with this New Orleans native to discuss his impact on Loras Wrestling and what his future holds

Upon Further Review
STEVE HELMINIAK LORAS PREVIEW (UFR).mp3

Upon Further Review

Play Episode Listen Later Jun 24, 2024 8:57


Let's Talk Wrestling
Talking Wrestling with Head Coach Brett Wheelan

Let's Talk Wrestling

Play Episode Listen Later Apr 17, 2024 93:37


Brett is from Manchester, Iowa and wrestled for West Delaware. Unfortunately, injuries cut his career short. But Brett just couldn't get away from wrestling. He's been involved at the coaching level, holding roles at Waterloo West, Waverly Shell-Rock and Loras, to name a few. Currently, Brett is the head coach at West Des Moines Valley and guided them to a 15th place finish at state this year, which included a state finalist. In the show notes you'll see a link to Pay It FOREward Connolly, a non-profit organization that serves to financially aid persons with disabilities and their families. The organization was founded in memory of Andrew Connolly, a good friend of Brett Wheelan. Again, the link to the organization can be found in the show notes. If you know someone who could benefit from this amazing organization or if you simply want to know how you can help, all the information is found through that link. So please, sit back, relax and enjoy, Coach Brett Wheelan! Pay It FOREward Connolly: https://www.facebook.com/profile.php?id=100064756686289 Euphoria Coffee website: https://www.drinkeuphoriacoffee2go.com/ Let's Talk Wrestling website: https://letstalkwrestlingpodcast.my.canva.site/ --- Support this podcast: https://podcasters.spotify.com/pod/show/lets-talk-wrestling/support

Hawkeye Report Podcast
Hawkeye Report Podcast 583

Hawkeye Report Podcast

Play Episode Listen Later Feb 22, 2024 53:23


We have a jam packed podcast this week. Tom Kakert and Kyle Huesmann open the show discussing the Iowa Basketball team suddenly picking up their two biggest wins of the season. That includes a win at Michigan State. We look back at the victory over the Spartans on Tuesday night and how Iowa was able to pull the upset.  Iowa is now potentially seeing a shot at the NCAA Tournament in the distance. We dive into the Hawkeyes resume after the wins over Wisconsin and Michigan State. How does it stack up when compared to other potential programs being mentioned as being on the bubble?  Then we discuss the Iowa women's basketball team hitting the road for a huge game on Thursday night at Indiana. Kyle previews this massive game that Iowa will likely need to keep their hopes of a Big Ten title alive. How does Iowa match up with the Hoosiers? We also look at the remaining games for Iowa and their competition for the title, Ohio State.  Finally we wrap up discussing Iowa baseball. We look back at the three game series and the win on Tuesday over Loras. What have we learned so far about the Hawkeye baseball team? We also dive into a massive series in Jacksonville this weekend where Iowa will face Auburn, nationally ranked Virginia, and Wichita State. 

Dubuque Area Baseball Podcast
2024 Loras College Baseball Season Preview

Dubuque Area Baseball Podcast

Play Episode Listen Later Feb 19, 2024 21:02


Head Coach Carl Tebon stops by the podcast and reflects over the past seasons where Loras beat Iowa, won a conference championship, and made a regional for the first time under coach Tebon. Of course, we talk about the upcoming 2024 Loras DuHawks. Thanks to our great sponsors: Xtreme Strength and Sports Performance Nolan Weber Agency EXIT Realty with Jerry Elbert Tricor Insurance Company Kramer Pest Control Bar Raising Fitness Foley Distributing Advanced Wellness Center

Close the Door: Game of Thrones, A Song of Ice and Fire Podcast

Spoilers, profanity, Jaime x Brienne. It's time for King's Landing Cribs Edition, where the magic doesn't happen. Are you a gasoline kinda gal or do you prefer manure? All we know is we're not into the smell of rotting flesh. Jaime trusts Loras will fall in love with Brienne if he just spends a few minutes alone with her. Game of Thrones. A Song of Ice and Fire. A Storm of Swords - Jaime VIII - originally recorded in 2019.   Close The Door And Come Here - Episode 517

Hoopsville
21.17a: 10th Annual Marathon (Part 1)

Hoopsville

Play Episode Listen Later Feb 2, 2024 127:03


Friday is the 10th Annual Hoopsville Marathon and is a bit of a "half" marathon. The "big show" is scheduled to be on air for at least six hours (it ends up going seven-and-a-half hours). This year's show features coaches, administrators, and others around Division III who give us a sense of the season to date, what's to come, and what else is going on in the Division and sport. We will talk about the ongoing NABC efforts with Coaches vs Cancer and Suits and Sneakers events. We hear from some of our pundits who tackle ever more difficult Top 25 polls. And we have our annual conversation with both national committee chairs. The marathon is also a chance to celebrate the final month of the Division III basketball regular season. And we restart the shows fundraising efforts to help continue our efforts into the future. Guests included (in Part 1): - Chris Martin, No. 20 Loras men's coach - Lucia Robinson-Gibbs, MIT women's coach - Justin Olufson, Webster women's coach - Gary Stewart, Stevenson men's coach, NABC Coaches v Cancer advocate - Bill Broderick, No. 4 Christopher Newport women's coach Hoopsville is presented by D3hoops.com.

WATCH DEM THRONES by Black With No Chaser
"BOOK OF THE STRANGER" Game of Thrones Season 6 EP4 Recap

WATCH DEM THRONES by Black With No Chaser

Play Episode Listen Later Jan 29, 2024 56:01


Sansa Stark arrives at the Wall and reunites with Jon Snow, and later receives a message from Ramsay Bolton challenging Jon to come take back Winterfell and rescue Rickon Stark; Margaery Tyrell is reunited with her brother, Loras; Cersei and Jaime Lannister plot with their uncle Kevan and Olenna Tyrell to have them released; and Daenerys Targaryen faces the khals.If you want to keep the fun going with us throughout the week, come join our Facebook group. THE WATCH DEM THRONES FACEBOOK GROUPhttps://www.facebook.com/groups/126567443834910/?ref=share&mibextid=NSMWBTTO WATCH AND SUBSCRIBE:Facebook:https://www.facebook.com/blackwithnochaser/Twitter:https://twitter.com/BeBlackNoChaser?t=pVFV06lBFdZRu72ot4uCjA&s=09Twitter:https://twitter.com/WatchDemThrones?t=q0ngrYPlugf0ttzM2jo39A&s=09Apple Music: https://podcasts.apple.com/us/podcast/watch-dem-thrones-by-black-with-no-chaser/id1641754247Spotify:https://open.spotify.com/show/1qI1bJ1vIlobu502w6zrtN?si=mtsa3gZYRZW_3FmlCrv7UgBWNC RADIO: https://apps.apple.com/us/app/bwnc-radio/id6443800363Amazon Musichttps://music.amazon.com/podcasts/45279c3a-c09f-47d1-a3a3-88e6e2507230/watch-dem-thrones-by-black-with-no-chaserIHeartRadiohttps://www.iheart.com/podcast/269-watch-dem-thrones-by-black-101286659/?cmp=android_share&sc=android_social_share&pr=false#gameofthrones #demdragons #blackwithnochaser #houseofthedragonhbo #dragonseeds #theblacks #thegreens #houseofthedragon #youtube #targaryens #podcast #podsincolor #applemusic #spotifymusic #podsincolor #starks #lannisters #Velaryon

The Nonlinear Library
EA - AI doing philosophy = AI generating hands? by Wei Dai

The Nonlinear Library

Play Episode Listen Later Jan 15, 2024 4:48


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI doing philosophy = AI generating hands?, published by Wei Dai on January 15, 2024 on The Effective Altruism Forum. I've been playing around with Stable Diffusion recently, and an analogy occurred to me between today's AI's notoriously bad generation of hands and future AI's potentially bad reasoning about philosophy. In case you aren't already familiar, currently available image generation AIs are very prone to outputting bad hands, e.g., ones with four or six fingers, or two thumbs, or unnatural poses, or interacting with other objects in very strange ways. Perhaps what's especially striking is how bad AIs are at hands relative to other image generation capabilities, thus serving as a cautionary tale about differentially decelerating philosophy relative to other forms of intellectual progress, e.g., scientific and technological progress. Is anyone looking into differential artistic progress as a possible x-risk? /jk Some explanations I've seen for why AI is bad at hands: it's hard for AIs to learn hand generation because of how many poses a hand can make, how many different ways it can interact with other objects, and how many different viewing angles AIs need to learn to reproduce each 2D image provides only partial information about a hand (much of it is often obscured behind other objects or parts of itself) most hands in the training data are very low resolution (a tiny part of the overall image) and thus not helpful for training AI the proportion of hands in the training set is too low for the AI to devote much model capacity to hand generation ("misalignment" between the loss function and what humans care about probably also contributes to this) AI developers just haven't collected and trained AI on enough high quality hand images yet There are news articles about this problem going back to at least 2022, and I can see a lot of people trying to solve it (on Reddit, GitHub, arXiv) but progress has been limited. Straightforward techniques like prompt engineering and finetuning do not seem to help much. Here are 2 SOTA techniques, to give you a glimpse of what the technological frontier currently looks like (at least in open source): Post-process images with a separate ML-based pipeline to fix hands after initial generation. This creates well-formed hands but doesn't seem to take interactions with other objects into (sufficient or any) consideration. If you're not trying to specifically generate hands, but just don't want to see incidentally bad hands in images with humans in them, get rid of all hand-related prompts, LoRAs, textual inversions, etc., and just putting "hands" in the negative prompt. This doesn't eliminate all hands but reduces the number/likelihood of hands in the picture and also makes the remaining ones look better. (The idea behind this is that it makes the AI "try less hard" to generate hands, and perhaps focus more on central examples that it has more training on. Of course generating hands is ultimately not a very hard problem. Hand anatomy and its interactions with other objects pose no fundamental mysteries. Bad hands are easy for humans to recognize and therefore we have quick and easy feedback for how well we're solving the problem. We can use our explicit understanding of hands to directly help solve the problem (solution 1 above used at least the fact that hands are compact 3D objects), or just provide the AI with more high quality training data (physically taking more photos of hands if needed) until it recognizably fixed itself. What about philosophy? Well, scarcity of existing high quality training data, check. Lots of unhelpful data labeled "philosophy", check. Low proportion of philosophy in the training data, check. Quick and easy to generate more high quality data, no. Good explicit understanding of the principles involved, ...

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

We are running an end of year survey for our listeners! Please let us know any feedback you have, what episodes resonated with you, and guest requests for 2024! Survey link here!Before language models became all the rage in November 2022, image generation was the hottest space in AI (it was the subject of our first piece on Latent Space!) In our interview with Sharif Shameem from Lexica we talked through the launch of StableDiffusion and the early days of that space. At the time, the toolkit was still pretty rudimentary: Lexica made it easy to search images, you had the AUTOMATIC1111 Web UI to generate locally, some HuggingFace spaces that offered inference, and eventually DALL-E 2 through OpenAI's platform, but not much beyond basic text-to-image workflows.Today's guest, Suhail Doshi, is trying to solve this with Playground AI, an image editor reimagined with AI in mind. Some of the differences compared to traditional text-to-image workflows:* Real-time preview rendering using consistency: as you change your prompt, you can see changes in real-time before doing a final rendering of it.* Style filtering: rather than having to prompt exactly how you'd like an image to look, you can pick from a whole range of filters both from Playground's model as well as Stable Diffusion (like RealVis, Starlight XL, etc). We talk about this at 25:46 in the podcast.* Expand prompt: similar to DALL-E3, Playground will do some prompt tuning for you to get better results in generation. Unlike DALL-E3, you can turn this off at any time if you are a prompting wizard* Image editing: after generation, you have tools like a magic eraser, inpainting pencil, etc. This makes it easier to do a full workflow in Playground rather than switching to another tool like Photoshop.Outside of the product, they have also trained a new model from scratch, Playground v2, which is fully open source and open weights and allows for commercial usage. They benchmarked the model against SDXL across 1,000 prompts and found that humans preferred the Playground generation 70% of the time. They had similar results on PartiPrompts:They also created a new benchmark, MJHQ-30K, for “aesthetic quality”:We introduce a new benchmark, MJHQ-30K, for automatic evaluation of a model's aesthetic quality. The benchmark computes FID on a high-quality dataset to gauge aesthetic quality.We curate the high-quality dataset from Midjourney with 10 common categories, each category with 3K samples. Following common practice, we use aesthetic score and CLIP score to ensure high image quality and high image-text alignment. Furthermore, we take extra care to make the data diverse within each category.Suhail was pretty open with saying that Midjourney is currently the best product for imagine generation out there, and that's why they used it as the base for this benchmark. I think it's worth comparing yourself to maybe the best thing and try to find like a really fair way of doing that. So I think more people should try to do that. I definitely don't think you should be kind of comparing yourself on like some Google model or some old SD, Stable Diffusion model and be like, look, we beat Stable Diffusion 1.5. I think users ultimately want care, how close are you getting to the thing that people mostly agree with? [00:23:47]We also talked a lot about Suhail's founder journey from starting Mixpanel in 2009, then going through YC again with Mighty, and eventually sunsetting that to pivot into Playground. Enjoy!Show Notes* Suhail's Twitter* “Starting my road to learn AI”* Bill Gates book trip* Playground* Playground v2 Announcement* $40M raise announcement* “Running infra dev ops for 24 A100s”* Mixpanel* Mighty* “I decided to stop working on Mighty”* Fast.ai* CivitTimestamps* [00:00:00] Intros* [00:02:59] Being early in ML at Mixpanel* [00:04:16] Pivoting from Mighty to Playground and focusing on generative AI* [00:07:54] How DALL-E 2 inspired Mighty* [00:09:19] Reimagining the graphics editor with AI* [00:17:34] Training the Playground V2 model from scratch to advance generative graphics* [00:21:11] Techniques used to improve Playground V2 like data filtering and model tuning* [00:25:21] Releasing the MJHQ30K benchmark to evaluate generative models* [00:30:35] The limitations of current models for detailed image editing tasks* [00:34:06] Using post-generation user feedback to create better benchmarks* [00:38:28] Concerns over potential misuse of powerful generative models* [00:41:54] Rethinking the graphics editor user experience in the AI era* [00:45:44] Integrating consistency models into Playground using preview rendering* [00:47:23] Interacting with the Stable Diffusion LoRAs community* [00:51:35] Running DevOps on A100s* [00:53:12] Startup ideas?TranscriptAlessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO-in-Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI. [00:00:15]Swyx: Hey, and today in the studio we have Suhail Doshi, welcome. [00:00:18]Suhail: Yeah, thanks. Thanks for having me. [00:00:20]Swyx: So among many things, you're a CEO and co-founder of Mixpanel, and I think about three years ago you left to start Mighty, and more recently, I think about a year ago, transitioned into Playground, and you've just announced your new round. How do you like to be introduced beyond that? [00:00:34]Suhail: Just founder of Playground is fine, yeah, prior co-founder and CEO of Mixpanel. [00:00:40]Swyx: Yeah, awesome. I'd just like to touch on Mixpanel a little bit, because it's obviously one of the more successful analytics companies we previously had amplitude on, and I'm curious if you had any reflections on the interaction of that amount of data that people would want to use for AI. I don't know if there's still a part of you that stays in touch with that world. [00:00:59]Suhail: Yeah, I mean, the short version is that maybe back in like 2015 or 2016, I don't really remember exactly, because it was a while ago, we had an ML team at Mixpanel, and I think this is when maybe deep learning or something really just started getting kind of exciting, and we were thinking that maybe given that we had such vast amounts of data, perhaps we could predict things. So we built two or three different features, I think we built a feature where we could predict whether users would churn from your product. We made a feature that could predict whether users would convert, we built a feature that could do anomaly detection, like if something occurred in your product, that was just very surprising, maybe a spike in traffic in a particular region, can we tell you that that happened? Because it's really hard to like know everything that's going on with your data, can we tell you something surprising about your data? And we tried all of these various features, most of it boiled down to just like, you know, using logistic regression, and it never quite seemed very groundbreaking in the end. And so I think, you know, we had a four or five person ML team, and I think we never expanded it from there. And I did all these Fast AI courses trying to learn about ML. And that was the- That's the first time you did fast AI. Yeah, that was the first time I did fast AI. Yeah, I think I've done it now three times, maybe. [00:02:12]Swyx: Oh, okay. [00:02:13]Suhail: I didn't know it was the third. No, no, just me reviewing it, it's maybe three times, but yeah. [00:02:16]Swyx: You mentioned prediction, but honestly, like it's also just about the feedback, right? The quality of feedback from users, I think it's useful for anyone building AI applications. [00:02:25]Suhail: Yeah. Yeah, I think I haven't spent a lot of time thinking about Mixpanel because it's been a long time, but sometimes I'm like, oh, I wonder what we could do now. And then I kind of like move on to whatever I'm working on, but things have changed significantly since. [00:02:39]Swyx: And then maybe we'll touch on Mighty a little bit. Mighty was very, very bold. My framing of it was, you will run our browsers for us because everyone has too many tabs open. I have too many tabs open and slowing down your machines that you can do it better for us in a centralized data center. [00:02:51]Suhail: Yeah, we were first trying to make a browser that we would stream from a data center to your computer at extremely low latency, but the real objective wasn't trying to make a browser or anything like that. The real objective was to try to make a new kind of computer. And the thought was just that like, you know, we have these computers in front of us today and we upgrade them or they run out of RAM or they don't have enough RAM or not enough disk or, you know, there's some limitation with our computers, perhaps like data locality is a problem. Why do I need to think about upgrading my computer ever? And so, you know, we just had to kind of observe that like, well, actually it seems like a lot of applications are just now in the browser, you know, it's like how many real desktop applications do we use relative to the number of applications we use in the browser? So it's just this realization that actually like, you know, the browser was effectively becoming more or less our operating system over time. And so then that's why we kind of decided to go, hmm, maybe we can stream the browser. Fortunately, the idea did not work for a couple of different reasons, but the objective is try to make sure new computer. [00:03:50]Swyx: Yeah, very, very bold. [00:03:51]Alessio: Yeah, and I was there at YC Demo Day when you first announced it. It was, I think, the last or one of the last in-person ones, at Pier34 in Mission Bay. How do you think about that now when everybody wants to put some of these models in people's machines and some of them want to stream them in, do you think there's maybe another wave of the same problem before it was like browser apps too slow, now it's like models too slow to run on device? [00:04:16]Suhail: Yeah. I mean, I've obviously pivoted away from Mighty, but a lot of what I somewhat believed at Mighty, maybe why I'm so excited about AI and what's happening, a lot of what Mighty was about was like moving compute somewhere else, right? Right now, applications, they get limited quantities of memory, disk, networking, whatever your home network has, et cetera. You know, what if these applications could somehow, if we could shift compute, and then these applications have vastly more compute than they do today. Right now it's just like client backend services, but you know, what if we could change the shape of how applications could interact with things? And it's changed my thinking. In some ways, AI has like a bit of a continuation of my belief that like perhaps we can really shift compute somewhere else. One of the problems with Mighty was that JavaScript is single-threaded in the browser. And what we learned, you know, the reason why we kind of abandoned Mighty was because I didn't believe we could make a new kind of computer. We could have made some kind of enterprise business, probably it could have made maybe a lot of money, but it wasn't going to be what I hoped it was going to be. And so once I realized that most of a web app is just going to be single-threaded JavaScript, then the only thing you could do largely withstanding changing JavaScript, which is a fool's errand most likely, make a better CPU, right? And there's like three CPU manufacturers, two of which sell, you know, big ones, you know, AMD, Intel, and then of course like Apple made the M1. And it's not like single-threaded CPU core performance, single-core performance was increasing very fast, it's plateauing rapidly. And even these different companies were not doing as good of a job, you know, sort of with the continuation of Moore's law. But what happened in AI was that you got like, if you think of the AI model as like a computer program, like just like a compiled computer program, it is literally built and designed to do massive parallel computations. And so if you could take like the universal approximation theorem to its like kind of logical complete point, you know, you're like, wow, I can get, make computation happen really rapidly and parallel somewhere else, you know, so you end up with these like really amazing models that can like do anything. It just turned out like perhaps the new kind of computer would just simply be shifted, you know, into these like really amazing AI models in reality. Yeah. [00:06:30]Swyx: Like I think Andrej Karpathy has always been, has been making a lot of analogies with the LLMOS. [00:06:34]Suhail: I saw his video and I watched that, you know, maybe two weeks ago or something like that. I was like, oh man, this, I very much resonate with this like idea. [00:06:41]Swyx: Why didn't I see this three years ago? [00:06:43]Suhail: Yeah. I think, I think there still will be, you know, local models and then there'll be these very large models that have to be run in data centers. I think it just depends on kind of like the right tool for the job, like any engineer would probably care about. But I think that, you know, by and large, like if the models continue to kind of keep getting bigger, you're always going to be wondering whether you should use the big thing or the small, you know, the tiny little model. And it might just depend on like, you know, do you need 30 FPS or 60 FPS? Maybe that would be hard to do, you know, over a network. [00:07:13]Swyx: You tackled a much harder problem latency wise than the AI models actually require. Yeah. [00:07:18]Suhail: Yeah. You can do quite well. You can do quite well. You definitely did 30 FPS video streaming, did very crazy things to make that work. So I'm actually quite bullish on the kinds of things you can do with networking. [00:07:30]Swyx: Maybe someday you'll come back to that at some point. But so for those that don't know, you're very transparent on Twitter. Very good to follow you just to learn your insights. And you actually published a postmortem on Mighty that people can read up on and willing to. So there was a bit of an overlap. You started exploring the AI stuff in June 2022, which is when you started saying like, I'm taking fast AI again. Maybe, was there more context around that? [00:07:54]Suhail: Yeah. I think I was kind of like waiting for the team at Mighty to finish up, you know, something. And I was like, okay, well, what can I do? I guess I will make some kind of like address bar predictor in the browser. So we had, you know, we had forked Chrome and Chromium. And I was like, you know, one thing that's kind of lame is that like this browser should be like a lot better at predicting what I might do, where I might want to go. It struck me as really odd that, you know, Chrome had very little AI actually or ML inside this browser. For a company like Google, you'd think there's a lot. Code is actually just very, you know, it's just a bunch of if then statements is more or less the address bar. So it seemed like a pretty big opportunity. And that's also where a lot of people interact with the browser. So, you know, long story short, I was like, hmm, I wonder what I could build here. So I started to take some AI courses and review the material again and get back to figuring it out. But I think that was somewhat serendipitous because right around April was, I think, a very big watershed moment in AI because that's when Dolly 2 came out. And I think that was the first truly big viral moment for generative AI. [00:08:59]Swyx: Because of the avocado chair. [00:09:01]Suhail: Yeah, exactly. [00:09:02]Swyx: It wasn't as big for me as Stable Diffusion. [00:09:04]Suhail: Really? [00:09:05]Swyx: Yeah, I don't know. Dolly was like, all right, that's cool. [00:09:07]Suhail: I don't know. Yeah. [00:09:09]Swyx: I mean, they had some flashy videos, but it didn't really register. [00:09:13]Suhail: That moment of images was just such a viral novel moment. I think it just blew people's mind. Yeah. [00:09:19]Swyx: I mean, it's the first time I encountered Sam Altman because they had this Dolly 2 hackathon and they opened up the OpenAI office for developers to walk in back when it wasn't as much of a security issue as it is today. I see. Maybe take us through the journey to decide to pivot into this and also choosing images. Obviously, you were inspired by Dolly, but there could be any number of AI companies and businesses that you could start and why this one, right? [00:09:45]Suhail: Yeah. So I think at that time, Mighty and OpenAI was not quite as popular as it is all of a sudden now these days, but back then they had a lot more bandwidth to kind of help anybody. And so we had been talking with the team there around trying to see if we could do really fast low latency address bar prediction with GPT-3 and 3.5 and that kind of thing. And so we were sort of figuring out how could we make that low latency. I think that just being able to talk to them and kind of being involved gave me a bird's eye view into a bunch of things that started to happen. Latency first was the Dolly 2 moment, but then stable diffusion came out and that was a big moment for me as well. And I remember just kind of like sitting up one night thinking, I was like, you know, what are the kinds of companies one could build? Like what matters right now? One thing that I observed is that I find a lot of inspiration when I'm working in a field in something and then I can identify a bunch of problems. Like for Mixpanel, I was an intern at a company and I just noticed that they were doing all this data analysis. And so I thought, hmm, I wonder if I could make a product and then maybe they would use it. And in this case, you know, the same thing kind of occurred. It was like, okay, there are a bunch of like infrastructure companies that put a model up and then you can use their API, like Replicate is a really good example of that. There are a bunch of companies that are like helping you with training, model optimization, Mosaic at the time, and probably still, you know, was doing stuff like that. So I just started listing out like every category of everything, of every company that was doing something interesting. I started listing out like weights and biases. I was like, oh man, weights and biases is like this great company. Do I want to compete with that company? I might be really good at competing with that company because of Mixpanel because it's so much of like analysis. But I was like, no, I don't want to do anything related to that. That would, I think that would be too boring now at this point. So I started to list out all these ideas and one thing I observed was that at OpenAI, they had like a playground for GPT-3, right? All it was is just like a text box more or less. And then there were some settings on the right, like temperature and whatever. [00:11:41]Swyx: Top K. [00:11:42]Suhail: Yeah, top K. You know, what's your end stop sequence? I mean, that was like their product before GPT, you know, really difficult to use, but fun if you're like an engineer. And I just noticed that their product kind of was evolving a little bit where the interface kind of was getting a little bit more complex. They had like a way where you could like generate something in the middle of a sentence and all those kinds of things. And I just thought to myself, I was like, everything is just like this text box and you generate something and that's about it. And stable diffusion had kind of come out and it was all like hugging face and code. Nobody was really building any UI. And so I had this kind of thing where I wrote prompt dash like question mark in my notes and I didn't know what was like the product for that at the time. I mean, it seems kind of trite now, but I just like wrote prompt. What's the thing for that? Manager. Prompt manager. Do you organize them? Like, do you like have a UI that can play with them? Yeah. Like a library. What would you make? And so then, of course, then you thought about what would the modalities be given that? How would you build a UI for each kind of modality? And so there are a couple of people working on some pretty cool things. And I basically chose graphics because it seemed like the most obvious place where you could build a really powerful, complex UI. That's not just only typing a box. It would very much evolve beyond that. Like what would be the best thing for something that's visual? Probably something visual. Yeah. I think that just that progression kind of happened and it just seemed like there was a lot of effort going into language, but not a lot of effort going into graphics. And then maybe the very last thing was, I think I was talking to Aditya Ramesh, who was the co-creator of DALL-E 2 and Sam. And I just kind of went to these guys and I was just like, hey, are you going to make like a UI for this thing? Like a true UI? Are you going to go for this? Are you going to make a product? For DALL-E. Yeah. For DALL-E. Yeah. Are you going to do anything here? Because if you are going to do it, just let me know and I will stop and I'll go do something else. But if you're not going to do anything, I'll just do it. And so we had a couple of conversations around what that would look like. And then I think ultimately they decided that they were going to focus on language primarily. And I just felt like it was going to be very underinvested in. Yes. [00:13:46]Swyx: There's that sort of underinvestment from OpenAI, but also it's a different type of customer than you're used to, presumably, you know, and Mixpanel is very good at selling to B2B and developers will figure on you or not. Yeah. Was that not a concern? [00:14:00]Suhail: Well, not so much because I think that, you know, right now I would say graphics is in this very nascent phase. Like most of the customers are just like hobbyists, right? Yeah. Like it's a little bit of like a novel toy as opposed to being this like very high utility thing. But I think ultimately, if you believe that you could make it very high utility, the probably the next customers will end up being B2B. It'll probably not be like a consumer. There will certainly be a variation of this idea that's in consumer. But if your quest is to kind of make like something that surpasses human ability for graphics, like ultimately it will end up being used for business. So I think it's maybe more of a progression. In fact, for me, it's maybe more like Mixpanel started out as SMB and then very much like ended up starting to grow up towards enterprise. So for me, I think it will be a very similar progression. But yeah, I mean, the reason why I was excited about it is because it was a creative tool. I make music and it's AI. It's like something that I know I could stay up till three o'clock in the morning doing. Those are kind of like very simple bars for me. [00:14:56]Alessio: So you mentioned Dolly, Stable Diffusion. You just had Playground V2 come out two days ago. Yeah, two days ago. [00:15:02]Suhail: Two days ago. [00:15:03]Alessio: This is a model you train completely from scratch. So it's not a cheap fine tune on something. You open source everything, including the weights. Why did you decide to do it? I know you supported Stable Diffusion XL in Playground before, right? Yep. What made you want to come up with V2 and maybe some of the interesting, you know, technical research work you've done? [00:15:24]Suhail: Yeah. So I think that we continue to feel like graphics and these foundation models for anything really related to pixels, but also definitely images continues to be very underinvested. It feels a little like graphics is in like this GPT-2 moment, right? Like even GPT-3, even when GPT-3 came out, it was exciting, but it was like, what are you going to use this for? Yeah, we'll do some text classification and some semantic analysis and maybe it'll sometimes like make a summary of something and it'll hallucinate. But no one really had like a very significant like business application for GPT-3. And in images, we're kind of stuck in the same place. We're kind of like, okay, I write this thing in a box and I get some cool piece of artwork and the hands are kind of messed up and sometimes the eyes are a little weird. Maybe I'll use it for a blog post, you know, that kind of thing. The utility feels so limited. And so, you know, and then we, you sort of look at Stable Diffusion and we definitely use that model in our product and our users like it and use it and love it and enjoy it, but it hasn't gone nearly far enough. So we were kind of faced with the choice of, you know, do we wait for progress to occur or do we make that progress happen? So yeah, we kind of embarked on a plan to just decide to go train these things from scratch. And I think the community has given us so much. The community for Stable Diffusion I think is one of the most vibrant communities on the internet. It's like amazing. It feels like, I hope this is what like Homebrew Club felt like when computers like showed up because it's like amazing what that community will do and it moves so fast. I've never seen anything in my life and heard other people's stories around this where an academic research paper comes out and then like two days later, someone has sample code for it. And then two days later, there's a model. And then two days later, it's like in nine products, you know, they're all competing with each other. It's incredible to see like math symbols on an academic paper go to well-designed features in a product. So I think the community has done so much. So I think we wanted to give back to the community kind of on our way. Certainly we would train a better model than what we gave out on Tuesday, but we definitely felt like there needs to be some kind of progress in these open source models. The last kind of milestone was in July when Stable Diffusion Excel came out, but there hasn't been anything really since. Right. [00:17:34]Swyx: And there's Excel Turbo now. [00:17:35]Suhail: Well, Excel Turbo is like this distilled model, right? So it's like lower quality, but fast. You have to decide, you know, what your trade off is there. [00:17:42]Swyx: It's also a consistency model. [00:17:43]Suhail: I don't think it's a consistency model. It's like it's they did like a different thing. Yeah. I think it's like, I don't want to get quoted for this, but it's like something called ad like adversarial or something. [00:17:52]Swyx: That's exactly right. [00:17:53]Suhail: I've read something about that. Maybe it's like closer to GANs or something, but I didn't really read the full paper. But yeah, there hasn't been quite enough progress in terms of, you know, there's no multitask image model. You know, the closest thing would be something called like EmuEdit, but there's no model for that. It's just a paper that's within meta. So we did that and we also gave out pre-trained weights, which is very rare. Usually you just get the aligned model and then you have to like see if you can do anything with it. So we actually gave out, there's like a 256 pixel pre-trained stage and a 512. And we did that for academic research because we come across people all the time in academia, they have access to like one A100 or eight at best. And so if we can give them kind of like a 512 pre-trained model, our hope is that there'll be interesting novel research that occurs from that. [00:18:38]Swyx: What research do you want to happen? [00:18:39]Suhail: I would love to see more research around things that users care about tend to be things like character consistency. [00:18:45]Swyx: Between frames? [00:18:46]Suhail: More like if you have like a face. Yeah, yeah. Basically between frames, but more just like, you know, you have your face and it's in one image and then you want it to be like in another. And users are very particular and sensitive to faces changing because we know we're trained on faces as humans. Not seeing a lot of innovation, enough innovation around multitask editing. You know, there are two things like instruct pics to pics and then the EmuEdit paper that are maybe very interesting, but we certainly are not pushing the fold on that in that regard. All kinds of things like around that rotation, you know, being able to keep coherence across images, style transfer is still very limited. Just even reasoning around images, you know, what's going on in an image, that kind of thing. Things are still very, very underpowered, very nascent. So therefore the utility is very, very limited. [00:19:32]Alessio: On the 1K Prompt Benchmark, you are 2.5x prefer to Stable Diffusion XL. How do you get there? Is it better images in the training corpus? Can you maybe talk through the improvements in the model? [00:19:44]Suhail: I think they're still very early on in the recipe, but I think it's a lot of like little things and you know, every now and then there are some big important things like certainly your data quality is really, really important. So we spend a lot of time thinking about that. But I would say it's a lot of things that you kind of clean up along the way as you train your model. Everything from captions to the data that you align with after pre-train to how you're picking your data sets, how you filter your data sets. I feel like there's a lot of work in AI that doesn't really feel like AI. It just really feels like just data set filtering and systems engineering and just like, you know, and the recipe is all there, but it's like a lot of extra work to do that. I think we plan to do a Playground V 2.1, maybe either by the end of the year or early next year. And we're just like watching what the community does with the model. And then we're just going to take a lot of the things that they're unhappy about and just like fix them. You know, so for example, like maybe the eyes of people in an image don't feel right. They feel like they're a little misshapen or they're kind of blurry feeling. That's something that we already know we want to fix. So I think in that case, it's going to be about data quality. Or maybe you want to improve the kind of the dynamic range of color. You know, we want to make sure that that's like got a good range in any image. So what technique can we use there? There's different things like offset noise, pyramid noise, terminal zero, SNR, like there are all these various interesting things that you can do. So I think it's like a lot of just like tricks. Some are tricks, some are data, and some is just like cleaning. [00:21:11]Swyx: Specifically for faces, it's very common to use a pipeline rather than just train the base model more. Do you have a strong belief either way on like, oh, they should be separated out to different stages for like improving the eyes, improving the face or enhance or whatever? Or do you think like it can all be done in one model? [00:21:28]Suhail: I think we will make a unified model. Yeah, I think it will. I think we'll certainly in the end, ultimately make a unified model. There's not enough research about this. Maybe there is something out there that we haven't read. There are some bottlenecks, like for example, in the VAE, like the VAEs are ultimately like compressing these things. And so you don't know. And then you might have like a big informational information bottleneck. So maybe you would use a pixel based model, perhaps. I think we've talked to people, everyone from like Rombach to various people, Rombach trained stable diffusion. I think there's like a big question around the architecture of these things. It's still kind of unknown, right? Like we've got transformers and we've got like a GPT architecture model, but then there's this like weird thing that's also seemingly working with diffusion. And so, you know, are we going to use vision transformers? Are we going to move to pixel based models? Is there a different kind of architecture? We don't really, I don't think there have been enough experiments. Still? Oh my God. [00:22:21]Swyx: Yeah. [00:22:22]Suhail: That's surprising. I think it's very computationally expensive to do a pipeline model where you're like fixing the eyes and you're fixing the mouth and you're fixing the hands. [00:22:29]Swyx: That's what everyone does as far as I understand. [00:22:31]Suhail: I'm not exactly sure what you mean, but if you mean like you get an image and then you will like make another model specifically to fix a face, that's fairly computationally expensive. And I think it's like not probably not the right way. Yeah. And it doesn't generalize very well. Now you have to pick all these different things. [00:22:45]Swyx: Yeah. You're just kind of glomming things on together. Yeah. Like when I look at AI artists, like that's what they do. [00:22:50]Suhail: Ah, yeah, yeah, yeah. They'll do things like, you know, I think a lot of ARs will do control net tiling to do kind of generative upscaling of all these different pieces of the image. Yeah. And I think these are all just like, they're all hacks ultimately in the end. I mean, it just to me, it's like, let's go back to where we were just three years, four years ago with where deep learning was at and where language was that, you know, it's the same thing. It's like we were like, okay, well, I'll just train these very narrow models to try to do these things and kind of ensemble them or pipeline them to try to get to a best in class result. And here we are with like where the models are gigantic and like very capable of solving huge amounts of tasks when given like lots of great data. [00:23:28]Alessio: You also released a new benchmark called MJHQ30K for automatic evaluation of a model's aesthetic quality. I have one question. The data set that you use for the benchmark is from Midjourney. Yes. You have 10 categories. How do you think about the Playground model, Midjourney, like, are you competitors? [00:23:47]Suhail: There are a lot of people, a lot of people in research, they like to compare themselves to something they know they can beat, right? Maybe this is the best reason why it can be helpful to not be a researcher also sometimes like I'm not trained as a researcher, I don't have a PhD in anything AI related, for example. But I think if you care about products and you care about your users, then the most important thing that you want to figure out is like everyone has to acknowledge that Midjourney is very good. They are the best at this thing. I'm happy to admit that. I have no problem admitting that. Just easy. It's very visual to tell. So I think it's incumbent on us to try to compare ourselves to the thing that's best, even if we lose, even if we're not the best. At some point, if we are able to surpass Midjourney, then we only have ourselves to compare ourselves to. But on First Blush, I think it's worth comparing yourself to maybe the best thing and try to find like a really fair way of doing that. So I think more people should try to do that. I definitely don't think you should be kind of comparing yourself on like some Google model or some old SD, Stable Diffusion model and be like, look, we beat Stable Diffusion 1.5. I think users ultimately want care, how close are you getting to the thing that people mostly agree with? So we put out that benchmark for no other reason to say like, this seems like a worthy thing for us to at least try, for people to try to get to. And then if we surpass it, great, we'll come up with another one. [00:25:06]Alessio: Yeah, no, that's awesome. And you killed Stable Diffusion Excel and everything. In the benchmark chart, it says Playground V2 1024 pixel dash aesthetic. Do you have kind of like, yeah, style fine tunes or like what's the dash aesthetic for? [00:25:21]Suhail: We debated this, maybe we named it wrong or something, but we were like, how do we help people realize the model that's aligned versus the models that weren't? Because we gave out pre-trained models, we didn't want people to like use those. So that's why they're called base. And then the aesthetic model, yeah, we wanted people to pick up the thing that makes things pretty. Who wouldn't want the thing that's aesthetic? But if there's a better name, we're definitely open to feedback. No, no, that's cool. [00:25:46]Alessio: I was using the product. You also have the style filter and you have all these different styles. And it seems like the styles are tied to the model. So there's some like SDXL styles, there's some Playground V2 styles. Can you maybe give listeners an overview of how that works? Because in language, there's not this idea of like style, right? Versus like in vision model, there is, and you cannot get certain styles in different [00:26:11]Suhail: models. [00:26:12]Alessio: So how do styles emerge and how do you categorize them and find them? [00:26:15]Suhail: Yeah, I mean, it's so fun having a community where people are just trying a model. Like it's only been two days for Playground V2. And we actually don't know what the model's capable of and not capable of. You know, we certainly see problems with it. But we have yet to see what emergent behavior is. I mean, we've just sort of discovered that it takes about like a week before you start to see like new things. I think like a lot of that style kind of emerges after that week, where you start to see, you know, there's some styles that are very like well known to us, like maybe like pixel art is a well known style. Photorealism is like another one that's like well known to us. But there are some styles that cannot be easily named. You know, it's not as simple as like, okay, that's an anime style. It's very visual. And in the end, you end up making up the name for what that style represents. And so the community kind of shapes itself around these different things. And so if anyone that's into stable diffusion and into building anything with graphics and stuff with these models, you know, you might have heard of like Proto Vision or Dream Shaper, some of these weird names, but they're just invented by these authors. But they have a sort of je ne sais quoi that, you know, appeals to users. [00:27:26]Swyx: Because it like roughly embeds to what you what you want. [00:27:29]Suhail: I guess so. I mean, it's like, you know, there's one of my favorite ones that's fine tuned. It's not made by us. It's called like Starlight XL. It's just this beautiful model. It's got really great color contrast and visual elements. And the users love it. I love it. And it's so hard. I think that's like a very big open question with graphics that I'm not totally sure how we'll solve. I don't know. It's, it's like an evolving situation too, because styles get boring, right? They get fatigued. Like it's like listening to the same style of pop song. I try to relate to graphics a little bit like with music, because I think it gives you a little bit of a different shape to things. Like it's not as if we just have pop music, rap music and country music, like all of these, like the EDM genre alone has like sub genres. And I think that's very true in graphics and painting and art and anything that we're doing. There's just these sub genres, even if we can't quite always name them. But I think they are emergent from the community, which is why we're so always happy to work with the community. [00:28:26]Swyx: That is a struggle. You know, coming back to this, like B2B versus B2C thing, B2C, you're going to have a huge amount of diversity and then it's going to reduce as you get towards more sort of B2B type use cases. I'm making this up here. So like you might be optimizing for a thing that you may eventually not need. [00:28:42]Suhail: Yeah, possibly. Yeah, possibly. I think like a simple thing with startups is that I worry sometimes by trying to be overly ambitious and like really scrutinizing like what something is in its most nascent phase that you miss the most ambitious thing you could have done. Like just having like very basic curiosity with something very small can like kind of lead you to something amazing. Like Einstein definitely did that. And then he like, you know, he basically won all the prizes and got everything he wanted and then basically did like kind of didn't really. He can dismiss quantum and then just kind of was still searching, you know, for the unifying theory. And he like had this quest. I think that happens a lot with like Nobel Prize people. I think there's like a term for it that I forget. I actually wanted to go after a toy almost intentionally so long as that I could see, I could imagine that it would lead to something very, very large later. Like I said, it's very hobbyist, but you need to start somewhere. You need to start with something that has a big gravitational pull, even if these hobbyists aren't likely to be the people that, you know, have a way to monetize it or whatever, even if they're, but they're doing it for fun. So there's something, something there that I think is really important. But I agree with you that, you know, in time we will absolutely focus on more utilitarian things like things that are more related to editing feats that are much harder. And so I think like a very simple use case is just, you know, I'm not a graphics designer. It seems like very simple that like you, if we could give you the ability to do really complex graphics without skill, wouldn't you want that? You know, like my wife the other day was set, you know, said, I wish Playground was better. When are you guys going to have a feature where like we could make my son, his name's Devin, smile when he was not smiling in the picture for the holiday card. Right. You know, just being able to highlight his, his mouth and just say like, make him smile. Like why can't we do that with like high fidelity and coherence, little things like that, all the way to putting you in completely different scenarios. [00:30:35]Swyx: Is that true? Can we not do that in painting? [00:30:37]Suhail: You can do in painting, but the quality is just so bad. Yeah. It's just really terrible quality. You know, it's like you'll do it five times and it'll still like kind of look like crooked or just artifact. Part of it's like, you know, the lips on the face, there's such little information there. So small that the models really struggle with it. Yeah. [00:30:55]Swyx: Make the picture smaller and you don't see it. That's my trick. I don't know. [00:30:59]Suhail: Yeah. Yeah. That's true. Or, you know, you could take that region and make it really big and then like say it's a mouth and then like shrink it. It feels like you're wrestling with it more than it's doing something that kind of surprises you. [00:31:12]Swyx: Yeah. It feels like you are very much the internal tastemaker, like you carry in your head this vision for what a good art model should look like. Do you find it hard to like communicate it to like your team and other people? Just because it's obviously it's hard to put into words like we just said. [00:31:26]Suhail: Yeah. It's very hard to explain. Images have such high bitrate compared to just words and we don't have enough words to describe these things. It's not terribly difficult. I think everyone on the team, if they don't have good kind of like judgment taste or like an eye for some of these things, they're like steadily building it because they have no choice. Right. So in that realm, I don't worry too much, actually. Like everyone is kind of like learning to get the eye is what I would call it. But I also have, you know, my own narrow taste. Like I don't represent the whole population either. [00:31:59]Swyx: When you benchmark models, you know, like this benchmark we're talking about, we use FID. Yeah. Input distance. OK. That's one measure. But like it doesn't capture anything you just said about smiles. [00:32:08]Suhail: Yeah. FID is generally a bad metric. It's good up to a point and then it kind of like is irrelevant. Yeah. [00:32:14]Swyx: And then so are there any other metrics that you like apart from vibes? I'm always looking for alternatives to vibes because vibes don't scale, you know. [00:32:22]Suhail: You know, it might be fun to kind of talk about this because it's actually kind of fresh. So up till now, we haven't needed to do a ton of like benchmarking because we hadn't trained our own model and now we have. So now what? What does that mean? How do we evaluate it? And, you know, we're kind of like living with the last 48, 72 hours of going, did the way that we benchmark actually succeed? [00:32:43]Swyx: Did it deliver? [00:32:44]Suhail: Right. You know, like I think Gemini just came out. They just put out a bunch of benchmarks. But all these benchmarks are just an approximation of how you think it's going to end up with real world performance. And I think that's like very fascinating to me. So if you fake that benchmark, you'll still end up in a really bad scenario at the end of the day. And so, you know, one of the benchmarks we did was we kind of curated like a thousand prompts. And I think that's kind of what we published in our blog post, you know, of all these tasks that we a lot of some of them are curated by our team where we know the models all suck at it. Like my favorite prompt that no model is really capable of is a horse riding an astronaut, the inverse one. And it's really, really hard to do. [00:33:22]Swyx: Not in data. [00:33:23]Suhail: You know, another one is like a giraffe underneath a microwave. How does that work? Right. There's so many of these little funny ones. We do. We have prompts that are just like misspellings of things. Yeah. We'll figure out if the models will figure it out. [00:33:36]Swyx: They should embed to the same space. [00:33:39]Suhail: Yeah. And just like all these very interesting weirdo things. And so we have so many of these and then we kind of like evaluate whether the models are any good at it. And the reality is that they're all bad at it. And so then you're just picking the most aesthetic image. We're still at the beginning of building like the best benchmark we can that aligns most with just user happiness, I think, because we're not we're not like putting these in papers and trying to like win, you know, I don't know, awards at ICCV or something if they have awards. You could. [00:34:05]Swyx: That's absolutely a valid strategy. [00:34:06]Suhail: Yeah, you could. But I don't think it could correlate necessarily with the impact we want to have on humanity. I think we're still evolving whatever our benchmarks are. So the first benchmark was just like very difficult tasks that we know the models are bad at. Can we come up with a thousand of these, whether they're hand rated and some of them are generated? And then can we ask the users, like, how do we do? And then we wanted to use a benchmark like party prompts. We mostly did that so people in academia could measure their models against ours versus others. But yeah, I mean, fit is pretty bad. And I think in terms of vibes, it's like you put out the model and then you try to see like what users make. And I think my sense is that we're going to take all the things that we notice that the users kind of were failing at and try to find like new ways to measure that, whether that's like a smile or, you know, color contrast or lighting. One benefit of Playground is that we have users making millions of images every single day. And so we can just ask them for like a post generation feedback. Yeah, we can just ask them. We can just say, like, how good was the lighting here? How was the subject? How was the background? [00:35:06]Swyx: Like a proper form of like, it's just like you make it, you come to our site, you make [00:35:10]Suhail: an image and then we say, and then maybe randomly you just say, hey, you know, like, how was the color and contrast of this image? And you say it was not very good, just tell us. So I think I think we can get like tens of thousands of these evaluations every single day to truly measure real world performance as opposed to just like benchmark performance. I would like to publish hopefully next year. I think we will try to publish a benchmark that anyone could use, that we evaluate ourselves on and that other people can, that we think does a good job of approximating real world performance because we've tried it and done it and noticed that it did. Yeah. I think we will do that. [00:35:45]Swyx: I personally have a few like categories that I consider special. You know, you know, you have like animals, art, fashion, food. There are some categories which I consider like a different tier of image. Top among them is text in images. How do you think about that? So one of the big wow moments for me, something I've been looking out for the entire year is just the progress of text and images. Like, can you write in an image? Yeah. And Ideogram came out recently, which had decent but not perfect text and images. Dolly3 had improved some and all they said in their paper was that they just included more text in the data set and it just worked. I was like, that's just lazy. But anyway, do you care about that? Because I don't see any of that in like your sample. Yeah, yeah. [00:36:27]Suhail: The V2 model was mostly focused on image quality versus like the feature of text synthesis. [00:36:33]Swyx: Well, as a business user, I care a lot about that. [00:36:35]Suhail: Yeah. Yeah. I'm very excited about text synthesis. And yeah, I think Ideogram has done a good job of maybe the best job. Dolly has like a hit rate. Yes. You know, like sometimes it's Egyptian letters. Yeah. I'm very excited about text synthesis. You know, I don't have much to say on it just yet. You know, you don't want just text effects. I think where this has to go is it has to be like you could like write little tiny pieces of text like on like a milk carton. That's maybe not even the focal point of a scene. I think that's like a very hard task that, you know, if you could do something like that, then there's a lot of other possibilities. Well, you don't have to zero shot it. [00:37:09]Swyx: You can just be like here and focus on this. [00:37:12]Suhail: Sure. Yeah, yeah. Definitely. Yeah. [00:37:16]Swyx: Yeah. So I think text synthesis would be very exciting. I'll also flag that Max Wolf, MiniMaxxier, which you must have come across his work. He's done a lot of stuff about using like logo masks that then map onto food and vegetables. And it looks like text, which can be pretty fun. [00:37:29]Suhail: That's the wonderful thing about like the open source community is that you get things like control net and then you see all these people do these just amazing things with control net. And then you wonder, I think from our point of view, we sort of go that that's really wonderful. But how do we end up with like a unified model that can do that? What are the bottlenecks? What are the issues? The community ultimately has very limited resources. And so they need these kinds of like workaround research ideas to get there. But yeah. [00:37:55]Swyx: Are techniques like control net portable to your architecture? [00:37:58]Suhail: Definitely. Yeah. We kept the Playground V2 exactly the same as SDXL. Not because not out of laziness, but just because we knew that the community already had tools. You know, all you have to do is maybe change a string in your code and then, you know, retrain a control net for it. So it was very intentional to do that. We didn't want to fragment the community with different architectures. Yeah. [00:38:16]Swyx: So basically, I'm going to go over three more categories. One is UIs, like app UIs, like mock UIs. Third is not safe for work, and then copyrighted stuff. I don't know if you care to comment on any of those. [00:38:28]Suhail: I think the NSFW kind of like safety stuff is really important. I kind of think that one of the biggest risks kind of going into maybe the U.S. election year will probably be very interrelated with like graphics, audio, video. I think it's going to be very hard to explain, you know, to a family relative who's not kind of in our world. And our world is like sometimes very, you know, we think it's very big, but it's very tiny compared to the rest of the world. Some people like there's still lots of humanity who have no idea what chat GPT is. And I think it's going to be very hard to explain, you know, to your uncle, aunt, whoever, you know, hey, I saw President Biden say this thing on a video, you know, I can't believe, you know, he said that. I think that's going to be a very troubling thing going into the world next year, the year after. [00:39:12]Swyx: That's more like a risk thing, like deepfakes, faking, political faking. But there's a lot of studies on how for most businesses, you don't want to train on not safe for work images, except that it makes you really good at bodies. [00:39:24]Suhail: Personally, we filter out NSFW type of images in our data set so that it's, you know, so our safety filter stuff doesn't have to work as hard. [00:39:32]Swyx: But you've heard this argument that not safe for work images are very good at human anatomy, which you do want to be good at. [00:39:38]Suhail: It's not like necessarily a bad thing to train on that data. It's more about like how you go and use it. That's why I was kind of talking about safety, you know, in part, because there are very terrible things that can happen in the world. If you have an extremely powerful graphics model, you know, suddenly like you can kind of imagine, you know, now if you can like generate nudes and then there's like you could do very character consistent things with faces, like what does that lead to? Yeah. And so I tend to think more what occurs after that, right? Even if you train on, let's say, you know, new data, if it does something to kind of help, there's nothing wrong with the human anatomy, it's very valid for a model to learn that. But then it's kind of like, how does that get used? And, you know, I won't bring up all of the very, very unsavory, terrible things that we see on a daily basis on the site, but I think it's more about what occurs. And so we, you know, we just recently did like a big sprint on safety. It's very difficult with graphics and art, right? Because there is tasteful art that has nudity, right? They're all over in museums, like, you know, there's very valid situations for that. And then there's the things that are the gray line of that, you know, what I might not find tasteful, someone might be like, that is completely tasteful, right? And then there are things that are way over the line. And then there are things that maybe you or, you know, maybe I would be okay with, but society isn't, you know? So where does that kind of end up on the spectrum of things? I think it's really hard with art. Sometimes even if you have like things that are not nude, if a child goes to your site, scrolls down some images, you know, classrooms of kids, you know, using our product, it's a really difficult problem. And it stretches mostly culture, society, politics, everything. [00:41:14]Alessio: Another favorite topic of our listeners is UX and AI. And I think you're probably one of the best all-inclusive editors for these things. So you don't just have the prompt, images come out, you pray, and now you do it again. First, you let people pick a seed so they can kind of have semi-repeatable generation. You also have, yeah, you can pick how many images and then you leave all of them in the canvas. And then you have kind of like this box, the generation box, and you can even cross between them and outpaint. There's all these things. How did you get here? You know, most people are kind of like, give me text, I give you image. You know, you're like, these are all the tools for you. [00:41:54]Suhail: Even though we were trying to make a graphics foundation model, I think we think that we're also trying to like re-imagine like what a graphics editor might look like given the change in technology. So, you know, I don't think we're trying to build Photoshop, but it's the only thing that we could say that people are largely familiar with. Oh, okay, there's Photoshop. What would Photoshop compare itself to pre-computer? I don't know, right? It's like, or kind of like a canvas, but you know, there's these menu options and you can use your mouse. What's a mouse? So I think that we're trying to re-imagine what a graphics editor might look like, not just for the fun of it, but because we kind of have no choice. Like there's this idea in image generation where you can generate images. That's like a super weird thing. What is that in Photoshop, right? You have to wait right now for the time being, but the wait is worth it often for a lot of people because they can't make that with their own skills. So I think it goes back to, you know, how we started the company, which was kind of looking at GPT-3's Playground, that the reason why we're named Playground is a homage to that actually. And, you know, it's like, shouldn't these products be more visual? These prompt boxes are like a terminal window, right? We're kind of at this weird point where it's just like MS-DOS. I remember my mom using MS-DOS and I memorized the keywords, like DIR, LS, all those things, right? It feels a little like we're there, right? Prompt engineering, parentheses to say beautiful or whatever, waits the word token more in the model or whatever. That's like super strange. I think a large portion of humanity would agree that that's not user-friendly, right? So how do we think about the products to be more user-friendly? Well, sure, you know, sure, it would be nice if I wanted to get rid of, like, the headphones on my head, you know, it'd be nice to mask it and then say, you know, can you remove the headphones? You know, if I want to grow, expand the image, you know, how can we make that feel easier without typing lots of words and being really confused? I don't even think we've nailed the UI UX yet. Part of that is because we're still experimenting. And part of that is because the model and the technology is going to get better. And whatever felt like the right UX six months ago is going to feel very broken now. So that's a little bit of how we got there is kind of saying, does everything have to be like a prompt in a box? Or can we do things that make it very intuitive for users? [00:44:03]Alessio: How do you decide what to give access to? So you have things like an expand prompt, which Dally 3 just does. It doesn't let you decide whether you should or not. [00:44:13]Swyx: As in, like, rewrites your prompts for you. [00:44:15]Suhail: Yeah, for that feature, I think once we get it to be cheaper, we'll probably just give it up. We'll probably just give it away. But we also decided something that might be a little bit different. We noticed that most of image generation is just, like, kind of casual. You know, it's in WhatsApp. It's, you know, it's in a Discord bot somewhere with Majorny. It's in ChatGPT. One of the differentiators I think we provide is at the expense of just lots of users necessarily. Mainstream consumers is that we provide as much, like, power and tweakability and configurability as possible. So the only reason why it's a toggle, because we know that users might want to use it and might not want to use it. There's some really powerful power user hobbyists that know what they're doing. And then there's a lot of people that just want something that looks cool, but they don't know how to prompt. And so I think a lot of Playground is more about going after that core user base that, like, knows, has a little bit more savviness and how to use these tools. You know, the average Dell user is probably not going to use ControlNet. They probably don't even know what that is. And so I think that, like, as the models get more powerful, as there's more tooling, hopefully you'll imagine a new sort of AI-first graphics editor that's just as, like, powerful and configurable as Photoshop. And you might have to master a new kind of tool. [00:45:28]Swyx: There's so many things I could go bounce off of. One, you mentioned about waiting. We have to kind of somewhat address the elephant in the room. Consistency models have been blowing up the past month. How do you think about integrating that? Obviously, there's a lot of other companies also trying to beat you to that space as well. [00:45:44]Suhail: I think we were the first company to integrate it. Ah, OK. [00:45:47]Swyx: Yeah. I didn't see your demo. [00:45:49]Suhail: Oops. Yeah, yeah. Well, we integrated it in a different way. OK. There are, like, 10 companies right now that have kind of tried to do, like, interactive editing, where you can, like, draw on the left side and then you get an image on the right side. We decided to kind of, like, wait and see whether there's, like, true utility on that. We have a different feature that's, like, unique in our product that is called preview rendering. And so you go to the product and you say, you know, we're like, what is the most common use case? The most common use case is you write a prompt and then you get an image. But what's the most annoying thing about that? The most annoying thing is, like, it feels like a slot machine, right? You're like, OK, I'm going to put it in and maybe I'll get something cool. So we did something that seemed a lot simpler, but a lot more relevant to how users already use these products, which is preview rendering. You toggle it on and it will show you a render of the image. And then graphics tools already have this. Like, if you use Cinema 4D or After Effects or something, it's called viewport rendering. And so we try to take something that exists in the real world that has familiarity and say, OK, you're going to get a rough sense of an early preview of this thing. And then when you're ready to generate, we're going to try to be as coherent about that image that you saw. That way, you're not spending so much time just like pulling down the slot machine lever. I think we were the first company to actually ship a quick LCM thing. Yeah, we were very excited about it. So we shipped it very quick. Yeah. [00:47:03]Swyx: Well, the demos I've been seeing, it's not like a preview necessarily. They're almost using it to animate their generations. Like, because you can kind of move shapes. [00:47:11]Suhail: Yeah, yeah, they're like doing it. They're animating it. But they're sort of showing, like, if I move a moon, you know, can I? [00:47:17]Swyx: I don't know. To me, it unlocks video in a way. [00:47:20]Suhail: Yeah. But the video models are already so much better than that. Yeah. [00:47:23]Swyx: There's another one, which I think is general ecosystem of Loras, right? Civit is obviously the most popular repository of Loras. How do you think about interacting with that ecosystem? [00:47:34]Suhail: The guy that did Lora, not the guy that invented Loras, but the person that brought Loras to Stable Diffusion actually works with us on some projects. His name is Simu. Shout out to Simu. And I think Loras are wonderful. Obviously, fine tuning all these Dreambooth models and such, it's just so heavy. And it's obvious in our conversation around styles and vibes, it's very hard to evaluate the artistry of these things. Loras give people this wonderful opportunity to create sub-genres of art. And I think they're amazing. Any graphics tool, any kind of thing that's expressing art has to provide some level of customization to its user base that goes beyond just typing Greg Rakowski in a prompt. We have to give more than that. It's not like users want to type these real artist names. It's that they don't know how else to get an image that looks interesting. They truly want originality and uniqueness. And I think Loras provide that. And they provide it in a very nice, scalable way. I hope that we find something even better than Loras in the long term, because there are still weaknesses to Loras, but I think they do a good job for now. Yeah. [00:48:39]Swyx: And so you would never compete with Civit? You would just kind of let people import? [00:48:43]Suhail: Civit's a site where all these things get kind of hosted by the community, right? And so, yeah, we'll often pull down some of the best things there. I think when we have a significantly better model, we will certainly build something that gets closer to that. Again, I go back to saying just I still think this is very nascent. Things are very underpowered, right? Loras are not easy to train. They're easy for an engineer. It sure would be nicer if I could just pick five or six reference images, right? And they might even be five or six different reference images that are not... They're just very different. They communicate a style, but they're actually like... It's like a mood board, right? And you have to be kind of an engineer almost to train these Loras or go to some site and be technically savvy, at least. It seems like it'd be much better if I could say, I love this style. Here are five images and you tell the model, like, this is what I want. And the model gives you something that's very aligned with what your style is, what you're talking about. And it's a style you couldn't even communicate, right? There's n

Short Time Wrestling Podcast
Short Time Shots: December 8, 2023

Short Time Wrestling Podcast

Play Episode Listen Later Dec 8, 2023 5:44


Twas a backyard brawl on Thursday night. I took in some high school wrestling as I watched Mounds View High School dispatch with Irondale … which are the two high schools in my school district. One is known as the sports school, the other is the band and robotics school. The sports school won 60-6. By the way, former Gopher Marty Morgan's youngest son is a 215-pounder for Mounds View and he's freaking HUGE. He's headed to South Dakota State. I was spotted in the crowd by one Mounds View parent. Was great to just watch some pretty good high school wrestling without actually working. There's some great HS wrestling kicking off today at the Walsh Ironman. Check my socials @jasonmbryant for some knowledge bombs. It's Short Time Shots, a look at the day's college wrestling scores and more, I'm your host, Hall of Fame wrestling writer and broadcaster Jason Bryant. On to the scores on the college scene from Thursday. Speaking of scoreboards, did you know you can find all the D1 dual meet scores and links to their results at collegewrestlingscoreboard.com AND I've added another redirect to help you follow all the tournaments from every division at collegewrestlingtournaments.com. Only D1 dual of the night, Michigan State improved to 4-0 after a 25-9 win over Buffalo ON THE STAGE up in New York. Buffalo did one of those theatre-style duals for the third year in a row. If you look at the dual, this one was probably closer than the score indicates as Michigan State picked up three victories that were two points or less. Still been a rough start for Buffalo this year. In Division II, Augustana (S.D.) knocked off 13th-ranked Northern State 21-13 out in the middle of South Dakota. The Vikings were ranked 18th in the most recent NWCA rankings, which is a tournament points ranking rather than a dual ranking. Augie's top-ranked Jaxson Rohman earned a major at 125 pounds, while top-ranked Wyatt Turnquist won 7-2 at 149 pounds for Northern State. Out west, San Francisco State shutout Simpson (Calif.) 41-0. In D3, second-ranked Wartburg rolled past the only non-Iowa team in the American Rivers Conference, which used to be the Iowa Conference. The Knights had five falls, the fastest coming at 197 pounds by top-ranked Mo Endene, the returning D3 champion at the weight in the 54-0 win. Third-ranked UW-Eau Claire rallied from down 14-3 after five weights and reeled off five straight W's to oust rival UW-Whitewater 25-14. Whitewater did pick up the dual's most notable win as ninth-ranked Dominick Mallinder bounced top-ranked Tyler Fleetwood 7-1 at 133 pounds. 20th-ranked Millikin topped Aurora 23-17. Bonus points were the difference for the Big Blue as three of the wins were bonus victories including falls by Coby Haney at 133 and Nathan Callaway at 157. Also in the CCIW, Elmhurst edged Concordia Wisconsin 24-23. Roanoke beat Shenandoah 43-4 out in my home state of Virginia. Three bonus victories were the difference for Loras as they topped Cornell College 21-19 in a matchup of Iowa schools. In the NAIA, Montana State-Northern rallied to beat Dickinson State 32-16, while No. 11 Providence (Mont.) beat Eastern Oregon 28-15. Top-ranked Hunter Sparks of Eastern Oregon got a fall in that one at 125 pounds. The Evergreen State College earned its first win, topping Grays Harbor College 50-3. Evergreen's mascot? The GEODUCKS. What is a Geoduck? Well, it's the largest burrowing clam in the world, native to the coastal waters of the Pacific. Makes sense, Evergreen is in Washington. Grays Harbor is too, just a mile or two from the Muddy Banks of the Wishkah. Some of you will get the reference. On the women's side, fourth ranked Colorado Mesa torched Texas Woman's University 44-5, and Adams State 45-4. UW-Stevens Point topped Wartburg 30-13 in the first home dual in school history for the new women's program at Wartburg. Minot State was a winner over Dakota Wesleyan 27-16. Evergreen's women also picked up a W, beating Grays Harbor 34-8. Shout outs to our latest Mat Talk patrons, Warrior Coach and KBnoswag. Subscribe, for free, to my Daily Wrestling News. Get it for free at mattalkonline.com/news, sign up free today. That's as always, delivered by our longtime friends at Resilite.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

The Latent Space crew will be at NeurIPS on Tuesday! Reach out with any parties and papers of interest. We have also been incubating a smol daily AI Newsletter and Latent Space University is making progress.Good open models like Llama 2 and Mistral 7B (which has just released an 8x7B MoE model) have enabled their own sub-industry of finetuned variants for a myriad of reasons:* Ownership & Control - you take responsibility for serving the models* Privacy - not having to send data to a third party vendor* Customization - Improving some attribute (censorship, multiturn chat and chain of thought, roleplaying) or benchmark performance (without cheating)Related to improving benchmark performance is the ability to use smaller (7B, 13B) models, by matching the performance of larger models, which have both cost and inference latency benefits.Core to all this work is finetuning, and the emergent finetuning library of choice has been Wing Lian's Axolotl.AxolotlAxolotl is an LLM fine-tuner supporting SotA techniques and optimizations for a variety of common model architectures:It is used by many of the leading open source models:* Teknium: OpenHermes, Trismigestus, CollectiveCognition* OpenOrca: Mistral-OpenOrca, Mistral-SlimOrca* Nous Research: Puffin, Capybara, NousHermes* Pygmalion: Mythalion, Pygmalion* Eric Hartford: Dolphin, Samantha* DiscoResearch: DiscoLM 120B & 70B* OpenAccess AI Collective: Manticore, Minotaur, Jackalope, HippogriffAs finetuning is very formatting dependent, it also provides prompt interfaces and formatters between a range of popular model formats from Stanford's Alpaca and Steven Tey's ShareGPT (which led to Vicuna) to the more NSFW Pygmalion community.Nous Research MeetupWe last talked about Nous at the DevDay Recap at the e/acc “banger rave”. We met Wing at the Nous Research meetup at the a16z offices in San Francisco, where they officially announced their company and future plans:Including Nous Forge:Show NotesWe've already covered the nuances of Dataset Contamination and the problems with “Open Source” in AI, so we won't rehash those topics here but do read/listen to those if you missed it.* Axolotl GitHub and Discord* The Flan paper and dataset* StackLlama model and blogpost* Multipack paper* Our episode with Tri Dao* Mamba state space models - Tri Dao and Albert GuTimestamps* [00:00:00] Introducing Wing* [00:02:34] SF Open Source AI Meetup* [00:04:09] What is Axolotl?* [00:08:01] What is finetuning?* [00:08:52] Open Source Model Zoo* [00:10:53] Benchmarks and Contamination* [00:14:29] The Case for Open Source AI* [00:17:34] Orca and OpenOrca* [00:23:36] DiscoLM and Model Stacking* [00:25:07] Datasets and Evals over Models* [00:29:15] Distilling from GPT4* [00:33:31] Finetuning - LoRA, QLoRA, ReLoRA, GPTQ* [00:41:55] Axolotl vs HF Transformers* [00:48:00] 20x efficiency with StackLlama and Multipack* [00:54:47] Tri Dao and Mamba* [00:59:08] Roadmap for Axolotl* [01:01:20] The Open Source AI CommunityTranscript[00:00:00] Introducing Wing Lian[00:00:00] ​[00:00:00] swyx: Welcome to Latent Space, a special edition with Wing Lien, but also with our new guest host, Alex. Hello, hello. Welcome, welcome. Again, needs no introduction. I think it's like your sixth time on Latent Space already. I think so, yeah. And welcome, Wing. We just met, but you've been very prolific online. Thanks for having me.[00:00:30] Yeah. So you are in town. You're not local. You're in town. You're from Minneapolis?[00:00:35] Wing Lian: Annapolis. Annapolis. It's funny because a lot of people think it's Indianapolis. It's I've got Minneapolis, but I used to live out at least in the San Francisco Bay Area years ago from like 2008 to 2014. So it's fairly familiar here.[00:00:50] swyx: Yep. You're the maintainer of Axolotl now, which we'll get into. You're very, very prolific in the open source AI community, and you're also the founder of the Open Access AI Collective. Yeah. Cool. Awesome. Maybe we can go over a little bit of your backgrounds into tech and then coming into AI, and then we'll cover what[00:01:06] Wing Lian: happens and why you're here.[00:01:08] Yeah. So. Back on tech, so I started years ago, I started way back when I was scraping, Apartment websites for listings and then, and then building like SEO optimized pages and then just throwing Google AdSense on it.[00:01:24] And that got me through like college basically. Is[00:01:27] swyx: that decent money? And what year[00:01:28] Wing Lian: was this? Like 2004, 2005. Yeah, that's decent money. It's like thousand bucks a month. But as a college student, that's like. Gravy. Really good money, right? So, and then there's just too much competition It's just sort of like died off. I was writing stuff in like Perl back then using like like who nobody hosted anything on Perl anymore, right? Still did a little bit more like computer tech support and then software, and web more professionally.[00:01:54] So I spent some time working on applications in the blood industry. I came out to San Francisco for, I was at SGN, so Social Gaming Network, as a startup. They started doing, with Facebook apps, and then they pivoted into doing mobile apps. And then, from there, I spent time.[00:02:14] I've quite a few more startups since then and in the last few years I've been in the music space So like I was at United Masters for a while and then past year I've been at SoundCloud, but not doing that anymore and now that I have a lot more time It's just like all right.[00:02:30] We're going full bore on axolotl and we're gonna we're gonna crush AI So yeah,[00:02:34] SF Open Source AI Meetup[00:02:34] swyx: totally you so you're here in town for the open source. Yeah, I meet up that we had yesterday Yep, yeah, that was amazing. Yeah, it was a big collection. Olama, Noose Research, Alignment Lab, Anyone else that I missed? I mean, Jeremy Howard is his own thing.[00:02:47] Yeah.[00:02:49] And Alex, you're also there. You love to bring SF to the world. Your takes?[00:02:55] Alex Volkov: It's incredible that we recorded a Thursday Eye episode after that one. And LDJ, who's usually co hosts Thursday Eye, just like briefly mentioned, Oh yeah, I talked about it.[00:03:04] Like, I saw Karpathy, and then I talked to Jeremy Howard, and the guy from Mistral came in, and it's like, He's talking about all these, titans of industry, basically, that outside of SF, You just don't meet casually hanging out in the same space. You can't, pull somebody. He ran into the Laylow from Mistral, he ran into him while, drinking water.[00:03:20] He didn't even know he was there. It's just, that type of stuff is really hard to find outside of SF. So, absolutely, absolutely great. And also, presentations from Alignment Labs, presentations from News Research, news issues, talked about. Forge, and some of[00:03:33] swyx: the other stuff they announced. We can say now they're officially a company.[00:03:36] I met Technium.[00:03:37] He[00:03:37] Alex Volkov: came over here. He didn't want to get recorded. But maybe.[00:03:41] Wing Lian: We'll wear him down at some point. Yeah, I'm excited for Forge. They've positioned it as this agentic sort of framework where it's just Drag and drop things and, fill in text with where you want to inject different variables and it opens up all of these potentials for data pipelines now, right?[00:03:56] And using your own local LLMs and not relying on GPT 4 or anything like that. Yeah, yeah,[00:04:02] swyx: good stuff. Okay, so let's maybe go into the Axolotl origin story and then we have, we have some intro or background.[00:04:09] What is Axolotl?[00:04:09] swyx: To do on like the open source model universe and also on fine tuning, but maybe just, since you're talking about your personal journey, what was your personal journey into[00:04:18] Wing Lian: axolotl?[00:04:19] Yeah, so my personal journey started like back in mid March, completely unrelated to AI and axolotl. And it really started, I fell while skiing, I torqued. Great 3 MCL sprain and being sort of like an active person that can no longer be active because the two, couldn't play soccer, because that is requires to have having knees until I, it's healed.[00:04:42] So I. I decided I needed to find something to do to take up my free time. And that became, well, let's learn how to train in, these language models. It was everywhere. So I was like, all right, I'm just going to sit down, learn. I think I used like other, I think I was using like Alpacalora.[00:05:00] Cause I think the Alpaca paper had just came out, come out then. So I was like using Alpacalora repo and sort of like learning how to use like. None of us were like GPU rich back then, and none of us, most of us still we're still all GPU poor, but I was doing what was it, like 4 bit, Alpaca Lord, there was like a 4 bit version where we were doing quant, or 8, no, 8 bit quantizations, and then I think they had released QLOR a little bit later, and I think right when, before QLOR came out, I was already starting to do fine tunes, but having this need to sort of like mix data sets together, and If you've ever looked at all the various different datasets available on HuggingFace, they all have various different prompt formats, and, it's sort of a nightmare, and then I think the other piece is if you've ever tried to fine tune, at least Back then probably the ecosystem's a little better now.[00:05:54] Everybody required that you say, alright, you put your hyperparameters as command line arguments. And so it's always like, well, I now have to go copy and paste my previous thing and to change things out. And I really wanted it. to be in a YAML file because it was more portable and reproducible.[00:06:09] So I was doing that and then the QLOR paper came out. Tim Dettmer announced that and then somebody looked it up for me yesterday and it's like between that announcement it took us seven days to get that integrated into Axolotl, right? Which is like, it's not. I wouldn't say it's really fast, but in a manner that, is in a, a reusable framework, I think it was quite the accomplishment then.[00:06:33] And so we started, picking up traction with people there. And then it's just been building models, and then just iterating what my needs are. So, yeah. Excellent. Yeah. I[00:06:44] Alex Volkov: want to ask, for folks who are listening who never heard of Axolotl, now do you describe how you got there?[00:06:49] Can you, how do you summarize this for folks who maybe haven't fine tuned anything. They know about open source LLM exists, they maybe know like LLAML, what's XLR for somebody who doesn't know. I've never heard of a data set curation[00:07:01] Wing Lian: creation before. We sort of have to take a step back and understand that, when you've got these language models, you have what I think most people refer to as like base models, also known as like foundational models, right?[00:07:15] Where some benefactor, whether it's Meta or Mistral or whoever, has gone and spent all this money. To train these models on huge corpuses of text, right? And these, these corpuses, they're generally good across lots of different things, but they're really good at just saying, talking on and on and on, but they're not good at, following instructions or having chats or anything like that.[00:07:40] So, when you think about fine tuning, it's like Saying, all right, we have this really sort of good generalized, text completion thing, and I want to turn it into something that I can talk to or have, follow instructions. So, I think fine tuning is probably best defined in like that.[00:07:58] swyx: Okay, got it.[00:07:59] And we actually[00:08:01] What is finetuning?[00:08:01] swyx: Do want to make sure that we have like an overall introduction to fine tuning for people because again like trying to make sure that we bring everyone along in this, in this journey. We already went into Loras and QLoras without explaining what[00:08:12] Wing Lian: they are. Oh yes, yes, sorry.[00:08:14] swyx: And so I will put things in my words and you can correct me as, as, as my I'll be the village idiot here.[00:08:21] So, so fine tuning is basically sort of grabbing an open source model off the shelf, and then basically doing further training on it with a custom dataset of your own. Primarily, people use it, think about it as fine tuning for JSON output, or fine tuning for a style of response. Let's say you wanted to tell jokes, or be funny, or be short, or whatever.[00:08:43] Just the open source AI community has really fine tuned in all sorts of different manner. I think we'll go over those those things now. Let's go over those things now, and then we'll talk about fine tuning methods.[00:08:52] Open Source Model Zoo[00:08:52] swyx: So there's a universe of people who fine tune stuff. Yesterday in your slides, you had, I'll just list some of these and then we'll maybe go through some of them, right?[00:08:59] So Technium is personally leading Open Hermes, which is I think the sort of premier model out of the news. news community. There's OpenOrca, which you had a hand in. News, the news research itself also has Capybara and Puffin and all the others. There's Pygmalion, which I've never messed with.[00:09:14] Eric Hartford, I am aware of his Uncensored Models and his Samantha Models. Disco Research with Disco LM. And then you personally have done Manticore, Minotaur, Jackalope, and Hippogriff. What should people know about all these names? Being part of AI Twitter is seeing all these things and going dude, I'm being DDoS'ed by all these things and I don't know how different they are.[00:09:32] What should people know? Yeah, so[00:09:34] Wing Lian: I think on a lot of these models, generally, we like to think of those as sort of general models, so If you think about it, what is GPT 4, what is Chad GPT? It's a good general model, and then, One of the services I think that OpenAI offers is like these fine tunings where you're a business and you have very specific business use cases and you might fine tune for that use case.[00:10:00] All of these models are really just general use case that you can then go and maybe Fine tune another lore over it for your use cases, but they tend to be good. With good being relative, it's open source. Open source AI is still sort of is infancy. So, good is, it's pretty reasonable.[00:10:18] It's probably still better than most, high schoolers at answering questions and being able to like figure things out and, and reasoning skills and math and those sorts of things, right?[00:10:27] swyx: And also as measured on the Hugging[00:10:29] Wing Lian: Face leaderboard. Yes, well, that's like a whole other discussion, right, there's a whole other, group of people who, and I, I mostly agree with them that, benchmarks can be, are pretty bogus these days, LM says, I think they published something recently where, even if you think the dataset's not contaminated, you can go and, find contamination And maybe we should step back and say what contamination is, right?[00:10:53] Benchmarks and Contamination[00:10:53] Wing Lian: So we have all of these data, when you go and do these benchmarks, there's a specific data set where there are these questions and usually it's multiple choice. And what can happen is, well, sometimes someone It puts the question, maybe maliciously, maybe accidentally, into the training dataset, and now the, the, your model knows how to answer the test questions really well, but it doesn't, it hasn't generalized the ability to actually do that[00:11:20] Alex Volkov: right.[00:11:21] We've seen some folks competitively announce models that are like the best at that leaderboard, but then it's, it's quite obvious that, In open source? Yeah, and in that leaderboard, for Hugging Face specific, I don't know if LMCs, if that had suffered, but we, there's been some models that seem to have been competitively trained and some leakage happened into their,[00:11:41] swyx: like, supposal.[00:11:43] I understand, once there's been a credible assertion, Hugging Face actually does take them down, right? Yeah, yeah,[00:11:48] Alex Volkov: which is really hard to know, right?[00:11:50] swyx: It's really hard to know, sometimes it's like a pure accident,[00:11:52] Alex Volkov: it's oh, oops. You're going through a mixer. I think, a responsible So acknowledgement, that this kind of happened to you is also important.[00:11:58] I saw LDJ from news research can acknowledge that. Because many of these datasets are collections of other datasets. There's a bunch of people are baking, basically. It's alchemy. Right. And so sometimes you don't know. Sometimes you pull an open source dataset and they announce, oh, you know what, actually, the MMLU benchmark which we used to Specifically identify models that did go into this data set, that then went into that data set.[00:12:22] So sometimes it's actually an accident and folks take it down. But I've seen some competitive folks who want to put their name out there because people are starting to notice which is the top[00:12:30] swyx: model. For those who want a fun take on this so the file one dataset. FindOne model from Microsoft was accused of being contaminated.[00:12:37] And I saw this joke paper that was fantastic. It was called, training on the test set is all you need. It's a super small model that just memorizes everything. It was fantastic. So yeah, contamination, I think we've actually covered it in a previous episode before. So we're good. But again, I want to give people a map into the open source AI model, the universe.[00:12:57] And Alex, you can also jump in here because you guys have spent a lot more time with them than I have. So, what should people know about Technium? What should people know about Noose? And then we can go down the list. Yeah,[00:13:05] Wing Lian: I think so. I think if we start with, Technium. When you talk to him, he's gonna say, I think, I think his response is that he wants to build GP4 on his laptop, right?[00:13:14] So, very, very good at building general models. I think with Noose, Noose Research, they're looking at more, sort of, More, more research focused things, like their Yarn models, I don't, I don't, they didn't actually train their, they have their own trainer for their Yarn models, but So they did not use Xlato for that one?[00:13:30] They didn't use that, but like Is that, you don't have support for it? I think we do support Yarn, I think, I'd have to double check that answer. Yeah, I'm just kind of curious what you can and cannot support, and Yeah, I mean, Yarn is supportable, it's basically, I think it's just replacing, I think, the rope part of that, so Yeah, not, not a big deal.[00:13:48] Yeah, it's not a big deal, it's just I haven't gotten to it, not enough people have asked, I think a lot of people have asked for other things, so it's just, squeaky wheel, right? I think at the end of the day, people are like building these data sets and I think if you sort of map things chronologically, these make more sense because it's like, how do we incrementally improve all of these models?[00:14:07] So a lot of these models are just incremental improvements over the last thing, right? Whether it is sort of through methods of how do we, how did we curate the data set? How did we improve the quality of the data set? So, you maybe LDJ talked about it right on I think for, for Capybara and Puffin, like how those, those were very specific dataset curation techniques that he works on.[00:14:29] The Case for Open Source AI[00:14:29] Alex Volkov: So there's, folks are doing this for dataset curation. Folks are doing this for skillset building as well. Definitely people understand that open source is like very important, especially after the, the, the, the, the march, the debacle, the OpenAI weekend that we all had. And people started noticing that even after developer day in OpenAI, the APIs went out.[00:14:48] And then after that, the whole leadership of the company is swiftly changed and people, there was worries about, you know. How can people continue building AI products based on these like shaky grounds that turned attention definitely to Technium at least in open RMS I started seeing this more and more on Twitter, but also other models and many companies They're gonna start with open AI just to get there quick, and then they they think about okay Maybe I don't want to share my knowledge.[00:15:13] Maybe I don't want to sign up for Microsoft. Maybe they will change their terms and conditions so What else is out there? They turned to other companies. Up until yesterday, Google was nowhere to be found. We've talked about Gemini a little bit before in a previous And you can tune in[00:15:26] swyx: to[00:15:26] Alex Volkov: Thursday Eye.[00:15:26] Yeah, you can tune in to Thursday Eye. We covered the Gemini release a little bit. And but many are turning into the open source community and seeing that Meta released and continues to release and commit to open source AI. Mistral came out and the model is way smaller than LLAMA and performs Significantly better.[00:15:43] People play with OpenRMS, which is currently techniums based, news researched, sourced, axolotl trained OpenRMS, I assume, right? And then they play with this and they see that, okay, this is like GPT 3. 5 quality. We had GPT 4. 5 birthday just a week ago. A week ago, a year ago, a week ago, we never, interacted with these models of this caliber.[00:16:04] And now there's one open source, one that's on my laptop, completely offline, that, I can continue improving for my use cases. So enterprises, companies are also noticing this. And the open source community folks are building the skill set, not only the data sets. They're building the actual kind of, here's how we're going to do this, with Axelotl, with these data sets.[00:16:21] The curation pieces. Now. Interesting. There's like recipes of curation. The actual model training is kind of a competitive thing where people go and compete on these leaderboards that we talked about, the LMC arena, and that recently added open air and recently added open chat and a bunch of other stuff that are super cool.[00:16:37] The hug and face open source leaderboard. And so there's a competitive aspect to this. There's the open source. Aspect to this, like Technium says, I want GPT 4 on my laptop. There's the, let me build a skill set that potentially turns into a company, like we saw with Noose. Noose just, started organizing, a bunch of people on Discord, and suddenly, they're announcing their company.[00:16:54] It's happening across all these modalities, and suddenly all these people who saw these green pastures and a fairly quick way to, hey, here's a cool online community I can, start doing cool stuff with. You mentioned the same in the beginning, right? Like, after your accident, what's cool, let me try this out.[00:17:08] Suddenly I start noticing that there's a significant movement of interest in enterprising companies into these areas. And, this skill set, these data sets, and this community is now very Very important, important enough to create an event which pulls in Andrei Karpathy from OpenAI to come and see what's new Jeremy Howard, like the event that we just talked about, people are flying over and this is just a meetup.[00:17:28] So, definitely, the community is buzzing right now and I think Axelot is a big piece as well.[00:17:34] Orca and OpenOrca[00:17:34] Wing Lian: Cool. Maybe we can talk about like Orca real quick, Orca, OpenOrca rather, I think there was a lot of buzz when, the first Orca paper came out. And just briefly, what is Orca? Yeah, Orca was basically having traces of like chain of thought reasoning, right?[00:17:48] So they go and they, they distill sort of GPT 4. They take, they take a sampling of data from the Flan dataset. Maybe we can like add some show notes in the Flan dataset. Yeah, but we've covered it. Okay, cool. Use GPT 4 to say, all right, explain this in a step by step reasoning, right?[00:18:06] And then you take that and you, they train the model and it showed, very good improvements across a lot of benchmarks. So OpenOrca was sort of the open reproduction of that since Microsoft Research never released that particular data set. And going back to sort of the Hugging Face leaderboard thing, those models did really well.[00:18:23] And then I think, so sort of the follow up to that was SlimOrca, right? I think Going into and building the OpenOrca dataset, we never really went in and, validated the actual answers that GPT 4 gave us, so what we did was one from OpenChat actually cross referenced the original Flan, the original Flan response, the human responses, the correct answers with the dataset, and then I went and took it and sent all of, both of them to GPT 4 and said, is this answer mostly correct, right?[00:18:54] Yeah. And then we were able to filter the dataset from, At least of the GPT 4 only answers from like 800, 000 to like 500, 000 answers or rows and then, and then retrain the model and it had the same performance as the original model to within I think, 0. 1 percent here about, and 30 percent less data.[00:19:13] So, yeah. Okay.[00:19:15] swyx: Interesting. So, I mean, there's, there's so much there that I want to highlight, but yeah. Orca is interesting. I do want people to know about it. Putting chain of thought into the data set like it's just makes a ton of sense one thing I think it would be helpful for people to scope thing these things out is how much data are we talking about when when you When people are fine tuning and then how much time or resources or money does it take to train to fine[00:19:36] Wing Lian: tune?[00:19:37] Yeah, so I think there's a little bit of overlap there with sort of like fine tuning techniques, but let's say Orca and I think even Hermes, they're both relatively large data sets like 10 billion tokens. Yeah. So large data sets being or the original Orca was, or the original open Orca was 800,000 rows.[00:19:55] I believe it was somewhere in the ballpark of like a gigabyte of data, of gigabyte, of text data. And I, I don't. I believe, Hermes was, is like a quarter million rows of data, I don't know the actual byte size on that particular one. So, going and training a, let's, let's say everybody's training 7 billion Mistral right now, right?[00:20:15] So, to tri I, I believe to fine tune 7 billion Mistral on, let's say, 8 A6000s, which have 48 gigabytes of VRAM, I believe, It takes about 40 hours, so 40, and then that's, depending on where you get your compute, 40 times 6, so it's like 500 to fine tune that model, so, and, and that's assuming you get it right the first time, right?[00:20:44] So, you know.[00:20:45] swyx: Is, is that something that X. Lotto handles, like, getting it right the first[00:20:48] Wing Lian: time? If you talk to anybody, it's like you've probably tried at least three or four runs or experiments to like find the right hyperparameters. And after a while you sort of have a feel for like which, where you need your hyperparameters to be.[00:21:04] Usually you might do like a partial training run, do some benchmark. So I guess for Al Farouk, whether you're going by his. This is Jeremy, he's, his actual name, or his twitter handle. He released the Dharma dataset, which is basically a subset of all the benchmarks. And Axolotl actually supports, you know taking that subset and then just running many benchmarks across your model every time you're doing an evaluation so you can sort of like see sort of relative it's not going to be the actual benchmark score, but you can get ideas alright, is this benchmark improving, is this benchmark decreasing, based on, you know Wait,[00:21:39] swyx: why don't you run the full benchmark?[00:21:41] What, what, what The[00:21:42] Wing Lian: full benchmarks take Take a long time. Significant, yeah, significant amount of time. Yeah. And Okay, so that's like[00:21:48] swyx: mini MMLU. Yeah. Like,[00:21:49] Wing Lian: mini BigBench or whatever. Yep, exactly.[00:21:51] Alex Volkov: It's really cool. We, when I joined Web2Masters just recently, and one of the things that I try to do is hey I'm not, I'm a software engineer by trade, I don't have an MLE background, But I joined a company that does primarily MLE, and I wanted to learn from the community, Because a lot of the open source community, they use weights and biases, And the benchmark that you said that Pharrell did, remind me of the name, sorry.[00:22:13] Dharma? Dharma, yeah, yeah. So Luigi showed me how Dharma shows inside the dashboard. In Wi and Biases dashboard and so you can actually kinda see the trending run and then you can see per each kind of iteration or, or epoch or you can see the model improving trending so you can on top of everything else.[00:22:29] The wi and biases gives like hyper parameter tracking, which like you, you started with common line and that's really hard to like remember. Also the Dharma data set, like the quick, the mini orca mini, you mini many different things. It's pretty cool to like visualize them as well. And I, I heard that he's working on a new version of, of Dharma, so Dharma 2, et cetera.[00:22:47] So hopefully, hopefully we'll see that soon, but definitely it's hard, right? You start this training around, it said like 40, 50 hours. Sometimes, sometimes it's like your SSHing into this machine. You, you start a process, you send it with God and you just go about your day, collecting data sets, and then you have to return.[00:23:04] And the whole process of instrumentation of this is still a little bit like squeaky but definitely. Tuning performance, or like grabbing performance in the middle of this, like with Dharma and some other tools, is very helpful to know that you're not wasting precious resources going somewhere you shouldn't go.[00:23:21] Yeah.[00:23:22] swyx: Yeah. Very cool. Maybe I'll, I'll, before we go into like sort of more details on fine tuning stuff, I just wanted to round out the rest of the Excel autoverse. There's, there's still Eric Hartford stuff. I don't know if you want to talk about Pygmalion, Disco, anything that you know about[00:23:35] Wing Lian: those, those things.[00:23:36] DiscoLM and Model Stacking[00:23:36] Wing Lian: Yeah, I think like one of the, definitely one of the more interesting ones was like the Disco 120b, right? Yeah, I know nothing about it. Yeah. So, so. Alpen from Pygmalion AI, right, so they, so Pygmalion is a sort of a, it's, it's, they have their own community, a lot of it is based around, roleplay models, those sorts of things, and Alpen, like, put together, merged together Llama270B, so, and Alpen, like, put together, merged together Llama270B, so, I don't remember how he stacked them together, whether he merged the layers in between. There's a whole, there's a whole toolkit for that by Charles Goddard, where you can like take a single model and like stack them together or multiple models merge.[00:24:18] That's like a whole other talk and a whole other tool set, but was able to create this 120. Billion parameter model out of a LAMA two 70 B. And then I believe the, yeah, disco is a fine tune of, of the, the, the sort of the base one 20 B is, I believe Goliath one 20 B. So, and, and what are the[00:24:37] swyx: headline results that people should know about[00:24:39] Wing Lian: disco?[00:24:39] I think for the headline results, I, I've, I haven't played with it personally because it's. It's a very large model and there's a lot of GPU, right? But, like, from what I've heard anecdotally, it performs really well. The responses are very good. Even with, like, just, even the base model is a lot better than, Llama70b.[00:24:57] So, and we, I think generally everybody's like, we would all love to fine tune Llama70b, but it's just, it's so much, it's so much memory, so much compute, right?[00:25:07] Datasets and Evals over Models[00:25:07] Wing Lian: I[00:25:07] Alex Volkov: want to touch on this point because the interesting thing That comes up out of being in this ecosphere and being friends with open source folks, tracking week to week state of the art performance on different models.[00:25:19] First of all, a lot of the stuff that the folks do a couple of weeks ago, and then something like Mistral comes out, and a lot of the stuff back then, Doesn't technically make sense anymore. Like the artifacts of that work, the actual artifacts, they don't no longer make sense. They're like lower on the on, on the hug and face leaderboard or lower on LM CS leaderboard.[00:25:36] But some of the techniques that people use, definitely the datasets. The datasets keep traveling, right? So open airmen, for example, is the dataset. The tum cleaned up for only. Open sourceable data that previously was just Hermes. And that, it was previously used to train Lama. And then once Mistral came out, it was used to train Mistral.[00:25:54] And then it became significantly better on the 7b base Mistral. So the data sets keep traveling, keep getting better a little bit here and there. And so the techniques improve as well. It looks like both things are simultaneously true. The artifacts of a month and a half ago. The, the actual models themselves, it's great the hug and face has them, because not every company can keep up with the next weeks', oh, I, I'll install this model instead, sell this model instead.[00:26:19] But the, the techniques and the, the dataset keep improving as we go further, and I think that's really cool. However, the outcome of this is that for a long time. For many, many people, including us, that we do this every week. We literally talk with people who release these models every week. It's really hard to know.[00:26:36] So, there's a few aspects of this. One, I think, like you said, the bigger model, the 70B models, you actually have to have somebody like Perplexity, for example, giving you access to the 70B really fast. Or you have to, like, Actually, find some compute, and it's expensive, especially for the bigger models. For example Falcon 180B came out, like the hugest open source model.[00:26:56] How do you evaluate this if you can't run it? Nobody liked it. It's really, so first of all, nobody liked it, but secondly, only the people who were able to find compute enough to run inference on this, they only had like, I can't run this on my laptop, and so that's why it's much easier, something like OpenRMS 7 to be, 7B, it's much easier, because you can run this on your MacBook.[00:27:14] It's much easier to evaluate. It's much easier to figure out the vibes, right? Everybody talks about the vibes as an evaluation check. If you're plugged in enough, if you follow the right people, if they say pretty much the same things all independently, then you run into a problem of whether they're repeating, and their stochastic parents are repeating the same thing, or they actually evaluated themselves.[00:27:31] Yeah, you never know. But, you never know, but like, I think on a large enough scale on Twitter, you start getting the feel. And we all know that like, OpenRMS is one of the top performing models, benchmarks, but also vibes. And I just wanted to highlight this vibes checks thing because you can have the benchmarks, you can have the evaluations, they potentially have contamination in them, potentially they not necessarily tell you the whole story because some models are good on benchmarks, but then you talk to them, they're not super helpful.[00:28:00] And I think it's a combination of the benchmarks, the leaderboards, the chatbot, because LMSys, remember, their ranking is not only based on benchmarks, it's also people playing with their arena stuff. People actually like humans, like, get two answers. I think they completely ignore benchmarks. Yeah, and then They only do ELO.[00:28:18] Oh, they do ELO completely, right? So that, for example, is just like people playing with both models and say, Hey, I prefer this one, I prefer that one. But also there's like some selection bias. The type of people who will go to LMCs to play with the models, they're a little bit specific in terms of like who they are.[00:28:33] It's very interesting. There's so many models. People are doing this in this way, that way. Some people are doing this for academic rigor only to test out new ideas. Some people are actually doing this like the Intel fine tunes of Mistral. Intel wanted to come out and show that their hardware approach is possible, Mistral, etc.[00:28:51] And it's really hard to know, like, what to pick, what to use. And especially on the bigger models, like you said, like the Llama 70B, the Falcon 180B. It's really because, like, who has the compute to validate those? So I would mention that, like, use with caution. Like, go and research and see if the biggest model that just released was actually worth the tokens and the money you spend on it.[00:29:12] To try and, if you're a business, to integrate it.[00:29:15] Distilling from GPT4[00:29:15] swyx: Since you said use of caution, I'll bring in one issue that has always been in the back of my mind whenever I look at the entire universe of open source AI models, which is that 95 percent of the data is derived from GPC 4, correct?[00:29:30] Which technically you can't use for commercial licenses,[00:29:34] Wing Lian: right?[00:29:35] swyx: What is the community's stance on this kind of stuff?[00:29:40] Wing Lian: I think from the community stance, like I feel like a lot of us are just experimenting, so for us, it's like, we're not going and building a product that we're trying to sell, right?[00:29:49] We're just building a product because we think it's interesting and we want to use it in our day to day lives, whether or not we try and integrate it. Personal use, yeah. Yeah, personal use, so like, as long as we're not selling it, yeah, it's fine. But[00:30:01] swyx: like, I as a company cannot just take OpenHermes and start serving[00:30:05] Alex Volkov: it and make money on it.[00:30:06] OpenHermes you can. Because the opening of OpenHermes, I think, is a clean up. That did after the regular Hermes, please folks, check your licenses before you listen to podcasts and say, Hey, I will tell you though, you could say the same thing about OpenAI. You could say the same thing kind of makes sense, where OpenAI or StabilityAI trains their diffusion model on a bunch of pictures on the internet, and then the court kind of doesn't strike down Sarah Silverman, I think, or somebody else, who came and said, hey, this has my work in it, because of the way how it processes, and the model eventually builds this knowledge into the model, and then it doesn't actually reproduce one to one what happened in the dataset.[00:30:45] You could claim the same thing for open source. Like, we're using And by we, I mean the, the open source community that I like happily report on uses GPT 4 to rank, for example, which is the better answer you, you, that's how you build one, one type of data set, right? Or DPO or something like this, you, you basically generate data set of like a question and four answers, for example, and then you go to GPT 4 and say, Hey, smartest model in the world right now, up to Gemini Ultra, that we should mention as well.[00:31:11] Which one of those choices is better? But the choices themselves are not necessarily written with GPT 4. Some of them may be, so there's like full syntactic datasets. But there's also, datasets are just ranked with GPT 4. But they're actually generated with a sillier model, or like the less important model.[00:31:25] The lines are very blurry as to what type of stuff is possible or not possible. And again, when you use this model that's up on Hug Face, the license says you can use this. OpenAI is not going to come after you, the user. If anything, OpenAI will try to say, hey, let's prevent this, this type of thing happening, and the brain, but I honestly don't think that they could know even, not that it makes it okay, it's just like, They also kind of do this with the Internet's archive, and also, I think that some of it is for use.[00:31:55] You use models to help you augment tasks, which is what GPT 4 lets you do.[00:32:00] swyx: Yeah, the worst thing that OpenAI can do is just kick you off OpenAI. That's because it's only enforced in the terms of service.[00:32:05] Alex Volkov: Sure, but just like to make sure, to clarify who they're going to kick out, they could kick out like News, for example, if news are abusing their service, a user of the open source, fully Apache 2 open source, for example, They won't get kicked out if they use both, just because they use both.[00:32:22] I don't believe so. I don't think OpenAI has a claim for that.[00:32:25] swyx: Well, we're not lawyers, but I just want to mention it for people to know it's an issue.[00:32:30] Wing Lian: And one of the things, like, I talked to someone recently, and I think that they also are like interested in it, but also to the point of like, right, if I use a model trained on data, using GPT for data, But I use that model to then regenerate new data.[00:32:46] Is that model, is that data okay? So like you start going down this whole rabbit hole. So yeah. All right.[00:32:53] swyx: Fantastic. Cool. Well, I think that's roughly highlights most of the open source universe. You also have your own models. Do you want to shout out any one of them? Yeah.[00:33:01] Wing Lian: I mean, I think like, I think Early on, Manicore got a lot of love.[00:33:04] I think it was mostly popular in, like, the roleplay communities. It was, it tended to be pretty truthful. It tended to be, like, have relatively good answers, depending on who you ask, right? But, I think for me, it was just, Releasing models was a way to try and, like, continue to build out the product, figure out what I needed to put into the product, how do I make it faster, and, if you've got to, like, go and debug your product, you may as well have it do something useful.[00:33:29] Awesome. So, yeah.[00:33:31] Finetuning - LoRA, QLoRA, ReLoRA, GPTQ[00:33:31] swyx: Okay, and then maybe we'll talk about just fine tuning techniques. So this is going to be a little bit more technical than just talking about model names and datasets. So we started off talking about LoRa, QLoRa. I just learned from your readme there's ReLoRa. Which I've never heard about.[00:33:45] Could you maybe talk about, like, just parameter efficient fine tuning that whole, that[00:33:50] Wing Lian: whole journey, like, what people should know. Yeah, so with parameter efficient fine tuning, I think the popular ones, again, being, let's, we'll start with lore, right? So, usually what you do is you freeze all the layers on your base, on the base model, and then you, at the same time, you sort of introduce additional Oh, this is tight.[00:34:08] No. You introduce, another set of layers over it, and then you train those, and it is done in a way that is mathematically possible, particularly with LORs that you can, then you, you, When you, when you train the model, you, you run your inputs through the base model, whose weights are frozen, but you, then you also run it through the additional weights, and then at the end you combine the weights, and then, and then, or you combine the weights to get your outputs, and then at the end, and when you're done training, you're left with this other set of weights, right, that are completely independent, and And then from that, what you can do is, some person smarter than I figured out, well, oh, they've done it in such a way that now I can merge these weights back into the original model without changing the architecture of the model, right?[00:35:03] So, so, that tends to be, like, the go to, and You're training much fewer parameters so that when you do that, yes, you still need to have all of the original weights, but you have a smaller gradient, you have a smaller optimizer state, and you're just training less weights, so you can tend to train those models on, like, much smaller GPUs.[00:35:27] swyx: Yeah. And it's roughly like, what I've seen, what I've seen out there is roughly like 1 percent the number of parameters that you're trading. Yeah, that sounds about right. Which is that much cheaper. So Axelotl supports full fine tune, LoRa, QLoRa,[00:35:40] Wing Lian: Q. Yes. So, so QLoRa is, is very similar to LoRa. The paper was, if I remember correctly, the paper was Rather, traditionally, most people who did Loras were, were, they were quant, they were putting the model weights in 8 bit, and then fine tune, parameter efficient fine tuning over the Lora weights, and then with QLora, they were quantizing all of those, they were then quantizing the weights down to 4 bit, right, and then I believe they were also training on all of the linear layers in the model.[00:36:15] And then with ReLore, that was an interesting paper, and then, I think, like, it got implemented. Some people in the community tried it, tried it out, and it showed that it didn't really have the impact that the paper indicated that it would. And from what I was told recently, that they re I guess they re released something for Relora, like, a few weeks ago, and that it's possibly better.[00:36:44] I personally haven't had the time. What was the[00:36:46] swyx: main difference,[00:36:47] Wing Lian: apart from quantization? I don't know. Okay. What was the main difference, sorry?[00:36:49] swyx: Apart from quantization, right? Like,[00:36:50] Wing Lian: Qlora's thing was, like, we'll just drop off some bits. With Relora, what they did was, you would go through, you would define some number of steps that you would train, like, your Lora with, or your Qlora.[00:37:01] Like, you could do Like, ReqLore, if you really wanted to, you would, you would train your LoRa for some number of steps, And then you would merge those weights into your base model, and then you would start over. So by starting, so, then by starting over, The optimizer has to find, like, sort of, re optimize again, and find what's the best direction to move in, and then do it all again, and then merge it in, do it all again, and theoretically, according to the paper, doing ReLore, you can do parameter efficient fine tuning, but still have sort of, like, the performance gains of doing a full fine tuning, so.[00:37:38] swyx: Yeah, and[00:37:39] Wing Lian: GPTQ? And GPTQ, so it's, I think with GPTQ, it's very similar to, more similar to QLore, where you're, it's mostly a quantization of the weights down to like 4 bit, where GPTQ is a very, is a specific methodology or implementation of quantization, so. Got it.[00:37:57] Alex Volkov: Wang, for, for folks who use Axolotl, your users, some people who maybe, Want to try it out?[00:38:03] And do they need to know the differences? Do they need to know the implementation details of QLora versus ReLora? Or is it okay for them to just know that Axolotl is the place that already integrated them? And if that's true, if that's all they need to know, how do they choose which method to use? Yeah,[00:38:22] Wing Lian: so I think like, I think most people aren't going to be using ReLora.[00:38:25] I think most people are going to be using either Lora or QLora. And I think they should have it. They should have an understanding of why they might want to use one over the other. Most people will say that with Qlora, the quality of the final model is not quite as good as like if you were to do a LoRa or a full fine tune, right?[00:38:44] Just because, you've quantized these down, so your accuracy is probably a little off, and so that by the time you've done the Qlora, you're not moving the weights how you would on a full fine tune with the full parameter weights.[00:38:56] Interesting.[00:38:57] swyx: Okay, cool. For people who are more interested, obviously, read the papers. I just wanted to give people, like, a high level overview of what these things are. And you've done people a service by making it easy for people to try it out. I'm going to, I'm going to also ask a question which I know to be wrong, but I'm curious because I get asked this all the time.[00:39:15] What is the difference between all these kinds of fine tunes[00:39:17] Wing Lian: and RLHF? Okay, between all of these sorts of fine tunes and RLHF. So all of these sorts of fine tunes are based, are, ideally, this, they are taking knowledge that the base model already knows about, and presenting it in a way to the model that you're having the model answer like, Use what it already knows to sort of answer in a particular way, whether it's, you're extracting general knowledge, a particular task, right?[00:39:44] Instruct, tune, chat, those sorts of things. And then generally with RLHF, so what is, let's go back, what is it? Reinforcement Learning with Human Feedback. So if we start with the human feedback part, What you're doing is you generally have, you have like a given prompt and then you, maybe you have one, maybe you have two, I think, like if you look at with Starling, you have like up to what, seven different, seven different possible responses, and you're sort of ranking those responses on, on some sort of metric, right, whether the metric is how much I, I might like that answer versus or I think with like starling is like how how how helpful was the answer how accurate was the answer how toxic was the answer those sorts of things on some sort of scale right and then using that to go back and like sort of Take a model and nudge it in the direction of giving that feedback, to be able to answer questions based on those preferences.[00:40:42] swyx: Yeah, so you can apply, and is it commutative? Can you apply fine tuning after and onto an RLHF model? Or should the RLHF apply, come in afterwards,[00:40:54] Wing Lian: after the fine tune? Um, I, yeah, I don't know that there's There's been enough research for one way or another, like, I don't know.[00:41:02] That's a question that's been asked on Discord. Yeah, like, I definitely would say I don't know the answer. Go and try it and report back to me and let me know so I can answer for the next guy.[00:41:10] swyx: It's shocking how much is still unknown about all these things. Well, I mean, that's what research is for, right?[00:41:16] Wing Lian: So actually I, I think I saw on the top of a leaderboard, it was a, it was a mytral base model, and they didn't actually fine tune it. They, or they, they just did RLH, they did like an RLHF fine tune on it using like, I don't, I don't recall which dataset, but it was like, and it benchmarked really well.[00:41:37] But yeah, you'd have to go and look at it. But, so it is interesting, like going back to that, it's like. Traditionally, most people will fine tune the model and then do like a DPO, PPO, some sort of reinforcement learning over that, but that particular model was, it seemed like they skipped like the supervised fine tuning or Scott.[00:41:55] Axolotl vs HF Transformers[00:41:55] swyx: Cool. One thing I did also want to comment about is the overall, like, landscape, competitive landscape, I don't know. Hugging Face Transformers, I think, has a PFT module.[00:42:05] Wing Lian: Yeah, yeah, the PEFT, the Parameter Efficient Fine Tuning, yep. Is that a competitor to you? No, no, so we actually use it. We're just a wrapper over sort of, sort of the HuggingFace stuff.[00:42:15] So, so that is their own sort of module where They have, taken the responsibility or yeah, the responsibility of like where you're doing these parameter efficient fine tuning methods and just sort of like, it is in that particular package where transformers is mostly responsible for sort of like the modeling code and, and the trainer, right.[00:42:35] And then sort of, there's an integration between the two and, there's like a variety of other fine tuning packages, I think like TRL, TRLX, that's the stability AI one. Yeah, I think TRL likes the stability, yeah, Carper, and TRL is a hugging face trainer. Even that one's just another wrapper over, over the transformers library and the path library, right?[00:43:00] But what we do is we have taken sort of those, yes, we've We also use that, but we also have more validation, right? So, there are some of us who have done enough fine tunes where like, Oh, this and this just don't go together, right? But most people don't know that, so like Example?[00:43:19] Like, people want to One and one doesn't go together. I don't have an example offhand, but if you turn this knob and this knob, right? You would think, all right, maybe this will work, but you don't know until you try. And then by the time you find out it doesn't work, it's like maybe five minutes later, it's failed.[00:43:34] It's failed in the middle of training or it's failed during the evaluation step. And you're like, ah, so we've, we've added a lot of, we've added a lot more validation in it. So that like, when you've, you've created your configuration, you run it through and now you say. The validation code says this is probably not right or probably not what you don't, not what you want.[00:43:52] So are you like a, you[00:43:53] swyx: do some linting of your YAML file?[00:43:56] Wing Lian: There, I guess you could call it linting, it's sort of like Is there a set of rules out[00:44:00] swyx: there somewhere? Yeah, there's a set of rules in there. That's amazing, you should write documentation like This rule is because, this user at this time, like, ran into this bug and that's what we invested in.[00:44:10] It's like a good collection[00:44:11] Wing Lian: of knowledge. Yeah, it is, and I guess like, if you really wanted to, like, figure it out, I guess you could, like, git blame everything, and But, yeah, it's, so, I think that's always a useful thing, it's like Because people want to experiment but they don't, people will get frustrated when you've experiment, you're experimenting and it breaks and you don't know why or you know why and you've just gone down the rabbit hole, right?[00:44:37] So, so I think that's one of the big features that's, that I think I find important because it's It prevents you from doing things you probably shouldn't have, and it, and sometimes we will let you do those things, but we'll try and warn, warn you that you've done that.[00:44:50] I[00:44:51] Alex Volkov: have a follow up question on this, actually, because yesterday we hung out to this open source event, and I spent time by you a couple times, like when people told you, oh, XLR, I use XLR, it's super cool, and then the first thing you asked is, like, immediately, like, what can we improve?[00:45:04] And yes, from multiple folks, and I think we talked about this a little bit, where there's It's a developer tool. It's like a machine learning slash developer tool. Your purpose in this is to help and keep people, as much as possible, like, Hey, here's the best set of things that you can use right now. The bear libraries are, or the bear trainer, for example, is a bear trainer.[00:45:28] And also, maybe we should talk about how fast you're implementing these things. So you mentioned the first implementation took a week or so. Now there's a core maintainer group, right? There's like, features are landing, like Qlora, for example. Neftune, I don't know if that's one example of something that people potentially said that it's going to be cool, and then eventually, like, one of those things that didn't really shake out, like, people quickly tested this out.[00:45:48] So, there's a ton of Wait, Neftune is cancelled? I don't know if it's fully canceled, but based on vibes, I heard that it's not that great. So like, but the whole point that I'm trying to make with Neftune as well is that being existing in the community of like XLR or like, I don't know, even following the, the GitHub options or following the Discord, it's a fairly good way to like, learn these, Kind of gut feelings that you just, you just said, right?[00:46:14] Like where this, maybe this knob, that knob doesn't work. Some of these are not written down. Some of these are like tribal knowledge that passes from place to place. Axel is like a great collection of many of them. And so, do you get That back also from community of folks who just use, like, how do you know who uses this?[00:46:30] I think that's still an issue, like, knowing if they trained with XLR or should they add this to things? Talk about, how do you get feedback and how else you should get feedback?[00:46:38] Wing Lian: Yeah, I mean, most of the feedback comes from the Discord, so people come in and , they don't get a training running, they run into, like, obscure errors or, errors that That's a lot of things that maybe, maybe as a product we could catch, but like, there's a lot of things that at some point we need to go and do and it's just on the list somewhere.[00:46:58] Right that's why when people come up, I'm like, what, what were your pain points? Because like, as a developer tool, if you're not happy with it, or you come in and in the first, Takes you 30 minutes and you're still not happy. You leave the tool and you may, you might move on maybe to a better tool, maybe to, one with less frustration, but it may not be as good, right?[00:47:17] So I'm trying to like, figure out, all right, how can I reduce all this frustration? Because like for me, I use it every day for the most part, right? And so I am blind to that, right? Mm-Hmm. . Mm-Hmm. . I just know, I, I go do this, this, and this. It pretty much mostly works, right? But, so I don't have sort of that, alright, that learning curve that other people are seeing and don't understand their pain points.[00:47:40] Yeah,[00:47:40] Alex Volkov: you don't have the The ability to onboard yourself as a new user completely new to the whole paradigm to like get into the doors of like, Oh, no, I don't even know how to like ask about this problem or error.[00:47:53] swyx: Cool. The last few things I wanted to cover was also just the more advanced stuff that you covered yesterday.[00:48:00] 20x efficiency with StackLlama and Multipack[00:48:00] swyx: So I'll just, caution this as like, yeah, this is more advanced. But you mentioned Stackllama and Multipack. What are they[00:48:06] Wing Lian: and what should people know? Yeah, so, so, Stack Llama was, that paper came out, so Stack Llama I think was like, two, two, two separate, two separate concepts that they announced, so the first one was They being hugging face.[00:48:20] Yeah, sorry, yes, they being hugging face, so the first one being sort of like, this idea of packing, like some packing sequences together, so like, if we think about training data, right, your training data is, let's say, to keep the math easy, let's say your training data is 500, We, we, we, we will use the terminology words.[00:48:39] Let's say your training data is 500 words long, and let's say your, your context length, you know how much data your, that your model can accept is like, or that you want feed into your model. It's, let's say, we won't use tokens again, we'll we'll use it is it's 4,000 tokens, right? So if you're training at 4K Con or four 4,000 4K contacts and you're only using 500 of it, you're sitting like with the other 1500.[00:49:05] 3, 500 words that you're not using, right? And typically that's either filled with these PAD tokens, so I think I made the analogy last night that it's like having sort of like a glass here you fill it up with a shot of liquor and then you're and that's your training data and then you just fill it up with more water and those are your PAD tokens and it's just, it doesn't do much, right?[00:49:27] It's still the same thing, but you still have to go through all of that to go through all your training data. And then, so what Stack Llama showed was you could just sort of take your training data, append the next row of training data until you filled that entire 4k context, so in this example, right, with 500 words to 4k, that's 8 rows of training data.[00:49:48] But, the problem with that is, is that with a lot of these transformer models, they're very much relying on attention, right? So, like, if you now have this sequence of words that now, in order for the, the model has seen all of these other words before, right? And then it sees another set of words, another set of words, but it's learning everything in context of all the words that it's seen before.[00:50:13] We haven't corrected the attention for that. And just real quickly, since I said that that paper was two concepts, the other one was, I believe it was like a reinforcement learning, but outside the scope of this. So going from that, I implemented that early on because I was like, Oh, wow, this is really great.[00:50:29] And. Yes, because it saves you a bunch of time, but the trade off is a little bit of accuracy, ultimately, but it still did pretty well. I think when I did Manicore, I think it used sort of that concept from Stack Llama of just sort of appending these sequences together, right? And then sort of the next evolution of that is Multipack, right?[00:50:51] So, there was a separate paper on that, it was, I believe it was referenced, it got referenced in the Orca paper, where you could, you could properly mask those out using like a, I think it was like a lower block triangular attention mask, and then sort of, so, So, there's that. I did try implementing that, manually recreating that mask, but then one from the OpenChat, so he was helping with OpenOrca as well, and he had done an implementation of Multipack, and where he used FlashAttention, so FlashAttention So that was released by TreeDAO, and it was this huge performance gain.[00:51:35] Everybody uses it now, even the Transformers library now, they've taken all of these, like, people are taking all of these models and sort of like, making it compatible with FlashAttention. But in Flash Tension, there is one particular implementation that lets you say, Well, I'm sending you all of these sequences like you would in Stack Llama, But let me send you another, another, Set of information about, this is where this set of sequences is, this is where the second set of sequences is.[00:52:06] So like, if it was like, 500 words long, and you stacked them all together, you would just send it a row of information that was like, 0, 500, 1000, 1500, etc, etc, out to 4000. And it would know, alright, I need to break this up, and then run the forward pass with it. And then it would be able to, and it was much more, much more performant.[00:52:29] And I think you end up seeing like 10x, 20x improvements over sort of, I mean, I think FlashAttention was like a 2x improvement, and then adding that with the Multipack, you start to see like, depending on, how much data you have, up to like a 20x improvement sometimes. 20x. 20x. Wow. Yeah.[00:52:48] And I only know the 20x because I, like, before last night, I was like, I re ran the alpaca, I looked up the alpaca paper because it was like, I just need a frame of reference where somebody did it, and I think they used eight A100s for three hours, and they said it cost them 100. I don't, I don't think eight A100s cost, I don't know how much it costs right now.[00:53:14] But I ended up rerunning it. Usually a dollar an hour, right? Yeah, so eight. The cheapest is like a[00:53:18] Alex Volkov: dollar, a dollar an hour for one.[00:53:20] Wing Lian: Yeah, so that's still like 24, 25. But maybe if you're going on Azure, maybe it's like, maybe it's 100 on Azure. I mean, it used to be more expensive, like, a year ago.[00:53:31] Yeah, and then, so I re ran it with sort of like, I turned on all of the optimizations just to see what it would be. And like, and usually Multipack is the biggest optimization, so Multipack with Flash Detention. And it, I think I spun it up on 8 L40s, and it ran, and I didn't let it run all the way through, I just grabbed the time, the estimated completion time, and it was like 30 minutes, so it would have cost like 4 or 5 to run the entire, like, reproduce the alpaca paper, right?[00:54:00] Which is crazy. It's crazy. 20x,[00:54:02] Alex Volkov: yeah. I want to ask about, like, you said you turned on all the optimization. Is that the yaml file with xlodl, you just go and like check off, like, I want this, I want that? Yeah, yeah,[00:54:10] Wing Lian: so there's like one particular yaml file in there, That, there's one particular YAML file in there that's like, it's under examples, llama2, fft, optimize.[00:54:20] So, I think someone had created one where they just turned, they put in all of the optimizations and turned them on. I mean, it actually, it does run, which is like, sort of surprising sometimes, because sometimes, you optimize this, optimize this, and sometimes they just don't work together, but, yeah.[00:54:36] Just turn the knobs on, and like, fine tuning should really just be that easy, right? I just want to flip the knob and move on with my life and not figure out how to implement it.[00:54:47] Tri Dao and Mamba[00:54:47] Alex Volkov: Specifically, the guy behind FlashAttention came up with something new. You want to talk about this a little bit? You want to briefly cover Mamba?[00:54:53] Yeah, let's talk about Mamba. Let's talk about Mamba. So, what is Mamba?[00:54:57] Wing Lian: Oh, gosh. I

Short Time Wrestling Podcast
Short Time Shots - December 1, 2023

Short Time Wrestling Podcast

Play Episode Listen Later Dec 1, 2023 4:01


Las Vegas is on everyone's mind who follows the world of Division I wrestling and that also means me. There's a bunch of Vegas-related content in the newsletter today, so again, you can find out how to get that daily at the end of the show. I was on the mic Thursday here in the Twin Cities - more on that in a bit, this is Short Time Shots, a look at the day's results in and around college wrestling. Right now, if you hear any banging around, it's because I'm getting a new furnace installed this morning. Yup. Thankfully before it his absolute zero here this winter. Our installer used to wrestle, too. Links: Did you know you can find all the D1 dual meet scores and links to their results at collegewrestlingscoreboard.com AND I've added another redirect to help you follow all the tournaments from every division at collegewrestlingtournaments.com. As to the aforementioned Minnesota match, Division III's top-ranked team, Augsburg, blanked No. 25 Loras 46-0 at Si Melby Hall. Augsburg won four matchups between ranked wrestlers, including returning national champion Sam Stuhl earning an 11-2 major decision over No. 13 Jalen Schropp at 141 pounds. Top-ranked heavyweight Tyler Kim and 184-pounder Bentley Schwanebeck-Ostermann picked big bonus victories - Kim with the fall and BSO (his name is too long to keep repeating) with the tech. Other scores around Division III saw Carthage beat Elmhurst 29-15, and Central, that's it, just Central, pushed past Augustana (Ill.) 36-9. UW-Stevens Point trounced UW-Oshkoh (B'Gosh) 40-18. Augustana did beat Simpson (Iowa) The only Division I match of the night featured a pair of transitional programs as Bellarmine heads out to Missouri and beat Lindenwood 35-7. There were some exciting duals in Division II, including Kutztown winning the last five bouts to pull away past Seton Hill 34-15. Both of Kutztown's ranked wrestlers - No. 11 Dalton Gimbor at 165 and third-ranked Matthew Weinberg at 184 - won. Gimbor beat Kane Kettering while Weinberg picked up a forfeit. I hate forfeit. Millersville beat East Stroudsburg 23-19. Colorado School of Mines won the last six weights to top New Mexico Highlands 31-13. Colorado Mesa beat rival Western Colorado 24-17. Davenport beat NAIA Cleary 37-12. Down south, Coker blanked Bluefield State (that's in West Virginia), while Belmont Abbey beat Emory & Henry 48-3. Emory & Henry is transitioning from Division III to Division II. In the NAIA, Ottawa beat Avila 35-13. On the women's side, Grand View beat Central Methodist 27-18, Cedar Crest beat Alvernia 32-16 in the school's first ever home women's wrestling dual. Sioux Falls new women's program improved to 2-0 with a win over Minot State 39-13 and Lock Haven beat Elmira 35-12. Subscribe, for free, to my Daily Wrestling News. Get it for free at mattalkonline.com/news, sign up free today. That's as always, delivered by our longtime friends at Resilite.

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)
Chapter Seventy-Two - Jaime 9 - A Storm of Swords | A Song of Ice and Fire (ASOIAF)

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)

Play Episode Listen Later Nov 21, 2023 71:23


Jaime has a curious goodbye with Arya Stark. He refuses to leave the Kingsguard, despite a strong push from his family. He resists an advance from his sister, which results in a heated argument between the two. Finally, he gives Brienne a jaw-dropping gift and then tasks her with a surprising mission in return. Simon and Mackelly squint hard at “Arya Stark.”Chapter Review:Jaime Lannister grows bored of watching the king sign documents that offer rewards or punishment to houses, for their roles in the war. He returns to the White Sword Tower. On his way, he runs into a girl claiming to be Arya Stark. Jaime is skeptical.Cersei waits for him in his chambers. She begs him to talk to their father about Tywin's plan to send her back to Casterly Rock, effectively separating her from her son Tommen. Jaime doesn't think it'll do any good, but Cersei believes it will, if he agrees to leave the Kingsguard. Jaime refuses. Cersei makes a sexual advance that Jaime rebuffs. She throws insults as she storms out.Jaime calls for Ser Loras Tyrell and Brienne of Tarth. Loras admits Brienne might be telling the truth about Renly Baratheon's death. After dismissing Loras, Jaime gives Brienne the Valyrian steel sword his father gave him. In return, he asks she use it to find Sansa Stark and keep her safe.Characters/Places/Names/Events:Jaime Lannister - Member of the royal Kingsguard. Twin brother to queen Cersei and older brother to Tyrion. Nicknamed the "Kingslayer" for killing King Aerys Targaryen.Ser Loras Tyrell - Member of the Kingsguard. Young, dashing, talented knight from the Reach. The third son of Lord Mace Tyrell of Highgarden.Brienne of Tarth - Warrior woman. Once "Brienne the Blue" in King Renly's Rainbow Guard. On a mission to rescue the Stark girls for Catelyn Stark.Cersei Lannister - twin to Jaime, mother to Joffrey, Tommen, and Myrcella. Tywin Lannister - father of Jaime, Cersei, Tyrion. Lord of Casterly Rock. Hand of the King.Tyrion Lannister - youngest sibling of Jaime and Cersei. Imprisoned awaiting trial for murdering his nephew King Jofrrey.Tommen Lannister - King of the Andals, Roynar, and the First men. Lord of the Seven Kingdoms. Protector of the Realm.We've got four new sustainer tiers on our Buy Me a Coffee site. Something for every budget and level of interest. Check it out! Support the showSupport us: Buy from our store Buy us a Cup of Arbor Gold, or become a sustainer and receive cool perks Donate to our cause Use our exclusive URL for a free 30-day trial of Audible Buy or gift Marriott Bonvoy points through our affiliate link Rate and review us at Apple Podcasts, Spotify, podchaser.com, and elsewhere.Find us on social media: Discord Twitter @GhostsHarrenhal Facebook Instagram YouTube All Music credits to Ross Bugden:INSTAGRAM! : https://instagram.com/rossbugden/ (rossbugden) TWITTER! : https://twitter.com/RossBugden (@rossbugden) YOUTUBE! : https://www.youtube.com/wa...

Short Time Wrestling Podcast
Short Time Shots: November 17, 2023

Short Time Wrestling Podcast

Play Episode Listen Later Nov 17, 2023 3:50


It's Friday, and I'll be heading back to my native lands this weekend after the Husky Hockey squad finishes up with Duluth. This is NOT a hockey podcast, it's Short Time Shots, a look back at the day's scores around college wrestling. Here's what went down on Thursday. Second-ranked NC State dominated Army West Point 34-6. Jakob Camacho had a takedown and nearfall points late to break open a close bout to earn a 9-1 major decision over Army's Ethan Berginc at 125 pounds. Ed Scott beat Nathan Lukez 14-8 at 157 and Dylan Fishback majored Ben Pasiuk 15-7 at 184 pounds. Minnesota rolled Bucknell in Lewisburg 37-3. Minnesota true freshman Tyler Wells continued to be the second most famous wrestler from Princeton, Minnesota behind Bob Backlund, but his college legend begins as he topped Kurt Phipps at 133 pounds. The Gophers picked up five bonus victories including falls by 125-pound All-American Patrick McKee and heavyweight Bennett Tabor. Tabor pinned Bucknell's Dorian Crosby in sudden victory. Former EWL rivals met up in Pennsylvania as West Virginia bested Clarion 32-7. Bonus victories were registered by Jordan Titus at 141 pounds, Ty Watters at 149, Peyton Hall at 165, Dennis Robin at 184 and Michael Wolfgram at heavyweight. The Mountaineers came in ranked 24th this week. They improved to 4-0. In Division II, top-ranked Lander won eight of 10 to defeat UNC Pembroke 31-9. Lander opened up with second-ranked James Joplin edging 11th-ranked Logan Seliga 6-4 at 125 pounds. Lander's Kaden Kuenzi picked off No. 11 Logan Robinson 8-7 at 133 pounds. The nationally ranked trio of David Hunsberger, Anthony Yacovetti and Juan Edmond-Holmes all picked up bonus victories on the night. Down in Missouri, Third-ranked St. Cloud State picked up a pair of wins, beating Central Missouri 37-11 and host Drury 46-5. The Mules did pick up a ranked win at 133 pounds where eighth-ranked Tommy Frezza majored ninth-ranked Caleb Meekins 11-1. St. Cloud State got one back at 165 pounds where fourth-ranked Anthony Herrera pinned eight-ranked John Ridle in the first period. Central Missouri beat Drury 47-4. Minnesota State Moorhead beat NAIA Jamestown 38-10. In Division III, 15th-ranked Coe beat Central 32-13 with All-American Will Esmoil and L.J. Richardson picking up bonus victories for the Kohawks, yes, with a K. Loras pounded the Beavers of Buena Vista 43-3. Augustana (Ill.) singed the Firebirds of Carthage 22-21 in the CCIW, while Augustana's women beat Carthage 29-20. Pacific beat Linfield 34-24 on the men's side, while Linfield's women prevailed 20-18. In the NAIA, Dakota Wesleyan beat Northwestern (Iowa) 26-21, Ottawa (Kan.) beat York (Neb.) 43-7. The Ottawa women also beat York 40-10. Cornerstone topped NJCAA Henry Ford, the college, not the man, 48-3. Providence (Mont.) beat North Idaho of the NJCAA 39-3. Northeastern Oklahoma A&M beat Carl Albert State 33-8 in junior college action. Subscribe, for free, to my Daily Wrestling News. Get it for free at mattalkonline.com/news, sign up free today. That's as always, delivered by our longtime friends at Resilite. Go Huskies. WOOOOO!

WATCH DEM THRONES by Black With No Chaser
"SONS OF THE HARPY" Game of Thrones Season 5 EP4 Recap

WATCH DEM THRONES by Black With No Chaser

Play Episode Listen Later Nov 7, 2023 97:02


We pick up where we left off with Jorah Mormont stealing a boat to navigate with his captive, Tyrion. Aboard a merchant ship, Jaime and Bronn sail to Dorne to rescue Myrcella and getting betrayed and caught up while doing so. Cersei has the Iron Bank on her ass and they want 10% of what's owed to them...and she ain't got it. She sends Mace Tyrell to go talk to them as well as getting him out of the way.  Cersei also plots with the High Sparrow to try and manipulate the faith to do her dirty work. The faith attacks the bars and brothels and takes Loras as captive for being gay. We also see Ellaria and the sandsnakes decide to do what Oberyn's brother won't do...avenge his death against the Lannisters. The Red Priestess tried to throw that hotbox on Jon and he refuses. And finally, we see the Sons mount an attack on Dany and her people with Ser Barriston paying the price for it....If you haven't already, please to subscribe to the audio version of the podcast on all your favorite podcast platforms...If you want to keep the fun going with us throughout the week, come join our Facebook group.JOIN THE WATCH DEM THRONES FACEBOOK GROUPhttps://www.facebook.com/groups/126567443834910/?ref=share&mibextid=NSMWBTTO WATCH AND SUBSCRIBE:Facebook:https://www.facebook.com/blackwithnochaser/Twitter:https://twitter.com/BeBlackNoChaser?t=pVFV06lBFdZRu72ot4uCjA&s=09Twitter:https://twitter.com/WatchDemThrones?t=q0ngrYPlugf0ttzM2jo39A&s=09Apple Music: https://podcasts.apple.com/us/podcast/watch-dem-thrones-by-black-with-no-chaser/id1641754247Spotify:https://open.spotify.com/show/1qI1bJ1vIlobu502w6zrtN?si=mtsa3gZYRZW_3FmlCrv7UgBWNC RADIO: https://apps.apple.com/us/app/bwnc-radio/id6443800363Amazon Musichttps://music.amazon.com/podcasts/45279c3a-c09f-47d1-a3a3-88e6e2507230/watch-dem-thrones-by-black-with-no-chaserIHeartRadiohttps://www.iheart.com/podcast/269-watch-dem-thrones-by-black-101286659/?cmp=android_share&sc=android_social_share&pr=false#gameofthrones #demdragons #blackwithnochaser #houseofthedragonhbo #dragonseeds #theblacks #thegreens #houseofthedragon #youtube #targaryens #podcast #podsincolor #applemusic #spotifymusic #podsincolor #starks #lannisters #Velaryon

Podcast Plenae
Alexandra Loras em "Uma jornada de autoconhecimento constante"

Podcast Plenae

Play Episode Listen Later Oct 22, 2023 20:44


Você consegue olhar com sensibilidade para a sua própria trajetória? Alexandra Loras representa o pilar Propósito nesta temporada e fala sobre assuntos complexos e delicados, como racismo e feminismo, de forma sensível e clara.

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)
Chapter Sixty-Seven - Jamie 8 - A Storm of Swords | A Song of Ice and Fire (ASOIAF)

The Ghosts of Harrenhal: A Song of Ice and Fire Podcast (ASOIAF)

Play Episode Listen Later Oct 17, 2023 74:57


Jaime reminisces about the bygone glory days of the Kingsguard. He turns his attention to righting the wrongs of the current White Swords. He goes man-by-man through his sworn brothers, calling them out on the issues he has with them. Lastly, he has a heart-to-heart with Loras about his claim that Brienne killed Renly. Jenny and Mackelly applaud Jaime's closing arguments.Chapter Review:Jaime Lannister reads through the White Book, the written history of the Kingsguard, while waiting for his sworn brothers to join. When his brothers arrive, Jaime admonishes them for allowing the king to die under their watch. He then gauges their opinions on Tyrion's involvement in the death.He goes through each man and calls them out on issues. Ser Boros Blount will now be King Tommen's taste tester, as he's unfit to protect the boy he gave up so easily. He then questions why he's never heard of Ser Osmund Kettleblack before. After that, he asks Ser Meryn Trant to show him where in their vows it says to beat women and children, like he did to Sansa Stark. He praises Balon Swann's character but questions how he'd handle his family challenging Tommen's rule.Finally, he turns his attention to Ser Loras Tyrell,  who reminds Jaime of himself at that age. Jaime makes a case for Brienne's possible innocence in the murder of Renly Baratheon. His argument is enough to seed doubt in Loras' mind. Jaime tasks Loras with talking with Brienne and deciding her innocence or guilt. Whatever Loras determines is the case, Jaime will abide by it.Characters/Places/Names/Events:Jaime Lannister - Member of the Kingsguard. Twin brother to queen Cersei and older brother to Tyrion.Ser Boros Blount - Member of the Kingsguard. Gave up Tommen to Tyrion's men without a fight.Ser Meryn Trant - Member of the KIngsguard. Beat Sansa Stark on the command of King Joffrey.Ser Osmund Kettleblack - Member of the Kingsguard. Former sellsword, with a mysterious past.Ser Balon Swann - Member of the Kingsguard. Seemingly honorable and valiant.Ser Loras Tyrell - Member of the Kingsguard. Young, dashing, talented knight from the Reach. The third son of Lord Mace Tyrell of Highgarden.Brienne of Tarth - Warrior woman. Once "Brienne the Blue" in King Renly's Rainbow Guard. On a mission to rescue the Stark girls for Catelyn Stark. Support the showSupport us: Buy from our store Buy us a Cup of Arbor Gold, or become a sustainer and receive cool perks Donate to our cause Use our exclusive URL for a free 30-day trial of Audible Buy or gift Marriott Bonvoy points through our affiliate link Rate and review us at Apple Podcasts, Spotify, podchaser.com, and elsewhere.Find us on social media: Discord Twitter @GhostsHarrenhal Facebook Instagram YouTube All Music credits to Ross Bugden:INSTAGRAM! : https://instagram.com/rossbugden/ (rossbugden) TWITTER! : https://twitter.com/RossBugden (@rossbugden) YOUTUBE! : https://www.youtube.com/wa...

Dubuque Area Baseball Podcast
Weighin' in with Worm Interview Loras 2X AA Rick Healey

Dubuque Area Baseball Podcast

Play Episode Listen Later Oct 4, 2023 37:08


Loras Hall of Fame Athlete Rick Healey joins me this week before his induction into the Hall. We discuss the struggles and balances of being a young father while chasing dreams of All American Status; also his thoughts about going into the hall this weekend!

WATCH DEM THRONES by Black With No Chaser
"FIRST OF HIS NAME" Game of Thrones Season 4 EP5 Recap

WATCH DEM THRONES by Black With No Chaser

Play Episode Listen Later Aug 28, 2023 85:23


In the episode, Tommen Baratheon is crowned as the new king of the Seven Kingdoms. Over in Meereen Sansa learns of the Purple Wedding and Wish Daario brings her a fleet of ships. Next we find Lil Finger and Sansa arriving at the Eyrie and are welcomed by Lysa Arryn. At King's Landing, Cersei discusses wedding arrangements with Tywin. It's decided that Margaery will marry Tommen in a fortnight, and Cersei will marry Loras a fortnight later. Daenerys Targaryen decides to stay in Meereen; and Jon Snow leads an attack on the Night's Watch mutineers at Craster's Keep plus much much more....JOIN THE WATCH DEM THRONES FACEBOOK GROUPhttps://www.facebook.com/groups/126567443834910/?ref=share&mibextid=NSMWBTTO WATCH AND SUBSCRIBE:Facebook:https://www.facebook.com/blackwithnochaser/Twitter:https://twitter.com/BeBlackNoChaser?t=pVFV06lBFdZRu72ot4uCjA&s=09Twitter:https://twitter.com/WatchDemThrones?t=q0ngrYPlugf0ttzM2jo39A&s=09Apple Music: https://podcasts.apple.com/us/podcast/watch-dem-thrones-by-black-with-no-chaser/id1641754247Spotify:https://open.spotify.com/show/1qI1bJ1vIlobu502w6zrtN?si=mtsa3gZYRZW_3FmlCrv7UgBWNC RADIO: https://apps.apple.com/us/app/bwnc-radio/id6443800363Amazon Musichttps://music.amazon.com/podcasts/45279c3a-c09f-47d1-a3a3-88e6e2507230/watch-dem-thrones-by-black-with-no-chaserIHeartRadiohttps://www.iheart.com/podcast/269-watch-dem-thrones-by-black-101286659/?cmp=android_share&sc=android_social_share&pr=false#gameofthrones #demdragons #blackwithnochaser #houseofthedragonhbo #dragonseeds #theblacks #thegreens #houseofthedragon #youtube #targaryens #podcast #podsincolor #applemusic #spotifymusic #podsincolor

WATCH DEM THRONES by Black With No Chaser
Game of Thrones Season 3 EP6 "THE CLIMB" Recap

WATCH DEM THRONES by Black With No Chaser

Play Episode Listen Later Jun 12, 2023 101:12


Tywin plans unions for the Lannisters making a pact with Oleanna Tyrell to marry Loras and Cersei. Melisandre visits the Riverlands to take Gendry.; Robb wants to repair his alliance with House Frey and receives a list of demands, including a wedding for Edmure. Roose Bolton makes a decision about Jaime Lannister and sends him back to King's Landing,  but under one condition. He also keeps Brienne for treason. Jon, Ygritte and the Wildlings face a formidable climb at the wall and they almost come up short. We also get the famous "chaos is a ladder" line by Lil Finger to Varys thus episode.TO WATCH AND SUBSCRIBE:Facebook:https://www.facebook.com/blackwithnochaser/Twitter:https://twitter.com/BeBlackNoChaser?t=pVFV06lBFdZRu72ot4uCjA&s=09Twitter:https://twitter.com/WatchDemThrones?t=q0ngrYPlugf0ttzM2jo39A&s=09Apple Music: https://podcasts.apple.com/us/podcast/watch-dem-thrones-by-black-with-no-chaser/id1641754247Spotify:https://open.spotify.com/show/1qI1bJ1vIlobu502w6zrtN?si=mtsa3gZYRZW_3FmlCrv7UgBWNC RADIO: https://apps.apple.com/us/app/bwnc-radio/id6443800363Amazon Musichttps://music.amazon.com/podcasts/45279c3a-c09f-47d1-a3a3-88e6e2507230/watch-dem-thrones-by-black-with-no-chaserIHeartRadiohttps://www.iheart.com/podcast/269-watch-dem-thrones-by-black-101286659/?cmp=android_share&sc=android_social_share&pr=false#watchdemthrones #gameofthrones #demdragons #blackwithnochaser #podcast #podsincolor #humor #jokes #spotify #applemusic #youtube #HouseOfTheDragonHBO

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

It's now almost 6 months since Google declared Code Red, and the results — Jeff Dean's recap of 2022 achievements and a mass exodus of the top research talent that contributed to it in January, Bard's rushed launch in Feb, a slick video showing Google Workspace AI features and confusing doubly linked blogposts about PaLM API in March, and merging Google Brain and DeepMind in April — have not been inspiring. Google's internal panic is in full display now with the surfacing of a well written memo, written by software engineer Luke Sernau written in early April, revealing internal distress not seen since Steve Yegge's infamous Google Platforms Rant. Similar to 2011, the company's response to an external challenge has been to mobilize the entire company to go all-in on a (from the outside) vague vision.Google's misfortunes are well understood by now, but the last paragraph of the memo: “We have no moat, and neither does OpenAI”, was a banger of a mic drop.Combine this with news this morning that OpenAI lost $540m last year and will need as much as $100b more funding (after the complex $10b Microsoft deal in Jan), and the memo's assertion that both Google and OpenAI have “no moat” against the mighty open source horde have gained some credibility in the past 24 hours.Many are criticising this memo privately:* A CEO commented to me yesterday that Luke Sernau does not seem to work in AI related parts of Google and “software engineers don't understand moats”. * Emad Mostaque, himself a perma-champion of open source and open models, has repeatedly stated that “Closed models will always outperform open models” because closed models can just wrap open ones.* Emad has also commented on the moats he does see: “Unique usage data, Unique content, Unique talent, Unique product, Unique business model”, most of which Google does have, and OpenAI less so (though it is winning on the talent front)* Sam Altman famously said that “very few to no one is Silicon Valley has a moat - not even Facebook” (implying that moats don't actually matter, and you should spend your time thinking about more important things)* It is not actually clear what race the memo thinks Google and OpenAI are in vs Open Source. Neither are particularly concerned about running models locally on phones, and they are perfectly happy to let “a crazy European alpha male” run the last mile for them while they build actually monetizable cloud infrastructure.However moats are of intense interest by everybody keen on productized AI, cropping up in every Harvey, Jasper, and general AI startup vs incumbent debate. It is also interesting to take the memo at face value and discuss the searing hot pace of AI progress in open source. We hosted this discussion yesterday with Simon Willison, who apart from being an incredible communicator also wrote a great recap of the No Moat memo. 2,800 have now tuned in on Twitter Spaces, but we have taken the audio and cleaned it up here. Enjoy!Timestamps* [00:00:00] Introducing the Google Memo* [00:02:48] Open Source > Closed?* [00:05:51] Running Models On Device* [00:07:52] LoRA part 1* [00:08:42] On Moats - Size, Data* [00:11:34] Open Source Models are Comparable on Data* [00:13:04] Stackable LoRA* [00:19:44] The Need for Special Purpose Optimized Models* [00:21:12] Modular - Mojo from Chris Lattner* [00:23:33] The Promise of Language Supersets* [00:28:44] Google AI Strategy* [00:29:58] Zuck Releasing LLaMA* [00:30:42] Google Origin Confirmed* [00:30:57] Google's existential threat* [00:32:24] Non-Fiction AI Safety ("y-risk")* [00:35:17] Prompt Injection* [00:36:00] Google vs OpenAI* [00:41:04] Personal plugs: Simon and TravisTranscripts[00:00:00] Introducing the Google Memo[00:00:00] Simon Willison: So, yeah, this is a document, which Kate, which I first saw at three o'clock this morning, I think. It claims to be leaked from Google. There's good reasons to believe it is leaked from Google, and to be honest, if it's not, it doesn't actually matter because the quality of the analysis, I think stands alone.[00:00:15] If this was just a document by some anonymous person, I'd still think it was interesting and worth discussing. And the title of the document is We Have No Moat and neither does Open ai. And the argument it makes is that while Google and OpenAI have been competing on training bigger and bigger language models, the open source community is already starting to outrun them, given only a couple of months of really like really, really serious activity.[00:00:41] You know, Facebook lama was the thing that really kicked us off. There were open source language models like Bloom before that some G P T J, and they weren't very impressive. Like nobody was really thinking that they were. Chat. G P T equivalent Facebook Lama came out in March, I think March 15th. And was the first one that really sort of showed signs of being as capable maybe as chat G P T.[00:01:04] My, I don't, I think all of these models, they've been, the analysis of them has tend to be a bit hyped. Like I don't think any of them are even quite up to GT 3.5 standards yet, but they're within spitting distance in some respects. So anyway, Lama came out and then, Two weeks later Stanford Alpaca came out, which was fine tuned on top of Lama and was a massive leap forward in terms of quality.[00:01:27] And then a week after that Vicuna came out, which is to this date, the the best model I've been able to run on my own hardware. I, on my mobile phone now, like, it's astonishing how little resources you need to run these things. But anyway, the the argument that this paper made, which I found very convincing is it only took open source two months to get this far.[00:01:47] It's now every researcher in the world is kicking it on new, new things, but it feels like they're being there. There are problems that Google has been trying to solve that the open source models are already addressing, and really how do you compete with that, like with your, it's closed ecosystem, how are you going to beat these open models with all of this innovation going on?[00:02:04] But then the most interesting argument in there is it talks about the size of models and says that maybe large isn't a competitive advantage, maybe actually a smaller model. With lots of like different people fine tuning it and having these sort of, these LoRA l o r a stackable fine tuning innovations on top of it, maybe those can move faster.[00:02:23] And actually having to retrain your giant model every few months from scratch is, is way less useful than having small models that you can tr you can fine tune in a couple of hours on laptop. So it's, it's fascinating. I basically, if you haven't read this thing, you should read every word of it. It's not very long.[00:02:40] It's beautifully written. Like it's, it's, I mean, If you try and find the quotable lines in it, almost every line of it's quotable. Yeah. So, yeah, that's that, that, that's the status of this[00:02:48] Open Source > Closed?[00:02:48] swyx: thing. That's a wonderful summary, Simon. Yeah, there, there's so many angles we can take to this. I, I'll just observe one, one thing which if you think about the open versus closed narrative, Ima Mok, who is the CEO of Stability, has always been that open will trail behind closed, because the closed alternatives can always take.[00:03:08] Learnings and lessons from open source. And this is the first highly credible statement that is basically saying the exact opposite, that open source is moving than, than, than closed source. And they are scared. They seem to be scared. Which is interesting,[00:03:22] Travis Fischer: Travis. Yeah, the, the, the, a few things that, that I'll, I'll, I'll say the only thing which can keep up with the pace of AI these days is open source.[00:03:32] I think we're, we're seeing that unfold in real time before our eyes. And. You know, I, I think the other interesting angle of this is to some degree LLMs are they, they don't really have switching costs. They are going to be, become commoditized. At least that's, that's what a lot of, a lot of people kind of think to, to what extent is it Is it a, a rate in terms of, of pricing of these things?[00:03:55] , and they all kind of become roughly the, the, the same in, in terms of their, their underlying abilities. And, and open source is gonna, gonna be actively pushing, pushing that forward. And, and then this is kind of coming from, if it is to be believed the kind of Google or an insider type type mentality around you know, where is the actual competitive advantage?[00:04:14] What should they be focusing on? How can they get back in into the game? When you know, when, when, when, when currently the, the, the external view of, of Google is that they're kind of spinning their wheels and they have this code red,, and it's like they're, they're playing catch up already.[00:04:28] Like how could they use the open source community and work with them, which is gonna be really, really hard you know, from a structural perspective given Google's place in the ecosystem. But a, a lot, lot, a lot of jumping off points there.[00:04:42] Alessio Fanelli: I was gonna say, I think the Post is really focused on how do we get the best model, but it's not focused on like, how do we build the best product around it.[00:04:50] A lot of these models are limited by how many GPUs you can get to run them and we've seen on traditional open source, like everybody can use some of these projects like Kafka and like Alaska for free. But the reality is that not everybody can afford to run the infrastructure needed for it.[00:05:05] So I, I think like the main takeaway that I have from this is like, A lot of the moats are probably around just getting the, the sand, so to speak, and having the GPUs to actually serve these models. Because even if the best model is open source, like running it at large scale for an end is not easy and like, it's not super convenient to get a lot, a lot of the infrastructure.[00:05:27] And we've seen that model work in open source where you have. The opensource project, and then you have a enterprise cloud hosted version for it. I think that's gonna look really different in opensource models because just hosting a model doesn't have a lot of value. So I'm curious to hear how people end up getting rewarded to do opensource.[00:05:46] You know, it's, we figured that out in infrastructure, but we haven't figured it out in in Alans[00:05:51] Running Models On Device[00:05:51] Simon Willison: yet. I mean, one thing I'll say is that the the models that you can run on your own devices are so far ahead of what I ever dreamed they would be at this point. Like Vicuna 13 b i i, I, I think is the current best available open mo model that I've played with.[00:06:08] It's derived from Facebook Lama, so you can't use it for commercial purposes yet. But the point about MCK 13 B is it runs in the browser directly on web gpu. There's this amazing web l l M project where you literally, your browser downloaded a two gigabyte file. And it fires up a chat g D style interface and it's quite good.[00:06:27] It can do rap battles between different animals and all of the kind of fun stuff that you'd expect to be able to do the language model running entirely in Chrome canary. It's shocking to me that that's even possible, but that kind of shows that once, once you get to inference, if you can shrink the model down and the techniques for shrinking these models, the, the first one was the the quantization.[00:06:48] Which the Lama CPP project really sort of popularized Matt can by using four bits instead of 16 bit floating point numbers, you can shrink it down quite a lot. And then there was a paper that came out days ago suggesting that you can prune the models and ditch half the model and maintain the same level of quality.[00:07:05] So with, with things like that, with all of these tricks coming together, it's really astonishing how much you can get done on hardware that people actually have in their pockets even.[00:07:15] swyx: Just for completion I've been following all of your posts. Oh, sorry. Yes. I just wanna follow up, Simon. You're, you said you're running a model on your phone. Which model is it? And I don't think you've written it up.[00:07:27] Simon Willison: Yeah, that one's vina. I did, did I write it up? I did. I've got a blog post about how it it, it, it knows who I am, sort of, but it said that I invented a, a, a pattern for living called bear or bunny pattern, which I definitely didn't, but I loved that my phone decided that I did.[00:07:44] swyx: I will hunt for that because I'm not yet running Vic on my phone and I feel like I should and, and as like a very base thing, but I'll, okay.[00:07:52] Stackable LoRA Modules[00:07:52] swyx: Also, I'll follow up two things, right? Like one I'm very interesting and let's, let's talk about that a little bit more because this concept of stackable improvements to models I think is extremely interesting.[00:08:00] Like, I would love to MPM install abilities onto my models, right? Which is really awesome. But the, the first thing thing is under-discussed is I don't get the panic. Like, honestly, like Google has the most moats. I I, I was arguing maybe like three months ago on my blog. Like Google has the most mote out of a lot of people because, hey, we have your calendar.[00:08:21] Hey, we have your email. Hey, we have your you know, Google Docs. Like, isn't that a, a sufficient mode? Like, why are these guys panicking so much? I don't, I still don't get it. Like, Sure open source is running ahead and like, it's, it's on device and whatev, what have you, but they have so much more mode.[00:08:36] Like, what are we talking about here? There's many dimensions to compete on.[00:08:42] On Moats - Size, Data[00:08:42] Travis Fischer: Yeah, there's like one of, one of the, the things that, that the author you know, mentions in, in here is when, when you start to, to, to have the feeling of what we're trailing behind, then you're, you're, you're, you're brightest researchers jump ship and go to OpenAI or go to work at, at, at academia or, or whatever.[00:09:00] And like the talent drain. At the, the level of the, the senior AI researchers that are pushing these things ahead within Google, I think is a serious, serious concern. And my, my take on it's a good point, right? Like, like, like, like what Google has modes. They, they, they're not running outta money anytime soon.[00:09:16] You know, I think they, they do see the level of the, the defensibility and, and the fact that they want to be, I'll chime in the, the leader around pretty much anything. Tech first. There's definitely ha ha have lost that, that, that feeling. Right? , and to what degree they can, they can with the, the open source community to, to get that back and, and help drive that.[00:09:38] You know all of the llama subset of models with, with alpaca and Vicuna, et cetera, that all came from, from meta. Right. Like that. Yeah. Like it's not licensed in an open way where you can build a company on top of it, but is now kind of driving this family of, of models, like there's a tree of models that, that they're, they're leading.[00:09:54] And where is Google in that, in that playbook? Like for a long time they were the one releasing those models being super open and, and now it's just they, they've seem to be trailing and there's, there's people jumping ship and to what degree can they, can they, can they. Close off those wounds and, and focus on, on where, where they, they have unique ability to, to gain momentum.[00:10:15] I think is a core part of my takeaway from this. Yeah.[00:10:19] Alessio Fanelli: And think another big thing in the post is, oh, as long as you have high quality data, like you don't need that much data, you can just use that. The first party data loops are probably gonna be the most important going forward if we do believe that this is true.[00:10:32] So, Databricks. We have Mike Conover from Databricks on the podcast, and they talked about how they came up with the training set for Dolly, which they basically had Databricks employees write down very good questions and very good answers for it. Not every company as the scale to do that. And I think products like Google, they have millions of people writing Google Docs.[00:10:54] They have millions of people using Google Sheets, then millions of people writing stuff, creating content on YouTube. The question is, if you wanna compete against these companies, maybe the model is not what you're gonna do it with because the open source kind of commoditizes it. But how do you build even better data?[00:11:12] First party loops. And that's kind of the hardest thing for startups, right? Like even if we open up the, the models to everybody and everybody can just go on GitHub and. Or hugging face and get the waste to the best model, but get enough people to generate data for me so that I can still make it good. That's, that's what I would be worried about if I was a, a new company.[00:11:31] How do I make that happen[00:11:32] Simon Willison: really quickly?[00:11:34] Open Source Models are Comparable on Data[00:11:34] Simon Willison: I'm not convinced that the data is that big a challenge. So there's this PO project. So the problem with Facebook LAMA is that it's not available for, for commercial use. So people are now trying to train a alternative to LAMA that's entirely on openly licensed data.[00:11:48] And that the biggest project around that is this red pajama project, which They released their training data a few weeks ago and it was 2.7 terabytes. Right? So actually tiny, right? You can buy a laptop that you can fit 2.7 terabytes on. Got it. But it was the same exact data that Facebook, the same thing that Facebook Lamb had been trained on.[00:12:06] Cuz for your base model. You're not really trying to teach it fact about the world. You're just trying to teach it how English and other languages work, how they fit together. And then the real magic is when you fine tune on top of that. That's what Alpaca did on top of Lama and so on. And the fine tuning sets, it looks like, like tens of thousands of examples to kick one of these role models into shape.[00:12:26] And tens of thousands of examples like Databricks spent a month and got the 2000 employees of their company to help kick in and it worked. You've got the open assistant project of crowdsourcing this stuff now as well. So it's achievable[00:12:40] swyx: sore throat. I agree. I think it's a fa fascinating point. Actually, so I've heard through the grapevine then red pajamas model.[00:12:47] Trained on the, the data that they release is gonna be releasing tomorrow. And it's, it's this very exciting time because the, the, there, there's a, there's a couple more models that are coming down the pike, which independently we produced. And so yeah, that we, everyone is challenging all these assumptions from, from first principles, which is fascinating.[00:13:04] Stackable LoRA[00:13:04] swyx: I, I did, I did wanted to, to like try to get a little bit more technical in terms of like the, the, the, the specific points race. Cuz this doc, this doc was just amazing. Can we talk about LoRA. I, I, I'll open up to Simon again if he's back.[00:13:16] Simon Willison: I'd rather someone else take on. LoRA, I've, I, I know as much as I've read in that paper, but not much more than that.[00:13:21] swyx: So I thought it was this kind of like an optimization technique. So LoRA stands for lower rank adaptation. But this is the first mention of LoRA as a form of stackable improvements. Where he I forget what, let, just, let me just kind of Google this. But obviously anyone's more knowledgeable please.[00:13:39] So come on in.[00:13:40] Alessio Fanelli: I, all of Lauren is through GTS Man, about 20 minutes on GT four, trying to figure out word. It was I study computer science, but this is not this is not my area of expertise. What I got from it is that basically instead of having to retrain the whole model you can just pick one of the ranks and you take.[00:13:58] One of like the, the weight matrix tests and like make two smaller matrixes from it and then just two to be retrained and training the whole model. So[00:14:08] swyx: it save a lot of Yeah. You freeze part of the thing and then you just train the smaller part like that. Exactly. That seems to be a area of a lot of fruitful research.[00:14:15] Yeah. I think Mini GT four recently did something similar as well. And then there's, there's, there's a, there's a Spark Model people out today that also did the same thing.[00:14:23] Simon Willison: So I've seen a lot of LoRA stable, the stable diffusion community has been using LoRA a lot. So they, in that case, they had a, I, the thing I've seen is people releasing LoRA's that are like you, you train a concept like a, a a particular person's face or something you release.[00:14:38] And the, the LoRA version of this end up being megabytes of data, like, which is, it's. You know, it's small enough that you can just trade those around and you can effectively load multiple of those into the model. But what I haven't realized is that you can use the same trick on, on language models. That was one of the big new things for me in reading the the leaks Google paper today.[00:14:56] Alessio Fanelli: Yeah, and I think the point to make around on the infrastructure, so what tragedy has told me is that when you're figuring out what rank you actually wanna do this fine tuning at you can have either go too low and like the model doesn't actually learn it. Or you can go too high and the model overfit those learnings.[00:15:14] So if you have a base model that everybody agrees on, then all the subsequent like LoRA work is done around the same rank, which gives you an advantage. And the point they made in the, that, since Lama has been the base for a lot of this LoRA work like they own. The, the mind share of the community.[00:15:32] So everything that they're building is compatible with their architecture. But if Google Opensources their own model the rank that they chose For LoRA on Lama might not work on the Google model. So all of the existing work is not portable. So[00:15:46] Simon Willison: the impression I got is that one of the challenges with LoRA is that you train all these LoRAs on top of your model, but then if you retrain that base model as LoRA's becoming invalid, right?[00:15:55] They're essentially, they're, they're, they're built for an exact model version. So this means that being the big company with all of the GPUs that can afford to retrain a model every three months. That's suddenly not nearly as valuable as it used to be because now maybe there's an open source model that's five years old at this point and has like multiple, multiple stacks of LoRA's trained all over the world on top of it, which can outperform your brand new model just because there's been so much more iteration on that base.[00:16:20] swyx: I, I think it's, I think it's fascinating. It's I think Jim Fan from Envidia was recently making this argument for transformers. Like even if we do come up with a better. Architecture, then transformers, they're the sheer hundreds and millions of dollars that have been invested on top of transformers.[00:16:34] Make it actually there is some switching costs and it's not exactly obvious that better architecture. Equals equals we should all switch immediately tomorrow. It's, it's, it's[00:16:44] Simon Willison: kinda like the, the difficulty of launching a new programming language today Yes. Is that pipeline and JavaScript have a million packages.[00:16:51] So no matter how good your new language is, if it can't tap into those existing package libraries, it's, it's not gonna be useful for, which is why Moji is so clever, because they did build on top of Pips. They get all of that existing infrastructure, all of that existing code working already.[00:17:05] swyx: I mean, what, what thought you, since you co-create JAO and all that do, do we wanna take a diversion into mojo?[00:17:10] No, no. I[00:17:11] Travis Fischer: would, I, I'd be happy to, to, to jump in, and get Simon's take on, on Mojo. 1, 1, 1 small, small point on LoRA is I, I, I just think. If you think about at a high level, what the, the major down downsides are of these, these large language models. It's the fact that they well they're, they're, they're difficult to, to train, right?[00:17:32] They, they tend to hallucinate and they are, have, have a static, like, like they were trained at a certain date, right? And with, with LoRA, I think it makes it a lot more amenable to Training new, new updates on top of that, that like base model on the fly where you can incorporate new, new data and in a way that is, is, is an interesting and potentially more optimal alternative than Doing the kind of in context generation cuz, cuz most of like who at perplexity AI or, or any of these, these approaches currently, it's like all based off of doing real-time searches and then injecting as much into the, the, the local context window as possible so that you, you try to ground your, your, your, your language model.[00:18:16] Both in terms of the, the information it has access to that, that, that helps to reduce hallucinations. It can't reduce it, but helps to reduce it and then also gives it access to up-to-date information that wasn't around for that, that massive like, like pre-training step. And I think LoRA in, in, in mine really makes it more, more amenable to having.[00:18:36] Having constantly shifting lightweight pre-training on top of it that scales better than than normal. Pre I'm sorry. Fine tune, fine tuning. Yeah, that, that was just kinda my one takeaway[00:18:45] Simon Willison: there. I mean, for me, I've never been, I want to run models on my own hard, I don't actually care about their factual content.[00:18:52] Like I don't need a model that's been, that's trained on the most upstate things. What I need is a model that can do the bing and bar trick, right? That can tell when it needs to run a search. And then go and run a search to get extra information and, and bring that context in. And similarly, I wanted to be able to operate tools where it can access my email or look at my notes or all of those kinds of things.[00:19:11] And I don't think you need a very powerful model for that. Like that's one of the things where I feel like, yeah, vicuna running on my, on my laptop is probably powerful enough to drive a sort of personal research assistant, which can look things up for me and it can summarize things for my notes and it can do all of that and I don't care.[00:19:26] But it doesn't know about the Ukraine war because the Ukraine war training cutoff, that doesn't matter. If it's got those additional capabilities, which are quite easy to build the reason everyone's going crazy building agents and tools right now is that it's a few lines of Python code, and a sort of couple of paragraphs to get it to.[00:19:44] The Need for Special Purpose Optimized Models[00:19:44] Simon Willison: Well, let's, let's,[00:19:45] Travis Fischer: let's maybe dig in on that a little bit. And this, this also is, is very related to mojo. Cuz I, I do think there are use cases and domains where having the, the hyper optimized, like a version of these models running on device is, is very relevant where you can't necessarily make API calls out on the fly.[00:20:03] and Aug do context, augmented generation. And I was, I was talking with, with a a researcher. At Lockheed Martin yesterday, literally about like, like the, the version of this that's running of, of language models running on, on fighter jets. Right? And you, you talk about like the, the, the amount of engineering, precision and optimization that has to go into, to those type of models.[00:20:25] And the fact that, that you spend so much money, like, like training a super distilled ver version where milliseconds matter it's a life or death situation there. You know, and you couldn't even, even remotely ha ha have a use case there where you could like call out and, and have, have API calls or something.[00:20:40] So I, I do think there's like keeping in mind the, the use cases where, where. There, there'll be use cases that I'm more excited about at, at the application level where, where, yeah, I want to to just have it be super flexible and be able to call out to APIs and have this agentic type type thing.[00:20:56] And then there's also industries and, and use cases where, where you really need everything baked into the model.[00:21:01] swyx: Yep. Agreed. My, my favorite piece take on this is I think DPC four as a reasoning engine, which I think came from the from Nathan at every two. Which I think, yeah, I see the hundred score over there.[00:21:12] Modular - Mojo from Chris Lattner[00:21:12] swyx: Simon, do you do you have a, a few seconds on[00:21:14] Simon Willison: mojo. Sure. So Mojo is a brand new program language you just announced a few days ago. It's not actually available yet. I think there's an online demo, but to zooming it becomes an open source language we can use. It's got really some very interesting characteristics.[00:21:29] It's a super set of Python, so anything written in Python, Python will just work, but it adds additional features on top that let you basically do very highly optimized code with written. In Python syntax, it compiles down the the main thing that's exciting about it is the pedigree that it comes from.[00:21:47] It's a team led by Chris Latner, built L L V M and Clang, and then he designed Swift at Apple. So he's got like three, three for three on, on extraordinarily impactful high performance computing products. And he put together this team and they've basically, they're trying to go after the problem of how do you build.[00:22:06] A language which you can do really high performance optimized work in, but where you don't have to do everything again from scratch. And that's where building on top of Python is so clever. So I wasn't like, if this thing came along, I, I didn't really pay attention to it until j Jeremy Howard, who built Fast ai put up a very detailed blog post about why he was excited about Mojo, which included a, there's a video demo in there, which everyone should watch because in that video he takes Matrix multiplication implemented in Python.[00:22:34] And then he uses the mojo extras to 2000 x. The performance of that matrix multiplication, like he adds a few static types functions sort of struck instead of the class. And he gets 2000 times the performance out of it, which is phenomenal. Like absolutely extraordinary. So yeah, that, that got me really excited.[00:22:52] Like the idea that we can still use Python and all of this stuff we've got in Python, but we can. Just very slightly tweak some things and get literally like thousands times upwards performance out of the things that matter. That's really exciting.[00:23:07] swyx: Yeah, I, I, I'm curious, like, how come this wasn't thought of before?[00:23:11] It's not like the, the, the concept of a language super set hasn't hasn't, has, has isn't, is completely new. But all, as far as I know, all the previous Python interpreter approaches, like the alternate runtime approaches are like they, they, they're more, they're more sort of, Fit conforming to standard Python, but never really tried this additional approach of augmenting the language.[00:23:33] The Promise of Language Supersets[00:23:33] swyx: I, I'm wondering if you have many insights there on, like, why, like why is this a, a, a breakthrough?[00:23:38] Simon Willison: Yeah, that's a really interesting question. So, Jeremy Howard's piece talks about this thing called M L I R, which I hadn't heard of before, but this was another Chris Latner project. You know, he built L L VM as a low level virtual machine.[00:23:53] That you could build compilers on top of. And then M L I R was this one that he initially kicked off at Google, and I think it's part of TensorFlow and things like that. But it was very much optimized for multiple cores and GPU access and all of that kind of thing. And so my reading of Jeremy Howard's article is that they've basically built Mojo on top of M L I R.[00:24:13] So they had a huge, huge like a starting point where they'd, they, they knew this technology better than anyone else. And because they had this very, very robust high performance basis that they could build things on. I think maybe they're just the first people to try and build a high, try and combine a high level language with M L A R, with some extra things.[00:24:34] So it feels like they're basically taking a whole bunch of ideas people have been sort of experimenting with over the last decade and bundled them all together with exactly the right team, the right level of expertise. And it looks like they've got the thing to work. But yeah, I mean, I've, I've, I'm. Very intrigued to see, especially once this is actually available and we can start using it.[00:24:52] It, Jeremy Howard is someone I respect very deeply and he's, he's hyping this thing like crazy, right? His headline, his, and he's not the kind of person who hypes things if they're not worth hyping. He said Mojo may be the biggest programming language advanced in decades. And from anyone else, I'd kind of ignore that headline.[00:25:09] But from him it really means something.[00:25:11] swyx: Yes, because he doesn't hype things up randomly. Yeah, and, and, and he's a noted skeptic of Julia which is, which is also another data science hot topic. But from the TypeScript and web, web development worlds there has been a dialect of TypeScript that was specifically optimized to compile, to web assembly which I thought was like promising and then, and, and eventually never really took off.[00:25:33] But I, I like this approach because I think more. Frameworks should, should essentially be languages and recognize that they're language superset and maybe working compilers that that work on them. And then that is the, by the way, that's the direction that React is going right now. So fun times[00:25:50] Simon Willison: type scripts An interesting comparison actually, cuz type script is effectively a superset of Java script, right?[00:25:54] swyx: It's, but there's no, it's purely[00:25:57] Simon Willison: types, right? Gotcha. Right. So, so I guess mojo is the soup set python, but the emphasis is absolutely on tapping into the performance stuff. Right.[00:26:05] swyx: Well, the just things people actually care about.[00:26:08] Travis Fischer: Yeah. The, the one thing I've found is, is very similar to the early days of type script.[00:26:12] There was the, the, the, the most important thing was that it's incrementally adoptable. You know, cuz people had a script code basis and, and they wanted to incrementally like add. The, the, the main value prop for TypeScript was reliability and the, the, the, the static typing. And with Mojo, Lucia being basically anyone who's a target a large enterprise user of, of Mojo or even researchers, like they're all going to be coming from a, a hardcore.[00:26:36] Background in, in Python and, and have large existing libraries. And the the question will be for what use cases will mojo be like a, a, a really good fit for that incremental adoption where you can still tap into your, your, your massive, like python exi existing infrastructure workflows, data tooling, et cetera.[00:26:55] And, and what does, what does that path to adoption look like?[00:26:59] swyx: Yeah, we, we, we don't know cuz it's a wait listed language which people were complaining about. They, they, the, the mojo creators were like saying something about they had to scale up their servers. And I'm like, what language requires essential server?[00:27:10] So it's a little bit suss, a little bit, like there's a, there's a cloud product already in place and they're waiting for it. But we'll see. We'll see. I mean, emojis should be promising in it. I, I actually want more. Programming language innovation this way. You know, I was complaining years ago that programming language innovation is all about stronger types, all fun, all about like more functional, more strong types everywhere.[00:27:29] And, and this is, the first one is actually much more practical which I, which I really enjoy. This is why I wrote about self provisioning run types.[00:27:36] Simon Willison: And[00:27:37] Alessio Fanelli: I mean, this is kind of related to the post, right? Like if you stop all of a sudden we're like, the models are all the same and we can improve them.[00:27:45] Like, where can we get the improvements? You know, it's like, Better run times, better languages, better tooling, better data collection. Yeah. So if I were a founder today, I wouldn't worry as much about the model, maybe, but I would say, okay, what can I build into my product and like, or what can I do at the engineering level that maybe it's not model optimization because everybody's working on it, but like you said, it's like, why haven't people thought of this before?[00:28:09] It's like, it's, it's definitely super hard, but I'm sure that if you're like Google or you're like open AI or you're like, Databricks, we got smart enough people that can think about these problems, so hopefully we see more of this.[00:28:21] swyx: You need, Alan? Okay. I promise to keep this relatively tight. I know Simon on a beautiful day.[00:28:27] It is a very nice day in California. I wanted to go through a few more points that you have pulled out Simon and, and just give you the opportunity to, to rant and riff and, and what have you. I, I, are there any other points from going back to the sort of Google OpenAI mode documents that, that you felt like we, we should dive in on?[00:28:44] Google AI Strategy[00:28:44] Simon Willison: I mean, the really interesting stuff there is the strategy component, right? The this idea that that Facebook accidentally stumbled into leading this because they put out this model that everyone else is innovating on top of. And there's a very open question for me as to would Facebook relic Lama to allow for commercial usage?[00:29:03] swyx: Is there some rumor? Is that, is that today?[00:29:06] Simon Willison: Is there a rumor about that?[00:29:07] swyx: That would be interesting? Yeah, I saw, I saw something about Zuck saying that he would release the, the Lama weights officially.[00:29:13] Simon Willison: Oh my goodness. No, that I missed. That is, that's huge.[00:29:17] swyx: Let me confirm the tweet. Let me find the tweet and then, yeah.[00:29:19] Okay.[00:29:20] Simon Willison: Because actually I met somebody from Facebook machine learning research a couple of weeks ago, and I, I pressed 'em on this and they said, basically they don't think it'll ever happen because if it happens, and then somebody does horrible fascist stuff with this model, all of the headlines will be Meg releases a monster into the world.[00:29:36] So, so hi. His, the, the, the, a couple of weeks ago, his feeling was that it's just too risky for them to, to allow it to be used like that. But a couple of weeks is, is, is a couple of months in AI world. So yeah, it wouldn't be, it feels to me like strategically Facebook should be jumping right on this because this puts them at the very.[00:29:54] The very lead of, of open source innovation around this stuff.[00:29:58] Zuck Releasing LLaMA[00:29:58] swyx: So I've pinned the tweet talking about Zuck and Zuck saying that meta will open up Lama. It's from the founder of Obsidian, which gives it a slight bit more credibility, but it is the only. Tweet that I can find about it. So completely unsourced,[00:30:13] we shall see. I, I, I mean I have friends within meta, I should just go ask them. But yeah, I, I mean one interesting angle on, on the memo actually is is that and, and they were linking to this in, in, in a doc, which is apparently like. Facebook got a bunch of people to do because they, they never released it for commercial use, but a lot of people went ahead anyway and, and optimized and, and built extensions and stuff.[00:30:34] They, they got a bunch of free work out of opensource, which is an interesting strategy.[00:30:39] There's okay. I don't know if I.[00:30:42] Google Origin Confirmed[00:30:42] Simon Willison: I've got exciting piece of news. I've just heard from somebody with contacts at Google that they've heard people in Google confirm the leak. That that document wasn't even legit Google document, which I don't find surprising at all, but I'm now up to 10, outta 10 on, on whether that's, that's, that's real.[00:30:57] Google's existential threat[00:30:57] swyx: Excellent. Excellent. Yeah, it is fascinating. Yeah, I mean the, the strategy is, is, is really interesting. I think Google has been. Definitely sleeping on monetizing. You know, I, I, I heard someone call when Google Brain and Devrel I merged that they would, it was like goodbye to the Xerox Park of our era and it definitely feels like Google X and Google Brain would definitely Xerox parks of our, of our era, and I guess we all benefit from that.[00:31:21] Simon Willison: So, one thing I'll say about the, the Google side of things, like the there was a question earlier, why are Google so worried about this stuff? And I think it's, it's just all about the money. You know, the, the, the engine of money at Google is Google searching Google search ads, and who uses Chachi PT on a daily basis, like me, will have noticed that their usage of Google has dropped like a stone.[00:31:41] Because there are many, many questions that, that chat, e p t, which shows you no ads at all. Is, is, is a better source of information for than Google now. And so, yeah, I'm not, it doesn't surprise me that Google would see this as an existential threat because whether or not they can be Bard, it's actually, it's not great, but it, it exists, but it hasn't it yet either.[00:32:00] And if I've got a Chatbook chatbot that's not showing me ads and chatbot that is showing me ads, I'm gonna pick the one that's not showing[00:32:06] swyx: me ads. Yeah. Yeah. I, I agree. I did see a prototype of Bing with ads. Bing chat with ads. I haven't[00:32:13] Simon Willison: seen the prototype yet. No.[00:32:15] swyx: Yeah, yeah. Anyway, I I, it, it will come obviously, and then we will choose, we'll, we'll go out of our ways to avoid ads just like we always do.[00:32:22] We'll need ad blockers and chat.[00:32:23] Excellent.[00:32:24] Non-Fiction AI Safety ("y-risk")[00:32:24] Simon Willison: So I feel like on the safety side, the, the safety side, there are basically two areas of safety that I, I, I sort of split it into. There's the science fiction scenarios, the AI breaking out and killing all humans and creating viruses and all of that kind of thing. The sort of the terminated stuff. And then there's the the.[00:32:40] People doing bad things with ai and that's latter one is the one that I think is much more interesting and that cuz you could u like things like romance scams, right? Romance scams already take billions of dollars from, from vulner people every year. Those are very easy to automate using existing tools.[00:32:56] I'm pretty sure for QNA 13 b running on my laptop could spin up a pretty decent romance scam if I was evil and wanted to use it for them. So that's the kind of thing where, I get really nervous about it, like the fact that these models are out there and bad people can use these bad, do bad things.[00:33:13] Most importantly at scale, like romance scamming, you don't need a language model to pull off one romance scam, but if you wanna pull off a thousand at once, the language model might be the, the thing that that helps you scale to that point. And yeah, in terms of the science fiction stuff and also like a model on my laptop that can.[00:33:28] Guess what comes next in a sentence. I'm not worried that that's going to break out of my laptop and destroy the world. There. There's, I'm get slightly nervous about the huge number of people who are trying to build agis on top of this models, the baby AGI stuff and so forth, but I don't think they're gonna get anywhere.[00:33:43] I feel like if you actually wanted a model that was, was a threat to human, a language model would be a tiny corner of what that thing. Was actually built on top of, you'd need goal setting and all sorts of other bits and pieces. So yeah, for the moment, the science fiction stuff doesn't really interest me, although it is a little bit alarming seeing more and more of the very senior figures in this industry sort of tip the hat, say we're getting a little bit nervous about this stuff now.[00:34:08] Yeah.[00:34:09] swyx: So that would be Jeff Iton and and I, I saw this me this morning that Jan Lacoon was like happily saying, this is fine. Being the third cheer award winner.[00:34:20] Simon Willison: But you'll see a lot of the AI safe, the people who've been talking about AI safety for the longest are getting really angry about science fiction scenarios cuz they're like, no, the, the thing that we need to be talking about is the harm that you can cause with these models right now today, which is actually happening and the science fiction stuff kind of ends up distracting from that.[00:34:36] swyx: I love it. You, you. Okay. So, so Uher, I don't know how to pronounce his name. Elier has a list of ways that AI will kill us post, and I think, Simon, you could write a list of ways that AI will harm us, but not kill us, right? Like the, the, the non-science fiction actual harm ways, I think, right? I haven't seen a, a actual list of like, hey, romance scams spam.[00:34:57] I, I don't, I don't know what else, but. That could be very interesting as a Hmm. Okay. Practical. Practical like, here are the situations we need to guard against because they are more real today than that we need to. Think about Warren, about obviously you've been a big advocate of prompt injection awareness even though you can't really solve them, and I, I worked through a scenario with you, but Yeah,[00:35:17] Prompt Injection[00:35:17] Simon Willison: yeah.[00:35:17] Prompt injection is a whole other side of this, which is, I mean, that if you want a risk from ai, the risk right now is everyone who's building puts a building systems that attackers can trivially subvert into stealing all of their private data, unlocking their house, all of that kind of thing. So that's another very real risk that we have today.[00:35:35] swyx: I think in all our personal bios we should edit in prompt injections already, like in on my website, I wanna edit in a personal prompt injections so that if I get scraped, like I all know if someone's like reading from a script, right? That that is generated by any iBot. I've[00:35:49] Simon Willison: seen people do that on LinkedIn already and they get, they get recruiter emails saying, Hey, I didn't read your bio properly and I'm just an AI script, but would you like a job?[00:35:57] Yeah. It's fascinating.[00:36:00] Google vs OpenAI[00:36:00] swyx: Okay. Alright, so topic. I, I, I think, I think this this, this mote is is a peak under the curtain of the, the internal panic within Google. I think it is very val, very validated. I'm not so sure they should care so much about small models or, or like on device models.[00:36:17] But the other stuff is interesting. There is a comment at the end that you had by about as for opening open is themselves, open air, doesn't matter. So this is a Google document talking about Google's position in the market and what Google should be doing. But they had a comment here about open eye.[00:36:31] They also say open eye had no mode, which is a interesting and brave comment given that open eye is the leader in, in a lot of these[00:36:38] Simon Willison: innovations. Well, one thing I will say is that I think we might have identified who within Google wrote this document. Now there's a version of it floating around with a name.[00:36:48] And I look them up on LinkedIn. They're heavily involved in the AI corner of Google. So my guess is that at Google done this one, I've worked for companies. I'll put out a memo, I'll write up a Google doc and I'll email, email it around, and it's nowhere near the official position of the company or of the executive team.[00:37:04] It's somebody's opinion. And so I think it's more likely that this particular document is somebody who works for Google and has an opinion and distributed it internally and then it, and then it got leaked. I dunno if it's necessarily. Represents Google's sort of institutional thinking about this? I think it probably should.[00:37:19] Again, this is such a well-written document. It's so well argued that if I was an executive at Google and I read that, I would, I would be thinking pretty hard about it. But yeah, I don't think we should see it as, as sort of the official secret internal position of the company. Yeah. First[00:37:34] swyx: of all, I might promote that person.[00:37:35] Cuz he's clearly more,[00:37:36] Simon Willison: oh, definitely. He's, he's, he's really, this is a, it's, I, I would hire this person about the strength of that document.[00:37:42] swyx: But second of all, this is more about open eye. Like I'm not interested in Google's official statements about open, but I was interested like his assertion, open eye.[00:37:50] Doesn't have a mote. That's a bold statement. I don't know. It's got the best people.[00:37:55] Travis Fischer: Well, I, I would, I would say two things here. One, it's really interesting just at a meta, meta point that, that they even approached it this way of having this public leak. It, it, it kind of, Talks a little bit to the fact that they, they, they felt that that doing do internally, like wasn't going to get anywhere or, or maybe this speaks to, to some of the like, middle management type stuff or, or within Google.[00:38:18] And then to the, the, the, the point about like opening and not having a moat. I think for, for large language models, it, it, it will be over, over time kind of a race to the bottom just because the switching costs are, are, are so low compared with traditional cloud and sas. And yeah, there will be differences in, in, in quality, but, but like over time, if you, you look at the limit of these things like the, I I think Sam Altman has been quoted a few times saying that the, the, the price of marginal price of intelligence will go to zero.[00:38:47] Time and the marginal price of energy powering that intelligence will, will also hit over time. And in that world, if you're, you're providing large language models, they become commoditized. Like, yeah. What, what is, what is your mode at that point? I don't know. I think they're e extremely well positioned as a team and as a company for leading this space.[00:39:03] I'm not that, that worried about that, but it is something from a strategic point of view to keep in mind about large language models becoming a commodity. So[00:39:11] Simon Willison: it's quite short, so I think it's worth just reading the, in fact, that entire section, it says epilogue. What about open ai? All of this talk of open source can feel unfair given open AI's current closed policy.[00:39:21] Why do we have to share if they won't? That's talking about Google sharing, but the fact of the matter is we are already sharing everything with them. In the form of the steady flow of poached senior researchers until we spent that tide. Secrecy is a moot point. I love that. That's so salty. And, and in the end, open eye doesn't matter.[00:39:38] They are making the same mistakes that we are in their posture relative to open source. And their ability to maintain an edge is necessarily in question. Open source alternatives. Canned will eventually eclipse them. Unless they change their stance in this respect, at least we can make the first move. So the argument this, this paper is making is that Google should go, go like meta and, and just lean right into open sourcing it and engaging with the wider open source community much more deeply, which OpenAI have very much signaled they are not willing to do.[00:40:06] But yeah, it's it's, it's read the whole thing. The whole thing is full of little snippets like that. It's just super fun. Yes,[00:40:12] swyx: yes. Read the whole thing. I, I, I also appreciate that the timeline, because it set a lot of really great context for people who are out of the loop. So Yeah.[00:40:20] Alessio Fanelli: Yeah. And the final conspiracy theory is that right before Sundar and Satya and Sam went to the White House this morning, so.[00:40:29] swyx: Yeah. Did it happen? I haven't caught up the White House statements.[00:40:34] Alessio Fanelli: No. That I, I just saw, I just saw the photos of them going into the, the White House. I've been, I haven't seen any post-meeting updates.[00:40:41] swyx: I think it's a big win for philanthropic to be at that table.[00:40:44] Alessio Fanelli: Oh yeah, for sure. And co here it's not there.[00:40:46] I was like, hmm. Interesting. Well, anyway,[00:40:50] swyx: yeah. They need, they need some help. Okay. Well, I, I promise to keep this relatively tight. Spaces do tend to have a, have a tendency of dragging on. But before we go, anything that you all want to plug, anything that you're working on currently maybe go around Simon are you still working on dataset?[00:41:04] Personal plugs: Simon and Travis[00:41:04] Simon Willison: I am, I am, I'm having a bit of a, so datasets my open source project that I've been working on. It's about helping people analyze and publish data. I'm having an existential crisis of it at the moment because I've got access to the chat g p T code, interpreter mode, and you can upload the sequel light database to that and it will do all of the things that I, on my roadmap for the next 12 months.[00:41:24] Oh my God. So that's frustrating. So I'm basically, I'm leaning data. My interest in data and AI are, are rapidly crossing over a lot harder about the AI features that I need to build on top of dataset. Make sure it stays relevant in a chat. G p t can do most of the stuff that it does already. But yeah the thing, I'll plug my blog simon willis.net.[00:41:43] I'm now updating it daily with stuff because AI move moved so quickly and I have a sub newsletter, which is effectively my blog, but in email form sent out a couple of times a week, which Please subscribe to that or RSS feed on my blog or, or whatever because I'm, I'm trying to keep track of all sorts of things and I'm publishing a lot at the moment.[00:42:02] swyx: Yes. You, you are, and we love you very much for it because you, you are a very good reporter and technical deep diver into things, into all the things. Thank you, Simon. Travis are you ready to announce the, I guess you've announced it some somewhat. Yeah. Yeah.[00:42:14] Travis Fischer: So I'm I, I just founded a company.[00:42:16] I'm working on a framework for building reliable agents that aren't toys and focused on more constrained use cases. And you know, I I, I look at kind of agi. And these, these audigy type type projects as like jumping all the way to str to, to self-driving. And, and we, we, we kind of wanna, wanna start with some more enter and really focus on, on reliable primitives to, to start that.[00:42:38] And that'll be an open source type script project. I'll be releasing the first version of that soon. And that's, that's it. Follow me you know, on here for, for this type of stuff, I, I, I, everything, AI[00:42:48] swyx: and, and spa, his chat PT bot,[00:42:50] Travis Fischer: while you still can. Oh yeah, the chat VT Twitter bot is about 125,000 followers now.[00:42:55] It's still running. I, I'm not sure if it's your credit. Yeah. Can you say how much you spent actually, No, no. Well, I think probably totally like, like a thousand bucks or something, but I, it's, it's sponsored by OpenAI, so I haven't, I haven't actually spent any real money.[00:43:08] swyx: What? That's[00:43:09] awesome.[00:43:10] Travis Fischer: Yeah. Yeah.[00:43:11] Well, once, once I changed, originally the logo was the Chachi VUI logo and it was the green one, and then they, they hit me up and asked me to change it. So it's now it's a purple logo. And they're, they're, they're cool with that. Yeah.[00:43:21] swyx: Yeah. Sending take down notices to people with G B T stuff apparently now.[00:43:26] So it's, yeah, it's a little bit of a gray area. I wanna write more on, on mos. I've been actually collecting and meaning to write a piece of mos and today I saw the memo, I was like, oh, okay. Like I guess today's the day we talk about mos. So thank you all. Thanks. Thanks, Simon. Thanks Travis for, for jumping on and thanks to all the audience for engaging on this with us.[00:43:42] We'll continue to engage on Twitter, but thanks to everyone. Cool. Thanks everyone. Bye. Alright, thanks everyone. Bye. Get full access to Latent Space at www.latent.space/subscribe

D3 Glory Days Podcast
2023 Indoor Nationals Preview Part 2 with Cheick Traore, Mike Jasa, and Yakob Ekoue

D3 Glory Days Podcast

Play Episode Listen Later Mar 7, 2023 54:00


In a two part episode, we sit down with 7 athletes getting ready to compete at the D3 Indoor Track and Field national championships. We discuss how their season has gone so far, tactics they use to stay calm before big meets, and any predictions they may have. Time Stamp Cheick Traore - 200/400, Ramapo - 2:25 Mike Jasa - 800, 4x4, DMR, Loras - 18:32 Yakob Ekoue - SP/WT, UW-Eau Claire, 36:35 BPN Ambassador We're excited to be Ambassadors for Bare Performance Nutrition. A supplement company that gets you ready for next competition. Whether it's protein or electrolytes, their supplements are geared to help you compete at your best. I've been taking their electrolytes to refuel after a run and can it increase my recovery. If you're interested in trying them out, head over to their website and try today. How to Support D3 Glory Days: D3 Glory Days Venmo. We launched a Patreon! Subscribe and leave us a review on Apple Podcasts Instagram, Twitter and Strava. D3 Glory Days Merch

The Podcast About Division III Baseball
Episode 113: 2023 Midwest (Region 9) Preview

The Podcast About Division III Baseball

Play Episode Listen Later Feb 2, 2023 63:28


Welcome to Episode 113 of The Podcast About Division III Baseball. It's way too cold to start playing baseball in the Midwest Region (Region 9) but that's not gonna stop us from previewing all the D-III baseball teams within it. In order: ARC (at 2:40) Loras was definitely good but not "16-0" to start the year good; how will they keep the momentum going after a breakout season? Buena Vista can mash but will need some arms to step up Coe are the defending champs despite a bizarre season and have a lot of experience to replace Luther leads the rest of the pack within a very balanced conference on the whole MIAC (at 15:30) Bethel dominated in 2022 and has basically everyone back ready to do it again -- watch out for this squad Gustavus Adolphus and Saint John's (MN) are the likeliest to push the Royals Remember the name Kiefer Lord!!! (even if he isn't pitching in the MIAC anymore) MWC (at 29:10) We've seen a few different teams challenge for this conference title and the latest team on top was the Lawrence Vikings, who return an excellent lineup led by one of 2022's national HR leaders basically everyone else in this conference is solid, we'll see if UChicago can finally get over the hump UMAC (at 37:30) Crown was one of the best stories of 2022 and now has plenty back to defend their UMAC title Don't forget about Northwestern (MN), though, even if they have a ton of impact talent lost to transfer/graduation WIAC (at 45:50) UW-Stevens Point loudly re-established themselves back atop the D-III baseball world and look like the favorite again UW-Whitewater will need a ton of inexperienced arms to fill in a lot lost to graduation but the offense should roll again UW-La Crosse was ultra-young a year ago and might be ready to jump back into contention UW-Oshkosh was disappointing in 2022 but still has a good amount of talent to get back in the mix atop this conference Don't sleep on Eau Claire; that's a program clearly on the rise We conclude (at 1:01:10) with our player and pitcher of the year picks, and our teams to beat. Then we say goodbye.  Follow us on Twitter @d3baseballpod (DMs are open if you want to reach us there) and @CespedesBBQ.  Send us your best D-III stories or any other comments or questions to thed3baseballpodcast@gmail.com Subscribe + rate/review on iTunes: https://itunes.apple.com/us/podcast/the-podcast-about-division-iii-baseball/id1342691759  Spotify: https://open.spotify.com/show/5uk8q4iUrMUZRriKM1Akfx?si=b6820eb311f847f1  Support us on Patreon -- this will never be behind a paywall but we appreciate any support to help cover our podcasting hosting fees and all the hours we put into making these pods possible! https://patreon.com/user?u=87461961&utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=creatorshare_fan&utm_content=join_link 

Dubuque Area Baseball Podcast
Loras Hall of Fame Member Nick Olberding

Dubuque Area Baseball Podcast

Play Episode Listen Later Jan 24, 2023 29:04


Loras baseball very own "Iron Man' and newly inducted Loras College Hall of Fame member Nick Olberding stops by the podcast to talk about his time playing baseball. Nick talks about his time growing up in Dyersville, Iowa while playing for Beckman Catholic and the Dyserville Whitehawks. He shares stories about his time at Loras and his storied career. Find Us on Social Media: Facebook and Instagram: Dubuque Area Baseball Podcast Twitter: @CoachManeman