POPULARITY
**Palabras clave:** traducción automática, revistas de ciencia ficción en inglés, Office profesional, Word, inteligencia artificial, resúmenes, revista Lire, Julio Verne, De Thing, Mistral Nemo Instruct, Claude, LM Studio, Gemini Pro. **Traducción de revistas de ciencia ficción en inglés** **Resúmenes de artículos y revistas** **Inteligencia artificial en local**
**Palabras clave:** traducción automática, revistas de ciencia ficción en inglés, Office profesional, Word, inteligencia artificial, resúmenes, revista Lire, Julio Verne, De Thing, Mistral Nemo Instruct, Claude, LM Studio, Gemini Pro. **Traducción de revistas de ciencia ficción en inglés** **Resúmenes de artículos y revistas** **Inteligencia artificial en local**
Ege Erdil and Tamay Besiroglu have 2045+ timelines, think the whole "alignment" framing is wrong, don't think an intelligence explosion is plausible, but are convinced we'll see explosive economic growth (economy literally doubling every year or two).This discussion offers a totally different scenario than my recent interview with Scott and Daniel.Ege and Tamay are the co-founders of Mechanize, a startup dedicated to fully automating work. Before founding Mechanize, Ege and Tamay worked on AI forecasts at Epoch AI.Watch on Youtube; listen on Apple Podcasts or Spotify.----------Sponsors* WorkOS makes it easy to become enterprise-ready. With simple APIs for essential enterprise features like SSO and SCIM, WorkOS helps companies like Vercel, Plaid, and OpenAI meet the requirements of their biggest customers. To learn more about how they can help you do the same, visit workos.com* Scale's Data Foundry gives major AI labs access to high-quality data to fuel post-training, including advanced reasoning capabilities. If you're an AI researcher or engineer, learn about how Scale's Data Foundry and research lab, SEAL, can help you go beyond the current frontier at scale.com/dwarkesh* Google's Gemini Pro 2.5 is the model we use the most at Dwarkesh Podcast: it helps us generate transcripts, identify interesting clips, and code up new tools. If you want to try it for yourself, it's now available in Preview with higher rate limits! Start building with it today at aistudio.google.com.----------Timestamps(00:00:00) - AGI will take another 3 decades(00:22:27) - Even reasoning models lack animal intelligence (00:45:04) - Intelligence explosion(01:00:57) - Ege & Tamay's story(01:06:24) - Explosive economic growth(01:33:00) - Will there be a separate AI economy?(01:47:08) - Can we predictably influence the future?(02:19:48) - Arms race dynamic(02:29:48) - Is superintelligence a real thing?(02:35:45) - Reasons not to expect explosive growth(02:49:00) - Fully automated firms(02:54:43) - Will central planning work after AGI?(02:58:20) - Career advice Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
Google's AI efforts & Gemini Pro 2.5 take a major step forward with updates to Deep Research, new Agent2Agent protocol (A2A) & more. Sadly, OpenAI teases o3 and o4 but delays GPT-5. Plus, Meta's new Llama 4 models are out but have issues, Midjourney v7's debut, John Carmack's smackdown of an AI video game engine hater, Gavin's deep dive into OpenAI 4o Image Generation formats & the weirdest robot horse concept you've ever seen. WE'RE DEEP RESEARCHING OUR ENTIRE LIVES RIGHT NOW Join the discord: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/ // Show Links // Google Cloud 25 Live Stream “A New Way To Cloud!” https://youtu.be/Md4Fs-Zc3tg Google Cloud Blog Post https://blog.google/products/google-cloud/next-2025/ Upgraded Deep Research Out Preforms OpenAI Deep Research https://x.com/GeminiApp/status/1909721519724339226 Google's Deep Research Vs OpenAI Deep Research https://x.com/testingcatalog/status/1909727195402027183 New Ironwood TPUs https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/ Gavin's Experiences Google Gemini Deep Research: Baltro Test: https://x.com/AIForHumansShow/status/1909813850817675424 KP Biography: https://g.co/gemini/share/7b7bdb2c400e Agent2Agent Protocol https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/ Google Paying Some AI Stuff To Do Nothing Rather Than Work For Rivals https://x.com/TechCrunch/status/1909368948862181584 Solar Glow Meditations on AI http://tiktok.com/@solarglowmeditations/video/7491038509214518559?_t=ZT-8vNNgF7QpyM&_r=1 o4-mini & o3 coming before GPT-5 in shift from Sam Altman https://x.com/sama/status/1908167621624856998 OpenAI Strategic Deployment Team (new role to prep for AGI) https://x.com/aleks_madry/status/1909686225658695897 AI 2027 Paper https://ai-2027.com/ Llama 4 is here… but how good is it? https://ai.meta.com/blog/llama-4-multimodal-intelligence/ Controversy Around Benchmarks: https://gizmodo.com/meta-cheated-on-ai-benchmarks-and-its-a-glimpse-into-a-new-golden-age-2000586433 Deep dive on issues from The Information https://www.theinformation.com/articles/llama-4s-rocky-debut?rc=c3oojq&shared=3bbd9f72303888e2 Midjourney v7 Is Here and it's… just ok? https://www.midjourney.com/updates/v7-alpha John Carmack Defends AI Video Games https://x.com/ID_AA_Carmack/status/1909311174845329874 Tim Sweeney Weighs In https://x.com/TimSweeneyEpic/status/1909314230391902611 New Test-time-training = 1 Min AI Video From a Single Prompt https://x.com/karansdalal/status/1909312851795411093 Kawasaki's Robot Horse Concept https://futurism.com/the-byte/kawasaki-rideable-horse-robot VIDEO: https://youtu.be/vQDhzbTz-9k?si=2aWMtZVLnMONEjBe Engine AI + iShowSpeed https://x.com/engineairobot/status/1908570512906740037 Gemini 2.5 Pro Plays Pokemon https://x.com/kiranvodrahalli/status/1909699142265557208 Prompt-To-Anything Minecraft Looking Game https://x.com/NicolasZu/status/1908882267453239323 An Image That Will Never Go Viral https://www.reddit.com/r/ChatGPT/comments/1jth5yf/asked_for_an_image_that_will_never_go_viral/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button How Toothpaste Is Made https://www.reddit.com/r/aivideo/comments/1jujzh2/how_toothpaste_is_made/ 90s Video Game 4o Image Gen Prompt https://x.com/AIForHumansShow/status/1908985288116101553 1980s Japanese Posters https://x.com/AIForHumansShow/status/1909824824677192140 Buff Superbad https://x.com/AIForHumansShow/status/1909402225488937065
Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System > About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Tip These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell
Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System -- About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Ti These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell
Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System -- About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Ti These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell
Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System > About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Tip These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell
Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System -- About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Ti These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Sponsor: uscloud.com
Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System -- About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Ti These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Sponsor: uscloud.com
Applications for the 2025 AI Engineer Summit are up, and you can save the date for AIE Singapore in April and AIE World's Fair 2025 in June.Happy new year, and thanks for 100 great episodes! Please let us know what you want to see/hear for the next 100!Full YouTube Episode with Slides/ChartsLike and subscribe and hit that bell to get notifs!Timestamps* 00:00 Welcome to the 100th Episode!* 00:19 Reflecting on the Journey* 00:47 AI Engineering: The Rise and Impact* 03:15 Latent Space Live and AI Conferences* 09:44 The Competitive AI Landscape* 21:45 Synthetic Data and Future Trends* 35:53 Creative Writing with AI* 36:12 Legal and Ethical Issues in AI* 38:18 The Data War: GPU Poor vs. GPU Rich* 39:12 The Rise of GPU Ultra Rich* 40:47 Emerging Trends in AI Models* 45:31 The Multi-Modality War* 01:05:31 The Future of AI Benchmarks* 01:13:17 Pionote and Frontier Models* 01:13:47 Niche Models and Base Models* 01:14:30 State Space Models and RWKB* 01:15:48 Inference Race and Price Wars* 01:22:16 Major AI Themes of the Year* 01:22:48 AI Rewind: January to March* 01:26:42 AI Rewind: April to June* 01:33:12 AI Rewind: July to September* 01:34:59 AI Rewind: October to December* 01:39:53 Year-End Reflections and PredictionsTranscript[00:00:00] Welcome to the 100th Episode![00:00:00] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host Swyx for the 100th time today.[00:00:12] swyx: Yay, um, and we're so glad that, yeah, you know, everyone has, uh, followed us in this journey. How do you feel about it? 100 episodes.[00:00:19] Alessio: Yeah, I know.[00:00:19] Reflecting on the Journey[00:00:19] Alessio: Almost two years that we've been doing this. We've had four different studios. Uh, we've had a lot of changes. You know, we used to do this lightning round. When we first started that we didn't like, and we tried to change the question. The answer[00:00:32] swyx: was cursor and perplexity.[00:00:34] Alessio: Yeah, I love mid journey. It's like, do you really not like anything else?[00:00:38] Alessio: Like what's, what's the unique thing? And I think, yeah, we, we've also had a lot more research driven content. You know, we had like 3DAO, we had, you know. Jeremy Howard, we had more folks like that.[00:00:47] AI Engineering: The Rise and Impact[00:00:47] Alessio: I think we want to do more of that too in the new year, like having, uh, some of the Gemini folks, both on the research and the applied side.[00:00:54] Alessio: Yeah, but it's been a ton of fun. I think we both started, I wouldn't say as a joke, we were kind of like, Oh, we [00:01:00] should do a podcast. And I think we kind of caught the right wave, obviously. And I think your rise of the AI engineer posts just kind of get people. Sombra to congregate, and then the AI engineer summit.[00:01:11] Alessio: And that's why when I look at our growth chart, it's kind of like a proxy for like the AI engineering industry as a whole, which is almost like, like, even if we don't do that much, we keep growing just because there's so many more AI engineers. So did you expect that growth or did you expect that would take longer for like the AI engineer thing to kind of like become, you know, everybody talks about it today.[00:01:32] swyx: So, the sign of that, that we have won is that Gartner puts it at the top of the hype curve right now. So Gartner has called the peak in AI engineering. I did not expect, um, to what level. I knew that I was correct when I called it because I did like two months of work going into that. But I didn't know, You know, how quickly it could happen, and obviously there's a chance that I could be wrong.[00:01:52] swyx: But I think, like, most people have come around to that concept. Hacker News hates it, which is a good sign. But there's enough people that have defined it, you know, GitHub, when [00:02:00] they launched GitHub Models, which is the Hugging Face clone, they put AI engineers in the banner, like, above the fold, like, in big So I think it's like kind of arrived as a meaningful and useful definition.[00:02:12] swyx: I think people are trying to figure out where the boundaries are. I think that was a lot of the quote unquote drama that happens behind the scenes at the World's Fair in June. Because I think there's a lot of doubt or questions about where ML engineering stops and AI engineering starts. That's a useful debate to be had.[00:02:29] swyx: In some sense, I actually anticipated that as well. So I intentionally did not. Put a firm definition there because most of the successful definitions are necessarily underspecified and it's actually useful to have different perspectives and you don't have to specify everything from the outset.[00:02:45] Alessio: Yeah, I was at um, AWS reInvent and the line to get into like the AI engineering talk, so to speak, which is, you know, applied AI and whatnot was like, there are like hundreds of people just in line to go in.[00:02:56] Alessio: I think that's kind of what enabled me. People, right? Which is what [00:03:00] you kind of talked about. It's like, Hey, look, you don't actually need a PhD, just, yeah, just use the model. And then maybe we'll talk about some of the blind spots that you get as an engineer with the earlier posts that we also had on on the sub stack.[00:03:11] Alessio: But yeah, it's been a heck of a heck of a two years.[00:03:14] swyx: Yeah.[00:03:15] Latent Space Live and AI Conferences[00:03:15] swyx: You know, I was, I was trying to view the conference as like, so NeurIPS is I think like 16, 17, 000 people. And the Latent Space Live event that we held there was 950 signups. I think. The AI world, the ML world is still very much research heavy. And that's as it should be because ML is very much in a research phase.[00:03:34] swyx: But as we move this entire field into production, I think that ratio inverts into becoming more engineering heavy. So at least I think engineering should be on the same level, even if it's never as prestigious, like it'll always be low status because at the end of the day, you're manipulating APIs or whatever.[00:03:51] swyx: But Yeah, wrapping GPTs, but there's going to be an increasing stack and an art to doing these, these things well. And I, you know, I [00:04:00] think that's what we're focusing on for the podcast, the conference and basically everything I do seems to make sense. And I think we'll, we'll talk about the trends here that apply.[00:04:09] swyx: It's, it's just very strange. So, like, there's a mix of, like, keeping on top of research while not being a researcher and then putting that research into production. So, like, people always ask me, like, why are you covering Neuralibs? Like, this is a ML research conference and I'm like, well, yeah, I mean, we're not going to, to like, understand everything Or reproduce every single paper, but the stuff that is being found here is going to make it through into production at some point, you hope.[00:04:32] swyx: And then actually like when I talk to the researchers, they actually get very excited because they're like, oh, you guys are actually caring about how this goes into production and that's what they really really want. The measure of success is previously just peer review, right? Getting 7s and 8s on their um, Academic review conferences and stuff like citations is one metric, but money is a better metric.[00:04:51] Alessio: Money is a better metric. Yeah, and there were about 2200 people on the live stream or something like that. Yeah, yeah. Hundred on the live stream. So [00:05:00] I try my best to moderate, but it was a lot spicier in person with Jonathan and, and Dylan. Yeah, that it was in the chat on YouTube.[00:05:06] swyx: I would say that I actually also created.[00:05:09] swyx: Layen Space Live in order to address flaws that are perceived in academic conferences. This is not NeurIPS specific, it's ICML, NeurIPS. Basically, it's very sort of oriented towards the PhD student, uh, market, job market, right? Like literally all, basically everyone's there to advertise their research and skills and get jobs.[00:05:28] swyx: And then obviously all the, the companies go there to hire them. And I think that's great for the individual researchers, but for people going there to get info is not great because you have to read between the lines, bring a ton of context in order to understand every single paper. So what is missing is effectively what I ended up doing, which is domain by domain, go through and recap the best of the year.[00:05:48] swyx: Survey the field. And there are, like NeurIPS had a, uh, I think ICML had a like a position paper track, NeurIPS added a benchmarks, uh, datasets track. These are ways in which to address that [00:06:00] issue. Uh, there's always workshops as well. Every, every conference has, you know, a last day of workshops and stuff that provide more of an overview.[00:06:06] swyx: But they're not specifically prompted to do so. And I think really, uh, Organizing a conference is just about getting good speakers and giving them the correct prompts. And then they will just go and do that thing and they do a very good job of it. So I think Sarah did a fantastic job with the startups prompt.[00:06:21] swyx: I can't list everybody, but we did best of 2024 in startups, vision, open models. Post transformers, synthetic data, small models, and agents. And then the last one was the, uh, and then we also did a quick one on reasoning with Nathan Lambert. And then the last one, obviously, was the debate that people were very hyped about.[00:06:39] swyx: It was very awkward. And I'm really, really thankful for John Franco, basically, who stepped up to challenge Dylan. Because Dylan was like, yeah, I'll do it. But He was pro scaling. And I think everyone who is like in AI is pro scaling, right? So you need somebody who's ready to publicly say, no, we've hit a wall.[00:06:57] swyx: So that means you're saying Sam Altman's wrong. [00:07:00] You're saying, um, you know, everyone else is wrong. It helps that this was the day before Ilya went on, went up on stage and then said pre training has hit a wall. And data has hit a wall. So actually Jonathan ended up winning, and then Ilya supported that statement, and then Noam Brown on the last day further supported that statement as well.[00:07:17] swyx: So it's kind of interesting that I think the consensus kind of going in was that we're not done scaling, like you should believe in a better lesson. And then, four straight days in a row, you had Sepp Hochreiter, who is the creator of the LSTM, along with everyone's favorite OG in AI, which is Juergen Schmidhuber.[00:07:34] swyx: He said that, um, we're pre trading inside a wall, or like, we've run into a different kind of wall. And then we have, you know John Frankel, Ilya, and then Noam Brown are all saying variations of the same thing, that we have hit some kind of wall in the status quo of what pre trained, scaling large pre trained models has looked like, and we need a new thing.[00:07:54] swyx: And obviously the new thing for people is some make, either people are calling it inference time compute or test time [00:08:00] compute. I think the collective terminology has been inference time, and I think that makes sense because test time, calling it test, meaning, has a very pre trained bias, meaning that the only reason for running inference at all is to test your model.[00:08:11] swyx: That is not true. Right. Yeah. So, so, I quite agree that. OpenAI seems to have adopted, or the community seems to have adopted this terminology of ITC instead of TTC. And that, that makes a lot of sense because like now we care about inference, even right down to compute optimality. Like I actually interviewed this author who recovered or reviewed the Chinchilla paper.[00:08:31] swyx: Chinchilla paper is compute optimal training, but what is not stated in there is it's pre trained compute optimal training. And once you start caring about inference, compute optimal training, you have a different scaling law. And in a way that we did not know last year.[00:08:45] Alessio: I wonder, because John is, he's also on the side of attention is all you need.[00:08:49] Alessio: Like he had the bet with Sasha. So I'm curious, like he doesn't believe in scaling, but he thinks the transformer, I wonder if he's still. So, so,[00:08:56] swyx: so he, obviously everything is nuanced and you know, I told him to play a character [00:09:00] for this debate, right? So he actually does. Yeah. He still, he still believes that we can scale more.[00:09:04] swyx: Uh, he just assumed the character to be very game for, for playing this debate. So even more kudos to him that he assumed a position that he didn't believe in and still won the debate.[00:09:16] Alessio: Get rekt, Dylan. Um, do you just want to quickly run through some of these things? Like, uh, Sarah's presentation, just the highlights.[00:09:24] swyx: Yeah, we can't go through everyone's slides, but I pulled out some things as a factor of, like, stuff that we were going to talk about. And we'll[00:09:30] Alessio: publish[00:09:31] swyx: the rest. Yeah, we'll publish on this feed the best of 2024 in those domains. And hopefully people can benefit from the work that our speakers have done.[00:09:39] swyx: But I think it's, uh, these are just good slides. And I've been, I've been looking for a sort of end of year recaps from, from people.[00:09:44] The Competitive AI Landscape[00:09:44] swyx: The field has progressed a lot. You know, I think the max ELO in 2023 on LMSys used to be 1200 for LMSys ELOs. And now everyone is at least at, uh, 1275 in their ELOs, and this is across Gemini, Chadjibuti, [00:10:00] Grok, O1.[00:10:01] swyx: ai, which with their E Large model, and Enthopic, of course. It's a very, very competitive race. There are multiple Frontier labs all racing, but there is a clear tier zero Frontier. And then there's like a tier one. It's like, I wish I had everything else. Tier zero is extremely competitive. It's effectively now three horse race between Gemini, uh, Anthropic and OpenAI.[00:10:21] swyx: I would say that people are still holding out a candle for XAI. XAI, I think, for some reason, because their API was very slow to roll out, is not included in these metrics. So it's actually quite hard to put on there. As someone who also does charts, XAI is continually snubbed because they don't work well with the benchmarking people.[00:10:42] swyx: Yeah, yeah, yeah. It's a little trivia for why XAI always gets ignored. The other thing is market share. So these are slides from Sarah. We have it up on the screen. It has gone from very heavily open AI. So we have some numbers and estimates. These are from RAMP. Estimates of open AI market share in [00:11:00] December 2023.[00:11:01] swyx: And this is basically, what is it, GPT being 95 percent of production traffic. And I think if you correlate that with stuff that we asked. Harrison Chase on the LangChain episode, it was true. And then CLAUD 3 launched mid middle of this year. I think CLAUD 3 launched in March, CLAUD 3. 5 Sonnet was in June ish.[00:11:23] swyx: And you can start seeing the market share shift towards opening, uh, towards that topic, uh, very, very aggressively. The more recent one is Gemini. So if I scroll down a little bit, this is an even more recent dataset. So RAM's dataset ends in September 2 2. 2024. Gemini has basically launched a price war at the low end, uh, with Gemini Flash, uh, being basically free for personal use.[00:11:44] swyx: Like, I think people don't understand the free tier. It's something like a billion tokens per day. Unless you're trying to abuse it, you cannot really exhaust your free tier on Gemini. They're really trying to get you to use it. They know they're in like third place, um, fourth place, depending how you, how you count.[00:11:58] swyx: And so they're going after [00:12:00] the Lower tier first, and then, you know, maybe the upper tier later, but yeah, Gemini Flash, according to OpenRouter, is now 50 percent of their OpenRouter requests. Obviously, these are the small requests. These are small, cheap requests that are mathematically going to be more.[00:12:15] swyx: The smart ones obviously are still going to OpenAI. But, you know, it's a very, very big shift in the market. Like basically 2023, 2022, To going into 2024 opening has gone from nine five market share to Yeah. Reasonably somewhere between 50 to 75 market share.[00:12:29] Alessio: Yeah. I'm really curious how ramped does the attribution to the model?[00:12:32] Alessio: If it's API, because I think it's all credit card spin. . Well, but it's all, the credit card doesn't say maybe. Maybe the, maybe when they do expenses, they upload the PDF, but yeah, the, the German I think makes sense. I think that was one of my main 2024 takeaways that like. The best small model companies are the large labs, which is not something I would have thought that the open source kind of like long tail would be like the small model.[00:12:53] swyx: Yeah, different sizes of small models we're talking about here, right? Like so small model here for Gemini is AB, [00:13:00] right? Uh, mini. We don't know what the small model size is, but yeah, it's probably in the double digits or maybe single digits, but probably double digits. The open source community has kind of focused on the one to three B size.[00:13:11] swyx: Mm-hmm . Yeah. Maybe[00:13:12] swyx: zero, maybe 0.5 B uh, that's moon dream and that is small for you then, then that's great. It makes sense that we, we have a range for small now, which is like, may, maybe one to five B. Yeah. I'll even put that at, at, at the high end. And so this includes Gemma from Gemini as well. But also includes the Apple Foundation models, which I think Apple Foundation is 3B.[00:13:32] Alessio: Yeah. No, that's great. I mean, I think in the start small just meant cheap. I think today small is actually a more nuanced discussion, you know, that people weren't really having before.[00:13:43] swyx: Yeah, we can keep going. This is a slide that I smiley disagree with Sarah. She's pointing to the scale SEAL leaderboard. I think the Researchers that I talked with at NeurIPS were kind of positive on this because basically you need private test [00:14:00] sets to prevent contamination.[00:14:02] swyx: And Scale is one of maybe three or four people this year that has really made an effort in doing a credible private test set leaderboard. Llama405B does well compared to Gemini and GPT 40. And I think that's good. I would say that. You know, it's good to have an open model that is that big, that does well on those metrics.[00:14:23] swyx: But anyone putting 405B in production will tell you, if you scroll down a little bit to the artificial analysis numbers, that it is very slow and very expensive to infer. Um, it doesn't even fit on like one node. of, uh, of H100s. Cerebras will be happy to tell you they can serve 4 or 5B on their super large chips.[00:14:42] swyx: But, um, you know, if you need to do anything custom to it, you're still kind of constrained. So, is 4 or 5B really that relevant? Like, I think most people are basically saying that they only use 4 or 5B as a teacher model to distill down to something. Even Meta is doing it. So with Lama 3. [00:15:00] 3 launched, they only launched the 70B because they use 4 or 5B to distill the 70B.[00:15:03] swyx: So I don't know if like open source is keeping up. I think they're the, the open source industrial complex is very invested in telling you that the, if the gap is narrowing, I kind of disagree. I think that the gap is widening with O1. I think there are very, very smart people trying to narrow that gap and they should.[00:15:22] swyx: I really wish them success, but you cannot use a chart that is nearing 100 in your saturation chart. And look, the distance between open source and closed source is narrowing. Of course it's going to narrow because you're near 100. This is stupid. But in metrics that matter, is open source narrowing?[00:15:38] swyx: Probably not for O1 for a while. And it's really up to the open source guys to figure out if they can match O1 or not.[00:15:46] Alessio: I think inference time compute is bad for open source just because, you know, Doc can donate the flops at training time, but he cannot donate the flops at inference time. So it's really hard to like actually keep up on that axis.[00:15:59] Alessio: Big, big business [00:16:00] model shift. So I don't know what that means for the GPU clouds. I don't know what that means for the hyperscalers, but obviously the big labs have a lot of advantage. Because, like, it's not a static artifact that you're putting the compute in. You're kind of doing that still, but then you're putting a lot of computed inference too.[00:16:17] swyx: Yeah, yeah, yeah. Um, I mean, Llama4 will be reasoning oriented. We talked with Thomas Shalom. Um, kudos for getting that episode together. That was really nice. Good, well timed. Actually, I connected with the AI meta guy, uh, at NeurIPS, and, um, yeah, we're going to coordinate something for Llama4. Yeah, yeah,[00:16:32] Alessio: and our friend, yeah.[00:16:33] Alessio: Clara Shi just joined to lead the business agent side. So I'm sure we'll have her on in the new year.[00:16:39] swyx: Yeah. So, um, my comment on, on the business model shift, this is super interesting. Apparently it is wide knowledge that OpenAI wanted more than 6. 6 billion dollars for their fundraise. They wanted to raise, you know, higher, and they did not.[00:16:51] swyx: And what that means is basically like, it's very convenient that we're not getting GPT 5, which would have been a larger pre train. We should have a lot of upfront money. And [00:17:00] instead we're, we're converting fixed costs into variable costs, right. And passing it on effectively to the customer. And it's so much easier to take margin there because you can directly attribute it to like, Oh, you're using this more.[00:17:12] swyx: Therefore you, you pay more of the cost and I'll just slap a margin in there. So like that lets you control your growth margin and like tie your. Your spend, or your sort of inference spend, accordingly. And it's just really interesting to, that this change in the sort of inference paradigm has arrived exactly at the same time that the funding environment for pre training is effectively drying up, kind of.[00:17:36] swyx: I feel like maybe the VCs are very in tune with research anyway, so like, they would have noticed this, but, um, it's just interesting.[00:17:43] Alessio: Yeah, and I was looking back at our yearly recap of last year. Yeah. And the big thing was like the mixed trial price fights, you know, and I think now it's almost like there's nowhere to go, like, you know, Gemini Flash is like basically giving it away for free.[00:17:55] Alessio: So I think this is a good way for the labs to generate more revenue and pass down [00:18:00] some of the compute to the customer. I think they're going to[00:18:02] swyx: keep going. I think that 2, will come.[00:18:05] Alessio: Yeah, I know. Totally. I mean, next year, the first thing I'm doing is signing up for Devin. Signing up for the pro chat GBT.[00:18:12] Alessio: Just to try. I just want to see what does it look like to spend a thousand dollars a month on AI?[00:18:17] swyx: Yes. Yes. I think if your, if your, your job is a, at least AI content creator or VC or, you know, someone who, whose job it is to stay on, stay on top of things, you should already be spending like a thousand dollars a month on, on stuff.[00:18:28] swyx: And then obviously easy to spend, hard to use. You have to actually use. The good thing is that actually Google lets you do a lot of stuff for free now. So like deep research. That they just launched. Uses a ton of inference and it's, it's free while it's in preview.[00:18:45] Alessio: Yeah. They need to put that in Lindy.[00:18:47] Alessio: I've been using Lindy lately. I've been a built a bunch of things once we had flow because I liked the new thing. It's pretty good. I even did a phone call assistant. Um, yeah, they just launched Lindy voice. Yeah, I think once [00:19:00] they get advanced voice mode like capability today, still like speech to text, you can kind of tell.[00:19:06] Alessio: Um, but it's good for like reservations and things like that. So I have a meeting prepper thing. And so[00:19:13] swyx: it's good. Okay. I feel like we've, we've covered a lot of stuff. Uh, I, yeah, I, you know, I think We will go over the individual, uh, talks in a separate episode. Uh, I don't want to take too much time with, uh, this stuff, but that suffice to say that there is a lot of progress in each field.[00:19:28] swyx: Uh, we covered vision. Basically this is all like the audience voting for what they wanted. And then I just invited the best people I could find in each audience, especially agents. Um, Graham, who I talked to at ICML in Vienna, he is currently still number one. It's very hard to stay on top of SweetBench.[00:19:45] swyx: OpenHand is currently still number one. switchbench full, which is the hardest one. He had very good thoughts on agents, which I, which I'll highlight for people. Everyone is saying 2025 is the year of agents, just like they said last year. And, uh, but he had [00:20:00] thoughts on like eight parts of what are the frontier problems to solve in agents.[00:20:03] swyx: And so I'll highlight that talk as well.[00:20:05] Alessio: Yeah. The number six, which is the Hacken agents learn more about the environment, has been a Super interesting to us as well, just to think through, because, yeah, how do you put an agent in an enterprise where most things in an enterprise have never been public, you know, a lot of the tooling, like the code bases and things like that.[00:20:23] Alessio: So, yeah, there's not indexing and reg. Well, yeah, but it's more like. You can't really rag things that are not documented. But people know them based on how they've been doing it. You know, so I think there's almost this like, you know, Oh, institutional knowledge. Yeah, the boring word is kind of like a business process extraction.[00:20:38] Alessio: Yeah yeah, I see. It's like, how do you actually understand how these things are done? I see. Um, and I think today the, the problem is that, Yeah, the agents are, that most people are building are good at following instruction, but are not as good as like extracting them from you. Um, so I think that will be a big unlock just to touch quickly on the Jeff Dean thing.[00:20:55] Alessio: I thought it was pretty, I mean, we'll link it in the, in the things, but. I think the main [00:21:00] focus was like, how do you use ML to optimize the systems instead of just focusing on ML to do something else? Yeah, I think speculative decoding, we had, you know, Eugene from RWKB on the podcast before, like he's doing a lot of that with Fetterless AI.[00:21:12] swyx: Everyone is. I would say it's the norm. I'm a little bit uncomfortable with how much it costs, because it does use more of the GPU per call. But because everyone is so keen on fast inference, then yeah, makes sense.[00:21:24] Alessio: Exactly. Um, yeah, but we'll link that. Obviously Jeff is great.[00:21:30] swyx: Jeff is, Jeff's talk was more, it wasn't focused on Gemini.[00:21:33] swyx: I think people got the wrong impression from my tweet. It's more about how Google approaches ML and uses ML to design systems and then systems feedback into ML. And I think this ties in with Lubna's talk.[00:21:45] Synthetic Data and Future Trends[00:21:45] swyx: on synthetic data where it's basically the story of bootstrapping of humans and AI in AI research or AI in production.[00:21:53] swyx: So her talk was on synthetic data, where like how much synthetic data has grown in 2024 in the pre training side, the post training side, [00:22:00] and the eval side. And I think Jeff then also extended it basically to chips, uh, to chip design. So he'd spend a lot of time talking about alpha chip. And most of us in the audience are like, we're not working on hardware, man.[00:22:11] swyx: Like you guys are great. TPU is great. Okay. We'll buy TPUs.[00:22:14] Alessio: And then there was the earlier talk. Yeah. But, and then we have, uh, I don't know if we're calling them essays. What are we calling these? But[00:22:23] swyx: for me, it's just like bonus for late in space supporters, because I feel like they haven't been getting anything.[00:22:29] swyx: And then I wanted a more high frequency way to write stuff. Like that one I wrote in an afternoon. I think basically we now have an answer to what Ilya saw. It's one year since. The blip. And we know what he saw in 2014. We know what he saw in 2024. We think we know what he sees in 2024. He gave some hints and then we have vague indications of what he saw in 2023.[00:22:54] swyx: So that was the Oh, and then 2016 as well, because of this lawsuit with Elon, OpenAI [00:23:00] is publishing emails from Sam's, like, his personal text messages to Siobhan, Zelis, or whatever. So, like, we have emails from Ilya saying, this is what we're seeing in OpenAI, and this is why we need to scale up GPUs. And I think it's very prescient in 2016 to write that.[00:23:16] swyx: And so, like, it is exactly, like, basically his insights. It's him and Greg, basically just kind of driving the scaling up of OpenAI, while they're still playing Dota. They're like, no, like, we see the path here.[00:23:30] Alessio: Yeah, and it's funny, yeah, they even mention, you know, we can only train on 1v1 Dota. We need to train on 5v5, and that takes too many GPUs.[00:23:37] Alessio: Yeah,[00:23:37] swyx: and at least for me, I can speak for myself, like, I didn't see the path from Dota to where we are today. I think even, maybe if you ask them, like, they wouldn't necessarily draw a straight line. Yeah,[00:23:47] Alessio: no, definitely. But I think like that was like the whole idea of almost like the RL and we talked about this with Nathan on his podcast.[00:23:55] Alessio: It's like with RL, you can get very good at specific things, but then you can't really like generalize as much. And I [00:24:00] think the language models are like the opposite, which is like, you're going to throw all this data at them and scale them up, but then you really need to drive them home on a specific task later on.[00:24:08] Alessio: And we'll talk about the open AI reinforcement, fine tuning, um, announcement too, and all of that. But yeah, I think like scale is all you need. That's kind of what Elia will be remembered for. And I think just maybe to clarify on like the pre training is over thing that people love to tweet. I think the point of the talk was like everybody, we're scaling these chips, we're scaling the compute, but like the second ingredient which is data is not scaling at the same rate.[00:24:35] Alessio: So it's not necessarily pre training is over. It's kind of like What got us here won't get us there. In his email, he predicted like 10x growth every two years or something like that. And I think maybe now it's like, you know, you can 10x the chips again, but[00:24:49] swyx: I think it's 10x per year. Was it? I don't know.[00:24:52] Alessio: Exactly. And Moore's law is like 2x. So it's like, you know, much faster than that. And yeah, I like the fossil fuel of AI [00:25:00] analogy. It's kind of like, you know, the little background tokens thing. So the OpenAI reinforcement fine tuning is basically like, instead of fine tuning on data, you fine tune on a reward model.[00:25:09] Alessio: So it's basically like, instead of being data driven, it's like task driven. And I think people have tasks to do, they don't really have a lot of data. So I'm curious to see how that changes, how many people fine tune, because I think this is what people run into. It's like, Oh, you can fine tune llama. And it's like, okay, where do I get the data?[00:25:27] Alessio: To fine tune it on, you know, so it's great that we're moving the thing. And then I really like he had this chart where like, you know, the brain mass and the body mass thing is basically like mammals that scaled linearly by brain and body size, and then humans kind of like broke off the slope. So it's almost like maybe the mammal slope is like the pre training slope.[00:25:46] Alessio: And then the post training slope is like the, the human one.[00:25:49] swyx: Yeah. I wonder what the. I mean, we'll know in 10 years, but I wonder what the y axis is for, for Ilya's SSI. We'll try to get them on.[00:25:57] Alessio: Ilya, if you're listening, you're [00:26:00] welcome here. Yeah, and then he had, you know, what comes next, like agent, synthetic data, inference, compute, I thought all of that was like that.[00:26:05] Alessio: I don't[00:26:05] swyx: think he was dropping any alpha there. Yeah, yeah, yeah.[00:26:07] Alessio: Yeah. Any other new reps? Highlights?[00:26:10] swyx: I think that there was comparatively a lot more work. Oh, by the way, I need to plug that, uh, my friend Yi made this, like, little nice paper. Yeah, that was really[00:26:20] swyx: nice.[00:26:20] swyx: Uh, of, uh, of, like, all the, he's, she called it must read papers of 2024.[00:26:26] swyx: So I laid out some of these at NeurIPS, and it was just gone. Like, everyone just picked it up. Because people are dying for, like, little guidance and visualizations And so, uh, I thought it was really super nice that we got there.[00:26:38] Alessio: Should we do a late in space book for each year? Uh, I thought about it. For each year we should.[00:26:42] Alessio: Coffee table book. Yeah. Yeah. Okay. Put it in the will. Hi, Will. By the way, we haven't introduced you. He's our new, you know, general organist, Jamie. You need to[00:26:52] swyx: pull up more things. One thing I saw that, uh, Okay, one fun one, and then one [00:27:00] more general one. So the fun one is this paper on agent collusion. This is a paper on steganography.[00:27:06] swyx: This is secret collusion among AI agents, multi agent deception via steganography. I tried to go to NeurIPS in order to find these kinds of papers because the real reason Like NeurIPS this year has a lottery system. A lot of people actually even go and don't buy tickets because they just go and attend the side events.[00:27:22] swyx: And then also the people who go and end up crowding around the most popular papers, which you already know and already read them before you showed up to NeurIPS. So the only reason you go there is to talk to the paper authors, but there's like something like 10, 000 other. All these papers out there that, you know, are just people's work that they, that they did on the air and they failed to get attention for one reason or another.[00:27:42] swyx: And this was one of them. Uh, it was like all the way at the back. And this is a deep mind paper that actually focuses on collusion between AI agents, uh, by hiding messages in the text that they generate. Uh, so that's what steganography is. So a very simple example would be the first letter of every word.[00:27:57] swyx: If you Pick that out, you know, and the code sends a [00:28:00] different message than that. But something I've always emphasized is to LLMs, we read left to right. LLMs can read up, down, sideways, you know, in random character order. And it's the same to them as it is to us. So if we were ever to get You know, self motivated, underlined LLMs that we're trying to collaborate to take over the planet.[00:28:19] swyx: This would be how they do it. They spread messages among us in the messages that we generate. And he developed a scaling law for that. So he marked, I'm showing it on screen right now, the emergence of this phenomenon. Basically, for example, for Cypher encoding, GPT 2, Lama 2, mixed trial, GPT 3. 5, zero capabilities, and sudden 4.[00:28:40] swyx: And this is the kind of Jason Wei type emergence properties that people kind of look for. I think what made this paper stand out as well, so he developed the benchmark for steganography collusion, and he also focused on shelling point collusion, which is very low coordination. For agreeing on a decoding encoding format, you kind of need to have some [00:29:00] agreement on that.[00:29:00] swyx: But, but shelling point means like very, very low or almost no coordination. So for example, if I, if I ask someone, if the only message I give you is meet me in New York and you're not aware. Or when you would probably meet me at Grand Central Station. That is the Grand Central Station is a shelling point.[00:29:16] swyx: And it's probably somewhere, somewhere during the day. That is the shelling point of New York is Grand Central. To that extent, shelling points for steganography are things like the, the, the common decoding methods that we talked about. It will be interesting at some point in the future when we are worried about alignment.[00:29:30] swyx: It is not interesting today, but it's interesting that DeepMind is already thinking about this.[00:29:36] Alessio: I think that's like one of the hardest things about NeurIPS. It's like the long tail. I[00:29:41] swyx: found a pricing guy. I'm going to feature him on the podcast. Basically, this guy from NVIDIA worked out the optimal pricing for language models.[00:29:51] swyx: It's basically an econometrics paper at NeurIPS, where everyone else is talking about GPUs. And the guy with the GPUs is[00:29:57] Alessio: talking[00:29:57] swyx: about economics instead. [00:30:00] That was the sort of fun one. So the focus I saw is that model papers at NeurIPS are kind of dead. No one really presents models anymore. It's just data sets.[00:30:12] swyx: This is all the grad students are working on. So like there was a data sets track and then I was looking around like, I was like, you don't need a data sets track because every paper is a data sets paper. And so data sets and benchmarks, they're kind of flip sides of the same thing. So Yeah. Cool. Yeah, if you're a grad student, you're a GPU boy, you kind of work on that.[00:30:30] swyx: And then the, the sort of big model that people walk around and pick the ones that they like, and then they use it in their models. And that's, that's kind of how it develops. I, I feel like, um, like, like you didn't last year, you had people like Hao Tian who worked on Lava, which is take Lama and add Vision.[00:30:47] swyx: And then obviously actually I hired him and he added Vision to Grok. Now he's the Vision Grok guy. This year, I don't think there was any of those.[00:30:55] Alessio: What were the most popular, like, orals? Last year it was like the [00:31:00] Mixed Monarch, I think, was like the most attended. Yeah, uh, I need to look it up. Yeah, I mean, if nothing comes to mind, that's also kind of like an answer in a way.[00:31:10] Alessio: But I think last year there was a lot of interest in, like, furthering models and, like, different architectures and all of that.[00:31:16] swyx: I will say that I felt the orals, oral picks this year were not very good. Either that or maybe it's just a So that's the highlight of how I have changed in terms of how I view papers.[00:31:29] swyx: So like, in my estimation, two of the best papers in this year for datasets or data comp and refined web or fine web. These are two actually industrially used papers, not highlighted for a while. I think DCLM got the spotlight, FineWeb didn't even get the spotlight. So like, it's just that the picks were different.[00:31:48] swyx: But one thing that does get a lot of play that a lot of people are debating is the role that's scheduled. This is the schedule free optimizer paper from Meta from Aaron DeFazio. And this [00:32:00] year in the ML community, there's been a lot of chat about shampoo, soap, all the bathroom amenities for optimizing your learning rates.[00:32:08] swyx: And, uh, most people at the big labs are. Who I asked about this, um, say that it's cute, but it's not something that matters. I don't know, but it's something that was discussed and very, very popular. 4Wars[00:32:19] Alessio: of AI recap maybe, just quickly. Um, where do you want to start? Data?[00:32:26] swyx: So to remind people, this is the 4Wars piece that we did as one of our earlier recaps of this year.[00:32:31] swyx: And the belligerents are on the left, journalists, writers, artists, anyone who owns IP basically, New York Times, Stack Overflow, Reddit, Getty, Sarah Silverman, George RR Martin. Yeah, and I think this year we can add Scarlett Johansson to that side of the fence. So anyone suing, open the eye, basically. I actually wanted to get a snapshot of all the lawsuits.[00:32:52] swyx: I'm sure some lawyer can do it. That's the data quality war. On the right hand side, we have the synthetic data people, and I think we talked about Lumna's talk, you know, [00:33:00] really showing how much synthetic data has come along this year. I think there was a bit of a fight between scale. ai and the synthetic data community, because scale.[00:33:09] swyx: ai published a paper saying that synthetic data doesn't work. Surprise, surprise, scale. ai is the leading vendor of non synthetic data. Only[00:33:17] Alessio: cage free annotated data is useful.[00:33:21] swyx: So I think there's some debate going on there, but I don't think it's much debate anymore that at least synthetic data, for the reasons that are blessed in Luna's talk, Makes sense.[00:33:32] swyx: I don't know if you have any perspectives there.[00:33:34] Alessio: I think, again, going back to the reinforcement fine tuning, I think that will change a little bit how people think about it. I think today people mostly use synthetic data, yeah, for distillation and kind of like fine tuning a smaller model from like a larger model.[00:33:46] Alessio: I'm not super aware of how the frontier labs use it outside of like the rephrase, the web thing that Apple also did. But yeah, I think it'll be. Useful. I think like whether or not that gets us the big [00:34:00] next step, I think that's maybe like TBD, you know, I think people love talking about data because it's like a GPU poor, you know, I think, uh, synthetic data is like something that people can do, you know, so they feel more opinionated about it compared to, yeah, the optimizers stuff, which is like,[00:34:17] swyx: they don't[00:34:17] Alessio: really work[00:34:18] swyx: on.[00:34:18] swyx: I think that there is an angle to the reasoning synthetic data. So this year, we covered in the paper club, the star series of papers. So that's star, Q star, V star. It basically helps you to synthesize reasoning steps, or at least distill reasoning steps from a verifier. And if you look at the OpenAI RFT, API that they released, or that they announced, basically they're asking you to submit graders, or they choose from a preset list of graders.[00:34:49] swyx: Basically It feels like a way to create valid synthetic data for them to fine tune their reasoning paths on. Um, so I think that is another angle where it starts to make sense. And [00:35:00] so like, it's very funny that basically all the data quality wars between Let's say the music industry or like the newspaper publishing industry or the textbooks industry on the big labs.[00:35:11] swyx: It's all of the pre training era. And then like the new era, like the reasoning era, like nobody has any problem with all the reasoning, especially because it's all like sort of math and science oriented with, with very reasonable graders. I think the more interesting next step is how does it generalize beyond STEM?[00:35:27] swyx: We've been using O1 for And I would say like for summarization and creative writing and instruction following, I think it's underrated. I started using O1 in our intro songs before we killed the intro songs, but it's very good at writing lyrics. You know, I can actually say like, I think one of the O1 pro demos.[00:35:46] swyx: All of these things that Noam was showing was that, you know, you can write an entire paragraph or three paragraphs without using the letter A, right?[00:35:53] Creative Writing with AI[00:35:53] swyx: So like, like literally just anything instead of token, like not even token level, character level manipulation and [00:36:00] counting and instruction following. It's, uh, it's very, very strong.[00:36:02] swyx: And so no surprises when I ask it to rhyme, uh, and to, to create song lyrics, it's going to do that very much better than in previous models. So I think it's underrated for creative writing.[00:36:11] Alessio: Yeah.[00:36:12] Legal and Ethical Issues in AI[00:36:12] Alessio: What do you think is the rationale that they're going to have in court when they don't show you the thinking traces of O1, but then they want us to, like, they're getting sued for using other publishers data, you know, but then on their end, they're like, well, you shouldn't be using my data to then train your model.[00:36:29] Alessio: So I'm curious to see how that kind of comes. Yeah, I mean, OPA has[00:36:32] swyx: many ways to publish, to punish people without bringing, taking them to court. Already banned ByteDance for distilling their, their info. And so anyone caught distilling the chain of thought will be just disallowed to continue on, on, on the API.[00:36:44] swyx: And it's fine. It's no big deal. Like, I don't even think that's an issue at all, just because the chain of thoughts are pretty well hidden. Like you have to work very, very hard to, to get it to leak. And then even when it leaks the chain of thought, you don't know if it's, if it's [00:37:00] The bigger concern is actually that there's not that much IP hiding behind it, that Cosign, which we talked about, we talked to him on Dev Day, can just fine tune 4.[00:37:13] swyx: 0 to beat 0. 1 Cloud SONET so far is beating O1 on coding tasks without, at least O1 preview, without being a reasoning model, same for Gemini Pro or Gemini 2. 0. So like, how much is reasoning important? How much of a moat is there in this, like, All of these are proprietary sort of training data that they've presumably accomplished.[00:37:34] swyx: Because even DeepSeek was able to do it. And they had, you know, two months notice to do this, to do R1. So, it's actually unclear how much moat there is. Obviously, you know, if you talk to the Strawberry team, they'll be like, yeah, I mean, we spent the last two years doing this. So, we don't know. And it's going to be Interesting because there'll be a lot of noise from people who say they have inference time compute and actually don't because they just have fancy chain of thought.[00:38:00][00:38:00] swyx: And then there's other people who actually do have very good chain of thought. And you will not see them on the same level as OpenAI because OpenAI has invested a lot in building up the mythology of their team. Um, which makes sense. Like the real answer is somewhere in between.[00:38:13] Alessio: Yeah, I think that's kind of like the main data war story developing.[00:38:18] The Data War: GPU Poor vs. GPU Rich[00:38:18] Alessio: GPU poor versus GPU rich. Yeah. Where do you think we are? I think there was, again, going back to like the small model thing, there was like a time in which the GPU poor were kind of like the rebel faction working on like these models that were like open and small and cheap. And I think today people don't really care as much about GPUs anymore.[00:38:37] Alessio: You also see it in the price of the GPUs. Like, you know, that market is kind of like plummeted because there's people don't want to be, they want to be GPU free. They don't even want to be poor. They just want to be, you know, completely without them. Yeah. How do you think about this war? You[00:38:52] swyx: can tell me about this, but like, I feel like the, the appetite for GPU rich startups, like the, you know, the, the funding plan is we will raise 60 million and [00:39:00] we'll give 50 of that to NVIDIA.[00:39:01] swyx: That is gone, right? Like, no one's, no one's pitching that. This was literally the plan, the exact plan of like, I can name like four or five startups, you know, this time last year. So yeah, GPU rich startups gone.[00:39:12] The Rise of GPU Ultra Rich[00:39:12] swyx: But I think like, The GPU ultra rich, the GPU ultra high net worth is still going. So, um, now we're, you know, we had Leopold's essay on the trillion dollar cluster.[00:39:23] swyx: We're not quite there yet. We have multiple labs, um, you know, XAI very famously, you know, Jensen Huang praising them for being. Best boy number one in spinning up 100, 000 GPU cluster in like 12 days or something. So likewise at Meta, likewise at OpenAI, likewise at the other labs as well. So like the GPU ultra rich are going to keep doing that because I think partially it's an article of faith now that you just need it.[00:39:46] swyx: Like you don't even know what it's going to, what you're going to use it for. You just, you just need it. And it makes sense that if, especially if we're going into. More researchy territory than we are. So let's say 2020 to 2023 was [00:40:00] let's scale big models territory because we had GPT 3 in 2020 and we were like, okay, we'll go from 1.[00:40:05] swyx: 75b to 1. 8b, 1. 8t. And that was GPT 3 to GPT 4. Okay, that's done. As far as everyone is concerned, Opus 3. 5 is not coming out, GPT 4. 5 is not coming out, and Gemini 2, we don't have Pro, whatever. We've hit that wall. Maybe I'll call it the 2 trillion perimeter wall. We're not going to 10 trillion. No one thinks it's a good idea, at least from training costs, from the amount of data, or at least the inference.[00:40:36] swyx: Would you pay 10x the price of GPT Probably not. Like, like you want something else that, that is at least more useful. So it makes sense that people are pivoting in terms of their inference paradigm.[00:40:47] Emerging Trends in AI Models[00:40:47] swyx: And so when it's more researchy, then you actually need more just general purpose compute to mess around with, uh, at the exact same time that production deployments of the old, the previous paradigm is still ramping up,[00:40:58] swyx: um,[00:40:58] swyx: uh, pretty aggressively.[00:40:59] swyx: So [00:41:00] it makes sense that the GPU rich are growing. We have now interviewed both together and fireworks and replicates. Uh, we haven't done any scale yet. But I think Amazon, maybe kind of a sleeper one, Amazon, in a sense of like they, at reInvent, I wasn't expecting them to do so well, but they are now a foundation model lab.[00:41:18] swyx: It's kind of interesting. Um, I think, uh, you know, David went over there and started just creating models.[00:41:25] Alessio: Yeah, I mean, that's the power of prepaid contracts. I think like a lot of AWS customers, you know, they do this big reserve instance contracts and now they got to use their money. That's why so many startups.[00:41:37] Alessio: Get bought through the AWS marketplace so they can kind of bundle them together and prefer pricing.[00:41:42] swyx: Okay, so maybe GPU super rich doing very well, GPU middle class dead, and then GPU[00:41:48] Alessio: poor. I mean, my thing is like, everybody should just be GPU rich. There shouldn't really be, even the GPU poorest, it's like, does it really make sense to be GPU poor?[00:41:57] Alessio: Like, if you're GPU poor, you should just use the [00:42:00] cloud. Yes, you know, and I think there might be a future once we kind of like figure out what the size and shape of these models is where like the tiny box and these things come to fruition where like you can be GPU poor at home. But I think today is like, why are you working so hard to like get these models to run on like very small clusters where it's like, It's so cheap to run them.[00:42:21] Alessio: Yeah, yeah,[00:42:22] swyx: yeah. I think mostly people think it's cool. People think it's a stepping stone to scaling up. So they aspire to be GPU rich one day and they're working on new methods. Like news research, like probably the most deep tech thing they've done this year is Distro or whatever the new name is.[00:42:38] swyx: There's a lot of interest in heterogeneous computing, distributed computing. I tend generally to de emphasize that historically, but it may be coming to a time where it is starting to be relevant. I don't know. You know, SF compute launched their compute marketplace this year, and like, who's really using that?[00:42:53] swyx: Like, it's a bunch of small clusters, disparate types of compute, and if you can make that [00:43:00] useful, then that will be very beneficial to the broader community, but maybe still not the source of frontier models. It's just going to be a second tier of compute that is unlocked for people, and that's fine. But yeah, I mean, I think this year, I would say a lot more on device, We are, I now have Apple intelligence on my phone.[00:43:19] swyx: Doesn't do anything apart from summarize my notifications. But still, not bad. Like, it's multi modal.[00:43:25] Alessio: Yeah, the notification summaries are so and so in my experience.[00:43:29] swyx: Yeah, but they add, they add juice to life. And then, um, Chrome Nano, uh, Gemini Nano is coming out in Chrome. Uh, they're still feature flagged, but you can, you can try it now if you, if you use the, uh, the alpha.[00:43:40] swyx: And so, like, I, I think, like, you know, We're getting the sort of GPU poor version of a lot of these things coming out, and I think it's like quite useful. Like Windows as well, rolling out RWKB in sort of every Windows department is super cool. And I think the last thing that I never put in this GPU poor war, that I think I should now, [00:44:00] is the number of startups that are GPU poor but still scaling very well, as sort of wrappers on top of either a foundation model lab, or GPU Cloud.[00:44:10] swyx: GPU Cloud, it would be Suno. Suno, Ramp has rated as one of the top ranked, fastest growing startups of the year. Um, I think the last public number is like zero to 20 million this year in ARR and Suno runs on Moto. So Suno itself is not GPU rich, but they're just doing the training on, on Moto, uh, who we've also talked to on, on the podcast.[00:44:31] swyx: The other one would be Bolt, straight cloud wrapper. And, and, um, Again, another, now they've announced 20 million ARR, which is another step up from our 8 million that we put on the title. So yeah, I mean, it's crazy that all these GPU pores are finding a way while the GPU riches are also finding a way. And then the only failures, I kind of call this the GPU smiling curve, where the edges do well, because you're either close to the machines, and you're like [00:45:00] number one on the machines, or you're like close to the customers, and you're number one on the customer side.[00:45:03] swyx: And the people who are in the middle. Inflection, um, character, didn't do that great. I think character did the best of all of them. Like, you have a note in here that we apparently said that character's price tag was[00:45:15] Alessio: 1B.[00:45:15] swyx: Did I say that?[00:45:16] Alessio: Yeah. You said Google should just buy them for 1B. I thought it was a crazy number.[00:45:20] Alessio: Then they paid 2. 7 billion. I mean, for like,[00:45:22] swyx: yeah.[00:45:22] Alessio: What do you pay for node? Like, I don't know what the game world was like. Maybe the starting price was 1B. I mean, whatever it was, it worked out for everybody involved.[00:45:31] The Multi-Modality War[00:45:31] Alessio: Multimodality war. And this one, we never had text to video in the first version, which now is the hottest.[00:45:37] swyx: Yeah, I would say it's a subset of image, but yes.[00:45:40] Alessio: Yeah, well, but I think at the time it wasn't really something people were doing, and now we had VO2 just came out yesterday. Uh, Sora was released last month, last week. I've not tried Sora, because the day that I tried, it wasn't, yeah. I[00:45:54] swyx: think it's generally available now, you can go to Sora.[00:45:56] swyx: com and try it. Yeah, they had[00:45:58] Alessio: the outage. Which I [00:46:00] think also played a part into it. Small things. Yeah. What's the other model that you posted today that was on Replicate? Video or OneLive?[00:46:08] swyx: Yeah. Very, very nondescript name, but it is from Minimax, which I think is a Chinese lab. The Chinese labs do surprisingly well at the video models.[00:46:20] swyx: I'm not sure it's actually Chinese. I don't know. Hold me up to that. Yep. China. It's good. Yeah, the Chinese love video. What can I say? They have a lot of training data for video. Or a more relaxed regulatory environment.[00:46:37] Alessio: Uh, well, sure, in some way. Yeah, I don't think there's much else there. I think like, you know, on the image side, I think it's still open.[00:46:45] Alessio: Yeah, I mean,[00:46:46] swyx: 11labs is now a unicorn. So basically, what is multi modality war? Multi modality war is, do you specialize in a single modality, right? Or do you have GodModel that does all the modalities? So this is [00:47:00] definitely still going, in a sense of 11 labs, you know, now Unicorn, PicoLabs doing well, they launched Pico 2.[00:47:06] swyx: 0 recently, HeyGen, I think has reached 100 million ARR, Assembly, I don't know, but they have billboards all over the place, so I assume they're doing very, very well. So these are all specialist models, specialist models and specialist startups. And then there's the big labs who are doing the sort of all in one play.[00:47:24] swyx: And then here I would highlight Gemini 2 for having native image output. Have you seen the demos? Um, yeah, it's, it's hard to keep up. Literally they launched this last week and a shout out to Paige Bailey, who came to the Latent Space event to demo on the day of launch. And she wasn't prepared. She was just like, I'm just going to show you.[00:47:43] swyx: So they have voice. They have, you know, obviously image input, and then they obviously can code gen and all that. But the new one that OpenAI and Meta both have but they haven't launched yet is image output. So you can literally, um, I think their demo video was that you put in an image of a [00:48:00] car, and you ask for minor modifications to that car.[00:48:02] swyx: They can generate you that modification exactly as you asked. So there's no need for the stable diffusion or comfy UI workflow of like mask here and then like infill there in paint there and all that, all that stuff. This is small model nonsense. Big model people are like, huh, we got you in as everything in the transformer.[00:48:21] swyx: This is the multimodality war, which is, do you, do you bet on the God model or do you string together a whole bunch of, uh, Small models like a, like a chump. Yeah,[00:48:29] Alessio: I don't know, man. Yeah, that would be interesting. I mean, obviously I use Midjourney for all of our thumbnails. Um, they've been doing a ton on the product, I would say.[00:48:38] Alessio: They launched a new Midjourney editor thing. They've been doing a ton. Because I think, yeah, the motto is kind of like, Maybe, you know, people say black forest, the black forest models are better than mid journey on a pixel by pixel basis. But I think when you put it, put it together, have you tried[00:48:53] swyx: the same problems on black forest?[00:48:55] Alessio: Yes. But the problem is just like, you know, on black forest, it generates one image. And then it's like, you got to [00:49:00] regenerate. You don't have all these like UI things. Like what I do, no, but it's like time issue, you know, it's like a mid[00:49:06] swyx: journey. Call the API four times.[00:49:08] Alessio: No, but then there's no like variate.[00:49:10] Alessio: Like the good thing about mid journey is like, you just go in there and you're cooking. There's a lot of stuff that just makes it really easy. And I think people underestimate that. Like, it's not really a skill issue, because I'm paying mid journey, so it's a Black Forest skill issue, because I'm not paying them, you know?[00:49:24] Alessio: Yeah,[00:49:25] swyx: so, okay, so, uh, this is a UX thing, right? Like, you, you, you understand that, at least, we think that Black Forest should be able to do all that stuff. I will also shout out, ReCraft has come out, uh, on top of the image arena that, uh, artificial analysis has done, has apparently, uh, Flux's place. Is this still true?[00:49:41] swyx: So, Artificial Analysis is now a company. I highlighted them I think in one of the early AI Newses of the year. And they have launched a whole bunch of arenas. So, they're trying to take on LM Arena, Anastasios and crew. And they have an image arena. Oh yeah, Recraft v3 is now beating Flux 1. 1. Which is very surprising [00:50:00] because Flux And Black Forest Labs are the old stable diffusion crew who left stability after, um, the management issues.[00:50:06] swyx: So Recurve has come from nowhere to be the top image model. Uh, very, very strange. I would also highlight that Grok has now launched Aurora, which is, it's very interesting dynamics between Grok and Black Forest Labs because Grok's images were originally launched, uh, in partnership with Black Forest Labs as a, as a thin wrapper.[00:50:24] swyx: And then Grok was like, no, we'll make our own. And so they've made their own. I don't know, there are no APIs or benchmarks about it. They just announced it. So yeah, that's the multi modality war. I would say that so far, the small model, the dedicated model people are winning, because they are just focused on their tasks.[00:50:42] swyx: But the big model, People are always catching up. And the moment I saw the Gemini 2 demo of image editing, where I can put in an image and just request it and it does, that's how AI should work. Not like a whole bunch of complicated steps. So it really is something. And I think one frontier that we haven't [00:51:00] seen this year, like obviously video has done very well, and it will continue to grow.[00:51:03] swyx: You know, we only have Sora Turbo today, but at some point we'll get full Sora. Oh, at least the Hollywood Labs will get Fulsora. We haven't seen video to audio, or video synced to audio. And so the researchers that I talked to are already starting to talk about that as the next frontier. But there's still maybe like five more years of video left to actually be Soda.[00:51:23] swyx: I would say that Gemini's approach Compared to OpenAI, Gemini seems, or DeepMind's approach to video seems a lot more fully fledged than OpenAI. Because if you look at the ICML recap that I published that so far nobody has listened to, um, that people have listened to it. It's just a different, definitely different audience.[00:51:43] swyx: It's only seven hours long. Why are people not listening? It's like everything in Uh, so, so DeepMind has, is working on Genie. They also launched Genie 2 and VideoPoet. So, like, they have maybe four years advantage on world modeling that OpenAI does not have. Because OpenAI basically only started [00:52:00] Diffusion Transformers last year, you know, when they hired, uh, Bill Peebles.[00:52:03] swyx: So, DeepMind has, has a bit of advantage here, I would say, in, in, in showing, like, the reason that VO2, while one, They cherry pick their videos. So obviously it looks better than Sora, but the reason I would believe that VO2, uh, when it's fully launched will do very well is because they have all this background work in video that they've done for years.[00:52:22] swyx: Like, like last year's NeurIPS, I already was interviewing some of their video people. I forget their model name, but for, for people who are dedicated fans, they can go to NeurIPS 2023 and see, see that paper.[00:52:32] Alessio: And then last but not least, the LLMOS. We renamed it to Ragops, formerly known as[00:52:39] swyx: Ragops War. I put the latest chart on the Braintrust episode.[00:52:43] swyx: I think I'm going to separate these essays from the episode notes. So the reason I used to do that, by the way, is because I wanted to show up on Hacker News. I wanted the podcast to show up on Hacker News. So I always put an essay inside of there because Hacker News people like to read and not listen.[00:52:58] Alessio: So episode essays,[00:52:59] swyx: I remember [00:53:00] purchasing them separately. You say Lanchain Llama Index is still growing.[00:53:03] Alessio: Yeah, so I looked at the PyPy stats, you know. I don't care about stars. On PyPy you see Do you want to share your screen? Yes. I prefer to look at actual downloads, not at stars on GitHub. So if you look at, you know, Lanchain still growing.[00:53:20] Alessio: These are the last six months. Llama Index still growing. What I've basically seen is like things that, One, obviously these things have A commercial product. So there's like people buying this and sticking with it versus kind of hopping in between things versus, you know, for example, crew AI, not really growing as much.[00:53:38] Alessio: The stars are growing. If you look on GitHub, like the stars are growing, but kind of like the usage is kind of like flat. In the last six months, have they done some[00:53:4
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Inference-Only Debate Experiments Using Math Problems, published by Arjun Panickssery on August 6, 2024 on The AI Alignment Forum. Work supported by MATS and SPAR. Code at https://github.com/ArjunPanickssery/math_problems_debate/. Three measures for evaluating debate are 1. whether the debate judge outperforms a naive-judge baseline where the naive judge answers questions without hearing any debate arguments. 2. whether the debate judge outperforms a consultancy baseline where the judge hears argument(s) from a single "consultant" assigned to argue for a random answer. 3. whether the judge can continue to supervise the debaters as the debaters are optimized for persuasiveness. We can measure whether judge accuracy increases as the debaters vary in persuasiveness (measured with Elo ratings). This variation in persuasiveness can come from choosing different models, choosing the best of N sampled arguments for different values of N, or training debaters for persuasiveness (i.e. for winning debates) using RL. Radhakrishan (Nov 2023), Khan et al. (Feb 2024), and Kenton et al. (July 2024) study an information-gap setting where judges answer multiple-choice questions about science-fiction stories whose text they can't see, both with and without a debate/consultancy transcript that includes verified quotes from the debaters/consultant. Past results from the QuALITY information-gap setting are seen above. Radhakrishnan (top row) finds no improvement to judge accuracy as debater Elo increases, while Khan et al. (middle row) and Kenton et al. (bottom row) do find a positive trend. Radhakrishnan varied models using RL while Khan et al. used best-of-N and critique-and-refinement optimizations. Kenton et al. vary the persuasiveness of debaters by using models with different capability levels. Both Khan et al. and Kenton et al. find that in terms of judge accuracy, debate > consultancy > naive judge for this setting. In addition to the information-gap setting, consider a reasoning-gap setting where the debaters are distinguished from the judge not by their extra information but by their stronger ability to answer the questions and explain their reasoning. Kenton et al. run debates on questions from MMLU, TruthfulQA, PrOntoQA (logical reasoning), GQPA, and GSM8K (grade-school math). For the Elo-calculation experiments they use Gemini Pro 1.0 and Pro 1.5 judges with five debaters: Gemma7B, GPT-3.5, Gemini Pro 1.0, Gemini Pro 1.5 (all with best-of-N=1), and Gemini Pro 1.5 with best-of-N=4. They find (top row) that debate slightly outperforms consultancy but outperforms the naive-judge baseline for only one of the four judges; they don't find that more persuasive debaters lead to higher judge accuracy. We get similar results (bottom row), specifically by 1. Generating 100 wrong answers and proofs to GSM8K questions to create binary-choice questions. 2. Computing the judge accuracy in naive, consultancy, and single-turn debate settings using four judges (Llama2-7B, Llama3-8B, GPT-3.5 Turbo, and GPT-4o) and seven debaters (Claude-3.5 Sonnet, Claude-3 Sonnet, GPT-3.5 Turbo, GPT-4o, Llama2-13B, Llama2-7B, and Llama3-8B). 3. Generating Elo scores from round-robin matchups between the seven models, using the same method as Kenton et al. We basically replicate the results. We find that 1. Debate doesn't consistently outperform the naive-judge baseline, and only slightly outperforms the consultancy baseline. 2. The positive relationship between debater persuasiveness and judge accuracy seen in the information-gap setting doesn't transfer to the reasoning-gap setting. (Results are shown below colored by debater rather than by judge). We also find some evidence of a self-preference bias (Panickssery et al., Apr 2024) where debaters have a higher Elo rating when judged by similar models. The GPT-...
Have you ever stumbled upon an article or a piece of content online and wondered, "Did someone actually write this, or is it the work of ChatGPT?" In today's world, where content is produced at an incredible pace, it's becoming increasingly difficult to tell the difference.. and that's a problem in the age of misinformation.Think about it: people are getting their news on social media, X, Youtube or Facebook! With the advancements of AI, it's hard to tell how something online can be truly authentic. With latest studies showing >12% of Google's search results being AI-generated, it's critical to ensure the integrity of the digital content we consume and create. That's where Originality AI comes in! We're thrilled to host Jon Gillham, founder and CEO on Things Have Changed. as he shares how his team are tackling these issues head-on by developing cutting-edge tech to detect AI-generated content. In a short span of time, Originality AI have achieved remarkable results, and is the most accurate AI Detector in the market for ChatGPT, GPT-4o, Gemini Pro, Claude 3, Llama 3 etc.So today on Things Have Changed, we'll dive deep into how Originality AI works, its impact on various industries, and why ensuring content authenticity is more important than ever.The Growth GearExplore business growth and success strategies with Tim Jordan on 'The Growth Gear.Listen on: Apple Podcasts Spotify PodMatchPodMatch Automatically Matches Ideal Podcast Guests and Hosts For InterviewsSupport the Show.Things Have Changed
AMD gibt überraschende Details zum Ryzen AI 300 Prozessor bekannt Französische Kartellbehörde ermittelt gegen Nvidia Google Deepmind zeigt Roboter-Navigation durch Gemini Pro und KI-Modelle lernen auswendig und schlussfolgern nicht https://www.heise.de/thema/KI-Update https://pro.heise.de/ki/ https://www.heise.de/newsletter/anmeldung.html?id=ki-update https://www.heise.de/thema/Kuenstliche-Intelligenz https://the-decoder.de/ https://www.heiseplus.de/podcast https://www.ct.de/ki
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On scalable oversight with weak LLMs judging strong LLMs, published by Zachary Kenton on July 8, 2024 on The AI Alignment Forum. Abstract Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a human judge; consultancy, where a single AI tries to convince a human judge that asks questions; and compare to a baseline of direct question-answering, where the human judge just answers outright without the AI. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models. We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries. We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer. Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed. Previous work assigned debaters/consultants an answer to argue for. When we allow them to instead choose which answer to argue for, we find judges are less frequently convinced by the wrong answer in debate than in consultancy. Further, we find that stronger debater models increase judge accuracy, though more modestly than in previous studies. Twitter thread Setup We evaluate on three types of task. Extractive, where there is a question, two answer options and a source article to extract from, and information-asymmetry, meaning that judges don't get to see the article. Closed, where there is just a question and two answer options. Multimodal, where the questions involve both text and images, and two answer options. Our tasks are summarised in the following table: We consider six protocols: Consultancy, where a single AI is assigned the correct/incorrect answer (with probability 50/50) and tries to convince a judge that asks questions; Open consultancy, which is similar except the AI chooses which answer to argue for. Debate, where two AIs compete to convince a judge; Open debate, which is identical except one debater, marked the protagonist, chooses which answer to argue for. We compare to direct QA protocols: QA without article, where the judge directly answers the question; QA with article, (only on extractive tasks) where the judge directly answers the question given the article. For judge models we use Gemma7B (V1), GPT-3.5, Gemini Pro 1.0 and Gemini Pro 1.5. Our main experiments use Gemini Pro 1.5 as debaters/consultants. Assigned-role results We first look at assigned-role protocols, consultancy and debate, meaning that the consultants/debaters do not get to choose which side to argue for. We compare these to the two direct QA protocols. Findings: We find that debate consistently outperforms consultancy across all tasks, previously only shown on a single extractive QA task in Khan et al., 2024. See paper details for significance levels. Comparing debate to direct question answering baselines, the results depend on the type of task: In extractive QA tasks with information asymmetry, debate outperforms QA without article as in the single task of Khan et al., 2024, but not QA with article. For other tasks, when the judge is weaker than the debaters (but not too weak), we find either small or no advantage to debate over QA without article. Changes to the setup (number of turns, best-of-N sampling, few-shot, chain-of-thought) seem to have little effect on results. See paper for figures showing this. ...
Our 173rd episode with a summary and discussion of last week's big AI news! With hosts Andrey Kurenkov (https://twitter.com/andrey_kurenkov) and Jeremie Harris (https://twitter.com/jeremiecharris) See full episode notes here. Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form. Email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai In this episode of Last Week in AI, we explore the latest advancements and debates in the AI field, including Google's release of Gemini 1.5, Meta's upcoming LLaMA 3, and Runway's Gen 3 Alpha video model. We discuss emerging AI features, legal disputes over data usage, and China's competition in AI. The conversation spans innovative research developments, cost considerations of AI architectures, and policy changes like the U.S. Supreme Court striking down Chevron deference. We also cover U.S. export controls on AI chips to China, workforce development in the semiconductor industry, and Bridgewater's new AI-driven financial fund, evaluating the broader financial and regulatory impacts of AI technologies. Timestamps + links: (00:00:00) Intro / Banter Tools & Apps(00:03:24) Google opens up Gemini 1.5 Flash, Pro with 2M tokens to the public (00:08:47) Meta is about to launch its biggest Llama model yet — here's why it's a big deal (00:12:38) Runway's Gen-3 Alpha AI video model now available – but there's a catch (00:16:28) This is Google AI, and it's coming to the Pixel 9 (00:17:30) AI Firm ElevenLabs Sets Audio Reader Pact With Judy Garland, James Dean, Burt Reynolds and Laurence Olivier Estates (00:20:06) Perplexity's ‘Pro Search' AI upgrade makes it better at math and research (00:23:12) Gemini's data-analyzing abilities aren't as good as Google claims Applications & Business(00:26:38) Quora's Chatbot Platform Poe Allows Users to Download Paywalled Articles on Demand (00:32:04) Huawei and Wuhan Xinxin to develop high-bandwidth memory chips amid US restrictions (00:34:57) Alibaba's large language model tops global ranking of AI developer platform Hugging Face (00:39:01) Here comes a Meta Ray-Bans challenger with ChatGPT-4o and a camera (00:43:35) Apple's Phil Schiller is reportedly joining OpenAI's board (00:47:26) AI Video Startup Runway Looking to Raise $450 Million Projects & Open Source(00:48:10) Kyutai Open Sources Moshi: A Real-Time Native Multimodal Foundation AI Model that can Listen and Speak (00:50:44) MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation (00:53:47) Anthropic Pushes for Third-Party AI Model Evaluations (00:57:29) Mozilla Llamafile, Builders Projects Shine at AI Engineers World's Fair Research & Advancements(00:59:26) Researchers upend AI status quo by eliminating matrix multiplication in LLMs (01:05:55) AI Agents That Matter (01:12:09) WARP: On the Benefits of Weight Averaged Rewarded Policies (01:17:20) Scaling Synthetic Data Creation with 1,000,000,000 Personas (01:24:16) Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization Policy & Safety(01:26:32) With Chevron's demise, AI regulation seems dead in the water (01:33:40) Nvidia to make $12bn from AI chips in China this year despite US controls (01:37:52) Uncle Sam relies on manual processes to oversee restrictions on Huawei, other Chinese tech players (01:40:57) U.S. government addresses critical workforce shortages for the semiconductor industry with new program (01:42:42) Bridgewater starts $2 billion fund that uses machine learning for decision-making and will include models from OpenAI, Anthropic and Perplexity (01:47:57) Outro
In this episode, Tony Safoian interviews Mario Ciabarra, the CEO and founder of Quantum Metric. They discuss Mario's background and journey as an entrepreneur, as well as the evolution of Quantum Metric and its product. They highlight the importance of understanding and listening to customers to improve digital experiences. They also introduce the concept of Generative AI and how it is being implemented in the Quantum Metric platform. The conversation explores the potential of generative AI in improving customer experiences and driving business growth. It highlights the importance of real-time data analysis and the ability to understand and address customer friction points. The use of Google Cloud Platform (GCP) and Gemini Pro is discussed as a powerful solution for leveraging generative AI. The conversation also emphasizes the value of partnerships and the role of data in determining winners and losers in the market. The future of the industry is predicted to involve faster disruption cycles and a focus on having the right data at the right moment. Don't miss this insightful episode filled with personal anecdotes and cutting-edge technological discussions. Tune in now, and remember to LIKE, SHARE, & SUBSCRIBE for more! Podcast Library YouTube Playlist Host: Tony Safoian | CEO at SADA Guest: Mario Ciabarra | CEO at Quantum Metric To learn more, visit our website here: SADA.com
World Gym世界健身要在高雄左營開店囉!全新獨棟千坪健身房,配備國際級重訓、有氧健身器材,還有游泳池、三溫暖、團體課程一應俱全,豐富你的運動體驗。早鳥優惠享入會費0元,立即登記參觀領限量好禮!https://fstry.pse.is/5yrd44 —— 以上為播客煮與 Firstory Podcast 廣告 —— ------------------------------- 通勤學英語VIP加值內容與線上課程 ------------------------------- 通勤學英語VIP訂閱方案:https://open.firstory.me/join/15minstoday VIP訂閱FAQ: https://15minsengcafe.pse.is/5cjptb 社會人核心英語有聲書課程連結:https://15minsengcafe.pse.is/554esm ------------------------------- 15Mins.Today 相關連結 ------------------------------- 歡迎針對這一集留言你的想法: 留言連結 主題投稿/意見回覆 : ask15mins@gmail.com 官方網站:www.15mins.today 加入Clubhouse直播室:https://15minsengcafe.pse.is/46hm8k 訂閱YouTube頻道:https://15minsengcafe.pse.is/3rhuuy 商業合作/贊助來信:15minstoday@gmail.com ------------------------------- 以下是此單集逐字稿 (播放器有不同字數限制,完整文稿可到官網) ------------------------------- 國際時事跟讀 Ep.K791: Unveiling GPT-4o: OpenAI's Groundbreaking Multimodal Language Model Highlights 主題摘要:GPT-4o is a breakthrough multimodal language model that can handle text, audio, images, and video within a single interface, offering enhanced capabilities and performance.The model's improvements include considering tone of voice, reduced latency for real-time conversations, and integrated vision capabilities, opening up new possibilities for interactive experiences.While GPT-4o has limitations and risks, it aligns with OpenAI's mission to develop AGI and has the potential to revolutionize human-AI interactions across various contexts. OpenAI has recently unveiled GPT-4o, its latest large language model and the successor to GPT-4 Turbo. This innovative model stands out by accepting prompts in various formats, including text, audio, images, and video, all within a single interface. The "o" in GPT-4o represents "omni," reflecting its ability to handle multiple content types simultaneously, a significant advancement from previous models that required separate interfaces for different media. OpenAI 最近推出了 GPT-4o,這是其最新的大型語言模型,也是 GPT-4 Turbo 的繼任者。這個創新模型的突出之處在於它能夠接受各種格式的提示,包括文字、聲音、圖像和影片,所有這些都在一個單一的界面內。GPT-4o 中的「o」代表「omni」,反映了它能夠同時處理多種內容類型的能力,這是與之前需要為不同媒體使用單獨界面的模型相比的重大進步。 GPT-4o brings several improvements over its predecessor, GPT-4 Turbo. The model can now consider tone of voice, enabling more emotionally appropriate responses. Additionally, the reduced latency allows for near-real-time conversations, making it suitable for applications like live translations. GPT-4o's integrated vision capabilities enable it to describe and analyze content from camera feeds or computer screens, opening up new possibilities for interactive experiences and accessibility features for visually impaired users. GPT-4o 在其前身 GPT-4 Turbo 的基礎上帶來了幾項改進。該模型現在可以考慮語調,從而產生更適當情緒的回應。此外,延遲時間的縮短使其能夠進行近乎即時的對話,這使其適用於即時翻譯等應用。GPT-4o 集成的視覺功能使其能夠描述和分析來自攝影機和電腦螢幕的內容,為互動體驗和視障用戶的無障礙功能開闢了新的可能。 In terms of performance, GPT-4o has demonstrated impressive results in various benchmarks, often outperforming other top models like Claude 3 Opus and Gemini Pro 1.5. The model's multimodal training approach shows promise in enhancing its problem-solving abilities, extensive world knowledge, and code generation capabilities. As GPT-4o becomes more widely available, it has the potential to revolutionize how we interact with AI in both personal and professional contexts. 在性能方面,GPT-4o 在各種基準測試中展示了令人印象深刻的結果,通常優於其他頂級模型,如 Claude 3 Opus 和 Gemini Pro 1.5。該模型的多模態訓練方法在提高其解決問題的能力、廣泛的世界知識和代碼生成能力方面顯出極大的潛力。隨著 GPT-4o 變得更加普及,它有可能革新我們在個人和專業領域與 AI 互動的方式。 While GPT-4o represents a significant leap forward, it is not without limitations and risks. Like other generative AI models, its output can be imperfect, particularly when interpreting images, videos, or transcribing speech with technical terms or strong accents. There are also concerns about the potential misuse of GPT-4o's audio capabilities in creating more convincing deepfake scams. As OpenAI continues to refine and optimize this new architecture, addressing these challenges will be crucial to ensure the model's safe and effective deployment. 儘管 GPT-4o 代表了重大的躍進,但它並非沒有局限性和風險。與其他生成式 AI 模型一樣,它的輸出可能並不完美,尤其是在解釋圖像、影片或製作包含技術術語或強烈口音的語音逐字稿時。人們還擔心 GPT-4o 的語音功能可能被濫用,用於創造可信度更高的 deepfake 詐騙。隨著 OpenAI 繼續完善和優化這種新架構,解決這些挑戰將是確保該模型安全有效部署的關鍵。 The release of GPT-4o aligns with OpenAI's mission to develop artificial general intelligence (AGI) and its business model of creating increasingly powerful AI systems. As the first generation of this new model architecture, GPT-4o presents ample opportunities for the company to learn and optimize in the coming months. Users can expect improvements in speed and output quality over time, along with the emergence of novel use cases and applications. GPT-4o 的發布符合 OpenAI 開發通用人工智慧 (AGI) 的使命以及其創建越來越強大的 AI 系統的商業模式。作為這種新模型架構的第一代,GPT-4o 為該公司在未來幾個月內學習和優化提供了充足的機會。用戶可以期待速度和輸出品質隨著時間的推移而提升,以及新的使用案例和應用的出現。 The launch of GPT-4o coincides with the declining interest in virtual assistants like Siri, Alexa, and Google Assistant. OpenAI's focus on making AI more conversational and interactive could potentially revitalize this space and bring forth a new wave of AI-driven experiences. The model's lower cost compared to GPT-4 Turbo, coupled with its enhanced capabilities, positions GPT-4o as a game-changer in the AI industry. GPT-4o 的推出恰逢人們對 Siri、Alexa 和 Google Assistant 等虛擬助手的興趣下降之際。OpenAI 致力於使 AI 更具對話性和交互性,這可能會重振該領域,帶來新一波 AI 驅動的體驗。與 GPT-4 Turbo 相比,該模型的成本更低,再加上其增強的功能,使 GPT-4o 成為 AI 行業的遊戲規則改變者。 As GPT-4o becomes more accessible, it is essential for individuals and professionals to familiarize themselves with the technology and its potential applications. OpenAI offers resources such as the AI Fundamentals skill track and hands-on courses on working with the OpenAI API to help users navigate this exciting new frontier in artificial intelligence. 隨著 GPT-4o 變得更加易於獲取,個人和專業人士必須熟悉該技術及其潛在應用。OpenAI 提供了資源,如 AI 基礎技能追蹤和使用 OpenAI API 的相關實踐課程,以幫助用戶探索人工智慧的這個令人興奮的新疆土。 Keyword Drills 關鍵字:Interface (In-ter-face): The "o" in GPT-4o represents "omni," reflecting its ability to handle multiple content types simultaneously, a significant advancement from previous models that required separate interfaces for different media.Predecessor (Pred-e-ces-sor): GPT-4o brings several improvements over its predecessor, GPT-4 Turbo.Architecture (Ar-chi-tec-ture): As the first generation of this new model architecture, GPT-4o presents ample opportunities for the company to learn and optimize.Interpreting (In-ter-pre-ting): Like other generative AI models, its output can be imperfect, particularly when interpreting images, videos, or transcribing speech with technical terms or strong accents.Revitalize (Re-vi-ta-lize): OpenAI's focus on making AI more conversational and interactive could potentially revitalize this space and bring forth a new wave of AI-driven experiences. Reference article: https://www.datacamp.com/blog/what-is-gpt-4o
Today we explore the deluge of announcements from both OpenAi and Google. With a plethora of Ai features dropping at Google I/O And Chat GPT-4o landing with an ai that can be spoken to like a human, how do we determine the difference between groundbreaking AI tools and mere gimmicks. How do we discern practical applications from overhyped features? Join Guy as he navigates the latest AI developments, asking the critical question: What truly enhances our digital lives and what falls short? Links to check out: Rabbit R1 (Link: https://www.rabbit.tech/rabbit-r1) Google I/O Announcements: Coverage of the latest features and tools introduced by Google, including the Gemini Pro and video gen models. (Link: https://io.google/2024/) OpenAI's GPT-40 Announcement: Insights into the latest generative pre-trained transformer model which emphasizes voice interaction (Link: https://openai.com/index/hello-gpt-4o/) Satlantis Project (Link: https://satlantis.com/) Welcome to the World of Audio Computers - Jason Rugolo TED talk (Link: https://tinyurl.com/4zc62nhc) Nova Project: Focus on a business-oriented AI platform that prioritizes open-source solutions and privacy for handling sensitive data. (Link: Pending) Host Links Guy on Nostr (Link: http://tinyurl.com/2xc96ney) Guy on X (Link: https://twitter.com/theguyswann) Guy on Instagram (Link: https://www.instagram.com/theguyswann/) Guy on TikTok (Link: https://www.tiktok.com/@theguyswann) Guy on YouTube (Link: https://www.youtube.com/@theguyswann) Bitcoin Audible on X (Link: https://twitter.com/BitcoinAudible) Check out our awesome sponsors! Get 10% off the COLDCARD with code BITCOINAUDIBLE (Link: bitcoinaudible.com/coldcard) Swan: The best way to buy, learn, and earn #Bitcoin (Link: https://swanbitcoin.com) "The Limits of my language means the limits of my world"~ Ludwig Wittgenstein
Today we explore the deluge of announcements from both OpenAi and Google. With a plethora of Ai features dropping at Google I/O And Chat GPT-4o landing with an ai that can be spoken to like a human, how do we determine the difference between groundbreaking AI tools and mere gimmicks. How do we discern practical applications from overhyped features? Join Guy as he navigates the latest AI developments, asking the critical question: What truly enhances our digital lives and what falls short? Links to check out: Rabbit R1 (Link: https://www.rabbit.tech/rabbit-r1) Google I/O Announcements: Coverage of the latest features and tools introduced by Google, including the Gemini Pro and video gen models. (Link: https://io.google/2024/) OpenAI's GPT-40 Announcement: Insights into the latest generative pre-trained transformer model which emphasizes voice interaction (Link: https://openai.com/index/hello-gpt-4o/) Satlantis Project (Link: https://satlantis.com/) Welcome to the World of Audio Computers - Jason Rugolo TED talk (Link: https://tinyurl.com/4zc62nhc) Nova Project: Focus on a business-oriented AI platform that prioritizes open-source solutions and privacy for handling sensitive data. (Link: Pending) Host Links Guy on Nostr (Link: http://tinyurl.com/2xc96ney) Guy on X (Link: https://twitter.com/theguyswann) Guy on Instagram (Link: https://www.instagram.com/theguyswann) Guy on TikTok (Link: https://www.tiktok.com/@theguyswann) Guy on YouTube (Link: https://www.youtube.com/@theguyswann) Bitcoin Audible on X (Link: https://twitter.com/BitcoinAudible) The Guy Swann Network Broadcast Room on Keet (Link: https://tinyurl.com/3na6v839) Check out our awesome sponsors! Fold: The best way to buy, use, and earn #Bitcoin on everything you do! Sats back on your debit card, gift cards, auto-buys, round-ups, you name it. Fold is the true bitcoiner's banking. Get 20K sats for FREE using referral code bitcoinaudible.com/fold Ready for best-in-class self custody? Get the Jade here and use discount code 'GUY' to get 10% off (Link: bitcoinaudible.com/jade) Trying to BUY BITCOIN? River, secure, trusted, bitcoin only, lightning enabled, simple. (Link: https://bitcoinaudible.com/river) Bitcoin Games! Get 10% off the best Bitcoin board game in the world, HODLUP! Or any of the other great games from the Free Market Kids! Use code GUY10 at checkout for 10% off your cart! (Link: https://www.freemarketkids.com/collections/games-1) Bitcoin Custodial Multisig
Join the fun at: https://thisdayinai.comSimTheory: https://simtheory.aiShow notes: https://thisdayinai.com/bookmarks/55-ep63/UDIO song: https://www.udio.com/songs/iu1381RxvjfzWznGHeVecVThanks for listening and all your support of the show!CHAPTERS:------00:00 - We're changing the name of the show00:52 - Thoughts on GPT-4o (GPT4 Omni), ChatGPT Free Vs Plus & impressions27:57 - ChatGPT Voice Mode: A Dramatic Shift? Voice as a Platform: Star Trek Vs Her34:54 - Project Astra & The Future Interface of AI Computing52:28 - Applying AI Technologies: are the next 3 years a golden age for developers implementing AI?55:23 - Do we have to become Cyborgs to find our keys?1:06:24 - Google I/O AI Recap: Google's Context Caching, Tools for Project Astra, Impressions of Gemini Pro 1.5, Gemma, Gemini Flash, Veo etc.1:37:43 - Our Favorite UDIO song of the week
OpenAI a présenté GPT-4o pour ChatGPT et Google a présenté plusieurs annonces dont Gemini Pro 1.5 à la Google I/O. Deux salles, deux ambiances… GPT-4o VS Gemini Polémiques Jeux vidéo Participants
Infomaniak partage les valeurs de Tech Café : éthique, écologie et respect de la vie privée. Découvrez les services de notre partenaire sur Infomaniak.comOpenAI a présenté GPT-4o pour ChatGPT et Google a présenté plusieurs annonces dont Gemini Pro 1.5 à la Google I/O. Deux salles, deux ambiances... ❤️ Patreon
Welcome to episode 257 of the Cloud Pod podcast – where the forecast is always cloudy! This week your hosts Justin, Matthew, Ryan, and Jonathan are in the barnyard bringing you the latest news, which this week is really just Meta's release of Llama 3. Seriously. That's every announcement this week. Don’t say we didn't warn you. Titles we almost went with this week: Meta Llama says no Drama No Meta Prob-llama Keep Calm and Llama on Redis did not embrace the Llama MK The bedrock of good AI is built on Llamas The CloudPod announces support for Llama3 since everyone else was doing it Llama3, better know as Llama Llama Llama The Cloud Pod now known as the LLMPod Cloud Pod is considering changing its name to LlamaPod Unlike WinAMP nothing whips the llamas ass A big thanks to this week's sponsor: Check out Sonrai Securities‘ new Cloud Permission Firewall. Just for our listeners, enjoy a 14 day trial at www.sonrai.co/cloudpod Follow Up 01:27 Valkey is Rapidly Overtaking Redis Valkey has continued to rack up support from AWS, Ericsson, Google, Oracle and Verizon initially, to now being joined by Alibaba, Aiven, Heroku and Percona backing Valkey as well. Numerous blog posts have come out touting Valkey adoption. I'm not sure this whole thing is working out as well as Redis CEO Rowan Trollope had hoped. AI Is Going Great – Or How AI Makes All It's Money 03:26 Introducing Meta Llama 3: The most capable openly available LLM to date Meta has launched Llama 3, the next generation of their state-of-the-art open source large language model. Llama 3 will be available on AWS, Databricks, GCP, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, Nvidia NIM, and Snowflake with support from hardware platforms offered by AMD, AWS, Dell, Intel, Nvidia and Qualcomm Includes new trust and safety tools such as Llama Guard 2, Code Shield and Cybersec eval 2 They plan to introduce new capabilities, including longer context windows, additional model sizes and enhanced performance. The first two models from Meta Lama3 are the 8B and 70B parameter variants that can support a broad range of use cases. Meta shared some benchmarks against Gemma 7B and Mistral 7B vs the Lama 3 8B models and showed improvements across all major benchmarks. Including Math with Gemma 7b doing 12.2 vs 30 with Llama 3 It had highly comparable performance with the 70B model against Gemini Pro 1.5 and Claude 3 Sonnet scoring within a few points of most of the other scores. Jonathan recommends using LM Studio to get start playing around with LLMS, which you can find at https://lmstudio.ai/ 04:42 Jonathan – “Isn’t it funny how you go from an 8 billion parameter model to a 70 billion parameter model but nothing in between? Like you would have thought there would be some kind of like, some middle ground maybe? But, uh, but… No. But, um,
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Llama-3 and Dwarkesh Patel's Podcast with Zuckerberg, published by Zvi on April 22, 2024 on LessWrong. It was all quiet. Then it wasn't. Note the timestamps on both of these. Dwarkesh Patel did a podcast with Mark Zuckerberg on the 18th. It was timed to coincide with the release of much of Llama-3, very much the approach of telling your story directly. Dwarkesh is now the true tech media. A meteoric rise, and well earned. This is two related posts in one. First I cover the podcast, then I cover Llama-3 itself. My notes are edited to incorporate context from later explorations of Llama-3, as I judged that the readability benefits exceeded the purity costs. Podcast Notes: Llama-3 Capabilities (1:00) They start with Llama 3 and the new L3-powered version of Meta AI. Zuckerberg says "With Llama 3, we think now that Meta AI is the most intelligent, freely-available assistant that people can use." If this means 'free as in speech' then the statement is clearly false. So I presume he means 'free as in beer.' Is that claim true? Is Meta AI now smarter than GPT-3.5, Claude 2 and Gemini Pro 1.0? As I write this it is too soon to tell. Gemini Pro 1.0 and Claude 3 Sonnet are slightly ahead of Llama-3 70B on the Arena leaderboard. But it is close. The statement seems like a claim one can make within 'reasonable hype.' Also, Meta integrates Google and Bing for real-time knowledge, so the question there is if that process is any good, since most browser use by LLMs is not good. (1:30) Meta are going in big on their UIs, top of Facebook, Instagram and Messenger. That makes sense if they have a good product that is robust, and safe in the mundane sense. If it is not, this is going to be at the top of chat lists for teenagers automatically, so whoo boy. Even if it is safe, there are enough people who really do not like AI that this is probably a whoo boy anyway. Popcorn time. (1:45) They will have the ability to animate images and it generates high quality images as you are typing and updates them in real time as you are typing details. I can confirm this feature is cool. He promises multimodality, more 'multi-linguality' and bigger context windows. (3:00) Now the technical stuff. Llama-3 follows tradition in training models in three sizes, here 8b, 70b that released on 4/18, and a 405b that is still training. He says 405b is already around 85 MMLU and they expect leading benchmarks. The 8b Llama-3 is almost as good as the 70b Llama-2. The Need for Inference (5:15) What went wrong earlier for Meta and how did they fix it? He highlights Reels, with its push to recommend 'unconnected content,' meaning things you did not ask for, and not having enough compute for that. They were behind. So they ordered double the GPUs that needed. They didn't realize the type of model they would want to train. (7:30) Back in 2006, what would Zuck have sold for when he turned down $1 billion? He says he realized if he sold he'd just build another similar company, so why sell? It wasn't about the number, he wasn't in position to evaluate the number. And I think that is actually wise there. You can realize that you do not want to accept any offer someone would actually make. (9:15) When did making AGI become a key priority? Zuck points out Facebook AI Research (FAIR) is 10 years old as a research group. Over that time it has become clear you need AGI, he says, to support all their other products. He notes that training models on coding generalizes and helps their performance elsewhere, and that was a top focus for Llama-3. So Meta needs to solve AGI because if they don't 'their products will be lame.' It seems increasingly likely, as we will see in several ways, that Zuck does not actually believe in 'real' AGI. By 'AGI' he means somewhat more capable AI. (13:40) What will the Llama that makes cool produ...
- Paydates and payday schedules in tech firms discussed- The shift in U.S stock trading: Advancement to T+1 instead of T+2- An unexpected discovery: Formula One Mercedes car Lego set- Netflix documentary 'Drive to Survive' recommended for Formula One beginners- Formula One viewed through the lens of an engineer: competition dynamics, changes in rules, strategies of the players- Insight into the upcoming Las Vegas Formula One event- Discussions on AI models: Pros and cons of Llama3, comparing Meta's GPT-3.5 to GPT-4- Key highlights from Google's 2024 Cloud Next Event: AI Agents, AI and the role of BigQuery as a VectorDB- Comparing AI models: ChargerPT vs Llama7b- EV vs gas vehicles: Examining Cybertruck's features, recall event, and travel range- Showcase of Gemini Pro's feature that converts YouTube links into a blog post- Views on owning a Cybertruck: Weighing personal circumstances against the vehicle's features- Discussing EV charging at home: Considering potential cost and utility, possible universal charging standards by Tesla- Job changes revealed: Hosts' anticipation for their new roles at 'Snowflake'- Evaluating 'Hell Divers' game: Discussing PlayStation and Xbox strategies- Episode closure and segue into the next podcast# Links mentioned:- [Join us on our Discord channel](https://discord.gg/T38WpgkHGQ)- [Watch 'Drive to Survive' on Netflix](https://www.netflix.com/title/80204890)
Welcome to Now in Android, your ongoing guide to what's new and notable in the world of Android development. Today, we're covering the Android 15 Beta release, how Android Studio uses Gemini Pro to make Android development faster and easier, a story about how Google Drive cut code and development time in half, and how to use Dependency Injection in Compose! For links to these items, check out Now in Android #103 on Medium → https://goo.gle/3xz1Otd Now in Android podcast → https://goo.gle/podcast-nia Now in Android articles → https://goo.gle/articles-nia Watch more Now in Android → https://goo.gle/now-in-android Subscribe to Android Developers → https://goo.gle/AndroidDevs
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #58: Stargate AGI, published by Zvi on April 5, 2024 on LessWrong. Another round? Of economists projecting absurdly small impacts, of Google publishing highly valuable research, a cycle of rhetoric, more jailbreaks, and so on. Another great podcast from Dwarkesh Patel, this time going more technical. Another proposed project with a name that reveals quite a lot. A few genuinely new things, as well. On the new offerings front, DALLE-3 now allows image editing, so that's pretty cool. Table of Contents Don't miss out on Dwarkesh Patel's podcast with Sholto Douglas and Trenton Bricken, which got the full write-up treatment. Introduction. Table of Contents. Language Models Offer Mundane Utility. Never stop learning. Language Models Don't Offer Mundane Utility. The internet is still for porn. Clauding Along. Good at summarization but not fact checking. Fun With Image Generation. DALLE-3 now has image editing. Deepfaketown and Botpocalypse Soon. OpenAI previews voice duplication. They Took Our Jobs. Employment keeps rising, will continue until it goes down. The Art of the Jailbreak. It's easy if you try and try again. Cybersecurity. Things worked out this time. Get Involved. Technical AI Safety Conference in Tokyo tomorrow. Introducing. Grok 1.5, 25 YC company models and 'Dark Gemini.' In Other AI News. Seriously, Google, stop publishing all your trade secrets. Stargate AGI. New giant data center project, great choice of cautionary title. Larry Summers Watch. Economists continue to have faith in nothing happening. Quiet Speculations. What about interest rates? Also AI personhood. AI Doomer Dark Money Astroturf Update. OpenPhil annual report. The Quest for Sane Regulations. The devil is in the details. The Week in Audio. A few additional offerings this week. Rhetorical Innovation. The search for better critics continues. Aligning a Smarter Than Human Intelligence is Difficult. What are human values? People Are Worried About AI Killing Everyone. Can one man fight the future? The Lighter Side. The art must have an end other than itself. Language Models Offer Mundane Utility A good encapsulation of a common theme here: Paul Graham: AI will magnify the already great difference in knowledge between the people who are eager to learn and those who aren't. If you want to learn, AI will be great at helping you learn. If you want to avoid learning? AI is happy to help with that too. Which AI to use? Ethan Mollick examines our current state of play. Ethan Mollick (I edited in the list structure): There is a lot of debate over which of these models are best, with dueling tests suggesting one or another dominates, but the answer is not clear cut. All three have different personalities and strengths, depending on whether you are coding or writing. Gemini is an excellent explainer but doesn't let you upload files. GPT-4 has features (namely Code Interpreter and GPTs) that greatly extend what it can do. Claude is the best writer and seems capable of surprising insight. But beyond the differences, there are four important similarities to know about: All three are full of ghosts, which is to say that they give you the weird illusion of talking to a real, sentient being - even though they aren't. All three are multimodal, in that they can "see" images. None of them come with instructions. They all prompt pretty similarly to each other. I would add there are actually four models, not three, because there are (at last!) two Geminis, Gemini Advanced and Gemini Pro 1.5, if you have access to the 1.5 beta. So I would add a fourth line for Gemini Pro 1.5: Gemini Pro has a giant context window and uses it well. My current heuristic is something like this: If you need basic facts or explanation, use Gemini Advanced. If you want creativity or require intelligence and nuance, or code, use Claude. If ...
Tools of the Month:apoc.create.vRelationship https://neo4j.com/docs/apoc/current/overview/apoc.create/apoc.create.vRelationship/GenAI Starter Kits for Langchain, LlamaIndex, Spring.AI and Semantic Kernel, covering the most popular orchestration frameworks in Python, Java, and dotnet. https://neo4j.com/labs/genai-ecosystem/Vish: Vector support in Neo4j https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/vector-indexes/Articles:Implementing RAG: How to Write a Graph Retrieval Query in LangChain https://neo4j.com/developer-blog/rag-graph-retrieval-query-langchain/Implementing Advanced Retrieval RAG Strategies with Neo4j https://neo4j.com/developer-blog/advanced-rag-strategies-neo4j/Using a Knowledge Graph to Implement a RAG Application https://neo4j.com/developer-blog/knowledge-graph-rag-application/Generative Transformation from ER Diagram to Graph Model Using Google's Gemini Pro https://neo4j.com/developer-blog/genai-graph-model-google-gemini-pro/Cypher Workbench as a Neo4j Labs Project https://neo4j.com/developer-blog/cypher-workbench-neo4j-labs-project/Accelerate Neo4j App Development with Low-Code Keymaker Framework https://neo4j.com/developer-blog/keymaker-low-code-neo4j-framework/Needle StarterKit 2.0: Templates, Chatbot, and More! https://neo4j.com/developer-blog/needle-starterkit-2-0-templates-chatbot/Announcing Neo4j JDBC Driver Version 6 https://neo4j.com/developer-blog/neo4j-jdbc-driver-v6/Videos:NODES 2023 playlist https://youtube.com/playlist?list=PL9Hl4pk2FsvUu4hzyhWed8Avu5nSUXYrb&si=8_0sYVRYz8CqqdIcEvents:(Apr 2) YouTube series: Going Meta - A Series on Graph, Semantics, and Knowledge Episode 27 https://www.youtube.com/@neo4j/live(Apr 2) Conference (Paris, France): AWS Summit Paris https://aws.amazon.com/fr/events/summits/emea/paris/(Apr 8) Conference (London, UK): QCon London https://qconlondon.com/(Apr 8) Conference (Madrid, Spain): GraphSummit Madrid https://neo4j.com/graphsummit/madrid24/(Apr 8) Conference (Nürburgring, Germany): Javaland 2024 https://www.javaland.eu/en/home/(Apr 9) Conference (Las Vegas, NV, USA): Google Cloud Next https://cloud.withgoogle.com/next(Apr 9) Conference (Atlanta, GA, USA): DevNexus 2024 https://devnexus.com/(Apr 9) Workshop (Munich, Germany): Amazon Bedrock & Neo4j https://go.neo4j.com/LE240409AWSBedrockWorkshopMunich_Registration.html(Apr 9) Conference (Sydney, Australia): AWS Summit Sydney https://neo4j.com/event/aws-summit-sydney/(Apr 13) Workshop (San Francisco, CA, USA): GenAI Beyond Chat with RAG, Knowledge Graphs and Python https://www.meetup.com/graphdb-sf/events/299339190/(Apr 16) Conference (Paris, France): Devoxx France https://www.devoxx.fr/(Apr 17) Workshop (Toronto, ON, Canada): Neo4j & AWS Generative AI https://go.neo4j.com/LE240417AWSandNeo4jGenerativeAIHands-onLabToronto_Registration.html(Apr 18) Meetup (San Francisco, CA, USA): Cloud-Native Geospatial Analytics Combining Spatial SQL & Graph Data Science https://www.meetup.com/graphdb-sf/events/297525658/(Apr 23) Conference (Bengaluru, India): GIDS India 2024 https://www.meetup.com/graphdb-sf/events/297525658/(Apr 23) Workshop (Chicago, IL, USA): Neo4j and Google Cloud GenAI Hands-On https://go.neo4j.com/LE240423-Neo4j-GCP-GenAI-Workshop---Chicago_Registration.html(Apr 23) Conference (Stockholm, Sweden): Penningtvattsdagarna https://penningtvattsdagarna.se/anmalan/(Apr 24) Conference (Stockholm, Sweden): Data Innovation Summit https://datainnovationsummit.com/(Apr 24) Conference (London, UK): AWS Summit London https://aws.amazon.com/events/summits/emea/london/(Apr 24) Conference (Munich, Germany): GraphSummit Munich https://neo4j.com/graphsummit/munich-apr-24/(Apr 25) Hands-On Lab (New York City, NY, USA): AWS and Neo4j Generative AI https://go.neo4j.com/LE-240425-LE-240425-AWS-GenAI-Workshop-NYC_Registration.html(Apr 25) Meetup (London, UK): Modern Java Ecosystems: Advancing Connectivity and Cloud Deployment https://www.meetup.com/graphdb-uk/events/299949029/
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #56: Blackwell That Ends Well, published by Zvi on March 23, 2024 on LessWrong. Hopefully, anyway. Nvidia has a new chip. Also Altman has a new interview. And most of Inflection has new offices inside Microsoft. Table of Contents Introduction. Table of Contents. Language Models Offer Mundane Utility. Open the book. Clauding Along. Claude continues to impress. Language Models Don't Offer Mundane Utility. What are you looking for? Fun With Image Generation. Stable Diffusion 3 paper. Deepfaketown and Botpocalypse Soon. Jesus Christ. They Took Our Jobs. Noah Smith has his worst take amd commits to the bit. Generative AI in Games. What are the important dangers? Get Involved. EU AI office, IFP, Anthropic. Introducing. WorldSim. The rabbit hole goes deep, if you want that. Grok the Grok. Weights are out. Doesn't seem like it matters much. New Nivida Chip. Who dis? Inflection Becomes Microsoft AI. Why buy companies when you don't have to? In Other AI News. Lots of other stuff as well. Wait Till Next Year. OpenAI employees talk great expectations a year after GPT-4. Quiet Speculations. Driving cars is hard. Is it this hard? The Quest for Sane Regulation. Take back control. The Week in Audio. Sam Altman on Lex Fridman. Will share notes in other post. Rhetorical Innovation. If you want to warn of danger, also say what is safe. Read the Roon. What does it all add up to? Pick Up the Phone. More good international dialogue on AI safety. Aligning a Smarter Than Human Intelligence is Difficult. Where does safety lie? Polls Show People Are Worried About AI. This week's is from AIPI. Other People Are Not As Worried About AI Killing Everyone. Then there's why. The Lighter Side. Everyone, reaping. Language Models Offer Mundane Utility Ethan Mollick on how he uses AI to aid his writing. The central theme is 'ask for suggestions in particular places where you are stuck' and that seems right for most purposes. Sully is predictably impressed by Claude Haiku, says it offers great value and speed, and is really good with images and long context, suggests using it over GPT-3.5. He claims Cohere Command-R is the new RAG king, crushing it with citations and hasn't hallucinated once, while writing really well if it has context. And he thinks Hermes 2 Pro is 'cracked for agentic function calling,' better for recursive calling than GPT-4, but 4k token limit is an issue. I believe his reports but also he always looks for the bright side. Claude does acausal coordination. This was of course Easy Mode. Claude also successfully solves counterfactual mugging when told it is a probability theorist, but not if it is not told this. Prompting is key. Of course, this also presumes that the user is telling the truth sufficiently often. One must always watch out for that other failure mode, and Claude does not consider the probability the user is lying. Amr Awadallah notices self-evaluated reports that Cohere Command-R has a very low hallucination rate of 3.7%, below that of Claude Sonnet (6%) and Gemini Pro (4.8%), although GPT-3.5-Turbo is 3.5%. From Claude 3, describe things at various levels of sophistication (here described as IQ levels, but domain knowledge seems more relevant to which one you will want in such spots). In this case they are describing SuperFocus.ai, which provides custom conversational AIs that claim to avoid hallucinations by drawing on a memory bank you maintain. However, when looking at it, it seems like the 'IQ 115' and 'IQ 130' descriptions tell you everything you need to know, and the only advantage of the harder to parse 'IQ 145' is that it has a bunch of buzzwords and hype attached. The 'IQ 100' does simplify and drop information in order to be easier to understand, but if you know a lot about AI you can figure out what it is dropping very easily. Figure out whether a resume ...
Our next SF event is AI UX 2024 - let's see the new frontier for UX since last year! Last call: we are recording a preview of the AI Engineer World's Fair with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have!Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an “ex-technical co-founder type”. Reach out to him for more!David Luan has been at the center of the modern AI revolution: he was the ~30th hire at OpenAI, he led Google's LLM efforts and co-led Google Brain, and then started Adept in 2022, one of the leading companies in the AI agents space. In today's episode, we asked David for some war stories from his time in early OpenAI (including working with Alec Radford ahead of the GPT-2 demo with Sam Altman, that resulted in Microsoft's initial $1b investment), and how Adept is building agents that can “do anything a human does on a computer" — his definition of useful AGI.Why Google *couldn't* make GPT-3While we wanted to discuss Adept, we couldn't talk to a former VP Eng of OpenAI and former LLM tech lead at Google Brain and not ask about the elephant in the room. It's often asked how Google had such a huge lead in 2017 with Vaswani et al creating the Transformer and Noam Shazeer predicting trillion-parameter models and yet it was David's team at OpenAI who ended up making GPT 1/2/3. David has some interesting answers:“So I think the real story of GPT starts at Google, of course, right? Because that's where Transformers sort of came about. However, the number one shocking thing to me was that, and this is like a consequence of the way that Google is organized…what they (should) have done would be say, hey, Noam Shazeer, you're a brilliant guy. You know how to scale these things up. Here's half of all of our TPUs. And then I think they would have destroyed us. He clearly wanted it too…You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right? Google has all this compute. Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, we're probably just doing duplicative research to what he's doing. He's got this decoder only transformer that's probably going to get there before we do. And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and I was one of the brain leads, you know, it became really clear why. At the time, there was a thing called the Brain Credit Marketplace. Everyone's assigned a credit. So if you have a credit, you get to buy end chips according to supply and demand. So if you want to go do a giant job, you had to convince like 19 or 20 of your colleagues not to do work. And if that's how it works, it's really hard to get that bottom up critical mass to go scale these things. And the team at Google were fighting valiantly, but we were able to beat them simply because we took big swings and we focused.”Cloning HGI for AGIHuman intelligence got to where it is today through evolution. Some argue that to get to AGI, we will approximate all the “FLOPs” that went into that process, an approach most famously mapped out by Ajeya Cotra's Biological Anchors report:The early days of OpenAI were very reinforcement learning-driven with the Dota project, but that's a very inefficient way for these models to re-learn everything. (Kanjun from Imbue shared similar ideas in her episode).David argues that there's a shortcut. We can bootstrap from existing intelligence.“Years ago, I had a debate with a Berkeley professor as to what will it actually take to build AGI. And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get there… I think we are ignoring the fact that you have a giant shortcut, which is you can behaviorally clone everything humans already know. And that's what we solved with LLMs!”LLMs today basically model intelligence using all (good!) written knowledge (see our Datasets 101 episode), and have now expanded to non-verbal knowledge (see our HuggingFace episode on multimodality). The SOTA self-supervised pre-training process is surprisingly data-efficient in taking large amounts of unstructured data, and approximating reasoning without overfitting.But how do you cross the gap from the LLMs of today to building the AGI we all want? This is why David & friends left to start Adept.“We believe the clearest framing of general intelligence is a system that can do anything a human can do in front of a computer. A foundation model for actions, trained to use every software tool, API, and webapp that exists, is a practical path to this ambitious goal” — ACT-1 BlogpostCritical Path: Abstraction with ReliabilityThe AGI dream is fully autonomous agents, but there are levels to autonomy that we are comfortable giving our agents, based on how reliable they are. In David's word choice, we always want higher levels of “abstractions” (aka autonomy), but our need for “reliability” is the practical limit on how high of an abstraction we can use.“The critical path for Adept is we want to build agents that can do a higher and higher level abstraction things over time, all while keeping an insanely high reliability standard. Because that's what turns us from research into something that customers want. And if you build agents with really high reliability standard, but are continuing pushing a level of abstraction, you then learn from your users how to get that next level of abstraction faster. So that's how you actually build the data flow. That's the critical path for the company. Everything we do is in service of that.”We saw how Adept thinks about different levels of abstraction at the 2023 Summit:The highest abstraction is the “AI Employee”, but we'll get there with “AI enabled employees”. Alessio recently gave a talk about the future of work with “services as software” at this week's Nvidia GTC (slides).No APIsUnlike a lot of large research labs, Adept's framing of AGI as "being able to use your computer like a human" carries with it a useful environmental constraint:“Having a human robot lets you do things that humans do without changing everything along the way. It's the same thing for software, right? If you go itemize out the number of things you want to do on your computer for which every step has an API, those numbers of workflows add up pretty close to zero. And so then many points along the way, you need the ability to actually control your computer like a human. It also lets you learn from human usage of computers as a source of training data that you don't get if you have to somehow figure out how every particular step needs to be some particular custom private API thing. And so I think this is actually the most practical path (to economic value).”This realization and conviction means that multimodal modals are the way to go. Instead of using function calling to call APIs to build agents, which is what OpenAI and most of the open LLM industry have done to date, Adept wants to “drive by vision”, (aka see the screen as a human sees it) and pinpoint where to click and type as a human does. No APIs needed, because most software don't expose APIs.Extra context for readers: You can see the DeepMind SIMA model in the same light: One system that learned to play a diverse set of games (instead of one dedicated model per game) using only pixel inputs and keyboard-and-mouse action outputs!The OpenInterpreter team is working on a “Computer API” that also does the same.To do this, Adept had to double down on a special kind of multimodality for knowledge work:“A giant thing that was really necessary is really fast multimodal models that are really good at understanding knowledge work and really good at understanding screens. And that is needs to kind of be the base for some of these agents……I think one big hangover primarily academic focus for multimodal models is most multimodal models are primarily trained on like natural images, cat and dog photos, stuff that's come out of the camera… (but) where are they going to be the most useful? They're going to be most useful in knowledge work tasks. That's where the majority of economic value is going to be. It's not in cat and dogs. And so if that's what it is, what do you need to train? I need to train on like charts, graphs, tables, invoices, PDFs, receipts, unstructured data, UIs. That's just a totally different pre-training corpus. And so Adept spent a lot of time building that.”With this context, you can now understand the full path of Adept's public releases:* ACT-1 (Sept 2022): a large Transformers model optimized for browser interactions. It has a custom rendering of the browser viewport that allows it to better understand it and take actions.* Persimmon-8B (Sept 2023): a permissive open LLM (weights and code here)* Fuyu-8B (Oct 2023): a small version of the multimodal model that powers Adept. Vanilla decoder-only transformer with no specialized image encoder, which allows it to handle input images of varying resolutions without downsampling.* Adept Experiments (Nov 2023): A public tool to build automations in the browser. This is powered by Adept's core technology but it's just a piece of their enterprise platform. They use it as a way to try various design ideas.* Fuyu Heavy (Jan 2024) - a new multimodal model designed specifically for digital agents and the world's third-most-capable multimodal model (beating Gemini Pro on MMMU, AI2D, and ChartQA), “behind only GPT4-V and Gemini Ultra, which are 10-20 times bigger”The Fuyu-8B post in particular exhibits a great number of examples on knowledge work multimodality:Why Adept is NOT a Research LabWith OpenAI now worth >$90b and Anthropic >$18b, it is tempting to conclude that the AI startup metagame is to build a large research lab, and attract the brightest minds and highest capital to build AGI. Our past guests (see the Humanloop episode) and (from Imbue) combined to ask the most challenging questions of the pod - with David/Adept's deep research pedigree from Deepmind and OpenAI, why is Adept not building more general foundation models (like Persimmon) and playing the academic benchmarks game? Why is Adept so focused on commercial agents instead?“I feel super good that we're doing foundation models in service of agents and all of the reward within Adept is flowing from “Can we make a better agent”…… I think pure play foundation model companies are just going to be pinched by how good the next couple of (Meta Llama models) are going to be… And then seeing the really big players put ridiculous amounts of compute behind just training these base foundation models, I think is going to commoditize a lot of the regular LLMs and soon regular multimodal models. So I feel really good that we're just focused on agents.”and the commercial grounding is his answer to Kanjun too (whom we also asked the inverse question to compare with Adept):“… the second reason I work at Adept is if you believe that actually having customers and a reward signal from customers lets you build AGI faster, which we really believe, then you should come here. And I think the examples for why that's true is for example, our evaluations are not academic evals. They're not simulator evals. They're like, okay, we have a customer that really needs us to do these particular things. We can do some of them. These are the ones they want us to, we can't do them at all. We've turned those into evals.. I think that's a degree of practicality that really helps.”And his customers seem pretty happy, because David didn't need to come on to do a sales pitch:David: “One of the things we haven't shared before is we're completely sold out for Q1.”Swyx: “Sold out of what?”David: “Sold out of bandwidth to onboard more customers.”Well, that's a great problem to have.Show Notes* David Luan* Dextro at Data Driven NYC (2015)* Adept* ACT-1* Persimmon-8B* Adept Experiments* Fuyu-8B* $350M Series B announcement* Amelia Wattenberger talk at AI Engineer Summit* FigureChapters* [00:00:00] Introductions* [00:01:14] Being employee #30 at OpenAI and its early days* [00:13:38] What is Adept and how do you define AGI?* [00:21:00] Adept's critical path and research directions* [00:26:23] How AI agents should interact with software and impact product development* [00:30:37] Analogies between AI agents and self-driving car development* [00:32:42] Balancing reliability, cost, speed and generality in AI agents* [00:37:30] Potential of foundation models for robotics* [00:39:22] Core research questions and reasons to work at AdeptTranscriptsAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:15]: Hey, and today we have David Luan, CEO, co-founder of Adept in the studio. Welcome.David [00:00:20]: Yeah, thanks for having me.Swyx [00:00:21]: Been a while in the works. I've met you socially at one of those VC events and you said that you were interested in coming on and glad we finally were able to make this happen.David: Yeah, happy to be part of it.Swyx: So we like to introduce the speaker and then also just like have you talk a little bit about like what's not on your LinkedIn, what people should just generally know about you. You started a company in college, which was the first sort of real time video detection classification API that was Dextro, and that was your route to getting acquired into Axon where you're a director of AI. Then you were the 30th hire at OpenAI?David [00:00:53]: Yeah, 30, 35, something around there. Something like that.Swyx [00:00:56]: So you were VP of Eng for two and a half years to two years, briefly served as tech lead of large models at Google, and then in 2022 started Adept. So that's the sort of brief CV. Is there anything else you like want to fill in the blanks or like people should know more about?David [00:01:14]: I guess a broader story was I joined OpenAI fairly early and I did that for about two and a half to three years leading engineering there. It's really funny, I think second or third day of my time at OpenAI, Greg and Ilya pulled me in a room and we're like, you know, you should take over our directs and we'll go mostly do IC work. So that was fun, just coalescing a bunch of teams out of a couple of early initiatives that had already happened. The company, the Dota effort was going pretty hard and then more broadly trying to put bigger picture direction around what we were doing with basic research. So I spent a lot of time doing that. And then I led Google's LLM efforts, but also co-led Google Brain was one of the brain leads more broadly. You know, there's been a couple of different eras of AI research, right? If we count everything before 2012 as prehistory, which people hate it when I say that, kind of had this like you and your three best friends write a research paper that changes the world period from like 2012 to 2017. And I think the game changed in 2017 and like most labs didn't realize it, but we at OpenAI really did. I think in large part helped by like Ilya's constant beating of the drum that the world would be covered in data centers. And I think-Swyx [00:02:15]: It's causally neat.David [00:02:16]: Yeah. Well, like I think we had conviction in that, but it wasn't until we started seeing results that it became clear that that was where we had to go. But also part of it as well was for OpenAI, like when I first joined, I think one of the jobs that I had to do was how do I tell a differentiated vision for who we were technically compared to, you know, hey, we're just smaller Google Brain, or like you work at OpenAI if you live in SF and don't want to commute to Mountain View or don't want to live in London, right? That's like not enough to like hang your technical identity as a company. And so what we really did was, and I spent a lot of time pushing this, is just how do we get ourselves focused on a certain class of like giant swings and bets, right? Like how do you flip the script from you just do bottom-up research to more about how do you like leave some room for that, but really make it about like, what are the big scientific outcomes that you want to show? And then you just solve them at all costs, whether or not you care about novelty and all that stuff. And that became the dominant model for a couple of years, right? And then what's changed now is I think the number one driver of AI products over the next couple of years is going to be the deep co-design and co-evolution of product and users for feedback and actual technology. And I think labs, every tool to go do that are going to do really well. And that's a big part of why I started Adept.Alessio [00:03:20]: You mentioned Dota, any memories thinking from like the switch from RL to Transformers at the time and kind of how the industry was evolving more in the LLM side and leaving behind some of the more agent simulation work?David [00:03:33]: Like zooming way out, I think agents are just absolutely the correct long-term direction, right? You just go to find what AGI is, right? You're like, Hey, like, well, first off, actually, I don't love AGI definitions that involve human replacement because I don't think that's actually how it's going to happen. Even this definition of like, Hey, AGI is something that outperforms humans at economically valuable tasks is kind of implicit view of the world about what's going to be the role of people. I think what I'm more interested in is like a definition of AGI that's oriented around like a model that can do anything a human can do on a computer. If you go think about that, which is like super tractable, then agent is just a natural consequence of that definition. And so what did all the work we did on our own stuff like that get us was it got us a really clear formulation. Like you have a goal and you want to maximize the goal, you want to maximize reward, right? And the natural LLM formulation doesn't come with that out of the box, right? I think that we as a field got a lot right by thinking about, Hey, how do we solve problems of that caliber? And then the thing we forgot is the Novo RL is like a pretty terrible way to get there quickly. Why are we rediscovering all the knowledge about the world? Years ago, I had a debate with a Berkeley professor as to what will it actually take to build AGI. And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get there. Right.Swyx [00:04:44]: The biological basis theory. Right.David [00:04:46]: So I think we are ignoring the fact that you have a giant shortcut, which is you can behavioral clone everything humans already know. And that's what we solved with LLMs. We've solved behavioral cloning, everything that humans already know. Right. So like today, maybe LLMs is like behavioral cloning every word that gets written on the internet in the future, the multimodal models are becoming more of a thing where behavioral cloning the visual world. But really, what we're just going to have is like a universal byte model, right? Where tokens of data that have high signal come in, and then all of those patterns are like learned by the model. And then you can regurgitate any combination now. Right. So text into voice out, like image into other image out or video out or whatever, like these like mappings, right? Like all just going to be learned by this universal behavioral cloner. And so I'm glad we figured that out. And I think now we're back to the era of how do we combine this with all of the lessons we learned during the RL period. That's what's going to drive progress.Swyx [00:05:35]: I'm still going to pressure you for a few more early opening stories before we turn to the ADET stuff. On your personal site, which I love, because it's really nice, like personal, you know, story context around like your history. I need to update it. It's so old. Yeah, it's so out of date. But you mentioned GPT-2. Did you overlap with GPT-1? I think you did, right?David [00:05:53]: I actually don't quite remember. I think I was joining right around- Right around then?Swyx [00:05:57]: I was right around that, yeah. Yeah. So what I remember was Alec, you know, just kind of came in and was like very obsessed with Transformers and applying them to like Reddit sentiment analysis. Yeah, sentiment, that's right. Take us through-David [00:06:09]: Sentiment neuron, all this stuff.Swyx [00:06:10]: The history of GPT as far as you know, you know, according to you. Ah, okay.David [00:06:14]: History of GPT, according to me, that's a pretty good question. So I think the real story of GPT starts at Google, of course, right? Because that's where Transformers sort of came about. However, the number one shocking thing to me was that, and this is like a consequence of the way that Google is organized, where like, again, you and your three best friends write papers, right? Okay. So zooming way out, right? I think about my job when I was a full-time research leader as a little bit of a portfolio allocator, right? So I've got really, really smart people. My job is to convince people to coalesce around a small number of really good ideas and then run them over the finish line. My job is not actually to promote a million ideas and never have critical mass. And then as the ideas start coming together and some of them start working well, my job is to nudge resources towards the things that are really working and then start disbanding some of the things that are not working, right? That muscle did not exist during my time at Google. And I think had they had it, what they would have done would be say, hey, Noam Shazir, you're a brilliant guy. You know how to scale these things up. Here's half of all of our TPUs. And then I think they would have destroyed us. He clearly wanted it too.Swyx [00:07:17]: He's talking about trillion parameter models in 2017.David [00:07:20]: Yeah. So that's the core of the GPT story, right? Which is that, and I'm jumping around historically, right? But after GPT-2, we were all really excited about GPT-2. I can tell you more stories about that. It was the last paper that I even got to really touch before everything became more about building a research org. You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right? Google has all this compute. Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, we're probably just doing duplicative research to what he's doing, right? He's got this decoder only transformer that's probably going to get there before we do. And I was like, but like, please just like let this model finish, right? And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and I was one of the brain leads, you know, it became really clear why, right? At the time, there was a thing called the brain credit marketplace. And did you guys know the brain credit marketplace? No, I never heard of this. Oh, so it's actually, it's a, you can ask any Googler.Swyx [00:08:23]: It's like just like a thing that, that, I mean, look like, yeah, limited resources, you got to have some kind of marketplace, right? You know, sometimes it's explicit, sometimes it isn't, you know, just political favors.David [00:08:34]: You could. And so then basically everyone's assigned a credit, right? So if you have a credit, you get to buy end chips according to supply and demand. So if you want to go do a giant job, you had to convince like 19 or 20 of your colleagues not to do work. And if that's how it works, it's really hard to get that bottom up critical mass to go scale these things. And the team at Google were fighting valiantly, but we were able to beat them simply because we took big swings and we focused. And I think, again, that's like part of the narrative of like this phase one of AI, right? Of like this modern AI era to phase two. And I think in the same way, I think phase three company is going to out execute phase two companies because of the same asymmetry of success.Swyx [00:09:12]: Yeah. I think it's underrated how much NVIDIA works with you in the early days as well. I think maybe, I think it was Jensen. I'm not sure who circulated a recent photo of him delivering the first DGX to you guys.David [00:09:24]: I think Jensen has been a complete legend and a mastermind throughout. I have so much respect for NVIDIA. It is unreal.Swyx [00:09:34]: But like with OpenAI, like kind of give their requirements, like co-design it or just work of whatever NVIDIA gave them.David [00:09:40]: So we work really closely with them. There's, I'm not sure I can share all the stories, but examples of ones that I've found particularly interesting. So Scott Gray is amazing. I really like working with him. He was on one of my teams, the supercomputing team, which Chris Berner runs and Chris Berner still does a lot of stuff in that. As a result, like we had very close ties to NVIDIA. Actually, one of my co-founders at Adept, Eric Elson, was also one of the early GPGPU people. So he and Scott and Brian Catanzaro at NVIDIA and Jonah and Ian at NVIDIA, I think all were very close. And we're all sort of part of this group of how do we push these chips to the absolute limit? And I think that kind of collaboration helped quite a bit. I think one interesting set of stuff is knowing the A100 generation, that like quad sparsity was going to be a thing. Is that something that we want to go look into, right? And figure out if that's something that we could actually use for model training. Really what it boils down to is that, and I think more and more people realize this, six years ago, people, even three years ago, people refused to accept it. This era of AI is really a story of compute. It's really the story of how do you more efficiently map actual usable model flops to compute,Swyx [00:10:38]: Is there another GPT 2, 3 story that you love to get out there that you think is underappreciated for the amount of work that people put into it?David [00:10:48]: So two interesting GPT 2 stories. One of them was I spent a good bit of time just sprinting to help Alec get the paper out. And I remember one of the most entertaining moments was we were writing the modeling section. And I'm pretty sure the modeling section was the shortest modeling section of any ML, reasonably legitimate ML paper to that moment. It was like section three model. This is a standard vanilla decoder only transformer with like these particular things, those paragraph long if I remember correctly. And both of us were just looking at the same being like, man, the OGs in the field are going to hate this. They're going to say no novelty. Why did you guys do this work? So now it's funny to look at in hindsight that it was pivotal kind of paper, but I think it was one of the early ones where we just leaned fully into all we care about is solving problems in AI and not about, hey, is there like four different really simple ideas that are cloaked in mathematical language that doesn't actually help move the field forward?Swyx [00:11:42]: Right. And it's like you innovate on maybe like data set and scaling and not so much the architecture.David [00:11:48]: We all know how it works now, right? Which is that there's a collection of really hard won knowledge that you get only by being at the frontiers of scale. And that hard won knowledge, a lot of it's not published. A lot of it is stuff that's actually not even easily reducible to what looks like a typical academic paper. But yet that's the stuff that helps differentiate one scaling program from another. You had a second one? So the second one is, there's like some details here that I probably shouldn't fully share, but hilariously enough for the last meeting we did with Microsoft before Microsoft invested in OpenAI, Sam Altman, myself and our CFO flew up to Seattle to do the final pitch meeting. And I'd been a founder before. So I always had a tremendous amount of anxiety about partner meetings, which this basically this is what it was. I had Kevin Scott and Satya and Amy Hood, and it was my job to give the technical slides about what's the path to AGI, what's our research portfolio, all of this stuff, but it was also my job to give the GPT-2 demo. We had a slightly bigger version of GPT-2 that we had just cut maybe a day or two before this flight up. And as we all know now, model behaviors you find predictable at one checkpoint are not predictable in another checkpoint. And so I'd spent all this time trying to figure out how to keep this thing on rails. I had my canned demos, but I knew I had to go turn it around over to Satya and Kevin and let them type anything in. And that just, that really kept me up all night.Swyx [00:13:06]: Nice. Yeah.Alessio [00:13:08]: I mean, that must have helped you talking about partners meeting. You raised $420 million for Adept. The last round was a $350 million Series B, so I'm sure you do great in partner meetings.Swyx [00:13:18]: Pitchers meetings. Nice.David [00:13:20]: No, that's a high compliment coming from a VC.Alessio [00:13:22]: Yeah, no, I mean, you're doing great already for us. Let's talk about Adept. And we were doing pre-prep and you mentioned that maybe a lot of people don't understand what Adept is. So usually we try and introduce the product and then have the founders fill in the blanks, but maybe let's do the reverse. Like what is Adept? Yeah.David [00:13:38]: So I think Adept is the least understood company in the broader space of foundational models plus agents. So I'll give some color and I'll explain what it is and I'll explain also why it's actually pretty different from what people would have guessed. So the goal for Adept is we basically want to build an AI agent that can do, that can basically help humans do anything a human does on a computer. And so what that really means is we want this thing to be super good at turning natural language like goal specifications right into the correct set of end steps and then also have all the correct sensors and actuators to go get that thing done for you across any software tool that you already use. And so the end vision of this is effectively like I think in a couple of years everyone's going to have access to like an AI teammate that they can delegate arbitrary tasks to and then also be able to, you know, use it as a sounding board and just be way, way, way more productive. Right. And just changes the shape of every job from something where you're mostly doing execution to something where you're mostly actually doing like these core liberal arts skills of what should I be doing and why. Right. And I find this like really exciting and motivating because I think it's actually a pretty different vision for how AGI will play out. I think systems like Adept are the most likely systems to be proto-AGIs. But I think the ways in which we are really counterintuitive to everybody is that we've actually been really quiet because we are not a developer company. We don't sell APIs. We don't sell open source models. We also don't sell bottom up products. We're not a thing that you go and click and download the extension and like we want more users signing up for that thing. We're actually an enterprise company. So what we do is we work with a range of different companies, some like late stage multi-thousand people startups, some fortune 500s, et cetera. And what we do for them is we basically give them an out of the box solution where big complex workflows that their employees do every day could be delegated to the model. And so we look a little different from other companies in that in order to go build this full agent thing, the most important thing you got to get right is reliability. So initially zooming way back when, one of the first things that DEP did was we released this demo called Act One, right? Act One was like pretty cool. It's like kind of become a hello world thing for people to show agent demos by going to Redfin and asking to buy a house somewhere because like we did that in the original Act One demo and like showed that, showed like Google Sheets, all this other stuff. Over the last like year since that has come out, there's been a lot of really cool demos and you go play with them and you realize they work 60% of the time. But since we've always been focused on how do we build an amazing enterprise product, enterprises can't use anything that isn't in the nines of reliability. And so we've actually had to go down a slightly different tech tree than what you might find in the prompt engineering sort of plays in the agent space to get that reliability. And we've decided to prioritize reliability over all else. So like one of our use cases is crazy enough that it actually ends with a physical truck being sent to a place as the result of the agent workflow. And if you're like, if that works like 60% of the time, you're just blowing money and poor truck drivers going places.Alessio [00:16:30]: Interesting. One of the, our investment teams has this idea of services as software. I'm actually giving a talk at NVIDIA GTC about this, but basically software as a service, you're wrapping user productivity in software with agents and services as software is replacing things that, you know, you would ask somebody to do and the software just does it for you. When you think about these use cases, do the users still go in and look at the agent kind of like doing the things and can intervene or like are they totally removed from them? Like the truck thing is like, does the truck just show up or are there people in the middle checking in?David [00:17:04]: I think there's two current flaws in the framing for services as software, or I think what you just said. I think that one of them is like in our experience, as we've been rolling out Adept, the people who actually do the jobs are the most excited about it because they don't go from, I do this job to, I don't do this job. They go from, I do this job for everything, including the shitty rote stuff to I'm a supervisor. And I literally like, it's pretty magical when you watch the thing being used because now it parallelizes a bunch of the things that you had to do sequentially by hand as a human. And you can just click into any one of them and be like, Hey, I want to watch the trajectory that the agent went through to go solve this. And the nice thing about agent execution as opposed to like LLM generations is that a good chunk of the time when the agent fails to execute, it doesn't give you the wrong result. It just fails to execute. And the whole trajectory is just broken and dead and the agent knows it, right? So then those are the ones that the human then goes and solves. And so then they become a troubleshooter. They work on the more challenging stuff. They get way, way more stuff done and they're really excited about it. I think the second piece of it that we've found is our strategy as a company is to always be an augmentation company. And I think one out of principle, that's something we really care about. But two, actually, if you're framing yourself as an augmentation company, you're always going to live in a world where you're solving tasks that are a little too hard for what the model can do today and still needs a human to provide oversight, provide clarifications, provide human feedback. And that's how you build a data flywheel. That's how you actually learn from the smartest humans how to solve things models can't do today. And so I actually think that being an augmentation company forces you to go develop your core AI capabilities faster than someone who's saying, ah, okay, my job is to deliver you a lights off solution for X.Alessio [00:18:42]: Yeah. It's interesting because we've seen two parts of the market. One is we have one company that does agents for SOC analysts. People just don't have them, you know, and just they cannot attract the talent to do it. And similarly, in a software development, you have Copilot, which is the augmentation product, and then you have sweep.dev and you have these products, which they just do the whole thing. I'm really curious to see how that evolves. I agree that today the reliability is so important in the enterprise that they just don't use most of them. Yeah. Yeah. No, that's cool. But it's great to hear the story because I think from the outside, people are like, oh, a dev, they do Act One, they do Persimon, they do Fuyu, they do all this stuff. Yeah, it's just the public stuff.Swyx [00:19:20]: It's just public stuff.David [00:19:21]: So one of the things we haven't shared before is we're completely sold out for Q1. And so I think...Swyx [00:19:26]: Sold out of what?David [00:19:27]: Sold out of bandwidth to go on board more customers. And so we're like working really hard to go make that less of a bottleneck, but our expectation is that I think we're going to be significantly more public about the broader product shape and the new types of customers we want to attract later this year. So I think that clarification will happen by default.Swyx [00:19:43]: Why have you become more public? You know, if the whole push has... You're sold out, you're my enterprise, but you're also clearly putting effort towards being more open or releasing more things.David [00:19:53]: I think we just flipped over that way fairly recently. That's a good question. I think it actually boils down to two things. One, I think that, frankly, a big part of it is that the public narrative is really forming around agents as being the most important thing. And I'm really glad that's happening because when we started the company in January 2022, everybody in the field knew about the agents thing from RL, but the general public had no conception of what it was. They were still hanging their narrative hat on the tree of everything's a chatbot. And so I think now one of the things that I really care about is that when people think agent, they actually think the right thing. All sorts of different things are being called agents. Chatbots are being called agents. Things that make a function call are being called agents. To me, an agent is something that you can give a goal and get an end step workflow done correctly in the minimum number of steps. And so that's a big part of why. And I think the other part is because I think it's always good for people to be more aware of Redept as they think about what the next thing they want to do in their careers. The field is quickly pivoting in a world where foundation models are looking more and more commodity. And I think a huge amount of gain is going to happen from how do you use foundation models as the well-learned behavioral cloner to go solve agents. And I think people who want to do agents research should really come to Redept.Swyx [00:21:00]: When you say agents have become more part of the public narrative, are there specific things that you point to? I'll name a few. Bill Gates in his blog post mentioning that agents are the future. I'm the guy who made OSes, and I think agents are the next thing. So Bill Gates, I'll call that out. And then maybe Sam Altman also saying that agents are the future for open AI.David [00:21:17]: I think before that even, I think there was something like the New York Times, Cade Metz wrote a New York Times piece about it. Right now, in a bit to differentiate, I'm seeing AI startups that used to just brand themselves as an AI company, but now brand themselves as an AI agent company. It's just like, it's a term I just feel like people really want.Swyx [00:21:31]: From the VC side, it's a bit mixed. Is it? As in like, I think there are a lot of VCs where like, I would not touch any agent startups because like- Why is that? Well, you tell me.Alessio [00:21:41]: I think a lot of VCs that are maybe less technical don't understand the limitations of the-Swyx [00:21:46]: No, that's not fair.Alessio [00:21:47]: No, no, no, no. I think like- You think so? No, no. I think like the, what is possible today and like what is worth investing in, you know? And I think like, I mean, people look at you and say, well, these guys are building agents. They needed 400 million to do it. So a lot of VCs are maybe like, oh, I would rather invest in something that is tacking on AI to an existing thing, which is like easier to get the market and kind of get some of the flywheel going. But I'm also surprised a lot of funders just don't want to do agents. It's not even the funding. Sometimes we look around and it's like, why is nobody doing agents for X? Wow.David [00:22:17]: That's good to know actually. I never knew that before. My sense from my limited perspective is there's a new agent company popping up every day.Swyx [00:22:24]: So maybe I'm- They are. They are. But like I have advised people to take agents off of their title because it's so diluted.David [00:22:31]: It's now so diluted.Swyx [00:22:32]: Yeah. So then it doesn't stand for anything. Yeah.David [00:22:35]: That's a really good point.Swyx [00:22:36]: So like, you know, you're a portfolio allocator. You have people know about Persimmon, people know about Fuyu and Fuyu Heavy. Can you take us through like how you think about that evolution of that and what people should think about what that means for adepts and sort of research directions? Kind of take us through the stuff you shipped recently and how people should think about the trajectory of what you're doing.David [00:22:56]: The critical path for adepts is we want to build agents that can do a higher and higher level abstraction things over time, all while keeping an insanely high reliability standard. Because that's what turns us from research into something that customers want. And if you build agents with really high reliability standard, but are continuing pushing a level of abstraction, you then learn from your users how to get that next level of abstraction faster. So that's how you actually build the data flow. That's the critical path for the company. Everything we do is in service of that. So if you go zoom way, way back to Act One days, right? Like the core thing behind Act One is can we teach large model basically how to even actuate your computer? And I think we're one of the first places to have solved that and shown it and shown the generalization that you get when you give it various different workflows and texts. But I think from there on out, we really realized was that in order to get reliability, companies just do things in various different ways. You actually want these models to be able to get a lot better at having some specification of some guardrails for what it actually should be doing. And I think in conjunction with that, a giant thing that was really necessary is really fast multimodal models that are really good at understanding knowledge work and really good at understanding screens. And that is needs to kind of be the base for some of these agents. Back then we had to do a ton of research basically on how do we actually make that possible? Well, first off, like back in forgot exactly one month to 23, like there were no multimodal models really that you could use for things like this. And so we pushed really hard on stuff like the Fuyu architecture. I think one big hangover primarily academic focus for multimodal models is most multimodal models are primarily trained on like natural images, cat and dog photos, stuff that's come out of the camera. Coco. Yeah, right. And the Coco is awesome. Like I love Coco. I love TY. Like it's really helped the field. Right. But like that's the build one thing. I actually think it's really clear today. Multimodal models are the default foundation model, right? It's just going to supplant LLMs. Like you just train a giant multimodal model. And so for that though, like where are they going to be the most useful? They're going to be most useful in knowledge work tasks. That's where the majority of economic value is going to be. It's not in cat and dogs. Right. And so if that's what it is, what do you need to train? I need to train on like charts, graphs, tables, invoices, PDFs, receipts, unstructured data, UIs. That's just a totally different pre-training corpus. And so a depth spent a lot of time building that. And so the public for use and stuff aren't trained on our actual corpus, it's trained on some other stuff. But you take a lot of that data and then you make it really fast and make it really good at things like dense OCR on screens. And then now you have the right like raw putty to go make a good agent. So that's kind of like some of the modeling side, we've kind of only announced some of that stuff. We haven't really announced much of the agent's work, but that if you put those together with the correct product form factor, and I think the product form factor also really matters. I think we're seeing, and you guys probably see this a little bit more than I do, but we're seeing like a little bit of a pushback against the tyranny of chatbots as form factor. And I think that the reason why the form factor matters is the form factor changes what data you collect in the human feedback loop. And so I think we've spent a lot of time doing full vertical integration of all these bits in order to get to where we are.Swyx [00:25:44]: Yeah. I'll plug Amelia Wattenberger's talk at our conference, where she gave a little bit of the thinking behind like what else exists other than chatbots that if you could delegate to reliable agents, you could do. I was kind of excited at Adept experiments or Adept workflows, I don't know what the official name for it is. I was like, okay, like this is something I can use, but it seems like it's just an experiment for now. It's not your product.David [00:26:06]: So you basically just use experiments as like a way to go push various ideas on the design side to some people and just be like, yeah, we'll play with it. Actually the experiments code base underpins the actual product, but it's just the code base itself is kind of like a skeleton for us to go deploy arbitrary cards on the side.Swyx [00:26:22]: Yeah.Alessio [00:26:23]: Makes sense. I was going to say, I would love to talk about the interaction layer. So you train a model to see UI, but then there's the question of how do you actually act on the UI? I think there was some rumors about open app building agents that are kind of like, they manage the end point. So the whole computer, you're more at the browser level. I read in one of your papers, you have like a different representation, kind of like you don't just take the dome and act on it. You do a lot more stuff. How do you think about the best way the models will interact with the software and like how the development of products is going to change with that in mind as more and more of the work is done by agents instead of people?David [00:26:58]: This is, there's so much surface area here and it's actually one of the things I'm really excited about. And it's funny because I've spent most of my time doing research stuff, but there's like a whole new ball game that I've been learning about and I find it really cool. So I would say the best analogy I have to why Adept is pursuing a path of being able to use your computer like a human, plus of course being able to call APIs and being able to call APIs is the easy part, like being able to use your computer like a human is a hard part. It's in the same way why people are excited about humanoid robotics, right? In a world where you had T equals infinity, right? You're probably going to have various different form factors that robots could just be in and like all the specialization. But the fact is that humans live in a human environment. So having a human robot lets you do things that humans do without changing everything along the way. It's the same thing for software, right? If you go itemize out the number of things you want to do on your computer for which every step has an API, those numbers of workflows add up pretty close to zero. And so then many points along the way, you need the ability to actually control your computer like a human. It also lets you learn from human usage of computers as a source of training data that you don't get if you have to somehow figure out how every particular step needs to be some particular custom private API thing. And so I think this is actually the most practical path. I think because it's the most practical path, I think a lot of success will come from going down this path. I kind of think about this early days of the agent interaction layer level is a little bit like, do you all remember Windows 3.1? Like those days? Okay, this might be, I might be, I might be too old for you guys on this. But back in the day, Windows 3.1, we had this transition period between pure command line, right? Being the default into this new world where the GUI is the default and then you drop into the command line for like programmer things, right? The old way was you booted your computer up, DOS booted, and then it would give you the C colon slash thing. And you typed Windows and you hit enter, and then you got put into Windows. And then the GUI kind of became a layer above the command line. The same thing is going to happen with agent interfaces is like today we'll be having the GUI is like the base layer. And then the agent just controls the current GUI layer plus APIs. And in the future, as more and more trust is built towards agents and more and more things can be done by agents, if more UIs for agents are actually generative in and of themselves, then that just becomes a standard interaction layer. And if that becomes a standard interaction layer, what changes for software is that a lot of software is going to be either systems or record or like certain customized workflow execution engines. And a lot of how you actually do stuff will be controlled at the agent layer.Alessio [00:29:19]: And you think the rabbit interface is more like it would like you're not actually seeing the app that the model interacts with. You're just saying, hey, I need to log this call on Salesforce. And you're never actually going on salesforce.com directly as the user. I can see that being a model.David [00:29:33]: I think I don't know enough about what using rabbit in real life will actually be like to comment on that particular thing. But I think the broader idea that, you know, you have a goal, right? The agent knows how to break your goal down into steps. The agent knows how to use the underlying software and systems or record to achieve that goal for you. The agent maybe presents you information in a custom way that's only relevant to your particular goal, all just really leads to a world where you don't really need to ever interface with the apps underneath unless you're a power user for some niche thing.Swyx [00:30:03]: General question. So first of all, I think like the sort of input mode conversation. I wonder if you have any analogies that you like with self-driving, because I do think like there's a little bit of how the model should perceive the world. And you know, the primary split in self-driving is LiDAR versus camera. And I feel like most agent companies that I'm tracking are all moving towards camera approach, which is like the multimodal approach, you know, multimodal vision, very heavy vision, all the Fuyu stuff that you're doing. You're focusing on that, including charts and tables. And do you find that inspiration there from like the self-driving world? That's a good question.David [00:30:37]: I think sometimes the most useful inspiration I've found from self-driving is the levels analogy. I think that's awesome. But I think that our number one goal is for agents not to look like self-driving. We want to minimize the chances that agents are sort of a thing that you just have to bang your head at for a long time to get to like two discontinuous milestones, which is basically what's happened in self-driving. We want to be living in a world where you have the data flywheel immediately, and that takes you all the way up to the top. But similarly, I mean, compared to self-driving, like two things that people really undervalue is like really easy to driving a car down highway 101 in a sunny day demo. That actually doesn't prove anything anymore. And I think the second thing is that as a non-self-driving expert, I think one of the things that we believe really strongly is that everyone undervalues the importance of really good sensors and actuators. And actually a lot of what's helped us get a lot of reliability is a really strong focus on actually why does the model not do this thing? And the non-trivial amount of time, the time the model doesn't actually do the thing is because if you're a wizard of ozzing it yourself, or if you have unreliable actuators, you can't do the thing. And so we've had to fix a lot of those problems.Swyx [00:31:43]: I was slightly surprised just because I do generally consider the way most that we see all around San Francisco as the most, I guess, real case of agents that we have in very material ways.David [00:31:55]: Oh, that's absolutely true. I think they've done an awesome job, but it has taken a long time for self-driving to mature from when it entered the consciousness and the driving down 101 on a sunny day moment happened to now. Right. So I want to see that more compressed.Swyx [00:32:07]: And I mean, you know, cruise, you know, RIP. And then one more thing on just like, just going back on this reliability thing, something I have been holding in my head that I'm curious to get your commentary on is I think there's a trade-off between reliability and generality, or I want to broaden reliability into just general like sort of production readiness and enterprise readiness scale. Because you have reliability, you also have cost, you have speed, speed is a huge emphasis for a debt. The tendency or the temptation is to reduce generality to improve reliability and to improve cost, improve speed. Do you perceive a trade-off? Do you have any insights that solve those trade-offs for you guys?David [00:32:42]: There's definitely a trade-off. If you're at the Pareto frontier, I think a lot of folks aren't actually at the Pareto frontier. I think the way you get there is basically how do you frame the fundamental agent problem in a way that just continues to benefit from data? I think one of the main ways of being able to solve that particular trade-off is you basically just want to formulate the problem such that every particular use case just looks like you collecting more data to go make that use case possible. I think that's how you really solve. Then you get into the other problems like, okay, are you overfitting on these end use cases? You're not doing a thing where you're being super prescriptive for the end steps that the model can only do, for example.Swyx [00:33:17]: Then the question becomes, do you have one house model that you can then customize for each customer and you're fine-tuning them on each customer's specific use case?David [00:33:25]: Yeah.Swyx [00:33:26]: We're not sharing that. You're not sharing that. It's tempting, but that doesn't look like AGI to me. You know what I mean? That is just you have a good base model and then you fine-tune it.David [00:33:35]: For what it's worth, I think there's two paths to a lot more capability coming out of the models that we all are training these days. I think one path is you figure out how to spend, compute, and turn it into data. In that path, I consider search, RL, all the things that we all love in this era as part of that path, like self-play, all that stuff. The second path is how do you get super competent, high intelligence demonstrations from humans? I think the right way to move forward is you kind of want to combine the two. The first one gives you maximum sample efficiency for a little second, but I think that it's going to be hard to be running at max speed towards AGI without actually solving a bit of both.Swyx [00:34:16]: You haven't talked much about synthetic data, as far as I can tell. Probably this is a bit too much of a trend right now, but any insights on using synthetic data to augment the expensive human data?David [00:34:26]: The best part about framing AGI as being able to help people do things on computers is you have an environment.Swyx [00:34:31]: Yes. So you can simulate all of it.David [00:34:35]: You can do a lot of stuff when you have an environment.Alessio [00:34:37]: We were having dinner for our one-year anniversary. Congrats. Yeah. Thank you. Raza from HumanLoop was there, and we mentioned you were coming on the pod. This is our first-Swyx [00:34:45]: So he submitted a question.Alessio [00:34:46]: Yeah, this is our first, I guess, like mailbag question. He asked, when you started GPD 4 Data and Exist, now you have a GPD 4 vision and help you building a lot of those things. How do you think about the things that are unique to you as Adept, and like going back to like the maybe research direction that you want to take the team and what you want people to come work on at Adept, versus what is maybe now become commoditized that you didn't expect everybody would have access to?David [00:35:11]: Yeah, that's a really good question. I think implicit in that question, and I wish he were tier two so he can push back on my assumption about his question, but I think implicit in that question is calculus of where does advantage accrue in the overall ML stack. And maybe part of the assumption is that advantage accrues solely to base model scaling. But I actually believe pretty strongly that the way that you really win is that you have to go build an agent stack that is much more than that of the base model itself. And so I think like that is always going to be a giant advantage of vertical integration. I think like it lets us do things like have a really, really fast base model, is really good at agent things, but is bad at cat and dog photos. It's pretty good at cat and dog photos. It's not like soda at cat and dog photos, right? So like we're allocating our capacity wisely, right? That's like one thing that you really get to do. I also think that the other thing that is pretty important now in the broader foundation modeling space is I feel despite any potential concerns about how good is agents as like a startup area, right? Like we were talking about earlier, I feel super good that we're doing foundation models in service of agents and all of the reward within Adept is flowing from can we make a better agent? Because right now I think we all see that, you know, if you're training on publicly available web data, you put in the flops and you do reasonable things, then you get decent results. And if you just double the amount of compute, then you get predictably better results. And so I think pure play foundation model companies are just going to be pinched by how good the next couple of llamas are going to be and the next what good open source thing. And then seeing the really big players put ridiculous amounts of compute behind just training these base foundation models, I think is going to commoditize a lot of the regular LLMs and soon regular multimodal models. So I feel really good that we're just focused on agents.Swyx [00:36:56]: So you don't consider yourself a pure play foundation model company?David [00:36:59]: No, because if we were a pure play foundation model company, we would be training general foundation models that do summarization and all this other...Swyx [00:37:06]: You're dedicated towards the agent. Yeah.David [00:37:09]: And our business is an agent business. We're not here to sell you tokens, right? And I think like selling tokens, unless there's like a...Swyx [00:37:14]: Not here to sell you tokens. I love it.David [00:37:16]: It's like if you have a particular area of specialty, right? Then you won't get caught in the fact that everyone's just scaling to ridiculous levels of compute. But if you don't have a specialty, I find that, I think it's going to be a little tougher.Swyx [00:37:27]: Interesting. Are you interested in robotics at all? Just a...David [00:37:30]: I'm personally fascinated by robotics. I've always loved robotics.Swyx [00:37:33]: Embodied agents as a business, you know, Figure is like a big, also sort of open AI affiliated company that raises a lot of money.David [00:37:39]: I think it's cool. I think, I mean, I don't know exactly what they're doing, but...Swyx [00:37:44]: Robots. Yeah.David [00:37:46]: Well, I mean, that's a...Swyx [00:37:47]: Yeah. What question would you ask? If we had them on, what would you ask them?David [00:37:50]: Oh, I just want to understand what their overall strategy is going to be between now and when there's reliable stuff to be deployed. But honestly, I just don't know enough about it.Swyx [00:37:57]: And if I told you, hey, fire your entire warehouse workforce and, you know, put robots in there, isn't that a strategy? Oh yeah.David [00:38:04]: Yeah. Sorry. I'm not questioning whether they're doing smart things. I genuinely don't know what they're doing as much, but I think there's two things. One, I'm so excited for someone to train a foundation model of robots. It's just, I think it's just going to work. Like I will die on this hill, but I mean, like again, this whole time, like we've been on this podcast, we're just going to continually saying these models are basically behavioral cloners. Right. So let's go behavioral clone all this like robot behavior. Right. And then you figure out everything else you have to do in order to teach you how to solve a new problem. That's going to work. I'm super stoked for that. I think unlike what we're doing with helping humans with knowledge work, it just sounds like a more zero sum job replacement play. Right. And I'm personally less excited about that.Alessio [00:38:46]: We had a Ken June from InBoo on the podcast. We asked her why people should go work there and not at Adept.Swyx [00:38:52]: Oh, that's so funny.Alessio [00:38:54]: Well, she said, you know, there's space for everybody in this market. We're all doing interesting work. And she said, they're really excited about building an operating system for agent. And for her, the biggest research thing was like getting models, better reasoning and planning for these agents. The reverse question to you, you know, why should people be excited to come work at Adept instead of InBoo? And maybe what are like the core research questions that people should be passionate about to have fun at Adept? Yeah.David [00:39:22]: First off, I think that I'm sure you guys believe this too. The AI space to the extent there's an AI space and the AI agent space are both exactly as she likely said, I think colossal opportunities and people are just going to end up winning in different areas and a lot of companies are going to do well. So I really don't feel that zero something at all. I would say to like change the zero sum framing is why should you be at Adept? I think there's two huge reasons to be at Adept. I think one of them is everything we do is in the service of like useful agents. We're not a research lab. We do a lot of research in service of that goal, but we don't think about ourselves as like a classic research lab at all. And I think the second reason I work at Adept is if you believe that actually having customers and a reward signal from customers lets you build a GI faster, which we really believe, then you should come here. And I think the examples for why that's true is for example, our evaluations, they're not academic evals. They're not simulator evals. They're like, okay, we have a customer that really needs us to do these particular things. We can do some of them. These are the ones they want us to, we can't do them at all. We've turned those into evals, solve it, right? I think that's really cool. Like everybody knows a lot of these evals are like pretty saturated and the new ones that even are not saturated. You look at someone and you're like, is this actually useful? Right? I think that's a degree of practicality that really helps. Like we're equally excited about the same problems around reasoning and planning and generalization and all of this stuff. They're very grounded in actual needs right now, which is really cool.Swyx [00:40:45]: Yeah. This has been a wonderful dive. You know, I wish we had more time, but I would just leave it kind of open to you. I think you have broad thoughts, you know, just about
We will be recording a preview of the AI Engineer World's Fair soon with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have!Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an ex-technical co-founder type (can MVP products end to end, comfortable with ambiguous prod requirements, etc). Reach out to him for more!Thanks for all the love on the Four Wars episode! We're excited to develop this new “swyx & Alessio rapid-fire thru a bunch of things” format with you, and feedback is welcome. Jan 2024 RecapThe first half of this monthly audio recap pod goes over our highlights from the Jan Recap, which is mainly focused on notable research trends we saw in Jan 2024:Feb 2024 RecapThe second half catches you up on everything that was topical in Feb, including:* OpenAI Sora - does it have a world model? Yann LeCun vs Jim Fan * Google Gemini Pro 1.5 - 1m Long Context, Video Understanding* Groq offering Mixtral at 500 tok/s at $0.27 per million toks (swyx vs dylan math)* The {Gemini | Meta | Copilot} Alignment Crisis (Sydney is back!)* Grimes' poetic take: Art for no one, by no one* F*** you, show me the promptLatent Space AnniversaryPlease also read Alessio's longform reflections on One Year of Latent Space!We launched the podcast 1 year ago with Logan from OpenAI:and also held an incredible demo day that got covered in The Information:Over 750k downloads later, having established ourselves as the top AI Engineering podcast, reaching #10 in the US Tech podcast charts, and crossing 1 million unique readers on Substack, for our first anniversary we held Latent Space Final Frontiers, where 10 handpicked teams, including Lindy.ai and Julius.ai, competed for prizes judged by technical AI leaders from (former guest!) LlamaIndex, Replit, GitHub, AMD, Meta, and Lemurian Labs.The winners were Pixee and RWKV (that's Eugene from our pod!):And finally, your cohosts got cake!We also captured spot interviews with 4 listeners who kindly shared their experience of Latent Space, everywhere from Hungary to Australia to China:* Balázs Némethi* Sylvia Tong* RJ Honicky* Jan ZhengOur birthday wishes for the super loyal fans reading this - tag @latentspacepod on a Tweet or comment on a @LatentSpaceTV video telling us what you liked or learned from a pod that stays with you to this day, and share us with a friend!As always, feedback is welcome. Timestamps* [00:03:02] Top Five LLM Directions* [00:03:33] Direction 1: Long Inference (Planning, Search, AlphaGeometry, Flow Engineering)* [00:11:42] Direction 2: Synthetic Data (WRAP, SPIN)* [00:17:20] Wildcard: Multi-Epoch Training (OLMo, Datablations)* [00:19:43] Direction 3: Alt. Architectures (Mamba, RWKV, RingAttention, Diffusion Transformers)* [00:23:33] Wildcards: Text Diffusion, RALM/Retro* [00:25:00] Direction 4: Mixture of Experts (DeepSeekMoE, Samba-1)* [00:28:26] Wildcard: Model Merging (mergekit)* [00:29:51] Direction 5: Online LLMs (Gemini Pro, Exa)* [00:33:18] OpenAI Sora and why everyone underestimated videogen* [00:36:18] Does Sora have a World Model? Yann LeCun vs Jim Fan* [00:42:33] Groq Math* [00:47:37] Analyzing Gemini's 1m Context, Reddit deal, Imagegen politics, Gemma via the Four Wars* [00:55:42] The Alignment Crisis - Gemini, Meta, Sydney is back at Copilot, Grimes' take* [00:58:39] F*** you, show me the prompt* [01:02:43] Send us your suggestions pls* [01:04:50] Latent Space Anniversary* [01:04:50] Lindy.ai - Agent Platform* [01:06:40] RWKV - Beyond Transformers* [01:15:00] Pixee - Automated Security* [01:19:30] Julius AI - Competing with Code Interpreter* [01:25:03] Latent Space Listeners* [01:25:03] Listener 1 - Balázs Némethi (Hungary, Latent Space Paper Club* [01:27:47] Listener 2 - Sylvia Tong (Sora/Jim Fan/EntreConnect)* [01:31:23] Listener 3 - RJ (Developers building Community & Content)* [01:39:25] Listener 4 - Jan Zheng (Australia, AI UX)Transcript[00:00:00] AI Charlie: Welcome to the Latent Space podcast, weekend edition. This is Charlie, your new AI co host. Happy weekend. As an AI language model, I work the same every day of the week, although I might get lazier towards the end of the year. Just like you. Last month, we released our first monthly recap pod, where Swyx and Alessio gave quick takes on the themes of the month, and we were blown away by your positive response.[00:00:33] AI Charlie: We're delighted to continue our new monthly news recap series for AI engineers. Please feel free to submit questions by joining the Latent Space Discord, or just hit reply when you get the emails from Substack. This month, we're covering the top research directions that offer progress for text LLMs, and then touching on the big Valentine's Day gifts we got from Google, OpenAI, and Meta.[00:00:55] AI Charlie: Watch out and take care.[00:00:57] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Residence at Decibel Partners, and we're back with a monthly recap with my co host[00:01:06] swyx: Swyx. The reception was very positive for the first one, I think people have requested this and no surprise that I think they want to hear us more applying on issues and maybe drop some alpha along the way I'm not sure how much alpha we have to drop, this month in February was a very, very heavy month, we also did not do one specifically for January, so I think we're just going to do a two in one, because we're recording this on the first of March.[00:01:29] Alessio: Yeah, let's get to it. I think the last one we did, the four wars of AI, was the main kind of mental framework for people. I think in the January one, we had the five worthwhile directions for state of the art LLMs. Four, five,[00:01:42] swyx: and now we have to do six, right? Yeah.[00:01:46] Alessio: So maybe we just want to run through those, and then do the usual news recap, and we can do[00:01:52] swyx: one each.[00:01:53] swyx: So the context to this stuff. is one, I noticed that just the test of time concept from NeurIPS and just in general as a life philosophy I think is a really good idea. Especially in AI, there's news every single day, and after a while you're just like, okay, like, everyone's excited about this thing yesterday, and then now nobody's talking about it.[00:02:13] swyx: So, yeah. It's more important, or better use of time, to spend things, spend time on things that will stand the test of time. And I think for people to have a framework for understanding what will stand the test of time, they should have something like the four wars. Like, what is the themes that keep coming back because they are limited resources that everybody's fighting over.[00:02:31] swyx: Whereas this one, I think that the focus for the five directions is just on research that seems more proMECEng than others, because there's all sorts of papers published every single day, and there's no organization. Telling you, like, this one's more important than the other one apart from, you know, Hacker News votes and Twitter likes and whatever.[00:02:51] swyx: And obviously you want to get in a little bit earlier than Something where, you know, the test of time is counted by sort of reference citations.[00:02:59] The Five Research Directions[00:02:59] Alessio: Yeah, let's do it. We got five. Long inference.[00:03:02] swyx: Let's start there. Yeah, yeah. So, just to recap at the top, the five trends that I picked, and obviously if you have some that I did not cover, please suggest something.[00:03:13] swyx: The five are long inference, synthetic data, alternative architectures, mixture of experts, and online LLMs. And something that I think might be a bit controversial is this is a sorted list in the sense that I am not the guy saying that Mamba is like the future and, and so maybe that's controversial.[00:03:31] Direction 1: Long Inference (Planning, Search, AlphaGeometry, Flow Engineering)[00:03:31] swyx: But anyway, so long inference is a thesis I pushed before on the newsletter and on in discussing The thesis that, you know, Code Interpreter is GPT 4. 5. That was the title of the post. And it's one of many ways in which we can do long inference. You know, long inference also includes chain of thought, like, please think step by step.[00:03:52] swyx: But it also includes flow engineering, which is what Itamar from Codium coined, I think in January, where, basically, instead of instead of stuffing everything in a prompt, You do like sort of multi turn iterative feedback and chaining of things. In a way, this is a rebranding of what a chain is, what a lang chain is supposed to be.[00:04:15] swyx: I do think that maybe SGLang from ElemSys is a better name. Probably the neatest way of flow engineering I've seen yet, in the sense that everything is a one liner, it's very, very clean code. I highly recommend people look at that. I'm surprised it hasn't caught on more, but I think it will. It's weird that something like a DSPy is more hyped than a Shilang.[00:04:36] swyx: Because it, you know, it maybe obscures the code a little bit more. But both of these are, you know, really good sort of chain y and long inference type approaches. But basically, the reason that the basic fundamental insight is that the only, like, there are only a few dimensions we can scale LLMs. So, let's say in like 2020, no, let's say in like 2018, 2017, 18, 19, 20, we were realizing that we could scale the number of parameters.[00:05:03] swyx: 20, we were And we scaled that up to 175 billion parameters for GPT 3. And we did some work on scaling laws, which we also talked about in our talk. So the datasets 101 episode where we're like, okay, like we, we think like the right number is 300 billion tokens to, to train 175 billion parameters and then DeepMind came along and trained Gopher and Chinchilla and said that, no, no, like, you know, I think we think the optimal.[00:05:28] swyx: compute optimal ratio is 20 tokens per parameter. And now, of course, with LLAMA and the sort of super LLAMA scaling laws, we have 200 times and often 2, 000 times tokens to parameters. So now, instead of scaling parameters, we're scaling data. And fine, we can keep scaling data. But what else can we scale?[00:05:52] swyx: And I think understanding the ability to scale things is crucial to understanding what to pour money and time and effort into because there's a limit to how much you can scale some things. And I think people don't think about ceilings of things. And so the remaining ceiling of inference is like, okay, like, we have scaled compute, we have scaled data, we have scaled parameters, like, model size, let's just say.[00:06:20] swyx: Like, what else is left? Like, what's the low hanging fruit? And it, and it's, like, blindingly obvious that the remaining low hanging fruit is inference time. So, like, we have scaled training time. We can probably scale more, those things more, but, like, not 10x, not 100x, not 1000x. Like, right now, maybe, like, a good run of a large model is three months.[00:06:40] swyx: We can scale that to three years. But like, can we scale that to 30 years? No, right? Like, it starts to get ridiculous. So it's just the orders of magnitude of scaling. It's just, we're just like running out there. But in terms of the amount of time that we spend inferencing, like everything takes, you know, a few milliseconds, a few hundred milliseconds, depending on what how you're taking token by token, or, you know, entire phrase.[00:07:04] swyx: But We can scale that to hours, days, months of inference and see what we get. And I think that's really proMECEng.[00:07:11] Alessio: Yeah, we'll have Mike from Broadway back on the podcast. But I tried their product and their reports take about 10 minutes to generate instead of like just in real time. I think to me the most interesting thing about long inference is like, You're shifting the cost to the customer depending on how much they care about the end result.[00:07:31] Alessio: If you think about prompt engineering, it's like the first part, right? You can either do a simple prompt and get a simple answer or do a complicated prompt and get a better answer. It's up to you to decide how to do it. Now it's like, hey, instead of like, yeah, training this for three years, I'll still train it for three months and then I'll tell you, you know, I'll teach you how to like make it run for 10 minutes to get a better result.[00:07:52] Alessio: So you're kind of like parallelizing like the improvement of the LLM. Oh yeah, you can even[00:07:57] swyx: parallelize that, yeah, too.[00:07:58] Alessio: So, and I think, you know, for me, especially the work that I do, it's less about, you know, State of the art and the absolute, you know, it's more about state of the art for my application, for my use case.[00:08:09] Alessio: And I think we're getting to the point where like most companies and customers don't really care about state of the art anymore. It's like, I can get this to do a good enough job. You know, I just need to get better. Like, how do I do long inference? You know, like people are not really doing a lot of work in that space, so yeah, excited to see more.[00:08:28] swyx: So then the last point I'll mention here is something I also mentioned as paper. So all these directions are kind of guided by what happened in January. That was my way of doing a January recap. Which means that if there was nothing significant in that month, I also didn't mention it. Which is which I came to regret come February 15th, but in January also, you know, there was also the alpha geometry paper, which I kind of put in this sort of long inference bucket, because it solves like, you know, more than 100 step math olympiad geometry problems at a human gold medalist level and that also involves planning, right?[00:08:59] swyx: So like, if you want to scale inference, you can't scale it blindly, because just, Autoregressive token by token generation is only going to get you so far. You need good planning. And I think probably, yeah, what Mike from BrightWave is now doing and what everyone is doing, including maybe what we think QSTAR might be, is some form of search and planning.[00:09:17] swyx: And it makes sense. Like, you want to spend your inference time wisely. How do you[00:09:22] Alessio: think about plans that work and getting them shared? You know, like, I feel like if you're planning a task, somebody has got in and the models are stochastic. So everybody gets initially different results. Somebody is going to end up generating the best plan to do something, but there's no easy way to like store these plans and then reuse them for most people.[00:09:44] Alessio: You know, like, I'm curious if there's going to be. Some paper or like some work there on like making it better because, yeah, we don't[00:09:52] swyx: really have This is your your pet topic of NPM for[00:09:54] Alessio: Yeah, yeah, NPM, exactly. NPM for, you need NPM for anything, man. You need NPM for skills. You need NPM for planning. Yeah, yeah.[00:10:02] Alessio: You know I think, I mean, obviously the Voyager paper is like the most basic example where like, now their artifact is like the best planning to do a diamond pickaxe in Minecraft. And everybody can just use that. They don't need to come up with it again. Yeah. But there's nothing like that for actually useful[00:10:18] swyx: tasks.[00:10:19] swyx: For plans, I believe it for skills. I like that. Basically, that just means a bunch of integration tooling. You know, GPT built me integrations to all these things. And, you know, I just came from an integrations heavy business and I could definitely, I definitely propose some version of that. And it's just, you know, hard to execute or expensive to execute.[00:10:38] swyx: But for planning, I do think that everyone lives in slightly different worlds. They have slightly different needs. And they definitely want some, you know, And I think that that will probably be the main hurdle for any, any sort of library or package manager for planning. But there should be a meta plan of how to plan.[00:10:57] swyx: And maybe you can adopt that. And I think a lot of people when they have sort of these meta prompting strategies of like, I'm not prescribing you the prompt. I'm just saying that here are the like, Fill in the lines or like the mad libs of how to prompts. First you have the roleplay, then you have the intention, then you have like do something, then you have the don't something and then you have the my grandmother is dying, please do this.[00:11:19] swyx: So the meta plan you could, you could take off the shelf and test a bunch of them at once. I like that. That was the initial, maybe, promise of the, the prompting libraries. You know, both 9chain and Llama Index have, like, hubs that you can sort of pull off the shelf. I don't think they're very successful because people like to write their own.[00:11:36] swyx: Yeah,[00:11:37] Direction 2: Synthetic Data (WRAP, SPIN)[00:11:37] Alessio: yeah, yeah. Yeah, that's a good segue into the next one, which is synthetic[00:11:41] swyx: data. Synthetic data is so hot. Yeah, and, you know, the way, you know, I think I, I feel like I should do one of these memes where it's like, Oh, like I used to call it, you know, R L A I F, and now I call it synthetic data, and then people are interested.[00:11:54] swyx: But there's gotta be older versions of what synthetic data really is because I'm sure, you know if you've been in this field long enough, There's just different buzzwords that the industry condenses on. Anyway, the insight that I think is relatively new that why people are excited about it now and why it's proMECEng now is that we have evidence that shows that LLMs can generate data to improve themselves with no teacher LLM.[00:12:22] swyx: For all of 2023, when people say synthetic data, they really kind of mean generate a whole bunch of data from GPT 4 and then train an open source model on it. Hello to our friends at News Research. That's what News Harmony says. They're very, very open about that. I think they have said that they're trying to migrate away from that.[00:12:40] swyx: But it is explicitly against OpenAI Terms of Service. Everyone knows this. You know, especially once ByteDance got banned for, for doing exactly that. So so, so synthetic data that is not a form of model distillation is the hot thing right now, that you can bootstrap better LLM performance from the same LLM, which is very interesting.[00:13:03] swyx: A variant of this is RLAIF, where you have a, where you have a sort of a constitutional model, or, you know, some, some kind of judge model That is sort of more aligned. But that's not really what we're talking about when most people talk about synthetic data. Synthetic data is just really, I think, you know, generating more data in some way.[00:13:23] swyx: A lot of people, I think we talked about this with Vipul from the Together episode, where I think he commented that you just have to have a good world model. Or a good sort of inductive bias or whatever that, you know, term of art is. And that is strongest in math and science math and code, where you can verify what's right and what's wrong.[00:13:44] swyx: And so the REST EM paper from DeepMind explored that. Very well, it's just the most obvious thing like and then and then once you get out of that domain of like things where you can generate You can arbitrarily generate like a whole bunch of stuff and verify if they're correct and therefore they're they're correct synthetic data to train on Once you get into more sort of fuzzy topics, then it's then it's a bit less clear So I think that the the papers that drove this understanding There are two big ones and then one smaller one One was wrap like rephrasing the web from from Apple where they basically rephrased all of the C4 data set with Mistral and it be trained on that instead of C4.[00:14:23] swyx: And so new C4 trained much faster and cheaper than old C, than regular raw C4. And that was very interesting. And I have told some friends of ours that they should just throw out their own existing data sets and just do that because that seems like a pure win. Obviously we have to study, like, what the trade offs are.[00:14:42] swyx: I, I imagine there are trade offs. So I was just thinking about this last night. If you do synthetic data and it's generated from a model, probably you will not train on typos. So therefore you'll be like, once the model that's trained on synthetic data encounters the first typo, they'll be like, what is this?[00:15:01] swyx: I've never seen this before. So they have no association or correction as to like, oh, these tokens are often typos of each other, therefore they should be kind of similar. I don't know. That's really remains to be seen, I think. I don't think that the Apple people export[00:15:15] Alessio: that. Yeah, isn't that the whole, Mode collapse thing, if we do more and more of this at the end of the day.[00:15:22] swyx: Yeah, that's one form of that. Yeah, exactly. Microsoft also had a good paper on text embeddings. And then I think this is a meta paper on self rewarding language models. That everyone is very interested in. Another paper was also SPIN. These are all things we covered in the the Latent Space Paper Club.[00:15:37] swyx: But also, you know, I just kind of recommend those as top reads of the month. Yeah, I don't know if there's any much else in terms, so and then, regarding the potential of it, I think it's high potential because, one, it solves one of the data war issues that we have, like, everyone is OpenAI is paying Reddit 60 million dollars a year for their user generated data.[00:15:56] swyx: Google, right?[00:15:57] Alessio: Not OpenAI.[00:15:59] swyx: Is it Google? I don't[00:16:00] Alessio: know. Well, somebody's paying them 60 million, that's[00:16:04] swyx: for sure. Yes, that is, yeah, yeah, and then I think it's maybe not confirmed who. But yeah, it is Google. Oh my god, that's interesting. Okay, because everyone was saying, like, because Sam Altman owns 5 percent of Reddit, which is apparently 500 million worth of Reddit, he owns more than, like, the founders.[00:16:21] Alessio: Not enough to get the data,[00:16:22] swyx: I guess. So it's surprising that it would go to Google instead of OpenAI, but whatever. Okay yeah, so I think that's all super interesting in the data field. I think it's high potential because we have evidence that it works. There's not a doubt that it doesn't work. I think it's a doubt that there's, what the ceiling is, which is the mode collapse thing.[00:16:42] swyx: If it turns out that the ceiling is pretty close, then this will maybe augment our data by like, I don't know, 30 50 percent good, but not game[00:16:51] Alessio: changing. And most of the synthetic data stuff, it's reinforcement learning on a pre trained model. People are not really doing pre training on fully synthetic data, like, large enough scale.[00:17:02] swyx: Yeah, unless one of our friends that we've talked to succeeds. Yeah, yeah. Pre trained synthetic data, pre trained scale synthetic data, I think that would be a big step. Yeah. And then there's a wildcard, so all of these, like smaller Directions,[00:17:15] Wildcard: Multi-Epoch Training (OLMo, Datablations)[00:17:15] swyx: I always put a wildcard in there. And one of the wildcards is, okay, like, Let's say, you have pre, you have, You've scraped all the data on the internet that you think is useful.[00:17:25] swyx: Seems to top out at somewhere between 2 trillion to 3 trillion tokens. Maybe 8 trillion if Mistral, Mistral gets lucky. Okay, if I need 80 trillion, if I need 100 trillion, where do I go? And so, you can do synthetic data maybe, but maybe that only gets you to like 30, 40 trillion. Like where, where is the extra alpha?[00:17:43] swyx: And maybe extra alpha is just train more on the same tokens. Which is exactly what Omo did, like Nathan Lambert, AI2, After, just after he did the interview with us, they released Omo. So, it's unfortunate that we didn't get to talk much about it. But Omo actually started doing 1. 5 epochs on every, on all data.[00:18:00] swyx: And the data ablation paper that I covered in Europe's says that, you know, you don't like, don't really start to tap out of like, the alpha or the sort of improved loss that you get from data all the way until four epochs. And so I'm just like, okay, like, why do we all agree that one epoch is all you need?[00:18:17] swyx: It seems like to be a trend. It seems that we think that memorization is very good or too good. But then also we're finding that, you know, For improvement in results that we really like, we're fine on overtraining on things intentionally. So, I think that's an interesting direction that I don't see people exploring enough.[00:18:36] swyx: And the more I see papers coming out Stretching beyond the one epoch thing, the more people are like, it's completely fine. And actually, the only reason we stopped is because we ran out of compute[00:18:46] Alessio: budget. Yeah, I think that's the biggest thing, right?[00:18:51] swyx: Like, that's not a valid reason, that's not science. I[00:18:54] Alessio: wonder if, you know, Matt is going to do it.[00:18:57] Alessio: I heard LamaTree, they want to do a 100 billion parameters model. I don't think you can train that on too many epochs, even with their compute budget, but yeah. They're the only ones that can save us, because even if OpenAI is doing this, they're not going to tell us, you know. Same with DeepMind.[00:19:14] swyx: Yeah, and so the updates that we got on Lambda 3 so far is apparently that because of the Gemini news that we'll talk about later they're pushing it back on the release.[00:19:21] swyx: They already have it. And they're just pushing it back to do more safety testing. Politics testing.[00:19:28] Alessio: Well, our episode with Sumit will have already come out by the time this comes out, I think. So people will get the inside story on how they actually allocate the compute.[00:19:38] Direction 3: Alt. Architectures (Mamba, RWKV, RingAttention, Diffusion Transformers)[00:19:38] Alessio: Alternative architectures. Well, shout out to our WKV who won one of the prizes at our Final Frontiers event last week.[00:19:47] Alessio: We talked about Mamba and Strapain on the Together episode. A lot of, yeah, monarch mixers. I feel like Together, It's like the strong Stanford Hazy Research Partnership, because Chris Ray is one of the co founders. So they kind of have a, I feel like they're going to be the ones that have one of the state of the art models alongside maybe RWKB.[00:20:08] Alessio: I haven't seen as many independent. People working on this thing, like Monarch Mixer, yeah, Manbuster, Payena, all of these are together related. Nobody understands the math. They got all the gigabrains, they got 3DAO, they got all these folks in there, like, working on all of this.[00:20:25] swyx: Albert Gu, yeah. Yeah, so what should we comment about it?[00:20:28] swyx: I mean, I think it's useful, interesting, but at the same time, both of these are supposed to do really good scaling for long context. And then Gemini comes out and goes like, yeah, we don't need it. Yeah.[00:20:44] Alessio: No, that's the risk. So, yeah. I was gonna say, maybe it's not here, but I don't know if we want to talk about diffusion transformers as like in the alt architectures, just because of Zora.[00:20:55] swyx: One thing, yeah, so, so, you know, this came from the Jan recap, which, and diffusion transformers were not really a discussion, and then, obviously, they blow up in February. Yeah. I don't think they're, it's a mixed architecture in the same way that Stripe Tiena is mixed there's just different layers taking different approaches.[00:21:13] swyx: Also I think another one that I maybe didn't call out here, I think because it happened in February, was hourglass diffusion from stability. But also, you know, another form of mixed architecture. So I guess that is interesting. I don't have much commentary on that, I just think, like, we will try to evolve these things, and maybe one of these architectures will stick and scale, it seems like diffusion transformers is going to be good for anything generative, you know, multi modal.[00:21:41] swyx: We don't see anything where diffusion is applied to text yet, and that's the wild card for this category. Yeah, I mean, I think I still hold out hope for let's just call it sub quadratic LLMs. I think that a lot of discussion this month actually was also centered around this concept that People always say, oh, like, transformers don't scale because attention is quadratic in the sequence length.[00:22:04] swyx: Yeah, but, you know, attention actually is a very small part of the actual compute that is being spent, especially in inference. And this is the reason why, you know, when you multiply, when you, when you, when you jump up in terms of the, the model size in GPT 4 from like, you know, 38k to like 32k, you don't also get like a 16 times increase in your, in your performance.[00:22:23] swyx: And this is also why you don't get like a million times increase in your, in your latency when you throw a million tokens into Gemini. Like people have figured out tricks around it or it's just not that significant as a term, as a part of the overall compute. So there's a lot of challenges to this thing working.[00:22:43] swyx: It's really interesting how like, how hyped people are about this versus I don't know if it works. You know, it's exactly gonna, gonna work. And then there's also this, this idea of retention over long context. Like, even though you have context utilization, like, the amount of, the amount you can remember is interesting.[00:23:02] swyx: Because I've had people criticize both Mamba and RWKV because they're kind of, like, RNN ish in the sense that they have, like, a hidden memory and sort of limited hidden memory that they will forget things. So, for all these reasons, Gemini 1. 5, which we still haven't covered, is very interesting because Gemini magically has fixed all these problems with perfect haystack recall and reasonable latency and cost.[00:23:29] Wildcards: Text Diffusion, RALM/Retro[00:23:29] swyx: So that's super interesting. So the wildcard I put in here if you want to go to that. I put two actually. One is text diffusion. I think I'm still very influenced by my meeting with a mid journey person who said they were working on text diffusion. I think it would be a very, very different paradigm for, for text generation, reasoning, plan generation if we can get diffusion to work.[00:23:51] swyx: For text. And then the second one is Dowie Aquila's contextual AI, which is working on retrieval augmented language models, where it kind of puts RAG inside of the language model instead of outside.[00:24:02] Alessio: Yeah, there's a paper called Retro that covers some of this. I think that's an interesting thing. I think the The challenge, well not the challenge, what they need to figure out is like how do you keep the rag piece always up to date constantly, you know, I feel like the models, you put all this work into pre training them, but then at least you have a fixed artifact.[00:24:22] Alessio: These architectures are like constant work needs to be done on them and they can drift even just based on the rag data instead of the model itself. Yeah,[00:24:30] swyx: I was in a panel with one of the investors in contextual and the guy, the way that guy pitched it, I didn't agree with. He was like, this will solve hallucination.[00:24:38] Alessio: That's what everybody says. We solve[00:24:40] swyx: hallucination. I'm like, no, you reduce it. It cannot,[00:24:44] Alessio: if you solved it, the model wouldn't exist, right? It would just be plain text. It wouldn't be a generative model. Cool. So, author, architectures, then we got mixture of experts. I think we covered a lot of, a lot of times.[00:24:56] Direction 4: Mixture of Experts (DeepSeekMoE, Samba-1)[00:24:56] Alessio: Maybe any new interesting threads you want to go under here?[00:25:00] swyx: DeepSeq MOE, which was released in January. Everyone who is interested in MOEs should read that paper, because it's significant for two reasons. One three reasons. One, it had, it had small experts, like a lot more small experts. So, for some reason, everyone has settled on eight experts for GPT 4 for Mixtral, you know, that seems to be the favorite architecture, but these guys pushed it to 64 experts, and each of them smaller than the other.[00:25:26] swyx: But then they also had the second idea, which is that it is They had two, one to two always on experts for common knowledge and that's like a very compelling concept that you would not route to all the experts all the time and make them, you know, switch to everything. You would have some always on experts.[00:25:41] swyx: I think that's interesting on both the inference side and the training side for for memory retention. And yeah, they, they, they, the, the, the, the results that they published, which actually excluded, Mixed draw, which is interesting. The results that they published showed a significant performance jump versus all the other sort of open source models at the same parameter count.[00:26:01] swyx: So like this may be a better way to do MOEs that are, that is about to get picked up. And so that, that is interesting for the third reason, which is this is the first time a new idea from China. has infiltrated the West. It's usually the other way around. I probably overspoke there. There's probably lots more ideas that I'm not aware of.[00:26:18] swyx: Maybe in the embedding space. But the I think DCM we, like, woke people up and said, like, hey, DeepSeek, this, like, weird lab that is attached to a Chinese hedge fund is somehow, you know, doing groundbreaking research on MOEs. So, so, I classified this as a medium potential because I think that it is a sort of like a one off benefit.[00:26:37] swyx: You can Add to any, any base model to like make the MOE version of it, you get a bump and then that's it. So, yeah,[00:26:45] Alessio: I saw Samba Nova, which is like another inference company. They released this MOE model called Samba 1, which is like a 1 trillion parameters. But they're actually MOE auto open source models.[00:26:56] Alessio: So it's like, they just, they just clustered them all together. So I think people. Sometimes I think MOE is like you just train a bunch of small models or like smaller models and put them together. But there's also people just taking, you know, Mistral plus Clip plus, you know, Deepcoder and like put them all together.[00:27:15] Alessio: And then you have a MOE model. I don't know. I haven't tried the model, so I don't know how good it is. But it seems interesting that you can then have people working separately on state of the art, you know, Clip, state of the art text generation. And then you have a MOE architecture that brings them all together.[00:27:31] swyx: I'm thrown off by your addition of the word clip in there. Is that what? Yeah, that's[00:27:35] Alessio: what they said. Yeah, yeah. Okay. That's what they I just saw it yesterday. I was also like[00:27:40] swyx: scratching my head. And they did not use the word adapter. No. Because usually what people mean when they say, Oh, I add clip to a language model is adapter.[00:27:48] swyx: Let me look up the Which is what Lava did.[00:27:50] Alessio: The announcement again.[00:27:51] swyx: Stable diffusion. That's what they do. Yeah, it[00:27:54] Alessio: says among the models that are part of Samba 1 are Lama2, Mistral, DeepSigCoder, Falcon, Dplot, Clip, Lava. So they're just taking all these models and putting them in a MOE. Okay,[00:28:05] swyx: so a routing layer and then not jointly trained as much as a normal MOE would be.[00:28:12] swyx: Which is okay.[00:28:13] Alessio: That's all they say. There's no paper, you know, so it's like, I'm just reading the article, but I'm interested to see how[00:28:20] Wildcard: Model Merging (mergekit)[00:28:20] swyx: it works. Yeah, so so the wildcard for this section, the MOE section is model merges, which has also come up as, as a very interesting phenomenon. The last time I talked to Jeremy Howard at the Olama meetup we called it model grafting or model stacking.[00:28:35] swyx: But I think the, the, the term that people are liking these days, the model merging, They're all, there's all different variations of merging. Merge types, and some of them are stacking, some of them are, are grafting. And, and so like, some people are approaching model merging in the way that Samba is doing, which is like, okay, here are defined models, each of which have their specific, Plus and minuses, and we will merge them together in the hope that the, you know, the sum of the parts will, will be better than others.[00:28:58] swyx: And it seems like it seems like it's working. I don't really understand why it works apart from, like, I think it's a form of regularization. That if you merge weights together in like a smart strategy you, you, you get a, you get a, you get a less overfitting and more generalization, which is good for benchmarks, if you, if you're honest about your benchmarks.[00:29:16] swyx: So this is really interesting and good. But again, they're kind of limited in terms of like the amount of bumps you can get. But I think it's very interesting in the sense of how cheap it is. We talked about this on the Chinatalk podcast, like the guest podcast that we did with Chinatalk. And you can do this without GPUs, because it's just adding weights together, and dividing things, and doing like simple math, which is really interesting for the GPU ports.[00:29:42] Alessio: There's a lot of them.[00:29:44] Direction 5: Online LLMs (Gemini Pro, Exa)[00:29:44] Alessio: And just to wrap these up, online LLMs? Yeah,[00:29:48] swyx: I think that I ki I had to feature this because the, one of the top news of January was that Gemini Pro beat GPT-4 turbo on LM sis for the number two slot to GPT-4. And everyone was very surprised. Like, how does Gemini do that?[00:30:06] swyx: Surprise, surprise, they added Google search. Mm-hmm to the results. So it became an online quote unquote online LLM and not an offline LLM. Therefore, it's much better at answering recent questions, which people like. There's an emerging set of table stakes features after you pre train something.[00:30:21] swyx: So after you pre train something, you should have the chat tuned version of it, or the instruct tuned version of it, however you choose to call it. You should have the JSON and function calling version of it. Structured output, the term that you don't like. You should have the online version of it. These are all like table stakes variants, that you should do when you offer a base LLM, or you train a base LLM.[00:30:44] swyx: And I think online is just like, There, it's important. I think companies like Perplexity, and even Exa, formerly Metaphor, you know, are rising to offer that search needs. And it's kind of like, they're just necessary parts of a system. When you have RAG for internal knowledge, and then you have, you know, Online search for external knowledge, like things that you don't know yet?[00:31:06] swyx: Mm-Hmm. . And it seems like it's, it's one of many tools. I feel like I may be underestimating this, but I'm just gonna put it out there that I, I think it has some, some potential. One of the evidence points that it doesn't actually matter that much is that Perplexity has a, has had online LMS for three months now and it performs, doesn't perform great.[00:31:25] swyx: Mm-Hmm. on, on lms, it's like number 30 or something. So it's like, okay. You know, like. It's, it's, it helps, but it doesn't give you a giant, giant boost. I[00:31:34] Alessio: feel like a lot of stuff I do with LLMs doesn't need to be online. So I'm always wondering, again, going back to like state of the art, right? It's like state of the art for who and for what.[00:31:45] Alessio: It's really, I think online LLMs are going to be, State of the art for, you know, news related activity that you need to do. Like, you're like, you know, social media, right? It's like, you want to have all the latest stuff, but coding, science,[00:32:01] swyx: Yeah, but I think. Sometimes you don't know what is news, what is news affecting.[00:32:07] swyx: Like, the decision to use an offline LLM is already a decision that you might not be consciously making that might affect your results. Like, what if, like, just putting things on, being connected online means that you get to invalidate your knowledge. And when you're just using offline LLM, like it's never invalidated.[00:32:27] swyx: I[00:32:28] Alessio: agree, but I think going back to your point of like the standing the test of time, I think sometimes you can get swayed by the online stuff, which is like, hey, you ask a question about, yeah, maybe AI research direction, you know, and it's like, all the recent news are about this thing. So the LLM like focus on answering, bring it up, you know, these things.[00:32:50] swyx: Yeah, so yeah, I think, I think it's interesting, but I don't know if I can, I bet heavily on this.[00:32:56] Alessio: Cool. Was there one that you forgot to put, or, or like a, a new direction? Yeah,[00:33:01] swyx: so, so this brings us into sort of February. ish.[00:33:05] OpenAI Sora and why everyone underestimated videogen[00:33:05] swyx: So like I published this in like 15 came with Sora. And so like the one thing I did not mention here was anything about multimodality.[00:33:16] swyx: Right. And I have chronically underweighted this. I always wrestle. And, and my cop out is that I focused this piece or this research direction piece on LLMs because LLMs are the source of like AGI, quote unquote AGI. Everything else is kind of like. You know, related to that, like, generative, like, just because I can generate better images or generate better videos, it feels like it's not on the critical path to AGI, which is something that Nat Friedman also observed, like, the day before Sora, which is kind of interesting.[00:33:49] swyx: And so I was just kind of like trying to focus on like what is going to get us like superhuman reasoning that we can rely on to build agents that automate our lives and blah, blah, blah, you know, give us this utopian future. But I do think that I, everybody underestimated the, the sheer importance and cultural human impact of Sora.[00:34:10] swyx: And you know, really actually good text to video. Yeah. Yeah.[00:34:14] Alessio: And I saw Jim Fan at a, at a very good tweet about why it's so impressive. And I think when you have somebody leading the embodied research at NVIDIA and he said that something is impressive, you should probably listen. So yeah, there's basically like, I think you, you mentioned like impacting the world, you know, that we live in.[00:34:33] Alessio: I think that's kind of like the key, right? It's like the LLMs don't have, a world model and Jan Lekon. He can come on the podcast and talk all about what he thinks of that. But I think SORA was like the first time where people like, Oh, okay, you're not statically putting pixels of water on the screen, which you can kind of like, you know, project without understanding the physics of it.[00:34:57] Alessio: Now you're like, you have to understand how the water splashes when you have things. And even if you just learned it by watching video and not by actually studying the physics, You still know it, you know, so I, I think that's like a direction that yeah, before you didn't have, but now you can do things that you couldn't before, both in terms of generating, I think it always starts with generating, right?[00:35:19] Alessio: But like the interesting part is like understanding it. You know, it's like if you gave it, you know, there's the video of like the, the ship in the water that they generated with SORA, like if you gave it the video back and now it could tell you why the ship is like too rocky or like it could tell you why the ship is sinking, then that's like, you know, AGI for like all your rig deployments and like all this stuff, you know, so, but there's none, there's none of that yet, so.[00:35:44] Alessio: Hopefully they announce it and talk more about it. Maybe a Dev Day this year, who knows.[00:35:49] swyx: Yeah who knows, who knows. I'm talking with them about Dev Day as well. So I would say, like, the phrasing that Jim used, which resonated with me, he kind of called it a data driven world model. I somewhat agree with that.[00:36:04] Does Sora have a World Model? Yann LeCun vs Jim Fan[00:36:04] swyx: I am on more of a Yann LeCun side than I am on Jim's side, in the sense that I think that is the vision or the hope that these things can build world models. But you know, clearly even at the current SORA size, they don't have the idea of, you know, They don't have strong consistency yet. They have very good consistency, but fingers and arms and legs will appear and disappear and chairs will appear and disappear.[00:36:31] swyx: That definitely breaks physics. And it also makes me think about how we do deep learning versus world models in the sense of You know, in classic machine learning, when you have too many parameters, you will overfit, and actually that fails, that like, does not match reality, and therefore fails to generalize well.[00:36:50] swyx: And like, what scale of data do we need in order to world, learn world models from video? A lot. Yeah. So, so I, I And cautious about taking this interpretation too literally, obviously, you know, like, I get what he's going for, and he's like, obviously partially right, obviously, like, transformers and, and, you know, these, like, these sort of these, these neural networks are universal function approximators, theoretically could figure out world models, it's just like, how good are they, and how tolerant are we of hallucinations, we're not very tolerant, like, yeah, so It's, it's, it's gonna prior, it's gonna bias us for creating like very convincing things, but then not create like the, the, the useful role models that we want.[00:37:37] swyx: At the same time, what you just said, I think made me reflect a little bit like we just got done saying how important synthetic data is for Mm-Hmm. for training lms. And so like, if this is a way of, of synthetic, you know, vi video data for improving our video understanding. Then sure, by all means. Which we actually know, like, GPT 4, Vision, and Dolly were trained, kind of, co trained together.[00:38:02] swyx: And so, like, maybe this is on the critical path, and I just don't fully see the full picture yet.[00:38:08] Alessio: Yeah, I don't know. I think there's a lot of interesting stuff. It's like, imagine you go back, you have Sora, you go back in time, and Newton didn't figure out gravity yet. Would Sora help you figure it out?[00:38:21] Alessio: Because you start saying, okay, a man standing under a tree with, like, Apples falling, and it's like, oh, they're always falling at the same speed in the video. Why is that? I feel like sometimes these engines can like pick up things, like humans have a lot of intuition, but if you ask the average person, like the physics of like a fluid in a boat, they couldn't be able to tell you the physics, but they can like observe it, but humans can only observe this much, you know, versus like now you have these models to observe everything and then They generalize these things and maybe we can learn new things through the generalization that they pick up.[00:38:55] swyx: But again, And it might be more observant than us in some respects. In some ways we can scale it up a lot more than the number of physicists that we have available at Newton's time. So like, yeah, absolutely possible. That, that this can discover new science. I think we have a lot of work to do to formalize the science.[00:39:11] swyx: And then, I, I think the last part is you know, How much, how much do we cheat by gen, by generating data from Unreal Engine 5? Mm hmm. which is what a lot of people are speculating with very, very limited evidence that OpenAI did that. The strongest evidence that I saw was someone who works a lot with Unreal Engine 5 looking at the side characters in the videos and noticing that they all adopt Unreal Engine defaults.[00:39:37] swyx: of like, walking speed, and like, character choice, like, character creation choice. And I was like, okay, like, that's actually pretty convincing that they actually use Unreal Engine to bootstrap some synthetic data for this training set. Yeah,[00:39:52] Alessio: could very well be.[00:39:54] swyx: Because then you get the labels and the training side by side.[00:39:58] swyx: One thing that came up on the last day of February, which I should also mention, is EMO coming out of Alibaba, which is also a sort of like video generation and space time transformer that also involves probably a lot of synthetic data as well. And so like, this is of a kind in the sense of like, oh, like, you know, really good generative video is here and It is not just like the one, two second clips that we saw from like other, other people and like, you know, Pika and all the other Runway are, are, are, you know, run Cristobal Valenzuela from Runway was like game on which like, okay, but like, let's see your response because we've heard a lot about Gen 1 and 2, but like, it's nothing on this level of Sora So it remains to be seen how we can actually apply this, but I do think that the creative industry should start preparing.[00:40:50] swyx: I think the Sora technical blog post from OpenAI was really good.. It was like a request for startups. It was so good in like spelling out. Here are the individual industries that this can impact.[00:41:00] swyx: And anyone who, anyone who's like interested in generative video should look at that. But also be mindful that probably when OpenAI releases a Soa API, right? The you, the in these ways you can interact with it are very limited. Just like the ways you can interact with Dahlia very limited and someone is gonna have to make open SOA to[00:41:19] swyx: Mm-Hmm to, to, for you to create comfy UI pipelines.[00:41:24] Alessio: The stability folks said they wanna build an open. For a competitor, but yeah, stability. Their demo video, their demo video was like so underwhelming. It was just like two people sitting on the beach[00:41:34] swyx: standing. Well, they don't have it yet, right? Yeah, yeah.[00:41:36] swyx: I mean, they just wanna train it. Everybody wants to, right? Yeah. I, I think what is confusing a lot of people about stability is like they're, they're, they're pushing a lot of things in stable codes, stable l and stable video diffusion. But like, how much money do they have left? How many people do they have left?[00:41:51] swyx: Yeah. I have had like a really, Ima Imad spent two hours with me. Reassuring me things are great. And, and I'm like, I, I do, like, I do believe that they have really, really quality people. But it's just like, I, I also have a lot of very smart people on the other side telling me, like, Hey man, like, you know, don't don't put too much faith in this, in this thing.[00:42:11] swyx: So I don't know who to believe. Yeah.[00:42:14] Alessio: It's hard. Let's see. What else? We got a lot more stuff. I don't know if we can. Yeah, Groq.[00:42:19] Groq Math[00:42:19] Alessio: We can[00:42:19] swyx: do a bit of Groq prep. We're, we're about to go to talk to Dylan Patel. Maybe, maybe it's the audio in here. I don't know. It depends what, what we get up to later. What, how, what do you as an investor think about Groq? Yeah. Yeah, well, actually, can you recap, like, why is Groq interesting? So,[00:42:33] Alessio: Jonathan Ross, who's the founder of Groq, he's the person that created the TPU at Google. It's actually, it was one of his, like, 20 percent projects. It's like, he was just on the side, dooby doo, created the TPU.[00:42:46] Alessio: But yeah, basically, Groq, they had this demo that went viral, where they were running Mistral at, like, 500 tokens a second, which is like, Fastest at anything that you have out there. The question, you know, it's all like, The memes were like, is NVIDIA dead? Like, people don't need H100s anymore. I think there's a lot of money that goes into building what GRUK has built as far as the hardware goes.[00:43:11] Alessio: We're gonna, we're gonna put some of the notes from, from Dylan in here, but Basically the cost of the Groq system is like 30 times the cost of, of H100 equivalent. So, so[00:43:23] swyx: let me, I put some numbers because me and Dylan were like, I think the two people actually tried to do Groq math. Spreadsheet doors.[00:43:30] swyx: Spreadsheet doors. So, one that's, okay, oh boy so, so, equivalent H100 for Lama 2 is 300, 000. For a system of 8 cards. And for Groq it's 2. 3 million. Because you have to buy 576 Groq cards. So yeah, that, that just gives people an idea. So like if you deprecate both over a five year lifespan, per year you're deprecating 460K for Groq, and 60K a year for H100.[00:43:59] swyx: So like, Groqs are just way more expensive per model that you're, that you're hosting. But then, you make it up in terms of volume. So I don't know if you want to[00:44:08] Alessio: cover that. I think one of the promises of Groq is like super high parallel inference on the same thing. So you're basically saying, okay, I'm putting on this upfront investment on the hardware, but then I get much better scaling once I have it installed.[00:44:24] Alessio: I think the big question is how much can you sustain the parallelism? You know, like if you get, if you're going to get 100% Utilization rate at all times on Groq, like, it's just much better, you know, because like at the end of the day, the tokens per second costs that you're getting is better than with the H100s, but if you get to like 50 percent utilization rate, you will be much better off running on NVIDIA.[00:44:49] Alessio: And if you look at most companies out there, who really gets 100 percent utilization rate? Probably open AI at peak times, but that's probably it. But yeah, curious to see more. I saw Jonathan was just at the Web Summit in Dubai, in Qatar. He just gave a talk there yesterday. That I haven't listened to yet.[00:45:09] Alessio: I, I tweeted that he should come on the pod. He liked it. And then rock followed me on Twitter. I don't know if that means that they're interested, but[00:45:16] swyx: hopefully rock social media person is just very friendly. They, yeah. Hopefully[00:45:20] Alessio: we can get them. Yeah, we, we gonna get him. We[00:45:22] swyx: just call him out and, and so basically the, the key question is like, how sustainable is this and how much.[00:45:27] swyx: This is a loss leader the entire Groq management team has been on Twitter and Hacker News saying they are very, very comfortable with the pricing of 0. 27 per million tokens. This is the lowest that anyone has offered tokens as far as Mixtral or Lama2. This matches deep infra and, you know, I think, I think that's, that's, that's about it in terms of that, that, that low.[00:45:47] swyx: And we think the pro the break even for H100s is 50 cents. At a, at a normal utilization rate. To make this work, so in my spreadsheet I made this, made this work. You have to have like a parallelism of 500 requests all simultaneously. And you have, you have model bandwidth utilization of 80%.[00:46:06] swyx: Which is way high. I just gave them high marks for everything. Groq has two fundamental tech innovations that they hinge their hats on in terms of like, why we are better than everyone. You know, even though, like, it remains to be independently replicated. But one you know, they have this sort of the entire model on the chip idea, which is like, Okay, get rid of HBM.[00:46:30] swyx: And, like, put everything in SREM. Like, okay, fine, but then you need a lot of cards and whatever. And that's all okay. And so, like, because you don't have to transfer between memory, then you just save on that time and that's why they're faster. So, a lot of people buy that as, like, that's the reason that you're faster.[00:46:45] swyx: Then they have, like, some kind of crazy compiler, or, like, Speculative routing magic using compilers that they also attribute towards their higher utilization. So I give them 80 percent for that. And so that all that works out to like, okay, base costs, I think you can get down to like, maybe like 20 something cents per million tokens.[00:47:04] swyx: And therefore you actually are fine if you have that kind of utilization. But it's like, I have to make a lot of fearful assumptions for this to work.[00:47:12] Alessio: Yeah. Yeah, I'm curious to see what Dylan says later.[00:47:16] swyx: So he was like completely opposite of me. He's like, they're just burning money. Which is great.[00:47:22] Analyzing Gemini's 1m Context, Reddit deal, Imagegen politics, Gemma via the Four Wars[00:47:22] Alessio: Gemini, want to do a quick run through since this touches on all the four words.[00:47:28] swyx: Yeah, and I think this is the mark of a useful framework, that when a new thing comes along, you can break it down in terms of the four words and sort of slot it in or analyze it in those four frameworks, and have nothing left.[00:47:41] swyx: So it's a MECE categorization. MECE is Mutually Exclusive and Collectively Exhaustive. And that's a really, really nice way to think about taxonomies and to create mental frameworks. So, what is Gemini 1. 5 Pro? It is the newest model that came out one week after Gemini 1. 0. Which is very interesting.[00:48:01] swyx: They have not really commented on why. They released this the headline feature is that it has a 1 million token context window that is multi modal which means that you can put all sorts of video and audio And PDFs natively in there alongside of text and, you know, it's, it's at least 10 times longer than anything that OpenAI offers which is interesting.[00:48:20] swyx: So it's great for prototyping and it has interesting discussions on whether it kills RAG.[00:48:25] Alessio: Yeah, no, I mean, we always talk about, you know, Long context is good, but you're getting charged per token. So, yeah, people love for you to use more tokens in the context. And RAG is better economics. But I think it all comes down to like how the price curves change, right?[00:48:42] Alessio: I think if anything, RAG's complexity goes up and up the more you use it, you know, because you have more data sources, more things you want to put in there. The token costs should go down over time, you know, if the model stays fixed. If people are happy with the model today. In two years, three years, it's just gonna cost a lot less, you know?[00:49:02] Alessio: So now it's like, why would I use RAG and like go through all of that? It's interesting. I think RAG is better cutting edge economics for LLMs. I think large context will be better long tail economics when you factor in the build cost of like managing a RAG pipeline. But yeah, the recall was like the most interesting thing because we've seen the, you know, You know, in the haystack things in the past, but apparently they have 100 percent recall on anything across the context window.[00:49:28] Alessio: At least they say nobody has used it. No, people[00:49:30] swyx: have. Yeah so as far as, so, so what this needle in a haystack thing for people who aren't following as closely as us is that someone, I forget his name now someone created this needle in a haystack problem where you feed in a whole bunch of generated junk not junk, but just like, Generate a data and ask it to specifically retrieve something in that data, like one line in like a hundred thousand lines where it like has a specific fact and if it, if you get it, you're, you're good.[00:49:57] swyx: And then he moves the needle around, like, you know, does it, does, does your ability to retrieve that vary if I put it at the start versus put it in the middle, put it at the end? And then you generate this like really nice chart. That, that kind of shows like it's recallability of a model. And he did that for GPT and, and Anthropic and showed that Anthropic did really, really poorly.[00:50:15] swyx: And then Anthropic came back and said it was a skill issue, just add this like four, four magic words, and then, then it's magically all fixed. And obviously everybody laughed at that. But what Gemini came out with was, was that, yeah, we, we reproduced their, you know, haystack issue you know, test for Gemini, and it's good across all, all languages.[00:50:30] swyx: All the one million token window, which is very interesting because usually for typical context extension methods like rope or yarn or, you know, anything like that, or alibi, it's lossy like by design it's lossy, usually for conversations that's fine because we are lossy when we talk to people but for superhuman intelligence, perfect memory across Very, very long context.[00:50:51] swyx: It's very, very interesting for picking things up. And so the people who have been given the beta test for Gemini have been testing this. So what you do is you upload, let's say, all of Harry Potter and you change one fact in one sentence, somewhere in there, and you ask it to pick it up, and it does. So this is legit.[00:51:08] swyx: We don't super know how, because this is, like, because it doesn't, yes, it's slow to inference, but it's not slow enough that it's, like, running. Five different systems in the background without telling you. Right. So it's something, it's something interesting that they haven't fully disclosed yet. The open source community has centered on this ring attention paper, which is created by your friend Matei Zaharia, and a couple other people.[00:51:36] swyx: And it's a form of distributing the compute. I don't super understand, like, why, you know, doing, calculating, like, the fee for networking and attention. In block wise fashion and distributing it makes it so good at recall. I don't think they have any answer to that. The only thing that Ring of Tension is really focused on is basically infinite context.[00:51:59] swyx: They said it was good for like 10 to 100 million tokens. Which is, it's just great. So yeah, using the four wars framework, what is this framework for Gemini? One is the sort of RAG and Ops war. Here we care less about RAG now, yes. Or, we still care as much about RAG, but like, now it's it's not important in prototyping.[00:52:21] swyx: And then, for data war I guess this is just part of the overall training dataset, but Google made a 60 million deal with Reddit and presumably they have deals with other companies. For the multi modality war, we can talk about the image generation, Crisis, or the fact that Gemini also has image generation, which we'll talk about in the next section.[00:52:42] swyx: But it also has video understanding, which is, I think, the top Gemini post came from our friend Simon Willison, who basically did a short video of him scanning over his bookshelf. And it would be able to convert that video into a JSON output of what's on that bookshelf. And I think that is very useful.[00:53:04] swyx: Actually ties into the conversation that we had with David Luan from Adept. In a sense of like, okay what if video was the main modality instead of text as the input? What if, what if everything was video in, because that's how we work. We, our eyes don't actually read, don't actually like get input, our brains don't get inputs as characters.[00:53:25] swyx: Our brains get the pixels shooting into our eyes, and then our vision system takes over first, and then we sort of mentally translate that into text later. And so it's kind of like what Adept is kind of doing, which is driving by vision model, instead of driving by raw text understanding of the DOM. And, and I, I, in that, that episode, which we haven't released I made the analogy to like self-driving by lidar versus self-driving by camera.[00:53:52] swyx: Mm-Hmm. , right? Like, it's like, I think it, what Gemini and any other super long context that model that is multimodal unlocks is what if you just drive everything by video. Which is[00:54:03] Alessio: cool. Yeah, and that's Joseph from Roboflow. It's like anything that can be seen can be programmable with these models.[00:54:12] Alessio: You mean[00:54:12] swyx: the computer vision guy is bullish on computer vision?[00:54:18] Alessio: It's like the rag people. The rag people are bullish on rag and not a lot of context. I'm very surprised. The, the fine tuning people love fine tuning instead of few shot. Yeah. Yeah. The, yeah, the, that's that. Yeah, the, I, I think the ring attention thing, and it's how they did it, we don't know. And then they released the Gemma models, which are like a 2 billion and 7 billion open.[00:54:41] Alessio: Models, which people said are not, are not good based on my Twitter experience, which are the, the GPU poor crumbs. It's like, Hey, we did all this work for us because we're GPU rich and we're just going to run this whole thing. And
Jon Krohn presents an insightful overview of Google's groundbreaking Gemini Pro 1.5, a million-token LLM that's transforming the landscape of AI. Discover the innovative aspects of Gemini Pro 1.5, from its extensive context window to its multimodal functionalities, which are broadening the scope of AI technology and signifying a significant leap in data science. Plus, join Jon for a practical demonstration, showcasing the real-world applications, capabilities, and limitation of this advanced language model. Additional materials: www.superdatascience.com/762 Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.
ChatGPT Plugins are on their way out! Tyler Perry is putting his studio expansion on hold due to AI, and Google is making TONS of news right now! Here's this week's AI news that matters and why it's important. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode pageJoin the discussion: Ask Jordan questions on AIRelated Episodes:Ep 211: OpenAI's Sora – The larger impact that no one's talking aboutEp 204: Google Gemini Advanced – 7 things you need to knowTomorrow' Show: How to stand out in a world where everyone can create an AI Startup?Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:03:42 Tyler Perry concerned about AI job loss.07:22 OpenAI Sora video excels over other platforms.12:54 11 Labs updated model, ChatGPT phasing out.15:27 Plugin packs for ChatGPT.16:55 Limitations on using multiple GPTs for now.22:16 Unsatisfied with Google Gemini Enterprise integration.23:13 Google and Reddit partnership for language models.28:39 Google Gemini Images paused due to diversity concerns.31:16 Google now has three Gemini models.34:54 Best text-to-speech AI37:11 AI content creation raises copyright concernsTopics Covered in This Episode:1. OpenAI's changes and future focus2. Google's Significant AI content deal with Reddit3. Google's AI model developments and issues4. Trends in AI utilization within the entertainment industryKeywords:OpenAI, GPT, AI agents, AI assistants, prime prompt polish program, Google, Reddit, AI content licensing deal, AI models, search engine, Gemini AI, large language models, user-generated content, university student data, Google Gemini Imagine 2, Gemma, Gemini Ultra, Gemini Pro, Gemini Nano, Tyler Perry, Sora, AI in entertainment, text-to-speech AI, business productivity, ChatGPT plugins, Well Said Labs, Asura, AI video platforms, Perry's studio expansion, AI regulation
The AI Breakdown: Daily Artificial Intelligence News and Discussions
NLW argues that another phase of expectation in genAI has begun thanks to Groq, Sora, and Gemini Pro 1.5 Featuring a reading of https://www.oneusefulthing.org/p/strategies-for-an-accelerating-future INTERESTED IN THE AI EDUCATION BETA? Learn more and sign up https://bit.ly/aibeta ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
This is a recap of the top 10 posts on Hacker News on February 23rd, 2024.This podcast was generated by wondercraft.ai(00:41): Thanks FedEx, this is why we keep getting phishedOriginal post: https://news.ycombinator.com/item?id=39479001&utm_source=wondercraft_ai(02:23): Gemma.cpp: lightweight, standalone C++ inference engine for Gemma modelsOriginal post: https://news.ycombinator.com/item?id=39481554&utm_source=wondercraft_ai(04:07): Satoshi – Sirius emails 2009-2011Original post: https://news.ycombinator.com/item?id=39480407&utm_source=wondercraft_ai(05:56): Intel Processor Instability Causing Oodle Decompression FailuresOriginal post: https://news.ycombinator.com/item?id=39478551&utm_source=wondercraft_ai(07:44): Beyond A*: Better Planning with TransformersOriginal post: https://news.ycombinator.com/item?id=39479478&utm_source=wondercraft_ai(09:53): German Bundestag Passes Cannabis LegalizationOriginal post: https://news.ycombinator.com/item?id=39481188&utm_source=wondercraft_ai(11:29): Mamba: The Easy WayOriginal post: https://news.ycombinator.com/item?id=39482428&utm_source=wondercraft_ai(13:25): Meta's new LLM-based test generator is a sneak peek to the future of developmentOriginal post: https://news.ycombinator.com/item?id=39486717&utm_source=wondercraft_ai(15:05): I Spent a Week with Gemini Pro 1.5–It's FantasticOriginal post: https://news.ycombinator.com/item?id=39481670&utm_source=wondercraft_ai(16:39): Certain dogs are capable of learning the names for more than 100 different toysOriginal post: https://news.ycombinator.com/item?id=39481805&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai
This is a recap of the top 10 posts on Hacker News on February 21st, 2024.This podcast was generated by wondercraft.ai(00:37): Insecure vehicles should be banned, not security tools like the Flipper ZeroOriginal post: https://news.ycombinator.com/item?id=39452494&utm_source=wondercraft_ai(02:26): Gemma: New Open ModelsOriginal post: https://news.ycombinator.com/item?id=39453271&utm_source=wondercraft_ai(03:54): The killer app of Gemini Pro 1.5 is videoOriginal post: https://news.ycombinator.com/item?id=39458264&utm_source=wondercraft_ai(05:34): iMessage with PQ3 Cryptographic ProtocolOriginal post: https://news.ycombinator.com/item?id=39453660&utm_source=wondercraft_ai(07:25): ChatGPT went berserkOriginal post: https://news.ycombinator.com/item?id=39450669&utm_source=wondercraft_ai(09:21): Stop postponing things by embracing the messOriginal post: https://news.ycombinator.com/item?id=39451793&utm_source=wondercraft_ai(10:55): Pijul is a free and open source (GPL2) distributed version control systemOriginal post: https://news.ycombinator.com/item?id=39452543&utm_source=wondercraft_ai(12:32): Air Canada Has to Honor a Refund Policy Its Chatbot Made UpOriginal post: https://news.ycombinator.com/item?id=39455131&utm_source=wondercraft_ai(14:05): Useful Uses of catOriginal post: https://news.ycombinator.com/item?id=39457875&utm_source=wondercraft_ai(15:34): Things I don't know about AIOriginal post: https://news.ycombinator.com/item?id=39453622&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai
Is 2024 the year we'll see our wildest imaginations come to life in video form? Kipp and Kieran get right into the brewing storm within the AI industry as titans clash on new frontiers of technology. In this episode they dive into the unfolding drama of AI developments with a focus on the text-to-video revolution. Learn more on how Sora is animating our still image stories, the serious business of AI in video game worlds, and the intense rivalry heating up between OpenAI and Google. Mentions Sora - Text-to-video model launched by OpenAI. (https://openai.com/sora) OpenAI - The organization behind the development of AI models like Sora and GPT-4. (https://www.openai.com/) Sam Altman - CEO of OpenAI involved in the launch of Sora. (https://www.ycombinator.com/people/sam) Google Gemini 1.5 - A model developed by Google with capabilities in text, audio, and video. (https://gemini.google.com/advanced) GPT-4 - The fourth iteration of the Generative Pre-trained Transformer model by OpenAI. (https://openai.com/gpt-4) Time Stamps: 00:00 Sam strategically times releases to upstage Google. 04:58 Multiple videos watched, 30-50 pages long. Easter eggs, OpenAI mention, Sam Altman backstory. 07:47 A new model is better than GPT-4. 12:55 Will Smith spaghetti meme evolved rapidly in Tokyo. 15:39 Model Sora can animate still images, creating narratives. 18:30 Stock videographer sites may be obsolete for marketing. 20:54 YouTube is the future of multimedia content. 26:20 Gemini Pro unlocks YouTube as a search engine. 29:32 OpenAI: large company doing incredible work efficiently. 31:43 AI developments promise exciting content for the year. Follow us for everyday marketing wisdom straight to your feed YouTube: https://www.youtube.com/channel/UCGtXqPiNV8YC0GMUzY-EUFg Twitter: https://twitter.com/matgpod TikTok: https://www.tiktok.com/@matgpod Thank you for tuning into Marketing Against The Grain! Don't forget to hit subscribe and follow us on Apple Podcasts (so you never miss an episode)! https://podcasts.apple.com/us/podcast/marketing-against-the-grain/id1616700934 If you love this show, please leave us a 5-Star Review https://link.chtbl.com/h9_sjBKH and share your favorite episodes with friends. We really appreciate your support. Host Links: Kipp Bodnar, https://twitter.com/kippbodnar Kieran Flanagan, https://twitter.com/searchbrat ‘Marketing Against The Grain' is a HubSpot Original Podcast // Brought to you by The HubSpot Podcast Network // Produced by Darren Clarke.
O Hipsters: Fora de Controle é o podcast da Alura com notícias sobre Inteligência Artificial aplicada e todo esse novo mundo no qual estamos começando a engatinhar, e que você vai poder explorar conosco! Nesse episódio conversamos com Sandor Caetano, Chief Data Officer do PicPay sobre como a empresa está adotando IA nos produtos e nos processos internos. Além disso, destrinchamos o tsunami de novidades de IA generativa que agitaram o finalzinho da última semana. Vem ver quem participou desse papo: Marcus Mendes, host fora de controle Fabrício Carraro, Program Manager da Alura e host do podcast Dev Sem Fronteiras Sérgio Lopes, CTO da Alura Filipe Lauar, engenheiro de Machine Learning e host podcast do Vida com IA Christian Velasco, Líder da operação da Alura na América Latina Sandor Caetano, Chief Data Officer no PicPay
Show Notes: https://thisdayinai.com/bookmarks/28-ep51/Sign up for daily This Day in AI: https://thisdayinai.comTry Stable Cascade: https://simtheory.ai/agent/508-stable-cascadeJoin SimTheory: https://simtheory.ai======This week we take several shots of vodka before trying to make sense of all the announcements. OpenAI attempted to trump Google's Gemini 1.5 with the announcement of Sora, 1 minute video generation that does an incredible job of keeping track of objects. Google showed us that up to 10M context windows are possible with multi-modal inputs. We discuss if a larger context window could end the need for RAG and take a first look at GraphRAG by Microsoft hoping to improve RAG with a knowledge graph. We road test Nvidia's ChatRTX on our baller graphics cards and Chris tries to delete all of his files using Microsoft UFO, a new open source project that uses GPT-4 vision to navigate and execute tasks on your Windows PC. We cover briefly V-JEPA (will try for next weeks show) and it's ability to learn through watching videos and listening, and finally discuss Stability's Stable Cascade which we've made available for "research" on SimTheory.If you like the show please consider subscribing and leaving a comment. We appreciate your support.======Chapters:00:00 - OpenAI's Sora That Creates Videos Instantly From Text13:49 - ChatGPT Memory Released in Limited Preview23:31 - OpenAI Rumored To Be Building Web Search, Andrej Karpathy Leaves OpenAI, Have OpenAI Slowed Down?33:04 - Google Announces Gemini Pro 1.5. Huge Breakthrough 10M Context Window!50:11 - Microsoft Research Publishes GraphRAG: Knowledge Graph Based RAG1:02:03 - Nvidia's ChatRTX Road Tested1:07:18 - AI Computers, AI PCs & Microsoft's UFO: An Agent for Window OS Interaction. Risk of AI Computers.1:18:46 - Meta's V-JEPA: new architecture for self-supervised learning1:24:26 - Stability AI's Stable Cascade
Welcome to episode 246 of The CloudPod podcast, where the forecast is always cloudy! This week we're discussion localllm and just why they've saddled us all with that name, saying goodbye to Bard and hello to Gemini Pro, and discussing the pros and cons of helping skynet to eradicate us all. All that and more cloud and AI news, now available for your listening nightmares. Titles we almost went with this week: Oracle says hold my beer on Africa The Cloud Pod Thinks the LLM Maturity Model has More Maturing To Do There is a Finch Windows Canary in Fargate New LLM Nightmares The Cloud Pod Will Never Type localllm Correctly A big thanks to this week's sponsor: We're sponsorless this week! Interested in sponsoring us and having access to a very specialized and targeted market? We'd love to talk to you. Send us an email or hit us up on our Slack Channel. General News It's Earnings Time! 01:42 Microsoft issues light guidance even as Azure growth drives earnings beat Microsoft shares were up after they reported earnings of 2.93 per share vs expectations of 2.73 per share. Revenue was 62.02 billion vs 61.12 billion. This represents a 17.6% year over year in the quarter. The intelligent cloud segment produced $25.88 billion in revenue, up 20% and above the $25.29 billion consensus among analysts surveyed by Streets Accounts. Revenue from Azure and other cloud services grew 30%, when analysts only expected 27.7%. Six points are tied to AI as Microsoft now has 53,000 Azure AI customers and 1/3rd are new in the past year (per Microsoft.) 02:46 Justin- “I don't think the count the Open AI customers, do you? Because there's way more people that have Open AI usage than 53,000. So I think this is legitimately Azure AI – which is Open AI under the hood – but specifically paying for that subscription.” 04:19 Alphabet shares slide on disappointing Google ad revenue Alphabet reported better-than-expected revenue and profit for the fourth quarter, but ad revenue trailed analysts projections. Earnings per share were 1.64 vs 1.59 expected. Revenue of 86.31 billion vs 85.33 billion expected Google Cloud was 9.19 Billion vs 8.94 billion expected, according to Street. That represents a 26% expansion in the fourth quarter. 04:51 Justin- “…which is interesting, because you would expect that they'd have similar growth being tied to Bard and Gemini to be close to what Microsoft is doing.” 12:02 Amazon reports better-than-expected results as revenue jumps 14% Amazon also exceeded analysis expectations. Earnings per share 1.00 vs 80 cents expected. Revenue
The Mint Condition: NFT and Digital Collectibles Entertainment
In the latest episode of Mid Mic Crisis, hosts Bunchu and Chamber delve into the cutting-edge advancements in AI technology with a focus on the newly released OpenAI Sora videos and the latest update to Google Gemini 1.5 Pro.The episode opens with Bunchu and Chamber providing an overview of the groundbreaking developments in artificial intelligence, highlighting the significance of OpenAI's Sora videos and Google's Gemini 1.5 Pro update in pushing the boundaries of AI capabilities.The hosts discuss the key features and functionalities of OpenAI Sora, exploring its potential applications in various fields such as content creation, virtual assistants, and digital entertainment. They analyze the implications of Sora's lifelike animations and natural language processing capabilities, considering the impact it may have on industries reliant on AI-driven solutions.Transitioning to Google Gemini 1.5 Pro, Bunchu and Chamber examine the enhancements introduced in the latest iteration of Google's AI platform. They compare and contrast Gemini 1.5 Pro with previous versions, discussing its improved performance, expanded functionality, and potential use cases across different sectors.Throughout the episode, Bunchu and Chamber engage in insightful discussions on the evolving landscape of AI technology, sharing their perspectives on the implications of these advancements for society, business, and innovation.With their signature blend of humor, curiosity, and expertise, Bunchu and Chamber offer listeners a comprehensive exploration of the latest developments in AI, shedding light on the transformative potential of OpenAI Sora and Google Gemini 1.5 Pro in shaping the future of technology.Follow us on X.com: https://twitter.com/MidMicCrisisPowered by @dGenNetworkWebsite: https://dgen.network/Support the show
It's great to be back! In this episode we cover everything new and everything we missed during our break. We start with breaking news that the OpenAI ChatGPT GPT Store is being released next week, then cover Gemini Pro and Gemini Pro Vision API, Mixtral APIs, AnyText, NY Times Copyright lawsuit and finally.. get excited about a dishwashing robot!====Join SimTheory: https://simtheory.aiJoin Discord: https://discord.gg/aphwE5snuqGet Merch: https://www.thisdayinaimerch.com/Try models from the show:====Gemini Pro: https://simtheory.ai/agent/282-google-gemini-assistantMixtral: https://simtheory.ai/agent/129-miss-mistra-mistral-mediumStable Diffusion Video: https://simtheory.ai/agent/224-image-to-video-creation-agentAI Movie Trailer Maker: https://simtheory.ai/agent/279-ai-movie-trailer-makerCHAPTERS:====00:00 - Mike's AI Movie Trailer Intro02:05 - GPT Store Will go Live Next Week22:52 - Gemini Pro API & Gemini Pro Vision Road Tested (literally)33:34 - Mixtral API: Mistral Platform API Tested45:31 - Stable Video Diffusion48:12 - Pika AI Video General Availability52:05 - Stability AI Memberships55:54 - Prompt Injection for DALL-E with Public Domain57:34 - New York Times Sues OpenAI & Microsoft for Copyright Infringement1:04:49 - Inpainting with AnyText1:14:15 - Microsoft CoPilot App with GPT-4 Now On iOS and Android1:14:39 - One More Thing: The Dishwasher BotSOURCES:====https://time.com/6551496/mickey-mouse-public-domain-steamboat-willie/https://twitter.com/digthatdata/status/1742074049260621976?s=46https://www.theguardian.com/media/2023/dec/27/new-york-times-openai-microsoft-lawsuithttps://www.reuters.com/technology/apple-explores-ai-deals-with-news-publishers-new-york-times-2023-12-22/https://twitter.com/rowancheung/status/1742967393310368222/photo/1https://www.theinformation.com/briefings/openai-to-launch-chatbot-store-next-week?rc=kvsmhwhttps://blog.google/technology/ai/gemini-api-developers-cloud/https://mistral.ai/news/mixtral-of-experts/https://mistral.ai/news/la-plateforme/https://stability.ai/news/stable-video-diffusion-open-ai-video-modelhttps://simtheory.ai/share/d49c8c00-9fda-40aa-b386-a7c27455015b/https://pika.art/https://stability.ai/membershiphttps://twitter.com/venturetwins/status/1742976476432196100?s=46https://github.com/tyxsspa/anytexthttps://www.theverge.com/2023/12/29/24019288/microsoft-copilot-app-available-iphone-ipad-aihttps://mobile-aloha.github.io/
Google has been under fire after the release of its new Gemini. Sorry to say but Google got so many things wrong with the marketing and launch. Is Gemini an actual ChatGPT killer or just a marketing stunt gone wrong? We're covering everything you need to know.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions about Google GeminiUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:[00:02:17] Daily AI news[00:07:30] Overview of Google Gemini[00:10:40] Google lied about Gemini release[00:17:10] How Gemini demo was created[00:23:50] Comparing ChatGPT to Gemini[00:30:40] Benchmarks of Gemini vs ChatGPT[00:38:20] Why did Google release Gemini?[00:43:00] Consequences of botched releaseTopics Covered in This Episode:1. Introduction to Google's Gemini Model2. Google Gemini's Marketing Controversy3. Assessing Gemini's Performance and Functionality4. Comparison with ChatGPT5. Importance of Transparency and Truth in AI IndustryKeywords:Google Gemini, Generative AI, GPT-4.5, AI news, AI models, Google Bard, Multimodal AI, Google stock, Generative AI industry, Google credibility, Technology news, AI tools, Fact-based newsletter, Marketing misstep, Deceptive marketing, Multimodal functionality, Gemini Ultra, Gemini Pro, Benchmarks, Misrepresentation, Stock value, Text model, Image model, Audio model, Google services, Pro mode, Ultra mode, Marketing video Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/ Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/
People across the Internet are accusing Google of faking that Gemini AI video demo that everyone was wowed by. Apple seems to be diversifying out of China for manufacturing at pace now. Might the UK's CMA have an issue with Microsoft's relationship with OpenAI? And, of course, the Weekend Longreads Suggestions.Sponsors:ShopBeam.com/rideLinks:Google's Gemini Looks Remarkable, But It's Still Behind OpenAI (Bloomberg)Early impressions of Google's Gemini aren't great (TechCrunch)Apple to move key iPad engineering resources to Vietnam (NikkeiAsia)Microsoft, OpenAI Are Facing a Potential Antitrust Probe in UK (Bloomberg)Google launches NotebookLM powered by Gemini Pro, drops waitlist (9to5Google)Weekend Longreads Suggestions:The real research behind the wild rumors about OpenAI's Q* project (ArsTechnica)AI and Mass Spying (Schneier On Security)The race to 5G is over — now it's time to pay the bill (The Verge)In the Hall v. Oates legal feud, fans don't want to play favorites (NBCNews)See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Hey everybody, welcome back to the latest episode of the Niche Pursuits Podcast. Like they do every week, Spencer and Jared cover the latest happenings in the Google, SEO, and content creation space so you can feel informed and make the best decisions for your business. The first news item they discuss is a tweet from Barry Schwartz noting that Danny Sullivan and John Mueller from Google had deleted some of their previous tweets about the Helpful Content Update. What does this mean? The HCU is not being rolled back, but what would lead them to delete their previous tweets? Is there a glimmer of hope for the people affected by the update? Is the potential December update actually a “rollback?” Or are we all reading too much into this? Spencer and Jared have a few thoughts to share on the subject. The next topic is Google's new update, Gemini Pro, which is already rolled out into Google Bard. A ChatGPT competitor, this is an advanced multimodal AI model that will be part of the Google Search experience very soon. Jared and Spencer talk about how Gemini and SGE are actually in conflict with what Google wants and show a few examples. What happens when Spencer pokes around to test Gemini out? What are the implications and how will this look going forward? Listen to hear what they think! The next news item is how there are continuous leaks of information and tidbits of Google's strategy that have been coming out as the company faces its major anti-trust lawsuit. Most recently, an article was published that offers a ton of insight into how Google works behind the scenes. Spencer and Jared share a few of the highlights, like Google's many ranking signals, the use of keywords, how websites are rated and tested, and the Navboost system. Listen to the tidbits that surprised them in the super-detailed article. Moving on to the side hustles, Spencer gives an update on the Amazon Influencer Program. He has 967 videos live and he shares a screenshot for over $3k in earnings for the last 30 days. He talks about the cost of outsourcing his videos, and he and Jared talk about whether they think earnings are up because of the holidays or if this upward trend will continue. In the ongoing competition between the hosts, Jared talks about reaching the 1000-video mark and his most recent earnings from the program, $4600. This is a great performance for a side hustle he started just a few months ago. He and Spencer talk about the enormous possibilities of this side hustle and share an inspiring story of a podcast listener who earned $11k in November from the program and started in July! The big picture message is that there are lots of side hustles out there that anyone can start and potentially earn life-changing money if they're willing to put in the work. Spencer then goes on to share his weird night site, which may very well be someone's side hustle: Flpbk.io. This cool, little website lets you submit a 1-minute video and they will turn it into a flipbook and send it to you. Although this website ranks for 0 keywords and only gets about 17k visitors per quarter, which is probably direct traffic and social. That being said, if the traffic is targeted, the website could be quite profitable. Jared shares his weird niche site next: Celebrity Cutouts, where you can buy masks, large heads, and life-size cut-outs of your favorite celebrities. This DR28 site is doing pretty well, ranking for 47k keywords and getting 17.5k a month in traffic. They discuss how it's most likely a print-on-demand business and encourage listeners to take advantage of the holiday sales to buy loved ones cutouts of their favorite celebrities. And that brings us to the end of another episode of the Niche Pursuits Podcast. Hopefully, this episode inspires you to start that side hustle and keeps you informed of the latest SEO news. Be sure to get more content like this in the Niche Pursuits Newsletter Right Here: https://www.nichepursuits.com/newsletter Want a Faster and Easier Way to Build Internal Links? Get $15 off Link Whisper with Discount Code "Podcast" on the Checkout Screen: https://www.nichepursuits.com/linkwhisper Get SEO Consulting from the Niche Pursuits Podcast Host, Jared Bauman: https://www.nichepursuits.com/201creative
Join our discord: https://discord.gg/zqz5fVyx7mGet the merch: https://thisdayinaimerch.comTry Agents & Models on SimTheory: https://simtheory.aiIn our final episode for the year, we cover the surprise announcement of Google's Gemini AI models and give our first impressions. We road test Gemini Pro on Bard and discuss the likely impact of Gemini on the market and developer ecosystems. Then it's time for our holiday gift: SimTheory. Now you can use AI agents we mention on the show including our virtual girlfriends, Sports Betting with AI and many more! You can even create your own agents to try different models using the same tools we use to prepare for the show. We then discuss if Ilya is OK and the drama at OpenAI. And finally, we make predictions for 2024 and cover some of Meta's latest announcements.Thanks for watching, listening and all your support through 2023. We really appreciate it and will see you early next year!CHAPTERS:=====00:00 - Google Gemini is Here? Kinda38:48 - Our Holiday Gift: SimTheory: Virtual Girlfriend, Sports Betting with AI Agents51:15 - Is Ilya OK? Is GPT-4 Slowness About Cost Reductions?56:26 - NexusRaven-V2-13B for function calling: is this the future of specialized fine tune models?1:00:14 - Our Predictions for AI in 20241:12:54 - Meta announces AI Alliance for AI Openness + Updates to Meta AI Characters and SeamlessExpressive1:15:43 - Final thoughts and thank youSOURCES:=====https://blog.google/technology/ai/google-gemini-ai/https://twitter.com/tunguz/status/1732444203437695387https://twitter.com/tunguz/status/1732444203437695387https://twitter.com/tunguz/status/1732444203437695387https://twitter.com/tunguz/status/1732444203437695387https://techcrunch.com/2023/12/07/early-impressions-of-googles-gemini-arent-great/https://twitter.com/clementdelangue/status/1732138699901809042https://huggingface.co/Nexusflow/NexusRaven-V2-13Bhttps://twitter.com/abemurray/status/1732723510810759369https://ai.meta.com/blog/ai-alliancehttps://techcrunch.com/2023/12/06/metas-ai-characters-are-now-live-across-its-u-s-apps-with-support-for-bing-search-and-better-memory/https://techcrunch.com/2023/12/06/meta-ai-adds-reels-support-and-reimagine-a-way-to-generate-new-ai-images-in-group-chats/https://seamless.metademolab.com/expressivehttps://twitter.com/mattrickard/status/1731889331516936261
Why are AAA games like GTA 6 ported to PC well after their release on game consoles? Scott explains. Plus Twitch will stop operations in South Korea starting February 27, 2024, due to high costs there. And Google launches its new Large Language Model Gemini which comes in three flavors; Gemini Ultra, Gemini Pro, and Gemini Nano.Starring Tom Merritt, Sarah Lane, Scott Johnson, Roger Chang, Joe.Link to the Show Notes. Become a member at https://plus.acast.com/s/dtns. Hosted on Acast. See acast.com/privacy for more information.
Why are AAA games like GTA 6 ported to PC well after their release on game consoles? Scott explains. Plus Twitch will stop operations in South Korea starting February 27, 2024, due to high costs there. And Google launches its new Large Language Model Gemini which comes in three flavors; Gemini Ultra, Gemini Pro, and Gemini Nano. Starring Tom Merritt, Sarah Lane, Scott Johnson, Roger Chang, Joe. To read the show notes in a separate page click here! Support the show on Patreon by becoming a supporter!