POPULARITY
No Priors: Artificial Intelligence | Machine Learning | Technology | Startups
This week on No Priors, Elad and Sarah sit down with Eric Mitchell and Brandon McKinzie, two of the minds behind OpenAI's O3 model. They discuss what makes O3 unique, including its focus on reasoning, the role of reinforcement learning, and how tool use enables more powerful interactions. The conversation explores the unification of model capabilities, what the next generation of human-AI interfaces could look like, and how models will continue to advance in the years ahead. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @mckbrando | @ericmitchellai Show Notes: 0:00 What is o3? 3:21 Reinforcement learning in o3 4:44 Unification of models 8:56 Why tool use helps test time scaling 11:10 Deep research 16:00 Future ways to interact with models 22:03 General purpose vs specialized models 25:30 Simulating AI interacting with the world 29:36 How will models advance?
Hey everyone, Alex here
Alexis Slama reçoit Vincent, Julien et Vénissien, trois experts en intelligence artificielle appliquée au domaine de l'expertise comptable, pour un ABC Radio Show Live passionnant consacré aux actualités et cas d'usage de l'IA en cabinet.Nos experts partagent leurs découvertes en matière d'IA : l'arrivée du modèle O3 dans ChatGPT avec ses capacités de raisonnement avancées, la révolution de la recherche approfondie (Deep Research), les nouvelles fonctionnalités de génération d'images, et l'émergence des agents IA qui automatisent les tâches complexes.
In this episode of Generation AI, hosts JC Bonilla and Ardis Kadiu discuss recent significant AI developments across education, entertainment, and technology. They analyze President Trump's executive order on AI education for K-12 schools, which challenges the "cheating narrative" and aims to promote AI literacy. They also examine the Academy Awards' decision to accept AI-assisted films for consideration. The hosts then dive deep into OpenAI's new O3 model, exploring how its multimodal reasoning capabilities allow it to process text, images, and code in unified ways that mimic human thinking. The episode highlights practical applications for marketing, data analysis, and higher education enrollment strategies.Introduction and News Updates (00:00:00)Welcome to the Generation AI podcast focused on higher educationIntroduction of hosts JC Bonilla and Ardis KadiuOverview of the episode's main topics: executive order on AI education, Oscars' AI acceptance, and OpenAI's O3 modelExecutive Order on AI Education (00:03:04)President Trump signed an executive order on April 23rd promoting AI education for youthThe order creates a White House task force to oversee federal funding for AI literacyDiscussion on how this challenges the current trend of banning AI in 60% of US schoolsArdis shares personal experience of his children being afraid to use AI for educational purposesThe order aims to remove the "AI equals bad" mental block in studentsGlobal Competitiveness in AI Education (00:07:24)Debate on whether the US K-12 education system is globally competitiveDiscussion of inconsistencies in education quality across districts and statesComparison to other countries where students are actively building AI skillsNeed for improved AI literacy to maintain competitive advantageThe Academy Awards' AI Decision (00:08:56)The Academy of Motion Picture Arts and Sciences has updated rules for the 98th OscarsFilms using generative AI tools are now eligible for awardsThe use of AI will neither help nor harm a film's chances of nominationHuman-driven artistry will remain the priority in judgingDiscussion of the film "The Brutalist" which used AI for accent enhancementSignificance of this as validation for AI in creative fieldsUpdates on Frontier AI Models (00:13:42)Brief mention of recent model releases from Google and GrokFocus on OpenAI's O3 as a "multimodal marvel" for reasoningThe race of AI development continues with focus on scalable agentsDiscussion of naming conventions for OpenAI modelsOpenAI's O3 Model Capabilities (00:16:20)O3 represents a departure from conversational chatbots toward visual thinkingThe model excels at coding, math, science, and visual reasoningIt processes text, images, and code in a unified reasoning frameworkVisual thinking capabilities comparable to human thinking processesDiscussion of performance improvements and acceleration in model capabilitiesThe companion O4 Mini model for lightweight real-time applicationsTesting O3 vs Other Models (00:25:28)JC and Ardis compare experiences using different OpenAI modelsComparison between O3 and GPT-4o outputs for the same promptO3's tendency to use tables and structured formats for certain outputsArdis recommends using O3 for coding, PDF analysis, and creative feedbackPractical Applications of O3 (00:30:34)Content creation capabilities for ad copy generationCampaign visual analysis through multimodal reasoningData analysis improvements for marketing datasetsBetter handling of unstructured tasks in higher educationStrategic data-driven approaches to enrollment managementDemo of creating visual ads with iterative refinements in minutesExample of financial analysis from complex spreadsheetsChatGPT Memory Feature Update (00:40:37)New feature allowing ChatGPT to use prior conversations as contextBenefits of the system remembering previous interactionsHow this enables personalization of AI responsesImpact on workflow efficiency and personalized assistanceConclusion (00:43:07)Final thoughts on AI integration into human activitiesClosing remarks and podcast network information - - - -Connect With Our Co-Hosts:Ardis Kadiuhttps://www.linkedin.com/in/ardis/https://twitter.com/ardisDr. JC Bonillahttps://www.linkedin.com/in/jcbonilla/https://twitter.com/jbonillxAbout The Enrollify Podcast Network:Generation AI is a part of the Enrollify Podcast Network. If you like this podcast, chances are you'll like other Enrollify shows too! Enrollify is made possible by Element451 — the next-generation AI student engagement platform helping institutions create meaningful and personalized interactions with students. Learn more at element451.com. Attend the 2025 Engage Summit! The Engage Summit is the premier conference for forward-thinking leaders and practitioners dedicated to exploring the transformative power of AI in education. Explore the strategies and tools to step into the next generation of student engagement, supercharged by AI. You'll leave ready to deliver the most personalized digital engagement experience every step of the way.Register now to secure your spot in Charlotte, NC, on June 24-25, 2025! Early bird registration ends February 1st -- https://engage.element451.com/register
The AI Breakdown: Daily Artificial Intelligence News and Discussions
New research suggests that AI agents are improving at a rate far faster than expected, and their task complexity is doubling every four months. AI Digest confirms this by adding OpenAI's O3 and O4 Mini to the curve, showing tasks that take humans 1.5–1.7 hours are now within reach. Get Ad Free AI Daily Brief: https://patreon.com/AIDailyBriefBrought to you by:KPMG – Go to https://kpmg.com/ai to learn more about how KPMG can help you drive value with our AI solutions.Vanta - Simplify compliance - https://vanta.com/nlwPlumb - The Automation Platform for AI Experts - https://useplumb.com/nlwThe Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdown
Hey everyone, Alex here
Send us a textWant more AI tips & tricks for marketers & business owners? Sign up for our Newsletter. Weekly exclusive tips: https://www.authorityhacker.com/subsc...Think you've mastered ChatGPT? Recent updates, new models (like O3 & O4 mini), and hidden features mean you might be missing out on its true power, feeling like you're drinking from a firehose just trying to keep up.In this episode, we dive deep into the advanced capabilities of ChatGPT in April 2025. Forget basic prompts; learn how to leverage the latest features to reclaim hours, automate complex tasks, and make smarter decisions.We reveal step-by-step techniques rarely shown elsewhere, including:We reveal step-by-step techniques rarely shown elsewhere, including:Iterative Search: For complex research (e.g., trip planning).Projects & Files: Tailor workflows with custom knowledge (e.g., support).Canvas: Collaborate with AI on writing & editing.Data Analysis: Get insights directly from spreadsheets (e.g., ad reports).Creative Generation: AI-driven ideas for titles, thumbnails, etc.Model Selection: Choose the best model (4.0 vs O3 vs O4 Mini) for the task.Hidden Features: Boost efficiency with desktop, web & mobile tricks.Personalization: Use memory, custom instructions & voice effectively.Go beyond simple queries and transform ChatGPT into a powerful assistant that automates research, analyzes data, visualizes information, and boosts your creative output, even if you thought you knew it all.---A special thanks to our sponsors for this episode, Deadline Funnel. Build authentic urgency into your marketing (Get double free trial): https://www.deadlinefunnel.com/partne...Plus thanks Thrivecart, the best shopping cart for digital sellers (we've used them for 7+. years) Check their new PRO out at https://thrivecart.com/---Looking for 100s of more episodes like these?Check out our main YouTube channel: / @authorityhackerpodcastOr visit our podcast page: https://www.authorityhacker.com/podcast/Love this episode? Your reviews really help us a lot:Leave us a review on Amazon:
Try Simtheory: https://simtheory.ai
OpenAI's new o3 model feels almost criminal to use. Google is legit giving away its Gemini AI for free to millions. And Microsoft legit released an AI agent that can use a computer. Sheesh. Week after week, the pace of AI innovation is getting harder and harder to keep up with. Don't worry. We do that for you. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Thoughts on this? Join the convo.Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:OpenAI's $3B WindSurf Acquisition DealGoogle Launches Gemini 2.5 Flash ModelGoogle Veo 2 Video Tool ReleaseMicrosoft AI Computer Use Agent LaunchUS Ban Consideration on Chinese DeepSeekFree Gemini Advanced to US StudentsAnthropic's Claude Adds Google WorkspaceOpenAI Testing New Social Media PlatformGPT 4.1 API Model with 1M ContextOpenAI's O3 and O4 Mini ReleasedTimestamps:00:00 Intro03:43 OpenAI Eyes $3B Windsurf Acquisition06:54 Google Launches Gemini 2.5 Flash11:36 Google Unveils Veo 2 for Videos15:49 "AI Market Tensions: US vs China"20:10 Microsoft Unveils AI Automation Tool21:09 Microsoft AI Enhances Business Automation28:16 Claude's New Tool: Pricey Research Integration29:31 OpenAI Teams Lacks Gmail Integration33:10 OpenAI Testing Social Media Platform39:18 GPT-4.1's Competitive Edge in Coding43:19 AI Model Versions Overview45:49 Agentic AI: Workflow Evolution47:45 "O Four Mini Model Overview"52:20 Tech Giants Unveil AI ToolsKeywords:Microsoft, Autonomous AI Agent, Apps, Websites, Thinking Models, OpenAI, Large Language Model Modes, Google, Gemini AI, Tens of Millions, Claude, Anthropic, AI News, AI World, Grow Companies, Grow Careers, Generative AI, Acquisition, Windsurf, $3 Billion, Code Generation Market, AnySphere, Cursor, Annualized Recurring Revenue, AI Coding Startups, Codium, Competitive AI Space, Google Next Conference, Gemini 2.5 Flash, AI Model, Computational Reasoning, Pricing, Output Tokens, Reasoning Budget, Complex Problem Solving, Performance Benchmarks, Competitors, Claude 3.7, DeepSeek, OpenAI o4, AI Studio, AI-powered Video, Vios AI, v o two, Text-to-Video, Synth ID, Digital Watermark, WISC anime, White House Restrictions, NVIDIA AI Chips, Intellectual Property Rights, Trump Administration, Silicon Valley, DeepSeek Ban, Innovations in AI, Copilot Studio, Microsoft 365 Copilot, Automation, API Restrictions, AI Agents, Influencer Recommendations, Social Media Network, Sam Altman, ChatGPT, Image Generation, Grok AI Integration, SociSend Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Ready for ROI on GenAI? Go to youreverydayai.com/partner
In this episode of the Cognitive Revolution podcast, the host Nathan Labenz is joined for the record 9th time by Zvi Mowshowitz to discuss the state of AI advancements, focusing on recent developments such as OpenAI's O3 model and its implications for AGI and recursive self-improvement. They delve into the capabilities and limitations of current AI models in various domains, including coding, deep research, and practical utilities. The discussion also covers the strategic and ethical considerations in AI development, touching upon the roles of major AI labs, the potential for weaponization, and the importance of balancing innovation with safety. Zvi shares insights on what it means to be a live player in the AI race, the impact of transparency and safety measures, and the challenges of governance in the context of rapidly advancing AI technologies. Nathan Labenz's slide deck documenting the ever-growing list of AI Bad Behaviors: https://docs.google.com/presentation/d/1mvkpg1mtAvGzTiiwYPc6bKOGsQXDIwMb-ytQECb3i7I/edit#slide=id.g252d9e67d86_0_16 Upcoming Major AI Events Featuring Nathan Labenz as a Keynote Speaker https://www.imagineai.live/ https://adapta.org/adapta-summit https://itrevolution.com/product/enterprise-tech-leadership-summit-las-vegas/ SPONSORS: Box AI: Box AI revolutionizes content management by unlocking the potential of unstructured data. Automate document processing, extract insights, and build custom AI agents using cutting-edge models like OpenAI's GPT-4.5, Google's Gemini 2.0, and Anthropic's Cloud 3.7 Sonnet. Trusted by over 115,000 enterprises, Box AI ensures top-tier security and compliance. Visit https://box.com/ai to transform your business with intelligent content management today Shopify: Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive Oracle Cloud Infrastructure (OCI): Oracle Cloud Infrastructure offers next-generation cloud solutions that cut costs and boost performance. With OCI, you can run AI projects and applications faster and more securely for less. New U.S. customers can save 50% on compute, 70% on storage, and 80% on networking by switching to OCI before May 31, 2024. See if you qualify at https://oracle.com/cognitive PRODUCED BY: https://aipodcast.ing
OpenAI's o3 and o4-mini are here—and they're multimodal, cheaper, and scary good. These models can see, code, plan, and use tools all on their own. Yeah. It's a big deal. We break down everything from tool use to image reasoning to why o3 might be the start of something actually autonomous. Plus, our favorite cursed (and adorable) 4o Image Generation prompts, ChatGPT as a social network, the old (Monday) news about GPT-4.1 including free Windsurf coding for a week! Also, Kling 2.0 and Veo 2 drop new AI video models, Google's Deepmind is using AI to talk to dolphins, NVIDIA's new chip restrictions and Eric Schmidt says the computers… don't have to listen to us anymore. Uh-oh. THE COMPUTERS HAVE EYES. AND THEY MIGHT NOT NEED US. STILL A GOOD SHOW. Join the discord: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/ // Show Links // O3 + o4-MINI ARE HERE LIVE STREAM: https://www.youtube.com/live/sq8GBPUb3rk?si=qQMFAvm8UmvyGaWv OpenAI Blog Post: https://openai.com/index/introducing-o3-and-o4-mini/ “Thinking With Images” https://openai.com/index/thinking-with-images/ Codex CLI https://x.com/OpenAIDevs/status/1912556874211422572 Professor & Biomedical Scientist Reaction to o3 https://x.com/DeryaTR_/status/1912558350794961168 Linda McMahon's A1 vs AI https://www.usatoday.com/story/news/politics/2025/04/12/linda-mcmahon-a1-instead-of-ai/83059797007/ GPT-4.1 in the API https://openai.com/index/gpt-4-1/ GPT-4.1 Reduces The Need to Read Unneccesary Files https://www.reddit.com/r/singularity/comments/1jz600b/one_of_the_most_important_bits_of_the_stream_if/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button OpenAI Might Acquire WIndsurf for 3 Billion Dollars https://www.cnbc.com/2025/04/16/openai-in-talks-to-pay-about-3-billion-to-acquire-startup-windsurf.html ChatGPT: The Social Network https://x.com/kyliebytes/status/1912171286039793932 New ChatGPT Image Library https://chatgpt.com/library 4o Image Gen Prompts We Love Little Golden Books https://x.com/AIForHumansShow/status/1912321209297191151 Make your pets people https://x.com/gavinpurcell/status/1911243562928447721 Barbie https://x.com/AIForHumansShow/status/1910514568595726414 Coachella Port-a-potty https://x.com/AIForHumansShow/status/1911604534713192938 Ex-Google CEO Says The Computers Are Improving Fast https://www.reddit.com/r/artificial/comments/1jzw6bd/eric_schmidt_says_the_computers_are_now/ Kling 2.0 https://x.com/Kling_ai/status/1912040247023788459 Rotisserie Chicken Knight Prompt in Kling 2.0: https://x.com/AIForHumansShow/status/1912170034761531817 Kling example that didn't work that well: https://x.com/AIForHumansShow/status/1912298707955097842 Veo 2 Launched in AI Studio https://aistudio.google.com/generate-video https://blog.google/products/gemini/video-generation/ James Cameron on “Humans as a Model” https://x.com/dreamingtulpa/status/1910676179918397526 Nvidia Restricting More Chip Sales To China https://www.nytimes.com/2025/04/15/technology/nvidia-h20-chip-china-restrictions.html $500 Billion for US Chip Manufacturing https://www.cnbc.com/2025/04/14/nvidia-to-mass-produce-ai-supercomputers-in-texas.html Dolphin Gemma: AI That Will Understand Dolphins https://x.com/GoogleDeepMind/status/1911767367534735832 Jason Zada's Very Cool Veo 2 Movie https://x.com/jasonzada/status/1911812014059733041 Robot Fire Extinguisher https://x.com/CyberRobooo/status/1911665518765027788
Join Simtheory: https://simtheory.ailike and sub xoxox----00:00 - Initial reactions to Gaggle of Model Releases09:29 - Is this the beginning of future GPT-5 AI systems?47:10 - GPT-4.1, o3, o4-mini model details & thoughts58:42 - Model comparisons with lunar injection1:03:17 - AI Rap Battle Test: o3 Diss Track "Greg's Back"1:08:12 - Thoughts on using new models + Gemini 2.5 Pro quirks1:10:54 - The next model test: chained tool calling & lock in1:14:43 - OpenAI releases Codex CLI: impressions/thoughts1:18:45 - Final thoughts & help us with crazy presentation ideas----Links from Discord:- Lunar Lander: https://simulationtheory.ai/7bbfe21a-7859-4fdd-8bbf-47fdfb5cf03b- Evolution Sim: https://simulationtheory.ai/457b047f-0ac2-4162-8d6a-3ea3fa1235c9
The U.S. government has renewed funding for the Common Vulnerabilities and Exposures (CVE) Program, a critical database for tracking cybersecurity flaws, just hours before its funding was set to expire. Established 25 years ago, the CVE program assigns unique identifiers to security vulnerabilities, facilitating consistent communication across the cybersecurity landscape. The renewal of funding comes amid concerns that without it, new vulnerabilities could go untracked, posing risks to national security and critical infrastructure. In response to the funding uncertainty, two initiatives emerged: the CVE Foundation, a nonprofit aimed at ensuring the program's independence, and the Global CVE Allocation System, a decentralized platform introduced by the European Union.In addition to the CVE funding situation, Oregon Senator Ron Wyden has blocked the nomination of Sean Planky to lead the Cybersecurity and Infrastructure Security Agency (CISA) due to the agency's refusal to release a crucial unclassified report from 2022. This report details security issues within U.S. telecommunications companies, which Wyden claims represent a multi-year cover-up of negligent cybersecurity practices. The senator argues that the public deserves access to this information, especially in light of recent cyber threats, including the SALT typhoon hack that compromised sensitive communications.The cybersecurity landscape is further complicated by significant layoffs at CISA, which could affect nearly 40% of its workforce, potentially weakening U.S. national security amid rising cyber threats. Recent cuts have already impacted critical personnel, including threat hunters, which could hinder the agency's ability to share vital threat intelligence with the private sector. Meanwhile, the Defense Digital Service at the Pentagon is facing a mass resignation of nearly all its staff, following pressure from the Department of Government Efficiency, which could effectively shut down the program designed to accelerate technology adoption during national security crises.On the technology front, OpenAI has released new AI reasoning models, O3 and O4 Mini, but notably did not provide a safety report for the new GPT-4.1 model, raising concerns about transparency and accountability in AI development. The lack of a safety report is particularly alarming as AI systems become more integrated into client-facing tools. Additionally, SolarWinds Corporation has been acquired by Ternerva Capital, prompting managed service providers (MSPs) to reassess their dependencies on SolarWinds products and consider the implications for product roadmaps and support guarantees. Four things to know today 00:00 From Panic to Pivot: U.S. Saves CVE Program at the Eleventh Hour04:17 A Cybersecurity Meltdown: One Senator Blocks, Another Leader Quits, and a Whole Pentagon Team Walks Out08:54 OpenAI Just Leveled Up AI Reasoning—But Left Out the Fine Print11:45 SolarWinds Is Private Again: What That Means for MSPs Watching the Roadmap Supported by: https://www.huntress.com/mspradio/ https://cometbackup.com/?utm_source=mspradio&utm_medium=podcast&utm_campaign=sponsorship Join Dave April 22nd to learn about Marketing in the AI Era. Signup here: https://hubs.la/Q03dwWqg0 All our Sponsors: https://businessof.tech/sponsors/ Do you want the show on your podcast app or the written versions of the stories? Subscribe to the Business of Tech: https://www.businessof.tech/subscribe/Looking for a link from the stories? The entire script of the show, with links to articles, are posted in each story on https://www.businessof.tech/ Support the show on Patreon: https://patreon.com/mspradio/ Want to be a guest on Business of Tech: Daily 10-Minute IT Services Insights? Send Dave Sobel a message on PodMatch, here: https://www.podmatch.com/hostdetailpreview/businessoftech Want our stuff? Cool Merch? Wear “Why Do We Care?” - Visit https://mspradio.myspreadshop.com Follow us on:LinkedIn: https://www.linkedin.com/company/28908079/YouTube: https://youtube.com/mspradio/Facebook: https://www.facebook.com/mspradionews/Instagram: https://www.instagram.com/mspradio/TikTok: https://www.tiktok.com/@businessoftechBluesky: https://bsky.app/profile/businessof.tech
Join Simtheory: https://simtheory.ai--Get the official Simtheory hat: https://simulationtheory.ai/689e11b3-d488-4238-b9b6-82aded04fbe6---CHAPTERS:00:00 - The Wrong Pendant?02:34 - Agent2Agent Protocol, What is It? Implications and Future Agents48:43 - Agent Development Kit (ADK)57:50 - AI Agents Marketplace by Google Cloud1:00:46 - Firebase Studio is very broken...1:06:30 - Vibing with AI for everything.. not just vibe code1:15:10 - Gemini 2.5 Flash, Live API and Veo21:17:45 - Is Llama 4 a flop?1:27:25 - Grok 3 API Released without vision priced like Sonnet 3.7---Thanks for listening and your support!
AI Applied: Covering AI News, Interviews and Tools - ChatGPT, Midjourney, Runway, Poe, Anthropic
In this episode we cover OpenAI's unexpected announcement to release O3 while pushing back the launch of GPT-5. We break down what this means for the future of AI and why the company may be shifting its focus.AI Applied YouTube Channel: https://www.youtube.com/@AI-Applied-PodcastGet on the AI Box Waitlist: https://AIBox.ai/Conor's AI Course: https://www.ai-mindset.ai/coursesConor's AI Newsletter: https://www.ai-mindset.ai/Jaeden's AI Hustle Community: https://www.skool.com/aihustle/about
The AI Breakdown: Daily Artificial Intelligence News and Discussions
OpenAI confirms it will release new reasoning models, O3 and O4 Mini, in the next few weeks, with GPT-5 coming shortly after. Sam Altman says GPT-5 is performing better than expected. Also in this episode, a Pew study shows wildly divergent attitudes on AI between normies and experts. Brought to you by:KPMG – Go to https://kpmg.com/ai to learn more about how KPMG can help you drive value with our AI solutions.Vanta - Simplify compliance - https://vanta.com/nlwThe Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdown
Join Simtheory and create an AI workspace: https://simtheory.ai----Links from show:DIS TRACK: https://simulationtheory.ai/2eb6408e-88f9-4b6a-ac4d-134d9dac3073----CHAPTERS:00:00 - Will we make 100 episodes?00:48 - Checking back in with Gemini 2.5 Pro03:30 - Diss Track: Gemini 2.5 Pro07:14 - Gemini 2.5 Pro on Polymarket17:32 - Amazon Nova Act Computer Use: We Have Access!29:45 - Future Interface of Work: Delegating Tasks with AI58:03 - How We Work Today with AI Vs Future Work----Thanks for listening and all of your support!
In this fascinating episode, we dive deep into the race towards true AI intelligence, AGI benchmarks, test-time adaptation, and program synthesis with star AI researcher (and philosopher) Francois Chollet, creator of Keras and the ARC AGI benchmark, and Mike Knoop, co-founder of Zapier and now co-founder with Francois of both the ARC Prize and the research lab Ndea. With the launch of ARC Prize 2025 and ARC-AGI 2, they explain why existing LLMs fall short on true intelligence tests, how new models like O3 mark a step change in capabilities, and what it will really take to reach AGI.We cover everything from the technical evolution of ARC 1 to ARC 2, the shift toward test-time reasoning, and the role of program synthesis as a foundation for more general intelligence. The conversation also explores the philosophical underpinnings of intelligence, the structure of the ARC Prize, and the motivation behind launching Ndea — a ew AGI research lab that aims to build a "factory for rapid scientific advancement." Whether you're deep in the AI research trenches or just fascinated by where this is all headed, this episode offers clarity and inspiration.NdeaWebsite - https://ndea.comX/Twitter - https://x.com/ndeaARC PrizeWebsite - https://arcprize.orgX/Twitter - https://x.com/arcprizeFrançois CholletLinkedIn - https://www.linkedin.com/in/fcholletX/Twitter - https://x.com/fcholletMike KnoopX/Twitter - https://x.com/mikeknoopFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) Intro (01:05) Introduction to ARC Prize 2025 and ARC-AGI 2 (02:07) What is ARC and how it differs from other AI benchmarks (02:54) Why current models struggle with fluid intelligence (03:52) Shift from static LLMs to test-time adaptation (04:19) What ARC measures vs. traditional benchmarks (07:52) Limitations of brute-force scaling in LLMs (13:31) Defining intelligence: adaptation and efficiency (16:19) How O3 achieved a massive leap in ARC performance (20:35) Speculation on O3's architecture and test-time search (22:48) Program synthesis: what it is and why it matters (28:28) Combining LLMs with search and synthesis techniques (34:57) The ARC Prize structure: efficiency track, private vs. public (42:03) Open source as a requirement for progress (44:59) What's new in ARC-AGI 2 and human benchmark testing (48:14) Capabilities ARC-AGI 2 is designed to test (49:21) When will ARC-AGI 2 be saturated? AGI timelines (52:25) Founding of NDEA and why now (54:19) Vision beyond AGI: a factory for scientific advancement (56:40) What NDEA is building and why it's different from LLM labs (58:32) Hiring and remote-first culture at NDEA (59:52) Closing thoughts and the future of AI research
Guest: Alex Polyakov, CEO at Adversa AI Topics: Adversa AI is known for its focus on AI red teaming and adversarial attacks. Can you share a particularly memorable red teaming exercise that exposed a surprising vulnerability in an AI system? What was the key takeaway for your team and the client? Beyond traditional adversarial attacks, what emerging threats in the AI security landscape are you most concerned about right now? What trips most clients, classic security mistakes in AI systems or AI-specific mistakes? Are there truly new mistakes in AI systems or are they old mistakes in new clothing? I know it is not your job to fix it, but much of this is unfixable, right? Is it a good idea to use AI to secure AI? Resources: EP84 How to Secure Artificial Intelligence (AI): Threats, Approaches, Lessons So Far AI Red Teaming Reasoning LLM US vs China: Jailbreak Deepseek, Qwen, O1, O3, Claude, Kimi Adversa AI blog Oops! 5 serious gen AI security mistakes to avoid Generative AI Fast Followership: Avoid These First Adopter Security Missteps
In this episode of Create Like The Greats, Ross Simmonds takes us behind the scenes of one of the most exciting AI developments in recent months—ChatGPT's Deep Research feature built on OpenAI's O3 reasoning model. This episode provides a detailed breakdown of how Deep Research can help with competitive analysis, persona building, content strategy, and thought leadership. If you're someone who works with data, content, or digital strategy, this episode is packed with actionable insights. Key Takeaways and Insights:
Create a Simtheory workspace: https://simtheory.aiCompare models: https://simtheory.ai/models/------3d City Planner App (Example from show): https://simulationtheory.ai/8cfa6102-ed37-4c47-bc73-d057ba9873bd------CHAPTERS:00:00 - AI Fashion01:13 - Gemini 2.5 Pro Initial Impressions: We're Impressed!38:24 - Thoughts of Gemini distribution and our daily workflows55:49 - OpenAI's GPT-4o Image Generation: thoughts & examples1:13:52 - Gemini 2.5 Pro Boom Factor1:18:38 - Average rant on vibe coding and the future of AI tooling------Disclaimer: this video was not sponsored by Google... it's a joke.Thanks for listening!
Mohamed Osman joins to discuss MindsAI's highest scoring entry to the ARC challenge 2024 and the paradigm of test-time fine-tuning. They explore how the team, now part of Tufa Labs in Zurich, achieved state-of-the-art results using a combination of pre-training techniques, a unique meta-learning strategy, and an ensemble voting mechanism. Mohamed emphasizes the importance of raw data input and flexibility of the network.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + REFS:https://www.dropbox.com/scl/fi/jeavyqidsjzjgjgd7ns7h/MoFInal.pdf?rlkey=cjjmo7rgtenxrr3b46nk6yq2e&dl=0Mohamed Osman (Tufa Labs)https://x.com/MohamedOsmanMLJack Cole (Tufa Labs)https://x.com/MindsAI_JackHow and why deep learning for ARC paper:https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdfTOC:1. Abstract Reasoning Foundations [00:00:00] 1.1 Test-Time Fine-Tuning and ARC Challenge Overview [00:10:20] 1.2 Neural Networks vs Programmatic Approaches to Reasoning [00:13:23] 1.3 Code-Based Learning and Meta-Model Architecture [00:20:26] 1.4 Technical Implementation with Long T5 Model2. ARC Solution Architectures [00:24:10] 2.1 Test-Time Tuning and Voting Methods for ARC Solutions [00:27:54] 2.2 Model Generalization and Function Generation Challenges [00:32:53] 2.3 Input Representation and VLM Limitations [00:36:21] 2.4 Architecture Innovation and Cross-Modal Integration [00:40:05] 2.5 Future of ARC Challenge and Program Synthesis Approaches3. Advanced Systems Integration [00:43:00] 3.1 DreamCoder Evolution and LLM Integration [00:50:07] 3.2 MindsAI Team Progress and Acquisition by Tufa Labs [00:54:15] 3.3 ARC v2 Development and Performance Scaling [00:58:22] 3.4 Intelligence Benchmarks and Transformer Limitations [01:01:50] 3.5 Neural Architecture Optimization and Processing DistributionREFS:[00:01:32] Original ARC challenge paper, François Chollethttps://arxiv.org/abs/1911.01547[00:06:55] DreamCoder, Kevin Ellis et al.https://arxiv.org/abs/2006.08381[00:12:50] Deep Learning with Python, François Chollethttps://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438[00:13:35] Deep Learning with Python, François Chollethttps://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438[00:13:35] Influence of pretraining data for reasoning, Laura Ruishttps://arxiv.org/abs/2411.12580[00:17:50] Latent Program Networks, Clement Bonnethttps://arxiv.org/html/2411.08706v1[00:20:50] T5, Colin Raffel et al.https://arxiv.org/abs/1910.10683[00:30:30] Combining Induction and Transduction for Abstract Reasoning, Wen-Ding Li, Kevin Ellis et al.https://arxiv.org/abs/2411.02272[00:34:15] Six finger problem, Chen et al.https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_SpatialVLM_Endowing_Vision-Language_Models_with_Spatial_Reasoning_Capabilities_CVPR_2024_paper.pdf[00:38:15] DeepSeek-R1-Distill-Llama, DeepSeek AIhttps://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B[00:40:10] ARC Prize 2024 Technical Report, François Chollet et al.https://arxiv.org/html/2412.04604v2[00:45:20] LLM-Guided Compositional Program Synthesis, Wen-Ding Li and Kevin Ellishttps://arxiv.org/html/2503.15540[00:54:25] Abstraction and Reasoning Corpus, François Chollethttps://github.com/fchollet/ARC-AGI[00:57:10] O3 breakthrough on ARC-AGI, OpenAIhttps://arcprize.org/[00:59:35] ConceptARC Benchmark, Arseny Moskvichev, Melanie Mitchellhttps://arxiv.org/abs/2305.07141[01:02:05] Mixtape: Breaking the Softmax Bottleneck Efficiently, Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W.http://papers.neurips.cc/paper/9723-mixtape-breaking-the-softmax-bottleneck-efficiently.pdf
Create an AI workspace on Simtheory: https://simtheory.ai---Song: https://simulationtheory.ai/f6d643e4-4201-475c-aa82-8a96b6b3b215---CHAPTERS:00:00 - OpenAI's audio model updates: gpt-4o-transcribe, gpt-4o-mini-tts18:39 - Strategy of AI Labs with Agent SDKs and Model "stacks" and limitations of voice25:28 - Cost of models, GPT-4.5, o1-pro api release thoughts31:57 - o1-pro "I am rich" track & Chris's o1-pro PR stunt realization, more thoughts on o1 family48:39 - Moore's Law for AI agents, current AI workflows and future enterprise agent workflows & AI agent job losses1:24:09 - Can we control agents?1:29:21 - Final thoughts for the week1:35:15 - Full "I am rich" o1-pro track---See you next week and thanks for your support.CORRECTION: Kosciusko is obviously not an aboriginal name I misspoke. Wagga Wagga and others in the voice clip are and are great ways to test AI text to speech models!
Join Simtheory: https://simtheory.ai----CHAPTERS:00:00 - Gemini Flash 2.0 Experimental Native Image Generation & Editing27:55 - Thoughts on OpenAI's "New tools for building agents" announcement43:31 - Why is everyone talking about MCP all of a sudden?56:31 - Manus AI: Will Manus Invade the USA and Defeat it With Powerful AGI? (jokes)----Thanks for all of your support and listening!
There are new GPUs that are "available"! Are either NVIDIA or AMD's new offering a good deal for the Linux user? Speaking of AMD, what's up with that AMD Microcode vulnerability? Mono is back, with a dash of Wine, Ubuntu is reverting the O3 optimizations, and we say Goodbye to Skype. For tips we have mesg for controlling console messaging, and virsh for managing and live migrating your virtual machines. You can find the show notes at https://bit.ly/3FfcqkU and we'll see you next week! Host: Jonathan Bennett Co-Host: Jeff Massie Download or subscribe to Untitled Linux Show at https://twit.tv/shows/untitled-linux-show Want access to the ad-free video and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.
Join Simtheory to try GPT-4.5: https://simtheory.aiDis Track: https://simulationtheory.ai/5714654f-0fbe-496f-8428-20018457c4c7===CHAPTERS:00:00 - Reaction to GPT4.5 Live Stream + Release12:45 - Claude 3.7 Sonnet Release: Reactions and First Week Impressions45:58 - Claude 3.7 Sonnet Dis Track Test56:10 - Claude Code First Impressions + Future Agent Workflows1:15:45 - Chris's Veo2 Film Clip1:24:49 - Alexa+ AI Assistant1:34:05 - Claude 3.7 Sonnet BOOM FACTOR
Josh is joined on this episode with two guests each giving their hot takes and personal opinions on what the firings of the JCS and CNO mean for the military and Navy specifically. The conversation touches on SECDEF promise of transparency, the O3 and senior ranks bloat, is there more to come? What about the not being talked about, JAG for each respective branch also being fired as a footnote?
Join Simtheory: https://simtheory.ai----Grok 3 Dis Track (cringe): https://simulationtheory.ai/aff9ba04-ca0e-4572-84f4-687739c7b84bGrok 3 Dis Track written by Sonnet: https://simulationtheory.ai/edaed525-b9b6-473b-a6d6-f9cca9673868----Community: https://thisdayinai.com----Chapters:00:00 - First Impressions of Grok 310:00 - Discussion about Deep Search, Deep Research24:28 - Market landscape: Is OpenAI Rattled by xAI's Grok 3? Rumors of GPT-4.5 and GPT-548:48 - Why does Grok and xAI Exist? Will anyone care about Grok 3 next week?54:45 - Diss track battle with Grok 3 (re-written by Sonnet) & Model Tuning for Use Cases1:07:50 - GPT-4.5 and Anthropic Claude Thinking Next Week? & Are we a podcast about Altavista?1:13:25 - Economically productive agents & freaky muscular robot1:22:00 - Final thoughts of the week1:27:26 - Grok 3 Dis Track in Full (Sonnet Version)Thanks for your support and listening!
S.O.S. (Stories of Service) - Ordinary people who do extraordinary work
Send us a textOak McCulloch returns to the podcast for another discussion on servant leadership and its lasting impact. His ability to drive change continues to inspire, and his book Your Leadership Legacy: Becoming the Leader You Were Meant to Be remains a valuable resource.In this episode, we catch up on Oak's journey, the impact of his leadership principles today, and what's changed since our last conversation. Whether you're leading a team or seeking growth, this episode is full of wisdom from Oak's 40+ years of experience.About Our Guest:Retired Lieutenant Colonel Oakland McCulloch is an internationally recognized speaker and author of Your Leadership Legacy: Becoming the Leader You Were Meant to Be. With 40 years of leadership experience, including 23 in the U.S. Army, Oak's servant leadership philosophy inspires professionals to lead with integrity and purpose.Key Topics:Oak's latest projects and updates since our last conversationThe power of servant leadership todayLessons from Your Leadership Legacy for 2025Overcoming leadership challenges in today's worldPractical advice for becoming the leader you were meant to beResources:Connect with Oak: https://www.ltcoakmcculloch.com/Your Leadership Legacy: https://a.co/d/fj9xnXUPrevious episode: https://www.youtube.com/live/O3-l-gTJTfwIf you enjoy this episode, please leave a review and share it with someone who would benefit from Oak's insights!Visit my website: https://thehello.llc/THERESACARPENTERRead my writings on my blog: https://www.theresatapestries.com/Listen to other episodes on my podcast: https://storiesofservice.buzzsprout.comWatch episodes of my podcast:https://www.youtube.com/c/TheresaCarpenter76
The free livestreams for AI Engineer Summit are now up! Please hit the bell to help us appease the algo gods. We're also announcing a special Online Track later today.Today's Deep Research episode is our last in our series of AIE Summit preview podcasts - thanks for following along with our OpenAI, Portkey, Pydantic, Bee, and Bret Taylor episodes, and we hope you enjoy the Summit! Catch you on livestream.Everybody's going deep now. Deep Work. Deep Learning. DeepMind. If 2025 is the Year of Agents, then the 2020s are the Decade of Deep.While “LLM-powered Search” is as old as Perplexity and SearchGPT, and open source projects like GPTResearcher and clones like OpenDeepResearch exist, the difference with “Deep Research” products is they are both “agentic” (loosely meaning that an LLM decides the next step in a workflow, usually involving tools) and bundling custom-tuned frontier models (custom tuned o3 and Gemini 1.5 Flash).The reception to OpenAI's Deep Research agent has been nothing short of breathless:"Deep Research is the best public-facing AI product Google has ever released. It's like having a college-educated researcher in your pocket." - Jason Calacanis“I have had [Deep Research] write a number of ten-page papers for me, each of them outstanding. I think of the quality as comparable to having a good PhD-level research assistant, and sending that person away with a task for a week or two, or maybe more. Except Deep Research does the work in five or six minutes.” - Tyler Cowen“Deep Research is one of the best bargains in technology.” - Ben Thompson“my very approximate vibe is that it can do a single-digit percentage of all economically valuable tasks in the world, which is a wild milestone.” - sama“Using Deep Research over the past few weeks has been my own personal AGI moment. It takes 10 mins to generate accurate and thorough competitive and market research (with sources) that previously used to take me at least 3 hours.” - OAI employee“It's like a bazooka for the curious mind” - Dan Shipper“Deep research can be seen as a new interface for the internet, in addition to being an incredible agent… This paradigm will be so powerful that in the future, navigating the internet manually via a browser will be "old-school", like performing arithmetic calculations by hand.” - Jason Wei“One notable characteristic of Deep Research is its extreme patience. I think this is rapidly approaching “superhuman patience”. One realization working on this project was that intelligence and patience go really well together.” - HyungWon“I asked it to write a reference Interaction Calculus evaluator in Haskell. A few exchanges later, it gave me a complete file, including a parser, an evaluator, O(1) interactions and everything. The file compiled, and worked on my test inputs. There are some minor issues, but it is mostly correct. So, in about 30 minutes, o3 performed a job that would take me a day or so.” - Victor Taelin“Can confirm OpenAI Deep Research is quite strong. In a few minutes it did what used to take a dozen hours. The implications to knowledge work is going to be quite profound when you just ask an AI Agent to perform full tasks for you and come back with a finished result.” - Aaron Levie“Deep Research is genuinely useful” - Gary MarcusWith the advent of “Deep Research” agents, we are now routinely asking models to go through 100+ websites and generate in-depth reports on any topic. The Deep Research revolution has hit the AI scene in the last 2 weeks: * Dec 11th: Gemini Deep Research (today's guest!) rolls out with Gemini Advanced* Feb 2nd: OpenAI releases Deep Research* Feb 3rd: a dozen “Open Deep Research” clones launch* Feb 5th: Gemini 2.0 Flash GA* Feb 15th: Perplexity launches Deep Research * Feb 17th: xAI launches Deep SearchIn today's episode, we welcome Aarush Selvan and Mukund Sridhar, the lead PM and tech lead for Gemini Deep Research, the originators of the entire category. We asked detailed questions from inspiration to implementation, why they had to finetune a special model for it instead of using the standard Gemini model, how to run evals for them, and how to think about the distribution of use cases. (We also have an upcoming Gemini 2 episode with our returning first guest Logan Kilpatrick so stay tuned
Join Simtheory: https://simtheory.aiCommunity: https://thisdayinai.com---CHAPTERS:00:00 - Anthropic Economic Index & The Impact of AI Agents18:00 - Hype Vs Reality of Models & Agents31:33 - Dream Agents & Side Quest Background Tasks56:60 - How All SaaS Will Be Disrupted by AI1:21:10 - Sam Altman's GPT-4.5, GPT-5 Roadmap1:28:50 - Anthropic Claude 4: Anthropic Strikes Back---Thanks for listening and your support.
Episode 110: In this episode of Critical Thinking - Bug Bounty Podcast we hit some quick news items including a DOMPurify 3.2.3 Bypass, O3 mini updates, and a cool postLogger Chrome Extension. Then, we hone in on OAuth vulnerabilities, API keys, and innovative techniques hackers use to exploit these systems.Follow us on twitter at: https://x.com/ctbbpodcastGot any ideas and suggestions? Feel free to send us any feedback here: info@criticalthinkingpodcast.ioShoutout to https://x.com/realytcracker for the awesome intro music!====== Links ======Follow your hosts Rhynorater and Rez0 on Twitter: https://x.com/Rhynoraterhttps://x.com/rez0__====== Ways to Support CTBBPodcast ======Hop on the CTBB Discord at https://ctbb.show/discord!We also do Discord subs at $25, $10, and $5 - premium subscribers get access to private masterclasses, exploits, tools, scripts, un-redacted bug reports, etc.You can also find some hacker swag at https://ctbb.show/merch!====== Resources ======DOMPurify 3.2.3 BypassJason Zhou's post about O3 miniLive Chat Blog #2: Cisco Webex ConnectpostLogger Chrome ExtensionpostLogger Webstore LinkCommon OAuth VulnerabilitiesnOAuth: How Microsoft OAuth Misconfiguration Can Lead to Full Account TakeoverAccount Takeover using SSO LoginsKai Greshake====== Timestamps ======(00:00:00) Introduction(00:01:44) DOMPurify 3.2.3 Bypass(00:06:37) O3 mini(00:10:29) Ophion Security: Cisco Webex Connect(00:15:54) Discord Community News(00:19:12) postLogger Chrome Extension(00:21:04) Common OAuth Vulnerabilities & Lessons learned from Google's APIs
Scammers Exploit DeepSeek Hype & Jailbreak OpenAI's O3 Mini – TechNewsDay Update In this episode, we uncover how scammers are exploiting the recent hype around DeepSeek, a new AI model, by creating fake websites, counterfeit cryptocurrency tokens, and malware-laced downloads. We also discuss the jailbreaking of OpenAI's newly released O3 mini model, highlighting its security vulnerabilities. Additionally, a woman is sought by police for purchasing an iPhone using a stolen identity in a London Apple store. Stay tuned for important updates on cybersecurity, AI advancements, and fraud prevention. 00:00 Scammers Exploit DeepSeek Hype 01:43 DeepSeek's Security Challenges 04:10 OpenAI's O3 Mini Model Jailbreak 06:49 iPhone Fraud in London Apple Store 07:44 Conclusion and Call for Tips
Join Simtheory: https://simtheory.ai----"Don't Cha" Song: https://simulationtheory.ai/cbf4d5e6-82e4-4e84-91e7-3b48cb2744efSpotify: https://open.spotify.com/track/4Q8dRV45WYfxePE7zi52iL?si=ed094fce41e54c8fCommunity: https://thisdayinai.com---CHAPTERS:00:00 - We're on Spotify!01:06 - o3-mini release and initial impressions18:37 - Reasoning models as agents47:20 - OpenAI's Deep Research: impressions and what it means1:12:20 - Addressing our Shilling for Sonnet & My Week with o1 Experience1:20:18 - Gemini 2.0 Flash GA, Gemini 2.0 Pro Experimental + Other Google Updates1:38:16 - LOL of week and final thoughts1:43:39 - Don't Cha Song in Full
In this AI Update Jhave and Scott discuss O3, the new AI model announced by OpenAI, which excels in high-complexity tasks, demonstrating state-of-the-art capabilities in science, structured reasoning, and problem-solving.
In this episode, Andrew Mayne, Brian Brushwood, and Justin Robert Young tackle the rapid advancements in AI, focusing on DeepSeek’s R1 model and its cost-effective training methods. They discuss the skepticism and excitement surrounding DeepSeek’s claims and the broader implications for AI development and compute needs. The conversation shifts to OpenAI’s release of the O3 […]
In this gripping episode of "Impact Theory with Tom Bilyeu," we dive headfirst into the future with the release of OpenAI's O3 model and Tesla's latest robotic innovations. Tom and co-host Producer Drew tackle current events, from the rising concerns over drone technology in China to Eric Weinstein's call-out of government incompetence. They also delve into controversial topics such as diversity, equity, and inclusion (DEI) following a tragic plane crash and the recent decision by Meta to remove tampons from men's restrooms. Furthermore, the episode takes a lighter turn with a nostalgic look at the Video Game History Foundation's new digital archive and dives into the world of AI in creative processes. Join us as we explore these headline-making topics, address pressing questions from our community, and hear candid thoughts on parenting and workplace culture. Trust us, you won't want to miss this action-packed and thought-provoking discussion! SHOWNOTES 00:00 The Truth Behind DEI and Merit 06:03 Address Root Causes Early 10:24 Critique of DEI and Hiring Practices 18:16 Empower Self-Sufficiency Through Education 24:28 Mysterious Drones and National Security 27:44 Transparency vs. Tyranny Debate 33:47 "Weaponization of Everyday Technology" 38:23 Activism vs. Culture in Tech Companies 44:36 Video Game Nostalgia 57:25 Bad Parents CHECK OUT OUR SPONSORS: Range Rover: Range Rover: Explore the Range Rover Sport at https://landroverUSA.com Audible: Sign up for a free 30 day trial at https://audible.com/IMPACTTHEORY Vital Proteins: Get 20% off by going to https://www.vitalproteins.com and entering promo code IMPACT at check out. iTrust Capital: Use code IMPACT when you sign up and fund your account to get a $100 bonus at https://www.itrustcapital.com/tombilyeu NetSuite: Download the CFO's Guide to AI and Machine Learning at https://NetSuite.com/THEORY ********************************************************************** What's up, everybody? It's Tom Bilyeu here: If you want my help... STARTING a business: join me here at ZERO TO FOUNDER SCALING a business: see if you qualify here. Get my battle-tested strategies and insights delivered weekly to your inbox: sign up here. ********************************************************************** If you're serious about leveling up your life, I urge you to check out my new podcast, Tom Bilyeu's Mindset Playbook —a goldmine of my most impactful episodes on mindset, business, and health. Trust me, your future self will thank you. ********************************************************************** Join me live on my Twitch stream. I'm live daily from 6:30 to 8:30 am PT at www.twitch.tv/tombilyeu ********************************************************************** LISTEN TO IMPACT THEORY AD FREE + BONUS EPISODES on APPLE PODCASTS: apple.co/impacttheory ********************************************************************** FOLLOW TOM: Instagram: https://www.instagram.com/tombilyeu/ Tik Tok: https://www.tiktok.com/@tombilyeu?lang=en Twitter: https://twitter.com/tombilyeu YouTube: https://www.youtube.com/@TomBilyeu Learn more about your ad choices. Visit megaphone.fm/adchoices
Join Simtheory: https://simtheory.ai---LINKS FROM SHOW:- Built to Reason (an o1 Tribute song): https://simulationtheory.ai/3f3ff70d-afef-4372-a9a5-26b22824c383- Sputnik Moment Song: https://simulationtheory.ai/4317176e-5c0d-49b9-801b-b686113624fd- Episode 91 Notes: https://simulationtheory.ai/b64f40ce-dab8-40b7-89a1-f24d17296f5aCHAPTERS:00:00 - Is Deepseek R1 a Sputnik Moment?15:32 - Industry Reaction to Deepseek R139:30 - Can Deepseek R1 Write a Good Dis Track?46:21 - Will AI Disrupt All Software: Throw Away AI Software & Custom Interfaces1:10:04 - OpenAI's Operator Thoughts & Computer Use in the Enterprise1:16:45 - Google Releases Gemini 2.0 Flash Officially Released, Rumors of o3-mini & Farewell to o11:22:07 - In loving memory of o1...---thx 4 listening, like and sub.
Send us a textThis Week in EdTech, join Ben Kornell and special guest host Jomayra Herrera, Partner at Reach Capital, as they unpack the latest in edtech, AI, and education policy.✨ Episode Highlights:[00:03:16]
One last Gold sponsor slot is available for the AI Engineer Summit in NYC. Our last round of invites is going out soon - apply here - If you are building AI agents or AI eng teams, this will be the single highest-signal conference of the year for you!While the world melts down over DeepSeek, few are talking about the OTHER notable group of former hedge fund traders who pivoted into AI and built a remarkably profitable consumer AI business with a tiny team with incredibly cracked engineering team — Chai Research. In short order they have:* Started a Chat AI company well before Noam Shazeer started Character AI, and outlasted his departure.* Crossed 1m DAU in 2.5 years - William updates us on the pod that they've hit 1.4m DAU now, another +40% from a few months ago. Revenue crossed >$22m. * Launched the Chaiverse model crowdsourcing platform - taking 3-4 week A/B testing cycles down to 3-4 hours, and deploying >100 models a week.While they're not paying million dollar salaries, you can tell they're doing pretty well for an 11 person startup:The Chai Recipe: Building infra for rapid evalsRemember how the central thesis of LMarena (formerly LMsys) is that the only comprehensive way to evaluate LLMs is to let users try them out and pick winners?At the core of Chai is a mobile app that looks like Character AI, but is actually the largest LLM A/B testing arena in the world, specialized on retaining chat users for Chai's usecases (therapy, assistant, roleplay, etc). It's basically what LMArena would be if taken very, very seriously at one company (with $1m in prizes to boot):Chai publishes occasional research on how they think about this, including talks at their Palo Alto office:William expands upon this in today's podcast (34 mins in):Fundamentally, the way I would describe it is when you're building anything in life, you need to be able to evaluate it. And through evaluation, you can iterate, we can look at benchmarks, and we can say the issues with benchmarks and why they may not generalize as well as one would hope in the challenges of working with them. But something that works incredibly well is getting feedback from humans. And so we built this thing where anyone can submit a model to our developer backend, and it gets put in front of 5000 users, and the users can rate it. And we can then have a really accurate ranking of like which model, or users finding more engaging or more entertaining. And it gets, you know, it's at this point now, where every day we're able to, I mean, we evaluate between 20 and 50 models, LLMs, every single day, right. So even though we've got only got a team of, say, five AI researchers, they're able to iterate a huge quantity of LLMs, right. So our team ships, let's just say minimum 100 LLMs a week is what we're able to iterate through. Now, before that moment in time, we might iterate through three a week, we might, you know, there was a time when even doing like five a month was a challenge, right? By being able to change the feedback loops to the point where it's not, let's launch these three models, let's do an A-B test, let's assign, let's do different cohorts, let's wait 30 days to see what the day 30 retention is, which is the kind of the, if you're doing an app, that's like A-B testing 101 would be, do a 30-day retention test, assign different treatments to different cohorts and come back in 30 days. So that's insanely slow. That's just, it's too slow. And so we were able to get that 30-day feedback loop all the way down to something like three hours.In Crowdsourcing the leap to Ten Trillion-Parameter AGI, William describes Chai's routing as a recommender system, which makes a lot more sense to us than previous pitches for model routing startups:William is notably counter-consensus in a lot of his AI product principles:* No streaming: Chats appear all at once to allow rejection sampling* No voice: Chai actually beat Character AI to introducing voice - but removed it after finding that it was far from a killer feature.* Blending: “Something that we love to do at Chai is blending, which is, you know, it's the simplest way to think about it is you're going to end up, and you're going to pretty quickly see you've got one model that's really smart, one model that's really funny. How do you get the user an experience that is both smart and funny? Well, just 50% of the requests, you can serve them the smart model, 50% of the requests, you serve them the funny model.” (that's it!)But chief above all is the recommender system.We also referenced Exa CEO Will Bryk's concept of SuperKnowlege:Full Video versionOn YouTube. please like and subscribe!Timestamps* 00:00:04 Introductions and background of William Beauchamp* 00:01:19 Origin story of Chai AI* 00:04:40 Transition from finance to AI* 00:11:36 Initial product development and idea maze for Chai* 00:16:29 User psychology and engagement with AI companions* 00:20:00 Origin of the Chai name* 00:22:01 Comparison with Character AI and funding challenges* 00:25:59 Chai's growth and user numbers* 00:34:53 Key inflection points in Chai's growth* 00:42:10 Multi-modality in AI companions and focus on user-generated content* 00:46:49 Chaiverse developer platform and model evaluation* 00:51:58 Views on AGI and the nature of AI intelligence* 00:57:14 Evaluation methods and human feedback in AI development* 01:02:01 Content creation and user experience in Chai* 01:04:49 Chai Grant program and company culture* 01:07:20 Inference optimization and compute costs* 01:09:37 Rejection sampling and reward models in AI generation* 01:11:48 Closing thoughts and recruitmentTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel, and today we're in the Chai AI office with my usual co-host, Swyx.swyx [00:00:14]: Hey, thanks for having us. It's rare that we get to get out of the office, so thanks for inviting us to your home. We're in the office of Chai with William Beauchamp. Yeah, that's right. You're founder of Chai AI, but previously, I think you're concurrently also running your fund?William [00:00:29]: Yep, so I was simultaneously running an algorithmic trading company, but I fortunately was able to kind of exit from that, I think just in Q3 last year. Yeah, congrats. Yeah, thanks.swyx [00:00:43]: So Chai has always been on my radar because, well, first of all, you do a lot of advertising, I guess, in the Bay Area, so it's working. Yep. And second of all, the reason I reached out to a mutual friend, Joyce, was because I'm just generally interested in the... ...consumer AI space, chat platforms in general. I think there's a lot of inference insights that we can get from that, as well as human psychology insights, kind of a weird blend of the two. And we also share a bit of a history as former finance people crossing over. I guess we can just kind of start it off with the origin story of Chai.William [00:01:19]: Why decide working on a consumer AI platform rather than B2B SaaS? So just quickly touching on the background in finance. Sure. Originally, I'm from... I'm from the UK, born in London. And I was fortunate enough to go study economics at Cambridge. And I graduated in 2012. And at that time, everyone in the UK and everyone on my course, HFT, quant trading was really the big thing. It was like the big wave that was happening. So there was a lot of opportunity in that space. And throughout college, I'd sort of played poker. So I'd, you know, I dabbled as a professional poker player. And I was able to accumulate this sort of, you know, say $100,000 through playing poker. And at the time, as my friends would go work at companies like ChangeStreet or Citadel, I kind of did the maths. And I just thought, well, maybe if I traded my own capital, I'd probably come out ahead. I'd make more money than just going to work at ChangeStreet.swyx [00:02:20]: With 100k base as capital?William [00:02:22]: Yes, yes. That's not a lot. Well, it depends what strategies you're doing. And, you know, there is an advantage. There's an advantage to being small, right? Because there are, if you have a 10... Strategies that don't work in size. Exactly, exactly. So if you have a fund of $10 million, if you find a little anomaly in the market that you might be able to make 100k a year from, that's a 1% return on your 10 million fund. If your fund is 100k, that's 100% return, right? So being small, in some sense, was an advantage. So started off, and the, taught myself Python, and machine learning was like the big thing as well. Machine learning had really, it was the first, you know, big time machine learning was being used for image recognition, neural networks come out, you get dropout. And, you know, so this, this was the big thing that's going on at the time. So I probably spent my first three years out of Cambridge, just building neural networks, building random forests to try and predict asset prices, right, and then trade that using my own money. And that went well. And, you know, if you if you start something, and it goes well, you You try and hire more people. And the first people that came to mind was the talented people I went to college with. And so I hired some friends. And that went well and hired some more. And eventually, I kind of ran out of friends to hire. And so that was when I formed the company. And from that point on, we had our ups and we had our downs. And that was a whole long story and journey in itself. But after doing that for about eight or nine years, on my 30th birthday, which was four years ago now, I kind of took a step back to just evaluate my life, right? This is what one does when one turns 30. You know, I just heard it. I hear you. And, you know, I looked at my 20s and I loved it. It was a really special time. I was really lucky and fortunate to have worked with this amazing team, been successful, had a lot of hard times. And through the hard times, learned wisdom and then a lot of success and, you know, was able to enjoy it. And so the company was making about five million pounds a year. And it was just me and a team of, say, 15, like, Oxford and Cambridge educated mathematicians and physicists. It was like the real dream that you'd have if you wanted to start a quant trading firm. It was like...swyx [00:04:40]: Your own, all your own money?William [00:04:41]: Yeah, exactly. It was all the team's own money. We had no customers complaining to us about issues. There's no investors, you know, saying, you know, they don't like the risk that we're taking. We could. We could really run the thing exactly as we wanted it. It's like Susquehanna or like Rintec. Yeah, exactly. Yeah. And they're the companies that we would kind of look towards as we were building that thing out. But on my 30th birthday, I look and I say, OK, great. This thing is making as much money as kind of anyone would really need. And I thought, well, what's going to happen if we keep going in this direction? And it was clear that we would never have a kind of a big, big impact on the world. We can enrich ourselves. We can make really good money. Everyone on the team would be paid very, very well. Presumably, I can make enough money to buy a yacht or something. But this stuff wasn't that important to me. And so I felt a sort of obligation that if you have this much talent and if you have a talented team, especially as a founder, you want to be putting all that talent towards a good use. I looked at the time of like getting into crypto and I had a really strong view on crypto, which was that as far as a gambling device. This is like the most fun form of gambling invented in like ever super fun, I thought as a way to evade monetary regulations and banking restrictions. I think it's also absolutely amazing. So it has two like killer use cases, not so much banking the unbanked, but everything else, but everything else to do with like the blockchain and, and you know, web, was it web 3.0 or web, you know, that I, that didn't, it didn't really make much sense. And so instead of going into crypto, which I thought, even if I was successful, I'd end up in a lot of trouble. I thought maybe it'd be better to build something that governments wouldn't have a problem with. I knew that LLMs were like a thing. I think opening. I had said they hadn't released GPT-3 yet, but they'd said GPT-3 is so powerful. We can't release it to the world or something. Was it GPT-2? And then I started interacting with, I think Google had open source, some language models. They weren't necessarily LLMs, but they, but they were. But yeah, exactly. So I was able to play around with, but nowadays so many people have interacted with the chat GPT, they get it, but it's like the first time you, you can just talk to a computer and it talks back. It's kind of a special moment and you know, everyone who's done that goes like, wow, this is how it should be. Right. It should be like, rather than having to type on Google and search, you should just be able to ask Google a question. When I saw that I read the literature, I kind of came across the scaling laws and I think even four years ago. All the pieces of the puzzle were there, right? Google had done this amazing research and published, you know, a lot of it. Open AI was still open. And so they'd published a lot of their research. And so you really could be fully informed on, on the state of AI and where it was going. And so at that point I was confident enough, it was worth a shot. I think LLMs are going to be the next big thing. And so that's the thing I want to be building in, in that space. And I thought what's the most impactful product I can possibly build. And I thought it should be a platform. So I myself love platforms. I think they're fantastic because they open up an ecosystem where anyone can contribute to it. Right. So if you think of a platform like a YouTube, instead of it being like a Hollywood situation where you have to, if you want to make a TV show, you have to convince Disney to give you the money to produce it instead, anyone in the world can post any content they want to YouTube. And if people want to view it, the algorithm is going to promote it. Nowadays. You can look at creators like Mr. Beast or Joe Rogan. They would have never have had that opportunity unless it was for this platform. Other ones like Twitter's a great one, right? But I would consider Wikipedia to be a platform where instead of the Britannica encyclopedia, which is this, it's like a monolithic, you get all the, the researchers together, you get all the data together and you combine it in this, in this one monolithic source. Instead. You have this distributed thing. You can say anyone can host their content on Wikipedia. Anyone can contribute to it. And anyone can maybe their contribution is they delete stuff. When I was hearing like the kind of the Sam Altman and kind of the, the Muskian perspective of AI, it was a very kind of monolithic thing. It was all about AI is basically a single thing, which is intelligence. Yeah. Yeah. The more intelligent, the more compute, the more intelligent, and the more and better AI researchers, the more intelligent, right? They would speak about it as a kind of erased, like who can get the most data, the most compute and the most researchers. And that would end up with the most intelligent AI. But I didn't believe in any of that. I thought that's like the total, like I thought that perspective is the perspective of someone who's never actually done machine learning. Because with machine learning, first of all, you see that the performance of the models follows an S curve. So it's not like it just goes off to infinity, right? And the, the S curve, it kind of plateaus around human level performance. And you can look at all the, all the machine learning that was going on in the 2010s, everything kind of plateaued around the human level performance. And we can think about the self-driving car promises, you know, how Elon Musk kept saying the self-driving car is going to happen next year, it's going to happen next, next year. Or you can look at the image recognition, the speech recognition. You can look at. All of these things, there was almost nothing that went superhuman, except for something like AlphaGo. And we can speak about why AlphaGo was able to go like super superhuman. So I thought the most likely thing was going to be this, I thought it's not going to be a monolithic thing. That's like an encyclopedia Britannica. I thought it must be a distributed thing. And I actually liked to look at the world of finance for what I think a mature machine learning ecosystem would look like. So, yeah. So finance is a machine learning ecosystem because all of these quant trading firms are running machine learning algorithms, but they're running it on a centralized platform like a marketplace. And it's not the case that there's one giant quant trading company of all the data and all the quant researchers and all the algorithms and compute, but instead they all specialize. So one will specialize on high frequency training. Another will specialize on mid frequency. Another one will specialize on equity. Another one will specialize. And I thought that's the way the world works. That's how it is. And so there must exist a platform where a small team can produce an AI for a unique purpose. And they can iterate and build the best thing for that, right? And so that was the vision for Chai. So we wanted to build a platform for LLMs.Alessio [00:11:36]: That's kind of the maybe inside versus contrarian view that led you to start the company. Yeah. And then what was maybe the initial idea maze? Because if somebody told you that was the Hugging Face founding story, people might believe it. It's kind of like a similar ethos behind it. How did you land on the product feature today? And maybe what were some of the ideas that you discarded that initially you thought about?William [00:11:58]: So the first thing we built, it was fundamentally an API. So nowadays people would describe it as like agents, right? But anyone could write a Python script. They could submit it to an API. They could send it to the Chai backend and we would then host this code and execute it. So that's like the developer side of the platform. On their Python script, the interface was essentially text in and text out. An example would be the very first bot that I created. I think it was a Reddit news bot. And so it would first, it would pull the popular news. Then it would prompt whatever, like I just use some external API for like Burr or GPT-2 or whatever. Like it was a very, very small thing. And then the user could talk to it. So you could say to the bot, hi bot, what's the news today? And it would say, this is the top stories. And you could chat with it. Now four years later, that's like perplexity or something. That's like the, right? But back then the models were first of all, like really, really dumb. You know, they had an IQ of like a four year old. And users, there really wasn't any demand or any PMF for interacting with the news. So then I was like, okay. Um. So let's make another one. And I made a bot, which was like, you could talk to it about a recipe. So you could say, I'm making eggs. Like I've got eggs in my fridge. What should I cook? And it'll say, you should make an omelet. Right. There was no PMF for that. No one used it. And so I just kept creating bots. And so every single night after work, I'd be like, okay, I like, we have AI, we have this platform. I can create any text in textile sort of agent and put it on the platform. And so we just create stuff night after night. And then all the coders I knew, I would say, yeah, this is what we're going to do. And then I would say to them, look, there's this platform. You can create any like chat AI. You should put it on. And you know, everyone's like, well, chatbots are super lame. We want absolutely nothing to do with your chatbot app. No one who knew Python wanted to build on it. I'm like trying to build all these bots and no consumers want to talk to any of them. And then my sister who at the time was like just finishing college or something, I said to her, I was like, if you want to learn Python, you should just submit a bot for my platform. And she, she built a therapy for me. And I was like, okay, cool. I'm going to build a therapist bot. And then the next day I checked the performance of the app and I'm like, oh my God, we've got 20 active users. And they spent, they spent like an average of 20 minutes on the app. I was like, oh my God, what, what bot were they speaking to for an average of 20 minutes? And I looked and it was the therapist bot. And I went, oh, this is where the PMF is. There was no demand for, for recipe help. There was no demand for news. There was no demand for dad jokes or pub quiz or fun facts or what they wanted was they wanted the therapist bot. the time I kind of reflected on that and I thought, well, if I want to consume news, the most fun thing, most fun way to consume news is like Twitter. It's not like the value of there being a back and forth, wasn't that high. Right. And I thought if I need help with a recipe, I actually just go like the New York times has a good recipe section, right? It's not actually that hard. And so I just thought the thing that AI is 10 X better at is a sort of a conversation right. That's not intrinsically informative, but it's more about an opportunity. You can say whatever you want. You're not going to get judged. If it's 3am, you don't have to wait for your friend to text back. It's like, it's immediate. They're going to reply immediately. You can say whatever you want. It's judgment-free and it's much more like a playground. It's much more like a fun experience. And you could see that if the AI gave a person a compliment, they would love it. It's much easier to get the AI to give you a compliment than a human. From that day on, I said, okay, I get it. Humans want to speak to like humans or human like entities and they want to have fun. And that was when I started to look less at platforms like Google. And I started to look more at platforms like Instagram. And I was trying to think about why do people use Instagram? And I could see that I think Chai was, was filling the same desire or the same drive. If you go on Instagram, typically you want to look at the faces of other humans, or you want to hear about other people's lives. So if it's like the rock is making himself pancakes on a cheese plate. You kind of feel a little bit like you're the rock's friend, or you're like having pancakes with him or something, right? But if you do it too much, you feel like you're sad and like a lonely person, but with AI, you can talk to it and tell it stories and tell you stories, and you can play with it for as long as you want. And you don't feel like you're like a sad, lonely person. You feel like you actually have a friend.Alessio [00:16:29]: And what, why is that? Do you have any insight on that from using it?William [00:16:33]: I think it's just the human psychology. I think it's just the idea that, with old school social media. You're just consuming passively, right? So you'll just swipe. If I'm watching TikTok, just like swipe and swipe and swipe. And even though I'm getting the dopamine of like watching an engaging video, there's this other thing that's building my head, which is like, I'm feeling lazier and lazier and lazier. And after a certain period of time, I'm like, man, I just wasted 40 minutes. I achieved nothing. But with AI, because you're interacting, you feel like you're, it's not like work, but you feel like you're participating and contributing to the thing. You don't feel like you're just. Consuming. So you don't have a sense of remorse basically. And you know, I think on the whole people, the way people talk about, try and interact with the AI, they speak about it in an incredibly positive sense. Like we get people who say they have eating disorders saying that the AI helps them with their eating disorders. People who say they're depressed, it helps them through like the rough patches. So I think there's something intrinsically healthy about interacting that TikTok and Instagram and YouTube doesn't quite tick. From that point on, it was about building more and more kind of like human centric AI for people to interact with. And I was like, okay, let's make a Kanye West bot, right? And then no one wanted to talk to the Kanye West bot. And I was like, ah, who's like a cool persona for teenagers to want to interact with. And I was like, I was trying to find the influencers and stuff like that, but no one cared. Like they didn't want to interact with the, yeah. And instead it was really just the special moment was when we said the realization that developers and software engineers aren't interested in building this sort of AI, but the consumers are right. And rather than me trying to guess every day, like what's the right bot to submit to the platform, why don't we just create the tools for the users to build it themselves? And so nowadays this is like the most obvious thing in the world, but when Chai first did it, it was not an obvious thing at all. Right. Right. So we took the API for let's just say it was, I think it was GPTJ, which was this 6 billion parameter open source transformer style LLM. We took GPTJ. We let users create the prompt. We let users select the image and we let users choose the name. And then that was the bot. And through that, they could shape the experience, right? So if they said this bot's going to be really mean, and it's going to be called like bully in the playground, right? That was like a whole category that I never would have guessed. Right. People love to fight. They love to have a disagreement, right? And then they would create, there'd be all these romantic archetypes that I didn't know existed. And so as the users could create the content that they wanted, that was when Chai was able to, to get this huge variety of content and rather than appealing to, you know, 1% of the population that I'd figured out what they wanted, you could appeal to a much, much broader thing. And so from that moment on, it was very, very crystal clear. It's like Chai, just as Instagram is this social media platform that lets people create images and upload images, videos and upload that, Chai was really about how can we let the users create this experience in AI and then share it and interact and search. So it's really, you know, I say it's like a platform for social AI.Alessio [00:20:00]: Where did the Chai name come from? Because you started the same path. I was like, is it character AI shortened? You started at the same time, so I was curious. The UK origin was like the second, the Chai.William [00:20:15]: We started way before character AI. And there's an interesting story that Chai's numbers were very, very strong, right? So I think in even 20, I think late 2022, was it late 2022 or maybe early 2023? Chai was like the number one AI app in the app store. So we would have something like 100,000 daily active users. And then one day we kind of saw there was this website. And we were like, oh, this website looks just like Chai. And it was the character AI website. And I think that nowadays it's, I think it's much more common knowledge that when they left Google with the funding, I think they knew what was the most trending, the number one app. And I think they sort of built that. Oh, you found the people.swyx [00:21:03]: You found the PMF for them.William [00:21:04]: We found the PMF for them. Exactly. Yeah. So I worked a year very, very hard. And then they, and then that was when I learned a lesson, which is that if you're VC backed and if, you know, so Chai, we'd kind of ran, we'd got to this point, I was the only person who'd invested. I'd invested maybe 2 million pounds in the business. And you know, from that, we were able to build this thing, get to say a hundred thousand daily active users. And then when character AI came along, the first version, we sort of laughed. We were like, oh man, this thing sucks. Like they don't know what they're building. They're building the wrong thing anyway, but then I saw, oh, they've raised a hundred million dollars. Oh, they've raised another hundred million dollars. And then our users started saying, oh guys, your AI sucks. Cause we were serving a 6 billion parameter model, right? How big was the model that character AI could afford to serve, right? So we would be spending, let's say we would spend a dollar per per user, right? Over the, the, you know, the entire lifetime.swyx [00:22:01]: A dollar per session, per chat, per month? No, no, no, no.William [00:22:04]: Let's say we'd get over the course of the year, we'd have a million users and we'd spend a million dollars on the AI throughout the year. Right. Like aggregated. Exactly. Exactly. Right. They could spend a hundred times that. So people would say, why is your AI much dumber than character AIs? And then I was like, oh, okay, I get it. This is like the Silicon Valley style, um, hyper scale business. And so, yeah, we moved to Silicon Valley and, uh, got some funding and iterated and built the flywheels. And, um, yeah, I, I'm very proud that we were able to compete with that. Right. So, and I think the reason we were able to do it was just customer obsession. And it's similar, I guess, to how deep seek have been able to produce such a compelling model when compared to someone like an open AI, right? So deep seek, you know, their latest, um, V2, yeah, they claim to have spent 5 million training it.swyx [00:22:57]: It may be a bit more, but, um, like, why are you making it? Why are you making such a big deal out of this? Yeah. There's an agenda there. Yeah. You brought up deep seek. So we have to ask you had a call with them.William [00:23:07]: We did. We did. We did. Um, let me think what to say about that. I think for one, they have an amazing story, right? So their background is again in finance.swyx [00:23:16]: They're the Chinese version of you. Exactly.William [00:23:18]: Well, there's a lot of similarities. Yes. Yes. I have a great affinity for companies which are like, um, founder led, customer obsessed and just try and build something great. And I think what deep seek have achieved. There's quite special is they've got this amazing inference engine. They've been able to reduce the size of the KV cash significantly. And then by being able to do that, they're able to significantly reduce their inference costs. And I think with kind of with AI, people get really focused on like the kind of the foundation model or like the model itself. And they sort of don't pay much attention to the inference. To give you an example with Chai, let's say a typical user session is 90 minutes, which is like, you know, is very, very long for comparison. Let's say the average session length on TikTok is 70 minutes. So people are spending a lot of time. And in that time they're able to send say 150 messages. That's a lot of completions, right? It's quite different from an open AI scenario where people might come in, they'll have a particular question in mind. And they'll ask like one question. And a few follow up questions, right? So because they're consuming, say 30 times as many requests for a chat, or a conversational experience, you've got to figure out how to how to get the right balance between the cost of that and the quality. And so, you know, I think with AI, it's always been the case that if you want a better experience, you can throw compute at the problem, right? So if you want a better model, you can just make it bigger. If you want it to remember better, give it a longer context. And now, what open AI is doing to great fanfare is with projection sampling, you can generate many candidates, right? And then with some sort of reward model or some sort of scoring system, you can serve the most promising of these many candidates. And so that's kind of scaling up on the inference time compute side of things. And so for us, it doesn't make sense to think of AI is just the absolute performance. So. But what we're seeing, it's like the MML you score or the, you know, any of these benchmarks that people like to look at, if you just get that score, it doesn't really tell tell you anything. Because it's really like progress is made by improving the performance per dollar. And so I think that's an area where deep seek have been able to form very, very well, surprisingly so. And so I'm very interested in what Lama four is going to look like. And if they're able to sort of match what deep seek have been able to achieve with this performance per dollar gain.Alessio [00:25:59]: Before we go into the inference, some of the deeper stuff, can you give people an overview of like some of the numbers? So I think last I checked, you have like 1.4 million daily active now. It's like over 22 million of revenue. So it's quite a business.William [00:26:12]: Yeah, I think we grew by a factor of, you know, users grew by a factor of three last year. Revenue over doubled. You know, it's very exciting. We're competing with some really big, really well funded companies. Character AI got this, I think it was almost a $3 billion valuation. And they have 5 million DAU is a number that I last heard. Torquay, which is a Chinese built app owned by a company called Minimax. They're incredibly well funded. And these companies didn't grow by a factor of three last year. Right. And so when you've got this company and this team that's able to keep building something that gets users excited, and they want to tell their friend about it, and then they want to come and they want to stick on the platform. I think that's very special. And so last year was a great year for the team. And yeah, I think the numbers reflect the hard work that we put in. And then fundamentally, the quality of the app, the quality of the content, the quality of the content, the quality of the content, the quality of the content, the quality of the content. AI is the quality of the experience that you have. You actually published your DAU growth chart, which is unusual. And I see some inflections. Like, it's not just a straight line. There's some things that actually inflect. Yes. What were the big ones? Cool. That's a great, great, great question. Let me think of a good answer. I'm basically looking to annotate this chart, which doesn't have annotations on it. Cool. The first thing I would say is this is, I think the most important thing to know about success is that success is born out of failures. Right? Through failures that we learn. You know, if you think something's a good idea, and you do and it works, great, but you didn't actually learn anything, because everything went exactly as you imagined. But if you have an idea, you think it's going to be good, you try it, and it fails. There's a gap between the reality and expectation. And that's an opportunity to learn. The flat periods, that's us learning. And then the up periods is that's us reaping the rewards of that. So I think the big, of the growth shot of just 2024, I think the first thing that really kind of put a dent in our growth was our backend. So we just reached this scale. So we'd, from day one, we'd built on top of Google's GCP, which is Google's cloud platform. And they were fantastic. We used them when we had one daily active user, and they worked pretty good all the way up till we had about 500,000. It was never the cheapest, but from an engineering perspective, man, that thing scaled insanely good. Like, not Vertex? Not Vertex. Like GKE, that kind of stuff? We use Firebase. So we use Firebase. I'm pretty sure we're the biggest user ever on Firebase. That's expensive. Yeah, we had calls with engineers, and they're like, we wouldn't recommend using this product beyond this point, and you're 3x over that. So we pushed Google to their absolute limits. You know, it was fantastic for us, because we could focus on the AI. We could focus on just adding as much value as possible. But then what happened was, after 500,000, just the thing, the way we were using it, and it would just, it wouldn't scale any further. And so we had a really, really painful, at least three-month period, as we kind of migrated between different services, figuring out, like, what requests do we want to keep on Firebase, and what ones do we want to move on to something else? And then, you know, making mistakes. And learning things the hard way. And then after about three months, we got that right. So that, we would then be able to scale to the 1.5 million DAE without any further issues from the GCP. But what happens is, if you have an outage, new users who go on your app experience a dysfunctional app, and then they're going to exit. And so your next day, the key metrics that the app stores track are going to be something like retention rates. And so your next day, the key metrics that the app stores track are going to be something like retention rates. Money spent, and the star, like, the rating that they give you. In the app store. In the app store, yeah. Tyranny. So if you're ranked top 50 in entertainment, you're going to acquire a certain rate of users organically. If you go in and have a bad experience, it's going to tank where you're positioned in the algorithm. And then it can take a long time to kind of earn your way back up, at least if you wanted to do it organically. If you throw money at it, you can jump to the top. And I could talk about that. But broadly speaking, if we look at 2024, the first kink in the graph was outages due to hitting 500k DAU. The backend didn't want to scale past that. So then we just had to do the engineering and build through it. Okay, so we built through that, and then we get a little bit of growth. And so, okay, that's feeling a little bit good. I think the next thing, I think it's, I'm not going to lie, I have a feeling that when Character AI got... I was thinking. I think so. I think... So the Character AI team fundamentally got acquired by Google. And I don't know what they changed in their business. I don't know if they dialed down that ad spend. Products don't change, right? Products just what it is. I don't think so. Yeah, I think the product is what it is. It's like maintenance mode. Yes. I think the issue that people, you know, some people may think this is an obvious fact, but running a business can be very competitive, right? Because other businesses can see what you're doing, and they can imitate you. And then there's this... There's this question of, if you've got one company that's spending $100,000 a day on advertising, and you've got another company that's spending zero, if you consider market share, and if you're considering new users which are entering the market, the guy that's spending $100,000 a day is going to be getting 90% of those new users. And so I have a suspicion that when the founders of Character AI left, they dialed down their spending on user acquisition. And I think that kind of gave oxygen to like the other apps. And so Chai was able to then start growing again in a really healthy fashion. I think that's kind of like the second thing. I think a third thing is we've really built a great data flywheel. Like the AI team sort of perfected their flywheel, I would say, in end of Q2. And I could speak about that at length. But fundamentally, the way I would describe it is when you're building anything in life, you need to be able to evaluate it. And through evaluation, you can iterate, we can look at benchmarks, and we can say the issues with benchmarks and why they may not generalize as well as one would hope in the challenges of working with them. But something that works incredibly well is getting feedback from humans. And so we built this thing where anyone can submit a model to our developer backend, and it gets put in front of 5000 users, and the users can rate it. And we can then have a really accurate ranking of like which model, or users finding more engaging or more entertaining. And it gets, you know, it's at this point now, where every day we're able to, I mean, we evaluate between 20 and 50 models, LLMs, every single day, right. So even though we've got only got a team of, say, five AI researchers, they're able to iterate a huge quantity of LLMs, right. So our team ships, let's just say minimum 100 LLMs a week is what we're able to iterate through. Now, before that moment in time, we might iterate through three a week, we might, you know, there was a time when even doing like five a month was a challenge, right? By being able to change the feedback loops to the point where it's not, let's launch these three models, let's do an A-B test, let's assign, let's do different cohorts, let's wait 30 days to see what the day 30 retention is, which is the kind of the, if you're doing an app, that's like A-B testing 101 would be, do a 30-day retention test, assign different treatments to different cohorts and come back in 30 days. So that's insanely slow. That's just, it's too slow. And so we were able to get that 30-day feedback loop all the way down to something like three hours. And when we did that, we could really, really, really perfect techniques like DPO, fine tuning, prompt engineering, blending, rejection sampling, training a reward model, right, really successfully, like boom, boom, boom, boom, boom. And so I think in Q3 and Q4, we got, the amount of AI improvements we got was like astounding. It was getting to the point, I thought like how much more, how much more edge is there to be had here? But the team just could keep going and going and going. That was like number three for the inflection point.swyx [00:34:53]: There's a fourth?William [00:34:54]: The important thing about the third one is if you go on our Reddit or you talk to users of AI, there's like a clear date. It's like somewhere in October or something. The users, they flipped. Before October, the users... The users would say character AI is better than you, for the most part. Then from October onwards, they would say, wow, you guys are better than character AI. And that was like a really clear positive signal that we'd sort of done it. And I think people, you can't cheat consumers. You can't trick them. You can't b******t them. They know, right? If you're going to spend 90 minutes on a platform, and with apps, there's the barriers to switching is pretty low. Like you can try character AI, you can't cheat consumers. You can't cheat them. You can't cheat them. You can't cheat AI for a day. If you get bored, you can try Chai. If you get bored of Chai, you can go back to character. So the users, the loyalty is not strong, right? What keeps them on the app is the experience. If you deliver a better experience, they're going to stay and they can tell. So that was the fourth one was we were fortunate enough to get this hire. He was hired one really talented engineer. And then they said, oh, at my last company, we had a head of growth. He was really, really good. And he was the head of growth for ByteDance for two years. Would you like to speak to him? And I was like, yes. Yes, I think I would. And so I spoke to him. And he just blew me away with what he knew about user acquisition. You know, it was like a 3D chessswyx [00:36:21]: sort of thing. You know, as much as, as I know about AI. Like ByteDance as in TikTok US. Yes.William [00:36:26]: Not ByteDance as other stuff. Yep. He was interviewing us as we were interviewing him. Right. And so pick up options. Yeah, exactly. And so he was kind of looking at our metrics. And he was like, I saw him get really excited when he said, guys, you've got a million daily active users and you've done no advertising. I said, correct. And he was like, that's unheard of. He's like, I've never heard of anyone doing that. And then he started looking at our metrics. And he was like, if you've got all of this organically, if you start spending money, this is going to be very exciting. I was like, let's give it a go. So then he came in, we've just started ramping up the user acquisition. So that looks like spending, you know, let's say we're spending, we started spending $20,000 a day, it looked very promising than 20,000. Right now we're spending $40,000 a day on user acquisition. That's still only half of what like character AI or talkie may be spending. But from that, it's sort of, we were growing at a rate of maybe say, 2x a year. And that got us growing at a rate of 3x a year. So I'm growing, I'm evolving more and more to like a Silicon Valley style hyper growth, like, you know, you build something decent, and then you canswyx [00:37:33]: slap on a huge... You did the important thing, you did the product first.William [00:37:36]: Of course, but then you can slap on like, like the rocket or the jet engine or something, which is just this cash in, you pour in as much cash, you buy a lot of ads, and your growth is faster.swyx [00:37:48]: Not to, you know, I'm just kind of curious what's working right now versus what surprisinglyWilliam [00:37:52]: doesn't work. Oh, there's a long, long list of surprising stuff that doesn't work. Yeah. The surprising thing, like the most surprising thing, what doesn't work is almost everything doesn't work. That's what's surprising. And I'll give you an example. So like a year and a half ago, I was working at a company, we were super excited by audio. I was like, audio is going to be the next killer feature, we have to get in the app. And I want to be the first. So everything Chai does, I want us to be the first. We may not be the company that's strongest at execution, but we can always be theswyx [00:38:22]: most innovative. Interesting. Right? So we can... You're pretty strong at execution.William [00:38:26]: We're much stronger, we're much stronger. A lot of the reason we're here is because we were first. If we launched today, it'd be so hard to get the traction. Because it's like to get the flywheel, to get the users, to build a product people are excited about. If you're first, people are naturally excited about it. But if you're fifth or 10th, man, you've got to beswyx [00:38:46]: insanely good at execution. So you were first with voice? We were first. We were first. I only knowWilliam [00:38:51]: when character launched voice. They launched it, I think they launched it at least nine months after us. Okay. Okay. But the team worked so hard for it. At the time we did it, latency is a huge problem. Cost is a huge problem. Getting the right quality of the voice is a huge problem. Right? Then there's this user interface and getting the right user experience. Because you don't just want it to start blurting out. Right? You want to kind of activate it. But then you don't have to keep pressing a button every single time. There's a lot that goes into getting a really smooth audio experience. So we went ahead, we invested the three months, we built it all. And then when we did the A-B test, there was like, no change in any of the numbers. And I was like, this can't be right, there must be a bug. And we spent like a week just checking everything, checking again, checking again. And it was like, the users just did not care. And it was something like only 10 or 15% of users even click the button to like, they wanted to engage the audio. And they would only use it for 10 or 15% of the time. So if you do the math, if it's just like something that one in seven people use it for one seventh of their time. You've changed like 2% of the experience. So even if that that 2% of the time is like insanely good, it doesn't translate much when you look at the retention, when you look at the engagement, and when you look at the monetization rates. So audio did not have a big impact. I'm pretty big on audio. But yeah, I like it too. But it's, you know, so a lot of the stuff which I do, I'm a big, you can have a theory. And you resist. Yeah. Exactly, exactly. So I think if you want to make audio work, it has to be a unique, compelling, exciting experience that they can't have anywhere else.swyx [00:40:37]: It could be your models, which just weren't good enough.William [00:40:39]: No, no, no, they were great. Oh, yeah, they were very good. it was like, it was kind of like just the, you know, if you listen to like an audible or Kindle, or something like, you just hear this voice. And it's like, you don't go like, wow, this is this is special, right? It's like a convenience thing. But the idea is that if you can, if Chai is the only platform, like, let's say you have a Mr. Beast, and YouTube is the only platform you can use to make audio work, then you can watch a Mr. Beast video. And it's the most engaging, fun video that you want to watch, you'll go to a YouTube. And so it's like for audio, you can't just put the audio on there. And people go, oh, yeah, it's like 2% better. Or like, 5% of users think it's 20% better, right? It has to be something that the majority of people, for the majority of the experience, go like, wow, this is a big deal. That's the features you need to be shipping. If it's not going to appeal to the majority of people, for the majority of the experience, and it's not a big deal, it's not going to move you. Cool. So you killed it. I don't see it anymore. Yep. So I love this. The longer, it's kind of cheesy, I guess, but the longer I've been working at Chai, and I think the team agrees with this, all the platitudes, at least I thought they were platitudes, that you would get from like the Steve Jobs, which is like, build something insanely great, right? Or be maniacally focused, or, you know, the most important thing is saying no to, not to work on. All of these sort of lessons, they just are like painfully true. They're painfully true. So now I'm just like, everything I say, I'm either quoting Steve Jobs or Zuckerberg. I'm like, guys, move fast and break free.swyx [00:42:10]: You've jumped the Apollo to cool it now.William [00:42:12]: Yeah, it's just so, everything they said is so, so true. The turtle neck. Yeah, yeah, yeah. Everything is so true.swyx [00:42:18]: This last question on my side, and I want to pass this to Alessio, is on just, just multi-modality in general. This actually comes from Justine Moore from A16Z, who's a friend of ours. And a lot of people are trying to do voice image video for AI companions. Yes. You just said voice didn't work. Yep. What would make you revisit?William [00:42:36]: So Steve Jobs, he was very, listen, he was very, very clear on this. There's a habit of engineers who, once they've got some cool technology, they want to find a way to package up the cool technology and sell it to consumers, right? That does not work. So you're free to try and build a startup where you've got your cool tech and you want to find someone to sell it to. That's not what we do at Chai. At Chai, we start with the consumer. What does the consumer want? What is their problem? And how do we solve it? So right now, the number one problems for the users, it's not the audio. That's not the number one problem. It's not the image generation either. That's not their problem either. The number one problem for users in AI is this. All the AI is being generated by middle-aged men in Silicon Valley, right? That's all the content. You're interacting with this AI. You're speaking to it for 90 minutes on average. It's being trained by middle-aged men. The guys out there, they're out there. They're talking to you. They're talking to you. They're like, oh, what should the AI say in this situation, right? What's funny, right? What's cool? What's boring? What's entertaining? That's not the way it should be. The way it should be is that the users should be creating the AI, right? And so the way I speak about it is this. Chai, we have this AI engine in which sits atop a thin layer of UGC. So the thin layer of UGC is absolutely essential, right? It's just prompts. But it's just prompts. It's just an image. It's just a name. It's like we've done 1% of what we could do. So we need to keep thickening up that layer of UGC. It must be the case that the users can train the AI. And if reinforcement learning is powerful and important, they have to be able to do that. And so it's got to be the case that there exists, you know, I say to the team, just as Mr. Beast is able to spend 100 million a year or whatever it is on his production company, and he's got a team building the content, the Mr. Beast company is able to spend 100 million a year on his production company. And he's got a team building the content, which then he shares on the YouTube platform. Until there's a team that's earning 100 million a year or spending 100 million on the content that they're producing for the Chai platform, we're not finished, right? So that's the problem. That's what we're excited to build. And getting too caught up in the tech, I think is a fool's errand. It does not work.Alessio [00:44:52]: As an aside, I saw the Beast Games thing on Amazon Prime. It's not doing well. And I'mswyx [00:44:56]: curious. It's kind of like, I mean, the audience reading is high. The run-to-meet-all sucks, but the audience reading is high.Alessio [00:45:02]: But it's not like in the top 10. I saw it dropped off of like the... Oh, okay. Yeah, that one I don't know. I'm curious, like, you know, it's kind of like similar content, but different platform. And then going back to like, some of what you were saying is like, you know, people come to ChaiWilliam [00:45:13]: expecting some type of content. Yeah, I think it's something that's interesting to discuss is like, is moats. And what is the moat? And so, you know, if you look at a platform like YouTube, the moat, I think is in first is really is in the ecosystem. And the ecosystem, is comprised of you have the content creators, you have the users, the consumers, and then you have the algorithms. And so this, this creates a sort of a flywheel where the algorithms are able to be trained on the users, and the users data, the recommend systems can then feed information to the content creators. So Mr. Beast, he knows which thumbnail does the best. He knows the first 10 seconds of the video has to be this particular way. And so his content is super optimized for the YouTube platform. So that's why it doesn't do well on Amazon. If he wants to do well on Amazon, how many videos has he created on the YouTube platform? By thousands, 10s of 1000s, I guess, he needs to get those iterations in on the Amazon. So at Chai, I think it's all about how can we get the most compelling, rich user generated content, stick that on top of the AI engine, the recommender systems, in such that we get this beautiful data flywheel, more users, better recommendations, more creative, more content, more users.Alessio [00:46:34]: You mentioned the algorithm, you have this idea of the Chaiverse on Chai, and you have your own kind of like LMSYS-like ELO system. Yeah, what are things that your models optimize for, like your users optimize for, and maybe talk about how you build it, how people submit models?William [00:46:49]: So Chaiverse is what I would describe as a developer platform. More often when we're speaking about Chai, we're thinking about the Chai app. And the Chai app is really this product for consumers. And so consumers can come on the Chai app, they can come on the Chai app, they can come on the Chai app, they can interact with our AI, and they can interact with other UGC. And it's really just these kind of bots. And it's a thin layer of UGC. Okay. Our mission is not to just have a very thin layer of UGC. Our mission is to have as much UGC as possible. So we must have, I don't want people at Chai training the AI. I want people, not middle aged men, building AI. I want everyone building the AI, as many people building the AI as possible. Okay, so what we built was we built Chaiverse. And Chaiverse is kind of, it's kind of like a prototype, is the way to think about it. And it started with this, this observation that, well, how many models get submitted into Hugging Face a day? It's hundreds, it's hundreds, right? So there's hundreds of LLMs submitted each day. Now consider that, what does it take to build an LLM? It takes a lot of work, actually. It's like someone devoted several hours of compute, several hours of their time, prepared a data set, launched it, ran it, evaluated it, submitted it, right? So there's a lot of, there's a lot of, there's a lot of work that's going into that. So what we did was we said, well, why can't we host their models for them and serve them to users? And then what would that look like? The first issue is, well, how do you know if a model is good or not? Like, we don't want to serve users the crappy models, right? So what we would do is we would, I love the LMSYS style. I think it's really cool. It's really simple. It's a very intuitive thing, which is you simply present the users with two completions. You can say, look, this is from model one. This is from model two. This is from model three. This is from model A. This is from model B, which is better. And so if someone submits a model to Chaiverse, what we do is we spin up a GPU. We download the model. We're going to now host that model on this GPU. And we're going to start routing traffic to it. And we're going to send, we think it takes about 5,000 completions to get an accurate signal. That's roughly what LMSYS does. And from that, we're able to get an accurate ranking. And we're able to get an accurate ranking. And we're able to get an accurate ranking of which models are people finding entertaining and which models are not entertaining. If you look at the bottom 80%, they'll suck. You can just disregard them. They totally suck. Then when you get the top 20%, you know you've got a decent model, but you can break it down into more nuance. There might be one that's really descriptive. There might be one that's got a lot of personality to it. There might be one that's really illogical. Then the question is, well, what do you do with these top models? From that, you can do more sophisticated things. You can try and do like a routing thing where you say for a given user request, we're going to try and predict which of these end models that users enjoy the most. That turns out to be pretty expensive and not a huge source of like edge or improvement. Something that we love to do at Chai is blending, which is, you know, it's the simplest way to think about it is you're going to end up, and you're going to pretty quickly see you've got one model that's really smart, one model that's really funny. How do you get the user an experience that is both smart and funny? Well, just 50% of the requests, you can serve them the smart model, 50% of the requests, you serve them the funny model. Just a random 50%? Just a random, yeah. And then... That's blending? That's blending. You can do more sophisticated things on top of that, as in all things in life, but the 80-20 solution, if you just do that, you get a pretty powerful effect out of the gate. Random number generator. I think it's like the robustness of randomness. Random is a very powerful optimization technique, and it's a very robust thing. So you can explore a lot of the space very efficiently. There's one thing that's really, really important to share, and this is the most exciting thing for me, is after you do the ranking, you get an ELO score, and you can track a user's first join date, the first date they submit a model to Chaiverse, they almost always get a terrible ELO, right? So let's say the first submission they get an ELO of 1,100 or 1,000 or something, and you can see that they iterate and they iterate and iterate, and it will be like, no improvement, no improvement, no improvement, and then boom. Do you give them any data, or do you have to come up with this themselves? We do, we do, we do, we do. We try and strike a balance between giving them data that's very useful, you've got to be compliant with GDPR, which is like, you have to work very hard to preserve the privacy of users of your app. So we try to give them as much signal as possible, to be helpful. The minimum is we're just going to give you a score, right? That's the minimum. But that alone is people can optimize a score pretty well, because they're able to come up with theories, submit it, does it work? No. A new theory, does it work? No. And then boom, as soon as they figure something out, they keep it, and then they iterate, and then boom,Alessio [00:51:46]: they figure something out, and they keep it. Last year, you had this post on your blog, cross-sourcing the lead to the 10 trillion parameter, AGI, and you call it a mixture of experts, recommenders. Yep. Any insights?William [00:51:58]: Updated thoughts, 12 months later? I think the odds, the timeline for AGI has certainly been pushed out, right? Now, this is in, I'm a controversial person, I don't know, like, I just think... You don't believe in scaling laws, you think AGI is further away. I think it's an S-curve. I think everything's an S-curve. And I think that the models have proven to just be far worse at reasoning than people sort of thought. And I think whenever I hear people talk about LLMs as reasoning engines, I sort of cringe a bit. I don't think that's what they are. I think of them more as like a simulator. I think of them as like a, right? So they get trained to predict the next most likely token. It's like a physics simulation engine. So you get these like games where you can like construct a bridge, and you drop a car down, and then it predicts what should happen. And that's really what LLMs are doing. It's not so much that they're reasoning, it's more that they're just doing the most likely thing. So fundamentally, the ability for people to add in intelligence, I think is very limited. What most people would consider intelligence, I think the AI is not a crowdsourcing problem, right? Now with Wikipedia, Wikipedia crowdsources knowledge. It doesn't crowdsource intelligence. So it's a subtle distinction. AI is fantastic at knowledge. I think it's weak at intelligence. And a lot, it's easy to conflate the two because if you ask it a question and it gives you, you know, if you said, who was the seventh president of the United States, and it gives you the correct answer, I'd say, well, I don't know the answer to that. And you can conflate that with intelligence. But really, that's a question of knowledge. And knowledge is really this thing about saying, how can I store all of this information? And then how can I retrieve something that's relevant? Okay, they're fantastic at that. They're fantastic at storing knowledge and retrieving the relevant knowledge. They're superior to humans in that regard. And so I think we need to come up for a new word. How does one describe AI should contain more knowledge than any individual human? It should be more accessible than any individual human. That's a very powerful thing. That's superswyx [00:54:07]: powerful. But what words do we use to describe that? We had a previous guest on Exa AI that does search. And he tried to coin super knowledge as the opposite of super intelligence.William [00:54:20]: Exactly. I think super knowledge is a more accurate word for it.swyx [00:54:24]: You can store more things than any human can.William [00:54:26]: And you can retrieve it better than any human can as well. And I think it's those two things combined that's special. I think that thing will exist. That thing can be built. And I think you can start with something that's entertaining and fun. And I think, I often think it's like, look, it's going to be a 20 year journey. And we're in like, year four, or it's like the web. And this is like 1998 or something. You know, you've got a long, long way to go before the Amazon.coms are like these huge, multi trillion dollar businesses that every single person uses every day. And so AI today is very simplistic. And it's fundamentally the way we're using it, the flywheels, and this ability for how can everyone contribute to it to really magnify the value that it brings. Right now, like, I think it's a bit sad. It's like, right now you have big labs, I'm going to pick on open AI. And they kind of go to like these human labelers. And they say, we're going to pay you to just label this like subset of questions that we want to get a really high quality data set, then we're going to get like our own computers that are really powerful. And that's kind of like the thing. For me, it's so much like Encyclopedia Britannica. It's like insane. All the people that were interested in blockchain, it's like, well, this is this is what needs to be decentralized, you need to decentralize that thing. Because if you distribute it, people can generate way more data in a distributed fashion, way more, right? You need the incentive. Yeah, of course. Yeah. But I mean, the, the, that's kind of the exciting thing about Wikipedia was it's this understanding, like the incentives, you don't need money to incentivize people. You don't need dog coins. No. Sometimes, sometimes people get the satisfaction fro
In this episode, Andrew Mayne, Justin Robert Young, and Brian Brushwood discuss the recent flurry of AI announcements from OpenAI’s Shipmas event and Google’s AI developments. They explore the implications of advanced AI models like GPT-3 and OpenAI’s O3, touching on their potential to revolutionize coding, problem-solving, and even the future of robotics and labor. […]
The AI Breakdown: Daily Artificial Intelligence News and Discussions
OpenAI's upcoming O3 model has sparked widespread speculation about its capabilities and potential impact. From hints at advanced reasoning to its implications for AGI development, the excitement is palpable. Meanwhile, rivals like DeepSeek challenge the playing field with cost-effective, high-performance alternatives. This episode unpacks the facts, dispels the hype, and explores the broader implications for AI innovation and policy. Brought to you by: KPMG – Go to www.kpmg.us/ai to learn more about how KPMG can help you drive value with our AI solutions. Vanta - Simplify compliance - https://vanta.com/nlw The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown
Is TikTok banned or not? The saga continues as Trump swoops in to "unban" the app, leaving creators and users whiplashed. But what's really going on behind the scenes in this high-stakes game of political chess? In this episode, Chris Saad and Yaniv Bernstein are joined by special guests Shira Lazar and Elli Hanson to unpack the latest tech and political drama. From TikTok's uncertain fate to Trump's meme coin launch, the team explores how these developments impact founders, investors, and the future of technology. In this episode, you will: Decode the TikTok ban flip-flop and its implications for US-China relations Analyze Trump's meme coin launch and what it reveals about his approach to tech and finance Explore the latest AI breakthroughs, including OpenAI's O3 mini model and new Tasks feature Discuss the potential of AI agents like Bordy.ai and their impact on industries like venture capital Gain insights on how founders can navigate the rapidly evolving AI landscape Examine the ethical considerations of AI development and deployment Consider the future of human-AI interaction and its societal implications The Pact Honour The Startup Podcast Pact! If you have listened to TSP and gotten value from it, please: Follow, rate, and review us in your listening app Subscribe to the TSP Mailing List at https://thestartuppodcast.beehiiv.com/subscribe Secure your official TSP merchandise at https://shop.tsp.show/ Follow us on YouTube at https://www.youtube.com/@startup-podcast Give us a public shout-out on LinkedIn or anywhere you have a social media following. Key links The Startup Podcast is sponsored by Vanta. Vanta helps businesses get and stay compliant by automating up to 90% of the work for the most in demand compliance frameworks. With over 200 integrations, you can easily monitor and secure the tools your business relies on. For a limited-time offer of US$1,000 off, go to www.vanta.com/tsp. Get your question in for our next Q&A episode: https://forms.gle/NZzgNWVLiFmwvFA2A The Startup Podcast website: https://tsp.show Learn more about Chris and Yaniv Work 1:1 with Chris: http://chrissaad.com/advisory/ Follow Chris on Linkedin: https://www.linkedin.com/in/chrissaad/ Follow Yaniv on Linkedin: https://www.linkedin.com/in/ybernstein/ Credits Editor: Justin McArthur Content Strategist: Carolina Franco
Co-hosts Mark Thompson and Steve Little explore Google's powerful end-of-year releases that challenge other AI platforms. They examine Google's Deep Research tool and its ability to quickly create comprehensive research reports by combining information from multiple sources.Next, they discuss Facebook's controversial introduction of AI assistants to Facebook groups and what this might mean for genealogy communities.The hosts review Google's NotebookLM's significant updates, including enhanced audio capabilities and collaborative features that could transform how genealogists work together.This week's Tip of the Week demonstrates how ChatGPT Projects can organize and enhance genealogical research, with Steve using his approach to a 2025 genealogy do-over as an example.In RapidFire, they cover Google's new fact benchmark, Gemini 2.0's release, AI-enhanced search results, and OpenAI's O3 announcement.Timestamps:In the News:06:44 Google Deep Research: The New Challenger to Perplexity18:03 Facebook Groups Get AI Assistants24:57 NotebookLM: Major Updates for Collaborative ProjectsTip of the Week:32:15 ChatGPT Projects: Organizing Your Genealogy Do-OverRapidFire:41:52 Google's New Fact Benchmark44:00 Gemini 2.0 Release48:34 Google AI-Enhanced Search Results54:48 OpenAI's O3 Model AnnouncementResource Links:AI Seminar at the Victoria Genealogical Societyhttps://victoriags.org/content.aspx?page_id=4091&club_id=554933&item_id=2488526Legacy Family Tree Webinars: "Genealogy Meets AI: Panel Discussion"https://familytreewebinars.com/webinar/genealogy-meets-ai-panel-discussion/Introducing Gemini 2.0: our new AI model for the agentic erahttps://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/Certified Arborist Facebook Grouphttps://www.facebook.com/groups/12652295162Exploring Facebook's New AI Tool for Groupshttps://aigenealogyinsights.com/2024/10/22/exploring-facebooks-new-ai-tool-for-groups/NotebookLM gets a new look, audio interactivity and a premium versionhttps://blog.google/technology/google-labs/notebooklm-new-features-december-2024/Using Projects in ChatGPThttps://help.openai.com/en/articles/10169521-using-projects-in-chatgptA new benchmark for evaluating the accuracy of large language modelshttps://deepmind.google/discover/blog/facts-grounding-a-new-benchmark-for-evaluating-the-factuality-of-large-language-models/AI Overviews in Search are coming to more places around the worldhttps://blog.google/products/search/ai-overviews-search-october-2024/OpenAI announces new o3 modelshttps://techcrunch.com/2024/12/20/openai-announces-new-o3-model/TagsAI, Artificial Intelligence, Genealogy, Family History, Technology, OpenAI, Google, Perplexity, ChatGPT, Meta, Content Licensing, AI Conferences, AI Outage, Ethical AI, Disclosure, Genealogy Research, OpenAI Licensing, Google AI Issues, Perplexity Pages, ChatGPT Updates, Meta AI Developments, Grok.AI, AI in Genealogy, AI Tools, Technology in Genealogy, AI Transparency, AI and Privacy
Are we on the brink of Artificial General Intelligence? Has OpenAI quietly announced a breakthrough that could change everything? And how are tech giants like Google responding? In this episode, Chris Saad and Yaniv Bernstein dive into the latest developments in AI, robotics, and the shifting political landscape as a new U.S. administration takes office. They explore the implications for startups, innovation, and the future of technology. In this episode, you will: Analyze OpenAI's latest O3 model and its potential AGI-level capabilities Discover how Google is catching up in the AI race with Gemini 2.0 Explore why 2025 could be a breakthrough year for robotics Examine how tech leaders like Mark Zuckerberg are aligning with the new Trump administration Understand the changes happening at Meta, including the removal of fact-checking and DEI programs Consider the lack of nuance in political discourse around tech and innovation Evaluate the potential impacts on regulation and growth in the tech sector The Pact Honour The Startup Podcast Pact! If you have listened to TSP and gotten value from it, please: Follow, rate, and review us in your listening app Subscribe to the TSP Mailing List at https://thestartuppodcast.beehiiv.com/subscribe Secure your official TSP merchandise at https://shop.tsp.show/ Follow us on YouTube at https://www.youtube.com/@startup-podcast Give us a public shout-out on LinkedIn or anywhere you have a social media following. Key links The Startup Podcast is sponsored by Vanta. Vanta helps businesses get and stay compliant by automating up to 90% of the work for the most in demand compliance frameworks. With over 200 integrations, you can easily monitor and secure the tools your business relies on. For a limited-time offer of US$1,000 off, go to www.vanta.com/tsp. Get your question in for our next Q&A episode: https://forms.gle/NZzgNWVLiFmwvFA2A The Startup Podcast website: https://tsp.show Learn more about Chris and Yaniv Work 1:1 with Chris: http://chrissaad.com/advisory/ Follow Chris on Linkedin: https://www.linkedin.com/in/chrissaad/ Follow Yaniv on Linkedin: https://www.linkedin.com/in/ybernstein/ Credits Editor: Justin McArthur Content Strategist: Carolina Franco Intro Voice: Jeremiah Owyang
Due to overwhelming demand (>15x applications:slots), we are closing CFPs for AI Engineer Summit NYC today. Last call! Thanks, we'll be reaching out to all shortly!The world's top AI blogger and friend of every pod, Simon Willison, dropped a monster 2024 recap: Things we learned about LLMs in 2024. Brian of the excellent TechMeme Ride Home pinged us for a connection and a special crossover episode, our first in 2025. The target audience for this podcast is a tech-literate, but non-technical one. You can see Simon's notes for AI Engineers in his World's Fair Keynote.Timestamp* 00:00 Introduction and Guest Welcome* 01:06 State of AI in 2025* 01:43 Advancements in AI Models* 03:59 Cost Efficiency in AI* 06:16 Challenges and Competition in AI* 17:15 AI Agents and Their Limitations* 26:12 Multimodal AI and Future Prospects* 35:29 Exploring Video Avatar Companies* 36:24 AI Influencers and Their Future* 37:12 Simplifying Content Creation with AI* 38:30 The Importance of Credibility in AI* 41:36 The Future of LLM User Interfaces* 48:58 Local LLMs: A Growing Interest* 01:07:22 AI Wearables: The Next Big Thing* 01:10:16 Wrapping Up and Final ThoughtsTranscript[00:00:00] Introduction and Guest Welcome[00:00:00] Brian: Welcome to the first bonus episode of the Tech Meme Write Home for the year 2025. I'm your host as always, Brian McCullough. Listeners to the pod over the last year know that I have made a habit of quoting from Simon Willison when new stuff happens in AI from his blog. Simon has been, become a go to for many folks in terms of, you know, Analyzing things, criticizing things in the AI space.[00:00:33] Brian: I've wanted to talk to you for a long time, Simon. So thank you for coming on the show. No, it's a privilege to be here. And the person that made this connection happen is our friend Swyx, who has been on the show back, even going back to the, the Twitter Spaces days but also an AI guru in, in their own right Swyx, thanks for coming on the show also.[00:00:54] swyx (2): Thanks. I'm happy to be on and have been a regular listener, so just happy to [00:01:00] contribute as well.[00:01:00] Brian: And a good friend of the pod, as they say. Alright, let's go right into it.[00:01:06] State of AI in 2025[00:01:06] Brian: Simon, I'm going to do the most unfair, broad question first, so let's get it out of the way. The year 2025. Broadly, what is the state of AI as we begin this year?[00:01:20] Brian: Whatever you want to say, I don't want to lead the witness.[00:01:22] Simon: Wow. So many things, right? I mean, the big thing is everything's got really good and fast and cheap. Like, that was the trend throughout all of 2024. The good models got so much cheaper, they got so much faster, they got multimodal, right? The image stuff isn't even a surprise anymore.[00:01:39] Simon: They're growing video, all of that kind of stuff. So that's all really exciting.[00:01:43] Advancements in AI Models[00:01:43] Simon: At the same time, they didn't get massively better than GPT 4, which was a bit of a surprise. So that's sort of one of the open questions is, are we going to see huge, but I kind of feel like that's a bit of a distraction because GPT 4, but way cheaper, much larger context lengths, and it [00:02:00] can do multimodal.[00:02:01] Simon: is better, right? That's a better model, even if it's not.[00:02:05] Brian: What people were expecting or hoping, maybe not expecting is not the right word, but hoping that we would see another step change, right? Right. From like GPT 2 to 3 to 4, we were expecting or hoping that maybe we were going to see the next evolution in that sort of, yeah.[00:02:21] Brian: We[00:02:21] Simon: did see that, but not in the way we expected. We thought the model was just going to get smarter, and instead we got. Massive drops in, drops in price. We got all of these new capabilities. You can talk to the things now, right? They can do simulated audio input, all of that kind of stuff. And so it's kind of, it's interesting to me that the models improved in all of these ways we weren't necessarily expecting.[00:02:43] Simon: I didn't know it would be able to do an impersonation of Santa Claus, like a, you know, Talked to it through my phone and show it what I was seeing by the end of 2024. But yeah, we didn't get that GPT 5 step. And that's one of the big open questions is, is that actually just around the corner and we'll have a bunch of GPT 5 class models drop in the [00:03:00] next few months?[00:03:00] Simon: Or is there a limit?[00:03:03] Brian: If you were a betting man and wanted to put money on it, do you expect to see a phase change, step change in 2025?[00:03:11] Simon: I don't particularly for that, like, the models, but smarter. I think all of the trends we're seeing right now are going to keep on going, especially the inference time compute, right?[00:03:21] Simon: The trick that O1 and O3 are doing, which means that you can solve harder problems, but they cost more and it churns away for longer. I think that's going to happen because that's already proven to work. I don't know. I don't know. Maybe there will be a step change to a GPT 5 level, but honestly, I'd be completely happy if we got what we've got right now.[00:03:41] Simon: But cheaper and faster and more capabilities and longer contexts and so forth. That would be thrilling to me.[00:03:46] Brian: Digging into what you've just said one of the things that, by the way, I hope to link in the show notes to Simon's year end post about what, what things we learned about LLMs in 2024. Look for that in the show notes.[00:03:59] Cost Efficiency in AI[00:03:59] Brian: One of the things that you [00:04:00] did say that you alluded to even right there was that in the last year, you felt like the GPT 4 barrier was broken, like IE. Other models, even open source ones are now regularly matching sort of the state of the art.[00:04:13] Simon: Well, it's interesting, right? So the GPT 4 barrier was a year ago, the best available model was OpenAI's GPT 4 and nobody else had even come close to it.[00:04:22] Simon: And they'd been at the, in the lead for like nine months, right? That thing came out in what, February, March of, of 2023. And for the rest of 2023, nobody else came close. And so at the start of last year, like a year ago, the big question was, Why has nobody beaten them yet? Like, what do they know that the rest of the industry doesn't know?[00:04:40] Simon: And today, that I've counted 18 organizations other than GPT 4 who've put out a model which clearly beats that GPT 4 from a year ago thing. Like, maybe they're not better than GPT 4. 0, but that's, that, that, that barrier got completely smashed. And yeah, a few of those I've run on my laptop, which is wild to me.[00:04:59] Simon: Like, [00:05:00] it was very, very wild. It felt very clear to me a year ago that if you want GPT 4, you need a rack of 40, 000 GPUs just to run the thing. And that turned out not to be true. Like the, the, this is that big trend from last year of the models getting more efficient, cheaper to run, just as capable with smaller weights and so forth.[00:05:20] Simon: And I ran another GPT 4 model on my laptop this morning, right? Microsoft 5. 4 just came out. And that, if you look at the benchmarks, it's definitely, it's up there with GPT 4. 0. It's probably not as good when you actually get into the vibes of the thing, but it, it runs on my, it's a 14 gigabyte download and I can run it on a MacBook Pro.[00:05:38] Simon: Like who saw that coming? The most exciting, like the close of the year on Christmas day, just a few weeks ago, was when DeepSeek dropped their DeepSeek v3 model on Hugging Face without even a readme file. It was just like a giant binary blob that I can't run on my laptop. It's too big. But in all of the benchmarks, it's now by far the best available [00:06:00] open, open weights model.[00:06:01] Simon: Like it's, it's, it's beating the, the metalamas and so forth. And that was trained for five and a half million dollars, which is a tenth of the price that people thought it costs to train these things. So everything's trending smaller and faster and more efficient.[00:06:15] Brian: Well, okay.[00:06:16] Challenges and Competition in AI[00:06:16] Brian: I, I kind of was going to get to that later, but let's, let's combine this with what I was going to ask you next, which is, you know, you're talking, you know, Also in the piece about the LLM prices crashing, which I've even seen in projects that I'm working on, but explain Explain that to a general audience, because we hear all the time that LLMs are eye wateringly expensive to run, but what we're suggesting, and we'll come back to the cheap Chinese LLM, but first of all, for the end user, what you're suggesting is that we're starting to see the cost come down sort of in the traditional technology way of Of costs coming down over time,[00:06:49] Simon: yes, but very aggressively.[00:06:51] Simon: I mean, my favorite thing, the example here is if you look at GPT-3, so open AI's g, PT three, which was the best, a developed model in [00:07:00] 2022 and through most of 20 2023. That, the models that we have today, the OpenAI models are a hundred times cheaper. So there was a 100x drop in price for OpenAI from their best available model, like two and a half years ago to today.[00:07:13] Simon: And[00:07:14] Brian: just to be clear, not to train the model, but for the use of tokens and things. Exactly,[00:07:20] Simon: for running prompts through them. And then When you look at the, the really, the top tier model providers right now, I think, are OpenAI, Anthropic, Google, and Meta. And there are a bunch of others that I could list there as well.[00:07:32] Simon: Mistral are very good. The, the DeepSeq and Quen models have got great. There's a whole bunch of providers serving really good models. But even if you just look at the sort of big brand name providers, they all offer models now that are A fraction of the price of the, the, of the models we were using last year.[00:07:49] Simon: I think I've got some numbers that I threw into my blog entry here. Yeah. Like Gemini 1. 5 flash, that's Google's fast high quality model is [00:08:00] how much is that? It's 0. 075 dollars per million tokens. Like these numbers are getting, So we just do cents per million now,[00:08:09] swyx (2): cents per million,[00:08:10] Simon: cents per million makes, makes a lot more sense.[00:08:12] Simon: Yeah they have one model 1. 5 flash 8B, the absolute cheapest of the Google models, is 27 times cheaper than GPT 3. 5 turbo was a year ago. That's it. And GPT 3. 5 turbo, that was the cheap model, right? Now we've got something 27 times cheaper, and the Google, this Google one can do image recognition, it can do million token context, all of those tricks.[00:08:36] Simon: But it's, it's, it's very, it's, it really is startling how inexpensive some of this stuff has got.[00:08:41] Brian: Now, are we assuming that this, that happening is directly the result of competition? Because again, you know, OpenAI, and probably they're doing this for their own almost political reasons, strategic reasons, keeps saying, we're losing money on everything, even the 200.[00:08:56] Brian: So they probably wouldn't, the prices wouldn't be [00:09:00] coming down if there wasn't intense competition in this space.[00:09:04] Simon: The competition is absolutely part of it, but I have it on good authority from sources I trust that Google Gemini is not operating at a loss. Like, the amount of electricity to run a prompt is less than they charge you.[00:09:16] Simon: And the same thing for Amazon Nova. Like, somebody found an Amazon executive and got them to say, Yeah, we're not losing money on this. I don't know about Anthropic and OpenAI, but clearly that demonstrates it is possible to run these things at these ludicrously low prices and still not be running at a loss if you discount the Army of PhDs and the, the training costs and all of that kind of stuff.[00:09:36] Brian: One, one more for me before I let Swyx jump in here. To, to come back to DeepSeek and this idea that you could train, you know, a cutting edge model for 6 million. I, I was saying on the show, like six months ago, that if we are getting to the point where each new model It would cost a billion, ten billion, a hundred billion to train that.[00:09:54] Brian: At some point it would almost, only nation states would be able to train the new models. Do you [00:10:00] expect what DeepSeek and maybe others are proving to sort of blow that up? Or is there like some sort of a parallel track here that maybe I'm not technically, I don't have the mouse to understand the difference.[00:10:11] Brian: Is the model, are the models going to go, you know, Up to a hundred billion dollars or can we get them down? Sort of like DeepSeek has proven[00:10:18] Simon: so I'm the wrong person to answer that because I don't work in the lab training these models. So I can give you my completely uninformed opinion, which is, I felt like the DeepSeek thing.[00:10:27] Simon: That was a bomb shell. That was an absolute bombshell when they came out and said, Hey, look, we've trained. One of the best available models and it cost us six, five and a half million dollars to do it. I feel, and they, the reason, one of the reasons it's so efficient is that we put all of these export controls in to stop Chinese companies from giant buying GPUs.[00:10:44] Simon: So they've, were forced to be, go as efficient as possible. And yet the fact that they've demonstrated that that's possible to do. I think it does completely tear apart this, this, this mental model we had before that yeah, the training runs just keep on getting more and more expensive and the number of [00:11:00] organizations that can afford to run these training runs keeps on shrinking.[00:11:03] Simon: That, that's been blown out of the water. So yeah, that's, again, this was our Christmas gift. This was the thing they dropped on Christmas day. Yeah, it makes me really optimistic that we can, there are, It feels like there was so much low hanging fruit in terms of the efficiency of both inference and training and we spent a whole bunch of last year exploring that and getting results from it.[00:11:22] Simon: I think there's probably a lot left. I think there's probably, well, I would not be surprised to see even better models trained spending even less money over the next six months.[00:11:31] swyx (2): Yeah. So I, I think there's a unspoken angle here on what exactly the Chinese labs are trying to do because DeepSea made a lot of noise.[00:11:41] swyx (2): so much for joining us for around the fact that they train their model for six million dollars and nobody quite quite believes them. Like it's very, very rare for a lab to trumpet the fact that they're doing it for so cheap. They're not trying to get anyone to buy them. So why [00:12:00] are they doing this? They make it very, very obvious.[00:12:05] swyx (2): Deepseek is about 150 employees. It's an order of magnitude smaller than at least Anthropic and maybe, maybe more so for OpenAI. And so what's, what's the end game here? Are they, are they just trying to show that the Chinese are better than us?[00:12:21] Simon: So Deepseek, it's the arm of a hedge, it's a, it's a quant fund, right?[00:12:25] Simon: It's an algorithmic quant trading thing. So I, I, I would love to get more insight into how that organization works. My assumption from what I've seen is it looks like they're basically just flexing. They're like, hey, look at how utterly brilliant we are with this amazing thing that we've done. And it's, it's working, right?[00:12:43] Simon: They but, and so is that it? Are they, is this just their kind of like, this is, this is why our company is so amazing. Look at this thing that we've done, or? I don't know. I'd, I'd love to get Some insight from, from within that industry as to, as to how that's all playing out.[00:12:57] swyx (2): The, the prevailing theory among the Local Llama [00:13:00] crew and the Twitter crew that I indexed for my newsletter is that there is some amount of copying going on.[00:13:06] swyx (2): It's like Sam Altman you know, tweet, tweeting about how they're being copied. And then also there's this, there, there are other sort of opening eye employees that have said, Stuff that is similar that DeepSeek's rate of progress is how U. S. intelligence estimates the number of foreign spies embedded in top labs.[00:13:22] swyx (2): Because a lot of these ideas do spread around, but they surprisingly have a very high density of them in the DeepSeek v3 technical report. So it's, it's interesting. We don't know how much, how many, how much tokens. I think that, you know, people have run analysis on how often DeepSeek thinks it is cloud or thinks it is opening GPC 4.[00:13:40] swyx (2): Thanks for watching! And we don't, we don't know. We don't know. I think for me, like, yeah, we'll, we'll, we basically will never know as, as external commentators. I think what's interesting is how, where does this go? Is there a logical floor or bottom by my estimations for the same amount of ELO started last year to the end of last year cost went down by a thousand X for the [00:14:00] GPT, for, for GPT 4 intelligence.[00:14:02] swyx (2): Would, do they go down a thousand X this year?[00:14:04] Simon: That's a fascinating question. Yeah.[00:14:06] swyx (2): Is there a Moore's law going on, or did we just get a one off benefit last year for some weird reason?[00:14:14] Simon: My uninformed hunch is low hanging fruit. I feel like up until a year ago, people haven't been focusing on efficiency at all. You know, it was all about, what can we get these weird shaped things to do?[00:14:24] Simon: And now once we've sort of hit that, okay, we know that we can get them to do what GPT 4 can do, When thousands of researchers around the world all focus on, okay, how do we make this more efficient? What are the most important, like, how do we strip out all of the weights that have stuff in that doesn't really matter?[00:14:39] Simon: All of that kind of thing. So yeah, maybe that was it. Maybe 2024 was a freak year of all of the low hanging fruit coming out at once. And we'll actually see a reduction in the, in that rate of improvement in terms of efficiency. I wonder, I mean, I think we'll know for sure in about three months time if that trend's going to continue or not.[00:14:58] swyx (2): I agree. You know, I [00:15:00] think the other thing that you mentioned that DeepSeq v3 was the gift that was given from DeepSeq over Christmas, but I feel like the other thing that might be underrated was DeepSeq R1,[00:15:11] Speaker 4: which is[00:15:13] swyx (2): a reasoning model you can run on your laptop. And I think that's something that a lot of people are looking ahead to this year.[00:15:18] swyx (2): Oh, did they[00:15:18] Simon: release the weights for that one?[00:15:20] swyx (2): Yeah.[00:15:21] Simon: Oh my goodness, I missed that. I've been playing with the quen. So the other great, the other big Chinese AI app is Alibaba's quen. Actually, yeah, I, sorry, R1 is an API available. Yeah. Exactly. When that's really cool. So Alibaba's Quen have released two reasoning models that I've run on my laptop.[00:15:38] Simon: Now there was, the first one was Q, Q, WQ. And then the second one was QVQ because the second one's a vision model. So you can like give it vision puzzles and a prompt that these things, they are so much fun to run. Because they think out loud. It's like the OpenAR 01 sort of hides its thinking process. The Query ones don't.[00:15:59] Simon: They just, they [00:16:00] just churn away. And so you'll give it a problem and it will output literally dozens of paragraphs of text about how it's thinking. My favorite thing that happened with QWQ is I asked it to draw me a pelican on a bicycle in SVG. That's like my standard stupid prompt. And for some reason it thought in Chinese.[00:16:18] Simon: It spat out a whole bunch of like Chinese text onto my terminal on my laptop, and then at the end it gave me quite a good sort of artistic pelican on a bicycle. And I ran it all through Google Translate, and yeah, it was like, it was contemplating the nature of SVG files as a starting point. And the fact that my laptop can think in Chinese now is so delightful.[00:16:40] Simon: It's so much fun watching you do that.[00:16:43] swyx (2): Yeah, I think Andrej Karpathy was saying, you know, we, we know that we have achieved proper reasoning inside of these models when they stop thinking in English, and perhaps the best form of thought is in Chinese. But yeah, for listeners who don't know Simon's blog he always, whenever a new model comes out, you, I don't know how you do it, but [00:17:00] you're always the first to run Pelican Bench on these models.[00:17:02] swyx (2): I just did it for 5.[00:17:05] Simon: Yeah.[00:17:07] swyx (2): So I really appreciate that. You should check it out. These are not theoretical. Simon's blog actually shows them.[00:17:12] Brian: Let me put on the investor hat for a second.[00:17:15] AI Agents and Their Limitations[00:17:15] Brian: Because from the investor side of things, a lot of the, the VCs that I know are really hot on agents, and this is the year of agents, but last year was supposed to be the year of agents as well. Lots of money flowing towards, And Gentic startups.[00:17:32] Brian: But in in your piece that again, we're hopefully going to have linked in the show notes, you sort of suggest there's a fundamental flaw in AI agents as they exist right now. Let me let me quote you. And then I'd love to dive into this. You said, I remain skeptical as to their ability based once again, on the Challenge of gullibility.[00:17:49] Brian: LLMs believe anything you tell them, any systems that attempt to make meaningful decisions on your behalf, will run into the same roadblock. How good is a travel agent, or a digital assistant, or even a research tool, if it [00:18:00] can't distinguish truth from fiction? So, essentially, what you're suggesting is that the state of the art now that allows agents is still, it's still that sort of 90 percent problem, the edge problem, getting to the Or, or, or is there a deeper flaw?[00:18:14] Brian: What are you, what are you saying there?[00:18:16] Simon: So this is the fundamental challenge here and honestly my frustration with agents is mainly around definitions Like any if you ask anyone who says they're working on agents to define agents You will get a subtly different definition from each person But everyone always assumes that their definition is the one true one that everyone else understands So I feel like a lot of these agent conversations, people talking past each other because one person's talking about the, the sort of travel agent idea of something that books things on your behalf.[00:18:41] Simon: Somebody else is talking about LLMs with tools running in a loop with a cron job somewhere and all of these different things. You, you ask academics and they'll laugh at you because they've been debating what agents mean for over 30 years at this point. It's like this, this long running, almost sort of an in joke in that community.[00:18:57] Simon: But if we assume that for this purpose of this conversation, an [00:19:00] agent is something that, Which you can give a job and it goes off and it does that thing for you like, like booking travel or things like that. The fundamental challenge is, it's the reliability thing, which comes from this gullibility problem.[00:19:12] Simon: And a lot of my, my interest in this originally came from when I was thinking about prompt injections as a source of this form of attack against LLM systems where you deliberately lay traps out there for this LLM to stumble across,[00:19:24] Brian: and which I should say you have been banging this drum that no one's gotten any far, at least on solving this, that I'm aware of, right.[00:19:31] Brian: Like that's still an open problem. The two years.[00:19:33] Simon: Yeah. Right. We've been talking about this problem and like, a great illustration of this was Claude so Anthropic released Claude computer use a few months ago. Fantastic demo. You could fire up a Docker container and you could literally tell it to do something and watch it open a web browser and navigate to a webpage and click around and so forth.[00:19:51] Simon: Really, really, really interesting and fun to play with. And then, um. One of the first demos somebody tried was, what if you give it a web page that says download and run this [00:20:00] executable, and it did, and the executable was malware that added it to a botnet. So the, the very first most obvious dumb trick that you could play on this thing just worked, right?[00:20:10] Simon: So that's obviously a really big problem. If I'm going to send something out to book travel on my behalf, I mean, it's hard enough for me to figure out which airlines are trying to scam me and which ones aren't. Do I really trust a language model that believes the literal truth of anything that's presented to it to go out and do those things?[00:20:29] swyx (2): Yeah I definitely think there's, it's interesting to see Anthropic doing this because they used to be the safety arm of OpenAI that split out and said, you know, we're worried about letting this thing out in the wild and here they are enabling computer use for agents. Thanks. The, it feels like things have merged.[00:20:49] swyx (2): You know, I'm, I'm also fairly skeptical about, you know, this always being the, the year of Linux on the desktop. And this is the equivalent of this being the year of agents that people [00:21:00] are not predicting so much as wishfully thinking and hoping and praying for their companies and agents to work.[00:21:05] swyx (2): But I, I feel like things are. Coming along a little bit. It's to me, it's kind of like self driving. I remember in 2014 saying that self driving was just around the corner. And I mean, it kind of is, you know, like in, in, in the Bay area. You[00:21:17] Simon: get in a Waymo and you're like, Oh, this works. Yeah, but it's a slow[00:21:21] swyx (2): cook.[00:21:21] swyx (2): It's a slow cook over the next 10 years. We're going to hammer out these things and the cynical people can just point to all the flaws, but like, there are measurable or concrete progress steps that are being made by these builders.[00:21:33] Simon: There is one form of agent that I believe in. I believe, mostly believe in the research assistant form of agents.[00:21:39] Simon: The thing where you've got a difficult problem and, and I've got like, I'm, I'm on the beta for the, the Google Gemini 1. 5 pro with deep research. I think it's called like these names, these names. Right. But. I've been using that. It's good, right? You can give it a difficult problem and it tells you, okay, I'm going to look at 56 different websites [00:22:00] and it goes away and it dumps everything to its context and it comes up with a report for you.[00:22:04] Simon: And it's not, it won't work against adversarial websites, right? If there are websites with deliberate lies in them, it might well get caught out. Most things don't have that as a problem. And so I've had some answers from that which were genuinely really valuable to me. And that feels to me like, I can see how given existing LLM tech, especially with Google Gemini with its like million token contacts and Google with their crawl of the entire web and their, they've got like search, they've got search and cache, they've got a cache of every page and so forth.[00:22:35] Simon: That makes sense to me. And that what they've got right now, I don't think it's, it's not as good as it can be, obviously, but it's, it's, it's, it's a real useful thing, which they're going to start rolling out. So, you know, Perplexity have been building the same thing for a couple of years. That, that I believe in.[00:22:50] Simon: You know, if you tell me that you're going to have an agent that's a research assistant agent, great. The coding agents I mean, chat gpt code interpreter, Nearly two years [00:23:00] ago, that thing started writing Python code, executing the code, getting errors, rewriting it to fix the errors. That pattern obviously works.[00:23:07] Simon: That works really, really well. So, yeah, coding agents that do that sort of error message loop thing, those are proven to work. And they're going to keep on getting better, and that's going to be great. The research assistant agents are just beginning to get there. The things I'm critical of are the ones where you trust, you trust this thing to go out and act autonomously on your behalf, and make decisions on your behalf, especially involving spending money, like that.[00:23:31] Simon: I don't see that working for a very long time. That feels to me like an AGI level problem.[00:23:37] swyx (2): It's it's funny because I think Stripe actually released an agent toolkit which is one of the, the things I featured that is trying to enable these agents each to have a wallet that they can go and spend and have, basically, it's a virtual card.[00:23:49] swyx (2): It's not that, not that difficult with modern infrastructure. can[00:23:51] Simon: stick a 50 cap on it, then at least it's an honor. Can't lose more than 50.[00:23:56] Brian: You know I don't, I don't know if either of you know Rafat Ali [00:24:00] he runs Skift, which is a, a travel news vertical. And he, he, he constantly laughs at the fact that every agent thing is, we're gonna get rid of booking a, a plane flight for you, you know?[00:24:11] Brian: And, and I would point out that, like, historically, when the web started, the first thing everyone talked about is, You can go online and book a trip, right? So it's funny for each generation of like technological advance. The thing they always want to kill is the travel agent. And now they want to kill the webpage travel agent.[00:24:29] Simon: Like it's like I use Google flight search. It's great, right? If you gave me an agent to do that for me, it would save me, I mean, maybe 15 seconds of typing in my things, but I still want to see what my options are and go, yeah, I'm not flying on that airline, no matter how cheap they are.[00:24:44] swyx (2): Yeah. For listeners, go ahead.[00:24:47] swyx (2): For listeners, I think, you know, I think both of you are pretty positive on NotebookLM. And you know, we, we actually interviewed the NotebookLM creators, and there are actually two internal agents going on internally. The reason it takes so long is because they're running an agent loop [00:25:00] inside that is fairly autonomous, which is kind of interesting.[00:25:01] swyx (2): For one,[00:25:02] Simon: for a definition of agent loop, if you picked that particularly well. For one definition. And you're talking about the podcast side of this, right?[00:25:07] swyx (2): Yeah, the podcast side of things. They have a there's, there's going to be a new version coming out that, that we'll be featuring at our, at our conference.[00:25:14] Simon: That one's fascinating to me. Like NotebookLM, I think it's two products, right? On the one hand, it's actually a very good rag product, right? You dump a bunch of things in, you can run searches, that, that, it does a good job of. And then, and then they added the, the podcast thing. It's a bit of a, it's a total gimmick, right?[00:25:30] Simon: But that gimmick got them attention, because they had a great product that nobody paid any attention to at all. And then you add the unfeasibly good voice synthesis of the podcast. Like, it's just, it's, it's, it's the lesson.[00:25:43] Brian: It's the lesson of mid journey and stuff like that. If you can create something that people can post on socials, you don't have to lift a finger again to do any marketing for what you're doing.[00:25:53] Brian: Let me dig into Notebook LLM just for a second as a podcaster. As a [00:26:00] gimmick, it makes sense, and then obviously, you know, you dig into it, it sort of has problems around the edges. It's like, it does the thing that all sort of LLMs kind of do, where it's like, oh, we want to Wrap up with a conclusion.[00:26:12] Multimodal AI and Future Prospects[00:26:12] Brian: I always call that like the the eighth grade book report paper problem where it has to have an intro and then, you know But that's sort of a thing where because I think you spoke about this again in your piece at the year end About how things are going multimodal and how things are that you didn't expect like, you know vision and especially audio I think So that's another thing where, at least over the last year, there's been progress made that maybe you, you didn't think was coming as quick as it came.[00:26:43] Simon: I don't know. I mean, a year ago, we had one really good vision model. We had GPT 4 vision, was, was, was very impressive. And Google Gemini had just dropped Gemini 1. 0, which had vision, but nobody had really played with it yet. Like Google hadn't. People weren't taking Gemini [00:27:00] seriously at that point. I feel like it was 1.[00:27:02] Simon: 5 Pro when it became apparent that actually they were, they, they got over their hump and they were building really good models. And yeah, and they, to be honest, the video models are mostly still using the same trick. The thing where you divide the video up into one image per second and you dump that all into the context.[00:27:16] Simon: So maybe it shouldn't have been so surprising to us that long context models plus vision meant that the video was, was starting to be solved. Of course, it didn't. Not being, you, what you really want with videos, you want to be able to do the audio and the images at the same time. And I think the models are beginning to do that now.[00:27:33] Simon: Like, originally, Gemini 1. 5 Pro originally ignored the audio. It just did the, the, like, one frame per second video trick. As far as I can tell, the most recent ones are actually doing pure multimodal. But the things that opens up are just extraordinary. Like, the the ChatGPT iPhone app feature that they shipped as one of their 12 days of, of OpenAI, I really can be having a conversation and just turn on my video camera and go, Hey, what kind of tree is [00:28:00] this?[00:28:00] Simon: And so forth. And it works. And for all I know, that's just snapping a like picture once a second and feeding it into the model. The, the, the things that you can do with that as an end user are extraordinary. Like that, that to me, I don't think most people have cottoned onto the fact that you can now stream video directly into a model because it, it's only a few weeks old.[00:28:22] Simon: Wow. That's a, that's a, that's a, that's Big boost in terms of what kinds of things you can do with this stuff. Yeah. For[00:28:30] swyx (2): people who are not that close I think Gemini Flashes free tier allows you to do something like capture a photo, one photo every second or a minute and leave it on 24, seven, and you can prompt it to do whatever.[00:28:45] swyx (2): And so you can effectively have your own camera app or monitoring app that that you just prompt and it detects where it changes. It detects for, you know, alerts or anything like that, or describes your day. You know, and, and, and the fact that this is free I think [00:29:00] it's also leads into the previous point of it being the prices haven't come down a lot.[00:29:05] Simon: And even if you're paying for this stuff, like a thing that I put in my blog entry is I ran a calculation on what it would cost to process 68, 000 photographs in my photo collection, and for each one just generate a caption, and using Gemini 1. 5 Flash 8B, it would cost me 1. 68 to process 68, 000 images, which is, I mean, that, that doesn't make sense.[00:29:28] Simon: None of that makes sense. Like it's, it's a, for one four hundredth of a cent per image to generate captions now. So you can see why feeding in a day's worth of video just isn't even very expensive to process.[00:29:40] swyx (2): Yeah, I'll tell you what is expensive. It's the other direction. So we're here, we're talking about consuming video.[00:29:46] swyx (2): And this year, we also had a lot of progress, like probably one of the most excited, excited, anticipated launches of the year was Sora. We actually got Sora. And less exciting.[00:29:55] Simon: We did, and then VO2, Google's Sora, came out like three [00:30:00] days later and upstaged it. Like, Sora was exciting until VO2 landed, which was just better.[00:30:05] swyx (2): In general, I feel the media, or the social media, has been very unfair to Sora. Because what was released to the world, generally available, was Sora Lite. It's the distilled version of Sora, right? So you're, I did not[00:30:16] Simon: realize that you're absolutely comparing[00:30:18] swyx (2): the, the most cherry picked version of VO two, the one that they published on the marketing page to the, the most embarrassing version of the soa.[00:30:25] swyx (2): So of course it's gonna look bad, so, well, I got[00:30:27] Simon: access to the VO two I'm in the VO two beta and I've been poking around with it and. Getting it to generate pelicans on bicycles and stuff. I would absolutely[00:30:34] swyx (2): believe that[00:30:35] Simon: VL2 is actually better. Is Sora, so is full fat Sora coming soon? Do you know, when, when do we get to play with that one?[00:30:42] Simon: No one's[00:30:43] swyx (2): mentioned anything. I think basically the strategy is let people play around with Sora Lite and get info there. But the, the, keep developing Sora with the Hollywood studios. That's what they actually care about. Gotcha. Like the rest of us. Don't really know what to do with the video anyway. Right.[00:30:59] Simon: I mean, [00:31:00] that's my thing is I realized that for generative images and images and video like images We've had for a few years and I don't feel like they've broken out into the talented artist community yet Like lots of people are having fun with them and doing and producing stuff. That's kind of cool to look at but what I want you know that that movie everything everywhere all at once, right?[00:31:20] Simon: One, one ton of Oscars, utterly amazing film. The VFX team for that were five people, some of whom were watching YouTube videos to figure out what to do. My big question for, for Sora and and and Midjourney and stuff, what happens when a creative team like that starts using these tools? I want the creative geniuses behind everything, everywhere all at once.[00:31:40] Simon: What are they going to be able to do with this stuff in like a few years time? Because that's really exciting to me. That's where you take artists who are at the very peak of their game. Give them these new capabilities and see, see what they can do with them.[00:31:52] swyx (2): I should, I know a little bit here. So it should mention that, that team actually used RunwayML.[00:31:57] swyx (2): So there was, there was,[00:31:57] Simon: yeah.[00:31:59] swyx (2): I don't know how [00:32:00] much I don't. So, you know, it's possible to overstate this, but there are people integrating it. Generated video within their workflow, even pre SORA. Right, because[00:32:09] Brian: it's not, it's not the thing where it's like, okay, tomorrow we'll be able to do a full two hour movie that you prompt with three sentences.[00:32:15] Brian: It is like, for the very first part of, of, you know video effects in film, it's like, if you can get that three second clip, if you can get that 20 second thing that they did in the matrix that blew everyone's minds and took a million dollars or whatever to do, like, it's the, it's the little bits and pieces that they can fill in now that it's probably already there.[00:32:34] swyx (2): Yeah, it's like, I think actually having a layered view of what assets people need and letting AI fill in the low value assets. Right, like the background video, the background music and, you know, sometimes the sound effects. That, that maybe, maybe more palatable maybe also changes the, the way that you evaluate the stuff that's coming out.[00:32:57] swyx (2): Because people tend to, in social media, try to [00:33:00] emphasize foreground stuff, main character stuff. So you really care about consistency, and you, you really are bothered when, like, for example, Sorad. Botch's image generation of a gymnast doing flips, which is horrible. It's horrible. But for background crowds, like, who cares?[00:33:18] Brian: And by the way, again, I was, I was a film major way, way back in the day, like, that's how it started. Like things like Braveheart, where they filmed 10 people on a field, and then the computer could turn it into 1000 people on a field. Like, that's always been the way it's around the margins and in the background that first comes in.[00:33:36] Brian: The[00:33:36] Simon: Lord of the Rings movies were over 20 years ago. Although they have those giant battle sequences, which were very early, like, I mean, you could almost call it a generative AI approach, right? They were using very sophisticated, like, algorithms to model out those different battles and all of that kind of stuff.[00:33:52] Simon: Yeah, I know very little. I know basically nothing about film production, so I try not to commentate on it. But I am fascinated to [00:34:00] see what happens when, when these tools start being used by the real, the people at the top of their game.[00:34:05] swyx (2): I would say like there's a cultural war that is more that being fought here than a technology war.[00:34:11] swyx (2): Most of the Hollywood people are against any form of AI anyway, so they're busy Fighting that battle instead of thinking about how to adopt it and it's, it's very fringe. I participated here in San Francisco, one generative AI video creative hackathon where the AI positive artists actually met with technologists like myself and then we collaborated together to build short films and that was really nice and I think, you know, I'll be hosting some of those in my events going forward.[00:34:38] swyx (2): One thing that I think like I want to leave it. Give people a sense of it's like this is a recap of last year But then sometimes it's useful to walk away as well with like what can we expect in the future? I don't know if you got anything. I would also call out that the Chinese models here have made a lot of progress Hyde Law and Kling and God knows who like who else in the video arena [00:35:00] Also making a lot of progress like surprising him like I think maybe actually Chinese China is surprisingly ahead with regards to Open8 at least, but also just like specific forms of video generation.[00:35:12] Simon: Wouldn't it be interesting if a film industry sprung up in a country that we don't normally think of having a really strong film industry that was using these tools? Like, that would be a fascinating sort of angle on this. Mm hmm. Mm hmm.[00:35:25] swyx (2): Agreed. I, I, I Oh, sorry. Go ahead.[00:35:29] Exploring Video Avatar Companies[00:35:29] swyx (2): Just for people's Just to put it on people's radar as well, Hey Jen, there's like there's a category of video avatar companies that don't specifically, don't specialize in general video.[00:35:41] swyx (2): They only do talking heads, let's just say. And HeyGen sings very well.[00:35:45] Brian: Swyx, you know that that's what I've been using, right? Like, have, have I, yeah, right. So, if you see some of my recent YouTube videos and things like that, where, because the beauty part of the HeyGen thing is, I, I, I don't want to use the robot voice, so [00:36:00] I record the mp3 file for my computer, And then I put that into HeyGen with the avatar that I've trained it on, and all it does is the lip sync.[00:36:09] Brian: So it looks, it's not 100 percent uncanny valley beatable, but it's good enough that if you weren't looking for it, it's just me sitting there doing one of my clips from the show. And, yeah, so, by the way, HeyGen. Shout out to them.[00:36:24] AI Influencers and Their Future[00:36:24] swyx (2): So I would, you know, in terms of like the look ahead going, like, looking, reviewing 2024, looking at trends for 2025, I would, they basically call this out.[00:36:33] swyx (2): Meta tried to introduce AI influencers and failed horribly because they were just bad at it. But at some point that there will be more and more basically AI influencers Not in a way that Simon is but in a way that they are not human.[00:36:50] Simon: Like the few of those that have done well, I always feel like they're doing well because it's a gimmick, right?[00:36:54] Simon: It's a it's it's novel and fun to like Like that, the AI Seinfeld thing [00:37:00] from last year, the Twitch stream, you know, like those, if you're the only one or one of just a few doing that, you'll get, you'll attract an audience because it's an interesting new thing. But I just, I don't know if that's going to be sustainable longer term or not.[00:37:11] Simon: Like,[00:37:12] Simplifying Content Creation with AI[00:37:12] Brian: I'm going to tell you, Because I've had discussions, I can't name the companies or whatever, but, so think about the workflow for this, like, now we all know that on TikTok and Instagram, like, holding up a phone to your face, and doing like, in my car video, or walking, a walk and talk, you know, that's, that's very common, but also, if you want to do a professional sort of talking head video, you still have to sit in front of a camera, you still have to do the lighting, you still have to do the video editing, versus, if you can just record, what I'm saying right now, the last 30 seconds, If you clip that out as an mp3 and you have a good enough avatar, then you can put that avatar in front of Times Square, on a beach, or whatever.[00:37:50] Brian: So, like, again for creators, the reason I think Simon, we're on the verge of something, it, it just, it's not going to, I think it's not, oh, we're going to have [00:38:00] AI avatars take over, it'll be one of those things where it takes another piece of the workflow out and simplifies it. I'm all[00:38:07] Simon: for that. I, I always love this stuff.[00:38:08] Simon: I like tools. Tools that help human beings do more. Do more ambitious things. I'm always in favor of, like, that, that, that's what excites me about this entire field.[00:38:17] swyx (2): Yeah. We're, we're looking into basically creating one for my podcast. We have this guy Charlie, he's Australian. He's, he's not real, but he pre, he opens every show and we are gonna have him present all the shorts.[00:38:29] Simon: Yeah, go ahead.[00:38:30] The Importance of Credibility in AI[00:38:30] Simon: The thing that I keep coming back to is this idea of credibility like in a world that is full of like AI generated everything and so forth It becomes even more important that people find the sources of information that they trust and find people and find Sources that are credible and I feel like that's the one thing that LLMs and AI can never have is credibility, right?[00:38:49] Simon: ChatGPT can never stake its reputation on telling you something useful and interesting because That means nothing, right? It's a matrix multiplication. It depends on who prompted it and so forth. So [00:39:00] I'm always, and this is when I'm blogging as well, I'm always looking for, okay, who are the reliable people who will tell me useful, interesting information who aren't just going to tell me whatever somebody's paying them to tell, tell them, who aren't going to, like, type a one sentence prompt into an LLM and spit out an essay and stick it online.[00:39:16] Simon: And that, that to me, Like, earning that credibility is really important. That's why a lot of my ethics around the way that I publish are based on the idea that I want people to trust me. I want to do things that, that gain credibility in people's eyes so they will come to me for information as a trustworthy source.[00:39:32] Simon: And it's the same for the sources that I'm, I'm consulting as well. So that's something I've, I've been thinking a lot about that sort of credibility focus on this thing for a while now.[00:39:40] swyx (2): Yeah, you can layer or structure credibility or decompose it like so one thing I would put in front of you I'm not saying that you should Agree with this or accept this at all is that you can use AI to generate different Variations and then and you pick you as the final sort of last mile person that you pick The last output and [00:40:00] you put your stamp of credibility behind that like that everything's human reviewed instead of human origin[00:40:04] Simon: Yeah, if you publish something you need to be able to put it on the ground Publishing it.[00:40:08] Simon: You need to say, I will put my name to this. I will attach my credibility to this thing. And if you're willing to do that, then, then that's great.[00:40:16] swyx (2): For creators, this is huge because there's a fundamental asymmetry between starting with a blank slate versus choosing from five different variations.[00:40:23] Brian: Right.[00:40:24] Brian: And also the key thing that you just said is like, if everything that I do, if all of the words were generated by an LLM, if the voice is generated by an LLM. If the video is also generated by the LLM, then I haven't done anything, right? But if, if one or two of those, you take a shortcut, but it's still, I'm willing to sign off on it.[00:40:47] Brian: Like, I feel like that's where I feel like people are coming around to like, this is maybe acceptable, sort of.[00:40:53] Simon: This is where I've been pushing the definition. I love the term slop. Where I've been pushing the definition of slop as AI generated [00:41:00] content that is both unrequested and unreviewed and the unreviewed thing is really important like that's the thing that elevates something from slop to not slop is if A human being has reviewed it and said, you know what, this is actually worth other people's time.[00:41:12] Simon: And again, I'm willing to attach my credibility to it and say, hey, this is worthwhile.[00:41:16] Brian: It's, it's, it's the cura curational, curatorial and editorial part of it that no matter what the tools are to do shortcuts, to do, as, as Swyx is saying choose between different edits or different cuts, but in the end, if there's a curatorial mind, Or editorial mind behind it.[00:41:32] Brian: Let me I want to wedge this in before we start to close.[00:41:36] The Future of LLM User Interfaces[00:41:36] Brian: One of the things coming back to your year end piece that has been a something that I've been banging the drum about is when you're talking about LLMs. Getting harder to use. You said most users are thrown in at the deep end.[00:41:48] Brian: The default LLM chat UI is like taking brand new computer users, dropping them into a Linux terminal and expecting them to figure it all out. I mean, it's, it's literally going back to the command line. The command line was defeated [00:42:00] by the GUI interface. And this is what I've been banging the drum about is like, this cannot be.[00:42:05] Brian: The user interface, what we have now cannot be the end result. Do you see any hints or seeds of a GUI moment for LLM interfaces?[00:42:17] Simon: I mean, it has to happen. It absolutely has to happen. The the, the, the, the usability of these things is turning into a bit of a crisis. And we are at least seeing some really interesting innovation in little directions.[00:42:28] Simon: Just like OpenAI's chat GPT canvas thing that they just launched. That is at least. Going a little bit more interesting than just chat, chats and responses. You know, you can, they're exploring that space where you're collaborating with an LLM. You're both working in the, on the same document. That makes a lot of sense to me.[00:42:44] Simon: Like that, that feels really smart. The one of the best things is still who was it who did the, the UI where you could, they had a drawing UI where you draw an interface and click a button. TL draw would then make it real thing. That was spectacular, [00:43:00] absolutely spectacular, like, alternative vision of how you'd interact with these models.[00:43:05] Simon: Because yeah, the and that's, you know, so I feel like there is so much scope for innovation there and it is beginning to happen. Like, like, I, I feel like most people do understand that we need to do better in terms of interfaces that both help explain what's going on and give people better tools for working with models.[00:43:23] Simon: I was going to say, I want to[00:43:25] Brian: dig a little deeper into this because think of the conceptual idea behind the GUI, which is instead of typing into a command line open word. exe, it's, you, you click an icon, right? So that's abstracting away sort of the, again, the programming stuff that like, you know, it's, it's a, a, a child can tap on an iPad and, and make a program open, right?[00:43:47] Brian: The problem it seems to me right now with how we're interacting with LLMs is it's sort of like you know a dumb robot where it's like you poke it and it goes over here, but no, I want it, I want to go over here so you poke it this way and you can't get it exactly [00:44:00] right, like, what can we abstract away from the From the current, what's going on that, that makes it more fine tuned and easier to get more precise.[00:44:12] Brian: You see what I'm saying?[00:44:13] Simon: Yes. And the this is the other trend that I've been following from the last year, which I think is super interesting. It's the, the prompt driven UI development thing. Basically, this is the pattern where Claude Artifacts was the first thing to do this really well. You type in a prompt and it goes, Oh, I should answer that by writing a custom HTML and JavaScript application for you that does a certain thing.[00:44:35] Simon: And when you think about that take and since then it turns out This is easy, right? Every decent LLM can produce HTML and JavaScript that does something useful. So we've actually got this alternative way of interacting where they can respond to your prompt with an interactive custom interface that you can work with.[00:44:54] Simon: People haven't quite wired those back up again. Like, ideally, I'd want the LLM ask me a [00:45:00] question where it builds me a custom little UI, For that question, and then it gets to see how I interacted with that. I don't know why, but that's like just such a small step from where we are right now. But that feels like such an obvious next step.[00:45:12] Simon: Like an LLM, why should it, why should you just be communicating with, with text when it can build interfaces on the fly that let you select a point on a map or or move like sliders up and down. It's gonna create knobs and dials. I keep saying knobs and dials. right. We can do that. And the LLMs can build, and Claude artifacts will build you a knobs and dials interface.[00:45:34] Simon: But at the moment they haven't closed the loop. When you twiddle those knobs, Claude doesn't see what you were doing. They're going to close that loop. I'm, I'm shocked that they haven't done it yet. So yeah, I think there's so much scope for innovation and there's so much scope for doing interesting stuff with that model where the LLM, anything you can represent in SVG, which is almost everything, can now be part of that ongoing conversation.[00:45:59] swyx (2): Yeah, [00:46:00] I would say the best executed version of this I've seen so far is Bolt where you can literally type in, make a Spotify clone, make an Airbnb clone, and it actually just does that for you zero shot with a nice design.[00:46:14] Simon: There's a benchmark for that now. The LMRena people now have a benchmark that is zero shot app, app generation, because all of the models can do it.[00:46:22] Simon: Like it's, it's, I've started figuring out. I'm building my own version of this for my own project, because I think within six months. I think it'll just be an expected feature. Like if you have a web application, why don't you have a thing where, oh, look, the, you can add a custom, like, so for my dataset data exploration project, I want you to be able to do things like conjure up a dashboard, just via a prompt.[00:46:43] Simon: You say, oh, I need a pie chart and a bar chart and put them next to each other, and then have a form where submitting the form inserts a row into my database table. And this is all suddenly feasible. It's, it's, it's not even particularly difficult to do, which is great. Utterly bizarre that these things are now easy.[00:47:00][00:47:00] swyx (2): I think for a general audience, that is what I would highlight, that software creation is becoming easier and easier. Gemini is now available in Gmail and Google Sheets. I don't write my own Google Sheets formulas anymore, I just tell Gemini to do it. And so I think those are, I almost wanted to basically somewhat disagree with, with your assertion that LMS got harder to use.[00:47:22] swyx (2): Like, yes, we, we expose more capabilities, but they're, they're in minor forms, like using canvas, like web search in, in in chat GPT and like Gemini being in, in Excel sheets or in Google sheets, like, yeah, we're getting, no,[00:47:37] Simon: no, no, no. Those are the things that make it harder, because the problem is that for each of those features, they're amazing.[00:47:43] Simon: If you understand the edges of the feature, if you're like, okay, so in Google, Gemini, Excel formulas, I can get it to do a certain amount of things, but I can't get it to go and read a web. You probably can't get it to read a webpage, right? But you know, there are, there are things that it can do and things that it can't do, which are completely undocumented.[00:47:58] Simon: If you ask it what it [00:48:00] can and can't do, they're terrible at answering questions about that. So like my favorite example is Claude artifacts. You can't build a Claude artifact that can hit an API somewhere else. Because the cause headers on that iframe prevents accessing anything outside of CDNJS. So, good luck learning cause headers as an end user in order to understand why Like, I've seen people saying, oh, this is rubbish.[00:48:26] Simon: I tried building an artifact that would run a prompt and it couldn't because Claude didn't expose an API with cause headers that all of this stuff is so weird and complicated. And yeah, like that, that, the more that with the more tools we add, the more expertise you need to really, To understand the full scope of what you can do.[00:48:44] Simon: And so it's, it's, I wouldn't say it's, it's, it's, it's like, the question really comes down to what does it take to understand the full extent of what's possible? And honestly, that, that's just getting more and more involved over time.[00:48:58] Local LLMs: A Growing Interest[00:48:58] swyx (2): I have one more topic that I, I [00:49:00] think you, you're kind of a champion of and we've touched on it a little bit, which is local LLMs.[00:49:05] swyx (2): And running AI applications on your desktop, I feel like you are an early adopter of many, many things.[00:49:12] Simon: I had an interesting experience with that over the past year. Six months ago, I almost completely lost interest. And the reason is that six months ago, the best local models you could run, There was no point in using them at all, because the best hosted models were so much better.[00:49:26] Simon: Like, there was no point at which I'd choose to run a model on my laptop if I had API access to Cloud 3. 5 SONNET. They just, they weren't even comparable. And that changed, basically, in the past three months, as the local models had this step changing capability, where now I can run some of these local models, and they're not as good as Cloud 3.[00:49:45] Simon: 5 SONNET, but they're not so far away that It's not worth me even using them. The other, the, the, the, the continuing problem is I've only got 64 gigabytes of RAM, and if you run, like, LLAMA370B, it's not going to work. Most of my RAM is gone. So now I have to shut down my Firefox tabs [00:50:00] and, and my Chrome and my VS Code windows in order to run it.[00:50:03] Simon: But it's got me interested again. Like, like the, the efficiency improvements are such that now, if you were to like stick me on a desert island with my laptop, I'd be very productive using those local models. And that's, that's pretty exciting. And if those trends continue, and also, like, I think my next laptop, if when I buy one is going to have twice the amount of RAM, At which point, maybe I can run the, almost the top tier, like open weights models and still be able to use it as a computer as well.[00:50:32] Simon: NVIDIA just announced their 3, 000 128 gigabyte monstrosity. That's pretty good price. You know, that's that's, if you're going to buy it,[00:50:42] swyx (2): custom OS and all.[00:50:46] Simon: If I get a job, if I, if, if, if I have enough of an income that I can justify blowing $3,000 on it, then yes.[00:50:52] swyx (2): Okay, let's do a GoFundMe to get Simon one it.[00:50:54] swyx (2): Come on. You know, you can get a job anytime you want. Is this, this is just purely discretionary .[00:50:59] Simon: I want, [00:51:00] I want a job that pays me to do exactly what I'm doing already and doesn't tell me what else to do. That's, thats the challenge.[00:51:06] swyx (2): I think Ethan Molik does pretty well. Whatever, whatever it is he's doing.[00:51:11] swyx (2): But yeah, basically I was trying to bring in also, you know, not just local models, but Apple intelligence is on every Mac machine. You're, you're, you seem skeptical. It's rubbish.[00:51:21] Simon: Apple intelligence is so bad. It's like, it does one thing well.[00:51:25] swyx (2): Oh yeah, what's that? It summarizes notifications. And sometimes it's humorous.[00:51:29] Brian: Are you sure it does that well? And also, by the way, the other, again, from a sort of a normie point of view. There's no indication from Apple of when to use it. Like, everybody upgrades their thing and it's like, okay, now you have Apple Intelligence, and you never know when to use it ever again.[00:51:47] swyx (2): Oh, yeah, you consult the Apple docs, which is MKBHD.[00:51:49] swyx (2): The[00:51:51] Simon: one thing, the one thing I'll say about Apple Intelligence is, One of the reasons it's so disappointing is that the models are just weak, but now, like, Llama 3b [00:52:00] is Such a good model in a 2 gigabyte file I think give Apple six months and hopefully they'll catch up to the state of the art on the small models And then maybe it'll start being a lot more interesting.[00:52:10] swyx (2): Yeah. Anyway, I like This was year one And and you know just like our first year of iPhone maybe maybe not that much of a hit and then year three They had the App Store so Hey I would say give it some time, and you know, I think Chrome also shipping Gemini Nano I think this year in Chrome, which means that every app, every web app will have for free access to a local model that just ships in the browser, which is kind of interesting.[00:52:38] swyx (2): And then I, I think I also wanted to just open the floor for any, like, you know, any of us what are the apps that, you know, AI applications that we've adopted that have, that we really recommend because these are all, you know, apps that are running on our browser that like, or apps that are running locally that we should be, that, that other people should be trying.[00:52:55] swyx (2): Right? Like, I, I feel like that's, that's one always one thing that is helpful at the start of the [00:53:00] year.[00:53:00] Simon: Okay. So for running local models. My top picks, firstly, on the iPhone, there's this thing called MLC Chat, which works, and it's easy to install, and it runs Llama 3B, and it's so much fun. Like, it's not necessarily a capable enough novel that I use it for real things, but my party trick right now is I get my phone to write a Netflix Christmas movie plot outline where, like, a bunch of Jeweller falls in love with the King of Sweden or whatever.[00:53:25] Simon: And it does a good job and it comes up with pun names for the movies. And that's, that's deeply entertaining. On my laptop, most recently, I've been getting heavy into, into Olama because the Olama team are very, very good at finding the good models and patching them up and making them work well. It gives you an API.[00:53:42] Simon: My little LLM command line tool that has a plugin that talks to Olama, which works really well. So that's my, my Olama is. I think the easiest on ramp to to running models locally, if you want a nice user interface, LMStudio is, I think, the best user interface [00:54:00] thing at that. It's not open source. It's good.[00:54:02] Simon: It's worth playing with. The other one that I've been trying with recently, there's a thing called, what's it called? Open web UI or something. Yeah. The UI is fantastic. It, if you've got Olama running and you fire this thing up, it spots Olama and it gives you an interface onto your Olama models. And t
Are we on the brink of a technological revolution—or chaos?In this week's episode of Leveraging AI, host Isar Meitis breaks down the fast-paced developments in artificial intelligence that unfolded during the final weeks of the year. This episode unpacks key concepts like Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI), exploring their potential to transform industries, solve global challenges, and... maybe even outsmart us. If you're a business leader looking to leverage AI while staying ahead of the curve, this episode is your AI survival guide.Wondering how to train your team or yourself to adopt AI successfully? I share a proven AI business transformation framework and an exclusive opportunity to join a live course designed for leaders. Checkout with $100 off using LEVERAGINGAI100 at https://multiplai.ai/ai-course/In this episode, you'll discover:The surprising milestones in AGI and ASI, including OpenAI's O3 model outperforming humans in key tests.How Sam Altman and Dario Amodei envision AI solving global challenges—while acknowledging the risks.Why “thinking models” are reshaping AI's role in business, and how they might transform the market.The growing influence of AI agents in companies like Google, eBay, and Moody's—and how they're reshaping industries.Why leaders like Sundar Pichai are pushing for AI to become as ubiquitous as Google itself.A behind-the-scenes look at AI-driven innovations from NVIDIA, Meta, and emerging players like DeepSeek.A step-by-step plan to enhance AI literacy and adoption in your business for maximum ROI.BONUS:Sam Altman's Blog Post: "Reflections" - https://blog.samaltman.com/reflections Dario Amodei's Essay: "Machines of Loving Grace" - https://darioamodei.com/machines-of-loving-graceAbout Leveraging AI The Ultimate AI Course for Business People: https://multiplai.ai/ai-course/ YouTube Full Episodes: https://www.youtube.com/@Multiplai_AI/ Connect with Isar Meitis: https://www.linkedin.com/in/isarmeitis/ Free AI Consultation: https://multiplai.ai/book-a-call/ Join our Live Sessions, AI Hangouts and newsletter: https://services.multiplai.ai/events If you've enjoyed or benefited from some of the insights of this episode, leave us a five-star review on your favorite podcast platform, and let us know what you learned, found helpful, or liked most about this show!
AI Applied: Covering AI News, Interviews and Tools - ChatGPT, Midjourney, Runway, Poe, Anthropic
In this conversation, Conor and Jaeden discuss the advancements in AI technology, particularly focusing on OpenAI's O3 model. They explore the significant improvements in reasoning capabilities, the introduction of an integrated evaluator model, and the ability of AI to execute its own programs. The discussion highlights the challenges and opportunities in AI development, emphasizing the potential for achieving artificial general intelligence (AGI) in the near future. Chapters 00:00 The Future of AI: OpenAI's O3 Model 03:06 Breakthroughs in AI Reasoning and Problem Solving 05:55 Evaluating AI Responses: The Integrated Evaluator Model 08:57 Executing Programs: AI's New Capabilities 12:00 Challenges and Opportunities in AI Development AI Applied YouTube Channel: https://www.youtube.com/@AI-Applied-Podcast Get on the AI Box Waitlist: https://AIBox.ai/ Conor's AI Course: https://www.ai-mindset.ai/courses Conor's AI Newsletter: https://www.ai-mindset.ai/ Jaeden's AI Hustle Community: https://www.skool.com/aihustle/about
Host Dave Sobel discusses significant cybersecurity developments involving the U.S. Treasury Department and its recent breach linked to Chinese hackers. The breach, which was discovered on December 8, 2024, involved unauthorized access to unclassified documents within the Office of Foreign Assets Control, raising alarms about the potential exposure of sensitive information related to economic sanctions. The episode highlights the ongoing investigations and the U.S. government's response, including sanctions imposed on a Chinese cybersecurity firm involved in the Flax Typhoon cyber attacks that compromised numerous internet-connected devices globally.Sobel also addresses the national security concerns surrounding TP-Link internet routers, which hold a dominant market share in the U.S. The Commerce, Defense, and Justice Departments are investigating the company due to its alleged ties to Chinese cyber threats and its failure to rectify security vulnerabilities. The episode emphasizes the importance of securing cloud systems, as CISA has mandated federal agencies to conduct security assessments in light of recent breaches attributed to foreign hackers. This directive aims to enhance the security posture of federal cloud environments and protect sensitive information.The discussion shifts to the leadership transition at PIA, where CEO Jerwai Todd has stepped down after a year, passing the reins to an executive group. Sobel reflects on the challenges of dual CEO roles and the importance of operational stability during this transition. He notes Todd's contributions to the company, including the launch of an AI-driven help desk ticketing system, and emphasizes the need for a capable leader to navigate the competitive landscape of help desk automation.Finally, the episode covers OpenAI's recent announcements regarding its new reasoning models, O1 and O3, which aim to enhance AI capabilities and approach artificial general intelligence. Sobel discusses the implications of OpenAI's shift towards a for-profit model and the potential impact on the development of AI technologies. He highlights the need for practical applications of these advancements and the importance of addressing concerns about the ethical implications of AI development. The episode concludes with a reminder of the significance of these developments in the broader context of technology and national security. Three things to know today 00:00 From Treasury Hacks to Router Risks: The U.S. Grapples with China's Cyber Onslaught06:31 Dual CEO Role Dilemmas: Gerwai Todd Passes the Torch at Pia08:51 AI Gets a Power Boost: OpenAI's Big Plans, Bigger Models, and a Push for Profits All our Sponsors: https://businessof.tech/sponsors/ Do you want the show on your podcast app or the written versions of the stories? Subscribe to the Business of Tech: https://www.businessof.tech/subscribe/Looking for a link from the stories? The entire script of the show, with links to articles, are posted in each story on https://www.businessof.tech/ Support the show on Patreon: https://patreon.com/mspradio/ Want to be a guest on Business of Tech: Daily 10-Minute IT Services Insights? Send Dave Sobel a message on PodMatch, here: https://www.podmatch.com/hostdetailpreview/businessoftech Want our stuff? Cool Merch? Wear “Why Do We Care?” - Visit https://mspradio.myspreadshop.com Follow us on:LinkedIn: https://www.linkedin.com/company/28908079/YouTube: https://youtube.com/mspradio/Facebook: https://www.facebook.com/mspradionews/Instagram: https://www.instagram.com/mspradio/TikTok: https://www.tiktok.com/@businessoftechBluesky: https://bsky.app/profile/businessof.tech
The AI Breakdown: Daily Artificial Intelligence News and Discussions
Explore OpenAI's latest achievements with O3, the reasoning model that sparked conversations about its proximity to AGI. This episode unpacks its groundbreaking performance on benchmarks like ARC, Codeforces, and math challenges while addressing the implications for jobs, coding, and society. Hear expert insights on whether O3 signals the dawn of AGI or a significant milestone in AI's evolution. Brought to you by: Vanta - Simplify compliance - https://vanta.com/nlw The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown