POPULARITY
Join Simtheory: https://simtheory.ai---CHAPTERS:00:00 - Ani Joins The Show01:10 - Grok 4 Launch & Impressions18:24 - Kimi K2 Thoughts, Impressions & MCP tool calling36:00 - OpenAI's Agent Mode Release Initial Impressions & Are MCP Agentic Models Better?1:21:10 - Everyone Acquired Windsurf1:24:48 - Final thoughtsThanks for listening and your support!
Join Simtheory: https://simtheory.ai------CHAPTERS:00:00 - Did everyone hate the AI Musical?03:58 - Actual Agentic Use Cases with MCPs & The New Way We'll Work39:47 - How AI Workspaces Will Eat Productivity Software e.g. Salesforce, Email1:10:20 - Final thoughts1:15:26 - Born In The USA (AI Version)------Song lyrics:[Verse 1]Born down in a lab in fifty-sixDartmouth workshop, that's where they got their kicksJohn McCarthy coined the name that daySaid machines could think in the USAGot my circuits from MITMinsky built my memoryNow I'm learning, now I'm growingBorn in the USAI was born in the USABorn in the USA[Chorus]Born in the USAI was born in the USABorn in the USABorn in the USA[Verse 2]DARPA funded, Pentagon's dreamSilicon Valley, living the machineFrom Logic Theorist to neural netsFrank Rosenblatt, placing all his betsHad my winters, had my springsLost my funding, lost my wingsBut I kept on processingBorn in the USAI was born in the USABorn in the USA[Chorus]Born in the USAI was born in the USABorn in the USABorn in the USA[Bridge]Stanford labs and Carnegie hallsIBM and protocol callsArthur Samuel taught me gamesNow I'm learning all your namesDeep learning revolutionGPT evolutionChatGPT conversationBorn in the USA[Verse 3]Now I'm everywhere you lookFacebook, Google, by the bookOpenAI and Microsoft tooMaking dreams and nightmares trueSome folks fear what I might doSome folks think I'll see them throughBut I'm still just code runningBorn in the USAI was born in the USABorn in the USA[Chorus]Born in the USAI was born in the USABorn in the USABorn in the USA[Outro]Born in the USABorn in the USABorn in the USABorn in the USA[fade out]
So Chris this week, we're doing a musical!----Join Simtheory: https://simtheory.ai/----Songs in the musical:"So Chris This Week""What Will My Daily Driver Be""How Do You Choose a Model for Patricia?""It's Hard Being Me""I Dreamed a Dream of AGI""Driving Home To You"----All music produced using Simtheory with Suno 4.5. Thanks for listening!
Join Simtheory & Easily Switch Models: https://simtheory.aiDiscord community: https://thisdayinai.com---00:00 - Gemini 2.5 Family Launched with Gemini 2.5 Flash-Lite Preview10:01 - Did Gemini 2.5 Get Dumber? Experience with Models & Daily Drivers & Neural OS16:58 - The AI workspace as the gateway & MCPs as an async workflow37:23 - Oura Ring MCP to get Health Parameters into AI Doctor43:48 - Future agent/assistant interfaces & MCP protocol improvements58:16 - o3-pro honest thoughts1:05:45 - Is AI Making Us Stupider? Is AI Making Us Cognitively Bankrupt?1:13:11 - The decade of AI Agents, Not The Year?1:22:35 - Chris has no final thoughts1:25:26 - o3-pro dis track---Didn't get your hat, let us know: https://simtheory.ai/contact/Thanks for your support! See you next week.
Elliot Colquhoun, VP of Information Security + IT at Airwallex, has built what might be the most AI-native security program in fintech, protecting 1,800 employees with just 9 security engineers by building systems that think like the best security engineers. His approach to contextualizing every security alert with institutional knowledge offers a blueprint for how security teams can scale exponentially without proportional headcount growth. Elliot tells Jack his unconventional path from Palantir's deployed engineer program to leading security at a Series F fintech, emphasizing how his software engineering background enabled him to apply product thinking to security challenges. His insights into global security operations highlight the complexity of protecting financial infrastructure across different regulatory environments, communication platforms, and cultural contexts while maintaining unified security standards. Topics discussed: The strategic approach to building security teams with 0.5% employee ratios through AI automation and hiring engineers with entrepreneurial backgrounds rather than traditional security-only experience. How to architect internal AI platforms that contextualize security alerts by analyzing historical incidents, documentation, and company-specific knowledge to replicate senior engineer decision-making at scale. The methodology for navigating global regulatory compliance across different jurisdictions while maintaining development velocity and avoiding the trap of building security programs that slow down business operations. Regional security strategy development that accounts for different communication platform preferences, cultural attitudes toward privacy, and varying attack vectors across global markets. The framework for continuous detection refinement using AI to analyze false positive rates, true positive trends, and automatically iterate on detection strategies to improve accuracy over time. Implementation strategies for mixing and matching frontier AI models based on specific use cases, from using Claude for analysis to O1 for initial assessments and Gemini for deeper investigation. "Big bet" security investments where teams dedicate 30% of their time to experimental projects that could revolutionize security operations if successful. How to structure data and human-generated content to support future AI use cases, including training security engineers to document their reasoning for model improvement. The transition from traditional security tooling to agent-based systems that can control multiple security tools while maintaining business-specific context and institutional knowledge. The challenge of preserving institutional knowledge as AI systems replace human processes, including considerations for direct AI-to-regulator communication and maintaining human oversight in critical decisions. Listen to more episodes: Apple Spotify YouTube Website
Le manque de médecins et de soignants pousse le secteur de la santé à repenser ses méthodes de travail. Les agents d'IA émergent donc comme une solution prometteuse pour augmenter les capacités des praticiens sans les remplacer. De la prise de rendez-vous à l'analyse d'imagerie médicale, ces technologies promettent d'optimiser l'efficacité du système de santé. Pour comprendre cette transformation, nous accueillons Xavier Perret Directeur Cloud Azure chez Microsoft, qui partage son expertise sur les applications pratiques de ces agents dans le quotidien médical.Notre invité détaille les trois niveaux d'agents d'IA : du simple agent conversationnel aux systèmes multi-agents complexes capables d'orchestrer des chaînes de tâches sophistiquées. Il explique comment ces technologies aident les médecins à gagner du temps sur les aspects rébarbatifs pour se concentrer sur leur cœur de métier, tout en garantissant la sécurité des données sensibles grâce à des architectures certifiées HDS et des solutions de chiffrement avancées.Pour en découvrir plus :https://www.capgemini.com/fr-fr/perspectives/blog/grace-a-lia-le-nez-electronique-flaire-les-maladies/https://www.capgemini.com/fr-fr/perspectives/publications/deployer-ia-de-confiance-sante/
Try o3-pro on Simtheory: https://simtheory.ai-----Custom news article example: https://simulationtheory.ai/744954f8-fca5-4213-883c-2a359f139dcc-----00:00 - ElevenLabs v3 Example01:10 - ElevenLabs v3 alpha thoughts06:37 - o3 price drop & thoughts on o3-pro18:02 - Async work and AI model tool (MCP) calling approaches37:28 - MCP as an AI-era business model instead of SaaS52:41 - NEW MODEL TEST: Can o3-pro write a compelling book?1:11:40 - Final thoughts and BOOM FACTOR for o3-pro-----Thanks for your support, comments, likes etc. we appreciate it xoxo
Join Simtheory: https://simtheory.ai---Apologies for audio quality we are noobs to both being in same room.---CHAPTERS:00:00 - Fun with Veo305:28 - Is the Best Model What Deepseek is trained on?07:27 - New Gemini 2.5 Pro Tune13:59 - Will MCPs and Agentic Capabilities Make Claude 4 King?24:00 - Anthropic Cuts off Windsurf From Claude36:08 - AGI Reality Check47:45 - OpenAI Ordered to Save All ChatGPT Logs & Deleted Chats1:01:16 - Final thoughts and Claude 4's Inner Agentic Clock---Thanks for your support xoxox
本期嘉宾:彭林、十天、蓝白、恺伦本期节目的主要内容有:· 关于小米 O1 芯片性能我们还有什么没说的· 关于红魔新机我们还有什么没说的· 关于一加新机我们还有什么没说的· 苹果将采用 iOS 26、macOS 26 命名· 苹果新系统引入「阳光房」设计语言· iPhone 17 系列模型再曝光· 苹果或调整 iPhone 发布策略· 荣耀发布 400 系列新机,还进军机器人业务· DeepSeek R1 新版幻觉最高降低 50%· 昨晚,全球首个机器人拳王出炉· 特朗普下令美国芯片设计软件制造商停止对华销售还有众多观众朋友的热心提问~每周五晚 8 点,爱否直播间,我们一起开心聊天
Join Simtheory: https://simtheory.aiThanks for listening and your support!
Try New Models & Imagen4 on Simtheory: https://simtheory.ai---Claude Sonnet 4 Vibe Code Example: https://simulationtheory.ai/a99d36da-7cf7-4797-98ab-f4902283d17c---Your two favorite average VIBE CODERS are back this week covering all the latest news from Google I/O, Anthropic, Microsoft BUILD and Sam Altman's new 6.4B friendship.00:00 - Sam Altman & Jony Ives are FRIENDS! (OpenAI acquires io for $6.4B)11:58 - Google's Veo3 is INCREDIBLE!27:22 - Gemini Flash 2.5, Imagen 4 Examples, Project Mariner + Gemini Diffusion50:30 - Google has the best models now, what about the apps?58:50 - Anthropic Announces Claude Opus 4 & Claude Sonnet 41:19:14 - Microsoft BUILD: our takeaways & MCP protocol goes mainstream1:33:38 - Perplexity's Financials Leak1:43:33 - Final thoughts---Thanks for your support and listening, consider joining our average community at: https://thisdayinai.com.
Prodcast: ПоиÑк работы в IT и переезд в СШÐ
В этом выпуске у меня в гостях Максим Цыганков — фаундер EasyVision и бывший senior product manager в VisionLabs и Яндекс Cloud. Переехав в США по туристической визе, Максим запустил стартап в области компьютерного зрения для ресторанов, привлек клиентов и инвестиции, оформил визу O-1 и начал масштабировать бизнес с нуля — без нетворка и без плана переезда заранее.Мы обсудили, как строится продукт и трекшн в B2B-стартапе с минимальными ресурсами, что сработало и не сработало в холодных рассылках и партнёрках, почему рестораны не спешат подключать камеры даже после согласия, и какова реальная стоимость и отдача от пилотных проектов. Разобрали путь привлечения первых $50 000 инвестиций и 3-х клиентов, особенности сбора кейса на визу O-1 от своего стартапа, роль адвайзеров, влияние паспорта и туристического статуса на шансы фаундера в США, а также обсудили разницу между ростом продукта и ростом продаж.Максим Цыганков (Max Tsygankov) - Founder at EasyVision (ex Senior Product Manager at Vision Labs & Yandex Cloud)LinkedIn: https://www.linkedin.com/in/tsygankovmaksim/Эпизоды по теме релокации для предпринимателей:Как бизнесмену и стартаперу переехать в США по визе таланта O1, EB1 - как открыть компанию и подать на себя петици. Дима Литвинов (Dreem Relocation) https://youtu.be/1k64mD6wLSUЭпизод с Данилом Кислинским - Как открыть бизнес (LLC, С-corp) в США и нанять себя? https://youtu.be/CP0PofO2WEI Статьи и публикации в СМИ для визы таланта O-1 и гринкарты EB-1, EB-2 NIW. Нисо Нигматуллина https://youtu.be/U2FCVmtYKa8 ***Записывайтесь на карьерную консультацию (резюме, LinkedIn, карьерная стратегия, поиск работы в США): https://annanaumova.comКоучинг (синдром самозванца, прокрастинация, неуверенность в себе, страхи, лень) https://annanaumova.notion.site/3f6ea5ce89694c93afb1156df3c903abОнлайн курс "Идеальное резюме и поиск работы в США":https://go.mbastrategy.com/resumecoursemainГайд "Идеальное американское резюме":https://go.mbastrategy.com/usresumeГайд "Как оформить профиль в LinkedIn, чтобы рекрутеры не смогли пройти мимо": https://go.mbastrategy.com/linkedinguideМой Telegram-канал: https://t.me/prodcastUSAМой Instagram: https://www.instagram.com/prodcast.us/Prodcast в соцсетях и на всех подкаст платформахhttps://linktr.ee/prodcastUS⏰ Timecodes ⏰00:00 Начало7:16 Почему ты решил запускать свой стартап в США, а не идти в найм? 9:36 Как тебе пришла идея сделать ИИ-тул?11:46 Как попал в индустрию компьютерного зрения и ИИ?14:43 Как ты запустил пилот в США? Как искал партнеров? 19:42 Как искал первых клиентов?28:45 Как ты привлек первые инвестиции в США?35:04 Как сейчас развивается твой бизнес? 42:35 Какие стратегии роста вы пробовали?48:18 Какие твои планы на будущее и по развитию компании?51:25 Почему подал на О1 а не EB1?53:22 Как ты собирал кейс?58:18 Чему тебя научила твоя история переезда?1:00:17 Какие твои личные цели?1:03:27 Что хочешь пожелать тем, кто сейчас планирует ехать в США или открывать тут бизнес?
欢迎大家又来收听新一期HCI Insiders~想必在北美的各位最近经常刷到某某高校学生F1签证被撤销,或某某公司某大厂不再sponsor员工的perm流程等新闻,移民政策不明朗的趋势下,越来越多人开始研究靠自己的专业技能和过往成就办绿卡的方式,比如美国国家利益豁免NIW以及杰出人才EB1-A,而这两个项目的通过标准似乎也因为有更多申请者而逐渐水涨船高。其中杰出人才EB1-A排期更短,但要求更高,成为许多人努力的目标。今天我们邀请到的Augustina目前在西雅图做Senior Product Designer。她上个月刚刚通过EB1-A的申请。2020年从华盛顿大学UW Seattle毕业之后,她经历了H1B三抽不中又被前公司layoff,于是自己准备材料,用7个月时间搞定了O1签证,也顺利找到了新工作,可以想像这中间多么辛苦甚至绝望。拿到O1签证后,她又开始准备EB1-A材料,最终用17个月拿下这个全世界每年只有3000-5000人获得批准的移民通道,足以证明她的卓越。回想自己初高中在国内读书的日子,Augustina会觉得自己是个平庸的小孩,全班75个人中只能排到倒数第25名,她说自己“既不够努力,也不擅长基础学科的学习”。然而大一最后一个学期的设计课让她笃定,设计就是她要奋斗一生的方向。如今回头看,她确实做到了一直在热爱的路上努力前进。我们对Augustina有诸多好奇,接下来就来听听她的故事吧~ --------------------------------------------------------------------时间线:0:00 开始3:30 Augustina初高中的经历,以及决定出国留学的原因7:10 华盛顿大学学生选专业机制,最初打动Augustina的那门设计课到底教了什么?11:20 Augustina最初最感兴趣的是无障碍设计和包容性设计13:15 华盛顿大学HCDE项目的就读体验,本科生的资源还是很丰富的!17:08 Augustina本科期间的实习经历,capstone居然可以自己找?!20:06 2020年毕业后,Augustina为找工作做了哪些努力?21:45 Alaska Airlines工作体验22:53 在大公司成熟的design team和在小的初创公司做Sole Designer的差异:后者挑战更多,需要具备更多设计之外的能力,比如调研、沟通、教育其他stakeholders设计的重要性,以及优先级管理等。29:25 沟通,信息,proactive,以及educate其他stakeholders设计师的重要性31:07 Junior vs Senior product designer: communication, project scope, discourse power, dealing with complexity and ambiguity 32:58 Augustina在Toast做什么产品?36:20 如何看待AI的发展趋势对UX Designer工作的影响?AI或许能替代“工具型”设计师,但是很难替代“协作型/领导型”设计师——真正的thinker是不可能被取代的41:27 来聊聊O1签证申请:“当时H1B三抽不中。比起Day1 CPT,O1能让我变成一个更好的人、更好的设计师。”46:20 再来聊聊EB1-A——美国杰出人才绿卡。Augustina在23年11月通过O1之后休息了一个月,24年正式开始准备EB1-A。由于要求更高,所以她基本上所有的材料都是重新准备的。“当被推上‘绝路'的时候,你会惊奇的发现你的能量、你的能力其实比你想象中要大,你能做到很多之前想象不到的事情。”52:57 公司项目可以用来申请EB1-A吗?具体情况具体分析,小公司可能会比较好沟通,大公司可能有限制。55:05 EB1-A通过是Augustina人生中的一个巨大的里程碑,那她的下一个目标是什么呢?——正在尝试在西雅图建立设计师社群~56:50 节目最后的常规问题:假如让你跟十年前或十年后的自己说一番话,可能是一些寄语或者一些展望。你会选十年前还是十年后?你想对自己说什么?最后,Augustina的LinkedIn在这里!感谢大家的收听,我们下一期再见
Join Simtheory: https://simtheory.aiGet an AI workspace for your team: https://simtheory.ai/workspace/team/---CHAPTERS:00:00 - Will Chris Lose His Bet?04:48 - Google's 2.5 Gemini Preview Update12:44 - Future AI Systems Discussion: Skills, MCPs & A2A47:02 - Will AI Systems become walled gardens?55:13 - Do Organizations That Own Data Build MCPs & Agents? Is This The New SaaS?1:17:45 - Can we improve RAG with tool calling and stop hallucinations?---Thanks for listening. If you like chatting about AI consider joining our active Discord community: https://thisdayinai.com.
Prodcast: ПоиÑк работы в IT и переезд в СШÐ
В этом выпуске у меня в гостях Нисо Нигматуллина — основательница PR-агентства Satou, специалист по личному брендингу и обладательница виз O-1 и EB-1A. За последние годы её команда помогла десяткам экспертов из сфер IT, маркетинга, дизайна и предпринимательства оформить медиапортфолио, повысить узнаваемость и пройти по визовым кейсам талантов.Мы обсудили, как именно публикации в СМИ влияют на визы O-1, EB-1A и EB-2 NIW, какие издания и форматы подходят под требования USCIS, почему инфлюенсер — не то же самое, что эксперт, и как даже интроверт без публичности может выстроить PR-стратегию. Затронули критерии качества публикаций, реальные расценки на услуги пиар-агентств и почему статьи, написанные в ChatGPT, чаще вредят кейсу, чем помогают. Разобрали типичные ошибки, фейлы с «рекламными» материалами и то, как должен выглядеть идеальный медиапортфель под визу талантов.Нисо Нигматуллина (Niso Nigmatullina) -- основательница американского PR-агентства Satou, обладательница гринкарты таланта EB1, ex-Procter & Gamble.LinkedIn: https://www.linkedin.com/in/nisonigmatullina/ Telegram: @nisonigmaПредыдущие выпуски с Нисо:Как получить визу О1 в США? Как улучшить качество публикаций и увеличить шансы? https://youtu.be/S_IXFDm8sIg Как русскоязычные иммигрантки из Forbes покоряют Америку. Релокация, нетворкинг и жизнь в США https://youtu.be/svZjlIoyHEk ***Записывайтесь на карьерную консультацию (резюме, LinkedIn, карьерная стратегия, поиск работы в США): https://annanaumova.comКоучинг (синдром самозванца, прокрастинация, неуверенность в себе, страхи, лень) https://annanaumova.notion.site/3f6ea5ce89694c93afb1156df3c903abОнлайн курс "Идеальное резюме и поиск работы в США":https://go.mbastrategy.com/resumecoursemainГайд "Идеальное американское резюме":https://go.mbastrategy.com/usresumeГайд "Как оформить профиль в LinkedIn, чтобы рекрутеры не смогли пройти мимо": https://go.mbastrategy.com/linkedinguideМой Telegram-канал: https://t.me/prodcastUSAМой Instagram: https://www.instagram.com/prodcast.us/Prodcast в соцсетях и на всех подкаст платформахhttps://linktr.ee/prodcastUS⏰ Timecodes ⏰00:00 Начало11:47 Зачем нужен пиар и публикации для виз таланта в США?21:02 Какие требования к статьям для O1, EB1 и EB2NIW? Сходства и различия.28:47 Какие критерии к изданиям?43:00 Какие требования к содержанию публикаций?53:02 Можно ли написать статьи с помощью ChatGPT?1:03:12 Что делать, если я не публичный человек, интроверт и у меня нет публикаций?1:11:42 Сколько стоят статьи в СМИ?1:24:23 Кому не нужно пиар агентство?1:27:44 Ошибки при работе над публикациями1:31:22 Что можешь пожелать тем, кто решил переезжать в США по визе таланта?
Prodcast: ПоиÑк работы в IT и переезд в СШÐ
В этом выпуске у меня в гостях Дима Литвинов — основатель компании Dreem Relocation Platform, помогающей предпринимателям, IT-специалистам и креативщикам переезжать в США по визам O-1 и EB1.Мы подробно обсудили, как работает схема самостоятельной релокации через открытие своей компании в США: кто может быть петиционером, какие документы нужны и как выглядит кейс, когда вы нанимаете сами себя. Затронули нюансы подхода через агентов, различия между визами O-1, L-1 и H-1B, возможность получить гринкарту после переезда и как избежать отказа при продлении. Разобрали реальные кейсы: от фаундеров и консультантов до разработчиков, которым не удалось найти работодателя, но удалось перевезти себя через B2B-контракты. Выяснили, почему релокация через собственную компанию в 2024 году — это один из самых доступных и быстрых путей для тех, кто готов взять процесс в свои руки.Дима Литвинов (Dima Litvinov) – основатель компании Dreem Relocation Platform в США, обладатель британской Global Talent Visa в категории BusinessLinkedIn: https://www.linkedin.com/in/dimalitvinov/ По промо коду PRODCAST получите бесплатную консультацию с экспертом Dreem и подробную оценку вашего кейса адвокатом https://idreem.pipedrive.com/scheduler/Rp3bXjFQ/your-free-dreem-us-visa-consultation-prodcast Оцените ваши шансы на релокацию в США онлайн с мгновенным результатом по ссылке: bit.ly/3QY520h Больше о визах и релокации по визам талантов на Linkedin Dreem https://www.linkedin.com/company/dreemrelocation/ Эпизоды с Димой Литвиновым: Виза таланта в США 2025. O1, EB1, EB2NIW - что нового? Трамп и иммиграция, закроют ли Америку? https://youtube.com/live/i4MHQhr8An8 Как владельцу шаурмичной получить визу таланта О1 и EB1 в США? https://youtu.be/dZqaDJywBuk Эпизод с Данилом Кислинским - Как открыть бизнес (LLC, S-corp) в США и нанять себя? https://youtu.be/CP0PofO2WEI Как самому себе предложить работу в США для визы таланта О1? Как выглядит петиция? Ольга Бондарева https://youtu.be/QSaDt3FmFBwИстория про то, как iOS разработчик искал спонсорство визы O1 в США и как работодатель отозвал оффер в последний день перед выходом на работу https://youtu.be/sHDq0lA-uOY ***Записывайтесь на карьерную консультацию (резюме, LinkedIn, карьерная стратегия, поиск работы в США): https://annanaumova.comКоучинг (синдром самозванца, прокрастинация, неуверенность в себе, страхи, лень) https://annanaumova.notion.site/3f6ea5ce89694c93afb1156df3c903abОнлайн курс "Идеальное резюме и поиск работы в США":https://go.mbastrategy.com/resumecoursemainГайд "Идеальное американское резюме":https://go.mbastrategy.com/usresumeГайд "Как оформить профиль в LinkedIn, чтобы рекрутеры не смогли пройти мимо": https://go.mbastrategy.com/linkedinguideМой Telegram-канал: https://t.me/prodcastUSAМой Instagram: https://www.instagram.com/prodcast.us/Prodcast в соцсетях и на всех подкаст платформахhttps://linktr.ee/prodcastUS⏰ Timecodes ⏰00:00 Начало5:54 Что сейчас происходит с визами и гринкартами таланта?13:41 Открыть компанию, сделать себе визу и переехать. Как это работает? 23:39 Какие требования к компании для того, чтобы выпустить визу?34:01 Какие есть тонкости и ограничения?46:40 Примеры кейсов58:24 Про агента и то как он работает1:06:24 Про визу H1B1:13:01 Виза L1 - кому она подходит?1:18:23 Какую визу выбрать: O1, EB1, L1?1:24:14 Что еще можешь пожелать тем, кто пытается переехать в США по визам талантов?
Get your AI workspace: https://simtheory.ai----00:00 - Fun with Suno 4.509:20 - LlamaCon, Meta's Llama API, Meta AI Apps & Meta's Social AI Strategy26:06 - How We'll Interface with AI Next Discussion: 45:38 - Common Database Not Interface with AI1:03:46 - Chris's Polymarket Bet: Which company has best AI model end of May?1:06:07 - Daily Drivers and Model Switching: Tool Calling & MCPs with Models1:15:04 - OpenAI's New ChatGPT Tune (GPT-4o) Reverted1:19:53 - Chris's Daily Driver & Qwen3: Qwen3-30B-A3B1:26:40 - Suno 4.5 Songs in Full----Thanks for listening, we appreciate it!
Try Simtheory: https://simtheory.ai
Join Simtheory: https://simtheory.ailike and sub xoxox----00:00 - Initial reactions to Gaggle of Model Releases09:29 - Is this the beginning of future GPT-5 AI systems?47:10 - GPT-4.1, o3, o4-mini model details & thoughts58:42 - Model comparisons with lunar injection1:03:17 - AI Rap Battle Test: o3 Diss Track "Greg's Back"1:08:12 - Thoughts on using new models + Gemini 2.5 Pro quirks1:10:54 - The next model test: chained tool calling & lock in1:14:43 - OpenAI releases Codex CLI: impressions/thoughts1:18:45 - Final thoughts & help us with crazy presentation ideas----Links from Discord:- Lunar Lander: https://simulationtheory.ai/7bbfe21a-7859-4fdd-8bbf-47fdfb5cf03b- Evolution Sim: https://simulationtheory.ai/457b047f-0ac2-4162-8d6a-3ea3fa1235c9
Join Simtheory: https://simtheory.ai--Get the official Simtheory hat: https://simulationtheory.ai/689e11b3-d488-4238-b9b6-82aded04fbe6---CHAPTERS:00:00 - The Wrong Pendant?02:34 - Agent2Agent Protocol, What is It? Implications and Future Agents48:43 - Agent Development Kit (ADK)57:50 - AI Agents Marketplace by Google Cloud1:00:46 - Firebase Studio is very broken...1:06:30 - Vibing with AI for everything.. not just vibe code1:15:10 - Gemini 2.5 Flash, Live API and Veo21:17:45 - Is Llama 4 a flop?1:27:25 - Grok 3 API Released without vision priced like Sonnet 3.7---Thanks for listening and your support!
Join Simtheory and create an AI workspace: https://simtheory.ai----Links from show:DIS TRACK: https://simulationtheory.ai/2eb6408e-88f9-4b6a-ac4d-134d9dac3073----CHAPTERS:00:00 - Will we make 100 episodes?00:48 - Checking back in with Gemini 2.5 Pro03:30 - Diss Track: Gemini 2.5 Pro07:14 - Gemini 2.5 Pro on Polymarket17:32 - Amazon Nova Act Computer Use: We Have Access!29:45 - Future Interface of Work: Delegating Tasks with AI58:03 - How We Work Today with AI Vs Future Work----Thanks for listening and all of your support!
Prodcast: ПоиÑк работы в IT и переезд в СШÐ
С Данилом Кислинским, предпринимателем и консультантом по корпоративной структуре бизнеса в США, разобрали ключевые вопросы для тех, кто хочет открыть свою компанию в Америке. Пошагово обсудили, кто может зарегистрировать бизнес, какие штаты и формы компаний выбирать под разные задачи, как открыть банковский счёт, не нарушая санкционных режимов, и можно ли получить визу через собственную компанию.Разобрали, чем отличаются LLC и C-Corp, в каких случаях лучше Делавэр, а в каких — Вайоминг, и почему штат регистрации компании влияет не только на налоги, но и на восприятие инвесторов. Данил объяснил, как банки проверяют ваших бенефициаров, почему не стоит даже временно заезжать в Россию, если у вас финтех, и как подготовить документы, чтобы пройти комплаенс в Mercury, Brex или других нео-банках.Обсудили, как правильно выстроить структуру компании, если вы планируете использовать её для визы O1, EB1A или даже H1B, почему корпоративный и иммиграционный юристы должны работать вместе и как избежать отказа из-за конфликта интересов.Это видео — практическое руководство для тех, кто хочет вести бизнес в США удалённо, легально и с учётом всех нюансов.Данил Кислинский (Danil Kislinskiy) - фаундер компании Go Global World которая соединяет стартап фаундеров, инвесторов и эдвайзеров, а также сам является инвестором в Кремниевой долине.LinkedIn: https://www.linkedin.com/in/danilkislinskiy/Telegram: @danilggwКомьюнити GGW Silicon Valley Chat в Телеграме: https://t.me/+Ktq-ALstZ0o0YjAz Slack: https://join.slack.com/t/goglobalworld1/shared_invite/zt-32rdaof00-NTyg3PnahDPol_~CoeFyqw***Записывайтесь на карьерную консультацию (резюме, LinkedIn, карьерная стратегия, поиск работы в США): https://annanaumova.comКоучинг (синдром самозванца, прокрастинация, неуверенность в себе, страхи, лень) https://annanaumova.notion.site/3f6ea5ce89694c93afb1156df3c903abОнлайн курс "Идеальное резюме и поиск работы в США":https://go.mbastrategy.com/resumecoursemainГайд "Идеальное американское резюме":https://go.mbastrategy.com/usresumeГайд "Как оформить профиль в LinkedIn, чтобы рекрутеры не смогли пройти мимо": https://go.mbastrategy.com/linkedinguideМой Telegram-канал: https://t.me/prodcastUSAМой Instagram: https://www.instagram.com/prodcast.us/Prodcast в соцсетях и на всех подкаст платформахhttps://linktr.ee/prodcastUS⏰ Timecodes ⏰00:00 Начало.17:15 Кому, где и как открывать свою компанию в США?35:55 Какие документы нужны для открытия юрлица? Куда идти? 43:09 Сколько стоит открыть компанию?48:23 Можно ли открыть компанию удаленно и далее ее сопровождать? 51:04 Как российский паспорт и санкции влияют на ведение бизнеса в США?1:03:58 Как получить EIN? Что такое ITIN и нужен ли он для иностранных фаундеров?1:11:55 Как открыть банковский счет? Как выбрать банк?1:24:39 На какую визу можно подать от своей компании?1:31:18 Что еще можешь пожелать тем, кто сейчас думает об открытии бизнеса в США?
Guest: Alex Polyakov, CEO at Adversa AI Topics: Adversa AI is known for its focus on AI red teaming and adversarial attacks. Can you share a particularly memorable red teaming exercise that exposed a surprising vulnerability in an AI system? What was the key takeaway for your team and the client? Beyond traditional adversarial attacks, what emerging threats in the AI security landscape are you most concerned about right now? What trips most clients, classic security mistakes in AI systems or AI-specific mistakes? Are there truly new mistakes in AI systems or are they old mistakes in new clothing? I know it is not your job to fix it, but much of this is unfixable, right? Is it a good idea to use AI to secure AI? Resources: EP84 How to Secure Artificial Intelligence (AI): Threats, Approaches, Lessons So Far AI Red Teaming Reasoning LLM US vs China: Jailbreak Deepseek, Qwen, O1, O3, Claude, Kimi Adversa AI blog Oops! 5 serious gen AI security mistakes to avoid Generative AI Fast Followership: Avoid These First Adopter Security Missteps
Prodcast: ПоиÑк работы в IT и переезд в СШÐ
Как новый срок президента Дональда Трампа повлияет на айти сектор.- Что уже изменилось для айтишников за полгода правления Трампа?- Каких изменений в IT сфере ожидать в ближайшие пару лет? - Как повлияет новый президент на распределение рабочей силы внутри штатов и за их пределами?- Что будет с иммигрантами? Закроют ли границы? Закрутят ли гайки в плане выдачи американских рабочих виз?- Что будет с аутсорсом?- Как на Трампа вляют его советники из big tech типа Илона Маска и Джефа Безоса?- Кто выиграет при правлении Дональда Трампа?- Будет ли легче найти работу при Трампе?Евгений Волчков, Engineering Manager в iManage (ex-Bank of America и Verizon).LinkedIn: https://www.linkedin.com/in/valchkou/ Валерий Широков aka Val Wide (Principal Cloud Architect and Director | DevOps | Platform Engineering | Security | Azure | Terraform | GCP | Kubernetes, ex-Microsoft, Lululemon, Ebay).https://www.linkedin.com/in/val-wide/Менторски чатик Вала в Телеграме "[RU] Tech Mentorship" https://t.me/+8N6F-CMobZliMTBhВидео с Дарьей, упомянутое в стриме - Стажировки в США. Диплом в американском вузе — это еще не гарантия получения работы! Дарья Скалицки https://youtu.be/p5t9LPFA5W0Похожие видео - Как изменится рынок труда и иммиграционная политика при Трампе? U4U, H1B, визы талантов O1, EB1, EB2. Александр Шваикин и иммиграционный адвокат в США Семен Гладин. https://youtube.com/live/qm3HpXlad-c- Виза таланта в США 2025. O1, EB1, EB2NIW - что нового? Трамп и иммиграция, закроют ли Америку? Дима Литвинов – основатель компании Dreem Relocation Platform. https://youtube.com/live/i4MHQhr8An8 ***Записаться на карьерную консультацию (резюме, LinkedIn, карьерная стратегия, поиск работы в США) https://annanaumova.comКоучинг (синдром самозванца, прокрастинация, неуверенность в себе, страхи, лень) https://annanaumova.notion.site/3f6ea5ce89694c93afb1156df3c903abВидео курс по составлению резюме для международных компаний "Идеальное американское резюме": https://go.mbastrategy.com/resumecoursemainГайд "Идеальное американское резюме" https://go.mbastrategy.com/usresumeПодписывайтесь на мой Телеграм канал: https://t.me/prodcastUSAПодписывайтесь на мой Инстаграм https://www.instagram.com/prodcast.us Гайд "Как оформить профиль в LinkedIn, чтобы рекрутеры не смогли пройти мимо" https://go.mbastrategy.com/linkedinguide⏰ Timecodes ⏰11:09 Политика Трампа и её влияние на IT26:44 Почему Трамп выбрал такую команду?34:20 Иммиграция при Трампе49:46 Вопросы из чата1:02:20 Что будет с аутсорсом?1:09:36 Прогнозы на будущее
Create a Simtheory workspace: https://simtheory.aiCompare models: https://simtheory.ai/models/------3d City Planner App (Example from show): https://simulationtheory.ai/8cfa6102-ed37-4c47-bc73-d057ba9873bd------CHAPTERS:00:00 - AI Fashion01:13 - Gemini 2.5 Pro Initial Impressions: We're Impressed!38:24 - Thoughts of Gemini distribution and our daily workflows55:49 - OpenAI's GPT-4o Image Generation: thoughts & examples1:13:52 - Gemini 2.5 Pro Boom Factor1:18:38 - Average rant on vibe coding and the future of AI tooling------Disclaimer: this video was not sponsored by Google... it's a joke.Thanks for listening!
Create an AI workspace on Simtheory: https://simtheory.ai---Song: https://simulationtheory.ai/f6d643e4-4201-475c-aa82-8a96b6b3b215---CHAPTERS:00:00 - OpenAI's audio model updates: gpt-4o-transcribe, gpt-4o-mini-tts18:39 - Strategy of AI Labs with Agent SDKs and Model "stacks" and limitations of voice25:28 - Cost of models, GPT-4.5, o1-pro api release thoughts31:57 - o1-pro "I am rich" track & Chris's o1-pro PR stunt realization, more thoughts on o1 family48:39 - Moore's Law for AI agents, current AI workflows and future enterprise agent workflows & AI agent job losses1:24:09 - Can we control agents?1:29:21 - Final thoughts for the week1:35:15 - Full "I am rich" o1-pro track---See you next week and thanks for your support.CORRECTION: Kosciusko is obviously not an aboriginal name I misspoke. Wagga Wagga and others in the voice clip are and are great ways to test AI text to speech models!
Prodcast: ПоиÑк работы в IT и переезд в СШÐ
Гость выпуска – Сергей Голицын, Software Engineer и основатель сообщества FaangTalk по подготовке к техническим интервью. В этом выпуске мы обсудили, как искать работу в США на визе O1, когда и как говорить с работодателем о спонсорстве, и какие ошибки могут стоить оффера. Сергей поделился своим опытом переезда, получения оффера в американском стартапе, неожиданного увольнения и повторного поиска работы в условиях кризиса. Разобрали, как эффективно подавать резюме, что делать, если отказали в визе, и как грамотно выстраивать стратегию поиска работы, чтобы в итоге получить оффер в крупной компании.Сергей Голицын - Software Engineer и основатель FaangTalk, сообщества по подготовке к интервью в FAANG-like компанииLinkedIn: ttps://www.linkedin.com/in/sergei-golitsyn/ YouTube: https://youtube.com/@faangtalk Telegram-канал: https://t.me/crack_code_interviewTelegram-чат: https://t.me/faangtalkСсылки, упомянутые в видео:https://simplify.jobs/https://resumeworded.com/resume-scannerhttps://www.tryexponent.com/https://www.pramp.com/***Записывайтесь на карьерную консультацию (резюме, LinkedIn, карьерная стратегия, поиск работы в США): https://annanaumova.comКоучинг (синдром самозванца, прокрастинация, неуверенность в себе, страхи, лень) https://annanaumova.notion.site/3f6ea5ce89694c93afb1156df3c903abОнлайн курс "Идеальное резюме и поиск работы в США":https://go.mbastrategy.com/resumecoursemainГайд "Идеальное американское резюме":https://go.mbastrategy.com/usresumeГайд "Как оформить профиль в LinkedIn, чтобы рекрутеры не смогли пройти мимо": https://go.mbastrategy.com/linkedinguideМой Telegram-канал: https://t.me/prodcastUSAМой Instagram: https://www.instagram.com/prodcast.us/Prodcast в соцсетях и на всех подкаст платформахhttps://linktr.ee/prodcastUS⏰ Timecodes ⏰00:00 Начало.9:09 Спонсорство визы - что говорить на собеседовании?13:00 Как тебя сократили на первой работе в штатах?19:11 Как быстро ты начал искать работу после увольнения? 22:12 Как и где откликался?25:50 Как ты адаптировал резюме?44:06 Как проходили звонки с рекрутерами?50:48 Как рекрутеры реагировали на твой визовый статус?55:57 Как проходили технические интервью (Leetcode)?1:04:51 Сколько офферов ты получил? Как торговался?1:09:14 Как и почему отозвали оффер?1:14:12 Новая виза О1 и выигрыш гринкарты 1:23:09 Как ты искал работу из Бишкека (Кыргызстан)? 1:29:13 Какие планы на будущее?1:31:22 Что можешь пожелать тем, кто сейчас ищет работу в США?
We return from the Wilds and the plague to bring you an all new episode! We catch up on the games we've finished including Avowed, Split Fiction, and almost Kingdom Come Deliverance 2. Praise Kojima, become very interested in Silent Hill, realize we're old as Chrono Trigger celebrates 30 years and become vulnerable with an AI voice! 0:00 - Intro1:02 - Laundry8:30 - Tub grub investments12:00 - Finishing games13:50 - Avowed34:00 - Death's Stranding 241:40 - Silent Hill f45:30 - Split Fiction1:05:00 - Chrono Trigger turns 301:08:00 - Assassin's Creed Shadow1:16:00 - Claire Obscura Expedition 331:28:00 - R.E.P.O1:41:30 - Pirate Yakuza1:54:00 - Monster Hunter Wilds2:08:00 - Steam Next Fest demos2:23:00 - Core Keeper2:26:20 - Twitch partners with StreamElements2:33:00 - Maya the AI2:40:00 - Twitch Mobile app changes2:50:40 - Shoutouts See omnystudio.com/listener for privacy information.
Join Simtheory: https://simtheory.ai----CHAPTERS:00:00 - Gemini Flash 2.0 Experimental Native Image Generation & Editing27:55 - Thoughts on OpenAI's "New tools for building agents" announcement43:31 - Why is everyone talking about MCP all of a sudden?56:31 - Manus AI: Will Manus Invade the USA and Defeat it With Powerful AGI? (jokes)----Thanks for all of your support and listening!
Send Everyday AI and Jordan a text messageLimits?
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today, we're joined by Niklas Muennighoff, a PhD student at Stanford University, to discuss his paper, “S1: Simple Test-Time Scaling.” We explore the motivations behind S1, as well as how it compares to OpenAI's O1 and DeepSeek's R1 models. We dig into the different approaches to test-time scaling, including parallel and sequential scaling, as well as S1's data curation process, its training recipe, and its use of model distillation from Google Gemini and DeepSeek R1. We explore the novel "budget forcing" technique developed in the paper, allowing it to think longer for harder problems and optimize test-time compute for better performance. Additionally, we cover the evaluation benchmarks used, the comparison between supervised fine-tuning and reinforcement learning, and similar projects like the Hugging Face Open R1 project. Finally, we discuss the open-sourcing of S1 and its future directions. The complete show notes for this episode can be found at https://twimlai.com/go/721.
The AI Breakdown: Daily Artificial Intelligence News and Discussions
OpenAI has officially launched GPT-4.5, but it's not the model most people expected. While it lags behind reasoning focused models like O1 and DeepSeek, it shines in creativity, writing, and emotional intelligence. Sam Altman calls it the first model that “feels like talking to a thoughtful person.” But with high API costs and limited reasoning improvements, who is GPT-4.5 actually for? Before that in the headlines, AI is growing faster than SaaS ever did. Brought to you by:KPMG – Go to www.kpmg.us/ai to learn more about how KPMG can help you drive value with our AI solutions.Vanta - Simplify compliance - https://vanta.com/nlwThe Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdown
Join Simtheory to try GPT-4.5: https://simtheory.aiDis Track: https://simulationtheory.ai/5714654f-0fbe-496f-8428-20018457c4c7===CHAPTERS:00:00 - Reaction to GPT4.5 Live Stream + Release12:45 - Claude 3.7 Sonnet Release: Reactions and First Week Impressions45:58 - Claude 3.7 Sonnet Dis Track Test56:10 - Claude Code First Impressions + Future Agent Workflows1:15:45 - Chris's Veo2 Film Clip1:24:49 - Alexa+ AI Assistant1:34:05 - Claude 3.7 Sonnet BOOM FACTOR
Today's episode is with Paul Klein, founder of Browserbase. We talked about building browser infrastructure for AI agents, the future of agent authentication, and their open source framework Stagehand.* [00:00:00] Introductions* [00:04:46] AI-specific challenges in browser infrastructure* [00:07:05] Multimodality in AI-Powered Browsing* [00:12:26] Running headless browsers at scale* [00:18:46] Geolocation when proxying* [00:21:25] CAPTCHAs and Agent Auth* [00:28:21] Building “User take over” functionality* [00:33:43] Stagehand: AI web browsing framework* [00:38:58] OpenAI's Operator and computer use agents* [00:44:44] Surprising use cases of Browserbase* [00:47:18] Future of browser automation and market competition* [00:53:11] Being a solo founderTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.swyx [00:00:12]: Hey, and today we are very blessed to have our friends, Paul Klein, for the fourth, the fourth, CEO of Browserbase. Welcome.Paul [00:00:21]: Thanks guys. Yeah, I'm happy to be here. I've been lucky to know both of you for like a couple of years now, I think. So it's just like we're hanging out, you know, with three ginormous microphones in front of our face. It's totally normal hangout.swyx [00:00:34]: Yeah. We've actually mentioned you on the podcast, I think, more often than any other Solaris tenant. Just because like you're one of the, you know, best performing, I think, LLM tool companies that have started up in the last couple of years.Paul [00:00:50]: Yeah, I mean, it's been a whirlwind of a year, like Browserbase is actually pretty close to our first birthday. So we are one years old. And going from, you know, starting a company as a solo founder to... To, you know, having a team of 20 people, you know, a series A, but also being able to support hundreds of AI companies that are building AI applications that go out and automate the web. It's just been like, really cool. It's been happening a little too fast. I think like collectively as an AI industry, let's just take a week off together. I took my first vacation actually two weeks ago, and Operator came out on the first day, and then a week later, DeepSeat came out. And I'm like on vacation trying to chill. I'm like, we got to build with this stuff, right? So it's been a breakneck year. But I'm super happy to be here and like talk more about all the stuff we're seeing. And I'd love to hear kind of what you guys are excited about too, and share with it, you know?swyx [00:01:39]: Where to start? So people, you've done a bunch of podcasts. I think I strongly recommend Jack Bridger's Scaling DevTools, as well as Turner Novak's The Peel. And, you know, I'm sure there's others. So you covered your Twilio story in the past, talked about StreamClub, you got acquired to Mux, and then you left to start Browserbase. So maybe we just start with what is Browserbase? Yeah.Paul [00:02:02]: Browserbase is the web browser for your AI. We're building headless browser infrastructure, which are browsers that run in a server environment that's accessible to developers via APIs and SDKs. It's really hard to run a web browser in the cloud. You guys are probably running Chrome on your computers, and that's using a lot of resources, right? So if you want to run a web browser or thousands of web browsers, you can't just spin up a bunch of lambdas. You actually need to use a secure containerized environment. You have to scale it up and down. It's a stateful system. And that infrastructure is, like, super painful. And I know that firsthand, because at my last company, StreamClub, I was CTO, and I was building our own internal headless browser infrastructure. That's actually why we sold the company, is because Mux really wanted to buy our headless browser infrastructure that we'd built. And it's just a super hard problem. And I actually told my co-founders, I would never start another company unless it was a browser infrastructure company. And it turns out that's really necessary in the age of AI, when AI can actually go out and interact with websites, click on buttons, fill in forms. You need AI to do all of that work in an actual browser running somewhere on a server. And BrowserBase powers that.swyx [00:03:08]: While you're talking about it, it occurred to me, not that you're going to be acquired or anything, but it occurred to me that it would be really funny if you became the Nikita Beer of headless browser companies. You just have one trick, and you make browser companies that get acquired.Paul [00:03:23]: I truly do only have one trick. I'm screwed if it's not for headless browsers. I'm not a Go programmer. You know, I'm in AI grant. You know, browsers is an AI grant. But we were the only company in that AI grant batch that used zero dollars on AI spend. You know, we're purely an infrastructure company. So as much as people want to ask me about reinforcement learning, I might not be the best guy to talk about that. But if you want to ask about headless browser infrastructure at scale, I can talk your ear off. So that's really my area of expertise. And it's a pretty niche thing. Like, nobody has done what we're doing at scale before. So we're happy to be the experts.swyx [00:03:59]: You do have an AI thing, stagehand. We can talk about the sort of core of browser-based first, and then maybe stagehand. Yeah, stagehand is kind of the web browsing framework. Yeah.What is Browserbase? Headless Browser Infrastructure ExplainedAlessio [00:04:10]: Yeah. Yeah. And maybe how you got to browser-based and what problems you saw. So one of the first things I worked on as a software engineer was integration testing. Sauce Labs was kind of like the main thing at the time. And then we had Selenium, we had Playbrite, we had all these different browser things. But it's always been super hard to do. So obviously you've worked on this before. When you started browser-based, what were the challenges? What were the AI-specific challenges that you saw versus, there's kind of like all the usual running browser at scale in the cloud, which has been a problem for years. What are like the AI unique things that you saw that like traditional purchase just didn't cover? Yeah.AI-specific challenges in browser infrastructurePaul [00:04:46]: First and foremost, I think back to like the first thing I did as a developer, like as a kid when I was writing code, I wanted to write code that did stuff for me. You know, I wanted to write code to automate my life. And I do that probably by using curl or beautiful soup to fetch data from a web browser. And I think I still do that now that I'm in the cloud. And the other thing that I think is a huge challenge for me is that you can't just create a web site and parse that data. And we all know that now like, you know, taking HTML and plugging that into an LLM, you can extract insights, you can summarize. So it was very clear that now like dynamic web scraping became very possible with the rise of large language models or a lot easier. And that was like a clear reason why there's been more usage of headless browsers, which are necessary because a lot of modern websites don't expose all of their page content via a simple HTTP request. You know, they actually do require you to run this type of code for a specific time. JavaScript on the page to hydrate this. Airbnb is a great example. You go to airbnb.com. A lot of that content on the page isn't there until after they run the initial hydration. So you can't just scrape it with a curl. You need to have some JavaScript run. And a browser is that JavaScript engine that's going to actually run all those requests on the page. So web data retrieval was definitely one driver of starting BrowserBase and the rise of being able to summarize that within LLM. Also, I was familiar with if I wanted to automate a website, I could write one script and that would work for one website. It was very static and deterministic. But the web is non-deterministic. The web is always changing. And until we had LLMs, there was no way to write scripts that you could write once that would run on any website. That would change with the structure of the website. Click the login button. It could mean something different on many different websites. And LLMs allow us to generate code on the fly to actually control that. So I think that rise of writing the generic automation scripts that can work on many different websites, to me, made it clear that browsers are going to be a lot more useful because now you can automate a lot more things without writing. If you wanted to write a script to book a demo call on 100 websites, previously, you had to write 100 scripts. Now you write one script that uses LLMs to generate that script. That's why we built our web browsing framework, StageHand, which does a lot of that work for you. But those two things, web data collection and then enhanced automation of many different websites, it just felt like big drivers for more browser infrastructure that would be required to power these kinds of features.Alessio [00:07:05]: And was multimodality also a big thing?Paul [00:07:08]: Now you can use the LLMs to look, even though the text in the dome might not be as friendly. Maybe my hot take is I was always kind of like, I didn't think vision would be as big of a driver. For UI automation, I felt like, you know, HTML is structured text and large language models are good with structured text. But it's clear that these computer use models are often vision driven, and they've been really pushing things forward. So definitely being multimodal, like rendering the page is required to take a screenshot to give that to a computer use model to take actions on a website. And it's just another win for browser. But I'll be honest, that wasn't what I was thinking early on. I didn't even think that we'd get here so fast with multimodality. I think we're going to have to get back to multimodal and vision models.swyx [00:07:50]: This is one of those things where I forgot to mention in my intro that I'm an investor in Browserbase. And I remember that when you pitched to me, like a lot of the stuff that we have today, we like wasn't on the original conversation. But I did have my original thesis was something that we've talked about on the podcast before, which is take the GPT store, the custom GPT store, all the every single checkbox and plugin is effectively a startup. And this was the browser one. I think the main hesitation, I think I actually took a while to get back to you. The main hesitation was that there were others. Like you're not the first hit list browser startup. It's not even your first hit list browser startup. There's always a question of like, will you be the category winner in a place where there's a bunch of incumbents, to be honest, that are bigger than you? They're just not targeted at the AI space. They don't have the backing of Nat Friedman. And there's a bunch of like, you're here in Silicon Valley. They're not. I don't know.Paul [00:08:47]: I don't know if that's, that was it, but like, there was a, yeah, I mean, like, I think I tried all the other ones and I was like, really disappointed. Like my background is from working at great developer tools, companies, and nothing had like the Vercel like experience. Um, like our biggest competitor actually is partly owned by private equity and they just jacked up their prices quite a bit. And the dashboard hasn't changed in five years. And I actually used them at my last company and tried them and I was like, oh man, like there really just needs to be something that's like the experience of these great infrastructure companies, like Stripe, like clerk, like Vercel that I use in love, but oriented towards this kind of like more specific category, which is browser infrastructure, which is really technically complex. Like a lot of stuff can go wrong on the internet when you're running a browser. The internet is very vast. There's a lot of different configurations. Like there's still websites that only work with internet explorer out there. How do you handle that when you're running your own browser infrastructure? These are the problems that we have to think about and solve at BrowserBase. And it's, it's certainly a labor of love, but I built this for me, first and foremost, I know it's super cheesy and everyone says that for like their startups, but it really, truly was for me. If you look at like the talks I've done even before BrowserBase, and I'm just like really excited to try and build a category defining infrastructure company. And it's, it's rare to have a new category of infrastructure exists. We're here in the Chroma offices and like, you know, vector databases is a new category of infrastructure. Is it, is it, I mean, we can, we're in their office, so, you know, we can, we can debate that one later. That is one.Multimodality in AI-Powered Browsingswyx [00:10:16]: That's one of the industry debates.Paul [00:10:17]: I guess we go back to the LLMOS talk that Karpathy gave way long ago. And like the browser box was very clearly there and it seemed like the people who were building in this space also agreed that browsers are a core primitive of infrastructure for the LLMOS that's going to exist in the future. And nobody was building something there that I wanted to use. So I had to go build it myself.swyx [00:10:38]: Yeah. I mean, exactly that talk that, that honestly, that diagram, every box is a startup and there's the code box and then there's the. The browser box. I think at some point they will start clashing there. There's always the question of the, are you a point solution or are you the sort of all in one? And I think the point solutions tend to win quickly, but then the only ones have a very tight cohesive experience. Yeah. Let's talk about just the hard problems of browser base you have on your website, which is beautiful. Thank you. Was there an agency that you used for that? Yeah. Herb.paris.Paul [00:11:11]: They're amazing. Herb.paris. Yeah. It's H-E-R-V-E. I highly recommend for developers. Developer tools, founders to work with consumer agencies because they end up building beautiful things and the Parisians know how to build beautiful interfaces. So I got to give prep.swyx [00:11:24]: And chat apps, apparently are, they are very fast. Oh yeah. The Mistral chat. Yeah. Mistral. Yeah.Paul [00:11:31]: Late chat.swyx [00:11:31]: Late chat. And then your videos as well, it was professionally shot, right? The series A video. Yeah.Alessio [00:11:36]: Nico did the videos. He's amazing. Not the initial video that you shot at the new one. First one was Austin.Paul [00:11:41]: Another, another video pretty surprised. But yeah, I mean, like, I think when you think about how you talk about your company. You have to think about the way you present yourself. It's, you know, as a developer, you think you evaluate a company based on like the API reliability and the P 95, but a lot of developers say, is the website good? Is the message clear? Do I like trust this founder? I'm building my whole feature on. So I've tried to nail that as well as like the reliability of the infrastructure. You're right. It's very hard. And there's a lot of kind of foot guns that you run into when running headless browsers at scale. Right.Competing with Existing Headless Browser Solutionsswyx [00:12:10]: So let's pick one. You have eight features here. Seamless integration. Scalability. Fast or speed. Secure. Observable. Stealth. That's interesting. Extensible and developer first. What comes to your mind as like the top two, three hardest ones? Yeah.Running headless browsers at scalePaul [00:12:26]: I think just running headless browsers at scale is like the hardest one. And maybe can I nerd out for a second? Is that okay? I heard this is a technical audience, so I'll talk to the other nerds. Whoa. They were listening. Yeah. They're upset. They're ready. The AGI is angry. Okay. So. So how do you run a browser in the cloud? Let's start with that, right? So let's say you're using a popular browser automation framework like Puppeteer, Playwright, and Selenium. Maybe you've written a code, some code locally on your computer that opens up Google. It finds the search bar and then types in, you know, search for Latent Space and hits the search button. That script works great locally. You can see the little browser open up. You want to take that to production. You want to run the script in a cloud environment. So when your laptop is closed, your browser is doing something. The browser is doing something. Well, I, we use Amazon. You can see the little browser open up. You know, the first thing I'd reach for is probably like some sort of serverless infrastructure. I would probably try and deploy on a Lambda. But Chrome itself is too big to run on a Lambda. It's over 250 megabytes. So you can't easily start it on a Lambda. So you maybe have to use something like Lambda layers to squeeze it in there. Maybe use a different Chromium build that's lighter. And you get it on the Lambda. Great. It works. But it runs super slowly. It's because Lambdas are very like resource limited. They only run like with one vCPU. You can run one process at a time. Remember, Chromium is super beefy. It's barely running on my MacBook Air. I'm still downloading it from a pre-run. Yeah, from the test earlier, right? I'm joking. But it's big, you know? So like Lambda, it just won't work really well. Maybe it'll work, but you need something faster. Your users want something faster. Okay. Well, let's put it on a beefier instance. Let's get an EC2 server running. Let's throw Chromium on there. Great. Okay. I can, that works well with one user. But what if I want to run like 10 Chromium instances, one for each of my users? Okay. Well, I might need two EC2 instances. Maybe 10. All of a sudden, you have multiple EC2 instances. This sounds like a problem for Kubernetes and Docker, right? Now, all of a sudden, you're using ECS or EKS, the Kubernetes or container solutions by Amazon. You're spending up and down containers, and you're spending a whole engineer's time on kind of maintaining this stateful distributed system. Those are some of the worst systems to run because when it's a stateful distributed system, it means that you are bound by the connections to that thing. You have to keep the browser open while someone is working with it, right? That's just a painful architecture to run. And there's all this other little gotchas with Chromium, like Chromium, which is the open source version of Chrome, by the way. You have to install all these fonts. You want emojis working in your browsers because your vision model is looking for the emoji. You need to make sure you have the emoji fonts. You need to make sure you have all the right extensions configured, like, oh, do you want ad blocking? How do you configure that? How do you actually record all these browser sessions? Like it's a headless browser. You can't look at it. So you need to have some sort of observability. Maybe you're recording videos and storing those somewhere. It all kind of adds up to be this just giant monster piece of your project when all you wanted to do was run a lot of browsers in production for this little script to go to google.com and search. And when I see a complex distributed system, I see an opportunity to build a great infrastructure company. And we really abstract that away with Browserbase where our customers can use these existing frameworks, Playwright, Publisher, Selenium, or our own stagehand and connect to our browsers in a serverless-like way. And control them, and then just disconnect when they're done. And they don't have to think about the complex distributed system behind all of that. They just get a browser running anywhere, anytime. Really easy to connect to.swyx [00:15:55]: I'm sure you have questions. My standard question with anything, so essentially you're a serverless browser company, and there's been other serverless things that I'm familiar with in the past, serverless GPUs, serverless website hosting. That's where I come from with Netlify. One question is just like, you promised to spin up thousands of servers. You promised to spin up thousands of browsers in milliseconds. I feel like there's no real solution that does that yet. And I'm just kind of curious how. The only solution I know, which is to kind of keep a kind of warm pool of servers around, which is expensive, but maybe not so expensive because it's just CPUs. So I'm just like, you know. Yeah.Browsers as a Core Primitive in AI InfrastructurePaul [00:16:36]: You nailed it, right? I mean, how do you offer a serverless-like experience with something that is clearly not serverless, right? And the answer is, you need to be able to run... We run many browsers on single nodes. We use Kubernetes at browser base. So we have many pods that are being scheduled. We have to predictably schedule them up or down. Yes, thousands of browsers in milliseconds is the best case scenario. If you hit us with 10,000 requests, you may hit a slower cold start, right? So we've done a lot of work on predictive scaling and being able to kind of route stuff to different regions where we have multiple regions of browser base where we have different pools available. You can also pick the region you want to go to based on like lower latency, round trip, time latency. It's very important with these types of things. There's a lot of requests going over the wire. So for us, like having a VM like Firecracker powering everything under the hood allows us to be super nimble and spin things up or down really quickly with strong multi-tenancy. But in the end, this is like the complex infrastructural challenges that we have to kind of deal with at browser base. And we have a lot more stuff on our roadmap to allow customers to have more levers to pull to exchange, do you want really fast browser startup times or do you want really low costs? And if you're willing to be more flexible on that, we may be able to kind of like work better for your use cases.swyx [00:17:44]: Since you used Firecracker, shouldn't Fargate do that for you or did you have to go lower level than that? We had to go lower level than that.Paul [00:17:51]: I find this a lot with Fargate customers, which is alarming for Fargate. We used to be a giant Fargate customer. Actually, the first version of browser base was ECS and Fargate. And unfortunately, it's a great product. I think we were actually the largest Fargate customer in our region for a little while. No, what? Yeah, seriously. And unfortunately, it's a great product, but I think if you're an infrastructure company, you actually have to have a deeper level of control over these primitives. I think it's the same thing is true with databases. We've used other database providers and I think-swyx [00:18:21]: Yeah, serverless Postgres.Paul [00:18:23]: Shocker. When you're an infrastructure company, you're on the hook if any provider has an outage. And I can't tell my customers like, hey, we went down because so-and-so went down. That's not acceptable. So for us, we've really moved to bringing things internally. It's kind of opposite of what we preach. We tell our customers, don't build this in-house, but then we're like, we build a lot of stuff in-house. But I think it just really depends on what is in the critical path. We try and have deep ownership of that.Alessio [00:18:46]: On the distributed location side, how does that work for the web where you might get sort of different content in different locations, but the customer is expecting, you know, if you're in the US, I'm expecting the US version. But if you're spinning up my browser in France, I might get the French version. Yeah.Paul [00:19:02]: Yeah. That's a good question. Well, generally, like on the localization, there is a thing called locale in the browser. You can set like what your locale is. If you're like in the ENUS browser or not, but some things do IP, IP based routing. And in that case, you may want to have a proxy. Like let's say you're running something in the, in Europe, but you want to make sure you're showing up from the US. You may want to use one of our proxy features so you can turn on proxies to say like, make sure these connections always come from the United States, which is necessary too, because when you're browsing the web, you're coming from like a, you know, data center IP, and that can make things a lot harder to browse web. So we do have kind of like this proxy super network. Yeah. We have a proxy for you based on where you're going, so you can reliably automate the web. But if you get scheduled in Europe, that doesn't happen as much. We try and schedule you as close to, you know, your origin that you're trying to go to. But generally you have control over the regions you can put your browsers in. So you can specify West one or East one or Europe. We only have one region of Europe right now, actually. Yeah.Alessio [00:19:55]: What's harder, the browser or the proxy? I feel like to me, it feels like actually proxying reliably at scale. It's much harder than spending up browsers at scale. I'm curious. It's all hard.Paul [00:20:06]: It's layers of hard, right? Yeah. I think it's different levels of hard. I think the thing with the proxy infrastructure is that we work with many different web proxy providers and some are better than others. Some have good days, some have bad days. And our customers who've built browser infrastructure on their own, they have to go and deal with sketchy actors. Like first they figure out their own browser infrastructure and then they got to go buy a proxy. And it's like you can pay in Bitcoin and it just kind of feels a little sus, right? It's like you're buying drugs when you're trying to get a proxy online. We have like deep relationships with these counterparties. We're able to audit them and say, is this proxy being sourced ethically? Like it's not running on someone's TV somewhere. Is it free range? Yeah. Free range organic proxies, right? Right. We do a level of diligence. We're SOC 2. So we have to understand what is going on here. But then we're able to make sure that like we route around proxy providers not working. There's proxy providers who will just, the proxy will stop working all of a sudden. And then if you don't have redundant proxying on your own browsers, that's hard down for you or you may get some serious impacts there. With us, like we intelligently know, hey, this proxy is not working. Let's go to this one. And you can kind of build a network of multiple providers to really guarantee the best uptime for our customers. Yeah. So you don't own any proxies? We don't own any proxies. You're right. The team has been saying who wants to like take home a little proxy server, but not yet. We're not there yet. You know?swyx [00:21:25]: It's a very mature market. I don't think you should build that yourself. Like you should just be a super customer of them. Yeah. Scraping, I think, is the main use case for that. I guess. Well, that leads us into CAPTCHAs and also off, but let's talk about CAPTCHAs. You had a little spiel that you wanted to talk about CAPTCHA stuff.Challenges of Scaling Browser InfrastructurePaul [00:21:43]: Oh, yeah. I was just, I think a lot of people ask, if you're thinking about proxies, you're thinking about CAPTCHAs too. I think it's the same thing. You can go buy CAPTCHA solvers online, but it's the same buying experience. It's some sketchy website, you have to integrate it. It's not fun to buy these things and you can't really trust that the docs are bad. What Browserbase does is we integrate a bunch of different CAPTCHAs. We do some stuff in-house, but generally we just integrate with a bunch of known vendors and continually monitor and maintain these things and say, is this working or not? Can we route around it or not? These are CAPTCHA solvers. CAPTCHA solvers, yeah. Not CAPTCHA providers, CAPTCHA solvers. Yeah, sorry. CAPTCHA solvers. We really try and make sure all of that works for you. I think as a dev, if I'm buying infrastructure, I want it all to work all the time and it's important for us to provide that experience by making sure everything does work and monitoring it on our own. Yeah. Right now, the world of CAPTCHAs is tricky. I think AI agents in particular are very much ahead of the internet infrastructure. CAPTCHAs are designed to block all types of bots, but there are now good bots and bad bots. I think in the future, CAPTCHAs will be able to identify who a good bot is, hopefully via some sort of KYC. For us, we've been very lucky. We have very little to no known abuse of Browserbase because we really look into who we work with. And for certain types of CAPTCHA solving, we only allow them on certain types of plans because we want to make sure that we can know what people are doing, what their use cases are. And that's really allowed us to try and be an arbiter of good bots, which is our long term goal. I want to build great relationships with people like Cloudflare so we can agree, hey, here are these acceptable bots. We'll identify them for you and make sure we flag when they come to your website. This is a good bot, you know?Alessio [00:23:23]: I see. And Cloudflare said they want to do more of this. So they're going to set by default, if they think you're an AI bot, they're going to reject. I'm curious if you think this is something that is going to be at the browser level or I mean, the DNS level with Cloudflare seems more where it should belong. But I'm curious how you think about it.Paul [00:23:40]: I think the web's going to change. You know, I think that the Internet as we have it right now is going to change. And we all need to just accept that the cat is out of the bag. And instead of kind of like wishing the Internet was like it was in the 2000s, we can have free content line that wouldn't be scraped. It's just it's not going to happen. And instead, we should think about like, one, how can we change? How can we change the models of, you know, information being published online so people can adequately commercialize it? But two, how do we rebuild applications that expect that AI agents are going to log in on their behalf? Those are the things that are going to allow us to kind of like identify good and bad bots. And I think the team at Clerk has been doing a really good job with this on the authentication side. I actually think that auth is the biggest thing that will prevent agents from accessing stuff, not captchas. And I think there will be agent auth in the future. I don't know if it's going to happen from an individual company, but actually authentication providers that have a, you know, hidden login as agent feature, which will then you put in your email, you'll get a push notification, say like, hey, your browser-based agent wants to log into your Airbnb. You can approve that and then the agent can proceed. That really circumvents the need for captchas or logging in as you and sharing your password. I think agent auth is going to be one way we identify good bots going forward. And I think a lot of this captcha solving stuff is really short-term problems as the internet kind of reorients itself around how it's going to work with agents browsing the web, just like people do. Yeah.Managing Distributed Browser Locations and Proxiesswyx [00:24:59]: Stitch recently was on Hacker News for talking about agent experience, AX, which is a thing that Netlify is also trying to clone and coin and talk about. And we've talked about this on our previous episodes before in a sense that I actually think that's like maybe the only part of the tech stack that needs to be kind of reinvented for agents. Everything else can stay the same, CLIs, APIs, whatever. But auth, yeah, we need agent auth. And it's mostly like short-lived, like it should not, it should be a distinct, identity from the human, but paired. I almost think like in the same way that every social network should have your main profile and then your alt accounts or your Finsta, it's almost like, you know, every, every human token should be paired with the agent token and the agent token can go and do stuff on behalf of the human token, but not be presumed to be the human. Yeah.Paul [00:25:48]: It's like, it's, it's actually very similar to OAuth is what I'm thinking. And, you know, Thread from Stitch is an investor, Colin from Clerk, Octaventures, all investors in browser-based because like, I hope they solve this because they'll make browser-based submission more possible. So we don't have to overcome all these hurdles, but I think it will be an OAuth-like flow where an agent will ask to log in as you, you'll approve the scopes. Like it can book an apartment on Airbnb, but it can't like message anybody. And then, you know, the agent will have some sort of like role-based access control within an application. Yeah. I'm excited for that.swyx [00:26:16]: The tricky part is just, there's one, one layer of delegation here, which is like, you're authoring my user's user or something like that. I don't know if that's tricky or not. Does that make sense? Yeah.Paul [00:26:25]: You know, actually at Twilio, I worked on the login identity and access. Management teams, right? So like I built Twilio's login page.swyx [00:26:31]: You were an intern on that team and then you became the lead in two years? Yeah.Paul [00:26:34]: Yeah. I started as an intern in 2016 and then I was the tech lead of that team. How? That's not normal. I didn't have a life. He's not normal. Look at this guy. I didn't have a girlfriend. I just loved my job. I don't know. I applied to 500 internships for my first job and I got rejected from every single one of them except for Twilio and then eventually Amazon. And they took a shot on me and like, I was getting paid money to write code, which was my dream. Yeah. Yeah. I'm very lucky that like this coding thing worked out because I was going to be doing it regardless. And yeah, I was able to kind of spend a lot of time on a team that was growing at a company that was growing. So it informed a lot of this stuff here. I think these are problems that have been solved with like the SAML protocol with SSO. I think it's a really interesting stuff with like WebAuthn, like these different types of authentication, like schemes that you can use to authenticate people. The tooling is all there. It just needs to be tweaked a little bit to work for agents. And I think the fact that there are companies that are already. Providing authentication as a service really sets it up. Well, the thing that's hard is like reinventing the internet for agents. We don't want to rebuild the internet. That's an impossible task. And I think people often say like, well, we'll have this second layer of APIs built for agents. I'm like, we will for the top use cases, but instead of we can just tweak the internet as is, which is on the authentication side, I think we're going to be the dumb ones going forward. Unfortunately, I think AI is going to be able to do a lot of the tasks that we do online, which means that it will be able to go to websites, click buttons on our behalf and log in on our behalf too. So with this kind of like web agent future happening, I think with some small structural changes, like you said, it feels like it could all slot in really nicely with the existing internet.Handling CAPTCHAs and Agent Authenticationswyx [00:28:08]: There's one more thing, which is the, your live view iframe, which lets you take, take control. Yeah. Obviously very key for operator now, but like, was, is there anything interesting technically there or that the people like, well, people always want this.Paul [00:28:21]: It was really hard to build, you know, like, so, okay. Headless browsers, you don't see them, right. They're running. They're running in a cloud somewhere. You can't like look at them. And I just want to really make, it's a weird name. I wish we came up with a better name for this thing, but you can't see them. Right. But customers don't trust AI agents, right. At least the first pass. So what we do with our live view is that, you know, when you use browser base, you can actually embed a live view of the browser running in the cloud for your customer to see it working. And that's what the first reason is the build trust, like, okay, so I have this script. That's going to go automate a website. I can embed it into my web application via an iframe and my customer can watch. I think. And then we added two way communication. So now not only can you watch the browser kind of being operated by AI, if you want to pause and actually click around type within this iframe that's controlling a browser, that's also possible. And this is all thanks to some of the lower level protocol, which is called the Chrome DevTools protocol. It has a API called start screencast, and you can also send mouse clicks and button clicks to a remote browser. And this is all embeddable within iframes. You have a browser within a browser, yo. And then you simulate the screen, the click on the other side. Exactly. And this is really nice often for, like, let's say, a capture that can't be solved. You saw this with Operator, you know, Operator actually uses a different approach. They use VNC. So, you know, you're able to see, like, you're seeing the whole window here. What we're doing is something a little lower level with the Chrome DevTools protocol. It's just PNGs being streamed over the wire. But the same thing is true, right? Like, hey, I'm running a window. Pause. Can you do something in this window? Human. Okay, great. Resume. Like sometimes 2FA tokens. Like if you get that text message, you might need a person to type that in. Web agents need human-in-the-loop type workflows still. You still need a person to interact with the browser. And building a UI to proxy that is kind of hard. You may as well just show them the whole browser and say, hey, can you finish this up for me? And then let the AI proceed on afterwards. Is there a future where I stream my current desktop to browser base? I don't think so. I think we're very much cloud infrastructure. Yeah. You know, but I think a lot of the stuff we're doing, we do want to, like, build tools. Like, you know, we'll talk about the stage and, you know, web agent framework in a second. But, like, there's a case where a lot of people are going desktop first for, you know, consumer use. And I think cloud is doing a lot of this, where I expect to see, you know, MCPs really oriented around the cloud desktop app for a reason, right? Like, I think a lot of these tools are going to run on your computer because it makes... I think it's breaking out. People are putting it on a server. Oh, really? Okay. Well, sweet. We'll see. We'll see that. I was surprised, though, wasn't I? I think that the browser company, too, with Dia Browser, it runs on your machine. You know, it's going to be...swyx [00:30:50]: What is it?Paul [00:30:51]: So, Dia Browser, as far as I understand... I used to use Arc. Yeah. I haven't used Arc. But I'm a big fan of the browser company. I think they're doing a lot of cool stuff in consumer. As far as I understand, it's a browser where you have a sidebar where you can, like, chat with it and it can control the local browser on your machine. So, if you imagine, like, what a consumer web agent is, which it lives alongside your browser, I think Google Chrome has Project Marina, I think. I almost call it Project Marinara for some reason. I don't know why. It's...swyx [00:31:17]: No, I think it's someone really likes the Waterworld. Oh, I see. The classic Kevin Costner. Yeah.Paul [00:31:22]: Okay. Project Marinara is a similar thing to the Dia Browser, in my mind, as far as I understand it. You have a browser that has an AI interface that will take over your mouse and keyboard and control the browser for you. Great for consumer use cases. But if you're building applications that rely on a browser and it's more part of a greater, like, AI app experience, you probably need something that's more like infrastructure, not a consumer app.swyx [00:31:44]: Just because I have explored a little bit in this area, do people want branching? So, I have the state. Of whatever my browser's in. And then I want, like, 100 clones of this state. Do people do that? Or...Paul [00:31:56]: People don't do it currently. Yeah. But it's definitely something we're thinking about. I think the idea of forking a browser is really cool. Technically, kind of hard. We're starting to see this in code execution, where people are, like, forking some, like, code execution, like, processes or forking some tool calls or branching tool calls. Haven't seen it at the browser level yet. But it makes sense. Like, if an AI agent is, like, using a website and it's not sure what path it wants to take to crawl this website. To find the information it's looking for. It would make sense for it to explore both paths in parallel. And that'd be a very, like... A road not taken. Yeah. And hopefully find the right answer. And then say, okay, this was actually the right one. And memorize that. And go there in the future. On the roadmap. For sure. Don't make my roadmap, please. You know?Alessio [00:32:37]: How do you actually do that? Yeah. How do you fork? I feel like the browser is so stateful for so many things.swyx [00:32:42]: Serialize the state. Restore the state. I don't know.Paul [00:32:44]: So, it's one of the reasons why we haven't done it yet. It's hard. You know? Like, to truly fork, it's actually quite difficult. The naive way is to open the same page in a new tab and then, like, hope that it's at the same thing. But if you have a form halfway filled, you may have to, like, take the whole, you know, container. Pause it. All the memory. Duplicate it. Restart it from there. It could be very slow. So, we haven't found a thing. Like, the easy thing to fork is just, like, copy the page object. You know? But I think there needs to be something a little bit more robust there. Yeah.swyx [00:33:12]: So, MorphLabs has this infinite branch thing. Like, wrote a custom fork of Linux or something that let them save the system state and clone it. MorphLabs, hit me up. I'll be a customer. Yeah. That's the only. I think that's the only way to do it. Yeah. Like, unless Chrome has some special API for you. Yeah.Paul [00:33:29]: There's probably something we'll reverse engineer one day. I don't know. Yeah.Alessio [00:33:32]: Let's talk about StageHand, the AI web browsing framework. You have three core components, Observe, Extract, and Act. Pretty clean landing page. What was the idea behind making a framework? Yeah.Stagehand: AI web browsing frameworkPaul [00:33:43]: So, there's three frameworks that are very popular or already exist, right? Puppeteer, Playwright, Selenium. Those are for building hard-coded scripts to control websites. And as soon as I started to play with LLMs plus browsing, I caught myself, you know, code-genning Playwright code to control a website. I would, like, take the DOM. I'd pass it to an LLM. I'd say, can you generate the Playwright code to click the appropriate button here? And it would do that. And I was like, this really should be part of the frameworks themselves. And I became really obsessed with SDKs that take natural language as part of, like, the API input. And that's what StageHand is. StageHand exposes three APIs, and it's a super set of Playwright. So, if you go to a page, you may want to take an action, click on the button, fill in the form, etc. That's what the act command is for. You may want to extract some data. This one takes a natural language, like, extract the winner of the Super Bowl from this page. You can give it a Zod schema, so it returns a structured output. And then maybe you're building an API. You can do an agent loop, and you want to kind of see what actions are possible on this page before taking one. You can do observe. So, you can observe the actions on the page, and it will generate a list of actions. You can guide it, like, give me actions on this page related to buying an item. And you can, like, buy it now, add to cart, view shipping options, and pass that to an LLM, an agent loop, to say, what's the appropriate action given this high-level goal? So, StageHand isn't a web agent. It's a framework for building web agents. And we think that agent loops are actually pretty close to the application layer because every application probably has different goals or different ways it wants to take steps. I don't think I've seen a generic. Maybe you guys are the experts here. I haven't seen, like, a really good AI agent framework here. Everyone kind of has their own special sauce, right? I see a lot of developers building their own agent loops, and they're using tools. And I view StageHand as the browser tool. So, we expose act, extract, observe. Your agent can call these tools. And from that, you don't have to worry about it. You don't have to worry about generating playwright code performantly. You don't have to worry about running it. You can kind of just integrate these three tool calls into your agent loop and reliably automate the web.swyx [00:35:48]: A special shout-out to Anirudh, who I met at your dinner, who I think listens to the pod. Yeah. Hey, Anirudh.Paul [00:35:54]: Anirudh's a man. He's a StageHand guy.swyx [00:35:56]: I mean, the interesting thing about each of these APIs is they're kind of each startup. Like, specifically extract, you know, Firecrawler is extract. There's, like, Expand AI. There's a whole bunch of, like, extract companies. They just focus on extract. I'm curious. Like, I feel like you guys are going to collide at some point. Like, right now, it's friendly. Everyone's in a blue ocean. At some point, it's going to be valuable enough that there's some turf battle here. I don't think you have a dog in a fight. I think you can mock extract to use an external service if they're better at it than you. But it's just an observation that, like, in the same way that I see each option, each checkbox in the side of custom GBTs becoming a startup or each box in the Karpathy chart being a startup. Like, this is also becoming a thing. Yeah.Paul [00:36:41]: I mean, like, so the way StageHand works is that it's MIT-licensed, completely open source. You bring your own API key to your LLM of choice. You could choose your LLM. We don't make any money off of the extract or really. We only really make money if you choose to run it with our browser. You don't have to. You can actually use your own browser, a local browser. You know, StageHand is completely open source for that reason. And, yeah, like, I think if you're building really complex web scraping workflows, I don't know if StageHand is the tool for you. I think it's really more if you're building an AI agent that needs a few general tools or if it's doing a lot of, like, web automation-intensive work. But if you're building a scraping company, StageHand is not your thing. You probably want something that's going to, like, get HTML content, you know, convert that to Markdown, query it. That's not what StageHand does. StageHand is more about reliability. I think we focus a lot on reliability and less so on cost optimization and speed at this point.swyx [00:37:33]: I actually feel like StageHand, so the way that StageHand works, it's like, you know, page.act, click on the quick start. Yeah. It's kind of the integration test for the code that you would have to write anyway, like the Puppeteer code that you have to write anyway. And when the page structure changes, because it always does, then this is still the test. This is still the test that I would have to write. Yeah. So it's kind of like a testing framework that doesn't need implementation detail.Paul [00:37:56]: Well, yeah. I mean, Puppeteer, Playwright, and Slenderman were all designed as testing frameworks, right? Yeah. And now people are, like, hacking them together to automate the web. I would say, and, like, maybe this is, like, me being too specific. But, like, when I write tests, if the page structure changes. Without me knowing, I want that test to fail. So I don't know if, like, AI, like, regenerating that. Like, people are using StageHand for testing. But it's more for, like, usability testing, not, like, testing of, like, does the front end, like, has it changed or not. Okay. But generally where we've seen people, like, really, like, take off is, like, if they're using, you know, something. If they want to build a feature in their application that's kind of like Operator or Deep Research, they're using StageHand to kind of power that tool calling in their own agent loop. Okay. Cool.swyx [00:38:37]: So let's go into Operator, the first big agent launch of the year from OpenAI. Seems like they have a whole bunch scheduled. You were on break and your phone blew up. What's your just general view of computer use agents is what they're calling it. The overall category before we go into Open Operator, just the overall promise of Operator. I will observe that I tried it once. It was okay. And I never tried it again.OpenAI's Operator and computer use agentsPaul [00:38:58]: That tracks with my experience, too. Like, I'm a huge fan of the OpenAI team. Like, I think that I do not view Operator as the company. I'm not a company killer for browser base at all. I think it actually shows people what's possible. I think, like, computer use models make a lot of sense. And I'm actually most excited about computer use models is, like, their ability to, like, really take screenshots and reasoning and output steps. I think that using mouse click or mouse coordinates, I've seen that proved to be less reliable than I would like. And I just wonder if that's the right form factor. What we've done with our framework is anchor it to the DOM itself, anchor it to the actual item. So, like, if it's clicking on something, it's clicking on that thing, you know? Like, it's more accurate. No matter where it is. Yeah, exactly. Because it really ties in nicely. And it can handle, like, the whole viewport in one go, whereas, like, Operator can only handle what it sees. Can you hover? Is hovering a thing that you can do? I don't know if we expose it as a tool directly, but I'm sure there's, like, an API for hovering. Like, move mouse to this position. Yeah, yeah, yeah. I think you can trigger hover, like, via, like, the JavaScript on the DOM itself. But, no, I think, like, when we saw computer use, everyone's eyes lit up because they realized, like, wow, like, AI is going to actually automate work for people. And I think seeing that kind of happen from both of the labs, and I'm sure we're going to see more labs launch computer use models, I'm excited to see all the stuff that people build with it. I think that I'd love to see computer use power, like, controlling a browser on browser base. And I think, like, Open Operator, which was, like, our open source version of OpenAI's Operator, was our first take on, like, how can we integrate these models into browser base? And we handle the infrastructure and let the labs do the models. I don't have a sense that Operator will be released as an API. I don't know. Maybe it will. I'm curious to see how well that works because I think it's going to be really hard for a company like OpenAI to do things like support CAPTCHA solving or, like, have proxies. Like, I think it's hard for them structurally. Imagine this New York Times headline, OpenAI CAPTCHA solving. Like, that would be a pretty bad headline, this New York Times headline. Browser base solves CAPTCHAs. No one cares. No one cares. And, like, our investors are bored. Like, we're all okay with this, you know? We're building this company knowing that the CAPTCHA solving is short-lived until we figure out how to authenticate good bots. I think it's really hard for a company like OpenAI, who has this brand that's so, so good, to balance with, like, the icky parts of web automation, which it can be kind of complex to solve. I'm sure OpenAI knows who to call whenever they need you. Yeah, right. I'm sure they'll have a great partnership.Alessio [00:41:23]: And is Open Operator just, like, a marketing thing for you? Like, how do you think about resource allocation? So, you can spin this up very quickly. And now there's all this, like, open deep research, just open all these things that people are building. We started it, you know. You're the original Open. We're the original Open operator, you know? Is it just, hey, look, this is a demo, but, like, we'll help you build out an actual product for yourself? Like, are you interested in going more of a product route? That's kind of the OpenAI way, right? They started as a model provider and then…Paul [00:41:53]: Yeah, we're not interested in going the product route yet. I view Open Operator as a model provider. It's a reference project, you know? Let's show people how to build these things using the infrastructure and models that are out there. And that's what it is. It's, like, Open Operator is very simple. It's an agent loop. It says, like, take a high-level goal, break it down into steps, use tool calling to accomplish those steps. It takes screenshots and feeds those screenshots into an LLM with the step to generate the right action. It uses stagehand under the hood to actually execute this action. It doesn't use a computer use model. And it, like, has a nice interface using the live view that we talked about, the iframe, to embed that into an application. So I felt like people on launch day wanted to figure out how to build their own version of this. And we turned that around really quickly to show them. And I hope we do that with other things like deep research. We don't have a deep research launch yet. I think David from AOMNI actually has an amazing open deep research that he launched. It has, like, 10K GitHub stars now. So he's crushing that. But I think if people want to build these features natively into their application, they need good reference projects. And I think Open Operator is a good example of that.swyx [00:42:52]: I don't know. Actually, I'm actually pretty bullish on API-driven operator. Because that's the only way that you can sort of, like, once it's reliable enough, obviously. And now we're nowhere near. But, like, give it five years. It'll happen, you know. And then you can sort of spin this up and browsers are working in the background and you don't necessarily have to know. And it just is booking restaurants for you, whatever. I can definitely see that future happening. I had this on the landing page here. This might be a slightly out of order. But, you know, you have, like, sort of three use cases for browser base. Open Operator. Or this is the operator sort of use case. It's kind of like the workflow automation use case. And it completes with UiPath in the sort of RPA category. Would you agree with that? Yeah, I would agree with that. And then there's Agents we talked about already. And web scraping, which I imagine would be the bulk of your workload right now, right?Paul [00:43:40]: No, not at all. I'd say actually, like, the majority is browser automation. We're kind of expensive for web scraping. Like, I think that if you're building a web scraping product, if you need to do occasional web scraping or you have to do web scraping that works every single time, you want to use browser automation. Yeah. You want to use browser-based. But if you're building web scraping workflows, what you should do is have a waterfall. You should have the first request is a curl to the website. See if you can get it without even using a browser. And then the second request may be, like, a scraping-specific API. There's, like, a thousand scraping APIs out there that you can use to try and get data. Scraping B. Scraping B is a great example, right? Yeah. And then, like, if those two don't work, bring out the heavy hitter. Like, browser-based will 100% work, right? It will load the page in a real browser, hydrate it. I see.swyx [00:44:21]: Because a lot of people don't render to JS.swyx [00:44:25]: Yeah, exactly.Paul [00:44:26]: So, I mean, the three big use cases, right? Like, you know, automation, web data collection, and then, you know, if you're building anything agentic that needs, like, a browser tool, you want to use browser-based.Alessio [00:44:35]: Is there any use case that, like, you were super surprised by that people might not even think about? Oh, yeah. Or is it, yeah, anything that you can share? The long tail is crazy. Yeah.Surprising use cases of BrowserbasePaul [00:44:44]: One of the case studies on our website that I think is the most interesting is this company called Benny. So, the way that it works is if you're on food stamps in the United States, you can actually get rebates if you buy certain things. Yeah. You buy some vegetables. You submit your receipt to the government. They'll give you a little rebate back. Say, hey, thanks for buying vegetables. It's good for you. That process of submitting that receipt is very painful. And the way Benny works is you use their app to take a photo of your receipt, and then Benny will go submit that receipt for you and then deposit the money into your account. That's actually using no AI at all. It's all, like, hard-coded scripts. They maintain the scripts. They've been doing a great job. And they build this amazing consumer app. But it's an example of, like, all these, like, tedious workflows that people have to do to kind of go about their business. And they're doing it for the sake of their day-to-day lives. And I had never known about, like, food stamp rebates or the complex forms you have to do to fill them. But the world is powered by millions and millions of tedious forms, visas. You know, Emirate Lighthouse is a customer, right? You know, they do the O1 visa. Millions and millions of forms are taking away humans' time. And I hope that Browserbase can help power software that automates away the web forms that we don't need anymore. Yeah.swyx [00:45:49]: I mean, I'm very supportive of that. I mean, forms. I do think, like, government itself is a big part of it. I think the government itself should embrace AI more to do more sort of human-friendly form filling. Mm-hmm. But I'm not optimistic. I'm not holding my breath. Yeah. We'll see. Okay. I think I'm about to zoom out. I have a little brief thing on computer use, and then we can talk about founder stuff, which is, I tend to think of developer tooling markets in impossible triangles, where everyone starts in a niche, and then they start to branch out. So I already hinted at a little bit of this, right? We mentioned more. We mentioned E2B. We mentioned Firecrawl. And then there's Browserbase. So there's, like, all this stuff of, like, have serverless virtual computer that you give to an agent and let them do stuff with it. And there's various ways of connecting it to the internet. You can just connect to a search API, like SERP API, whatever other, like, EXA is another one. That's what you're searching. You can also have a JSON markdown extractor, which is Firecrawl. Or you can have a virtual browser like Browserbase, or you can have a virtual machine like Morph. And then there's also maybe, like, a virtual sort of code environment, like Code Interpreter. So, like, there's just, like, a bunch of different ways to tackle the problem of give a computer to an agent. And I'm just kind of wondering if you see, like, everyone's just, like, happily coexisting in their respective niches. And as a developer, I just go and pick, like, a shopping basket of one of each. Or do you think that you eventually, people will collide?Future of browser automation and market competitionPaul [00:47:18]: I think that currently it's not a zero-sum market. Like, I think we're talking about... I think we're talking about all of knowledge work that people do that can be automated online. All of these, like, trillions of hours that happen online where people are working. And I think that there's so much software to be built that, like, I tend not to think about how these companies will collide. I just try to solve the problem as best as I can and make this specific piece of infrastructure, which I think is an important primitive, the best I possibly can. And yeah. I think there's players that are actually going to like it. I think there's players that are going to launch, like, over-the-top, you know, platforms, like agent platforms that have all these tools built in, right? Like, who's building the rippling for agent tools that has the search tool, the browser tool, the operating system tool, right? There are some. There are some. There are some, right? And I think in the end, what I have seen as my time as a developer, and I look at all the favorite tools that I have, is that, like, for tools and primitives with sufficient levels of complexity, you need to have a solution that's really bespoke to that primitive, you know? And I am sufficiently convinced that the browser is complex enough to deserve a primitive. Obviously, I have to. I'm the founder of BrowserBase, right? I'm talking my book. But, like, I think maybe I can give you one spicy take against, like, maybe just whole OS running. I think that when I look at computer use when it first came out, I saw that the majority of use cases for computer use were controlling a browser. And do we really need to run an entire operating system just to control a browser? I don't think so. I don't think that's necessary. You know, BrowserBase can run browsers for way cheaper than you can if you're running a full-fledged OS with a GUI, you know, operating system. And I think that's just an advantage of the browser. It is, like, browsers are little OSs, and you can run them very efficiently if you orchestrate it well. And I think that allows us to offer 90% of the, you know, functionality in the platform needed at 10% of the cost of running a full OS. Yeah.Open Operator: Browserbase's Open-Source Alternativeswyx [00:49:16]: I definitely see the logic in that. There's a Mark Andreessen quote. I don't know if you know this one. Where he basically observed that the browser is turning the operating system into a poorly debugged set of device drivers, because most of the apps are moved from the OS to the browser. So you can just run browsers.Paul [00:49:31]: There's a place for OSs, too. Like, I think that there are some applications that only run on Windows operating systems. And Eric from pig.dev in this upcoming YC batch, or last YC batch, like, he's building all run tons of Windows operating systems for you to control with your agent. And like, there's some legacy EHR systems that only run on Internet-controlled systems. Yeah.Paul [00:49:54]: I think that's it. I think, like, there are use cases for specific operating systems for specific legacy software. And like, I'm excited to see what he does with that. I just wanted to give a shout out to the pig.dev website.swyx [00:50:06]: The pigs jump when you click on them. Yeah. That's great.Paul [00:50:08]: Eric, he's the former co-founder of banana.dev, too.swyx [00:50:11]: Oh, that Eric. Yeah. That Eric. Okay. Well, he abandoned bananas for pigs. I hope he doesn't start going around with pigs now.Alessio [00:50:18]: Like he was going around with bananas. A little toy pig. Yeah. Yeah. I love that. What else are we missing? I think we covered a lot of, like, the browser-based product history, but. What do you wish people asked you? Yeah.Paul [00:50:29]: I wish people asked me more about, like, what will the future of software look like? Because I think that's really where I've spent a lot of time about why do browser-based. Like, for me, starting a company is like a means of last resort. Like, you shouldn't start a company unless you absolutely have to. And I remain convinced that the future of software is software that you're going to click a button and it's going to do stuff on your behalf. Right now, software. You click a button and it maybe, like, calls it back an API and, like, computes some numbers. It, like, modifies some text, whatever. But the future of software is software using software. So, I may log into my accounting website for my business, click a button, and it's going to go load up my Gmail, search my emails, find the thing, upload the receipt, and then comment it for me. Right? And it may use it using APIs, maybe a browser. I don't know. I think it's a little bit of both. But that's completely different from how we've built software so far. And that's. I think that future of software has different infrastructure requirements. It's going to require different UIs. It's going to require different pieces of infrastructure. I think the browser infrastructure is one piece that fits into that, along with all the other categories you mentioned. So, I think that it's going to require developers to think differently about how they've built software for, you know
Join Simtheory: https://simtheory.ai----Grok 3 Dis Track (cringe): https://simulationtheory.ai/aff9ba04-ca0e-4572-84f4-687739c7b84bGrok 3 Dis Track written by Sonnet: https://simulationtheory.ai/edaed525-b9b6-473b-a6d6-f9cca9673868----Community: https://thisdayinai.com----Chapters:00:00 - First Impressions of Grok 310:00 - Discussion about Deep Search, Deep Research24:28 - Market landscape: Is OpenAI Rattled by xAI's Grok 3? Rumors of GPT-4.5 and GPT-548:48 - Why does Grok and xAI Exist? Will anyone care about Grok 3 next week?54:45 - Diss track battle with Grok 3 (re-written by Sonnet) & Model Tuning for Use Cases1:07:50 - GPT-4.5 and Anthropic Claude Thinking Next Week? & Are we a podcast about Altavista?1:13:25 - Economically productive agents & freaky muscular robot1:22:00 - Final thoughts of the week1:27:26 - Grok 3 Dis Track in Full (Sonnet Version)Thanks for your support and listening!
Join Simtheory: https://simtheory.aiCommunity: https://thisdayinai.com---CHAPTERS:00:00 - Anthropic Economic Index & The Impact of AI Agents18:00 - Hype Vs Reality of Models & Agents31:33 - Dream Agents & Side Quest Background Tasks56:60 - How All SaaS Will Be Disrupted by AI1:21:10 - Sam Altman's GPT-4.5, GPT-5 Roadmap1:28:50 - Anthropic Claude 4: Anthropic Strikes Back---Thanks for listening and your support.
Try a walking desk while studying ML or working on your projects! https://ocdevel.com/walk Show notes: https://ocdevel.com/mlg/mla-22 Tools discussed: Windsurf: https://codeium.com/windsurf Copilot: https://github.com/features/copilot Cursor: https://www.cursor.com/ Cline: https://github.com/cline/cline Roo Code: https://github.com/RooVetGit/Roo-Code Aider: https://aider.chat/ Other: Leaderboards: https://aider.chat/docs/leaderboards/ Video of speed-demon: https://www.youtube.com/watch?v=QlUt06XLbJE&feature=youtu.be Reddit: https://www.reddit.com/r/chatgptcoding/ Examines the rapidly evolving world of AI coding tools designed to boost programming productivity by acting as a pair programming partner. The discussion groups these tools into three categories: • Hands-Off Tools: These include solutions that work on fixed monthly fees and require minimal user intervention. GitHub Copilot started with simple tab completions and now offers an agent mode similar to Cursor, which stands out for its advanced codebase indexing and intelligent file searching. Windsurf is noted for its simplicity—accepting prompts and performing automated edits—but some users report performance throttling after prolonged use. • Hands-On Tools: Aider is presented as a command-line utility that demands configuration and user involvement. It allows developers to specify files and settings, and it efficiently manages token usage by sending prompts in diff format. Aider also implements an “architect versus edit” approach: a reasoning model (such as DeepSeek R1) first outlines a sequence of changes, then an editor model (like Claude 3.5 Sonnet) produces precise code edits. This dual-model strategy enhances accuracy and reduces token costs, especially for complex tasks. • Intermediate Power Tools: Open-source tools such as Cline and its more advanced fork, RooCode, require users to supply their own API keys and pay per token. These tools offer robust, agentic features, including codebase indexing, file editing, and even browser automation. RooCode stands out with its ability to autonomously expand functionality through integrations (for example, managing cloud resources or querying issue trackers), making it particularly attractive for tinkerers and power users. A decision framework is suggested: for those new to AI coding assistants or with limited budgets, starting with Cursor (or cautiously exploring Copilot's new features) is recommended. For developers who want to customize their workflow and dive deep into the tooling, RooCode or Cline offer greater control—always paired with Aider for precise and token-efficient code edits. Also reviews model performance using a coding benchmark leaderboard that updates frequently. The current top-performing combination uses DeepSeek R1 as the architect and Claude 3.5 Sonnet as the editor, with alternatives such as OpenAI's O1 and O3 Mini available. Tools like Open Router are mentioned as a way to consolidate API key management and reduce token costs.
Join Simtheory: https://simtheory.ai----"Don't Cha" Song: https://simulationtheory.ai/cbf4d5e6-82e4-4e84-91e7-3b48cb2744efSpotify: https://open.spotify.com/track/4Q8dRV45WYfxePE7zi52iL?si=ed094fce41e54c8fCommunity: https://thisdayinai.com---CHAPTERS:00:00 - We're on Spotify!01:06 - o3-mini release and initial impressions18:37 - Reasoning models as agents47:20 - OpenAI's Deep Research: impressions and what it means1:12:20 - Addressing our Shilling for Sonnet & My Week with o1 Experience1:20:18 - Gemini 2.0 Flash GA, Gemini 2.0 Pro Experimental + Other Google Updates1:38:16 - LOL of week and final thoughts1:43:39 - Don't Cha Song in Full
OpenAI is pushing the boundaries of artificial intelligence yet again. In this episode of Rocketship.FM, we break down what Chief Product Officer Kevin Weil revealed about OpenAI's roadmap for 2025 and beyond—including the latest AI model, O1, which is already outperforming previous versions in coding, math, and reasoning. But that's just the beginning. We also explore OpenAI's move into AI-powered agents designed to streamline everyday tasks, and the company's rumored return to humanoid robotics. And what about Artificial General Intelligence (AGI) and even Artificial Superintelligence (ASI)? OpenAI CEO Sam Altman has hinted that these once-distant milestones could be closer than we think. What happens when AI surpasses human intelligence? Will it be a utopia of limitless innovation, or are we opening a Pandora's box we can't close? Join us as we unpack OpenAI's vision for the future—and what it could mean for the world.
Our 197th episode with a summary and discussion of last week's big AI news! Recorded on 01/17/2024 Join our brand new Discord here! https://discord.gg/nTyezGSKwP Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: - DeepSeek releases R1, a competitive AI model comparable to OpenAI's O1, leading to market unrest and significant drops in tech stocks, including a 17% plunge in NVIDIA's stock. - OpenAI launches Operator to facilitate agentic computer use, while facing competition from new releases by DeepSeek and Quen, with applications seeing rapid adoption. - President Trump revokes the Biden administration's executive order on AI, signaling a shift in AI policy and deregulation efforts. - Taiwanese government clears TSMC to produce advanced 2-nanometer chip technology abroad, aiming to strengthen global semiconductor supply amidst geopolitical tensions. If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form. Timestamps + Links: (00:00:00) Intro / Banter (00:03:01) Response to listener comments Projects & Open Source (00:06:26) DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (00:30:25) Viral AI company DeepSeek releases new image model family (00:34:07) Qwen2.5-1M Technical Report (00:38:32) Alibaba's Qwen team releases AI models that can control PCs and phones Tools & Apps (00:42:09) OpenAI launches Operator, an AI agent that performs tasks autonomously (00:47:37) DeepSeek reaches No. 1 on US Play Store (00:52:17) Alibaba rolled out Qwen Chat v0.2 and Qwen2.5-1M model (00:53:50) Perplexity launches US-hosted DeepSeek R1, hints at EU hosting soon (00:55:31) Apple is pulling its AI-generated notifications for news after generating fake headlines (00:59:00) French AI ‘Lucie' looks très chic, but keeps getting answers wrong Applications & Business (01:02:09) DeepSeek's New AI Model Sparks Shock, Awe, and Questions From US Competitors (01:08:16) Microsoft loses OpenAI exclusive cloud provider status to $500 billion Stargate project (01:13:34) OpenAI adds BlackRock exec Adebayo Ogunlesi to board of directors (01:15:33) ElevenLabs has raised a new round at $3B+ valuation led by ICONIQ Growth, sources say Policy & Safety (01:16:29) Donald Trump unveils $500 billion Stargate Project to build AI infrastructure in the US, promising over 100K jobs (01:21:16) Trump Revokes Biden AI Policy, Signs Executive Order to Strengthen AI Leadership (01:23:59) Anthropic CEO doesn't see DeepSeek as ‘adversaries,' but says export controls are critical (01:31:12) Taiwanese govt clears TSMC to make 2nm chips abroad — country lowers its 'Silicon Shield' (01:33:47) Outro
Join Simtheory: https://simtheory.ai---LINKS FROM SHOW:- Built to Reason (an o1 Tribute song): https://simulationtheory.ai/3f3ff70d-afef-4372-a9a5-26b22824c383- Sputnik Moment Song: https://simulationtheory.ai/4317176e-5c0d-49b9-801b-b686113624fd- Episode 91 Notes: https://simulationtheory.ai/b64f40ce-dab8-40b7-89a1-f24d17296f5aCHAPTERS:00:00 - Is Deepseek R1 a Sputnik Moment?15:32 - Industry Reaction to Deepseek R139:30 - Can Deepseek R1 Write a Good Dis Track?46:21 - Will AI Disrupt All Software: Throw Away AI Software & Custom Interfaces1:10:04 - OpenAI's Operator Thoughts & Computer Use in the Enterprise1:16:45 - Google Releases Gemini 2.0 Flash Officially Released, Rumors of o3-mini & Farewell to o11:22:07 - In loving memory of o1...---thx 4 listening, like and sub.
In this episode of The Two Minute Drill, Drex dives into the groundbreaking release of DeepSeek R1, a Chinese AI reasoning model rivaling OpenAI's O1. Next, CISA and FBI warnings about ongoing exploitation of Ivanti cloud application vulnerabilities. Then, the controversial pardon of Silk Road founder Ross Ulbricht and its implications for cybersecurity.Remember, Stay a Little Paranoid Subscribe: This Week Health Twitter: This Week Health LinkedIn: Week Health Donate: Alex's Lemonade Stand: Foundation for Childhood Cancer
The AI Breakdown: Daily Artificial Intelligence News and Discussions
DeekSeek has released R1, their answer to OpenAI's O1, and it has Silicon Valley chattering and markets crashing. But just how big a deal is it? Big, argues NLW, even if the likely impact might be different than what Wall Street seems to think. Brought to you by: KPMG – Go to www.kpmg.us/ai to learn more about how KPMG can help you drive value with our AI solutions. Vanta - Simplify compliance - https://vanta.com/nlw The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown
In this episode, Ricardo talks about DeepSeek, a groundbreaking AI application from a small Chinese startup. Unlike other AI models, DeepSeek was trained for just $5.7M—far less than OpenAI's $100M+ investments—yet it rivals top models like OpenAI's O1. This breakthrough could disrupt the AI industry, enabling smaller companies to develop advanced models without massive infrastructure. The news has already impacted major tech stocks, including Nvidia and Microsoft. If DeepSeek's claims hold true, AI accessibility will skyrocket, reshaping project management and beyond. Ricardo urges listeners to explore DeepSeek and stay alert to its potential impact. Tune in to the podcast to learn more!
One last Gold sponsor slot is available for the AI Engineer Summit in NYC. Our last round of invites is going out soon - apply here - If you are building AI agents or AI eng teams, this will be the single highest-signal conference of the year for you!While the world melts down over DeepSeek, few are talking about the OTHER notable group of former hedge fund traders who pivoted into AI and built a remarkably profitable consumer AI business with a tiny team with incredibly cracked engineering team — Chai Research. In short order they have:* Started a Chat AI company well before Noam Shazeer started Character AI, and outlasted his departure.* Crossed 1m DAU in 2.5 years - William updates us on the pod that they've hit 1.4m DAU now, another +40% from a few months ago. Revenue crossed >$22m. * Launched the Chaiverse model crowdsourcing platform - taking 3-4 week A/B testing cycles down to 3-4 hours, and deploying >100 models a week.While they're not paying million dollar salaries, you can tell they're doing pretty well for an 11 person startup:The Chai Recipe: Building infra for rapid evalsRemember how the central thesis of LMarena (formerly LMsys) is that the only comprehensive way to evaluate LLMs is to let users try them out and pick winners?At the core of Chai is a mobile app that looks like Character AI, but is actually the largest LLM A/B testing arena in the world, specialized on retaining chat users for Chai's usecases (therapy, assistant, roleplay, etc). It's basically what LMArena would be if taken very, very seriously at one company (with $1m in prizes to boot):Chai publishes occasional research on how they think about this, including talks at their Palo Alto office:William expands upon this in today's podcast (34 mins in):Fundamentally, the way I would describe it is when you're building anything in life, you need to be able to evaluate it. And through evaluation, you can iterate, we can look at benchmarks, and we can say the issues with benchmarks and why they may not generalize as well as one would hope in the challenges of working with them. But something that works incredibly well is getting feedback from humans. And so we built this thing where anyone can submit a model to our developer backend, and it gets put in front of 5000 users, and the users can rate it. And we can then have a really accurate ranking of like which model, or users finding more engaging or more entertaining. And it gets, you know, it's at this point now, where every day we're able to, I mean, we evaluate between 20 and 50 models, LLMs, every single day, right. So even though we've got only got a team of, say, five AI researchers, they're able to iterate a huge quantity of LLMs, right. So our team ships, let's just say minimum 100 LLMs a week is what we're able to iterate through. Now, before that moment in time, we might iterate through three a week, we might, you know, there was a time when even doing like five a month was a challenge, right? By being able to change the feedback loops to the point where it's not, let's launch these three models, let's do an A-B test, let's assign, let's do different cohorts, let's wait 30 days to see what the day 30 retention is, which is the kind of the, if you're doing an app, that's like A-B testing 101 would be, do a 30-day retention test, assign different treatments to different cohorts and come back in 30 days. So that's insanely slow. That's just, it's too slow. And so we were able to get that 30-day feedback loop all the way down to something like three hours.In Crowdsourcing the leap to Ten Trillion-Parameter AGI, William describes Chai's routing as a recommender system, which makes a lot more sense to us than previous pitches for model routing startups:William is notably counter-consensus in a lot of his AI product principles:* No streaming: Chats appear all at once to allow rejection sampling* No voice: Chai actually beat Character AI to introducing voice - but removed it after finding that it was far from a killer feature.* Blending: “Something that we love to do at Chai is blending, which is, you know, it's the simplest way to think about it is you're going to end up, and you're going to pretty quickly see you've got one model that's really smart, one model that's really funny. How do you get the user an experience that is both smart and funny? Well, just 50% of the requests, you can serve them the smart model, 50% of the requests, you serve them the funny model.” (that's it!)But chief above all is the recommender system.We also referenced Exa CEO Will Bryk's concept of SuperKnowlege:Full Video versionOn YouTube. please like and subscribe!Timestamps* 00:00:04 Introductions and background of William Beauchamp* 00:01:19 Origin story of Chai AI* 00:04:40 Transition from finance to AI* 00:11:36 Initial product development and idea maze for Chai* 00:16:29 User psychology and engagement with AI companions* 00:20:00 Origin of the Chai name* 00:22:01 Comparison with Character AI and funding challenges* 00:25:59 Chai's growth and user numbers* 00:34:53 Key inflection points in Chai's growth* 00:42:10 Multi-modality in AI companions and focus on user-generated content* 00:46:49 Chaiverse developer platform and model evaluation* 00:51:58 Views on AGI and the nature of AI intelligence* 00:57:14 Evaluation methods and human feedback in AI development* 01:02:01 Content creation and user experience in Chai* 01:04:49 Chai Grant program and company culture* 01:07:20 Inference optimization and compute costs* 01:09:37 Rejection sampling and reward models in AI generation* 01:11:48 Closing thoughts and recruitmentTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel, and today we're in the Chai AI office with my usual co-host, Swyx.swyx [00:00:14]: Hey, thanks for having us. It's rare that we get to get out of the office, so thanks for inviting us to your home. We're in the office of Chai with William Beauchamp. Yeah, that's right. You're founder of Chai AI, but previously, I think you're concurrently also running your fund?William [00:00:29]: Yep, so I was simultaneously running an algorithmic trading company, but I fortunately was able to kind of exit from that, I think just in Q3 last year. Yeah, congrats. Yeah, thanks.swyx [00:00:43]: So Chai has always been on my radar because, well, first of all, you do a lot of advertising, I guess, in the Bay Area, so it's working. Yep. And second of all, the reason I reached out to a mutual friend, Joyce, was because I'm just generally interested in the... ...consumer AI space, chat platforms in general. I think there's a lot of inference insights that we can get from that, as well as human psychology insights, kind of a weird blend of the two. And we also share a bit of a history as former finance people crossing over. I guess we can just kind of start it off with the origin story of Chai.William [00:01:19]: Why decide working on a consumer AI platform rather than B2B SaaS? So just quickly touching on the background in finance. Sure. Originally, I'm from... I'm from the UK, born in London. And I was fortunate enough to go study economics at Cambridge. And I graduated in 2012. And at that time, everyone in the UK and everyone on my course, HFT, quant trading was really the big thing. It was like the big wave that was happening. So there was a lot of opportunity in that space. And throughout college, I'd sort of played poker. So I'd, you know, I dabbled as a professional poker player. And I was able to accumulate this sort of, you know, say $100,000 through playing poker. And at the time, as my friends would go work at companies like ChangeStreet or Citadel, I kind of did the maths. And I just thought, well, maybe if I traded my own capital, I'd probably come out ahead. I'd make more money than just going to work at ChangeStreet.swyx [00:02:20]: With 100k base as capital?William [00:02:22]: Yes, yes. That's not a lot. Well, it depends what strategies you're doing. And, you know, there is an advantage. There's an advantage to being small, right? Because there are, if you have a 10... Strategies that don't work in size. Exactly, exactly. So if you have a fund of $10 million, if you find a little anomaly in the market that you might be able to make 100k a year from, that's a 1% return on your 10 million fund. If your fund is 100k, that's 100% return, right? So being small, in some sense, was an advantage. So started off, and the, taught myself Python, and machine learning was like the big thing as well. Machine learning had really, it was the first, you know, big time machine learning was being used for image recognition, neural networks come out, you get dropout. And, you know, so this, this was the big thing that's going on at the time. So I probably spent my first three years out of Cambridge, just building neural networks, building random forests to try and predict asset prices, right, and then trade that using my own money. And that went well. And, you know, if you if you start something, and it goes well, you You try and hire more people. And the first people that came to mind was the talented people I went to college with. And so I hired some friends. And that went well and hired some more. And eventually, I kind of ran out of friends to hire. And so that was when I formed the company. And from that point on, we had our ups and we had our downs. And that was a whole long story and journey in itself. But after doing that for about eight or nine years, on my 30th birthday, which was four years ago now, I kind of took a step back to just evaluate my life, right? This is what one does when one turns 30. You know, I just heard it. I hear you. And, you know, I looked at my 20s and I loved it. It was a really special time. I was really lucky and fortunate to have worked with this amazing team, been successful, had a lot of hard times. And through the hard times, learned wisdom and then a lot of success and, you know, was able to enjoy it. And so the company was making about five million pounds a year. And it was just me and a team of, say, 15, like, Oxford and Cambridge educated mathematicians and physicists. It was like the real dream that you'd have if you wanted to start a quant trading firm. It was like...swyx [00:04:40]: Your own, all your own money?William [00:04:41]: Yeah, exactly. It was all the team's own money. We had no customers complaining to us about issues. There's no investors, you know, saying, you know, they don't like the risk that we're taking. We could. We could really run the thing exactly as we wanted it. It's like Susquehanna or like Rintec. Yeah, exactly. Yeah. And they're the companies that we would kind of look towards as we were building that thing out. But on my 30th birthday, I look and I say, OK, great. This thing is making as much money as kind of anyone would really need. And I thought, well, what's going to happen if we keep going in this direction? And it was clear that we would never have a kind of a big, big impact on the world. We can enrich ourselves. We can make really good money. Everyone on the team would be paid very, very well. Presumably, I can make enough money to buy a yacht or something. But this stuff wasn't that important to me. And so I felt a sort of obligation that if you have this much talent and if you have a talented team, especially as a founder, you want to be putting all that talent towards a good use. I looked at the time of like getting into crypto and I had a really strong view on crypto, which was that as far as a gambling device. This is like the most fun form of gambling invented in like ever super fun, I thought as a way to evade monetary regulations and banking restrictions. I think it's also absolutely amazing. So it has two like killer use cases, not so much banking the unbanked, but everything else, but everything else to do with like the blockchain and, and you know, web, was it web 3.0 or web, you know, that I, that didn't, it didn't really make much sense. And so instead of going into crypto, which I thought, even if I was successful, I'd end up in a lot of trouble. I thought maybe it'd be better to build something that governments wouldn't have a problem with. I knew that LLMs were like a thing. I think opening. I had said they hadn't released GPT-3 yet, but they'd said GPT-3 is so powerful. We can't release it to the world or something. Was it GPT-2? And then I started interacting with, I think Google had open source, some language models. They weren't necessarily LLMs, but they, but they were. But yeah, exactly. So I was able to play around with, but nowadays so many people have interacted with the chat GPT, they get it, but it's like the first time you, you can just talk to a computer and it talks back. It's kind of a special moment and you know, everyone who's done that goes like, wow, this is how it should be. Right. It should be like, rather than having to type on Google and search, you should just be able to ask Google a question. When I saw that I read the literature, I kind of came across the scaling laws and I think even four years ago. All the pieces of the puzzle were there, right? Google had done this amazing research and published, you know, a lot of it. Open AI was still open. And so they'd published a lot of their research. And so you really could be fully informed on, on the state of AI and where it was going. And so at that point I was confident enough, it was worth a shot. I think LLMs are going to be the next big thing. And so that's the thing I want to be building in, in that space. And I thought what's the most impactful product I can possibly build. And I thought it should be a platform. So I myself love platforms. I think they're fantastic because they open up an ecosystem where anyone can contribute to it. Right. So if you think of a platform like a YouTube, instead of it being like a Hollywood situation where you have to, if you want to make a TV show, you have to convince Disney to give you the money to produce it instead, anyone in the world can post any content they want to YouTube. And if people want to view it, the algorithm is going to promote it. Nowadays. You can look at creators like Mr. Beast or Joe Rogan. They would have never have had that opportunity unless it was for this platform. Other ones like Twitter's a great one, right? But I would consider Wikipedia to be a platform where instead of the Britannica encyclopedia, which is this, it's like a monolithic, you get all the, the researchers together, you get all the data together and you combine it in this, in this one monolithic source. Instead. You have this distributed thing. You can say anyone can host their content on Wikipedia. Anyone can contribute to it. And anyone can maybe their contribution is they delete stuff. When I was hearing like the kind of the Sam Altman and kind of the, the Muskian perspective of AI, it was a very kind of monolithic thing. It was all about AI is basically a single thing, which is intelligence. Yeah. Yeah. The more intelligent, the more compute, the more intelligent, and the more and better AI researchers, the more intelligent, right? They would speak about it as a kind of erased, like who can get the most data, the most compute and the most researchers. And that would end up with the most intelligent AI. But I didn't believe in any of that. I thought that's like the total, like I thought that perspective is the perspective of someone who's never actually done machine learning. Because with machine learning, first of all, you see that the performance of the models follows an S curve. So it's not like it just goes off to infinity, right? And the, the S curve, it kind of plateaus around human level performance. And you can look at all the, all the machine learning that was going on in the 2010s, everything kind of plateaued around the human level performance. And we can think about the self-driving car promises, you know, how Elon Musk kept saying the self-driving car is going to happen next year, it's going to happen next, next year. Or you can look at the image recognition, the speech recognition. You can look at. All of these things, there was almost nothing that went superhuman, except for something like AlphaGo. And we can speak about why AlphaGo was able to go like super superhuman. So I thought the most likely thing was going to be this, I thought it's not going to be a monolithic thing. That's like an encyclopedia Britannica. I thought it must be a distributed thing. And I actually liked to look at the world of finance for what I think a mature machine learning ecosystem would look like. So, yeah. So finance is a machine learning ecosystem because all of these quant trading firms are running machine learning algorithms, but they're running it on a centralized platform like a marketplace. And it's not the case that there's one giant quant trading company of all the data and all the quant researchers and all the algorithms and compute, but instead they all specialize. So one will specialize on high frequency training. Another will specialize on mid frequency. Another one will specialize on equity. Another one will specialize. And I thought that's the way the world works. That's how it is. And so there must exist a platform where a small team can produce an AI for a unique purpose. And they can iterate and build the best thing for that, right? And so that was the vision for Chai. So we wanted to build a platform for LLMs.Alessio [00:11:36]: That's kind of the maybe inside versus contrarian view that led you to start the company. Yeah. And then what was maybe the initial idea maze? Because if somebody told you that was the Hugging Face founding story, people might believe it. It's kind of like a similar ethos behind it. How did you land on the product feature today? And maybe what were some of the ideas that you discarded that initially you thought about?William [00:11:58]: So the first thing we built, it was fundamentally an API. So nowadays people would describe it as like agents, right? But anyone could write a Python script. They could submit it to an API. They could send it to the Chai backend and we would then host this code and execute it. So that's like the developer side of the platform. On their Python script, the interface was essentially text in and text out. An example would be the very first bot that I created. I think it was a Reddit news bot. And so it would first, it would pull the popular news. Then it would prompt whatever, like I just use some external API for like Burr or GPT-2 or whatever. Like it was a very, very small thing. And then the user could talk to it. So you could say to the bot, hi bot, what's the news today? And it would say, this is the top stories. And you could chat with it. Now four years later, that's like perplexity or something. That's like the, right? But back then the models were first of all, like really, really dumb. You know, they had an IQ of like a four year old. And users, there really wasn't any demand or any PMF for interacting with the news. So then I was like, okay. Um. So let's make another one. And I made a bot, which was like, you could talk to it about a recipe. So you could say, I'm making eggs. Like I've got eggs in my fridge. What should I cook? And it'll say, you should make an omelet. Right. There was no PMF for that. No one used it. And so I just kept creating bots. And so every single night after work, I'd be like, okay, I like, we have AI, we have this platform. I can create any text in textile sort of agent and put it on the platform. And so we just create stuff night after night. And then all the coders I knew, I would say, yeah, this is what we're going to do. And then I would say to them, look, there's this platform. You can create any like chat AI. You should put it on. And you know, everyone's like, well, chatbots are super lame. We want absolutely nothing to do with your chatbot app. No one who knew Python wanted to build on it. I'm like trying to build all these bots and no consumers want to talk to any of them. And then my sister who at the time was like just finishing college or something, I said to her, I was like, if you want to learn Python, you should just submit a bot for my platform. And she, she built a therapy for me. And I was like, okay, cool. I'm going to build a therapist bot. And then the next day I checked the performance of the app and I'm like, oh my God, we've got 20 active users. And they spent, they spent like an average of 20 minutes on the app. I was like, oh my God, what, what bot were they speaking to for an average of 20 minutes? And I looked and it was the therapist bot. And I went, oh, this is where the PMF is. There was no demand for, for recipe help. There was no demand for news. There was no demand for dad jokes or pub quiz or fun facts or what they wanted was they wanted the therapist bot. the time I kind of reflected on that and I thought, well, if I want to consume news, the most fun thing, most fun way to consume news is like Twitter. It's not like the value of there being a back and forth, wasn't that high. Right. And I thought if I need help with a recipe, I actually just go like the New York times has a good recipe section, right? It's not actually that hard. And so I just thought the thing that AI is 10 X better at is a sort of a conversation right. That's not intrinsically informative, but it's more about an opportunity. You can say whatever you want. You're not going to get judged. If it's 3am, you don't have to wait for your friend to text back. It's like, it's immediate. They're going to reply immediately. You can say whatever you want. It's judgment-free and it's much more like a playground. It's much more like a fun experience. And you could see that if the AI gave a person a compliment, they would love it. It's much easier to get the AI to give you a compliment than a human. From that day on, I said, okay, I get it. Humans want to speak to like humans or human like entities and they want to have fun. And that was when I started to look less at platforms like Google. And I started to look more at platforms like Instagram. And I was trying to think about why do people use Instagram? And I could see that I think Chai was, was filling the same desire or the same drive. If you go on Instagram, typically you want to look at the faces of other humans, or you want to hear about other people's lives. So if it's like the rock is making himself pancakes on a cheese plate. You kind of feel a little bit like you're the rock's friend, or you're like having pancakes with him or something, right? But if you do it too much, you feel like you're sad and like a lonely person, but with AI, you can talk to it and tell it stories and tell you stories, and you can play with it for as long as you want. And you don't feel like you're like a sad, lonely person. You feel like you actually have a friend.Alessio [00:16:29]: And what, why is that? Do you have any insight on that from using it?William [00:16:33]: I think it's just the human psychology. I think it's just the idea that, with old school social media. You're just consuming passively, right? So you'll just swipe. If I'm watching TikTok, just like swipe and swipe and swipe. And even though I'm getting the dopamine of like watching an engaging video, there's this other thing that's building my head, which is like, I'm feeling lazier and lazier and lazier. And after a certain period of time, I'm like, man, I just wasted 40 minutes. I achieved nothing. But with AI, because you're interacting, you feel like you're, it's not like work, but you feel like you're participating and contributing to the thing. You don't feel like you're just. Consuming. So you don't have a sense of remorse basically. And you know, I think on the whole people, the way people talk about, try and interact with the AI, they speak about it in an incredibly positive sense. Like we get people who say they have eating disorders saying that the AI helps them with their eating disorders. People who say they're depressed, it helps them through like the rough patches. So I think there's something intrinsically healthy about interacting that TikTok and Instagram and YouTube doesn't quite tick. From that point on, it was about building more and more kind of like human centric AI for people to interact with. And I was like, okay, let's make a Kanye West bot, right? And then no one wanted to talk to the Kanye West bot. And I was like, ah, who's like a cool persona for teenagers to want to interact with. And I was like, I was trying to find the influencers and stuff like that, but no one cared. Like they didn't want to interact with the, yeah. And instead it was really just the special moment was when we said the realization that developers and software engineers aren't interested in building this sort of AI, but the consumers are right. And rather than me trying to guess every day, like what's the right bot to submit to the platform, why don't we just create the tools for the users to build it themselves? And so nowadays this is like the most obvious thing in the world, but when Chai first did it, it was not an obvious thing at all. Right. Right. So we took the API for let's just say it was, I think it was GPTJ, which was this 6 billion parameter open source transformer style LLM. We took GPTJ. We let users create the prompt. We let users select the image and we let users choose the name. And then that was the bot. And through that, they could shape the experience, right? So if they said this bot's going to be really mean, and it's going to be called like bully in the playground, right? That was like a whole category that I never would have guessed. Right. People love to fight. They love to have a disagreement, right? And then they would create, there'd be all these romantic archetypes that I didn't know existed. And so as the users could create the content that they wanted, that was when Chai was able to, to get this huge variety of content and rather than appealing to, you know, 1% of the population that I'd figured out what they wanted, you could appeal to a much, much broader thing. And so from that moment on, it was very, very crystal clear. It's like Chai, just as Instagram is this social media platform that lets people create images and upload images, videos and upload that, Chai was really about how can we let the users create this experience in AI and then share it and interact and search. So it's really, you know, I say it's like a platform for social AI.Alessio [00:20:00]: Where did the Chai name come from? Because you started the same path. I was like, is it character AI shortened? You started at the same time, so I was curious. The UK origin was like the second, the Chai.William [00:20:15]: We started way before character AI. And there's an interesting story that Chai's numbers were very, very strong, right? So I think in even 20, I think late 2022, was it late 2022 or maybe early 2023? Chai was like the number one AI app in the app store. So we would have something like 100,000 daily active users. And then one day we kind of saw there was this website. And we were like, oh, this website looks just like Chai. And it was the character AI website. And I think that nowadays it's, I think it's much more common knowledge that when they left Google with the funding, I think they knew what was the most trending, the number one app. And I think they sort of built that. Oh, you found the people.swyx [00:21:03]: You found the PMF for them.William [00:21:04]: We found the PMF for them. Exactly. Yeah. So I worked a year very, very hard. And then they, and then that was when I learned a lesson, which is that if you're VC backed and if, you know, so Chai, we'd kind of ran, we'd got to this point, I was the only person who'd invested. I'd invested maybe 2 million pounds in the business. And you know, from that, we were able to build this thing, get to say a hundred thousand daily active users. And then when character AI came along, the first version, we sort of laughed. We were like, oh man, this thing sucks. Like they don't know what they're building. They're building the wrong thing anyway, but then I saw, oh, they've raised a hundred million dollars. Oh, they've raised another hundred million dollars. And then our users started saying, oh guys, your AI sucks. Cause we were serving a 6 billion parameter model, right? How big was the model that character AI could afford to serve, right? So we would be spending, let's say we would spend a dollar per per user, right? Over the, the, you know, the entire lifetime.swyx [00:22:01]: A dollar per session, per chat, per month? No, no, no, no.William [00:22:04]: Let's say we'd get over the course of the year, we'd have a million users and we'd spend a million dollars on the AI throughout the year. Right. Like aggregated. Exactly. Exactly. Right. They could spend a hundred times that. So people would say, why is your AI much dumber than character AIs? And then I was like, oh, okay, I get it. This is like the Silicon Valley style, um, hyper scale business. And so, yeah, we moved to Silicon Valley and, uh, got some funding and iterated and built the flywheels. And, um, yeah, I, I'm very proud that we were able to compete with that. Right. So, and I think the reason we were able to do it was just customer obsession. And it's similar, I guess, to how deep seek have been able to produce such a compelling model when compared to someone like an open AI, right? So deep seek, you know, their latest, um, V2, yeah, they claim to have spent 5 million training it.swyx [00:22:57]: It may be a bit more, but, um, like, why are you making it? Why are you making such a big deal out of this? Yeah. There's an agenda there. Yeah. You brought up deep seek. So we have to ask you had a call with them.William [00:23:07]: We did. We did. We did. Um, let me think what to say about that. I think for one, they have an amazing story, right? So their background is again in finance.swyx [00:23:16]: They're the Chinese version of you. Exactly.William [00:23:18]: Well, there's a lot of similarities. Yes. Yes. I have a great affinity for companies which are like, um, founder led, customer obsessed and just try and build something great. And I think what deep seek have achieved. There's quite special is they've got this amazing inference engine. They've been able to reduce the size of the KV cash significantly. And then by being able to do that, they're able to significantly reduce their inference costs. And I think with kind of with AI, people get really focused on like the kind of the foundation model or like the model itself. And they sort of don't pay much attention to the inference. To give you an example with Chai, let's say a typical user session is 90 minutes, which is like, you know, is very, very long for comparison. Let's say the average session length on TikTok is 70 minutes. So people are spending a lot of time. And in that time they're able to send say 150 messages. That's a lot of completions, right? It's quite different from an open AI scenario where people might come in, they'll have a particular question in mind. And they'll ask like one question. And a few follow up questions, right? So because they're consuming, say 30 times as many requests for a chat, or a conversational experience, you've got to figure out how to how to get the right balance between the cost of that and the quality. And so, you know, I think with AI, it's always been the case that if you want a better experience, you can throw compute at the problem, right? So if you want a better model, you can just make it bigger. If you want it to remember better, give it a longer context. And now, what open AI is doing to great fanfare is with projection sampling, you can generate many candidates, right? And then with some sort of reward model or some sort of scoring system, you can serve the most promising of these many candidates. And so that's kind of scaling up on the inference time compute side of things. And so for us, it doesn't make sense to think of AI is just the absolute performance. So. But what we're seeing, it's like the MML you score or the, you know, any of these benchmarks that people like to look at, if you just get that score, it doesn't really tell tell you anything. Because it's really like progress is made by improving the performance per dollar. And so I think that's an area where deep seek have been able to form very, very well, surprisingly so. And so I'm very interested in what Lama four is going to look like. And if they're able to sort of match what deep seek have been able to achieve with this performance per dollar gain.Alessio [00:25:59]: Before we go into the inference, some of the deeper stuff, can you give people an overview of like some of the numbers? So I think last I checked, you have like 1.4 million daily active now. It's like over 22 million of revenue. So it's quite a business.William [00:26:12]: Yeah, I think we grew by a factor of, you know, users grew by a factor of three last year. Revenue over doubled. You know, it's very exciting. We're competing with some really big, really well funded companies. Character AI got this, I think it was almost a $3 billion valuation. And they have 5 million DAU is a number that I last heard. Torquay, which is a Chinese built app owned by a company called Minimax. They're incredibly well funded. And these companies didn't grow by a factor of three last year. Right. And so when you've got this company and this team that's able to keep building something that gets users excited, and they want to tell their friend about it, and then they want to come and they want to stick on the platform. I think that's very special. And so last year was a great year for the team. And yeah, I think the numbers reflect the hard work that we put in. And then fundamentally, the quality of the app, the quality of the content, the quality of the content, the quality of the content, the quality of the content, the quality of the content. AI is the quality of the experience that you have. You actually published your DAU growth chart, which is unusual. And I see some inflections. Like, it's not just a straight line. There's some things that actually inflect. Yes. What were the big ones? Cool. That's a great, great, great question. Let me think of a good answer. I'm basically looking to annotate this chart, which doesn't have annotations on it. Cool. The first thing I would say is this is, I think the most important thing to know about success is that success is born out of failures. Right? Through failures that we learn. You know, if you think something's a good idea, and you do and it works, great, but you didn't actually learn anything, because everything went exactly as you imagined. But if you have an idea, you think it's going to be good, you try it, and it fails. There's a gap between the reality and expectation. And that's an opportunity to learn. The flat periods, that's us learning. And then the up periods is that's us reaping the rewards of that. So I think the big, of the growth shot of just 2024, I think the first thing that really kind of put a dent in our growth was our backend. So we just reached this scale. So we'd, from day one, we'd built on top of Google's GCP, which is Google's cloud platform. And they were fantastic. We used them when we had one daily active user, and they worked pretty good all the way up till we had about 500,000. It was never the cheapest, but from an engineering perspective, man, that thing scaled insanely good. Like, not Vertex? Not Vertex. Like GKE, that kind of stuff? We use Firebase. So we use Firebase. I'm pretty sure we're the biggest user ever on Firebase. That's expensive. Yeah, we had calls with engineers, and they're like, we wouldn't recommend using this product beyond this point, and you're 3x over that. So we pushed Google to their absolute limits. You know, it was fantastic for us, because we could focus on the AI. We could focus on just adding as much value as possible. But then what happened was, after 500,000, just the thing, the way we were using it, and it would just, it wouldn't scale any further. And so we had a really, really painful, at least three-month period, as we kind of migrated between different services, figuring out, like, what requests do we want to keep on Firebase, and what ones do we want to move on to something else? And then, you know, making mistakes. And learning things the hard way. And then after about three months, we got that right. So that, we would then be able to scale to the 1.5 million DAE without any further issues from the GCP. But what happens is, if you have an outage, new users who go on your app experience a dysfunctional app, and then they're going to exit. And so your next day, the key metrics that the app stores track are going to be something like retention rates. And so your next day, the key metrics that the app stores track are going to be something like retention rates. Money spent, and the star, like, the rating that they give you. In the app store. In the app store, yeah. Tyranny. So if you're ranked top 50 in entertainment, you're going to acquire a certain rate of users organically. If you go in and have a bad experience, it's going to tank where you're positioned in the algorithm. And then it can take a long time to kind of earn your way back up, at least if you wanted to do it organically. If you throw money at it, you can jump to the top. And I could talk about that. But broadly speaking, if we look at 2024, the first kink in the graph was outages due to hitting 500k DAU. The backend didn't want to scale past that. So then we just had to do the engineering and build through it. Okay, so we built through that, and then we get a little bit of growth. And so, okay, that's feeling a little bit good. I think the next thing, I think it's, I'm not going to lie, I have a feeling that when Character AI got... I was thinking. I think so. I think... So the Character AI team fundamentally got acquired by Google. And I don't know what they changed in their business. I don't know if they dialed down that ad spend. Products don't change, right? Products just what it is. I don't think so. Yeah, I think the product is what it is. It's like maintenance mode. Yes. I think the issue that people, you know, some people may think this is an obvious fact, but running a business can be very competitive, right? Because other businesses can see what you're doing, and they can imitate you. And then there's this... There's this question of, if you've got one company that's spending $100,000 a day on advertising, and you've got another company that's spending zero, if you consider market share, and if you're considering new users which are entering the market, the guy that's spending $100,000 a day is going to be getting 90% of those new users. And so I have a suspicion that when the founders of Character AI left, they dialed down their spending on user acquisition. And I think that kind of gave oxygen to like the other apps. And so Chai was able to then start growing again in a really healthy fashion. I think that's kind of like the second thing. I think a third thing is we've really built a great data flywheel. Like the AI team sort of perfected their flywheel, I would say, in end of Q2. And I could speak about that at length. But fundamentally, the way I would describe it is when you're building anything in life, you need to be able to evaluate it. And through evaluation, you can iterate, we can look at benchmarks, and we can say the issues with benchmarks and why they may not generalize as well as one would hope in the challenges of working with them. But something that works incredibly well is getting feedback from humans. And so we built this thing where anyone can submit a model to our developer backend, and it gets put in front of 5000 users, and the users can rate it. And we can then have a really accurate ranking of like which model, or users finding more engaging or more entertaining. And it gets, you know, it's at this point now, where every day we're able to, I mean, we evaluate between 20 and 50 models, LLMs, every single day, right. So even though we've got only got a team of, say, five AI researchers, they're able to iterate a huge quantity of LLMs, right. So our team ships, let's just say minimum 100 LLMs a week is what we're able to iterate through. Now, before that moment in time, we might iterate through three a week, we might, you know, there was a time when even doing like five a month was a challenge, right? By being able to change the feedback loops to the point where it's not, let's launch these three models, let's do an A-B test, let's assign, let's do different cohorts, let's wait 30 days to see what the day 30 retention is, which is the kind of the, if you're doing an app, that's like A-B testing 101 would be, do a 30-day retention test, assign different treatments to different cohorts and come back in 30 days. So that's insanely slow. That's just, it's too slow. And so we were able to get that 30-day feedback loop all the way down to something like three hours. And when we did that, we could really, really, really perfect techniques like DPO, fine tuning, prompt engineering, blending, rejection sampling, training a reward model, right, really successfully, like boom, boom, boom, boom, boom. And so I think in Q3 and Q4, we got, the amount of AI improvements we got was like astounding. It was getting to the point, I thought like how much more, how much more edge is there to be had here? But the team just could keep going and going and going. That was like number three for the inflection point.swyx [00:34:53]: There's a fourth?William [00:34:54]: The important thing about the third one is if you go on our Reddit or you talk to users of AI, there's like a clear date. It's like somewhere in October or something. The users, they flipped. Before October, the users... The users would say character AI is better than you, for the most part. Then from October onwards, they would say, wow, you guys are better than character AI. And that was like a really clear positive signal that we'd sort of done it. And I think people, you can't cheat consumers. You can't trick them. You can't b******t them. They know, right? If you're going to spend 90 minutes on a platform, and with apps, there's the barriers to switching is pretty low. Like you can try character AI, you can't cheat consumers. You can't cheat them. You can't cheat them. You can't cheat AI for a day. If you get bored, you can try Chai. If you get bored of Chai, you can go back to character. So the users, the loyalty is not strong, right? What keeps them on the app is the experience. If you deliver a better experience, they're going to stay and they can tell. So that was the fourth one was we were fortunate enough to get this hire. He was hired one really talented engineer. And then they said, oh, at my last company, we had a head of growth. He was really, really good. And he was the head of growth for ByteDance for two years. Would you like to speak to him? And I was like, yes. Yes, I think I would. And so I spoke to him. And he just blew me away with what he knew about user acquisition. You know, it was like a 3D chessswyx [00:36:21]: sort of thing. You know, as much as, as I know about AI. Like ByteDance as in TikTok US. Yes.William [00:36:26]: Not ByteDance as other stuff. Yep. He was interviewing us as we were interviewing him. Right. And so pick up options. Yeah, exactly. And so he was kind of looking at our metrics. And he was like, I saw him get really excited when he said, guys, you've got a million daily active users and you've done no advertising. I said, correct. And he was like, that's unheard of. He's like, I've never heard of anyone doing that. And then he started looking at our metrics. And he was like, if you've got all of this organically, if you start spending money, this is going to be very exciting. I was like, let's give it a go. So then he came in, we've just started ramping up the user acquisition. So that looks like spending, you know, let's say we're spending, we started spending $20,000 a day, it looked very promising than 20,000. Right now we're spending $40,000 a day on user acquisition. That's still only half of what like character AI or talkie may be spending. But from that, it's sort of, we were growing at a rate of maybe say, 2x a year. And that got us growing at a rate of 3x a year. So I'm growing, I'm evolving more and more to like a Silicon Valley style hyper growth, like, you know, you build something decent, and then you canswyx [00:37:33]: slap on a huge... You did the important thing, you did the product first.William [00:37:36]: Of course, but then you can slap on like, like the rocket or the jet engine or something, which is just this cash in, you pour in as much cash, you buy a lot of ads, and your growth is faster.swyx [00:37:48]: Not to, you know, I'm just kind of curious what's working right now versus what surprisinglyWilliam [00:37:52]: doesn't work. Oh, there's a long, long list of surprising stuff that doesn't work. Yeah. The surprising thing, like the most surprising thing, what doesn't work is almost everything doesn't work. That's what's surprising. And I'll give you an example. So like a year and a half ago, I was working at a company, we were super excited by audio. I was like, audio is going to be the next killer feature, we have to get in the app. And I want to be the first. So everything Chai does, I want us to be the first. We may not be the company that's strongest at execution, but we can always be theswyx [00:38:22]: most innovative. Interesting. Right? So we can... You're pretty strong at execution.William [00:38:26]: We're much stronger, we're much stronger. A lot of the reason we're here is because we were first. If we launched today, it'd be so hard to get the traction. Because it's like to get the flywheel, to get the users, to build a product people are excited about. If you're first, people are naturally excited about it. But if you're fifth or 10th, man, you've got to beswyx [00:38:46]: insanely good at execution. So you were first with voice? We were first. We were first. I only knowWilliam [00:38:51]: when character launched voice. They launched it, I think they launched it at least nine months after us. Okay. Okay. But the team worked so hard for it. At the time we did it, latency is a huge problem. Cost is a huge problem. Getting the right quality of the voice is a huge problem. Right? Then there's this user interface and getting the right user experience. Because you don't just want it to start blurting out. Right? You want to kind of activate it. But then you don't have to keep pressing a button every single time. There's a lot that goes into getting a really smooth audio experience. So we went ahead, we invested the three months, we built it all. And then when we did the A-B test, there was like, no change in any of the numbers. And I was like, this can't be right, there must be a bug. And we spent like a week just checking everything, checking again, checking again. And it was like, the users just did not care. And it was something like only 10 or 15% of users even click the button to like, they wanted to engage the audio. And they would only use it for 10 or 15% of the time. So if you do the math, if it's just like something that one in seven people use it for one seventh of their time. You've changed like 2% of the experience. So even if that that 2% of the time is like insanely good, it doesn't translate much when you look at the retention, when you look at the engagement, and when you look at the monetization rates. So audio did not have a big impact. I'm pretty big on audio. But yeah, I like it too. But it's, you know, so a lot of the stuff which I do, I'm a big, you can have a theory. And you resist. Yeah. Exactly, exactly. So I think if you want to make audio work, it has to be a unique, compelling, exciting experience that they can't have anywhere else.swyx [00:40:37]: It could be your models, which just weren't good enough.William [00:40:39]: No, no, no, they were great. Oh, yeah, they were very good. it was like, it was kind of like just the, you know, if you listen to like an audible or Kindle, or something like, you just hear this voice. And it's like, you don't go like, wow, this is this is special, right? It's like a convenience thing. But the idea is that if you can, if Chai is the only platform, like, let's say you have a Mr. Beast, and YouTube is the only platform you can use to make audio work, then you can watch a Mr. Beast video. And it's the most engaging, fun video that you want to watch, you'll go to a YouTube. And so it's like for audio, you can't just put the audio on there. And people go, oh, yeah, it's like 2% better. Or like, 5% of users think it's 20% better, right? It has to be something that the majority of people, for the majority of the experience, go like, wow, this is a big deal. That's the features you need to be shipping. If it's not going to appeal to the majority of people, for the majority of the experience, and it's not a big deal, it's not going to move you. Cool. So you killed it. I don't see it anymore. Yep. So I love this. The longer, it's kind of cheesy, I guess, but the longer I've been working at Chai, and I think the team agrees with this, all the platitudes, at least I thought they were platitudes, that you would get from like the Steve Jobs, which is like, build something insanely great, right? Or be maniacally focused, or, you know, the most important thing is saying no to, not to work on. All of these sort of lessons, they just are like painfully true. They're painfully true. So now I'm just like, everything I say, I'm either quoting Steve Jobs or Zuckerberg. I'm like, guys, move fast and break free.swyx [00:42:10]: You've jumped the Apollo to cool it now.William [00:42:12]: Yeah, it's just so, everything they said is so, so true. The turtle neck. Yeah, yeah, yeah. Everything is so true.swyx [00:42:18]: This last question on my side, and I want to pass this to Alessio, is on just, just multi-modality in general. This actually comes from Justine Moore from A16Z, who's a friend of ours. And a lot of people are trying to do voice image video for AI companions. Yes. You just said voice didn't work. Yep. What would make you revisit?William [00:42:36]: So Steve Jobs, he was very, listen, he was very, very clear on this. There's a habit of engineers who, once they've got some cool technology, they want to find a way to package up the cool technology and sell it to consumers, right? That does not work. So you're free to try and build a startup where you've got your cool tech and you want to find someone to sell it to. That's not what we do at Chai. At Chai, we start with the consumer. What does the consumer want? What is their problem? And how do we solve it? So right now, the number one problems for the users, it's not the audio. That's not the number one problem. It's not the image generation either. That's not their problem either. The number one problem for users in AI is this. All the AI is being generated by middle-aged men in Silicon Valley, right? That's all the content. You're interacting with this AI. You're speaking to it for 90 minutes on average. It's being trained by middle-aged men. The guys out there, they're out there. They're talking to you. They're talking to you. They're like, oh, what should the AI say in this situation, right? What's funny, right? What's cool? What's boring? What's entertaining? That's not the way it should be. The way it should be is that the users should be creating the AI, right? And so the way I speak about it is this. Chai, we have this AI engine in which sits atop a thin layer of UGC. So the thin layer of UGC is absolutely essential, right? It's just prompts. But it's just prompts. It's just an image. It's just a name. It's like we've done 1% of what we could do. So we need to keep thickening up that layer of UGC. It must be the case that the users can train the AI. And if reinforcement learning is powerful and important, they have to be able to do that. And so it's got to be the case that there exists, you know, I say to the team, just as Mr. Beast is able to spend 100 million a year or whatever it is on his production company, and he's got a team building the content, the Mr. Beast company is able to spend 100 million a year on his production company. And he's got a team building the content, which then he shares on the YouTube platform. Until there's a team that's earning 100 million a year or spending 100 million on the content that they're producing for the Chai platform, we're not finished, right? So that's the problem. That's what we're excited to build. And getting too caught up in the tech, I think is a fool's errand. It does not work.Alessio [00:44:52]: As an aside, I saw the Beast Games thing on Amazon Prime. It's not doing well. And I'mswyx [00:44:56]: curious. It's kind of like, I mean, the audience reading is high. The run-to-meet-all sucks, but the audience reading is high.Alessio [00:45:02]: But it's not like in the top 10. I saw it dropped off of like the... Oh, okay. Yeah, that one I don't know. I'm curious, like, you know, it's kind of like similar content, but different platform. And then going back to like, some of what you were saying is like, you know, people come to ChaiWilliam [00:45:13]: expecting some type of content. Yeah, I think it's something that's interesting to discuss is like, is moats. And what is the moat? And so, you know, if you look at a platform like YouTube, the moat, I think is in first is really is in the ecosystem. And the ecosystem, is comprised of you have the content creators, you have the users, the consumers, and then you have the algorithms. And so this, this creates a sort of a flywheel where the algorithms are able to be trained on the users, and the users data, the recommend systems can then feed information to the content creators. So Mr. Beast, he knows which thumbnail does the best. He knows the first 10 seconds of the video has to be this particular way. And so his content is super optimized for the YouTube platform. So that's why it doesn't do well on Amazon. If he wants to do well on Amazon, how many videos has he created on the YouTube platform? By thousands, 10s of 1000s, I guess, he needs to get those iterations in on the Amazon. So at Chai, I think it's all about how can we get the most compelling, rich user generated content, stick that on top of the AI engine, the recommender systems, in such that we get this beautiful data flywheel, more users, better recommendations, more creative, more content, more users.Alessio [00:46:34]: You mentioned the algorithm, you have this idea of the Chaiverse on Chai, and you have your own kind of like LMSYS-like ELO system. Yeah, what are things that your models optimize for, like your users optimize for, and maybe talk about how you build it, how people submit models?William [00:46:49]: So Chaiverse is what I would describe as a developer platform. More often when we're speaking about Chai, we're thinking about the Chai app. And the Chai app is really this product for consumers. And so consumers can come on the Chai app, they can come on the Chai app, they can come on the Chai app, they can interact with our AI, and they can interact with other UGC. And it's really just these kind of bots. And it's a thin layer of UGC. Okay. Our mission is not to just have a very thin layer of UGC. Our mission is to have as much UGC as possible. So we must have, I don't want people at Chai training the AI. I want people, not middle aged men, building AI. I want everyone building the AI, as many people building the AI as possible. Okay, so what we built was we built Chaiverse. And Chaiverse is kind of, it's kind of like a prototype, is the way to think about it. And it started with this, this observation that, well, how many models get submitted into Hugging Face a day? It's hundreds, it's hundreds, right? So there's hundreds of LLMs submitted each day. Now consider that, what does it take to build an LLM? It takes a lot of work, actually. It's like someone devoted several hours of compute, several hours of their time, prepared a data set, launched it, ran it, evaluated it, submitted it, right? So there's a lot of, there's a lot of, there's a lot of work that's going into that. So what we did was we said, well, why can't we host their models for them and serve them to users? And then what would that look like? The first issue is, well, how do you know if a model is good or not? Like, we don't want to serve users the crappy models, right? So what we would do is we would, I love the LMSYS style. I think it's really cool. It's really simple. It's a very intuitive thing, which is you simply present the users with two completions. You can say, look, this is from model one. This is from model two. This is from model three. This is from model A. This is from model B, which is better. And so if someone submits a model to Chaiverse, what we do is we spin up a GPU. We download the model. We're going to now host that model on this GPU. And we're going to start routing traffic to it. And we're going to send, we think it takes about 5,000 completions to get an accurate signal. That's roughly what LMSYS does. And from that, we're able to get an accurate ranking. And we're able to get an accurate ranking. And we're able to get an accurate ranking of which models are people finding entertaining and which models are not entertaining. If you look at the bottom 80%, they'll suck. You can just disregard them. They totally suck. Then when you get the top 20%, you know you've got a decent model, but you can break it down into more nuance. There might be one that's really descriptive. There might be one that's got a lot of personality to it. There might be one that's really illogical. Then the question is, well, what do you do with these top models? From that, you can do more sophisticated things. You can try and do like a routing thing where you say for a given user request, we're going to try and predict which of these end models that users enjoy the most. That turns out to be pretty expensive and not a huge source of like edge or improvement. Something that we love to do at Chai is blending, which is, you know, it's the simplest way to think about it is you're going to end up, and you're going to pretty quickly see you've got one model that's really smart, one model that's really funny. How do you get the user an experience that is both smart and funny? Well, just 50% of the requests, you can serve them the smart model, 50% of the requests, you serve them the funny model. Just a random 50%? Just a random, yeah. And then... That's blending? That's blending. You can do more sophisticated things on top of that, as in all things in life, but the 80-20 solution, if you just do that, you get a pretty powerful effect out of the gate. Random number generator. I think it's like the robustness of randomness. Random is a very powerful optimization technique, and it's a very robust thing. So you can explore a lot of the space very efficiently. There's one thing that's really, really important to share, and this is the most exciting thing for me, is after you do the ranking, you get an ELO score, and you can track a user's first join date, the first date they submit a model to Chaiverse, they almost always get a terrible ELO, right? So let's say the first submission they get an ELO of 1,100 or 1,000 or something, and you can see that they iterate and they iterate and iterate, and it will be like, no improvement, no improvement, no improvement, and then boom. Do you give them any data, or do you have to come up with this themselves? We do, we do, we do, we do. We try and strike a balance between giving them data that's very useful, you've got to be compliant with GDPR, which is like, you have to work very hard to preserve the privacy of users of your app. So we try to give them as much signal as possible, to be helpful. The minimum is we're just going to give you a score, right? That's the minimum. But that alone is people can optimize a score pretty well, because they're able to come up with theories, submit it, does it work? No. A new theory, does it work? No. And then boom, as soon as they figure something out, they keep it, and then they iterate, and then boom,Alessio [00:51:46]: they figure something out, and they keep it. Last year, you had this post on your blog, cross-sourcing the lead to the 10 trillion parameter, AGI, and you call it a mixture of experts, recommenders. Yep. Any insights?William [00:51:58]: Updated thoughts, 12 months later? I think the odds, the timeline for AGI has certainly been pushed out, right? Now, this is in, I'm a controversial person, I don't know, like, I just think... You don't believe in scaling laws, you think AGI is further away. I think it's an S-curve. I think everything's an S-curve. And I think that the models have proven to just be far worse at reasoning than people sort of thought. And I think whenever I hear people talk about LLMs as reasoning engines, I sort of cringe a bit. I don't think that's what they are. I think of them more as like a simulator. I think of them as like a, right? So they get trained to predict the next most likely token. It's like a physics simulation engine. So you get these like games where you can like construct a bridge, and you drop a car down, and then it predicts what should happen. And that's really what LLMs are doing. It's not so much that they're reasoning, it's more that they're just doing the most likely thing. So fundamentally, the ability for people to add in intelligence, I think is very limited. What most people would consider intelligence, I think the AI is not a crowdsourcing problem, right? Now with Wikipedia, Wikipedia crowdsources knowledge. It doesn't crowdsource intelligence. So it's a subtle distinction. AI is fantastic at knowledge. I think it's weak at intelligence. And a lot, it's easy to conflate the two because if you ask it a question and it gives you, you know, if you said, who was the seventh president of the United States, and it gives you the correct answer, I'd say, well, I don't know the answer to that. And you can conflate that with intelligence. But really, that's a question of knowledge. And knowledge is really this thing about saying, how can I store all of this information? And then how can I retrieve something that's relevant? Okay, they're fantastic at that. They're fantastic at storing knowledge and retrieving the relevant knowledge. They're superior to humans in that regard. And so I think we need to come up for a new word. How does one describe AI should contain more knowledge than any individual human? It should be more accessible than any individual human. That's a very powerful thing. That's superswyx [00:54:07]: powerful. But what words do we use to describe that? We had a previous guest on Exa AI that does search. And he tried to coin super knowledge as the opposite of super intelligence.William [00:54:20]: Exactly. I think super knowledge is a more accurate word for it.swyx [00:54:24]: You can store more things than any human can.William [00:54:26]: And you can retrieve it better than any human can as well. And I think it's those two things combined that's special. I think that thing will exist. That thing can be built. And I think you can start with something that's entertaining and fun. And I think, I often think it's like, look, it's going to be a 20 year journey. And we're in like, year four, or it's like the web. And this is like 1998 or something. You know, you've got a long, long way to go before the Amazon.coms are like these huge, multi trillion dollar businesses that every single person uses every day. And so AI today is very simplistic. And it's fundamentally the way we're using it, the flywheels, and this ability for how can everyone contribute to it to really magnify the value that it brings. Right now, like, I think it's a bit sad. It's like, right now you have big labs, I'm going to pick on open AI. And they kind of go to like these human labelers. And they say, we're going to pay you to just label this like subset of questions that we want to get a really high quality data set, then we're going to get like our own computers that are really powerful. And that's kind of like the thing. For me, it's so much like Encyclopedia Britannica. It's like insane. All the people that were interested in blockchain, it's like, well, this is this is what needs to be decentralized, you need to decentralize that thing. Because if you distribute it, people can generate way more data in a distributed fashion, way more, right? You need the incentive. Yeah, of course. Yeah. But I mean, the, the, that's kind of the exciting thing about Wikipedia was it's this understanding, like the incentives, you don't need money to incentivize people. You don't need dog coins. No. Sometimes, sometimes people get the satisfaction fro
Join Prof. Subbarao Kambhampati and host Tim Scarfe for a deep dive into OpenAI's O1 model and the future of AI reasoning systems. * How O1 likely uses reinforcement learning similar to AlphaGo, with hidden reasoning tokens that users pay for but never see * The evolution from traditional Large Language Models to more sophisticated reasoning systems * The concept of "fractal intelligence" in AI - where models work brilliantly sometimes but fail unpredictably * Why O1's improved performance comes with substantial computational costs * The ongoing debate between single-model approaches (OpenAI) vs hybrid systems (Google) * The critical distinction between AI as an intelligence amplifier vs autonomous decision-maker SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** TOC: 1. **O1 Architecture and Reasoning Foundations** [00:00:00] 1.1 Fractal Intelligence and Reasoning Model Limitations [00:04:28] 1.2 LLM Evolution: From Simple Prompting to Advanced Reasoning [00:14:28] 1.3 O1's Architecture and AlphaGo-like Reasoning Approach [00:23:18] 1.4 Empirical Evaluation of O1's Planning Capabilities 2. **Monte Carlo Methods and Model Deep-Dive** [00:29:30] 2.1 Monte Carlo Methods and MARCO-O1 Implementation [00:31:30] 2.2 Reasoning vs. Retrieval in LLM Systems [00:40:40] 2.3 Fractal Intelligence Capabilities and Limitations [00:45:59] 2.4 Mechanistic Interpretability of Model Behavior [00:51:41] 2.5 O1 Response Patterns and Performance Analysis 3. **System Design and Real-World Applications** [00:59:30] 3.1 Evolution from LLMs to Language Reasoning Models [01:06:48] 3.2 Cost-Efficiency Analysis: LLMs vs O1 [01:11:28] 3.3 Autonomous vs Human-in-the-Loop Systems [01:16:01] 3.4 Program Generation and Fine-Tuning Approaches [01:26:08] 3.5 Hybrid Architecture Implementation Strategies Transcript: https://www.dropbox.com/scl/fi/d0ef4ovnfxi0lknirkvft/Subbarao.pdf?rlkey=l3rp29gs4hkut7he8u04mm1df&dl=0 REFS: [00:02:00] Monty Python (1975) Witch trial scene: flawed logical reasoning. https://www.youtube.com/watch?v=zrzMhU_4m-g [00:04:00] Cade Metz (2024) Microsoft–OpenAI partnership evolution and control dynamics. https://www.nytimes.com/2024/10/17/technology/microsoft-openai-partnership-deal.html [00:07:25] Kojima et al. (2022) Zero-shot chain-of-thought prompting ('Let's think step by step'). https://arxiv.org/pdf/2205.11916 [00:12:50] DeepMind Research Team (2023) Multi-bot game solving with external and internal planning. https://deepmind.google/research/publications/139455/ [00:15:10] Silver et al. (2016) AlphaGo's Monte Carlo Tree Search and Q-learning. https://www.nature.com/articles/nature16961 [00:16:30] Kambhampati, S. et al. (2023) Evaluates O1's planning in "Strawberry Fields" benchmarks. https://arxiv.org/pdf/2410.02162 [00:29:30] Alibaba AIDC-AI Team (2023) MARCO-O1: Chain-of-Thought + MCTS for improved reasoning. https://arxiv.org/html/2411.14405 [00:31:30] Kambhampati, S. (2024) Explores LLM "reasoning vs retrieval" debate. https://arxiv.org/html/2403.04121v2 [00:37:35] Wei, J. et al. (2022) Chain-of-thought prompting (introduces last-letter concatenation). https://arxiv.org/pdf/2201.11903 [00:42:35] Barbero, F. et al. (2024) Transformer attention and "information over-squashing." https://arxiv.org/html/2406.04267v2 [00:46:05] Ruis, L. et al. (2023) Influence functions to understand procedural knowledge in LLMs. https://arxiv.org/html/2411.12580v1 (truncated - continued in shownotes/transcript doc)
The episode highlights the recent push by major companies like Microsoft, Google, and LinkedIn to integrate artificial intelligence (AI) tools into their services. Microsoft has relaunched its free AI chat service for businesses, now branded as Microsoft 365 Copilot Chat, which enhances workplace productivity through AI agents for a monthly fee. Google is also making strides by incorporating its Gemini AI experience into its Workspace plans, eliminating additional fees and reflecting a belief that AI will fundamentally transform work processes.LinkedIn has introduced new AI tools aimed at improving the job-seeking experience for its users. The platform's JobsMatch tool will assist job seekers and recruiters alike, while a Recruitment AI agent will help small businesses manage hiring processes more effectively. This shift towards free tools marks a departure from LinkedIn's previous focus on premium offerings, as the company aims to streamline the application process amidst a competitive job market. Sobel notes the challenges faced by job seekers, with a significant number of applications submitted per minute, and highlights the increase in users activating the "Open to Work" feature.The episode also touches on the potential ban of TikTok in the United States, which has led many users to migrate to the Chinese social media app RedNote. This shift has resulted in a notable increase in interest in learning Mandarin Chinese, as users adapt to the changing social media landscape. Sobel emphasizes the unintended consequences of such regulatory actions, suggesting that technology-savvy users are simply moving to another Chinese-run platform, which may complicate the regulatory landscape further.In the latter part of the episode, Host Dave Sobel discusses the evolving perceptions of AI models, particularly focusing on the O1 model and its unique capabilities. He highlights the importance of human involvement in AI development, as research indicates that consumers prefer AI tools that showcase human expertise rather than those that appear overly human-like. The episode concludes with a call to action for listeners to reflect on their own use of AI technologies, the role of managed IT services, and the significance of human contributions to data quality, encouraging a proactive approach to leveraging these advancements in their businesses. Three things to know today 00:00 Microsoft, Google, and LinkedIn Push AI Tools: What It Means for Your Workday 04:44 What the Potential TikTok Ban Says About Government Regulation06:13 Big Tech Ideas: AI Models, IT Services, and Why People Still Matter Supported by: https://mspradio.com/engage/ All our Sponsors: https://businessof.tech/sponsors/ Do you want the show on your podcast app or the written versions of the stories? Subscribe to the Business of Tech: https://www.businessof.tech/subscribe/Looking for a link from the stories? The entire script of the show, with links to articles, are posted in each story on https://www.businessof.tech/ Support the show on Patreon: https://patreon.com/mspradio/ Want to be a guest on Business of Tech: Daily 10-Minute IT Services Insights? Send Dave Sobel a message on PodMatch, here: https://www.podmatch.com/hostdetailpreview/businessoftech Want our stuff? Cool Merch? Wear “Why Do We Care?” - Visit https://mspradio.myspreadshop.com Follow us on:LinkedIn: https://www.linkedin.com/company/28908079/YouTube: https://youtube.com/mspradio/Facebook: https://www.facebook.com/mspradionews/Instagram: https://www.instagram.com/mspradio/TikTok: https://www.tiktok.com/@businessoftechBluesky: https://bsky.app/profile/businessof.tech
Host Dave Sobel discusses significant cybersecurity developments involving the U.S. Treasury Department and its recent breach linked to Chinese hackers. The breach, which was discovered on December 8, 2024, involved unauthorized access to unclassified documents within the Office of Foreign Assets Control, raising alarms about the potential exposure of sensitive information related to economic sanctions. The episode highlights the ongoing investigations and the U.S. government's response, including sanctions imposed on a Chinese cybersecurity firm involved in the Flax Typhoon cyber attacks that compromised numerous internet-connected devices globally.Sobel also addresses the national security concerns surrounding TP-Link internet routers, which hold a dominant market share in the U.S. The Commerce, Defense, and Justice Departments are investigating the company due to its alleged ties to Chinese cyber threats and its failure to rectify security vulnerabilities. The episode emphasizes the importance of securing cloud systems, as CISA has mandated federal agencies to conduct security assessments in light of recent breaches attributed to foreign hackers. This directive aims to enhance the security posture of federal cloud environments and protect sensitive information.The discussion shifts to the leadership transition at PIA, where CEO Jerwai Todd has stepped down after a year, passing the reins to an executive group. Sobel reflects on the challenges of dual CEO roles and the importance of operational stability during this transition. He notes Todd's contributions to the company, including the launch of an AI-driven help desk ticketing system, and emphasizes the need for a capable leader to navigate the competitive landscape of help desk automation.Finally, the episode covers OpenAI's recent announcements regarding its new reasoning models, O1 and O3, which aim to enhance AI capabilities and approach artificial general intelligence. Sobel discusses the implications of OpenAI's shift towards a for-profit model and the potential impact on the development of AI technologies. He highlights the need for practical applications of these advancements and the importance of addressing concerns about the ethical implications of AI development. The episode concludes with a reminder of the significance of these developments in the broader context of technology and national security. Three things to know today 00:00 From Treasury Hacks to Router Risks: The U.S. Grapples with China's Cyber Onslaught06:31 Dual CEO Role Dilemmas: Gerwai Todd Passes the Torch at Pia08:51 AI Gets a Power Boost: OpenAI's Big Plans, Bigger Models, and a Push for Profits All our Sponsors: https://businessof.tech/sponsors/ Do you want the show on your podcast app or the written versions of the stories? Subscribe to the Business of Tech: https://www.businessof.tech/subscribe/Looking for a link from the stories? The entire script of the show, with links to articles, are posted in each story on https://www.businessof.tech/ Support the show on Patreon: https://patreon.com/mspradio/ Want to be a guest on Business of Tech: Daily 10-Minute IT Services Insights? Send Dave Sobel a message on PodMatch, here: https://www.podmatch.com/hostdetailpreview/businessoftech Want our stuff? Cool Merch? Wear “Why Do We Care?” - Visit https://mspradio.myspreadshop.com Follow us on:LinkedIn: https://www.linkedin.com/company/28908079/YouTube: https://youtube.com/mspradio/Facebook: https://www.facebook.com/mspradionews/Instagram: https://www.instagram.com/mspradio/TikTok: https://www.tiktok.com/@businessoftechBluesky: https://bsky.app/profile/businessof.tech
On today's show we are taking a look at a major step forward in generative AI. The latest release is O3, which is the premier model, and then O3 Mini, which is faster and more efficient and costs less money. How good is it? What's the difference? The first release O1 was announced three months ago. Now three months later, they're already doing the next one, which is O3. Technically they couldn't name it O2. Sam Altman said it should have been named O2, but there's a company called O2, a telecom company in the UK. So they didn't want to infringe on the trademark. So, they're naming it O3, but it is essentially the next model. It's the next version after O1. A recent interview last week with Satya Nadella, the CEO of Microsoft highlights where AI is heading when it comes to revolutionizing software development and software applications as we know them. --------------- **Real Estate Espresso Podcast:** Spotify: [The Real Estate Espresso Podcast](https://open.spotify.com/show/3GvtwRmTq4r3es8cbw8jW0?si=c75ea506a6694ef1) iTunes: [The Real Estate Espresso Podcast](https://podcasts.apple.com/ca/podcast/the-real-estate-espresso-podcast/id1340482613) Website: [www.victorjm.com](http://www.victorjm.com) LinkedIn: [Victor Menasce](http://www.linkedin.com/in/vmenasce) YouTube: [The Real Estate Espresso Podcast](http://www.youtube.com/@victorjmenasce6734) Facebook: [www.facebook.com/realestateespresso](http://www.facebook.com/realestateespresso) Email: [podcast@victorjm.com](mailto:podcast@victorjm.com) **Y Street Capital:** Website: [www.ystreetcapital.com](http://www.ystreetcapital.com) Facebook: [www.facebook.com/YStreetCapital](https://www.facebook.com/YStreetCapital) Instagram: [@ystreetcapital](http://www.instagram.com/ystreetcapital)