POPULARITY
Categories
The AI Breakdown: Daily Artificial Intelligence News and Discussions
January marked a clear break between the AI era people thought they were in and the one that actually arrived. Agentic coding crossed from novelty to default, tools like Claude Code reset expectations for what individuals can build, and systems such as OpenClaw and Moltbook showed how quickly agents are becoming ecosystems, not just features. This episode explains why the shift felt sudden, why it caught so many off guard, and why the real story isn't sentient agents but a widening gap between AI capability and real-world adoption. In the headlines: Nvidia and OpenAI, Intel's GPU pivot, Apple's embrace of agentic coding, developer dependence on Claude Code, and Disney's strategic turn toward experiences.Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsRackspace AI Launchpad - Build, test and scale intelligent workloads faster - http://rackspace.com/ailaunchpadZencoder - From vibe coding to AI-first engineering - http://zencoder.ai/zenflowOptimizely Opal - The agent orchestration platform build for marketers - https://www.optimizely.com/theaidailybriefAssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefSection - Build an AI workforce at scale - https://www.sectionai.com/LandfallIP - AI to Navigate the Patent Process - https://landfallip.com/Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai
Timestamps: 0:00 feeeeeeling hot hot hot! 0:15 Adobe Animate discontinued, then not 2:14 Intel hires GPU veteran... 3:52 Moltbook (last time) + Rent-a-human site 7:22 QUICK BITS INTRO 7:33 Copilot in File Explorer 8:08 France raids X offices, Spain social ban 9:11 MORE Ryzen CPUs fried in ASRock mobos 10:07 AMD adopting Intel's 'FRED' 10:54 GitHub's plan to deal with vibe coding slop NEWS SOURCES: https://lmg.gg/C94rA Learn more about your ad choices. Visit megaphone.fm/adchoices
Apple shatters revenue records, Tim Cook teases new innovations coming this year, Walmart hits $1T market cap, everyone's still pouring money into AI, and OpenClaw's “skills” have serious security concerns.Stephen's Newsletter SignupAd-Free + Bonus EpisodesShow Notes via EmailWatch on YouTube!Join the CommunityEmail Us: podcast@primarytech.fm@stephenrobles on Threads@jasonaten on Threads————————SponsorsShopify: Sign up for your one-dollar-per-month trial and start selling today at: shopify.com/primaryQuo: Try QUO for free PLUS get 20% off your first 6 months when you go to Quo.com/primary————————Links from the showMac Power Users - RelayApple announces all-time record in revenue, iPhone sales – Six ColorsWhile Everyone Else Tries to Replace the iPhone, Apple Just Had Its Best Quarter EverNew Mac configurator may point to separate CPU and GPU options - 9to5MacTim Cook hints at ‘never been seen' innovations coming this year - 9to5MacMeta (META) Q4 2025 earnings185 Billion Reasons Google Isn't Worried AI Will Kill SearchGoogle's subscriptions rise in Q4 as YouTube pulls $60B in yearly revenue | TechCrunchIt Took 64 Years to Build Walmart. It Took 3 Years to Turn It Into a $1 Trillion Tech CompanyXcode moves into agentic coding with deeper OpenAI and Anthropic integrations | TechCrunchOpenClaw's AI ‘skill' extensions are a security nightmare | The VergeHumans are infiltrating the social network for AI bots | The VergeAnthropic's 'Dishonest' Ads Clearly Struck a Nerve With Sam AltmanExpect more upsells and subscription bundles from Apple, Creator Studio was just the start - 9to5MacNow anyone can tap Ring doorbells to search for lost dogs | The VergeAirTag 2 Has Wild Range! #tech #airtag - YouTubeGoogle announces Pixel 10a with completely flat cameraAlexa Plus is now available to everyone in the US | The VergeApple Sports for iPhone updated with PGA, LPGA, and more - 9to5MacThe SpaceX-xAI Merger Isn't About Data Centers in Space. It's About Bailing Out Musk's Biggest GambleShortcuts Team Lead HiringGemini Mac App Tweet ★ Support this podcast ★
Fidelity announces its FIDD stablecoin. Robinhood plans 24/7 tokenized stock trading. The EF PSE team shares a client-side GPU acceleration roadmap. And Uniswap adds CCAs on its web app. Read more: https://ethdaily.io/871 Sponsor: Arkiv is an Ethereum-aligned data layer for Web3. Arkiv brings the familiar concept of a traditional Web2 database into the Web3 ecosystem. Find out more at Arkiv.network Content is for informational purposes only, not endorsement or investment advice. The accuracy of information is not guaranteed.
On this episode of That Tech Pod, we talk with Logan Lawler, Senior Director at Dell Technologies, about what it takes to make AI actually work in the real world. Logan shares his 16-year journey at Dell and why his focus today is less on hype and more on practical infrastructure choices that enable AI at scale.We break down Edge AI versus Cloud AI with clear, concrete examples, including how GPU-accelerated desktops, workstations, and hybrid cloud setups can turn “that's impossible” AI problems into manageable ones. Logan also highlights why storage, not compute, is often the biggest bottleneck, and the common mistakes organizations make when data can't keep up with GPUs. The conversation gets into energy and sustainability, from the environmental cost of massive data centers to what it means when nuclear power and AI collide. We also explore the human side of AI: whether instant answers are making us lazier, why struggle is still essential for learning, and how that idea shows up in parenting, education, and work. We close with real-world edge AI success stories, a few cautionary tales, and some lighter moments, making this a grounded discussion on AI, infrastructure, and the tradeoffs we rarely talk about.Logan Lawler works at Dell Technologies, where he leads strategy for Dell Pro Precision AI Solutions. Over his 16-year career at Dell, he's worked across sales, marketing, and e-commerce, and now helps enterprises and creative studios leverage high-performance AI workstations and hybrid cloud infrastructure. A frequent speaker and media guest, Logan explains how GPU-accelerated PCs and storage solutions are transforming industries from film and animation to healthcare research. Logan was raised in Missouri and is a graduate of the University of Missouri. He now lives in Texas with his family.
Adi Polak talks to Bryan Oliver (Thoughtworks) about his career in platform engineering and large-scale AI infrastructure. Bryan's first job: building pools and teaching swimming lessons. His challenge: running large-scale GPU data centers while keeping AI workloads predictable and reliable.SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo
Этот выпуск – луч надежды для отчаявшихся геймеров, смирившихся, что все видеокарты сметут ИИ-корпорации. Говорим про чипы, на которых обучение и инференс работают кратно быстрее, чем на GPU. В чем секрет, и чего ожидать в будущем – обсуждаем с Зигфридом Звездиным из Cerebras! Также ждем вас, ваши лайки, репосты и комменты в мессенджерах и соцсетях! Telegram-чат: https://t.me/podlodka Telegram-канал: https://t.me/podlodkanews Страница в Facebook: www.facebook.com/podlodkacast/ Twitter-аккаунт: https://twitter.com/PodcastPodlodka Ведущие в выпуске: Женя Кателла, Егор Толстой Полезные ссылки: Telegram-канал гостя https://t.me/zzigfrid Telegram гостя https://t.me/ziggerzz LinkedIn гостя https://www.linkedin.com/in/zigfrid/
Your host, Sebastian Hassinger, talks with Alumni Ventures managing partner Chris Sklarin about how one of the most active US venture firms is building a quantum portfolio while “democratizing” access to VC as an asset class for individual investors. They dig into Alumni Ventures' co‑investor model, how the firm thinks about quantum hardware, software, and sensing, and why quantum should be viewed as a long‑term platform with near‑term pockets of commercial value. Chris also explains how accredited investors can start seeing quantum deal flow through Alumni Ventures' syndicate.Chris' background and Alumni Ventures in a nutshellChris is an MIT‑trained engineer who spent years in software startups before moving into venture more than 20 years ago.Alumni Ventures is a roughly decade‑old firm focused on “democratizing venture capital” for individual investors, with over 11,000 LPs, more than 1.5 billion dollars raised, and about 1,300 active portfolio companies.The firm has been repeatedly recognized as a highly active VC by CB Insights, PitchBook, Stanford GSB, and Time magazine.How Alumni Ventures structures access for individualsMost investors come in as individuals into LLC‑structured funds rather than traditional GP/LP funds.Alumni Ventures always co‑invests alongside a lead VC, using the lead's conviction, sector expertise, and diligence as a key signal.The platform also offers a syndicate where accredited investors can opt in to see and back individual deals, including those tagged for quantum.Quantum in the Alumni Ventures portfolioAlumni Ventures has 5–6 quantum‑related investments spanning hardware, software, and applications, including Rigetti, Atom Computing, Q‑CTRL, Classiq, and quantum‑error‑mitigation startup Qedma/Cadmus.Rigetti was one of the firm's earliest quantum investments; the team followed on across multiple rounds and was able to return capital to investors after Rigetti's SPAC and a strong period in the public markets.Chris also highlights interest in Cycle Dre (a new company from Rigetti's former CTO) and application‑layer companies like InQ and quantum sensing players.Barbell funding and the “3–5 year” viewChris responds to the now‑familiar “barbell” funding picture in quantum— a few heavily funded players and a long tail of small companies—by emphasizing near‑term revenue over pure science experiments.He sees quantum entering an era where companies must show real products, customers, and revenue, not just qubit counts.Over the next 3–5 years, he expects meaningful commercial traction first in areas like quantum sensing, navigation, and point solutions in chemistry and materials, with full‑blown fault‑tolerant systems further out.Hybrid compute and NVIDIA's signal to the marketChris points to Jensen Huang's GTC 2025 keynote slide on NVIDIA's hybrid quantum–GPU ecosystem, where Alumni Ventures portfolio companies such as Atom Computing, Classiq, and Rigetti appeared.He notes that NVIDIA will not put “science projects” on that slide—those partnerships reflect a view that quantum processors will sit tightly coupled next to GPUs to handle specific workloads.He also mentions a large commercial deal between NVIDIA and Groq (a classical AI chip company in his portfolio) as another sign of a more heterogeneous compute future that quantum will plug into.Where near‑term quantum revenue shows upChris expects early commercial wins in sensing, GPS‑denied navigation, and other narrow but valuable applications before broad “quantum advantage” in general‑purpose computing.Software and middleware players can generate revenue sooner by making today's hardware more stable, more efficient, or easier to program, and by integrating into classical and AI workflows.He stresses that investors love clear revenue paths that fit into the 10‑year life of a typical venture fund.University spin‑outs, clustering, and deal flowAlumni Ventures certainly sees clustering around strong quantum schools like MIT, Harvard, and Yale, but Chris emphasizes that the “alumni angle” is secondary to the quality of the venture deal.Mature tech‑transfer offices and standard Delaware C‑corps mean spinning out quantum IP from universities is now a well‑trodden path.Chris leans heavily on network effects—Alumni Ventures' 800,000‑person network and 1,300‑company CEO base—as a key channel for discovering the most interesting quantum startups.Managing risk in a 100‑hardware‑company worldWith dozens of hardware approaches now in play, Chris uses Alumni Ventures' co‑investor model and lead‑investor diligence as a filter rather than picking purely on physics bets.He looks for teams with credible near‑term commercial pathways and for mechanisms like sensing or middleware that can create value even if fault‑tolerant systems arrive later than hoped.He compares quantum to past enabling waves like nanotech, where the biggest impact often shows up as incremental improvements rather than a single “big bang” moment.Democratizing access to quantum ventureAlumni Ventures allows accredited investors to join its free syndicate, self‑attest accreditation, and then see deal materials—watermarked and under NDA—for individual investments, including quantum.Chris encourages people to think in terms of diversified funds (20–30 deals per fund year) rather than only picking single names in what is a power‑law asset class.He frames quantum as a long‑duration infrastructure play with near‑term pockets of usefulness, where venture can help investors participate in the upside without getting ahead of reality.
欢迎收听雪球出品的财经有深度,雪球,国内领先的集投资交流交易一体的综合财富管理平台,聪明的投资者都在这里。今天分享的内容叫芯原股份的电话会,看核心竞争力,来自围棋投研。关于高科技股的讨论,往往会发生两种情况:有些投资者盯着财报上的亏损数字,说公司都活不下去了,更加没法估值;另一些投资者却说高科技要看宏大叙事和星辰大海。那应该怎么去看呢? 周末,芯原股份正好开了一场电话会议,参加的分析师和基金经理比较多,公司非常重视,不仅是董秘或者证代在那里介绍,相关的核心领导也都来了。我们团队内部学习后做了些讨论,也把相关内容分享给球友们。先说结论,对于高科技股,特别是还没有盈利公司的价值,我们可能需要审视产业最底层的逻辑:订单、技术底座与供应链位置。老规矩,先看一下公司二零二五年的业绩预告。芯原股份去年营收增长的还不错,同比增长百分之三十六,但是净利润依然亏损,全年亏损了四点五亿元,二零二四年亏损六亿元。上周五最新的总市值接近一千一百亿元。为什么连续亏损的公司会给到那么高的市值呢?市值不会撒谎,但市值需要被正确解读。首先我们要打破一个常见的迷思,即单纯看当期营收或者净利润来判断一家芯片设计公司的价值。在长周期的半导体行业,营收往往是滞后指标,而订单才是先行指标。这份业绩预告里面,最引人注目的数据并非芯原当前的营收,而是预示未来的订单量。可以看到,二零二五年,芯原新签订单总额达到五十九点六亿元,同比增幅超过百分之百。如果我们将这个数据放在显微镜下剖析,会发现其含金量极高。这一增长并非来自低端市场的红海竞争,比如那些早已杀成一片血海的M C U或低端电源管理芯片,而是踩中了A I爆发的红利。在近六十亿元的新签订单中,A I算力相关订单占比超过百分之七十三。这是一个非常恐怖的比例。它意味着芯原已经实质性地完成了客户结构的换血,从传统的消费电子主导转变为A I算力主导。特别是在第四季度,单季新签订单高达二十七点一亿元,环比增长百分之七十点二,显示出加速爆发的态势。这种J型曲线的增长斜率,说明市场对高性能算力芯片的需求,已经从概念验证阶段进入了大规模量产的军备竞赛阶段。同时,第四季度的订单增长直接推高了公司的蓄水池,截至二零二五年末,芯原在手订单达到历史高位的五十点七五亿元,连续九个季度维持高位。在手订单的质量就决定了未来业绩的确定性,更关键的是,这些订单的转化效率极高。根据行业惯例和公司披露的信息,预计超过百分之八十将在一年内转化为确定的收入。这意味着,芯原二零二六年的业绩增长已具备极高的确定性。为什么芯原能承接如此大规模的订单?这要归功于其独特的商业模式。很多人对芯片行业的理解还停留在英特尔式的I D M模式即设计制造一体化或者无晶圆厂模式即只做芯片设计以及销售的。但芯原代表了第三种进化路径,即轻设计模式,业内称为芯片设计平台即服务。为了让大家更直观地理解这种工业分工,我们可以将芯片设计比作盖房子。传统的无晶圆厂模式公司就像是自己买地、自己设计、自己找施工队的开发商,而芯原的逻辑非常清晰,它提供的是模块化服务和总包服务。具体来看,芯原的业务主要分为两块。第一块是I P授权,占营收的三分之一左右。这如同提供厨房、卫生间、承重墙等模块化设计。在芯片设计中,像G P U、N P U、I S P这些核心功能模块,如果每家公司都从零开始写代码,不仅效率低下,而且容易触碰专利雷区。芯原拥有丰富的I P储备,客户可以直接购买这些模块,就像买预制件一样。这部分业务的毛利极高,通常在百分之九十左右,是典型的技术寻租逻辑。第二块是一站式芯片定制,占营收的三分之二。这如同帮客户从头到尾盖房子并完成软装。虽然这部分业务的毛利较低,约为百分之二十,但规模效应显著。客户只需要提出需求,芯原负责从设计到流片再到封装测试的全流程管理。这种模式解决了当前半导体行业最大的痛点:随着制程演进,芯片研发费用飙升。在五纳米、三纳米时代,一颗芯片的研发成本动辄数亿美元,通常占到营收的百分之二十五到百分之三十。对于微软、谷歌、亚马逊这些云服务厂商以及新兴的G P U公司来说,自建一支几千人的芯片设计团队,不仅成本高昂,而且管理风险巨大。芯原通过规模化服务,帮助这些巨头降低了自建团队的高昂成本和风险。这就是为什么我们说芯原是半导体行业的卖水人。无论前线的英伟达、A M D还是国内的算力新势力谁输谁赢,只要他们需要造芯片,就需要芯原这样的基础设施建设者。这种商业模式的本质,是工业分工深化的产物,是半导体产业成熟的标志。长期以来,资本市场对芯原最大的诟病在于其盈利能力。高额的研发投入一直是压制芯原盈利的主要因素。但在二零二五年,这一逻辑发生了质的变化。尽管公司全年仍亏损四点四九亿元,但亏损规模已同比收窄百分之二十五。核心原因在于规模效应带来的效率提升。二零二五年,公司研发投入占比从过往的高位下降了近十一个百分点,降至百分之四十三。这里需要特别辟谣一种观点,即认为研发占比下降意味着公司在削减技术投入。事实恰恰相反,这并非削减了研发,而是人员复用率的大幅提升。在工业生产中,这叫做边际成本递减。芯原拥有一支庞大的研发梯队,其中百分之八十八为硕士以上学历。在项目数量较少时,这些高薪工程师的成本会被分摊到有限的项目上,导致单项目成本极高。但随着二零二五年订单翻倍,项目数量激增,此前培养的研发梯队已能成熟地在多个项目间流转。一个工程师在做完A项目的N P U模块后,可以迅速投入到B项目的同类模块设计中,这种知识和经验的复用,极大地摊薄了研发成本。营收同比增长百分之三十六,而研发成本占比下降,这种剪刀差意味着芯原正加速逼近盈利临界点。这符合所有平台型公司的成长曲线:前期投入巨大的固定成本建设基础设施,一旦跨过临界规模,利润就会像打开水龙头一样涌出。当然,商业模式的成功必须建立在硬核技术的基础上。如果没有金刚钻,揽不了瓷器活。在量产业务中,芯原展现了顶级的技术统治力。目前,公司先进工艺即五纳米及更优的营收占比已达百分之七十四。这是一个非常硬核的数据。在半导体领域,制程越先进,物理极限的挑战就越大,设计难度呈指数级上升。能做二十八纳米设计的公司在中国有一大把,但能做五纳米设计的公司屈指可数。公司领导在电话会议里面介绍,去年有个标志性的案例是某款五纳米车规级大芯片。请注意这里的三个关键词:五纳米、车规级、大芯片。该芯片面积达四百平方毫米。在芯片设计领域,面积越大,良率控制越难,光刻掩膜的拼接和时序收敛的难度就越高。更何况这是车规级芯片,对可靠性、抗干扰能力的要求远超普通消费级芯片。芯原不仅实现了一次流片成功,且无需改版直接量产。在行业内,一次流片成功是衡量设计能力的最高标准。因为五纳米的一次流片费用高达数千万美元,一旦失败,不仅是金钱的损失,更是市场窗口期的错失。在行业内,除特斯拉外,鲜有厂商具备此类能力。特斯拉的F S D芯片是自研的,而芯原帮助客户做到了同等级别的产品。其技术实力已可比肩英伟达下一代产品。这证明了芯原不仅是简单的代工设计,更是拥有核心技术壁垒的行业巨头。这种降维打击的能力,构成了芯原最深的护城河。展望二零二六年及更远,芯原的增长逻辑将从云端算力向边缘A I延伸。目前的A I热潮主要集中在云端训练芯片,也就是英伟达GPU统治的领域。但从产业演进的规律来看,A I的最终落地一定是在端侧。手机、汽车、眼镜、机器人,这些终端设备需要具备本地推理能力,而不是时刻依赖云端。除了稳固的数据中心业务,芯原已在端侧AI深度布局。通过与谷歌合作开源Coral M P,芯原切入了A I眼镜、A R玩具等低功耗可穿戴设备市场。这是一个潜力巨大的蓝海。虽然目前AI眼镜等新物种仍处于一次性工程费用向大规模量产转化的前夜,但凭借在低功耗和图形处理I P上的积累,芯原占据了极其有利的生态位。一旦终端形态确立,比如苹果或Meta推出了杀手级的A R眼镜,整个产业链将迎来爆发。届时,芯原在低功耗N P U I P上的技术积累,将成为继算力芯片后的第二增长曲线。这是一种典型的全产业链布局思维,左手抓云端高性能计算,右手抓端侧低功耗推理。无论AI技术如何迭代,无论算力重心如何摆动,芯原都能在其中找到自己的位置。总结而言,二零二五年的芯原正处于J型曲线的底部回升期。在手订单的积压、A I算力需求的持续井喷以及研发效率的边际提升,共同构成了其未来盈利改善的坚实基础。对于投资者来说,芯原给我们提供了一个极佳的观察样本。它告诉我们,中国半导体的崛起不仅仅是中芯国际这样的制造厂在扩产,也不仅仅是华为海思这样的设计巨头在突围,更包括芯原这样在底层提供设计服务和IP核的基础设施建设者。我们不需要过度的情绪煽动,也不需要虚幻的民族自豪感。我们需要的是像芯原这样,扎扎实实地攻克5纳米设计难关,用一个个具体的订单、一项项精准的数据、一次次成功的流片,来构建中国半导体产业的钢铁脊梁。历史的经验无数次证明,工业化进程是一场漫长的马拉松,没有捷径可走。但只要我们掌握了核心技术,打通了产业链条,建立了高效的商业模式,那么无论外部环境如何变化,中国制造的升级之路都将不可阻挡。最后,落实到投资维度,就借用团队里小伙伴的一句话:芯原有自己的流片渠道,转型做得也非常好,但估值是已经涨得看不懂了。
Gonka AI is a decentralized network that provides efficient AI computing power by leveraging global GPU resources for tasks like model training and inference. It challenges centralized providers like AWS and Google by using a novel "Proof of Work 2.0" mechanism, where nearly all compute goes to productive AI workloads rather than blockchain security. Guest: David Liberman and Daniil Liberman- Co-founders~This episode is sponsored by Gonka~Website: https://gonka.ai/X: https://x.com/gonka_aiDiscord: https://discord.gg/REcpeYc7P7GitHub: https://github.com/gonka-ai/gonka/pulls00:00 Intro01:00 Gonka's mission02:30 How can a platform use Gonka?03:45 Network capacity surge05:45 Scale growth in 18months08:45 Bitcoin of A.I.?11:30 Why decentralization?15:00 Security risks18:30 Value for token holders22:30 AI agents integration25:30 Gonka use cases28:00 Outro#Crypto #AI #cryptocurrency ~Decentralized A.I. At Warp Speed
On this episode of The GAP Luke Lawrie and Joab Gilroy return for another year of talking about video games. The Games they've been playing this week include Dispatch, Death Stranding 2, Hades II, Dying Light: The Beast, Nonolith, Hytale, Cubic Odyssey, The Séance of Blake Manor, Quarantine Zone, Patient Zero: Pandemic Simulator, and Magic: The Gathering – Lorwyn Eclipsed. Over in the news Meta lays off over 1,000 staff and shuts three VR studios, a new expansion is reportedly in development for The Witcher 3, GPU prices may spike due to hardware shortages, and The Division director Julian Gerighty departs Massive for DICE. This episode goes for 2 hours and 25 minutes, it also contains coarse language. Timestamps – 00:00:00 – Start 00:10:09 – Magic: The Gathering – Lorwyn Eclipsed 00:15:29 – Patient Zero: Pandemic Simulator 00:22:20 – Quarantine Zone 00:29:46 – The Séance of Blake Manor 00:36:26 – Cubic Odyssey 00:40:44 – Hytale 00:44:39 – Nonolith 00:46:18 – Dying Light: The Beast 00:58:44 – Hades II 01:11:43 – Death Stranding 2 01:24:35 – Dispatch 01:37:39 – News 01:59:17 – Questions 02:14:39 – Weekly Plugs 02:19:36 – End of Show Subscribe in a reader iTunes / Spotify
Will AGI happen soon - or are we running into a wall?In this episode, I'm joined by Tim Dettmers (Assistant Professor at CMU; Research Scientist at the Allen Institute for AI) and Dan Fu (Assistant Professor at UC San Diego; VP of Kernels at Together AI) to unpack two opposing frameworks from their essays: “Why AGI Will Not Happen” versus “Yes, AGI Will Happen.” Tim argues progress is constrained by physical realities like memory movement and the von Neumann bottleneck; Dan argues we're still leaving massive performance on the table through utilization, kernels, and systems—and that today's models are lagging indicators of the newest hardware and clusters.Then we get practical: agents and the “software singularity.” Dan says agents have already crossed a threshold even for “final boss” work like writing GPU kernels. Tim's message is blunt: use agents or be left behind. Both emphasize that the leverage comes from how you use them—Dan compares it to managing interns: clear context, task decomposition, and domain judgment, not blind trust.We close with what to watch in 2026: hardware diversification, the shift toward efficient, specialized small models, and architecture evolution beyond classic Transformers—including state-space approaches already showing up in real systems.Sources:Why AGI Will Not Happen - https://timdettmers.com/2025/12/10/why-agi-will-not-happen/Use Agents or Be Left Behind? A Personal Guide to Automating Your Own Work - https://timdettmers.com/2026/01/13/use-agents-or-be-left-behind/Yes, AGI Can Happen – A Computational Perspective - https://danfu.org/notes/agi/The Allen Institute for Artificial IntelligenceWebsite - https://allenai.orgX/Twitter - https://x.com/allen_aiTogether AIWebsite - https://www.together.aiX/Twitter - https://x.com/togethercomputeTim DettmersBlog - https://timdettmers.comLinkedIn - https://www.linkedin.com/in/timdettmers/X/Twitter - https://x.com/Tim_DettmersDan FuBlog - https://danfu.orgLinkedIn - https://www.linkedin.com/in/danfu09/X/Twitter - https://x.com/realDanFuFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)Blog - https://mattturck.comLinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) - Intro(01:06) – Two essays, two frameworks on AGI(01:34) – Tim's background: quantization, QLoRA, efficient deep learning(02:25) – Dan's background: FlashAttention, kernels, alternative architectures(03:38) – Defining AGI: what does it mean in practice?(08:20) – Tim's case: computation is physical, diminishing returns, memory movement(11:29) – “GPUs won't improve meaningfully”: the core claim and why(16:16) – Dan's response: utilization headroom (MFU) + “models are lagging indicators”(22:50) – Pre-training vs post-training (and why product feedback matters)(25:30) – Convergence: usefulness + diffusion (where impact actually comes from)(29:50) – Multi-hardware future: NVIDIA, AMD, TPUs, Cerebras, inference chips(32:16) – Agents: did the “switch flip” yet?(33:19) – Dan: agents crossed the threshold (kernels as the “final boss”)(34:51) – Tim: “use agents or be left behind” + beyond coding(36:58) – “90% of code and text should be written by agents” (how to do it responsibly)(39:11) – Practical automation for non-coders: what to build and how to start(43:52) – Dan: managing agents like junior teammates (tools, guardrails, leverage)(48:14) – Education and training: learning in an agent world(52:44) – What Tim is building next (open-source coding agent; private repo specialization)(54:44) – What Dan is building next (inference efficiency, cost, performance)(55:58) – Mega-kernels + Together Atlas (speculative decoding + adaptive speedups)(58:19) – Predictions for 2026: small models, open-source, hardware, modalities(1:02:02) – Beyond transformers: state-space and architecture diversity(1:03:34) – Wrap
Episode 144Happy New Year! This is one of my favorite episodes of the year — for the fourth time, Nathan Benaich and I did our yearly roundup of AI news and advancements, including selections from this year's State of AI Report.If you've stuck around and continue to listen, I'm really thankful you're here. I love hearing from you.You can find Nathan and Air Street Press here on Substack and on Twitter, LinkedIn, and his personal site. Check out his writing at press.airstreet.com.Find me on Twitter (or LinkedIn if you want…) for updates on new episodes, and reach me at editor@thegradient.pub for feedback, ideas, guest suggestions.Outline* (00:00) Intro* (00:44) Air Street Capital and Nathan world* Nathan's path from cancer research and bioinformatics to AI investing* The “evergreen thesis” of AI from niche to ubiquitous* Portfolio highlights: Eleven Labs, Synthesia, Crusoe* (03:44) Geographic flexibility: Europe vs. the US* Why SF isn't always the best place for original decisions* Industry diversity in New York vs. San Francisco* The Munich Security Conference and Europe's defense pivot* Playing macro games from a European vantage point* (07:55) VC investment styles and the “solo GP” approach* Taste as the determinant of investments* SF as a momentum game with small information asymmetry* Portfolio diversity: defense (Delian), embodied AI (Syriact), protein engineering* Finding entrepreneurs who “can't do anything else”* (10:44) State of AI progress in 2025* Momentous progress in writing, research, computer use, image, and video* We're in the “instruction manual” phase* The scale of investment: private markets, public markets, and nation states* (13:21) Range of outcomes and what “going bad” looks like* Today's systems are genuinely useful—worst case is a valuation problem* Financialization of AI buildouts and GPUs* (14:55) DeepSeek and China closing the capability gap* Seven-month lag analysis (Epoch AI)* Benchmark skepticism and consumer preferences (”Coca-Cola vs. Pepsi”)* Hedonic adaptation: humans reset expectations extremely quickly* Bifurcation of model companies toward specific product bets* (18:29) Export controls and the “evolutionary pressure” argument* Selective pressure breeds innovation* Chinese companies rushing to public markets (Minimax, ZAI)* (21:30) Reasoning models and test-time compute* Chain of thought faithfulness questions* Monitorability tax: does observability reduce quality?* User confusion about when models should “think”* AI for science: literature agents, hypothesis generation* (23:53) Chain of thought interpretability and safety* Anthropomorphization concerns* Alignment faking and self-preservation behaviors* Cybersecurity as a bigger risk than existential risk* Models as payloads injected into critical systems* (27:26) Commercial traction and AI adoption data* Ramp data: 44% of US businesses paying for AI (up from 5% in early 2023)* Average contract values up to $530K from $39K* State of AI survey: 92% report productivity gains* The “slow takeoff” consensus and human inertia* Use cases: meeting notes, content generation, brainstorming, coding, financial analysis* (32:53) The industrial era of AI* Stargate and XAI data centers* Energy infrastructure: gas turbines and grid investment* Labs need to own models, data, compute, and power* Poolside's approach to owning infrastructure* (35:40) Venture capital in the age of massive GPU capex* The GP lives in the present, the entrepreneur in the future, the LP in the past* Generality vs. specialism narratives* “Two or 20”: management fees vs. carried interest* Scaling funds to match entrepreneur ambitions* (40:10) NVIDIA challengers and returns analysis* Chinese challengers: 6x return vs. 26x on NVIDIA* US challengers: 2x return vs. 12x on NVIDIA* Grok acquired for $20B; Samba Nova markdown to $1.6B* “The tide is lifting all boats”—demand exceeds supply* (44:06) The hardware lottery and architecture convergence* Transformer dominance and custom ASICs making a comeback* NVIDIA still 90–95% of published AI research* (45:49) AI regulation: Trump agenda and the EU AI Act* Domain-specific regulators vs. blanket AI policy* State-level experimentation creates stochasticity* EU AI Act: “born before GPT-4, takes effect in a world shaped by GPT-7”* Only three EU member states compliant by late 2025* (50:14) Sovereign AI: what it really means* True sovereignty requires energy, compute, data, talent, chip design, and manufacturing* The US is sovereign; the UK by itself is not* Form alliances or become world-class at one level of the stack* ASML and the Netherlands as an example* (52:33) Open weight safety and containment* Three paths: model-based safeguards, scaffolding/ecosystem, procedural/governance* “Pandora's box is open”—containment on distribution, not weights* Leak risk: the most vulnerable link is often human* Developer–policymaker communication and regulator upskilling* (55:43) China's AI safety approach* Matt Sheehan's work on Chinese AI regulation* Safety summits and China's participation* New Chinese policies: minor modes, mental health intervention, data governance* UK's rebrand from “safety” to “security” institutes* (58:34) Prior predictions and patterns* Hits on regulatory/political areas; misses on semiconductor consolidation, AI video games* (59:43) 2026 Predictions* A Chinese lab overtaking US on frontier (likely ZAI or DeepSeek, on scientific reasoning)* Data center NIMBYism influencing midterm politics* (01:01:01) ClosingLinks and ResourcesNathan / Air Street Capital* Air Street Capital* State of AI Report 2025* Air Street Press — essays, analysis, and the Guide to AI newsletter* Nathan on Substack* Nathan on Twitter/X* Nathan on LinkedInFrom Air Street Press (mentioned in episode)* Is the EU AI Act Actually Useful? — by Max Cutler and Nathan Benaich* China Has No Place at the UK AI Safety Summit (2023) — by Alex Chalmers and Nathan BenaichResearch & Analysis* Epoch AI: Chinese AI Models Lag US by 7 Months — the analysis referenced on the US-China capability gap* Sara Hooker: The Hardware Lottery — the essay on how hardware determines which research ideas succeed* Matt Sheehan: China's AI Regulations and How They Get Made — Carnegie EndowmentCompanies Mentioned* Eleven Labs — AI voice synthesis (Air Street portfolio)* Synthesia — AI video generation (Air Street portfolio)* Crusoe — clean compute infrastructure (Air Street portfolio)* Poolside — AI for code (Air Street portfolio)* DeepSeek — Chinese AI lab* Minimax — Chinese AI company* ASML — semiconductor equipmentOther Resources* Search Engine Podcast: Data Centers (Part 1 & 2) — PJ Vogt's two-part series on XAI data centers and the AI financing boom* RAAIS Foundation — Nathan's AI research and education charity Get full access to The Gradient at thegradientpub.substack.com/subscribe
This week, the hosts go deep on out-of-band updates, unwanted "innovations," and the uneasy cost of tech's latest gold rush. Plus, securing a Microsoft account is not as hard as some think, and neither are passkeys once you get past the jargon. And for developers, AI Dev Gallery offers a fascinating glimpse at what you can do for free with AI used against a CPU, GPU, or NPU. Windows 11 Microsoft issues an emergency fix for a borked Windows Update. Right. A fix for a fix. Hell freezes over, if only slightly: Microsoft quietly made some positive changes to forced OneDrive Folder Backup. Donʼt worry, itʼs still forced (and appears to be opt-in, but isnʼt). But you can back out more elegantly. So itʼs opt-out, not opt-in, but a step forward. Plus, a new behavior Windows 11 on Arm PCs can now download games from the Xbox app (previously only through the Insider program) Over 85 percent of Xbox games on PC work in WOA now Prism emulator now supports AVX and AVX2 and Epic Anti-Cheat, and there is a new Windows Performance Fit feature offering guidance on which titles should play well. Beta: New 25H2 build with account dialog modernization, Click to Do and desktop background improvements. Not for Dev, suggesting itʼs about to move to 26H1 Notepad and Paint get more features yet again. Notably, these updates are for Dev and Canary only, suggesting these might be 26Hx features (then again, versions don't matter, right?) AI Just say no: To AI, to Copilot, and to Satya Nadella Our national nightmare is over: You can now (easily) hide Copilot in Microsoft Edge ChatGPT Go is now available worldwide, ads are on the way because of course Wikipedia partners with Amazon, Meta, Microsoft, more on AI Xbox & gaming January Xbox Update brings Game Sync Indicator, more Solid second half of January for Xbox Game Pass Microsoft will likely introduce a free, ad-supported Xbox Cloud Gaming tier because of course Tips & picks Tip of the week: Secure your Microsoft account App pick of the week: AI Dev Gallery RunAs Radio this week: Ideation to Implementation with Amber Vandenburg Liquor pick of the week: Estancia Raicilla Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Download or subscribe to Windows Weekly at https://twit.tv/shows/windows-weekly Check out Paul's blog at thurrott.com The Windows Weekly theme music is courtesy of Carl Franklin. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit
This week, the hosts go deep on out-of-band updates, unwanted "innovations," and the uneasy cost of tech's latest gold rush. Plus, securing a Microsoft account is not as hard as some think, and neither are passkeys once you get past the jargon. And for developers, AI Dev Gallery offers a fascinating glimpse at what you can do for free with AI used against a CPU, GPU, or NPU. Windows 11 Microsoft issues an emergency fix for a borked Windows Update. Right. A fix for a fix. Hell freezes over, if only slightly: Microsoft quietly made some positive changes to forced OneDrive Folder Backup. Donʼt worry, itʼs still forced (and appears to be opt-in, but isnʼt). But you can back out more elegantly. So itʼs opt-out, not opt-in, but a step forward. Plus, a new behavior Windows 11 on Arm PCs can now download games from the Xbox app (previously only through the Insider program) Over 85 percent of Xbox games on PC work in WOA now Prism emulator now supports AVX and AVX2 and Epic Anti-Cheat, and there is a new Windows Performance Fit feature offering guidance on which titles should play well. Beta: New 25H2 build with account dialog modernization, Click to Do and desktop background improvements. Not for Dev, suggesting itʼs about to move to 26H1 Notepad and Paint get more features yet again. Notably, these updates are for Dev and Canary only, suggesting these might be 26Hx features (then again, versions don't matter, right?) AI Just say no: To AI, to Copilot, and to Satya Nadella Our national nightmare is over: You can now (easily) hide Copilot in Microsoft Edge ChatGPT Go is now available worldwide, ads are on the way because of course Wikipedia partners with Amazon, Meta, Microsoft, more on AI Xbox & gaming January Xbox Update brings Game Sync Indicator, more Solid second half of January for Xbox Game Pass Microsoft will likely introduce a free, ad-supported Xbox Cloud Gaming tier because of course Tips & picks Tip of the week: Secure your Microsoft account App pick of the week: AI Dev Gallery RunAs Radio this week: Ideation to Implementation with Amber Vandenburg Liquor pick of the week: Estancia Raicilla Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Download or subscribe to Windows Weekly at https://twit.tv/shows/windows-weekly Check out Paul's blog at thurrott.com The Windows Weekly theme music is courtesy of Carl Franklin. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit
This week, the hosts go deep on out-of-band updates, unwanted "innovations," and the uneasy cost of tech's latest gold rush. Plus, securing a Microsoft account is not as hard as some think, and neither are passkeys once you get past the jargon. And for developers, AI Dev Gallery offers a fascinating glimpse at what you can do for free with AI used against a CPU, GPU, or NPU. Windows 11 Microsoft issues an emergency fix for a borked Windows Update. Right. A fix for a fix. Hell freezes over, if only slightly: Microsoft quietly made some positive changes to forced OneDrive Folder Backup. Donʼt worry, itʼs still forced (and appears to be opt-in, but isnʼt). But you can back out more elegantly. So itʼs opt-out, not opt-in, but a step forward. Plus, a new behavior Windows 11 on Arm PCs can now download games from the Xbox app (previously only through the Insider program) Over 85 percent of Xbox games on PC work in WOA now Prism emulator now supports AVX and AVX2 and Epic Anti-Cheat, and there is a new Windows Performance Fit feature offering guidance on which titles should play well. Beta: New 25H2 build with account dialog modernization, Click to Do and desktop background improvements. Not for Dev, suggesting itʼs about to move to 26H1 Notepad and Paint get more features yet again. Notably, these updates are for Dev and Canary only, suggesting these might be 26Hx features (then again, versions don't matter, right?) AI Just say no: To AI, to Copilot, and to Satya Nadella Our national nightmare is over: You can now (easily) hide Copilot in Microsoft Edge ChatGPT Go is now available worldwide, ads are on the way because of course Wikipedia partners with Amazon, Meta, Microsoft, more on AI Xbox & gaming January Xbox Update brings Game Sync Indicator, more Solid second half of January for Xbox Game Pass Microsoft will likely introduce a free, ad-supported Xbox Cloud Gaming tier because of course Tips & picks Tip of the week: Secure your Microsoft account App pick of the week: AI Dev Gallery RunAs Radio this week: Ideation to Implementation with Amber Vandenburg Liquor pick of the week: Estancia Raicilla Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Download or subscribe to Windows Weekly at https://twit.tv/shows/windows-weekly Check out Paul's blog at thurrott.com The Windows Weekly theme music is courtesy of Carl Franklin. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit
This week, the hosts go deep on out-of-band updates, unwanted "innovations," and the uneasy cost of tech's latest gold rush. Plus, securing a Microsoft account is not as hard as some think, and neither are passkeys once you get past the jargon. And for developers, AI Dev Gallery offers a fascinating glimpse at what you can do for free with AI used against a CPU, GPU, or NPU. Windows 11 Microsoft issues an emergency fix for a borked Windows Update. Right. A fix for a fix. Hell freezes over, if only slightly: Microsoft quietly made some positive changes to forced OneDrive Folder Backup. Donʼt worry, itʼs still forced (and appears to be opt-in, but isnʼt). But you can back out more elegantly. So itʼs opt-out, not opt-in, but a step forward. Plus, a new behavior Windows 11 on Arm PCs can now download games from the Xbox app (previously only through the Insider program) Over 85 percent of Xbox games on PC work in WOA now Prism emulator now supports AVX and AVX2 and Epic Anti-Cheat, and there is a new Windows Performance Fit feature offering guidance on which titles should play well. Beta: New 25H2 build with account dialog modernization, Click to Do and desktop background improvements. Not for Dev, suggesting itʼs about to move to 26H1 Notepad and Paint get more features yet again. Notably, these updates are for Dev and Canary only, suggesting these might be 26Hx features (then again, versions don't matter, right?) AI Just say no: To AI, to Copilot, and to Satya Nadella Our national nightmare is over: You can now (easily) hide Copilot in Microsoft Edge ChatGPT Go is now available worldwide, ads are on the way because of course Wikipedia partners with Amazon, Meta, Microsoft, more on AI Xbox & gaming January Xbox Update brings Game Sync Indicator, more Solid second half of January for Xbox Game Pass Microsoft will likely introduce a free, ad-supported Xbox Cloud Gaming tier because of course Tips & picks Tip of the week: Secure your Microsoft account App pick of the week: AI Dev Gallery RunAs Radio this week: Ideation to Implementation with Amber Vandenburg Liquor pick of the week: Estancia Raicilla Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Download or subscribe to Windows Weekly at https://twit.tv/shows/windows-weekly Check out Paul's blog at thurrott.com The Windows Weekly theme music is courtesy of Carl Franklin. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit
In this episode of Lead-Lag Live, Melanie Schafer sits down with Michael Mo, CEO of KULR Technology Group (NYSE: KULR), to explore why energy reliability is emerging as the critical constraint behind AI, robotics, drones, telecom infrastructure, and next-generation data centers.Fresh off CES and following KULR's newly announced $30M telecom battery supply agreement, Mo explains how high-power, high-safety battery systems are becoming mission-critical as electrification accelerates. From NASA-proven thermal technologies to lithium-ion replacements for legacy lead-acid systems, KULR is positioning itself at the center of multiple multi-year secular growth trends.The conversation covers AI data center power resilience, UAV and drone electrification, telecom backup systems, and why battery safety, reliability, and domestic supply chains matter more than ever as power demand explodes.In this episode:– Why power—not chips—may be the next AI bottleneck– KULR's NASA-derived battery safety and thermal technologies– The Cooler One platform and growth across drones, robotics, and aviation– Replacing lead-acid batteries in telecom with lithium-based solutions– Energy-as-a-Service and reducing total cost of ownership– AI data center battery buffers and GPU-level power protection– Scaling execution with a debt-free balance sheet and strong cash positionLead-Lag Live brings you inside conversations with the leaders shaping markets at the intersection of technology, energy, and investing. Subscribe for insights that cut through the noise.#AIInfrastructure #EnergyStorage #BatteryTechnology #Drones #Telecom #DataCenters #Electrification #KULR #MarketOutlook #CleanEnergy #InvestingStart your adventure with TableTalk Friday: A D&D Podcast at the link below or wherever you get your podcasts!Youtube: https://youtube.com/playlist?list=PLgB6B-mAeWlPM9KzGJ2O4cU0-m5lO0lkr&si=W_-jLsiREjyAIgEsSpotify: https://open.spotify.com/show/75YJ921WGQqUtwxRT71UQB?si=4R6kaAYOTtO2V Support the show
As constraints on energy, water, and permitting collide with exploding demand for AI and compute, a once-fringe idea is moving rapidly toward the center of the conversation: putting data centers in space. Starcloud believes orbital infrastructure isn't science fiction—it's a necessary extension of the global compute stack if scaling is going to continue at anything close to its current pace.Founded by Philip Johnston, Starcloud is building space-based compute systems designed to compete on cost, performance, and scale with terrestrial data centers. The company has already flown a data center–grade GPU in orbit and is now working toward larger, commercially viable systems that could reshape where and how AI is powered. We discuss:How energy and permitting constraints are reshaping the future of computeWhy space-based data centers may be economically inevitable, not optionalWhat Starcloud proved by running an H100 GPU in orbitHow launch costs, watts-per-kilogram, and chip longevity define the real economicsThe national security implications of who controls future compute capacity • Chapters •00:00 - Intro00:50 - The issue with data centers02:20 - Explosion of the data center debates04:58 - Philip's 5GW data center rendering and early conceptions of data centers in space at YC08:16 - Proving people wrong11:17 - The team at Starcloud today12:29 - Competing against SpaceX's data center14:42 - Sam Altman's beef with Starlink16:52 - Economics of Orbital vs Terrestrial Data Centers by Andrew McCallip21:33 - Where are we putting these things?23:50 - Latency in space25:59 - Political side of building data centers28:36 - Starcloud 130:16 - Space based processors30:51 - Shakespeare in space32:00 - Hardening an Nvidia H100 against radiation and making chips in space economical34:43 - Cooling systems in space36:01 - How Starcloud is thinking about replacing failed GPUs38:46 - The mission for Starcloud 240:05 - Competitors outside of SpaceX40:49 - Getting to economical launch costs44:35 - Will the next great wars be over water and power for data centers?46:25 - What keeps Philip up at night?47:11 - What keeps Mo up at night? • Show notes •Starcloud's website — https://www.starcloud.com/Philip's socials — https://x.com/PhilipJohnstonMo's socials — https://x.com/itsmoislamPayload's socials — https://twitter.com/payloadspace / https://www.linkedin.com/company/payloadspaceIgnition's socials — https://twitter.com/ignitionnuclear / https://www.linkedin.com/company/ignition-nuclear/Tectonic's socials — https://twitter.com/tectonicdefense / https://www.linkedin.com/company/tectonicdefense/Valley of Depth archive — Listen: https://pod.payloadspace.com/ • About us •Valley of Depth is a podcast about the technologies that matter — and the people building them. Brought to you by Arkaea Media, the team behind Payload (space), Ignition (nuclear energy), and Tectonic (defense tech), this show goes beyond headlines and hype. We talk to founders, investors, government officials, and military leaders shaping the future of national security and deep tech. From breakthrough science to strategic policy, we dive into the high-stakes decisions behind the world's hardest technologies.Payload: www.payloadspace.comTectonic: www.tectonicdefense.comIgnition: www.ignition-news.com
This week, the hosts go deep on out-of-band updates, unwanted "innovations," and the uneasy cost of tech's latest gold rush. Plus, securing a Microsoft account is not as hard as some think, and neither are passkeys once you get past the jargon. And for developers, AI Dev Gallery offers a fascinating glimpse at what you can do for free with AI used against a CPU, GPU, or NPU. Windows 11 Microsoft issues an emergency fix for a borked Windows Update. Right. A fix for a fix. Hell freezes over, if only slightly: Microsoft quietly made some positive changes to forced OneDrive Folder Backup. Donʼt worry, itʼs still forced (and appears to be opt-in, but isnʼt). But you can back out more elegantly. So itʼs opt-out, not opt-in, but a step forward. Plus, a new behavior Windows 11 on Arm PCs can now download games from the Xbox app (previously only through the Insider program) Over 85 percent of Xbox games on PC work in WOA now Prism emulator now supports AVX and AVX2 and Epic Anti-Cheat, and there is a new Windows Performance Fit feature offering guidance on which titles should play well. Beta: New 25H2 build with account dialog modernization, Click to Do and desktop background improvements. Not for Dev, suggesting itʼs about to move to 26H1 Notepad and Paint get more features yet again. Notably, these updates are for Dev and Canary only, suggesting these might be 26Hx features (then again, versions don't matter, right?) AI Just say no: To AI, to Copilot, and to Satya Nadella Our national nightmare is over: You can now (easily) hide Copilot in Microsoft Edge ChatGPT Go is now available worldwide, ads are on the way because of course Wikipedia partners with Amazon, Meta, Microsoft, more on AI Xbox & gaming January Xbox Update brings Game Sync Indicator, more Solid second half of January for Xbox Game Pass Microsoft will likely introduce a free, ad-supported Xbox Cloud Gaming tier because of course Tips & picks Tip of the week: Secure your Microsoft account App pick of the week: AI Dev Gallery RunAs Radio this week: Ideation to Implementation with Amber Vandenburg Liquor pick of the week: Estancia Raicilla Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Download or subscribe to Windows Weekly at https://twit.tv/shows/windows-weekly Check out Paul's blog at thurrott.com The Windows Weekly theme music is courtesy of Carl Franklin. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit
AI data centers are no longer just buildings full of racks. They are tightly coupled systems where power, cooling, IT, and operations all depend on each other, and where bad assumptions get expensive fast. On the latest episode of The Data Center Frontier Show, Editor-in-Chief Matt Vincent talks with Sherman Ikemoto of Cadence about what it now takes to design an “AI factory” that actually works. Ikemoto explains that data center design has always been fragmented. Servers, cooling, and power are designed by different suppliers, and only at the end does the operator try to integrate everything into one system. That final integration phase has long relied on basic tools and rules of thumb, which is risky in today's GPU-dense world. Cadence is addressing this with what it calls “DC elements”: digitally validated building blocks that represent real systems, such as NVIDIA's DGX SuperPOD with GB200 GPUs. These are not just drawings; they model how systems really behave in terms of power, heat, airflow, and liquid cooling. Operators can assemble these elements in a digital twin and see how an AI factory will actually perform before it is built. A key shift is designing directly to service-level agreements. Traditional uncertainty forced engineers to add large safety margins, driving up cost and wasting power. With more accurate simulation, designers can shrink those margins while still hitting uptime and performance targets, critical as rack densities move from 10–20 kW to 50–100 kW and beyond. Cadence validates its digital elements using a star system. The highest level, five stars, requires deep validation and supplier sign-off. The GB200 DGX SuperPOD model reached that level through close collaboration with NVIDIA. Ikemoto says the biggest bottleneck in AI data center buildouts is not just utilities or equipment; it is knowledge. The industry is moving too fast for old design habits. Physical prototyping is slow and expensive, so virtual prototyping through simulation is becoming essential, much like in aerospace and automotive design. Cadence's Reality Digital Twin platform uses a custom CFD engine built specifically for data centers, capable of modeling both air and liquid cooling and how they interact. It supports “extreme co-design,” where power, cooling, IT layout, and operations are designed together rather than in silos. Integration with NVIDIA Omniverse is aimed at letting multiple design tools share data and catch conflicts early. Digital twins also extend beyond commissioning. Many operators now use them in live operations, connected to monitoring systems. They test upgrades, maintenance, and layout changes in the twin before touching the real facility. Over time, the digital twin becomes the operating platform for the data center. Running real AI and machine-learning workloads through these models reveals surprises. Some applications create short, sharp power spikes in specific areas. To be safe, facilities often over-provision power by 20–30%, leaving valuable capacity unused most of the time. By linking application behavior to hardware and facility power systems, simulation can reduce that waste, crucial in an era where power is the main bottleneck. The episode also looks at Cadence's new billion-cycle power analysis tools, which allow massive chip designs to be profiled with near-real accuracy, feeding better system- and facility-level models. Cadence and NVIDIA have worked together for decades at the chip level. Now that collaboration has expanded to servers, racks, and entire AI factories. As Ikemoto puts it, the data center is the ultimate system—where everything finally comes together—and it now needs to be designed with the same rigor as the silicon inside it.
Wilder Lopes is the CEO and Founder of Ogre.run, working on AI-driven dependency resolution and reproducible code execution across environments.How Universal Resource Management Transforms AI Infrastructure Economics // MLOps Podcast #357 with Wilder Lopes, CEO / Founder of Ogre.runJoin the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// AbstractEnterprise organizations face a critical paradox in AI deployment: while 52% struggle to access needed GPU resources with 6-12 month waitlists, 83% of existing CPU capacity sits idle. This talk introduces an approach to AI infrastructure optimization through universal resource management that reshapes applications to run efficiently on any available hardware—CPUs, GPUs, or accelerators.We explore how code reshaping technology can unlock the untapped potential of enterprise computing infrastructure, enabling organizations to serve 2-3x more workloads while dramatically reducing dependency on scarce GPU resources. The presentation demonstrates why CPUs often outperform GPUs for memory-intensive AI workloads, offering superior cost-effectiveness and immediate availability without architectural complexity.// BioWilder Lopes is a second-time founder, developer, and research engineer focused on building practical infrastructure for developers. He is currently building Ogre.run, an AI agent designed to solve code reproducibility.Ogre enables developers to package source code into fully reproducible environments in seconds. Unlike traditional tools that require extensive manual setup, Ogre uses AI to analyze codebases and automatically generate the artifacts needed to make code run reliably on any machine. The result is faster development workflows and applications that work out of the box, anywhere.// Related LinksWebsite: https://ogre.runhttps://lopes.aihttps://substack.com/@wilderlopes https://youtu.be/YCWkUub5x8c?si=7RPKqRhu0Uf9LTql~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExploreJoin our Slack community [https://go.mlops.community/slack]Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)] Sign up for the next meetup: [https://go.mlops.community/register]MLOps Swag/Merch: [https://shop.mlops.community/]Connect with Demetrios on LinkedIn: /dpbrinkmConnect with Wilder on LinkedIn: /wilderlopes/Timestamps:[00:00] Secondhand Data Centers Challenges[00:27] AI Hardware Optimization Debate[03:40] LLMs on Older Hardware[07:15] CXL Tradeoffs[12:04] LLM on CPU Constraints[17:07] Leveraging Existing Hardware[22:31] Inference Chips Overview[27:57] Fundamental Innovation in AI[30:22] GPU CPU Combinations[40:19] AI Hardware Challenges[43:21] AI Perception Divide[47:25] Wrap up
It is only mid-January 2026, and the world has already reached peak "end-times" energy. In this episode of the LTS podcast, Marcos, Mr. G, and Josue attempt to find the humor in our collective digital collapse as they navigate the total wreckage of the GPU market and the AI-driven "hardware apocalypse." Between dodging cosmic glitches and whatever military-grade "extraterrestrials" are currently haunting the skies, the trio dives into the soul-deep debate of anime subbing versus dubbing—passionately arguing that Japanese voice acting in Demon Slayer is the only thing still providing pure emotional stability in this cursed timeline. If the first two weeks of the year are already this unhinged, at least we'll have high-quality animation to watch while the RAM prices send us into early retirement.#AnimeRecommendations #Gaming #TechIndustryMr. G Social Mediahttps://www.instagram.com/ameliorationautos?igsh=eTIwOGhxazA4ZGJs-------------------------------------------------- IG: https://bit.ly/IG-LTS -------------------------------------------------- LTS on X: https://bit.ly/LTSTweets -------------------------------------------------- Buy Me Coffee: https://www.buymeacoffee.com/LTS2020
In this episode of The Circuit, Ben Bajarin and Jay Goldberg discuss the launch of Ben's new publication, "The Diligent Stack." The duo then performs a deep dive into TSMC's recent earnings, analyzing the risks of semiconductor cyclicality, the massive CapEx requirements for the future, and the specific bottlenecks in advanced packaging (CoWoS). Later, they shift focus to OpenAI's partnership with Cerebras and the introduction of ads to fund massive compute needs. Finally, they break down the latest data on GPU pricing, highlighting the significant premiums hyperscalers charge compared to NeoClouds and the difficulty of tracking pricing for Nvidia's new Grace Blackwell chips.
What happens when the AI race stops being about size and starts being about sense? In this episode of Tech Talks Daily, I sit down with Wade Myers from MythWorx, a company operating quietly while questioning some of the loudest assumptions in artificial intelligence right now. We recorded this conversation during the noise of CES week, when headlines were full of bigger models, more parameters, and ever-growing GPU demand. But instead of chasing scale, this discussion goes in the opposite direction and asks whether brute force intelligence is already running out of road. Wade brings a perspective shaped by years as both a founder and investor, and he explains why today's large language models are starting to collide with real-world limits around power, cost, latency, and sustainability. We talk openly about the hidden tax of GPUs, how adding more compute often feels like piling complexity onto already fragile systems, and why that approach looks increasingly shaky for enterprises dealing with technical debt, energy constraints, and long deployment cycles. What makes this conversation especially interesting is MythWorx's belief that the next phase of AI will look less like prediction engines and more like reasoning systems. Wade walks through how their architecture is modeled closer to human learning, where intelligence is learned once and applied many times, rather than dragging around the full weight of the internet to answer every question. We explore why deterministic answers, audit trails, and explainability matter far more in areas like finance, law, medicine, and defense than clever-sounding responses. There is also a grounded enterprise angle here. We talk about why so many organizations feel uneasy about sending proprietary data into public AI clouds, how private AI deployments are becoming a board-level concern, and why most companies cannot justify building GPU-heavy data centers just to experiment. Wade draws parallels to the early internet and smartphone app eras, reminding us that the playful phase often comes before the practical one, and that disappointment is often a signal of maturation, not failure. We finish by looking ahead. Edge AI, small-footprint models, and architectures that reward efficiency over excess are all on the horizon, and Wade shares what MythWorx is building next, from faster model training to offline AI that can run on devices without constant connectivity. It is a conversation about restraint, reasoning, and realism at a time when hype often crowds out reflection. So if bigger models are no longer the finish line, what should business and technology leaders actually be paying attention to next, and are we ready to rethink what intelligence really means? Useful Links Connect with Wade Myers Learn More About MythWorx Thanks to our sponsors, Alcor, for supporting the show.
We're just out of the recent earnings season and we've seen a wild range of results and some interesting implications. Melissa Otto CFA, head of S&P Global's Visible Alpha research team, returns to discuss what that markets have been saying and what she makes of the data with host Eric Hanselman. Macroeconomic effects are having some impact, as consumer sentiment diverges across the top and the bottom of the economy. In technology, there are mixed feelings about AI as the hunt continues for use cases with decisive revenue returns. The hyperscalers are continuing to invest capital at staggering rates and, so far, the markets have mostly approved. AI supply chain companies, like NVIDIA, are generally moving forward with solid results. The larger question is where is the AI boom headed. There are constraints not only in supply chains for data centers, but also in energy supply. Agentic AI has a lot of promise, but needs to prove out its value and earn trust, as providers look to improve efficiency with more targeted silicon, like ASICs, to stand up alongside the forests of GPU's being deployed. As investors hunt for improved returns, they may be rotating to international opportunities and small cap companies that might be able to see faster returns from AI deployments. More S&P Global Content: Next in Tech podcast: Agentic Customer Experience Nvidia GTC in DC Blackwell expectations increase Otto: Markets are grappling with how to price AI-related stocks Next in Tech podcast, Episode 239: AI Infrastructure For S&P Global Subscribers: A view of peaks and plateaus AI to lead tech spending in 2026, but orgs losing track of energy efficiency – Highlights from Macroeconomic Outlook, SME Tech Trends Hyperscaler earnings quarterly: Alphabet, Amazon and Microsoft charge ahead on AI capacity buildouts Agents are already driving workplace impact and agentic AI adoption – Highlights from VotE: AI & Machine Learning Big Picture 2026 AI Outlook: Unleashing agentic potential Credits: Host/Author: Eric Hanselman Guest: Melissa Otto, CFA Producer/Editor: Feranmi Adeoshun Published With Assistance From: Sophie Carr, Kyra Smith
Discover how Cerebras is challenging NVIDIA with a fundamentally different approach to AI hardware and large-scale inference.In this episode of Startup Project, Nataraj sits down with Andrew Feldman, co-founder and CEO of Cerebras Systems, to discuss how the company built a wafer-scale AI chip from first principles. Andrew shares the origin story of Cerebras, why they chose to rethink chip architecture entirely, and how system-level design decisions unlock new performance for modern AI workloads.The conversation explores:Why inference is becoming the dominant cost and performance bottleneck in AIHow Cerebras' wafer-scale architecture overcomes GPU memory and communication limitsWhat it takes to compete with incumbents like NVIDIA and AMD as a new chip companyThe tradeoffs between training and inference at scaleCerebras' product strategy across systems, cloud offerings, and enterprise deploymentsThis episode is a deep dive into AI infrastructure, semiconductor architecture, and system-level design, and is especially relevant for builders, engineers, and leaders thinking about the future of AI compute.
Philip Johnston is co-founder and CEO of Starcloud, a company building data centers in space to solve AI's power crisis. Starcloud has already launched the first NVIDIA H100 GPU into orbit and is partnering with cloud providers like Crusoe to scale orbital computing infrastructure.As AI demand accelerates, data centers are running into a new bottleneck: access to reliable, affordable power. Grid congestion, interconnection delays, and cooling requirements are slowing the deployment of new AI data centers, even as compute demand continues to surge. Traditional data centers face 5-10 year lead times for new power projects due to permitting, interconnection queues, and grid capacity constraints.In this episode, Philip explains why Starcloud is building data centers in orbit, where continuous solar power is available and heat can be rejected directly into the vacuum of space. He walks through Starcloud's first on-orbit GPU deployment, the realities of cooling and radiation in space, and how orbital data centers could relieve pressure on terrestrial power systems as AI infrastructure scales.Episode recorded on Dec 11, 2025 (Published on Jan 13, 2026)In this episode, we cover: [04:59] What Starcloud's orbital data centers look like (and how they differ from terrestrial facilities)[06:37] How SpaceX Starship's reusable launch vehicles change space economics[10:45] The $500/kg breakeven point for space-based solar vs. Earth [14:15] Why space solar panels produce 8x more energy than ground-based arrays [21:19] Thermal management: Cooling NVIDIA GPUs in a vacuum using radiators [25:57] Edge computing in orbit: Real-time inference on satellite imagery [29:22] The Crusoe partnership: Selling power-as-a-service in space [31:21] Starcloud's business model: Power, cooling, and connectivity [34:18] Addressing critics: What could prevent orbital data centers from workingKey Takeaways:Starcloud launched the first NVIDIA H100 GPU into orbit in November 2024 Space solar produces 8x more energy per square meter than terrestrial solar Breakeven launch cost for orbital data centers: $500/kg Current customers: DOD and commercial Earth observation satellites needing real-time inference Target: 10 gigawatts of orbital computing capacity by early 2030s Enjoyed this episode? Please leave us a review! Share feedback or suggest future topics and guests at info@mcj.vc.Connect with MCJ:Cody Simms on LinkedInVisit mcj.vcSubscribe to the MCJ Newsletter*Editing and post-production work for this episode was provided by The Podcast Consultant
An airhacks.fm conversation with Thomas Wuerthinger (@thomaswue) about: clarification of GraalVM release cadence changes and decoupling from openJDK releases, GraalVM focusing on LTS Java releases only (skipping non-LTS like Java 26), GraalVM as a multi-vendor polyglot project with community edition and third-party vendors like Red Hat BellSoft and microdoc, increased focus on python support due to AI popularity, GraalVM team alignment with Oracle Database organization, Oracle Multilingual Engine (MLE) for running JavaScript and Python in Oracle Database, MySQL MLE integration, native image support for stored procedures in Oracle Database, shipping lambda functions from client applications to database for temporary execution, treating Oracle Database as an operating system for running business logic, serverless workloads directly in Oracle Database, application snapshotting similar to CRaC but running in user space without kernel privileges, efficient scale-to-zero capabilities with native images, Oracle REST Data Services service generalization for serverless execution platform, database triggers for workflow systems and application wake-up, durable functions with transactional state storage in Oracle Database, comparison to AS400 architecture with transaction manager database and operating system in same memory, memory price increases making GraalVM native image more attractive, lower memory consumption benefits of native image beyond just startup time, CPU-based inference support with SIMD and Vector API, TornadoVM for GPU-based inference built on Graal compiler, WebAssembly compilation target for native images, edge function deployment with WebAssembly, Intel memory protection keys for sandboxed native image execution, native image layers for shared base libraries similar to docker layers, profile-guided optimizations for size reduction, upx binary compression for 3x size reduction, memory savings from eliminated class metadata and profiling data not garbage collector differences, 32-bit object headers in serial GC smaller than HotSpot, polyglot integration allowing Python and JavaScript embedding in Java applications, Micronaut framework compile-time annotation processing, quarkus framework best alignment with native image for smallest binaries, GraalVM roadmap focused on database synergies and serverless innovation Thomas Wuerthinger on twitter: @thomaswue
In this episode of the Crazy Wisdom podcast, host Stewart Alsop sits down with Peter Schmidt Nielsen, who is building FPGA-accelerated servers at Saturn Data. The conversation explores why servers need FPGAs, how these field-programmable gate arrays work as "IO expanders" for massive memory bandwidth, and why they're particularly well-suited for vector database and search applications. Peter breaks down the technical realities of FPGAs - including why they "really suck" in many ways compared to GPUs and CPUs - while explaining how his company is leveraging them to provide terabyte-per-second bandwidth to 1.3 petabytes of flash storage. The discussion ranges from distributed systems challenges and the CAP theorem to the hardware-software relationship in modern computing, offering insights into both the philosophical aspects of search technology and the nuts-and-bolts engineering of memory controllers and routing fabrics.For more information about Peter's work, you can reach him on Twitter at @PTRSCHMDTNLSN or find his website at saturndata.com.Timestamps00:00 Introduction to FPGAs and Their Role in Servers02:47 Understanding FPGA Limitations and Use Cases05:55 Exploring Different Types of Servers08:47 The Importance of Memory and Bandwidth11:52 Philosophical Insights on Search and Access Patterns14:50 The Relationship Between Hardware and Search Queries17:45 Challenges of Distributed Systems20:47 The CAP Theorem and Its Implications23:52 The Evolution of Technology and Knowledge Management26:59 FPGAs as IO Expanders29:35 The Trade-offs of FPGAs vs. ASICs and GPUs32:55 The Future of AI Applications with FPGAs35:51 Exciting Developments in Hardware and BusinessKey Insights1. FPGAs are fundamentally "crappy ASICs" with serious limitations - Despite being programmable hardware, FPGAs perform far worse than general-purpose alternatives in most cases. A $100,000 high-end FPGA might only match the memory bandwidth of a $600 gaming GPU. They're only valuable for specific niches like ultra-low latency applications or scenarios requiring massive parallel I/O operations, making them unsuitable for most computational workloads where CPUs and GPUs excel.2. The real value of FPGAs lies in I/O expansion, not computation - Rather than using FPGAs for their processing power, Saturn Data leverages them primarily as cost-effective ways to access massive amounts of DRAM controllers and NVMe interfaces. Their server design puts 200 FPGAs in a 2U enclosure with 1.3 petabytes of flash storage and terabyte-per-second read bandwidth, essentially using FPGAs as sophisticated I/O expanders.3. Access patterns determine hardware performance more than raw specs - The way applications access data fundamentally determines whether specialized hardware will provide benefits. Applications that do sparse reads across massive datasets (like vector databases) benefit from Saturn Data's architecture, while those requiring dense computation or frequent inter-node communication are better served by traditional hardware. Understanding these patterns is crucial for matching workloads to appropriate hardware.4. Distributed systems complexity stems from failure tolerance requirements - The difficulty of distributed systems isn't inherent but depends on what failures you need to tolerate. Simple approaches that restart on any failure are easy but unreliable, while Byzantine fault tolerance (like Bitcoin) is extremely complex. Most practical systems, including banks, find middle ground by accepting occasional unavailability rather than trying to achieve perfect consistency, availability, and partition tolerance simultaneously.5. Hardware specialization follows predictable cycles of generalization and re-specialization - Computing hardware consistently follows "Makimoto's Wave" - specialized hardware becomes more general over time, then gets leapfrogged by new specialized solutions. CPUs became general-purpose, GPUs evolved from fixed graphics pipelines to programmable compute, and now companies like Etched are creating transformer-specific ASICs. This cycle repeats as each generation adds programmability until someone strips it away for performance gains.6. Memory bottlenecks are reshaping the hardware landscape - The AI boom has created severe memory shortages, doubling costs for DRAM components overnight. This affects not just GPU availability but creates opportunities for alternative architectures. When everyone faces higher memory costs, the relative premium for specialized solutions like FPGA-based systems becomes more attractive, potentially shifting the competitive landscape for memory-intensive applications.7. Search applications represent ideal FPGA use cases due to their sparse access patterns - Vector databases and search workloads are particularly well-suited to FPGA acceleration because they involve searching through massive datasets with sparse access patterns rather than dense computation. These applications can effectively utilize the high bandwidth to flash storage and parallel I/O capabilities that FPGAs provide, making them natural early adopters for this type of specialized hardware architecture.
Moiz Kohari, VP of Enterprise AI and Data Intelligence at DDN, breaks down what it actually takes to get AI into production and keep it there. If your org is stuck in pilot mode, this conversation will help you spot the real blockers, from trust and hallucinations to data architecture and GPU bottlenecks.Key takeaways• GenAI success in the enterprise is less about the demo and more about trust, accuracy, and knowing when the system should say “I don't know.”• “Operationalizing” usually fails at the handoff, when humans stay permanently in the loop and the business never captures the full benefit.• Data architecture is the multiplier. If your data is siloed, slow, or hard to access safely, your AI roadmap stalls, no matter how good your models are.• GPU spend is only worth it if your pipelines can feed the GPUs fast enough. A lot of teams are IO bound, so utilization stays low and budgets get burned.• The real win is better decisions, faster. Moving from end of day batch thinking to intraday intelligence can change risk, margin, and response time in major ways.Timestamped highlights00:35 What DDN does, and why data velocity matters when GPUs are the pricey line item02:12 AI vs GenAI in the enterprise, and why “taking the human out” is where value shows up08:43 Hallucinations, trust, and why “always answering” creates real production risk12:00 What teams do with the speed gains, and why faster delivery shifts you toward harder problems12:58 From hours to minutes, how GPU acceleration changes intraday risk and decision making in finance20:16 Data architecture choices, POSIX vs object storage, and why your IO layer can make or break AI readinessA line worth stealing“Speed is great, but trust is the frontier. If your system can't admit what it doesn't know, production is where the project stops.”Pro tips you can apply this week• Pick one workflow where the output can be checked quickly, then design the path from pilot to production up front, including who approves what and how exceptions get handled.• Audit your bottleneck before you buy more compute. If your GPUs are waiting on data, fix storage, networking, and pipeline throughput first.• Build “confidence behavior” into the system. Decide when it should answer, when it should cite, and when it should escalate to a human.Call to actionIf you got value from this one, follow the show and turn on notifications so you do not miss the next episode.
What if we told you that CES did not feature any new GPUs? But it did feature more frames! MSI with LIGHTNING and GPU safeguard, Phison's new controller, and that wily AMD with new Ryzen 7 9850X3D (and confirmed Ryzen 9 9950X3D2) - whee! Remember the Reboot computer generated cartoon? Remember D-Link Routers and Zero Days? Remember Intel? It's all here! That and everything old is new again with Old GPUs and CPUs coming back .. because RAM.Thanks again to our sponsor with CopilotMoney! Get on your single pane of financial glass and bring order to your money and spending - it's even actually fun to save again. Get the web version and use our code for 26% off at http://try.copilot.money/pcperTimestamps:0:00 Intro00:56 Patreon01:37 Food with Josh04:10 AMD announces Ryzen 7 9850X3D05:41 AMD sort of confirmed the 9950X3D207:00 NVIDIA DLSS 4.509:34 Intel was at CES12:50 MSI LIGHTNING returns14:54 MSI also launching GPU Safeguard Plus PSUs19:44 WD_Black is now Sandisk Optimus GX Pro21:54 Phison has the most efficient SSD controller26:11 ASUS ROG RGB Stripe OLED28:44 First computer-animated TV show restored33:29 Podcast sponsor - Copilot Money34:57 (In)Security Corner44:32 Gaming Quick Hits1:06:31 Picks of the Week1:24:08 Outro ★ Support this podcast on Patreon ★
BS Section and House Keeping Discord Server geekoholics.com/discord/ Whatcha Been Playing? Ready Or Not Demeo X D&D DorfRomantik Stick it to the Stick man Arc Raiders Marvel's Cosmic Invasion Dispatch (completed) Donkey Kong Bananaza (Completed) Lost Soul Aside News: Cross Platform / PC / Misc. Vince Zampella, co-creator of Call of Duty and Respawn Entertainment, has died 'It's going into very good hands': CD Projekt is selling GOG Marathon art lead departs Bungie of his own volition, ahead of game's new release date AMD and Nvidia will reportedly raise GPU prices "significantly" in 2026 Yes, aggression-matchmaking is now a thing in Arc Raiders From the Air Force to arcades to home consoles: Sega co-founder David Rosen dies aged 95 Fortnite's latest collab with an adult animated series that's somehow still going is none other than South Park | Rock Paper Shotgun Ubisoft close the studio behind Assassin's Creed Rebellion days after the developers vote to unionise | Rock Paper Shotgun Top Five Most-Played Games on PlayStation and Xbox in 2025 in the US Were the Same as in 2024 - IGN 007 First Light has been Delayed Top Selling Steam games of the year Nintendo Nintendo will offer an alternative to Switch 2's controversial Game Key Cards, dev claims PlayStation Sony announces the Hyperpop collection PSA's: Epic Games Store Freebies: Bloons TD6 Monthly PS games are out Free 4 All SpongeBob movie Messing around with my Spectrum Steam sale shame Steelseries Nova 7 Wireless V2 Rewatching Reacher Reading Reacher (The book that matches season 3) Finished Gen V
I'm about to get on a plane and will be gone for a couple weeks, but didn't want to leave you Breaking Changeless so I did the thing where I stand up in front of a microphone and talked at you. Again. Like I do. Fun fact: this is the first and only time I've taken a phone call live, on-air! I was just too lazy to edit that out gracefully. Whenever I go to Japan solo, I experience moments of loneliness, so I'd really appreciate it if you sent me some praise or complaints or ideas to podcast@searls.co and I'll feel comforted by the knowledge that you exist. Your engagement sustains me. Lotta weird and dumb links this go-round: Eric Doggett is a great friend/artist Fortune Feimster isn't spelled how I would've guessed Ray-Ban Meta Wayfarer (Gen 2) is the best gift anyone's given me in a while POSSE Party's tutorial videos are just enough to convince you to either bother or not bother Reddit's r/selfhosted is at least a little self-aware I'm giving myself some grace when it comes to the newsletter EDID Emulators are a hardware product that exist only because Windows is bad Looking forward to trying Happy for remote Claude Code / Codex CLI work Aaron's puns, ranked Pebble Index 01 Google's / XReal Putting It All on Glasses Next Year XReal is partnering with Asus ROG, too Google and Apple partner on better Android-iPhone switching NYT profiles John Ternus AirPods Pro with IR cameras (instead of stem clicks?!) JPMorgan Chase Reaches a Deal to Take Over the Apple Credit Card (News+) The Clicks Power Keyboard looks rad Expedition 33's Game Awards sweep has me asking, who will be the first to VEGOT? Valve still sending this guy chocolates every Christmas The post-GeForce era: What if Nvidia abandons PC gaming? The 5090 could cost $5090 by the end of 2026 Nvidia plans heavy cuts to GPU supply in early 2026 Racks of AI chips are too damn heavy (News+) Vera Rubin is probably even heavier GPT Image 1.5 is better but not good enough PSA: make ChatGPT less warm, enthusiastic, and emoji-tastic (News+) You can (supposedly) buy your Instacart groceries without leaving ChatGPT The massive year-end Ed Zitron newsletter. Podcasts are AI now (News+) Copywriters reveal how AI has decimated their industry A Stanford degree wont save you (News+) 'Godfather of SaaS' Says He Replaced Most of His Sales Team With AI Agents NYC phone ban reveals some students can't read clocks Swearing Actually Seems to Make Humans Physically Stronger Corporation for Public Broadcasting to Shut Down After 58 Years Due to Trump Eliminating Funding Grok Is Generating Sexual Content Far More Graphic Than What's on X (News+) Outer Worlds 2 Ball X Pit Stranger Things Season 5 Reddit's terrific r/RealOrAI sub The RayNeo Air 3s are the display glasses I'd recommend if you can find them for $199 Murderbot UDCast universal subtitling (and the movie I wanted to watch) Beckygram.com
The AI boom is hitting real limits. In this episode, the Newcomer team breaks down why data centers are running out of power, what is really happening inside OpenAI after its latest shakeup, and why neocloud players like CoreWeave may be heading toward a financial crunch.Eric Newcomer, Tom Dotan, and Maline Renberg explain the investor panic behind the scenes, the brutal GPU economics, and what these cracks mean for the future of AI infrastructure.
A new week means new questions! Hope you have fun with these!The Oscars are presented by which professional honorary organization Headquartered in Beverly Hills, California?The French company Van Cleef & Arpels is a business mainly specializing in what?A quay is a structure primarily built on or along what kind of geographical feature?The first successful tornado warning in history occurred in 1948 at Tinker Air Field Air Force Base near what city?Till We Have Faces, The Great Divorce, and The Screwtape Letters are lesser-known novels by which author?In video and tabletop games, what does "NPC" stand for?Biellmann Spin, Lutz, and Crossover are all terms used in which sport?In a computer or video gaming system, what does the acronym GPU stand for?H2O2 is the chemical structure for what common product?In Greek myth, which monster was beheaded by the hero Perseus?Before decimalisation in the UK, how many pence made a shilling?The German state that existed from 1701 to 1918 was known as the Kingdom of what?Chad Kroeger and his Canadian chums enjoy this slightly sweet dark rye bread from Germany.In military tech, falconets, culverins, and carronades were all types of what?On The Office, what are the awards called that Michael hands out to his employees?MusicHot Swing, Fast Talkin, Bass Walker, Dances and Dames, Ambush by Kevin MacLeod (incompetech.com)Licensed under Creative Commons: By Attribution 3.0 http://creativecommons.org/licenses/by/3.0/Don't forget to follow us on social media:Patreon – patreon.com/quizbang – Please consider supporting us on Patreon. Check out our fun extras for patrons and help us keep this podcast going. We appreciate any level of support!Website – quizbangpod.com Check out our website, it will have all the links for social media that you need and while you're there, why not go to the contact us page and submit a question!Facebook – @quizbangpodcast – we post episode links and silly lego pictures to go with our trivia questions. Enjoy the silly picture and give your best guess, we will respond to your answer the next day to give everyone a chance to guess.Instagram – Quiz Quiz Bang Bang (quizquizbangbang), we post silly lego pictures to go with our trivia questions. Enjoy the silly picture and give your best guess, we will respond to your answer the next day to give everyone a chance to guess.Twitter – @quizbangpod We want to start a fun community for our fellow trivia lovers. If you hear/think of a fun or challenging trivia question, post it to our twitter feed and we will repost it so everyone can take a stab it. Come for the trivia – stay for the trivia.Ko-Fi – ko-fi.com/quizbangpod – Keep that sweet caffeine running through our body with a Ko-Fi, power us through a late night of fact checking and editing!
Happy New Year! You may have noticed that in 2025 we had moved toward YouTube as our primary podcasting platform. As we'll explain in the next State of Latent Space post, we'll be doubling down on Substack again and improving the experience for the over 100,000 of you who look out for our emails and website updates!We first mentioned Artificial Analysis in 2024, when it was still a side project in a Sydney basement. They then were one of the few Nat Friedman and Daniel Gross' AIGrant companies to raise a full seed round from them and have now become the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities.We have chatted with both Clementine Fourrier of HuggingFace's OpenLLM Leaderboard and (the freshly valued at $1.7B) Anastasios Angelopoulos of LMArena on their approaches to LLM evals and trendspotting, but Artificial Analysis have staked out an enduring and important place in the toolkit of the modern AI Engineer by doing the best job of independently running the most comprehensive set of evals across the widest range of open and closed models, and charting their progress for broad industry analyst use.George Cameron and Micah-Hill Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is “open” really?We discuss:* The origin story: built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx's retweet* Why they run evals themselves: labs prompt models differently, cherry-pick chain-of-thought examples (Google Gemini 1.0 Ultra used 32-shot prompts to beat GPT-4 on MMLU), and self-report inflated numbers* The mystery shopper policy: they register accounts not on their own domain and run intelligence + performance benchmarks incognito to prevent labs from serving different models on private endpoints* How they make money: enterprise benchmarking insights subscription (standardized reports on model deployment, serverless vs. managed vs. leasing chips) and private custom benchmarking for AI companies (no one pays to be on the public leaderboard)* The Intelligence Index (V3): synthesizes 10 eval datasets (MMLU, GPQA, agentic benchmarks, long-context reasoning) into a single score, with 95% confidence intervals via repeated runs* Omissions Index (hallucination rate): scores models from -100 to +100 (penalizing incorrect answers, rewarding ”I don't know”), and Claude models lead with the lowest hallucination rates despite not always being the smartest* GDP Val AA: their version of OpenAI's GDP-bench (44 white-collar tasks with spreadsheets, PDFs, PowerPoints), run through their Stirrup agent harness (up to 100 turns, code execution, web search, file system), graded by Gemini 3 Pro as an LLM judge (tested extensively, no self-preference bias)* The Openness Index: scores models 0-18 on transparency of pre-training data, post-training data, methodology, training code, and licensing (AI2 OLMo 2 leads, followed by Nous Hermes and NVIDIA Nemotron)* The smiling curve of AI costs: GPT-4-level intelligence is 100-1000x cheaper than at launch (thanks to smaller models like Amazon Nova), but frontier reasoning models in agentic workflows cost more than ever (sparsity, long context, multi-turn agents)* Why sparsity might go way lower than 5%: GPT-4.5 is ~5% active, Gemini models might be ~3%, and Omissions Index accuracy correlates with total parameters (not active), suggesting massive sparse models are the future* Token efficiency vs. turn efficiency: GPT-5 costs more per token but solves Tau-bench in fewer turns (cheaper overall), and models are getting better at using more tokens only when needed (5.1 Codex has tighter token distributions)* V4 of the Intelligence Index coming soon: adding GDP Val AA, Critical Point, hallucination rate, and dropping some saturated benchmarks (human-eval-style coding is now trivial for small models)Links to Artificial Analysis* Website: https://artificialanalysis.ai* George Cameron on X: https://x.com/georgecameron* Micah-Hill Smith on X: https://x.com/micahhsmithFull Episode on YouTubeTimestamps* 00:00 Introduction: Full Circle Moment and Artificial Analysis Origins* 01:19 Business Model: Independence and Revenue Streams* 04:33 Origin Story: From Legal AI to Benchmarking Need* 16:22 AI Grant and Moving to San Francisco* 19:21 Intelligence Index Evolution: From V1 to V3* 11:47 Benchmarking Challenges: Variance, Contamination, and Methodology* 13:52 Mystery Shopper Policy and Maintaining Independence* 28:01 New Benchmarks: Omissions Index for Hallucination Detection* 33:36 Critical Point: Hard Physics Problems and Research-Level Reasoning* 23:01 GDP Val AA: Agentic Benchmark for Real Work Tasks* 50:19 Stirrup Agent Harness: Open Source Agentic Framework* 52:43 Openness Index: Measuring Model Transparency Beyond Licenses* 58:25 The Smiling Curve: Cost Falling While Spend Rising* 1:02:32 Hardware Efficiency: Blackwell Gains and Sparsity Limits* 1:06:23 Reasoning Models and Token Efficiency: The Spectrum Emerges* 1:11:00 Multimodal Benchmarking: Image, Video, and Speech Arenas* 1:15:05 Looking Ahead: Intelligence Index V4 and Future Directions* 1:16:50 Closing: The Insatiable Demand for IntelligenceTranscriptMicah [00:00:06]: This is kind of a full circle moment for us in a way, because the first time artificial analysis got mentioned on a podcast was you and Alessio on Latent Space. Amazing.swyx [00:00:17]: Which was January 2024. I don't even remember doing that, but yeah, it was very influential to me. Yeah, I'm looking at AI News for Jan 17, or Jan 16, 2024. I said, this gem of a models and host comparison site was just launched. And then I put in a few screenshots, and I said, it's an independent third party. It clearly outlines the quality versus throughput trade-off, and it breaks out by model and hosting provider. I did give you s**t for missing fireworks, and how do you have a model benchmarking thing without fireworks? But you had together, you had perplexity, and I think we just started chatting there. Welcome, George and Micah, to Latent Space. I've been following your progress. Congrats on... It's been an amazing year. You guys have really come together to be the presumptive new gardener of AI, right? Which is something that...George [00:01:09]: Yeah, but you can't pay us for better results.swyx [00:01:12]: Yes, exactly.George [00:01:13]: Very important.Micah [00:01:14]: Start off with a spicy take.swyx [00:01:18]: Okay, how do I pay you?Micah [00:01:20]: Let's get right into that.swyx [00:01:21]: How do you make money?Micah [00:01:24]: Well, very happy to talk about that. So it's been a big journey the last couple of years. Artificial analysis is going to be two years old in January 2026. Which is pretty soon now. We first run the website for free, obviously, and give away a ton of data to help developers and companies navigate AI and make decisions about models, providers, technologies across the AI stack for building stuff. We're very committed to doing that and tend to keep doing that. We have, along the way, built a business that is working out pretty sustainably. We've got just over 20 people now and two main customer groups. So we want to be... We want to be who enterprise look to for data and insights on AI, so we want to help them with their decisions about models and technologies for building stuff. And then on the other side, we do private benchmarking for companies throughout the AI stack who build AI stuff. So no one pays to be on the website. We've been very clear about that from the very start because there's no use doing what we do unless it's independent AI benchmarking. Yeah. But turns out a bunch of our stuff can be pretty useful to companies building AI stuff.swyx [00:02:38]: And is it like, I am a Fortune 500, I need advisors on objective analysis, and I call you guys and you pull up a custom report for me, you come into my office and give me a workshop? What kind of engagement is that?George [00:02:53]: So we have a benchmarking and insight subscription, which looks like standardized reports that cover key topics or key challenges enterprises face when looking to understand AI and choose between all the technologies. And so, for instance, one of the report is a model deployment report, how to think about choosing between serverless inference, managed deployment solutions, or leasing chips. And running inference yourself is an example kind of decision that big enterprises face, and it's hard to reason through, like this AI stuff is really new to everybody. And so we try and help with our reports and insight subscription. Companies navigate that. We also do custom private benchmarking. And so that's very different from the public benchmarking that we publicize, and there's no commercial model around that. For private benchmarking, we'll at times create benchmarks, run benchmarks to specs that enterprises want. And we'll also do that sometimes for AI companies who have built things, and we help them understand what they've built with private benchmarking. Yeah. So that's a piece mainly that we've developed through trying to support everybody publicly with our public benchmarks. Yeah.swyx [00:04:09]: Let's talk about TechStack behind that. But okay, I'm going to rewind all the way to when you guys started this project. You were all the way in Sydney? Yeah. Well, Sydney, Australia for me.Micah [00:04:19]: George was an SF, but he's Australian, but he moved here already. Yeah.swyx [00:04:22]: And I remember I had the Zoom call with you. What was the impetus for starting artificial analysis in the first place? You know, you started with public benchmarks. And so let's start there. We'll go to the private benchmark. Yeah.George [00:04:33]: Why don't we even go back a little bit to like why we, you know, thought that it was needed? Yeah.Micah [00:04:40]: The story kind of begins like in 2022, 2023, like both George and I have been into AI stuff for quite a while. In 2023 specifically, I was trying to build a legal AI research assistant. So it actually worked pretty well for its era, I would say. Yeah. Yeah. So I was finding that the more you go into building something using LLMs, the more each bit of what you're doing ends up being a benchmarking problem. So had like this multistage algorithm thing, trying to figure out what the minimum viable model for each bit was, trying to optimize every bit of it as you build that out, right? Like you're trying to think about accuracy, a bunch of other metrics and performance and cost. And mostly just no one was doing anything to independently evaluate all the models. And certainly not to look at the trade-offs for speed and cost. So we basically set out just to build a thing that developers could look at to see the trade-offs between all of those things measured independently across all the models and providers. Honestly, it was probably meant to be a side project when we first started doing it.swyx [00:05:49]: Like we didn't like get together and say like, Hey, like we're going to stop working on all this stuff. I'm like, this is going to be our main thing. When I first called you, I think you hadn't decided on starting a company yet.Micah [00:05:58]: That's actually true. I don't even think we'd pause like, like George had an acquittance job. I didn't quit working on my legal AI thing. Like it was genuinely a side project.George [00:06:05]: We built it because we needed it as people building in the space and thought, Oh, other people might find it useful too. So we'll buy domain and link it to the Vercel deployment that we had and tweet about it. And, but very quickly it started getting attention. Thank you, Swyx for, I think doing an initial retweet and spotlighting it there. This project that we released. And then very quickly though, it was useful to others, but very quickly it became more useful as the number of models released accelerated. We had Mixtrel 8x7B and it was a key. That's a fun one. Yeah. Like a open source model that really changed the landscape and opened up people's eyes to other serverless inference providers and thinking about speed, thinking about cost. And so that was a key. And so it became more useful quite quickly. Yeah.swyx [00:07:02]: What I love talking to people like you who sit across the ecosystem is, well, I have theories about what people want, but you have data and that's obviously more relevant. But I want to stay on the origin story a little bit more. When you started out, I would say, I think the status quo at the time was every paper would come out and they would report their numbers versus competitor numbers. And that's basically it. And I remember I did the legwork. I think everyone has some knowledge. I think there's some version of Excel sheet or a Google sheet where you just like copy and paste the numbers from every paper and just post it up there. And then sometimes they don't line up because they're independently run. And so your numbers are going to look better than... Your reproductions of other people's numbers are going to look worse because you don't hold their models correctly or whatever the excuse is. I think then Stanford Helm, Percy Liang's project would also have some of these numbers. And I don't know if there's any other source that you can cite. The way that if I were to start artificial analysis at the same time you guys started, I would have used the Luther AI's eval framework harness. Yup.Micah [00:08:06]: Yup. That was some cool stuff. At the end of the day, running these evals, it's like if it's a simple Q&A eval, all you're doing is asking a list of questions and checking if the answers are right, which shouldn't be that crazy. But it turns out there are an enormous number of things that you've got control for. And I mean, back when we started the website. Yeah. Yeah. Like one of the reasons why we realized that we had to run the evals ourselves and couldn't just take rules from the labs was just that they would all prompt the models differently. And when you're competing over a few points, then you can pretty easily get- You can put the answer into the model. Yeah. That in the extreme. And like you get crazy cases like back when I'm Googled a Gemini 1.0 Ultra and needed a number that would say it was better than GPT-4 and like constructed, I think never published like chain of thought examples. 32 of them in every topic in MLU to run it, to get the score, like there are so many things that you- They never shipped Ultra, right? That's the one that never made it up. Not widely. Yeah. Yeah. Yeah. I mean, I'm sure it existed, but yeah. So we were pretty sure that we needed to run them ourselves and just run them in the same way across all the models. Yeah. And we were, we also did certain from the start that you couldn't look at those in isolation. You needed to look at them alongside the cost and performance stuff. Yeah.swyx [00:09:24]: Okay. A couple of technical questions. I mean, so obviously I also thought about this and I didn't do it because of cost. Yep. Did you not worry about costs? Were you funded already? Clearly not, but you know. No. Well, we definitely weren't at the start.Micah [00:09:36]: So like, I mean, we're paying for it personally at the start. There's a lot of money. Well, the numbers weren't nearly as bad a couple of years ago. So we certainly incurred some costs, but we were probably in the order of like hundreds of dollars of spend across all the benchmarking that we were doing. Yeah. So nothing. Yeah. It was like kind of fine. Yeah. Yeah. These days that's gone up an enormous amount for a bunch of reasons that we can talk about. But yeah, it wasn't that bad because you can also remember that like the number of models we were dealing with was hardly any and the complexity of the stuff that we wanted to do to evaluate them was a lot less. Like we were just asking some Q&A type questions and then one specific thing was for a lot of evals initially, we were just like sampling an answer. You know, like, what's the answer for this? Like, we didn't want to go into the answer directly without letting the models think. We weren't even doing chain of thought stuff initially. And that was the most useful way to get some results initially. Yeah.swyx [00:10:33]: And so for people who haven't done this work, literally parsing the responses is a whole thing, right? Like because sometimes the models, the models can answer any way they feel fit and sometimes they actually do have the right answer, but they just returned the wrong format and they will get a zero for that unless you work it into your parser. And that involves more work. And so, I mean, but there's an open question whether you should give it points for not following your instructions on the format.Micah [00:11:00]: It depends what you're looking at, right? Because you can, if you're trying to see whether or not it can solve a particular type of reasoning problem, and you don't want to test it on its ability to do answer formatting at the same time, then you might want to use an LLM as answer extractor approach to make sure that you get the answer out no matter how unanswered. But these days, it's mostly less of a problem. Like, if you instruct a model and give it examples of what the answers should look like, it can get the answers in your format, and then you can do, like, a simple regex.swyx [00:11:28]: Yeah, yeah. And then there's other questions around, I guess, sometimes if you have a multiple choice question, sometimes there's a bias towards the first answer, so you have to randomize the responses. All these nuances, like, once you dig into benchmarks, you're like, I don't know how anyone believes the numbers on all these things. It's so dark magic.Micah [00:11:47]: You've also got, like… You've got, like, the different degrees of variance in different benchmarks, right? Yeah. So, if you run four-question multi-choice on a modern reasoning model at the temperatures suggested by the labs for their own models, the variance that you can see on a four-question multi-choice eval is pretty enormous if you only do a single run of it and it has a small number of questions, especially. So, like, one of the things that we do is run an enormous number of all of our evals when we're developing new ones and doing upgrades to our intelligence index to bring in new things. Yeah. So, that we can dial in the right number of repeats so that we can get to the 95% confidence intervals that we're comfortable with so that when we pull that together, we can be confident in intelligence index to at least as tight as, like, a plus or minus one at a 95% confidence. Yeah.swyx [00:12:32]: And, again, that just adds a straight multiple to the cost. Oh, yeah. Yeah, yeah.George [00:12:37]: So, that's one of many reasons that cost has gone up a lot more than linearly over the last couple of years. We report a cost to run the artificial analysis. We report a cost to run the artificial analysis intelligence index on our website, and currently that's assuming one repeat in terms of how we report it because we want to reflect a bit about the weighting of the index. But our cost is actually a lot higher than what we report there because of the repeats.swyx [00:13:03]: Yeah, yeah, yeah. And probably this is true, but just checking, you don't have any special deals with the labs. They don't discount it. You just pay out of pocket or out of your sort of customer funds. Oh, there is a mix. So, the issue is that sometimes they may give you a special end point, which is… Ah, 100%.Micah [00:13:21]: Yeah, yeah, yeah. Exactly. So, we laser focus, like, on everything we do on having the best independent metrics and making sure that no one can manipulate them in any way. There are quite a lot of processes we've developed over the last couple of years to make that true for, like, the one you bring up, like, right here of the fact that if we're working with a lab, if they're giving us a private endpoint to evaluate a model, that it is totally possible. That what's sitting behind that black box is not the same as they serve on a public endpoint. We're very aware of that. We have what we call a mystery shopper policy. And so, and we're totally transparent with all the labs we work with about this, that we will register accounts not on our own domain and run both intelligence evals and performance benchmarks… Yeah, that's the job. …without them being able to identify it. And no one's ever had a problem with that. Because, like, a thing that turns out to actually be quite a good… …good factor in the industry is that they all want to believe that none of their competitors could manipulate what we're doing either.swyx [00:14:23]: That's true. I never thought about that. I've been in the database data industry prior, and there's a lot of shenanigans around benchmarking, right? So I'm just kind of going through the mental laundry list. Did I miss anything else in this category of shenanigans? Oh, potential shenanigans.Micah [00:14:36]: I mean, okay, the biggest one, like, that I'll bring up, like, is more of a conceptual one, actually, than, like, direct shenanigans. It's that the things that get measured become things that get targeted by labs that they're trying to build, right? Exactly. So that doesn't mean anything that we should really call shenanigans. Like, I'm not talking about training on test set. But if you know that you're going to be great at another particular thing, if you're a researcher, there are a whole bunch of things that you can do to try to get better at that thing that preferably are going to be helpful for a wide range of how actual users want to use the thing that you're building. But will not necessarily work. Will not necessarily do that. So, for instance, the models are exceptional now at answering competition maths problems. There is some relevance of that type of reasoning, that type of work, to, like, how we might use modern coding agents and stuff. But it's clearly not one for one. So the thing that we have to be aware of is that once an eval becomes the thing that everyone's looking at, scores can get better on it without there being a reflection of overall generalized intelligence of these models. Getting better. That has been true for the last couple of years. It'll be true for the next couple of years. There's no silver bullet to defeat that other than building new stuff to stay relevant and measure the capabilities that matter most to real users. Yeah.swyx [00:15:58]: And we'll cover some of the new stuff that you guys are building as well, which is cool. Like, you used to just run other people's evals, but now you're coming up with your own. And I think, obviously, that is a necessary path once you're at the frontier. You've exhausted all the existing evals. I think the next point in history that I have for you is AI Grant that you guys decided to join and move here. What was it like? I think you were in, like, batch two? Batch four. Batch four. Okay.Micah [00:16:26]: I mean, it was great. Nat and Daniel are obviously great. And it's a really cool group of companies that we were in AI Grant alongside. It was really great to get Nat and Daniel on board. Obviously, they've done a whole lot of great work in the space with a lot of leading companies and were extremely aligned. With the mission of what we were trying to do. Like, we're not quite typical of, like, a lot of the other AI startups that they've invested in.swyx [00:16:53]: And they were very much here for the mission of what we want to do. Did they say any advice that really affected you in some way or, like, were one of the events very impactful? That's an interesting question.Micah [00:17:03]: I mean, I remember fondly a bunch of the speakers who came and did fireside chats at AI Grant.swyx [00:17:09]: Which is also, like, a crazy list. Yeah.George [00:17:11]: Oh, totally. Yeah, yeah, yeah. There was something about, you know, speaking to Nat and Daniel about the challenges of working through a startup and just working through the questions that don't have, like, clear answers and how to work through those kind of methodically and just, like, work through the hard decisions. And they've been great mentors to us as we've built artificial analysis. Another benefit for us was that other companies in the batch and other companies in AI Grant are pushing the capabilities. Yeah. And I think that's a big part of what AI can do at this time. And so being in contact with them, making sure that artificial analysis is useful to them has been fantastic for supporting us in working out how should we build out artificial analysis to continue to being useful to those, like, you know, building on AI.swyx [00:17:59]: I think to some extent, I'm mixed opinion on that one because to some extent, your target audience is not people in AI Grants who are obviously at the frontier. Yeah. Do you disagree?Micah [00:18:09]: To some extent. To some extent. But then, so a lot of what the AI Grant companies are doing is taking capabilities coming out of the labs and trying to push the limits of what they can do across the entire stack for building great applications, which actually makes some of them pretty archetypical power users of artificial analysis. Some of the people with the strongest opinions about what we're doing well and what we're not doing well and what they want to see next from us. Yeah. Yeah. Because when you're building any kind of AI application now, chances are you're using a whole bunch of different models. You're maybe switching reasonably frequently for different models and different parts of your application to optimize what you're able to do with them at an accuracy level and to get better speed and cost characteristics. So for many of them, no, they're like not commercial customers of ours, like we don't charge for all our data on the website. Yeah. They are absolutely some of our power users.swyx [00:19:07]: So let's talk about just the evals as well. So you start out from the general like MMU and GPQA stuff. What's next? How do you sort of build up to the overall index? What was in V1 and how did you evolve it? Okay.Micah [00:19:22]: So first, just like background, like we're talking about the artificial analysis intelligence index, which is our synthesis metric that we pulled together currently from 10 different eval data sets to give what? We're pretty much the same as that. Pretty confident is the best single number to look at for how smart the models are. Obviously, it doesn't tell the whole story. That's why we published the whole website of all the charts to dive into every part of it and look at the trade-offs. But best single number. So right now, it's got a bunch of Q&A type data sets that have been very important to the industry, like a couple that you just mentioned. It's also got a couple of agentic data sets. It's got our own long context reasoning data set and some other use case focused stuff. As time goes on. The things that we're most interested in that are going to be important to the capabilities that are becoming more important for AI, what developers are caring about, are going to be first around agentic capabilities. So surprise, surprise. We're all loving our coding agents and how the model is going to perform like that and then do similar things for different types of work are really important to us. The linking to use cases to economically valuable use cases are extremely important to us. And then we've got some of the. Yeah. These things that the models still struggle with, like working really well over long contexts that are not going to go away as specific capabilities and use cases that we need to keep evaluating.swyx [00:20:46]: But I guess one thing I was driving was like the V1 versus the V2 and how bad it was over time.Micah [00:20:53]: Like how we've changed the index to where we are.swyx [00:20:55]: And I think that reflects on the change in the industry. Right. So that's a nice way to tell that story.Micah [00:21:00]: Well, V1 would be completely saturated right now. Almost every model coming out because doing things like writing the Python functions and human evil is now pretty trivial. It's easy to forget, actually, I think how much progress has been made in the last two years. Like we obviously play the game constantly of like the today's version versus last week's version and the week before and all of the small changes in the horse race between the current frontier and who has the best like smaller than 10B model like right now this week. Right. And that's very important to a lot of developers and people and especially in this particular city of San Francisco. But when you zoom out a couple of years ago, literally most of what we were doing to evaluate the models then would all be 100% solved by even pretty small models today. And that's been one of the key things, by the way, that's driven down the cost of intelligence at every tier of intelligence. We can talk about more in a bit. So V1, V2, V3, we made things harder. We covered a wider range of use cases. And we tried to get closer to things developers care about as opposed to like just the Q&A type stuff that MMLU and GPQA represented. Yeah.swyx [00:22:12]: I don't know if you have anything to add there. Or we could just go right into showing people the benchmark and like looking around and asking questions about it. Yeah.Micah [00:22:21]: Let's do it. Okay. This would be a pretty good way to chat about a few of the new things we've launched recently. Yeah.George [00:22:26]: And I think a little bit about the direction that we want to take it. And we want to push benchmarks. Currently, the intelligence index and evals focus a lot on kind of raw intelligence. But we kind of want to diversify how we think about intelligence. And we can talk about it. But kind of new evals that we've kind of built and partnered on focus on topics like hallucination. And we've got a lot of topics that I think are not covered by the current eval set that should be. And so we want to bring that forth. But before we get into that.swyx [00:23:01]: And so for listeners, just as a timestamp, right now, number one is Gemini 3 Pro High. Then followed by Cloud Opus at 70. Just 5.1 high. You don't have 5.2 yet. And Kimi K2 Thinking. Wow. Still hanging in there. So those are the top four. That will date this podcast quickly. Yeah. Yeah. I mean, I love it. I love it. No, no. 100%. Look back this time next year and go, how cute. Yep.George [00:23:25]: Totally. A quick view of that is, okay, there's a lot. I love it. I love this chart. Yeah.Micah [00:23:30]: This is such a favorite, right? Yeah. And almost every talk that George or I give at conferences and stuff, we always put this one up first to just talk about situating where we are in this moment in history. This, I think, is the visual version of what I was saying before about the zooming out and remembering how much progress there's been. If we go back to just over a year ago, before 01, before Cloud Sonnet 3.5, we didn't have reasoning models or coding agents as a thing. And the game was very, very different. If we go back even a little bit before then, we're in the era where, when you look at this chart, open AI was untouchable for well over a year. And, I mean, you would remember that time period well of there being very open questions about whether or not AI was going to be competitive, like full stop, whether or not open AI would just run away with it, whether we would have a few frontier labs and no one else would really be able to do anything other than consume their APIs. I am quite happy overall that the world that we have ended up in is one where... Multi-model. Absolutely. And strictly more competitive every quarter over the last few years. Yeah. This year has been insane. Yeah.George [00:24:42]: You can see it. This chart with everything added is hard to read currently. There's so many dots on it, but I think it reflects a little bit what we felt, like how crazy it's been.swyx [00:24:54]: Why 14 as the default? Is that a manual choice? Because you've got service now in there that are less traditional names. Yeah.George [00:25:01]: It's models that we're kind of highlighting by default in our charts, in our intelligence index. Okay.swyx [00:25:07]: You just have a manually curated list of stuff.George [00:25:10]: Yeah, that's right. But something that I actually don't think every artificial analysis user knows is that you can customize our charts and choose what models are highlighted. Yeah. And so if we take off a few names, it gets a little easier to read.swyx [00:25:25]: Yeah, yeah. A little easier to read. Totally. Yeah. But I love that you can see the all one jump. Look at that. September 2024. And the DeepSeek jump. Yeah.George [00:25:34]: Which got close to OpenAI's leadership. They were so close. I think, yeah, we remember that moment. Around this time last year, actually.Micah [00:25:44]: Yeah, yeah, yeah. I agree. Yeah, well, a couple of weeks. It was Boxing Day in New Zealand when DeepSeek v3 came out. And we'd been tracking DeepSeek and a bunch of the other global players that were less known over the second half of 2024 and had run evals on the earlier ones and stuff. I very distinctly remember Boxing Day in New Zealand, because I was with family for Christmas and stuff, running the evals and getting back result by result on DeepSeek v3. So this was the first of their v3 architecture, the 671b MOE.Micah [00:26:19]: And we were very, very impressed. That was the moment where we were sure that DeepSeek was no longer just one of many players, but had jumped up to be a thing. The world really noticed when they followed that up with the RL working on top of v3 and R1 succeeding a few weeks later. But the groundwork for that absolutely was laid with just extremely strong base model, completely open weights that we had as the best open weights model. So, yeah, that's the thing that you really see in the game. But I think that we got a lot of good feedback on Boxing Day. us on Boxing Day last year.George [00:26:48]: Boxing Day is the day after Christmas for those not familiar.George [00:26:54]: I'm from Singapore.swyx [00:26:55]: A lot of us remember Boxing Day for a different reason, for the tsunami that happened. Oh, of course. Yeah, but that was a long time ago. So yeah. So this is the rough pitch of AAQI. Is it A-A-Q-I or A-A-I-I? I-I. Okay. Good memory, though.Micah [00:27:11]: I don't know. I'm not used to it. Once upon a time, we did call it Quality Index, and we would talk about quality, performance, and price, but we changed it to intelligence.George [00:27:20]: There's been a few naming changes. We added hardware benchmarking to the site, and so benchmarks at a kind of system level. And so then we changed our throughput metric to, we now call it output speed, and thenswyx [00:27:32]: throughput makes sense at a system level, so we took that name. Take me through more charts. What should people know? Obviously, the way you look at the site is probably different than how a beginner might look at it.Micah [00:27:42]: Yeah, that's fair. There's a lot of fun stuff to dive into. Maybe so we can hit past all the, like, we have lots and lots of emails and stuff. The interesting ones to talk about today that would be great to bring up are a few of our recent things, I think, that probably not many people will be familiar with yet. So first one of those is our omniscience index. So this one is a little bit different to most of the intelligence evils that we've run. We built it specifically to look at the embedded knowledge in the models and to test hallucination by looking at when the model doesn't know the answer, so not able to get it correct, what's its probability of saying, I don't know, or giving an incorrect answer. So the metric that we use for omniscience goes from negative 100 to positive 100. Because we're simply taking off a point if you give an incorrect answer to the question. We're pretty convinced that this is an example of where it makes most sense to do that, because it's strictly more helpful to say, I don't know, instead of giving a wrong answer to factual knowledge question. And one of our goals is to shift the incentive that evils create for models and the labs creating them to get higher scores. And almost every evil across all of AI up until this point, it's been graded by simple percentage correct as the main metric, the main thing that gets hyped. And so you should take a shot at everything. There's no incentive to say, I don't know. So we did that for this one here.swyx [00:29:22]: I think there's a general field of calibration as well, like the confidence in your answer versus the rightness of the answer. Yeah, we completely agree. Yeah. Yeah.George [00:29:31]: On that. And one reason that we didn't do that is because. Or put that into this index is that we think that the, the way to do that is not to ask the models how confident they are.swyx [00:29:43]: I don't know. Maybe it might be though. You put it like a JSON field, say, say confidence and maybe it spits out something. Yeah. You know, we have done a few evils podcasts over the, over the years. And when we did one with Clementine of hugging face, who maintains the open source leaderboard, and this was one of her top requests, which is some kind of hallucination slash lack of confidence calibration thing. And so, Hey, this is one of them.Micah [00:30:05]: And I mean, like anything that we do, it's not a perfect metric or the whole story of everything that you think about as hallucination. But yeah, it's pretty useful and has some interesting results. Like one of the things that we saw in the hallucination rate is that anthropics Claude models at the, the, the very left-hand side here with the lowest hallucination rates out of the models that we've evaluated amnesty is on. That is an interesting fact. I think it probably correlates with a lot of the previously, not really measured vibes stuff that people like about some of the Claude models. Is the dataset public or what's is it, is there a held out set? There's a hell of a set for this one. So we, we have published a public test set, but we we've only published 10% of it. The reason is that for this one here specifically, it would be very, very easy to like have data contamination because it is just factual knowledge questions. We would. We'll update it at a time to also prevent that, but with yeah, kept most of it held out so that we can keep it reliable for a long time. It leads us to a bunch of really cool things, including breakdown quite granularly by topic. And so we've got some of that disclosed on the website publicly right now, and there's lots more coming in terms of our ability to break out very specific topics. Yeah.swyx [00:31:23]: I would be interested. Let's, let's dwell a little bit on this hallucination one. I noticed that Haiku hallucinates less than Sonnet hallucinates less than Opus. And yeah. Would that be the other way around in a normal capability environments? I don't know. What's, what do you make of that?George [00:31:37]: One interesting aspect is that we've found that there's not really a, not a strong correlation between intelligence and hallucination, right? That's to say that the smarter the models are in a general sense, isn't correlated with their ability to, when they don't know something, say that they don't know. It's interesting that Gemini three pro preview was a big leap over here. Gemini 2.5. Flash and, and, and 2.5 pro, but, and if I add pro quickly here.swyx [00:32:07]: I bet pro's really good. Uh, actually no, I meant, I meant, uh, the GPT pros.George [00:32:12]: Oh yeah.swyx [00:32:13]: Cause GPT pros are rumored. We don't know for a fact that it's like eight runs and then with the LM judge on top. Yeah.George [00:32:20]: So we saw a big jump in, this is accuracy. So this is just percent that they get, uh, correct and Gemini three pro knew a lot more than the other models. And so big jump in accuracy. But relatively no change between the Google Gemini models, between releases. And the hallucination rate. Exactly. And so it's likely due to just kind of different post-training recipe, between the, the Claude models. Yeah.Micah [00:32:45]: Um, there's, there's driven this. Yeah. You can, uh, you can partially blame us and how we define intelligence having until now not defined hallucination as a negative in the way that we think about intelligence.swyx [00:32:56]: And so that's what we're changing. Uh, I know many smart people who are confidently incorrect.George [00:33:02]: Uh, look, look at that. That, that, that is very humans. Very true. And there's times and a place for that. I think our view is that hallucination rate makes sense in this context where it's around knowledge, but in many cases, people want the models to hallucinate, to have a go. Often that's the case in coding or when you're trying to generate newer ideas. One eval that we added to artificial analysis is, is, is critical point and it's really hard, uh, physics problems. Okay.swyx [00:33:32]: And is it sort of like a human eval type or something different or like a frontier math type?George [00:33:37]: It's not dissimilar to frontier frontier math. So these are kind of research questions that kind of academics in the physics physics world would be able to answer, but models really struggled to answer. So the top score here is not 9%.swyx [00:33:51]: And when the people that, that created this like Minway and, and, and actually off via who was kind of behind sweep and what organization is this? Oh, is this, it's Princeton.George [00:34:01]: Kind of range of academics from, from, uh, different academic institutions, really smart people. They talked about how they turn the models up in terms of the temperature as high temperature as they can, where they're trying to explore kind of new ideas in physics as a, as a thought partner, just because they, they want the models to hallucinate. Um, yeah, sometimes it's something new. Yeah, exactly.swyx [00:34:21]: Um, so not right in every situation, but, um, I think it makes sense, you know, to test hallucination in scenarios where it makes sense. Also, the obvious question is, uh, this is one of. Many that there is there, every lab has a system card that shows some kind of hallucination number, and you've chosen to not, uh, endorse that and you've made your own. And I think that's a, that's a choice. Um, totally in some sense, the rest of artificial analysis is public benchmarks that other people can independently rerun. You provide it as a service here. You have to fight the, well, who are we to, to like do this? And your, your answer is that we have a lot of customers and, you know, but like, I guess, how do you converge the individual?Micah [00:35:08]: I mean, I think, I think for hallucinations specifically, there are a bunch of different things that you might care about reasonably, and that you'd measure quite differently, like we've called this a amnesty and solutionation rate, not trying to declare the, like, it's humanity's last hallucination. You could, uh, you could have some interesting naming conventions and all this stuff. Um, the biggest picture answer to that. It's something that I actually wanted to mention. Just as George was explaining, critical point as well is, so as we go forward, we are building evals internally. We're partnering with academia and partnering with AI companies to build great evals. We have pretty strong views on, in various ways for different parts of the AI stack, where there are things that are not being measured well, or things that developers care about that should be measured more and better. And we intend to be doing that. We're not obsessed necessarily with that. Everything we do, we have to do entirely within our own team. Critical point. As a cool example of where we were a launch partner for it, working with academia, we've got some partnerships coming up with a couple of leading companies. Those ones, obviously we have to be careful with on some of the independent stuff, but with the right disclosure, like we're completely comfortable with that. A lot of the labs have released great data sets in the past that we've used to great success independently. And so it's between all of those techniques, we're going to be releasing more stuff in the future. Cool.swyx [00:36:26]: Let's cover the last couple. And then we'll, I want to talk about your trends analysis stuff, you know? Totally.Micah [00:36:31]: So that actually, I have one like little factoid on omniscience. If you go back up to accuracy on omniscience, an interesting thing about this accuracy metric is that it tracks more closely than anything else that we measure. The total parameter count of models makes a lot of sense intuitively, right? Because this is a knowledge eval. This is the pure knowledge metric. We're not looking at the index and the hallucination rate stuff that we think is much more about how the models are trained. This is just what facts did they recall? And yeah, it tracks parameter count extremely closely. Okay.swyx [00:37:05]: What's the rumored size of GPT-3 Pro? And to be clear, not confirmed for any official source, just rumors. But rumors do fly around. Rumors. I get, I hear all sorts of numbers. I don't know what to trust.Micah [00:37:17]: So if you, if you draw the line on omniscience accuracy versus total parameters, we've got all the open ways models, you can squint and see that likely the leading frontier models right now are quite a lot bigger than the ones that we're seeing right now. And the one trillion parameters that the open weights models cap out at, and the ones that we're looking at here, there's an interesting extra data point that Elon Musk revealed recently about XAI that for three trillion parameters for GROK 3 and 4, 6 trillion for GROK 5, but that's not out yet. Take those together, have a look. You might reasonably form a view that there's a pretty good chance that Gemini 3 Pro is bigger than that, that it could be in the 5 to 10 trillion parameters. To be clear, I have absolutely no idea, but just based on this chart, like that's where you would, you would land if you have a look at it. Yeah.swyx [00:38:07]: And to some extent, I actually kind of discourage people from guessing too much because what does it really matter? Like as long as they can serve it as a sustainable cost, that's about it. Like, yeah, totally.George [00:38:17]: They've also got different incentives in play compared to like open weights models who are thinking to supporting others in self-deployment for the labs who are doing inference at scale. It's I think less about total parameters in many cases. When thinking about inference costs and more around number of active parameters. And so there's a bit of an incentive towards larger sparser models. Agreed.Micah [00:38:38]: Understood. Yeah. Great. I mean, obviously if you're a developer or company using these things, not exactly as you say, it doesn't matter. You should be looking at all the different ways that we measure intelligence. You should be looking at cost to run index number and the different ways of thinking about token efficiency and cost efficiency based on the list prices, because that's all it matters.swyx [00:38:56]: It's not as good for the content creator rumor mill where I can say. Oh, GPT-4 is this small circle. Look at GPT-5 is this big circle. And then there used to be a thing for a while. Yeah.Micah [00:39:07]: But that is like on its own, actually a very interesting one, right? That is it just purely that chances are the last couple of years haven't seen a dramatic scaling up in the total size of these models. And so there's a lot of room to go up properly in total size of the models, especially with the upcoming hardware generations. Yes.swyx [00:39:29]: So, you know. Taking off my shitposting face for a minute. Yes. Yes. At the same time, I do feel like, you know, especially coming back from Europe, people do feel like Ilya is probably right that the paradigm is doesn't have many more orders of magnitude to scale out more. And therefore we need to start exploring at least a different path. GDPVal, I think it's like only like a month or so old. I was also very positive when it first came out. I actually talked to Tejo, who was the lead researcher on that. Oh, cool. And you have your own version.George [00:39:59]: It's a fantastic. It's a fantastic data set. Yeah.swyx [00:40:01]: And maybe it will recap for people who are still out of it. It's like 44 tasks based on some kind of GDP cutoff that's like meant to represent broad white collar work that is not just coding. Yeah.Micah [00:40:12]: Each of the tasks have a whole bunch of detailed instructions, some input files for a lot of them. It's within the 44 is divided into like two hundred and twenty two to five, maybe subtasks that are the level of that we run through the agenda. And yeah, they're really interesting. I will say that it doesn't. It doesn't necessarily capture like all the stuff that people do at work. No avail is perfect is always going to be more things to look at, largely because in order to make the tasks well enough to find that you can run them, they need to only have a handful of input files and very specific instructions for that task. And so I think the easiest way to think about them are that they're like quite hard take home exam tasks that you might do in an interview process.swyx [00:40:56]: Yeah, for listeners, it is not no longer like a long prompt. It is like, well, here's a zip file with like a spreadsheet or a PowerPoint deck or a PDF and go nuts and answer this question.George [00:41:06]: OpenAI released a great data set and they released a good paper which looks at performance across the different web chat bots on the data set. It's a great paper, encourage people to read it. What we've done is taken that data set and turned it into an eval that can be run on any model. So we created a reference agentic harness that can run. Run the models on the data set, and then we developed evaluator approach to compare outputs. That's kind of AI enabled, so it uses Gemini 3 Pro Preview to compare results, which we tested pretty comprehensively to ensure that it's aligned to human preferences. One data point there is that even as an evaluator, Gemini 3 Pro, interestingly, doesn't do actually that well. So that's kind of a good example of what we've done in GDPVal AA.swyx [00:42:01]: Yeah, the thing that you have to watch out for with LLM judge is self-preference that models usually prefer their own output, and in this case, it was not. Totally.Micah [00:42:08]: I think the way that we're thinking about the places where it makes sense to use an LLM as judge approach now, like quite different to some of the early LLM as judge stuff a couple of years ago, because some of that and MTV was a great project that was a good example of some of this a while ago was about judging conversations and like a lot of style type stuff. Here, we've got the task that the grader and grading model is doing is quite different to the task of taking the test. When you're taking the test, you've got all of the agentic tools you're working with, the code interpreter and web search, the file system to go through many, many turns to try to create the documents. Then on the other side, when we're grading it, we're running it through a pipeline to extract visual and text versions of the files and be able to provide that to Gemini, and we're providing the criteria for the task and getting it to pick which one more effectively meets the criteria of the task. Yeah. So we've got the task out of two potential outcomes. It turns out that we proved that it's just very, very good at getting that right, matched with human preference a lot of the time, because I think it's got the raw intelligence, but it's combined with the correct representation of the outputs, the fact that the outputs were created with an agentic task that is quite different to the way the grading model works, and we're comparing it against criteria, not just kind of zero shot trying to ask the model to pick which one is better.swyx [00:43:26]: Got it. Why is this an ELO? And not a percentage, like GDP-VAL?George [00:43:31]: So the outputs look like documents, and there's video outputs or audio outputs from some of the tasks. It has to make a video? Yeah, for some of the tasks. Some of the tasks.swyx [00:43:43]: What task is that?George [00:43:45]: I mean, it's in the data set. Like be a YouTuber? It's a marketing video.Micah [00:43:49]: Oh, wow. What? Like model has to go find clips on the internet and try to put it together. The models are not that good at doing that one, for now, to be clear. It's pretty hard to do that with a code editor. I mean, the computer stuff doesn't work quite well enough and so on and so on, but yeah.George [00:44:02]: And so there's no kind of ground truth, necessarily, to compare against, to work out percentage correct. It's hard to come up with correct or incorrect there. And so it's on a relative basis. And so we use an ELO approach to compare outputs from each of the models between the task.swyx [00:44:23]: You know what you should do? You should pay a contractor, a human, to do the same task. And then give it an ELO and then so you have, you have human there. It's just, I think what's helpful about GDPVal, the OpenAI one, is that 50% is meant to be normal human and maybe Domain Expert is higher than that, but 50% was the bar for like, well, if you've crossed 50, you are superhuman. Yeah.Micah [00:44:47]: So we like, haven't grounded this score in that exactly. I agree that it can be helpful, but we wanted to generalize this to a very large number. It's one of the reasons that presenting it as ELO is quite helpful and allows us to add models and it'll stay relevant for quite a long time. I also think it, it can be tricky looking at these exact tasks compared to the human performance, because the way that you would go about it as a human is quite different to how the models would go about it. Yeah.swyx [00:45:15]: I also liked that you included Lama 4 Maverick in there. Is that like just one last, like...Micah [00:45:20]: Well, no, no, no, no, no, no, it is the, it is the best model released by Meta. And... So it makes it into the homepage default set, still for now.George [00:45:31]: Other inclusion that's quite interesting is we also ran it across the latest versions of the web chatbots. And so we have...swyx [00:45:39]: Oh, that's right.George [00:45:40]: Oh, sorry.swyx [00:45:41]: I, yeah, I completely missed that. Okay.George [00:45:43]: No, not at all. So that, which has a checkered pattern. So that is their harness, not yours, is what you're saying. Exactly. And what's really interesting is that if you compare, for instance, Claude 4.5 Opus using the Claude web chatbot, it performs worse than the model in our agentic harness. And so in every case, the model performs better in our agentic harness than its web chatbot counterpart, the harness that they created.swyx [00:46:13]: Oh, my backwards explanation for that would be that, well, it's meant for consumer use cases and here you're pushing it for something.Micah [00:46:19]: The constraints are different and the amount of freedom that you can give the model is different. Also, you like have a cost goal. We let the models work as long as they want, basically. Yeah. Do you copy paste manually into the chatbot? Yeah. Yeah. That's, that was how we got the chatbot reference. We're not going to be keeping those updated at like quite the same scale as hundreds of models.swyx [00:46:38]: Well, so I don't know, talk to a browser base. They'll, they'll automate it for you. You know, like I have thought about like, well, we should turn these chatbot versions into an API because they are legitimately different agents in themselves. Yes. Right. Yeah.Micah [00:46:53]: And that's grown a huge amount of the last year, right? Like the tools. The tools that are available have actually diverged in my opinion, a fair bit across the major chatbot apps and the amount of data sources that you can connect them to have gone up a lot, meaning that your experience and the way you're using the model is more different than ever.swyx [00:47:10]: What tools and what data connections come to mind when you say what's interesting, what's notable work that people have done?Micah [00:47:15]: Oh, okay. So my favorite example on this is that until very recently, I would argue that it was basically impossible to get an LLM to draft an email for me in any useful way. Because most times that you're sending an email, you're not just writing something for the sake of writing it. Chances are context required is a whole bunch of historical emails. Maybe it's notes that you've made, maybe it's meeting notes, maybe it's, um, pulling something from your, um, any of like wherever you at work store stuff. So for me, like Google drive, one drive, um, in our super base databases, if we need to do some analysis or some data or something, preferably model can be plugged into all of those things and can go do some useful work based on it. The things that like I find most impressive currently that I am somewhat surprised work really well in late 2025, uh, that I can have models use super base MCP to query read only, of course, run a whole bunch of SQL queries to do pretty significant data analysis. And. And make charts and stuff and can read my Gmail and my notion. And okay. You actually use that. That's good. That's, that's, that's good. Is that a cloud thing? To various degrees of order, but chat GPD and Claude right now, I would say that this stuff like barely works in fairness right now. Like.George [00:48:33]: Because people are actually going to try this after they hear it. If you get an email from Micah, odds are it wasn't written by a chatbot.Micah [00:48:38]: So, yeah, I think it is true that I have never actually sent anyone an email drafted by a chatbot. Yet.swyx [00:48:46]: Um, and so you can, you can feel it right. And yeah, this time, this time next year, we'll come back and see where it's going. Totally. Um, super base shout out another famous Kiwi. Uh, I don't know if you've, you've any conversations with him about anything in particular on AI building and AI infra.George [00:49:03]: We have had, uh, Twitter DMS, um, with, with him because we're quite big, uh, super base users and power users. And we probably do some things more manually than we should in. In, in super base support line because you're, you're a little bit being super friendly. One extra, um, point regarding, um, GDP Val AA is that on the basis of the overperformance of the models compared to the chatbots turns out, we realized that, oh, like our reference harness that we built actually white works quite well on like gen generalist agentic tasks. This proves it in a sense. And so the agent harness is very. Minimalist. I think it follows some of the ideas that are in Claude code and we, all that we give it is context management capabilities, a web search, web browsing, uh, tool, uh, code execution, uh, environment. Anything else?Micah [00:50:02]: I mean, we can equip it with more tools, but like by default, yeah, that's it. We, we, we give it for GDP, a tool to, uh, view an image specifically, um, because the models, you know, can just use a terminal to pull stuff in text form into context. But to pull visual stuff into context, we had to give them a custom tool, but yeah, exactly. Um, you, you can explain an expert. No.George [00:50:21]: So it's, it, we turned out that we created a good generalist agentic harness. And so we, um, released that on, on GitHub yesterday. It's called stirrup. So if people want to check it out and, and it's a great, um, you know, base for, you know, generalist, uh, building a generalist agent for more specific tasks.Micah [00:50:39]: I'd say the best way to use it is get clone and then have your favorite coding. Agent make changes to it, to do whatever you want, because it's not that many lines of code and the coding agents can work with it. Super well.swyx [00:50:51]: Well, that's nice for the community to explore and share and hack on it. I think maybe in, in, in other similar environments, the terminal bench guys have done, uh, sort of the Harbor. Uh, and so it's, it's a, it's a bundle of, well, we need our minimal harness, which for them is terminus and we also need the RL environments or Docker deployment thing to, to run independently. So I don't know if you've looked at it. I don't know if you've looked at the harbor at all, is that, is that like a, a standard that people want to adopt?George [00:51:19]: Yeah, we've looked at it from a evals perspective and we love terminal bench and, and host benchmarks of, of, of terminal mention on artificial analysis. Um, we've looked at it from a, from a coding agent perspective, but could see it being a great, um, basis for any kind of agents. I think where we're getting to is that these models have gotten smart enough. They've gotten better, better tools that they can perform better when just given a minimalist. Set of tools and, and let them run, let the model control the, the agentic workflow rather than using another framework that's a bit more built out that tries to dictate the, dictate the flow. Awesome.swyx [00:51:56]: Let's cover the openness index and then let's go into the report stuff. Uh, so that's the, that's the last of the proprietary art numbers, I guess. I don't know how you sort of classify all these. Yeah.Micah [00:52:07]: Or call it, call it, let's call it the last of like the, the three new things that we're talking about from like the last few weeks. Um, cause I mean, there's a, we do a mix of stuff that. Where we're using open source, where we open source and what we do and, um, proprietary stuff that we don't always open source, like long context reasoning data set last year, we did open source. Um, and then all of the work on performance benchmarks across the site, some of them, we looking to open source, but some of them, like we're constantly iterating on and so on and so on and so on. So there's a huge mix, I would say, just of like stuff that is open source and not across the side. So that's a LCR for people. Yeah, yeah, yeah, yeah.swyx [00:52:41]: Uh, but let's, let's, let's talk about open.Micah [00:52:42]: Let's talk about openness index. This. Here is call it like a new way to think about how open models are. We, for a long time, have tracked where the models are open weights and what the licenses on them are. And that's like pretty useful. That tells you what you're allowed to do with the weights of a model, but there is this whole other dimension to how open models are. That is pretty important that we haven't tracked until now. And that's how much is disclosed about how it was made. So transparency about data, pre-training data and post-training data. And whether you're allowed to use that data and transparency about methodology and training code. So basically, those are the components. We bring them together to score an openness index for models so that you can in one place get this full picture of how open models are.swyx [00:53:32]: I feel like I've seen a couple other people try to do this, but they're not maintained. I do think this does matter. I don't know what the numbers mean apart from is there a max number? Is this out of 20?George [00:53:44]: It's out of 18 currently, and so we've got an openness index page, but essentially these are points, you get points for being more open across these different categories and the maximum you can achieve is 18. So AI2 with their extremely open OMO3 32B think model is the leader in a sense.swyx [00:54:04]: It's hooking face.George [00:54:05]: Oh, with their smaller model. It's coming soon. I think we need to run, we need to get the intelligence benchmarks right to get it on the site.swyx [00:54:12]: You can't have it open in the next. We can not include hooking face. We love hooking face. We'll have that, we'll have that up very soon. I mean, you know, the refined web and all that stuff. It's, it's amazing. Or is it called fine web? Fine web. Fine web.Micah [00:54:23]: Yeah, yeah, no, totally. Yep. One of the reasons this is cool, right, is that if you're trying to understand the holistic picture of the models and what you can do with all the stuff the company's contributing, this gives you that picture. And so we are going to keep it up to date alongside all the models that we do intelligence index on, on the site. And it's just an extra view to understand.swyx [00:54:43]: Can you scroll down to this? The, the, the, the trade-offs chart. Yeah, yeah. That one. Yeah. This, this really matters, right? Obviously, because you can b
a16z co-founder and General Partner Marc Andreessen joins an AMA-style conversation to explain why AI is the largest technology shift he has experienced, how the cost of intelligence is collapsing, and why the market still feels early despite rapid adoption. The discussion covers how falling model costs and fast capability gains are reshaping pricing, distribution, and competition across the AI stack, why usage-based and value-based pricing are becoming standard, and how startups and incumbents are navigating big versus small models and open versus closed systems. Marc also addresses China's progress, regulatory fragmentation, lessons from Europe, and why venture portfolios are designed to back multiple, conflicting outcomes at once. Resources:Follow Marc Andreessen on X: https://twitter.com/pmarcaFollow Jen Kha on X: https://twitter.com/jkhamehl Stay Updated:If you enjoyed this episode, be sure to like, subscribe, and share with your friends!Find a16z on X :https://twitter.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zListen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYXListen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711Follow our host: https://twitter.com/eriktorenbergPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
This is a recap of the top 10 posts on Hacker News on January 06, 2026. This podcast was generated by wondercraft.ai (00:30): Vietnam bans unskippable adsOriginal post: https://news.ycombinator.com/item?id=46514677&utm_source=wondercraft_ai(01:52): enclose.horseOriginal post: https://news.ycombinator.com/item?id=46509211&utm_source=wondercraft_ai(03:14): AWS raises GPU prices 15% on a Saturday, hopes you weren't paying attentionOriginal post: https://news.ycombinator.com/item?id=46511153&utm_source=wondercraft_ai(04:36): The Post-American InternetOriginal post: https://news.ycombinator.com/item?id=46509019&utm_source=wondercraft_ai(05:59): 65% of Hacker News posts have negative sentiment, and they outperformOriginal post: https://news.ycombinator.com/item?id=46512881&utm_source=wondercraft_ai(07:21): Opus 4.5 is not the normal AI agent experience that I have had thus farOriginal post: https://news.ycombinator.com/item?id=46515696&utm_source=wondercraft_ai(08:43): C Is Best (2025)Original post: https://news.ycombinator.com/item?id=46511470&utm_source=wondercraft_ai(10:05): Why is the Gmail app 700 MB?Original post: https://news.ycombinator.com/item?id=46514692&utm_source=wondercraft_ai(11:28): Stop Doom Scrolling, Start Doom Coding: Build via the terminal from your phoneOriginal post: https://news.ycombinator.com/item?id=46517458&utm_source=wondercraft_ai(12:50): Show HN: Prism.Tools – Free and privacy-focused developer utilitiesOriginal post: https://news.ycombinator.com/item?id=46511469&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai
The future of AI training is shaped by one constraint: keeping GPUs fed.In this episode, Lukas Biewald talks with CoreWeave SVP Corey Sanders about why general-purpose clouds start to break down under large-scale AI workloads.According to Corey, the industry is shifting toward a "Neo Cloud" model to handle the unique demands of modern models.They dive into the hardware and software stack required to maximize GPU utilization and achieve high goodput.Corey's conclusion is clear: AI demands specialization.Connect with us here:Corey Sanders: https://www.linkedin.com/in/corey-sanders-842b72/ CoreWeave: https://www.linkedin.com/company/coreweave/ Lukas Biewald: https://www.linkedin.com/in/lbiewald/ Weights & Biases: https://www.linkedin.com/company/wandb/(00:00) Trailer(00:57) Introduction(02:51) The Evolution of AI Workloads(06:22) Core Weave's Technological Innovations(13:58) Customer Engagement and Future Prospects(28:49) Comparing Cloud Approaches(33:50) Balancing Executive Roles and Hands-On Projects(46:44) Product Development and Customer Feedback
It's been a travel-heavy hiatus—Mark's been living in Spain and Shashank's been bouncing across Asia (including a month in China)—but they're back to unpack a packed week of AI news. They start with the headline hardware story: the Groq (GROQ) deal/partnership dynamics and why ultra-fast inference is becoming the next battleground, plus how this could reshape access to cutting-edge serving across the ecosystem. From there, they pivot to NVIDIA's CES announcements and what “Vera Rubin” implies for data center upgrades, cost-per-token curves, and the messy real-world math of rolling hardware generations. Shashank then brings the future to life with on-the-ground stories from China: a Huawei “everything store” that feels like an Apple Store meets a luxury dealership, folding devices that look straight out of sci-fi, and a parade of robots—from coffee bots to delivery robots that can ride elevators and deliver to your hotel room. They also touch on companion-style consumer robots and why “cute” might be a serious product strategy. Finally, Mark announces the launch of Novacut, a long-form AI video editor built to turn hours of travel footage into a coherent vlog draft—plus export workflows for Premiere, DaVinci Resolve, and Final Cut. They close by talking about the 2026 shift from single model calls to “agentic” systems, including a fun (and slightly alarming) lesson from LLM outcome bias using poker hand reviews. Topics include: Groq inference, NVIDIA + CES, Vera Rubin GPUs, GPU depreciation math, China robotics, Huawei ecosystem, hotel delivery bots, companion robots, Novacut launch, Cursor vs agent workflows, and why agents still struggle with sparse feedback loops. Link mentioned: Novacut — https://novacut.ai
NVIDIA is rumored to be bringing back the GeForce RTX 3060 again due to skyrocketing RAM and GPU prices, with AI being blamed. Man, you really don't want to build a gaming PC in 2026...Watch this podcast episode on YouTube and all major podcast hosts including Spotify.CLOWNFISH TV is an independent, opinionated news and commentary podcast that covers Entertainment and Tech from a consumer's point of view. We talk about Gaming, Comics, Anime, TV, Movies, Animation and more. Hosted by Kneon and Geeky Sparkles.D/REZZED News covers Pixels, Pop Culture, and the Paranormal! We're an independent, opinionated entertainment news blog covering Video Games, Tech, Comics, Movies, Anime, High Strangeness, and more. As part of Clownfish TV, we strive to be balanced, based, and apolitical. Get more news, views and reviews on Clownfish TV News - https://more.clownfishtv.com/On YouTube - https://www.youtube.com/c/ClownfishTVOn Spotify - https://open.spotify.com/show/4Tu83D1NcCmh7K1zHIedvgOn Apple Podcasts - https://podcasts.apple.com/us/podcast/clownfish-tv-audio-edition/id1726838629
Happy New Year! NVIDIA just spent $20 billion to hollow out an AI company for its brains, while Meta and Google scramble to scoop up fresh talent before AI gets "too weird to manage." Who's winning, who's left behind, and what do these backroom deals mean for the future of artificial intelligence? Andrej Karpathy admits programmers cannot keep pace with AI advances Economic uncertainty in AI despite massive stock market influence Google, Anthropic, and Microsoft drive AI productization for business and consumers OpenAI, Claude, and Gemini battle for consumer AI dominance Journalism struggles to keep up with AI realities and misinformation tools Concerns mount over AI energy, water, and environmental impact narratives Meta buys Manus, expands AI agent ambitions with Llama model OpenAI posts high-stress "Head of Preparedness" job worth $555K+ Training breakthroughs: DeepSeek's mHC and comparisons to Action Park U.S. lawmakers push broad, controversial internet censorship bills Age verification and bans spark state laws, VPN workaround explosion U.S. drone ban labeled protectionist as industry faces tech shortages FCC security initiatives falter; Cyber Trust Mark program scrapped Waymo robotaxis stall in blackouts, raising AV urban planning issues School cellphone bans expose kids' struggle with analog clocks MetroCard era ends in NYC as tap-to-pay takes over subway access RAM, VRAM, and GPU prices soar as AI and gaming squeeze supply CES preview: Samsung QD-OLED TV, Sony AFEELA car, gadget show hype Remembering Stewart Cheifet and Computer Chronicles' legacy Host: Leo Laporte Guests: Dan Patterson and Joey de Villa Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Sponsors: zscaler.com/security canary.tools/twit - use code: TWIT monarch.com with code TWIT Melissa.com/twit redis.io
Happy New Year! NVIDIA just spent $20 billion to hollow out an AI company for its brains, while Meta and Google scramble to scoop up fresh talent before AI gets "too weird to manage." Who's winning, who's left behind, and what do these backroom deals mean for the future of artificial intelligence? Andrej Karpathy admits programmers cannot keep pace with AI advances Economic uncertainty in AI despite massive stock market influence Google, Anthropic, and Microsoft drive AI productization for business and consumers OpenAI, Claude, and Gemini battle for consumer AI dominance Journalism struggles to keep up with AI realities and misinformation tools Concerns mount over AI energy, water, and environmental impact narratives Meta buys Manus, expands AI agent ambitions with Llama model OpenAI posts high-stress "Head of Preparedness" job worth $555K+ Training breakthroughs: DeepSeek's mHC and comparisons to Action Park U.S. lawmakers push broad, controversial internet censorship bills Age verification and bans spark state laws, VPN workaround explosion U.S. drone ban labeled protectionist as industry faces tech shortages FCC security initiatives falter; Cyber Trust Mark program scrapped Waymo robotaxis stall in blackouts, raising AV urban planning issues School cellphone bans expose kids' struggle with analog clocks MetroCard era ends in NYC as tap-to-pay takes over subway access RAM, VRAM, and GPU prices soar as AI and gaming squeeze supply CES preview: Samsung QD-OLED TV, Sony AFEELA car, gadget show hype Remembering Stewart Cheifet and Computer Chronicles' legacy Host: Leo Laporte Guests: Dan Patterson and Joey de Villa Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Sponsors: zscaler.com/security canary.tools/twit - use code: TWIT monarch.com with code TWIT Melissa.com/twit redis.io
Happy New Year! NVIDIA just spent $20 billion to hollow out an AI company for its brains, while Meta and Google scramble to scoop up fresh talent before AI gets "too weird to manage." Who's winning, who's left behind, and what do these backroom deals mean for the future of artificial intelligence? Andrej Karpathy admits programmers cannot keep pace with AI advances Economic uncertainty in AI despite massive stock market influence Google, Anthropic, and Microsoft drive AI productization for business and consumers OpenAI, Claude, and Gemini battle for consumer AI dominance Journalism struggles to keep up with AI realities and misinformation tools Concerns mount over AI energy, water, and environmental impact narratives Meta buys Manus, expands AI agent ambitions with Llama model OpenAI posts high-stress "Head of Preparedness" job worth $555K+ Training breakthroughs: DeepSeek's mHC and comparisons to Action Park U.S. lawmakers push broad, controversial internet censorship bills Age verification and bans spark state laws, VPN workaround explosion U.S. drone ban labeled protectionist as industry faces tech shortages FCC security initiatives falter; Cyber Trust Mark program scrapped Waymo robotaxis stall in blackouts, raising AV urban planning issues School cellphone bans expose kids' struggle with analog clocks MetroCard era ends in NYC as tap-to-pay takes over subway access RAM, VRAM, and GPU prices soar as AI and gaming squeeze supply CES preview: Samsung QD-OLED TV, Sony AFEELA car, gadget show hype Remembering Stewart Cheifet and Computer Chronicles' legacy Host: Leo Laporte Guests: Dan Patterson and Joey de Villa Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Sponsors: zscaler.com/security canary.tools/twit - use code: TWIT monarch.com with code TWIT Melissa.com/twit redis.io
Happy New Year! NVIDIA just spent $20 billion to hollow out an AI company for its brains, while Meta and Google scramble to scoop up fresh talent before AI gets "too weird to manage." Who's winning, who's left behind, and what do these backroom deals mean for the future of artificial intelligence? Andrej Karpathy admits programmers cannot keep pace with AI advances Economic uncertainty in AI despite massive stock market influence Google, Anthropic, and Microsoft drive AI productization for business and consumers OpenAI, Claude, and Gemini battle for consumer AI dominance Journalism struggles to keep up with AI realities and misinformation tools Concerns mount over AI energy, water, and environmental impact narratives Meta buys Manus, expands AI agent ambitions with Llama model OpenAI posts high-stress "Head of Preparedness" job worth $555K+ Training breakthroughs: DeepSeek's mHC and comparisons to Action Park U.S. lawmakers push broad, controversial internet censorship bills Age verification and bans spark state laws, VPN workaround explosion U.S. drone ban labeled protectionist as industry faces tech shortages FCC security initiatives falter; Cyber Trust Mark program scrapped Waymo robotaxis stall in blackouts, raising AV urban planning issues School cellphone bans expose kids' struggle with analog clocks MetroCard era ends in NYC as tap-to-pay takes over subway access RAM, VRAM, and GPU prices soar as AI and gaming squeeze supply CES preview: Samsung QD-OLED TV, Sony AFEELA car, gadget show hype Remembering Stewart Cheifet and Computer Chronicles' legacy Host: Leo Laporte Guests: Dan Patterson and Joey de Villa Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Sponsors: zscaler.com/security canary.tools/twit - use code: TWIT monarch.com with code TWIT Melissa.com/twit redis.io
Happy New Year! NVIDIA just spent $20 billion to hollow out an AI company for its brains, while Meta and Google scramble to scoop up fresh talent before AI gets "too weird to manage." Who's winning, who's left behind, and what do these backroom deals mean for the future of artificial intelligence? Andrej Karpathy admits programmers cannot keep pace with AI advances Economic uncertainty in AI despite massive stock market influence Google, Anthropic, and Microsoft drive AI productization for business and consumers OpenAI, Claude, and Gemini battle for consumer AI dominance Journalism struggles to keep up with AI realities and misinformation tools Concerns mount over AI energy, water, and environmental impact narratives Meta buys Manus, expands AI agent ambitions with Llama model OpenAI posts high-stress "Head of Preparedness" job worth $555K+ Training breakthroughs: DeepSeek's mHC and comparisons to Action Park U.S. lawmakers push broad, controversial internet censorship bills Age verification and bans spark state laws, VPN workaround explosion U.S. drone ban labeled protectionist as industry faces tech shortages FCC security initiatives falter; Cyber Trust Mark program scrapped Waymo robotaxis stall in blackouts, raising AV urban planning issues School cellphone bans expose kids' struggle with analog clocks MetroCard era ends in NYC as tap-to-pay takes over subway access RAM, VRAM, and GPU prices soar as AI and gaming squeeze supply CES preview: Samsung QD-OLED TV, Sony AFEELA car, gadget show hype Remembering Stewart Cheifet and Computer Chronicles' legacy Host: Leo Laporte Guests: Dan Patterson and Joey de Villa Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Sponsors: zscaler.com/security canary.tools/twit - use code: TWIT monarch.com with code TWIT Melissa.com/twit redis.io
Gary Shapiro has spent decades at the center of the global consumer technology industry, leading the Consumer Technology Association (CTA) and building CES into one of the most important stages for innovation, policy, and deal-making on the planet. In this first episode of 2026, Gary joins Charlie, Rony, and Ted to preview CES, unpack the explosion of AI across every category, and deliver unusually blunt takes on tariffs, China, manufacturing, and U.S. innovation policy. He explains how CES has evolved from a TV-and-gadgets show into a global platform where boards meet, standards are set, and policymakers, chip designers, robotics firms, and health-tech startups all collide.In the News: Before Gary joins, the hosts break down Nvidia's $20 billion “not-a-deal” with Singapore's Groq, the stake in Intel, and what that combo might signal about the edge of the GPU bubble and the shift toward inference compute, x86, and U.S. industrial policy. They also dig into Netflix's acquisition of Ready Player Me and what it suggests about a Netflix metaverse and location-based entertainment strategy, plus Starlink's rapid growth and an onslaught of “AI everything” products ahead of CES.Gary walks through new features at this year's show: CES Foundry at the Fontainebleau for AI and quantum, expanded tracks on manufacturing, wearables, women's health, and accessibility, plus an AI-powered show app already fielding thousands of questions (top query: where to pick up badges). He also talks candidly about his biggest concern—that fragmented state-level AI regulation (1,200+ state bills in 2025) will crush startups while big players shrug—and why he believes federal standards via NIST are the only realistic path. The discussion ranges from AI-driven healthcare and precision agriculture to robotics, demographics, labor culture, global supply chains, and what CES might look like in 2056.5 Key Takeaways from Gary:AI is now the spine of CES. CES 2026 centers on AI as infrastructure: CES Foundry at the Fontainebleau for AI + quantum, AI training tracks for strategy, implementation, agentic AI, and AI-driven marketing, and an AI-powered app helping attendees navigate the show.Fragmented state AI laws are an existential risk for startups. Over 1,200 state AI bills in 2025—including proposals to criminalize agentic AI counseling—could create a compliance maze only large incumbents can survive, which is why Gary argues for federal standards via NIST.Wearables are becoming systems, not gadgets. Oura rings, wrist devices, body sensors, and subdermal glucose monitors are starting to be designed as interoperable families of devices, with partnerships emerging to combine data into unified health services.Robotics is breaking out of the industrial niche. CES will showcase the largest robotics presence yet, moving beyond factory arms and drones to humanoids, logistics, social companions, and applied AI systems across sectors.Tariffs, alliances, and AI will reshape manufacturing. Gary is skeptical of “Fortress USA” strategies that try to onshore everything, pointing instead to allied reshoring (Latin America, Europe, Japan, South Korea) and the long-term role of AI-powered robotics in changing labor economics and global supply chains.This episode is brought to you by Zappar, creators of Mattercraft—the leading visual development environment for building immersive 3D web experiences for mobile headsets and desktop. Mattercraft combines the power of a game engine with the flexibility of the web, and now features an AI assistant that helps you design, code, and debug in real time, right in your browser. Whether you're a developer, designer, or just getting started, start building smarter at mattercraft.io.See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
From undergraduate research seminars at Princeton to winning Best Paper award at NeurIPS 2025, Kevin Wang, Ishaan Javali, Michał Bortkiewicz, Tomasz Trzcinski, Benjamin Eysenbach defied conventional wisdom by scaling reinforcement learning networks to 1,000 layers deep—unlocking performance gains that the RL community thought impossible. We caught up with the team live at NeurIPS to dig into the story behind RL1000: why deep networks have worked in language and vision but failed in RL for over a decade (spoiler: it's not just about depth, it's about the objective), how they discovered that self-supervised RL (learning representations of states, actions, and future states via contrastive learning) scales where value-based methods collapse, the critical architectural tricks that made it work (residual connections, layer normalization, and a shift from regression to classification), why scaling depth is more parameter-efficient than scaling width (linear vs. quadratic growth), how Jax and GPU-accelerated environments let them collect hundreds of millions of transitions in hours (the data abundance that unlocked scaling in the first place), the "critical depth" phenomenon where performance doesn't just improve—it multiplies once you cross 15M+ transitions and add the right architectural components, why this isn't just "make networks bigger" but a fundamental shift in RL objectives (their code doesn't have a line saying "maximize rewards"—it's pure self-supervised representation learning), how deep teacher, shallow student distillation could unlock deployment at scale (train frontier capabilities with 1000 layers, distill down to efficient inference models), the robotics implications (goal-conditioned RL without human supervision or demonstrations, scaling architecture instead of scaling manual data collection), and their thesis that RL is finally ready to scale like language and vision—not by throwing compute at value functions, but by borrowing the self-supervised, representation-learning paradigms that made the rest of deep learning work. We discuss: The self-supervised RL objective: instead of learning value functions (noisy, biased, spurious), they learn representations where states along the same trajectory are pushed together, states along different trajectories are pushed apart—turning RL into a classification problem Why naive scaling failed: doubling depth degraded performance, doubling again with residual connections and layer norm suddenly skyrocketed performance in one environment—unlocking the "critical depth" phenomenon Scaling depth vs. width: depth grows parameters linearly, width grows quadratically—depth is more parameter-efficient and sample-efficient for the same performance The Jax + GPU-accelerated environments unlock: collecting thousands of trajectories in parallel meant data wasn't the bottleneck, and crossing 15M+ transitions was when deep networks really paid off The blurring of RL and self-supervised learning: their code doesn't maximize rewards directly, it's an actor-critic goal-conditioned RL algorithm, but the learning burden shifts to classification (cross-entropy loss, representation learning) instead of TD error regression Why scaling batch size unlocks at depth: traditional RL doesn't benefit from larger batches because networks are too small to exploit the signal, but once you scale depth, batch size becomes another effective scaling dimension — RL1000 Team (Princeton) 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities: https://openreview.net/forum?id=s0JVsx3bx1 Chapters 00:00:00 Introduction: Best Paper Award and NeurIPS Poster Experience 00:01:11 Team Introductions and Princeton Research Origins 00:03:35 The Deep Learning Anomaly: Why RL Stayed Shallow 00:04:35 Self-Supervised RL: A Different Approach to Scaling 00:05:13 The Breakthrough Moment: Residual Connections and Critical Depth 00:07:15 Architectural Choices: Borrowing from ResNets and Avoiding Vanishing Gradients 00:07:50 Clarifying the Paper: Not Just Big Networks, But Different Objectives 00:08:46 Blurring the Lines: RL Meets Self-Supervised Learning 00:09:44 From TD Errors to Classification: Why This Objective Scales 00:11:06 Architecture Details: Building on Braw and SymbaFowl 00:12:05 Robotics Applications: Goal-Conditioned RL Without Human Supervision 00:13:15 Efficiency Trade-offs: Depth vs Width and Parameter Scaling 00:15:48 JAX and GPU-Accelerated Environments: The Data Infrastructure 00:18:05 World Models and Next State Classification 00:22:37 Unlocking Batch Size Scaling Through Network Capacity 00:24:10 Compute Requirements: State-of-the-Art on a Single GPU 00:21:02 Future Directions: Distillation, VLMs, and Hierarchical Planning 00:27:15 Closing Thoughts: Challenging Conventional Wisdom in RL Scaling
Jensen Huang is the founder, president, and CEO of NVIDIA, the company whose 1999 invention of the GPU helped transform gaming, computer graphics, and accelerated computing. Under his leadership, NVIDIA has grown into a full-stack computing infrastructure company reshaping AI and data-center technology across industries.www.nvidia.com www.youtube.com/nvidia Perplexity: Download the app or ask Perplexity anything at https://pplx.ai/rogan. Visible. Live in the know. Join today at https://www.visible.com/rogan Don't miss out on all the action - Download the DraftKings app today! Sign-up at https://dkng.co/rogan or with my promo code ROGAN GAMBLING PROBLEM? CALL 1-800-GAMBLER, (800) 327-5050 or visit gamblinghelplinema.org (MA). Call 877-8-HOPENY/text HOPENY (467369) (NY). Please Gamble Responsibly. 888-789-7777/visit ccpg.org (CT), or visit www.mdgamblinghelp.org (MD). 21+ and present in most states. (18+ DC/KY/NH/WY). Void in ONT/OR/NH. Eligibility restrictions apply. On behalf of Boot Hill Casino & Resort (KS). Pass-thru of per wager tax may apply in IL. 1 per new customer. Must register new account to receive reward Token. Must select Token BEFORE placing min. $5 bet to receive $200 in Bonus Bets if your bet wins. Min. -500 odds req. Token and Bonus Bets are single-use and non-withdrawable. Token expires 1/11/26. Bonus Bets expire in 7 days (168 hours). Stake removed from payout. Terms: sportsbook.draftkings.com/promos. Ends 1/4/26 at 11:59 PM ET. Sponsored by DK. Learn more about your ad choices. Visit podcastchoices.com/adchoices